HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Star-Transformer

Qipeng Guo; Xipeng Qiu; Pengfei Liu; Yunfan Shao; Xiangyang Xue; Zheng Zhang

Star-Transformer

Abstract

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

Code Repositories

dmlc/dgl
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
natural-language-inference-on-snliStar-Transformer (no cross sentence attention)
% Test Accuracy: 86.0
sentiment-analysis-on-sst-5-fine-grainedStar-Transformer
Accuracy: 53.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp