HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Reformer: The Efficient Transformer

Nikita Kitaev Łukasz Kaiser Anselm Levskaya

Reformer: The Efficient Transformer

Abstract

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Benchmarks

BenchmarkMethodologyMetrics
d4rl-on-d4rlReformer
Average Reward: 63.9
image-generation-on-imagenet-64x64Reformer (6 layers)
Bits per dim: 3.740
image-generation-on-imagenet-64x64Reformer (12 layers)
Bits per dim: 3.710
language-modelling-on-wikitext-103Reformer 125M
Test perplexity: 26.0
open-domain-question-answering-on-searchqaLocality-Sensitive Hashing
EM: 66.0
question-answering-on-natural-questions-longLocality-Sensitive Hashing
F1: 75.5
question-answering-on-quasart-tLocality-Sensitive Hashing
EM: 53.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp