3 months ago

Big Bird: Transformers for Longer Sequences

Manzil Zaheer Guru Guruganesh Avinava Dubey Joshua Ainslie Chris Alberti Santiago Ontanon Philip Pham Anirudh Ravula Qifan Wang Li Yang

Abstract

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.

Code Repositories

mim-solutions/bert_for_longer_texts

pytorch

Mentioned in GitHub

sergeykramp/mthesis-bigbird-embeddings

pytorch

Mentioned in GitHub

pwc-1/Paper-8/tree/main/big_bird

mindspore

google-research/bigbird

Official

thefonseca/factorsum

pytorch

Mentioned in GitHub

tensorflow/models/tree/master/official/nlp/projects/bigbird

monologg/kobigbird

Mentioned in GitHub

pwc-1/Paper-8/tree/main/bigbird_pegasus

mindspore

sajjjadayobi/ParsBigBird

Mentioned in GitHub

huggingface/transformers

pytorch

Mentioned in GitHub

PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/bigbird

paddle

2024-MindSpore-1/Code2/tree/main/model-1/big_bird

mindspore

facebookresearch/xformers

pytorch

Mentioned in GitHub

mim-solutions/roberta_for_longer_texts

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
document-summarization-on-bbc-xsum	BigBird-Pegasus	ROUGE-1: 47.12 ROUGE-2: 24.05 ROUGE-L: 38.8
document-summarization-on-cnn-daily-mail	BigBird-Pegasus	ROUGE-1: 43.84 ROUGE-2: 21.11 ROUGE-L: 40.74
linguistic-acceptability-on-cola	BigBird	Accuracy: 58.5%
natural-language-inference-on-multinli	BigBird	Matched: 87.5
natural-language-inference-on-qnli	BigBird	Accuracy: 92.2%
natural-language-inference-on-rte	BigBird	Accuracy: 75.0%
question-answering-on-hotpotqa	BigBird-etc	ANS-F1: 0.755 JOINT-F1: 0.736 SUP-F1: 0.891
question-answering-on-quora-question-pairs	BigBird	Accuracy: 88.6%
question-answering-on-triviaqa	BigBird-etc	F1: 80.9
question-answering-on-wikihop	BigBird-etc	Test: 82.3
semantic-textual-similarity-on-mrpc	BigBird	F1: 91.5
semantic-textual-similarity-on-sts-benchmark	BigBird	Spearman Correlation: .878
sentiment-analysis-on-sst-2-binary	BigBird	Accuracy: 94.6
text-classification-on-arxiv	BigBird	Accuracy: 92.31
text-classification-on-hyperpartisan	BigBird	Accuracy: 92.2
text-classification-on-hyperpartisan-1	BigBird	Accuracy: 92.2
text-classification-on-patents	BigBird	Accuracy: 69.3
text-classification-on-yelp-5	BigBird	Accuracy: 72.16%
text-summarization-on-arxiv-1	BigBird-Pegasus	ROUGE-1: 46.63 ROUGE-2: 19.02 ROUGE-L: 41.77
text-summarization-on-bigpatent	BigBird-Pegasus	ROUGE-1: 60.64 ROUGE-2: 42.46 ROUGE-L: 50.01
text-summarization-on-pubmed-1	BigBird-Pegasus	ROUGE-1: 46.32 ROUGE-2: 20.65 ROUGE-L: 42.33

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Big Bird: Transformers for Longer Sequences

Manzil Zaheer Guru Guruganesh Avinava Dubey Joshua Ainslie Chris Alberti Santiago Ontanon Philip Pham Anirudh Ravula Qifan Wang Li Yang1 more

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Manzil Zaheer Guru Guruganesh Avinava Dubey Joshua Ainslie Chris Alberti Santiago Ontanon Philip Pham Anirudh Ravula Qifan Wang Li Yang