Command Palette
Search for a command to run...
Mandy Guo Joshua Ainslie David Uthus Santiago Ontanon Jianmo Ni Yun-Hsuan Sung Yinfei Yang

Abstract
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| abstractive-text-summarization-on-cnn-daily | LongT5 | ROUGE-1: 43.94 ROUGE-2: 21.40 ROUGE-L: 41.28 |
| long-range-modeling-on-scrolls | LongT5 Base | Avg.: 38.6 CNLI: 85.6 GovRep: 57.7 / 30.0 / 31.4 Nrtv: 23.0 QALT EM-T/H: 37.9 / 36.6 QMSum: 33.9 / 11.0 / 22.8 Qspr: 46.6 SumScr: 34.8 / 9.6 / 21.1 |
| long-range-modeling-on-scrolls | LongT5 XL | Avg.: 42.53 CNLI: 88.2 GovRep: 61.1 / 32.3 / 33.7 Nrtv: 29.3 QALT EM-T/H: 46.0 / 42.1 QMSum: 34.9 / 11.8 / 23.5 Qspr: 53.1 SumScr: 35.8 / 9.6 / 21.1 |
| long-range-modeling-on-scrolls | LongT5 Large | Avg.: 41.03 CNLI: 87.3 GovRep: 61.3/32.2/33.8 Nrtv: 27.2 QALT EM-T/H: 40.6 / 38.6 QMSum: 35.1 / 12.0 / 23.3 Qspr: 52.3 SumScr: 60.3 / 31.1 / 32.8 |
| multi-document-summarization-on-multi-news | LongT5 | ROUGE-1: 48.17 ROUGE-2: 19.43 ROUGE-SU4: 24.94 |
| text-summarization-on-arxiv | LongT5 | ROUGE-1: 48.35 ROUGE-2: 21.92 ROUGE-L: 44.27 |
| text-summarization-on-bigpatent | LongT5 | ROUGE-1: 76.87 ROUGE-2: 66.06 ROUGE-L: 70.76 |
| text-summarization-on-pubmed-1 | LongT5 | ROUGE-1: 50.23 ROUGE-2: 24.76 ROUGE-L: 46.67 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.