HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Mandy Guo Joshua Ainslie David Uthus Santiago Ontanon Jianmo Ni Yun-Hsuan Sung Yinfei Yang

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Abstract

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.

Benchmarks

BenchmarkMethodologyMetrics
abstractive-text-summarization-on-cnn-dailyLongT5
ROUGE-1: 43.94
ROUGE-2: 21.40
ROUGE-L: 41.28
long-range-modeling-on-scrollsLongT5 Base
Avg.: 38.6
CNLI: 85.6
GovRep: 57.7 / 30.0 / 31.4
Nrtv: 23.0
QALT EM-T/H: 37.9 / 36.6
QMSum: 33.9 / 11.0 / 22.8
Qspr: 46.6
SumScr: 34.8 / 9.6 / 21.1
long-range-modeling-on-scrollsLongT5 XL
Avg.: 42.53
CNLI: 88.2
GovRep: 61.1 / 32.3 / 33.7
Nrtv: 29.3
QALT EM-T/H: 46.0 / 42.1
QMSum: 34.9 / 11.8 / 23.5
Qspr: 53.1
SumScr: 35.8 / 9.6 / 21.1
long-range-modeling-on-scrollsLongT5 Large
Avg.: 41.03
CNLI: 87.3
GovRep: 61.3/32.2/33.8
Nrtv: 27.2
QALT EM-T/H: 40.6 / 38.6
QMSum: 35.1 / 12.0 / 23.3
Qspr: 52.3
SumScr: 60.3 / 31.1 / 32.8
multi-document-summarization-on-multi-newsLongT5
ROUGE-1: 48.17
ROUGE-2: 19.43
ROUGE-SU4: 24.94
text-summarization-on-arxivLongT5
ROUGE-1: 48.35
ROUGE-2: 21.92
ROUGE-L: 44.27
text-summarization-on-bigpatentLongT5
ROUGE-1: 76.87
ROUGE-2: 66.06
ROUGE-L: 70.76
text-summarization-on-pubmed-1LongT5
ROUGE-1: 50.23
ROUGE-2: 24.76
ROUGE-L: 46.67

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp