Question Answering
Question Answering是自然语言处理领域的重要任务,旨在通过计算机系统自动回答用户提出的问题。该任务可细分为社区问答和知识库问答等子任务,评估指标主要包括EM和F1分数。当前热门的基准数据集有SQuAD、HotPotQA、bAbI、TriviaQA和WikiQA等。近年来,T5和XLNet等模型在这一领域表现出色,推动了问答系统的准确性和实用性。
adversarial_qa
AGI Eval
AI2 Kaggle Dataset
Aristo Kaggle Allen AI 8th grade questions
Cardal
AviationQA
KGT5
bAbi
STM
Bamboogle
BBH
BioASQ
BioLinkBERT (large)
BLURB
BioLinkBERT (large)
BoolQ
Gemma-7B
CaseHOLD
Custom Legal-BERT
catbAbI QA-mode
Fast Weight Memory
catbAbI LM-mode
Fast Weight Memory
ChAII - Hindi and Tamil Question Answering
MuCoT
CheGeKa
Children's Book Test
NSE
CliCR
Gated-Attention Reader
CNN / Daily Mail
COCO Visual Question Answering (VQA) real images 1.0 open ended
CODAH
G-DAUG-Combo + RoBERTa-Large
Complex-CronQuestions
SubGTR
COMPLEXQUESTIONS
WebQA
ComplexWebQuestions
TOME-2
ConditionalQA
FiD
ConvFinQA
COPA
PaLM 540B (finetuned)
CoQA
GPT-3 175B (few-shot, k=32)
CronQuestions
DaNetQA
DROP
DROP Test
QDGAT (ensemble)
DuoRC
Vector Database (ChromaDB)
EfficientQA dev
EfficientQA test
EgoTaskQA
FairytaleQA
BART fine-tuned on FairytaleQA
FEVER
FinQA
ELASTIC (RoBERTa-large)
FiQA-2018 (BEIR)
FQuAD
FriendsQA
GeoQuestions1089
GeoQA2
GraphQuestions
ChatGPT
HellaSwag
HotpotQA
Beam Retrieval
HotpotQA (BEIR)
BM25+CE
HybridQA
MAFiD
JaQuAD
BERT-Japanese
JD Product Question Answer
PAAG
KILT: ELI5
KQA Pro
MapEval-API
Claude-3.5-Sonnet (ReAct)
MapEval-Textual
Mathematics Dataset
TP-Transformer
MCTest-160
syntax, frame, coreference, and word embedding features
MCTest-500
MedMCQA Dev
MedMobile (3.8B)
MedQA
DRAGON + BioLinkBERT
MedTurkQuAD: Medical Turkish Question-Answering Dataset
MetaQA
T5-small+prolog
MMLU
Molweni
MRQA
MRQA out-of-domain
RGX
MS MARCO
MuLD (HotpotQA)
MuLD (NarrativeQA)
MultiQ
MultiRC
PaLM 540B (finetuned)
MultiSpanQA
RoBERTa-large Tagger + LIQUID (Ensemble)
MultiTQ
NarrativeQA
Masque (NarrativeQA + MS MARCO)
Natural Questions
Atlas (full, Wiki-dec-2018 index)
Natural Questions (long)
DensePhrases
NaturalQA
DPR
NewsQA
OpenAI/o3-mini-2025-01-31-high
NExT-QA (Open-ended VideoQA)
NQ (BEIR)
OBQA
FLAN 137B (zero-shot)
OpenBookQA
OTT-QA
Fusion Retriever+ETC
PeerQA
GPT-4o-2024-08-06-128k
PIQA
LLaMA 65B (0-shot)
PopQA
SelfRAG-7b
PubChemQA
BioMedGPT-10B
PubMedQA
BioGPT-Large(1.5B)
QASent
Attentive LSTM
QASPER
Longformer Encoder Decoder (base)
QuAC
FlowQA (single model)
QuALITY
Quasart-T
Quora Question Pairs
DeBERTa (large)
RACE
RecipeQA
multimodal+LXMERT+ConstrainedMaxPooling
ReClor
XLNet-large
Reverb
RuOpenBookQA
SberQuAD
SCDE
SchizzoSQUAD
SemEvalCQA
HyperQA
SimpleQuestions
SIQA
LLaMA 65B (zero-shot)
SQA3D
CREMA
SQuAD
squad_adversarial
squad_v2
SQuAD1.1
LUKE
SQuAD1.1 dev
T5-11B
SQuAD2.0
SQuAD2.0 dev
XLNet (single model)
squadshifts amazon
squadshifts new_wiki
squadshifts nyt
squadshifts reddit
StepGame
TP-MANN
Story Cloze
Neo-6B (QA + WS)
StoryCloze
BLOOMZ
StrategyQA
PaLM 2 (few-shot, CoT, SC)
SWAG
DeBERTaV3large
TAT-QA
TagOp
TempQA-WD
TempQuestions
QAap
TimeQuestions
TIQ
Torque
ECONET
TrecQA
TANDA DeBERTa-V3-Large + ALL
TriviaQA
PaLM 2-L (one-shot)
TruthfulQA
CoA
TweetQA
ByT5
UniProtQA
VNHSGE-Biology
VNHSGE-Chemistry
VNHSGE-Civic
Bing Chat
VNHSGE-English
VNHSGE-Geography
VNHSGE-History
VNHSGE-Literature
VNHSGE Mathematics
VNHSGE-Physics
WebQuestions
FiE+PAQ
WebQuestionsSP
ChatGPT
WebSRC
WikiHop
BigBird-etc
WikiQA
TANDA-RoBERTa (ASNQ, WikiQA)
WikiSQL
WikiTableQuestions
TabSQLify (col+row)
YahooCQA
sMIM (1024) +