Command Palette
Search for a command to run...
问答
Question Answering是自然语言处理领域的重要任务,旨在通过计算机系统自动回答用户提出的问题。该任务可细分为社区问答和知识库问答等子任务,评估指标主要包括EM和F1分数。当前热门的基准数据集有SQuAD、HotPotQA、bAbI、TriviaQA和WikiQA等。近年来,T5和XLNet等模型在这一领域表现出色,推动了问答系统的准确性和实用性。
SQuAD2.0
SQuAD1.1
RuBERT
HotpotQA
Beam Retrieval
PIQA
GPT-3 175B (0-shot)
BoolQ
Gemma-7B
COPA
PaLM 540B (finetuned)
TriviaQA
SpanBERT
SQuAD1.1 dev
T5-11B
Natural Questions
Atlas (full, Wiki-dec-2018 index)
OpenBookQA
WebQuestions
Memory Networks (ensemble)
TruthfulQA
CoA
MultiRC
DeBERTa-1.5B
CronQuestions
PubMedQA
PubMedBERT uncased
MedQA
DRAGON + BioLinkBERT
WikiQA
TANDA-RoBERTa (ASNQ, WikiQA)
SIQA
LLaMA 65B (zero-shot)
StoryCloze
BLOOMZ
DaNetQA
TimeQuestions
Quora Question Pairs
DeBERTa (large)
CNN / Daily Mail
DROP Test
QDGAT (ensemble)
NewsQA
OpenAI/o3-mini-2025-01-31-high
bAbi
STM
Natural Questions (long)
DensePhrases
SQuAD2.0 dev
XLNet (single model)
TrecQA
TANDA DeBERTa-V3-Large + ALL
StrategyQA
PaLM 2 (few-shot, CoT, SC)
MultiTQ
NarrativeQA
Masque (NarrativeQA + MS MARCO)
WikiHop
BigBird-etc
OBQA
FLAN 137B (zero-shot)
Bamboogle
TIQ
CoQA
BERT Large Augmented (single model)
Children's Book Test
NSE
TempQuestions
QAap
FEVER
KILT: ELI5
FQuAD
QASent
Attentive LSTM
BioASQ
PubMedBERT uncased
Quasart-T
RACE
YahooCQA
sMIM (1024) +
SQA3D
ScanQA (w/ auxiliary loss)
Story Cloze
Neo-6B (QA + WS)
FinQA
ELASTIC (RoBERTa-large)
NQ (BEIR)
DROP
FriendsQA
NExT-QA (Open-ended VideoQA)
PeerQA
GPT-4o-2024-08-06-128k
SemEvalCQA
HyperQA
HybridQA
MAFiD
AI2 Kaggle Dataset
FiQA-2018 (BEIR)
BLURB
BioLinkBERT (large)
QuALITY
catbAbI LM-mode
Fast Weight Memory
FairytaleQA
BART fine-tuned on FairytaleQA
HotpotQA (BEIR)
BM25+CE
MS MARCO
CheGeKa
catbAbI QA-mode
Fast Weight Memory
RuOpenBookQA
MultiQ
NaturalQA
DPR
EgoTaskQA
Complex-CronQuestions
SubGTR
Molweni
OTT-QA
Fusion Retriever+ETC
CaseHOLD
Custom Legal-BERT
ReClor
XLNet-large
TweetQA
ByT5
VNHSGE-English
DuoRC
Vector Database (ChromaDB)
ConditionalQA
FiD
SCDE
ConvFinQA
SberQuAD
Mathematics Dataset
TP-Transformer
CliCR
Gated-Attention Reader
Torque
ECONET
VNHSGE-History
MedTurkQuAD: Medical Turkish Question-Answering Dataset
VNHSGE-Geography
VNHSGE-Literature
WikiTableQuestions
TabSQLify (col+row)
Reverb
VNHSGE-Civic
Bing Chat
COMPLEXQUESTIONS
WebQA
VNHSGE-Physics
MuLD (NarrativeQA)
QuAC
FlowQA (single model)
WikiSQL
MuLD (HotpotQA)
MCTest-500
PubChemQA
BioMedGPT-10B
UniProtQA
CODAH
G-DAUG-Combo + RoBERTa-Large
GeoQuestions1089
GeoQA2
AGI Eval
MapEval-API
Claude-3.5-Sonnet (ReAct)
VNHSGE-Biology
MRQA
VNHSGE Mathematics
Aristo Kaggle Allen AI 8th grade questions
Cardal
TempQA-WD
VNHSGE-Chemistry
SQuAD
PopQA
SelfRAG-7b
SimpleQuestions
BBH
MapEval-Textual
JaQuAD
BERT-Japanese
MRQA out-of-domain
RGX
WebQuestionsSP
ChatGPT
StepGame
TP-MANN
WebSRC
SWAG
DeBERTaV3large
TAT-QA
TagOp
ChAII - Hindi and Tamil Question Answering
MuCoT
JD Product Question Answer
PAAG
QASPER
Longformer Encoder Decoder (base)
MCTest-160
syntax, frame, coreference, and word embedding features
AviationQA
KGT5
EfficientQA test
RecipeQA
multimodal+LXMERT+ConstrainedMaxPooling
GraphQuestions
ChatGPT
MedMCQA Dev
MedMobile (3.8B)
MetaQA
T5-small+prolog
HellaSwag
COCO Visual Question Answering (VQA) real images 1.0 open ended
EfficientQA dev
ComplexWebQuestions
TOME-2
KQA Pro
MMLU
MultiSpanQA
RoBERTa-large Tagger + LIQUID (Ensemble)
SchizzoSQUAD
squad_adversarial
squadshifts nyt
squadshifts amazon
squadshifts reddit
squad_v2
adversarial_qa
squadshifts new_wiki