Question Answering On Hotpotqa
评估指标
ANS-EM
ANS-F1
JOINT-EM
JOINT-F1
SUP-EM
SUP-F1
评测结果
各个模型在此基准测试上的表现结果
模型名称 | ANS-EM | ANS-F1 | JOINT-EM | JOINT-F1 | SUP-EM | SUP-F1 | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|
RoBERTa-DenseRetriever-Fast | 0.598 | 0.727 | 0.345 | 0.602 | 0.480 | 0.749 | - | - |
SAQA | 0.284 | 0.386 | 0.086 | 0.245 | 0.147 | 0.472 | - | - |
MultiQA | 0.307 | 0.402 | 0.000 | 0.000 | 0.000 | 0.000 | - | - |
Entity-centric IR | 0.354 | 0.463 | 0.000 | 0.255 | 0.001 | 0.432 | - | - |
GRN + BERT | 0.299 | 0.391 | 0.083 | 0.258 | 0.132 | 0.497 | - | - |
HopRetriever + Sp-search | 0.671 | 0.799 | 0.432 | 0.706 | 0.574 | 0.835 | HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions | - |
SAFSr-Bert | 0.394 | 0.514 | 0.133 | 0.370 | 0.242 | 0.585 | - | - |
DPR-recurrent | 0.598 | 0.727 | 0.345 | 0.602 | 0.480 | 0.749 | - | - |
HGN Model-reproduce | 0.335 | 0.427 | 0.110 | 0.284 | 0.156 | 0.493 | - | - |
GoldEn Retriever | 0.379 | 0.486 | 0.180 | 0.391 | 0.307 | 0.642 | Answering Complex Open-domain Questions Through Iterative Query Generation | |
HopRetriever-V1 | 0.608 | 0.739 | 0.380 | 0.639 | 0.531 | 0.793 | - | - |
HopAns | 0.617 | 0.746 | 0.368 | 0.629 | 0.500 | 0.772 | - | - |
GAR | 0.482 | 0.613 | 0.306 | 0.530 | 0.483 | 0.739 | - | - |
tes | 0.074 | 0.121 | 0.000 | 0.011 | 0.000 | 0.078 | - | - |
PR-Bert | 0.433 | 0.538 | 0.145 | 0.391 | 0.219 | 0.596 | - | - |
Chain-of-Skills | 0.674 | 0.801 | 0.457 | 0.717 | 0.613 | 0.853 | Chain-of-Skills: A Configurable Model for Open-domain Question Answering | |
Beam Retrieval | 0.727 | 0.850 | 0.505 | 0.775 | 0.663 | 0.901 | End-to-End Beam Retrieval for Multi-Hop Question Answering | |
DR model | 0.588 | 0.717 | 0.293 | 0.568 | 0.416 | 0.725 | - | - |
AFSgraph | 0.601 | 0.730 | 0.359 | 0.617 | 0.500 | 0.769 | - | - |
HopRetriever | 0.671 | 0.799 | 0.431 | 0.698 | 0.572 | 0.826 | - | - |
0 of 72 row(s) selected.