Open Domain Question Answering On Kilt
评估指标
EM
F1
KILT-EM
KILT-F1
R-Prec
Recall@5
评测结果
各个模型在此基准测试上的表现结果
模型名称 | EM | F1 | KILT-EM | KILT-F1 | R-Prec | Recall@5 | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|
Multitask DPR + BART | 39.75 | 48.43 | 29.09 | 34.7 | 59.42 | 68.24 | - | - |
Sphere | 46.05 | 56.57 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
KGI_0 | 45.22 | 53.38 | 36.36 | 41.83 | 63.71 | 70.17 | - | - |
intersect | 53.74 | 62.24 | 38.78 | 44.4 | 63.16 | 68.19 | - | - |
Multi-task DPR | 0.0 | 0.0 | 0.0 | 0.0 | 59.42 | 68.24 | - | - |
BART + DPR | 41.27 | 49.54 | 30.06 | 34.72 | 54.29 | 65.52 | - | - |
BERT + DPR | 38.64 | 47.09 | 31.99 | 37.58 | 60.66 | 46.79 | - | - |
multi-task small | 0.35 | 3.72 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
Re2G | 51.73 | 60.97 | 43.56 | 49.8 | 70.78 | 76.63 | Re2G: Retrieve, Rerank, Generate | |
BART | 21.75 | 28.69 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
Wikipedia | 51.59 | 60.83 | 35.32 | 40.73 | 59.83 | 71.17 | - | - |
TABi | 0.0 | 0.0 | 0.0 | 0.0 | 62.6 | 64.95 | - | - |
T5-base | 19.6 | 27.73 | 0.0 | 0.0 | 0.0 | 0.0 | KILT: a Benchmark for Knowledge Intensive Language Tasks | |
chriskuei | 0.0 | 0.0 | 0.0 | 0.0 | 60.32 | 61.21 | - | - |
GENRE | 0.0 | 0.0 | 0.0 | 0.0 | 60.25 | 61.36 | - | - |
RAG | 44.39 | 52.35 | 32.69 | 37.91 | 59.49 | 67.06 | - | - |
0 of 16 row(s) selected.