Question Answering On Peerqa
评估指标
AlignScore
Prometheus-2 Answer Correctness
Rouge-L
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | AlignScore | Prometheus-2 Answer Correctness | Rouge-L |
---|---|---|---|
language-models-are-few-shot-learners | 0.1378 | 3.0408 | 0.2414 |
mistral-7b | 0.0827 | 3.4245 | 0.1922 |
模型 3 | 0.1362 | 3.0571 | 0.2294 |
gpt-4-technical-report-1 | 0.1224 | 3.4612 | 0.2266 |
the-llama-3-herd-of-models | 0.1098 | 3.1102 | 0.2295 |
the-llama-3-herd-of-models | 0.1016 | 3.1673 | 0.2286 |