Question Answering On Peerqa
评估指标
AlignScore
Prometheus-2 Answer Correctness
Rouge-L
评测结果
各个模型在此基准测试上的表现结果
模型名称 | AlignScore | Prometheus-2 Answer Correctness | Rouge-L | Paper Title | Repository |
---|---|---|---|---|---|
GPT-3.5-Turbo-0613-16k | 0.1378 | 3.0408 | 0.2414 | Language Models are Few-Shot Learners | |
Mistral-v02-7B-32k | 0.0827 | 3.4245 | 0.1922 | Mistral 7B | |
Command-R-v01-34B | 0.1362 | 3.0571 | 0.2294 | - | - |
GPT-4o-2024-08-06-128k | 0.1224 | 3.4612 | 0.2266 | GPT-4 Technical Report | |
Llama-3-IT-8B-8k | 0.1098 | 3.1102 | 0.2295 | The Llama 3 Herd of Models | |
Llama-3-IT-8B-32k | 0.1016 | 3.1673 | 0.2286 | The Llama 3 Herd of Models |
0 of 6 row(s) selected.