HyperAI超神经

Question Answering On Peerqa

评估指标

AlignScore
Prometheus-2 Answer Correctness
Rouge-L

评测结果

各个模型在此基准测试上的表现结果

模型名称
AlignScore
Prometheus-2 Answer Correctness
Rouge-L
Paper TitleRepository
GPT-3.5-Turbo-0613-16k0.13783.04080.2414Language Models are Few-Shot Learners
Mistral-v02-7B-32k0.08273.42450.1922Mistral 7B
Command-R-v01-34B0.13623.05710.2294--
GPT-4o-2024-08-06-128k0.12243.46120.2266GPT-4 Technical Report
Llama-3-IT-8B-8k0.10983.11020.2295The Llama 3 Herd of Models
Llama-3-IT-8B-32k0.10163.16730.2286The Llama 3 Herd of Models
0 of 6 row(s) selected.