HyperAI超神经

Long Context Understanding On Ada Leval Tsort

评估指标

128k
16k
2k
32k
4k
64k
8k

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称128k16k2k32k4k64k8k
gpt-4-technical-report-12.05.515.52.016.54.08.5
模型 2-5.54.0-4.5-4.5
glm-130b-an-open-bilingual-pre-trained-model-0.90.9-0.2-0.7
glm-130b-an-open-bilingual-pre-trained-model-0.72.3-2.4-2.0
judging-llm-as-a-judge-with-mt-bench-and-1-2.55.3-5.0-3.1
judging-llm-as-a-judge-with-mt-bench-and-1-1.75.3-2.2-2.3
模型 7-3.05.00.05.00.04.5
judging-llm-as-a-judge-with-mt-bench-and-1-3.15.4-5.0-2.4
gpt-4-technical-report-16.03.518.56.015.56.07.5
internlm2-technical-report-4.35.1-3.9-5.1