HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
多任务语言理解
Multi Task Language Understanding On Mmlu
Multi Task Language Understanding On Mmlu
评估指标
Average (%)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Average (%)
Paper Title
Repository
Claude 3.5 Sonnet (5-shot)
88.7
Claude 3.5 Sonnet Model Card Addendum
-
ds-r1(671b)
87.5
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
GPT-4 o1(300b)
87
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
-
Llama 3.1 (405B)
86.6
Llama 3 Meets MoE: Efficient Upcycling
Llama 3.1 (70B)
86.0
Llama 3 Meets MoE: Efficient Upcycling
Gemini Ultra (5-shot)
83.7
-
-
Qwen2-72B-Instruct
83.54
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Claude 3 Sonnet (5-shot)
79
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
Qwen1.5 72B (5-shot)
77.5
-
-
Leeroo (5-shot)
75.9
Routoo: Learning to Route to Large Language Models Effectively
Camelidae-8×34B (5-shot)
75.6
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Claude 3 Haiku (5-shot)
75.2
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
DBRX Instruct 132B (5-shot)
73.7
The Llama 3 Herd of Models
llama 2(65b)
73.5
Scaling Instruction-Finetuned Language Models
Claude Instant 1.1 (5-shot)
73.4
Model Card and Evaluations for Claude Models
-
Llama 3.1 8B (CoT)
73.0
The Llama 3 Herd of Models
Flan-PaLM (5-shot, finetuned)
72.2
Scaling Instruction-Finetuned Language Models
Gemini Pro (5-shot)
71.8
-
-
Mixtral 8x7B (5-shot)
70.6
Mixtral of Experts
Falcon 180B (5-shot)
70.6
The Falcon Series of Open Language Models
-
0 of 61 row(s) selected.
Previous
Next
Multi Task Language Understanding On Mmlu | SOTA | HyperAI超神经