HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
常识推理
Common Sense Reasoning On Arc Challenge
Common Sense Reasoning On Arc Challenge
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
GPT-4 (few-shot, k=25)
96.4
GPT-4 Technical Report
PaLM 2 (few-shot, CoT, SC)
95.1
PaLM 2 Technical Report
Shivaay (4B, few-shot, k=8)
91.04
-
-
StupidLLM
91.03
-
-
Claude 2 (few-shot, k=5)
91
Model Card and Evaluations for Claude Models
-
Claude 1.3 (few-shot, k=5)
90
Model Card and Evaluations for Claude Models
-
PaLM 540B (Self Improvement, Self Consistency)
89.8
Large Language Models Can Self-Improve
-
PaLM 540B (Self Consistency)
88.7
Large Language Models Can Self-Improve
-
PaLM 540B (Self Improvement, CoT Prompting)
88.3
Large Language Models Can Self-Improve
-
PaLM 540B (Self Improvement, Standard-Prompting)
87.2
Large Language Models Can Self-Improve
-
PaLM 540B (Standard-Prompting)
87.1
Large Language Models Can Self-Improve
-
ST-MoE-32B 269B (fine-tuned)
86.5
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Claude Instant 1.1 (few-shot, k=5)
85.7
Model Card and Evaluations for Claude Models
-
PaLM 540B (CoT Prompting)
85.2
Large Language Models Can Self-Improve
-
GPT-3.5 (few-shot, k=25)
85.2
GPT-4 Technical Report
LLaMA 3 8B + MoSLoRA (fine-tuned)
81.5
Mixture-of-Subspaces in Low-Rank Adaptation
LLaMA-3 8B + MixLoRA
79.9
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
LLaMA-2 13B + MixLoRA
69.9
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
PaLM 2-L (1-shot)
69.2
PaLM 2 Technical Report
GAL 120B (zero-shot)
67.9
Galactica: A Large Language Model for Science
0 of 54 row(s) selected.
Previous
Next
Common Sense Reasoning On Arc Challenge | SOTA | HyperAI超神经