HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
自然问题
Natural Questions On Theoremqa
Natural Questions On Theoremqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
GPT-4 (PoT)
52.4
TheoremQA: A Theorem-driven Question Answering dataset
GPT-4 (CoT)
43.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (PoT)
35.6
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
32.5
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
32.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
PaLM-2-unicorn (CoT)
31.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (CoT)
30.2
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
28.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
27.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Claude-v1 (PoT)
25.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-v1 (CoT)
24.9
TheoremQA: A Theorem-driven Question Answering dataset
code-davinci-002
23.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-instant (CoT)
23.6
TheoremQA: A Theorem-driven Question Answering dataset
text-davinci-003
22.8
TheoremQA: A Theorem-driven Question Answering dataset
PaLM-2-bison (CoT)
21.0
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
19.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
17.0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
16.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
15.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 19 row(s) selected.
Previous
Next
Natural Questions On Theoremqa | SOTA | HyperAI超神经