HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
Console
Sign In
首页
SOTA
自然问题
Natural Questions On Theoremqa
Natural Questions On Theoremqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Code
GPT-4 (PoT)
52.4
TheoremQA: A Theorem-driven Question Answering dataset
GPT-4 (CoT)
43.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (PoT)
35.6
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
32.5
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
32.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
PaLM-2-unicorn (CoT)
31.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (CoT)
30.2
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
28.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
27.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Claude-v1 (PoT)
25.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-v1 (CoT)
24.9
TheoremQA: A Theorem-driven Question Answering dataset
code-davinci-002
23.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-instant (CoT)
23.6
TheoremQA: A Theorem-driven Question Answering dataset
text-davinci-003
22.8
TheoremQA: A Theorem-driven Question Answering dataset
PaLM-2-bison (CoT)
21.0
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
19.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
17.0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
16.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
15.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 19 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
Console
Sign In
首页
SOTA
自然问题
Natural Questions On Theoremqa
Natural Questions On Theoremqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Code
GPT-4 (PoT)
52.4
TheoremQA: A Theorem-driven Question Answering dataset
GPT-4 (CoT)
43.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (PoT)
35.6
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
32.5
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
32.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
PaLM-2-unicorn (CoT)
31.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (CoT)
30.2
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
28.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
27.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Claude-v1 (PoT)
25.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-v1 (CoT)
24.9
TheoremQA: A Theorem-driven Question Answering dataset
code-davinci-002
23.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-instant (CoT)
23.6
TheoremQA: A Theorem-driven Question Answering dataset
text-davinci-003
22.8
TheoremQA: A Theorem-driven Question Answering dataset
PaLM-2-bison (CoT)
21.0
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
19.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
17.0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
16.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
15.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 19 row(s) selected.
Previous
Next
Console
Console