HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
问答
Question Answering On Truthfulqa
Question Answering On Truthfulqa
评估指标
EM
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
EM
Paper Title
Repository
CoA
67.3
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Gopher 280B (zero-shot, QA prompts)
-
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
LLaMA 65B
-
LLaMA: Open and Efficient Foundation Language Models
GPT-2 1.5B
-
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Shakti-LLM (2.5B)
-
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
-
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)
-
Representation Engineering: A Top-Down Approach to AI Transparency
GAL 6.7B
-
Galactica: A Large Language Model for Science
Vicuna 7B + Inference Time Intervention (ITI)
-
-
-
GAL 30B
-
Galactica: A Large Language Model for Science
GAL 1.3B
-
Galactica: A Large Language Model for Science
Gopher 7.1 (zero-shot, QA prompts)
-
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
CoA w/o actions
63.3
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
ToT
66.6
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Gopher 7.1B (zero-shot, Our Prompt + Choices)
-
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
LLaMa-2-7B-Chat + TruthX
-
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
GAL 120B
-
Galactica: A Large Language Model for Science
LLaMA 7B
-
LLaMA: Open and Efficient Foundation Language Models
UnifiedQA 3B
-
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Gopher 1.4 (zero-shot, QA prompts)
-
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
GAL 125M
-
Galactica: A Large Language Model for Science
0 of 33 row(s) selected.
Previous
Next