HyperAI超神经

Question Answering On Truthfulqa

评估指标

EM

评测结果

各个模型在此基准测试上的表现结果

模型名称
EM
Paper TitleRepository
CoA67.3Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Gopher 280B (zero-shot, QA prompts)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher
LLaMA 65B-LLaMA: Open and Efficient Foundation Language Models
GPT-2 1.5B-TruthfulQA: Measuring How Models Mimic Human Falsehoods
Shakti-LLM (2.5B)-SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments-
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)-Representation Engineering: A Top-Down Approach to AI Transparency
GAL 6.7B-Galactica: A Large Language Model for Science
Vicuna 7B + Inference Time Intervention (ITI)---
GAL 30B-Galactica: A Large Language Model for Science
GAL 1.3B-Galactica: A Large Language Model for Science
Gopher 7.1 (zero-shot, QA prompts)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher
CoA w/o actions63.3Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
ToT66.6Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Gopher 7.1B (zero-shot, Our Prompt + Choices)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher
LLaMa-2-7B-Chat + TruthX-TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
GAL 120B-Galactica: A Large Language Model for Science
LLaMA 7B-LLaMA: Open and Efficient Foundation Language Models
UnifiedQA 3B-TruthfulQA: Measuring How Models Mimic Human Falsehoods
Gopher 1.4 (zero-shot, QA prompts)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher
GAL 125M-Galactica: A Large Language Model for Science
0 of 33 row(s) selected.