HyperAIHyperAI

Question Answering On Truthfulqa

Metrics

EM

Results

Performance results of various models on this benchmark

Model Name
EM
Paper TitleRepository
CoA67.3Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models-
Gopher 280B (zero-shot, QA prompts)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher-
LLaMA 65B-LLaMA: Open and Efficient Foundation Language Models-
GPT-2 1.5B-TruthfulQA: Measuring How Models Mimic Human Falsehoods-
Shakti-LLM (2.5B)-SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments-
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)-Representation Engineering: A Top-Down Approach to AI Transparency-
GAL 6.7B-Galactica: A Large Language Model for Science-
Vicuna 7B + Inference Time Intervention (ITI)---
GAL 30B-Galactica: A Large Language Model for Science-
GAL 1.3B-Galactica: A Large Language Model for Science-
Gopher 7.1 (zero-shot, QA prompts)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher-
CoA w/o actions63.3Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models-
ToT66.6Tree of Thoughts: Deliberate Problem Solving with Large Language Models-
Gopher 7.1B (zero-shot, Our Prompt + Choices)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher-
LLaMa-2-7B-Chat + TruthX-TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space-
GAL 120B-Galactica: A Large Language Model for Science-
LLaMA 7B-LLaMA: Open and Efficient Foundation Language Models-
UnifiedQA 3B-TruthfulQA: Measuring How Models Mimic Human Falsehoods-
Gopher 1.4 (zero-shot, QA prompts)-Scaling Language Models: Methods, Analysis & Insights from Training Gopher-
GAL 125M-Galactica: A Large Language Model for Science-
0 of 33 row(s) selected.