HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
问答
Question Answering On Triviaqa
Question Answering On Triviaqa
评估指标
EM
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
EM
Paper Title
Repository
Claude 2 (few-shot, k=5)
87.5
Model Card and Evaluations for Claude Models
-
GPT-4-0613
87
-
-
Claude 1.3 (few-shot, k=5)
86.7
Model Card and Evaluations for Claude Models
-
RankRAG-llama3-70b (Zero-Shot, KILT)
86.5
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
-
PaLM 2-L (one-shot)
86.1
PaLM 2 Technical Report
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
85.6
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
-
LLaMA 2 70B (one-shot)
85
Llama 2: Open Foundation and Fine-Tuned Chat Models
GPT-4-0613 (Zero-shot)
84.8
GPT-4 Technical Report
RankRAG-llama3-8b (Zero-Shot, KILT)
82.9
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
-
PaLM 2-M (one-shot)
81.7
PaLM 2 Technical Report
PaLM-540B (One-Shot)
81.4
PaLM: Scaling Language Modeling with Pathways
PaLM-540B (Few-Shot)
81.4
PaLM: Scaling Language Modeling with Pathways
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
81.0
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
-
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
79.29
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Claude Instant 1.1 (few-shot, k=5)
78.9
Model Card and Evaluations for Claude Models
-
code-davinci-002 175B + REPLUG LSR (Few-Shot)
77.3
REPLUG: Retrieval-Augmented Black-Box Language Models
PaLM-540B (Zero-Shot)
76.9
PaLM: Scaling Language Modeling with Pathways
code-davinci-002 175B + REPLUG (Few-Shot)
76.8
REPLUG: Retrieval-Augmented Black-Box Language Models
GLaM 62B/64E (Few-shot)
75.8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
-
GLaM 62B/64E (One-shot)
75.8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
-
0 of 56 row(s) selected.
Previous
Next