HyperAI超神经

Question Answering On Natural Questions

评估指标

EM

评测结果

各个模型在此基准测试上的表现结果

模型名称
EM
Paper TitleRepository
RankRAG-llama3-70b (Zero-Shot, DPR)50.0RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
PaLM 2-S (one-shot)25.3PaLM 2 Technical Report
LLaMA 7B (Contriever)26.07--
Search-o134Search-o1: Agentic Search-Enhanced Large Reasoning Models
Atlas (few-shot, k=64, Wiki-dec-2021+CC index)42.4Atlas: Few-shot Learning with Retrieval Augmented Language Models
LLaMA 2 70B (one-shot)33.0Llama 2: Open Foundation and Fine-Tuned Chat Models
Mistral 7B (5-shot)28.8Mistral 7B
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)47.0ChatQA: Surpassing GPT-4 on Conversational QA and RAG-
LLaMA 65B (few-shot, k=5)35.0LLaMA: Open and Efficient Foundation Language Models
GLaM 62B/64E (One-Shot)26.3GLaM: Efficient Scaling of Language Models with Mixture-of-Experts-
Atlas (full, Wiki-dec-2018 index)64.0Atlas: Few-shot Learning with Retrieval Augmented Language Models
code-davinci-002 175B + REPLUG (few-shot)44.7REPLUG: Retrieval-Augmented Black-Box Language Models
RankRAG-llama3-8b (Zero-Shot, DPR)46.1RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
ReAtt54.7Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
LLaMA 65B (few-shot, k=64)39.9LLaMA: Open and Efficient Foundation Language Models
RankRAG-llama3-70b (Zero-Shot, KILT)54.2RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
DPR41.5Dense Passage Retrieval for Open-Domain Question Answering
FiD-KD (full)54.7Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gopher (few-shot, k=64)28.2Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Atlas (full, Wiki-dec-2021+CC index)60.4Atlas: Few-shot Learning with Retrieval Augmented Language Models
0 of 47 row(s) selected.