Question Answering On Natural Questions

评估指标

评测结果

各个模型在此基准测试上的表现结果

		Paper Title	Repository
Atlas (full, Wiki-dec-2018 index)	64.0	Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas (full, Wiki-dec-2021+CC index)	60.4	Atlas: Few-shot Learning with Retrieval Augmented Language Models
DPA-RAG	59.19	Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
FiE	58.4	FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering	-
R2-D2 (full)	55.9	R2-D2: A Modular Baseline for Open-Domain Question Answering
ReAtt	54.7	Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
FiD-KD (full)	54.7	Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
RankRAG-llama3-70b (Zero-Shot, KILT)	54.2	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	-
EMDR^2	52.5	End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
FID (full)	51.4	Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
RankRAG-llama3-8b (Zero-Shot, KILT)	50.6	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	-
RankRAG-llama3-70b (Zero-Shot, DPR)	50.0	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	-
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)	47.0	ChatQA: Surpassing GPT-4 on Conversational QA and RAG	-
RankRAG-llama3-8b (Zero-Shot, DPR)	46.1	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	-
code-davinci-002 175B + REPLUG LSR (few-shot)	45.5	REPLUG: Retrieval-Augmented Black-Box Language Models
RETRO + DPR (full)	45.5	Improving language models by retrieving from trillions of tokens
Atlas (few-shot, k=64, Wiki-Dec-2018 index)	45.1	Atlas: Few-shot Learning with Retrieval Augmented Language Models
code-davinci-002 175B + REPLUG (few-shot)	44.7	REPLUG: Retrieval-Augmented Black-Box Language Models
RAG	44.5	Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
ChatQA-1.5-llama3-8b (Zero-Shot, KILT)	42.7	ChatQA: Surpassing GPT-4 on Conversational QA and RAG	-

0 of 47 row(s) selected.

Command Palette

Question Answering On Natural Questions

评估指标

评测结果