HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

KILT: a Benchmark for Knowledge Intensive Language Tasks

KILT: a Benchmark for Knowledge Intensive Language Tasks

Abstract

Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research on models that condition on specific information in large textual resources, we present a benchmark for knowledge-intensive language tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia, reducing engineering turnaround through the re-use of components, as well as accelerating research into task-agnostic memory architectures. We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and yielding competitive results on entity linking and slot filling, by generating disambiguated text. KILT data and code are available at https://github.com/facebookresearch/KILT.

Code Repositories

zouharvi/kb-shrink
pytorch
Mentioned in GitHub
facebookresearch/KILT
Official
Mentioned in GitHub
facebookresearch/editeval
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
entity-linking-on-kilt-aida-yago2T5-base
Accuracy: 74.05
KILT-AC: 74.05
R-Prec: 74.05
Recall@5: 74.05
entity-linking-on-kilt-wned-cwebT5-base
Accuracy: 49.29
KILT-AC: 49.29
R-Prec: 49.29
Recall@5: 49.29
entity-linking-on-kilt-wned-wikiT5-base
Accuracy: 47.13
KILT-AC: 47.13
R-Prec: 47.13
Recall@5: 47.13
fact-verification-on-kilt-feverT5-base
Accuracy: 76.3
KILT-AC: 0.0
R-Prec: 0.0
Recall@5: 0.0
fact-verification-on-kilt-feverRAG
Accuracy: 86.31
KILT-AC: 53.45
R-Prec: 61.94
Recall@5: 75.55
open-domain-dialog-on-kilt-wizard-ofT5-base
F1: 13.53
KILT-F1: 0.0
KILT-RL: 0.0
R-Prec: 0.0
ROUGE-L: 12.4
Recall@5: 0.0
open-domain-question-answering-on-kiltT5-base
EM: 19.6
F1: 27.73
KILT-EM: 0.0
KILT-F1: 0.0
R-Prec: 0.0
Recall@5: 0.0
open-domain-question-answering-on-kilt-1T5-base
EM: 12.64
F1: 19.57
KILT-EM: 0.0
KILT-F1: 0.0
R-Prec: 0.0
Recall@5: 0.0
open-domain-question-answering-on-kilt-2T5-base
EM: 18.11
F1: 27.83
KILT-EM: 0.0
KILT-F1: 0.0
R-Prec: 0.0
Recall@5: 0.0
open-domain-question-answering-on-kilt-eli5T5-base
F1: 16.1
KILT-F1: 0.0
KILT-RL: 0.0
R-Prec: 0.0
ROUGE-L: 19.08
Recall@5: 0.0
question-answering-on-kilt-eli5BART+DPR
F1: 17.88
Rouge-L: 17.41
question-answering-on-kilt-eli5T5-base
F1: 16.1
Rouge-L: 19.08
question-answering-on-kilt-eli5RAG
F1: 14.51
Rouge-L: 14.05
slot-filling-on-kilt-t-rexT5-base
Accuracy: 43.56
F1: 50.61
KILT-AC: 0.0
KILT-F1: 0.0
R-Prec: 0.0
Recall@5: 0.0
slot-filling-on-kilt-zero-shot-reT5-base
Accuracy: 9.02
F1: 13.52
KILT-AC: 0.0
KILT-F1: 0.0
R-Prec: 0.0
Recall@5: 0.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp