Command Palette
Search for a command to run...
WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context
Anna Breit Artem Revenko Kiamehr Rezaee Mohammad Taher Pilehvar Jose Camacho-Collados

Abstract
We present WiC-TSV, a new multi-domain evaluation benchmark for Word Sense Disambiguation. More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as a binary classification task thus being independent of external sense inventories, and the coverage of various domains. This makes the dataset highly flexible for the evaluation of a diverse set of models and systems in and across domains. WiC-TSV provides three different evaluation settings, depending on the input signals provided to the model. We set baseline performance on the dataset using state-of-the-art language models. Experimental results show that even though these models can perform decently on the task, there remains a gap between machine and human performance, especially in out-of-domain settings. WiC-TSV data is available at https://competitions.codalab.org/competitions/23683
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| entity-linking-on-wic-tsv | Bert-base | Task 1 Accuracy: all: 75.3 Task 1 Accuracy: domain specific: 77.9 Task 1 Accuracy: general purpose: 73.3 Task 2 Accuracy: all: 71.7 Task 2 Accuracy: domain specific: 74.7 Task 2 Accuracy: general purpose: 68.6 Task 3 Accuracy: all: 76.6 Task 3 Accuracy: domain specific: 80.4 Task 3 Accuracy: general purpose: 73.5 |
| entity-linking-on-wic-tsv | Unsupervised Bert | Task 1 Accuracy: all: 54.4 Task 1 Accuracy: domain specific: 60.6 Task 1 Accuracy: general purpose: 49.2 Task 2 Accuracy: all: 62.8 Task 2 Accuracy: domain specific: 69.1 Task 2 Accuracy: general purpose: 57.6 Task 3 Accuracy: all: 60.5 Task 3 Accuracy: domain specific: 67.9 Task 3 Accuracy: general purpose: 54.4 |
| entity-linking-on-wic-tsv | All true | Task 1 Accuracy: all: 50.8 Task 1 Accuracy: domain specific: 47.0 Task 1 Accuracy: general purpose: 53.8 Task 2 Accuracy: all: 50.8 Task 2 Accuracy: domain specific: 47.0 Task 2 Accuracy: general purpose: 53.8 Task 3 Accuracy: all: 50.8 Task 3 Accuracy: domain specific: 47.0 Task 3 Accuracy: general purpose: 53.8 |
| entity-linking-on-wic-tsv | FastText | Task 1 Accuracy: all: 53.7 Task 1 Accuracy: domain specific: 50.6 Task 1 Accuracy: general purpose: 56.2 Task 2 Accuracy: all: 52.7 Task 2 Accuracy: domain specific: 47.7 Task 2 Accuracy: general purpose: 56.8 Task 3 Accuracy: all: 53.4 Task 3 Accuracy: domain specific: 49.0 Task 3 Accuracy: general purpose: 57.1 |
| entity-linking-on-wic-tsv | Human | Task 3 Accuracy: all: 85.3 Task 3 Accuracy: domain specific: 89.2 Task 3 Accuracy: general purpose: 82.1 |
| word-sense-disambiguation-on-wic-tsv | Bert-base | Task 1 Accuracy: all: 75.3 Task 1 Accuracy: domain specific: 77.9 Task 1 Accuracy: general purpose: 73.3 Task 2 Accuracy: all: 71.7 Task 2 Accuracy: domain specific: 74.7 Task 2 Accuracy: general purpose: 68.6 Task 3 Accuracy: all: 76.6 Task 3 Accuracy: domain specific: 80.4 Task 3 Accuracy: general purpose: 73.5 |
| word-sense-disambiguation-on-wic-tsv | Human | Task 3 Accuracy: all: 85.3 Task 3 Accuracy: domain specific: 89.2 Task 3 Accuracy: general purpose: 82.1 |
| word-sense-disambiguation-on-wic-tsv | FastText | Task 1 Accuracy: all: 53.7 Task 1 Accuracy: domain specific: 50.6 Task 1 Accuracy: general purpose: 56.2 Task 2 Accuracy: all: 52.7 Task 2 Accuracy: domain specific: 47.7 Task 2 Accuracy: general purpose: 56.8 Task 3 Accuracy: all: 53.4 Task 3 Accuracy: domain specific: 49.0 Task 3 Accuracy: general purpose: 57.1 |
| word-sense-disambiguation-on-wic-tsv | All true | Task 1 Accuracy: all: 50.8 Task 1 Accuracy: domain specific: 47.0 Task 1 Accuracy: general purpose: 53.8 Task 2 Accuracy: all: 50.8 Task 2 Accuracy: domain specific: 47.0 Task 2 Accuracy: general purpose: 53.8 Task 3 Accuracy: all: 50.8 Task 3 Accuracy: domain specific: 47.0 Task 3 Accuracy: general purpose: 53.8 |
| word-sense-disambiguation-on-wic-tsv | Unsupervised Bert | Task 1 Accuracy: all: 54.4 Task 1 Accuracy: domain specific: 60.6 Task 1 Accuracy: general purpose: 49.2 Task 2 Accuracy: all: 62.8 Task 2 Accuracy: domain specific: 69.1 Task 2 Accuracy: general purpose: 57.6 Task 3 Accuracy: all: 60.5 Task 3 Accuracy: domain specific: 67.9 Task 3 Accuracy: general purpose: 54.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.