Image Sentence Alignment On Valse
Metrics
Average Accuracy
average pairwise accuracy
Results
Performance results of various models on this benchmark
Model Name | Average Accuracy | average pairwise accuracy | Paper Title | Repository |
---|---|---|---|---|
ViLBERT 12-in-1 | 63.2 | 75.1 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
LXMERT | 53.5 | 59.6 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
CLIP | - | 64.0 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
ViLBERT | 51.3 | 63.7 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
VisualBERT | 48.8 | 46.4 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
GPT1 | - | 60.7 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
GPT2 | - | 60.1 | VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena | - |
0 of 7 row(s) selected.