Visual Question Answering
Visual Question Answering (VQA) 是计算机视觉领域的一项任务,旨在通过自然语言回答关于图像的问题。该任务的核心目标是使机器能够理解图像内容,并以准确、连贯的语言形式提供答案。VQA 在人机交互、智能辅助和内容理解等方面具有重要应用价值,能够显著提升机器的视觉认知能力。
A-OKVQA
ActivityNet
BLIP-2 T5
AI2D
ArtQuest
PrefixLM with CLIP and T5
AutoHallusion
GPT-4V
CLEVR
NS-VQA (1K programs)
CLEVR-Humans
MDETR
COCO
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
MCB 7 att.
COCO Visual Question Answering (VQA) real images 1.0 open ended
COCO Visual Question Answering (VQA) real images 2.0 open ended
HDU-USYD-UNCC
InfiMM-Eval
GPT-4V
CORE-MM
DeepForm
DocVQA
DocVQA test
Human
DocVQA val
BERT LARGE Baseline
DVQA test-familiar
PReFIL (Oracle OCR)
EgoSchema
Lyra-Pro
F-VQA
ZS-F-VQA
FigureQA - test 1
PReFIL
GQA
PEVL+
GQA test-dev
CFR
GQA test-std
ProTo
GQA Test2019
GRIT
HallusionBench
GPT-4V
IconQA
Patch-TRM
IllusionVQA
ImageNet
InfographicVQA
Gemini Ultra (pixel only)
InfoSeek
MM-Vet
MME
MSRVTT-QA
mPLUG-2
MSVD-QA
mPLUG-2
MVBench
OK-VQA
PaLI-X (Single-task FT)
OVAD benchmark
PlotQA-D1
PlotQA-D2
PMC-VQA
QLEVR
MAC
RetVQA
MI-BART
TDIUC
Accuracy
TextVQA
TextVQA test-standard
PaLI
TGIF-QA
VCR (Q-A) dev
VL-BERTLARGE
VCR (Q-A) test
VCR (Q-AR) dev
VL-BERTLARGE
VCR (Q-AR) test
GPT4RoI
VCR (QA-R) dev
VL-BERTLARGE
VCR (QA-R) test
UNITER (Large)
Video MME
Visual Genome (pairs)
CMN
Visual Genome (subjects)
Visual7W
CMN
VizWiz 2018
LXR955, No Ensemble
VizWiz 2018 Answerability
VizWiz 2020 Answerability
VizWiz 2020 VQA
VLM2-Bench
VQA-CE
RandImg
VQA-CP
CSS
VQA v1 test-dev
SAAA (ResNet)
VQA v1 test-std
SAAA (ResNet)
VQA v2 test-dev
Oscar
VQA v2 test-std
BEiT-3
VQA v2 val
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
VQA-X
WebSRC
WHOOPS!
ZS-F-VQA
SAN † - hard mask