Video Question Answering
Video Question Answering(VQA)是一种结合计算机视觉与自然语言处理技术的任务,旨在通过分析视频内容来准确回答用户提出的与视频相关的问题。其目标是实现对视频中视觉和语言信息的深度融合理解,从而提供精准、高效的信息检索和交互体验。VQA在智能视频助手、教育平台、娱乐系统等领域具有重要的应用价值。
ActivityNet-QA
VideoChat2
AGQA 2.0 balanced
GF (sup) - Faster RCNN
DramaQA
How2QA
Text + Text (no Multimodal Pretext Training)
Howto100M-QA
TimeSformer
IntentQA
VideoChat2_mistral
iVQA
FrozenBiLM
LSMDC-FiB
Clover
LSMDC-MC
VIOLETv2
MSR-VTT
MSR-VTT-MC
ATP (1<-16)
MSRVTT-MC
Singularity-temporal
MSRVTT-QA
FrozenBiLM
MSVD-QA
MVBench
Tarsier (34B)
NExT-QA
LinVT-Qwen2-VL
(7B)
NExT-QA (Efficient)
ViLA (3B, 4 frames)
Perception Test
Oyrx (34B)
RoadTextVQA
GIT
STAR Benchmark
VLAP (4 frames)
TGIF-QA
SUTD-TrafficQA
TVBench
Tarsier-34B
TVQA
LLaMA-VQA
VideoQA
Just Ask (fine-tune)
VLEP
WildQA