HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Video Question Answering
Video Question Answering On Situated
Video Question Answering On Situated
评估指标
Average Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Average Accuracy
Paper Title
Repository
MIST
51.13
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
TraveLER (0-shot)
44.9
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
SHG-VQA (trained from scratch)
39.47
Learning Situation Hyper-Graphs for Video Question Answering
Flamingo-9B (4-shot)
42.8
Flamingo: a Visual Language Model for Few-Shot Learning
SeViLA
64.9
Self-Chained Image-Language Model for Video Localization and Question Answering
All-in-one
47.5
All in One: Exploring Unified Video-Language Pre-training
GF(sup)
53.94
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
VLAP (4 frames)
67.1
ViLA: Efficient Video-Language Alignment for Video Question Answering
SeViLA (0-shot)
44.6
Self-Chained Image-Language Model for Video Localization and Question Answering
Flamingo-80B (0-shot)
39.7
Flamingo: a Visual Language Model for Few-Shot Learning
LLaMA-VQA
65.4
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
InternVideo
58.7
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Flamingo-9B (0-shot)
41.8
Flamingo: a Visual Language Model for Few-Shot Learning
Temp[ATP]
48.37
Revisiting the "Video" in Video-Language Understanding
AnyMAL-70B (0-shot)
48.2
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Flamingo-80B (4-shot)
42.4
Flamingo: a Visual Language Model for Few-Shot Learning
GF(uns)
53.86
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
0 of 17 row(s) selected.
Previous
Next