HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
视频问答
Video Question Answering On Situated
Video Question Answering On Situated
评估指标
Average Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Average Accuracy
Paper Title
Repository
VLAP (4 frames)
67.1
ViLA: Efficient Video-Language Alignment for Video Question Answering
LLaMA-VQA
65.4
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
SeViLA
64.9
Self-Chained Image-Language Model for Video Localization and Question Answering
InternVideo
58.7
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
GF(sup)
53.94
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
GF(uns)
53.86
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
MIST
51.13
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
Temp[ATP]
48.37
Revisiting the "Video" in Video-Language Understanding
AnyMAL-70B (0-shot)
48.2
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
All-in-one
47.5
All in One: Exploring Unified Video-Language Pre-training
TraveLER (0-shot)
44.9
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
SeViLA (0-shot)
44.6
Self-Chained Image-Language Model for Video Localization and Question Answering
Flamingo-9B (4-shot)
42.8
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo-80B (4-shot)
42.4
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo-9B (0-shot)
41.8
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo-80B (0-shot)
39.7
Flamingo: a Visual Language Model for Few-Shot Learning
SHG-VQA (trained from scratch)
39.47
Learning Situation Hyper-Graphs for Video Question Answering
0 of 17 row(s) selected.
Previous
Next
Video Question Answering On Situated | SOTA | HyperAI超神经