HyperAI
HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Video Question Answering
Video Question Answering On Situated
Video Question Answering On Situated
Metrics
Average Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Average Accuracy
Paper Title
Repository
MIST
51.13
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
-
TraveLER (0-shot)
44.9
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
-
SHG-VQA (trained from scratch)
39.47
Learning Situation Hyper-Graphs for Video Question Answering
-
Flamingo-9B (4-shot)
42.8
Flamingo: a Visual Language Model for Few-Shot Learning
-
SeViLA
64.9
Self-Chained Image-Language Model for Video Localization and Question Answering
-
All-in-one
47.5
All in One: Exploring Unified Video-Language Pre-training
-
GF(sup)
53.94
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
-
VLAP (4 frames)
67.1
ViLA: Efficient Video-Language Alignment for Video Question Answering
-
SeViLA (0-shot)
44.6
Self-Chained Image-Language Model for Video Localization and Question Answering
-
Flamingo-80B (0-shot)
39.7
Flamingo: a Visual Language Model for Few-Shot Learning
-
LLaMA-VQA
65.4
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
-
InternVideo
58.7
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
-
Flamingo-9B (0-shot)
41.8
Flamingo: a Visual Language Model for Few-Shot Learning
-
Temp[ATP]
48.37
Revisiting the "Video" in Video-Language Understanding
-
AnyMAL-70B (0-shot)
48.2
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
-
Flamingo-80B (4-shot)
42.4
Flamingo: a Visual Language Model for Few-Shot Learning
-
GF(uns)
53.86
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
-
0 of 17 row(s) selected.
Previous
Next
Video Question Answering On Situated | SOTA | HyperAI