HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Video Question Answering
Video Question Answering On Next Qa
Video Question Answering On Next Qa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
LLaVA-Video
83.2
Video Instruction Tuning With Synthetic Data
-
LLaVA-NeXT-Interleave(14B)
79.1
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
ATM
58.3
ATM: Action Temporality Modeling for Video Question Answering
-
VideoChat2_HD_mistral
79.5
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
ViperGPT(0-shot)
60.0
ViperGPT: Visual Inference via Python Execution for Reasoning
LongVILA(7B)
80.7
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
VGT(PT)
56.9
Video Graph Transformer for Video Question Answering
TCR
73.5
Text-Conditioned Resampler For Long Form Video Understanding
-
ViLA (3B)
75.6
ViLA: Efficient Video-Language Alignment for Video Question Answering
HiTeA
63.1
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
-
HQGA
51.4
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
RTQ
63.2
RTQ: Rethinking Video-language Understanding Based on Image-text Model
GF
58.83
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
LSTP
72.1
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
LLaMA-VQA (33B)
75.5
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
CoVGT(PT)
60.7
Contrastive Video Question Answering via Video Graph Transformer
-
SeViT
60.6
Semi-Parametric Video-Grounded Text Generation
-
VideoChat2_mistral
78.6
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Vamos
77.3
Vamos: Versatile Action Models for Video Understanding
-
LinVT-Qwen2-VL (7B)
85.5
LinVT: Empower Your Image-level Large Language Model to Understand Videos
0 of 44 row(s) selected.
Previous
Next