HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Zeroshot Video Question Answer
Zero Shot Video Question Answer On Intentqa
Zero Shot Video Question Answer On Intentqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
IG-VLM
65.3
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
VideoTree (GPT4)
66.9
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VidCtx (7B)
67.1
VidCtx: Context-aware Video Question Answering with Image Models
-
LLoVi (GPT-4)
64.0
A Simple LLM Framework for Long-Range Video Question-Answering
LangRepo (12B)
59.1
Language Repository for Long Video Understanding
SeViLA (4B)
60.9
Self-Chained Image-Language Model for Video Localization and Question Answering
LVNet
71.1
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
ENTER
71.5
ENTER: Event Based Interpretable Reasoning for VideoQA
-
LLoVi (7B)
53.6
A Simple LLM Framework for Long-Range Video Question-Answering
Mistral (7B)
50.4
Mistral 7B
TS-LLaVA-34B
67.9
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
SlowFast-LLaVA-34B
60.1
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Random
20.0
-
-
0 of 13 row(s) selected.
Previous
Next