HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
视频字幕生成
Video Captioning On Msvd 1
Video Captioning On Msvd 1
评估指标
BLEU-4
CIDEr
METEOR
ROUGE-L
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
BLEU-4
CIDEr
METEOR
ROUGE-L
Paper Title
Repository
MaMMUT
-
195.6
-
-
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
VLAB
79.3
179.8
51.2
87.9
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
-
VALOR
80.7
178.5
51.0
87.9
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
COSA
76.5
178.5
-
-
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
mPLUG-2
70.5
165.8
48.4
85.3
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
HowToCaption
70.4
154.2
46.4
83.2
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HiTeA
71.0
146.9
45.3
81.4
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
-
Vid2Seq
-
146.2
45.3
-
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
VIOLETv2
-
139.2
-
-
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
RTQ
66.9
123.4
-
82.2
RTQ: Rethinking Video-language Understanding Based on Image-text Model
CoCap (ViT/L14)
60.1
121.5
41.4
78.2
Accurate and Fast Compressed Video Captioning
VASTA (Vatex-backbone)
59.2
119.7
40.65
76.7
Diverse Video Captioning by Adaptive Spatio-temporal Attention
IcoCap (ViT-B/16)
59.1
110.3
39.5
76.5
IcoCap: Improving Video Captioning by Compounding Images
-
SEM-POS
60.1
108.3
38.5
76.0
SEM-POS: Grammatically and Semantically Correct Video Captioning
-
VASTA (Kinetics-backbone)
56.1
106.4
39.1
74.5
Diverse Video Captioning by Adaptive Spatio-temporal Attention
IcoCap (ViT-B/32)
56.3
103.8
38.9
75.0
IcoCap: Improving Video Captioning by Compounding Images
-
0 of 16 row(s) selected.
Previous
Next
Video Captioning On Msvd 1 | SOTA | HyperAI超神经