HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
音频字幕生成
Audio Captioning On Audiocaps
Audio Captioning On Audiocaps
评估指标
CIDEr
METEOR
SPICE
SPIDEr
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
CIDEr
METEOR
SPICE
SPIDEr
Paper Title
Repository
SLAM-AAC
0.841
0.268
0.194
0.518
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
AutoCap
0.832
0.253
0.182
0.507
Taming Data and Transformers for Audio Generation
EnCLAP++-large
0.823
0.269
0.197
0.510
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
LOAE
0.816
0.267
0.193
0.505
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
EnCLAP++-base
0.815
0.257
0.188
0.501
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
CNext-trans
0.8061
0.2527
0.1841
0.4951
-
-
EnCLAP-large
0.8029
0.2554
0.1879
0.4954
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
VAST
0.781
0.247
-
-
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
EnCLAP-base
0.7795
0.2473
0.1863
0.4829
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
AL-MixGen + Multi-TTA
0.769
-
0.181
0.475
-
-
Rethink-ACT (AST + TF + MIL)
0.764
0.242
0.180
0.472
Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer
-
AL-MixGen
0.755
-
0.177
0.466
Exploring Train and Test-Time Augmentations for Audio-Language Learning
-
BART + YAMNet + PANNs
0.753
-
0.176
0.465
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS
-
VALOR
0.741
0.231
-
-
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
CNN+Transformer
0.693
-
0.159
0.426
Audio Captioning Transformer
TopDown-AlignedAtt (1NN)
0.593
-
0.144
0.369
AudioCaps: Generating Captions for Audios in The Wild
-
0 of 16 row(s) selected.
Previous
Next