HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
音频分类
Audio Classification On Vggsound
Audio Classification On Vggsound
评估指标
Top 1 Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Top 1 Accuracy
Paper Title
Repository
Mirasol3B
69.8
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
-
ONE-PEACE (Audio-Visual)
68.2
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
MAViL
67.1
-
-
EquiAV
67.1
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
MMT (Audio-Visual)
66.2
Multiscale Multimodal Transformer for Multimodal Action Recognition
-
CAV-MAE (Audio-Visual)
65.9
Contrastive Audio-Visual Masked Autoencoder
UAVM (Audio + Video)
65.8
UAVM: Towards Unifying Audio and Visual Models
Audiovisual Masked Autoencoder (Audiovisual, Single)
65.0
Audiovisual Masked Autoencoders
AVT (Audio-Visual)
63.9
AVT: Audio-Video Transformer for Multimodal Action Recognition
-
ONE-PEACE (Audio-Only)
59.6
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
CAV-MAE (Audio-Only)
59.5
Contrastive Audio-Visual Masked Autoencoder
Audiovisual Masked Autoencoder (Audio-only, Single)
57.2
Audiovisual Masked Autoencoders
MAST (Audio Only)
57.0
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
-
UAVM (Audio Only)
56.5
UAVM: Towards Unifying Audio and Visual Models
MMT (Video)
56.1
Multiscale Multimodal Transformer for Multimodal Action Recognition
-
PlayItBackX3
53.7
Play It Back: Iterative Attention for Audio Recognition
AVT (V)
53.2
AVT: Audio-Video Transformer for Multimodal Action Recognition
-
MBT (A)
52.3
Attention Bottlenecks for Multimodal Fusion
MBT (V)
51.2
Attention Bottlenecks for Multimodal Fusion
UAVM (Video Only)
49.9
UAVM: Towards Unifying Audio and Visual Models
0 of 21 row(s) selected.
Previous
Next