HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Audio Classification
Audio Classification On Vggsound
Audio Classification On Vggsound
Metrics
Top 1 Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Top 1 Accuracy
Paper Title
Repository
ONE-PEACE (Audio-Visual)
68.2
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Mirasol3B
69.8
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
-
MAST (Audio Only)
57.0
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
-
Audiovisual Masked Autoencoder (Audio-only, Single)
57.2
Audiovisual Masked Autoencoders
CAV-MAE (Audio-Visual)
65.9
Contrastive Audio-Visual Masked Autoencoder
AVT (Audio-Visual)
63.9
AVT: Audio-Video Transformer for Multimodal Action Recognition
-
PlayItBackX3
53.7
Play It Back: Iterative Attention for Audio Recognition
Audiovisual Masked Autoencoder (Audiovisual, Single)
65.0
Audiovisual Masked Autoencoders
MBT (AV)
-
Attention Bottlenecks for Multimodal Fusion
AVT (V)
53.2
AVT: Audio-Video Transformer for Multimodal Action Recognition
-
MBT (A)
52.3
Attention Bottlenecks for Multimodal Fusion
CAV-MAE (Audio-Only)
59.5
Contrastive Audio-Visual Masked Autoencoder
MMT (Audio-Visual)
66.2
Multiscale Multimodal Transformer for Multimodal Action Recognition
-
MAViL
67.1
-
-
MMT (Video)
56.1
Multiscale Multimodal Transformer for Multimodal Action Recognition
-
ONE-PEACE (Audio-Only)
59.6
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
UAVM (Audio + Video)
65.8
UAVM: Towards Unifying Audio and Visual Models
UAVM (Audio Only)
56.5
UAVM: Towards Unifying Audio and Visual Models
UAVM (Video Only)
49.9
UAVM: Towards Unifying Audio and Visual Models
EquiAV
67.1
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
0 of 21 row(s) selected.
Previous
Next