HyperAI

Audio Classification On Audioset

Metrics

Test mAP

Results

Performance results of various models on this benchmark

Model Name
Test mAP
Paper TitleRepository
EAT0.486EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
mn40_as (Single)0.483Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
M2D-AS/0.70.485Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
MAViL (Audio-Visual, single)0.533--
EAT-S0.405End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
BEATs (Audio-only, Single)0.486BEATs: Audio Pre-Training with Acoustic Tokenizers
CAV-MAE (Audio-Visual)0.512Contrastive Audio-Visual Masked Autoencoder
L30.249Look, Listen and Learn
OmniVec20.558OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning-
PSLA (Single)0.443PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
EAT-M0.426End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
ATST-Frame0.480Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
CAV-MAE (Audio-Only)0.466Contrastive Audio-Visual Masked Autoencoder
DTF-AT (Single)0.486DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
Audiovisual Masked Autoencoder (Audio-only, Single)0.466Audiovisual Masked Autoencoders
AST (Ensemble)0.485AST: Audio Spectrogram Transformer
BEATs (Audio-only, Ensemble)0.506BEATs: Audio Pre-Training with Acoustic Tokenizers
EquiAV0.546EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
MMV0.309Self-Supervised MultiModal Versatile Networks
CAV-MAE (Visual-Only)0.262Contrastive Audio-Visual Masked Autoencoder
0 of 50 row(s) selected.