Command Palette
Search for a command to run...
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Schmid Florian ; Koutini Khaled ; Widmer Gerhard

Abstract
The introduction of large-scale audio datasets, such as AudioSet, paved theway for Transformers to conquer the audio domain and replace CNNs as thestate-of-the-art neural network architecture for many tasks. Audio SpectrogramTransformers are excellent at exploiting large datasets, creating powerfulpre-trained models that surpass CNNs when fine-tuned on downstream tasks.However, current popular Audio Spectrogram Transformers are demanding in termsof computational complexity compared to CNNs. Recently, we have shown that, byemploying Transformer-to-CNN Knowledge Distillation, efficient CNNs can catchup with and even outperform Transformers on large datasets. In this work, weextend this line of research and increase the capacity of efficient CNNs byintroducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamicconvolutions and attention mechanisms. We show that these dynamic CNNsoutperform traditional efficient CNNs, in terms of the performance-complexitytrade-off and parameter efficiency, at the task of audio tagging on thelarge-scale AudioSet. Our experiments further indicate that the introduceddynamic CNNs achieve better performance on downstream tasks and scale up well,attaining Transformer performance and even outperforming them on AudioSet andseveral downstream tasks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-classification-on-audioset | DyMN-L (Audio-Only, Single) | Test mAP: 0.490 |
| audio-classification-on-esc-50 | DyMN-L | Accuracy (5-fold): 97.4 PRE-TRAINING DATASET: AudioSet Top-1 Accuracy: 97.4 |
| audio-classification-on-fsd50k | MN | mAP: 65.6 |
| audio-classification-on-fsd50k | DyMN-L | mAP: 65.5 |
| audio-tagging-on-audioset | DyMN-L (Audio-Only, Single) | mean average precision: 0.490 |
| instrument-recognition-on-openmic-2018 | DyMN-L | mean average precision: 0.855 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.