HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Audio Transformers

Verma Prateek Berger Jonathan

Audio Transformers

Abstract

Over the past two decades, CNN architectures have produced compelling modelsof sound perception and cognition, learning hierarchical organizations offeatures. Analogous to successes in computer vision, audio featureclassification can be optimized for a particular task of interest, over a widevariety of datasets and labels. In fact similar architectures designed forimage understanding have proven effective for acoustic scene analysis. Here wepropose applying Transformer based architectures without convolutional layersto raw audio signals. On a standard dataset of Free Sound 50K,comprising of 200categories, our model outperforms convolutional models to produce state of theart results. This is significant as unlike in natural language processing andcomputer vision, we do not perform unsupervised pre-training for outperformingconvolutional architectures. On the same training set, with respect meanaver-age precision benchmarks, we show a significant improvement. We furtherimprove the performance of Transformer architectures by using techniques suchas pooling inspired from convolutional net-work designed in the past few years.In addition, we also show how multi-rate signal processing ideas inspired fromwavelets, can be applied to the Transformer embeddings to improve the results.We also show how our models learns a non-linear non constant band-widthfilter-bank, which shows an adaptable time frequency front end representationfor the task of audio understanding, different from other tasks e.g. pitchestimation.

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-fsd50kLarge 6-Layer Transformer with Pooling
mAP: 53.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Audio Transformers | Papers | HyperAI