HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Real-Time Target Sound Extraction

Veluri Bandhav ; Chan Justin ; Itani Malek ; Chen Tuochao ; Yoshioka Takuya ; Gollakota Shyamnath

Real-Time Target Sound Extraction

Abstract

We present the first neural network model to achieve real-time and streamingtarget sound extraction. To accomplish this, we propose Waveformer, anencoder-decoder architecture with a stack of dilated causal convolution layersas the encoder, and a transformer decoder layer as the decoder. This hybridarchitecture uses dilated causal convolutions for processing large receptivefields in a computationally efficient manner while also leveraging thegeneralization performance of transformer-based architectures. Our evaluationsshow as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior modelsfor this task while having a 1.2-4x smaller model size and a 1.5-2x lowerruntime. We provide code, dataset, and audio samples:https://waveformer.cs.washington.edu/.

Code Repositories

vb000/waveformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
streaming-target-sound-extraction-onWaveformer
SI-SNRi: 9.43
target-sound-extraction-on-fsdsoundscapesWaveformer
SI-SNRi: 9.43

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp