3 months ago

Music Source Separation Based on a Lightweight Deep Learning Framework (DTTNET: DUAL-PATH TFC-TDF UNET)

Junyu Chen Susmitha Vekkot Pancham Shukla

Abstract

Music source separation (MSS) aims to extract 'vocals', 'drums', 'bass' and 'other' tracks from a piece of mixed music. While deep learning methods have shown impressive results, there is a trend toward larger models. In our paper, we introduce a novel and lightweight architecture called DTTNet, which is based on Dual-Path Module and Time-Frequency Convolutions Time-Distributed Fully-connected UNet (TFC-TDF UNet). DTTNet achieves 10.12 dB cSDR on 'vocals' compared to 10.01 dB reported for Bandsplit RNN (BSRNN) but with 86.7% fewer parameters. We also assess pattern-specific performance and model generalization for intricate audio patterns.

Code Repositories

FaceOnLive/Spleeter-Android-iOS

junyuchen-cjy/dttnet-pytorch

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
music-source-separation-on-musdb18-hq	Dual-Path TFC-TDF UNet (DTTNet)	SDR (avg): 8.15 SDR (bass): 7.55 SDR (drums): 7.82 SDR (others): 7.02 SDR (vocals): 10.21

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette