HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Hierarchic Temporal Convolutional Network With Cross-Domain Encoder for Music Source Separation

{Hao Huang Liang He Wenzhong Yang Yadong Chen Ying Hu}

Abstract

Recently, the time-domain-based methods (i.e., the method of modeling the raw waveform directly) for audio source separation have shown tremendous potential. In this paper, we propose a model which combines the complexed spectrogram domain feature and time-domain feature by a cross-domain encoder (CDE) and adopts the hierarchic temporal convolutional network (HTCN) for multiple music sources separation. The CDE is designed to enable the network to code the interactive information of the time-domain and complexed spectrogram domain features. HTCN enables it to learn the long-time series dependence effectively. We also designed a feature calibration unit (FCU) to be applied in the HTCN and adopted the multi-stage training strategy during the training stage. The ablation study demonstrates the effectiveness of each designed component in the model. We conducted the experiments on the MUSDB18 dataset. The experimental results indicate that our proposed CDE-HTCN model outperforms the top-of-the-line methods and, compared with the state-of-the-art method, DEMUCS, achieves the improvement of the average SDR score of 0.61 dB. Significantly, the improvement of the SDR score for the bass source has a sizable margin of 0.91 dB.

Benchmarks

BenchmarkMethodologyMetrics
music-source-separation-on-musdb18CDE-HTCN
SDR (avg): 6.89
SDR (bass): 7.92
SDR (drums): 7.33
SDR (other): 4.92
SDR (vocals): 7.37

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchic Temporal Convolutional Network With Cross-Domain Encoder for Music Source Separation | Papers | HyperAI