HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Shaked Dovrat Eliya Nachmani Lior Wolf

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Abstract

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C^3)$ time complexity, where $C$ is the number of speakers, in comparison to $O(C!)$ of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.

Code Repositories

shakeddovrat/librimix
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-separation-on-libri10mixHungarian PIT
SI-SDRi: 7.78
speech-separation-on-libri15mixHungarian PIT
SI-SDRi: 5.66
speech-separation-on-libri20mixHungarian PIT
SI-SDRi: 4.26
speech-separation-on-libri5mixHungarian PIT
SI-SDRi: 12.72
speech-separation-on-wsj0-5mixHungarian PIT
SI-SDRi: 13.22

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp