HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

In defence of metric learning for speaker recognition

Joon Son Chung Jaesung Huh Seongkyu Mun Minjae Lee Hee Soo Heo Soyeon Choe Chiheon Ham Sunghwan Jung Bong-Jin Lee Icksang Han

In defence of metric learning for speaker recognition

Abstract

The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most popular loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our proposed metric learning objective outperform state-of-the-art methods.

Code Repositories

coqui-ai/TTS
pytorch
Mentioned in GitHub
shkim816/temporal_dynamic_cnn
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
real-time-semantic-segmentation-on-cityscapes-1SwiftNetRN-18
Frame (fps): 39.9
mIoU: 75.5%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp