HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

wav2vec: Unsupervised Pre-training for Speech Recognition

Steffen Schneider; Alexei Baevski; Ronan Collobert; Michael Auli

wav2vec: Unsupervised Pre-training for Speech Recognition

Abstract

We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available. Our approach achieves 2.43% WER on the nov92 test set. This outperforms Deep Speech 2, the best reported character-based system in the literature while using two orders of magnitude less labeled training data.

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-timitwav2vec
Percentage error: 14.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp