HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

CRF-based Single-stage Acoustic Modeling with CTC Topology

{Zhijian Ou Hongyu Xiang}

CRF-based Single-stage Acoustic Modeling with CTC Topology

Abstract

In this paper, we develop conditional random field (CRF) based single-stage (SS) acoustic modeling with connectionist temporal classification (CTC) inspired state topology, which is called CTC-CRF for short.CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology.Like SS-LF-MMI (lattice-free maximum-mutual-information), CTC-CRFs can be trained from scratch (flat-start), eliminating GMM-HMM pre-training and tree-building.Evaluation experiments are conducted on the WSJ, Switchboard and Librispeech datasets.In a head-to-head comparison, the CTC-CRF model using simple Bidirectional LSTMs consistently outperforms the strong SS-LF-MMI, across all the three benchmarking datasets and in both cases of mono-phones and mono-chars.Additionally, CTC-CRFs avoid some ad-hoc operation in SS-LF-MMI.

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-librispeech-test-cleanCTC-CRF 4gram-LM
Word Error Rate (WER): 4.09
speech-recognition-on-librispeech-test-otherCTC-CRF 4gram-LM
Word Error Rate (WER): 10.65
speech-recognition-on-wsj-dev93Convolutional Speech Recognition
Word Error Rate (WER): 6.23
speech-recognition-on-wsj-eval92CTC-CRF 4gram-LM
Word Error Rate (WER): 3.79
speech-recognition-on-wsj-eval93CTC-CRF 4gram-LM
Word Error Rate (WER): 6.23

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp