Command Palette
Search for a command to run...
{Zhijian Ou Hongyu Xiang}

Abstract
In this paper, we develop conditional random field (CRF) based single-stage (SS) acoustic modeling with connectionist temporal classification (CTC) inspired state topology, which is called CTC-CRF for short.CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology.Like SS-LF-MMI (lattice-free maximum-mutual-information), CTC-CRFs can be trained from scratch (flat-start), eliminating GMM-HMM pre-training and tree-building.Evaluation experiments are conducted on the WSJ, Switchboard and Librispeech datasets.In a head-to-head comparison, the CTC-CRF model using simple Bidirectional LSTMs consistently outperforms the strong SS-LF-MMI, across all the three benchmarking datasets and in both cases of mono-phones and mono-chars.Additionally, CTC-CRFs avoid some ad-hoc operation in SS-LF-MMI.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-recognition-on-librispeech-test-clean | CTC-CRF 4gram-LM | Word Error Rate (WER): 4.09 |
| speech-recognition-on-librispeech-test-other | CTC-CRF 4gram-LM | Word Error Rate (WER): 10.65 |
| speech-recognition-on-wsj-dev93 | Convolutional Speech Recognition | Word Error Rate (WER): 6.23 |
| speech-recognition-on-wsj-eval92 | CTC-CRF 4gram-LM | Word Error Rate (WER): 3.79 |
| speech-recognition-on-wsj-eval93 | CTC-CRF 4gram-LM | Word Error Rate (WER): 6.23 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.