HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Xiaoming Ren Huifeng Zhu Liuwei Wei Minghui Wu Jie Hao

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Abstract

Recently Convolution-augmented Transformer (Conformer) has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer. In this work, we believe that the output information of each block in the encoder and decoder is not completely inclusive, in other words, their output information may be complementary. We study how to take advantage of the complementary information of each block in a parameter-efficient way, and it is expected that this may lead to more robust performance. Therefore we propose the Block-augmented Transformer for speech recognition, named Blockformer. We have implemented two block ensemble methods: the base Weighted Sum of the Blocks Output (Base-WSBO), and the Squeeze-and-Excitation module to Weighted Sum of the Blocks Output (SE-WSBO). Experiments have proved that the Blockformer significantly outperforms the state-of-the-art Conformer-based models on AISHELL-1, our model achieves a CER of 4.29\% without using a language model and 4.05\% with an external language model on the testset.

Code Repositories

mininglamp-technology/asr-blockformer
Official
pytorch
Mentioned in GitHub
LeonWlw/asr_blockformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-aishell-1SE-WSBO With LM
Params(M): 46
Word Error Rate (WER): 4.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp