HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Is Attention always needed? A Case Study on Language Identification from Speech

Atanu Mandal; Santanu Pal; Indranil Dutta; Mahidas Bhattacharya; Sudip Kumar Naskar

Is Attention always needed? A Case Study on Language Identification from Speech

Abstract

Language Identification (LID) is a crucial preliminary process in the field of Automatic Speech Recognition (ASR) that involves the identification of a spoken language from audio samples. Contemporary systems that can process speech in multiple languages require users to expressly designate one or more languages prior to utilization. The LID task assumes a significant role in scenarios where ASR systems are unable to comprehend the spoken language in multilingual settings, leading to unsuccessful speech recognition outcomes. The present study introduces convolutional recurrent neural network (CRNN) based LID, designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples. Furthermore, we replicate certain state-of-the-art methodologies, specifically the Convolutional Neural Network (CNN) and Attention-based Convolutional Recurrent Neural Network (CRNN with attention), and conduct a comparative analysis with our CRNN-based approach. We conducted comprehensive evaluations on thirteen distinct Indian languages and our model resulted in over 98\% classification accuracy. The LID model exhibits high-performance levels ranging from 97% to 100% for languages that are linguistically similar. The proposed LID model exhibits a high degree of extensibility to additional languages and demonstrates a strong resistance to noise, achieving 91.2% accuracy in a noisy setting when applied to a European Language (EU) dataset.

Benchmarks

BenchmarkMethodologyMetrics
spoken-language-identification-on-indicttsCRNN
Classification Accuracy: 0.987
spoken-language-identification-on-indicttsCNN
Classification Accuracy: 0.983
spoken-language-identification-on-indicttsCRNN Attention
Classification Accuracy: 0.987
spoken-language-identification-on-youtubeCRNN
Accuracy : 0.967
spoken-language-identification-on-youtubeCRNN Attention
Accuracy : 0.966
spoken-language-identification-on-youtubeCNN
Accuracy : 0.948
spoken-language-identification-on-youtube-1CNN
Accuracy : 0.871
spoken-language-identification-on-youtube-1CRNN Attention
Accuracy : 0.888
spoken-language-identification-on-youtube-1CRNN
Accuracy : 0.912

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp