Speech Recognition On Lrs3 Ted
Metrics
Word Error Rate (WER)
Results
Performance results of various models on this benchmark
Model Name | Word Error Rate (WER) | Paper Title | Repository |
---|---|---|---|
Whisper | 0.68 | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | - |
RAVEn Large | 1.4 | Jointly Learning Visual and Auditory Speech Representations from Raw Data | - |
Llama-AVSR | 0.81 | Large Language Models are Strong Audio-Visual Speech Recognition Learners | - |
AV-HuBERT Large | 1.3 | Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction | - |
0 of 4 row(s) selected.