HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
唇语识别
Lipreading On Lrs3 Ted
Lipreading On Lrs3 Ted
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
Repository
Conv-seq2seq
60.1
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
CTC + KD
59.8
ASR is all you need: cross-modal distillation for lip reading
-
TM-seq2seq
58.9
Deep Audio-Visual Speech Recognition
EG-seq2seq
57.8
Discriminative Multi-modality Speech Recognition
CTC-V2P
55.1
Large-Scale Visual Speech Recognition
-
Hyb + Conformer
43.3
End-to-end Audio-visual Speech Recognition with Conformers
VTP
40.6
Sub-word Level Lip Reading With Visual Attention
-
ES³ Base
40.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Large
37.1
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
RNN-T
33.6
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
CTC/Attention (LRW+LRS2/3+AVSpeech)
31.5
Visual Speech Recognition for Multiple Languages in the Wild
SyncVSR
31.2
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP (more data)
30.7
Sub-word Level Lip Reading With Visual Attention
-
AV-HuBERT Large
26.9
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
DistillAV
26.2
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
AV-HuBERT Large + Relaxed Attention + LM
25.51
Relaxed Attention for Transformer Models
VSP-LLM
25.4
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
RAVEn Large
23.4
Jointly Learning Visual and Auditory Speech Representations from Raw Data
USR (self-supervised)
22.3
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
SyncVSR
21.5
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
0 of 23 row(s) selected.
Previous
Next