HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
唇语识别
Lipreading On Lrs2
Lipreading On Lrs2
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
Repository
LIBS
65.29
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
-
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
Hybrid CTC / Attention
50
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
-
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
-
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention
32.9
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base
30.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP
28.9
Sub-word Level Lip Reading With Visual Attention
-
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
0 of 25 row(s) selected.
Previous
Next
Lipreading On Lrs2 | SOTA | HyperAI超神经