HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
唇语识别
Lipreading On Lip Reading In The Wild
Lipreading On Lip Reading In The Wild
评估指标
Top-1 Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Top-1 Accuracy
Paper Title
Repository
SyncVSR (Word Boundary)
95.0
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)
94.1
Training Strategies for Improved Lip-reading
SyncVSR
93.2
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
AVCRFormer
89.57
Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems
-
3D Conv + EfficientNetV2 + Transformer + TCN
89.52
Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers
-
Vosk + MediaPipe + LS + MixUp + SA + 3DResNet-18 + BiLSTM + Cosine WR
88.7
Visual Speech Recognition in a Driver Assistance System
-
3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory
88.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
3D Conv + ResNet-18 + MS-TCN + KD (Ensemble)
88.5
Towards Practical Lipreading with Distilled and Efficient Models
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary)
88.4
Learn an Effective Lip Reading Model without Pains
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR
85.5
Learn an Effective Lip Reading Model without Pains
3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory
85.4
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
-
3D Conv + ResNet-18 + MS-TCN
85.30
Lipreading using Temporal Convolutional Networks
3D Conv + ResNet-18 + Bi-GRU(Face Cutout)
85.02
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
MoCo + Wav2Vec by SJTU LUMIA
85.0
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
3D Conv + P3D-ResNet50 + TCN
84.80
Discriminative Multi-modality Speech Recognition
3D Conv + ResNet-18 + Bi-GRU
84.41
Mutual Information Maximization for Effective Lip Reading
SpotFast + Transformer + Product-Key memory
84.4
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
DFTN
84.13
Deformation Flow Based Two-Stream Network for Lip Reading
PCPG
83.5
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
-
3D Conv + ResNet-34 + Bi-GRU
83.39
End-to-end Audiovisual Speech Recognition
0 of 22 row(s) selected.
Previous
Next