Command Palette
Search for a command to run...
R Prajwal K ; Mukhopadhyay Rudrabha ; Philip Jerin ; Jha Abhishek ; Namboodiri Vinay ; Jawahar C. V.

Abstract
In light of the recent breakthroughs in automatic machine translationsystems, we propose a novel approach that we term as "Face-to-FaceTranslation". As today's digital communication becomes increasingly visual, weargue that there is a need for systems that can automatically translate a videoof a person speaking in language A into a target language B with realistic lipsynchronization. In this work, we create an automatic pipeline for this problemand demonstrate its impact on multiple real-world applications. First, we builda working speech-to-speech translation system by bringing together multipleexisting modules from speech and language. We then move towards "Face-to-FaceTranslation" by incorporating a novel visual module, LipGAN for generatingrealistic talking faces from the translated audio. Quantitative evaluation ofLipGAN on the standard LRW test set shows that it significantly outperformsexisting approaches across all standard metrics. We also subject ourFace-to-Face Translation pipeline, to multiple human evaluations and show thatit can significantly improve the overall user experience for consuming andinteracting with multimodal content across languages. Code, models and demovideo are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| talking-face-generation-on-lrw | LipGAN | LMD: 0.60 SSIM: 0.96 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.