Command Palette
Search for a command to run...
Egor Zakharov Aleksei Ivakhnenko Aliaksandra Shysheya Victor Lempitsky

Abstract
We propose a neural rendering-based system that creates head avatars from a single photograph. Our approach models a person's appearance by decomposing it into two layers. The first layer is a pose-dependent coarse image that is synthesized by a small neural network. The second layer is defined by a pose-independent texture image that contains high-frequency details. The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views. We compare our system to analogous state-of-the-art systems in terms of visual quality and speed. The experiments show significant inference speedup over previous neural head avatar models for a given visual quality. We also report on a real-time smartphone-based implementation of our system.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| talking-head-generation-on-voxceleb2-1-shot | Fast Bi-layer Avatars (medium size) | CSIM: 0.653 LPIPS: 0.358 Normalized Pose Error: 43.3 SSIM: 0.508 inference time (ms): 4 |
| talking-head-generation-on-voxceleb2-1-shot | Few-shot Vid-to-vid (medium size) | CSIM: 0.604 LPIPS: 0.368 Normalized Pose Error: 46.1 SSIM: 0.419 inference time (ms): 22 |
| talking-head-generation-on-voxceleb2-1-shot | First Order Motion Model (medium size) | CSIM: 0.638 LPIPS: 0.311 Normalized Pose Error: 47.8 SSIM: 0.553 inference time (ms): 13 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.