Command Palette
Search for a command to run...
Aliaksandr Siarohin Oliver J. Woodford Jian Ren Menglei Chai Sergey Tulyakov

Abstract
We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, shape, and pose. The regions correspond to semantically relevant and distinct object parts, that are more easily detected in frames of the driving video. To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation. To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space. Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks. We present a challenging new benchmark with high-resolution videos and show that the improvement is particularly pronounced when articulated objects are considered, reaching 96.6% user preference vs. the state of the art.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-reconstruction-on-mgif | Siarohin et al. | L1: 0.0206 |
| video-reconstruction-on-mgif | FOMM | L1: 0.0223 |
| video-reconstruction-on-tai-chi-hd-256 | FOMM | AED: 0.172 AKD: 6.53 L1: 0.056 MKR: 0.033 |
| video-reconstruction-on-tai-chi-hd-256 | Siarohin et al. | AED: 0.152 AKD: 5.58 L1: 0.047 MKR: 0.027 |
| video-reconstruction-on-tai-chi-hd-512 | Siarohin et al. | AED: 0.172 AKD: 13.86 L1: 0.064 MKR: 0.043 |
| video-reconstruction-on-tai-chi-hd-512 | FOMM | AED: 0.203 AKD: 17.12 L1: 0.075 MKR: 0.066 |
| video-reconstruction-on-ted-talks | Siarohin et al. | AED: 0.114 AKD: 3.75 L1: 0.026 MKR: 0.007 |
| video-reconstruction-on-ted-talks | FOMM | AED: 0.163 AKD: 7.07 L1: 0.033 MKR: 0.014 |
| video-reconstruction-on-voxceleb | FOMM | AED: 0.134 AKD: 1.27 L1: 0.041 |
| video-reconstruction-on-voxceleb | Siarohin et al. | AED: 0.133 AKD: 1.28 L1: 0.040 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.