Command Palette
Search for a command to run...
Prados-Torreblanca Andrés ; Buenaposada José M. ; Baumela Luis

Abstract
Top-performing landmark estimation algorithms are based on exploiting theexcellent ability of large convolutional neural networks (CNNs) to representlocal appearance. However, it is well known that they can only learn weakspatial relationships. To address this problem, we propose a model based on thecombination of a CNN with a cascade of Graph Attention Network regressors. Tothis end, we introduce an encoding that jointly represents the appearance andlocation of facial landmarks and an attention mechanism to weigh theinformation according to its reliability. This is combined with a multi-taskapproach to initialize the location of graph nodes and a coarse-to-finelandmark description scheme. Our experiments confirm that the proposed modellearns a global representation of the structure of the face, achieving topperformance in popular benchmarks on head pose and landmark estimation. Theimprovement provided by our model is most significant in situations involvinglarge changes in the local appearance of landmarks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| face-alignment-on-300w | SPIGA | NME_inter-ocular (%, Challenge): 4.66 NME_inter-ocular (%, Common): 2.59 NME_inter-ocular (%, Full): 2.99 NME_inter-pupil (%, Challenge): 6.73 NME_inter-pupil (%, Common): 3.59 NME_inter-pupil (%, Full): 4.20 |
| face-alignment-on-300w-common | SPIGA | NME: 2.59 |
| face-alignment-on-300w-split-2 | SPIGA | AUC@7 (box): 71.0 AUC@8 (inter-ocular): 57.27 FR@8 (inter-ocular): 0.67 NME (box): 2.03 NME (inter-ocular): 3.43 |
| face-alignment-on-cofw-68 | SPIGA | AUC@7 (box): 64.1 NME (box): 2.52 NME (inter-ocular): 3.93 |
| face-alignment-on-merl-rav | SPIGA | AUC@7 (box) : 78.47 NME (box): 1.51 |
| face-alignment-on-wflw | SPIGA | AUC@10 (inter-ocular): 60.56 FR@10 (inter-ocular): 2.08 NME (inter-ocular): 4.06 |
| face-alignment-on-wfw-extra-data | SPIGA | AUC@10 (inter-ocular): 60.56 FR@10 (inter-ocular): 2.08 NME (inter-ocular): 4.06 |
| facial-landmark-detection-on-300w | SPIGA (Inter-ocular Norm) | NME: 2.99 |
| head-pose-estimation-on-wflw | SPIGA | MAE mean (º): 1.52 MAE pitch (º): 1.86 MAE roll (º): 0.93 MAE yaw (º): 1.78 |
| pose-estimation-on-300w-full | SPIGA | MAE mean (º): 1.29 MAE pitch (º): 1.70 MAE roll (º): 0.77 MAE yaw (º): 1.41 |
| pose-estimation-on-merl-rav | SPIGA | MAE mean (º): 2.39 MAE pitch (º): 2.24 MAE roll (º): 1.71 MAE yaw (º): 3.23 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.