8 months ago

Convolutional Neural Network

Multi-Task Learning

Computer Vision

Method/Architecture

Computer Vision

Andrés Prados-Torreblanca José M. Buenaposada Luis Baumela

Abstract

Top-performing landmark estimation algorithms are based on exploiting theexcellent ability of large convolutional neural networks (CNNs) to representlocal appearance. However, it is well known that they can only learn weakspatial relationships. To address this problem, we propose a model based on thecombination of a CNN with a cascade of Graph Attention Network regressors. Tothis end, we introduce an encoding that jointly represents the appearance andlocation of facial landmarks and an attention mechanism to weigh theinformation according to its reliability. This is combined with a multi-taskapproach to initialize the location of graph nodes and a coarse-to-finelandmark description scheme. Our experiments confirm that the proposed modellearns a global representation of the structure of the face, achieving topperformance in popular benchmarks on head pose and landmark estimation. Theimprovement provided by our model is most significant in situations involvinglarge changes in the local appearance of landmarks.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Convolutional Neural Network

Multi-Task Learning

Computer Vision

Method/Architecture

Computer Vision

Andrés Prados-Torreblanca José M. Buenaposada Luis Baumela

Abstract

Top-performing landmark estimation algorithms are based on exploiting theexcellent ability of large convolutional neural networks (CNNs) to representlocal appearance. However, it is well known that they can only learn weakspatial relationships. To address this problem, we propose a model based on thecombination of a CNN with a cascade of Graph Attention Network regressors. Tothis end, we introduce an encoding that jointly represents the appearance andlocation of facial landmarks and an attention mechanism to weigh theinformation according to its reliability. This is combined with a multi-taskapproach to initialize the location of graph nodes and a coarse-to-finelandmark description scheme. Our experiments confirm that the proposed modellearns a global representation of the structure of the face, achieving topperformance in popular benchmarks on head pose and landmark estimation. Theimprovement provided by our model is most significant in situations involvinglarge changes in the local appearance of landmarks.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp