HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MARLIN: Masked Autoencoder for facial video Representation LearnINg

Cai Zhixi ; Ghosh Shreya ; Stefanov Kalin ; Dhall Abhinav ; Cai Jianfei ; Rezatofighi Hamid ; Haffari Reza ; Hayat Munawar

MARLIN: Masked Autoencoder for facial video Representation LearnINg

Abstract

This paper proposes a self-supervised approach to learn universal facialrepresentations from videos, that can transfer across a variety of facialanalysis tasks such as Facial Attribute Recognition (FAR), Facial ExpressionRecognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Ourproposed framework, named MARLIN, is a facial video masked autoencoder, thatlearns highly robust and generic facial embeddings from abundantly availablenon-annotated web crawled facial videos. As a challenging auxiliary task,MARLIN reconstructs the spatio-temporal details of the face from the denselymasked facial regions which mainly include eyes, nose, mouth, lips, and skin tocapture local and global aspects that in turn help in encoding generic andtransferable features. Through a variety of experiments on diverse downstreamtasks, we demonstrate MARLIN to be an excellent facial video encoder as well asfeature extractor, that performs consistently well across a variety ofdownstream tasks including FAR (1.13% gain over supervised benchmark), FER(2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervisedbenchmark), LS (29.36% gain for Frechet Inception Distance), and even in lowdata regime. Our code and models are available athttps://github.com/ControlNet/MARLIN .

Code Repositories

ControlNet/MARLIN
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-celebv-hqMARLIN
AUC: 0.9406
Accuracy: 95.48
deepfake-detection-on-faceforensics-1MARLIN (ViT-B)
AUC: 0.9305
deepfake-detection-on-faceforensics-1MARLIN (ViT-L)
AUC: 0.9377
deepfake-detection-on-faceforensics-1MARLIN (ViT-S)
AUC: 0.8863
emotion-classification-on-cmu-moseiMARLIN (ViT-S)
Accuracy: 80.38
emotion-classification-on-cmu-moseiMARLIN (ViT-B)
Accuracy: 80.6
emotion-classification-on-cmu-moseiMARLIN (ViT-L)
Accuracy: 80.63
facial-attribute-classification-on-celebv-hqMARLIN
AUC: 0.9561
Accuracy: 93.9
lip-sync-on-lrs2Wav2Lip + ViT + MARLIN
FID: 3.452
LSE-C: 5.528
LSE-D: 7.127
multimodal-sentiment-analysis-on-cmu-mosei-1MARLIN (ViT-B)
Accuracy: 73.7
multimodal-sentiment-analysis-on-cmu-mosei-1MARLIN (ViT-S)
Accuracy: 72.69
multimodal-sentiment-analysis-on-cmu-mosei-1MARLIN (ViT-L)
Accuracy: 74.83

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp