HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Audiovisual Masked Autoencoders

Mariana-Iuliana Georgescu Eduardo Fonseca Radu Tudor Ionescu Mario Lucic Cordelia Schmid Anurag Arnab

Audiovisual Masked Autoencoders

Abstract

Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and objectives within the masked autoencoding framework, motivated by the success of similar methods in natural language and image understanding. We show that we can achieve significant improvements on audiovisual downstream classification tasks, surpassing the state-of-the-art on VGGSound and AudioSet. Furthermore, we can leverage our audiovisual pretraining scheme for multiple unimodal downstream tasks using a single audiovisual pretrained model. We additionally demonstrate the transferability of our representations, achieving state-of-the-art audiovisual results on Epic Kitchens without pretraining specifically for this dataset.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-audiosetAudiovisual Masked Autoencoder (Audio-only, Single)
Test mAP: 0.466
audio-classification-on-audiosetAudiovisual Masked Autoencoder (Audiovisual, Single)
Test mAP: 0.518
audio-classification-on-epic-kitchens-100Audiovisual Masked Autoencoder (Video-only, Single)
Top-1 Action: 45.8
Top-1 Noun: 55.9
Top-1 Verb: 70.8
audio-classification-on-epic-kitchens-100Audiovisual Masked Autoencoder (Audiovisual, Single)
Top-1 Action: 46.0
Top-1 Noun: 56.4
Top-1 Verb: 71.4
audio-classification-on-epic-kitchens-100Audiovisual Masked Autoencoder (Audio-only, Single)
Top-1 Action: 19.7
Top-1 Noun: 27.2
Top-1 Verb: 52.7
audio-classification-on-vggsoundAudiovisual Masked Autoencoder (Audio-only, Single)
Top 1 Accuracy: 57.2
audio-classification-on-vggsoundAudiovisual Masked Autoencoder (Audiovisual, Single)
Top 1 Accuracy: 65.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp