HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

UAVM: Towards Unifying Audio and Visual Models

Yuan Gong; Alexander H. Liu; Andrew Rouditchenko; James Glass

UAVM: Towards Unifying Audio and Visual Models

Abstract

Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do not have.

Code Repositories

YuanGongND/uavm
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-audiosetUAVM (Audio + Video)
Test mAP: 0.504
audio-classification-on-vggsoundUAVM (Audio + Video)
Top 1 Accuracy: 65.8
audio-classification-on-vggsoundUAVM (Audio Only)
Top 1 Accuracy: 56.5
audio-classification-on-vggsoundUAVM (Video Only)
Top 1 Accuracy: 49.9
multi-modal-classification-on-audiosetUAVM
Average mAP: 0.504
multi-modal-classification-on-vgg-soundUAVM
Top-1 Accuracy: 65.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp