HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Mark Hamilton Andrew Zisserman John R. Hershey William T. Freeman

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding
  of Sound and Language

Abstract

We present DenseAV, a novel dual encoder grounding architecture that learnshigh-resolution, semantically meaningful, and audio-visually aligned featuressolely through watching videos. We show that DenseAV can discover themeaning'' of words and thelocation'' of sounds without explicitlocalization supervision. Furthermore, it automatically discovers anddistinguishes between these two types of associations without supervision. Weshow that DenseAV's localization abilities arise from a new multi-head featureaggregation operator that directly compares dense image and audiorepresentations for contrastive learning. In contrast, many other systems thatlearn ``global'' audio and video representations cannot localize words andsound. Finally, we contribute two new datasets to improve the evaluation of AVrepresentations through speech and sound prompted semantic segmentation. Onthese and other datasets we show DenseAV dramatically outperforms the prior arton speech and sound prompted semantic segmentation. DenseAV outperforms theprevious state-of-the-art, ImageBind, on cross-modal retrieval using fewer thanhalf of the parameters. Project Page:https://aka.ms/denseav{https://aka.ms/denseav}

Code Repositories

mhamilton723/DenseAV
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sound-prompted-semantic-segmentation-onDenseAV
mAP: 32.7
mIoU: 24.7
speech-prompted-semantic-segmentation-onDenseAV
mAP: 48.7
mIoU: 36.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp