5 months ago

Active Speakers in Context

Alcazar Juan Leon ; Heilbron Fabian Caba ; Mai Long ; Perazzi Federico ; Lee Joon-Young ; Arbelaez Pablo ; Ghanem Bernard

Abstract

Current methods for active speak er detection focus on modeling short-termaudiovisual information from a single speaker. Although this strategy can beenough for addressing single-speaker scenarios, it prevents accurate detectionwhen the task is to identify who of many candidate speakers are talking. Thispaper introduces the Active Speaker Context, a novel representation that modelsrelationships between multiple speakers over long time horizons. Our ActiveSpeaker Context is designed to learn pairwise and temporal relations from anstructured ensemble of audio-visual observations. Our experiments show that astructured feature ensemble already benefits the active speaker detectionperformance. Moreover, we find that the proposed Active Speaker Contextimproves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving a mAPof 87.1%. We present ablation studies that verify that this result is a directconsequence of our long-term multi-speaker analysis.

Code Repositories

fuankarion/active-speakers-context

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
audio-visual-active-speaker-detection-on-ava	Active Speakers in Context	validation mean average precision: 87.1%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette