HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

LoCoNet: Long-Short Context Network for Active Speaker Detection

Wang Xizi ; Cheng Feng ; Bertasius Gedas ; Crandall David

LoCoNet: Long-Short Context Network for Active Speaker Detection

Abstract

Active Speaker Detection (ASD) aims to identify who is speaking in each frameof a video. ASD reasons from audio and visual information from two contexts:long-term intra-speaker context and short-term inter-speaker context. Long-termintra-speaker context models the temporal dependencies of the same speaker,while short-term inter-speaker context models the interactions of speakers inthe same scene. These two contexts are complementary to each other and can helpinfer the active speaker. Motivated by these observations, we propose LoCoNet,a simple yet effective Long-Short Context Network that models the long-termintra-speaker context and short-term inter-speaker context. We useself-attention to model long-term intra-speaker context due to itseffectiveness in modeling long-range dependencies, and convolutional blocksthat capture local patterns to model short-term inter-speaker context.Extensive experiments show that LoCoNet achieves state-of-the-art performanceon multiple datasets, achieving an mAP of 95.2%(+1.1%) on AVA-ActiveSpeaker,68.1%(+22%) on Columbia dataset, 97.2%(+2.8%) on Talkies dataset and59.7%(+8.0%) on Ego4D dataset. Moreover, in challenging cases where multiplespeakers are present, or face of active speaker is much smaller than otherfaces in the same scene, LoCoNet outperforms previous state-of-the-art methodsby 3.4% on the AVA-ActiveSpeaker dataset. The code will be released athttps://github.com/SJTUwxz/LoCoNet_ASD.

Code Repositories

sjtuwxz/loconet_asd
Official
pytorch
kaistmm/TalkNCE
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-visual-active-speaker-detection-on-avaLoCoNet
validation mean average precision: 95.2%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp