3 months ago

ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2021

{Shiguang Shan Zhongqin Wu Xiao Liu Shuang Yang Susan Liang Yuanhang Zhang}

Abstract

This report presents a brief description of our method for the AVA Active Speaker Detection (ASD) task at ActivityNetChallenge 2021. Our solution, the Extended Unified Context Network (Extended UniCon) is based on a novel UnifiedContext Network (UniCon) designed for robust ASD, which combines multiple types of contextual information to optimize all candidates jointly. We propose a few changes to the original UniCon in terms of audio features, temporal modeling architecture, and loss function design. Together, our best model ensemble sets a new state-of-the-art at 93.4% mAP on the AVA-ActiveSpeaker test set without any form of pretraining, and currently ranks first on the ActivityNet challenge leaderboard.

Benchmarks

Benchmark	Methodology	Metrics
audio-visual-active-speaker-detection-on-ava	Extended UniCon	validation mean average precision: 93.6%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette