8 months ago

Abstract

It is now well established from a variety of studies that there is asignificant benefit from combining video and audio data in detecting activespeakers. However, either of the modalities can potentially mislead audiovisualfusion by inducing unreliable or deceptive information. This paper outlinesactive speaker detection as a multi-objective learning problem to leverage bestof each modalities using a novel self-attention, uncertainty-based multimodalfusion scheme. Results obtained show that the proposed multi-objective learningarchitecture outperforms traditional approaches in improving both mAP and AUCscores. We further demonstrate that our fusion strategy surpasses, in activespeaker detection, other modality fusion methods reported in variousdisciplines. We finally show that the proposed method significantly improvesthe state-of-the-art on the AVA-ActiveSpeaker dataset.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Multimodal

Multimodal Representation

Baptiste Pouthier Laurent Pilati Leela K. Gudupudi Charles Bouveyron Frederic Precioso

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Multimodal

Multimodal Representation

Baptiste Pouthier Laurent Pilati Leela K. Gudupudi Charles Bouveyron Frederic Precioso

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Baptiste Pouthier Laurent Pilati Leela K. Gudupudi Charles Bouveyron Frederic Precioso

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Baptiste Pouthier Laurent Pilati Leela K. Gudupudi Charles Bouveyron Frederic Precioso

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Baptiste Pouthier Laurent Pilati Leela K. Gudupudi Charles Bouveyron Frederic Precioso

Abstract

Build AI with AI

HyperAI Newsletters