HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Hierarchical Fusion for Online Multimodal Dialog Act Classification

{Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah}

Hierarchical Fusion for Online Multimodal Dialog Act Classification

Abstract

We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.

Benchmarks

BenchmarkMethodologyMetrics
dialogue-act-classification-on-emotydaHierarchical Fusion
Accuracy: 63.42
dialogue-act-classification-on-icsi-meetingHierarchical Fusion
Accuracy: 91.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchical Fusion for Online Multimodal Dialog Act Classification | Papers | HyperAI