HyperAI

Abstract

We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.

Abstract

Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah

Abstract

Build AI with AI

HyperAI Newsletters

Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah

Abstract

Build AI with AI

HyperAI Newsletters

Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Hierarchical Fusion for Online Multimodal Dialog Act Classification

Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Hierarchical Fusion for Online Multimodal Dialog Act Classification

Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Hierarchical Fusion for Online Multimodal Dialog Act Classification

Ruihong Huang Adarsh Pyarelal Md Messal Monem Miah

Abstract

Build AI with AI

HyperAI Newsletters