HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation

Taeyang Yun; Hyunkuk Lim; Jeonghwan Lee; Min Song

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation

Abstract

Emotion Recognition in Conversation (ERC) plays a crucial role in enabling dialogue systems to effectively respond to user requests. The emotions in a conversation can be identified by the representations from various modalities, such as audio, visual, and text. However, due to the weak contribution of non-verbal modalities to recognize emotions, multimodal ERC has always been considered a challenging task. In this paper, we propose Teacher-leading Multimodal fusion network for ERC (TelME). TelME incorporates cross-modal knowledge distillation to transfer information from a language model acting as the teacher to the non-verbal students, thereby optimizing the efficacy of the weak modalities. We then combine multimodal features using a shifting fusion approach in which student networks support the teacher. TelME achieves state-of-the-art performance in MELD, a multi-speaker conversation dataset for ERC. Finally, we demonstrate the effectiveness of our components through additional experiments.

Code Repositories

yuntaeyang/telme
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
emotion-recognition-in-conversation-onTelME
Weighted-F1: 70.48
emotion-recognition-in-conversation-on-meldTelME
Weighted-F1: 67.37

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp