HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

Mazumder Pratik ; Singh Pravendra ; Parida Kranti Kumar ; Namboodiri Vinay P.

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing
  Label Features from Multi-Modal Embeddings

Abstract

In this paper, we propose a novel approach for generalized zero-shot learningin a multi-modal setting, where we have novel classes of audio/video duringtesting that are not seen during training. We use the semantic relatedness oftext embeddings as a means for zero-shot learning by aligning audio and videoembeddings with the corresponding class label text feature space. Our approachuses a cross-modal decoder and a composite triplet loss. The cross-modaldecoder enforces a constraint that the class label text features can bereconstructed from the audio and video embeddings of data points. This helpsthe audio and video embeddings to move closer to the class label textembedding. The composite triplet loss makes use of the audio, video, and textembeddings. It helps bring the embeddings from the same class closer and pushaway the embeddings from different classes in a multi-modal setting. This helpsthe network to perform better on the multi-modal zero-shot learning task.Importantly, our multi-modal zero-shot learning approach works even if amodality is missing at test time. We test our approach on the generalizedzero-shot classification and retrieval tasks and show that our approachoutperforms other models in the presence of a single modality as well as in thepresence of multiple modalities. We validate our approach by comparing it withprevious approaches and using various ablations.

Benchmarks

BenchmarkMethodologyMetrics
gzsl-video-classification-on-activitynet-gzsl-1AVGZSLNet
HM: 6.44
ZSL: 5.40
gzsl-video-classification-on-ucf-gzsl-mainAVGZSLNet
HM: 18.05
ZSL: 13.65
gzsl-video-classification-on-vggsound-gzsl-1AVGZSLNet
HM: 5.83
ZSL: 5.28

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings | Papers | HyperAI