8 months ago

Multimodal Representation

Video Understanding

Audio Classification

Computer Vision

Hong Jie ; Hayder Zeeshan ; Han Junlin ; Fang Pengfei ; Harandi Mehrtash ; Petersson Lars

Abstract

Audio-visual zero-shot learning aims to classify samples consisting of a pairof corresponding audio and video sequences from classes that are not presentduring training. An analysis of the audio-visual data reveals a large degree ofhyperbolicity, indicating the potential benefit of using a hyperbolictransformation to achieve curvature-aware geometric learning, with the aim ofexploring more complex hierarchical data structures for this task. The proposedapproach employs a novel loss function that incorporates cross-modalityalignment between video and audio features in the hyperbolic space.Additionally, we explore the use of multiple adaptive curvatures for hyperbolicprojections. The experimental results on this very challenging task demonstratethat our proposed hyperbolic approach for zero-shot learning outperforms theSOTA method on three datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSLachieving a harmonic mean (HM) improvement of around 3.0%, 7.0%, and 5.3%,respectively.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multimodal Representation

Video Understanding

Audio Classification

Computer Vision

Hong Jie ; Hayder Zeeshan ; Han Junlin ; Fang Pengfei ; Harandi Mehrtash ; Petersson Lars

Abstract

Audio-visual zero-shot learning aims to classify samples consisting of a pairof corresponding audio and video sequences from classes that are not presentduring training. An analysis of the audio-visual data reveals a large degree ofhyperbolicity, indicating the potential benefit of using a hyperbolictransformation to achieve curvature-aware geometric learning, with the aim ofexploring more complex hierarchical data structures for this task. The proposedapproach employs a novel loss function that incorporates cross-modalityalignment between video and audio features in the hyperbolic space.Additionally, we explore the use of multiple adaptive curvatures for hyperbolicprojections. The experimental results on this very challenging task demonstratethat our proposed hyperbolic approach for zero-shot learning outperforms theSOTA method on three datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSLachieving a harmonic mean (HM) improvement of around 3.0%, 7.0%, and 5.3%,respectively.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp