Command Palette
Search for a command to run...
Hong Jie ; Hayder Zeeshan ; Han Junlin ; Fang Pengfei ; Harandi Mehrtash ; Petersson Lars

Abstract
Audio-visual zero-shot learning aims to classify samples consisting of a pairof corresponding audio and video sequences from classes that are not presentduring training. An analysis of the audio-visual data reveals a large degree ofhyperbolicity, indicating the potential benefit of using a hyperbolictransformation to achieve curvature-aware geometric learning, with the aim ofexploring more complex hierarchical data structures for this task. The proposedapproach employs a novel loss function that incorporates cross-modalityalignment between video and audio features in the hyperbolic space.Additionally, we explore the use of multiple adaptive curvatures for hyperbolicprojections. The experimental results on this very challenging task demonstratethat our proposed hyperbolic approach for zero-shot learning outperforms theSOTA method on three datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSLachieving a harmonic mean (HM) improvement of around 3.0%, 7.0%, and 5.3%,respectively.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| gzsl-video-classification-on-activitynet-gzsl | Hyper-multiple | HM: 15.25 ZSL: 10.39 |
| gzsl-video-classification-on-activitynet-gzsl-1 | Hyper-multiple | HM: 12.65 ZSL: 9.50 |
| gzsl-video-classification-on-ucf-gzsl-cls | Hyper-multiple | HM: 48.30 ZSL: 52.11 |
| gzsl-video-classification-on-ucf-gzsl-main | Hyper-multiple | HM: 29.32 ZSL: 22.24 |
| gzsl-video-classification-on-vggsound-gzsl | Hyper-multiple | HM: 8.67 ZSL: 7.31 |
| gzsl-video-classification-on-vggsound-gzsl-1 | Hyper-multiple | HM: 9.32 ZSL: 7.97 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.