HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Zhuo Chen Yufeng Huang Jiaoyan Chen Yuxia Geng Wen Zhang Yin Fang Jeff Z. Pan Huajun Chen

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Abstract

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

Code Repositories

zjukg/DUET
Official
pytorch
Mentioned in GitHub
zjukg/structure-clip
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-learning-on-awa2DUET (Ours)
Accuracy Seen: 84.7
Accuracy Unseen: 63.7
H: 72.7
average top-1 classification accuracy: 69.9
zero-shot-learning-on-cub-200-2011DUET
Accuracy Seen: 72.8
Accuracy Unseen: 62.9
H: 67.5
average top-1 classification accuracy: 72.3
zero-shot-learning-on-sun-attributeDUET (Ours)
Accuracy Seen: 45.8
Accuracy Unseen: 45.7
H: 45.8
average top-1 classification accuracy: 64.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp