3 months ago

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

Zhaoheng Zheng Haidong Zhu Ram Nevatia

Abstract

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Code Repositories

zhaohengz/caila

Official

pytorch

Mentioned in GitHub

zhaohengz/llamp

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
compositional-zero-shot-learning-on-mit-3	CAILA	H-Mean: 39.9 Seen accuracy: 51.0 Test AUC top 1: 23.4 Test AUC top 2: - Test AUC top 3: - Unseen accuracy: 53.9 Val AUC top 1: - Val AUC top 2: - Val AUC top 3: -

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette