HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Shan Ning Longtian Qiu Yongfei Liu Xuming He

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Abstract

Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP.

Code Repositories

artanic30/hoiclip
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
human-object-interaction-detection-on-hicoHOICLIP
mAP: 34.69
human-object-interaction-detection-on-v-cocoHOICLIP
AP(S1): 63.50
AP(S2): 64.81

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp