HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Frederic Z. Zhang Dylan Campbell Stephen Gould

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Abstract

Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

Code Repositories

fredzzhang/upt
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
human-object-interaction-detection-on-hicoUPT-R50
Time Per Frame (ms): 42
mAP: 31.66
human-object-interaction-detection-on-hicoUPT-R101
Time Per Frame (ms): 61
mAP: 32.31
human-object-interaction-detection-on-hicoUPT-R101-DC5
Time Per Frame (ms): 124
mAP: 32.62
human-object-interaction-detection-on-v-cocoUPT-R101
AP(S1): 60.7
AP(S2): 66.2
Time Per Frame(ms): 64
human-object-interaction-detection-on-v-cocoUPT-R50
AP(S1): 59.0
AP(S2): 64.5
Time Per Frame(ms): 43
human-object-interaction-detection-on-v-cocoUPT-R101-DC5
AP(S1): 61.3
AP(S2): 67.1
Time Per Frame(ms): 131

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp