Command Palette
Search for a command to run...
Yong-Lu Li Liang Xu Xinpeng Liu Xijie Huang Yue Xu Shiyi Wang Hao-Shu Fang Ze Ma Mingyang Chen Cewu Lu

Abstract
Existing image-based activity understanding methods mainly adopt direct mapping, i.e. from image to activity concepts, which may encounter performance bottleneck since the huge gap. In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics. Human Body Part States (PaSta) are fine-grained action semantic tokens, e.g. <hand, hold, something>, which can compose the activities and help us step toward human activity knowledge engine. To fully utilize the power of PaSta, we build a large-scale knowledge base PaStaNet, which contains 7M+ PaSta annotations. And two corresponding models are proposed: first, we design a model named Activity2Vec to extract PaSta features, which aim to be general representations for various activities. Second, we use a PaSta-based Reasoning method to infer activities. Promoted by PaStaNet, our method achieves significant improvements, e.g. 6.4 and 13.9 mAP on full and one-shot sets of HICO in supervised learning, and 3.2 and 4.2 mAP on V-COCO and images-based AVA in transfer learning. Code and data are available at http://hake-mvig.cn/.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-object-interaction-detection-on-hico | PaStaNet | mAP: 22.65 |
| human-object-interaction-detection-on-hico-1 | PaStaNet | mAP: 46.3 |
| human-object-interaction-detection-on-v-coco | PaStaNet | AP(S1): 51.0 AP(S2): 57.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.