Command Palette
Search for a command to run...
Jiaojiao Zhao; Cees G.M. Snoek

Abstract
The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-detection-on-j-hmdb | Two-in-one | Video-mAP 0.5: 57.96 |
| action-detection-on-j-hmdb | Two-in-one Two Stream | Video-mAP 0.5: 74.74 |
| action-detection-on-ucf-sports | Two-in-one Two Stream | Video-mAP 0.5: 96.52 |
| action-detection-on-ucf-sports | Two-in-one | Video-mAP 0.5: 92.74 |
| action-detection-on-ucf101-24 | Two-in-one | Video-mAP 0.2: 75.48 Video-mAP 0.5: 48.31 |
| action-detection-on-ucf101-24 | Two-in-one Two Stream | Video-mAP 0.2: 78.48 Video-mAP 0.5: 50.30 |
| action-recognition-in-videos-on-ucf101 | two-in-one two stream | 3-fold Accuracy: 92 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.