Command Palette
Search for a command to run...
W-TALC: Weakly-supervised Temporal Activity Localization and Classification
Sujoy Paul; Sourya Roy; Amit K Roy-Chowdhury

Abstract
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-classification-on-activitynet-12 | W-TALC | mAP: 93.2 |
| action-classification-on-thumos14 | W-TALC | mAP: 85.6 |
| weakly-supervised-action-localization-on | W-TALC | mAP@0.1:0.7: - mAP@0.5: 22.8 |
| weakly-supervised-action-localization-on-2 | W-TALC | mAP@0.5: 37.0 |
| weakly-supervised-action-localization-on-7 | W-TALC | mAP: 3.45 mAP IOU@0.5: 6.18 mAP IOU@0.75: 3.15 mAP IOU@0.95: 0.83 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.