Command Palette
Search for a command to run...
Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks
{ Gang Hua Nanning Zheng Zhenxing Niu Zhanning Gao Qilin Zhang Le Wang Ziyi Liu}

Abstract
Weakly-supervised temporal action localization (WS-TAL) is a promising but challenging task with only video-level action categorical labels available during training. Without requiring temporal action boundary annotations in training data, WS-TAL could possibly exploit automatically retrieved video tags as video-level labels. However, such coarse video-level supervision inevitably incurs confusions, especially in untrimmed videos containing multiple action instances. To address this challenge, we propose the Contrast-based Localization EvaluAtioN Network (CleanNet) with our new action proposal evaluator, which provides pseudo-supervision by leveraging the temporal contrast in snippet-level action classification predictions. Essentially, the new action proposal evaluator enforces an additional temporal contrast constraint so that high-evaluation-score action proposals are more likely to coincide with true action instances. Moreover, the new action localization module is an integral part of CleanNet which enables end-to-end training. This is in contrast to many existing WS-TAL methods where action localization is merely a post-processing step. Experiments on THUMOS14 and ActivityNet datasets validate the efficacy of CleanNet against existing state-ofthe- art WS-TAL algorithms.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| weakly-supervised-action-localization-on | CleanNet | mAP@0.1:0.7: - mAP@0.5: 23.9 |
| weakly-supervised-action-localization-on-2 | CleanNet | mAP@0.5: 37.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.