Command Palette
Search for a command to run...
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Yazan Abu Farha; Juergen Gall

Abstract
Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional approaches follow a two-step pipeline, by generating frame-wise probabilities and then feeding them to high-level temporal models, recent approaches use temporal convolutions to directly classify the video frames. In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. Each stage features a set of dilated temporal convolutions to generate an initial prediction that is refined by the next one. This architecture is trained using a combination of a classification loss and a proposed smoothing loss that penalizes over-segmentation errors. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our model achieves state-of-the-art results on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-segmentation-on-50-salads-1 | MS-TCN | Acc: 80.7 Edit: 67.9 F1@10%: 76.3 F1@25%: 74.0 F1@50%: 64.5 |
| action-segmentation-on-breakfast-1 | MS-TCN (IDT) | Acc: 65.1 Average F1: 50.6 Edit: 61.4 F1@10%: 58.2 F1@25%: 52.9 F1@50%: 40.8 |
| action-segmentation-on-breakfast-1 | MS-TCN (I3D) | Acc: 66.3 Average F1: 46.2 Edit: 61.7 F1@10%: 52.6 F1@25%: 48.1 F1@50%: 37.9 |
| action-segmentation-on-gtea-1 | MS-TCN | Acc: 79.2 Edit: 81.4 F1@10%: 87.5 F1@25%: 85.4 F1@50%: 74.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.