Command Palette
Search for a command to run...
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
Ming Xu; Stephen Gould

Abstract
We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-action-segmentation-on-breakfast | ASOT | Acc: 56.1 F1: 38.3 JSD: 94.9 Precision: 36.7 Recall: 40.1 mIoU: 18.6 |
| unsupervised-action-segmentation-on-ikea-asm | ASOT | Accuracy: 34.0 F1: 27.9 JSD: 88.7 Precision: 21.1 Recall: 24.0 |
| unsupervised-action-segmentation-on-youtube | ASOT | Acc: 52.9 F1: 35.1 Precision: 47.6 Recall: 27.8 mIoU: 24.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.