5 months ago

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Ming Xu; Stephen Gould

Abstract

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Code Repositories

mingu6/action_seg_ot

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
unsupervised-action-segmentation-on-breakfast	ASOT	Acc: 56.1 F1: 38.3 JSD: 94.9 Precision: 36.7 Recall: 40.1 mIoU: 18.6
unsupervised-action-segmentation-on-ikea-asm	ASOT	Accuracy: 34.0 F1: 27.9 JSD: 88.7 Precision: 21.1 Recall: 24.0
unsupervised-action-segmentation-on-youtube	ASOT	Acc: 52.9 F1: 35.1 Precision: 47.6 Recall: 27.8 mIoU: 24.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette