HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Cheng Ho Kei ; Tai Yu-Wing ; Tang Chi-Keung

Modular Interactive Video Object Segmentation: Interaction-to-Mask,
  Propagation and Difference-Aware Fusion

Abstract

We present Modular interactive VOS (MiVOS) framework which decouplesinteraction-to-mask and mask propagation, allowing for higher generalizabilityand better performance. Trained separately, the interaction module convertsuser interactions to an object mask, which is then temporally propagated by ourpropagation module using a novel top-$k$ filtering strategy in reading thespace-time memory. To effectively take the user's intent into account, a noveldifference-aware module is proposed to learn how to properly fuse the masksbefore and after each interaction, which are aligned with the target frames byemploying the space-time memory. We evaluate our method both qualitatively andquantitatively with different forms of user interactions (e.g., scribbles,clicks) on DAVIS to show that our method outperforms current state-of-the-artalgorithms while requiring fewer frame interactions, with the additionaladvantage in generalizing to different types of user interactions. Wecontribute a large-scale synthetic VOS dataset with pixel-accurate segmentationof 4.8M frames to accompany our source codes to facilitate future research.

Code Repositories

limingxing00/rde-vos-cvpr2022
pytorch
Mentioned in GitHub
hkchengrex/MiVOS
Official
pytorch
Mentioned in GitHub
hkchengrex/Scribble-to-Mask
pytorch
Mentioned in GitHub
Vujas-Eteph/CiVOS
pytorch
Mentioned in GitHub
hkchengrex/Mask-Propagation
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
interactive-video-object-segmentation-onMiVOS
AUC-J: 0.849
AUC-Ju0026F: 0.879
Ju0026F@60s: 0.885
J@60s: 0.854
semi-supervised-video-object-segmentation-on-1MiVOS
F-measure (Decay): 14.5
F-measure (Mean): 80.2
F-measure (Recall): 87.6
Ju0026F: 76.5
Jaccard (Decay): 14.9
Jaccard (Mean): 72.7
Jaccard (Recall): 81.2
video-object-segmentation-on-youtube-vosMiVOS
F-Measure (Seen): 84.7
F-Measure (Unseen): 85.5
Jaccard (Seen): 80.6
Jaccard (Unseen): 77.3
Overall: 82.0
visual-object-tracking-on-davis-2016MiVOS
F-measure (Decay): 5.1
F-measure (Mean): 92.4
F-measure (Recall): 96.4
Ju0026F: 91.0
Jaccard (Decay): 6.6
Jaccard (Mean): 89.7
Jaccard (Recall): 97.5
Speed (FPS): 16.9
visual-object-tracking-on-davis-2017MiVOS
F-measure (Decay): 8.2
F-measure (Mean): 87.4
F-measure (Recall): 93.1
Ju0026F: 84.5
Jaccard (Decay): 7.0
Jaccard (Mean): 81.7
Jaccard (Recall): 90.9
Speed (FPS): 11.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp