HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark

{Joon-Young Lee Seonguk Seo Bohyung Han}

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark

Abstract

We propose a unified referring video object segmentation network (URVOS). URVOS takes a video and a referring expression as inputs, and estimates the {object masks} referred by the given language expression in the whole video frames. Our algorithm addresses the challenging problem by performing language-based object segmentation and mask propagation jointly using a single deep neural network with a proper combination of two attention models. In addition, we construct the first large-scale referring video object segmentation dataset called Refer-Youtube-VOS. We evaluate our model on two benchmark datasets including ours and demonstrate the effectiveness of the proposed approach. The dataset is released at url{https://github.com/skynbe/Refer-Youtube-VOS}.

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-davisURVOS + Refer-Youtube-VOS + ft. DAVIS
Ju0026F 1st frame: 51.63
referring-expression-segmentation-on-davisURVOS + Refer-Youtube-VOS
Ju0026F 1st frame: 46.85
referring-expression-segmentation-on-davisURVOS
Ju0026F 1st frame: 44.1
referring-expression-segmentation-on-refer-1URVOS
F: 50.8
J: 47.0
Ju0026F: 48.9
referring-video-object-segmentation-on-mevisURVOS
F: 29.9
J: 25.7
Ju0026F: 27.8
referring-video-object-segmentation-on-refURVOS
F: 56.0
J: 47.3
Ju0026F: 51.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp