Command Palette
Search for a command to run...
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Ding Henghui ; Liu Chang ; He Shuting ; Jiang Xudong ; Loy Chen Change

Abstract
This paper strives for motion expressions guided video segmentation, whichfocuses on segmenting objects in video content based on a sentence describingthe motion of the objects. Existing referring video object datasets typicallyfocus on salient objects and use language expressions that contain excessivestatic attributes that could potentially enable the target object to beidentified in a single frame. These datasets downplay the importance of motionin video content for language-guided video object segmentation. To investigatethe feasibility of using motion expressions to ground and segment objects invideos, we propose a large-scale dataset called MeViS, which contains numerousmotion expressions to indicate target objects in complex environments. Webenchmarked 5 existing referring video object segmentation (RVOS) methods andconducted a comprehensive comparison on the MeViS dataset. The results showthat current RVOS methods cannot effectively address motion expression-guidedvideo segmentation. We further analyze the challenges and propose a baselineapproach for the proposed MeViS dataset. The goal of our benchmark is toprovide a platform that enables the development of effective language-guidedvideo segmentation algorithms that leverage motion expressions as a primary cuefor object segmentation in complex video scenes. The proposed MeViS dataset hasbeen released at https://henghuiding.github.io/MeViS.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| referring-video-object-segmentation-on-mevis | LMPM | F: 40.2 J: 34.2 Ju0026F: 37.2 |
| referring-video-object-segmentation-on-revos | LMPM (Swin-T) | F: 31.7 J: 21.2 Ju0026F: 26.4 R: 3.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.