HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

Chen Liang Yu Wu Yawei Luo Yi Yang

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

Abstract

Text-based video segmentation is a challenging task that segments out the natural language referred objects in videos. It essentially requires semantic comprehension and fine-grained video understanding. Existing methods introduce language representation into segmentation models in a bottom-up manner, which merely conducts vision-language interaction within local receptive fields of ConvNets. We argue that such interaction is not fulfilled since the model can barely construct region-level relationships given partial observations, which is contrary to the description logic of natural language/referring expressions. In fact, people usually describe a target object using relations with other objects, which may not be easily understood without seeing the whole video. To address the issue, we introduce a novel top-down approach by imitating how we human segment an object with the language guidance. We first figure out all candidate objects in videos and then choose the refereed one by parsing relations among those high-level objects. Three kinds of object-level relations are investigated for precise relationship understanding, i.e., positional relation, text-guided semantic relation, and temporal relation. Extensive experiments on A2D Sentences and J-HMDB Sentences show our method outperforms state-of-the-art methods by a large margin. Qualitative results also show our results are more explainable.

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-a2dClawCraneNet
IoU mean: 0.655
IoU overall: 0.644
Precision@0.5: 0.704
Precision@0.6: 0.677
Precision@0.7: 0.617
Precision@0.8: 0.489
Precision@0.9: 0.171
referring-expression-segmentation-on-j-hmdbClawCraneNet
IoU mean: 0.655
IoU overall: 0.644
Precision@0.5: 0.880
Precision@0.6: 0.796
Precision@0.7: 0.566
Precision@0.8: 0.147
Precision@0.9: 0.002

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp