HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Vision-Language Transformer and Query Generation for Referring Segmentation

Ding Henghui ; Liu Chang ; Wang Suchen ; Jiang Xudong

Vision-Language Transformer and Query Generation for Referring
  Segmentation

Abstract

In this work, we address the challenging task of referring segmentation. Thequery expression in referring segmentation typically indicates the targetobject by describing its relationship with others. Therefore, to find thetarget one among all instances in the image, the model must have a holisticunderstanding of the whole image. To achieve this, we reformulate referringsegmentation as a direct attention problem: finding the region in the imagewhere the query language expression is most attended to. We introducetransformer and multi-head attention to build a network with an encoder-decoderattention mechanism architecture that "queries" the given image with thelanguage expression. Furthermore, we propose a Query Generation Module, whichproduces multiple sets of queries with different attention weights thatrepresent the diversified comprehensions of the language expression fromdifferent aspects. At the same time, to find the best way from thesediversified comprehensions based on visual clues, we further propose a QueryBalance Module to adaptively select the output features of these queries for abetter mask generation. Without bells and whistles, our approach islight-weight and achieves new state-of-the-art performance consistently onthree referring segmentation datasets, RefCOCO, RefCOCO+, and G-Ref. Our codeis available at https://github.com/henghuiding/Vision-Language-Transformer.

Code Repositories

henghuiding/Vision-Language-Transformer
Official
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
generalized-referring-expressionVLT
N-acc.: 35.2
Precision@(F1=1, IoU≥0.5): 36.6
generalized-referring-expression-segmentationVLT
cIoU: 52.51
gIoU: 52.00
referring-expression-segmentation-on-refcocoVLT
Overall IoU: 65.65
referring-expression-segmentation-on-refcoco-3VLT
Overall IoU: 55.50
referring-expression-segmentation-on-refcoco-4VLT
Overall IoU: 59.20
referring-expression-segmentation-on-refcoco-5VLT
Overall IoU: 49.36
referring-expression-segmentation-on-refcocogVLT (Darknet53)
Overall IoU: 52.99
referring-expression-segmentation-on-refcocog-1VLT (Darknet53)
Overall IoU: 56.65

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp