HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Look Deeper See Richer: Depth-aware Image Paragraph Captioning

{Hongzhi Yin Zi Huang Yang Li Yadan Luo Ziwei Wang}

Abstract

With the widespread availability of image captioning at a sentence level, how to automatically generate image paragraphs is yet well explored. Describing an image by a full paragraph involves organising sentences orderly, coherently and diversely, inevitably leading higher complexity than by a single sentence. Existing image paragraph captioning methods give a series of sentences to represent the objects and regions of interests, where the descriptions are essentially generated by feeding the image fragments containing objects and regions into conventional image single-sentence captioning models. This strategy is difficult to generate the descriptions that guarantee the stereoscopic hierarchy and non-overlapping objects. In this paper, we propose a Depth-aware Attention Model ( extitDAM ) to generate paragraph captions for images. The depths of image areas are firstly estimated in order to discriminate objects in a range of spatial locations, which can further guide the linguistic decoder to reveal spatial relationships among objects. This model completes the paragraph in a logical and coherent manner. By incorporating the attention mechanism, the learned model swiftly shifts the sentence focus during paragraph generation, whilst avoiding verbose descriptions on a same object. Extensive quantitative experiments and the user study have been conducted on the Visual Genome dataset, which demonstrate the effectiveness and the interpretability of the proposed model.

Benchmarks

BenchmarkMethodologyMetrics
image-paragraph-captioning-on-image-paragraphDepth-aware Attention Model (DAM)
BLEU-4: 6.7
CIDEr: 17.3
METEOR: 13.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Look Deeper See Richer: Depth-aware Image Paragraph Captioning | Papers | HyperAI