HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning

{and Jianfei Cai Hanwang Zhang Chongyang Gao Xu Yang}

Abstract

When we humans tell a long paragraph about an image, we usuallyfirst implicitly compose a mental “script” and then comply with itto generate the paragraph. Inspired by this, we render the modernencoder-decoder based image paragraph captioning model suchability by proposing Hierarchical Scene Graph Encoder-Decoder(HSGED) for generating coherent and distinctive paragraphs. Inparticular, we use the image scene graph as the “script” to incorporate rich semantic knowledge and, more importantly, the hierarchical constraints into the model. Specifically, we design a sentencescene graph RNN (SSG-RNN) to generate sub-graph level topics,which constrain the word scene graph RNN (WSG-RNN) to generate the corresponding sentences. We propose irredundant attentionin SSG-RNN to improve the possibility of abstracting topics fromrarely described sub-graphs and inheriting attention in WSG-RNNto generate more grounded sentences with the abstracted topics,both of which give rise to more distinctive paragraphs. An efficientsentence-level loss is also proposed for encouraging the sequence ofgenerated sentences to be similar to that of the ground-truth paragraphs. We validate HSGED on Stanford image paragraph datasetand show that it not only achieves a new state-of-the-art 36.02CIDEr-D, but also generates more coherent and distinctive paragraphs under various metrics.

Benchmarks

BenchmarkMethodologyMetrics
image-paragraph-captioning-on-image-paragraphHSGED(SLL)
BLEU-4: 11.26
CIDEr: 36.02
METEOR: 18.33

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning | Papers | HyperAI