HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Dual-CNN: A Convolutional language decoder for paragraph image captioning

{Xiaojie Wang Fangxiang Feng Yihui Shi Haoyun Liang Ruifan Li}

Abstract

Abstract The task of paragraph image captioning aims to generate a coherent paragraph describing a given image. However, due to their limited ability to capture long-term dependency, recurrent neural network or long-short term memory based decoders could hardly generate satisfactory textual descriptions with a long paragraph. In addition, the training inefficiency in the sequential decoders is significantly observed. Motivated by the advantage of convolutional neural network (i.e., CNN), in this paper, we propose a Dual-CNN decoder with long-term memory ability and parallel computation, which can produce a semantically coherent paragraph for an image. Our Dual-CNN model is evaluated on the Stanford image-paragraph dataset. Extensive experiments demonstrate that our Dual-CNN achieves comparable results compared with state-of-the-art models. Furthermore, the diversity and coherence of generated paragraphs are analyzed to show the superiority of our approach.

Benchmarks

BenchmarkMethodologyMetrics
image-paragraph-captioning-on-image-paragraphDual-CNN
BLEU-4: 8.6
CIDEr: 17.4
METEOR: 15.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Dual-CNN: A Convolutional language decoder for paragraph image captioning | Papers | HyperAI