Command Palette
Search for a command to run...
Dual-CNN: A Convolutional language decoder for paragraph image captioning
{Xiaojie Wang Fangxiang Feng Yihui Shi Haoyun Liang Ruifan Li}
Abstract
Abstract The task of paragraph image captioning aims to generate a coherent paragraph describing a given image. However, due to their limited ability to capture long-term dependency, recurrent neural network or long-short term memory based decoders could hardly generate satisfactory textual descriptions with a long paragraph. In addition, the training inefficiency in the sequential decoders is significantly observed. Motivated by the advantage of convolutional neural network (i.e., CNN), in this paper, we propose a Dual-CNN decoder with long-term memory ability and parallel computation, which can produce a semantically coherent paragraph for an image. Our Dual-CNN model is evaluated on the Stanford image-paragraph dataset. Extensive experiments demonstrate that our Dual-CNN achieves comparable results compared with state-of-the-art models. Furthermore, the diversity and coherence of generated paragraphs are analyzed to show the superiority of our approach.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-paragraph-captioning-on-image-paragraph | Dual-CNN | BLEU-4: 8.6 CIDEr: 17.4 METEOR: 15.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.