HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Dong-Jin Kim; Jinsoo Choi; Tae-Hyun Oh; In So Kweon

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Abstract

Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce "relational captioning," a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.

Code Repositories

Dong-JinKim/DenseRelationalCaptioning
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
relational-captioning-on-relationalMTTSNet
Image-Level Recall: 34.27

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning | Papers | HyperAI