HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

{Xu sun Zhiyi Yin Lei LI Xiaodong He Pengcheng Yang Fuli Luo Peng Chen}

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

Abstract

The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsensedriven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 29% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.

Benchmarks

BenchmarkMethodologyMetrics
visual-storytelling-on-vistK-Storyteller
BLEU-4: 12.8
CIDEr: 12.1
METEOR: 35.2
ROUGE-L: 29.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling | Papers | HyperAI