6 months ago

Abstract

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good ability in generating stories with higher quality compared to state-of-the-art methods.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Image Captioning

Multimodal

Natural Language Processing

Multimodality

Task/Problem

Ruize Wang Zhongyu Wei Ying Cheng Piji Li Haijun Shan Ji Zhang Qi Zhang Xuanjing Huang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Image Captioning

Multimodal

Natural Language Processing

Multimodality

Task/Problem

Ruize Wang Zhongyu Wei Ying Cheng Piji Li Haijun Shan Ji Zhang Qi Zhang Xuanjing Huang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Ruize Wang Zhongyu Wei Ying Cheng Piji Li Haijun Shan Ji Zhang Qi Zhang Xuanjing Huang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Ruize Wang Zhongyu Wei Ying Cheng Piji Li Haijun Shan Ji Zhang Qi Zhang Xuanjing Huang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Ruize Wang Zhongyu Wei Ying Cheng Piji Li Haijun Shan Ji Zhang Qi Zhang Xuanjing Huang

Abstract

Build AI with AI

HyperAI Newsletters