HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

Bowen Zhang Hexiang Hu Fei Sha

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

Abstract

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings of randomly sampled nouns from the groundtruth stories as the target anchor word embeddings to learn the predictor. To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model. As opposed to state-of-the-art methods, the proposed model is simple in design, easy to optimize, and attains the best results in most automatic evaluation metrics. In human evaluation, the method also outperforms competing methods.

Benchmarks

BenchmarkMethodologyMetrics
visual-storytelling-on-vistStoryAnchor: w/ Predicted Nouns
BLEU-1: 65.1
BLEU-2: 40.0
BLEU-3: 23.4
BLEU-4: 14
CIDEr: 9.9
METEOR: 35.5
ROUGE-L: 30

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Visual Storytelling via Predicting Anchor Word Embeddings in the Stories | Papers | HyperAI