HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Diverse Video Captioning by Adaptive Spatio-temporal Attention

Zohreh Ghaderi Leonard Salewski Hendrik P. A. Lensch

Diverse Video Captioning by Adaptive Spatio-temporal Attention

Abstract

To generate proper captions for videos, the inference needs to identify relevant concepts and pay attention to the spatial relationships between them as well as to the temporal development in the clip. Our end-to-end encoder-decoder video captioning framework incorporates two transformer-based architectures, an adapted transformer for a single joint spatio-temporal video analysis as well as a self-attention-based decoder for advanced text generation. Furthermore, we introduce an adaptive frame selection scheme to reduce the number of required incoming frames while maintaining the relevant content when training both transformers. Additionally, we estimate semantic concepts relevant for video captioning by aggregating all ground truth captions of each sample. Our approach achieves state-of-the-art results on the MSVD, as well as on the large-scale MSR-VTT and the VATEX benchmark datasets considering multiple Natural Language Generation (NLG) metrics. Additional evaluations on diversity scores highlight the expressiveness and diversity in the structure of our generated captions.

Code Repositories

zohrehghaderi/vasta
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-captioning-on-msr-vtt-1VASTA (Vatex-backbone)
BLEU-4: 44.21
CIDEr: 56.08
METEOR: 30.24
ROUGE-L: 62.9
video-captioning-on-msr-vtt-1VASTA (Kinetics-backbone)
BLEU-4: 43.4
CIDEr: 55
METEOR: 30.2
ROUGE-L: 62.5
video-captioning-on-msvd-1VASTA (Vatex-backbone)
BLEU-4: 59.2
CIDEr: 119.7
METEOR: 40.65
ROUGE-L: 76.7
video-captioning-on-msvd-1VASTA (Kinetics-backbone)
BLEU-4: 56.1
CIDEr: 106.4
METEOR: 39.1
ROUGE-L: 74.5
video-captioning-on-vatex-1VASTA (Kinetics-backbone)
BLEU-4: 36.25
CIDEr: 65.07
METEOR: 25.32
ROUGE-L: 51.88

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp