HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Antoine Yang Arsha Nagrani Paul Hongsuck Seo Antoine Miech Jordi Pont-Tuset Ivan Laptev Josef Sivic Cordelia Schmid

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Abstract

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale. The Vid2Seq architecture augments a language model with special time tokens, allowing it to seamlessly predict event boundaries and textual descriptions in the same output sequence. Such a unified model requires large-scale training data, which is not available in current annotated datasets. We show that it is possible to leverage unlabeled narrated videos for dense video captioning, by reformulating sentence boundaries of transcribed speech as pseudo event boundaries, and using the transcribed speech sentences as pseudo event captions. The resulting Vid2Seq model pretrained on the YT-Temporal-1B dataset improves the state of the art on a variety of dense video captioning benchmarks including YouCook2, ViTT and ActivityNet Captions. Vid2Seq also generalizes well to the tasks of video paragraph captioning and video clip captioning, and to few-shot settings. Our code is publicly available at https://antoyang.github.io/vid2seq.html.

Benchmarks

BenchmarkMethodologyMetrics
dense-video-captioning-on-activitynetVid2Seq
CIDEr: 28
METEOR: 17
dense-video-captioning-on-vittVid2Seq
CIDEr: 43.5
METEOR: 8.5
SODA: 0.135
dense-video-captioning-on-youcook2Vid2Seq
CIDEr: 47.1
METEOR: 9.3
SODA: 7.9
video-captioning-on-msr-vtt-1Vid2Seq
CIDEr: 64.6
METEOR: 30.8
video-captioning-on-msvd-1Vid2Seq
CIDEr: 146.2
METEOR: 45.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp