5 months ago

Live Video Captioning

Blanco-Fernández Eduardo ; Gutiérrez-Álvarez Carlos ; Nasri Nadia ; Maldonado-Bascón Saturnino ; López-Sastre Roberto J.

Abstract

Dense video captioning involves detecting and describing events within videosequences. Traditional methods operate in an offline setting, assuming theentire video is available for analysis. In contrast, in this work we introducea groundbreaking paradigm: Live Video Captioning (LVC), where captions must begenerated for video streams in an online manner. This shift brings uniquechallenges, including processing partial observations of the events and theneed for a temporal anticipation of the actions. We formally define the novelproblem of LVC and propose innovative evaluation metrics specifically designedfor this online scenario, demonstrating their advantages over traditionalmetrics. To address the novel complexities of LVC, we present a new model thatcombines deformable transformers with temporal filtering, enabling effectivecaptioning over video streams. Extensive experiments on the ActivityNetCaptions dataset validate the proposed approach, showcasing its superiorperformance in the LVC setting compared to state-of-the-art offline methods. Tofoster further research, we provide the results of our model and an evaluationtoolkit with the new metrics integrated at: https://github.com/gramuah/lvc.

Code Repositories

gramuah/lvc

Official

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
live-video-captioning-on-activitynet-captions	LVC	Live Score: 20.81

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette