DREAM-1K Video Description Benchmark Dataset
DREAM-1K is a video description benchmark dataset released by ByteDance. The related paper results are:Tarsier: Recipes for Training and Evaluating Large Video Description Models".
The dataset contains 1,000 annotated video clips (about 10 seconds in length) of varying complexity from 5 different categories, each containing at least one dynamic event that cannot be accurately identified from a single frame. Each video is provided with fine-grained manual annotations covering all events, actions, and motions.
Data source category:
- Live-action movies
- Animated Films
- Stock Video
- YouTube long videos
- TikTok-style short videos
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.