HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Training a Large Video Model on a Single Machine in a Day

Yue Zhao Philipp Krähenbühl

Training a Large Video Model on a Single Machine in a Day

Abstract

Videos are big, complex to pre-process, and slow to train on. State-of-the-art large-scale video models are trained on clusters of 32 or more GPUs for several days. As a consequence, academia largely ceded the training of large video models to industry. In this paper, we show how to still train a state-of-the-art video model on a single machine with eight consumer-grade GPUs in a day. We identify three bottlenecks, IO, CPU, and GPU computation, and optimize each. The result is a highly efficient video training pipeline. For comparable architectures, our pipeline achieves higher accuracies with $\frac{1}{8}$ of the computation compared to prior work. Code is available at https://github.com/zhaoyue-zephyrus/AVION.

Code Repositories

zhaoyue-zephyrus/avion
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-on-epic-kitchens-100Avion (ViT-L)
Action@1: 54.4
Noun@1: 65.4
Verb@1: 73.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp