HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

VideoGPT: Video Generation using VQ-VAE and Transformers

Wilson Yan Yunzhi Zhang Pieter Abbeel Aravind Srinivas

VideoGPT: Video Generation using VQ-VAE and Transformers

Abstract

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. Despite the simplicity in formulation and ease of training, our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the BAIR Robot dataset, and generate high fidelity natural videos from UCF-101 and Tumbler GIF Dataset (TGIF). We hope our proposed architecture serves as a reproducible reference for a minimalistic implementation of transformer based video generation models. Samples and code are available at https://wilson1yan.github.io/videogpt/index.html

Code Repositories

wilson1yan/VideoGPT
Official
pytorch
Mentioned in GitHub
alescontrela/viper
jax
Mentioned in GitHub
Alescontrela/viper_rl
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-generation-on-bair-robot-pushingVideoGPT
Cond: 1
FVD score: 103.3
Pred: 15
Train: 15
video-generation-on-ucf-101-16-frames-128x128VideoGPT
Inception Score: 24.69

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp