HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
  Generation

Abstract

Significant advancements have been achieved in the realm of large-scalepre-trained text-to-video Diffusion Models (VDMs). However, previous methodseither rely solely on pixel-based VDMs, which come with high computationalcosts, or on latent-based VDMs, which often struggle with precise text-videoalignment. In this paper, we are the first to propose a hybrid model, dubbed asShow-1, which marries pixel-based and latent-based VDMs for text-to-videogeneration. Our model first uses pixel-based VDMs to produce a low-resolutionvideo of strong text-video correlation. After that, we propose a novel experttranslation method that employs the latent-based VDMs to further upsample thelow-resolution video to high resolution, which can also remove potentialartifacts and corruptions from low-resolution videos. Compared to latent VDMs,Show-1 can produce high-quality videos of precise text-video alignment;Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage duringinference is 15G vs 72G). Furthermore, our Show-1 model can be readily adaptedfor motion customization and video stylization applications through simpletemporal attention layer finetuning. Our model achieves state-of-the-artperformance on standard video generation benchmarks. Our code and model weightsare publicly available at https://github.com/showlab/Show-1.

Code Repositories

showlab/show-1
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-video-generation-on-evalcrafter-textShow-1
Motion Quality: 52.19
Temporal Consistency: 60.83
Text-to-Video Alignment: 62.07
Total Score: 229
Visual Quality: 53.74
text-to-video-generation-on-msr-vttShow-1
CLIPSIM: 0.3072
FID: 13.08
FVD: 538

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp