HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Yongxin Zhu Bocheng Li Hang Zhang Xin Li Linli Xu Lidong Bing

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified
  Perspective

Abstract

Latent-based image generative models, such as Latent Diffusion Models (LDMs)and Mask Image Models (MIMs), have achieved notable success in image generationtasks. These models typically leverage reconstructive autoencoders like VQGANor VAE to encode pixels into a more compact latent space and learn the datadistribution in the latent space instead of directly from pixels. However, thispractice raises a pertinent question: Is it truly the optimal choice? Inresponse, we begin with an intriguing observation: despite sharing the samelatent space, autoregressive models significantly lag behind LDMs and MIMs inimage generation. This finding contrasts sharply with the field of NLP, wherethe autoregressive model GPT has established a commanding presence. To addressthis discrepancy, we introduce a unified perspective on the relationshipbetween latent space and generative models, emphasizing the stability of latentspace in image generative modeling. Furthermore, we propose a simple buteffective discrete image tokenizer to stabilize the latent space for imagegenerative modeling. Experimental results show that image autoregressivemodeling with our tokenizer (DiGIT) benefits both image understanding and imagegeneration with the next token prediction principle, which is inherentlystraightforward for GPT models but challenging for other generative models.Remarkably, for the first time, a GPT-style autoregressive model for imagesoutperforms LDMs, which also exhibits substantial improvement akin to GPT whenscaling up model size. Our findings underscore the potential of an optimizedlatent space and the integration of discrete tokenization in advancing thecapabilities of image generative models. The code is available athttps://github.com/DAMO-NLP-SG/DiGIT.

Code Repositories

DAMO-NLP-SG/DiGIT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
conditional-image-generation-on-imagenet-2DiGIT
FID: 3.39
Inception score: 205.96
image-generation-on-imagenet-256x256DiGIT-0.7B
FID: 3.39
Inception score: 205.96
self-supervised-image-classification-onDiGIT
Number of Params: 732M
Top 1 Accuracy: 80.3%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp