6 months ago

Junsong Chen Chongjian Ge Enze Xie Yue Wu Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li

Abstract

In this paper, we introduce PixArt-Σ, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution. PixArt-Σrepresents a significant advancement over its predecessor, PixArt-α, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-Σis its training efficiency. Leveraging the foundational pre-training of PixArt-α, it evolves from the weaker' baseline to astronger' model via incorporating higher quality data, a process we term "weak-to-strong training". The advancements in PixArt-Σare twofold: (1) High-Quality Training Data: PixArt-Σincorporates superior-quality image data, paired with more precise and detailed image captions. (2) Efficient Token Compression: we propose a novel attention module within the DiT framework that compresses both keys and values, significantly improving efficiency and facilitating ultra-high-resolution image generation. Thanks to these improvements, PixArt-Σachieves superior image quality and user prompt adherence capabilities with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters). Moreover, PixArt-Σ's capability to generate 4K images supports the creation of high-resolution posters and wallpapers, efficiently bolstering the production of high-quality visual content in industries such as film and gaming.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Junsong Chen Chongjian Ge Enze Xie Yue Wu Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Junsong Chen Chongjian Ge Enze Xie Yue Wu Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen Chongjian Ge Enze Xie Yue Wu Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen Chongjian Ge Enze Xie Yue Wu Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen Chongjian Ge Enze Xie Yue Wu Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li

Abstract

Build AI with AI

HyperAI Newsletters