6 months ago

Image Generation

Diffusion Model

Method/Architecture

Computer Vision

Yikai Wang Zhouxia Wang Zhonghua Wu Qingyi Tao Kang Liao Chen Change Loy

Abstract

We propose a novel approach to image generation by decomposing an image intoa structured sequence, where each element in the sequence shares the samespatial resolution but differs in the number of unique tokens used, capturingdifferent level of visual granularity. Image generation is carried out throughour newly introduced Next Visual Granularity (NVG) generation framework, whichgenerates a visual granularity sequence beginning from an empty image andprogressively refines it, from global layout to fine details, in a structuredmanner. This iterative process encodes a hierarchical, layered representationthat offers fine-grained control over the generation process across multiplegranularity levels. We train a series of NVG models for class-conditional imagegeneration on the ImageNet dataset and observe clear scaling behavior. Comparedto the VAR series, NVG consistently outperforms it in terms of FID scores (3.30-> 3.03, 2.57 ->2.44, 2.09 -> 2.06). We also conduct extensive analysis toshowcase the capability and potential of the NVG framework. Our code and modelswill be released.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

6 months ago

Image Generation

Diffusion Model

Method/Architecture

Computer Vision

Yikai Wang Zhouxia Wang Zhonghua Wu Qingyi Tao Kang Liao Chen Change Loy

Abstract

We propose a novel approach to image generation by decomposing an image intoa structured sequence, where each element in the sequence shares the samespatial resolution but differs in the number of unique tokens used, capturingdifferent level of visual granularity. Image generation is carried out throughour newly introduced Next Visual Granularity (NVG) generation framework, whichgenerates a visual granularity sequence beginning from an empty image andprogressively refines it, from global layout to fine details, in a structuredmanner. This iterative process encodes a hierarchical, layered representationthat offers fine-grained control over the generation process across multiplegranularity levels. We train a series of NVG models for class-conditional imagegeneration on the ImageNet dataset and observe clear scaling behavior. Comparedto the VAR series, NVG consistently outperforms it in terms of FID scores (3.30-> 3.03, 2.57 ->2.44, 2.09 -> 2.06). We also conduct extensive analysis toshowcase the capability and potential of the NVG framework. Our code and modelswill be released.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp