HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Taming Transformers for High-Resolution Image Synthesis

Esser Patrick ; Rombach Robin ; Ommer Björn

Taming Transformers for High-Resolution Image Synthesis

Abstract

Designed to learn long-range interactions on sequential data, transformerscontinue to show state-of-the-art results on a wide variety of tasks. Incontrast to CNNs, they contain no inductive bias that prioritizes localinteractions. This makes them expressive, but also computationally infeasiblefor long sequences, such as high-resolution images. We demonstrate howcombining the effectiveness of the inductive bias of CNNs with the expressivityof transformers enables them to model and thereby synthesize high-resolutionimages. We show how to (i) use CNNs to learn a context-rich vocabulary of imageconstituents, and in turn (ii) utilize transformers to efficiently model theircomposition within high-resolution images. Our approach is readily applied toconditional synthesis tasks, where both non-spatial information, such as objectclasses, and spatial information, such as segmentations, can control thegenerated image. In particular, we present the first results onsemantically-guided synthesis of megapixel images with transformers and obtainthe state of the art among autoregressive models on class-conditional ImageNet.Code and pretrained models can be found athttps://github.com/CompVis/taming-transformers .

Code Repositories

joanrod/ocr-vqgan
pytorch
Mentioned in GitHub
dome272/VQGAN
pytorch
Mentioned in GitHub
xiaoiker/meta_dpm
pytorch
Mentioned in GitHub
YvanG/VQGAN-CLIP
pytorch
Mentioned in GitHub
hyn2028/llm-cxr
pytorch
Mentioned in GitHub
joh-fischer/PlantLDM
pytorch
Mentioned in GitHub
samb-t/unleashing-transformers
pytorch
Mentioned in GitHub
v-iashin/SpecVQGAN
pytorch
Mentioned in GitHub
dome272/vqgan-pytorch
pytorch
Mentioned in GitHub
CompVis/taming-transformers
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
deepfake-detection-on-fakeavceleb-1VQGAN
AP: 55.0
ROC AUC: 51.8
image-generation-on-celeba-256x256VQGAN
FID: 10.2
image-generation-on-celeba-hq-256x256VQGAN+Transformer
FID: 10.2
image-generation-on-ffhq-256-x-256VQGAN+Transformer
FID: 9.6
image-generation-on-imagenet-256x256VQGAN+Transformer (k=600, p=1.0, a=0.05)
FID: 5.2
image-generation-on-imagenet-256x256VQGAN+Transformer (k=mixed, p=1.0, a=0.005)
FID: 6.59
image-outpainting-on-lhqcTaming
Block-FID (Right Extend): 22.53
Block-FID (Down Extend): 26.38
Block-FID (Left Extend): -
Block-FID (Up Extend): -
image-reconstruction-on-imagenetTaming-VQGAN (16x16)
FID: 3.64
LPIPS: 0.177
PSNR: 19.93
SSIM: 0.542
image-to-image-translation-on-ade20k-labelsVQGAN+Transformer
FID: 35.5
image-to-image-translation-on-coco-stuffVQGAN+Transformer
FID: 22.4
text-to-image-generation-on-conceptualVQ-GAN
FID: 28.86
text-to-image-generation-on-lhqcTaming
Block-FID: 38.89

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp