Command Palette
Search for a command to run...
Scott Reed; Aäron van den Oord; Nal Kalchbrenner; Sergio Gómez Colmenarejo; Ziyu Wang; Dan Belov; Nando de Freitas

Abstract
PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512x512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-compression-on-imagenet32 | MS-PixelCNN | bpsp: 3.95 |
| image-generation-on-imagenet-64x64 | Parallel Multiscale | Bits per dim: 3.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.