DenseFlow-74-10 | 3.35 (different downsampling) | Densely connected normalizing flows | |
2-rectified flow++ (NFE=1) | - | Improving the Training of Rectified Flows | |
Performer (6 layers) | 3.719 | Rethinking Attention with Performers | |
Sparse Transformer 59M (strided) | 3.44 | Generating Long Sequences with Sparse Transformers | |
CD (Diffusion + Distillation, NFE=2) | - | Consistency Models | |
CT (Direct Generation, NFE=1) | - | Consistency Models | |
Efficient-VDVAE | 3.30 (different downsampling) | Efficient-VDVAE: Less is more | |
Gated PixelCNN (van den Oord et al., [2016c]) | 3.57 | Conditional Image Generation with PixelCNN Decoders | |