DenseFlow-74-10 | 3.35 (different downsampling) | Densely connected normalizing flows | - |
2-rectified flow++ (NFE=1) | - | Improving the Training of Rectified Flows | - |
Performer (6 layers) | 3.719 | Rethinking Attention with Performers | - |
Sparse Transformer 59M (strided) | 3.44 | Generating Long Sequences with Sparse Transformers | - |
CD (Diffusion + Distillation, NFE=2) | - | Consistency Models | - |
CT (Direct Generation, NFE=1) | - | Consistency Models | - |
Efficient-VDVAE | 3.30 (different downsampling) | Efficient-VDVAE: Less is more | - |
Gated PixelCNN (van den Oord et al., [2016c]) | 3.57 | Conditional Image Generation with PixelCNN Decoders | - |