Command Palette
Search for a command to run...
Alexander C. Li; Mihir Prabhudesai; Shivam Duggal; Ellis Brown; Deepak Pathak

Abstract
The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, our diffusion-based approach has significantly stronger multimodal compositional reasoning ability than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Our models achieve strong classification performance using only weak augmentations and exhibit qualitatively better "effective robustness" to distribution shift. Overall, our results are a step toward using generative over discriminative models for downstream tasks. Results and visualizations at https://diffusion-classifier.github.io/
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| domain-generalization-on-imagenet-a | Diffusion Classifier | Top-1 accuracy %: 30.2 |
| fine-grained-image-classification-on-fgvc | Diffusion Classifier (zero-shot) | Accuracy: 26.4 |
| image-classification-on-cifar-10 | Diffusion Classifier (zero-shot) | Percentage correct: 88.5 |
| image-classification-on-flowers-102 | Diffusion Classifier (zero-shot) | Per-Class Accuracy: 66.3 |
| image-classification-on-imagenet | Diffusion Classifier | Top 1 Accuracy: 79.1% |
| image-classification-on-objectnet-imagenet | Diffusion Classifier | Top 1 Accuracy: 33.9 |
| image-classification-on-objectnet-imagenet | Diffusion Classifier (zero-shot) | Top 1 Accuracy: 43.4 |
| image-classification-on-oxford-iiit-pets-1 | Diffusion Classifier (zero-shot) | Per-Class Accuracy: 87.3 |
| image-classification-on-stl-10 | Diffusion Classifier (zero-shot) | Percentage correct: 95.4 |
| visual-reasoning-on-winoground | Diffusion Classifier (zero-shot) | Text Score: 34.00 |
| zero-shot-transfer-image-classification-on-1 | Diffusion Classifier (zero-shot) | Accuracy (Private): 61.4 |
| zero-shot-transfer-image-classification-on-17 | Diffusion Classifier (zero-shot) | Top 1 Accuracy: 77.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.