5 months ago

CleanDIFT: Diffusion Features without Noise

Nick Stracke Stefan Andreas Baumann Kolja Bauer Frank Fundel Björn Ommer

Abstract

Internal features from large-scale pre-trained diffusion models have recentlybeen established as powerful semantic descriptors for a wide range ofdownstream tasks. Works that use these features generally need to add noise toimages before passing them through the model to obtain the semantic features,as the models do not offer the most useful features when given images withlittle to no noise. We show that this noise has a critical impact on theusefulness of these features that cannot be remedied by ensembling withdifferent random noises. We address this issue by introducing a lightweight,unsupervised fine-tuning method that enables diffusion backbones to providehigh-quality, noise-free semantic features. We show that these features readilyoutperform previous diffusion features by a wide margin in a wide variety ofextraction setups and downstream tasks, offering better performance than evenensemble-based methods at a fraction of the cost.

Code Repositories

CompVis/cleandift

Official

jax

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
semantic-correspondence-on-spair-71k	GeoAware-SC + CleanDIFT (Zero-Shot)	PCK: 70.0
semantic-correspondence-on-spair-71k	SD+DINO + CleanDIFT (Zero-Shot)	PCK: 64.8
semantic-correspondence-on-spair-71k	DIFT + CleanDIFT (Zero-Shot)	PCK: 61.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette