HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Deep ViT Features as Dense Visual Descriptors

Amir Shir ; Gandelsman Yossi ; Bagon Shai ; Dekel Tali

Deep ViT Features as Dense Visual Descriptors

Abstract

We study the use of deep features extracted from a pretrained VisionTransformer (ViT) as dense visual descriptors. We observe and empiricallydemonstrate that such features, when extractedfrom a self-supervised ViT model(DINO-ViT), exhibit several striking properties, including: (i) the featuresencode powerful, well-localized semantic information, at high spatialgranularity, such as object parts; (ii) the encoded semantic information isshared across related, yet different object categories, and (iii) positionalbias changes gradually throughout the layers. These properties allow us todesign simple methods for a variety of applications, including co-segmentation,part co-segmentation and semantic correspondences. To distill the power of ViTfeatures from convoluted design choices, we restrict ourselves to lightweightzero-shot methodologies (e.g., binning and clustering) applied directly to thefeatures. Since our methods require no additional training nor data, they arereadily applicable across a variety of domains. We show by extensivequalitative and quantitative evaluation that our simple methodologies achievecompetitive results with recent state-of-the-art supervised methods, andoutperform previous unsupervised methods by a large margin. Code is availablein dino-vit-features.github.io.

Code Repositories

shiramir/dino-vit-features
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
feature-upsampling-on-imagenetStrided
Average Drop: 11.48
Average Increase: 4.97

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deep ViT Features as Dense Visual Descriptors | Papers | HyperAI