HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Xiaoyi Dong Jianmin Bao Ting Zhang Dongdong Chen Weiming Zhang Lu Yuan Dong Chen Fang Wen Nenghai Yu Baining Guo

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Abstract

This paper explores a better prediction target for BERT pre-training of vision transformers. We observe that current prediction targets disagree with human perception judgment.This contradiction motivates us to learn a perceptual prediction target. We argue that perceptually similar images should stay close to each other in the prediction target space. We surprisingly find one simple yet effective idea: enforcing perceptual similarity during the dVAE training. Moreover, we adopt a self-supervised transformer model for deep feature extraction and show that it works well for calculating perceptual similarity.We demonstrate that such learned visual tokens indeed exhibit better semantic meanings, and help pre-training achieve superior transfer performance in various downstream tasks. For example, we achieve $\textbf{84.5\%}$ Top-1 accuracy on ImageNet-1K with ViT-B backbone, outperforming the competitive method BEiT by $\textbf{+1.3\%}$ under the same pre-training epochs. Our approach also gets significant improvement on object detection and segmentation on COCO and semantic segmentation on ADE20K. Equipped with a larger backbone ViT-H, we achieve the state-of-the-art ImageNet accuracy (\textbf{88.3\%}) among methods using only ImageNet-1K data.

Code Repositories

xyzforever/bevt
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-imagenetPeCo (ViT-H, 224)
Top 1 Accuracy: 87.5%
self-supervised-image-classification-on-1PeCo(ViT-H/14, 448)
Number of Params: 632M
Top 1 Accuracy: 88.3%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers | Papers | HyperAI