4 months ago

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Robert Geirhos; Patricia Rubisch; Claudio Michaelis; Matthias Bethge; Felix A. Wichmann; Wieland Brendel

Abstract

Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on "Stylized-ImageNet", a stylized version of ImageNet. This provides a much better fit for human behavioural performance in our well-controlled psychophysical lab setting (nine experiments totalling 48,560 psychophysical trials across 97 observers) and comes with a number of unexpected emergent benefits such as improved object detection performance and previously unseen robustness towards a wide range of image distortions, highlighting advantages of a shape-based representation.

Code Repositories

rgeirhos/texture-vs-shape

pytorch

facebookresearch/augmentation-corruption

pytorch

Mentioned in GitHub

annstrange/breast-cancer-cnn

Mentioned in GitHub

rgeirhos/Stylized-ImageNet

Official

pytorch

Mentioned in GitHub

mbuet2ner/local-global-features-cnn

pytorch

Mentioned in GitHub

LiYingwei/ShapeTextureDebiasedTraining

pytorch

Mentioned in GitHub

frank-roesler/Image_Segmentation

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
domain-generalization-on-imagenet-a	Stylized ImageNet (ResNet-50)	Top-1 accuracy %: 2.3
domain-generalization-on-imagenet-c	Stylized ImageNet (ResNet-50)	mean Corruption Error (mCE): 69.3
domain-generalization-on-imagenet-r	Stylized ImageNet (ResNet-50)	Top-1 Error Rate: 58.5
domain-generalization-on-vizwiz	ResNet-50 (SIN)	Accuracy - All Images: 25.3 Accuracy - Clean Images: 30 Accuracy - Corrupted Images: 20.4
domain-generalization-on-vizwiz	ResNet-50 (SIN_IN_IN)	Accuracy - All Images: 39.2 Accuracy - Clean Images: 44.6 Accuracy - Corrupted Images: 32.4
domain-generalization-on-vizwiz	ResNet-50 (SIN_IN)	Accuracy - All Images: 38.2 Accuracy - Clean Images: 42.7 Accuracy - Corrupted Images: 32.5
object-recognition-on-shape-bias	ResNet-50	shape bias: 22.1
object-recognition-on-shape-bias	GoogLeNet	shape bias: 31.2
object-recognition-on-shape-bias	VGG-16	shape bias: 17.2
object-recognition-on-shape-bias	AlexNet	shape bias: 42.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Robert Geirhos; Patricia Rubisch; Claudio Michaelis; Matthias Bethge; Felix A. Wichmann; Wieland Brendel

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters