3 months ago

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Mitchell Wortsman Gabriel Ilharco Samir Yitzhak Gadre Rebecca Roelofs Raphael Gontijo-Lopes Ari S. Morcos Hongseok Namkoong Ali Farhadi Yair Carmon Simon Kornblith

Abstract

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.

Code Repositories

shallowlearn/sportsreid

pytorch

Mentioned in GitHub

hwk0702/keras2torch/tree/main/Computer_Vision/Model_Soup

pytorch

Burf/ModelSoups

Mentioned in GitHub

facebookresearch/ModelRatatouille

pytorch

Mentioned in GitHub

flowritecom/flow-merge

pytorch

Mentioned in GitHub

mlfoundations/model-soups

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
domain-generalization-on-imagenet-a	Model soups (BASIC-L)	Top-1 accuracy %: 94.17
domain-generalization-on-imagenet-a	Model soups (ViT-G/14)	Top-1 accuracy %: 92.67
domain-generalization-on-imagenet-r	Model soups (ViT-G/14)	Top-1 Error Rate: 4.54
domain-generalization-on-imagenet-r	Model soups (BASIC-L)	Top-1 Error Rate: 3.90
domain-generalization-on-imagenet-sketch	Model soups (ViT-G/14)	Top-1 accuracy: 74.24
domain-generalization-on-imagenet-sketch	Model soups (BASIC-L)	Top-1 accuracy: 77.18
image-classification-on-imagenet	Model soups (ViT-G/14)	Number of params: 1843M Top 1 Accuracy: 90.94%
image-classification-on-imagenet	Model soups (BASIC-L)	Number of params: 2440M Top 1 Accuracy: 90.98%
image-classification-on-imagenet-real	Model soups (ViT-G/14)	Accuracy: 91.20% Params: 1843M
image-classification-on-imagenet-real	Model soups (BASIC-L)	Accuracy: 91.03% Params: 2440M
image-classification-on-imagenet-real	Baseline (ViT-G/14)	Accuracy: 91.78%
image-classification-on-imagenet-v2	Model soups (ViT-G/14)	Top 1 Accuracy: 84.22
image-classification-on-imagenet-v2	Model soups (BASIC-L)	Top 1 Accuracy: 84.63
image-classification-on-objectnet	Baseline (ViT-G/14)	Top-1 Accuracy: 79.03
image-classification-on-objectnet	Model soups (ViT-G/14)	Top-1 Accuracy: 78.52
unsupervised-domain-adaptation-on-imagenet-r	Model soups (ViT-G/14)	Top 1 Error: 4.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Mitchell Wortsman Gabriel Ilharco Samir Yitzhak Gadre Rebecca Roelofs Raphael Gontijo-Lopes Ari S. Morcos Hongseok Namkoong Ali Farhadi Yair Carmon Simon Kornblith1 more

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Mitchell Wortsman Gabriel Ilharco Samir Yitzhak Gadre Rebecca Roelofs Raphael Gontijo-Lopes Ari S. Morcos Hongseok Namkoong Ali Farhadi Yair Carmon Simon Kornblith