HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Abstract

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.

Code Repositories

shallowlearn/sportsreid
pytorch
Mentioned in GitHub
Burf/ModelSoups
tf
Mentioned in GitHub
facebookresearch/ModelRatatouille
pytorch
Mentioned in GitHub
flowritecom/flow-merge
pytorch
Mentioned in GitHub
mlfoundations/model-soups
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
domain-generalization-on-imagenet-aModel soups (BASIC-L)
Top-1 accuracy %: 94.17
domain-generalization-on-imagenet-aModel soups (ViT-G/14)
Top-1 accuracy %: 92.67
domain-generalization-on-imagenet-rModel soups (ViT-G/14)
Top-1 Error Rate: 4.54
domain-generalization-on-imagenet-rModel soups (BASIC-L)
Top-1 Error Rate: 3.90
domain-generalization-on-imagenet-sketchModel soups (ViT-G/14)
Top-1 accuracy: 74.24
domain-generalization-on-imagenet-sketchModel soups (BASIC-L)
Top-1 accuracy: 77.18
image-classification-on-imagenetModel soups (ViT-G/14)
Number of params: 1843M
Top 1 Accuracy: 90.94%
image-classification-on-imagenetModel soups (BASIC-L)
Number of params: 2440M
Top 1 Accuracy: 90.98%
image-classification-on-imagenet-realModel soups (ViT-G/14)
Accuracy: 91.20%
Params: 1843M
image-classification-on-imagenet-realModel soups (BASIC-L)
Accuracy: 91.03%
Params: 2440M
image-classification-on-imagenet-realBaseline (ViT-G/14)
Accuracy: 91.78%
image-classification-on-imagenet-v2Model soups (ViT-G/14)
Top 1 Accuracy: 84.22
image-classification-on-imagenet-v2Model soups (BASIC-L)
Top 1 Accuracy: 84.63
image-classification-on-objectnetBaseline (ViT-G/14)
Top-1 Accuracy: 79.03
image-classification-on-objectnetModel soups (ViT-G/14)
Top-1 Accuracy: 78.52
unsupervised-domain-adaptation-on-imagenet-rModel soups (ViT-G/14)
Top 1 Error: 4.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp