4 months ago

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie; Zihang Dai; Eduard Hovy; Minh-Thang Luong; Quoc V. Le

Abstract

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. Code is available at https://github.com/google-research/uda.

Code Repositories

tomgoter/nlp_finalproject

Mentioned in GitHub

bhacquin/UDA_pytorch

pytorch

Mentioned in GitHub

leblancdaniel/paraphraser

Mentioned in GitHub

A-Telfer/AugKey

Mentioned in GitHub

ildoonet/unsupervised-data-augmentation

pytorch

Mentioned in GitHub

sud0301/my_uda_pytorch

pytorch

SanghunYun/UDA_pytorch

pytorch

Mentioned in GitHub

rwbfd/OpenCompetitionV2

pytorch

Mentioned in GitHub

PhamNguyen97/TSA_pytorch

pytorch

Mentioned in GitHub

joannayu25/NLP_Project_MIDS-W266

Mentioned in GitHub

ChingHuanChiu/sensitive

Mentioned in GitHub

peisuke/UnsupervisedDataAugmentation.pytorch

pytorch

Mentioned in GitHub

PhamQuocHuy1101/UDA-Image-classification

pytorch

Mentioned in GitHub

mrvoh/meta_learning_multilingual_doc_classification

pytorch

Mentioned in GitHub

kekmodel/UDA-pytorch

pytorch

Mentioned in GitHub

uizard-technologies/realmix

Mentioned in GitHub

google-research/uda

Official

PaulEmmanuelSotir/DeepCV

pytorch

Mentioned in GitHub

SaraAmd/Semi-Supervised-Learning

pytorch

Mentioned in GitHub

lantgabor/Unsupervised-Data-Augmentation-PyTorch

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
image-classification-on-imagenet	ResNet-50 (UDA)	Top 1 Accuracy: 79.04%
semi-supervised-image-classification-on-2	UDA	Top 5 Accuracy: 88.52
semi-supervised-image-classification-on-cifar	UDA	Percentage error: 5.27
semi-supervised-image-classification-on-svhn	UDA	Accuracy: 97.54
sentiment-analysis-on-amazon-review-full	BERT large finetune UDA	Accuracy: 62.88
sentiment-analysis-on-amazon-review-full	BERT large	Accuracy: 65.83
sentiment-analysis-on-amazon-review-polarity	BERT large	Accuracy: 97.37
sentiment-analysis-on-amazon-review-polarity	BERT large finetune UDA	Accuracy: 96.5
sentiment-analysis-on-imdb	BERT large finetune UDA	Accuracy: 95.8
sentiment-analysis-on-imdb	BERT large	Accuracy: 95.49
sentiment-analysis-on-yelp-binary	BERT large	Error: 1.89
sentiment-analysis-on-yelp-binary	BERT large finetune UDA	Error: 2.05
sentiment-analysis-on-yelp-fine-grained	BERT large finetune UDA	Error: 32.08
sentiment-analysis-on-yelp-fine-grained	BERT large	Error: 29.32
text-classification-on-amazon-2	BERT Finetune + UDA	Error: 3.5
text-classification-on-amazon-5	BERT Finetune + UDA	Error: 37.12
text-classification-on-dbpedia	BERT large	Error: 0.68
text-classification-on-dbpedia	BERT large UDA	Error: 1.09
text-classification-on-yelp-2	BERT Finetune + UDA	Accuracy: 97.95%
text-classification-on-yelp-5	BERT Finetune + UDA	Accuracy: 67.92%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie; Zihang Dai; Eduard Hovy; Minh-Thang Luong; Quoc V. Le

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters