HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie; Zihang Dai; Eduard Hovy; Minh-Thang Luong; Quoc V. Le

Unsupervised Data Augmentation for Consistency Training

Abstract

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. Code is available at https://github.com/google-research/uda.

Code Repositories

tomgoter/nlp_finalproject
tf
Mentioned in GitHub
bhacquin/UDA_pytorch
pytorch
Mentioned in GitHub
leblancdaniel/paraphraser
tf
Mentioned in GitHub
A-Telfer/AugKey
Mentioned in GitHub
ildoonet/unsupervised-data-augmentation
pytorch
Mentioned in GitHub
SanghunYun/UDA_pytorch
pytorch
Mentioned in GitHub
rwbfd/OpenCompetitionV2
pytorch
Mentioned in GitHub
PhamNguyen97/TSA_pytorch
pytorch
Mentioned in GitHub
ChingHuanChiu/sensitive
tf
Mentioned in GitHub
kekmodel/UDA-pytorch
pytorch
Mentioned in GitHub
uizard-technologies/realmix
tf
Mentioned in GitHub
PaulEmmanuelSotir/DeepCV
pytorch
Mentioned in GitHub
SaraAmd/Semi-Supervised-Learning
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-imagenetResNet-50 (UDA)
Top 1 Accuracy: 79.04%
semi-supervised-image-classification-on-2UDA
Top 5 Accuracy: 88.52
semi-supervised-image-classification-on-cifarUDA
Percentage error: 5.27
semi-supervised-image-classification-on-svhnUDA
Accuracy: 97.54
sentiment-analysis-on-amazon-review-fullBERT large finetune UDA
Accuracy: 62.88
sentiment-analysis-on-amazon-review-fullBERT large
Accuracy: 65.83
sentiment-analysis-on-amazon-review-polarityBERT large
Accuracy: 97.37
sentiment-analysis-on-amazon-review-polarityBERT large finetune UDA
Accuracy: 96.5
sentiment-analysis-on-imdbBERT large finetune UDA
Accuracy: 95.8
sentiment-analysis-on-imdbBERT large
Accuracy: 95.49
sentiment-analysis-on-yelp-binaryBERT large
Error: 1.89
sentiment-analysis-on-yelp-binaryBERT large finetune UDA
Error: 2.05
sentiment-analysis-on-yelp-fine-grainedBERT large finetune UDA
Error: 32.08
sentiment-analysis-on-yelp-fine-grainedBERT large
Error: 29.32
text-classification-on-amazon-2BERT Finetune + UDA
Error: 3.5
text-classification-on-amazon-5BERT Finetune + UDA
Error: 37.12
text-classification-on-dbpediaBERT large
Error: 0.68
text-classification-on-dbpediaBERT large UDA
Error: 1.09
text-classification-on-yelp-2BERT Finetune + UDA
Accuracy: 97.95%
text-classification-on-yelp-5BERT Finetune + UDA
Accuracy: 67.92%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp