HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Pushing the bounds of dropout

Gábor Melis; Charles Blundell; Tomáš Kočiský; Karl Moritz Hermann; Chris Dyer; Phil Blunsom

Pushing the bounds of dropout

Abstract

We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective.

Code Repositories

deepmind/lamb
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-penn-treebank-word2-layer skip-LSTM + dropout tuning
Params: 24M
Test perplexity: 55.3
Validation perplexity: 57.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Pushing the bounds of dropout | Papers | HyperAI