HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Recurrent Highway Networks

Julian Georg Zilly; Rupesh Kumar Srivastava; Jan Koutník; Jürgen Schmidhuber

Recurrent Highway Networks

Abstract

Many sequential processing tasks require complex nonlinear transition functions from one step to the next. However, recurrent neural networks with 'deep' transition functions remain difficult to train, even when using Long Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell. Based on this analysis we propose Recurrent Highway Networks, which extend the LSTM architecture to allow step-to-step transition depths larger than one. Several language modeling experiments demonstrate that the proposed architecture results in powerful and efficient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character.

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-enwiki8Recurrent Highway Networks
Bit per Character (BPC): 1.27
Number of params: 46M
language-modelling-on-hutter-prizeLarge RHN
Bit per Character (BPC): 1.27
Number of params: 46M
language-modelling-on-hutter-prizeRHN - depth 5 [zilly2016recurrent]
Bit per Character (BPC): 1.31
language-modelling-on-penn-treebank-wordRecurrent highway networks
Params: 23M
Test perplexity: 65.4
Validation perplexity: 67.9
language-modelling-on-text8Large RHN
Bit per Character (BPC): 1.27
Number of params: 46M

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp