HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Jiajun Fan Changnan Xiao Yue Huang

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Abstract

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavourable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is better than GPI and how it works. Several practical algorithms based on GDI have been proposed to verify the effectiveness and extensiveness of it. Empirical experiments prove our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200M training frames. Our work aims to lead the RL research to step into the journey of conquering the human world records and seek real superhuman agents on both performance and efficiency.

Benchmarks

BenchmarkMethodologyMetrics
atari-games-on-atari-2600-beam-riderGDI-I3
Score: 162100
atari-games-on-atari-2600-berzerkGDI-I3
Score: 7607
atari-games-on-atari-2600-bowlingGDI-I3
Score: 201.9
atari-games-on-atari-2600-boxingGDI-H3
Score: 100
atari-games-on-atari-2600-centipedeGDI-I3
Score: 155830
atari-games-on-atari-2600-chopper-commandGDI-H3
Score: 999999
atari-games-on-atari-2600-crazy-climberGDI-I3
Score: 201000
atari-games-on-atari-2600-defenderGDI-I3
Score: 893110
atari-games-on-atari-2600-demon-attackGDI-I3
Score: 675530
atari-games-on-atari-2600-double-dunkGDI-H3
Score: 24
atari-games-on-atari-2600-enduroGDI-I3
Score: 14330
atari-games-on-atari-2600-fishing-derbyGDI-I3
Score: 59
atari-games-on-atari-2600-freewayGDI-I3
Score: 34
atari-games-on-atari-2600-frostbiteGDI-I3
Score: 10485
atari-games-on-atari-2600-gravitarGDI-I3
Score: 5905
atari-games-on-atari-2600-heroGDI-I3
Score: 38330
atari-games-on-atari-2600-ice-hockeyGDI-I3
Score: 44.94
atari-games-on-atari-2600-james-bondGDI-I3
Score: 594500
atari-games-on-atari-2600-kangarooGDI-I3
Score: 14500
atari-games-on-atari-2600-krullGDI-I3
Score: 97575
atari-games-on-atari-2600-montezumas-revengeGDI-I3
Score: 3000
atari-games-on-atari-2600-ms-pacmanGDI-I3
Score: 11536
atari-games-on-atari-2600-name-this-gameGDI-I3
Score: 34434
atari-games-on-atari-2600-phoenixGDI-I3
Score: 894460
atari-games-on-atari-2600-pitfallGDI-I3
Score: 0
atari-games-on-atari-2600-private-eyeGDI-I3
Score: 15100
atari-games-on-atari-2600-qbertGDI-I3
Score: 27800
atari-games-on-atari-2600-road-runnerGDI-I3
Score: 878600
atari-games-on-atari-2600-robotankGDI-I3
Score: 108.2
atari-games-on-atari-2600-seaquestGDI-I3
Score: 943910
atari-games-on-atari-2600-skiingGDI-I3
Score: -6774
atari-games-on-atari-2600-solarisGDI-I3
Score: 11074
atari-games-on-atari-2600-space-invadersGDI-I3
Score: 140460
atari-games-on-atari-2600-star-gunnerGDI-I3
Score: 465750
atari-games-on-atari-2600-surroundGDI-I3
Score: -7.8
atari-games-on-atari-2600-tennisGDI-I3
Score: 24
atari-games-on-atari-2600-time-pilotGDI-I3
Score: 216770
atari-games-on-atari-2600-tutankhamGDI-I3
Score: 423.9
atari-games-on-atari-2600-up-and-downGDI-I3
Score: 986440
atari-games-on-atari-57GDI-H3(200M frames)
Human World Record Breakthrough: 22
Mean Human Normalized Score: 9620.98%
atari-games-on-atari-57GDI-H3-

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp