HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Planning in Stochastic Environments with a Learned Model

{David Silver Thomas K Hubert Sherjil Ozair Julian Schrittwieser Ioannis Antonoglou}

Planning in Stochastic Environments with a Learned Model

Abstract

Model-based reinforcement learning has proven highly successful. However, learning a model in isolation from its use during planning is problematic in complex environments. To date, the most effective techniques have instead combined value-equivalent model learning with powerful tree-search methods. This approach is exemplified by MuZero, which has achieved state-of-the-art performance in a wide range of domains, from board games to visually rich environments, with discrete and continuous action spaces, in online and offline settings. However, previous instantiations of this approach were limited to the use of deterministic models. This limits their performance in environments that are inherently stochastic, partially observed, or so large and complex that they appear stochastic to a finite agent. In this paper we extend this approach to learn and plan with stochastic models. Specifically, we introduce a new algorithm, Stochastic MuZero, that learns a stochastic model incorporating afterstates, and uses this model to perform a stochastic tree search. Stochastic MuZero matched or exceeded the state of the art in a set of canonical single and multi-agent environments, including 2048 and backgammon, while maintaining the same performance as standard MuZero in the game of Go.

Benchmarks

BenchmarkMethodologyMetrics
2048-on-2048AlphaZero (With Simulator)
Average Score: 500000
2048-on-2048MuZero
Average Score: 300000
2048-on-2048Stochastic Muzero
Average Score: 500000

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Planning in Stochastic Environments with a Learned Model | Papers | HyperAI