HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Particle Based Stochastic Policy Optimization

{Tie-Yan Liu Tao Qin Fangyun Wei Chang Liu Yuxuan Song Qiwei Ye}

Particle Based Stochastic Policy Optimization

Abstract

Stochastic polic have been widely applied for their good property in exploration and uncertainty quantification. Modeling policy distribution by joint state-action distribution within the exponential family has enabled flexibility in exploration and learning multi-modal policies and also involved the probabilistic perspective of deep reinforcement learning (RL). The connection between probabilistic inference and RL makes it possible to leverage the advancements of probabilistic optimization tools. However, recent efforts are limited to the minimization of reverse KLdivergence which is confidence-seeking and may fade the merit of a stochastic policy. To leverage the full potential of stochastic policy and provide more flexible property, there is a strong motivation to consider different update rules during policy optimization. In this paper, we propose a particle-based probabilistic pol-icy optimization framework, ParPI, which enables the usage of a broad family of divergence or distances, such asf-divergences, and the Wasserstein distance which could serve better probabilistic behavior of the learned stochastic policy. Experiments in both online and offline settings demonstrate the effectiveness of the proposed algorithm as well as the characteristics of different discrepancy measures for policy optimization.

Benchmarks

BenchmarkMethodologyMetrics
offline-rl-on-walker2dParPI
D4RL Normalized Score: 151.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp