HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Smooth Exploration for Robotic Reinforcement Learning

Antonin Raffin Jens Kober Freek Stulp

Smooth Exploration for Robotic Reinforcement Learning

Abstract

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
continuous-control-on-pybullet-antPPO
Return: 2160
continuous-control-on-pybullet-antA2C gSDE
Return: 2560
continuous-control-on-pybullet-antPPO gSDE
Return: 2587
continuous-control-on-pybullet-antA2C
Return: 1967
continuous-control-on-pybullet-antSAC gSDE
Return: 3459
continuous-control-on-pybullet-antSAC
Return: 2859
continuous-control-on-pybullet-antTD3 gSDE
Return: 3267
continuous-control-on-pybullet-antTD3
Return: 2865
continuous-control-on-pybullet-halfcheetahSAC
Return: 2883
continuous-control-on-pybullet-halfcheetahPPO + gSDE
Return: 2760
continuous-control-on-pybullet-halfcheetahA2C + gSDE
Return: 2028
continuous-control-on-pybullet-halfcheetahTD3
Return: 2687
continuous-control-on-pybullet-halfcheetahPPO
Return: 2254
continuous-control-on-pybullet-halfcheetahTD3 gSDE
Return: 2578
continuous-control-on-pybullet-halfcheetahSAC gSDE
Return: 2850
continuous-control-on-pybullet-halfcheetahA2C
Return: 1652
continuous-control-on-pybullet-hopperA2C
Return: 1559
continuous-control-on-pybullet-hopperPPO
Return: 1622
continuous-control-on-pybullet-hopperTD3
Return: 2470
continuous-control-on-pybullet-hopperSAC gSDE
Return: 2646
continuous-control-on-pybullet-hopperPPO gSDE
Return: 2508
continuous-control-on-pybullet-hopperA2C gSDE
Return: 1448
continuous-control-on-pybullet-hopperTD3 gSDE
Return: 2353
continuous-control-on-pybullet-hopperSAC
Return: 2477
continuous-control-on-pybullet-walker2dPPO
Return: 1238
continuous-control-on-pybullet-walker2dA2C
Return: 443
continuous-control-on-pybullet-walker2dSAC
Return: 2215
continuous-control-on-pybullet-walker2dPPO gSDE
Return: 1776
continuous-control-on-pybullet-walker2dTD3
Return: 2106
continuous-control-on-pybullet-walker2dA2C gSDE
Return: 694
continuous-control-on-pybullet-walker2dTD3 gSDE
Return: 1989
continuous-control-on-pybullet-walker2dSAC gSDE
Return: 2341

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp