3 months ago

Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning

{Jinwoo Shin Yung Yi Junsu Kim Kyunghwan Son}

Abstract

In cooperative multi-agent reinforcement learning, state transitions, rewards, and actions can all induce randomness (or uncertainty) in the observed long-term returns. These randomnesses are reflected from two risk sources: (a) agent-wise risk (i.e., how cooperative our teammates act for a given agent) and (b) environment-wise risk (i.e., transition stochasticity). Although these two sources are both important factors for learning robust policies of agents, prior works do not separate them or deal with only a single risk source, which could lead to suboptimal equilibria. In this paper, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA), a novel framework being capable of disentangling risk sources. Our main idea is to separate risk-level leverages (i.e., quantiles) in both centralized training and decentralized execution with a hierarchical quantile structure and quantile regression. Our experiments demonstrate that DRIMA significantly outperforms prior-arts across various scenarios in StarCraft Multi-agent Challenge. Notably, DRIMA shows robust performance regardless of reward shaping, exploration schedule, where prior methods learn only a suboptimal policy.

Benchmarks

Benchmark	Methodology	Metrics
smac-on-smac-def-armored-parallel	DRIMA	Median Win Rate: 60.0
smac-on-smac-def-armored-sequential	DRIMA	Median Win Rate: 100
smac-on-smac-def-infantry-parallel	DRIMA	Median Win Rate: 100.0
smac-on-smac-def-infantry-sequential	DRIMA	Median Win Rate: 100
smac-on-smac-def-outnumbered-parallel	DRIMA	Median Win Rate: 70.0
smac-on-smac-def-outnumbered-sequential	DRIMA	Median Win Rate: 100
smac-on-smac-off-complicated-parallel	DRIMA	Median Win Rate: 100
smac-on-smac-off-complicated-sequential	DRIMA	Median Win Rate: 96.9
smac-on-smac-off-distant-parallel	DRIMA	Median Win Rate: 95.0
smac-on-smac-off-distant-sequential	DRIMA	Median Win Rate: 100
smac-on-smac-off-hard-parallel	DRIMA	Median Win Rate: 80.0
smac-on-smac-off-hard-sequential	DRIMA	Median Win Rate: 93.8
smac-on-smac-off-near-parallel	DRIMA	Median Win Rate: 95.0
smac-on-smac-off-near-sequential	DRIMA	Median Win Rate: 93.8
smac-on-smac-off-superhard-parallel	DRIMA	Median Win Rate: 0.0
smac-on-smac-off-superhard-sequential	DRIMA	Median Win Rate: 15.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning