Command Palette
Search for a command to run...
Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning
{Jinwoo Shin Yung Yi Junsu Kim Kyunghwan Son}

Abstract
In cooperative multi-agent reinforcement learning, state transitions, rewards, and actions can all induce randomness (or uncertainty) in the observed long-term returns. These randomnesses are reflected from two risk sources: (a) agent-wise risk (i.e., how cooperative our teammates act for a given agent) and (b) environment-wise risk (i.e., transition stochasticity). Although these two sources are both important factors for learning robust policies of agents, prior works do not separate them or deal with only a single risk source, which could lead to suboptimal equilibria. In this paper, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA), a novel framework being capable of disentangling risk sources. Our main idea is to separate risk-level leverages (i.e., quantiles) in both centralized training and decentralized execution with a hierarchical quantile structure and quantile regression. Our experiments demonstrate that DRIMA significantly outperforms prior-arts across various scenarios in StarCraft Multi-agent Challenge. Notably, DRIMA shows robust performance regardless of reward shaping, exploration schedule, where prior methods learn only a suboptimal policy.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| smac-on-smac-def-armored-parallel | DRIMA | Median Win Rate: 60.0 |
| smac-on-smac-def-armored-sequential | DRIMA | Median Win Rate: 100 |
| smac-on-smac-def-infantry-parallel | DRIMA | Median Win Rate: 100.0 |
| smac-on-smac-def-infantry-sequential | DRIMA | Median Win Rate: 100 |
| smac-on-smac-def-outnumbered-parallel | DRIMA | Median Win Rate: 70.0 |
| smac-on-smac-def-outnumbered-sequential | DRIMA | Median Win Rate: 100 |
| smac-on-smac-off-complicated-parallel | DRIMA | Median Win Rate: 100 |
| smac-on-smac-off-complicated-sequential | DRIMA | Median Win Rate: 96.9 |
| smac-on-smac-off-distant-parallel | DRIMA | Median Win Rate: 95.0 |
| smac-on-smac-off-distant-sequential | DRIMA | Median Win Rate: 100 |
| smac-on-smac-off-hard-parallel | DRIMA | Median Win Rate: 80.0 |
| smac-on-smac-off-hard-sequential | DRIMA | Median Win Rate: 93.8 |
| smac-on-smac-off-near-parallel | DRIMA | Median Win Rate: 95.0 |
| smac-on-smac-off-near-sequential | DRIMA | Median Win Rate: 93.8 |
| smac-on-smac-off-superhard-parallel | DRIMA | Median Win Rate: 0.0 |
| smac-on-smac-off-superhard-sequential | DRIMA | Median Win Rate: 15.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.