HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning

{Jinwoo Shin Yung Yi Junsu Kim Kyunghwan Son}

Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning

Abstract

In cooperative multi-agent reinforcement learning, state transitions, rewards, and actions can all induce randomness (or uncertainty) in the observed long-term returns. These randomnesses are reflected from two risk sources: (a) agent-wise risk (i.e., how cooperative our teammates act for a given agent) and (b) environment-wise risk (i.e., transition stochasticity). Although these two sources are both important factors for learning robust policies of agents, prior works do not separate them or deal with only a single risk source, which could lead to suboptimal equilibria. In this paper, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA), a novel framework being capable of disentangling risk sources. Our main idea is to separate risk-level leverages (i.e., quantiles) in both centralized training and decentralized execution with a hierarchical quantile structure and quantile regression. Our experiments demonstrate that DRIMA significantly outperforms prior-arts across various scenarios in StarCraft Multi-agent Challenge. Notably, DRIMA shows robust performance regardless of reward shaping, exploration schedule, where prior methods learn only a suboptimal policy.

Benchmarks

BenchmarkMethodologyMetrics
smac-on-smac-def-armored-parallelDRIMA
Median Win Rate: 60.0
smac-on-smac-def-armored-sequentialDRIMA
Median Win Rate: 100
smac-on-smac-def-infantry-parallelDRIMA
Median Win Rate: 100.0
smac-on-smac-def-infantry-sequentialDRIMA
Median Win Rate: 100
smac-on-smac-def-outnumbered-parallelDRIMA
Median Win Rate: 70.0
smac-on-smac-def-outnumbered-sequentialDRIMA
Median Win Rate: 100
smac-on-smac-off-complicated-parallelDRIMA
Median Win Rate: 100
smac-on-smac-off-complicated-sequentialDRIMA
Median Win Rate: 96.9
smac-on-smac-off-distant-parallelDRIMA
Median Win Rate: 95.0
smac-on-smac-off-distant-sequentialDRIMA
Median Win Rate: 100
smac-on-smac-off-hard-parallelDRIMA
Median Win Rate: 80.0
smac-on-smac-off-hard-sequentialDRIMA
Median Win Rate: 93.8
smac-on-smac-off-near-parallelDRIMA
Median Win Rate: 95.0
smac-on-smac-off-near-sequentialDRIMA
Median Win Rate: 93.8
smac-on-smac-off-superhard-parallelDRIMA
Median Win Rate: 0.0
smac-on-smac-off-superhard-sequentialDRIMA
Median Win Rate: 15.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp