Skip to content
Longterm Wiki
Back

MACPO (Multi-Agent Constrained Policy Optimization)

paper

Authors

Ankita Kushwaha·Kiran Ravish·Preeti Lamba·Pawan Kumar

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A comprehensive survey of Safe Reinforcement Learning (SafeRL) and Multi-Agent Safe RL, covering Constrained Markov Decision Processes and theoretical foundations essential for developing provably safe AI systems.

Paper Details

Citations
0
0 influential
Year
2025
Methodology
survey

Metadata

arxiv preprintprimary source

Abstract

Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.

Summary

This survey provides a comprehensive technical overview of Safe Reinforcement Learning (SafeRL), focusing on Constrained Markov Decision Processes (CMDPs) and their extensions to multi-agent settings (SafeMARL). The paper reviews theoretical foundations of CMDPs, state-of-the-art algorithms for single-agent SafeRL including policy gradient methods with safety guarantees and safe exploration strategies, and recent advances in SafeMARL for both cooperative and competitive scenarios. The authors identify five open research problems to guide future work, with particular emphasis on advancing SafeMARL, making this a technical reference for researchers developing safe learning algorithms.

Cited by 1 page

PageTypeQuality
Multi-Agent SafetyApproach68.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
[License: CC BY 4.0](https://info.arxiv.org/help/license/index.html#licenses-available)

arXiv:2505.17342v1 \[cs.LG\] 22 May 2025

# A Survey of Safe Reinforcement Learning and Constrained MDPs:  A Technical Survey on Single-Agent and Multi-Agent Safety

Report issue for preceding element

Ankita Kushwaha
International Institute of Information Technology, Hyderabad
Kiran Ravish
International Institute of Information Technology, Hyderabad
Preeti Lamba
International Institute of Information Technology, Hyderabad
Pawan Kumar
International Institute of Information Technology, Hyderabad

Report issue for preceding element

###### Abstract

Report issue for preceding element

Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.

Report issue for preceding element

## 1 Introduction

Report issue for preceding element

Reinforcement learning (RL) has achieved remarkable success in domains such as games, robotics, and autonomous systems. However, when deploying RL in real-world _safety-critical_ applications (e.g., autonomous driving, healthcare, robotics), it is essential to ensure that the learning agent avoids catastrophic failures or unsafe behaviors Amodei et al. ( [2016](https://arxiv.org/html/2505.17342v1#bib.bib6 "")); Garcia and Fernandez ( [2015](https://arxiv.org/html/2505.17342v1#bib.bib30 "")). Safe Reinforcement Learning (SafeRL) addresses this need by augmenting standard RL objectives with safety considerations, typically in the form of constraints on the agent’s behavior or environment outcomes.

Report issue for preceding element

###### Definition 1.1.

Report issue for preceding element

The goal in SafeRL is to maximize performance (cumulative reward) while satisfying safety constraints during training and deployment.

Report issue for preceding element

A common framework for SafeRL is the Constrained Markov Decision Process (CMDP) introduced by Altman ( [1999](https://arxiv.org/html/2505.17342v1#bib.bib5 "")). In a 

... (truncated, 98 KB total)
Resource ID: 7ba5b02ca89ba9eb | Stable ID: OWVjYTg1Yz