MACPO (Multi-Agent Constrained Policy Optimization)
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
A comprehensive survey of Safe Reinforcement Learning (SafeRL) and Multi-Agent Safe RL, covering Constrained Markov Decision Processes and theoretical foundations essential for developing provably safe AI systems.
Paper Details
Metadata
Abstract
Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.
Summary
This survey provides a comprehensive technical overview of Safe Reinforcement Learning (SafeRL), focusing on Constrained Markov Decision Processes (CMDPs) and their extensions to multi-agent settings (SafeMARL). The paper reviews theoretical foundations of CMDPs, state-of-the-art algorithms for single-agent SafeRL including policy gradient methods with safety guarantees and safe exploration strategies, and recent advances in SafeMARL for both cooperative and competitive scenarios. The authors identify five open research problems to guide future work, with particular emphasis on advancing SafeMARL, making this a technical reference for researchers developing safe learning algorithms.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Multi-Agent Safety | Approach | 68.0 |
Cached Content Preview
[License: CC BY 4.0](https://info.arxiv.org/help/license/index.html#licenses-available)
arXiv:2505.17342v1 \[cs.LG\] 22 May 2025
# A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety
Report issue for preceding element
Ankita Kushwaha
International Institute of Information Technology, Hyderabad
Kiran Ravish
International Institute of Information Technology, Hyderabad
Preeti Lamba
International Institute of Information Technology, Hyderabad
Pawan Kumar
International Institute of Information Technology, Hyderabad
Report issue for preceding element
###### Abstract
Report issue for preceding element
Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.
Report issue for preceding element
## 1 Introduction
Report issue for preceding element
Reinforcement learning (RL) has achieved remarkable success in domains such as games, robotics, and autonomous systems. However, when deploying RL in real-world _safety-critical_ applications (e.g., autonomous driving, healthcare, robotics), it is essential to ensure that the learning agent avoids catastrophic failures or unsafe behaviors Amodei et al. ( [2016](https://arxiv.org/html/2505.17342v1#bib.bib6 "")); Garcia and Fernandez ( [2015](https://arxiv.org/html/2505.17342v1#bib.bib30 "")). Safe Reinforcement Learning (SafeRL) addresses this need by augmenting standard RL objectives with safety considerations, typically in the form of constraints on the agent’s behavior or environment outcomes.
Report issue for preceding element
###### Definition 1.1.
Report issue for preceding element
The goal in SafeRL is to maximize performance (cumulative reward) while satisfying safety constraints during training and deployment.
Report issue for preceding element
A common framework for SafeRL is the Constrained Markov Decision Process (CMDP) introduced by Altman ( [1999](https://arxiv.org/html/2505.17342v1#bib.bib5 "")). In a
... (truncated, 98 KB total)7ba5b02ca89ba9eb | Stable ID: OWVjYTg1Yz