Model-based RL
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
This paper investigates model-based reinforcement learning algorithms, analyzing the trade-off between data generation and model bias—important for understanding how RL systems learn safely and efficiently with limited real-world data.
Paper Details
Metadata
Abstract
Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.
Summary
This paper addresses the fundamental challenge in model-based reinforcement learning of balancing data efficiency gains from learned models against the bias introduced by model-generated data. The authors provide theoretical analysis of model usage in policy optimization, showing that a simple approach of generating short rollouts from learned models branched off real data can achieve both improved sample efficiency over prior model-based methods and asymptotic performance matching state-of-the-art model-free algorithms. They demonstrate that incorporating empirical estimates of model generalization into theoretical guarantees justifies model usage, and their method scales effectively to longer horizons where other model-based approaches fail.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Long-Horizon Autonomous Tasks | Capability | 65.0 |
Cached Content Preview
# When to Trust Your Model: Model-Based Policy Optimization
Michael Janner
Justin Fu
Marvin Zhang
Sergey Levine
University of California, Berkeley
{janner, justinjfu, marvin, svlevine}@eecs.berkeley.edu
###### Abstract
Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data.
In this paper, we study the role of model usage in policy optimization both theoretically and empirically.
We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step.
In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage.
Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls.
In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.
### 1 Introduction
Reinforcement learning algorithms generally fall into one of two categories: model-based approaches, which build a predictive model of an environment and derive a controller from it, and model-free techniques, which learn a direct mapping from states to actions.
Model-free methods have shown promise as a general-purpose tool for learning complex policies from raw state inputs
(Mnih et al., [2015](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib27 ""); Lillicrap et al., [2016](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib25 ""); Haarnoja et al., [2018](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib16 "")),
but their generality comes at the cost of efficiency.
When dealing with real-world physical systems, for which data collection can be an arduous process, model-based approaches are appealing due to their comparatively fast learning.
However, model accuracy acts as a bottleneck to policy quality, often causing model-based approaches to perform worse asymptotically than their model-free counterparts.
In this paper, we study how to most effectively use a predictive model for policy optimization.
We first formulate and analyze a class of model-based reinforcement learning algorithms with improvement guarantees.
Although there has been recent interest in monotonic improvement of model-based reinforcement learning algorithms (Sun et al., [2018](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib38 ""); Luo et al., [2019](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib26 "")), most commonly used model-based approaches lack the improvement guarantees that underp
... (truncated, 91 KB total)e97b8be1cc138942 | Stable ID: ODNmOWE4Yj