Skip to content
Longterm Wiki
Back

Model-based RL

paper

Authors

Michael Janner·Justin Fu·Marvin Zhang·Sergey Levine

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper investigates model-based reinforcement learning algorithms, analyzing the trade-off between data generation and model bias—important for understanding how RL systems learn safely and efficiently with limited real-world data.

Paper Details

Citations
12
212 influential
Year
2024

Metadata

arxiv preprintprimary source

Abstract

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

Summary

This paper addresses the fundamental challenge in model-based reinforcement learning of balancing data efficiency gains from learned models against the bias introduced by model-generated data. The authors provide theoretical analysis of model usage in policy optimization, showing that a simple approach of generating short rollouts from learned models branched off real data can achieve both improved sample efficiency over prior model-based methods and asymptotic performance matching state-of-the-art model-free algorithms. They demonstrate that incorporating empirical estimates of model generalization into theoretical guarantees justifies model usage, and their method scales effectively to longer horizons where other model-based approaches fail.

Cited by 1 page

PageTypeQuality
Long-Horizon Autonomous TasksCapability65.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202691 KB
# When to Trust Your Model:    Model-Based Policy Optimization

Michael Janner
Justin Fu
Marvin Zhang
Sergey Levine

University of California, Berkeley

{janner, justinjfu, marvin, svlevine}@eecs.berkeley.edu

###### Abstract

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data.
In this paper, we study the role of model usage in policy optimization both theoretically and empirically.
We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step.
In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage.
Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls.
In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

### 1 Introduction

Reinforcement learning algorithms generally fall into one of two categories: model-based approaches, which build a predictive model of an environment and derive a controller from it, and model-free techniques, which learn a direct mapping from states to actions.
Model-free methods have shown promise as a general-purpose tool for learning complex policies from raw state inputs
(Mnih et al., [2015](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib27 ""); Lillicrap et al., [2016](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib25 ""); Haarnoja et al., [2018](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib16 "")),
but their generality comes at the cost of efficiency.
When dealing with real-world physical systems, for which data collection can be an arduous process, model-based approaches are appealing due to their comparatively fast learning.
However, model accuracy acts as a bottleneck to policy quality, often causing model-based approaches to perform worse asymptotically than their model-free counterparts.

In this paper, we study how to most effectively use a predictive model for policy optimization.
We first formulate and analyze a class of model-based reinforcement learning algorithms with improvement guarantees.
Although there has been recent interest in monotonic improvement of model-based reinforcement learning algorithms (Sun et al., [2018](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib38 ""); Luo et al., [2019](https://ar5iv.labs.arxiv.org/html/1906.08253#bib.bib26 "")), most commonly used model-based approaches lack the improvement guarantees that underp

... (truncated, 91 KB total)
Resource ID: e97b8be1cc138942 | Stable ID: ODNmOWE4Yj