Model-based RL

paper

2019·arXiv·arxiv.org/abs/1906.08253

Authors

Michael Janner·Justin Fu·Marvin Zhang·Sergey Levine

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper investigates model-based reinforcement learning algorithms, analyzing the trade-off between data generation and model bias—important for understanding how RL systems learn safely and efficiently with limited real-world data.

Paper Details

Citations

212 influential

Year

2024

arXiv:1906.08253 DOI:10.52843/cassyni.hwpvkj Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

Summary

This paper addresses the fundamental challenge in model-based reinforcement learning of balancing data efficiency gains from learned models against the bias introduced by model-generated data. The authors provide theoretical analysis of model usage in policy optimization, showing that a simple approach of generating short rollouts from learned models branched off real data can achieve both improved sample efficiency over prior model-based methods and asymptotic performance matching state-of-the-art model-free algorithms. They demonstrate that incorporating empirical estimates of model generalization into theoretical guarantees justifies model usage, and their method scales effectively to longer horizons where other model-based approaches fail.

Cited by 1 page

Page	Type	Quality
Long-Horizon Autonomous Tasks	Capability	65.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202681 KB

[1906.08253] When to Trust Your Model: Model-Based Policy Optimization 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 When to Trust Your Model: 
 Model-Based Policy Optimization

 
 
 Michael Janner       
Justin Fu       
Marvin Zhang       
Sergey Levine 
 University of California, Berkeley 
 {janner, justinjfu, marvin, svlevine}@eecs.berkeley.edu 
 
 
 

 
 Abstract

 Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data.
In this paper, we study the role of model usage in policy optimization both theoretically and empirically.
We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step.
In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage.
Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls.
In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

 
 
 
 1 Introduction

 
 Reinforcement learning algorithms generally fall into one of two categories: model-based approaches, which build a predictive model of an environment and derive a controller from it, and model-free techniques, which learn a direct mapping from states to actions.
Model-free methods have shown promise as a general-purpose tool for learning complex policies from raw state inputs
 (Mnih et al., 2015 ; Lillicrap et al., 2016 ; Haarnoja et al., 2018 ) ,
but their generality comes at the cost of efficiency.
When dealing with real-world physical systems, for which data collection can be an arduous process, model-based approaches are appealing due to their comparatively fast learning.
However, model accuracy acts as a bottleneck to policy quality, often causing model-based approaches to perform worse asymptotically than their model-free counterparts.

 
 
 In this paper, we study how to most effectively use a predictive model for policy optimization.
We first formulate and analyze a class of model-based reinforcement learning algorithms with improvement guarantees.
Although there has been recent interest in monotonic improvement of model-based reinforcement learning algorithms (Sun et al., 2018 ; Luo et al., 2019 ) , most commonly used model-based approaches lack the improvement guarantees that underpin many model-free methods (Schulman et al., 2015 ) .
While it is possible to apply analogous techniques to the study of model-based methods to 

... (truncated, 81 KB total)

Resource ID: e97b8be1cc138942 | Stable ID: sid_g3kt2yuS9s