Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

paper

2016·arXiv·arxiv.org/abs/1604.06057

Authors

Tejas D. Kulkarni·Karthik R. Narasimhan·Ardavan Saeedi·Joshua B. Tenenbaum

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A foundational hierarchical RL paper relevant to AI safety discussions around goal-directed agents, subgoal decomposition, and how agents with intrinsic motivation might pursue intermediate objectives in ways that could be difficult to oversee or control.

Paper Details

Citations

1,258

82 influential

Year

2016

arXiv:1604.06057 Semantic Scholar

Metadata

Importance: 62/100arxiv preprintprimary source

Abstract

Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse, delayed feedback: (1) a complex discrete stochastic decision process, and (2) the classic ATARI game `Montezuma's Revenge'.

Summary

This paper introduces h-DQN, a hierarchical deep Q-network framework that combines two-level value functions operating at different temporal scales with intrinsic motivation to tackle sparse-reward exploration. A top-level controller sets subgoals while a lower-level controller learns primitive actions to achieve them, enabling more efficient exploration. The approach achieves notable results on Montezuma's Revenge, a benchmark known for extremely sparse rewards.

Key Points

•Proposes h-DQN: a two-level hierarchy where a meta-controller selects intrinsic subgoals and a sub-controller learns actions to satisfy those goals.
•Intrinsic motivation drives exploration by rewarding goal completion, helping the agent overcome sparse external reward signals.
•Goals defined over entities and relations constrain the exploration space, enabling more data-efficient learning in complex environments.
•Demonstrates strong performance on Montezuma's Revenge, a notoriously hard Atari game due to sparse, delayed feedback.
•Relates to options framework and goal-conditioned value functions, offering a practical deep RL instantiation of temporal abstraction.

Cited by 1 page

Page	Type	Quality
Long-Horizon Autonomous Tasks	Capability	65.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202650 KB

[1604.06057] Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

 
 
 Tejas D. Kulkarni 
 BCS, MIT 
 tejask@mit.edu &Karthik R. Narasimhan ∗ 
 CSAIL, MIT 
 karthikn@mit.edu &Ardavan Saeedi 
 CSAIL, MIT 
 ardavans@mit.edu &Joshua B. Tenenbaum 
 BCS, MIT 
 jbt@mit.edu 
 Authors contributed equally and listed alphabetically. 
 

 
 Abstract

 Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse, delayed feedback: (1) a complex discrete stochastic decision process, and (2) the classic ATARI game ‘Montezuma’s Revenge’.

 
 
 
 1 Introduction

 
 Learning goal-directed behavior with sparse feedback from complex environments is a fundamental challenge for artificial intelligence. Learning in this setting requires the agent to represent knowledge at multiple levels of spatio-temporal abstractions and to explore the environment efficiently. Recently, non-linear function approximators coupled with reinforcement learning [ 21 , 28 , 37 ] have made it possible to learn abstractions over high-dimensional state spaces, but the task of exploration with sparse feedback still remains a major challenge. Existing methods like Boltzmann exploration and Thomson sampling  [ 45 , 32 ] offer significant improvements over ϵ italic-ϵ \epsilon -greedy, but are limited due to the underlying models functioning at the level of basic actions. In this work, we propose a framework that integrates deep reinforcement learning with hierarchical value functions (h-DQN), where the agent is motivated to solve intrinsic goals (via learning options) to aid exploration. These goals provide for efficient exploration and help mitigate the sparse feedback problem. Additionally, we observe that goals defined in the space of entities and relations can help significantly constrain the exploration space for data-effi

... (truncated, 50 KB total)

Resource ID: 3272d54e99e53eee | Stable ID: sid_RZIq2dWMBV