Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
A foundational hierarchical RL paper relevant to AI safety discussions around goal-directed agents, subgoal decomposition, and how agents with intrinsic motivation might pursue intermediate objectives in ways that could be difficult to oversee or control.
Paper Details
Metadata
Abstract
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse, delayed feedback: (1) a complex discrete stochastic decision process, and (2) the classic ATARI game `Montezuma's Revenge'.
Summary
This paper introduces h-DQN, a hierarchical deep Q-network framework that combines two-level value functions operating at different temporal scales with intrinsic motivation to tackle sparse-reward exploration. A top-level controller sets subgoals while a lower-level controller learns primitive actions to achieve them, enabling more efficient exploration. The approach achieves notable results on Montezuma's Revenge, a benchmark known for extremely sparse rewards.
Key Points
- •Proposes h-DQN: a two-level hierarchy where a meta-controller selects intrinsic subgoals and a sub-controller learns actions to satisfy those goals.
- •Intrinsic motivation drives exploration by rewarding goal completion, helping the agent overcome sparse external reward signals.
- •Goals defined over entities and relations constrain the exploration space, enabling more data-efficient learning in complex environments.
- •Demonstrates strong performance on Montezuma's Revenge, a notoriously hard Atari game due to sparse, delayed feedback.
- •Relates to options framework and goal-conditioned value functions, offering a practical deep RL instantiation of temporal abstraction.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Long-Horizon Autonomous Tasks | Capability | 65.0 |
Cached Content Preview
# Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Tejas D. Kulkarni
BCS, MIT
tejask@mit.edu &Karthik R. Narasimhan∗
CSAIL, MIT
karthikn@mit.edu &Ardavan Saeedi
CSAIL, MIT
ardavans@mit.edu &Joshua B. Tenenbaum
BCS, MIT
jbt@mit.eduAuthors contributed equally and listed alphabetically.
###### Abstract
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse, delayed feedback: (1) a complex discrete stochastic decision process, and (2) the classic ATARI game ‘Montezuma’s Revenge’.
## 1 Introduction
Learning goal-directed behavior with sparse feedback from complex environments is a fundamental challenge for artificial intelligence. Learning in this setting requires the agent to represent knowledge at multiple levels of spatio-temporal abstractions and to explore the environment efficiently. Recently, non-linear function approximators coupled with reinforcement learning \[ [21](https://ar5iv.labs.arxiv.org/html/1604.06057#bib.bib21 ""), [28](https://ar5iv.labs.arxiv.org/html/1604.06057#bib.bib28 ""), [37](https://ar5iv.labs.arxiv.org/html/1604.06057#bib.bib37 "")\] have made it possible to learn abstractions over high-dimensional state spaces, but the task of exploration with sparse feedback still remains a major challenge. Existing methods like Boltzmann exploration and Thomson sampling \[ [45](https://ar5iv.labs.arxiv.org/html/1604.06057#bib.bib45 ""), [32](https://ar5iv.labs.arxiv.org/html/1604.06057#bib.bib32 "")\] offer significant improvements over ϵitalic-ϵ\\epsilon-greedy, but are limited due to the underlying models functioning at the level of basic actions. In this work, we propose a framework that integrates deep reinforcement learning with hierarchical value functions (h-DQN), where the agent is motivated to solve intrinsic goals (via learning options) to aid exploration. These goals provide for efficient exploration and help mitigate the sparse feedback problem. Additionally, we observe th
... (truncated, 56 KB total)3272d54e99e53eee | Stable ID: NTNkMmZjNj