Embedded Agency - Machine Intelligence Research Institute

web

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

This is a foundational MIRI document outlining core technical obstacles to aligned AGI; essential reading for understanding the agent foundations research agenda and why classical decision/game theory is insufficient for reasoning about advanced AI systems.

Metadata

Importance: 85/100working paperprimary source

Summary

Embedded Agency by Abram Demski and Scott Garrabrant (2018, updated 2020) addresses foundational challenges for AI alignment arising from agents that are physically embedded within the world they model and act upon, rather than idealized external observers. It systematically explores four core problems: decision theory for embedded agents, building accurate world-models when the agent is part of the world, robust delegation between agents, and stable self-improvement. The work serves as a key reference document for MIRI's agent foundations research agenda.

Key Points

•Classical decision theory assumes agents are separate from their environment; embedded agents must reason about worlds that include themselves, creating logical and computational challenges.
•Covers decision theory problems including Newcomb-like scenarios and the need for alternatives to causal/evidential decision theory, referencing Functional Decision Theory.
•Addresses embedded world-models: how agents can maintain coherent beliefs when they cannot fully represent the system they are part of, including ontological crises and logical uncertainty.
•Explores robust delegation: how principals can reliably communicate values and constraints to agents, and how agents can safely delegate to sub-agents.
•Available in multiple formats: full text, illustrated hand-drawn sequence, and arXiv paper, making it accessible to varied audiences.

Cited by 1 page

Page	Type	Quality
Agent Foundations	Approach	59.0

Cached Content Preview

HTTP 200Fetched Apr 7, 20268 KB

Embedded Agency - Machine Intelligence Research Institute 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 

 
 
 
 
 Skip to content 

 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 Embedded Agency

 
 
 
 
 Embedded Agency  is a write-up by Abram Demski and Scott Garrabrant, available on the AI Alignment Forum  here . There’s also a shorter version of the post as a  hand-drawn sequence , and a lightly rewritten version  on arXiv .

 Embedded Agency was first released in 2018, with the arXiv version following in early 2019. In August 2020, Demski and Garrabrant  substantially updated all versions .

 We’ve included links and references below, listed in the order they come up in the relevant topic/section.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 General

 
 
 
 
 (  Text Introduction   —   Illustrated Introduction   ———   MIRI Blog Afterword   —   LessWrong Afterword  ) 
 Marcus Hutter. 2012. “ One Decade of Universal Artificial Intelligence .” In  Theoretical Foundations of Artificial General Intelligence  4.

 Nate Soares. 2017. “ Ensuring Smarter-Than-Human Intelligence Has A Positive Outcome .”  MIRI Blog .

 Eliezer Yudkowsky. 2018. “ The Rocket Alignment Problem .”  MIRI Blog .

 
Further reading: “ Security Mindset and Ordinary Paranoia ”; “ Agent Foundations for Aligning Machine Intelligence with Human Interests ” 
 
 
 
 Decision Theory

 
 
 
 
 (  Text Version   —   Illustrated Version ) 
 Eliezer Yudkowsky and Nate Soares. 2017. “ Functional Decision Theory: A New Theory of Instrumental Rationality .” arXiv:1710.05060 [cs.AI].

 Scott Garrabrant. 2017. “ Two Major Obstacles for Logical Inductor Decision Theory .”  Intelligent Agent Foundations Forum .

 Patrick LaVictoire. 2015.  An Introduction to Löb’s Theorem in MIRI Research . MIRI technical report 2015–6.

 Rob Bensinger. 2017. “ Decisions Are For Making Bad Outcomes Inconsistent .”  MIRI Blog .

 Wei Dai. 2009. “ Towards a New Decision Theory .”  Less Wrong .

 Vladimir Nesov. 2009. “ Counterfactual Mugging .”  Less Wrong .

 
 
 
 
 Embedded World-Models

 
 
 
 
 (  Text Version   —   Illustrated Version  ) 
 Abram Demski. 2018. “ Toward a New Technical Explanation of Technical Explanation .”  Less Wrong .

 Nate Soares. 2015.  Formalizing Two Problems of Realistic World-Models . MIRI technical report 2015–3.

 Jan Leike. 2016.  Nonparametric General Reinforcement Learning . PhD thesis, Australian National University.

 Laurent Orseau and Mark Ring. 2012. “ Space-Time Embedded Intelligence .” In  Artificial General Intelligence, 5th International Conference . Springer.

 Benja Fallenstein, Jessica Taylor, and Paul Christiano. 2015. “ Reflective Oracles: A Foundation for Classical Game Theory .” arXiv:1508.04145 [cs.AI].

 Jan Leike, Jessica Taylor, and Benya Fallenstein. 2016. “ A Formal Solution to the Grain of Truth Problem .” Paper presented at the 32nd Conference on Uncertainty in Art

... (truncated, 8 KB total)

Resource ID: 1da850cbb06cd522 | Stable ID: sid_jga8l8jCFw