Agent Foundations for Aligning Machine Intelligence

web

2024·MIRI·intelligence.org/research-guide/

Author

Kolya T

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

This is MIRI's official research guide, useful for understanding the agent-foundations approach to alignment and identifying open technical problems; best paired with MIRI's technical papers and the Embedded Agency sequence.

Metadata

Importance: 72/100homepagereference

Summary

MIRI's research guide outlines the theoretical foundations and open problems in agent-based AI alignment, focusing on decision theory, logical uncertainty, corrigibility, and related mathematical challenges. It provides a roadmap for researchers interested in contributing to foundational alignment work. The guide situates these problems within the broader goal of ensuring advanced AI systems remain safe and beneficial.

Key Points

•Covers core MIRI research agendas including logical uncertainty, decision theory, and embedded agency problems.
•Addresses corrigibility and the shutdown problem as central challenges for building safe, correctable AI agents.
•Explores mesa-optimization and inner alignment risks arising from learned models pursuing unintended sub-goals.
•Serves as an entry point for technically-minded researchers wanting to contribute to foundational AI safety work.
•Connects formal agent foundations to broader alignment goals such as value learning and corrigible behavior.

Cited by 5 pages

Page	Type	Quality
Corrigibility Failure Pathways	Analysis	62.0
Mesa-Optimization Risk Analysis	Analysis	61.0
Corrigibility	Research Area	59.0
AI Model Steganography	Risk	91.0
Long-Timelines Technical Worldview	Concept	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202642 KB

Research Guide - Machine Intelligence Research Institute 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 

 
 
 
 
 Skip to content 

 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 A Guide to MIRI’s Research

 
 
 
 
 by Nate Soares 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Update June 2022 : As noted in the 2019 update below, this research guide has only been lightly updated since 2015. We’re also currently doing less hiring (though not zero hiring), and are not currently running AIRCS workshops (though we may run more in the future).

 If you’re interested in contributing to the alignment problem, we recommend starting with the  Alignment Research Field Guide ,  How To Get Into Independent Research On Alignment/Agency , and the resources on the  Late 2021 MIRI Conversations  page.

 If you have additional questions about how to get involved, we recommend contacting Buck Shlegeris of  Redwood Research  or posting on  LessWrong .

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Update March 2019 : This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on the  AI alignment problem  is:

 If you have a computer science or software engineering background : Apply to attend our new  workshops on AI risk  and to  work as an engineer at MIRI . For this purpose, you don’t need any prior familiarity with our research.  If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position,  shoot us an email  and we can talk about whether it makes sense.
 You can find out more about our engineering program in our  2018 strategy update .
 
 If you’d like to learn more about the problems we’re working on  (regardless of your answer to the above): See “ Embedded Agency ” for an introduction to our agent foundations research, and see our  Alignment Research Field Guide  for general recommendations on how to get started in AI safety. After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “ Fixed Point Exercises .” As Scott notes: Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.

 These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.

 
 If you want people to collaborate and discuss with, we suggest starting or joining a  MIRIx group , posting on  LessWrong , applying for our  AI Risk for Computer Scientists  workshops, 

... (truncated, 42 KB total)

Resource ID: ee872736d7fbfcd5 | Stable ID: sid_H6xjufEsJk