MIRI All Publications Index

web

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

MIRI (Machine Intelligence Research Institute) is one of the earliest organizations dedicated to AI alignment research; this index is the canonical starting point for exploring their foundational technical contributions to the field.

Metadata

Importance: 72/100homepagereference

Summary

A comprehensive index of all publications from the Machine Intelligence Research Institute (MIRI), covering foundational AI safety research including agent foundations, decision theory, logical uncertainty, and value alignment. This page serves as the primary access point for MIRI's technical and strategic research output spanning over a decade of work.

Key Points

•Central repository for MIRI's full research catalog including technical papers, reports, and blog posts on AI alignment
•Covers foundational topics: agent foundations, decision theory (TDT, UDT, FDT), logical uncertainty, and corrigibility
•Includes landmark works like 'Coherent Extrapolated Volition', 'Tiling Agents', and research on instrumental convergence
•Spans both early foundational work and more recent alignment-focused technical research
•Useful for tracing the intellectual lineage of many core AI safety concepts still active in the field

Cited by 2 pages

Page	Type	Quality
Machine Intelligence Research Institute (MIRI)	Organization	50.0
Corrigibility Failure	Risk	62.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202621 KB

All Publications - Machine Intelligence Research Institute 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 

 
 
 
 
 Skip to content 

 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 All MIRI Publications

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Articles

 
 
 
 
 
 
 
 
 2024 – 2025

 P Barnett. 2025. &#8220; Compute Requirements for Algorithmic Innovation in Frontier AI Models .&#8221; arXiv:2507.10618 [cs.LG].

P Barnett, A Scher, D Abecassis. 2025. &#8220; Technical Requirements for Halting Dangerous AI Activities .&#8221; arXiv:2507.09801 [cs.AI].

P Barnett, A Scher. 2025. &#8220; AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions .&#8221; MIRI technical report 2025-1.

P Barnett. 2024. &#8220; What AI evaluations for preventing catastrophic risks can and cannot do .&#8221; arXiv:2412.08653 [cs.CY].

A Scher. 2024. &#8220; Mechanisms to Verify International Agreements About AI Development .&#8221; MIRI technical report 2024-1.

P Barnett, L Thiergart. 2024. &#8220; Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation .&#8221; arXiv:2411.12820 [cs.AI].

 
 
 
 
 2020 – 2021

 S Garrabrant. 2021. “ Temporal Inference with Finite Factored Sets .” arXiv: 2109.11513 [cs.AI].

 S Garrabrant, D Herrmann, and J Lopez-Wild. 2021. “ Cartesian Frames .” arXiv: 2109.10996 [math.CT].

 E Hubinger. 2020. “ An Overview of 11 Proposals for Building Safe Advanced AI .” arXiv:2012.07532 [cs.LG].

 
 
 
 
 2019

 A Demski and S Garrabrant. 2019. “ Embedded Agency .” arXiv:1902.09469 [cs.AI].

 E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “ Risks from Learned Optimization in Advanced Machine Learning Systems .” arXiv:1906.01820 [cs.AI].

 V Kosoy. 2019. “ Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help .” Presented at the Safe Machine Learning workshop at ICLR.

 
 
 
 
 2018

 S Armstrong and S Mindermann. 2018. “ Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents .” In  Advances in Neural Information Processing Systems  31.

 D Manheim and S Garrabrant. 2018. “ Categorizing Variants of Goodhart’s Law .” arXiv:1803.04585 [cs.AI].

 
 
 
 
 2017

 R Carey. 2018. “ Incorrigibility in the CIRL Framework .” arXiv:1709.06275 [cs.AI]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

 A Critch. 2017. “ Toward Negotiable Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making .” arXiv:1701.01302 [cs.AI].

 S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2017. “ A Formal Approach to the Problem of Logical Non-Omniscience .” Paper presented at the 16th conference on Theoretical Aspects of Rationality and Knowledge.

 K Grace, J Salvatier, A Dafoe, B Zhang, and O Evans. 2017. “ When Will AI Exceed Human Performance? Evidence from AI Experts .” 

... (truncated, 21 KB total)

Resource ID: fc77e6a5087586a3 | Stable ID: sid_To99aIXW2d