MIRI All Publications Index
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: MIRI
MIRI (Machine Intelligence Research Institute) is one of the earliest organizations dedicated to AI alignment research; this index is the canonical starting point for exploring their foundational technical contributions to the field.
Metadata
Summary
A comprehensive index of all publications from the Machine Intelligence Research Institute (MIRI), covering foundational AI safety research including agent foundations, decision theory, logical uncertainty, and value alignment. This page serves as the primary access point for MIRI's technical and strategic research output spanning over a decade of work.
Key Points
- •Central repository for MIRI's full research catalog including technical papers, reports, and blog posts on AI alignment
- •Covers foundational topics: agent foundations, decision theory (TDT, UDT, FDT), logical uncertainty, and corrigibility
- •Includes landmark works like 'Coherent Extrapolated Volition', 'Tiling Agents', and research on instrumental convergence
- •Spans both early foundational work and more recent alignment-focused technical research
- •Useful for tracing the intellectual lineage of many core AI safety concepts still active in the field
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Machine Intelligence Research Institute | Organization | 50.0 |
| Corrigibility Failure | Risk | 62.0 |
Cached Content Preview
[Skip to content](https://intelligence.org/all-publications/#content)
# All MIRI Publications
# Articles
### 2024 – 2025
P Barnett. 2025. “ [Compute Requirements for Algorithmic Innovation in Frontier AI Models](https://arxiv.org/abs/2507.10618).” arXiv:2507.10618 \[cs.LG\].
P Barnett, A Scher, D Abecassis. 2025. “ [Technical Requirements for Halting Dangerous AI Activities](https://arxiv.org/abs/2507.09801).” arXiv:2507.09801 \[cs.AI\].
P Barnett, A Scher. 2025. “ [AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions](https://intelligence.org/wp-content/uploads/2025/05/AI-Governance-to-Avoid-Extinction.pdf).” MIRI technical report 2025-1.
P Barnett. 2024. “ [What AI evaluations for preventing catastrophic risks can and cannot do](https://arxiv.org/abs/2412.08653).” arXiv:2412.08653 \[cs.CY\].
A Scher. 2024. “ [Mechanisms to Verify International Agreements About AI Development](https://intelligence.org/wp-content/uploads/2024/11/Mechanisms-to-Verify-International-Agreements-About-AI-Development-27-Nov-24.pdf).” MIRI technical report 2024-1.
P Barnett, L Thiergart. 2024. “ [Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation](https://arxiv.org/abs/2411.12820).” arXiv:2411.12820 \[cs.AI\].
### 2020 – 2021
S Garrabrant. 2021. “ [Temporal Inference with Finite Factored Sets](https://arxiv.org/abs/2109.11513).” arXiv: 2109.11513 \[cs.AI\].
S Garrabrant, D Herrmann, and J Lopez-Wild. 2021. “ [Cartesian Frames](https://arxiv.org/abs/2109.10996).” arXiv: 2109.10996 \[math.CT\].
E Hubinger. 2020. “ [An Overview of 11 Proposals for Building Safe Advanced AI](https://arxiv.org/abs/2012.07532).” arXiv:2012.07532 \[cs.LG\].
### 2019
A Demski and S Garrabrant. 2019. “ [Embedded Agency](https://arxiv.org/abs/1902.09469).” arXiv:1902.09469 \[cs.AI\].
E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “ [Risks from Learned Optimization in Advanced Machine Learning Systems](https://arxiv.org/abs/1906.01820).” arXiv:1906.01820 \[cs.AI\].
V Kosoy. 2019. “ [Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help](https://drive.google.com/uc?export=download&id=1xa7UpGGODl6mszNWkA4XQGPyeopsNuWu).” Presented at the Safe Machine Learning workshop at ICLR.
### 2018
S Armstrong and S Mindermann. 2018. “ [Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents](http://papers.nips.cc/paper/7803-occams-razor-is-insufficient-to-infer-the-preferences-of-irrational-agents.pdf).” In _Advances in Neural Information Processing Systems_ 31.
D Manheim and S Garrabrant. 2018. “ [Categorizing Variants of Goodhart’s Law](https://arxiv.org/abs/1803.04585).” arXiv:1803.04585 \[cs.AI\].
### 2017
R Carey. 2018. “ [Incorrigibility in the CIRL Framework](https://arxiv.org/abs/1709.06275).” arXiv:1709.06275 \[cs.AI\]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.
A Critch. 2017.
... (truncated, 28 KB total)fc77e6a5087586a3 | Stable ID: NTM4MGQ2NG