Skip to content
Longterm Wiki
Back

MIRI All Publications Index

web

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

MIRI (Machine Intelligence Research Institute) is one of the earliest organizations dedicated to AI alignment research; this index is the canonical starting point for exploring their foundational technical contributions to the field.

Metadata

Importance: 72/100homepagereference

Summary

A comprehensive index of all publications from the Machine Intelligence Research Institute (MIRI), covering foundational AI safety research including agent foundations, decision theory, logical uncertainty, and value alignment. This page serves as the primary access point for MIRI's technical and strategic research output spanning over a decade of work.

Key Points

  • Central repository for MIRI's full research catalog including technical papers, reports, and blog posts on AI alignment
  • Covers foundational topics: agent foundations, decision theory (TDT, UDT, FDT), logical uncertainty, and corrigibility
  • Includes landmark works like 'Coherent Extrapolated Volition', 'Tiling Agents', and research on instrumental convergence
  • Spans both early foundational work and more recent alignment-focused technical research
  • Useful for tracing the intellectual lineage of many core AI safety concepts still active in the field

Cited by 2 pages

Cached Content Preview

HTTP 200Fetched Mar 20, 202628 KB
[Skip to content](https://intelligence.org/all-publications/#content)

# All MIRI Publications

# Articles

### 2024 – 2025

P Barnett. 2025. “ [Compute Requirements for Algorithmic Innovation in Frontier AI Models](https://arxiv.org/abs/2507.10618).” arXiv:2507.10618 \[cs.LG\].

P Barnett, A Scher, D Abecassis. 2025. “ [Technical Requirements for Halting Dangerous AI Activities](https://arxiv.org/abs/2507.09801).” arXiv:2507.09801 \[cs.AI\].

P Barnett, A Scher. 2025. “ [AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions](https://intelligence.org/wp-content/uploads/2025/05/AI-Governance-to-Avoid-Extinction.pdf).” MIRI technical report 2025-1.

P Barnett. 2024. “ [What AI evaluations for preventing catastrophic risks can and cannot do](https://arxiv.org/abs/2412.08653).” arXiv:2412.08653 \[cs.CY\].

A Scher. 2024. “ [Mechanisms to Verify International Agreements About AI Development](https://intelligence.org/wp-content/uploads/2024/11/Mechanisms-to-Verify-International-Agreements-About-AI-Development-27-Nov-24.pdf).” MIRI technical report 2024-1.

P Barnett, L Thiergart. 2024. “ [Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation](https://arxiv.org/abs/2411.12820).” arXiv:2411.12820 \[cs.AI\].

### 2020 – 2021

S Garrabrant. 2021. “ [Temporal Inference with Finite Factored Sets](https://arxiv.org/abs/2109.11513).” arXiv: 2109.11513 \[cs.AI\].

S Garrabrant, D Herrmann, and J Lopez-Wild. 2021. “ [Cartesian Frames](https://arxiv.org/abs/2109.10996).” arXiv: 2109.10996 \[math.CT\].

E Hubinger. 2020. “ [An Overview of 11 Proposals for Building Safe Advanced AI](https://arxiv.org/abs/2012.07532).” arXiv:2012.07532 \[cs.LG\].

### 2019

A Demski and S Garrabrant. 2019. “ [Embedded Agency](https://arxiv.org/abs/1902.09469).” arXiv:1902.09469 \[cs.AI\].

E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “ [Risks from Learned Optimization in Advanced Machine Learning Systems](https://arxiv.org/abs/1906.01820).” arXiv:1906.01820 \[cs.AI\].

V Kosoy. 2019. “ [Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help](https://drive.google.com/uc?export=download&id=1xa7UpGGODl6mszNWkA4XQGPyeopsNuWu).” Presented at the Safe Machine Learning workshop at ICLR.

### 2018

S Armstrong and S Mindermann. 2018. “ [Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents](http://papers.nips.cc/paper/7803-occams-razor-is-insufficient-to-infer-the-preferences-of-irrational-agents.pdf).” In _Advances in Neural Information Processing Systems_ 31.

D Manheim and S Garrabrant. 2018. “ [Categorizing Variants of Goodhart’s Law](https://arxiv.org/abs/1803.04585).” arXiv:1803.04585 \[cs.AI\].

### 2017

R Carey. 2018. “ [Incorrigibility in the CIRL Framework](https://arxiv.org/abs/1709.06275).” arXiv:1709.06275 \[cs.AI\]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

A Critch. 2017.

... (truncated, 28 KB total)
Resource ID: fc77e6a5087586a3 | Stable ID: NTM4MGQ2NG