Skip to content
Longterm Wiki
Back

MIRI/Open Philanthropy exchange on decision theory

blog

Author

Rob Bensinger

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

This exchange is a rare public record of institutional disagreement between MIRI and Open Philanthropy on decision theory, making it valuable for understanding the landscape of foundational agent-design debates in AI alignment research.

Metadata

Importance: 62/100blog postprimary source

Summary

This post documents a substantive dialogue between MIRI and Open Philanthropy researchers comparing decision theories (CDT, EDT, TDT, UDT, FDT) and their relevance to AI alignment. The exchange focuses on whether updateless decision theories outperform updateful variants on key philosophical dilemmas such as counterfactual mugging and Troll Bridge. It serves as a useful reference for understanding where these organizations agree and disagree on foundational decision-theoretic questions.

Key Points

  • Clarifies distinctions between CDT, EDT, TDT, UDT, and FDT, providing a structured comparison of major decision theory frameworks.
  • Debates whether updateless approaches (UDT, updateless FDT) systematically outperform updateful versions on canonical dilemmas like counterfactual mugging.
  • Explores the Troll Bridge problem as a stress test for decision theories, highlighting edge cases where standard frameworks struggle.
  • Reflects genuine disagreement between MIRI and Open Philanthropy researchers, making institutional perspectives on foundational AI alignment questions explicit.
  • Relevant to AI alignment because the choice of decision theory for AI agents may have significant implications for their behavior in strategic or adversarial settings.

Cited by 1 page

PageTypeQuality
Agent FoundationsApproach59.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202626 KB
x

MIRI/OP exchange about decision theory — AI Alignment Forum

[Decision theory](https://www.alignmentforum.org/w/decision-theory)[Functional Decision Theory](https://www.alignmentforum.org/w/functional-decision-theory)[Causal Decision Theory](https://www.alignmentforum.org/w/causal-decision-theory)[Embedded Agency](https://www.alignmentforum.org/w/embedded-agency)[Evidential Decision Theory](https://www.alignmentforum.org/w/evidential-decision-theory)[AI](https://www.alignmentforum.org/w/ai)[Rationality](https://www.alignmentforum.org/w/rationality)
Frontpage

# 22

# [MIRI/OP exchange about decisiontheory](https://www.alignmentforum.org/posts/FBbHEjkZzdupcjkna/miri-op-exchange-about-decision-theory-1)

by [Rob Bensinger](https://www.alignmentforum.org/users/robbbb?from=post_header)

25th Aug 2021

12 min read

[7](https://www.alignmentforum.org/posts/FBbHEjkZzdupcjkna/miri-op-exchange-about-decision-theory-1#comments)

# 22

Open Philanthropy's Joe Carlsmith and Nick Beckstead had a short conversation about [decision theory](https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh/p/zcPLNNw4wgBX5k8kQ) a few weeks ago with MIRI's Abram Demski and Scott Garrabrant (and me) and LW's Ben Pace. I'm copying it here because I thought others might find it useful.

Terminology notes:

- CDT is[**causal decision theory**](https://plato.stanford.edu/entries/decision-causal/), the dominant theory among working decision theorists. CDT says to choose the action with the best causal consequences.
- EDT is **evidential decision theory**, CDT's traditional rival. EDT says to choose the action such that things go best _conditional_ on your choosing that action.
- TDT is[**timeless decision theory**](http://intelligence.org/files/TDT.pdf), a theory proposed by Eliezer Yudkowsky in 2010. TDT was superseded by FDT/UDT because TDT fails on dilemmas like[counterfactual mugging](https://www.alignmentforum.org/w/counterfactual-mugging), refusing to pay the mugger.
- UDT is[**updateless decision theory**](https://www.alignmentforum.org/w/updateless-decision-theory), a theory proposed by Wei Dai in 2009. UDT in effect asks what action "you would have pre-committed to without the benefit of any observations you have made about the universe", and chooses that action.
- FDT is[**functional decision theory**](https://arxiv.org/abs/1710.05060), an umbrella term introduced by Yudkowsky and Nate Soares in 2017 to refer to UDT-ish approaches to decision theory.

* * *

**Carlsmith:** Anyone have an example of a case where FDT and updateless EDT give different verdicts?

**Beckstead:** Is smoking lesion an example?

I haven't thought about how updateless EDT handles that differently from EDT.

**Demski:** FDT is supposed to be an overarching framework for decision theories "in the MIRI style", whereas updateless EDT is a specific decision theory.

In particular, FDT may or may not be updateless.

Updateful FDT is basically TDT.

Now, I generally claim it's harder to find examples wher

... (truncated, 26 KB total)
Resource ID: db5e810911f924b1 | Stable ID: OWFmMWJkZT