AI Alignment Forum

blog

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

The AI Alignment Forum is the primary online community for technical AI safety research; the featured post represents foundational agent-foundations work questioning utility function orthodoxy in decision theory.

Metadata

Importance: 72/100homepage

Summary

The AI Alignment Forum is a central community platform for technical AI safety and alignment research discussion. The featured post argues against 'reductive utility' (utility functions over possible worlds) and proposes the Jeffrey-Bolker framework as an alternative that avoids ontological crises and computability constraints by grounding preferences in agent-relative events rather than universal physics.

Key Points

•Challenges the standard assumption that rational agents must have utility functions defined over possible worlds ('reductive utility view')
•Identifies two core problems: ontological crises when physics understanding changes, and overly restrictive computability requirements on utility functions
•Proposes Jeffrey-Bolker framework as an alternative where agents have preferences over events without requiring an underlying utility function over worlds
•Argues preferences should be specified in high-level concepts (human welfare, paperclips) rather than microscopic physical degrees of freedom
•The forum itself is the primary hub for serious technical AI alignment research discourse and community discussion

Cited by 10 pages

Page	Type	Quality
Capabilities-to-Safety Pipeline Model	Analysis	73.0
AI Compounding Risks Analysis Model	Analysis	60.0
Mesa-Optimization Risk Analysis	Analysis	61.0
Worldview-Intervention Mapping	Analysis	62.0
Alignment Research Center (ARC)	Organization	57.0
Conjecture	Organization	37.0
Google DeepMind	Organization	37.0
Machine Intelligence Research Institute (MIRI)	Organization	50.0
Dario Amodei	Person	41.0
Paul Christiano	Person	39.0

Cached Content Preview

HTTP 200Fetched Feb 26, 202634 KB

[Home](https://www.alignmentforum.org/)[Library](https://www.alignmentforum.org/library)[Questions](https://www.alignmentforum.org/questions)[All Posts](https://www.alignmentforum.org/allPosts)

[About](https://www.alignmentforum.org/about)

[An Orthodox Case Against Utility Functions](https://www.alignmentforum.org/posts/A8iGaZ3uHNNGgJeaD/an-orthodox-case-against-utility-functions)

[Best of LessWrong 2020](https://www.alignmentforum.org/bestoflesswrong?year=2020&category=ai%20safety)

Abram argues against assuming that rational agents have utility functions over worlds (which he calls the "reductive utility" view). Instead, he points out that you can have a perfectly valid decision theory where agents just have preferences over events, without having to assume there's some underlying utility function over worlds.

![](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/splashArtImagePrompta%20human%20mind%20shaped%20puzzle%20with%20some%20puzzle%20pieces%20labeled%20as%20different%20emotions/mbz9gjrhxkf8epjkxknu)

13Vanessa Kosoy

In this post, the author presents a case for replacing expected utility theory with some other structure which has no explicit utility function, but only quantities that correspond to conditional expectations of utility.

To provide motivation, the author starts from what he calls the "reductive utility view", which is the thesis he sets out to overthrow. He then identifies two problems with the view.

The first problem is about the ontology in which preferences are defined. In the reductive utility view, the domain of the utility function is the set of possible universes, according to the best available understanding of physics. This is objectionable, because then the agent needs to somehow change the domain as its understanding of physics grows (the ontological crisis problem). It seems more natural to allow the agent's preferences to be specified in terms of the high-level concepts it cares about (e.g. human welfare or paperclips), not in terms of the microscopic degrees of freedom (e.g. quantum fields or strings). There are also additional complications related to the unobservability of rewards, and to "moral uncertainty".

The second problem is that the reductive utility view requires the utility function to be computable. The author considers this an overly restrictive requirement, since it rules out utility functions such as in the procrastination paradox (1 is the button is ever pushed, 0 if the button is never pushed). More generally, computable utility function have to be continuous (in the sense of the topology on the space of infinite histories which is obtained from regarding it as an infinite cartesian product over time).

The alternative suggested by the author is using the Jeffrey-Bolker framework. Alas, the author does not write down the precise mathematical definition of the framework, which I find frustrating. The linked article in the Stanford Encyclopedia of Philo

... (truncated, 34 KB total)

Resource ID: 2e0c662574087c2a | Stable ID: sid_qD9FxXUZur