Criticism of the Main Framework in AI Alignment - EA Forum
blogAuthor
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: EA Forum
A philosophical critique from the EA Forum challenging the framing assumptions of mainstream alignment research; useful for understanding debates about whether technical safety or governance/ethics should be the primary lens for AI risk reduction.
Forum Post Details
Metadata
Summary
Michele Campolo argues that mainstream AI alignment research over-focuses on the technical control problem while neglecting risks from deliberate misuse by malicious actors. The author proposes that moral progress—rather than direct risk reduction—offers a more comprehensive framework addressing both misaligned and maliciously deployed AI across near- and long-term scenarios.
Key Points
- •The dominant AI alignment framework centers on the control problem (preventing misaligned AI), but this framing inadequately covers intentional misuse by bad actors.
- •Malicious use of powerful AI systems (e.g., authoritarian lock-in, bioweapons) may pose equal or greater risks than unintended misalignment.
- •The author proposes moral progress as an alternative organizing framework, arguing it addresses both misalignment and misuse scenarios more holistically.
- •The critique highlights a potential blind spot in the EA/rationalist alignment community's prioritization of technical safety over socio-political risks.
- •Short- and long-term AI risk landscapes both require frameworks that account for human intent, not just system behavior.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Model Organisms of Misalignment | Analysis | 65.0 |
Cached Content Preview
Criticism of the main framework in AI alignment — EA Forum
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Hide table of contents Ongoing project on moral AI Criticism of the main framework in AI alignment
by Michele Campolo Aug 31 2022 8 min read 9 45
AI safety Building effective altruism Cause prioritization Existential risk AI alignment Criticism and Red Teaming Contest Frontpage Criticism of the main framework in AI alignment 0. Summary 1. Criticism of the main framework in AI alignment 1.1 What I mean by main framework 1.2 Response 2. An alternative to the main framework 2.1 Moral progress as a goal of alignment research 2.2 Some considerations and objections to the alternative 3. Technical details about the alternative References Appendix Terminology Other stuff 9 comments Most of the content of the post applies to both short-term and long-term future, and can be read by anyone who has heard about AI alignment before.
0. Summary
AI alignment research centred around the control problem works well for futures shaped by out-of-control misaligned AI, but not that well for futures shaped by bad actors using AI. Section 1 contains a step-by-step argument for that claim. In section 2 I propose an alternative which aims at moral progress instead of direct risk reduction, and I reply to some objections. I will give technical details about the alternative at some point in the future, in section 3.
The appendix clarifies some minor ambiguities with terminology and links to other stuff.
1. Criticism of the main framework in AI alignment
1.1 What I mean by main framework
In short, it’s the rationale behind most work in AI alignment: solving the control problem to reduce existential risk. I am not talking about AI governance, nor about AI safety that has nothing to do with existential risk (e.g. safety of self-driving cars).
Here are the details, presented as a step-by-step argument.
At some point in the future, we'll be able to design AIs that are very good at achieving their goals. (Capabilities premise)
These AIs might have goals that are different from their designers' goals. (Misalignment premise)
Therefore, very bad futures caused by out-of-control misaligned AI are possible. (From previous two premises)
AI alignment research that is motivated by the previous argument often aims at making misalignment between AI and designer, or loss of control, less likely to happen or less severe. (Alignment research premise).
Common approaches are ensuring that the goals of the AI are well specified and aligned with what the designer originally wanted, or making the AI learn our values by observing our behaviour. In case you are new to these ideas, two accessible books on the subject are [1,2].
5. Therefore, AI alignment research improves the expected value of bad futures caused by out-of-control misaligned AI. (From
... (truncated, 21 KB total)bdaa3d7b94d1fe20 | Stable ID: Nzk5Y2NhMG