Skip to content
Longterm Wiki
Back

Criticism of the Main Framework in AI Alignment - EA Forum

blog

Author

Michele Campolo

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: EA Forum

A philosophical critique from the EA Forum challenging the framing assumptions of mainstream alignment research; useful for understanding debates about whether technical safety or governance/ethics should be the primary lens for AI risk reduction.

Forum Post Details

Karma
45
Comments
9
Forum
eaforum
Forum Tags
AI safetyBuilding effective altruismCause prioritizationExistential riskAI alignmentCriticism and Red Teaming Contest
Part of sequence: Ongoing project on moral AI

Metadata

Importance: 42/100blog postcommentary

Summary

Michele Campolo argues that mainstream AI alignment research over-focuses on the technical control problem while neglecting risks from deliberate misuse by malicious actors. The author proposes that moral progress—rather than direct risk reduction—offers a more comprehensive framework addressing both misaligned and maliciously deployed AI across near- and long-term scenarios.

Key Points

  • The dominant AI alignment framework centers on the control problem (preventing misaligned AI), but this framing inadequately covers intentional misuse by bad actors.
  • Malicious use of powerful AI systems (e.g., authoritarian lock-in, bioweapons) may pose equal or greater risks than unintended misalignment.
  • The author proposes moral progress as an alternative organizing framework, arguing it addresses both misalignment and misuse scenarios more holistically.
  • The critique highlights a potential blind spot in the EA/rationalist alignment community's prioritization of technical safety over socio-political risks.
  • Short- and long-term AI risk landscapes both require frameworks that account for human intent, not just system behavior.

Cited by 1 page

PageTypeQuality
Model Organisms of MisalignmentAnalysis65.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202621 KB
Criticism of the main framework in AI alignment — EA Forum 
 
 This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Hide table of contents Ongoing project on moral AI Criticism of the main framework in AI alignment 

 by Michele Campolo Aug 31 2022 8 min read 9 45

 AI safety Building effective altruism Cause prioritization Existential risk AI alignment Criticism and Red Teaming Contest Frontpage Criticism of the main framework in AI alignment 0. Summary 1. Criticism of the main framework in AI alignment 1.1 What I mean by main framework 1.2 Response 2. An alternative to the main framework 2.1 Moral progress as a goal of alignment research 2.2 Some considerations and objections to the alternative 3. Technical details about the alternative References Appendix Terminology Other stuff 9 comments Most of the content of the post applies to both short-term and long-term future, and can be read by anyone who has heard about AI alignment before.

 0. Summary

 AI alignment research centred around the control problem works well for futures shaped by out-of-control misaligned AI, but not that well for futures shaped by bad actors using AI. Section 1 contains a step-by-step argument for that claim. In section 2 I propose an alternative which aims at moral progress instead of direct risk reduction, and I reply to some objections. I will give technical details about the alternative at some point in the future, in section 3. 

 The appendix clarifies some minor ambiguities with terminology and links to other stuff.

 1. Criticism of the main framework in AI alignment

 1.1 What I mean by main framework

 In short, it’s the rationale behind most work in AI alignment: solving the control problem to reduce existential risk. I am not talking about AI governance, nor about AI safety that has nothing to do with existential risk (e.g. safety of self-driving cars).

 Here are the details, presented as a step-by-step argument.

 At some point in the future, we'll be able to design AIs that are very good at achieving their goals. (Capabilities premise)
 These AIs might have goals that are different from their designers' goals. (Misalignment premise)
 Therefore, very bad futures caused by out-of-control misaligned AI are possible. (From previous two premises)
 AI alignment research that is motivated by the previous argument often aims at making misalignment between AI and designer, or loss of control, less likely to happen or less severe. (Alignment research premise).
 Common approaches are ensuring that the goals of the AI are well specified and aligned with what the designer originally wanted, or making the AI learn our values by observing our behaviour. In case you are new to these ideas, two accessible books on the subject are [1,2].

      5. Therefore, AI alignment research improves the expected value of bad futures                       caused by out-of-control misaligned AI. (From

... (truncated, 21 KB total)
Resource ID: bdaa3d7b94d1fe20 | Stable ID: Nzk5Y2NhMG