Criticism of the Main Framework in AI Alignment - EA Forum

blog

2022·EA Forum·forum.effectivealtruism.org/posts/Cs8qhNakLuLXY4GvE/criti...

Author

Michele Campolo

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: EA Forum

A philosophical critique from the EA Forum challenging the framing assumptions of mainstream alignment research; useful for understanding debates about whether technical safety or governance/ethics should be the primary lens for AI risk reduction.

Forum Post Details

Karma

Comments

Forum

eaforum

Forum Tags

AI safetyBuilding effective altruismCause prioritizationExistential riskAI alignmentCriticism and Red Teaming Contest

Part of sequence: Ongoing project on moral AI

Metadata

Importance: 42/100blog postcommentary

Summary

Michele Campolo argues that mainstream AI alignment research over-focuses on the technical control problem while neglecting risks from deliberate misuse by malicious actors. The author proposes that moral progress—rather than direct risk reduction—offers a more comprehensive framework addressing both misaligned and maliciously deployed AI across near- and long-term scenarios.

Key Points

•The dominant AI alignment framework centers on the control problem (preventing misaligned AI), but this framing inadequately covers intentional misuse by bad actors.
•Malicious use of powerful AI systems (e.g., authoritarian lock-in, bioweapons) may pose equal or greater risks than unintended misalignment.
•The author proposes moral progress as an alternative organizing framework, arguing it addresses both misalignment and misuse scenarios more holistically.
•The critique highlights a potential blind spot in the EA/rationalist alignment community's prioritization of technical safety over socio-political risks.
•Short- and long-term AI risk landscapes both require frameworks that account for human intent, not just system behavior.

Cited by 1 page

Page	Type	Quality
Model Organisms of Misalignment	Analysis	65.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202614 KB

# Criticism of the main framework in AI alignment
By Michele Campolo
Published: 2022-08-31
Most of the content of the post applies to both short-term and long-term future, and can be read by anyone who has heard about AI alignment before.

0\. Summary
===========

AI alignment research centred around the control problem works well for futures shaped by out-of-control misaligned AI, but not that well for futures shaped by bad actors using AI. Section 1 contains a step-by-step argument for that claim. In section 2 I propose an alternative which aims at moral progress instead of direct risk reduction, and I reply to some objections. I will give technical details about the alternative at some point in the future, in section 3. 

The appendix clarifies some minor ambiguities with terminology and links to other stuff.

1\. Criticism of the main framework in AI alignment
===================================================

1.1 What I mean by main framework
---------------------------------

In short, it’s the rationale behind most work in AI alignment: solving the control problem to reduce existential risk. I am not talking about AI governance, nor about AI safety that has nothing to do with existential risk (e.g. safety of self-driving cars).

Here are the details, presented as a step-by-step argument.

1.  At some point in the future, we'll be able to design AIs that are very good at achieving their goals. (Capabilities premise)
2.  These AIs might have goals that are different from their designers' goals. (Misalignment premise)
3.  Therefore, very bad futures caused by out-of-control misaligned AI are possible. (From previous two premises)
4.  AI alignment research that is motivated by the previous argument often aims at making misalignment between AI and designer, or loss of control, less likely to happen or less severe. (Alignment research premise).

Common approaches are ensuring that the goals of the AI are well specified and aligned with what the designer originally wanted, or making the AI learn our values by observing our behaviour. In case you are new to these ideas, two accessible books on the subject are \[1,2\].

     5\. Therefore, AI alignment research improves the expected value of bad futures                       caused by out-of-control misaligned AI. (From 3 and 4).

By expected value I mean a measure of value that takes likelihood of events into account, and follows some intuitive rules such as "5% chance of extinction is worse than 1% chance of extinction". It need not be an explicit calculation, especially because it might be difficult to compare possible futures quantitatively, e.g. extinction vs dystopia.

I don't claim that all AI alignment research follows this framework; just that this is what motivates a decent amount (I would guess more than half) of work in AI alignment.

1.2 Response
------------

I call this a response, and not a strict objection, because none of the points or inferences in the previous argument is reje

... (truncated, 14 KB total)

Resource ID: bdaa3d7b94d1fe20 | Stable ID: sid_jMOBab6O6O