EA Forum - Some Quick Thoughts on AI Is Easy to Control
blogAuthor
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: EA Forum
This EA Forum post responds to optimistic claims about AI controllability, offering a counterargument relevant to debates about whether current alignment progress meaningfully addresses risks from future superintelligent systems.
Forum Post Details
Metadata
Summary
Mikhail Samin critiques claims that AI is easy to control, arguing that such arguments conflate controllability of current subhuman systems with the fundamentally harder problem of controlling superintelligent systems. He contends that success in overseeing weaker AI cannot be bootstrapped to solve alignment for superhuman systems, as the capability jump represents a qualitative shift in the control problem requiring superhuman oversight.
Key Points
- •Conflating controllability of current AI with alignment of superintelligent AI misrepresents the core safety challenge.
- •Subhuman AI systems may be easier to oversee, but controlling superhuman systems requires oversight capabilities that humans don't yet possess.
- •Success with weaker AI systems cannot be simply extrapolated to solve alignment for more capable, superhuman systems.
- •The leap from subhuman to superhuman capability is a qualitative shift, not merely a quantitative one, making the control problem fundamentally harder.
- •Alignment and controllability are distinct concepts, and conflating them obscures the real difficulty of the long-term safety problem.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| ControlAI | Organization | 63.0 |
Cached Content Preview
Some quick thoughts on "AI is easy to control" — EA Forum
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Hide table of contents Some quick thoughts on "AI is easy to control"
by MikhailSamin Dec 7 2023 9 min read 4 5
AI safety Frontpage Some quick thoughts on "AI is easy to control" Intro Optimization Interventions "AI control research is easier" "Values are easy to learn" Conclusion 4 comments There are many things I feel like the post authors miss, and I want to share a few thoughts that seem good to communicate.
I'm going to focus on controlling superintelligent AI systems: systems powerful enough to solve alignment (in the CEV sense) completely, or to kill everyone on the planet.
In this post, I'm going to ignore other AI-related sources of x-risk, such as AI-enabled bioterrorism , and I'm not commenting on everything that seems important to comment on.
I'm also not going to point at all the slippery claims that I think can make the reader generalize incorrectly, as it'd be nitpicky and also not worth the time (examples of what I'd skip- I couldn't find evidence that GPT-4 has undergone any supervised fine-tuning; RLHF shapes chatbots' brains into the kind of systems that produce outputs that make human graders click on thumbs-up/"I prefer this text", smart systems that do that are not themselves necessarily "preferred" by human graders; one footnote [1] ).
Intro
many people are worried that we will lose control of artificial intelligence, leading to human extinction or a similarly catastrophic “AI takeover.” We hope the arguments in this essay make such an outcome seem implausible. But even if future AI turns out to be less “controllable” in a strict sense of the word— simply because, for example, it thinks faster than humans can directly supervise— we also argue it will be easy to instill our values into an AI, a process called “ alignment .”
This misrepresents the worry . Saying "but even if" makes it look like: people worrying about x-risk place credence on "loss of control leads to x-risk no matter/despite alignment"; and these people wrong, as the post shows "this outcome" to be implausible; and, separately, that even if they're right about loss of control, they're wrong about x-risk, as it'll be fine because of alignment.
But mostly, people (including the leading voices) are worried specifically about capable misaligned systems leading to human extinction. I don't know anyone in the community who'd say it's a bad thing that leads to extinction if a CEV-aligned superintelligence grabs control.
Since each generation of controllable AIs can help control the next generation, it looks like this process can continue indefinitely, even to very high levels of capability
I expect it to be easy to reward-shape AIs below a certain level [2] of capability, and I worry ab
... (truncated, 26 KB total)a3c5ea4ebbb2c3ff | Stable ID: M2U3MWU2ND