Skip to content
Longterm Wiki
Back

Paul Christiano's AI Alignment Research

blog

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Paul Christiano is one of the most cited researchers in technical AI alignment; his user profile aggregates posts, comments, and research threads on the Alignment Forum, making it a useful entry point into his body of work.

Metadata

Importance: 82/100homepage

Summary

Paul Christiano is a leading AI alignment researcher and founder of ARC (Alignment Research Center), known for foundational contributions including iterated amplification, debate as an alignment technique, and eliciting latent knowledge (ELK). His work addresses existential risks from advanced AI, responsible scaling policies, and core technical challenges in ensuring AI systems remain beneficial and under human oversight.

Key Points

  • Developed influential alignment proposals including iterated amplification, AI safety via debate, and eliciting latent knowledge (ELK).
  • Argues that powerful AI poses genuine risks of irreversible human disempowerment, potentially emerging without clear warning signals.
  • Advocates for responsible scaling policies and transparency about AI capabilities from frontier labs.
  • Critiques alignment efforts that focus on tractable but less critical problems rather than core technical difficulties.
  • Founded the Alignment Research Center (ARC) to work on evaluation and interpretability-adjacent alignment research.

Cited by 2 pages

Cached Content Preview

HTTP 200Fetched Mar 15, 20267 KB
# [Paul Christiano](https://www.alignmentforum.org/users/paulfchristiano)

Top postsTop post

[**Where I agree and disagree with Eliezer**\\
\\
(Partially in response to AGI Ruin: A list of Lethalities. Written in the same rambling style. Not exhaustive.)\\
\\
Agreements\\
1\. Powerful AI systems have a good chance of deliberately and irreversibly disempowering humanity. This is a much more likely failure mode than humanity killing ourselves with destructive physical technologies.\\
2\. Catastrophically risky AI systems could plausibly exist soon, and there likely won’t be a strong consensus about this fact until such systems pose a meaningful existential risk per year. There is not necessarily any “fire alarm.”\\
3\. Even if there were consensus about a risk from powerful AI systems, there is a good chance that the world would respond in a totally unproductive way. It’s wishful thinking to look at possible stories of doom and say “we wouldn’t let that happen;” humanity is fully capable of messing up even very basic challenges, especially if they are novel.\\
4\. I think that many of the projects intended to help with AI alignment don't make progress on key difficulties and won’t significantly reduce the risk of catastrophic outcomes. This is related to people gravitating to whatever research is most tractable and not being too picky about what problems it helps with, and related to a low level of concern with the long-term future in particular. Overall, there are relatively few researchers who are effectively focused on the technical problems most relevant to existential risk from alignment failures.\\
5\. There are strong social and political pressures to spend much more of our time talking about how AI shapes existing conflicts and shifts power. This pressure is already playing out and it doesn’t seem too likely to get better.\\
6\. Even when thinking about accident risk, people’s minds seem to go to what they think of as “more realistic and less sci fi” risks that are much less likely to be existential (and sometimes I think less plausible). It’s very possible this dynamic won’t change until after actually existing AI\\
\\
917Jun 19, 2022](https://www.alignmentforum.org/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer)

[**What failure looks like** \\
\\
442Mar 17, 2019](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like)[**AI alignment is distinct from its near-term applications** \\
\\
255Dec 13, 2022](https://www.alignmentforum.org/posts/Hw26MrLuhGWH7kBLm/ai-alignment-is-distinct-from-its-near-term-applications)[**Thoughts on the impact of RLHF research** \\
\\
254Jan 25, 2023](https://www.alignmentforum.org/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research)

#### Paul Christiano

Subscribe

Message

28163

Ω

6184

157

2346

592

16y

PostsSequencesQuick takesAll

⚙

[**Thoughts on responsible scaling policies and regulation** \\
\\
I am excited about AI developers implementing res

... (truncated, 7 KB total)
Resource ID: ebb2f8283d5a6014 | Stable ID: Y2U1YWQ0NT