Back
Alignment Research Center
webalignment.org·alignment.org/
ARC is one of the leading independent technical AI safety research organizations; its evaluations work spun out as METR, and it remains influential in shaping how frontier labs approach pre-deployment safety assessments.
Metadata
Importance: 72/100homepage
Summary
The Alignment Research Center (ARC) is a non-profit research organization focused on technical AI alignment and safety research. ARC works on understanding and addressing risks from advanced AI systems, including interpretability, evaluations, and identifying dangerous AI capabilities before deployment.
Key Points
- •ARC conducts technical research on AI alignment, aiming to ensure advanced AI systems behave safely and as intended by their developers.
- •The organization developed ARC Evals (now METR), focused on evaluating dangerous capabilities in frontier AI models.
- •ARC works on interpretability research to better understand the internal representations and reasoning of large language models.
- •Research priorities include understanding how AI systems could autonomously acquire resources or evade human oversight.
- •ARC collaborates with major AI labs to conduct pre-deployment safety evaluations of frontier models.
Cited by 13 pages
| Page | Type | Quality |
|---|---|---|
| Autonomous Coding | Capability | 63.0 |
| Long-Horizon Autonomous Tasks | Capability | 65.0 |
| Power-Seeking Emergence Conditions Model | Analysis | 63.0 |
| AI Risk Warning Signs Model | Analysis | 70.0 |
| Alignment Research Center | Organization | 57.0 |
| Paul Christiano | Person | 39.0 |
| AI Control | Research Area | 75.0 |
| Eliciting Latent Knowledge (ELK) | Approach | 91.0 |
| AI Alignment Research Agendas | Crux | 69.0 |
| Sleeper Agent Detection | Approach | 66.0 |
| Deceptive Alignment | Risk | 75.0 |
| Emergent Capabilities | Risk | 61.0 |
| Sharp Left Turn | Risk | 69.0 |
6 FactBase facts citing this source
| Entity | Property | Value | As Of |
|---|---|---|---|
| Alignment Research Center | Legal Structure | 501(c)(3) nonprofit | — |
| Alignment Research Center | Headquarters | Berkeley, CA | — |
| Alignment Research Center | Founded Date | Oct 2021 | — |
| Alignment Research Center | Founded By | sid_vzxfzxBITd | — |
| Paul Christiano | Employed By | QsXVXtQ0zE | Oct 2021 |
| Paul Christiano | Role / Title | Founder, Alignment Research Center | Oct 2021 |
Cached Content Preview
HTTP 200Fetched Feb 26, 20264 KB
The Alignment Research Center (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests.
We are currently pursuing theoretical research on how to produce formal mechanistic explanations of neural network behavior.
## Recent research
In 2025, ARC has been making conceptual and theoretical progress at the fastest pace that I've seen since I first interned in 2022. Most of this progress has come about because
…
[»](https://www.alignment.org/blog/competing-with-sampling/)
Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The
…
[»](https://www.alignment.org/blog/a-birds-eye-view-of-arcs-research/)
ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural
…
[»](https://www.alignment.org/blog/formal-verification-heuristic-explanations-and-surprise-accounting/)
[Read more →](https://www.alignment.org/blog)
## About ARC
**What is “alignment”?** ML systems can exhibit goal-directed behavior, but it is difficult to understand or control what they are “trying” to do. Powerful models could cause harm if they were trying to manipulate and deceive humans. The goal of [intent alignment](https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6?ref=alignment-research-center-2.ghost.io) is to instead train these models to be helpful and honest.
**Motivation**: We expect that modern ML techniques would lead to severe misalignment if scaled up to large enough computers and datasets. Practitioners may be able to adapt before these failures have catastrophic consequences, but we could reduce the risk by adopting scalable methods further in advance.
**What we’re working on**: We're currently working on [outperforming random sampling](https://www.alignment.org/blog/competing-with-sampling/) when it comes to understanding neural network outputs. More broadly, we are trying to produce [formal mechanistic explanations](https://www.alignment.org/blog/formal-verification-heuristic-explanations-and-surprise-accounting/) for neural network behaviors in order to [produce robustly aligned systems](https://www.alignment.org/blog/a-birds-eye-view-of-arcs-research/). We see this as the most promising approach to our broader research agenda, which is explained along with our research methodology in our report on [Eliciting Latent Knowledge](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit).
**Methodology**: We’re unsatisfied with an algorithm if we can see any plausible story about how it eventually breaks down, which means that we can rule out most algorithms on paper without ever implementing them. The cost of this approach is that it may completely miss strategies that exploit important structure in real
... (truncated, 4 KB total)Resource ID:
0562f8c207d8b63f | Stable ID: ZjljOTM4Nm