Skip to content
Longterm Wiki
Back

Alignment Research Center

web
alignment.org·alignment.org/

ARC is one of the leading independent technical AI safety research organizations; its evaluations work spun out as METR, and it remains influential in shaping how frontier labs approach pre-deployment safety assessments.

Metadata

Importance: 72/100homepage

Summary

The Alignment Research Center (ARC) is a non-profit research organization focused on technical AI alignment and safety research. ARC works on understanding and addressing risks from advanced AI systems, including interpretability, evaluations, and identifying dangerous AI capabilities before deployment.

Key Points

  • ARC conducts technical research on AI alignment, aiming to ensure advanced AI systems behave safely and as intended by their developers.
  • The organization developed ARC Evals (now METR), focused on evaluating dangerous capabilities in frontier AI models.
  • ARC works on interpretability research to better understand the internal representations and reasoning of large language models.
  • Research priorities include understanding how AI systems could autonomously acquire resources or evade human oversight.
  • ARC collaborates with major AI labs to conduct pre-deployment safety evaluations of frontier models.

Cited by 13 pages

6 FactBase facts citing this source

Cached Content Preview

HTTP 200Fetched Feb 26, 20264 KB
The Alignment Research Center (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests.

We are currently pursuing theoretical research on how to produce formal mechanistic explanations of neural network behavior.

## Recent research

In 2025, ARC has been making conceptual and theoretical progress at the fastest pace that I've seen since I first interned in 2022. Most of this progress has come about because
…
[»](https://www.alignment.org/blog/competing-with-sampling/)

Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The
…
[»](https://www.alignment.org/blog/a-birds-eye-view-of-arcs-research/)

ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural
…
[»](https://www.alignment.org/blog/formal-verification-heuristic-explanations-and-surprise-accounting/)

[Read more →](https://www.alignment.org/blog)

## About ARC

**What is “alignment”?** ML systems can exhibit goal-directed behavior, but it is difficult to understand or control what they are “trying” to do. Powerful models could cause harm if they were trying to manipulate and deceive humans. The goal of [intent alignment](https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6?ref=alignment-research-center-2.ghost.io) is to instead train these models to be helpful and honest.

**Motivation**: We expect that modern ML techniques would lead to severe misalignment if scaled up to large enough computers and datasets. Practitioners may be able to adapt before these failures have catastrophic consequences, but we could reduce the risk by adopting scalable methods further in advance.

**What we’re working on**: We're currently working on [outperforming random sampling](https://www.alignment.org/blog/competing-with-sampling/) when it comes to understanding neural network outputs. More broadly, we are trying to produce [formal mechanistic explanations](https://www.alignment.org/blog/formal-verification-heuristic-explanations-and-surprise-accounting/) for neural network behaviors in order to [produce robustly aligned systems](https://www.alignment.org/blog/a-birds-eye-view-of-arcs-research/). We see this as the most promising approach to our broader research agenda, which is explained along with our research methodology in our report on [Eliciting Latent Knowledge](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit).

**Methodology**: We’re unsatisfied with an algorithm if we can see any plausible story about how it eventually breaks down, which means that we can rule out most algorithms on paper without ever implementing them. The cost of this approach is that it may completely miss strategies that exploit important structure in real

... (truncated, 4 KB total)
Resource ID: 0562f8c207d8b63f | Stable ID: ZjljOTM4Nm