Skip to content
Longterm Wiki
Back

A Bird's Eye View of ARC's Research - Alignment Forum

blog

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Written by ARC researchers, this post serves as a high-level map of ARC's research agenda and is useful for understanding how ELK and alignment robustness fit into a broader scalable alignment strategy.

Metadata

Importance: 72/100blog postprimary source

Summary

This post outlines the Alignment Research Center's unified research agenda for scalable alignment, centered on two core subproblems: alignment robustness and eliciting latent knowledge (ELK). ARC uses a 'builder-breaker' methodology that makes worst-case assumptions rather than optimistic extrapolations from current AI systems. The post maps individual research projects to this broader framework using a hierarchical diagram.

Key Points

  • ARC's mission is developing alignment approaches that scale gracefully to more advanced AI systems, not just patching current ones.
  • The 'builder-breaker' methodology stress-tests proposals by assuming worst-case empirical scenarios rather than relying on favorable contingencies.
  • Two central subproblems: alignment robustness (maintaining intent alignment under out-of-distribution inputs) and eliciting latent knowledge (ELK).
  • A hierarchical diagram connects individual research projects to the overarching framework, providing a structured view of ARC's research portfolio.
  • The approach prioritizes worst-case guarantees over average-case performance, reflecting concern about catastrophic failure modes in advanced AI.

Cited by 1 page

PageTypeQuality
Model Organisms of MisalignmentAnalysis65.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202662 KB
[Alignment Research Center (ARC)](https://www.alignmentforum.org/w/alignment-research-center-arc)[AI](https://www.alignmentforum.org/w/ai) [Curated](https://www.alignmentforum.org/recommendations)

# 57

# [A bird's eye view of ARC'sresearch](https://www.alignmentforum.org/posts/ztokaf9harKTmRcn4/a-bird-s-eye-view-of-arc-s-research)

by [Jacob\_Hilton](https://www.alignmentforum.org/users/jacob_hilton?from=post_header)

23rd Oct 2024

[ARC](https://www.alignment.org/)

8 min read

[12](https://www.alignmentforum.org/posts/ztokaf9harKTmRcn4/a-bird-s-eye-view-of-arc-s-research#comments)

# 57

_This post includes a "flattened version" of an interactive diagram that cannot be displayed on this site. I recommend reading the original version of the post with the interactive diagram, which can be found [here](https://www.alignment.org/blog/a-birds-eye-view-of-arcs-research)._

Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The purpose of this post is to try to convey some of that vision and how our individual pieces of research fit into it.

_Thanks to Ryan Greenblatt, Victor Lecomte, Eric Neyman, Jeff Wu and Mark Xu for helpful comments._

## A bird's eye view

To begin, we will take a "bird's eye" view of ARC's research.[\[1\]](https://www.alignmentforum.org/posts/ztokaf9harKTmRcn4/a-bird-s-eye-view-of-arc-s-research#fn1) As we "zoom in", more nodes will become visible and we will explain the new nodes.

_An interactive version of the diagrams below can be found [here](https://www.alignment.org/blog/a-birds-eye-view-of-arcs-research)._

### Zoom level 1

![birds_eye_lvl1.svg](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ztokaf9harKTmRcn4/vgdmqmbjhnr9eyfvswek)

At the most zoomed-out level, ARC is working on the problem of "[intent alignment](https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6)": how to design AI systems that are trying to do what their operators want. While many practitioners are taking an iterative approach to this problem, there are [foreseeable ways](https://arxiv.org/abs/2307.15217) in which today's leading approaches could fail to scale to more intelligent AI systems, which could have [undesirable consequences](https://www.cold-takes.com/without-specific-countermeasures-the-easiest-path-to-transformative-ai-likely-leads-to-ai-takeover/). ARC is attempting to develop algorithms that have a better chance of scaling gracefully to future AI systems, hence the term " **scalable alignment**".

ARC's particular approach to scalable alignment is a "builder-breaker" methodology (described in more detail [here](https://ai-alignment.com/my-research-methodology-b94f2751cb2c), and exemplified in the [ELK report](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit)). Roughly speaking, if the scalability of an algorithm depen

... (truncated, 62 KB total)
Resource ID: eb7dbb3e5177e9f8 | Stable ID: N2QzZjQwYT