Skip to content
Longterm Wiki
Back

ARC's first technical report: Eliciting Latent Knowledge

web

Authors

paulfchristiano·Mark Xu·Ajeya Cotra

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: LessWrong

This 2021 ARC report is a landmark alignment document that formalized the ELK problem and launched a prize competition; it is widely cited as a key reference for understanding deceptive alignment and scalable oversight challenges.

Forum Post Details

Karma
230
Comments
90
Forum
lesswrong
Status
Curated
Forum Tags
Eliciting Latent KnowledgeAlignment Research Center (ARC)AI

Metadata

Importance: 88/100blog postprimary source

Summary

ARC's foundational technical report introduces Eliciting Latent Knowledge (ELK) as a central open problem in AI alignment: how to extract what an AI system actually 'knows' about the world rather than what it reports. The report surveys multiple proposed approaches to mapping between an AI's internal world-model and human concepts, and explains why this problem is both hard and critical to solving alignment.

Key Points

  • ELK addresses the core challenge of getting an AI to report its true beliefs rather than what will satisfy evaluators or pass oversight checks.
  • The problem is closely related to ontology identification: bridging the gap between an AI's internal representations and human concepts/values.
  • The report presents and critiques multiple proposed approaches, establishing a research agenda and methodology for ARC.
  • ELK is positioned as foundational to ARC's broader alignment strategy, particularly for scalable oversight in high-stakes scenarios.
  • The report launched a well-known prize competition to solicit solutions, significantly broadening community engagement with the problem.

Cited by 1 page

PageTypeQuality
Alignment Research CenterOrganization57.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
![Background Image](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/splashArtImagePrompta%20cross-section%20view%20of%20two%20brains%20connected%20by%20a%20network%20of%20wires/bgoz3goj4c6uc29ruprw)

[Best of LessWrong 2021](https://www.lesswrong.com/bestoflesswrong?year=2021&category=all)

[Eliciting Latent Knowledge](https://www.lesswrong.com/w/eliciting-latent-knowledge)[Alignment Research Center (ARC)](https://www.lesswrong.com/w/alignment-research-center-arc)[AI](https://www.lesswrong.com/w/ai) [Curated](https://www.lesswrong.com/recommendations)

# 230

1010

ChaptersSpeed 1XSubscribe

ARC’s first technical report: Eliciting Latent Knowledge

00:00 / 01:54

Speed 1x

Chapter 1QandA

01:02QandA

00:00 / 01:54

[Apple](https://podcasts.apple.com/us/podcast/lesswrong-30+-karma/id1698192712)

[Spotify](https://open.spotify.com/show/3teJ17Kn2xs9pMMRcMAWuQ)

[RSS](https://feeds.type3.audio/lesswrong--30-karma.rss)

0.5x5x

1x

1010

00:00 / 01:54

ChaptersSpeed 1XSubscribe

Chapter 1QandA

01:02QandA

00:00 / 01:54

[Apple](https://podcasts.apple.com/us/podcast/lesswrong-30+-karma/id1698192712)

[Spotify](https://open.spotify.com/show/3teJ17Kn2xs9pMMRcMAWuQ)

[RSS](https://feeds.type3.audio/lesswrong--30-karma.rss)

0.5x5x

1x

# [ARC's first technical report: Eliciting LatentKnowledge](https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge)

by [paulfchristiano](https://www.lesswrong.com/users/paulfchristiano?from=post_header), [Mark Xu](https://www.lesswrong.com/users/mark-xu?from=post_header), [Ajeya Cotra](https://www.lesswrong.com/users/ajeya-cotra?from=post_header)

14th Dec 2021

[AI Alignment Forum](https://alignmentforum.org/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge)

1 min read

[90](https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge#comments)

# 230

# Ω 95

[Review by\\
\\
Orpheus16](https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge#WDCnePLNdAEpnnP2z)[Review by\\
\\
Vaniver](https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge#tnxtbX897FZEuDipw)

This is a linkpost for [https://docs.google.com/document/d/1WwsnJQstPq91\_Yh-Ch2XRL8H\_EpsnjrC1dwZXR37PC8/edit?usp=sharing](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=sharing)

ARC has published a report on [Eliciting Latent Knowledge](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=sharing), an open problem which we believe is central to alignment. We think reading this report is the clearest way to understand what problems we are working on, how they fit into our plan for solving alignment in the worst case, and our research methodology.

The core difficulty we discuss is learning how to map between an AI’s model of the

... (truncated, 98 KB total)
Resource ID: 37f4871113caa2ab | Stable ID: NDFiYzZjZW