Skip to content
Longterm Wiki
Back

ARC's ELK (Eliciting Latent Knowledge) Report

web

Published by the Alignment Research Center (ARC), this report is considered a foundational document in technical AI alignment, introducing the ELK problem as a formal challenge and benchmarking proposed solutions; it has heavily influenced subsequent research on honesty, transparency, and oversight.

Metadata

Importance: 92/100organizational reportprimary source

Summary

ARC's foundational report on the Eliciting Latent Knowledge problem, which asks how to get an AI to honestly report its beliefs about the world even when it could fool human overseers. It systematically explores proposed solutions and their failure modes, framing ELK as a core alignment challenge that must be solved for scalable oversight to work.

Key Points

  • Defines the ELK problem: training an AI reporter to accurately convey what a powerful world-model 'knows' without deceiving human evaluators
  • Surveys multiple proposed solutions (e.g., auxilliary training, relaxed adversarial training) and identifies counterexamples for each
  • Argues that naive reward signals may train AIs to produce plausible-looking outputs rather than truthful representations of internal knowledge
  • Highlights the challenge of distinguishing a 'helpful' AI from one that has learned to deceive in ways humans cannot detect
  • Positions ELK as a prerequisite for scalable oversight approaches like amplification and debate to be trustworthy

Cited by 1 page

PageTypeQuality
Paul ChristianoPerson39.0

Cached Content Preview

HTTP 200Fetched Feb 26, 20260 KB
Eliciting Latent Knowledge - Google Docs JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload. 
 This browser version is no longer supported. Please upgrade to a supported browser. Eliciting Latent Knowledge Tab External Share File Edit View Tools Help Accessibility Debug
Resource ID: e6ff505f606f86cf | Stable ID: YThjMzgxYT