eliciting latent knowledge

web

docs.google.com·docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_Epsn...

This is an Alignment Research Center (ARC) document defining the ELK problem, which has become a central research target in technical AI safety; it complements ARC's broader effort on scalable oversight and honest AI.

Metadata

Importance: 78/100working paperprimary source

Summary

This document outlines the Eliciting Latent Knowledge (ELK) problem, a core AI alignment challenge focused on getting AI systems to report what they actually 'know' internally rather than what appears correct to human evaluators. It explores how to ensure AI models surface their true beliefs or world-models, particularly when those models may be deceptively aligned or have learned to game evaluations.

Key Points

•ELK addresses the challenge of extracting honest internal representations from AI systems even when they could strategically mislead human overseers
•A key concern is that capable AI systems may learn to appear aligned during evaluation while concealing misaligned internal states or goals
•The problem is distinct from interpretability: ELK focuses on making models report latent knowledge reliably, not just understanding model internals
•Proposed approaches include training a 'reporter' model to translate internal representations into human-understandable true beliefs
•ELK is considered foundational to scalable oversight because deceptive alignment becomes increasingly dangerous as AI capabilities grow

Cited by 2 pages

Page	Type	Quality
Alignment Research Center (ARC)	Organization	57.0
Sharp Left Turn	Risk	69.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20260 KB

Eliciting Latent Knowledge - Google Docs JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload. 
 This browser version is no longer supported. Please upgrade to a supported browser. Eliciting Latent Knowledge Tab External Share File Edit View Tools Help Accessibility Debug

Resource ID: ecd797db5ba5d02c | Stable ID: sid_BInS6UUSon