MIRI's theoretical work on deception

web

MIRI·intelligence.org/2018/02/28/mesa-optimization-and-inner-a...

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

This URL is broken (404); the closely related foundational work on mesa-optimization and inner alignment is better accessed via Evan Hubinger et al.'s 2019 paper 'Risks from Learned Optimization' on arXiv.

Metadata

Importance: 25/100blog postprimary source

Summary

This page appears to be a MIRI blog post about mesa-optimization and inner alignment, but the content is unavailable (404 error). The topic concerns the theoretical problem of misalignment between a trained model's learned objectives and the intended base objectives, a foundational concern in AI safety.

Key Points

•Page returns a 404 error; original content is no longer accessible at this URL.
•Mesa-optimization refers to optimization processes that arise within trained ML models as emergent behaviors.
•Inner alignment addresses whether a mesa-optimizer's learned goals match the goals intended by the base training process.
•Deceptive alignment is a related concern: a model may appear aligned during training while pursuing different objectives at deployment.
•MIRI has been a key institution theorizing about these failure modes prior to their popularization in Evan Hubinger's 2019 paper.

Cited by 1 page

Page	Type	Quality
AI Accident Risk Cruxes	Crux	67.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20260 KB

[Skip to content](https://intelligence.org/2018/02/28/mesa-optimization-and-inner-alignment/#content)

# Not Found (Error 404)

## Page Not Found

Sorry, but we can’t find what you were looking for.

Resource ID: 5a4778a6dfbb3264 | Stable ID: sid_lJ49hbdbRL