Skip to content
Longterm Wiki
Back

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

This URL is broken (404); the closely related foundational work on mesa-optimization and inner alignment is better accessed via Evan Hubinger et al.'s 2019 paper 'Risks from Learned Optimization' on arXiv.

Metadata

Importance: 25/100blog postprimary source

Summary

This page appears to be a MIRI blog post about mesa-optimization and inner alignment, but the content is unavailable (404 error). The topic concerns the theoretical problem of misalignment between a trained model's learned objectives and the intended base objectives, a foundational concern in AI safety.

Key Points

  • Page returns a 404 error; original content is no longer accessible at this URL.
  • Mesa-optimization refers to optimization processes that arise within trained ML models as emergent behaviors.
  • Inner alignment addresses whether a mesa-optimizer's learned goals match the goals intended by the base training process.
  • Deceptive alignment is a related concern: a model may appear aligned during training while pursuing different objectives at deployment.
  • MIRI has been a key institution theorizing about these failure modes prior to their popularization in Evan Hubinger's 2019 paper.

Cited by 1 page

PageTypeQuality
AI Accident Risk CruxesCrux67.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20260 KB
[Skip to content](https://intelligence.org/2018/02/28/mesa-optimization-and-inner-alignment/#content)

# Not Found (Error 404)

## Page Not Found

Sorry, but we can’t find what you were looking for.
Resource ID: 5a4778a6dfbb3264 | Stable ID: Y2E2YjQ4Ym