Back
MIRI's theoretical work on deception
webCredibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: MIRI
This URL is broken (404); the closely related foundational work on mesa-optimization and inner alignment is better accessed via Evan Hubinger et al.'s 2019 paper 'Risks from Learned Optimization' on arXiv.
Metadata
Importance: 25/100blog postprimary source
Summary
This page appears to be a MIRI blog post about mesa-optimization and inner alignment, but the content is unavailable (404 error). The topic concerns the theoretical problem of misalignment between a trained model's learned objectives and the intended base objectives, a foundational concern in AI safety.
Key Points
- •Page returns a 404 error; original content is no longer accessible at this URL.
- •Mesa-optimization refers to optimization processes that arise within trained ML models as emergent behaviors.
- •Inner alignment addresses whether a mesa-optimizer's learned goals match the goals intended by the base training process.
- •Deceptive alignment is a related concern: a model may appear aligned during training while pursuing different objectives at deployment.
- •MIRI has been a key institution theorizing about these failure modes prior to their popularization in Evan Hubinger's 2019 paper.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20260 KB
[Skip to content](https://intelligence.org/2018/02/28/mesa-optimization-and-inner-alignment/#content) # Not Found (Error 404) ## Page Not Found Sorry, but we can’t find what you were looking for.
Resource ID:
5a4778a6dfbb3264 | Stable ID: Y2E2YjQ4Ym