Longterm Wiki
Data
Page StatusAI Transition Model

Misalignment Potential

Entry

Misalignment Potential

Model RoleRoot Factor (AI System)
Key ParametersAlignment Robustness, Interpretability Coverage, Human Oversight Quality
Primary OutcomeExistential Catastrophe
Related
ai-transition-model-scenarios
Existential CatastropheAI Takeover
ai-transition-model-parameters
Alignment RobustnessInterpretability CoverageHuman Oversight QualitySafety-Capability GapSafety Culture Strength
Entry

Misalignment Potential

Model RoleRoot Factor (AI System)
Key ParametersAlignment Robustness, Interpretability Coverage, Human Oversight Quality
Primary OutcomeExistential Catastrophe
Related
ai-transition-model-scenarios
Existential CatastropheAI Takeover
ai-transition-model-parameters
Alignment RobustnessInterpretability CoverageHuman Oversight QualitySafety-Capability GapSafety Culture Strength

Misalignment Potential measures the likelihood that AI systems will pursue goals other than what we intend. This aggregate combines the technical and organizational factors that determine whether advanced AI systems might behave in harmful ways despite our efforts.

Primary outcome affected: Existential Catastrophe ↑↑↑

When misalignment potential is high, catastrophic loss of control, accidents at scale, and goal divergence become more likely. Reducing this potential is the most direct lever for reducing existential and catastrophic AI risk.


Component Parameters

Influences

+AI Takeover

AI systems gaining decisive control.

RapidGradual
strong medium weak

Internal Dynamics

These components interact:

  • Interpretability enables alignment verification: We can only confirm alignment if we understand model internals
  • Safety culture sustains investment: Without organizational commitment, safety research loses funding to capabilities
  • Oversight requires interpretability: Human overseers need tools to understand what systems are doing
  • Gap closure requires all components: No single factor is sufficient; safety capacity emerges from their combination

How This Affects Scenarios

Contributing Factors

  • AI CapabilitiesFactor

Related Pages

Top Related Pages

Transition Model

Safety-Capability GapHuman Oversight QualitySafety Culture StrengthMisuse PotentialAI UsesAI Ownership

Approaches

AI Alignment

Key Debates

AI Alignment Research AgendasWhy Alignment Might Be HardWhy Alignment Might Be Easy

Concepts

RLHF

Models

Alignment Robustness Trajectory Model

Risks

Epistemic Sycophancy

Labs

Safe Superintelligence Inc.

People

Eliezer Yudkowsky

Safety Research

AI Value LearningProsaic Alignment