Skip to content
Longterm Wiki
Back

Apollo Research — Research Overview

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Apollo Research

Apollo Research is a dedicated AI safety organization; this page indexes all their published work and is a useful starting point for tracking their contributions to scheming evaluations, interpretability, and AI governance.

Metadata

Importance: 72/100homepage

Summary

Apollo Research's research page aggregates their publications across evaluations, interpretability, and governance, with a focus on detecting and understanding AI scheming, deceptive alignment, and loss of control risks. Key featured works include a taxonomy for Loss of Control preparedness and stress-testing anti-scheming training methods in partnership with OpenAI. The page serves as a central index for their contributions to AI safety science and policy.

Key Points

  • Features a Loss of Control taxonomy and preparedness framework for policymakers, covering degrees and dynamics of LoC scenarios.
  • Includes evaluations of frontier models for in-context scheming behavior, including work done in collaboration with OpenAI on anti-scheming training.
  • Interpretability research covers sparse dictionary learning, linear probes for detecting strategic deception, and mechanistic description methods.
  • Governance work addresses EU AI Act compliance, national security AI assurance, and frameworks for AI incident reporting regimes.
  • Research spans both technical safety (evaluations, interpretability) and policy-facing outputs, making Apollo a cross-domain AI safety lab.

Cited by 9 pages

PageTypeQuality
Apollo ResearchOrganization58.0
Alignment EvaluationsApproach65.0
AI EvaluationApproach72.0
AI Safety CasesApproach91.0
Scheming & Deception DetectionApproach91.0
Sleeper Agent DetectionApproach66.0
Technical AI Safety ResearchCrux66.0
Mesa-OptimizationRisk63.0
SchemingRisk74.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20264 KB
# Research

Featured

![](https://www.apolloresearch.ai/u/2025/11/image-16-1200x951.png)

- Governance

### The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

Despite increasing policy and research attention to Loss of Control (LoC), decision- and policymakers are still operating in the absence of a uniform conceptualization and definition of LoC. Today, we bridge this gap through a novel taxonomy and preparedness framework for LoC that explores the degrees and dynamics of LoC through a comprehensive best-in-class literature review and presents actionable tools to counter relevant threats to national security and humanity.

![](https://www.apolloresearch.ai/u/2025/09/Group-47-1-1200x960.png)

- Evaluations

### Stress Testing Deliberative Alignment for Anti-Scheming Training

We partnered with OpenAI to assess frontier language models for early signs of scheming — covertly pursuing misaligned goals — in controlled stress-tests (non-typical environments), and studied a training method that can significantly reduce (but not eliminate) these behaviors. Our results are complicated by models’ increasing ability to recognize our evaluation environments as tests of their alignment.

Our Research

Filter

- Select all
- Interpretability
- Evaluations
- Governance

Governance

![](https://www.apolloresearch.ai/u/2025/12/Internal-Deployment.png)

- Governance

#### Internal Deployment of AI Models and Systems in the EU AI Act

Governance

![](https://www.apolloresearch.ai/u/2025/10/image-20-700x379.png)

- Governance

#### Assurance of Frontier AI Built for National Security

Governance

![](https://www.apolloresearch.ai/u/2025/04/Group-13-1-e1760438108415.png)

- Governance

#### AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

Governance

![](https://www.apolloresearch.ai/u/2025/04/Group-2-2-700x350.png)

- Governance

#### Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime

Interpretability

![](https://www.apolloresearch.ai/u/2025/02/Group-5-700x350.png)

- Interpretability

#### Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

Interpretability

![](https://www.apolloresearch.ai/u/2025/02/Group-11-2-700x562.png)

- Interpretability

#### Detecting Strategic Deception Using Linear Probes

Governance

![](https://www.apolloresearch.ai/u/2025/02/Group-6-700x350.png)

- Governance

#### Precursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities

Evaluations

![](https://www.apolloresearch.ai/u/2024/12/Group-11-700x437.png)

- Evaluations

#### Frontier Models are Capable of In-Context Scheming

Evaluations

![](https://www.apolloresearch.ai/u/2024/10/Group-6-2-700x350.png)

- Evaluations

#### Towards Safety Cases For AI Scheming

Interpretability

![](https://www.apolloresearch.ai/u/2024/05/Group-8-700x350.png)

- Interpretability

#### Identifying

... (truncated, 4 KB total)
Resource ID: 560dff85b3305858 | Stable ID: YjBkNGIzMD