Apollo Research — Research Overview

web

Apollo Research·apolloresearch.ai/research/

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Apollo Research

Apollo Research is a dedicated AI safety organization; this page indexes all their published work and is a useful starting point for tracking their contributions to scheming evaluations, interpretability, and AI governance.

Metadata

Importance: 72/100homepage

Summary

Apollo Research's research page aggregates their publications across evaluations, interpretability, and governance, with a focus on detecting and understanding AI scheming, deceptive alignment, and loss of control risks. Key featured works include a taxonomy for Loss of Control preparedness and stress-testing anti-scheming training methods in partnership with OpenAI. The page serves as a central index for their contributions to AI safety science and policy.

Key Points

•Features a Loss of Control taxonomy and preparedness framework for policymakers, covering degrees and dynamics of LoC scenarios.
•Includes evaluations of frontier models for in-context scheming behavior, including work done in collaboration with OpenAI on anti-scheming training.
•Interpretability research covers sparse dictionary learning, linear probes for detecting strategic deception, and mechanistic description methods.
•Governance work addresses EU AI Act compliance, national security AI assurance, and frameworks for AI incident reporting regimes.
•Research spans both technical safety (evaluations, interpretability) and policy-facing outputs, making Apollo a cross-domain AI safety lab.

Cited by 9 pages

Page	Type	Quality
Apollo Research	Organization	58.0
Alignment Evaluations	Approach	65.0
AI Evaluation	Approach	72.0
AI Safety Cases	Approach	91.0
Scheming & Deception Detection	Approach	91.0
Sleeper Agent Detection	Approach	66.0
Technical AI Safety Research	Crux	66.0
Mesa-Optimization	Risk	63.0
Scheming	Risk	74.0

Cached Content Preview

HTTP 200Fetched Apr 7, 20264 KB

Research &#8211; Apollo Research 
 
 
 
 
 
 
 
 
 

 
 
 
 

 

 
 
 
 
 
 
 
 
 

 
 
 
 

 
 
 
 
 
 
 
 
 
 Research

 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Featured 
 
 
 

 
 

 
 
 
 
 
 Governance

 
 
 
 The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

 Despite increasing policy and research attention to Loss of Control (LoC), decision- and policymakers are still operating in the absence of a uniform conceptualization and definition of LoC. Today, we bridge this gap through a novel taxonomy and preparedness framework for LoC that explores the degrees and dynamics of LoC through a comprehensive best-in-class literature review and presents actionable tools to counter relevant threats to national security and humanity.

 
 

 
 24/11/2025 
 Read more 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 Evaluations

 
 
 
 Stress Testing Deliberative Alignment for Anti-Scheming Training

 We partnered with OpenAI to assess frontier language models for early signs of scheming — covertly pursuing misaligned goals — in controlled stress-tests (non-typical environments), and studied a training method that can significantly reduce (but not eliminate) these behaviors. Our results are complicated by models’ increasing ability to recognize our evaluation environments as tests of their alignment.

 
 

 
 17/09/2025 
 Read more 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Our Research 

 
 Filter 
 
 Select all

 Interpretability

 Evaluations

 Governance

 
 

 
 
 

 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Internal Deployment of AI Models and Systems in the EU AI Act

 
 
 
 
 08/12/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Assurance of Frontier AI Built for National Security

 
 
 
 
 09/10/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

 
 
 
 
 17/04/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime

 
 
 
 
 15/04/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Interpretability 

 
 
 
 
 Interpretability

 
 
 
 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

 
 
 
 
 11/02/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Interpretability 

 
 
 
 
 Interpretability

 
 
 
 Detecting Strategic Deception Using Linear Probes

 
 
 
 
 06/02/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Precursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities

 
 
 
 
 06/02/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Evaluations 

 
 
 
 
 Evaluations

 
 
 
 Frontier Models are Capable of In-Context Scheming

 
 
 
 
 05/12/2024 
 Read more 
 
 
 
 
 
 
 
 

 
 Evaluations 

 
 
 
 
 Evaluations

 
 
 
 Towards Safety Cases For AI Scheming

 
 
 
 
 31/10/2024 


... (truncated, 4 KB total)

Resource ID: 560dff85b3305858 | Stable ID: sid_Te6SBkUA2m