AI Control

AI Controlactive

Research on deploying AI systems with sufficient safeguards even if they are misaligned, using monitoring, sandboxing, and redundancy.

Organizations

Key Papers

Grants

Total Funding

$340K

Risks Addressed

First Proposed: 2023 (Greenblatt et al., Redwood Research)

Cluster: AI Control

Organizations2

Organization	Role
Anthropic	active
Redwood Research	pioneer

Name	Recipient	Amount	Funder	Date
Luthien	Jai Dhyani	$170K	Manifund	2025-03-04
Help AIs create AI safety tools	Jacob Arbeid	$80K	Manifund	2025-10-29
4-month stipend to continue work on AI Control as a MATS extension	Cody Rushing	$30K	Long-Term Future Fund (LTFF)	2024-07
4-month stipend to continue work on AI Control as a MATS extension	Tyler Tracy	$30K	Long-Term Future Fund (LTFF)	2024-07
4-month salary to continue work on AI Control as a MATS extension	Vasil Georgiev	$30K	Long-Term Future Fund (LTFF)	2024-07

Funder	Grants	Total Amount
Manifund	2	$250K
Long-Term Future Fund (LTFF)	3	$90K

SEMINAL

Greenblatt & Roger2024

Name	Status	Orgs	Papers
Circuit BreakersInference-time interventions that halt model execution when unsafe behavior is detected.	active	2	1
Encoded Reasoning DetectionDetecting when AI systems use steganography, hidden channels, or encoded communication to circumvent oversight.	emerging	2	1
Monitoring and Anomaly DetectionRuntime monitoring of AI system behavior to detect unexpected actions, policy violations, or anomalous patterns.	active	2	1
Multi-Agent SafetySafety challenges arising from multiple AI agents interacting, including collusion, coordination failures, and emergent behaviors.	emerging	2	1
Output FilteringPost-generation safety filters that screen model outputs before delivery.	active	2	1
Sandboxing and ContainmentIsolating AI systems in controlled environments to limit their ability to cause harm, including air-gapped execution and capability restrictions.	active	2	1
Structured Access / API-OnlyDeployment models that restrict access to model weights, providing only API interfaces.	active	2	1
Tool-Use RestrictionsLimiting which external tools and actions AI agents can access.	active	2	1