AI Control
AI ControlactiveResearch on deploying AI systems with sufficient safeguards even if they are misaligned, using monitoring, sandboxing, and redundancy.
Organizations
2
Key Papers
1
Grants
5
Total Funding
$340K
Risks Addressed
2
First Proposed: 2023 (Greenblatt et al., Redwood Research)
Cluster: AI Control
Tags
ai-controlsafety-researchdeployment
Organizations2
| Organization | Role |
|---|---|
| Anthropic | active |
| Redwood Research | pioneer |
Grants5
| Name | Recipient | Amount | Funder | Date |
|---|---|---|---|---|
| Luthien | Jai Dhyani | $170K | Manifund | 2025-03-04 |
| Help AIs create AI safety tools | Jacob Arbeid | $80K | Manifund | 2025-10-29 |
| 4-month stipend to continue work on AI Control as a MATS extension | Cody Rushing | $30K | Long-Term Future Fund (LTFF) | 2024-07 |
| 4-month stipend to continue work on AI Control as a MATS extension | Tyler Tracy | $30K | Long-Term Future Fund (LTFF) | 2024-07 |
| 4-month salary to continue work on AI Control as a MATS extension | Vasil Georgiev | $30K | Long-Term Future Fund (LTFF) | 2024-07 |
Funding by Funder
| Funder | Grants | Total Amount |
|---|---|---|
| Manifund | 2 | $250K |
| Long-Term Future Fund (LTFF) | 3 | $90K |
Key Papers & Resources1
SEMINAL
AI Control: Improving Safety Despite Intentional Subversion
Greenblatt & Roger2024
Sub-Areas8
| Name | Status | Orgs | Papers |
|---|---|---|---|
| Circuit BreakersInference-time interventions that halt model execution when unsafe behavior is detected. | active | 2 | 1 |
| Encoded Reasoning DetectionDetecting when AI systems use steganography, hidden channels, or encoded communication to circumvent oversight. | emerging | 2 | 1 |
| Monitoring and Anomaly DetectionRuntime monitoring of AI system behavior to detect unexpected actions, policy violations, or anomalous patterns. | active | 2 | 1 |
| Multi-Agent SafetySafety challenges arising from multiple AI agents interacting, including collusion, coordination failures, and emergent behaviors. | emerging | 2 | 1 |
| Output FilteringPost-generation safety filters that screen model outputs before delivery. | active | 2 | 1 |
| Sandboxing and ContainmentIsolating AI systems in controlled environments to limit their ability to cause harm, including air-gapped execution and capability restrictions. | active | 2 | 1 |
| Structured Access / API-OnlyDeployment models that restrict access to model weights, providing only API interfaces. | active | 2 | 1 |
| Tool-Use RestrictionsLimiting which external tools and actions AI agents can access. | active | 2 | 1 |