Skip to content
Longterm Wiki

AI Control

AI Controlactive

Research on deploying AI systems with sufficient safeguards even if they are misaligned, using monitoring, sandboxing, and redundancy.

Organizations
2
Key Papers
1
Grants
5
Total Funding
$340K
Risks Addressed
2
First Proposed: 2023 (Greenblatt et al., Redwood Research)
Cluster: AI Control

Tags

ai-controlsafety-researchdeployment

Organizations2

OrganizationRole
Anthropicactive
Redwood Researchpioneer

Grants5

NameRecipientAmountFunderDate
LuthienJai Dhyani$170KManifund2025-03-04
Help AIs create AI safety toolsJacob Arbeid$80KManifund2025-10-29
4-month stipend to continue work on AI Control as a MATS extensionCody Rushing$30KLong-Term Future Fund (LTFF)2024-07
4-month stipend to continue work on AI Control as a MATS extensionTyler Tracy$30KLong-Term Future Fund (LTFF)2024-07
4-month salary to continue work on AI Control as a MATS extensionVasil Georgiev$30KLong-Term Future Fund (LTFF)2024-07

Funding by Funder

FunderGrantsTotal Amount
Manifund2$250K
Long-Term Future Fund (LTFF)3$90K

Key Papers & Resources1

Sub-Areas8

NameStatusOrgsPapers
Circuit BreakersInference-time interventions that halt model execution when unsafe behavior is detected.active21
Encoded Reasoning DetectionDetecting when AI systems use steganography, hidden channels, or encoded communication to circumvent oversight.emerging21
Monitoring and Anomaly DetectionRuntime monitoring of AI system behavior to detect unexpected actions, policy violations, or anomalous patterns.active21
Multi-Agent SafetySafety challenges arising from multiple AI agents interacting, including collusion, coordination failures, and emergent behaviors.emerging21
Output FilteringPost-generation safety filters that screen model outputs before delivery.active21
Sandboxing and ContainmentIsolating AI systems in controlled environments to limit their ability to cause harm, including air-gapped execution and capability restrictions.active21
Structured Access / API-OnlyDeployment models that restrict access to model weights, providing only API interfaces.active21
Tool-Use RestrictionsLimiting which external tools and actions AI agents can access.active21