Skip to content
Longterm Wiki

Circuit Breakers

AI Controlactive

Inference-time interventions that halt model execution when unsafe behavior is detected.

Organizations
2
Key Papers
1
First Proposed: 2024 (Zou et al.)
Cluster: AI Control
Parent Area: AI Control

Tags

function:robustnessstage:inferencescope:technique