Improving Alignment and Robustness with Circuit Breakers

publication

Entity profile Source checks

Child of Center for AI Safety (CAIS)

Metadata

Source Table	`publications`
Source ID	`K87cheyygx`
Description	Andy Zou, Long Phan, Justin Wang et al., 2024
Source URL	arxiv.org/abs/2406.04313
Parent	Center for AI Safety (CAIS)
Children	—
Created	Mar 23, 2026, 2:46 PM
Updated	Mar 23, 2026, 2:46 PM
Synced	Mar 23, 2026, 2:46 PM

Record Data

`id`	K87cheyygx
`entityId`	Center for AI Safety (CAIS)(organization)
`entityDisplayName`	—
`resourceId`	—
`title`	Improving Alignment and Robustness with Circuit Breakers
`authors`	Andy Zou, Long Phan, Justin Wang et al.
`url`	arxiv.org/abs/2406.04313
`venue`	—
`publishedDate`	2024
`publicationType`	paper
`citationCount`	—
`isFlagship`	No
`abstract`	—
`source`	arxiv.org/abs/2406.04313
`notes`	ICML 2024

Source Check Verdicts

confirmed99% confidence

Last checked: 4/29/2026

1 → confirmed

Debug info

Thing ID: K87cheyygx

Source Table: publications

Source ID: K87cheyygx

Parent Thing ID: sid_y4bieqSeag