Skip to content
Longterm Wiki
publication

Improving Alignment and Robustness with Circuit Breakers

Metadata

Source Tablepublications
Source IDK87cheyygx
DescriptionAndy Zou, Long Phan, Justin Wang et al., 2024
Source URLarxiv.org/abs/2406.04313
ParentCenter for AI Safety (CAIS)
Children
CreatedMar 23, 2026, 2:46 PM
UpdatedMar 23, 2026, 2:46 PM
SyncedMar 23, 2026, 2:46 PM

Record Data

idK87cheyygx
entityIdCenter for AI Safety (CAIS)(organization)
entityDisplayName
resourceId
titleImproving Alignment and Robustness with Circuit Breakers
authorsAndy Zou, Long Phan, Justin Wang et al.
urlarxiv.org/abs/2406.04313
venue
publishedDate2024
publicationTypepaper
citationCount
isFlagshipNo
abstract
sourcearxiv.org/abs/2406.04313
notesICML 2024

Source Check Verdicts

confirmed99% confidence

Last checked: 3/26/2026

All key fields in the record are confirmed by the source text: (1) Title matches exactly; (2) Authors Andy Zou, Long Phan, and Justin Wang are confirmed as the first three authors (et al. appropriately represents the remaining 7 authors); (3) Published date 2024 is confirmed (submitted June 6, 2024); (4) URL https://arxiv.org/abs/2406.04313 matches the arXiv identifier 2406.04313 provided in the source; (5) Publication type is confirmed as an arXiv paper in Machine Learning (cs.LG). No contradictions detected.

Debug info

Thing ID: K87cheyygx

Source Table: publications

Source ID: K87cheyygx