publication
Improving Alignment and Robustness with Circuit Breakers
Metadata
| Source Table | publications |
| Source ID | K87cheyygx |
| Description | Andy Zou, Long Phan, Justin Wang et al., 2024 |
| Source URL | arxiv.org/abs/2406.04313 |
| Parent | Center for AI Safety (CAIS) |
| Children | — |
| Created | Mar 23, 2026, 2:46 PM |
| Updated | Mar 23, 2026, 2:46 PM |
| Synced | Mar 23, 2026, 2:46 PM |
Record Data
id | K87cheyygx |
entityId | Center for AI Safety (CAIS)(organization) |
entityDisplayName | — |
resourceId | — |
title | Improving Alignment and Robustness with Circuit Breakers |
authors | Andy Zou, Long Phan, Justin Wang et al. |
url | arxiv.org/abs/2406.04313 |
venue | — |
publishedDate | 2024 |
publicationType | paper |
citationCount | — |
isFlagship | No |
abstract | — |
source | arxiv.org/abs/2406.04313 |
notes | ICML 2024 |
Source Check Verdicts
confirmed99% confidence
Last checked: 3/26/2026
All key fields in the record are confirmed by the source text: (1) Title matches exactly; (2) Authors Andy Zou, Long Phan, and Justin Wang are confirmed as the first three authors (et al. appropriately represents the remaining 7 authors); (3) Published date 2024 is confirmed (submitted June 6, 2024); (4) URL https://arxiv.org/abs/2406.04313 matches the arXiv identifier 2406.04313 provided in the source; (5) Publication type is confirmed as an arXiv paper in Machine Learning (cs.LG). No contradictions detected.
Debug info
Thing ID: K87cheyygx
Source Table: publications
Source ID: K87cheyygx