Fact·f_4j56kTwGW5·Fact

Center for AI Safety (CAIS) — publication: Representation Engineering: A Top-Down Approach to AI Transparency — proposes methods to read and control LLM internal representations for safety

Verdictconfirmed98%

1 check · 5/18/2026

1 → confirmed

Our claim

entire record

Subject: Center for AI Safety (CAIS)
Value: Representation Engineering: A Top-Down Approach to AI Transparency — proposes methods to read and control LLM internal representations for safety
As Of: October 2023
Source: https://arxiv.org/abs/2310.01405
Notes: By Zou, Phan, Chen, Campbell, Guo, Ren, Pan, Yin, Mazeika, Dombrowski, Goel, Li, Byun, Wang, Mallen, Basart, Koyejo, Song, Li, Hendrycks

Source evidence

1 src · 1 check

arxiv.org/abs/2310.01405 resource

confirmed98%primaryHaiku 4.5 · 5/18/2026

NoteThe source directly confirms all key elements of the claim: (1) CAIS is listed as an affiliation for multiple authors (Andy Zou, Long Phan, Sarah Chen, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Zifan Wang, Steven Basart, Dan Hendrycks); (2) the publication title matches exactly; (3) the abstract explicitly describes methods to 'read and control' LLM internal representations; (4) safety applications are mentioned including honesty, harmlessness, and power-seeking; (5) the arXiv ID 2310.01405 corresponds to October 2023 ('2310' = 2023-10). All author names in the claim match those listed in the source.

Case № f_4j56kTwGW5Filed 5/18/2026Confidence 98%