Citation

Redwood Research - Footnote 26

confirmed100% confidence

1 evidence check

Last checked: 4/3/2026

Migrated from citation_quotes. Original verdict: accurate

Evidence — 1 source, 1 check

www.anthropic.com/research/alignment-faking(1 check)

confirmed100%Haiku 4.5 · 4/3/2026

Found: In collaboration with <EntityLink id="anthropic">Anthropic</EntityLink>, Redwood demonstrated that large language models can strategically hide misaligned intentions during training. Claude 3 Opus was…

Note: Migrated from citation_quotes accuracy check. Original verdict: accurate

Debug info

Record type: citation

Record ID: page:redwood-research:fn26