Citation

Redwood Research - Footnote 24

partial85% confidence

1 evidence check

Last checked: 4/3/2026

The claim states that Claude 3 Opus was used as the primary test model, but the source only mentions 'Claude' and does not specify the version. The claim states that large language models can strategically hide misaligned intentions during training, but the source states that Claude 'sometimes hides misaligned intentions'.

Evidence — 1 source, 1 check

www.redwoodresearch.org/(1 check)

partial85%Haiku 4.5 · 4/3/2026

Found: In collaboration with <EntityLink id="anthropic">Anthropic</EntityLink>, Redwood demonstrated that large language models can strategically hide misaligned intentions during training. Claude 3 Opus was…

Note: The claim states that Claude 3 Opus was used as the primary test model, but the source only mentions 'Claude' and does not specify the version. The claim states that large language models can strategically hide misaligned intentions during training, but the source states that Claude 'sometimes hides misaligned intentions'.

Debug info

Record type: citation

Record ID: page:redwood-research:fn24