Skip to content
Longterm Wiki

Redwood Research

Safety Organization
Founded Jun 2021 (4 years old)HQ: San Francisco, CAredwoodresearch.org

Also known as: Redwood

Research & Technical Papers (2)

Redwood Research's AI Control research program focuses on developing techniques to ensure AI systems behave safely even if they are misaligned or adversarially inclined, by building robust oversight and control mechanisms rather than relying solely on alignment. The approach emphasizes empirically evaluating whether safety measures hold up against a red-teamed 'untrusted' AI attempting to subvert them. This represents a complementary strategy to alignment research, treating safety as an engineering and evaluation problem.
paperredwoodresearch.org2
Causal Scrubbing is a methodology developed by Redwood Research for evaluating mechanistic interpretability hypotheses about neural networks. It provides a principled, algorithmic approach to test whether a proposed explanation of a model's computation is correct by systematically replacing activations and measuring the impact on model behavior. This framework helps researchers rigorously validate or falsify circuit-level interpretability claims.
paperredwoodresearch.org3