Also known as: Redwood
Policy & Governance
Policy Positions1News & Announcements (5)
Buck Shlegeris and Ryan Greenblatt argue that AI labs should implement 'AI control' measures ensuring powerful models cannot cause unacceptably bad outcomes even if they are misaligned and actively trying to subvert safety measures. They contend no fundamental research breakthroughs are needed to achieve control for early transformatively useful AIs, and that this approach substantially reduces risks from scheming/deceptive alignment. The post accompanies a technical paper demonstrating control evaluation methodology in a programming setting. | web | blog.redwoodresearch.org | - | 3 | ||
This URL returns a 404 error, indicating the Redwood Research Fellowship page no longer exists or has been moved. No content is available to summarize. | web | redwoodresearch.org | - | - | ||
Redwood Research is a nonprofit AI safety organization that pioneered the 'AI control' research agenda, focusing on preventing intentional subversion by misaligned AI systems. Their key contributions include the ICML paper on AI Control protocols, the Alignment Faking demonstration (with Anthropic), and consulting work with governments and AI labs on misalignment risk mitigation. | web | redwoodresearch.org | - | 16 | ||
This URL returns a 404 error, indicating the page no longer exists or has been moved. No substantive content is available from this resource. The original page was likely intended to aggregate or link Redwood Research's 2024 alignment studies. | web | redwoodresearch.org | - | 1 | ||
This Redwood Research blog post analyzes and contextualizes Anthropic's internal prediction that AGI may arrive by early 2027, examining the reasoning behind such a timeline and its implications for AI safety work. It explores what 'AGI' means in this context and how this prediction should affect research prioritization and urgency. | web | blog.redwoodresearch.org | - | 2 |