Redwood Research

Safety Organization

Founded Jun 2021 (4 years old)HQ: San Francisco, CAredwoodresearch.org ↗

Also known as: Redwood

Entity

About

People3 Timeline7 Divisions1

Business

Grants Received6 Market Data8

Policy & Governance

Policy Positions1

Output & Research

Publications2 Announcements5

Data

News & Announcements (5)


"The case for ensuring that powerful AIs are controlled" (May 2024) Buck Shlegeris and Ryan Greenblatt argue that AI labs should implement 'AI control' measures ensuring powerful models cannot cause unacceptably bad outcomes even if they are misaligned and actively trying to subvert safety measures. They contend no fundamental research breakthroughs are needed to achieve control for early transformatively useful AIs, and that this approach substantially reduces risks from scheming/deceptive alignment. The post accompanies a technical paper demonstrating control evaluation methodology in a programming setting.	web	blog.redwoodresearch.org	-	3		↗
Redwood Research Fellowship (Page Not Found) This URL returns a 404 error, indicating the Redwood Research Fellowship page no longer exists or has been moved. No content is available to summarize.	web	redwoodresearch.org	-	-		↗
Redwood Research: AI Control Redwood Research is a nonprofit AI safety organization that pioneered the 'AI control' research agenda, focusing on preventing intentional subversion by misaligned AI systems. Their key contributions include the ICML paper on AI Control protocols, the Alignment Faking demonstration (with Anthropic), and consulting work with governments and AI labs on misalignment risk mitigation.	web	redwoodresearch.org	-	16		↗
Redwood Research's 2024 studies This URL returns a 404 error, indicating the page no longer exists or has been moved. No substantive content is available from this resource. The original page was likely intended to aggregate or link Redwood Research's 2024 alignment studies.	web	redwoodresearch.org	-	1		↗
What's up with Anthropic predicting AGI by early 2027? This Redwood Research blog post analyzes and contextualizes Anthropic's internal prediction that AGI may arrive by early 2027, examining the reasoning behind such a timeline and its implications for AI safety work. It explores what 'AGI' means in this context and how this prediction should affect research prioritization and urgency.	web	blog.redwoodresearch.org	-	2		↗