Rohin Shah - Berkeley

web

Personal homepage of Rohin Shah, head of AGI Safety & Alignment at Google DeepMind and former CHAI PhD researcher, providing an overview of his research focus areas including debate, interpretability, monitoring, and capability evaluations.

Metadata

Importance: 55/100homepage

Summary

Rohin Shah's personal homepage outlines his work leading the AGI Safety & Alignment team at Google DeepMind and his prior PhD research at UC Berkeley's CHAI on value learning. It highlights his research interests including amplified oversight, debate, interpretability via sparse autoencoders, deployment monitoring, and dangerous capability evaluations. He also previously authored the Alignment Newsletter.

Key Points

•Leads AGI Safety & Alignment team at Google DeepMind, covering both research and policy implementation.
•PhD from CHAI at UC Berkeley focused on AI systems learning human preferences without prior knowledge of user goals.
•Research interests include empirical debate, sparse autoencoders for interpretability, deployment monitoring, and capability evaluations.
•Previously authored the Alignment Newsletter, a widely-read digest of AI safety research (now on hiatus).
•Provides a public FAQ for those seeking career advice in AI alignment.

Cached Content Preview

HTTP 200Fetched Jun 22, 20262 KB

Hi, I’m Rohin! I lead the AGI Safety & Alignment team at Google DeepMind , where we prepare for the development of powerful AI systems, through both research and policy implementation .

I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don’t initially know what the user wants. I used to write up paper summaries in the Alignment Newsletter , though the newsletter is unfortunately on indefinite hiatus now.

In my free time, I enjoy puzzles, board games, and karaoke. You can email me at rohinmshah@gmail.com, though if you want to ask me about careers in AI alignment, you should read my FAQ first.

Research →

Alignment Newsletter →

Research

My research focuses on AI safety: techniques that ensure that AI systems do what their developers intend.

Amplified oversight leverages AI capabilities to evaluate AI outputs. I’m particularly excited about empirical work on debate .

Since we build AI systems through machine learning, we don’t understand how they work internally. Interpretability research such as sparse autoencoders aims to bridge this gap.

Monitoring AI systems after they are deployed broadly can defend against cases where AI systems appear safe during testing but cause problems “in the wild”.

Dangerous capability evaluations like these can provide an early warning for risks, allowing us to put appropriate mitigations in place.

Papers →