Skip to content
Longterm Wiki

Rohin Shah - Berkeley

web
rohinshah.com·rohinshah.com

Metadata

Cached Content Preview

HTTP 200Fetched Apr 30, 20263 KB
[Skip to content](https://rohinshah.com/#wp--skip-link--target)

Hi, I’m Rohin! I lead the AGI Safety & Alignment team at [Google DeepMind](https://deepmind.google/), where we prepare for the development of powerful AI systems, through both [research](https://rohinshah.com/research/) and [policy implementation](https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/).

I completed my PhD at the [Center for Human-Compatible AI](https://humancompatible.ai/) at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don’t initially know what the user wants. I used to write up paper summaries in the [Alignment Newsletter](https://rohinshah.com/alignment-newsletter/), though the newsletter is unfortunately on indefinite hiatus now.

In my free time, I enjoy puzzles, board games, and karaoke. You can email me at rohinmshah@gmail.com, though if you want to ask me about careers in AI alignment, you should read [my FAQ](https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/) first.

[Research →](https://rohinshah.com/research/)

[Alignment Newsletter →](https://rohinshah.com/alignment-newsletter/)

![](https://rohinshah.com/wp-content/uploads/2024/12/Rohin.jpg)

## Research

My research focuses on AI safety: techniques that ensure that AI systems do what their developers intend.

**Amplified oversight** leverages AI capabilities to evaluate AI outputs. I’m particularly excited about [empirical work](https://arxiv.org/abs/2407.04622) on [debate](https://arxiv.org/abs/1805.00899).

Since we build AI systems through machine learning, we don’t understand how they work internally. **Interpretability** research such as [sparse](https://arxiv.org/abs/2404.16014) [autoencoders](https://arxiv.org/abs/2408.05147) aims to bridge this gap.

**Monitoring** AI systems after they are deployed broadly can defend against cases where AI systems appear safe during testing but cause problems “in the wild”.

**Dangerous capability evaluations** like [these](https://arxiv.org/abs/2403.13793) can provide an [early warning](https://deepmind.google/discover/blog/an-early-warning-system-for-novel-ai-risks/) for risks, allowing us to put appropriate mitigations in place.

[Papers →](https://rohinshah.com/research/)

## Latest Articles

- [![FAQ: Advice for AI alignment researchers](https://rohinshah.com/wp-content/uploads/2021/01/faq.jpeg)](https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/)

January 3, 2021



## [FAQ: Advice for AI alignment researchers](https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/)

- [![Teaching from Simple Abstractions](https://rohinshah.com/wp-content/uploads/2024/12/sicp.jpg)](https://rohinshah.com/teaching-from-simple-abstractions/)

January 8, 2017



## [Teaching from Simple Abstractions](https://rohinshah.com/teaching-from-simple-abstractions/)

- [![Thoughts on the “Meta Trap”](https://rohinshah.com/wp-content/uploads/2024/1

... (truncated, 3 KB total)
Resource ID: 3fe7bb7b357ec56d | Stable ID: sid_EqHoxVsuUQ