Rohin Shah - Berkeley
webMetadata
Cached Content Preview
HTTP 200Fetched Apr 30, 20263 KB
[Skip to content](https://rohinshah.com/#wp--skip-link--target)
Hi, I’m Rohin! I lead the AGI Safety & Alignment team at [Google DeepMind](https://deepmind.google/), where we prepare for the development of powerful AI systems, through both [research](https://rohinshah.com/research/) and [policy implementation](https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/).
I completed my PhD at the [Center for Human-Compatible AI](https://humancompatible.ai/) at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don’t initially know what the user wants. I used to write up paper summaries in the [Alignment Newsletter](https://rohinshah.com/alignment-newsletter/), though the newsletter is unfortunately on indefinite hiatus now.
In my free time, I enjoy puzzles, board games, and karaoke. You can email me at rohinmshah@gmail.com, though if you want to ask me about careers in AI alignment, you should read [my FAQ](https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/) first.
[Research →](https://rohinshah.com/research/)
[Alignment Newsletter →](https://rohinshah.com/alignment-newsletter/)

## Research
My research focuses on AI safety: techniques that ensure that AI systems do what their developers intend.
**Amplified oversight** leverages AI capabilities to evaluate AI outputs. I’m particularly excited about [empirical work](https://arxiv.org/abs/2407.04622) on [debate](https://arxiv.org/abs/1805.00899).
Since we build AI systems through machine learning, we don’t understand how they work internally. **Interpretability** research such as [sparse](https://arxiv.org/abs/2404.16014) [autoencoders](https://arxiv.org/abs/2408.05147) aims to bridge this gap.
**Monitoring** AI systems after they are deployed broadly can defend against cases where AI systems appear safe during testing but cause problems “in the wild”.
**Dangerous capability evaluations** like [these](https://arxiv.org/abs/2403.13793) can provide an [early warning](https://deepmind.google/discover/blog/an-early-warning-system-for-novel-ai-risks/) for risks, allowing us to put appropriate mitigations in place.
[Papers →](https://rohinshah.com/research/)
## Latest Articles
- [](https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/)
January 3, 2021
## [FAQ: Advice for AI alignment researchers](https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/)
- [](https://rohinshah.com/teaching-from-simple-abstractions/)
January 8, 2017
## [Teaching from Simple Abstractions](https://rohinshah.com/teaching-from-simple-abstractions/)
- [Resource ID:
3fe7bb7b357ec56d | Stable ID: sid_EqHoxVsuUQ