Quick Assessment
| Aspect | Assessment |
|---|---|
| Primary Role | Senior Research Scientist and Mechanistic Interpretability Team Lead, Google DeepMind (2023–present) |
| Key Contributions | Creator of TransformerLens (∼3,100 GitHub stars, 112+ contributors); co-author of "A Mathematical Framework for Transformer Circuits" (2021); lead on Gemma Scope (400+ open sparse autoencoders); ICLR 2023 Spotlight for grokking paper |
| Key Publications | "A Mathematical Framework for Transformer Circuits" (2021); "Progress Measures for Grokking via Mechanistic Interpretability" (ICLR 2023 Spotlight); "Towards Principled Evaluations of Sparse Autoencoders (SAEs)" (ICLR 2025); "SAEBench" (ICML 2025) |
| Institutional Affiliation | Google DeepMind; previously Anthropic (2021–2022) |
| Influence on AI Safety | Builds open-source tooling and training curricula for mechanistic interpretability; has mentored approximately 50 junior researchers, with 7 subsequently placed at major AI companies; named to MIT Technology Review Innovators Under 35 (2025) |
Overview
Neel Nanda is a Senior Research Scientist and Mechanistic Interpretability Team Lead at Google DeepMind, where he leads research into reverse-engineering neural networks to understand how they implement algorithms. He studied Mathematics at Trinity College, Cambridge (2017–2020), then worked at Anthropic as a language model interpretability researcher under Chris Olah (2021–2022), before joining DeepMind's mechanistic interpretability team in 2023.
He is the creator of TransformerLens, an open-source library that has become a widely used tool in the interpretability research community, and a co-author of "A Mathematical Framework for Transformer Circuits," an influential Anthropic paper introducing a mathematical vocabulary for analyzing transformer behavior. His team at DeepMind produced Gemma Scope, a collection of over 400 openly released sparse autoencoders for Gemma 2 models. As of 2025, Nanda had accrued over 13,000 citations on Google Scholar, an h-index of 31, and had co-authored papers appearing at NeurIPS, ICLR, and ICML.
Nanda has also been active in training the next generation of interpretability researchers, having mentored approximately 50 junior researchers through programs including SERI MATS, with 15 co-authored papers published at top ML venues and 7 mentees subsequently placed at major AI companies.