Longterm Wiki

Expert Positions2 topics

Topic	View	Estimate	Confidence	Date	Source check
Likelihood of Deceptive Alignment	Less likely	15%	medium	2023
Will Advanced AI Be Deceptive?	Less likely	20%	medium	2023

Organization Roles1

Google DeepMindCurrent

Mechanistic Interpretability Team Lead

2023 – present

From wiki articleRead full article →

Quick Assessment

Aspect	Assessment
Primary Role	Senior Research Scientist and Mechanistic Interpretability Team Lead, Google DeepMind (2023–present)
Key Contributions	Creator of TransformerLens (∼3,100 GitHub stars, 112+ contributors); co-author of "A Mathematical Framework for Transformer Circuits" (2021); lead on Gemma Scope (400+ open sparse autoencoders); ICLR 2023 Spotlight for grokking paper
Key Publications	"A Mathematical Framework for Transformer Circuits" (2021); "Progress Measures for Grokking via Mechanistic Interpretability" (ICLR 2023 Spotlight); "Towards Principled Evaluations of Sparse Autoencoders (SAEs)" (ICLR 2025); "SAEBench" (ICML 2025)
Institutional Affiliation	Google DeepMind; previously Anthropic (2021–2022)
Influence on AI Safety	Builds open-source tooling and training curricula for mechanistic interpretability; has mentored approximately 50 junior researchers, with 7 subsequently placed at major AI companies; named to MIT Technology Review Innovators Under 35 (2025)

Overview

Neel Nanda is a Senior Research Scientist and Mechanistic Interpretability Team Lead at Google DeepMind, where he leads research into reverse-engineering neural networks to understand how they implement algorithms. He studied Mathematics at Trinity College, Cambridge (2017–2020), then worked at Anthropic as a language model interpretability researcher under Chris Olah (2021–2022), before joining DeepMind's mechanistic interpretability team in 2023.

He is the creator of TransformerLens, an open-source library that has become a widely used tool in the interpretability research community, and a co-author of "A Mathematical Framework for Transformer Circuits," an influential Anthropic paper introducing a mathematical vocabulary for analyzing transformer behavior. His team at DeepMind produced Gemma Scope, a collection of over 400 openly released sparse autoencoders for Gemma 2 models. As of 2025, Nanda had accrued over 13,000 citations on Google Scholar, an h-index of 31, and had co-authored papers appearing at NeurIPS, ICLR, and ICML.

Nanda has also been active in training the next generation of interpretability researchers, having mentored approximately 50 junior researchers through programs including SERI MATS, with 15 co-authored papers published at top ML venues and 7 mentees subsequently placed at major AI companies.

Read full article →