Skip to content
Longterm Wiki

About Neel Nanda – Mechanistic Interpretability Researcher

web
neelnanda.io·neelnanda.io/about

Personal homepage of Neel Nanda, head of Google DeepMind's mechanistic interpretability team and former Anthropic researcher, providing an overview of his background, research focus, and resources in mechanistic interpretability for AI safety.

Metadata

Importance: 42/100homepage

Summary

This is the personal about page of Neel Nanda, who leads the Google DeepMind mechanistic interpretability team and previously worked at Anthropic under Chris Olah. It outlines his background in AI safety research, his motivation to reduce existential risk from AI, and links to his key contributions including the TransformerLens library, educational resources, and mentorship programs.

Key Points

  • Neel Nanda leads the Google DeepMind mechanistic interpretability team, focused on reverse-engineering algorithms learned by neural networks.
  • Previously worked at Anthropic as a language model interpretability researcher under Chris Olah.
  • Created TransformerLens, a widely-used library for mechanistic interpretability of language models.
  • Runs the MATS mentorship stream, a full-time research program for aspiring AI safety researchers.
  • Motivated by reducing existential risk from AI; affiliated with Effective Altruism and rationality communities.

1 FactBase fact citing this source

Cached Content Preview

HTTP 200Fetched May 11, 20266 KB
About Me 

 Hi, I’m Neel! I run the Google DeepMind mechanistic interpretability team, our job is to take a trained neural network and try to reverse engineer the algorithms and structures it has learned. If you want to learn more about the field, see my appearance on the Machine Learning Street Talk podcast .

 I see the main goal of my work as reducing existential risk from AI, and I consider myself part of the Effective Altruism and rationality communities. Prior to this, I did independent mechanistic interpretability research, and I worked at Anthropic as a language model interpretability researcher under Chris Olah . You can see my papers here . 

 The main way I currently mentor people is via my MATS stream , a full-time research program that happens twice a year (over the summer and over the winter). You can read more about the process and how to apply here .

 Before all that, I did a pure maths undergrad at Cambridge (graduated in 2020), interned in quant finance roles (Jane Street and Jump Trading), before deciding that it wasn’t for me and taking the year after graduating to explore AI Safety and figure out what was going on in that space (interning at the Future of Humanity Institute, DeepMind and the Centre for Human-Compatible AI). After that year, I decided that existential risk from powerful AI is one of the most important problems of this century, and one worth spending my career trying to help with.

 If you have thoughts on anything I’ve written, or otherwise want to contact me, you can email me at neelnanda27@gmail.com . (Though I don’t have capacity to respond to everyone who reaches out, so apologies in advance!)

 About my blog 

 This blog is a collection of my thoughts on various ideas I’ve found valuable for being happy, improving my life, or understanding the world. (You can see my more technical blog posts in the mechanistic interpretability section , which is what I mostly write about nowadays) A lot of them focus on self-improvement and rationality, but also cover topics such as emotions, friendships, social skills, teaching, agency, motivation, achieving goals and altruism. See the Top Posts page to get an idea of where to start. I blog according to my internal sense of fun and whimsy, and accordingly don’t blog on any fixed schedule. Depending on how busy I am, I will sometimes have bursts of many posts and long breaks in between. You can subscribe to hear about new posts here .

 I started this blog as an exercise in being less of a bloody perfectionist, so each post is deliberately written as a rough first draft, with minimal editing. Accordingly, these are highly, and deliberately not quality controlled! Please don’t take these posts as a perfect representation of what I believe, or the best representations of these ideas - I’m cutting a lot of the nuance and caveats I’d give in a proper treatment! But I feel very happy with how some of them have come out, and I hope my unfiltered ramblings are interesting and u

... (truncated, 6 KB total)
Resource ID: d0386b44406e85cd | Stable ID: sid_bnoZRPYYcg