Back
Mechanistic interpretability work
webneelnanda.io·neelnanda.io/
Neel Nanda is one of the most prolific researchers in mechanistic interpretability; his homepage aggregates papers, blog posts, and tools that are frequently cited as entry points into the field.
Metadata
Importance: 72/100homepage
Summary
Neel Nanda's personal research homepage focused on mechanistic interpretability of neural networks, aiming to reverse-engineer how transformers and other models implement algorithms internally. His work includes foundational contributions like the discovery of grokking phenomena, superposition in neural networks, and developing TransformerLens, a key tool for interpretability research.
Key Points
- •Leads mechanistic interpretability research at Google DeepMind, previously at Anthropic, focusing on understanding transformer internals
- •Developed TransformerLens, a widely-used open-source library for interpretability research on GPT-style language models
- •Contributed foundational work on superposition hypothesis explaining how neural nets compress more features than dimensions available
- •Researched grokking — delayed generalization in neural networks — providing mechanistic explanations for this phenomenon
- •Produces educational content, tutorials, and research agendas to grow the mechanistic interpretability research community
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Neel Nanda | Person | 26.0 |
| Deceptive Alignment | Risk | 75.0 |
2 FactBase facts citing this source
| Entity | Property | Value | As Of |
|---|---|---|---|
| Neel Nanda | Employed By | A4XoubikkQ | Jan 2023 |
| Neel Nanda | Role / Title | Research Scientist, Google DeepMind | Jan 2023 |
Cached Content Preview
HTTP 200Fetched Mar 15, 20268 KB
Neel Nanda
0
Neel Nanda
8/19/25
Neel Nanda
8/19/25
MATS Applications Open (Due Aug 29)
I am looking for people who want to be supervised by me to write a mech interp paper. Apply here now ! Due Aug 29
Read More
Neel Nanda
5/26/25
Neel Nanda
5/26/25
Post 51: Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice
I recommend giving advice by asking questions to walk someone through key steps in my argument — often I’m missing key info, which comes up quickly as an unexpected answer, while if I’m right I’m more persuasive
Read More
Neel Nanda
3/22/25
Neel Nanda
3/22/25
Post 50: Good Research Takes are Not Sufficient for Good Strategic Takes
Having a good research track record is some evidence of good big-picture takes about AGI, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically.
Read More
Neel Nanda
8/18/22
Neel Nanda
8/18/22
Interlude: A Mechanistic Interpretability Analysis of Grokking
Link post for some independent interpretability research I did
Read More
Neel Nanda
6/17/22
Neel Nanda
6/17/22
Post 49: Things That Make Me Enjoy Giving Career Advice
Thoughts on what makes giving career advice more fun for me
Read More
Neel Nanda
6/15/22
Neel Nanda
6/15/22
Post 48: Prioritise Tasks by Rating not Sorting
A short note on priority ordering tasks by rating them out of 10, rather than explicitly sorting into a priority order
Read More
Neel Nanda
2/27/22
Neel Nanda
2/27/22
Post 47: How I Formed My Own Views About AI Safety
How I formed my own views on the complex topic of ‘will AI kill us all, and should I work on stopping this’, and traps I fell into
Read More
Neel Nanda
2/22/22
Neel Nanda
2/22/22
Post 46: Reward Good Bets That Had Bad Outcomes
In many important areas of life, I want to persevere through many failures for a few big successes. As a highly anxious person, this is hard! I instead focus on whether I made a good bet, not whether it failed.
Read More
Neel Nanda
2/11/22
Neel Nanda
2/11/22
... (truncated, 8 KB total)Resource ID:
028435b427f72e06 | Stable ID: Yzc5ODkzZm