200 Concrete Open Problems in Mechanistic Interpretability — AI Alignment Forum
blogCredibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Alignment Forum
Metadata
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Neel Nanda | Person | 26.0 |
Cached Content Preview
HTTP 200Fetched Apr 30, 20263 KB
 # 200 Concrete Open Problems in Mechanistic Interpretability 17[Concrete Steps to Get Started in Transformer Mechanistic Interpretability](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/9ezkEb9oGvEi6WoB3) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 5 41[200 Concrete Open Problems in Mechanistic Interpretability: Introduction](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/LbrPTJ4fmABEdEnLf) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 0 18[200 COP in MI: The Case for Analysing Toy Language Models](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/GWCgZrzWCZCuzGktv) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 2 8[200 COP in MI: Looking for Circuits in the Wild](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/XNjRwEX9kxbpzWFWd) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 3 17[200 COP in MI: Interpreting Algorithmic Problems](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/ejtFsvyhRkMofKAFy) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 0 18[200 COP in MI: Exploring Polysemanticity and Superposition](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/o6ptPu7arZrqRCxyz) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 1 11[200 COP in MI: Analysing Training Dynamics](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/hHaXzJQi6SKkeXzbg) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 0 7[200 COP in MI: Techniques, Tooling and Automation](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/btasQF7wiCYPsr5qw) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 0 10[200 COP in MI: Image Model Interpretability](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/caMoe6yNfXcaCG2u3) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 1 10[200 COP in MI: Interpreting Reinforcement Learning](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/eqvvDM25MXLGqumnf) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 0 11[200 COP in MI: Studying Learned Features in Language Models](https://www.alignmentforum.org/s/yivyHaCAmMJ3CqSyj/p/Qup9gorqpd9qKAEav) [Neel Nanda](https://www.alignmentforum.org/users/neel-nanda-1) 3y 2 x 200 Concrete Open Problems in Mechanistic Interpretability — AI Alignment Forum reCAPTCHA Recaptcha requires verification. protected by **reCAPTCHA**
Resource ID:
856cb0a13a71ff2c | Stable ID: sid_vwWkgXPGow