TransformerLens: A Library for Mechanistic Interpretability of Language Models
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
TransformerLens is a widely-used open-source library for mechanistic interpretability research on GPT-style language models, enabling researchers to inspect, cache, and manipulate internal activations to reverse-engineer learned algorithms.
Metadata
Summary
TransformerLens is a Python library created by Neel Nanda and maintained by Bryce Meyer that enables mechanistic interpretability research on 50+ open-source language models. It exposes internal activations, supports caching of any activation, and allows editing or replacing activations during inference. It has been used in numerous influential mechanistic interpretability papers.
Key Points
- •Supports loading and inspecting 50+ open-source GPT-style language models with full activation access
- •Enables caching, editing, removing, or replacing internal activations during model forward passes
- •Used in landmark MI research including grokking, circuit discovery, induction heads, and universality studies
- •Installable via pip with a simple API; version 3.0 introduces TransformerBridge supporting broader architectures
- •Central infrastructure tool for the mechanistic interpretability research community
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Neel Nanda | Person | 26.0 |
Cached Content Preview
TransformerLensOrg
/
TransformerLens
Public
Notifications
You must be signed in to change notification settings
Fork
557
Star
3.4k
main Branches Tags Go to file Code Open more actions menu Folders and files
Name Name Last commit message Last commit date Latest commit
History
1,198 Commits 1,198 Commits .devcontainer .devcontainer .github .github .vscode .vscode assets assets debugging debugging demos demos docs docs easy_transformer easy_transformer tests tests transformer_lens transformer_lens .gitattributes .gitattributes .gitconfig .gitconfig .gitignore .gitignore LICENSE LICENSE Main_Demo.ipynb Main_Demo.ipynb README.md README.md makefile makefile pyproject.toml pyproject.toml uv.lock uv.lock View all files Repository files navigation
TransformerLens
A Library for Mechanistic Interpretability of Generative Language Models. Maintained by Bryce Meyer and created by Neel Nanda
This is a library for doing mechanistic
interpretability of GPT-2 Style language models. The
goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms
the model learned during training from its weights.
TransformerLens lets you load in 50+ different open source language models, and exposes the internal
activations of the model to you. You can cache any internal activation in the model, and add in
functions to edit, remove or replace these activations as the model runs.
Quick Start
Install
pip install transformer_lens
Python 3.8 or 3.9
pip install ' transformer_lens~=2.0 '
Use
from transformer_lens . model_bridge import TransformerBridge
# Load a model (eg GPT-2 Small)
bridge = TransformerBridge . boot_transformers ( "gpt2" , device = "cpu" )
# Run the model and get logits and activations
logits , activations = bridge . run_with_cache ( "Hello World" )
TransformerBridge is the recommended 3.0 path and supports 50+ architectures. The legacy HookedTransformer.from_pretrained API is still available through a compatibility layer but is deprecated - see the Migrating to TransformerLens 3 guide for conversion recipes.
Key Tutorials
Introduction to the Library and Mech
Interp
Demo of Main TransformerLens Features
Gallery
Research done involving TransformerLens:
Progress Measures for Grokking via Mechanistic
Interpretability (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
Finding Neurons in a Haystack: Case Studies with Sparse
Probing by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
Harvey, Dmitrii Troitskii, Dimitris Bertsimas
Towards Automated Circuit Discovery for Mechanistic
Interpretability by Arthur Conmy, Augustine N. Mavor-Parker,
Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
Actually, Othel
... (truncated, 10 KB total)c5b41066b0ec2f58 | Stable ID: sid_LhwvMxuLFA