TransformerLens: A Library for Mechanistic Interpretability of Language Models

web

GitHub·github.com/TransformerLensOrg/TransformerLens

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

TransformerLens is a widely-used open-source library for mechanistic interpretability research on GPT-style language models, enabling researchers to inspect, cache, and manipulate internal activations to reverse-engineer learned algorithms.

Metadata

Importance: 82/100tool pagetool

Summary

TransformerLens is a Python library created by Neel Nanda and maintained by Bryce Meyer that enables mechanistic interpretability research on 50+ open-source language models. It exposes internal activations, supports caching of any activation, and allows editing or replacing activations during inference. It has been used in numerous influential mechanistic interpretability papers.

Key Points

•Supports loading and inspecting 50+ open-source GPT-style language models with full activation access
•Enables caching, editing, removing, or replacing internal activations during model forward passes
•Used in landmark MI research including grokking, circuit discovery, induction heads, and universality studies
•Installable via pip with a simple API; version 3.0 introduces TransformerBridge supporting broader architectures
•Central infrastructure tool for the mechanistic interpretability research community

Cited by 1 page

Page	Type	Quality
Neel Nanda	Person	26.0

Cached Content Preview

HTTP 200Fetched Apr 28, 202610 KB

TransformerLensOrg
 
 / 
 
 TransformerLens 
 

 Public 
 

 

 
 
 
 

 
 
 
 Notifications
 You must be signed in to change notification settings 

 

 
 
 
 Fork
 557 
 
 

 
 
 
 
 
 Star
 3.4k 
 
 

 

 
 

 
 

 

 
 

 
 
 

 
 
 

 
 
 
   main Branches Tags Go to file Code Open more actions menu Folders and files

 Name Name Last commit message Last commit date Latest commit

   History

 1,198 Commits 1,198 Commits .devcontainer .devcontainer     .github .github     .vscode .vscode     assets assets     debugging debugging     demos demos     docs docs     easy_transformer easy_transformer     tests tests     transformer_lens transformer_lens     .gitattributes .gitattributes     .gitconfig .gitconfig     .gitignore .gitignore     LICENSE LICENSE     Main_Demo.ipynb Main_Demo.ipynb     README.md README.md     makefile makefile     pyproject.toml pyproject.toml     uv.lock uv.lock     View all files Repository files navigation

 TransformerLens

 

 
 
 
 

 A Library for Mechanistic Interpretability of Generative Language Models. Maintained by Bryce Meyer and created by Neel Nanda 

 

 This is a library for doing mechanistic
interpretability of GPT-2 Style language models. The
goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms
the model learned during training from its weights.

 TransformerLens lets you load in 50+ different open source language models, and exposes the internal
activations of the model to you. You can cache any internal activation in the model, and add in
functions to edit, remove or replace these activations as the model runs.

 Quick Start

 
 Install

 
 pip install transformer_lens 
 Python 3.8 or 3.9

 
 pip install ' transformer_lens~=2.0 ' 
 Use

 
 from transformer_lens . model_bridge import TransformerBridge 

 # Load a model (eg GPT-2 Small) 
 bridge = TransformerBridge . boot_transformers ( "gpt2" , device = "cpu" )

 # Run the model and get logits and activations 
 logits , activations = bridge . run_with_cache ( "Hello World" ) 
 TransformerBridge is the recommended 3.0 path and supports 50+ architectures. The legacy HookedTransformer.from_pretrained API is still available through a compatibility layer but is deprecated - see the Migrating to TransformerLens 3 guide for conversion recipes.

 Key Tutorials

 
 
 Introduction to the Library and Mech
Interp 

 Demo of Main TransformerLens Features 

 
 Gallery

 
 Research done involving TransformerLens:

 
 Progress Measures for Grokking via Mechanistic
Interpretability (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

 Finding Neurons in a Haystack: Case Studies with Sparse
Probing by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
Harvey, Dmitrii Troitskii, Dimitris Bertsimas

 Towards Automated Circuit Discovery for Mechanistic
Interpretability by Arthur Conmy, Augustine N. Mavor-Parker,
Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso

 Actually, Othel

... (truncated, 10 KB total)

Resource ID: c5b41066b0ec2f58 | Stable ID: sid_LhwvMxuLFA