Zoom In: An Introduction to Circuits
webA seminal Distill.pub paper by Olah et al. (OpenAI, 2020) that launched the 'circuits' research thread, widely considered foundational reading for mechanistic interpretability research in AI safety.
Metadata
Importance: 90/100blog postprimary source
Summary
This foundational Distill article introduces the 'circuits' framework for neural network interpretability, arguing that by studying connections between neurons we can reverse-engineer meaningful algorithms in neural network weights. It proposes three speculative claims: that features are the fundamental units of neural networks, that features are connected by circuits, and that similar features and circuits recur across different models and tasks.
Key Points
- •Introduces the 'circuits' approach to mechanistic interpretability, treating neural networks as reverse-engineerable computational systems with meaningful internal structure.
- •Proposes that neural networks contain interpretable 'features' (e.g., curve detectors, high-low frequency detectors) as fundamental units of computation.
- •Argues that features are connected by 'circuits'—subgraphs of the network that implement identifiable algorithms.
- •Claims universality: similar features and circuits appear across different architectures and tasks, suggesting convergent computational solutions.
- •Uses the analogy of scientific 'zooming in' (microscopes→cells, crystallography→DNA) to frame mechanistic interpretability as a paradigm shift in understanding AI.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Chris Olah | Person | 27.0 |
| Mechanistic Interpretability | Research Area | 59.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202661 KB
Zoom In: An Introduction to Circuits
Zoom In: An Introduction to Circuits
Distill
{ "title": "Zoom In: An Introduction to Circuits", "description": "By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.",
"authors": [
{ "author": "Chris Olah", "authorURL": "https://colah.github.io", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
{ "author": "Nick Cammarata", "authorURL": "http://nickcammarata.com", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
{ "author": "Ludwig Schubert", "authorURL": "https://schubert.io/", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
{ "author": "Gabriel Goh", "authorURL": "http://gabgoh.github.io", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
{ "author": "Michael Petrov", "authorURL": "https://twitter.com/mpetrov", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
{ "author": "Shan Carter", "authorURL": "http://shancarter.com", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" }
] }
Zoom In: An Introduction to Circuits
By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.
-->
Authors
Affiliations
Chris Olah
OpenAI
Nick Cammarata
OpenAI
Ludwig Schubert
OpenAI
Gabriel Goh
OpenAI
Michael Petrov
OpenAI
Shan Carter
OpenAI
Published
March 10, 2020
DOI
10.23915/distill.00024.001
This article is part of the Circuits thread , an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks.
-->
Circuits Thread
An Overview of Early Vision in InceptionV1
Introduction
-->
Many important transition points in the history of science have been moments when science “zoomed in.”
At these points, we develop a visualization or tool that allows us to see the world in a new level of detail, and a new field of science develops to study the world through this lens.
For example, microscopes let us see cells, leading to cellular biology. Science zoomed in. Several techniques including x-ray crystallography let us see DNA, leading to the molecular revolution. Science zoomed in. Atomic theory. Subatomic particles. Neuroscience. Science zoomed in.
These transitions weren’t just a change in precision: they were qualitative changes in what the object
... (truncated, 61 KB total)Resource ID:
346b1574c0c3ce67 | Stable ID: sid_DlMLRNm3TO