Zoom In: An Introduction to Circuits

web

distill.pub·distill.pub/2020/circuits/zoom-in/

A seminal Distill.pub paper by Olah et al. (OpenAI, 2020) that launched the 'circuits' research thread, widely considered foundational reading for mechanistic interpretability research in AI safety.

Metadata

Importance: 90/100blog postprimary source

Summary

This foundational Distill article introduces the 'circuits' framework for neural network interpretability, arguing that by studying connections between neurons we can reverse-engineer meaningful algorithms in neural network weights. It proposes three speculative claims: that features are the fundamental units of neural networks, that features are connected by circuits, and that similar features and circuits recur across different models and tasks.

Key Points

•Introduces the 'circuits' approach to mechanistic interpretability, treating neural networks as reverse-engineerable computational systems with meaningful internal structure.
•Proposes that neural networks contain interpretable 'features' (e.g., curve detectors, high-low frequency detectors) as fundamental units of computation.
•Argues that features are connected by 'circuits'—subgraphs of the network that implement identifiable algorithms.
•Claims universality: similar features and circuits appear across different architectures and tasks, suggesting convergent computational solutions.
•Uses the analogy of scientific 'zooming in' (microscopes→cells, crystallography→DNA) to frame mechanistic interpretability as a paradigm shift in understanding AI.

Cited by 2 pages

Page	Type	Quality
Chris Olah	Person	27.0
Mechanistic Interpretability	Research Area	59.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202661 KB

Zoom In: An Introduction to Circuits 
 
 
 
 
 
 Zoom In: An Introduction to Circuits 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 

 Distill
 
 
 
 
 
 { "title": "Zoom In: An Introduction to Circuits", "description": "By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.",
 "authors": [
 { "author": "Chris Olah", "authorURL": "https://colah.github.io", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Nick Cammarata", "authorURL": "http://nickcammarata.com", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Ludwig Schubert", "authorURL": "https://schubert.io/", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Gabriel Goh", "authorURL": "http://gabgoh.github.io", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Michael Petrov", "authorURL": "https://twitter.com/mpetrov", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Shan Carter", "authorURL": "http://shancarter.com", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" }
 ] } 
 
 
 Zoom In: An Introduction to Circuits

 By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.

 -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Authors

 Affiliations

 
 
 
 Chris Olah 
 

 
 OpenAI 
 

 
 
 
 Nick Cammarata 
 

 
 OpenAI 
 

 
 
 
 Ludwig Schubert 
 

 
 OpenAI 
 

 
 
 
 Gabriel Goh 
 

 
 OpenAI 
 

 
 
 
 Michael Petrov 
 

 
 OpenAI 
 

 
 
 
 Shan Carter 
 

 
 OpenAI 
 

 
 
 
 Published

 
 March 10, 2020

 
 
 
 DOI

 
 10.23915/distill.00024.001 

 
 
 
 
 
 
 
 This article is part of the Circuits thread , an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks.
 -->
 

 Circuits Thread 
 An Overview of Early Vision in InceptionV1 
 

 
 
 

 
 Introduction

-->
 
 Many important transition points in the history of science have been moments when science “zoomed in.”
 At these points, we develop a visualization or tool that allows us to see the world in a new level of detail, and a new field of science develops to study the world through this lens.
 

 
 For example, microscopes let us see cells, leading to cellular biology. Science zoomed in. Several techniques including x-ray crystallography let us see DNA, leading to the molecular revolution. Science zoomed in. Atomic theory. Subatomic particles. Neuroscience. Science zoomed in.
 

 
 These transitions weren’t just a change in precision: they were qualitative changes in what the object

... (truncated, 61 KB total)

Resource ID: 346b1574c0c3ce67 | Stable ID: sid_DlMLRNm3TO