Skip to content
Longterm Wiki
CO

Chris Olah

Also known as: Christopher Olah

Pioneer of neural network interpretability and visualization; co-founder of Anthropic; creator of Distill.pub and the Circuits thread at Transformer Circuits

Current Role
Co-founder, Interpretability
Organization
Anthropic

Expert Positions2 topics

TopicViewEstimateConfidenceDateSource
How Hard Is Alignment?Tractable via interpretabilitySolvable with sufficient transparency toolsmedium202180,000 Hours Podcast
Will Advanced AI Be Deceptive?Detectable with interpretabilityInterpretability provides a "mulligan" to catch deceptionmediumDec 2021Chris Olah's views on AGI safety (LessWrong)

Education

Attended University of Toronto (did not complete degree); Thiel Fellow

From wiki articleRead full article →

Quick Assessment

DimensionAssessment
Primary RoleCo-founder and interpretability research lead at Anthropic
Key ContributionsFeature visualization techniques, circuit analysis methodology, sparse autoencoder applications for interpretability, co-founding Distill journal
Key Publications"Towards Monosemanticity" (2023), "Scaling Monosemanticity" (2024), "Toy Models of Superposition" (2022), "Feature Visualization" (2017), "The Building Blocks of Interpretability" (2018)
Institutional AffiliationAnthropic (2021–present); previously OpenAI (2018–2020), Google Brain (2015–2018)
RecognitionNamed to TIME's 100 Most Influential People in AI (2024); 2012 Thiel Fellow
Influence on AI SafetyContributed to establishing Mechanistic Interpretability as a research direction within AI safety; applied transparency and verification approaches to Large Language Models

Overview

Chris Olah is a Canadian machine learning researcher specializing in neural network interpretability and a co-founder of Anthropic. He is known primarily for developing and advancing the research program now called mechanistic interpretability, which aims to reverse-engineer the internal algorithms and representations of neural networks — based on the hypothesis that such reverse-engineering is tractable, a claim that remains contested in the research community. His career has spanned Google Brain, OpenAI, and Anthropic, where he currently leads interpretability research.

Olah followed an unconventional path into research: he has no undergraduate degree, left university as a teenager, and built his early reputation through independent blog posts at colah.github.io and a 2012 Thiel Fellowship. His blog posts on topics such as LSTM networks and neural network representations attracted significant readership in the machine learning community before he joined Google Brain in 2015.

In 2016, Olah co-founded Distill, a peer-reviewed journal emphasizing interactive visualizations and web-native presentation of machine learning research, which operated until it entered an indefinite hiatus in July 2021. At Anthropic, he leads a team — which had grown to 17 researchers by April 2024 — focused on understanding the internal mechanisms of frontier AI systems including Claude. TIME magazine named him to its 2024 list of 100 Most Influential People in AI, describing him as "one of the pioneers of an entirely new scientific field, mechanistic interpretability."

A notable feature of Olah's institutional position is that he both leads the interpretability research program and co-founded the commercial AI laboratory — Anthropic — that funds, publishes, and benefits reputationally from demonstrating safety progress. This dual role is worth bearing in mind when evaluating claims about the maturity or impact of the interpretability program, particularly given that Anthropic has commercial interests in being seen as a safety-conscious organization.

Links

References

Facts

9
People
Role / TitleCo-founder, Interpretability
Employed ByAnthropic
General
Websitehttps://colah.github.io
Biographical
EducationAttended University of Toronto (did not complete degree); Thiel Fellow
Notable ForPioneer of neural network interpretability and visualization; co-founder of Anthropic; creator of Distill.pub and the Circuits thread at Transformer Circuits
Social Media@ch402
GitHubhttps://github.com/colah
Google Scholarhttps://scholar.google.com/citations?user=vKAKE1gAAAAJ
Wikipediahttps://en.wikipedia.org/wiki/Chris_Olah