All Publications
Transformer Circuits
Company BlogHigh(4)
Anthropic's interpretability research
Credibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
10
Resources
15
Citing pages
1
Tracked domains
Tracked Domains
transformer-circuits.pub
Resources (10)
10 resources
Citing Pages (15)
AI Accident Risk CruxesAI AlignmentAnthropicAnthropic Core ViewsChris OlahDense TransformersInterpretabilityIs Interpretability Sufficient for Safety?Mechanistic InterpretabilityProbing / Linear ProbesAI Scaling LawsSituational AwarenessSleeper Agent DetectionSparse Autoencoders (SAEs)Technical AI Safety Research
Publication ID:
transformer-circuits