AI interpretability research lab developing tools to decode and control neural network internals for safer AI systems.
Key Metrics
Funding Rounds
Related Wiki Pages
Top Related Pages
Sparse Autoencoders (SAEs)
Sparse autoencoders extract interpretable features from neural network activations using sparsity constraints.
Anthropic
An AI safety company founded by former OpenAI researchers that develops frontier AI models while pursuing safety research, including the Claude mod...
OpenAI
Leading AI lab that developed GPT models and ChatGPT, analyzing organizational evolution from non-profit research to commercial AGI development ami...
Chris Olah
Co-founder of Anthropic and researcher in neural network interpretability, known for developing mechanistic interpretability as a research program
Dario Amodei
CEO of Anthropic advocating competitive safety development philosophy with Constitutional AI, responsible scaling policies, and empirical alignment...