Goodfire blog: Announcing Goodfire Ember
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Goodfire
Goodfire Ember represents an early attempt to commercialize and democratize mechanistic interpretability tooling via a hosted API, making SAE-based model inspection accessible beyond specialized research labs.
Metadata
Summary
Goodfire announces Ember, the first hosted mechanistic interpretability API providing access to sparse autoencoder (SAE) models for analyzing and steering large language models like Llama 3.3 70B. The platform exposes 'features' as interpretable patterns of neuron activity, enabling researchers and organizations to programmatically inspect and modify model internals for safety and alignment purposes.
Key Points
- •Ember is the first hosted mechanistic interpretability API, offering SAE-based feature extraction for Llama 3.3 70B and Llama 3.1 8B inference.
- •Core abstraction is 'features' — interpretable patterns extracted from model residual streams via sparse autoencoders that capture meaningful concepts.
- •Feature steering allows programmatic tuning of model internals; 'Auto Steer' mode automates finding relevant features from a natural language prompt.
- •Early adopters include Apollo Research and Haize Labs, using Ember for safety benchmarks, PII security analysis, and scientific knowledge extraction.
- •SAE interpreter models are planned for open-sourcing; as of Feb 2026 the public API/demo was deprecated in favor of a partner-focused platform.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Goodfire | Organization | 68.0 |
Cached Content Preview

Blog
**Update (Feb 2026):**
**Ember** now refers to our general-purpose platform for interpretability that we deploy with select partners. The demo interface and API have been deprecated.
# Goodfire Ember: Scaling Interpretability for Frontier Model Alignment
Ember is the first hosted mechanistic interpretability API, with inference support for generative models like Llama 3.3 70B.
### Authors
### Affiliations
[Daniel Balsam](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
[Myra Deng](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
[Nam Nguyen](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
[Liv Gorton](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
[Thariq Shihipar](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
[Eric Ho](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
[Thomas McGrath](https://goodfire.ai/)
[Goodfire Research](https://goodfire.ai/)
### Published
Dec. 22, 2024
### DOI
_No DOI yet._

Today, we're releasing Goodfire Ember — an API/SDK that makes large-scale interpretability work accessible to the broader community. As part of our commitment to research collaboration, the state-of-the-art interpreter models that power our API (sparse autoencoders or SAEs) will be open-sourced in the upcoming weeks. We're inviting AI researchers to leverage Ember's powerful capabilities to accelerate alignment research and tackle this critical challenge alongside our lab.
Ember is already being used by leading organizations like Rakuten, Apollo Research, and Haize Labs, among others. Our early partners are using Ember to:
- Improve model performance on key safety benchmarks by activating relevant features
- Uncover new scientific knowledge from specialized foundation models
- Improve model security by investigating the model's understanding of PII
Since our last research preview, we've advanced on three key fronts: developing state-of-the-art interpreter models (SAEs), expanding SAE feature programming applications, and building fast, reliable infrastructure to support these capabilities.
Ember is now available on [platform.goodfire.ai](http://platform.goodfire.ai/), with support for Llama 3.3 70B and Llama 3.1 8B.
## Features are Ember's core interface
Our core abstraction is the concept of "features." Features are interpretable patterns of neuron activity that our interpreter models (SAEs) extract. These features capture how a model processes information, providing insights into its inner workings. While individual neurons work together in complex ways, features represent meaningful concepts that emerge from these interactions - like a model's understanding of "conci
... (truncated, 11 KB total)6f8084939203873f | Stable ID: OWYxZWFlZD