Goodfire
goodfire (E430)← Back to pagePath: /knowledge-base/organizations/goodfire/
Page Metadata
{
"id": "goodfire",
"numericId": null,
"path": "/knowledge-base/organizations/goodfire/",
"filePath": "knowledge-base/organizations/goodfire.mdx",
"title": "Goodfire",
"quality": 68,
"importance": 72,
"contentFormat": "article",
"tractability": null,
"neglectedness": null,
"uncertainty": null,
"causalLevel": null,
"lastUpdated": "2026-02-12",
"llmSummary": "Goodfire is a well-funded AI interpretability startup valued at $1.25B (Feb 2026) developing mechanistic interpretability tools like Ember API to make neural networks more transparent and steerable. The company's pivot toward using interpretability in model training (\"intentional design\") has sparked significant AI safety community debate about whether this compromises interpretability as an independent safety tool.",
"structuredSummary": null,
"description": "AI interpretability research lab developing tools to decode and control neural network internals for safer AI systems",
"ratings": {
"novelty": 6,
"rigor": 7,
"actionability": 6,
"completeness": 8
},
"category": "organizations",
"subcategory": "safety-orgs",
"clusters": [
"ai-safety",
"community"
],
"metrics": {
"wordCount": 2740,
"tableCount": 3,
"diagramCount": 0,
"internalLinks": 19,
"externalLinks": 52,
"footnoteCount": 50,
"bulletRatio": 0.15,
"sectionCount": 25,
"hasOverview": true,
"structuralScore": 13
},
"suggestedQuality": 87,
"updateFrequency": 21,
"evergreen": true,
"wordCount": 2740,
"unconvertedLinks": [],
"unconvertedLinkCount": 0,
"convertedLinkCount": 0,
"backlinkCount": 1,
"redundancy": {
"maxSimilarity": 18,
"similarPages": [
{
"id": "anthropic-core-views",
"title": "Anthropic Core Views",
"path": "/knowledge-base/responses/anthropic-core-views/",
"similarity": 18
},
{
"id": "interpretability",
"title": "Mechanistic Interpretability",
"path": "/knowledge-base/responses/interpretability/",
"similarity": 18
},
{
"id": "elicit",
"title": "Elicit (AI Research Tool)",
"path": "/knowledge-base/organizations/elicit/",
"similarity": 17
},
{
"id": "research-agendas",
"title": "AI Alignment Research Agenda Comparison",
"path": "/knowledge-base/responses/research-agendas/",
"similarity": 17
},
{
"id": "frontier-model-forum",
"title": "Frontier Model Forum",
"path": "/knowledge-base/organizations/frontier-model-forum/",
"similarity": 16
}
]
}
}Entity Data
{
"id": "goodfire",
"type": "organization",
"title": "Goodfire",
"description": "AI interpretability research lab developing tools to decode and control neural network internals for safer AI systems.",
"tags": [
"mechanistic-interpretability",
"sparse-autoencoders",
"ai-safety-startup",
"model-transparency",
"feature-steering"
],
"relatedEntries": [
{
"id": "anthropic",
"type": "lab"
},
{
"id": "dario-amodei",
"type": "researcher"
},
{
"id": "chris-olah",
"type": "researcher"
},
{
"id": "openai",
"type": "organization"
},
{
"id": "deepmind",
"type": "lab"
},
{
"id": "interpretability",
"type": "safety-agenda"
}
],
"sources": [],
"lastUpdated": "2026-02",
"customFields": []
}Canonical Facts (0)
No facts for this entity
External Links
No external links
Backlinks (1)
| id | title | type | relationship |
|---|---|---|---|
| sparse-autoencoders | Sparse Autoencoders (SAEs) | approach | — |
Frontmatter
{
"title": "Goodfire",
"description": "AI interpretability research lab developing tools to decode and control neural network internals for safer AI systems",
"importance": 72,
"lastEdited": "2026-02-12",
"update_frequency": 21,
"sidebar": {
"order": 60
},
"ratings": {
"novelty": 6,
"rigor": 7,
"actionability": 6,
"completeness": 8
},
"quality": 68,
"llmSummary": "Goodfire is a well-funded AI interpretability startup valued at $1.25B (Feb 2026) developing mechanistic interpretability tools like Ember API to make neural networks more transparent and steerable. The company's pivot toward using interpretability in model training (\"intentional design\") has sparked significant AI safety community debate about whether this compromises interpretability as an independent safety tool.",
"clusters": [
"ai-safety",
"community"
],
"subcategory": "safety-orgs",
"entityType": "organization"
}Raw MDX Source
---
title: Goodfire
description: AI interpretability research lab developing tools to decode and control neural network internals for safer AI systems
importance: 72
lastEdited: "2026-02-12"
update_frequency: 21
sidebar:
order: 60
ratings:
novelty: 6
rigor: 7
actionability: 6
completeness: 8
quality: 68
llmSummary: Goodfire is a well-funded AI interpretability startup valued at $1.25B (Feb 2026) developing mechanistic interpretability tools like Ember API to make neural networks more transparent and steerable. The company's pivot toward using interpretability in model training ("intentional design") has sparked significant AI safety community debate about whether this compromises interpretability as an independent safety tool.
clusters:
- ai-safety
- community
subcategory: safety-orgs
entityType: organization
---
import {EntityLink, KeyPeople, KeyQuestions, Section} from '@components/wiki';
## Quick Assessment
| Dimension | Assessment |
|-----------|------------|
| **Founded** | June 2024 |
| **Type** | Public benefit corporation, AI interpretability research lab |
| **Location** | San Francisco, California |
| **Funding** | \$209M+ (Seed: \$7M Aug 2024, Series A: \$50M Apr 2025, Series B: \$150M Feb 2026) |
| **Valuation** | \$1.25B (as of Series B, Feb 2026) |
| **Employees** | ≈39 (as of early 2026) |
| **Key Product** | Ember (<EntityLink id="E174">mechanistic interpretability</EntityLink> API and platform) |
| **Focus** | Mechanistic interpretability, sparse autoencoders, AI safety |
| **Notable Backers** | Anthropic (first direct investment), Menlo Ventures, Lightspeed Venture Partners |
## Key Links
| Source | Link |
|--------|------|
| Official Website | [goodfire.ai](https://www.goodfire.ai) |
| Wikipedia | [en.wikipedia.org](https://en.wikipedia.org/wiki/Goodfire) |
## Overview
Goodfire is an AI interpretability research lab and public benefit corporation specializing in mechanistic interpretability—the science of reverse-engineering neural networks to understand and control their internal workings.[^1] Founded in June 2024 by Eric Ho (CEO), Dan Balsam (CTO), and Tom McGrath (Chief Scientist), the company aims to transform opaque AI systems into transparent, steerable, and safer technologies.[^2]
The company's flagship product, Ember, is the first hosted mechanistic interpretability API, providing researchers and developers with programmable access to AI model internals.[^3] Rather than treating models as black boxes, Ember enables users to examine individual "features" (interpretable patterns of neural activation), edit model behavior without retraining, and audit for safety issues before deployment. The platform supports models like Llama 3.3 70B and processes tokens at a rate that has tripled monthly since launch in December 2024.[^4]
Goodfire's rapid ascent reflects growing industry recognition that interpretability is foundational to AI safety. The company raised \$50 million in Series A funding less than one year after founding, led by Menlo Ventures with participation from <EntityLink id="E22">Anthropic</EntityLink>—marking Anthropic's first direct investment in another company.[^5] Anthropic CEO <EntityLink id="E91">Dario Amodei</EntityLink> stated that "mechanistic interpretability is among the best bets to help us transform black-box neural networks into understandable, steerable systems."[^6] In February 2026, Goodfire raised a further \$150 million in Series B funding at a \$1.25 billion valuation, led by B Capital with participation from Salesforce Ventures, Eric Schmidt, and existing investors.[^43]
## History and Founding
The founding team brought complementary expertise from both entrepreneurship and frontier AI research. Eric Ho and Dan Balsam had previously co-founded RippleMatch in 2016, an AI-powered hiring platform that Ho scaled to over \$10 million in annual recurring revenue.[^7] Ho's work at RippleMatch earned him recognition on Forbes's 30 Under 30 list in 2022.[^8]
Tom McGrath, the company's Chief Scientist, is recognized as a pioneering figure in mechanistic interpretability. He completed his PhD in 2016 and co-founded the Interpretability team at Google DeepMind, where he served as a Senior Research Scientist.[^9] In March 2024, McGrath left Google to join South Park Commons, a community for technologists, with the explicit goal of making interpretability "useful" by starting a company.[^10] He connected with Ho and Balsam shortly thereafter to launch Goodfire.
The company's founding in June 2024 was followed quickly by a \$7 million seed round in August 2024, led by Lightspeed Venture Partners with participation from Menlo Ventures, South Park Commons, Work-Bench, and others.[^11] Less than one year later, in April 2025, Goodfire announced its \$50 million Series A at a \$200 million valuation.[^12] In February 2026, the company closed a \$150 million Series B at a \$1.25 billion valuation, led by B Capital with participation from DFJ Growth, Salesforce Ventures, Eric Schmidt, and existing investors including Menlo Ventures and Lightspeed Venture Partners.[^43]
## Team and Expertise
Beyond the three founders, Goodfire has assembled a team of leading researchers from <EntityLink id="E218">OpenAI</EntityLink> and <EntityLink id="E98">DeepMind</EntityLink>. The team includes:[^13]
- **Lee Sharkey**: Pioneered the use of sparse autoencoders in language models and co-founded <EntityLink id="E24">Apollo Research</EntityLink>
- **Nick Cammarata**: Started the interpretability team at OpenAI and worked closely with <EntityLink id="E59">Chris Olah</EntityLink>, widely considered the founder of mechanistic interpretability
The team's collective contributions include authoring the three most-cited papers in mechanistic interpretability and pioneering techniques like sparse autoencoders (SAEs) for feature discovery, auto-interpretability methods, and knowledge extraction from models like AlphaZero.[^14]
## Technology and Products
### Ember Platform
Ember is Goodfire's core product—a mechanistic interpretability API that provides direct, programmable access to AI model internals.[^15] Unlike traditional approaches that treat models as black boxes accessible only through prompts, Ember allows users to:
- **Examine features**: Identify interpretable patterns in neural activations (e.g., features representing "professionalism," "sarcasm," or specific knowledge domains)
- **Steer behavior**: Adjust feature activations to control model outputs without retraining or complex prompt engineering (e.g., making a model more "wise sage"-like by amplifying philosophical reasoning features)[^16]
- **Debug and audit**: Trace decision pathways, detect biases, identify vulnerabilities, and uncover hidden knowledge
- **Model diffing**: Track changes across training checkpoints to understand why problematic behaviors emerge[^17]
The platform is model-agnostic and currently supports models including Llama 3.3 70B and Llama 3.1 8B. Token processing has tripled monthly since launch, with hundreds of researchers using the platform.[^18]
### Auto Steer Method
Goodfire developed an "Auto Steer" method for automated behavioral adjustments. Independent evaluation found it effective for certain behavioral objectives (like "be professional") but noted a coherence gap—outputs sometimes became less coherent compared to traditional prompt engineering.[^19] This highlights the practical challenges of translating interpretability research into production-ready tools.
### Safety Applications
Goodfire emphasizes safety-first applications of interpretability:[^20]
- **Auditing**: Probing model behaviors to identify misalignment, biases, and vulnerabilities
- **Conditional steering**: Preventing jailbreaks by applying context-dependent behavioral controls (tested on the StrongREJECT adversarial dataset)
- **Model diffing**: Detecting how and why unsafe behaviors emerge during training or fine-tuning
- **PII detection**: Partnering with Rakuten to use sparse autoencoder probes to prevent personally identifiable information leakage[^21]
Pre-release safety measures for Ember include feature moderation (removing harmful/explicit/malicious features), input/<EntityLink id="E595">output filtering</EntityLink>, and controlled access for researchers.[^22]
### Intentional Design
In February 2026, Goodfire announced a broader vision called "intentional design"—using interpretability to guide model training rather than merely analyzing models post-hoc.[^44] The approach involves decomposing what a model learns from each datapoint into semantic components, then selectively applying or filtering these learning signals. Goodfire claims this method enabled them to cut hallucinations in half using interpretability-informed training.[^43] The approach has generated significant debate in the AI safety community (see Criticisms and Concerns).
## Partnerships and Impact
Goodfire has established collaborations with research institutions and industry partners:
- **Arc Institute**: Early collaboration using Ember on Evo 2, a DNA foundation model, to uncover biological concepts and accelerate scientific discovery in genomics.[^23]
- **Mayo Clinic**: Announced in September 2025, focusing on genomic medicine, reverse-engineering genomics models for insights into disease mechanisms while emphasizing data privacy and bias reduction.[^24]
- **Rakuten**: Enhancing reliability for Rakuten AI, which serves over 44 million monthly users in Japan and 2 billion customers worldwide, focusing on preventing PII leakage using frontier interpretability techniques.[^25]
- **Haize Labs**: Joint work on feature steering for AI safety auditing, red-teaming, and identifying failure modes in generative models.[^26]
- **Apollo Research**: Using Goodfire tools for safety benchmarks and research.[^27]
- **Microsoft**: Partnership announced alongside the Series B funding round in February 2026.[^43]
In November 2024, Goodfire powered the "Reprogramming AI Models" hackathon in partnership with Apart Research, with over 200 researchers across 15 countries prototyping safety applications like adversarial attack detection and "unlearning" harmful capabilities while preserving beneficial behaviors.[^28]
A notable scientific achievement came from Goodfire's partnership with Arc Institute: by reverse-engineering a biological foundation model, the team identified a novel class of Alzheimer's biomarkers—described as "the first major finding in the natural sciences obtained from reverse-engineering a foundation model."[^50]
## AI Safety and Alignment
Goodfire positions mechanistic interpretability as foundational to <EntityLink id="E439">AI alignment</EntityLink> and safety. The company's approach addresses several key challenges:
### Alignment Without Side Effects
Traditional alignment methods like reinforcement learning from human feedback (<EntityLink id="E259">RLHF</EntityLink>) can produce unintended side effects, such as excessive refusal of benign requests or sycophantic behavior.[^29] Goodfire's feature steering offers an alternative by enabling precise, quantitative alignment of specific behaviors without degrading overall model performance.
### Detecting Deception and Hidden Behaviors
One of the central challenges in AI safety is detecting deceptive or <EntityLink id="E274">scheming</EntityLink> behavior in advanced AI systems. Goodfire's model diffing and auditing tools aim to identify rare, undesired behaviors—such as a model encouraging self-harm—that might emerge during training or deployment.[^30] However, there is ongoing debate within the interpretability community about whether these techniques will scale to worst-case scenarios involving sophisticated deception.[^31]
### Governance and Compliance
Interpretability tools like Ember may become essential for regulatory compliance. The <EntityLink id="E127">EU AI Act</EntityLink> mandates transparency for high-risk AI systems, with fines up to €20 million for violations.[^32] Goodfire's auditing and documentation capabilities could help organizations meet these requirements.
### Fellowship Program
In October 2025, Goodfire announced a Fellowship Program for early- and mid-career researchers and engineers, matched with senior researchers to work on scientific discovery, interpretable models, and new interpretability methods.[^33]
## Criticisms and Concerns
Despite significant progress, Goodfire's approach faces several challenges and critiques:
### Unproven Effectiveness in Worst-Case Scenarios
There is substantial debate about whether mechanistic interpretability can reliably detect deception in advanced AI systems. Researcher <EntityLink id="E214">Neel Nanda</EntityLink> has noted that interpretability lacks "ground truth" for what concepts AI models actually use, making it difficult to validate interpretability claims.[^34] Some researchers favor alternative methods like linear probes.
A concrete example of interpretability's limitations emerged with GPT-4o's "extreme <EntityLink id="E295">sycophancy</EntityLink>" issue, which was detected behaviorally rather than through mechanistic analysis—no circuit was discovered, no particular weights or activations were identified as responsible, and mechanistic interpretability provided no advance warning.[^35]
### Competition from In-House Development
Leading AI labs like <EntityLink id="E22">Anthropic</EntityLink>, <EntityLink id="E218">OpenAI</EntityLink>, and <EntityLink id="E98">DeepMind</EntityLink> have the resources to develop interpretability tools internally. Anthropic has publicly committed to investing significantly in reliably detecting AI model problems by 2027.[^36] Additionally, open-source alternatives like Eleuther AI and InterpretML provide free interpretability frameworks, creating competitive pressure on commercial offerings.
### Computational Cost and Accessibility
Goodfire's pricing model is heavily compute-bound, with strict rate limits that may limit accessibility.[^37] The platform enforces a 50,000 token/minute global cap shared across all API methods. Advanced interpretability functions like AutoSteer and AutoConditional are limited to just 30 requests/minute, while simpler utilities allow 1,000 requests/minute. This hierarchy suggests exponentially higher computational costs for core interpretability features, potentially creating barriers for smaller organizations and academic researchers.
### "The Most Forbidden Technique" Debate
In February 2026, Goodfire Chief Scientist Thomas McGrath published "Intentionally Designing the Future of AI," proposing the use of interpretability tools to shape model training by decomposing gradients into semantic components and selectively applying them on a per-datapoint basis.[^44] This reignited a significant debate in the AI safety community about what Zvi Mowshowitz termed "The Most Forbidden Technique"—using interpretability techniques during training.[^45]
Critics argue that optimizing against interpretability signals during training teaches models to obfuscate their internal representations, ultimately degrading the very tools needed to detect misalignment.[^46] As Mowshowitz summarized: if you train against technique [T], "you are training the AI to obfuscate its thinking, and defeat [T]." A LessWrong post specifically questioning Goodfire's approach noted that even structurally different methods like gradient decomposition may not escape this fundamental dynamic, since "selection pressure just goes into the parts that you don't know about or don't completely understand."[^47]
Defenders, including <EntityLink id="E214">Neel Nanda</EntityLink>, argued that this research direction is both legitimate and potentially critical for safety. Nanda noted that multiple researchers including Anthropic Fellows have worked on interpretability-in-training, and that understanding its risks and benefits requires empirical research rather than blanket prohibition.[^48] He acknowledged key uncertainties: "I don't know how well it will work, how much it will break interpretability tools, or which things are more or less dangerous."
The debate took a personal dimension when founding research scientist Liv Gorton departed Goodfire in early 2026, with AI safety advocate Holly Elmore publicly speculating the departure was "for reasons of conscience."[^49] Gorton's departure—she had co-authored key research including the first sparse autoencoders on DeepSeek R1—highlighted the tensions between commercial applications of interpretability and its role as an independent safety tool.
### Capabilities vs. Safety Framing
Third-hand reports indicate that Goodfire leadership has pitched interpretability work as "capabilities-enhancing" (improving AI performance) rather than primarily safety-focused when fundraising.[^38] This framing raises questions about whether commercial incentives might prioritize performance improvements over safety applications—a tension common in dual-use AI research. The company's Series B announcement emphasized that interpretability-informed training had "cut hallucinations in half," framing the technology as a capabilities improvement.[^43]
## Funding and Business Model
Goodfire has raised approximately \$209 million across three rounds:[^39]
| Round | Date | Amount | Lead Investor | Key Participants | Valuation |
|-------|------|--------|---------------|------------------|-----------|
| Seed | August 2024 | \$7M | Lightspeed Venture Partners | Menlo Ventures, South Park Commons, Work-Bench, Juniper Ventures, Mythos Ventures, Bluebirds Capital | N/A |
| Series A | April 2025 | \$50M | Menlo Ventures | Anthropic, Lightspeed Venture Partners, B Capital, Work-Bench, Wing Ventures, South Park Commons, Metaplanet, Halcyon Ventures | \$200M |
| Series B | February 2026 | \$150M | B Capital | Juniper Ventures, DFJ Growth, Salesforce Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, Eric Schmidt | \$1.25B |
The Series A round marked <EntityLink id="E22">Anthropic</EntityLink>'s first direct investment in another company, signaling significant industry validation.[^40] The Series B, closing less than a year later, valued Goodfire at \$1.25 billion—making it one of the fastest AI startups to reach unicorn status.[^43]
Goodfire operates on a usage-based pricing model, charging per million tokens processed (input + output), with pricing tiered by model size:[^41]
- Smaller models (e.g., Llama 3.1 8B): \$0.35/million tokens
- Larger models (e.g., Llama 3.3 70B): \$1.90/million tokens
The company is positioned to capture value in a rapidly growing market. The explainable AI market was valued at approximately \$10 billion in 2025 and is projected to reach \$25 billion by 2030.[^42]
## Key Uncertainties
1. **Scalability to superintelligent systems**: Will mechanistic interpretability techniques that work on current models continue to provide safety guarantees as AI systems become more powerful and potentially deceptive?
2. **Commercial viability**: Can Goodfire compete with in-house interpretability teams at well-resourced AI labs and free open-source alternatives?
3. **Capabilities vs. safety trade-offs**: How will Goodfire navigate the tension between interpretability as a safety tool versus a capabilities enhancement that could accelerate AI development?
4. **Ground truth validation**: Without definitive ground truth about what concepts models represent, how can interpretability claims be rigorously validated?
5. **Computational economics**: Can the high computational costs of mechanistic interpretability be reduced sufficiently to enable widespread adoption?
6. **Training on interpretability**: Will using interpretability tools during model training ultimately compromise those tools' ability to serve as independent safety checks? This remains the central open question around Goodfire's "intentional design" approach.
## Sources
[^1]: [Goodfire company overview](https://www.cbinsights.com/company/goodfire-ai)
[^2]: [Goodfire company website](https://www.goodfire.ai/company)
[^3]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^4]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^5]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^6]: [PRNewswire: Goodfire Raises \$50M Series A](https://www.prnewswire.com/news-releases/goodfire-raises-50m-series-a-to-advance-ai-interpretability-research-302431030.html)
[^7]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^8]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^9]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^10]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^11]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^12]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^13]: [Menlo Ventures: Leading Goodfire's \$50M Series A](https://menlovc.com/perspective/leading-goodfires-50m-series-a-to-interpret-how-ai-models-think/)
[^14]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^15]: [Super B Crew: Goodfire Raises \$50M](https://www.superbcrew.com/goodfire-raises-50m-to-make-ai-models-transparent-steerable-and-safer-to-use/)
[^16]: [EA Forum: Goodfire — The Startup Trying to Decode How AI Thinks](https://forum.effectivealtruism.org/posts/2k8jdysns2HF3FeKC/goodfire-the-startup-trying-to-decode-how-ai-thinks)
[^17]: [EA Forum: Goodfire — The Startup Trying to Decode How AI Thinks](https://forum.effectivealtruism.org/posts/2k8jdysns2HF3FeKC/goodfire-the-startup-trying-to-decode-how-ai-thinks)
[^18]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^19]: [Alignment Forum: Mind the Coherence Gap](https://www.alignmentforum.org/posts/6dpKhtniqR3rnstnL/mind-the-coherence-gap-lessons-from-steering-llama-with-1)
[^20]: [Goodfire blog: Our Approach to Safety](https://www.goodfire.ai/blog/our-approach-to-safety)
[^21]: [Goodfire customer story: Rakuten](https://www.goodfire.ai/customer-stories/rakuten)
[^22]: [Goodfire blog: Our Approach to Safety](https://www.goodfire.ai/blog/our-approach-to-safety)
[^23]: [PRNewswire: Goodfire Raises \$50M Series A](https://www.prnewswire.com/news-releases/goodfire-raises-50m-series-a-to-advance-ai-interpretability-research-302431030.html)
[^24]: [Goodfire blog: Mayo Clinic collaboration](https://www.goodfire.ai/blog/mayo-clinic-collaboration)
[^25]: [Goodfire customer story: Rakuten](https://www.goodfire.ai/customer-stories/rakuten)
[^26]: [Goodfire blog: Our Approach to Safety](https://www.goodfire.ai/blog/our-approach-to-safety)
[^27]: [Goodfire blog: Announcing Goodfire Ember](https://www.goodfire.ai/blog/announcing-goodfire-ember)
[^28]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^29]: [EA Forum: Goodfire — The Startup Trying to Decode How AI Thinks](https://forum.effectivealtruism.org/posts/2k8jdysns2HF3FeKC/goodfire-the-startup-trying-to-decode-how-ai-thinks)
[^30]: [Goodfire research: Model Diff Amplification](https://www.goodfire.ai/research/model-diff-amplification)
[^31]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^32]: [AE Studio: AI Alignment](https://ae.studio/alignment/)
[^33]: [Goodfire blog: Fellowship Fall 25](https://www.goodfire.ai/blog/fellowship-fall-25)
[^34]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^35]: [Stanford CGPotts blog: Interpretability](https://web.stanford.edu/~cgpotts/blog/interp/)
[^36]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^37]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^38]: [EA Forum: Goodfire — The Startup Trying to Decode How AI Thinks](https://forum.effectivealtruism.org/posts/2k8jdysns2HF3FeKC/goodfire-the-startup-trying-to-decode-how-ai-thinks)
[^39]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^40]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^41]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^42]: [Contrary Research: Goodfire](https://research.contrary.com/company/goodfire)
[^43]: [Goodfire blog: Understanding, Learning From, and Designing AI: Our Series B](https://www.goodfire.ai/blog/our-series-b)
[^44]: [Goodfire blog: Intentionally Designing the Future of AI](https://www.goodfire.ai/blog/intentional-design)
[^45]: [Zvi Mowshowitz: The Most Forbidden Technique](https://thezvi.substack.com/p/the-most-forbidden-technique)
[^46]: [LessWrong: The Most Forbidden Technique](https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-forbidden-technique)
[^47]: [LessWrong: Goodfire and Training on Interpretability](https://www.lesswrong.com/posts/B3DQvjCD6gp2JEKaY/goodfire-and-training-on-interpretability)
[^48]: [Alignment Forum: It Is Reasonable To Research How To Use Model Internals In Training](https://www.alignmentforum.org/posts/G9HdpyREaCbFJjKu5/it-is-reasonable-to-research-how-to-use-model-internals-in)
[^49]: [Holly Elmore on X: Liv Gorton departure from Goodfire](https://x.com/ilex_ulmus/status/2016778354352136212)
[^50]: [Silicon Valley Daily: AI Research Lab Goodfire Scores \$125 Million](https://svdaily.com/2026/02/05/ai-research-lab-goodfire-scores-125-million/)