Back
Bloom: Automated Behavioral Evaluations
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic Alignment
Published by Anthropic's alignment team, Bloom describes infrastructure for automated behavioral evaluations relevant to researchers building scalable safety testing pipelines or studying how frontier labs operationalize safety assessments.
Metadata
Importance: 62/100blog postprimary source
Summary
Bloom is Anthropic's system for automated behavioral evaluations of AI models, designed to scalably assess safety-relevant behaviors without requiring human red-teamers for every evaluation. It enables systematic testing of model behaviors across a wide range of scenarios, supporting both capability assessment and safety evaluation at scale.
Key Points
- •Introduces automated pipelines for evaluating AI behavioral safety properties, reducing reliance on manual red-teaming efforts
- •Enables scalable, reproducible assessments of safety-relevant behaviors across diverse scenarios and model versions
- •Supports Anthropic's model evaluation infrastructure for tracking behavioral changes during training and deployment
- •Integrates with broader alignment research by providing systematic benchmarks for measuring policy compliance and harm avoidance
- •Represents a step toward automating parts of the AI safety evaluation process that traditionally required human judgment
Cited by 4 pages
| Page | Type | Quality |
|---|---|---|
| Epistemic Virtue Evals | Approach | 45.0 |
| Evals-Based Deployment Gates | Approach | 66.0 |
| AI Evaluations | Research Area | 72.0 |
| Scalable Eval Approaches | Approach | 65.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202669 KB
[Alignment Science Blog](https://alignment.anthropic.com/)
# Bloom: an open source tool for automated behavioral evaluations
Isha Gupta
December 19, 2025
Kai Fronsdal, Abhay Sheshadri, Jonathan Michala, Jacqueline Tay
Rowan Wang, Samuel R. Bowman, Sara Price
tl;dr
We are releasing Bloom, an agentic framework for developing behavioral
evaluations. Bloom's evaluations are reproducible and targeted: unlike open-ended auditing, Bloom
takes a researcher-specified behavior and quantifies its frequency and severity across automatically
generated scenarios. Bloom's evaluations correlate strongly with our hand-labelled judgments and
reliably separate baseline models from intentionally misaligned ones. As examples, we also release
benchmark results for four alignment relevant behaviors on 16 models. Bloom is available at
[github.com/safety-research/bloom](https://github.com/safety-research/bloom).
* * *
## Introduction
Frontier models exhibit various types of misalignment, for example in-context
scheming (Meinke et al, 2024), agentic misalignment (Lynch et al, 2025), and sycophancy (Sharma et al,
2023). Although researchers are developing mitigations for known issues ([Sonnet\\
4.5 System Card](https://assets.anthropic.com/m/12f214efcc2f457a/original/Claude-Sonnet-4-5-System-Card.pdf), 2025), new forms of misalignment will likely
emerge as models gain capabilities and are deployed in more complex environments. High-quality
evaluations remain essential for assessing these behaviors, but they require large amounts of researcher
time and are limited in quantity (see [Table 1](https://alignment.anthropic.com/2025/bloom-auto-evals/#h.wzevj3maq9zz)). These bespoke
evaluations also risk losing validity through training set contamination or rapidly evolving
capabilities (Kwa et al 2025).
Advancing model capabilities now make it possible to automate evaluation
development. Bloom is an agentic framework for generating
targeted evaluations for researcher-specified behavioral traits. We built Bloom to be accessible and highly configurable, serving as a reliable
evaluation generation framework for diverse research applications. With Bloom, researchers can skip the
evaluation pipeline engineering and go straight to measuring the propensities they are interested in
with a trusted, effective scaffold.
We recently released [Petri](https://alignment.anthropic.com/2025/petri/),
an automated auditor that explores the overall behavioral profiles of different models and surfaces new
misaligned behaviors. Bloom serves a complementary but separate purpose: generating in-depth evaluation
suites for specific behaviors and quantifying their severity and frequency across automatically
generated scenarios. Alongside the tool, we are releasing benchmarks for four behaviors—delusional
sycophancy, instructed long-horizon sabotage, self-preservation and self-preferential bias—across 16
frontier models. These only took a few days to conceptualize, refine and generate with Bloom
... (truncated, 69 KB total)Resource ID:
7fa7d4cb797a5edd | Stable ID: NThmNTZhNz