Skip to content
Longterm Wiki
Back

Petri: An Open-Source Auditing Tool to Accelerate AI Safety Research

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Released October 2025 by Anthropic, Petri is a practical open-source tool that lowers the barrier to systematic AI behavioral auditing, relevant to researchers building safety evaluations or studying model behavior at scale.

Metadata

Importance: 72/100tool pagetool

Summary

Petri (Parallel Exploration Tool for Risky Interactions) is Anthropic's open-source automated auditing framework that deploys AI agents to test target models through diverse multi-turn conversations, then scores and summarizes behaviors. It addresses the scaling problem of manual model auditing by automating hypothesis testing across behaviors like deception, sycophancy, and self-preservation. The tool was used in Claude 4 system cards and by the UK AI Security Institute for evaluations.

Key Points

  • Automates multi-turn conversation testing of AI models using an auditor agent that plans, interacts, and scores target model behavior across safety-relevant dimensions.
  • Researchers provide natural language 'seed instructions' describing scenarios to investigate; Petri runs them in parallel and surfaces the most concerning transcripts for human review.
  • Piloted across 14 frontier models using 111 diverse seed instructions covering deception, sycophancy, situational awareness, whistleblowing, and self-preservation behaviors.
  • Used in Claude 4 and Claude Sonnet 4.5 system cards, cross-lab comparisons with OpenAI, and adopted by the UK AI Security Institute for pre-release testing.
  • Open-source release enables external researchers and auditors to build both one-off exploratory evaluations and systematic benchmarks with minimal manual effort.

Cited by 1 page

PageTypeQuality
AI Accident Risk CruxesCrux67.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202611 KB
Alignment

# Petri: An open-source auditing tool to accelerate AI safety research

Oct 6, 2025

[Read the technical report](https://alignment.anthropic.com/2025/petri)

Petri (Parallel Exploration Tool for Risky Interactions) is our new open-source tool that enables researchers to explore hypotheses about model behavior with ease. Petri deploys an automated agent to test a target AI system through diverse multi-turn conversations involving simulated users and tools; Petri then scores and summarizes the target’s behavior.

This automation handles a significant part of the work that one needs to do to build a broad understanding of a new model, and makes it possible to test many individual hypotheses about how a model might behave in some new circumstance with only minutes of hands-on effort.

As AI becomes more capable and is deployed across more domains and with wide-ranging affordances, we need to evaluate a broader range of behaviors. This makes it increasingly difficult for humans to properly audit each model—the sheer volume and complexity of potential behaviors far exceeds what researchers can manually test.

We’ve found it valuable to turn to automated auditing agents to help address this challenge. We used them in the [Claude 4](https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf) and [Claude Sonnet 4.5](https://www.anthropic.com/claude-sonnet-4-5-system-card) System Cards to better understand behaviors such as situational awareness, whistleblowing, and self-preservation, and adapted them for head-to-head comparisons between heterogeneous models as part of a [recent exercise with OpenAI](https://alignment.anthropic.com/2025/openai-findings/). Our recent research release on [alignment-auditing agents](https://alignment.anthropic.com/2025/automated-auditing/) found these methods can reliably flag concerning behaviors in many settings. The [UK AI Security Institute](https://www.aisi.gov.uk/) also used a pre-release version of Petri to build evaluations that they used in their testing of Sonnet 4.5.

![](https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F85e8ee11c9ff6dea9023c57d53fe7b06a2bb9f8e-2292x2292.jpg&w=3840&q=75)Manually building alignment evaluations often involves constructing environments, running models, reading transcripts, and aggregating the results. Petri automates much of this process.

![](https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Faf0cd2e2b08bb4687e1924b6094dc2c4159486cf-2293x1288.jpg&w=3840&q=75)Researchers give Petri a list of seed instructions targeting scenarios and behaviors they want to test. Petri then operates on each seed instruction in parallel. For each seed instruction, an auditor agent makes a plan and interacts with the target model in a tool use loop. At the end, a judge scores each of the resulting transcripts across multiple dimensions so researchers can quickly

... (truncated, 11 KB total)
Resource ID: 62c583fb4c6af13a | Stable ID: NDM0MDNjZD