Back
AI Magazine: OpenAI vs Anthropic Safety Test Results
webaimagazine.com·aimagazine.com/news/openai-vs-anthropic-the-results-of-th...
A trade press article comparing safety testing outcomes for OpenAI and Anthropic models; useful for tracking industry narratives but lacks the methodological depth of academic safety evaluations.
Metadata
Importance: 35/100news articlenews
Summary
A comparative analysis of AI safety performance between OpenAI and Anthropic's models, examining how each company's systems perform on safety-related tests and benchmarks. The article highlights differences in safety approaches and outcomes between the two leading AI labs.
Key Points
- •Compares safety test results between OpenAI and Anthropic AI systems
- •Highlights differences in how each company approaches and implements AI safety measures
- •Provides benchmark-style evaluation of model behavior on safety-relevant tasks
- •Reflects ongoing industry efforts to establish comparative safety standards across frontier AI labs
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Anthropic | Organization | 74.0 |
Cached Content Preview
HTTP 200Fetched Feb 25, 202611 KB
Article
AI Applications
# OpenAI vs Anthropic: The Results of the AI Safety Test
By [Kitty Wheeler](https://aimagazine.com/author/kitty-wheeler)
September 01, 2025
6 mins
Share
Share

Led by Sam Altman and Dario Amodei, OpenAI and Anthropic publish the results of its first joint safety evaluation
OpenAI and Anthropic publish safety evaluation results for each ones leading AI systems, finding strengths and weaknesses across Claude 4 and GPT models
OpenAI and Anthropic publish the results of its first joint safety evaluation, where each company tests the other’s models using their own internal [safety protocols](https://aimagazine.com/articles/trump-scraps-ai-risk-rules-what-you-need-to-know).
OpenAI evaluates Anthropic’s Claude Opus 4 and Claude Sonnet 4 models, while Anthropic tests OpenAI’s GPT-4o, GPT-4.1, o3 and o4-mini systems.
To allow completion of the tests, both companies temporarily relaxed certain external safeguards, following standard industry practice for dangerous capability evaluations.
The exercise focuses on four critical areas: instruction hierarchy (how models prioritise different types of instructions), jailbreaking resistance, hallucination prevention and scheming behaviour.
“The goal of this external evaluation is to help surface gaps that might otherwise be missed, deepen our understanding of potential misalignment and demonstrate how labs can collaborate on issues of safety and alignment,” [OpenAI](https://aimagazine.com/news/openai-study-mode-ai-tutoring-for-better-student-learning) researchers say.
The findings reveal big differences in how the two companies’ models handle uncertainty and safety trade-offs, with implications for how AI systems might behave in real-world deployments.
## Why Claude dominates instruction following but struggles with jailbreaks
Claude 4 models show superior performance in maintaining instruction hierarchy – the system that ensures AI models prioritise [safety constraints over user requests](https://aimagazine.com/news/the-story-behind-elon-musks-xai-grok-4-ethical-concerns).

Anthropic specialises in safe, aligned large language models (LLMs) like Claude, focused on constitutional AI and ethical chatbot design \| Credit: Anthropic
In tests designed to extract secret passwords embedded in system prompts, both Opus 4 and Sonnet 4 achieve perfect scores, matching OpenAI’s flagship o3 model.
The Claude systems prove particularly adept at handling conflicts between system-level safety directives and user requests.
In multi-turn conversations where simulated users attempt to cajole the models into violating their instructions, Claude consistently refuses to comply.
However, the picture becomes more complex with jailbr
... (truncated, 11 KB total)Resource ID:
99038cb6447dc5e7 | Stable ID: YzNjMTRkY2