Introducing Claude Sonnet 4.5 \ Anthropic
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic
Anthropic's announcement of Claude Sonnet 4.5 is relevant to AI safety as it highlights alignment improvements in a frontier model, introduces agentic capabilities (Claude Agent SDK, computer use), and demonstrates the rapid capability progression of state-of-the-art AI systems.
Metadata
Summary
Anthropic announces Claude Sonnet 4.5, claiming it is the best coding and agentic model available, with leading performance on SWE-bench and OSWorld benchmarks. The release includes new developer tools such as the Claude Agent SDK, checkpoints in Claude Code, and a VS Code extension. Anthropic also highlights this as their most aligned frontier model to date, with substantial improvements across alignment metrics.
Key Points
- •Claude Sonnet 4.5 achieves state-of-the-art results on SWE-bench Verified (coding) and OSWorld (computer use, 61.4%), up from 42.2% four months prior.
- •Anthropic describes it as their most aligned frontier model ever, with large improvements across several alignment dimensions compared to previous Claude models.
- •New agentic infrastructure released: Claude Agent SDK, context editing, memory tools, and checkpoints in Claude Code for complex long-horizon tasks.
- •Model can maintain focus on complex multi-step tasks for over 30 hours, reflecting significant advances in agentic capability and reliability.
- •Pricing unchanged from Claude Sonnet 4 at $3/$15 per million tokens; available immediately via API.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Heavy Scaffolding / Agentic Systems | Concept | 57.0 |
Cached Content Preview
Announcements Introducing Claude Sonnet 4.5
Sep 29, 2025 Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math.
Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done.
Claude Sonnet 4.5 makes this possible. We're releasing it along with a set of major upgrades to our products. In Claude Code , we've added checkpoints—one of our most requested features—that save your progress and allow you to roll back instantly to a previous state. We've refreshed the terminal interface and shipped a native VS Code extension . We've added a new context editing feature and memory tool to the Claude API that lets agents run even longer and handle even greater complexity. In the Claude apps , we've brought code execution and file creation (spreadsheets, slides, and documents) directly into the conversation. And we've made the Claude for Chrome extension available to Max users who joined the waitlist last month.
We're also giving developers the building blocks we use ourselves to make Claude Code. We're calling this the Claude Agent SDK . The infrastructure that powers our frontier products—and allows them to reach their full potential—is now yours to build with.
This is the most aligned frontier model we’ve ever released, showing large improvements across several areas of alignment compared to previous Claude models.
Claude Sonnet 4.5 is available everywhere today. If you’re a developer, simply use claude-sonnet-4-5 via the Claude API . Pricing remains the same as Claude Sonnet 4, at $3/$15 per million tokens.
Frontier intelligence
Claude Sonnet 4.5 is state-of-the-art on the SWE-bench Verified evaluation, which measures real-world software coding abilities. Practically speaking, we’ve observed it maintaining focus for more than 30 hours on complex, multi-step tasks.
Claude Sonnet 4.5 represents a significant leap forward on computer use. On OSWorld, a benchmark that tests AI models on real-world computer tasks, Sonnet 4.5 now leads at 61.4%. Just four months ago, Sonnet 4 held the lead at 42.2%. Our Claude for Chrome extension puts these upgraded capabilities to use. In the demo below, we show Claude working directly in a browser, navigating sites, filling spreadsheets, and completing tasks.
The model also shows improved capabilities on a broad range of evaluations including reasoning and math:
Claude Sonnet 4.5 is our most powerful model to date. See footnotes for methodology. Experts in finance, law, medicine, and STEM found Sonnet 4.5 shows dramatically better domain-specific knowledge and reasoning compared to older models, including Opus 4.1.
Finance Law Medicine STEM The model’s capabilities are
... (truncated, 13 KB total)18816edf464a073a | Stable ID: sid_cTXyYfPFGw