Skip to content
Longterm Wiki

Claude Opus 4.1 Release Announcement

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Anthropic's announcement of Claude Opus 4.1, an incremental upgrade focused on agentic coding and reasoning tasks, relevant to AI safety as it documents capability advances and deployment practices of a frontier AI lab.

Metadata

Importance: 42/100press releasenews

Summary

Anthropic releases Claude Opus 4.1, an upgrade to Claude Opus 4 with improved performance on agentic tasks, real-world coding (74.5% on SWE-bench Verified), and reasoning. The model is available via API, Amazon Bedrock, and Google Cloud Vertex AI at the same pricing as Opus 4. Notable improvements include multi-file code refactoring and precision debugging in large codebases.

Key Points

  • Claude Opus 4.1 achieves 74.5% on SWE-bench Verified, advancing state-of-the-art coding performance.
  • Improvements focus on agentic tasks, in-depth research, data analysis, detail tracking, and agentic search.
  • Available to paid Claude users, Claude Code, and via API on Anthropic, Amazon Bedrock, and Google Vertex AI.
  • Third-party evaluators (GitHub, Rakuten, Windsurf) report notable gains in multi-file refactoring and debugging precision.
  • Anthropic notes plans for substantially larger model improvements in the coming weeks.

Cached Content Preview

HTTP 200Fetched Apr 24, 20264 KB
Announcements Claude Opus 4.1

 Aug 5, 2025 Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. We plan to release substantially larger improvements to our models in the coming weeks.

 

 Opus 4.1 is now available to paid Claude users and in Claude Code. It's also on our API, Amazon Bedrock, and Google Cloud's Vertex AI. Pricing is the same as Opus 4.

 Claude Opus 4.1

 Opus 4.1 advances our state-of-the-art coding performance to 74.5% on SWE-bench Verified . It also improves Claude’s in-depth research and data analysis skills, especially around detail tracking and agentic search.

 

 GitHub notes that Claude Opus 4.1 improves across most capabilities relative to Opus 4, with particularly notable performance gains in multi-file code refactoring. Rakuten Group finds that Opus 4.1 excels at pinpointing exact corrections within large codebases without making unnecessary adjustments or introducing bugs, with their team preferring this precision for everyday debugging tasks. Windsurf reports Opus 4.1 delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, showing roughly the same performance leap as the jump from Sonnet 3.7 to Sonnet 4.

 Getting started

 We recommend upgrading from Opus 4 to Opus 4.1 for all uses. If you’re a developer, simply use claude-opus-4-1-20250805 via the API. You can also explore our system card , model page , pricing page , and docs to learn more.

 

 As always, your feedback helps us improve, especially as we continue to release new and more capable models.

 Appendix

 Data sources 

 OpenAI: o3 launch post , o3 system card 
 Gemini: 2.5 Pro model card 
 Claude: Sonnet 3.7 launch post , Claude 4 launch post 
 

 Benchmark reporting 

 Claude models are hybrid reasoning models. The benchmarks reported in this blog post show the highest scores achieved with or without extended thinking. We’ve noted below for each result whether extended thinking was used:

 No extended thinking: SWE-bench Verified, Terminal-Bench
 The following benchmarks were reported with extended thinking (up to 64K tokens): TAU-bench, GPQA Diamond, MMMLU, MMMU, AIME
 

 TAU-bench methodology 

 Scores were achieved with a prompt addendum to both the Airline and Retail Agent Policy instructing Claude to better leverage its reasoning abilities while using extended thinking with tool use. The model is encouraged to write down its thoughts as it solves the problem distinct from our usual thinking mode, during the multi-turn trajectories to best leverage its reasoning abilities. To accommodate the additional steps Claude incurs by utilizing more thinking, the maximum number of steps (counted by model completions) was increased from 30 to 100 (most trajectories completed under 30 steps with only one trajectory reaching above 50 steps).

 

 SWE-bench methodology 

 For the Claude 4 family of models, we continue to use the same simple s

... (truncated, 4 KB total)
Resource ID: 68ffcf2c57450e50 | Stable ID: sid_zyp7mVYtTw