Skip to content
Longterm Wiki
All Source Checks
Citation

Anthropic - Footnote 17

partial85% confidence

1 evidence check

Last checked: 4/3/2026

The source does not mention the specific percentage achieved on SWE-bench Verified (80.9%) or OSWorld (61.4%). The source does not state that Claude Opus 4.5 is the first AI model to exceed 80% on SWE-bench Verified or 60% on Terminal-Bench 2.0. The source does not provide the next-best model's score on OSWorld (7.8%). The source only mentions Terminal Bench, not Terminal-Bench 2.0.

Evidence — 1 source, 1 check

partial85%Haiku 4.5 · 4/3/2026
Found: Claude Opus 4.5, released in November 2025, achieved results on benchmarks for complex enterprise tasks: 80.9% on SWE-bench Verified (the first AI model to exceed 80%), 60%+ on Terminal-Bench 2.0 (the

Note: The source does not mention the specific percentage achieved on SWE-bench Verified (80.9%) or OSWorld (61.4%). The source does not state that Claude Opus 4.5 is the first AI model to exceed 80% on SWE-bench Verified or 60% on Terminal-Bench 2.0. The source does not provide the next-best model's score on OSWorld (7.8%). The source only mentions Terminal Bench, not Terminal-Bench 2.0.

Debug info

Record type: citation

Record ID: page:anthropic:fn17