Terminal-Bench 2
AgenticSecond version of the Terminal-Bench benchmark with expanded task coverage and difficulty.
Models Tested
1
Best Score
65.4%
Median Score
65.4%
Scoring: percentage
Introduced: 2025-06
Maintainer: Terminal-Bench
Leaderboard1 model
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Claude Opus 4.6 | Anthropic | 65.4% |