Models Tested
3
Best Score
72.7%
Median Score
61.4%
Scoring: percentage
Introduced: 2024-04
Maintainer: CMU / HKU
Leaderboard (3 models)
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Claude Opus 4.6 | Anthropic | 72.7% |
| 🥈 | Claude Sonnet 4.5 | Anthropic | 61.4% |
| 🥉 | Claude Haiku 4.5 | Anthropic | 50.7 |