LiveBench
GeneralA monthly-refreshed benchmark with objective ground-truth answers across math, coding, reasoning, language, instruction following, and data analysis. Designed to resist contamination.
Models Tested
2
Best Score
61.2%
Median Score
58.1%
Scoring: percentage
Introduced: 2024-06
Maintainer: LiveBench Team
Leaderboard2 models
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Claude 3.5 Sonnet | Anthropic | 61.2% |
| 🥈 | GPT-4o | OpenAI | 55% |