Skip to content
Longterm Wiki

LiveBench

General

A monthly-refreshed benchmark with objective ground-truth answers across math, coding, reasoning, language, instruction following, and data analysis. Designed to resist contamination.

Models Tested
2
Best Score
61.2%
Median Score
58.1%
Scoring: percentage
Introduced: 2024-06
Maintainer: LiveBench Team

Leaderboard2 models

#ModelDeveloperScore
🥇Claude 3.5 SonnetAnthropic
61.2%
🥈GPT-4oOpenAI
55%