LiveBench

General

A monthly-refreshed benchmark with objective ground-truth answers across math, coding, reasoning, language, instruction following, and data analysis. Designed to resist contamination.

Wiki page →Website →

Models Tested

Best Score

61.2%

Median Score

58.1%

Scoring: percentage

Introduced: 2024-06

Maintainer: LiveBench Team

Leaderboard2 models

#	Model	Developer	Score
🥇	Claude 3.5 Sonnet	Anthropic	61.2%
🥈	GPT-4o	OpenAI	55%