SimpleQA

Knowledge

A factual question-answering benchmark from OpenAI testing short, fact-seeking questions with verifiable answers. Evaluates factual accuracy and calibration.

Models Tested

Best Score

52.9%

Median Score

44.55%

Scoring: accuracy

Introduced: 2024-10

Maintainer: OpenAI

Leaderboard (4 models)

#	Model	Developer	Score
🥇	Gemini 2.5 Pro	Google DeepMind	52.9%
🥈	o3	OpenAI	47.6%
🥉	GPT-4.1	OpenAI	41.5%
4	Claude Opus 4.5	Anthropic	36