BrowseComp

Agentic

A benchmark evaluating AI systems' ability to find hard-to-locate information on the web, testing browsing, search, and information synthesis capabilities across difficult queries.

Models Tested

Best Score

Median Score

Scoring: accuracy

Introduced: 2025-04

Maintainer: OpenAI

Leaderboard (1 model)

#	Model	Developer	Score
🥇	Claude Opus 4.6	Anthropic	84