Skip to content
Longterm Wiki

BrowseComp

Agentic
A benchmark evaluating AI systems' ability to find hard-to-locate information on the web, testing browsing, search, and information synthesis capabilities across difficult queries.
Models Tested
1
Best Score
84
Median Score
84
Scoring: accuracy
Introduced: 2025-04
Maintainer: OpenAI

Leaderboard (1 model)

#ModelDeveloperScore
🥇Claude Opus 4.6Anthropic
84