WebArena
AgenticA realistic web agent benchmark where AI must complete tasks on self-hosted replicas of real websites (Reddit, GitLab, shopping, maps). Tests multi-step web navigation and interaction.
Models Tested
2
Best Score
53%
Median Score
33.7%
Scoring: percentage
Introduced: 2023-07
Maintainer: CMU