Skip to content
Longterm Wiki

WinoGrande

Reasoning
A large-scale commonsense reasoning benchmark with 44,000 Winograd-schema-style problems, using adversarial filtering to reduce annotation artifacts.
Models Tested
1
Best Score
81.6
Median Score
81.6
Scoring: accuracy
Introduced: 2019-07
Maintainer: AI2

Leaderboard (1 model)

#ModelDeveloperScore
🥇GPT-3.5 TurboOpenAI
81.6