Skip to content
Longterm Wiki
Search
Entities
Research
Policy
Sources
FactBase
About
Internal
Search
⌘K
Benchmarks
/
WinoGrande
WinoGrande
Reasoning
Wiki page
Data
A large-scale commonsense reasoning benchmark with 44,000 Winograd-schema-style problems, using adversarial filtering to reduce annotation artifacts.
Models Tested
1
Best Score
81.6
Median Score
81.6
Scoring:
accuracy
Introduced:
2019-07
Maintainer:
AI2
Leaderboard
(1 model)
#
Model
Developer
Score
🥇
GPT-3.5 Turbo
OpenAI
81.6