Skip to content
Longterm Wiki

DocVQA

Multimodal

Document Visual Question Answering — a benchmark of 50,000 questions on 12,000+ document images, testing the ability to understand and extract information from real-world documents.

Models Tested
2
Best Score
94.4%
Median Score
94.4%
Scoring: accuracy
Introduced: 2020-07

Leaderboard2 models

#ModelDeveloperScore
🥇Llama 4 ScoutMeta AI (FAIR)
94.4%
🥈Llama 4 MaverickMeta AI (FAIR)
94.4%