Skip to content
Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Our World in Data

A widely-cited empirical resource for tracking AI capability trajectories; useful when grounding claims about AI progress, timeline discussions, or benchmarking in the AI safety and governance literature.

Metadata

Importance: 62/100dataset

Summary

An interactive dataset and visualization tracking AI performance across multiple domains—including language understanding, image recognition, and reasoning—relative to human-level baselines. It charts the historical progression of AI capabilities, illustrating where and when AI systems have surpassed human benchmarks. Useful for understanding the pace and trajectory of AI capability development.

Key Points

  • Tracks AI test scores across diverse domains including language, vision, and problem-solving relative to human performance baselines.
  • Visualizes the historical trend of AI surpassing human-level performance on various standardized benchmarks.
  • Provides a comparative framework useful for assessing progress in AI capabilities over time.
  • Highlights how rapidly AI has advanced in specific domains, relevant to discussions of transformative AI timelines.
  • Data sourced from prominent AI benchmarks, making it a useful reference for empirical claims about capability growth.

Review

This source represents a critical compilation of AI benchmark data, systematically tracking the progression of artificial intelligence capabilities across multiple domains. By normalizing human performance as zero and initial AI performance at -100, the dataset offers a nuanced view of technological advancement in areas such as language understanding, image recognition, mathematical reasoning, and code generation. The research is significant for AI safety because it provides empirical evidence of AI systems' evolving capabilities, highlighting both remarkable progress and persistent limitations. Benchmarks like BBH, MMLU, and HumanEval demonstrate AI's growing sophistication in complex reasoning, knowledge application, and problem-solving. However, the varied performance across different domains also underscores the importance of comprehensive evaluation and the need for careful development of AI systems to ensure alignment with human values and capabilities.
Resource ID: 653a55bdf7195c0c | Stable ID: NzI5NjYxN2