Unveiling OpenAI o3: From benchmarks to real world | Our Insights

web

plantemoran.com·plantemoran.com/explore-our-thinking/insight/2025/01/unve...

Metadata

1 FactBase fact citing this source

Entity	Property	Value	As Of
OpenAI	Benchmark Score	71.7	Sep 2024

Cached Content Preview

HTTP 200Fetched Apr 30, 20269 KB

[Skip to Content](https://plantemoran.com/explore-our-thinking/insight/2025/01/unveiling-openai-o3-from-benchmarks-to-real-world#main-content)

[![PM Centennial logo.](https://plantemoran.com/-/media/images/plante-moran-logo/pm-centennial-desktop-logo-svg.svg?la=en&h=61&w=585&hash=7B646AF7A07D669390254DCAF6427100)![PM Centennial logo.](https://plantemoran.com/-/media/images/plante-moran-logo/pm-centennial-mobile-logo-svg.svg?la=en&h=60&w=293&hash=E442C2A2BFF91D3AB23AC9D98F6E6FD3)](https://plantemoran.com/)

- [Client login](https://plantemoran.com/client-login "Client Login")
- [Contact us](https://plantemoran.com/contact-us "Contact us")
- [Subscribe](https://plantemoran.com/subscribe "Subscribe Now")

![Business professional with glasses reading about OpenAI.](https://plantemoran.com/-/media/images/insights-images/2025/01/open-ai_gettyimages2155123849_1100x704.jpg?w=320&hash=62EDBCF0E9D56A1992C0AF52B324F253)

**Article**

# Unveiling OpenAI o3: From benchmarks to real world

Authors: [Cole Weinman](https://plantemoran.com/get-to-know/people/cole-weinman), Lucy Jiang Lape

January 24, 2025 / 4 min read

OpenAI’s next frontier model, OpenAI o3, is garnering global attention given its complex reasoning capabilities. But model performance on benchmark datasets doesn’t necessarily align with real-world applications, performance, or business value. Here’s our take.

AI continues to evolve at warp speed. In December 2024, OpenAI announced its next frontier model, OpenAI o3. The model is garnering global attention due to its capabilities to complete complex reasoning tasks.

The testing was carried out with benchmark datasets, which are used to test and evaluate AI and other computational models. They’re key to advancing machine learning and AI research. But model performance on benchmark datasets doesn’t necessarily align with real-world tasks and applications, on-the-ground performance, or business value. Here’s our take.

But model performance on benchmark datasets doesn’t necessarily align with real-world tasks and applications, on-the-ground performance, or business value.

## OpenAI o3 performance on benchmark datasets

The reported performance of OpenAI o3 is remarkable. According to a video from OpenAI, o3 has demonstrated exceptional performance on benchmark datasets: 96.7% accuracy in competition-level math problems, 87.7% accuracy on PhD-level science questions, and 71.7% in software programming.

These results clearly outperform the OpenAI o1 model and set a new industry standard. Open AI o3 also scored between 75.7% and 87.5% accuracy on the ARC-AGI datasets — Abstract and Reasoning Corpus for Artificial General Intelligence — which are considered one of the most important benchmarks for artificial general intelligence (AGI). This performance is comparable to human performance at about 85% accuracy.

The ARC-AGI datasets test models’ abilities in spatial reasoning, pattern recognition, and adapting knowledge to unfamiliar challenges — abil

... (truncated, 9 KB total)

Resource ID: 0213f739e3203d91 | Stable ID: sid_DBcgSXZFRg