Learning to Reason with LLMs: OpenAI o1
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
This is OpenAI's official technical blog post announcing o1, a reasoning-focused model relevant to AI safety discussions around scalable oversight, interpretability of reasoning chains, and the implications of inference-time compute scaling for alignment.
Metadata
Summary
OpenAI introduces the o1 model series, which uses chain-of-thought reasoning during inference to significantly improve performance on complex tasks in science, math, and coding. The model is trained via reinforcement learning to 'think' before responding, producing a hidden reasoning trace. This represents a major capability advance, with safety implications around alignment and evaluation.
Key Points
- •o1 uses reinforcement learning to develop extended internal chain-of-thought reasoning before producing final answers, improving accuracy on hard problems.
- •The model achieves expert-level performance on benchmarks like AIME math competitions, Codeforces, and PhD-level science questions (GPQA).
- •A hidden 'reasoning token' chain is generated internally but not fully shown to users, raising interpretability and oversight concerns.
- •OpenAI reports o1 scores better on safety evaluations than GPT-4o, particularly for resisting jailbreaks and following safety guidelines.
- •The release marks a shift toward inference-time compute scaling as a new axis of capability improvement, distinct from simply scaling parameters.
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Reasoning and Planning | Capability | 65.0 |
| AI Scaling Laws | Concept | 92.0 |
| Process Supervision | Approach | 65.0 |
Cached Content Preview
Switch to
- [ChatGPT(opens in a new window)](https://chatgpt.com/?openaicom-did=ad31255f-bbda-4d1c-bfb3-774766dcd5d9&openaicom_referred=true)
- [Sora(opens in a new window)](https://sora.com/)
- [API Platform(opens in a new window)](https://platform.openai.com/)
Learning to reason with LLMs \| OpenAI
September 12, 2024
[Release](https://openai.com/research/index/release/)
# Learning to reason with LLMs
[Contributions](https://openai.com/openai-o1-contributions/) [Use o1(opens in a new window)](https://chatgpt.com/?openaicom-did=ad31255f-bbda-4d1c-bfb3-774766dcd5d9&openaicom_referred=true)
Listen to article
Share
OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1‑preview, for immediate use in ChatGPT and to [trusted API users(opens in a new window)](https://platform.openai.com/docs/guides/rate-limits/usage-tiers).
Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.

o1 performance smoothly improves with both train-time and test-time compute
## Evals
To highlight the reasoning improvement over GPT‑4o, we tested our models on a diverse set of human exams and ML benchmarks. We show that o1 significantly outperforms GPT‑4o on the vast majority of these reasoning-heavy tasks. Unless otherwise specified, we evaluated o1 on the maximal test-time compute setting.


![PhD-Level Science Questio
... (truncated, 35 KB total)9edf2bd5938d8386 | Stable ID: YTA4NjI3Mj