Is Scaling All You Need?
scaling-debate (E272)← Back to pagePath: /knowledge-base/debates/scaling-debate/
Page Metadata
{
"id": "scaling-debate",
"numericId": null,
"path": "/knowledge-base/debates/scaling-debate/",
"filePath": "knowledge-base/debates/scaling-debate.mdx",
"title": "Is Scaling All You Need?",
"quality": 42,
"importance": 42,
"contentFormat": "article",
"tractability": null,
"neglectedness": null,
"uncertainty": null,
"causalLevel": null,
"lastUpdated": "2026-01-29",
"llmSummary": "Comprehensive survey of the 2024-2025 scaling debate, documenting the shift from pure pretraining to 'scaling-plus' approaches after o3 achieved 87.5% on ARC-AGI-1 but GPT-5 faced 2-year delays. Expert consensus has moved to ~45% favoring hybrid approaches, with data wall projected 2026-2030 and AGI timelines spanning 5-30+ years depending on paradigm.",
"structuredSummary": null,
"description": "The scaling debate examines whether current AI approaches will reach AGI through more compute and data, or require new paradigms. By 2025, evidence is mixed: o3 achieved 87.5% on ARC-AGI-1, but GPT-5 took 2 years longer than expected and ARC-AGI-2 remains unsolved by all models. The emerging consensus favors 'scaling-plus'—combining pretraining with reasoning via test-time compute.",
"ratings": {
"novelty": 2.5,
"rigor": 4,
"actionability": 3,
"completeness": 5.5
},
"category": "debates",
"subcategory": null,
"clusters": [
"ai-safety"
],
"metrics": {
"wordCount": 1024,
"tableCount": 7,
"diagramCount": 1,
"internalLinks": 10,
"externalLinks": 36,
"footnoteCount": 0,
"bulletRatio": 0.16,
"sectionCount": 19,
"hasOverview": false,
"structuralScore": 13
},
"suggestedQuality": 87,
"updateFrequency": 45,
"evergreen": true,
"wordCount": 1024,
"unconvertedLinks": [
{
"text": "Stanford AI Index 2025",
"url": "https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance",
"resourceId": "1a26f870e37dcc68",
"resourceTitle": "Technical Performance - 2025 AI Index Report"
},
{
"text": "ARC Prize Technical Report",
"url": "https://arcprize.org/blog/oai-o3-pub-breakthrough",
"resourceId": "457fa3b0b79d8812",
"resourceTitle": "o3 scores 87.5% on ARC-AGI"
},
{
"text": "Stanford AI Index",
"url": "https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance",
"resourceId": "1a26f870e37dcc68",
"resourceTitle": "Technical Performance - 2025 AI Index Report"
},
{
"text": "Epoch AI",
"url": "https://epoch.ai/blog/can-ai-scaling-continue-through-2030",
"resourceId": "9587b65b1192289d",
"resourceTitle": "Epoch AI"
},
{
"text": "o1/o3 reasoning paradigm",
"url": "https://openai.com/index/introducing-o3-and-o4-mini/",
"resourceId": "bf92f3d905c3de0d",
"resourceTitle": "announced December 2024"
},
{
"text": "ARC Prize",
"url": "https://arcprize.org/blog/oai-o3-pub-breakthrough",
"resourceId": "457fa3b0b79d8812",
"resourceTitle": "o3 scores 87.5% on ARC-AGI"
},
{
"text": "OpenAI observed",
"url": "https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai",
"resourceId": "3c8e4281a140e1cd",
"resourceTitle": "GPQA Diamond"
},
{
"text": "Introducing o3 and o4-mini",
"url": "https://openai.com/index/introducing-o3-and-o4-mini/",
"resourceId": "bf92f3d905c3de0d",
"resourceTitle": "announced December 2024"
},
{
"text": "Can AI scaling continue through 2030?",
"url": "https://epoch.ai/blog/can-ai-scaling-continue-through-2030",
"resourceId": "9587b65b1192289d",
"resourceTitle": "Epoch AI"
},
{
"text": "AI Index 2025",
"url": "https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance",
"resourceId": "1a26f870e37dcc68",
"resourceTitle": "Technical Performance - 2025 AI Index Report"
},
{
"text": "o3: The grand finale of AI in 2024",
"url": "https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai",
"resourceId": "3c8e4281a140e1cd",
"resourceTitle": "GPQA Diamond"
},
{
"text": "Scaling Laws for LLMs",
"url": "https://cameronrwolfe.substack.com/p/llm-scaling-laws",
"resourceId": "056c40c4515292c5",
"resourceTitle": "AIME 2024"
},
{
"text": "AI Beyond the Scaling Laws",
"url": "https://www.hec.edu/en/dare/tech-ai/ai-beyond-scaling-laws",
"resourceId": "40560014cfc7663d",
"resourceTitle": "some researchers note"
}
],
"unconvertedLinkCount": 13,
"convertedLinkCount": 0,
"backlinkCount": 0,
"redundancy": {
"maxSimilarity": 14,
"similarPages": [
{
"id": "language-models",
"title": "Large Language Models",
"path": "/knowledge-base/capabilities/language-models/",
"similarity": 14
},
{
"id": "large-language-models",
"title": "Large Language Models",
"path": "/knowledge-base/capabilities/large-language-models/",
"similarity": 14
},
{
"id": "agi-timeline-debate",
"title": "When Will AGI Arrive?",
"path": "/knowledge-base/debates/agi-timeline-debate/",
"similarity": 13
},
{
"id": "agi-timeline",
"title": "AGI Timeline",
"path": "/knowledge-base/forecasting/agi-timeline/",
"similarity": 13
},
{
"id": "dense-transformers",
"title": "Dense Transformers",
"path": "/knowledge-base/intelligence-paradigms/dense-transformers/",
"similarity": 13
}
]
}
}Entity Data
{
"id": "scaling-debate",
"type": "crux",
"title": "Is Scaling All You Need?",
"description": "The debate over whether scaling compute and data is sufficient for AGI or if we need new paradigms.",
"tags": [
"debate",
"scaling",
"capabilities"
],
"relatedEntries": [],
"sources": [],
"lastUpdated": "2025-01",
"customFields": [
{
"label": "Question",
"value": "Can we reach AGI through scaling alone, or do we need new paradigms?"
},
{
"label": "Stakes",
"value": "Determines AI timeline predictions and research priorities"
},
{
"label": "Expert Consensus",
"value": "Strong disagreement between scaling optimists and skeptics"
}
]
}Canonical Facts (0)
No facts for this entity
External Links
{
"lesswrong": "https://www.lesswrong.com/tag/scaling-laws"
}Backlinks (0)
No backlinks
Frontmatter
{
"title": "Is Scaling All You Need?",
"description": "The scaling debate examines whether current AI approaches will reach AGI through more compute and data, or require new paradigms. By 2025, evidence is mixed: o3 achieved 87.5% on ARC-AGI-1, but GPT-5 took 2 years longer than expected and ARC-AGI-2 remains unsolved by all models. The emerging consensus favors 'scaling-plus'—combining pretraining with reasoning via test-time compute.",
"sidebar": {
"order": 2
},
"importance": 42,
"quality": 42,
"lastEdited": "2026-01-29",
"update_frequency": 45,
"llmSummary": "Comprehensive survey of the 2024-2025 scaling debate, documenting the shift from pure pretraining to 'scaling-plus' approaches after o3 achieved 87.5% on ARC-AGI-1 but GPT-5 faced 2-year delays. Expert consensus has moved to ~45% favoring hybrid approaches, with data wall projected 2026-2030 and AGI timelines spanning 5-30+ years depending on paradigm.",
"ratings": {
"novelty": 2.5,
"rigor": 4,
"actionability": 3,
"completeness": 5.5
},
"clusters": [
"ai-safety"
]
}Raw MDX Source
---
title: "Is Scaling All You Need?"
description: "The scaling debate examines whether current AI approaches will reach AGI through more compute and data, or require new paradigms. By 2025, evidence is mixed: o3 achieved 87.5% on ARC-AGI-1, but GPT-5 took 2 years longer than expected and ARC-AGI-2 remains unsolved by all models. The emerging consensus favors 'scaling-plus'—combining pretraining with reasoning via test-time compute."
sidebar:
order: 2
importance: 42
quality: 42
lastEdited: "2026-01-29"
update_frequency: 45
llmSummary: "Comprehensive survey of the 2024-2025 scaling debate, documenting the shift from pure pretraining to 'scaling-plus' approaches after o3 achieved 87.5% on ARC-AGI-1 but GPT-5 faced 2-year delays. Expert consensus has moved to ~45% favoring hybrid approaches, with data wall projected 2026-2030 and AGI timelines spanning 5-30+ years depending on paradigm."
ratings:
novelty: 2.5
rigor: 4
actionability: 3
completeness: 5.5
clusters: ["ai-safety"]
---
import {DisagreementMap, InfoBox, KeyQuestions, DataExternalLinks, Mermaid, EntityLink} from '@components/wiki';
<DataExternalLinks pageId="scaling-debate" />
<InfoBox
type="crux"
title="The Scaling Debate"
customFields={[
{ label: "Question", value: "Can we reach AGI through scaling alone, or do we need new paradigms?" },
{ label: "Stakes", value: "Determines AI timeline predictions and research priorities" },
{ label: "Expert Consensus", value: "Strong disagreement between scaling optimists and skeptics" },
]}
/>
## Quick Assessment
| Dimension | Assessment | Evidence |
|-----------|------------|----------|
| **Resolution Status** | Partially resolved toward scaling-plus | Reasoning models (o1, o3) demonstrate new scaling regimes; pure pretraining scaling stalling |
| **Expert Consensus** | ~25% favor pure scaling, ~30% favor new paradigms, ≈45% favor hybrid | [Stanford AI Index 2025](https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance) surveys; lab behavior |
| **Key Milestone (Pro-Scaling)** | o3 achieves 87.5% on ARC-AGI-1 | [ARC Prize Technical Report](https://arcprize.org/blog/oai-o3-pub-breakthrough): \$3,460/task at maximum compute |
| **Key Milestone (Anti-Scaling)** | GPT-5 delayed 2 years; pure pretraining hits ceiling | [Fortune (Feb 2025)](https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/): Industry pivots to reasoning |
| **Data Wall Timeline** | 2026-2030 for human-generated text | [Epoch AI (2022)](https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data): Stock exhausted depending on overtraining |
| **Investment Level** | \$500B+ committed through 2029 | [Stargate Project](https://openai.com/index/announcing-the-stargate-project/): <EntityLink id="E218">OpenAI</EntityLink>, SoftBank, Oracle joint venture |
| **Stakes** | Determines timeline predictions (5-15 vs 15-30+ years to AGI) | Affects safety research priorities, resource allocation, policy |
One of the most consequential debates in AI: Can we achieve AGI simply by making current approaches (transformers, neural networks) bigger, or do we need fundamental breakthroughs in architecture and methodology?
## Key Links
| Source | Link |
|--------|------|
| Official Website | [debutinfotech.com](https://www.debutinfotech.com/blog/unseen-data-harvesting-by-tech-giants-for-ai-development) |
## The Question
The debate centers on whether the remarkable progress of AI from 2019-2024 will continue along the same trajectory, or whether we're approaching fundamental limits that require new approaches.
<Mermaid chart={`
flowchart TD
PROGRESS[AI Progress 2019-2024] --> QUESTION{Will Scaling<br/>Continue Working?}
QUESTION -->|"Yes"| SCALING[Scaling Optimists]
QUESTION -->|"Partially"| HYBRID[Hybrid View]
QUESTION -->|"No"| SKEPTICS[New Paradigm Needed]
SCALING --> S1[More compute]
SCALING --> S2[More data]
SCALING --> S3[Better engineering]
S1 --> AGI_SOON[AGI 5-15 years]
S2 --> AGI_SOON
S3 --> AGI_SOON
HYBRID --> H1[Scaling + Reasoning]
HYBRID --> H2[Test-time compute]
HYBRID --> H3[New training methods]
H1 --> AGI_MED[AGI 10-20 years]
H2 --> AGI_MED
H3 --> AGI_MED
SKEPTICS --> N1[World models needed]
SKEPTICS --> N2[Symbolic reasoning]
SKEPTICS --> N3[New architectures]
N1 --> AGI_FAR[AGI 20-40+ years]
N2 --> AGI_FAR
N3 --> AGI_FAR
style PROGRESS fill:#e6f3ff
style AGI_SOON fill:#ccffcc
style AGI_MED fill:#ffffcc
style AGI_FAR fill:#ffcccc
`} />
**Scaling hypothesis**: Current deep learning approaches will reach human-level and superhuman intelligence through:
- More compute (bigger models, longer training)
- More data (larger, higher-quality datasets)
- Better engineering (efficiency improvements)
**New paradigms hypothesis**: We need fundamentally different approaches because current methods hit fundamental limits.
### The Evidence Landscape
| Evidence Type | Favors Scaling | Favors New Paradigms | Interpretation |
|---------------|----------------|---------------------|----------------|
| GPT-3 → GPT-4 gains | Strong: Major capability jumps | — | Pretraining scaling worked through 2023 |
| GPT-4 → GPT-5 delays | — | Strong: 2-year development time | [Fortune](https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/): Pure pretraining ceiling hit |
| o1/o3 reasoning models | Strong: New scaling regime found | Moderate: Required paradigm shift | Reinforcement learning unlocked gains |
| ARC-AGI-1 scores | Strong: o3 achieves 87.5% | Moderate: \$3,460/task cost | Brute force, not generalization |
| ARC-AGI-2 benchmark | — | Strong: Under 5% for all models | Humans still solve 100% |
| Model convergence | — | Moderate: Top-10 Elo gap shrunk 11.9% → 5.4% | [Stanford AI Index](https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance): Diminishing differentiation |
| Parameter efficiency | Strong: 142x reduction for MMLU 60% | — | 540B (2022) → 3.8B (2024) |
## Key Positions
<DisagreementMap
title="Positions on Scaling"
description="Where different researchers and organizations stand"
positions={[
{
name: "Ilya Sutskever (OpenAI)",
stance: "strong-scaling",
confidence: "high",
reasoning: "Has consistently predicted that scaling will be sufficient. OpenAI's strategy is built on this.",
evidence: ["GPT-2/3/4 trajectory", "Scaling law predictions"],
quote: "Unsupervised learning + scaling is all you need"
},
{
name: "Dario Amodei (Anthropic)",
stance: "scaling-plus",
confidence: "high",
reasoning: "Believes scaling is primary driver but with important safety additions (Constitutional AI, etc.)",
evidence: ["Anthropic's research strategy"],
quote: "Scaling works, but we need to scale safely"
},
{
name: "Yann LeCun (Meta)",
stance: "new-paradigm",
confidence: "high",
reasoning: "Argues LLMs are missing crucial components like world models and planning.",
evidence: ["JEPA proposal", "Critique of autoregressive models"],
quote: "Auto-regressive LLMs are a dead end for AGI"
},
{
name: "Gary Marcus",
stance: "strong-skeptic",
confidence: "high",
reasoning: "Argues deep learning is fundamentally limited, scaling just makes bigger versions of the same limitations.",
evidence: ["Persistent reasoning failures", "Lack of compositionality"],
quote: "Scaling just gives you more of the same mistakes"
},
{
name: "DeepMind",
stance: "scaling-plus",
confidence: "medium",
reasoning: "Combines scaling with algorithmic innovations (AlphaGo, AlphaFold, Gemini)",
evidence: ["Hybrid approaches"],
quote: "Scale and innovation together"
},
{
name: "François Chollet",
stance: "new-paradigm",
confidence: "high",
reasoning: "Created ARC benchmark to show LLMs can't generalize. Argues we need fundamentally different approaches.",
evidence: ["ARC benchmark results", "On the Measure of Intelligence"],
quote: "LLMs memorize, they don't generalize"
}
]}
/>
## Key Cruxes
<KeyQuestions
questions={[
{
question: "Will scaling unlock planning and reasoning?",
positions: [
{
position: "Yes - these are emergent capabilities",
confidence: "medium",
reasoning: "Many capabilities emerged unpredictably. Planning/reasoning may too at sufficient scale.",
implications: "Continue scaling, AGI within years"
},
{
position: "No - these require architectural changes",
confidence: "medium",
reasoning: "These capabilities require different computational structures than next-token prediction.",
implications: "Need new paradigms, AGI more distant"
}
]
},
{
question: "Is the data wall real?",
positions: [
{
position: "Yes - we'll run out of quality data soon",
confidence: "medium",
reasoning: "Finite internet, synthetic data degrades. Fundamental limit on scaling.",
implications: "Scaling hits wall by ~2026"
},
{
position: "No - many ways around it",
confidence: "medium",
reasoning: "Synthetic data, multimodal, data efficiency, curriculum learning all help.",
implications: "Scaling can continue for decade+"
}
]
},
{
question: "Do reasoning failures indicate fundamental limits?",
positions: [
{
position: "Yes - architectural gap",
confidence: "high",
reasoning: "Same types of failures persist across scales. Not improving on these dimensions.",
implications: "Scaling insufficient"
},
{
position: "No - just need more scale",
confidence: "low",
reasoning: "Performance is improving. May cross threshold with more scale.",
implications: "Keep scaling"
}
]
},
{
question: "What would disprove the scaling hypothesis?",
positions: [
{
position: "Scaling 100x with no qualitative improvement",
confidence: "medium",
reasoning: "If we scale 100x from GPT-4 and see only incremental gains, suggests limits.",
implications: "Would validate skeptics"
},
{
position: "Running out of data/compute",
confidence: "medium",
reasoning: "If practical limits prevent further scaling, question becomes moot.",
implications: "Would require new approaches by necessity"
}
]
}
]}
/>
## What Would Change Minds?
**For scaling optimists to update toward skepticism:**
- Scaling 100x with only marginal capability improvements
- Hitting hard data or compute walls
- Proof that key capabilities (planning, causality) can't emerge from current architectures
- Persistent failures on simple reasoning despite increasing scale
**For skeptics to update toward scaling:**
- GPT-5/6 showing qualitatively new reasoning capabilities
- Solving ARC or other generalization benchmarks via pure scaling
- Continued emergent abilities at each scale-up
- Clear path around data limitations
## The Data Wall
A critical constraint on scaling is the availability of training data. [Epoch AI research](https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data) projects that high-quality human-generated text will be exhausted between 2026-2030, depending on training efficiency.
### Data Availability Projections
| Data Source | Current Usage | Exhaustion Timeline | Mitigation |
|-------------|---------------|---------------------|------------|
| High-quality web text | ≈300B tokens/year | 2026-2028 | Quality filtering, multimodal |
| Books and academic papers | ≈10% utilized | 2028-2030 | OCR improvements, licensing |
| Code repositories | ≈50B tokens/year | 2027-2029 | Synthetic generation |
| Multimodal (video, audio) | Under 5% utilized | 2030+ | [Epoch AI](https://epoch.ai/blog/can-ai-scaling-continue-through-2030): Could 3x available data |
| Synthetic data | Nascent | Unlimited potential | [Microsoft SynthLLM](https://www.microsoft.com/en-us/research/articles/synthllm-breaking-the-ai-data-wall-with-scalable-synthetic-data/): Performance plateaus at 300B tokens |
[Elon Musk stated in 2024](https://ncfacanada.org/musk-says-human-data-for-ai-training-is-depleted/) that AI has "already exhausted all human-generated publicly available data." However, Anthropic's position is that "data quality and quantity challenges are a solvable problem rather than a fundamental limitation," with synthetic data remaining "highly promising."
### The Synthetic Data Question
A key uncertainty is whether synthetic data can substitute for human-generated data. Research shows mixed results:
- **Positive**: [Microsoft's SynthLLM](https://www.microsoft.com/en-us/research/articles/synthllm-breaking-the-ai-data-wall-with-scalable-synthetic-data/) demonstrates scaling laws hold for synthetic data
- **Negative**: A [Nature study](https://www.weforum.org/stories/2025/12/data-ai-training-synthetic/) found that "abusing" synthetic data leads to "irreversible defects" and "model collapse" after a few generations
- **Nuanced**: Performance improvements plateau at approximately 300B synthetic tokens
## Implications for AI Safety
This debate has major implications for AI safety strategy, resource allocation, and policy priorities.
### Timeline and Strategy Implications
| Scenario | AGI Timeline | Safety Research Priority | Policy Urgency |
|----------|--------------|-------------------------|----------------|
| **Scaling works** | 5-10 years | LLM alignment, RLHF improvements | Critical: Must act now |
| **Scaling-plus** | 8-15 years | Reasoning model safety, scalable oversight | High: 5-10 year window |
| **New paradigms** | 15-30+ years | Broader alignment theory, unknown architectures | Moderate: Time to prepare |
| **Hybrid** | 10-20 years | Both LLM and novel approaches | High: Uncertainty requires robustness |
**If scaling works:**
- Short timelines (AGI within 5-10 years)
- Predictable capability trajectory
- Safety research can focus on aligning scaled-up LLMs
- Winner-take-all dynamics (whoever scales most wins)
**If new paradigms needed:**
- Longer timelines (10-30+ years)
- More uncertainty about capability trajectory
- Safety research needs to consider unknown architectures
- More opportunity for safety-by-default designs
**Hybrid scenario (emerging consensus):**
- Medium timelines (5-15 years)
- Some predictability, some surprises
- Safety research should cover both scaled LLMs and new architectures
- The [o1/o3 reasoning paradigm](https://openai.com/index/introducing-o3-and-o4-mini/) suggests this is the most likely path
### Resource Allocation Implications
The debate affects billions of dollars in investment decisions:
- **Stargate Project**: [\$500B committed through 2029](https://openai.com/index/announcing-the-stargate-project/) by <EntityLink id="E218">OpenAI</EntityLink>, SoftBank, Oracle—implicitly betting on scaling
- **<EntityLink id="E549">Meta</EntityLink>'s LLM focus**: [<EntityLink id="E582">Yann LeCun</EntityLink>'s November 2025 departure](https://fortune.com/2026/01/23/deepmind-demis-hassabis-anthropic-dario-amodei-yann-lecun-ai-davos/) to found Advanced Machine Intelligence Labs signals internal disagreement
- **<EntityLink id="E98">DeepMind</EntityLink>'s approach**: Combines scaling with algorithmic innovation (AlphaFold, Gemini)—hedging both sides
## Historical Parallels
**Cases where scaling worked:**
- ImageNet → Deep learning revolution (2012)
- GPT-2 → GPT-3 → GPT-4 trajectory
- AlphaGo scaling to AlphaZero
- Transformer scaling unlocking new capabilities
**Cases where new paradigms were needed:**
- Perceptrons → Neural networks (needed backprop + hidden layers)
- RNNs → Transformers (needed attention mechanism)
- Expert systems → Statistical learning (needed paradigm shift)
The question: Which pattern are we in now?
## 2024-2025: The Scaling Debate Intensifies
The past two years have provided significant new evidence, though interpretation remains contested.
### Key Developments
| Date | Event | Implications |
|------|-------|--------------|
| Sep 2024 | OpenAI releases o1 reasoning model | New scaling paradigm: test-time compute |
| Dec 2024 | o3 achieves 87.5% on ARC-AGI-1 | [ARC Prize](https://arcprize.org/blog/oai-o3-pub-breakthrough): "Surprising step-function increase" |
| Dec 2024 | [Ilya Sutskever NeurIPS speech](https://fortune.com/2025/02/19/generative-ai-scaling-agi-deep-learning/) | "Pretraining as we know it will end" |
| Feb 2025 | [GPT-5 pivot revealed](https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/) | 2-year delay; pure pretraining ceiling hit |
| May 2025 | [ARC-AGI-2 benchmark launched](https://arcprize.org/arc-agi) | All frontier models score under 5%; humans 100% |
| Aug 2025 | GPT-5 released | [Performance gains](https://aicommission.org/2025/08/gpt-5-is-finally-here-can-it-put-openai-back-on-top/) mainly from inference-time reasoning |
| Nov 2025 | [Yann LeCun leaves Meta](https://www.pymnts.com/artificial-intelligence-2/2025/meta-large-language-models-will-not-get-to-human-level-intelligence/) | Founds AMI Labs to pursue world models |
| Jan 2026 | [Davos AI debates](https://fortune.com/2026/01/23/deepmind-demis-hassabis-anthropic-dario-amodei-yann-lecun-ai-davos/) | Hassabis vs LeCun on AGI timelines |
### The Reasoning Revolution
The emergence of "reasoning models" in 2024-2025 partially resolved the debate by introducing a new scaling paradigm:
- **Test-time compute scaling**: [OpenAI observed](https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai) that reinforcement learning exhibits "more compute = better performance" trends similar to pretraining
- **o3 benchmark results**: 96.7% on AIME 2024, 87.7% on GPQA Diamond, 71.7% on SWE-bench Verified (vs o1's 48.9%)
- **Key insight**: Rather than scaling model parameters, scale inference-time reasoning through reinforcement learning
This suggests a "scaling-plus" resolution: pure pretraining scaling has diminishing returns, but new scaling regimes (reasoning, test-time compute) can unlock continued progress.
### Expert Positions Have Shifted
[Around 75% of AI experts](https://www.metaintro.com/blog/ai-scaling-debate) don't believe scaling LLMs alone will lead to AGI—but many now believe scaling *reasoning* could work:
| Expert | 2023 Position | 2025 Position | Key Quote |
|--------|---------------|---------------|-----------|
| <EntityLink id="E269">Sam Altman</EntityLink> | Pure scaling works | Scaling + reasoning | "There is no wall" (disputed) |
| <EntityLink id="E91">Dario Amodei</EntityLink> | Scaling is primary | Scaling "probably will continue" | Synthetic data "highly promising" |
| <EntityLink id="E582">Yann LeCun</EntityLink> | Skeptic | Strong skeptic | "LLMs are a dead end for AGI" |
| <EntityLink id="E163">Ilya Sutskever</EntityLink> | Strong scaling optimist | Nuanced | "Pretraining as we know it will end" |
| François Chollet | Skeptic | Skeptic validated | [Predicts human-level AI 2038-2048](https://www.freethink.com/robots-ai/arc-prize-agi) |
| <EntityLink id="E101">Demis Hassabis</EntityLink> | Hybrid approach | AGI by 2030 possible | Scaling + algorithmic innovation |
## Sources and Further Reading
- **OpenAI**: [Introducing o3 and o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/) - Reasoning model capabilities
- **ARC Prize**: [Technical Report 2024](https://arcprize.org/blog/arc-prize-2024-results-analysis) - Benchmark analysis
- **Fortune**: [The \$19.6 billion pivot](https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/) - GPT-5 development challenges
- **Fortune**: [Pure scaling has failed](https://fortune.com/2025/02/19/generative-ai-scaling-agi-deep-learning/) - Industry analysis
- **Epoch AI**: [Can AI scaling continue through 2030?](https://epoch.ai/blog/can-ai-scaling-continue-through-2030) - Quantitative projections
- **Stanford HAI**: [AI Index 2025](https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance) - Technical performance trends
- **Nathan Lambert**: [o3: The grand finale of AI in 2024](https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai) - Technical analysis
- **Cameron Wolfe**: [Scaling Laws for LLMs](https://cameronrwolfe.substack.com/p/llm-scaling-laws) - Historical overview
- **HEC Paris**: [AI Beyond the Scaling Laws](https://www.hec.edu/en/dare/tech-ai/ai-beyond-scaling-laws) - Academic perspective