performance gap between US and Chinese models

web

jonvet.com·jonvet.com/blog/llm-scaling-in-2025

A personal blog post by Jonas Vetterle offering a practitioner-level commentary on AI scaling trends; useful as a snapshot of mainstream discourse around scaling law debates in late 2024, but not a primary research source.

Metadata

Importance: 28/100blog postcommentary

Summary

A blog post analyzing the state of LLM scaling laws as of late 2024/early 2025, examining whether pre-training scaling has stalled and how post-training techniques and test-time compute scaling have driven recent progress. It contextualizes OpenAI's o3 breakthrough against a backdrop of pessimism about AI advancement and discusses the competitive landscape between US and Chinese AI labs.

Key Points

•OpenAI's o3 achieved major breakthroughs on ARC and FrontierMath benchmarks, suggesting scaling has not fully stalled.
•2024 was characterized as a consolidation year where post-training and test-time compute scaling drove most model improvements rather than pre-training.
•Pessimism about scaling laws 'hitting a wall' was widespread in media coverage, but recent releases challenge that narrative.
•Workhorse models like GPT-4o and Sonnet 3.5 saw significant capability improvements, particularly in coding and math.
•The piece frames the US-China model performance gap as a key dimension of the competitive AI landscape in 2025.

Cited by 2 pages

Page	Type	Quality
Dense Transformers	Concept	58.0
AI Scaling Laws	Concept	92.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202616 KB

A brief history of LLM Scaling Laws and what to expect in 2025 Subscribe Authors Name Jonas Vetterle Twitter @jvetterle 
 OpenAI just unveiled their new reasoning model, o3, which breaks previous SOTA on the ARC dataset by a large margin and scored a breathtaking result on the challenging FrontierMath dataset. While we&#x27;re still updating our priors what this means for the trajectory of AI progress, it&#x27;s clear that the model is a significant step forward in terms of reasoning capabilities.

 However, if you&#x27;ve read recent new coverage (i.e. up until last week) about stalling AI progress including anonymous leaks and the occasional Gary Marcus rant , you probably noticed a certain degree of pessimism about the speed of advancement. Many were and probably still are wondering whether LLM Scaling Laws, which predict that increases in compute, data and model size lead to ever better models, have "hit a wall". Have we reached a limit in terms of how much we can scale the current paradigm: transformer-based LLMs?

 Apart from the releases of the first publicly available reasoning models (OpenAI&#x27;s o1, Google&#x27;s Gemini 2.0 Flash, and now also o3 which will be released to public in 2025), most model providers have been focussing on what on the surface looked like incremental improvements to their existing models. In that sense, for the most of it, 2024 has been a year of consolidation - many models have essentially caught up with what used to be the go-to model at the beginning of the year, GPT4.

 But that masks the progress that&#x27;s actually been made to the "work horse" models like GPT-4o, Sonnet 3.5, Llama 3 etc. (i.e. everything that&#x27;s not a reasoning model), which are most commonly used in AI applications. The big labs have continued to ship new versions of these models that pushed SOTA performance across the board, and which came with huge improvements on tasks like coding and solving math problems.

 One cannot but notice that 2024 has been the year in which improvements in model performance were primarily driven by post-training and scaling test-time compute . In terms of pretraining there hasn&#x27;t been as much news. This has led to some speculation that the (pre-training) scaling laws are breaking down, and that we are reaching the limits of what is possible with current models, data and compute.

 Loading Tweet Video... In this post, I&#x27;ll recap the history of LLM scaling laws, and my thoughts on where we are headed next. It&#x27;s different to make predictions from the outside of the big AI labs. But based on the information that I&#x27;ve seen, here is my summary of how scaling LLMs might continue in 2025:

 pre-training : limited - compute scaling is underway, but we&#x27;re likely limited by new, high quality data of sufficient scale
 post-training : more likely - use of synthetic data has been shown to be very effective and this will likely continue
 inference-time : also likely - OpenAI and Google/Deepm

... (truncated, 16 KB total)

Resource ID: 7226d362130b23f8 | Stable ID: sid_YNoWnZrGpH