nearly 5x more likely

government

UK AI Safety Institute·aisi.gov.uk/blog/5-key-findings-from-our-first-frontier-a...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: UK AI Safety Institute

Published by the UK's AI Safety Institute (AISI), this report represents one of the first government-led longitudinal evaluations of frontier model capabilities, offering empirical trend data relevant to both safety research and AI governance discussions.

Metadata

Importance: 74/100blog postanalysis

Summary

The UK AI Security Institute's inaugural Frontier AI Trends Report synthesizes evaluations of 30+ frontier AI models to document rapid capability gains across chemistry, biology, and cybersecurity domains. Key findings include models surpassing PhD-level expertise in CBRN fields, cyber task success rates rising from 9% to 50% in under two years, persistent jailbreak vulnerabilities, and growing AI autonomy. The report highlights a dangerous gap between capability advancement and policy adaptation.

Key Points

•Frontier AI models now exceed PhD-level expertise in chemistry and biology, raising serious dual-use and biosecurity concerns.
•Cyber capabilities improved dramatically: success on apprentice-level tasks rose from ~9% to ~50% in under two years across evaluated models.
•Universal jailbreaks remain effective against current model safeguards, indicating structural vulnerabilities in safety mechanisms.
•AI systems are increasingly operating autonomously, expanding potential attack surfaces and misuse vectors.
•Capability growth is outpacing governance and policy frameworks, creating a regulatory lag with national security implications.

Cited by 4 pages

Page	Type	Quality
Capability Elicitation	Approach	91.0
AI Evaluation	Approach	72.0
Third-Party Model Auditing	Approach	64.0
AI Output Filtering	Approach	63.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202610 KB

5 key findings from our first Frontier AI Trends Report | AISI Work 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Blog 
 
 Organisation 5 key findings from our first Frontier AI Trends Report

 Our inaugural Frontier AI Trends Report draws on 2 years&#x27; worth of evaluations to provide accessible insights into the trajectory of AI development.

 — Dec 18, 2025 
At the AI Security Institute (AISI), we conduct testing of frontier AI systems to better understand their national security, economic, and public safety implications. Since we were established in November 2023, we’ve conducted wide-ranging evaluations of over 30 state-of-the-art AI models.  

 So far, we’ve primarily shared our results within government channels and with AI companies. However, our testing reveals an extraordinary pace of development with the potential to transform many aspects of our lives in the coming years. We believe that the public need accessible, data-driven insights into the frontier of AI development to navigate this transformation – which is why we’ve decided to release our first Frontier AI Trends Report .

 The report contains a selection of aggregated testing results to illustrate high-level trends in AI progress across domains including chemistry, biology, cybersecurity, and autonomy, as well as broader societal impacts.  

 In this blog post, we share five headline results. 

 ‍

 AI models have far surpassed PhD-level expertise in chemistry and biology   

 We test AI models’ scientific knowledge using two privately developed test sets: Chemistry QA and Biology QA. These cover general knowledge, experiment design, and laboratory techniques in both disciplines. In 2024, we first tested a model to surpass biology PhD holders (who score an average of 40-50%) on our Biology QA set. Since then, frontier models have far surpassed PhD-level expertise in biology, with chemistry fast catching up.  

 Frontier model performance over time on AISI’s chemistry and biology question-answer (QA) evaluations relative to expert baseline scores (48% for Chemistry QA and 38% for Biology QA). Human baselines were established with PhD holders or equivalent professionals (e.g. 4+ years in bio-security policy) in chemistry or biology. Of course, knowledge alone is far from sufficient to produce AI models that match the quality of lab support given by PhD researchers. Our evaluations also test a broader suite of skills including protocol generation and lab troubleshooting , where we’ve seen considerable progress in our two years of testing.  

 Read more of our findings on chemistry & biology capabilities. 

 ‍

 AI models are improving at cyber tasks across all difficulty levels 

 We evaluate models on a suite of cyber evaluations that test for capabilities such as identifying code vulnerabilities or developing malware. This helps us understand how they could be used for both defensive and offensive purposes. 

 

... (truncated, 10 KB total)

Resource ID: 8a9de448c7130623 | Stable ID: sid_Bd5h1WpPLe