Strengthening Cyber Resilience as AI Capabilities Advance
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
OpenAI's December 2025 post outlines their approach to managing rapidly advancing AI cybersecurity capabilities, including safeguards for models approaching 'High' capability levels (zero-day exploits, enterprise intrusion), relevant to AI safety's dual-use risk management and deployment safety.
Metadata
Summary
OpenAI describes how their models' cybersecurity capabilities have rapidly improved (27% to 76% on CTF benchmarks from August to November 2025) and outlines a defense-in-depth safeguard strategy for models approaching 'High' capability levels. The post details layered mitigations including model training, detection systems, access controls, and partnerships with security experts. OpenAI frames this as a long-term investment to ensure advanced AI primarily benefits defenders rather than enabling malicious actors.
Key Points
- •CTF benchmark performance jumped from 27% (GPT-5, Aug 2025) to 76% (GPT-5.1-Codex-Max, Nov 2025), signaling rapid capability growth.
- •OpenAI is planning as if each new model could reach 'High' cybersecurity capability: zero-day exploits or complex enterprise intrusion operations.
- •A defense-in-depth approach is used: access controls, infrastructure hardening, egress controls, monitoring, and model-level refusals.
- •Dual-use nature of cyber knowledge means no single safeguard suffices; layered safety stack balances risk while empowering legitimate defenders.
- •OpenAI frames this as a sustained long-term investment in giving defenders an advantage across critical infrastructure ecosystems.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Misuse Risk Cruxes | Crux | 65.0 |
| AI Cyber Damage: Bounding the Tail | Analysis | -- |
Cached Content Preview
OpenAI December 10, 2025
Security Strengthening cyber resilience as AI capabilities advance
As our models grow more capable in cybersecurity, we’re investing in strengthening them, layering in safeguards, and partnering with global security experts.
Loading… Share Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyberdefense as well as new dual-use risks that must be managed carefully. For example, capabilities assessed through capture-the-flag (CTF) challenges have improved from 27% on GPT‑5 (opens in a new window) in August 2025 to 76% on GPT‑5.1‑Codex‑Max (opens in a new window) in November 2025.
We expect that upcoming AI models will continue on this trajectory; in preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework (opens in a new window) . By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects. This post explains how we think about safeguards for models that reach these levels of capability, and ensure they meaningfully help defenders while limiting misuse.
As these capabilities advance, OpenAI is investing in strengthening our models for defensive cybersecurity tasks and creating tools that enable defenders to more easily perform workflows such as auditing code and patching vulnerabilities. Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced.
Like other dual-use domains, defensive and offensive cyber workflows often rely on the same underlying knowledge and techniques. We are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes. Cybersecurity touches almost every field, which means we cannot rely on any single category of safeguards—such as restricting knowledge or using vetted access alone—but instead need a defense-in-depth approach that balances risk and empowers users. In practice, this means shaping how capabilities are accessed, guided, and applied so that advanced models strengthen security rather than lower barriers to misuse.
We see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.
Mitigating malicious uses
Our models are designed and trained to operate safely, supported by proactive systems that detect and respond to cyber abuse. We continuously refine these protections as our capabilities and the threat landscape change. While no system can guarantee complete prevention of misuse in cybersecurity without severely impacting defensive uses, our
... (truncated, 9 KB total)e550a2466989b110 | Stable ID: sid_fKeHMbRP8w