Advancing red teaming with people and AI
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
Published by OpenAI, this piece describes their evolving red teaming methodology combining human experts and automated tools, relevant to practitioners working on pre-deployment safety evaluations and adversarial testing frameworks.
Metadata
Importance: 62/100blog postprimary source
Summary
OpenAI examines the combination of human red teamers and automated AI-assisted red teaming to more systematically and scalably identify vulnerabilities in AI models. The research explores how diverse external red teams and automated methods complement each other to improve coverage of potential harms and failure modes.
Key Points
- •Combines human red teaming expertise with automated AI-driven approaches to broaden vulnerability discovery in language models.
- •External red teamers bring diverse perspectives, domain expertise, and creative attack strategies that automated systems may miss.
- •Automated red teaming can scale testing efforts and explore large prompt spaces more efficiently than human-only approaches.
- •The hybrid approach aims to systematically surface safety risks before model deployment, supporting safer release practices.
- •Findings inform iterative safety improvements and help benchmark model robustness against adversarial inputs.
Review
OpenAI's research on red teaming represents a critical approach to proactively identifying and mitigating potential risks in AI systems. By combining external human expertise with automated testing methods, the research aims to create more comprehensive safety evaluations that can capture diverse potential failure modes and misuse scenarios. The methodology involves carefully designed testing campaigns that include selecting diverse experts, creating structured testing interfaces, and developing advanced automated techniques that can generate novel and effective attack strategies. Notably, the research leverages more capable AI models like GPT-4 to improve the diversity and effectiveness of red teaming, demonstrating a meta-approach to using AI for improving AI safety. While acknowledging limitations such as temporal relevance and potential information hazards, the research represents an important step towards more robust AI risk assessment strategies.
Cached Content Preview
HTTP 200Fetched Apr 9, 202613 KB
Advancing red teaming with people and AI | OpenAI
Jan
FEB
Mar
11
2025
2026
2027
success
fail
About this capture
COLLECTED BY
Collection: Save Page Now Outlinks
TIMESTAMPS
The Wayback Machine - http://web.archive.org/web/20260211111316/https://openai.com/index/advancing-red-teaming-with-people-and-ai/
Skip to main content
Log in
Switch to
ChatGPT(opens in a new window)
Sora(opens in a new window)
API Platform(opens in a new window)
Research
Safety
For Business
For Developers
ChatGPT
Sora
Codex
Stories
Company
News
Research
Back to main menu
Research Index
Research Overview
Research Residency
OpenAI for Science
Latest Advancements
GPT-5.2
GPT-5.1
Sora 2
GPT-5
OpenAI o3 and o4-mini
GPT-4.5
Safety
Back to main menu
Safety Approach
Security & Privacy
For Business
Back to main menu
Business Overview
Enterprise
Startups
Solutions
Learn
ChatGPT Pricing
API Pricing
Contact Sales
For Developers
Back to main menu
API Platform
API Pricing
Agents
Codex
Open Models
Community
(opens in a new window)
ChatGPT
Back to main menu
Explore ChatGPT
Business
Enterprise
Education
Pricing
Download
Sora
Codex
Stories
Company
Back to main menu
About Us
Our Charter
Foundation
Careers
Brand Guidelines
News
Log in
OpenAI
Table of contents
The value of red teaming
External human red teaming
Automated red teaming
Limitations
November 21, 2024
Publication
Advancing red teaming with people and AI
Two new papers show how our external and automated red teaming efforts are advancing to help deliver safe and beneficial AI
External red teaming
(opens in a new window)Automated red teaming
(opens in a new window)
Loading…
Share
Interacting with an AI system is an essential way to learn what it can do—both the capabilities it has, and the risks it may pose. “Red teaming” means using people or AI to explore a new system’s potential risks in a structured way.
OpenAI has applied red teaming for a number of years, including when we engaged external experts(opens in a new window) to test our DALL·E 2 image generation model in early 2022. Our earliest red teaming efforts were primarily “manual” in the sense that we relied on people to conduct testing. Since then we’ve continued to use and refine our methods, and last July, we joined other leading labs in a commitment to invest further in red teaming and advance this research area.
Red teaming methods include manual, automated, and mixed approaches, and we use all three. We engage outside experts in both manual and automated methods of testing for new systems’ potential risks. At the same time, we are optimistic that we can use more powerful AI to scale the discovery of model mistakes, both for evaluating models and to train them to be safer.
Today, we are sharing two paper
... (truncated, 13 KB total)Resource ID:
ea71869e9fa90e9d | Stable ID: sid_xwhnT8qJWG