Skip to content
Longterm Wiki

Advancing red teaming with people and AI

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

Published by OpenAI, this piece describes their evolving red teaming methodology combining human experts and automated tools, relevant to practitioners working on pre-deployment safety evaluations and adversarial testing frameworks.

Metadata

Importance: 62/100blog postprimary source

Summary

OpenAI examines the combination of human red teamers and automated AI-assisted red teaming to more systematically and scalably identify vulnerabilities in AI models. The research explores how diverse external red teams and automated methods complement each other to improve coverage of potential harms and failure modes.

Key Points

  • Combines human red teaming expertise with automated AI-driven approaches to broaden vulnerability discovery in language models.
  • External red teamers bring diverse perspectives, domain expertise, and creative attack strategies that automated systems may miss.
  • Automated red teaming can scale testing efforts and explore large prompt spaces more efficiently than human-only approaches.
  • The hybrid approach aims to systematically surface safety risks before model deployment, supporting safer release practices.
  • Findings inform iterative safety improvements and help benchmark model robustness against adversarial inputs.

Review

OpenAI's research on red teaming represents a critical approach to proactively identifying and mitigating potential risks in AI systems. By combining external human expertise with automated testing methods, the research aims to create more comprehensive safety evaluations that can capture diverse potential failure modes and misuse scenarios. The methodology involves carefully designed testing campaigns that include selecting diverse experts, creating structured testing interfaces, and developing advanced automated techniques that can generate novel and effective attack strategies. Notably, the research leverages more capable AI models like GPT-4 to improve the diversity and effectiveness of red teaming, demonstrating a meta-approach to using AI for improving AI safety. While acknowledging limitations such as temporal relevance and potential information hazards, the research represents an important step towards more robust AI risk assessment strategies.

Cached Content Preview

HTTP 200Fetched Apr 9, 202613 KB
Advancing red teaming with people and AI | OpenAI

 

 
 
 
 

 Jan
 FEB
 Mar
 

 
 

 
 11
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Save Page Now Outlinks

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - http://web.archive.org/web/20260211111316/https://openai.com/index/advancing-red-teaming-with-people-and-ai/

 
Skip to main content

Log in

Switch to

ChatGPT(opens in a new window)

Sora(opens in a new window)

API Platform(opens in a new window)

Research

Safety

For Business

For Developers

ChatGPT

Sora

Codex

Stories

Company

News

Research

Back to main menu

Research Index

Research Overview

Research Residency

OpenAI for Science

Latest Advancements

GPT-5.2

GPT-5.1

Sora 2

GPT-5

OpenAI o3 and o4-mini

GPT-4.5

Safety

Back to main menu

Safety Approach

Security & Privacy

For Business

Back to main menu

Business Overview

Enterprise

Startups

Solutions

Learn

ChatGPT Pricing

API Pricing

Contact Sales

For Developers

Back to main menu

API Platform

API Pricing

Agents

Codex

Open Models

Community

(opens in a new window)

ChatGPT

Back to main menu

Explore ChatGPT

Business

Enterprise

Education

Pricing

Download

Sora

Codex

Stories

Company

Back to main menu

About Us

Our Charter

Foundation

Careers

Brand Guidelines

News

Log in

OpenAI
Table of contents

The value of red teaming

External human red teaming

Automated red teaming 

Limitations

November 21, 2024
Publication

Advancing red teaming with people and AI 

Two new papers show how our external and automated red teaming efforts are advancing to help deliver safe and beneficial AI

External red teaming

(opens in a new window)Automated red teaming

(opens in a new window)

Loading…

Share

Interacting with an AI system is an essential way to learn what it can do—both the capabilities it has, and the risks it may pose. “Red teaming” means using people or AI to explore a new system’s potential risks in a structured way. 

OpenAI has applied red teaming for a number of years, including when we engaged external experts⁠(opens in a new window) to test our DALL·E 2 image generation model in early 2022. Our earliest red teaming efforts were primarily “manual” in the sense that we relied on people to conduct testing. Since then we’ve continued to use and refine our methods, and last July, we joined other leading labs in a commitment to invest further in red teaming and advance this research area. 

Red teaming methods include manual, automated, and mixed approaches, and we use all three. We engage outside experts in both manual and automated methods of testing for new systems’ potential risks. At the same time, we are optimistic that we can use more powerful AI to scale the discovery of model mistakes, both for evaluating models and to train them to be safer.   

Today, we are sharing two paper

... (truncated, 13 KB total)
Resource ID: ea71869e9fa90e9d | Stable ID: sid_xwhnT8qJWG