Weapons of Mass Destruction Proxy Benchmark (WMDP)

web

WMDP is a key benchmark for assessing WMD-related hazardous capabilities in LLMs, relevant to AI safety evaluations conducted by labs and regulators to gate model deployment decisions.

Metadata

Importance: 72/100tool pagetool

Summary

WMDP is a benchmark designed to measure and evaluate hazardous knowledge in large language models related to biosecurity, chemical, nuclear, and radiological weapons. It serves as a proxy for assessing dangerous capabilities in AI systems and supports unlearning research aimed at reducing such risks. The benchmark helps researchers identify and mitigate the potential for LLMs to assist in weapons development.

Key Points

•Provides a multiple-choice benchmark with ~4,000 questions across biosecurity, chemical, nuclear, and radiological domains to assess hazardous LLM knowledge.
•Designed to support machine unlearning techniques that can selectively remove dangerous knowledge from models without degrading general capabilities.
•Helps AI developers and safety researchers identify models that may inadvertently provide uplift toward weapons of mass destruction.
•Serves as both an evaluation tool and a red-teaming resource for responsible AI deployment decisions.
•Accompanies the CUT (Corrective Unlearning Toolkit) method for reducing hazardous knowledge while preserving model utility.

Cited by 2 pages

Page	Type	Quality
AI Safety Institutes (AISIs)	Policy	69.0
Capability Unlearning / Removal	Approach	65.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20260 KB

WMDP Benchmark You need to enable JavaScript to run this app.

Resource ID: cfa49cff8bb3ac32 | Stable ID: sid_DKj4ar6gXK