Skip to content
Longterm Wiki

[PDF] METR's AI Action Plan Comment

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

Metadata

Cited by 1 page

PageTypeQuality
Third-Party Model AuditingApproach64.0

Cached Content Preview

HTTP 200Fetched Apr 30, 202633 KB
# Model Evaluation and Threat Research

**— 440 N Barranca Ave #3345 Covina, CA 91723 —**

## March 15, 2025

Faisal D’Souza, NCO 2415 Eisenhower Avenue Alexandria, VA 22314

## Response to Request for Information on the Development of an Artificial

## Intelligence (AI) Action Plan

METR is a research nonprofit focused on developing the science of assessing AI systems for capabilities that could pose catastrophic risks to public safety and national security. In 2024 and early 2025, METR conducted evaluations of systems like 1 2 GPT-4.5 and Claude 3.5 Sonnet in partnership with their respective developers, typically before they were released to the public. METR has also advised many of the leading AI developers to help them establish frameworks for scaling up their risk 3 mitigations in proportion to the measured or forecasted abilities of their systems.

METR appreciates the opportunity to provide input into the shaping of the upcoming Artificial Intelligence Action Plan. We believe that this is a critical time for the U.S. government and other actors to plan soberly for the risks (and opportunities) that may come from further significant advancements in the state of AI capability. This response begins by outlining a few considerations that we believe are important for informing the general direction of the Plan. It then recommends several concrete actions that we believe merit prioritization within the Plan.

1 [https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf](https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf) [https://metr.org/blog/2025-01-31-update-sonnet-o1-evals/](https://metr.org/blog/2025-01-31-update-sonnet-o1-evals/) [http://metr.org/faisc](http://metr.org/faisc)

* * *

# Summary

We believe it will be important to plan around the following three major factors:

1. AI systems are rapidly improving at carrying out tasks independently over long time horizons.

2. AI systems are improving on AI R&D tasks, in a feedback loop that could be very destabilizing.

3. AI systems will plausibly end up with unintended and/or malign goals, conceal this fact, and resist attempts to remove these goals.
   In light of these factors, we recommend that the AI Action Plan include at least the following four priority actions:

4. Collaborate with the private sector to improve the information security and internal controls achievable by the leading U.S. AI developers.

5. Lead creation of narrowly-tailored formal standards for how and when AI developers should measure:
   a. Capabilities that would directly threaten public safety or national security, specifically CBRN weapons development and cyber-offense.
   b. Capabilities that would amplify risks from AI systems, specifically long-horizon autonomy, automated AI R&D, and control subversion.

6. Prepare and consider interventions on the AI development and deployment process, as future circumstances may warrant, to:
   a. Require reporting of training runs and deployments above some threshold 

... (truncated, 33 KB total)
Resource ID: 14441e0a6692bf9c | Stable ID: sid_qFMWOawMHA