RAND Corporation Press Release: AI Safety Research (May 2024)

web

2024·RAND Corporation·rand.org/news/press/2024/05/30.html

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: RAND Corporation

This RAND press release lacks retrievable content for full analysis; users should visit the URL directly to assess relevance to specific AI safety governance or policy topics.

Metadata

Importance: 30/100press releasenews

Summary

This is a RAND Corporation press release from May 2024. Without access to the full content, it likely covers policy-relevant research on AI safety, governance, or national security implications of advanced AI systems, consistent with RAND's focus areas.

Key Points

•Published by RAND Corporation, a leading policy research organization with significant AI governance work
•Dated May 30, 2024, placing it in the current wave of AI safety policy discourse
•RAND has been active in AI risk assessment, governance frameworks, and national security AI analysis
•Press releases typically summarize major research findings intended for policymakers and the public

Cited by 1 page

Page	Type	Quality
Open Source AI Safety	Approach	62.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20268 KB

- [Skip to page content](https://www.rand.org/news/press/2024/05/30.html#page-content)

# RAND Study Highlights Importance of Securing AI Model Weights; Provides Playbook for Frontier AI Labs to Benchmark Security Measures

For Release

Thursday

May 30, 2024

Amid the rapid advancement of artificial intelligence (AI) and its potential risks to national security, a new RAND [study](https://www.rand.org/pubs/research_reports/RRA2849-1.html) explores how best to secure frontier AI models from malicious actors.

Where most studies have focused on the security of AI systems more broadly, this study focuses on the potential theft and misuse of foundation AI model weights—the learnable parameters derived by training the model on massive data sets—and details how promising security measures can be adapted specifically for model weights.

Specifically, it highlights several measures that frontier AI labs should prioritize now to safeguard model weights: centralizing all copies of weights to a limited number of access-controlled and monitored systems; reducing the number of people with authorization; hardening interfaces against weight exfiltration; engaging third-party red-teaming; investing in defense-in-depth for redundancy; implementing insider threat programs; and incorporating Confidential Computing to secure the weights and reduce the attack surface. None of these are widely implemented, but all are feasible to achieve within a year, according to the report.

“Until recently, AI security was primarily a commercial concern, but as the technology becomes more capable, it's increasingly important to ensure these technologies don't end up in the hands of bad actors that could exploit them,” said [Sella Nevo](https://www.rand.org/about/people/n/nevo_sella.html), director of RAND's Meselson Center and one of the report's authors. “Not only does this study offer a first-of-its-kind playbook for AI companies to defend against the most sophisticated attacks, it also strives to facilitate meaningful engagement between policymakers, AI developers, and other stakeholders on risk management strategies and the broader impact of AI security.”

Additionally, the study provides a framework for assessing the feasibility of different attacks based on the resources and expertise available to various types of attackers and proposes a list of security benchmarks to fortify security systems. It pinpoints 38 distinct attack vectors across nine categories, from more mundane threats like basic social engineering schemes to more severe (and rare) measures like a military takeover. It also highlights five operational categories of attacker capabilities, ranging from low-budget amateur individuals to highly resourced nation-states, and assesses the feasibility of each group to successfully execute the identified attack vectors.

For example, an amateur attacker might have a less than 20 percent chance to discover and exploit an existing vulnerability in a model's machine learn

... (truncated, 8 KB total)

Resource ID: 8300b324baea38ca | Stable ID: sid_JCmtIkvOua