Unsolved Problems in ML Safety

paper

2021·arXiv·arxiv.org/abs/2109.13916

Authors

Dan Hendrycks·Nicholas Carlini·John Schulman·Jacob Steinhardt

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Foundational paper providing a comprehensive roadmap of unsolved technical problems in ML safety, addressing emerging challenges from large-scale models and establishing research priorities for the field.

Paper Details

Citations

21 influential

Year

2021

Methodology

book-chapter

Metadata

arxiv preprintprimary source

Abstract

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.

Summary

This paper presents a comprehensive roadmap for ML safety research, identifying four critical problem areas that the field must address as machine learning systems grow larger and are deployed in high-stakes applications. The authors categorize safety challenges into Robustness (withstanding hazards), Monitoring (identifying hazards), Alignment (reducing inherent model hazards), and Systemic Safety (reducing systemic hazards). By clarifying the motivation behind each problem and providing concrete research directions, the paper aims to guide the ML safety research community toward addressing emerging safety challenges posed by large-scale models.

Cited by 2 pages

Page	Type	Quality
Center for AI Safety (CAIS)	Organization	42.0
Dan Hendrycks	Person	19.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202698 KB

[2109.13916] Unsolved Problems in ML Safety 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 

   
 Unsolved Problems in ML Safety 
   

 
 
 Dan Hendrycks
 UC Berkeley
 
 
    
 Nicholas Carlini
 Google
 
 
    
 John Schulman
 OpenAI
 
 
    
 Jacob Steinhardt
 UC Berkeley
 
 
 

 
 Abstract

 Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address.
We present four problems ready for research, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), steering ML systems (“Alignment”), and reducing deployment hazards (“Systemic Safety”). Throughout, we clarify each problem’s motivation and provide concrete research directions.

 
 
 
 1 Introduction

 
 As machine learning (ML) systems are deployed in high-stakes environments, such as medical settings [ 147 ] , roads [ 185 ] , and command and control centers [ 39 ] , unsafe ML systems may result in needless loss of life. Although researchers recognize that safety is important [ 1 , 5 ] ,
it is often unclear what problems to prioritize or how to make progress.
We identify four problem areas that would help make progress on ML Safety: robustness, monitoring, alignment, and systemic safety. While some of these, such as robustness, are long-standing challenges, the success and emergent capabilities of modern ML systems necessitate new angles of attack.

 
 
 We define ML Safety research as ML research aimed at making the adoption of ML more beneficial, with emphasis on long-term and long-tail risks.
We focus on cases where greater capabilities can be expected to decrease safety, or where ML Safety problems are otherwise poised to become more challenging in this decade.
For each of the four problems, after clarifying the motivation, we discuss possible research directions that can be started or continued in the next few years.
First, however, we motivate the need for ML Safety research.

 
 
 We should not procrastinate on safety engineering. In a report for the Department of Defense, Frola and Miller [ 55 ] observe that approximately 75 % percent 75 75\% of the most critical decisions that determine a system’s safety occur early in development [ 121 ] . If attention to safety is delayed, its impact is limited, as unsafe design choices become deeply embedded into the system.
The Internet was initially designed as an academic tool with neither safety nor security in mind [ 47 ] .
Decades of security patches later, security measures are still incomplete and increasingly complex.
A similar reason for starting safety work now is that relying on experts to test safety solutions is not enough—solutions must

... (truncated, 98 KB total)

Resource ID: f94e705023d45765 | Stable ID: sid_9nmjIoMijn