Skip to content
Longterm Wiki
Back

Unsolved Problems in ML Safety

paper

Authors

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Foundational paper providing a comprehensive roadmap of unsolved technical problems in ML safety, addressing emerging challenges from large-scale models and establishing research priorities for the field.

Paper Details

Citations
0
21 influential
Year
2021
Methodology
book-chapter
Categories
Unsolved Problems in Astrophysics

Metadata

arxiv preprintprimary source

Abstract

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.

Summary

This paper presents a comprehensive roadmap for ML safety research, identifying four critical problem areas that the field must address as machine learning systems grow larger and are deployed in high-stakes applications. The authors categorize safety challenges into Robustness (withstanding hazards), Monitoring (identifying hazards), Alignment (reducing inherent model hazards), and Systemic Safety (reducing systemic hazards). By clarifying the motivation behind each problem and providing concrete research directions, the paper aims to guide the ML safety research community toward addressing emerging safety challenges posed by large-scale models.

Cited by 2 pages

PageTypeQuality
Center for AI SafetyOrganization42.0
Dan HendrycksPerson19.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Unsolved Problems in ML Safety

Dan Hendrycks

UC Berkeley

Nicholas Carlini

Google

John Schulman

OpenAI

Jacob Steinhardt

UC Berkeley

###### Abstract

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address.
We present four problems ready for research, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), steering ML systems (“Alignment”), and reducing deployment hazards (“Systemic Safety”). Throughout, we clarify each problem’s motivation and provide concrete research directions.

## 1 Introduction

As machine learning (ML) systems are deployed in high-stakes environments, such as medical settings \[ [147](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx147 "")\], roads \[ [185](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx185 "")\], and command and control centers \[ [39](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx39 "")\], unsafe ML systems may result in needless loss of life. Although researchers recognize that safety is important \[ [1](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx1 ""), [5](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx5 "")\],
it is often unclear what problems to prioritize or how to make progress.
We identify four problem areas that would help make progress on ML Safety: robustness, monitoring, alignment, and systemic safety. While some of these, such as robustness, are long-standing challenges, the success and emergent capabilities of modern ML systems necessitate new angles of attack.

We define ML Safety research as ML research aimed at making the adoption of ML more beneficial, with emphasis on long-term and long-tail risks.
We focus on cases where greater capabilities can be expected to decrease safety, or where ML Safety problems are otherwise poised to become more challenging in this decade.
For each of the four problems, after clarifying the motivation, we discuss possible research directions that can be started or continued in the next few years.
First, however, we motivate the need for ML Safety research.

We should not procrastinate on safety engineering. In a report for the Department of Defense, Frola and Miller \[ [55](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx55 "")\] observe that approximately 75%percent7575\\% of the most critical decisions that determine a system’s safety occur early in development \[ [121](https://ar5iv.labs.arxiv.org/html/2109.13916#bib.bibx121 "")\]. If attention to safety is delayed, its impact is limited, as unsafe design choices become deeply embedded into the system.
The Internet was initially designed as an academ

... (truncated, 98 KB total)
Resource ID: f94e705023d45765 | Stable ID: MzYxNmRmMj