Skip to content
Longterm Wiki
Back

Intro to ML Safety Course

web
course.mlsafety.org·course.mlsafety.org/

Created by the Center for AI Safety, this is one of the most comprehensive structured curricula for technical AI safety, suitable for ML practitioners seeking a systematic introduction to safety concepts and methods.

Metadata

Importance: 82/100documentationeducational

Summary

A structured university-level course on machine learning safety developed by the Center for AI Safety, covering topics from robustness and anomaly detection to alignment and systemic safety. The course includes lecture recordings, slides, notes, and coding assignments across modules on safety engineering, robustness, monitoring, alignment, and emerging risks.

Key Points

  • Covers safety engineering fundamentals: risk decomposition, accident models (Swiss Cheese, STAMP), and black swan risks
  • Technical modules on adversarial robustness, anomaly detection, and interpretable uncertainty with hands-on coding assignments
  • Includes alignment-focused content on goal misgeneralization, scalable oversight, and emergent capabilities
  • Draws on established safety engineering frameworks from other high-stakes industries and applies them to ML systems
  • Freely accessible with full lecture recordings, slides, written notes, and Colab notebooks for self-study or structured learning

Cited by 2 pages

PageTypeQuality
Center for AI SafetyOrganization42.0
Dan HendrycksPerson19.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20268 KB
LinkSearchMenuExpandDocument

[Intro to ML Safety](https://course.mlsafety.org/)

> [Express interest in the next semester of Intro to ML Safety](https://airtable.com/appKxWEAvwZxweuyZ/shruI8noZFrsgIpvD).

# Syllabus

Legend: 🎥 lecture recording, 🖥️ slides, 📖 notes, 📝 written questions, ⌨️ coding assignment.

## Background

1**Introduction**[🎥](https://course.mlsafety.org/#media-popup), [🖥️️](https://docs.google.com/presentation/d/1vP4s1oxomdg3uU5PiV5EnSaiA6kSNcMxtI3L9wRhubQ/edit?usp=sharing)2**Optional** **Deep Learning Review**[🎥](https://course.mlsafety.org/#media-popup), [🖥️](https://docs.google.com/presentation/d/15yMNlkWAL5cuSHHZe1gy2sM8zcN8gHk9iBVzKKvS9zw/edit?usp=sharing), [📖](https://github.com/centerforaisafety/Intro_to_ML_Safety/blob/master/Deep%20Learning%20Review/main.md), [📝](https://drive.google.com/file/d/1pGSXbv68aHJ-ThLUZzH4D2tzPNaFhVqF/view?usp=sharing), [⌨️](https://colab.research.google.com/drive/1AEUEhqVmS4PFl3hPMzs2qPvn38twrQh3?copy)_building blocks, optimizers, losses, datasets_

## Safety Engineering

3**Risk Decomposition**[🎥](https://course.mlsafety.org/#media-popup), [🖥️️](https://docs.google.com/presentation/d/1RMZ89VHzVnDhugcrrwvHQnRIw366dMr3JYFC3rkxjL0/edit?usp=sharing)_risk analysis definitions, disaster risk equation, decomposition of safety areas, ability to cope and existential risk_4**Accident Models**[🎥](https://course.mlsafety.org/#media-popup), [🖥️](https://docs.google.com/presentation/d/1HquuLs0OTVYvuk0QRCG_6aqWhmMEf7sDBFLvRaEAZL4/edit?usp=sharing)_FMEA, Bow Tie model, Swiss Cheese model, defense in depth, preventative and protective measures, complex systems, nonlinear causality, emergence, STAMP_5**Black Swans**[🎥](https://course.mlsafety.org/#media-popup), [🖥️](https://docs.google.com/presentation/d/1rDWQuwdqFPm1ebqnuM9x_H-2ZYGehj6kSp_5LOi6q5E/edit?usp=sharing)_unknown unknowns, long tailed distributions, multiplicative processes, extremistan_►[Review questions 📝](https://drive.google.com/file/d/17hybWUxiVfdo7qFmvnfvfaLLS9Z43LtX/view?usp=sharing)

## Robustness

6**Adversarial Robustness**[🎥](https://course.mlsafety.org/#media-popup), [🖥️](https://docs.google.com/presentation/d/1HzloChC0XElQkCTI181CN6OaYcVNnB5l37sfuANkcq0/edit?usp=sharing), [📖](https://github.com/centerforaisafety/Intro_to_ML_Safety/blob/master/Adversarial%20Robustness/main.md), [⌨️](https://colab.research.google.com/drive/1ezV-jXyPgXDMSo6LqXyRgV_f2ky0cCFH?usp=sharing)_optimization pressure, PGD, untargeted vs targeted attacks, adversarial evaluation, white box vs black box, transferability, unforeseen attacks, text attacks, robustness certificates_7**Black Swan Robustness**[🎥](https://course.mlsafety.org/#media-popup), [🖥️️](https://docs.google.com/presentation/d/1uW7hNstJAq7_lSyk3yP8yTSjN85itESbDHFRi1F4wiw/edit?usp=sharing), [📖](https://github.com/centerforaisafety/Intro_to_ML_Safety/blob/master/Black%20Swan%20Robustness/main.md)_stress tests, train-test mismatch, adversarial distribution shifts, simulated scenar

... (truncated, 8 KB total)
Resource ID: 65c9fe2d57a4eb4c | Stable ID: NGRmNWJjMz