Skip to content
Longterm Wiki
Back

Research on AI Safety (Russell et al., AIPS 2015)

web

A foundational early paper by Stuart Russell articulating the value alignment problem and IRL-based solutions; precursor to the CIRL/assistance games research program and influential in shaping the technical AI safety field.

Metadata

Importance: 82/100conference paperprimary source

Summary

This paper by Stuart Russell and colleagues, presented at AIPS 2015, outlines a foundational framework for AI safety centered on the idea that AI systems should be uncertain about human values and use inverse reinforcement learning to infer them. It introduces the concept of assistance games (formerly CIRL) where AI agents are cooperative and defer to human preferences rather than pursuing fixed objective functions.

Key Points

  • Argues that specifying a fixed objective function for AI is dangerous; instead, AI should remain uncertain about human values and seek to learn them.
  • Proposes inverse reinforcement learning (IRL) as a mechanism for AI to infer human preferences from observed behavior.
  • Introduces cooperative/assistance game framework where AI and human have aligned incentives, making the AI inherently deferential.
  • Identifies three core properties for safe AI: altruistic (human-welfare-maximizing), uncertainty about objectives, and ability to learn from human behavior.
  • Frames AI safety as a technical problem solvable through principled probabilistic and game-theoretic approaches.

Cited by 1 page

PageTypeQuality
Center for Human-Compatible AIOrganization37.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20261 KB
# Resource not found

## The server has encountered a problem because the resource was not found.

Your request was :https://people.eecs.berkeley.edu/~russell/papers/aips15-safety.pdf

## What are you looking for?

- [Graduate Programs and Admissions](https://eecs.berkeley.edu/academics/graduate). There's also a [Graduate Admissions FAQ](https://eecs.berkeley.edu/academics/graduate/faq) for answers to specific questions.
- [Undergraduate Programs and Admissions](https://eecs.berkeley.edu/academics/undergraduate)
- [Recruiting EECS students and posting jobs](https://eecs.berkeley.edu/industry/recruit-students)
- [EECS Departmental Computer or Network Support](http://iris.eecs.berkeley.edu/helpdesk/)
- [People and Directories](https://eecs.berkeley.edu/people)

## Contact Us

To contact the owner of the specific page you were
looking for send an email to:

`russell AT cs.berkeley.edu`

To report problems or submit questions and comments related to the
site, please send an email to the following email address:

`
webteam@EECS.Berkeley.EDU`
Resource ID: 9d7e93ca9f7eba36 | Stable ID: ODkzMjA3ZD