Back
Research on AI Safety (Russell et al., AIPS 2015)
webpeople.eecs.berkeley.edu·people.eecs.berkeley.edu/~russell/papers/aips15-safety.pdf
A foundational early paper by Stuart Russell articulating the value alignment problem and IRL-based solutions; precursor to the CIRL/assistance games research program and influential in shaping the technical AI safety field.
Metadata
Importance: 82/100conference paperprimary source
Summary
This paper by Stuart Russell and colleagues, presented at AIPS 2015, outlines a foundational framework for AI safety centered on the idea that AI systems should be uncertain about human values and use inverse reinforcement learning to infer them. It introduces the concept of assistance games (formerly CIRL) where AI agents are cooperative and defer to human preferences rather than pursuing fixed objective functions.
Key Points
- •Argues that specifying a fixed objective function for AI is dangerous; instead, AI should remain uncertain about human values and seek to learn them.
- •Proposes inverse reinforcement learning (IRL) as a mechanism for AI to infer human preferences from observed behavior.
- •Introduces cooperative/assistance game framework where AI and human have aligned incentives, making the AI inherently deferential.
- •Identifies three core properties for safe AI: altruistic (human-welfare-maximizing), uncertainty about objectives, and ability to learn from human behavior.
- •Frames AI safety as a technical problem solvable through principled probabilistic and game-theoretic approaches.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Center for Human-Compatible AI | Organization | 37.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20261 KB
# Resource not found ## The server has encountered a problem because the resource was not found. Your request was :https://people.eecs.berkeley.edu/~russell/papers/aips15-safety.pdf ## What are you looking for? - [Graduate Programs and Admissions](https://eecs.berkeley.edu/academics/graduate). There's also a [Graduate Admissions FAQ](https://eecs.berkeley.edu/academics/graduate/faq) for answers to specific questions. - [Undergraduate Programs and Admissions](https://eecs.berkeley.edu/academics/undergraduate) - [Recruiting EECS students and posting jobs](https://eecs.berkeley.edu/industry/recruit-students) - [EECS Departmental Computer or Network Support](http://iris.eecs.berkeley.edu/helpdesk/) - [People and Directories](https://eecs.berkeley.edu/people) ## Contact Us To contact the owner of the specific page you were looking for send an email to: `russell AT cs.berkeley.edu` To report problems or submit questions and comments related to the site, please send an email to the following email address: ` webteam@EECS.Berkeley.EDU`
Resource ID:
9d7e93ca9f7eba36 | Stable ID: ODkzMjA3ZD