Berkeley CHAI Research

web

humancompatible.ai·humancompatible.ai/research

CHAI at UC Berkeley is a leading academic AI safety research center; this page indexes their active research projects and is useful for tracking frontier academic work on alignment and human-compatible AI development.

Metadata

Importance: 72/100homepage

Summary

The Berkeley Center for Human-Compatible AI (CHAI) conducts foundational research on making AI systems that are safe and beneficial for humans. Their work focuses on value alignment, preference learning, and ensuring AI systems remain under meaningful human control. CHAI is one of the leading academic institutions dedicated to long-term AI safety research.

Key Points

•CHAI focuses on technical AI alignment research, particularly on developing AI systems that learn and align with human values and preferences
•Research areas include inverse reward design, cooperative AI, and formal methods for ensuring AI systems remain corrigible and controllable
•Founded by Stuart Russell, CHAI emphasizes the 'assistance game' framework where AI systems are uncertain about human objectives
•Produces both theoretical foundations and practical methods for building provably beneficial AI systems
•Collaborates with policymakers and other institutions to bridge technical safety research with governance applications

Cited by 1 page

Page	Type	Quality
AI Risk Interaction Network Model	Analysis	64.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB

Research Publications &#8211; Center for Human-Compatible Artificial Intelligence 
 
 
 
 

 

 

 
 
 
 
 
 
 
 
 
 

 

 

 
 
 
 
 

 
 

 CHAI aims to reorient the foundations of AI research toward the development of provably beneficial systems . Currently, it is not possible to specify a formula for human values in any form that we know would provably benefit humanity, if that formula were instated as the objective of a powerful AI system. In short, any initial formal specification of human values is bound to be wrong in important ways. This means we need to somehow represent uncertainty in the objectives of AI systems. This way of formulating objectives stands in contrast to the standard model for AI, in which the AI system's objective is assumed to be known completely and correctly. 

 Therefore, much of CHAI's research efforts to date have focussed on developing and communicating a new model of AI development, in which AI systems should be uncertain of their objectives, and should be deferent to humans in light of that uncertainty. However, our interests extend to a variety of other problems in the development of provably beneficial AI systems. Our areas of greatest focus so far have been
 the foundations of rational agency and causality,
 value alignment and inverse reinforcement learning,
 human-robot cooperation,
 multi-agent perspectives and applications, and
 models of bounded or imperfect rationality. Other areas of interest to our mission include
 adversarial training and testing for ML systems,
 various AI capabilities,
 topics in cognitive science,
 ethics for AI and AI development
 robust inference and planning,
 security problems and solutions, and
 transparency and interpretability methods.
 

 In addition to purely academic work, CHAI strives to produce intellectual outputs for general audiences as well. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.

 Below is a list of CHAI's publications since we began operating in 2016. Many of our publications are collaborations with other AI research groups; we view collaborations as key to integrating our perspectives into mainstream AI research.

 

 0. NOTE:

 0. NOTE:

 
 
 Jonathan Stray. 2022.
 Risk Ratios. 
 NICAR 2022 
 

 
 J Stray. 2022.
 Better Conflict Bulletin. 
 
 

 
 OA Dada, G Obaido, IT Sanusi, K Aruleba, AA Yunusa. 2022.
 Hidden Gold for IT Professionals, Educators, and Students: Insights From Stack Overflow Survey. 
 IEEE Transactions on Computational Social Systems 
 

 
 OA Dada, K Aruleba, AA Yunusa, IT Sanusi, G Obaido. 2022.
 Information Technology Roles and Their Most-Used Programming Languages. 
 Oluwaseun Alexander Dada, Kehinde Aruleba, Abdullahi Abubakar Yunusa, Ismaila Temitayo Sanusi, George Ob

... (truncated, 98 KB total)

Resource ID: b8668d08f397f100 | Stable ID: sid_D0fNfsnNFm