CHAI News & Research Updates

web

humancompatible.ai·humancompatible.ai/news

This is the official news feed of CHAI (UC Berkeley), one of the leading AI safety research institutes; useful for tracking current CHAI research directions across human-AI coordination, value alignment, and related technical safety topics.

Metadata

Importance: 55/100homepagenews

Summary

The Center for Human-Compatible AI (CHAI) news page aggregates recent research updates, publications, and announcements from CHAI researchers. Topics span human-AI coordination, goal misgeneralization, sycophancy reduction, political neutrality in AI, and offline reinforcement learning.

Key Points

•Features research on 'Learning to Yield and Request Control' (YRC), a coordination problem about when AI should act autonomously vs. seek expert help
•Covers work on mitigating goal misgeneralization by allowing agents to request human assistance when uncertain
•Includes research on reducing LLM sycophancy using linear probe penalties applied during RLHF training
•Highlights work on defining and evaluating political neutrality for AI systems
•Serves as a living record of CHAI's ongoing alignment research output across multiple technical and conceptual areas

Cited by 1 page

Page	Type	Quality
Center for Human-Compatible AI (CHAI)	Organization	37.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20264 KB

News &#8211; Center for Human-Compatible Artificial Intelligence 
 
 
 

 

 

 
 
 
 
 
 
 
 
 
 

 

 

 
 
 
 
 
 
 
 
 
 
 
 Learning to Coordinate with Experts

 07 Mar 2025

 Khanh Nguyen, Benjamin Plaut, Tu Trinh, and Mohamad Danesh introduce a fundamental coordination problem called Learning to Yield and Request Control (YRC), where the objective is to learn a strategy that determines when to act autonomously and when to seek expert assistance. They build an open-source benchmark featuring diverse domains, propose a novel validation approach, and investigate the performance of various learning methods across diverse environments, yielding insights that can guide future research.

 
 
 

 
 
 
 
 
 
 
 Computational Frameworks for Human Care

 20 Feb 2025

 Brian Christian, CHAI Affiliate, has published an article titled &#8220; Computational Frameworks for Human Care &#8221; in the most recent issue of Daedalus, the journal of the American Academy of Arts and Sciences. In it, Christian traces how AI alignment has progressed from simple reward mechanisms toward care-like relationships, revealing both the potential and limitations of machine caregiving while deepening our understanding of human care itself. The issue is titled &#8220;The Social Science of Caregiving&#8221; and was co-edited by CHAI Affiliate Alison Gopnik.

 
 
 

 
 
 
 
 
 
 
 A Practical Definition of Political Neutrality for AI

 04 Feb 2025

 NEW: Our current research project to build political neutrality evaluations .

 
 
 

 
 
 
 
 
 
 
 RvS: What is Essential for Offline RL via Supervised Learning?

 18 Jan 2025

 Scott Emmons, PhD student, was an author on &#8220;RvS: What is Essential for Offline RL via Supervised Learning?&#8221;

 
 
 

 
 
 
 
 
 
 
 Getting By Goal Misgeneralization With a Little Help From a Mentor

 25 Dec 2024

 &#8220;Tu Trinh, Ben Plaut, Khanh Nguyen, and Mohamad Danesh wrote the paper, &#8220;Getting By Goal Misgeneralization With a Little Help From a Mentor.&#8221; This paper explores whether goal misgeneralization can be mitigated by allowing an agent to ask for help when it is uncertain. The answer is mostly yes, although our current methods have substantial weaknesses and there are lots of interesting avenues for future work.&#8221;Tu Trinh, Ben Plaut, Khanh Nguyen, and Mohamad Danesh wrote the paper, This paper explores whether goal misgeneralization can be mitigated by allowing an agent to ask for help when it is uncertain. The answer is mostly yes, although our current methods have substantial weaknesses and there are lots of interesting avenues for future work.

 
 
 

 
 
 
 
 
 
 
 Linear Probe Penalties Reduce LLM Sycophancy

 14 Dec 2024

 Visiting ETH MsC student Henry Papadatos and supervising CHAI PhD student Rachel Freedman publish an article &#8220;Linear Probe Penalties Reduce LLM Sycophancy&#8221; at the NeurIPS SoLaR workshop. The paper demonstrates a generalizable methodology for reducing unwanted LLM behaviors that

... (truncated, 4 KB total)

Resource ID: 5af46b480f0a6021 | Stable ID: sid_gnJl9lDp3y