Back
CHAI News & Research Updates
webhumancompatible.ai·humancompatible.ai/news
This is the official news feed of CHAI (UC Berkeley), one of the leading AI safety research institutes; useful for tracking current CHAI research directions across human-AI coordination, value alignment, and related technical safety topics.
Metadata
Importance: 55/100homepagenews
Summary
The Center for Human-Compatible AI (CHAI) news page aggregates recent research updates, publications, and announcements from CHAI researchers. Topics span human-AI coordination, goal misgeneralization, sycophancy reduction, political neutrality in AI, and offline reinforcement learning.
Key Points
- •Features research on 'Learning to Yield and Request Control' (YRC), a coordination problem about when AI should act autonomously vs. seek expert help
- •Covers work on mitigating goal misgeneralization by allowing agents to request human assistance when uncertain
- •Includes research on reducing LLM sycophancy using linear probe penalties applied during RLHF training
- •Highlights work on defining and evaluating political neutrality for AI systems
- •Serves as a living record of CHAI's ongoing alignment research output across multiple technical and conceptual areas
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Center for Human-Compatible AI | Organization | 37.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20267 KB
[](https://humancompatible.ai/news/2025/03/07/learning-to-coordinate-with-experts/)
### Learning to Coordinate with Experts
07 Mar 2025
Khanh Nguyen, Benjamin Plaut, Tu Trinh, and Mohamad Danesh introduce a fundamental coordination problem called Learning to Yield and Request Control (YRC), where the objective is to learn a strategy that determines when to act autonomously and when to seek expert assistance. They build an open-source benchmark featuring diverse domains, propose a novel validation approach, and investigate the performance of various learning methods across diverse environments, yielding insights that can guide future research.
[](https://humancompatible.ai/news/2025/02/20/computational-frameworksfor-human-care/)
### Computational Frameworks for Human Care
20 Feb 2025
Brian Christian, CHAI Affiliate, has published an article titled “ [Computational Frameworks for Human Care](https://www.amacad.org/sites/default/files/publication/downloads/daedalus_wi25_12_christian.pdf)” in the most recent issue of Daedalus, the journal of the American Academy of Arts and Sciences. In it, Christian traces how AI alignment has progressed from simple reward mechanisms toward care-like relationships, revealing both the potential and limitations of machine caregiving while deepening our understanding of human care itself. The issue is titled “The Social Science of Caregiving” and was co-edited by CHAI Affiliate Alison Gopnik.
[](https://humancompatible.ai/news/2025/02/04/a-practical-definition-of-political-neutrality-for-ai/)
### A Practical Definition of Political Neutrality for AI
04 Feb 2025
NEW: Our current research project to build [political neutrality evaluations](https://docs.google.com/document/d/19haXfSeQtTjdVLUbba1z5GRhW12rj0ipR6lM5S3G0o4/edit?usp=sharing).
[](https://humancompatible.ai/news/2025/01/18/rvs-what-is-essential-for-offline-rl-via-supervised-learning/)
### RvS: What is Essential for Offline RL via Supervised Learning?
18 Jan 2025
Scott Emmons, PhD student, was an author on “RvS: What is Essential for Offline RL via Supervised Learning?”
[](https://humancompatible.ai/news/2024/12/25/getting-by-goal-misgeneralization-with-a-little-help-from-a-mentor-2/)
### Getting By Goal Misgeneralization With a Little Help From a Mentor
25 Dec 2024
“Tu Trinh, Ben Plaut, Khanh Nguyen, and Mohamad Danesh wrote the paper, “Getting By Goal Misgeneralization With a Little Help From a Mentor.” This paper explores whether goal misgeneralization can be mitigated by allowing an agent to ask for help when it is uncertain. The answer is mostly yes, although our
... (truncated, 7 KB total)Resource ID:
5af46b480f0a6021 | Stable ID: ZjJmYjI3NW