Shen, H., Knearem, T., Ghosh, R., et al. (2024). "Towards Bidirectional Human-AI Alignment: A Systematic Review."

paper

2024·arXiv·arxiv.org/abs/2406.09264

Authors

Hua Shen·Tiffany Knearem·Reshmi Ghosh·Kenan Alkiek·Kundan Krishna·Yachuan Liu·Ziqiao Ma·Savvas Petridis·Yi-Hao Peng·Li Qiwei·Sushrita Rakshit·Chenglei Si·Yutong Xie·Jeffrey P. Bigham·Frank Bentley·Joyce Chai·Zachary Lipton·Qiaozhu Mei·Rada Mihalcea·Michael Terry·Diyi Yang·Meredith Ringel Morris·Paul Resnick·David Jurgens

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Useful framing paper for researchers thinking beyond one-directional value alignment; challenges the assumption that only AI systems need to change, while acknowledging that human adaptation to AI also carries risks worth scrutinizing from a safety perspective.

Paper Details

Citations

1 influential

Year

2024

Methodology

survey

arXiv:2406.09264 Semantic Scholar

Metadata

Importance: 62/100arxiv preprintanalysis

Abstract

Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and cross-disciplinary collaboration. In this position paper, we argue that the research community should explicitly define and critically reflect on "alignment" to account for the bidirectional and dynamic relationship between humans and AI. Through a systematic review of over 400 papers spanning HCI, NLP, ML, and more, we examine how alignment is currently defined and operationalized. Building on this analysis, we introduce the Bidirectional Human-AI Alignment framework, which not only incorporates traditional efforts to align AI with human values but also introduces the critical, underexplored dimension of aligning humans with AI -- supporting cognitive, behavioral, and societal adaptation to rapidly advancing AI technologies. Our findings reveal significant gaps in current literature, especially in long-term interaction design, human value modeling, and mutual understanding. We conclude with three central challenges and actionable recommendations to guide future research toward more nuanced, reciprocal, and human-AI alignment approaches.

Summary

This systematic review examines the concept of 'bidirectional' human-AI alignment, arguing that alignment should not only involve AI adapting to human values but also humans adapting to and understanding AI systems. The paper reviews existing literature to map out challenges, frameworks, and research gaps in achieving mutual accommodation between humans and AI.

Key Points

•Proposes 'bidirectional' alignment: AI systems learning human values AND humans updating mental models and behaviors to work effectively with AI.
•Conducts systematic literature review synthesizing research on human-to-AI alignment (adapting to AI) and AI-to-human alignment (value learning, preference elicitation).
•Identifies gaps in current alignment research, particularly the underexplored direction of how humans should adapt to AI systems.
•Highlights tensions between static value capture methods and the dynamic, context-dependent nature of human values and preferences.
•Calls for interdisciplinary approaches combining HCI, cognitive science, and AI safety to address the full bidirectional alignment challenge.

Cited by 1 page

Page	Type	Quality
Corrigibility	Research Area	59.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202698 KB

[2406.09264] Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions 
 
 
 
 
 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

 
 
 Hua Shen 1 ⋆ ⋆ {\star} ,
Tiffany Knearem 2 ♥ ♥ {\varheartsuit} ,
Reshmi Ghosh 2 ♦ ♦ {\vardiamondsuit} ,
Kenan Alkiek 3 ⋆ ⋆ {\star} ,
Kundan Krishna 3 ♠ ♠ {\spadesuit} ,
Yachuan Liu 3 ⋆ ⋆ {\star} ,
Ziqiao Ma 3 ⋆ ⋆ {\star} ,
Savvas Petridis 3 ♥ ♥ {\varheartsuit} ,
Yi-Hao Peng 3 ♠ ♠ {\spadesuit} ,
Li
Qiwei 3 ⋆ ⋆ {\star} ,
Sushrita Rakshit 3 ⋆ ⋆ {\star} ,
Chenglei Si 3 ♣ ♣ {\clubsuit} ,
Yutong Xie 3 ⋆ ⋆ {\star} ,
Jeffrey P. Bigham 4 ♠ ♠ {\spadesuit} ,
Frank Bentley 4 ♥ ♥ {\varheartsuit} ,
Joyce Chai 4 ⋆ ⋆ {\star} ,
Zachary Lipton 4 ♠ ♠ {\spadesuit} ,
Qiaozhu Mei 4 ⋆ ⋆ {\star} ,
Rada Mihalcea 4 ⋆ ⋆ {\star} ,
Michael Terry 4 ♥ ♥ {\varheartsuit} ,
Diyi Yang 4 ♣ ♣ {\clubsuit} ,
Meredith Ringel Morris 5 ♥ ♥ {\varheartsuit} ,
Paul Resnick 5 ⋆ ⋆ {\star} ,
David Jurgens 5 ⋆ ⋆ {\star} 
 
 
 ⋆ ⋆ {\star} University of Michigan ,
 ♥ ♥ {\varheartsuit} Google ,
 ♦ ♦ {\vardiamondsuit} Microsoft ,
 ♠ ♠ {\spadesuit} Carnegie Mellon University ,
 ♣ ♣ {\clubsuit} Stanford University ,
 ♥ ♥ {\varheartsuit} Google Research ,
 ♥ ♥ {\varheartsuit} Google DeepMind USA 
 
 
 (2024) 

 
 Abstract.

 Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment .
However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process ( i.e., aiming to ensure that AI systems’ objectives match humans) rather than an ongoing, mutual alignment problem  (Wikipedia, 2024 ) . This perspective largely neglects the long-term interaction and dynamic changes of alignment.
To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others.
We characterize, define and scope human-AI alignment. From this,
we present a conceptual framework of “Bidirectional Human-AI Alignment” to organize the literature from a human-centered perspective.
This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI ,
which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally.
Additionally, we articulate the key findings derived from literature analysis, includin

... (truncated, 98 KB total)

Resource ID: 190a2525cbd9e9c4 | Stable ID: sid_gLRdNVT7dn