Skip to content
Longterm Wiki
Back

Shen, H., Knearem, T., Ghosh, R., et al. (2024). "Towards Bidirectional Human-AI Alignment: A Systematic Review."

paper

Authors

Hua Shen·Tiffany Knearem·Reshmi Ghosh·Kenan Alkiek·Kundan Krishna·Yachuan Liu·Ziqiao Ma·Savvas Petridis·Yi-Hao Peng·Li Qiwei·Sushrita Rakshit·Chenglei Si·Yutong Xie·Jeffrey P. Bigham·Frank Bentley·Joyce Chai·Zachary Lipton·Qiaozhu Mei·Rada Mihalcea·Michael Terry·Diyi Yang·Meredith Ringel Morris·Paul Resnick·David Jurgens

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Useful framing paper for researchers thinking beyond one-directional value alignment; challenges the assumption that only AI systems need to change, while acknowledging that human adaptation to AI also carries risks worth scrutinizing from a safety perspective.

Paper Details

Citations
7
1 influential
Year
2024
Methodology
survey

Metadata

Importance: 62/100arxiv preprintanalysis

Abstract

Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and cross-disciplinary collaboration. In this position paper, we argue that the research community should explicitly define and critically reflect on "alignment" to account for the bidirectional and dynamic relationship between humans and AI. Through a systematic review of over 400 papers spanning HCI, NLP, ML, and more, we examine how alignment is currently defined and operationalized. Building on this analysis, we introduce the Bidirectional Human-AI Alignment framework, which not only incorporates traditional efforts to align AI with human values but also introduces the critical, underexplored dimension of aligning humans with AI -- supporting cognitive, behavioral, and societal adaptation to rapidly advancing AI technologies. Our findings reveal significant gaps in current literature, especially in long-term interaction design, human value modeling, and mutual understanding. We conclude with three central challenges and actionable recommendations to guide future research toward more nuanced, reciprocal, and human-AI alignment approaches.

Summary

This systematic review examines the concept of 'bidirectional' human-AI alignment, arguing that alignment should not only involve AI adapting to human values but also humans adapting to and understanding AI systems. The paper reviews existing literature to map out challenges, frameworks, and research gaps in achieving mutual accommodation between humans and AI.

Key Points

  • Proposes 'bidirectional' alignment: AI systems learning human values AND humans updating mental models and behaviors to work effectively with AI.
  • Conducts systematic literature review synthesizing research on human-to-AI alignment (adapting to AI) and AI-to-human alignment (value learning, preference elicitation).
  • Identifies gaps in current alignment research, particularly the underexplored direction of how humans should adapt to AI systems.
  • Highlights tensions between static value capture methods and the dynamic, context-dependent nature of human values and preferences.
  • Calls for interdisciplinary approaches combining HCI, cognitive science, and AI safety to address the full bidirectional alignment challenge.

Cited by 1 page

PageTypeQuality
CorrigibilityResearch Area59.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Hua Shen1⋆⋆{\\star},
Tiffany Knearem2♥♥{\\varheartsuit},
Reshmi Ghosh2♦♦{\\vardiamondsuit},
Kenan Alkiek3⋆⋆{\\star},
Kundan Krishna3♠♠{\\spadesuit},
Yachuan Liu3⋆⋆{\\star},
Ziqiao Ma3⋆⋆{\\star},
Savvas Petridis3♥♥{\\varheartsuit},
Yi-Hao Peng3♠♠{\\spadesuit},
Li
Qiwei3⋆⋆{\\star},
Sushrita Rakshit3⋆⋆{\\star},
Chenglei Si3♣♣{\\clubsuit},
Yutong Xie3⋆⋆{\\star},
Jeffrey P. Bigham4♠♠{\\spadesuit},
Frank Bentley4♥♥{\\varheartsuit},
Joyce Chai4⋆⋆{\\star},
Zachary Lipton4♠♠{\\spadesuit},
Qiaozhu Mei4⋆⋆{\\star},
Rada Mihalcea4⋆⋆{\\star},
Michael Terry4♥♥{\\varheartsuit},
Diyi Yang4♣♣{\\clubsuit},
Meredith Ringel Morris5♥♥{\\varheartsuit},
Paul Resnick5⋆⋆{\\star},
David Jurgens5⋆⋆{\\star}

⋆⋆{\\star}University of Michigan,
♥♥{\\varheartsuit}Google,
♦♦{\\vardiamondsuit}Microsoft,
♠♠{\\spadesuit}Carnegie Mellon University,
♣♣{\\clubsuit}Stanford University,
♥♥{\\varheartsuit}Google Research,
♥♥{\\varheartsuit}Google DeepMindUSA

(2024)

###### Abstract.

Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as _alignment_.
However, the lack of clarified definitions and scopes of _human-AI alignment_ poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process ( _i.e.,_ aiming to ensure that AI systems’ objectives match humans) rather than an ongoing, mutual alignment problem (Wikipedia, [2024](https://ar5iv.labs.arxiv.org/html/2406.09264#bib.bib430 "")). This perspective largely neglects the _long-term interaction_ and _dynamic changes_ of alignment.
To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others.
We characterize, define and scope human-AI alignment. From this,
we present a conceptual framework of “Bidirectional Human-AI Alignment” to organize the literature from a human-centered perspective.
This framework encompasses both 1) conventional studies of _aligning AI to humans_ that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of _aligning humans to AI_,
which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally.
Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations.
To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions.

Human-AI Alignm

... (truncated, 98 KB total)
Resource ID: 190a2525cbd9e9c4 | Stable ID: YzJlZTMyOW