Collective Constitutional AI: Aligning a Language Model with Public Input
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Introduces Collective Constitutional AI, a methodology for incorporating public input into language model alignment, addressing concerns about unilateral developer control and demonstrating reduced bias across social dimensions.
Paper Details
Metadata
Summary
This paper introduces Collective Constitutional AI (CCAI), a multi-stage methodology for incorporating public input into language model alignment, addressing concerns that LM developers should not unilaterally determine model behavior. The authors demonstrate the practical feasibility of CCAI by creating the first LM fine-tuned with collectively sourced principles and comparing it against a baseline model. Results show the CCAI-trained model exhibits lower bias across nine social dimensions while maintaining equivalent performance on language, math, and helpfulness metrics, with qualitative differences reflecting the distinct constitutions used for training.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Why Alignment Might Be Hard | Argument | 69.0 |
Cached Content Preview
Collective Constitutional AI: Aligning a Language Model with Public Input
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
saffron@cip.org
1234-5678-9012
Collective Intelligence Project San Francisco California USA
,
Divya Siddarth
divya@cip.org
Collective Intelligence Project San Francisco California USA
,
Liane Lovitt
Anthropic Sansome Street San Francisco California USA
,
Thomas I. Liao
Anthropic Sansome Street San Francisco California USA
,
Esin Durmus
Anthropic Sansome Street San Francisco California USA
,
Alex Tamkin
Anthropic Sansome Street San Francisco California USA
and
Deep Ganguli
Anthropic Sansome Street San Francisco California USA 94111
deep@anthropic.com
(2024)
Abstract.
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them.
To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs—from identifying a target population to sourcing principles to training and evaluating a model.
We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer.
Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal.
These results demonstrate a promising, tractable pathway toward publicly informed development of language models.
human-centered AI, participatory AI, reinforcement learning from human feedback, AI ethics, value alignment, collective alignment, AI alignment, generative AI, AI bias
† † journalyear: 2024 † † copyright: rightsretained † † conference: ACM Conference on Fairness, Accountability, and Transparency; June 3–6, 2024; Rio de Janeiro, Brazil † † booktitle: ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT ’24), June 3–6, 2024, Rio de Janeiro, Brazil † † doi: 10.1145/3630106.3658979 † † isbn: 979-8-4007-0450-5/24/06 † † conference: ACM Conference on Fairness, Accountability, and Transparency; June 03–06,
2024; Rio de Janeiro, Brazil † † ccs: Computing methodologies Machine learning † † ccs: Computing methodo
... (truncated, 98 KB total)d82c1768cf5e080b | Stable ID: YTU4MzFiOW