Collective Constitutional AI: Aligning a Language Model with Public Input

paper

2024·arXiv·arxiv.org/html/2406.07814v1

Authors

Saffron Huang·Divya Siddarth·Liane Lovitt·Thomas I. Liao·Esin Durmus·Alex Tamkin·Deep Ganguli

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Introduces Collective Constitutional AI, a methodology for incorporating public input into language model alignment, addressing concerns about unilateral developer control and demonstrating reduced bias across social dimensions.

Paper Details

Citations

20 influential

Year

2024

Methodology

peer-reviewed

Metadata

arxiv preprintprimary source

Summary

This paper introduces Collective Constitutional AI (CCAI), a multi-stage methodology for incorporating public input into language model alignment, addressing concerns that LM developers should not unilaterally determine model behavior. The authors demonstrate the practical feasibility of CCAI by creating the first LM fine-tuned with collectively sourced principles and comparing it against a baseline model. Results show the CCAI-trained model exhibits lower bias across nine social dimensions while maintaining equivalent performance on language, math, and helpfulness metrics, with qualitative differences reflecting the distinct constitutions used for training.

Cited by 1 page

Page	Type	Quality
Why Alignment Might Be Hard	Argument	69.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB

Collective Constitutional AI: Aligning a Language Model with Public Input 
 
 
 
 
 
 

 
 
 

 
 
 
 
 Collective Constitutional AI: Aligning a Language Model with Public Input

 
 
 Saffron Huang
 
 saffron@cip.org 
 
 1234-5678-9012 
 Collective Intelligence Project San Francisco California USA 
 
 ,  
 Divya Siddarth
 
 divya@cip.org 
 
 Collective Intelligence Project San Francisco California USA 
 
 ,  
 Liane Lovitt
 
 Anthropic Sansome Street San Francisco California USA 
 
 ,  
 Thomas I. Liao
 
 Anthropic Sansome Street San Francisco California USA 
 
 ,  
 Esin Durmus
 
 Anthropic Sansome Street San Francisco California USA 
 
 ,  
 Alex Tamkin
 
 Anthropic Sansome Street San Francisco California USA 
 
  and  
 Deep Ganguli
 
 Anthropic Sansome Street San Francisco California USA 94111 
 
 deep@anthropic.com 
 
 
 (2024) 
 
 Abstract.

 There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them.
To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs—from identifying a target population to sourcing principles to training and evaluating a model.
We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer.
Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal.
These results demonstrate a promising, tractable pathway toward publicly informed development of language models.

 
 human-centered AI, participatory AI, reinforcement learning from human feedback, AI ethics, value alignment, collective alignment, AI alignment, generative AI, AI bias
 
 † † journalyear: 2024 † † copyright: rightsretained † † conference: ACM Conference on Fairness, Accountability, and Transparency; June 3–6, 2024; Rio de Janeiro, Brazil † † booktitle: ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT ’24), June 3–6, 2024, Rio de Janeiro, Brazil † † doi: 10.1145/3630106.3658979 † † isbn: 979-8-4007-0450-5/24/06 † † conference: ACM Conference on Fairness, Accountability, and Transparency; June 03–06,
2024; Rio de Janeiro, Brazil † † ccs: Computing methodologies Machine learning † † ccs: Computing methodo

... (truncated, 98 KB total)

Resource ID: d82c1768cf5e080b | Stable ID: sid_KPveVuzo1e