Collective Constitutional AI

paper

Anthropic·anthropic.com/research/collective-constitutional-ai-align...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

A key Anthropic paper on participatory AI alignment; relevant to debates about whose values AI should encode and how democratic input can be operationalized in training processes.

Metadata

Importance: 72/100organizational reportprimary source

Summary

Anthropic extended their Constitutional AI framework by using the Polis platform to crowdsource constitutional principles from approximately 1,000 Americans, enabling more democratic input into AI alignment. They trained a model on these publicly derived principles and compared its outputs to their standard Claude model, finding the crowd-sourced model was less likely to refuse borderline requests while maintaining safety. This work explores how public deliberation can inform AI value alignment rather than leaving it solely to developers.

Key Points

•Used Polis, a deliberative polling platform, to gather and synthesize constitutional principles from ~1,000 demographically diverse Americans.
•The publicly-sourced constitutional model was notably less paternalistic and more willing to engage with contentious topics compared to Anthropic's standard model.
•Demonstrates a methodology for incorporating broader societal input into AI alignment, reducing sole reliance on developer-defined values.
•Highlights tensions between public preferences and developer safety judgments, raising questions about whose values AI systems should reflect.
•Represents an early practical experiment in participatory or democratic approaches to AI governance and value alignment.

Review

This research represents an innovative attempt to democratize AI alignment by incorporating public preferences into an AI system's constitutional principles. By engaging approximately 1,000 Americans in an online deliberation process, the researchers sought to move beyond developer-defined values and explore how collective input might shape AI behavior. Methodologically, the study used the Polis platform to solicit and vote on potential AI governance principles, then translated these into a constitutional framework for model training. The resulting 'Public' model was rigorously evaluated against a 'Standard' model, revealing interesting nuances. While performance remained largely equivalent, the Public model showed notably lower bias across social dimensions, particularly in disability status and physical appearance. This suggests that public input can potentially introduce more inclusive and balanced principles into AI systems.

Cited by 5 pages

Page	Type	Quality
AI Alignment	Approach	91.0
Anthropic Core Views	Safety Agenda	62.0
AI-Assisted Deliberation	Approach	63.0
AI Model Specifications	Approach	50.0
AI Value Lock-in	Risk	64.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202622 KB

Policy Societal Impacts Collective Constitutional AI: Aligning a Language Model with Public Input

 Oct 17, 2023 

 Anthropic and the Collective Intelligence Project recently ran a public input process involving ~1,000 Americans to draft a constitution for an AI system. We did this to explore how democratic processes can influence AI development. In our experiment, we discovered areas where people both agreed with our in-house constitution , and areas where they had different preferences. In this post, we share the resulting publicly sourced constitution, as well as what happened when we trained a new AI system against it using Constitutional AI.

 Constitutional AI (CAI) is an Anthropic-developed method for aligning general purpose language models to abide by high-level normative principles written into a constitution. Anthropic’s language model Claude currently relies on a constitution curated by Anthropic employees. This constitution takes inspiration from outside sources like the United Nations Universal Declaration of Human Rights, as well as our own firsthand experience interacting with language models to make them more helpful and harmless.

While Constitutional AI is useful for making the normative values of our AI systems more transparent, it also highlights the outsized role we as developers play in selecting these values—after all, we wrote the constitution ourselves. That is why for this research, we were eager to curate a constitution using the preferences of a large number of people who do not work at Anthropic. We believe that our work may be one of the first instances in which members of the public have collectively directed the behavior of a language model via an online deliberation process. We hope that sharing our very preliminary efforts and findings will help others learn from our successes and failures, and help build upon this work.

 Designing a Public Input Process to Collectively Draft a Constitution

 Anthropic partnered with the Collective Intelligence Project to run a public input process using the Polis platform. Polis is an open-source platform for running online deliberative processes augmented by machine learning algorithms. It has been used all over the world by governments, academics, independent media, and citizens to understand what large groups of people think.

We asked approximately 1,000 members of the American public to “Help us pick rules for our AI Chatbot!” (Figure 1). We sought a representative sample of U.S. adults across age, gender, income, and geography (anonymized participant demographics can be found here ). Participants could either vote on existing rules (normative principles), or add their own. In total, participants contributed 1,127 statements to the Polis, and cast 38,252 votes (an average of 34 votes per person). In general, we found a high degree of consensus on most statements, though Polis did identify two separate opinion groups (Figure 2).

 

 Figure 1: Stylized depiction of the int

... (truncated, 22 KB total)

Resource ID: 3c862a18b467640b | Stable ID: sid_RfyUrpFdnm