Collective Constitutional AI
paperCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic
A key Anthropic paper on participatory AI alignment; relevant to debates about whose values AI should encode and how democratic input can be operationalized in training processes.
Metadata
Summary
Anthropic extended their Constitutional AI framework by using the Polis platform to crowdsource constitutional principles from approximately 1,000 Americans, enabling more democratic input into AI alignment. They trained a model on these publicly derived principles and compared its outputs to their standard Claude model, finding the crowd-sourced model was less likely to refuse borderline requests while maintaining safety. This work explores how public deliberation can inform AI value alignment rather than leaving it solely to developers.
Key Points
- •Used Polis, a deliberative polling platform, to gather and synthesize constitutional principles from ~1,000 demographically diverse Americans.
- •The publicly-sourced constitutional model was notably less paternalistic and more willing to engage with contentious topics compared to Anthropic's standard model.
- •Demonstrates a methodology for incorporating broader societal input into AI alignment, reducing sole reliance on developer-defined values.
- •Highlights tensions between public preferences and developer safety judgments, raising questions about whose values AI systems should reflect.
- •Represents an early practical experiment in participatory or democratic approaches to AI governance and value alignment.
Review
Cited by 5 pages
| Page | Type | Quality |
|---|---|---|
| AI Alignment | Approach | 91.0 |
| Anthropic Core Views | Safety Agenda | 62.0 |
| AI-Assisted Deliberation | Approach | 63.0 |
| AI Model Specifications | Approach | 50.0 |
| AI Value Lock-in | Risk | 64.0 |
3c862a18b467640b | Stable ID: NGE4YTgyZD