Skip to content
Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

A key Anthropic paper on participatory AI alignment; relevant to debates about whose values AI should encode and how democratic input can be operationalized in training processes.

Metadata

Importance: 72/100organizational reportprimary source

Summary

Anthropic extended their Constitutional AI framework by using the Polis platform to crowdsource constitutional principles from approximately 1,000 Americans, enabling more democratic input into AI alignment. They trained a model on these publicly derived principles and compared its outputs to their standard Claude model, finding the crowd-sourced model was less likely to refuse borderline requests while maintaining safety. This work explores how public deliberation can inform AI value alignment rather than leaving it solely to developers.

Key Points

  • Used Polis, a deliberative polling platform, to gather and synthesize constitutional principles from ~1,000 demographically diverse Americans.
  • The publicly-sourced constitutional model was notably less paternalistic and more willing to engage with contentious topics compared to Anthropic's standard model.
  • Demonstrates a methodology for incorporating broader societal input into AI alignment, reducing sole reliance on developer-defined values.
  • Highlights tensions between public preferences and developer safety judgments, raising questions about whose values AI systems should reflect.
  • Represents an early practical experiment in participatory or democratic approaches to AI governance and value alignment.

Review

This research represents an innovative attempt to democratize AI alignment by incorporating public preferences into an AI system's constitutional principles. By engaging approximately 1,000 Americans in an online deliberation process, the researchers sought to move beyond developer-defined values and explore how collective input might shape AI behavior. Methodologically, the study used the Polis platform to solicit and vote on potential AI governance principles, then translated these into a constitutional framework for model training. The resulting 'Public' model was rigorously evaluated against a 'Standard' model, revealing interesting nuances. While performance remained largely equivalent, the Public model showed notably lower bias across social dimensions, particularly in disability status and physical appearance. This suggests that public input can potentially introduce more inclusive and balanced principles into AI systems.

Cited by 5 pages

PageTypeQuality
AI AlignmentApproach91.0
Anthropic Core ViewsSafety Agenda62.0
AI-Assisted DeliberationApproach63.0
AI Model SpecificationsApproach50.0
AI Value Lock-inRisk64.0
Resource ID: 3c862a18b467640b | Stable ID: NGE4YTgyZD