Constitutional AI
Alignment TrainingactiveTraining approach where AI systems critique and revise their own outputs using a set of principles, reducing reliance on human feedback.
Organizations
1
Key Papers
1
Grants
1
Total Funding
$42K
First Proposed: 2022 (Bai et al., Anthropic)
Cluster: Alignment Training
Tags
trainingself-supervisionalignmentanthropic
Organizations1
| Organization | Role |
|---|---|
| Anthropic | pioneer |
Grants1
| Name | Recipient | Amount | Funder | Date |
|---|---|---|---|---|
| 6-month 1 FTE funding to train Multi-Objective RLAIF models and compare their safety performance to standard RLAIF | Marcus Williams | $42K | Long-Term Future Fund (LTFF) | 2023-10 |
Funding by Funder
| Funder | Grants | Total Amount |
|---|---|---|
| Long-Term Future Fund (LTFF) | 1 | $42K |
Key Papers & Resources1
SEMINAL
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic)2022