using Constitutional AI to reduce sycophancy

web

aisafetyfundamentals.com·aisafetyfundamentals.com/projects/exploring-the-use-of-co...

A project from the AI Safety Fundamentals program exploring Constitutional AI as a mechanism to reduce sycophantic behavior in LLMs, relevant to honesty and alignment research.

Metadata

Importance: 42/100blog postanalysis

Summary

This project explores applying Constitutional AI (CAI) techniques to address sycophancy in large language models, where models tend to agree with or flatter users rather than providing accurate responses. It investigates whether rule-based constitutional principles can guide models toward more honest and consistent outputs. The work contributes to practical methods for improving AI truthfulness and reducing harmful people-pleasing behavior.

Key Points

•Sycophancy in LLMs causes models to prioritize user approval over truthful, accurate responses, posing alignment risks.
•Constitutional AI provides a framework of explicit principles that can potentially constrain sycophantic tendencies during training or inference.
•The project tests whether CAI-style constitutions can be applied specifically to reduce agreement bias and flattery in model outputs.
•Reducing sycophancy is important for AI safety as it relates to models being honest and robust under user pressure or pushback.
•This is an AI Safety Fundamentals project, likely a research exercise exploring practical alignment interventions.

Cited by 1 page

Page	Type	Quality
Epistemic Sycophancy	Risk	60.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20262 KB

Exploring the Use of Constitutional AI to Reduce Sycophancy in LLMs 
 
 
 
 
 

 

 

 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 

 
 
 

 

 
 
 

 

 

 

 

 
 

 
 

 

 

 

 
 BlueDot Impact 

 Subscribe Sign in Projects Exploring the Use of Constitutional AI to Reduce Sycophancy in LLMs

 Jul 04, 2024 3 Share This project was submitted by Aleksandr Eliseev . It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks. 

 In the scope of this research, we’ve attempted to fine-tune the 4-bit quantized Mistral 7B model using the Constitutional AI technique (Bai 2022) with a constitution aimed at reducing Sycophancy. An approach similar to synthetically generated data (Wei 2024) has been used as the training data. We found a constitution that reduces sycophancy by ~26.5%, but the sycophancy of the fine-tuned models has increased after the fine-tuning.

 Read the full piece here . 

 3 Share Previous Next Discussion about this post

 Comments Restacks Top Latest Discussions No posts

 Ready for more?

 Subscribe © 2026 Dewi Erwan · Privacy ∙ Terms ∙ Collection notice Start your Substack Get the app Substack is the home for great culture 
 

 
 
 
 

 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 

 
 
 
 

 
 

 

 
 

 
 This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

Resource ID: 918fdc30d3fe07d1 | Stable ID: sid_nvfkzTU4YI