Skip to content
Longterm Wiki
Back

using Constitutional AI to reduce sycophancy

web

A project from the AI Safety Fundamentals program exploring Constitutional AI as a mechanism to reduce sycophantic behavior in LLMs, relevant to honesty and alignment research.

Metadata

Importance: 42/100blog postanalysis

Summary

This project explores applying Constitutional AI (CAI) techniques to address sycophancy in large language models, where models tend to agree with or flatter users rather than providing accurate responses. It investigates whether rule-based constitutional principles can guide models toward more honest and consistent outputs. The work contributes to practical methods for improving AI truthfulness and reducing harmful people-pleasing behavior.

Key Points

  • Sycophancy in LLMs causes models to prioritize user approval over truthful, accurate responses, posing alignment risks.
  • Constitutional AI provides a framework of explicit principles that can potentially constrain sycophantic tendencies during training or inference.
  • The project tests whether CAI-style constitutions can be applied specifically to reduce agreement bias and flattery in model outputs.
  • Reducing sycophancy is important for AI safety as it relates to models being honest and robust under user pressure or pushback.
  • This is an AI Safety Fundamentals project, likely a research exercise exploring practical alignment interventions.

Cited by 1 page

PageTypeQuality
Epistemic SycophancyRisk60.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20262 KB
[![BlueDot Impact](https://substackcdn.com/image/fetch/$s_!PUc5!,w_40,h_40,c_fill,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9c29b1-3ff5-4ef6-8c04-e91c608ec10e_1000x1000.png)](https://blog.bluedot.org/)

# [BlueDot Impact](https://blog.bluedot.org/)

SubscribeSign in

[Projects](https://blog.bluedot.org/s/projects/?utm_source=substack&utm_medium=menu)

# Exploring the Use of Constitutional AI to Reduce Sycophancy in LLMs

Jul 04, 2024

3

Share

_This project was submitted by [Aleksandr Eliseev](https://www.linkedin.com/in/eliseealex/). It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks._

In the scope of this research, we’ve attempted to fine-tune the 4-bit quantized Mistral 7B model using the Constitutional AI technique (Bai 2022) with a constitution aimed at reducing Sycophancy. An approach similar to synthetically generated data (Wei 2024) has been used as the training data. We found a constitution that reduces sycophancy by ~26.5%, but the sycophancy of the fine-tuned models has increased after the fine-tuning.

> Read the full piece [here](https://docs.google.com/document/d/1_b853mHN2-Pq-DtuswMOX4BoeQWa0kCdCXqrm4NtgoY/edit).

3

Share

PreviousNext

#### Discussion about this post

CommentsRestacks

![User's avatar](https://substackcdn.com/image/fetch/$s_!TnFC!,w_32,h_32,c_fill,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Favatars%2Fdefault-light.png)

TopLatestDiscussions

No posts

### Ready for more?

Subscribe
Resource ID: 918fdc30d3fe07d1 | Stable ID: Y2FiMmE0MT