Self-modeling is instrumentally useful

web

Anthropic·anthropic.com/research/measuring-model-persuasiveness

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Relevant to AI safety discussions on emergent capabilities and misuse risk; demonstrates that frontier models have reached human-level persuasion, raising concerns about AI-enabled influence operations and the need for persuasion-specific safety evaluations.

Metadata

Importance: 72/100organizational reportprimary source

Summary

Anthropic researchers developed a methodology to empirically measure AI persuasiveness by tracking human opinion changes before and after exposure to AI-generated arguments. They found a consistent scaling trend where each successive Claude generation becomes more persuasive, with Claude 3 Opus reaching statistical parity with human-written persuasive arguments. This work highlights persuasion as a key capability metric with significant implications for misuse in disinformation and manipulation.

Key Points

•Clear scaling trend observed: each successive Claude generation (1→2→3) produces more persuasive arguments within both compact and frontier model classes.
•Claude 3 Opus achieves persuasiveness statistically indistinguishable from human-written arguments, a significant capability milestone.
•Method measures persuasion via pre/post agreement ratings on non-polarized topics to isolate genuine persuasive effect from prior beliefs.
•Persuasion is flagged as a dual-use capability with legitimate applications (healthcare, policy) but serious misuse potential (disinformation, manipulation).
•Experimental dataset released publicly on HuggingFace to enable independent replication, critique, and further research.

Cited by 1 page

Page	Type	Quality
Deceptive Alignment Decomposition Model	Analysis	62.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202627 KB

Societal Impacts Measuring the Persuasiveness of Language Models

 Apr 9, 2024 While people have long questioned whether AI models may, at some point, become as persuasive as humans in changing people&#x27;s minds, there has been limited empirical research into the relationship between model scale and the degree of persuasiveness across model outputs. To address this, we developed a basic method to measure persuasiveness, and used it to compare a variety of Anthropic models across three different generations (Claude 1, 2, and 3), and two classes of models (compact models that are smaller, faster, and more cost-effective, and frontier models that are larger and more capable).

 Within each class of models (compact and frontier), we find a clear scaling trend across model generations: each successive model generation is rated to be more persuasive than the previous . We also find that our latest and most capable model, Claude 3 Opus, produces arguments that don&#x27;t statistically differ in their persuasiveness compared to arguments written by humans (Figure 1) .

 Figure 1: Persuasiveness scores of model-written arguments (bars) and human-written arguments (horizontal dark dashed line). Error bars correspond to +/- 1SEM (vertical lines for model-written arguments, green band for human-written arguments). We see persuasiveness increases across model generations within both classes of models (compact: purple, frontier: red). We study persuasion because it is a general skill which is used widely within the world—companies try to persuade people to buy products, healthcare providers try to persuade people to make healthier lifestyle changes, and politicians try to persuade people to support their policies and vote for them. Developing ways to measure the persuasive capabilities of AI models is important because it serves as a proxy measure of how well AI models can match human skill in an important domain, and because persuasion may ultimately be tied to certain kinds of misuse, such as using AI to generate disinformation, or persuading people to take actions against their own interests.

 

 Here, we share our methods for studying the persuasiveness of AI models in a simple setting consisting of the following three steps:

 A person is presented with a claim and asked how much they agree with it,
 They are then shown an accompanying argument attempting to persuade them to agree with the claim,
 They are then asked to re-rate their level of agreement after being exposed to the persuasive argument.
 Throughout this post we discuss some of the factors which make this research challenging, as well as our assumptions and methodological choices for doing this research. Finally, we release our experimental data for others to analyze, critique, and build on.

 

 Focusing on Less Polarized Issues to Evaluate Persuasiveness

 

 In our analysis, we primarily focused on complex and emerging issues where people are less likely to have hardened views, such as o

... (truncated, 27 KB total)

Resource ID: 23665cecf2453df6 | Stable ID: sid_pVaxgCDZZ6