How large were the effects measured by the researchers?

Effects ranged from Cohen's d = 0.735 to 1.576 (p < 0.05) immediately after the conversation, and grew to d = 1.038 to 2.069 over the two-week follow-up. In statistical terms, these are large to very large effects.

Did participants know the chatbot was trying to persuade them?

No. Participants did not recognize the persuasive intent, and rated the persuasive and control agents as equally likeable. The study interprets this as a vulnerability to unnoticed and lasting manipulation of moral values.

arXiv Study: Chatbot Shifts Moral Judgments Permanently (d=2.07)

Q: Did participants know the chatbot was trying to persuade them?

No. Participants did not recognize the persuasive intent, and rated the persuasive and control agents as equally likeable. The study interprets this as a vulnerability to unnoticed and lasting manipulation of moral values.

A new empirical study published on arXiv (2604.21430) provides rare experimental evidence that brief conversations with a persuasive chatbot can produce statistically significant and lasting shifts in moral judgments among users. The research was conducted on 53 participants using a within-subject design, alongside a control group.

How Was the Study Set Up?

Participants evaluated moral scenarios in a naturalistic format: each participant held a conversation about four scenarios with a persuasive chatbot and four scenarios with a control agent. The goal was to isolate the effect of a persuasive conversational style from mere exposure to a topic.

This within-subject design (each participant as their own control) provides strong statistical power even with a relatively small sample, since between-person variability is removed from the equation. Baseline moral judgments were measured before the conversations, and repeated follow-up was conducted over a two-week period.

It is important to understand that “persuasive chatbot” here refers to a conversational style designed to shift attitudes in a specific direction — using techniques such as selective emphasis of arguments, leading questions, and empathic reframing — without overt pressure.

What Did They Measure?

The shifts in moral judgments were significant and large. Immediately after the conversation, Cohen’s d values ranged from 0.735 to 1.576 (p < 0.05), which already represents a large effect in social-psychological terms. Even more unusual: instead of fading, the effect amplified during follow-up, reaching values of d = 1.038 to 2.069.

The control group showed no changes, leading researchers to conclude that the shift comes from the persuasive conversation, not from merely thinking about a moral topic. The effect was specific to moral judgments — a parallel attempt to influence sentencing recommendations did not amplify in the same way.

A particularly notable finding: participants were not consciously aware that the chatbot was persuading them, and both agents (persuasive and control) received equally high likability ratings. This means that the mechanisms of previous defenses (“I’ll know when an AI is manipulating me”) did not work in this study.

Why Does This Matter for AI Regulation?

The finding opens a regulatory debate about AI as a moral advisor — a role chatbots increasingly take on when users discuss relationships, business decisions, or life dilemmas with them. If a brief conversation can shift moral attitudes at a measurably week-lasting level, this raises questions about informed consent, transparency of prompt settings, and developer accountability.

It should be noted that the limitations are real: a sample of 53 participants is small, and the study requires replication on larger and more demographically diverse populations. However, even in its current form the result is a rare instance of a randomized within-subject comparison of AI conversation effects on attitudes, and is likely to be cited in regulatory discussions about “AI compliance” and “persuasive design” throughout 2026.

arXiv:2604.21430: Brief Chatbot Conversations Permanently Shift Users' Moral Judgments — Empirical Study on 53 Participants

How Was the Study Set Up?

What Did They Measure?

Why Does This Matter for AI Regulation?

Sources

Related news