arXiv:2605.21006: Off-the-shelf persona vectors achieve 68-98% effectiveness of targeted sycophancy steering in LLM models
Researchers published a paper on arXiv on 21 May 2026 titled 'Playing Devil's Advocate' showing that existing persona vectors developed for roleplay tasks can reduce sycophancy (the model's tendency to agree with the user even when the user is wrong) to 68-98% of the effectiveness of specialised Contrastive Activation Addition (CAA) — without training on sycophancy-specific data. Geometric analysis reveals that sycophancy is a persona-level property rather than a single steerable direction in activation space, opening much easier pathways for alignment.