🟡 🛡️ Security Published: · 3 min read ·

arXiv:2606.05523: CHASE — co-evolutionary red-blue teaming via reinforcement learning

arXiv:2606.05523 ↗

Editorial illustration: CHASE — co-evolutionary red-blue teaming via reinforcement learning

CHASE is a closed-loop framework in which an attacker and a defender model co-evolve through reinforcement learning. The attacker uses GRPO to rewrite prompts while preserving intent, while the defense is strengthened through two-stage training. The result is a 43.2% reduction in vulnerability score with a zero false-refusal rate on benign inputs.

🤖

This article was generated using artificial intelligence from primary sources.

Paper arXiv:2606.05523 (v1, June 4, 2026, 00:06 UTC) introduces CHASE, a closed-loop framework in which an attacker and a defender model co-evolve through reinforcement learning (RL). The goal is to strengthen the safety of large language models through the simultaneous development of attack and defense.

What is CHASE and how is it structured?

CHASE is a closed-loop red-blue teaming framework. In security terminology, the red team represents an attacker searching for vulnerabilities, while the blue team represents the defense. The distinctive feature of CHASE is that the attacker and the defender model do not operate separately but co-evolve: as the attacker develops new attacks, the defense adapts, and that adaptation in turn forces the attacker to evolve further. It is a closed loop in which both sides advance against one another.

How does the attacking side work?

The attacker in CHASE uses GRPO to rewrite prompts while preserving intent. The key is that the attack rewrites the input prompt so that it bypasses the defense but retains the original (harmful) intent. This generates realistic, diverse attack examples that serve the defender model as challenging training material.

How is the defense strengthened?

The defending side is strengthened through two-stage training that combines RL and rejection sampling. The first stage uses reinforcement learning, while the second uses rejection sampling — the selection of high-quality response examples — to further reinforce the defense. With this combination, the defender model learns to reject attacks generated by the GRPO attacker, while retaining the ability to respond normally to harmless requests.

What are the results?

The main result is a reduction in the vulnerability score of 43.2%. Equally important is that this was achieved while maintaining a zero false-refusal rate on benign inputs — the model does not reject harmless requests despite the strengthened defense. This way, CHASE addresses a common problem in safety training, where stronger defense often leads to over-refusal of legitimate queries.

Do the learned attacks generalize?

Yes. According to the paper, the learned attack patterns generalize across different mechanical attack families. This is an important finding because it shows that the defense learned within the CHASE framework is not narrow — it does not defend only against the single type of attack it was trained on, but transfers to other mechanisms as well. Such generalization makes the co-evolutionary approach promising for building more robust, broadly resilient safety defenses in large language models.

Frequently Asked Questions

What is CHASE?
CHASE is a closed-loop red-blue teaming framework in which the attacker (red team) and the defender model (blue team) co-evolve. The attacker uses GRPO to rewrite prompts while preserving the original intent, and the defender model learns to defend against the attacks generated this way.
What results does CHASE achieve?
CHASE reduces the vulnerability score by 43.2% while maintaining a zero false-refusal rate on benign inputs. This means stronger defense without losing usefulness on harmless requests.
Do the learned attacks generalize?
Yes. According to the paper, the learned attack patterns generalize across different mechanical attack families, which suggests that the defense learned through CHASE is not limited to a single type of attack.