What is machine unlearning?

Machine unlearning is the process of selectively removing knowledge from an already trained model — for example, specific memorized personal data or dangerous capabilities — without full retraining.

Why is MMLU at 44.6% a problem?

Standard unlearning methods also destroy the model's general knowledge, causing a dramatic drop in MMLU (a general benchmark). SAGO shows that forgetting can be achieved while retaining overall competence.

SAGO: New Machine Unlearning Method Restores MMLU from 44.6% to 96% Without Sacrificing Forgetting, Accepted at ACL 2026

What Does SAGO Actually Solve?

Machine unlearning is a technique for removing specific knowledge from an already trained language model — for example, dangerous biological procedures or personal data about an individual — without full retraining. The problem is that existing methods forget too broadly: by removing targeted knowledge, they simultaneously destroy the model’s general intelligence.

SAGO (Sign-constrained Asymmetric Gradient Optimization) is a new framework that reformulates the problem as an asymmetric two-task problem:

Primary task: Retain existing knowledge
Auxiliary task: Forget the targeted content

The difference is not cosmetic — SAGO uses gradient synthesis combining the PCGrad approach with sign-constrained logic that prioritizes retention. In practice, when the gradients of the two tasks conflict, SAGO leans toward retention — because the primary goal is not to forget, but to preserve the model’s general competence while removing specific knowledge.

How Big Is the Difference in Numbers?

On the WMDP (Weapons of Mass Destruction Proxy) Bio benchmark — the standard test measuring how much a model has “forgotten” dangerous biological knowledge — SAGO achieves the following:

Method	MMLU score	Forgetting
Baseline (after standard unlearning)	44.6%	—
PCGrad (previous SOTA)	94.0%	comparable
SAGO (new result)	96.0%	comparable

MMLU (Massive Multitask Language Understanding) is the primary benchmark for general language intelligence. Dropping from ~75% pre-trained level to 44.6% after standard unlearning means the model lost a large portion of its useful knowledge. SAGO restores the score to 96%, practically without loss, while maintaining the forgetting of targeted WMDP Bio content.

Why Is This Significant for Model Safety?

Unlearning has become a key component of responsible AI deployment — regulators (EU AI Act, GDPR) and users ask model operators to be able to remove specific knowledge on demand. If the method destroys general competence, operators are left with only a binary choice: either keep the model as-is, or fully retrain it from scratch.

SAGO proves that it is possible to have both — precise forgetting and preserved knowledge — using existing methods available to anyone who already has access to a trained model.

Peer Review Status

The paper has been accepted at ACL 2026 (Annual Meeting of the Association for Computational Linguistics), one of the top NLP conferences. This means it has passed peer review — a significant signal of quality and reliability of results. The authors (a seven-member team, led by Xiao) have not released code with the preprint, but ACL traditionally requires a code release alongside publication.

SAGO: New Machine Unlearning Method Restores MMLU from 44.6% to 96% Without Sacrificing Forgetting, Accepted at ACL 2026

What Does SAGO Actually Solve?

How Big Is the Difference in Numbers?

Why Is This Significant for Model Safety?

Peer Review Status

Sources

Related news