🤖 24 AI
🟡 🛡️ Security Sunday, April 19, 2026 · 3 min read

SAGO: New Machine Unlearning Method Restores MMLU from 44.6% to 96% Without Sacrificing Forgetting, Accepted at ACL 2026

Editorial illustration: selective removal of memory fragments, protective layer around a neural network

Why it matters

SAGO is a gradient synthesis framework that reformulates machine unlearning as an asymmetric two-task problem — knowledge retention as the primary objective and forgetting as auxiliary. On the WMDP Bio benchmark it raises MMLU from a baseline of 44.6% past PCGrad's 94% to 96% with comparable forgetting scores, solving the main shortcoming of previous unlearning methods that excessively destroyed the model's useful knowledge.

What Does SAGO Actually Solve?

Machine unlearning is a technique for removing specific knowledge from an already trained language model — for example, dangerous biological procedures or personal data about an individual — without full retraining. The problem is that existing methods forget too broadly: by removing targeted knowledge, they simultaneously destroy the model’s general intelligence.

SAGO (Sign-constrained Asymmetric Gradient Optimization) is a new framework that reformulates the problem as an asymmetric two-task problem:

  • Primary task: Retain existing knowledge
  • Auxiliary task: Forget the targeted content

The difference is not cosmetic — SAGO uses gradient synthesis combining the PCGrad approach with sign-constrained logic that prioritizes retention. In practice, when the gradients of the two tasks conflict, SAGO leans toward retention — because the primary goal is not to forget, but to preserve the model’s general competence while removing specific knowledge.

How Big Is the Difference in Numbers?

On the WMDP (Weapons of Mass Destruction Proxy) Bio benchmark — the standard test measuring how much a model has “forgotten” dangerous biological knowledge — SAGO achieves the following:

MethodMMLU scoreForgetting
Baseline (after standard unlearning)44.6%
PCGrad (previous SOTA)94.0%comparable
SAGO (new result)96.0%comparable

MMLU (Massive Multitask Language Understanding) is the primary benchmark for general language intelligence. Dropping from ~75% pre-trained level to 44.6% after standard unlearning means the model lost a large portion of its useful knowledge. SAGO restores the score to 96%, practically without loss, while maintaining the forgetting of targeted WMDP Bio content.

Why Is This Significant for Model Safety?

Unlearning has become a key component of responsible AI deployment — regulators (EU AI Act, GDPR) and users ask model operators to be able to remove specific knowledge on demand. If the method destroys general competence, operators are left with only a binary choice: either keep the model as-is, or fully retrain it from scratch.

SAGO proves that it is possible to have both — precise forgetting and preserved knowledge — using existing methods available to anyone who already has access to a trained model.

Peer Review Status

The paper has been accepted at ACL 2026 (Annual Meeting of the Association for Computational Linguistics), one of the top NLP conferences. This means it has passed peer review — a significant signal of quality and reliability of results. The authors (a seven-member team, led by Xiao) have not released code with the preprint, but ACL traditionally requires a code release alongside publication.

🤖

This article was generated using artificial intelligence from primary sources.