Apr 16, 2026arXiv:2604.14808

Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem

Zeguan Xiao, Siqing Li, Yong Wang, Xuetao Wei, Jian Yang, Yun Chen, Guanhua Chen

AI Summary

This paper reframes machine unlearning in LLMs as an asymmetric two-task problem where retention is prioritized over forgetting. They introduce a retention-prioritized gradient synthesis framework that decouples gradient extraction from conflict resolution, using PCGrad and a novel method called SAGO to address gradient conflicts. Experiments on WMDP Bio/Cyber and RWKU benchmarks demonstrate that SAGO achieves superior performance, pushing the Pareto frontier by better aligning gradients and mitigating unlearning-retention trade-offs.

Key Contribution

Forget about re-balancing losses – gradient geometry is the key to unlearning in LLMs without sacrificing retention.

Abstract

Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM unlearning as an asymmetric two-task problem: retention is the primary objective and forgetting is an auxiliary. From this perspective, we propose a retention-prioritized gradient synthesis framework that decouples task-specific gradient extraction from conflict-aware combination. Instantiating the framework, we adapt established PCGrad to resolve gradient conflicts, and introduce SAGO, a novel retention-prioritized gradient synthesis method. Theoretically, both variants ensure non-negative cosine similarity with the retain gradient, while SAGO achieves strictly tighter alignment through constructive sign-constrained synthesis. Empirically, on WMDP Bio/Cyber and RWKU benchmarks, SAGO consistently pushes the Pareto frontier: e.g., on WMDP Bio (SimNPO+GD), recovery of target model MMLU performance progresses from 44.6% (naive) to 94.0% (+PCGrad) and further to 96.0% (+SAGO), while maintaining comparable forgetting strength. Our results show that re-shaping gradient geometry, rather than re-balancing losses, is the key to mitigating unlearning-retention trade-offs.

Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem

Related Papers