Search papers, labs, and topics across Lattice.
This paper reframes machine unlearning in LLMs as an asymmetric two-task problem where retention is prioritized over forgetting. They introduce a retention-prioritized gradient synthesis framework that decouples gradient extraction from conflict resolution, using PCGrad and a novel method called SAGO to address gradient conflicts. Experiments on WMDP Bio/Cyber and RWKU benchmarks demonstrate that SAGO achieves superior performance, pushing the Pareto frontier by better aligning gradients and mitigating unlearning-retention trade-offs.
Forget about re-balancing losses – gradient geometry is the key to unlearning in LLMs without sacrificing retention.
Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM unlearning as an asymmetric two-task problem: retention is the primary objective and forgetting is an auxiliary. From this perspective, we propose a retention-prioritized gradient synthesis framework that decouples task-specific gradient extraction from conflict-aware combination. Instantiating the framework, we adapt established PCGrad to resolve gradient conflicts, and introduce SAGO, a novel retention-prioritized gradient synthesis method. Theoretically, both variants ensure non-negative cosine similarity with the retain gradient, while SAGO achieves strictly tighter alignment through constructive sign-constrained synthesis. Empirically, on WMDP Bio/Cyber and RWKU benchmarks, SAGO consistently pushes the Pareto frontier: e.g., on WMDP Bio (SimNPO+GD), recovery of target model MMLU performance progresses from 44.6% (naive) to 94.0% (+PCGrad) and further to 96.0% (+SAGO), while maintaining comparable forgetting strength. Our results show that re-shaping gradient geometry, rather than re-balancing losses, is the key to mitigating unlearning-retention trade-offs.