Search papers, labs, and topics across Lattice.
This paper introduces MAST (Mechanism-Aligned Selective Targeting), a novel method for unlearning reasoning induced by Reinforcement Learning with Value Regularization (RLVR) that minimizes collateral damage compared to traditional full-parameter updates. By selectively targeting attention-projection tensors based on off-principal energy and other metrics, MAST achieves significant forgetting of specific tasks while preserving performance on others, such as maintaining GSM8K accuracy. The results show that MAST can effectively reduce target forgetting in MATH while enhancing retention in GSM8K, demonstrating its robustness across different models and objectives.
MAST achieves targeted forgetting in RLVR-induced reasoning with minimal impact on performance, preserving critical task accuracy while effectively unlearning unwanted knowledge.
We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.