Stanford HAIIndependent ResearcherIndian Institute of Information TechnologyUCSCW category (WhoMay 28, 2026arXiv:2605.30514

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

Suryash Yagnik, Shubham Gaur, Saksham Thakur, Vinija Jain, Aman Chadha, Amitava Das

AI Summary

The paper identifies a critical evaluation gap in machine unlearning benchmarks, which are heavily skewed towards fact-based questions and lack sufficient causal reasoning ("Why") questions. To address this, they introduce 5WBENCH, a balanced benchmark with equal representation across Who, What, When, Where, and Why question types. Using this benchmark, they demonstrate the limitations of existing unlearning methods and propose MAAT, a novel three-phase adapter-based unlearning framework that achieves superior performance on both forgetting and retention of causal knowledge.

Key Contribution

Current unlearning methods can ace the test but still flunk causal reasoning, and this paper introduces a benchmark and method to fix that.

Abstract

Machine unlearning evaluation is structurally skewed: Why-type questions, which probe causal and relational knowledge, comprise less than 0.06% of CounterFact, 0.6% of ZSRE, and less than 1.3% of TOFU, MUSE, and WMDP-Cyber. This near-zero representation means that methods that fail on causal knowledge can score highly in aggregate, and this failure is undetectable without balanced evaluation. We present 5WBENCH, a balanced 5,000-sample benchmark with 1,000 examples per 5W category (Who, What, When, Where, Why), making causal unlearning failures quantifiable for the first time. Using 5WBENCH, we show that no existing baseline simultaneously achieves high forgetting and high retention on Why-type questions: aggressive forgetting degrades retained knowledge, while conservative methods fail to forget causal facts. Why-type difficulty stems from multi-hop reasoning chains (44% of Why entries vs. less than or equal to 2% for others) and gradient dilution over 40.1-token answer spans. We present MAAT (Multi-phase Adapter-Aware Targeted Unlearning), a three-phase framework operating on LoRA adapter weights, combining gradient-projected ascent, SVD rank-dimension pruning, task vector negation, and hybrid KL-hidden-state retain repair. MAAT is the first method to simultaneously achieve high forgetting and high retention on Why-type causal knowledge, reaching a new operating point on the forget-retain Pareto frontier. We make our code publicly available.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

Related Papers