AIRIISP RAS Research Center for Trusted AISber AI LabSkoltechFeb 23, 2026arXiv:2602.19612

Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning

Borisiuk Anna, Borisiuk Anna, Andrey Savchenko, A. Savchenko, Alexander Panchecko, Alexander Panchecko, Elena Tutubalina

AI Summary

This paper introduces DUAL, a benchmark dataset of 28.6k Wikidata triplets annotated with fact popularity metrics (Wikipedia link counts and LLM-based salience scores) to investigate machine unlearning in LLMs. The study reveals that unlearning performance differs significantly depending on whether the knowledge originates from pretraining or supervised fine-tuning (SFT). The key finding is that SFT-based unlearning achieves smoother forgetting, more stable tuning, and higher retention compared to direct unlearning on pretrained models, which is prone to instability and catastrophic forgetting.

Key Contribution

Unlearning is much easier on supervised fine-tuned models than on pretrained ones, with direct unlearning on pretrained models often leading to catastrophic forgetting.

Abstract

Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supervised fine-tuning (SFT). In this paper, we introduce DUAL (Dual Unlearning Evaluation across Training Stages), a benchmark of 28.6k Wikidata-derived triplets annotated with fact popularity using Wikipedia link counts and LLM-based salience scores. Our experiments show that pretrained and SFT models respond differently to unlearning. An SFT step on the forget data yields smoother forgetting, more stable tuning, and 10-50% higher retention, while direct unlearning on pretrained models remains unstable and prone to relearning or catastrophic forgetting.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References22

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning

Related Papers