Search papers, labs, and topics across Lattice.
This paper introduces a statistical framework for efficient machine unlearning applicable to generic loss functions, with a focus on squared loss. They develop Unlearning Least Squares (ULS) and prove its minimax optimality in estimating model parameters from remaining data given only the pre-trained estimator, forget samples, and a small subsample of the remaining data. Their analysis decomposes the estimation error into an oracle term and an unlearning cost, and they provide asymptotically valid inference procedures without full retraining.
Unlearning data doesn't have to mean retraining from scratch: this method gets you statistically optimal performance with just a fraction of the original data.
There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.