Search papers, labs, and topics across Lattice.
This paper addresses the statistical cost of machine unlearning in the context of smooth strongly convex stochastic optimization by establishing tight upper and lower bounds on the excess population risk associated with approximate $\varepsilon$-unlearning. The authors find that the optimal unlearning rate is a combination of the standard statistical error and an unlearning penalty that varies with the dimension of the model, revealing a significant accuracy advantage for their unlearning algorithm when $\varepsilon$ is much larger than the model dimension. Their results indicate that while retraining from scratch is optimal for small $\varepsilon$, their method provides exponential improvements for larger $\varepsilon$, thus offering a practical solution for efficient unlearning in machine learning applications.
Unlearning can be exponentially more efficient than retraining from scratch when the unlearning requirement is sufficiently large, fundamentally changing the landscape of data removal in machine learning.
Machine unlearning is motivated by legal and user-facing requirements to remove the influence of individuals' data from trained models, such as the right to be forgotten. Prior work has developed algorithms and error bounds for unlearning in smooth strongly convex stochastic optimization, but the fundamental statistical cost of unlearning has remained unclear. We nearly resolve this problem by proving upper and lower bounds on the excess population risk of approximate $\varepsilon$-unlearning; our bounds are tight up to a condition-number factor. For mean estimation over the unit ball, our upper and lower bounds match. The optimal rate is the usual statistical error plus an unlearning penalty that interpolates between the retraining-from-scratch rate and an exponentially smaller term as $\varepsilon/d$ grows, where $d$ is the dimension of the model. In particular, when $\varepsilon \gg d$, our $\varepsilon$-unlearning algorithm offers an exponential accuracy improvement over retraining the model from scratch and differentially private baselines. On the other hand, when $\varepsilon \le d$, retraining from scratch is optimal.