May 6, 2026arXiv:2605.04713

Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition

AI Summary

This paper investigates subject-level machine unlearning as a post-hoc method for sanitizing engagement recognition models trained on datasets with noisy, subject-indexed supervision. They rank potentially harmful subjects using a model-dependent proxy, apply an approximate unlearning update, and compare the resulting model against an oracle retrained without those subjects. Experiments on DAiSEE and EngageNet show the unlearned model recovers a significant portion of the oracle's performance gain (89.3% and 92.5%, respectively) at a fraction of the retraining cost.

Key Contribution

Quickly sanitize your engagement recognition models after training: subject-level unlearning recovers ~90% of retraining benefits at 25% of the cost.

Abstract

Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machine unlearning as a post-hoc sanitization mechanism for engagement recognition. Starting from a baseline trained on all subjects, we rank candidate harmful subjects using a model-dependent proxy, apply a lightweight approximate unlearning update, and compare the result against an oracle model retrained from scratch on the retained subjects only. We instantiate this protocol on DAiSEE and EngageNet using Tensor-Convolution and Convolution-Transformer Network (TCCT-Net) as a fixed platform and evaluate three matched model states under the same removal scenario: baseline, unlearned, and oracle. In representative K=3 forget-set settings, the unlearned model recovers 89.3% and 92.5% of the oracle gain on EngageNet and DAiSEE, respectively, at roughly one quarter of retraining cost. Across the tested small-audit regimes, effectiveness is strongest at an intermediate forget-set size, indicating that approximate subject-level unlearning is a useful low-cost correction mechanism, but one whose benefit depends on subject selection quality and removal regime.

Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References35

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition

Related Papers