Search papers, labs, and topics across Lattice.
This paper establishes a theoretical framework for out-of-distribution (OOD) detection in dynamic environments by leveraging a reinforcement learning (RL)-guided optimizer that prioritizes reducing semantic OOD false positives over time. The authors introduce an augmented optimizer that integrates an RL correction term with standard gradient descent, demonstrating significant improvements in future-domain generalization and semantic-OOD rejection. Through a detailed analysis of temporal error decomposition, the study reveals how model and environment changes impact OOD detection performance, offering a novel perspective on optimizing OOD detection strategies.
Reinforcement learning can significantly enhance OOD detection by reducing false positives in evolving environments, outperforming traditional gradient descent methods.
Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.