Search papers, labs, and topics across Lattice.
This paper tackles the NP-hard penalized Least Trimmed Squares (LTS) regression problem, which is crucial for robust statistical estimation in the presence of outliers. They introduce a novel mixed-integer optimization (MIO) formulation that incorporates hyperplane arrangement logic within a perspective reformulation, leading to a branch-and-bound tree of polynomial size (for fixed feature count). Empirically, their tailored branch-and-bound algorithm, leveraging first-order methods and dual bounds, achieves a 1% optimality gap in 1 minute on synthetic datasets with 5000 samples and 20 features, outperforming existing MIO approaches.
Exact robust regression at scale is now possible: a new algorithm solves the NP-hard Least Trimmed Squares problem orders of magnitude faster than existing methods.
We study computational aspects of a key problem in robust statistics -- the penalized least trimmed squares (LTS) regression problem, a robust estimator that mitigates the influence of outliers in data by capping residuals with large magnitudes. Although statistically attractive, penalized LTS is NP-hard, and existing mixed-integer optimization (MIO) formulations scale poorly due to weak relaxations and exponential worst-case complexity in the number of observations. We propose a new MIO formulation that embeds hyperplane arrangement logic into a perspective reformulation, explicitly enforcing structural properties of optimal solutions. We show that, if the number of features is fixed, the resulting branch-and-bound tree is of polynomial size in the sample size. Moreover, we develop a tailored branch-and-bound algorithm that uses first-order methods with dual bounds to solve node relaxations efficiently. Computational experiments on synthetic and real datasets demonstrate substantial improvements over existing MIO approaches: on synthetic instances with 5000 samples and 20 features, our tailored solver reaches a 1% gap in 1 minute while competing approaches fail to do so within one hour. These gains enable exact robust regression at significantly larger sample sizes in low-dimensional settings.