Search papers, labs, and topics across Lattice.
This paper introduces a gradient-enhanced surrogate loss for online estimation of high-dimensional generalized linear models, addressing the limitations of previous renewable estimation approaches by eliminating batch-number constraints. The proposed method not only improves accuracy in non-distributed settings but is also extendable to distributed streaming data, allowing for efficient summary exchanges without full surrogate loss computations. Simulation results demonstrate that the new approach outperforms existing renewable estimators in both linear and logistic models, providing non-asymptotic error bounds under high-dimensional scaling.
Eliminating batch-number constraints in online estimation could significantly enhance the accuracy and efficiency of high-dimensional model training in streaming data environments.
We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves upon the existing renewable estimation approach for the same model in the high-dimensional setting, and removes the batch-number constraint in previous studies. We then extend the method to distributed streaming data under the master-client architecture, where batches are partitioned across sites and only summaries (gradient vectors) are exchanged. Instead of directing applying the popular method of Jordan et al. (2019) to the surrogate quadratic loss, our adjusted approach does not require the clients to compute the full surrogate loss. We derive non-asymptotic error bounds under the high-dimensional scaling, without the stringent constraint on the number of batches in the previous studies. Simulation results under linear and logistic models, together with a real-data application, show improved accuracy over existing renewable estimators.