ShanghaiTechApr 15, 2026arXiv:2604.14017

Stochastic Trust-Region Methods for Over-parameterized Models

AI Summary

This paper introduces a stochastic trust-region framework for training over-parameterized models, aiming to alleviate the sensitivity to step-size selection common in stochastic optimization methods like SGD. They develop a first-order stochastic trust-region algorithm for unconstrained optimization, achieving $O(\varepsilon^{-2} \log(1/\varepsilon))$ complexity under the strong growth condition. For equality-constrained problems, they propose a quadratic-penalty-based stochastic trust-region method with $O(\varepsilon^{-4} \log(1/\varepsilon))$ complexity, demonstrating stable optimization and effective constraint handling in deep learning tasks.

Key Contribution

Stochastic Trust-Region methods offer stable deep learning optimization and handle hard constraints effectively, all without manual learning-rate tuning.

Abstract

Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. For unconstrained optimization, we develop a first-order stochastic trust-region algorithm and show that, under the strong growth condition, it achieves an iteration and stochastic first-order oracle complexity of $O(\varepsilon^{-2} \log(1/\varepsilon))$ for finding an $\varepsilon$-stationary point. For equality-constrained problems, we introduce a quadratic-penalty-based stochastic trust-region method with penalty parameter $μ$, and establish an iteration and oracle complexity of $O(\varepsilon^{-4} \log(1/\varepsilon))$ to reach an $\varepsilon$-stationary point of the penalized problem, corresponding to an $O(\varepsilon)$-approximate KKT point of the original constrained problem. Numerical experiments on deep neural network training and orthogonally constrained subspace fitting demonstrate that the proposed methods achieve performance comparable to well-tuned stochastic baselines, while exhibiting stable optimization behavior and effectively handling hard constraints without manual learning-rate scheduling.

Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Stochastic Trust-Region Methods for Over-parameterized Models

Related Papers