Search papers, labs, and topics across Lattice.
This paper analyzes the "curse of unrolling" in algorithm unrolling, where derivative iterates initially diverge from the true Jacobian despite asymptotic correctness. The authors provide a non-asymptotic analysis identifying algorithmic factors governing this divergence and demonstrate that truncating early iterations of the derivative computation mitigates the curse and reduces memory. They further show that warm-starting in bilevel optimization implicitly truncates the unrolling, providing a practical solution.
Algorithm unrolling's "curse" – initial Jacobian divergence – can be tamed by simply truncating early derivative iterations or using warm-starting in bilevel optimization.
Algorithm unrolling is ubiquitous in machine learning, particularly in hyperparameter optimization and meta-learning, where Jacobians of solution mappings are computed by differentiating through iterative algorithms. Although unrolling is known to yield asymptotically correct Jacobians under suitable conditions, recent work has shown that the derivative iterates may initially diverge from the true Jacobian, a phenomenon known as the curse of unrolling. In this work, we provide a non-asymptotic analysis that explains the origin of this behavior and identifies the algorithmic factors that govern it. We show that truncating early iterations of the derivative computation mitigates the curse while simultaneously reducing memory requirements. Finally, we demonstrate that warm-starting in bilevel optimization naturally induces an implicit form of truncation, providing a practical remedy. Our theoretical findings are supported by numerical experiments on representative examples.