Feb 18, 2026arXiv:2602.16456

Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horváth, Martin Takáč

AI Summary

The paper introduces LoRSum, a memory-efficient method for LoRA fine-tuning that bridges the gap between full-step training with low-rank projections and standard LoRA. LoRSum casts LoRA optimization as a proximal sub-problem, solving it with alternating least squares updates, which is shown to be an implicit block power method, and can also update low-rank momentum. The authors further propose a scaled variant of LoRSum using diagonal approximations of structured metrics like K-FAC and Shampoo, demonstrating performance comparable to LoRA baselines while maintaining parameter efficiency and avoiding full-matrix SVD.

Key Contribution

LoRA fine-tuning just got a memory-efficient upgrade: LoRSum matches or beats standard LoRA performance by reformulating optimization as a proximal problem and using diagonal K-FAC approximations, all without expensive SVD.

Abstract

Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning. We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent by casting LoRA optimization as a proximal sub-problem and solving it efficiently with alternating least squares updates, which we prove to be an implicit block power method. We recover several recently proposed preconditioning methods for LoRA as special cases, and show that LoRSum can also be used for updating a low-rank momentum. In order to address full steps with preconditioned gradient descent, we propose a scaled variant of LoRSum that uses structured metrics such as K-FAC and Shampoo, and we show that storing the diagonal of these metrics still allows them to perform well while remaining memory-efficient. Experiments on a synthetic task, CIFAR-100, and language-model fine-tuning on GLUE, SQuAD v2, and WikiText-103, show that our method can match or improve LoRA baselines given modest compute overhead, while avoiding full-matrix SVD projections and retaining LoRA-style parameter efficiency.

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

Related Papers