Feb 19, 2026arXiv:2602.17510

LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights

Kasun Dewage, Marianna Pensky, Suranadi De Silva, Shankadeep Mondal

AI Summary

The paper introduces CRAFT, a parameter-efficient fine-tuning method that applies Tucker decomposition via HOSVD to pre-trained attention weights stacked across transformer layers, freezing the resulting factors. CRAFT bridges the gap between methods that decompose gradient updates and those that apply SVD to pre-trained weights per layer, offering a cross-layer approach. Experiments on GLUE with RoBERTa demonstrate competitive performance with existing PEFT methods while using only 41K adaptation parameters, independent of model size.

Key Contribution

Forget scaling LoRA matrices with model size: CRAFT achieves competitive fine-tuning performance with a fixed 41K parameters, regardless of model depth or dimension.

Abstract

We introduce CRAFT (Cross-layer Rank Adaptation via Frozen Tucker), a parameter-efficient fine-tuning (PEFT) method that applies Tucker tensor decomposition to pre-trained attention weight matrices stacked across transformer layers and trains only small square adaptation matrices on the resulting frozen Tucker factors. Existing tensor-based PEFT methods decompose gradient updates: LoTR applies Tucker decomposition with shared factor matrices, while SuperLoRA groups and reshapes $ΔW$ across layers before applying Tucker decomposition. Separately, methods like PiSSA apply SVD to pre-trained weights but operate independently per layer. CRAFT bridges these two lines of work: it performs full Tucker decomposition via Higher-Order SVD (HOSVD) directly on pre-trained weights organized as cross-layer 3D tensors, freezes all resulting factors, and adapts the model through lightweight trainable transformations applied to each factor matrix. Experiments on the GLUE benchmark using RoBERTa-base and RoBERTa-large demonstrate that CRAFT achieves competitive performance with existing methods while requiring only 41K Tucker adaptation parameters--a count independent of model dimension and depth at fixed Tucker ranks.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights

Related Papers