Search papers, labs, and topics across Lattice.
This paper introduces Task-aware Low-Rank Adaptation (TLoRA), a novel framework that optimizes both initialization and resource allocation for fine-tuning large language models using a data-driven approach. By employing singular value decomposition to align the LoRA $A$ matrix with task-relevant subspaces and freezing it during training, TLoRA enhances efficiency while maintaining performance across diverse tasks. Experimental results reveal that TLoRA not only excels in natural language understanding and reasoning but also significantly reduces the number of trainable parameters compared to traditional LoRA methods.
TLoRA achieves superior performance across multiple tasks while cutting down trainable parameters, redefining efficiency in fine-tuning large language models.
Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning method for large language models, with its effectiveness largely influenced by the allocation of ranks and scaling factors, as well as initialization. Existing LoRA variants typically address only one of these factors, often at the cost of increased training complexity or reduced practical efficiency. In this work, we present Task-aware Low-Rank Adaptation (TLoRA), a unified framework that jointly optimizes initialization and resource allocation at the outset of training. TLoRA introduces a data-driven initialization strategy that aligns the LoRA $A$ matrix with task-relevant subspaces by performing singular value decomposition on the product of pre-trained weights and input activation covariance. After this, the $A$ matrix is frozen, and only the $B$ matrix is trained. Furthermore, TLoRA employs a sensitivity-based importance metric to adaptively allocate ranks and scaling factors across layers under a fixed parameter budget. We conduct extensive experiments that demonstrate TLoRA consistently performs excellently across various tasks, including natural language understanding, commonsense reasoning, math reasoning, code generation, and chat generation, while significantly reducing the number of trainable parameters.