Ant GroupCornellSoochowUniversity of LiverpoolMar 10, 2026arXiv:2603.09865

GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

Kai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, Penglei Gao

AI Summary

The paper introduces Gradient-aligned Sparse Tuning (GAST), a parameter-efficient fine-tuning method that simultaneously performs data and layer selection based on gradient alignment. GAST adaptively selects the most impactful data points for each layer, addressing the limitations of methods that focus solely on either layer or data selection. Experiments show GAST outperforms existing PEFT methods, demonstrating its effectiveness in reducing redundancy and improving performance.

Key Contribution

Forget laboriously sifting through layers or datasets for PEFT: GAST co-optimizes both, adaptively picking the most impactful data for each layer based on gradient alignment.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has become a key strategy for adapting large language models, with recent advances in sparse tuning reducing overhead by selectively updating key parameters or subsets of data. Existing approaches generally focus on two distinct paradigms: layer-selective methods aiming to fine-tune critical layers to minimize computational load, and data-selective methods aiming to select effective training subsets to boost training. However, current methods typically overlook the fact that different data points contribute varying degrees to distinct model layers, and they often discard potentially valuable information from data perceived as of low quality. To address these limitations, we propose Gradient-aligned Sparse Tuning (GAST), an innovative method that simultaneously performs selective fine-tuning at both data and layer dimensions as integral components of a unified optimization strategy. GAST specifically targets redundancy in information by employing a layer-sparse strategy that adaptively selects the most impactful data points for each layer, providing a more comprehensive and sophisticated solution than approaches restricted to a single dimension. Experiments demonstrate that GAST consistently outperforms baseline methods, establishing a promising direction for future research in PEFT strategies.

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

Related Papers