Search papers, labs, and topics across Lattice.
The paper introduces Group Fine-Tuning (GFT), a post-training framework for LLMs designed to overcome the limitations of SFT by addressing reward sparsity and unstable inverse-probability weighting. GFT employs Group Advantage Learning to create diverse response groups and normalized contrastive supervision, along with Dynamic Coefficient Rectification to stabilize optimization. Experiments show GFT outperforms SFT and integrates better with subsequent RL training.
SFT's instability and reward sparsity can be overcome with a novel Group Fine-Tuning (GFT) framework, leading to better LLM policies.
Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a special case of policy gradient optimization with an extremely sparse implicit reward and unstable inverse-probability weighting, which together lead to single-path dependency, entropy collapse, and gradient explosion. Motivated by this diagnosis, we propose Group Fine-Tuning (GFT), a unified post-training framework that addresses these intrinsic limitations through two mechanisms: Group Advantage Learning, which constructs diverse response groups and derives normalized contrastive supervision to alleviate reward sparsity, and Dynamic Coefficient Rectification, which adaptively bounds inverse-probability weights to stabilize optimization while preserving efficient knowledge injection. Experiments demonstrate that GFT consistently surpasses SFT-based methods and yields policies that integrate more smoothly with subsequent RL training.