Mar 10, 2026arXiv:2603.09527

Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Yuhao Chen, Qingyu Zhang, Jixiang Luo, Xuelong Li, Rongrong Ji

AI Summary

The paper introduces Efficient Draft Adaptation (EDA), a parameter- and data-efficient framework to restore speculative decoding performance when target LLMs are fine-tuned for specific domains. EDA uses a decoupled architecture with shared and private components, adapting only the lightweight private component, and regenerates training data using the fine-tuned target model. A sample selection mechanism prioritizes high-value data, leading to improved average acceptance lengths with reduced training costs compared to full retraining of draft models.

Key Contribution

Don't fully retrain your draft model after fine-tuning your LLM: EDA restores speculative decoding performance with significantly less compute by adapting only a small, private component and regenerating training data.

Abstract

Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we introduce a parameter- and data-efficient framework named Efficient Draft Adaptation, abbreviated as EDA, for efficiently adapting draft models. EDA introduces three innovations: (1) a decoupled architecture that utilizes shared and private components to model the shared and target-specific output distributions separately, enabling parameter-efficient adaptation by updating only the lightweight private component;(2) a data regeneration strategy that utilizes the fine-tuned target model to regenerate training data, thereby improving the alignment between training and speculative decoding, leading to higher average acceptance length;(3) a sample selection mechanism that prioritizes high-value data for efficient adaptation. Our experiments show that EDA effectively restores speculative performance on fine-tuned models, achieving superior average acceptance lengths with significantly reduced training costs compared to full retraining. Code is available at https://github.com/Lyn-Lucy/Efficient-Draft-Adaptation.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

Related Papers