Google ResearchFeb 24, 2026arXiv:2602.21103

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Sanket Badhe, Sanket Badhe, Deep Shah, Deep Shah

AI Summary

The paper introduces Prompt-Level Distillation (PLD), a novel non-parametric method for transferring reasoning abilities from a large teacher model to a smaller student model by extracting and structuring explicit reasoning patterns into instructions for the student's system prompt. This approach avoids the computational costs and interpretability issues associated with fine-tuning. Experiments on StereoSet and Contract-NLI using Gemma-3 4B demonstrate that PLD significantly improves Macro F1 scores, enabling the compact model to achieve performance comparable to larger models with minimal latency.

Key Contribution

Forget fine-tuning: Prompt-Level Distillation lets small models match frontier reasoning performance by distilling explicit reasoning patterns into structured system prompts.

Abstract

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated on the StereoSet and Contract-NLI datasets using Gemma-3 4B, PLD improved Macro F1 scores from 57\% to 90.0\% and 67\% to 83\% respectively, enabling this compact model to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References38

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Related Papers