Search papers, labs, and topics across Lattice.
This paper introduces TUNEAHEAD, a framework designed to predict the fine-tuning performance of large language models before full training begins, addressing the challenges of compute intensity and potential performance degradation. By encoding candidate runs as meta-feature vectors that incorporate both static dataset descriptors and dynamic probe features, TUNEAHEAD effectively maps these features to performance estimates. The framework demonstrates superior predictive accuracy over existing methods, achieving an RMSE of 1.47 percentage points and ensuring that 95.1% of predictions fall within a narrow margin of the true scores across extensive testing.
TUNEAHEAD can predict fine-tuning performance with remarkable accuracy, potentially saving researchers from costly and ineffective training runs.
Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and na茂ve runs can even degrade model performance. This raises a practical question:can we predict fine-tuning performance before committing to a full training run? We present TUNEAHEAD, a lightweight framework for pre-hoc prediction of fine-tuning performance. TUNEAHEAD encodes each candidate run as a meta-feature vector that combines static dataset descriptors with dynamic probe features from a short standardized probe. A predictor maps these features to performance estimates, while SHAP-based attributions provide interpretable diagnostics that reveal which specific features drive the prediction. Across 1,300+ fine-tuning runs on Qwen2.5-7B-Instruct, TUNEAHEAD consistently outperforms strong baselines such as Early-Stop Extrapolation and ProxyLM. On a held-out test set of 370 runs, TUNEAHEAD achieves an RMSE of 1.47 percentage points and places 95.1% of predictions within +3/-3 percentage points of the true score. These accurate continuous predictions support practical go/no-go screening policies that can reduce unnecessary full fine-tuning while retaining most promising runs.