HKUSTJun 16, 2026arXiv:2606.17660

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

Yuxiang Luo, Haonan Long, Chen Wang, Qiqi Duan, Xiaotian Lin, Yanwei Xu, Yuyu Luo, Weikai Yang, Nan Tang

AI Summary

This paper introduces TUNEAHEAD, a framework designed to predict the fine-tuning performance of large language models before full training begins, addressing the challenges of compute intensity and potential performance degradation. By encoding candidate runs as meta-feature vectors that incorporate both static dataset descriptors and dynamic probe features, TUNEAHEAD effectively maps these features to performance estimates. The framework demonstrates superior predictive accuracy over existing methods, achieving an RMSE of 1.47 percentage points and ensuring that 95.1% of predictions fall within a narrow margin of the true scores across extensive testing.

Key Contribution

TUNEAHEAD can predict fine-tuning performance with remarkable accuracy, potentially saving researchers from costly and ineffective training runs.

Abstract

Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and naïve runs can even degrade model performance. This raises a practical question:can we predict fine-tuning performance before committing to a full training run? We present TUNEAHEAD, a lightweight framework for pre-hoc prediction of fine-tuning performance. TUNEAHEAD encodes each candidate run as a meta-feature vector that combines static dataset descriptors with dynamic probe features from a short standardized probe. A predictor maps these features to performance estimates, while SHAP-based attributions provide interpretable diagnostics that reveal which specific features drive the prediction. Across 1,300+ fine-tuning runs on Qwen2.5-7B-Instruct, TUNEAHEAD consistently outperforms strong baselines such as Early-Stop Extrapolation and ProxyLM. On a held-out test set of 370 runs, TUNEAHEAD achieves an RMSE of 1.47 percentage points and places 95.1% of predictions within +3/-3 percentage points of the true score. These accurate continuous predictions support practical go/no-go screening policies that can reduce unnecessary full fine-tuning while retaining most promising runs.

Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

Related Papers