Search papers, labs, and topics across Lattice.
This paper introduces a framework for identifying undervalued football players by comparing their expected market value, estimated from structured data, to their observed valuation. Gradient-boosted regression models are trained to predict market value using historical market dynamics, biographical features, and transfer history. The study finds that while market dynamics are the primary signal for undervaluation, NLP features extracted from news articles provide consistent secondary gains, improving robustness and interpretability.
Forget subjective scouting reports: this framework objectively identifies undervalued football players by blending market dynamics with news sentiment, offering a data-driven edge in talent acquisition.
We present a practical, reproducible framework for identifying undervalued football players grounded in objective mispricing. Instead of relying on subjective expert labels, we estimate an expected market value from structured data (historical market dynamics, biographical and contract features, transfer history) and compare it to the observed valuation to define mispricing. We then assess whether news-derived Natural Language Processing (NLP) features (i.e., sentiment statistics and semantic embeddings from football articles) complement market signals for shortlisting undervalued players. Using a chronological (leakage-aware) evaluation, gradient-boosted regression explains a large share of the variance in log-transformed market value. For undervaluation shortlisting, ROC-AUC-based ablations show that market dynamics are the primary signal, while NLP features provide consistent, secondary gains that improve robustness and interpretability. SHAP analyses suggest the dominance of market trends and age, with news-derived volatility cues amplifying signals in high-uncertainty regimes. The proposed pipeline is designed for decision support in scouting workflows, emphasizing ranking/shortlisting over hard classification thresholds, and includes a concise reproducibility and ethics statement.