AI CenterHSE UniversitySB AI LabFeb 22, 2026arXiv:2602.19339

SplitLight: An Exploratory Toolkit for Recommender Systems Datasets and Splits

Anna Volodkevich, Dmitry Anikin, Danil Gusak, Anton Klenitskiy, Evgeny Frolov, Alexey Vasilev

AI Summary

The paper introduces SplitLight, an open-source toolkit designed to address the impact of data preprocessing and splitting choices on recommender system evaluation. SplitLight analyzes dataset statistics, repeat consumption patterns, and split validity issues like temporal leakage and distribution shifts. The toolkit provides both a Python API and a no-code interface for comparing splitting strategies and generating audit summaries to improve the transparency and reliability of recommender system experiments.

Key Contribution

Seemingly innocuous data splitting choices in recommender systems can drastically alter model rankings, and SplitLight provides the tools to expose these hidden biases.

Abstract

Offline evaluation of recommender systems is often affected by hidden, under-documented choices in data preparation. Seemingly minor decisions in filtering, handling repeats, cold-start treatment, and splitting strategy design can substantially reorder model rankings and undermine reproducibility and cross-paper comparability. In this paper, we introduce SplitLight, an open-source exploratory toolkit that enables researchers and practitioners designing preprocessing and splitting pipelines or reviewing external artifacts to make these decisions measurable, comparable, and reportable. Given an interaction log and derived split subsets, SplitLight analyzes core and temporal dataset statistics, characterizes repeat consumption patterns and timestamp anomalies, and diagnoses split validity, including temporal leakage, cold-user/item exposure, and distribution shifts. SplitLight further allows side-by-side comparison of alternative splitting strategies through comprehensive aggregated summaries and interactive visualizations. Delivered as both a Python toolkit and an interactive no-code interface, SplitLight produces audit summaries that justify evaluation protocols and support transparent, reliable, and comparable experimentation in recommender systems research and industry.

Data Curation & Synthetic Data Open-Source Models & Weights Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SplitLight: An Exploratory Toolkit for Recommender Systems Datasets and Splits

Related Papers