Clausthal University of TechnologyUniversity of MannheimMay 27, 2026arXiv:2605.28418

Revisiting Metafeatures to Explain Model Differences on Tabular Data

Markus Herre, Andrej Tschalzev, Sascha Marton, Christian Bartelt

AI Summary

This paper investigates whether dataset meta-features can explain performance differences between tabular foundation models and traditional models on the TabArena benchmark. They analyze dataset-level performance gaps in relation to model-agnostic dataset descriptors, employing strict statistical tests with false discovery control. The key finding is that global meta-feature approaches are not robust enough to explain performance differences across the 51 datasets in TabArena, with only limited success in specific model comparisons.

Key Contribution

Turns out, dataset meta-features can't reliably explain why one tabular model beats another, suggesting tabular data is more heterogeneous than we thought.

Abstract

With the rise of tabular foundation models alongside traditional models still performing well on many tasks, choosing the right model for a tabular dataset remains difficult. We investigate whether dataset meta-features can explain performance gaps between model families on tabular prediction tasks. Using the TabArena benchmark results, we analyze dataset-level performance gaps and relate them to model-agnostic dataset descriptors. After strict statistical tests with false discovery control, we find that (1) for neural network vs. tree gaps, no meta-feature survives false discovery control, (2) for non-foundation vs. foundation model gaps, one association is robust but does not generalize when tested in leave-one-dataset-out prediction, and (3) for TabICLv2 vs. TabPFN-2.6, one robust association also improves held-out prediction. Furthermore, we conduct a leave-one-dataset-out analysis and find that meta-feature predictors fail to improve meaningfully over a simple baseline. Overall, our results show the heterogeneity of tabular datasets and that global meta-feature approaches are not robust enough to offer explanations on the 51 TabArena datasets.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Revisiting Metafeatures to Explain Model Differences on Tabular Data

Related Papers