MilaEdinburghServiceNowMar 10, 2025arXiv:2503.06985

Learning Decision Trees as Amortized Structure Inference

Mohammed Mahfoud, Ghait Boukachab, Michal Koziarski, Alex Hernandez-Garcia, Stefan Bauer, Y. Bengio, Nikolay Malkin

AI Summary

The paper introduces DT-GFN, a method for learning decision tree ensembles by formulating decision tree construction as a sequential planning problem solved via a deep reinforcement learning policy (GFlowNet). This approach addresses challenges in scaling and generalizing decision tree models for tabular data by learning to sample decision trees from the Bayesian posterior. DT-GFN outperforms state-of-the-art decision tree and deep learning methods on classification benchmarks, demonstrates robustness to distribution shifts, and produces interpretable models with shorter description lengths.

Key Contribution

Ditch the greedy heuristics: GFlowNets can learn to sample decision trees from the Bayesian posterior, outperforming standard methods and scaling consistently with ensemble size.

Abstract

Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translating to better performance, and generalizing systematically beyond the training data distribution. Designing decision tree models remains especially challenging given the intractably large search space, and most existing methods rely on greedy heuristics, while deep learning inductive biases expect a temporal or spatial structure not naturally present in tabular data. We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data, formulating decision tree construction as a sequential planning problem. We train a deep reinforcement learning (GFlowNet) policy to solve this problem, yielding a generative model that samples decision trees from the Bayesian posterior. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks derived from real-world data, robustness to distribution shifts, and anomaly detection, all while yielding interpretable models with shorter description lengths. Samples from the trained DT-GFN model can be ensembled to construct a random forest, and we further show that the performance of scales consistently in ensemble size, yielding ensembles of predictors that continue to generalize systematically.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References83

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Learning Decision Trees as Amortized Structure Inference

Related Papers