Search papers, labs, and topics across Lattice.
This paper addresses the challenge of active feature selection in machine learning, where the goal is to sequentially acquire the most informative features for each instance while minimizing costs. The authors extend their previous reinforcement learning approach by incorporating a heuristic-based strategy to explore promising feature combinations and introducing a post-fit regularization method to reduce the number of distinct feature sequences. Experiments on binary classification datasets, including one with high-dimensional variables, demonstrate improved performance compared to state-of-the-art methods in terms of both accuracy and policy complexity.
Reinforcement learning can now handle active feature selection in high-dimensional datasets by intelligently pruning the feature search space and regularizing decision sequences, outperforming existing methods in accuracy and policy complexity.
Determining the most appropriate features for machine learning predictive models is challenging regarding performance and feature acquisition costs. In particular, global feature choice is limited given that some features will only benefit a subset of instances. In previous work, we proposed a reinforcement learning approach to sequentially recommend which modality to acquire next to reach the best information/cost ratio, based on the instance-specific information already acquired. We formulated the problem as a Markov Decision Process where the state's dimensionality changes during the episode, avoiding data imputation, contrary to existing works. However, this only allowed processing a small number of features, as all possible combinations of features were considered. Here, we address these limitations with two contributions: 1) we expand our framework to larger datasets with a heuristic-based strategy that focuses on the most promising feature combinations, and 2) we introduce a post-fit regularisation strategy that reduces the number of different feature combinations, leading to compact sequences of decisions. We tested our method on four binary classification datasets (one involving high-dimensional variables), the largest of which had 56 features and 4500 samples. We obtained better performance than state-of-the-art methods, both in terms of accuracy and policy complexity.