Loading...
Thumbnail Image
Publication

Optimal decision trees via search: A reinforcement learning framework

Abstract
Decision Trees are one of the most popular models in Machine Learning because of their interpretability, which is especially noticeable for Decision Trees with few splits (or decision rules), a property that is called sparsity. Interpretable models are highly valued in domains where decisions carry substantial ramifications such as health-care and the criminal justice system. For this reason, seeking optimal sparse Decision Trees constitutes a fundamental problem in Interpretable Machine Learning, and a large research effort has been devoted to solving this problem. Due to the NP-Hard difficulty of the problem, greedy heuristic methods such as C4.5 and CART have been favoured historically, and while fast and scalable, these methods remain unsatisfactory because they forgo the sparsity portion of the problem. In fact, they often lead to suboptimal and overly complex Decision Trees. Nevertheless, many algorithms have been developed in the literature to address the sparsity problem, with recent breakthroughs leveraging Dynamic Programming (DP) and Branch & Bound (B&B) techniques, and causing a paradigm shift from traditional Mathematical Programming to search algorithms. However, most of the proposed DP and B&B methods follow a Depth-First-Search (DFS) strategy, which, while attractive due to its storage-economy capacity, is inefficient because of its uninformative nature. Moreover, DFS necessitates the definition of a maximum depth hyperparameter a priori, and its inefficiency becomes a limiting factor when considering large maximum depths. These concerns were partially addressed with Best-First-Search (BFS) strategies. BFS is appealing because it prunes the search space more aggressively without losing the optimality guarantee, thus enabling it to find the optimal solution faster than DFS. Moreover, BFS does not necessitate fixing a maximum depth hyperparameter, it can run on infinite depth, which is an important advantage that allows practitioners to focus tuning efforts on other hyperparameters. However, these advantages come at the expense of higher memory consumption than DFS. To alleviate this issue, the search strategies of BFS methods have to be formulated as efficiently as possible, in other words, they need to find the optimal solution as quickly as possible. The reason being to tip the balance of their advantages-drawbacks more towards their advantages, outweighing their high memory consumption issue. The current BFS methods can be improved substantially by reconsidering the sparsity problem under a framework that benefits from its structural properties. Our main contribution in this thesis addresses this concern within a Reinforcement Learning framework. Our BFS algorithm, Branches, benefits from an AO*-type search strategy, which, coupled with our pruning bound we call Purification Bound and several other heuristic considerations, focuses the search on relevant regions of the search space while maintaining the optimality guarantee. This enables Branches to find optimal sparse Decision Trees faster than the state of the art, and as a result alleviate the memory burden that weighs on BFS approaches. We analyse the computational complexity of Branches theoretically by deriving an upper bound on the number of branch evaluations it performs before terminating, and we show this result to be superior to similar analyses from the literature. These findings are further validated through a series of extensive experiments showing Branches to significantly advance the state of the art in terms of number of iterations, runtime, anytime behaviour and scalability potential. This work lays the foundations for future work that can further improve the scalability of these methods. In addition to this work, we also study the online setting where a data stream is observed rather than a fixed dataset. Unlike the batch setting, the literature on optimal sparse Decision Trees in is scarce for Online Learning. Most approaches adapt greedy heuristics to handling data streams through statistical estimates, hence inheriting their suboptimality issues. On this front, we develop three Monte Carlo Tree Search (MCTS) algorithms with asymptotic optimality guarantees, and we show that they can successfully retrieve the optimal solution in situations where the state of the art methods are practically guaranteed to fail. Furthermore, we adapt Branches to the online setting inducing Online-Branches, an algorithm that satisfies finite-time optimality guarantees rather than asymptotic ones. Our experiments show Online-Branches to outperform our previously developed MCTS methods. Our work on the online setting has the potential to draw more research attention to the problem of seeking optimal sparse Decision Trees when given a data stream. It also lays a solid foundation for these future works through several discussions about the potential issues that arise due to the hypotheses of Online Learning and our proposed solutions.
Type
Thesis
Series
Citation
Date
2024
Publisher
The University of Waikato
Rights
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.