Open Problem: Properly learning decision trees in polynomial time?

Abstract

The authors recently gave an $n^{O(\log\log n)}$ time membership query algorithm for properly learning decision trees under the uniform distribution (BLQT21). The previous fastest algorithm for this problem ran in $n^{O(\log n)}$ time, a consequence of EH89’s classic algorithm for the distribution-free setting. In this article we highlight the natural open problem of obtaining a polynomial-time algorithm, discuss possible avenues towards obtaining it, and state intermediate milestones that we believe are of independent interest.

keywords:

Decision trees, proper learning, analysis of Boolean functions

1 Introduction

Decision trees are one of the most intensively studied concept classes in learning theory. In this article we focus on the problem of properly learning decision trees, where the learning algorithm is expected to return a hypothesis that is itself a decision tree. Decision tree hypotheses are of particular interest because of their simple structure, and they are the canonical example of a highly interpretable model. Indeed, it is natural to seek decision tree hypotheses even when learning concepts that are not themselves decision trees. Algorithms such as ID3, CART, and C4.5 that construct decision tree representations of datasets are among the most popular and empirically successful algorithms in everyday machine learning practice.

We focus on the setting of PAC learning under the uniform distribution with membership queries, where the learner is given query access to an unknown size- $s$ decision tree target $f:\{0,1\}^{n}\to\{0,1\}$ and is expected to construct a decision tree hypothesis $h:\{0,1\}^{n}\to\{0,1\}$ satisfying $\Pr_{\boldsymbol{x}\sim\{0,1\}^{n}}[f(\boldsymbol{x})\neq h(\boldsymbol{x})]\leq\varepsilon$ , where $\boldsymbol{x}\sim\{0,1\}^{n}$ is uniform random.

The main open problem of this article is the following:

{openproblem}

Design a $\mathrm{poly}(n,s,1/\varepsilon)$ -time membership query algorithm for properly learning size- $s$ decision trees over $n$ variables to error $\varepsilon$ under the uniform distribution.

Regarding the use of membership queries, it would of course be preferable if the algorithm does not require them and instead only uses uniform random labeled examples. However, there are two significant barriers to such an algorithm. First, no such statistical query algorithm (SQ) exists: any SQ algorithm for learning size- $s$ decision trees has to make at least $n^{\Omega(\log s)}$ SQs