This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Efficient decision tree training with new data structure for secure multi-party computation

Koki Hamada    Dai Ikarashi    Ryo Kikuchi    Koji Chida
( NTT Social Informatics Laboratories
December 17, 2021)
Abstract

We propose a secure multi-party computation (MPC) protocol that constructs a secret-shared decision tree for a given secret-shared dataset. The previous MPC-based decision tree training protocol (Abspoel et al. 2021) requires O(2hmnlogn)O(2^{h}mn\log n) comparisons, being exponential in the tree height hh and with nn and mm being the number of rows and that of attributes in the dataset, respectively. The cause of the exponential number of comparisons in hh is that the decision tree training algorithm is based on the divide-and-conquer paradigm, where dummy rows are added after each split in order to hide the number of rows in the dataset. We resolve this issue via secure data structure that enables us to compute an aggregate value for every group while hiding the grouping information. By using this data structure, we can train a decision tree without adding dummy rows while hiding the size of the intermediate data. We specifically describes a decision tree training protocol that requires only O(hmnlogn)O(hmn\log n) comparisons when the input attributes are continuous and the output attribute is binary. Note that the order is now linear in the tree height hh. To demonstrate the practicality of our protocol, we implement it in an MPC framework based on a three-party secret sharing scheme. Our implementation results show that our protocol trains a decision tree with a height of 5 in 33 seconds for a dataset of 100,000 rows and 10 attributes.

1 Introduction

Secure multi-party computation (MPC) [Yao86] allows parties to jointly compute any function while keeping inputs private. Its large computational overhead has long been a barrier to practical use. In recent years, even efficient MPC protocols for machine learning methods such as neural network training [WGC19, RWT+18, MR18] have been proposed.

Decision tree is one of the classical machine learning methods. It is still widely used due to its computational simplicity and ease of interpretation. It is also an important component of other machine learning methods, such as gradient boosting decision tree [Fri01] and random forest [Bre01], which have been successful in recent years.

Since the work of Lindell and Pinkas [LP00] in the early days of privacy-preserving data mining, there has been a lot of research on MPC protocols for decision tree training. In order to be used as a component of MPC protocols for other machine learning methods, it is desirable to keep all the information, from the input to the trained decision tree, private. However, only a few protocols [dHSCodA14, AEV21, ACC+21] with such a property have been proposed. This is mainly due to two kinds of computational difficulties in MPC.

The first difficulty is the computation on real numbers. Decision tree training requires computation of evaluation functions. Although there are many types of evaluation functions, all commonly used ones involve division or logarithm. Therefore, naive MPC protocols for decision tree training involve computation on real numbers, which increases the computational cost. On the contrary, de Hoogh et al. [dHSCodA14] cleverly avoided computation on real numbers by replacing fractional number comparisons with integer comparisons, and proposed an efficient protocol for the case where inputs are categorical values. Abspoel et al. [AEV21] presented an efficient protocol that can be applied to the case where the input contains numerical values. The number of candidates to which the evaluation functions are applied is Θ(c)\Theta(c) when the input consists only of categorical values, whereas it increases to Θ(n2)\Theta(n^{2}) when the input contains numerical values, where cc is the number of possible values of the categorical value and nn is the number of samples in the input. They used a sorting protocol to reduce the number of candidates to O(n)O(n), and also extended the technique of de Hoogh et al. to the numerical case to avoid computation on real numbers. Adams et al. [ACC+21] dealt with the case where the input contains numeric values by a different approach: discretizing the numeric attributes of the input. Although the trained tree is slightly different from the one without discretization, this approach avoids the use of sorting, which is relatively computationally expensive, and allows us to use the efficient protocol of de Hoogh et al. [dHSCodA14].

The second difficulty is the protection of the intermediate data size. In decision tree training, the data is split recursively from the root node to the leaf nodes in a top-down fashion. As the tree height increases, the number of nodes increases exponentially. On the other hand, the size of the intermediate data processed by each node also decreases exponentially on average, hence the overall computational cost is linear in the tree height. When this is implemented in MPC, the intermediate data size after splitting has to be hidden, so the existing protocols [dHSCodA14, AEV21, ACC+21] used a data, which contains some dummy entries, of the same size as the original one at each node. Therefore, we could not benefit from the size reduction by data splitting, and as a result, the overall computational cost was exponential in the tree height.

1.1 Our contribution

Table 1: Comparison of computational cost of MPC protocols for decision tree training with numerical input attributes.
Method Number of operations
Trivial [AEV21] O(2hmn2)O(2^{h}mn^{2})
Abspoel et al. [AEV21] O((2h+logn)mnlogn)O((2^{h}+\log n)mn\log n)
Abspoel et al. [AEV21] O(2hmnlogn)O(2^{h}mn\log n)
with efficient sort [HKI+12]
Ours O(hmnlogn)O(hmn\log n)

We propose an MPC protocol for decision tree training with linear computational cost on tree height, which is the first protocol to solve the second problem above. It trains a binary decision tree under the assumption that all input attributes are numeric and the output attribute is binary. As in the protocol by Abspoel et al. [AEV21], it does not reveal any information other than the size of the input and the upper bound hh on the tree height.

The computational cost of our protocol is O(hmnlogn)O(hmn\log n), assuming that the comparison and multiplication protocols are unit operations, where mm is the number of input attributes in the dataset, and nn is the number of samples in the dataset. This is an exponential improvement with respect to hh over the computational cost O(2hmnlogn)O(2^{h}mn\log n) of the protocol by Abspoel et al. (Actually, Abspoel et al. [AEV21] claimed only a computational cost O((2h+logn)mnlogn)O((2^{h}+\log n)mn\log n), but their protocol can easily be implemented to run in O(2hmnlogn)O(2^{h}mn\log n) by replacing the sorting protocol to efficient one such as [HKI+12].) A comparison of the computational costs is shown in Table 1.

Our approach of exponential improvement in computational cost with respect to the tree height is general. For completeness, our protocol is instantiated with all input attributes being numeric, the output attribute being binary, and the evaluation function being the Gini index; however, it is easy to extend. In fact, the main protocol (Algorithm 5), which plays a central role in the exponential improvement of the computational cost, describes a process common to the major decision tree training algorithms CART [BFOS84], ID3 [Qui86], and C4.5 [Qui14].

Our protocol is built on top of a set of basic protocols, such as multiplication and comparison, provided by many recent MPC frameworks, so it can be used on top of various implementations. More specifically, we build our protocol on top of an MPC model called arithmetic black box (ABB), which consists of a set of basic operations described in Section 2.2.1.

As a byproduct of our decision tree training protocol, we also propose a secure data structure that can compute aggregate values for each group while keeping the grouping information private. This data structure can be used to compute aggregate values such as sums and maximums within each group while keeping the grouping information private, even in cases other than decision tree training.

To see the practicality of our decision tree training protocol, we implemented it on an MPC framework based on a 2-out-of-3 secret sharing scheme. Our protocol trained a decision tree with a height of 5 for 100,000 inputs of 10 input variables in 33 seconds.

1.2 Overview of our techniques

In general, MPC protocols are incompatible with divide-and-conquer algorithms. In divide-and-conquer algorithms, the problem is divided into smaller subproblems and solved recursively, but MPC protocols need to hide the size of the divided problem as well. A common way to hide the size of the problem is to add a dummy. We hide the actual size of the data by adding dummies (or leaving samples that should be removed) to the split data to make it appear to be the same size as the original. A disadvantage of this method is that it is computationally expensive; since it loses the property that the data size becomes smaller after splitting. For this reason, the previous study [AEV21] required an exponential cost for the height of the tree.

We use the property that the total number of samples is invariant at each height in training decision trees. We keep the data of nodes of the same height together, and train them all at once without adding any dummies. This allows our protocol to process only Θ(hmn)\Theta(hmn) samples in total, while the previous study [AEV21] processes Θ(2hmn)\Theta(2^{h}mn) samples including dummies.

To implement this idea, we first define a data structure that looks like a private vector of length nn, but is internally grouped. Specifically, we place the nn grouped elements on a private vector of length nn so that elements of the same group appear next to each other, and then create a private vector of length n with a flag corresponding to the first element of each group. This allows us to detect the boundaries of groups by referring to the flags internally, although we cannot distinguish the groupings outwardly.

In decision tree training, each group needs to be split when moving to the next height. We accomplish this within our data structure by stably sorting the elements using the binary branching result, which is computed for each element, as a key. Stability of the sort ensures that elements that are in the same group and have the same branching result will be placed sequentially after the sort. Since this split requires only one-bit-key sorting, it can be very efficient depending on the underlying MPC implementation.

We build the group-wise sum, maximum, and prefix sum computations on our data structure. We then use them to build a decision tree training algorithm similar to [AEV21] on our data structure.

2 Preliminaries

In this section, we introduce a typical decision tree training algorithm in the clear and secure mulit-party computation.

Before that, we introduce some notation. Throughout this paper, the index of a vector starts at 11. We refer to the ii-th element of a vector v\vec{v} by v[i]\vec{v}[i]. That is, if v\vec{v} is a vector of length nn, then v=(v[1],v[2],,v[n])\vec{v}=(\vec{v}[1],\vec{v}[2],\dots,\vec{v}[n]). In logical operations, 0 represents false and 11 represents true.

2.1 Decision tree training

Decision tree training is a method in machine learning. The goal is to obtain a model called a decision tree that predicts a value of an output attribute, given values of input attributes. There are several famous algorithms for decision tree training, such as CART [BFOS84], ID3 [Qui86], and C4.5 [Qui14]. The general framework of these algorithms is the same, and in fact they are all greedy algorithms based on the divide-and-conquer paradigm. In this section, we present a typical algorithm, for which we plan to construct a secure version, for training a two-class classification binary tree, where all input attributes are numerical.

2.1.1 Typical decision tree training algorithm

Let us start with defining notation. Consider a dataset DD with mm input attributes X1,,XmX_{1},\dots,X_{m} and an output attribute YY. Suppose there are nn samples, each sample being a pair of an input tuple xx and a class label yy. Here, xx is an mm-tuple, and yy is a value of the output attribute YY. The jj-th element of xx represents a value of the input attribute XjX_{j}. A decision tree consists of a binary tree and some additional information. Each internal node (non-leaf node) has a condition called a test of the form Xj<tX_{j}<t. It asks if the jj-th element in a given input tuple is less than a threshold tt or not. Each edge is assigned a possible outcome of its source node’s test, that is, true or false. An edge whose assigned outcome is true (false) is called a true edge (false edge, respectively). A child node whose incoming edge is a true edge (false edge) is called a true-child node (false-child node, respectively). Each leaf node is assigned a class label called leaf label. This information is used to predict a class label for a given input tuple as follows. Starting from the root node, we repeat evaluating the test of the internal node we reach and tracing an outgoing edge that is assigned the same value as the test outcome. When we reach a leaf node, we output its leaf label as the predicted class label.

Notation: 𝒯:=𝖳𝗋𝖺𝗂𝗇(𝒟)\mathcal{T}:=\operatorname{\sf Train}(\mathcal{D})
Input: A training dataset 𝒟\mathcal{D}.
Output: A decision tree 𝒯\mathcal{T}.
1 if the stopping criterion is met then
2       Let rr be a leaf node whose leaf label is the most common class label in 𝒟\mathcal{D}. Outputs a tree whose root node is rr.
3else
4       Find the best test Xj<tX_{j}<t according to the variable selection measure.
5       Recursively computes two subtrees 𝒯Xj<t:=𝖳𝗋𝖺𝗂𝗇(𝒟Xj<t)\mathcal{T}_{X_{j}<t}:=\operatorname{\sf Train}(\mathcal{D}_{X_{j}<t}) and 𝒯Xjt:=𝖳𝗋𝖺𝗂𝗇(𝒟Xjt)\mathcal{T}_{X_{j}\geq t}:=\operatorname{\sf Train}(\mathcal{D}_{X_{j}\geq t}).
6       Let vv be an internal node vv whose test is Xj<tX_{j}<t. Output a tree such that its root node is vv, vv’s true-child node is 𝒯Xj<t\mathcal{T}_{X_{j}<t}’s root node, and vv’s false-child node is 𝒯Xjt\mathcal{T}_{X_{j}\geq t}’s root node.
7      
Algorithm 1 A typical decision tree training algorithm in the clear.

A typical decision tree training algorithm is shown in Algorithm 1. It trains a tree recursively from the root node to a leaf node in a top-down fashion. At each node, it checks if the stopping criterion is satisfied using the given training dataset 𝒟\mathcal{D} to determine the node type. If the stopping criterion is satisfied, the current node is set to be a leaf node. Then, it sets the most frequent class label in the dataset to the leaf label of the current node, and outputs a tree whose root is this node. If the stopping criterion is not satisfied, the current node is set as an internal node. In this case, we select a test of the form Xj<tX_{j}<t that gives the best data splitting with respect to a predetermined criterion, and split the training dataset 𝒟\mathcal{D} into 𝒟Xj<t\mathcal{D}_{X_{j}<t} and 𝒟Xjt\mathcal{D}_{X_{j}\geq t} according to this test, where 𝒟Xj<t:={(x,y)𝒟x(Xj)<t}\mathcal{D}_{X_{j}<t}:=\{(x,y)\in\mathcal{D}\mid x(X_{j})<t\} and 𝒟Xjt:={(x,y)𝒟x(Xj)t}\mathcal{D}_{X_{j}\geq t}:=\{(x,y)\in\mathcal{D}\mid x(X_{j})\geq t\}. It then recursively trains decision trees 𝒯Xj<t\mathcal{T}_{X_{j}<t} and 𝒯Xjt\mathcal{T}_{X_{j}\geq t} with 𝒟Xj<t\mathcal{D}_{X_{j}<t} and 𝒟Xjt\mathcal{D}_{X_{j}\geq t} as the training data, respectively, and sets the roots of these trees as the child nodes of the current node, and outputs a tree whose root is the current node.

We use the commonly used stopping criterion: (1) the height of the node is hh, or (2) the dataset cannot be split further (i.e., (i) all class labels are the same, or (ii) all input tuples are the same), where hh is an upper bound of the tree height, which is typically given as a hyperparameter.

2.1.2 Test selection measure

The size and shape of the decision tree depends on which tests are selected at the internal nodes. In general, it is desirable to make the tree as small as possible, but the problem of constructing a decision tree that minimizes the sum of the lengths of the paths from the root to each leaf is known to be NP-hard [HR76]. Therefore, we usually define a measure for goodness of local splitting and select a test that maximizes this measure.

Commonly used measures for goodness of split include the information gain used in ID3 [Qui86] and the Gini index used in CART [BFOS84]. We use the Gini index, which is also used in previous studies such as [dHSCodA14, AEV21] due to its ease of computation in MPC.

Two types of Gini indices are defined: one for a dataset and one for a dataset and a test. The Gini index for a dataset 𝒟\mathcal{D}, which we denote by 𝖦𝗂𝗇𝗂(𝒟)\operatorname{\sf Gini_{\mathit{}}}(\mathcal{D}), is defined as follows:

𝖦𝗂𝗇𝗂(𝒟):=1c{0,1}|𝒟Y=c|2|𝒟|2,\operatorname{\sf Gini_{\mathit{}}}(\mathcal{D}):=1-\sum_{c\in\{0,1\}}\frac{|\mathcal{D}_{Y=c}|^{2}}{|\mathcal{D}|^{2}},

where 𝒟Y=c:={(x,y)𝒟y=c}\mathcal{D}_{Y=c}:=\{(x,y)\in\mathcal{D}\mid y=c\} is a subset of 𝒟\mathcal{D} whose class label is cc. Intuitively, the smaller 𝖦𝗂𝗇𝗂(𝒟)\operatorname{\sf Gini_{\mathit{}}}(\mathcal{D}) is, the purer 𝒟\mathcal{D} becomes in terms of class labels.

The Gini index for a dataset 𝒟\mathcal{D} and a test Xj<tX_{j}<t, which we denote by 𝖦Xj<t(𝒟)\operatorname{\sf G_{\mathit{X_{j}<t}}}(\mathcal{D}), is defined using 𝖦𝗂𝗇𝗂\operatorname{\sf Gini_{\mathit{}}} as follows:

𝖦Xj<t(𝒟):=|𝒟Xj<t||D|𝖦𝗂𝗇𝗂(𝒟Xj<t)+|𝒟Xjt||𝒟|𝖦𝗂𝗇𝗂(𝒟Xjt).\operatorname{\sf G_{\mathit{X_{j}<t}}}(\mathcal{D}):=\frac{|\mathcal{D}_{X_{j}<t}|}{|D|}\operatorname{\sf Gini_{\mathit{}}}(\mathcal{D}_{X_{j}<t})+\frac{|\mathcal{D}_{X_{j}\geq t}|}{|\mathcal{D}|}\operatorname{\sf Gini_{\mathit{}}}(\mathcal{D}_{X_{j}\geq t}).

Intuitively, the smaller 𝖦Xj<t(𝒟)\operatorname{\sf G_{\mathit{X_{j}<t}}}(\mathcal{D}) is, the purer each split dataset becomes (and hence the better the test is). Therefore, to find the best test for splitting a dataset 𝒟\mathcal{D}, we compute a test TT that minimizes 𝖦T(𝒟)\operatorname{\sf G_{\mathit{T}}}(\mathcal{D}) [HKP11].

Abspoel et al. [AEV21] showed that minimization of 𝖦Xj<t(𝒟)\operatorname{\sf G_{\mathit{X_{j}<t}}}(\mathcal{D}) is equivalent to maximization of 𝖦Xj<t(𝒟)\operatorname{\sf G^{\prime}_{\mathit{X_{j}<t}}}(\mathcal{D}) defined as

𝖦Xj<t(𝒟):=(|𝒟Xjt|(|𝒟Xj<tY=0|2+|𝒟Xj<tY=1|2)+|𝒟Xj<t|(|𝒟XjtY=0|2+|𝒟XjtY=1|2))/(|𝒟Xj<t||𝒟Xjt|),\operatorname{\sf G^{\prime}_{\mathit{X_{j}<t}}}(\mathcal{D}):=(|\mathcal{D}_{X_{j}\geq t}|(|\mathcal{D}_{X_{j}<t\wedge Y=0}|^{2}+|\mathcal{D}_{X_{j}<t\wedge Y=1}|^{2})\\ +|\mathcal{D}_{X_{j}<t}|(|\mathcal{D}_{X_{j}\geq t\wedge Y=0}|^{2}+|\mathcal{D}_{X_{j}\geq t\wedge Y=1}|^{2}))\\ /(|\mathcal{D}_{X_{j}<t}||\mathcal{D}_{X_{j}\geq t}|), (1)

where 𝒟Xj<tY=c:={(x,y)𝒟x(Xj)<ty=c}\mathcal{D}_{X_{j}<t\wedge Y=c}:=\{(x,y)\in\mathcal{D}\mid x(X_{j})<t\wedge y=c\} and 𝒟XjtY=c:={(x,y)𝒟x(Xj)ty=c}\mathcal{D}_{X_{j}\geq t\wedge Y=c}:=\{(x,y)\in\mathcal{D}\mid x(X_{j})\geq t\wedge y=c\}. We refer to it as modified Gini index and use it as a measure in our protocol.

2.2 Secure multi-party computation

We model secure multi-party computation (MPC) with an ideal functionality called arithmetic black box (ABB). This ideal functionality allows a set of parties P1,,PCP_{1},\dots,P_{C} to store values, operate on the stored values, and retrieve the stored values. We build our protocol on top of an ABB. This allows our protocol to run on any MPC implementation that realizes ABB, since concrete ABB implementation is separated from their construction.

2.2.1 Arithmetic black box

A command [[z]]𝖤𝗇𝖼(x,Pi)[\![z]\!]\leftarrow\operatorname{\sf Enc}(x,P_{i}): Receive xx from a party PiP_{i} and store it as [[x]][\![x]\!]. A command z𝖣𝖾𝖼([[x]])z\leftarrow\operatorname{\sf Dec}([\![x]\!]): Send xx to every party, who store it in the local variable zz. A command [[z]]𝖠𝖽𝖽([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf Add}([\![x]\!],[\![y]\!]): Compute z:=x+yz:=x+y and store it as [[z]][\![z]\!]. A command [[z]]𝖬𝗎𝗅([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf Mul}([\![x]\!],[\![y]\!]): Compute z:=xyz:=xy and store it as [[z]][\![z]\!]. A command [[z]]𝖤𝖰([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf EQ}([\![x]\!],[\![y]\!]): If x=yx=y then set z:=1z:=1, otherwise set z:=0z:=0. Store it as [[z]][\![z]\!]. A command [[z]]𝖫𝖳([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf LT}([\![x]\!],[\![y]\!]): If x<yx<y then set z:=1z:=1, otherwise set z:=0z:=0. Store it as [[z]][\![z]\!]. We assume that the commands 𝖠𝖽𝖽\operatorname{\sf Add}, 𝖬𝗎𝗅\operatorname{\sf Mul}, 𝖤𝖰\operatorname{\sf EQ}, and 𝖫𝖳\operatorname{\sf LT} are also defined in the same way when one of the inputs is a public value.

Figure 1: The arithmetic black box functionality ABB\mathcal{F}_{\rm ABB}.

We assume a simple ABB named ABB\mathcal{F}_{\rm ABB} over a ring M\mathbb{Z}_{M} for some integer MM as shown in Fig. 1. We denote a value referred to by a name xx stored in ABB\mathcal{F}_{\rm ABB} as [[x]][\![x]\!]. In a typical case, where ABB\mathcal{F}_{\rm ABB} is realized by a secret sharing based MPC, [[x]][\![x]\!] means that xx is secret shared. We say a value is private if it is stored in ABB\mathcal{F}_{\rm ABB}.

We identify residue classes in M\mathbb{Z}_{M} with their representatives in [0,M)[0,M). We assume MM is sufficiently large such that vector indices can be stored in ABB\mathcal{F}_{\rm ABB}. We also assume that the number of parties CC is constant.

For notational simplicity, [[z]]𝖠𝖽𝖽([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf Add}([\![x]\!],[\![y]\!]), [[z]]𝖬𝗎𝗅([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf Mul}([\![x]\!],[\![y]\!]), [[z]]𝖤𝖰([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf EQ}([\![x]\!],[\![y]\!]), and [[z]]𝖫𝖳([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf LT}([\![x]\!],[\![y]\!]) are also written as [[z]][[x]]+[[y]][\![z]\!]\leftarrow[\![x]\!]+[\![y]\!], [[z]][[x]]×[[y]][\![z]\!]\leftarrow[\![x]\!]\times[\![y]\!], [[z]]([[x]]=?[[y]])[\![z]\!]\leftarrow([\![x]\!]\stackrel{{\scriptstyle?}}{{=}}[\![y]\!]), and [[z]]([[x]]<?[[y]])[\![z]\!]\leftarrow([\![x]\!]\stackrel{{\scriptstyle?}}{{<}}[\![y]\!]), respectively. Furthermore, we denote [[x1]]+[[x2]]++[[xn]][\![x_{1}]\!]+[\![x_{2}]\!]+\dots+[\![x_{n}]\!] by i=1n[[xi]]\sum_{i=1}^{n}[\![x_{i}]\!].

2.2.2 Cost of MPC protocols

We define the cost of an MPC protocol as the number of invocations of ABB operations other than linear combination of private values. That is, we assume that the parties can compute 𝖠𝖽𝖽([[x]],[[y]])\operatorname{\sf Add}([\![x]\!],[\![y]\!]), 𝖠𝖽𝖽(c,[[y]])\operatorname{\sf Add}(c,[\![y]\!]), 𝖠𝖽𝖽([[x]],c)\operatorname{\sf Add}([\![x]\!],c), 𝖬𝗎𝗅(c,[[y]])\operatorname{\sf Mul}(c,[\![y]\!]), and 𝖬𝗎𝗅([[x]],c)\operatorname{\sf Mul}([\![x]\!],c) for free, where [[x]][\![x]\!] and [[y]][\![y]\!] are private values and cc is a public value. This cost models the communication complexity on a typical MPC based on a linear secret sharing scheme, in which the parties can locally compute linear combination of secret shared values. We refer to ABB operations, except for linear combinations of private values, as non-free operations.

2.2.3 Known protocols

We show known protocols that we will use as building blocks for our protocols. The protocols shown here are limited to those that can be built on ABB\mathcal{F}_{\rm ABB} for completeness. Some MPC implementations may provide the same functionality more efficiently, in which case we can use them instead of the protocols listed here to run our protocol more efficiently.

We start by defining some simple protocols.

  • [[z]][[x]]\nonscriptOR\nonscript[[y]][\![z]\!]\leftarrow[\![x]\!]\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu\mkern 5.0mu\mathbin{\operator@font{\textsf{OR}}}\penalty 900\mkern 5.0mu\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu[\![y]\!] computes logical disjunction of bits xx and yy as [[x]]+[[y]][[x]]×[[y]][\![x]\!]+[\![y]\!]-[\![x]\!]\times[\![y]\!], using O(1)O(1) non-free operations in O(1)O(1) rounds.

  • [[z]]¬[[x]][\![z]\!]\leftarrow\lnot[\![x]\!] computes negation of a bit xx as 1[[x]]1-[\![x]\!], using no non-free operations.

  • [[z]]𝖨𝖿𝖤𝗅𝗌𝖾([[c]],[[t]],[[f]])[\![z]\!]\leftarrow\operatorname{\sf IfElse}([\![c]\!],[\![t]\!],[\![f]\!]) receives a bit cc and two values tt and ff, and computes tt if c=1c=1, ff otherwise, as [[f]]+[[c]]×([[t]][[f]])[\![f]\!]+[\![c]\!]\times([\![t]\!]-[\![f]\!]), using O(1)O(1) non-free operations in O(1)O(1) rounds.

  • [[z]]𝖬𝖺𝗑([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf Max}([\![x]\!],[\![y]\!]) computes the maximum value of xx and yy as 𝖨𝖿𝖤𝗅𝗌𝖾(([[x]]<?[[y]]),[[y]],[[x]])\operatorname{\sf IfElse}(([\![x]\!]\stackrel{{\scriptstyle?}}{{<}}[\![y]\!]),[\![y]\!],[\![x]\!]), using O(1)O(1) non-free operations in O(1)O(1) rounds.

We require an extended 𝖬𝖺𝗑\operatorname{\sf Max} protocol, which we call 𝖵𝖾𝖼𝗍𝖬𝖺𝗑\operatorname{\sf VectMax}. We let [[z]]𝖵𝖾𝖼𝗍𝖬𝖺𝗑([[x]],[[y]])[\![z]\!]\leftarrow\operatorname{\sf VectMax}([\![\vec{x}]\!],[\![\vec{y}]\!]) denote the operation that computes a private value [[z]][\![z]\!] such that [[x]][\![\vec{x}]\!] and [[y]][\![\vec{y}]\!] are private vectors of the same length nn, i=argmaxj[1,n]x[j]i=\mathop{\rm arg~{}max}\limits_{j\in[1,n]}\vec{x}[j], and z=y[j]z=\vec{y}[j]. We use the construction by Abspoel et al. [AEV21], which uses O(n)O(n) non-free operations in O(logn)O(\log n) rounds.

We require three permutation-related protocols 𝖲𝗁𝗎𝖿𝖿𝗅𝖾\operatorname{\sf Shuffle}, 𝖠𝗉𝗉𝗅𝗒\operatorname{\sf Apply}, and 𝖴𝗇𝖺𝗉𝗉𝗅𝗒\operatorname{\sf Unapply}. Let SnS_{n} be a symmetric group on [1,n][1,n]. That is, SnS_{n} is the set of all bijective functions from [1,n][1,n] to [1,n][1,n]. A permutation is an element of SnS_{n}. Applying a permutation πSn\pi\in S_{n} to a vector x\vec{x} of length nn is the operation of rearranging x\vec{x} into a vector z\vec{z} satisfying z[π(i)]=x[i]\vec{z}[\pi(i)]=\vec{x}[i] for i[1,n]i\in[1,n]. We denote this operation as π(x)\pi(\vec{x}). Now we define the three protocols. Let [[π]]𝖲𝗁𝗎𝖿𝖿𝗅𝖾(n)[\![\pi]\!]\leftarrow\operatorname{\sf Shuffle}(n), where nn is an integer, denote the operation that computes [[π]][\![\pi]\!], such that π\pi is a uniformly randomly chosen element of SnS_{n}. Let [[z]]𝖠𝗉𝗉𝗅𝗒([[π]],[[x]])[\![\vec{z}]\!]\leftarrow\operatorname{\sf Apply}([\![\pi]\!],[\![\vec{x}]\!]), where πSn\pi\in S_{n} is a permutation and x\vec{x} is a vector of length nn, denote the operation that computes [[z]][\![\vec{z}]\!], such that [[z]]=π(x)[\![\vec{z}]\!]=\pi(\vec{x}). Let [[z]]𝖴𝗇𝖺𝗉𝗉𝗅𝗒([[π]],[[x]])[\![\vec{z}]\!]\leftarrow\operatorname{\sf Unapply}([\![\pi]\!],[\![\vec{x}]\!]), where πSn\pi\in S_{n} is a permutation and x\vec{x} is a vector of length nn, denote the operation that computes [[z]][\![\vec{z}]\!], such that [[z]]=π1(x)[\![\vec{z}]\!]=\pi^{-1}(\vec{x}). We use the construction by Falk and Ostrovsky [FO21]. In their construction, a private permutation is represented as a set of private control bits for Waksman permutation network [Wak68]. All these protocols use O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds. Note that we do not use the 𝖲𝗁𝗎𝖿𝖿𝗅𝖾\operatorname{\sf Shuffle} protocol directly in our protocols, but it is required as a component of the construction of 𝖲𝗈𝗋𝗍𝖯𝖾𝗋𝗆\operatorname{\sf SortPerm} protocol shown below.

We also require a protocol 𝖲𝗈𝗋𝗍𝖯𝖾𝗋𝗆\operatorname{\sf SortPerm} to compute a permutation that stably sorts given keys. We let

[[π]]𝖲𝗈𝗋𝗍𝖯𝖾𝗋𝗆([[x1]],[[x2]],,[[xk]])[\![\pi]\!]\leftarrow\operatorname{\sf SortPerm}([\![\vec{x}_{1}]\!],[\![\vec{x}_{2}]\!],\dots,[\![\vec{x}_{k}]\!])

denote the operation that computes [[π]][\![\pi]\!] such that x1,x2,,xk\vec{x}_{1},\vec{x}_{2},\dots,\vec{x}_{k} are vectors of length nn and applying πSn\pi\in S_{n} to ((x1[1],,xk[1]),,(x1[n],,xk[n]))((\vec{x}_{1}[1],\dots,\vec{x}_{k}[1]),\dots,(\vec{x}_{1}[n],\dots,\vec{x}_{k}[n])) lexicographically and stably sorts them. We use the construction by Laud and Willemson [LW14]. The protocol use O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds. Note that we can construct the composition of private and public permutations that is needed for the 𝖲𝗈𝗋𝗍𝖯𝖾𝗋𝗆\operatorname{\sf SortPerm} construction, since our private permutation is a set of control bits for Waksman permutation network.

To simplify the description, we introduce a small subprotocol 𝖲𝗈𝗋𝗍\operatorname{\sf Sort} for sorting private vectors. We let

[[z1]],,[[zm]]𝖲𝗈𝗋𝗍([[x1]],,[[xk]];[[y1]],,[[ym]])[\![\vec{z}_{1}]\!],\dots,[\![\vec{z}_{m}]\!]\leftarrow\operatorname{\sf Sort}([\![\vec{x}_{1}]\!],\dots,[\![\vec{x}_{k}]\!];[\![\vec{y}_{1}]\!],\dots,[\![\vec{y}_{m}]\!])

denote the following procedure:

  1. 1.

    [[π]]𝖲𝗈𝗋𝗍𝖯𝖾𝗋𝗆([[x1]],,[[xk]])[\![\pi]\!]\leftarrow\operatorname{\sf SortPerm}([\![\vec{x}_{1}]\!],\allowbreak\dots,\allowbreak[\![\vec{x}_{k}]\!]);

  2. 2.

    [[zj]]𝖠𝗉𝗉𝗅𝗒([[π]],[[yj]])[\![\vec{z}_{j}]\!]\leftarrow\operatorname{\sf Apply}(\allowbreak[\![\pi]\!],\allowbreak[\![\vec{y}_{j}]\!]) for j[1,m]j\in[1,m].

We sometimes use similar notation when the same operation is applied to multiple inputs. For example,

[[z1]],,[[zm]]𝖨𝖿𝖤𝗅𝗌𝖾([[c]];[[t1]],,[[tm]];[[f1]],,[[fm]])[\![z_{1}]\!],\dots,[\![z_{m}]\!]\leftarrow\operatorname{\sf IfElse}([\![c]\!];[\![t_{1}]\!],\dots,[\![t_{m}]\!];[\![f_{1}]\!],\dots,[\![f_{m}]\!])

means parallel execution of [[zj]]𝖨𝖿𝖤𝗅𝗌𝖾([[c]],[[tj]],[[fj]])[\![z_{j}]\!]\leftarrow\operatorname{\sf IfElse}([\![c]\!],[\![t_{j}]\!],[\![f_{j}]\!]) for j[1,m]j\in[1,m] and

[[z1]],,[[zm]]𝖵𝖾𝖼𝗍𝖬𝖺𝗑([[x]];[[y1]],,[[ym]])[\![z_{1}]\!],\dots,[\![z_{m}]\!]\leftarrow\operatorname{\sf VectMax}([\![\vec{x}]\!];[\![\vec{y}_{1}]\!],\dots,[\![\vec{y}_{m}]\!])

means parallel execution of [[zj]]𝖵𝖾𝖼𝗍𝖬𝖺𝗑([[x]],[[yj]])[\![z_{j}]\!]\leftarrow\operatorname{\sf VectMax}([\![\vec{x}]\!],[\![\vec{y}_{j}]\!]) for j[1,m]j\in[1,m].

If vectors are given for a protocol defined for scalar values, it means that the protocol is applied on an element-by-element basis. That is, [[z]][[x]]×[[y]][\![\vec{z}]\!]\leftarrow[\![\vec{x}]\!]\times[\![\vec{y}]\!] means parallel execution of [[z[i]]][[x[i]]]×[[y[i]]][\![\vec{z}[i]]\!]\leftarrow[\![\vec{x}[i]]\!]\times[\![\vec{y}[i]]\!] for i[1,n]i\in[1,n], and [[z]]𝖨𝖿𝖤𝗅𝗌𝖾([[c]],[[t]],[[f]])[\![\vec{z}]\!]\leftarrow\operatorname{\sf IfElse}([\![\vec{c}]\!],[\![\vec{t}]\!],[\![\vec{f}]\!]) means parallel execution of [[z[i]]]𝖨𝖿𝖤𝗅𝗌𝖾([[c[i]]],[[t[i]]],[[f[i]]])[\![\vec{z}[i]]\!]\leftarrow\operatorname{\sf IfElse}([\![\vec{c}[i]]\!],[\![\vec{t}[i]]\!],[\![\vec{f}[i]]\!]) for i[1,n]i\in[1,n].

If some of the inputs are scalar, the same scalar values are used for all executions. For example, [[z]]2×[[y]][\![\vec{z}]\!]\leftarrow 2\times[\![\vec{y}]\!] means parallel execution of [[z[i]]]2×[[y[i]]][\![\vec{z}[i]]\!]\leftarrow 2\times[\![\vec{y}[i]]\!] for i[1,n]i\in[1,n], and [[z]]𝖨𝖿𝖤𝗅𝗌𝖾([[c]],[[t]],1)[\![\vec{z}]\!]\leftarrow\operatorname{\sf IfElse}([\![c]\!],[\![\vec{t}]\!],1) means parallel execution of [[z[i]]]𝖨𝖿𝖤𝗅𝗌𝖾([[c]],[[t[i]]],1)[\![\vec{z}[i]]\!]\leftarrow\operatorname{\sf IfElse}([\![c]\!],[\![\vec{t}[i]]\!],1) for i[1,n]i\in[1,n].

3 Our secure group-wise aggregation protocols

In this section, we propose group-wise aggregation protocols that compute aggregate values (sum, prefix sum, and maximum) for each group without revealing the grouping information of the input grouped values. These are executed on grouped values stored in our data structure. These protocols and the data structure play a central role in the construction of our decision tree training protocol proposed in Section 4.

3.1 Our data structure for privately grouped values

We propose a data structure that stores grouped values without revealing any information about the grouping. We store nn values, divided into several groups, in a private vector [[x]][\![\vec{x}]\!] of length nn, called the internally grouped vector. Here, elements in the same group are stored as consecutive elements in the vector. That is, for any ii, jj, and kk such that 1i<j<kn1\leq i<j<k\leq n, if x[i]\vec{x}[i] and x[k]\vec{x}[k] are in the same group, then x[i]\vec{x}[i] and x[k]\vec{x}[k] are also in the same group. Along with such a vector, we maintain a private bit vector [[g]][\![\vec{g}]\!] of length nn, called the group flag vector, which indicates the boundaries between groups. Namely, we set g[i]=1\vec{g}[i]=1 if the ii-th element in x\vec{x} is the first element in a group, otherwise g[i]=0\vec{g}[i]=0. By definition, g[1]=1\vec{g}[1]=1 is always true.

We show an example. Suppose that six values are stored in an internally grouped vector x\vec{x} as x=(3,1,2,2,3,2)\vec{x}=(3,1,2,2,3,2) and the corresponding group flag vector is g=(1,0,1,1,0,0)\vec{g}=(1,0,1,1,0,0). Then, this means that the six values are divided into three groups, (3,1)(3,1), (2)(2), and (2,3,2)(2,3,2).

For the sake of simplicity, we introduce some notations. Let 𝖧𝖾𝖺𝖽(g,i)\operatorname{\sf Head}(\vec{g},i) (𝖳𝖺𝗂𝗅(g,i)\operatorname{\sf Tail}(\vec{g},i)) be the index of the first (last, respectively) element of the group that contains the ii-th element within the grouping represented by a group flag vector g\vec{g}. Formally, they are defined as 𝖧𝖾𝖺𝖽(g,i):=max{j[1,i]g[j]=1}\operatorname{\sf Head}(\vec{g},i):=\max\{j\in[1,i]\mid\vec{g}[j]=1\} and 𝖳𝖺𝗂𝗅(g,i):=min{j(i,n]g[j]=1}{n+1}1\operatorname{\sf Tail}(\vec{g},i):=\min\{j\in(i,n]\mid\vec{g}[j]=1\}\cup\{n+1\}-1, respectively, where nn is the length of g\vec{g}. For example, if a group flag vector is defined as g=(1,0,1,1,0,0)\vec{g}=(1,0,1,1,0,0), then 𝖧𝖾𝖺𝖽(g,2)=1\operatorname{\sf Head}(\vec{g},2)=1, 𝖧𝖾𝖺𝖽(g,4)=4\operatorname{\sf Head}(\vec{g},4)=4, 𝖳𝖺𝗂𝗅(g,4)=6\operatorname{\sf Tail}(\vec{g},4)=6, and 𝖳𝖺𝗂𝗅(g,3)=3\operatorname{\sf Tail}(\vec{g},3)=3.

3.2 Our protocol for group-wise sum

Table 2: Example of input/output for our group-wise aggregation protocols, where g\vec{g} is a group flag vector and x\vec{x} is an internally grouped vector.
Input Output
g\vec{g} x\vec{x} Sum Prefix sum Max
1 3 4 3 3
0 1 4 4 3
1 2 2 2 2
1 2 7 2 3
0 3 7 5 3
0 2 7 7 3
Notation: [[y]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],[[x]])[\![\vec{y}]\!]\leftarrow\operatorname{\sf GroupSum}([\![\vec{g}]\!],[\![\vec{x}]\!]).
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private internally grouped vector [[x]][\![\vec{x}]\!] of length nn.
Output: A private vector [[y]][\![\vec{y}]\!] of length nn.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1
2[[p]]𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆𝖱([[x]])×[[g]][\![\vec{p}]\!]\leftarrow\operatorname{\sf PrefixSumR}([\![\vec{x}]\!])\times[\![\vec{g}]\!].
3 [[π]]𝖲𝗈𝗋𝗍𝖯𝖾𝗋𝗆(¬[[g]])[\![\pi]\!]\leftarrow\operatorname{\sf SortPerm}(\lnot[\![\vec{g}]\!]).
4 [[p1]]𝖠𝗉𝗉𝗅𝗒([[π]],[[p]])[\![\vec{p}_{1}]\!]\leftarrow\operatorname{\sf Apply}([\![\pi]\!],[\![\vec{p}]\!]).
5 [[s1]]𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆𝖱1([[p1]])[\![\vec{s}_{1}]\!]\leftarrow\operatorname{\sf PrefixSumR}^{-1}([\![\vec{p}_{1}]\!]).
6 [[d1]]𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆1([[s1]])[\![\vec{d}_{1}]\!]\leftarrow\operatorname{\sf PrefixSum}^{-1}([\![\vec{s}_{1}]\!]).
7 [[d]]𝖴𝗇𝖺𝗉𝗉𝗅𝗒([[π]],[[d1]])×[[g]][\![\vec{d}]\!]\leftarrow\operatorname{\sf Unapply}([\![\pi]\!],[\![\vec{d}_{1}]\!])\times[\![\vec{g}]\!].
8 [[y]]𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆([[d]])[\![\vec{y}]\!]\leftarrow\operatorname{\sf PrefixSum}([\![\vec{d}]\!]).
Algorithm 2 Group-wise sum.

The group-wise sum protocol privately computes sums of each group in our data structure. It receives a private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private internally grouped vector [[x]][\![\vec{x}]\!] of length nn, and outputs a private vector [[y]][\![\vec{y}]\!] of length nn, where y[i]=j=𝖧𝖾𝖺𝖽(g,i)𝖳𝖺𝗂𝗅(g,i)x[j]\vec{y}[i]=\sum_{j=\operatorname{\sf Head}(\vec{g},i)}^{\operatorname{\sf Tail}(\vec{g},i)}\vec{x}[j] for i[1,n]i\in[1,n]. Note that the same value is computed for elements in the same group. An example is shown in Table 2. Columns 1 and 2 are the inputs, and column 3 is the output.

Before presenting our protocol, let us define some operations related to the computation of prefix sum. Given a vector x\vec{x} of length nn, z𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆(x)\vec{z}\leftarrow\operatorname{\sf PrefixSum}(\vec{x}) computes a vector z\vec{z} of length nn such that z[i]=j=1ix[j]\vec{z}[i]=\sum_{j=1}^{i}\vec{x}[j] for i[1,n]i\in[1,n]. We also define an inverse operation 𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆1\operatorname{\sf PrefixSum}^{-1}. Let z𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆1(x)\vec{z}\leftarrow\operatorname{\sf PrefixSum}^{-1}(\vec{x}) denote an operation that computes z\vec{z} such that x=𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆(z)\vec{x}=\operatorname{\sf PrefixSum}(\vec{z}). This can be easily computed as z[1]:=x[1]\vec{z}[1]:=\vec{x}[1] and z[i]:=x[i]x[i1]\vec{z}[i]:=\vec{x}[i]-\vec{x}[i-1] for all i[2,n]i\in[2,n]. We further define reverse-ordered versions of these operations. Given a vector x\vec{x} of length nn, z𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆𝖱(x)\vec{z}\leftarrow\operatorname{\sf PrefixSumR}(\vec{x}) computes a vector z\vec{z} of length nn such that z[i]=j=inx[j]\vec{z}[i]=\sum_{j=i}^{n}\vec{x}[j] for i[1,n]i\in[1,n]. Given a vector x\vec{x} of length nn, z𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆𝖱1(x)\vec{z}\leftarrow\operatorname{\sf PrefixSumR}^{-1}(\vec{x}) computes a vector z\vec{z} of length nn such that x=𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆𝖱(z)\vec{x}=\operatorname{\sf PrefixSumR}(\vec{z}). This is computed as z[n]:=x[n]\vec{z}[n]:=\vec{x}[n] and z[i]:=x[i]x[i+1]\vec{z}[i]:=\vec{x}[i]-\vec{x}[i+1] for all i[1,n)i\in[1,n).

The protocol for group-wise sum is shown in Algorithm 2. Let rr be the number of groups in the input. In Algorithms 2 to 2, we compute [[s1]][\![\vec{s}_{1}]\!] so that for each j[1,r]j\in[1,r], s1[j]\vec{s}_{1}[j] is the sum in the jj-th group in x\vec{x}. This follows from the fact that the collection of the first elements of each group in p\vec{p} is equal to the reverse-ordered prefix sum of the sums in each group in x\vec{x}. Next, in Algorithms 2 to 2, we copy each s1[j]\vec{s}_{1}[j], which is the sum in the jj-th group of x\vec{x}, to each element of the jj-th group. To do this, we apply the technique used by Laud in his parallel reading protocol [Lau15] as follows. We use the fact that when the prefix sum is computed, an element with a value of zero will be copied with the result of the preceding element. Specifically, in Algorithm 2, the values are restored to their original order so that the first element of each group becomes the sum of each group and the other elements become zero, and in Algorithm 2, the prefix sum of the entire vector is computed. However, this will copy the prefix sum of the sums instead of the sums. Therefore, the inverse operation for the prefix sum is preliminarily performed in Algorithm 2.

Note that this protocol is also useful for copying a particular element in a group to all elements in the group, i.e., we clear all but the source elements with zeros and then apply this protocol. This technique will be used in the following two protocols Algorithms 3 and 4.

The protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.

3.3 Our protocol for group-wise prefix sum

Notation: [[y]]𝖦𝗋𝗈𝗎𝗉𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆([[g]],[[x]])[\![\vec{y}]\!]\leftarrow\operatorname{\sf GroupPrefixSum}([\![\vec{g}]\!],[\![\vec{x}]\!]).
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private internally grouped vector [[x]][\![\vec{x}]\!] of length nn.
Output: A private vector [[y]][\![\vec{y}]\!] of length nn.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1
2[[s]]𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆([[x]])[\![\vec{s}]\!]\leftarrow\operatorname{\sf PrefixSum}([\![\vec{x}]\!]).
3 [[q[1]]]0[\![\vec{q}[1]]\!]\leftarrow 0 and [[q[i]]][[s[i1]]]×[[g[i]]][\![\vec{q}[i]]\!]\leftarrow[\![\vec{s}[i-1]]\!]\times[\![\vec{g}[i]]\!] for i[2,n]i\in[2,n].
4 [[y]][[s]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],[[q]])[\![\vec{y}]\!]\leftarrow[\![\vec{s}]\!]-\operatorname{\sf GroupSum}([\![\vec{g}]\!],[\![\vec{q}]\!]).
Algorithm 3 Group-wise prefix sum.

The group-wise prefix sum protocol privately computes prefix sums of each group in our data structure. It receives a private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private internally grouped vector [[x]][\![\vec{x}]\!] of values to be summed, and outputs a private vector [[y]][\![\vec{y}]\!] of prefix sums for each group such that y[i]=j=𝖧𝖾𝖺𝖽(g,i)ix[j]\vec{y}[i]=\sum_{j=\operatorname{\sf Head}(\vec{g},i)}^{i}\vec{x}[j]. An example of input/output is shown in Table 2. Columns 1 and 2 are the inputs, and column 4 is the output.

The protocol is shown in Algorithm 3. We first compute [[s]][\![\vec{s}]\!], which is the prefix sum of [[x]][\![\vec{x}]\!] (Algorithm 3). This looks almost done, but each value s[i]\vec{s}[i] exceeds the desired value by a partial sum from the first element of x\vec{x} to the last element of the preceding group in x\vec{x}. Therefore, we try to subtract these partial sums from [[s]][\![\vec{s}]\!] and obtain the desired output. The predecessor of the first element in the jj-th group in s\vec{s} is equal to the partial sum from the first element in x\vec{x} to the last element in the (j1)(j-1)-th group of x\vec{x}. Using this property, we construct a vector [[q]][\![\vec{q}]\!] that contains such values as the first values of the groups (Algorithm 3). We then copy the first elements of each group in [[q]][\![\vec{q}]\!] to other elements by applying 𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆\operatorname{\sf GroupSum} protocol to [[q]][\![\vec{q}]\!]. Finally, we subtract this from [[s]][\![\vec{s}]\!] to obtain the prefix sum for each group (Algorithm 3).

The protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.

3.4 Our protocol for group-wise max

Notation: [[y]]𝖦𝗋𝗈𝗎𝗉𝖬𝖺𝗑([[g]],[[x]])[\![\vec{y}]\!]\leftarrow\operatorname{\sf GroupMax}([\![\vec{g}]\!],[\![\vec{x}]\!]).
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private internally grouped vector [[x]][\![\vec{x}]\!] of length nn.
Output: A private vector [[y]][\![\vec{y}]\!] of length nn.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1
2m:=lognm:=\left\lceil\log n\right\rceil.
3 [[g(0)]][[g]][\![\vec{g}^{(0)}]\!]\leftarrow[\![\vec{g}]\!] and [[x(0)]][[x]][\![\vec{x}^{(0)}]\!]\leftarrow[\![\vec{x}]\!].
4 for d:=0d:=0 to m1m-1 do
5       w:=2dw:=2^{d}.
6       [[g(d+1)]][[g(d)]][\![\vec{g}^{(d+1)}]\!]\leftarrow[\![\vec{g}^{(d)}]\!], [[x(d+1)]][[x(d)]][\![\vec{x}^{(d+1)}]\!]\leftarrow[\![\vec{x}^{(d)}]\!].
7       for each j[w+1,n]j\in[w+1,n] do in parallel
8             [[g(d+1)[j]]][[g(d)[jw]]]\nonscriptOR\nonscript[[g(d)[j]]][\![\vec{g}^{(d+1)}[j]]\!]\leftarrow[\![\vec{g}^{(d)}[j-w]]\!]\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu\mkern 5.0mu\mathbin{\operator@font{\textsf{OR}}}\penalty 900\mkern 5.0mu\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu[\![\vec{g}^{(d)}[j]]\!].
9             [[a]]𝖬𝖺𝗑([[x(d)[jw]]],[[x(d)[j]]])[\![\vec{a}]\!]\leftarrow\operatorname{\sf Max}([\![\vec{x}^{(d)}[j-w]]\!],[\![\vec{x}^{(d)}[j]]\!]).
10             [[x(d+1)[j]]]𝖨𝖿𝖤𝗅𝗌𝖾([[g(d)[j]]],[[x(d)[j]]],[[a]])[\![\vec{x}^{(d+1)}[j]]\!]\leftarrow\operatorname{\sf IfElse}([\![\vec{g}^{(d)}[j]]\!],[\![\vec{x}^{(d)}[j]]\!],[\![\vec{a}]\!]).
11            
12[[t[i]]][[g[i+1]]][\![\vec{t}[i]]\!]\leftarrow[\![\vec{g}[i+1]]\!] for i[1,n)i\in[1,n) and [[t[n]]]1[\![\vec{t}[n]]\!]\leftarrow 1.
13 [[y]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],[[t]]×[[x(m)]])[\![\vec{y}]\!]\leftarrow\operatorname{\sf GroupSum}([\![\vec{g}]\!],[\![\vec{t}]\!]\times[\![\vec{x}^{(m)}]\!]).
Algorithm 4 Group-wise max.

The group-wise max protocol privately computes maximum values of each group in our data structure. It receives a private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private internally grouped vector [[x]][\![\vec{x}]\!] of length nn, and then outputs a private vector [[y]][\![\vec{y}]\!] of the maximum values for each group such that y[i]=maxj[𝖧𝖾𝖺𝖽(g,i),𝖳𝖺𝗂𝗅(g,i)]x[j]y[i]=\max_{j\in[\operatorname{\sf Head}(\vec{g},i),\operatorname{\sf Tail}(\vec{g},i)]}x[j]. An example is shown in Table 2. Columns 1 and 2 are the inputs, and column 5 is the output.

The protocol is shown in Algorithm 4. First, in Algorithms 4 to 4, we compute [[x(m)]][\![\vec{x}^{(m)}]\!], where jj-th element in x(m)\vec{x}^{(m)} represents the maximum value up to x[j]\vec{x}[j] in the group in x\vec{x}. That is, x(m)[j]=maxi[𝖧𝖾𝖺𝖽(g,j),j]x[i]\vec{x}^{(m)}[j]=\max_{i\in[\operatorname{\sf Head}(\vec{g},j),j]}\vec{x}[i]. The underlying idea is as follows. Suppose we have a vector xw\vec{x}_{w} that satisfies xw[i]=maxj(iw,i]x[j]\vec{x}_{w}[i]=\max_{j\in(i-w,i]}\vec{x}[j] for each ii. Then, we can obtain x2w\vec{x}_{2w} by computing x2w[i]:=max(xw[i],xw[iw])\vec{x}_{2w}[i]:=\max(\vec{x}_{w}[i],\vec{x}_{w}[i-w]) for each ii. Since x1=x\vec{x}_{1}=\vec{x}, we can compute maxj[1,i]x[j]\max_{j\in[1,i]}\vec{x}[j] for each ii by repeating this for Θ(logn)\Theta(\log n) rounds. In the group-by prefix sum protocol, we want to compute maxj[𝖧𝖾𝖺𝖽(g,i),i]x[j]\max_{j\in[\operatorname{\sf Head}(\vec{g},i),i]}\vec{x}[j] instead of maxj[1,i]x[j]\max_{j\in[1,i]}\vec{x}[j]. Therefore, in addition to the maximum value in the range (iw,i](i-w,i], we keep a flag indicating whether the first element of the group is in this range or not. Then, we can compute the desired value by not updating the maximum value if the corresponding flag is 11. Then, in Algorithms 4 and 4, we copy the last elements of each group in [[x(m)]][\![\vec{x}^{(m)}]\!] to other elements as we have done in Algorithm 3. Here, t[i]=1\vec{t}[i]=1 if and only if the ii-th element is the last element in a group.

The protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds, since each iteration in Algorithm 4 requires O(n)O(n) non-free operations in O(1)O(1) rounds.

4 Our efficient decision tree training protocol

In this section, we present our decision tree training protocol. Given a private training data set, it outputs the trained decision tree in a private form. Since the output decision tree is normalized for efficiency, we first describe it in Section 4.1. Then, in Section 4.2, we explain how our protocol trains the tree in a layer-by-layer manner. This part contains the main idea to reduce the 2h2^{h} factor in the computational cost to hh. The details of the batch test selection are deferred to Section 4.3. Note that our group-wise operations described in Section 3 are used throughout our protocols presented in this section.

4.1 Decision tree normalization for efficiency

Our training protocol outputs an equivalent normalized decision tree instead of the one that should have been obtained when training in the clear. Equivalent in this case means that the output for any given input is the same. Roughly speaking, our normalization aligns the heights of all leaf nodes to the upper bound on the tree height by inserting internal nodes that forward any sample to the false child node. Although this increases the number of nodes in the tree, in MPC, it reduces the data size and computational cost (though by just a constant factor), and simplifies the protocol.

Before describing the details of our normalization, let us recall the decision tree we originally wanted to compute, which is the output of Algorithm 1. It is a binary tree with height less than or equal to hh, and all tests for its internal nodes are of the form Xj<tX_{j}<t, where jj is an attribute number and tt is a threshold.

Our normalization consists of two modifications. The first modification is to change each test Xj<tX_{j}<t to an equivalent test 2Xj<t2X_{j}<t^{\prime} such that t=2tt^{\prime}=2t. In the original algorithm, we compute t=(xj[i]+xj[i+1])/2t=(\vec{x}_{j}[i]+\vec{x}_{j}[i+1])/2 when computing a threshold, but this involves division, which is a costly operation in MPC. We avoid division by computing t=2t=xj[i]+xj[i+1]t^{\prime}=2t=\vec{x}_{j}[i]+\vec{x}_{j}[i+1] instead of tt.

Our second modification is to align the height of all leaf nodes without changing the tree’s output. The modification is simple. We insert an internal node uu, which does not actually split, into the position of a leaf node vv with height less than hh. The test for uu is 2X1<𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤2X_{1}<{\sf MIN\_VALUE}, which always returns false, and uu has a false branch to vv, where 𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤{\sf MIN\_VALUE} is a sufficiently small public value. Any input tuple that reaches uu passes through the false branch to reach vv, so the predicted label does not change. In the normalized tree, all nodes with height less than hh are internal nodes, and all nodes with height hh are leaf nodes. This makes our protocol simple and efficient.

4.2 Our layer-by-layer training protocol

Notation: ([[𝖫𝖺𝗒𝖾𝗋(k)]])k[0,h]𝖣𝖾𝖼𝗂𝗌𝗂𝗈𝗇𝖳𝗋𝖾𝖾𝖳𝗋𝖺𝗂𝗇𝗂𝗇𝗀(([[xj(0)]])j[1,m],[[y(0)]],h)([\![{\sf Layer}^{(k)}]\!])_{k\in[0,h]}\leftarrow\operatorname{\sf DecisionTreeTraining}(([\![\vec{x}_{j}^{(0)}]\!])_{j\in[1,m]},[\![\vec{y}^{(0)}]\!]{},h)
Input: mm private vectors ([[xj(0)]])j[1,m]([\![\vec{x}_{j}^{(0)}]\!])_{j\in[1,m]} of length nn, a private vector [[y(0)]][\![\vec{y}^{(0)}]\!]{} of length nn, and an integer hh.
Output: A private decision tree ([[𝖫𝖺𝗒𝖾𝗋(k)]])k[0,h]([\![{\sf Layer}^{(k)}]\!])_{k\in[0,h]} of height hh.
Cost: O(hmnlogn)O(hmn\log n) non-free operations in O(h(logn+logm))O(h(\log n+\log m)) rounds.
1
2[[g(0)[1]]]1[\![\vec{g}^{(0)}[1]]\!]\leftarrow 1 and [[g(0)[i]]]0[\![\vec{g}^{(0)}[i]]\!]\leftarrow 0 for i[2,n]i\in[2,n].
3 [[𝖭𝖨𝖣(0)[i]]]1[\![{\sf NID}^{(0)}[i]]\!]\leftarrow 1 for i[1,n]i\in[1,n].
4 for k:=0k:=0 to h1h-1 do
5       [[𝖫𝖺𝗒𝖾𝗋(k)]],[[b(k)]]𝖳𝗋𝖺𝗂𝗇𝖨𝗇𝗍𝖾𝗋𝗇𝖺𝗅𝖭𝗈𝖽𝖾𝗌(k,([[xj(k)]])j,[[y(k)]],[[g(k)]],[[𝖭𝖨𝖣(k)]])[\![{\sf Layer}^{(k)}]\!],[\![\vec{b}^{(k)}]\!]\leftarrow\operatorname{\sf TrainInternalNodes}(k,([\![\vec{x}_{j}^{(k)}]\!])_{j},[\![\vec{y}^{(k)}]\!],[\![\vec{g}^{(k)}]\!],[\![{\sf NID}^{(k)}]\!]).
6       [[𝖭𝖨𝖣]]2k×[[b(k)]]+[[𝖭𝖨𝖣(k)]][\![{\sf NID}]\!]\leftarrow 2^{k}\times[\![\vec{b}^{(k)}]\!]+[\![{\sf NID}^{(k)}]\!].
7       [[g]]𝖦𝗋𝗈𝗎𝗉𝖥𝗂𝗋𝗌𝗍𝖮𝗇𝖾([[g(k)]],¬[[b(k)]])+𝖦𝗋𝗈𝗎𝗉𝖥𝗂𝗋𝗌𝗍𝖮𝗇𝖾([[g(k)]],[[b(k)]])[\![\vec{g}]\!]\leftarrow\operatorname{\sf GroupFirstOne}([\![\vec{g}^{(k)}]\!],\lnot[\![\vec{b}^{(k)}]\!])+\operatorname{\sf GroupFirstOne}([\![\vec{g}^{(k)}]\!],[\![\vec{b}^{(k)}]\!]).
8       [[x1(k+1)]],,[[xm(k+1)]],[[y(k+1)]],[[g(k+1)]],[[𝖭𝖨𝖣(k+1)]]𝖲𝗈𝗋𝗍([[b(k)]];[[x1(k)]],,[[xm(k)]],[[y(k)]],[[g]],[[𝖭𝖨𝖣]])[\![\vec{x}_{1}^{(k+1)}]\!],\dots,[\![\vec{x}_{m}^{(k+1)}]\!],[\![\vec{y}^{(k+1)}]\!],[\![\vec{g}^{(k+1)}]\!],[\![{\sf NID}^{(k+1)}]\!]\leftarrow\operatorname{\sf Sort}([\![\vec{b}^{(k)}]\!];[\![\vec{x}_{1}^{(k)}]\!],\dots,[\![\vec{x}_{m}^{(k)}]\!],[\![\vec{y}^{(k)}]\!],[\![\vec{g}]\!],[\![{\sf NID}]\!]).
9      
10[[𝖫𝖺𝗒𝖾𝗋(h)]]𝖳𝗋𝖺𝗂𝗇𝖫𝖾𝖺𝖿𝖭𝗈𝖽𝖾𝗌(h,[[g(h)]],[[y(h)]],[[𝖭𝖨𝖣(h)]])[\![{\sf Layer}^{(h)}]\!]\leftarrow\operatorname{\sf TrainLeafNodes}(h,[\![g^{(h)}]\!],[\![\vec{y}^{(h)}]\!],[\![{\sf NID}^{(h)}]\!]).
Algorithm 5 Decision tree training.
Notation: [[f]]𝖦𝗋𝗈𝗎𝗉𝖥𝗂𝗋𝗌𝗍𝖮𝗇𝖾([[g]],[[b]])[\![\vec{f}]\!]\leftarrow\operatorname{\sf GroupFirstOne}([\![\vec{g}]\!],[\![\vec{b}]\!]).
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private bit vector [[b]][\![\vec{b}]\!] of length nn.
Output: A private bit vector [[f]][\![\vec{f}]\!] of length nn.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1
2[[s]]𝖦𝗋𝗈𝗎𝗉𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆([[g]],[[b]])[\![\vec{s}]\!]\leftarrow\operatorname{\sf GroupPrefixSum}([\![\vec{g}]\!],[\![\vec{b}]\!]).
3 [[f]]([[s]]×[[b]]=?1)[\![\vec{f}]\!]\leftarrow([\![\vec{s}]\!]\times[\![\vec{b}]\!]\stackrel{{\scriptstyle?}}{{=}}1).
Algorithm 6 Detecting first ones in a private internally grouped vector.

This section describes the main part of our decision tree training protocol. We construct a decision tree by training nodes of the same height in a batch, layer by layer, while keeping the input and output secret. Training samples assigned to different nodes in the same layer are processed as internally separate groups using the protocols proposed in Section 3. This improves the 2h2^{h} factor of the communication complexity in [AEV21] to hh.

4.2.1 Encoding of inputs and outputs

Our decision tree training protocol receives a private training dataset and a public upper bound hh on the height of the tree, and outputs a private decision tree of height hh. The training dataset consists of nn samples, each of which consists of input tuple and a binary value called a class label. Each input tuple consists of mm numerical input attribute values. Our protocol receives it as mm private vectors [[xj]][\![\vec{x}_{j}]\!] (j[1,m]j\in[1,m]) of length nn and a private vector [[y]][\![\vec{y}]\!] of length nn. That is, the ii-th input tuple of the training dataset and its associated class label correspond to (x1[i],x2[i],,xm[i])(\vec{x}_{1}[i],\vec{x}_{2}[i],\dots,\vec{x}_{m}[i]) and y[i]\vec{y}[i], respectively.

The output tree is a normalized binary tree as described in Section 4.1. It is stored in 3h+23h+2 private vectors. Since all nodes of height k[0,h)k\in[0,h) are internal nodes, the node number, attribute number, and threshold of each node are stored in three vectors 𝖭𝖨𝖣(k){\sf NID}^{(k)}, 𝖠𝖨𝖣(k){\sf AID}^{(k)}, and 𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽(k){\sf Threshold}^{(k)}, respectively. Since all nodes of height hh are leaf nodes, the node number and leaf label of each node are stored in two vectors 𝖭𝖨𝖣(k){\sf NID}^{(k)} and 𝖫𝖺𝖻𝖾𝗅(k){\sf Label}^{(k)}, respectively. The length of each vector of height k[0,h]k\in[0,h] is min{n,2k}\min\{n,2^{k}\}. If the actual number of nodes is smaller than the length of the vector, it is filled with a dummy value 𝖭𝖴𝖫𝖫{\sf NULL}. The vectors of each layer are collectively denoted as [[𝖫𝖺𝗒𝖾𝗋(k)]]:=([[𝖭𝖨𝖣(k)]],[[𝖠𝖨𝖣(k)]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽(k)]])[\![{\sf Layer}^{(k)}]\!]:=([\![{\sf NID}^{(k)}]\!],[\![{\sf AID}^{(k)}]\!],[\![{\sf Threshold}^{(k)}]\!]) (k[0,h)k\in[0,h)) and [[𝖫𝖺𝗒𝖾𝗋(h)]]:=([[𝖭𝖨𝖣(h)]],[[𝖫𝖺𝖻𝖾𝗅(h)]])[\![{\sf Layer}^{(h)}]\!]:=([\![{\sf NID}^{(h)}]\!],[\![{\sf Label}^{(h)}]\!]), which we call layer information.

In order to represent the edges between nodes, each node is assigned an integer node number. The only node with height 0 is the root, and its node number is 11. For each child node (of height k+1k+1) of a node of height kk with node number dd, assign node number dd to the false child (if any) and node number d+2kd+2^{k} to the true child (if any). With this numbering scheme, in the kk-th layer, all node numbers are assigned different values from [1,2k][1,2^{k}].

4.2.2 The main protocol of our decision tree training

The main protocol of our decision tree training is shown in Algorithm 5. It trains the decision tree layer by layer in order from the 0-th layer to the hh-th layer. Samples and associated values are stored in our private grouping data structure as internally grouped vectors. Throughout the training, a group flag vector g(k)\vec{g}^{(k)} represents the grouping to each node of the kk-th layer. 𝖭𝖨𝖣(k){\sf NID}^{(k)},xj(k)\vec{x}_{j}^{(k)}, and y(k)\vec{y}^{(k)} are internally grouped vectors grouped by g(k)\vec{g}^{(k)} that store the node numbers, input attribute values, and output attribute values, respectively. In the 0-th layer, all samples are initialized to be assigned to the root node whose node number is 11 (Algorithms 5 and 5). Then, each layer is trained iteratively (Algorithm 5).

At each iteration, we first trains the nodes at the kk-th layer and computes test result b(k)\vec{b}^{(k)} for each sample (Algorithm 5). This is executed in the 𝖳𝗋𝖺𝗂𝗇𝖨𝗇𝗍𝖾𝗋𝗇𝖺𝗅𝖭𝗈𝖽𝖾𝗌\operatorname{\sf TrainInternalNodes} protocol, which we will describe in Section 4.2.3. Each b(k)[i]\vec{b}^{(k)}[i] represents the test result of the ii-th sample. 0 and 1 denote false and true, respectively.

The node numbers 𝖭𝖨𝖣{\sf NID} and group flags g\vec{g} at the next layer are computed in Algorithms 5 and 5, respectively. Then, g(k+1)\vec{g}^{(k+1)}, 𝖭𝖨𝖣(k+1){\sf NID}^{(k+1)}, xj(k+1)\vec{x}_{j}^{(k+1)}, and y(k+1)\vec{y}^{(k+1)} of the (k+1)(k+1)-th layer are computed by stably sorting g\vec{g}, 𝖭𝖨𝖣{\sf NID}, xj(k)\vec{x}_{j}^{(k)}, and y(k)\vec{y}^{(k)} by b(k)\vec{b}^{(k)} (Algorithm 5). Thanks to the stability of sorting, both the correspondence between the values of each sample and the contiguity of elements in the same group are maintained.

Let us verify the correctness of node numbers 𝖭𝖨𝖣{\sf NID} and group flags g\vec{g} for the next layer. Let vv be a node at the kk-th layer with node number dd. The node number of a child node of vv is dd for a false child and d+2kd+2^{k} for a true child. Thus, [[𝖭𝖨𝖣]][\![{\sf NID}]\!], which is computed in Algorithm 5, is the node number in the next layer of each sample. As for the group flags, since the splitting of the groups is stable, the first 0 and 1 in each group are the first elements of the group after the split. Since ¬[[b(k1)]]\lnot[\![\vec{b}^{(k-1)}]\!] and [[b(k1)]][\![\vec{b}^{(k-1)}]\!] indicate the positions of 0’s and 11’s, respectively, the first elements of the new groups can be detected using 𝖦𝗋𝗈𝗎𝗉𝖥𝗂𝗋𝗌𝗍𝖮𝗇𝖾\operatorname{\sf GroupFirstOne}, which detects first 11’s for all groups. We can construct 𝖦𝗋𝗈𝗎𝗉𝖥𝗂𝗋𝗌𝗍𝖮𝗇𝖾\operatorname{\sf GroupFirstOne} by detecting elements whose value is 11 and whose prefix sum in the group is also 11, as shown in Algorithm 6. The 𝖦𝗋𝗈𝗎𝗉𝖥𝗂𝗋𝗌𝗍𝖮𝗇𝖾\operatorname{\sf GroupFirstOne} protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.

In Algorithm 5, we train the leaf nodes in a batch by invoking the 𝖳𝗋𝖺𝗂𝗇𝖫𝖾𝖺𝖿𝖭𝗈𝖽𝖾𝗌\operatorname{\sf TrainLeafNodes} protocol, which we will describe in Section 4.2.4, and obtain the output vectors 𝖫𝖺𝗒𝖾𝗋(h){\sf Layer}^{(h)} for height hh.

The decision tree training protocol uses O(hmnlogn)O(hmn\log n) non-free operations in O(h(logm+logn))O(h(\log m+\log n)) rounds, since the 𝖳𝗋𝖺𝗂𝗇𝖨𝗇𝗍𝖾𝗋𝗇𝖺𝗅𝖭𝗈𝖽𝖾𝗌\operatorname{\sf TrainInternalNodes} protocol uses O(mnlogn)O(mn\log n) non-free operations in O(logn+logm)O(\log n+\log m), and the 𝖳𝗋𝖺𝗂𝗇𝖫𝖾𝖺𝖿𝖭𝗈𝖽𝖾𝗌\operatorname{\sf TrainLeafNodes} protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds, as we will show in the following sections.

4.2.3 Batch training for internal nodes

Notation: [[𝖫𝖺𝗒𝖾𝗋(k)]],[[b]]𝖳𝗋𝖺𝗂𝗇𝖨𝗇𝗍𝖾𝗋𝗇𝖺𝗅𝖭𝗈𝖽𝖾𝗌(k,([[xj]])j[1,m],[[y]],[[g]],[[𝖭𝖨𝖣]])[\![{\sf Layer}^{(k)}]\!],[\![\vec{b}]\!]\leftarrow\operatorname{\sf TrainInternalNodes}(k,([\![\vec{x}_{j}]\!])_{j\in[1,m]},[\![\vec{y}]\!],[\![\vec{g}]\!],[\![{\sf NID}]\!]).
Input: An integer kk, mm private vectors ([[xj]])j[1,m]([\![\vec{x}_{j}]\!])_{j\in[1,m]} of length nn, a private vector [[y]][\![\vec{y}]\!] of length nn, a private group flag vector [[g]][\![\vec{g}]\!] of length nn, and a private vector [[𝖭𝖨𝖣]][\![{\sf NID}]\!] of length nn.
Output: [[𝖫𝖺𝗒𝖾𝗋(k)]]=([[𝖭𝖨𝖣(k)]],[[𝖠𝖨𝖣(k)]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽(k)]])[\![{\sf Layer}^{(k)}]\!]=([\![{\sf NID}^{(k)}]\!],[\![{\sf AID}^{(k)}]\!],[\![{\sf Threshold}^{(k)}]\!]) and a private bit vector [[b]][\![\vec{b}]\!] of length nn, where [[𝖭𝖨𝖣(k)]][\![{\sf NID}^{(k)}]\!], [[𝖠𝖨𝖣(k)]][\![{\sf AID}^{(k)}]\!], and [[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽(k)]][\![{\sf Threshold}^{(k)}]\!] are private vectors of length min{n,2k}\min\{n,2^{k}\}.
Cost: O(mnlogn)O(mn\log n) non-free operations in O(logn+logm)O(\log n+\log m) rounds.
1 [[𝖠𝖨𝖣]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]]𝖦𝗅𝗈𝖻𝖺𝗅𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇(([[xj]])j,[[y]],[[g]])[\![{\sf AID}]\!],[\![{\sf Threshold}]\!]\leftarrow\operatorname{\sf GlobalTestSelection}(([\![\vec{x}_{j}]\!])_{j},[\![\vec{y}]\!],[\![\vec{g}]\!]).
2 [[s]]𝖦𝗋𝗈𝗎𝗉𝖲𝖺𝗆𝖾([[g]],[[y]])[\![\vec{s}]\!]\leftarrow\operatorname{\sf GroupSame}([\![\vec{g}]\!],[\![\vec{y}]\!]).
3 [[𝖠𝖨𝖣]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]]𝖨𝖿𝖤𝗅𝗌𝖾([[s]];1,𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤;[[𝖠𝖨𝖣]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]])[\![{\sf AID}]\!],[\![{\sf Threshold}]\!]\leftarrow\operatorname{\sf IfElse}([\![\vec{s}]\!];1,{\sf MIN\_VALUE};[\![{\sf AID}]\!],[\![{\sf Threshold}]\!]).
4 [[b]]𝖠𝗉𝗉𝗅𝗒𝖳𝖾𝗌𝗍𝗌(([[xj]])j,[[𝖠𝖨𝖣]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]])[\![\vec{b}]\!]\leftarrow\operatorname{\sf ApplyTests}(([\![\vec{x}_{j}]\!])_{j},[\![{\sf AID}]\!],[\![{\sf Threshold}]\!]).
5 [[𝖭𝖨𝖣(k)]],[[𝖠𝖨𝖣(k)]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽(k)]]𝖥𝗈𝗋𝗆𝖺𝗍𝖫𝖺𝗒𝖾𝗋(k,[[g]],[[𝖭𝖨𝖣]],[[𝖠𝖨𝖣]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]])[\![{\sf NID}^{(k)}]\!],[\![{\sf AID}^{(k)}]\!],[\![{\sf Threshold}^{(k)}]\!]\leftarrow\operatorname{\sf FormatLayer}(k,[\![\vec{g}]\!],[\![{\sf NID}]\!],[\![{\sf AID}]\!],[\![{\sf Threshold}]\!]).
Algorithm 7 Training internal nodes.
Notation: [[f]]𝖦𝗋𝗈𝗎𝗉𝖲𝖺𝗆𝖾([[g]],[[y]])[\![\vec{f}]\!]\leftarrow\operatorname{\sf GroupSame}([\![\vec{g}]\!],[\![\vec{y}]\!]).
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn and A private bit vector [[y]][\![\vec{y}]\!] of length nn.
Output: A private bit vector [[f]][\![\vec{f}]\!] of length nn.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1 [[s]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],1)[\![\vec{s}]\!]\leftarrow\operatorname{\sf GroupSum}([\![\vec{g}]\!],\vec{1}), where 1\vec{1} is a vector (1,,1)(1,\dots,1) of length nn.
2 [[s0]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],¬[[y]])[\![\vec{s}_{0}]\!]\leftarrow\operatorname{\sf GroupSum}([\![\vec{g}]\!],\lnot[\![\vec{y}]\!]).
3 [[s1]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],[[y]])[\![\vec{s}_{1}]\!]\leftarrow\operatorname{\sf GroupSum}([\![\vec{g}]\!],[\![\vec{y}]\!]).
4 [[f]]([[s]]=?[[s0]])\nonscriptOR\nonscript([[s]]=?[[s1]])[\![\vec{f}]\!]\leftarrow([\![\vec{s}]\!]\stackrel{{\scriptstyle?}}{{=}}[\![\vec{s}_{0}]\!])\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu\mkern 5.0mu\mathbin{\operator@font{\textsf{OR}}}\penalty 900\mkern 5.0mu\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu([\![\vec{s}]\!]\stackrel{{\scriptstyle?}}{{=}}[\![\vec{s}_{1}]\!]).
Algorithm 8 Checking if all elements in each group are the same.
Notation: [[b]]𝖠𝗉𝗉𝗅𝗒𝖳𝖾𝗌𝗍𝗌(([[xj]])j[1,m],[[𝖠𝖨𝖣]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]])[\![\vec{b}]\!]\leftarrow\operatorname{\sf ApplyTests}(([\![\vec{x}_{j}]\!])_{j\in[1,m]},[\![{\sf AID}]\!],[\![{\sf Threshold}]\!])
Input: mm private vectors ([[xj]])j[1,m]([\![\vec{x}_{j}]\!])_{j\in[1,m]} of length nn, a private vector [[𝖠𝖨𝖣]][\![{\sf AID}]\!] of length nn, and a private vector [[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]][\![{\sf Threshold}]\!] of length nn.
Output: A private bit vector [[b]][\![\vec{b}]\!] of length nn.
Cost: O(mn)O(mn) non-free operations in O(1)O(1) rounds.
1
2[[ej]]([[𝖠𝖨𝖣]]=?j)[\![\vec{e}_{j}]\!]\leftarrow([\![{\sf AID}]\!]\stackrel{{\scriptstyle?}}{{=}}j) for j[1,m]j\in[1,m].
3 [[x[i]]]j[1,m][[xj[i]]]×[[ej[i]]][\![\vec{x}[i]]\!]\leftarrow\sum_{j\in[1,m]}[\![\vec{x}_{j}[i]]\!]\times[\![\vec{e}_{j}[i]]\!] for i[1,n]i\in[1,n].
4 [[b]](2×[[x]]<?[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]])[\![\vec{b}]\!]\leftarrow(2\times[\![\vec{x}]\!]\stackrel{{\scriptstyle?}}{{<}}[\![{\sf Threshold}]\!]).
Algorithm 9 Applying tests to samples.
Notation: ([[d1]],,[[dc]])𝖥𝗈𝗋𝗆𝖺𝗍𝖫𝖺𝗒𝖾𝗋(k,[[g]],[[a1]],,[[ac]])([\![\vec{d}_{1}]\!],\dots,[\![\vec{d}_{c}]\!])\leftarrow\operatorname{\sf FormatLayer}(k,[\![\vec{g}]\!],[\![\vec{a}_{1}]\!],\dots,[\![\vec{a}_{c}]\!]).
Input: An integer kk, a private group flag vector [[g]][\![\vec{g}]\!] of length nn, and cc private vectors [[a1]],,[[ac]][\![\vec{a}_{1}]\!],\dots,[\![\vec{a}_{c}]\!] of length nn.
Output: A sequence of cc private vectors ([[d1]],,[[dc]])([\![\vec{d}_{1}]\!],\dots,[\![\vec{d}_{c}]\!]), where length of [[dj]][\![\vec{d}_{j}]\!] is min{2k,n}\min\{2^{k},n\} for j[1,c]j\in[1,c].
Cost: O(cnlogn)O(cn\log n) non-free operations in O(logn)O(\log n) rounds.
1
2[[vj]]𝖨𝖿𝖤𝗅𝗌𝖾([[g]],[[aj]],𝖭𝖴𝖫𝖫)[\![\vec{v}_{j}]\!]\leftarrow\operatorname{\sf IfElse}([\![\vec{g}]\!],[\![\vec{a}_{j}]\!],{\sf NULL}) for j[1,c]j\in[1,c].
3 [[v1]],,[[vc]]𝖲𝗈𝗋𝗍(¬[[g]];[[v1]],,[[vc]])[\![\vec{v}_{1}]\!],\dots,[\![\vec{v}_{c}]\!]\leftarrow\operatorname{\sf Sort}(\lnot[\![\vec{g}]\!];[\![\vec{v}_{1}]\!],\dots,[\![\vec{v}_{c}]\!]).
4 Let [[dj]][\![\vec{d}_{j}]\!] be the first min{2k,n}\min\{2^{k},n\} elements of [[vj]][\![\vec{v}_{j}]\!] for j[1,c]j\in[1,c].
Algorithm 10 Formatting vectors for kk-th layer.

This section describes the protocol for training internal nodes in a batch, which we have been putting off. It receives the privately grouped dataset of the kk-th layer (k[0,h)k\in[0,h)), computes the best test for each node, and outputs the test results [[b]][\![\vec{b}]\!] and the layer information [[𝖫𝖺𝗒𝖾𝗋(k)]]=([[𝖭𝖨𝖣(k)]],[[𝖠𝖨𝖣(k)]],[[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽(k)]])[\![{\sf Layer}^{(k)}]\!]=([\![{\sf NID}^{(k)}]\!],[\![{\sf AID}^{(k)}]\!],[\![{\sf Threshold}^{(k)}]\!]).

In Algorithm 7, we compute the best test for each group using the 𝖦𝗅𝗈𝖻𝖺𝗅𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇\operatorname{\sf GlobalTestSelection} protocol which will be shown in Section 4.3.1. In Algorithm 7, we determine if y[i]\vec{y}[i] is all the same in each group, which is part of the stopping criterion described in Section 2.1.1. It is computed by the 𝖦𝗋𝗈𝗎𝗉𝖲𝖺𝗆𝖾\operatorname{\sf GroupSame} protocol shown in Algorithm 8. In the 𝖦𝗋𝗈𝗎𝗉𝖲𝖺𝗆𝖾\operatorname{\sf GroupSame} protocol, s\vec{s}, s0\vec{s}_{0}, and s1\vec{s}_{1} represent the number of elements, the number of 0, and the number of 11 in the group, respectively. Thus, (s=?s0)\nonscriptOR\nonscript(s=?s1)(\vec{s}\stackrel{{\scriptstyle?}}{{=}}\vec{s}_{0})\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu\mkern 5.0mu\mathbin{\operator@font{\textsf{OR}}}\penalty 900\mkern 5.0mu\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu(\vec{s}\stackrel{{\scriptstyle?}}{{=}}\vec{s}_{1}) computes the desired value. It uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds. In Algorithm 7, we replace test with 2X1<𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤2X_{1}<{\sf MIN\_VALUE} for each element whose result in Algorithm 7 is true. Specifically, for each ii such that s[i]=1s[i]=1, [[𝖠𝖨𝖣[i]]][\![{\sf AID}[i]]\!] and [[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽[i]]][\![{\sf Threshold}[i]]\!] are overwritten with 11 and 𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤{\sf MIN\_VALUE}, respectively. In Algorithm 7, the 𝖠𝗉𝗉𝗅𝗒𝖳𝖾𝗌𝗍𝗌\operatorname{\sf ApplyTests} protocol is used to compute the test results from the best tests computed in the Algorithm 7. In Algorithm 7, the 𝖥𝗈𝗋𝗆𝖺𝗍𝖫𝖺𝗒𝖾𝗋\operatorname{\sf FormatLayer} protocol is used to format [[𝖭𝖨𝖣]][\![{\sf NID}]\!], [[𝖠𝖨𝖣]][\![{\sf AID}]\!], and [[𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽]][\![{\sf Threshold}]\!].

The 𝖠𝗉𝗉𝗅𝗒𝖳𝖾𝗌𝗍𝗌\operatorname{\sf ApplyTests} protocol is shown in Algorithm 9. It computes the private results [[b]][\![\vec{b}]\!] of applying the tests denoted by 𝖠𝖨𝖣{\sf AID} and 𝖳𝗁𝗋𝖾𝗌𝗁𝗈𝗅𝖽{\sf Threshold} to the input tuples (xj)j[1,m](\vec{x}_{j})_{j\in[1,m]}. For each i[1,n]i\in[1,n], it computes the flag [[ej[i]]][\![e_{j}[i]]\!] indicating whether 𝖠𝖨𝖣[i]=j{\sf AID}[i]=j by an equality test (Algorithm 9), and uses it to compute the 𝖠𝖨𝖣[i]{\sf AID}[i]-th attribute value x𝖠𝖨𝖣[i][i]\vec{x}_{{\sf AID}[i]}[i] (Algorithm 9). Then, it computes the test result of each element in Algorithm 9. It uses O(mn)O(mn) non-free operations in O(1)O(1) rounds.

The 𝖥𝗈𝗋𝗆𝖺𝗍𝖫𝖺𝗒𝖾𝗋\operatorname{\sf FormatLayer} protocol is shown in Algorithm 10. It removes redundant values from given vectors. Since the node numbers, attribute numbers, thresholds, and leaf labels are all the same in the same group, it is sufficient to leave only one element for each group. Therefore, we clear all but the first elements in each group with 𝖭𝖴𝖫𝖫{\sf NULL} and move the first elements to the front of the vector. Since the number of nodes in the kk-th layer is at most min{n,2k}\min\{n,2^{k}\}, we delete the trailing elements so that each vector has this length. It uses O(cnlogn)O(cn\log n) non-free operations in O(logn)O(\log n) rounds.

The 𝖳𝗋𝖺𝗂𝗇𝖨𝗇𝗍𝖾𝗋𝗇𝖺𝗅𝖭𝗈𝖽𝖾𝗌\operatorname{\sf TrainInternalNodes} protocol uses O(mnlogn)O(mn\log n) non-free operations in O(logn+logm)O(\log n+\log m) rounds, since the 𝖦𝗅𝗈𝖻𝖺𝗅𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇\operatorname{\sf GlobalTestSelection} protocol uses O(mnlogn)O(mn\log n) non-free operations in O(logn+logm)O(\log n+\log m) rounds, as we will show in Section 4.3.1.

4.2.4 Batch training for leaf nodes

Notation: [[𝖫𝖺𝗒𝖾𝗋(h)]]𝖳𝗋𝖺𝗂𝗇𝖫𝖾𝖺𝖿𝖭𝗈𝖽𝖾𝗌(h,[[g]],[[y]],[[𝖭𝖨𝖣]])[\![{\sf Layer}^{(h)}]\!]\leftarrow\operatorname{\sf TrainLeafNodes}(h,[\![\vec{g}]\!],[\![\vec{y}]\!],[\![{\sf NID}]\!]).
Input: An integer hh, a private group flag vector [[g]][\![\vec{g}]\!] of length nn, a private vector [[y]][\![\vec{y}]\!] of length nn, and a private vector [[𝖭𝖨𝖣]][\![{\sf NID}]\!] of length nn.
Output: [[𝖫𝖺𝗒𝖾𝗋(h)]]=([[𝖭𝖨𝖣(h)]],[[𝖫𝖺𝖻𝖾𝗅(h)]])[\![{\sf Layer}^{(h)}]\!]=([\![{\sf NID}^{(h)}]\!],[\![{\sf Label}^{(h)}]\!]), where [[𝖭𝖨𝖣(h)]][\![{\sf NID}^{(h)}]\!] and [[𝖫𝖺𝖻𝖾𝗅(h)]][\![{\sf Label}^{(h)}]\!] are private vectors of length min{n,2h}\min\{n,2^{h}\}.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1
2Computes the leaf labels by [[𝖫𝖺𝖻𝖾𝗅]](𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],¬[[y]])<?𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],[[y]]))[\![{\sf Label}]\!]\leftarrow(\operatorname{\sf GroupSum}([\![\vec{g}]\!],\lnot[\![\vec{y}]\!])\stackrel{{\scriptstyle?}}{{<}}\operatorname{\sf GroupSum}([\![\vec{g}]\!],[\![\vec{y}]\!])), which are the most frequent values of y\vec{y} in each group.
3 [[𝖭𝖨𝖣(h)]],[[𝖫𝖺𝖻𝖾𝗅(h)]]𝖥𝗈𝗋𝗆𝖺𝗍𝖫𝖺𝗒𝖾𝗋(h,[[g]],[[𝖭𝖨𝖣]],[[𝖫𝖺𝖻𝖾𝗅]])[\![{\sf NID}^{(h)}]\!],[\![{\sf Label}^{(h)}]\!]\leftarrow\operatorname{\sf FormatLayer}(h,[\![\vec{g}]\!],[\![{\sf NID}]\!],[\![{\sf Label}]\!]).
Algorithm 11 Training leaf nodes.

This section describes the protocol for training leaf nodes in a batch, which we have putting off. It receives the privately grouped dataset of the hh-th layer, computes the leaf label for each node, and outputs the layer information [[𝖫𝖺𝗒𝖾𝗋(h)]]=([[𝖭𝖨𝖣(h)]],[[𝖫𝖺𝖻𝖾𝗅(h)]])[\![{\sf Layer}^{(h)}]\!]=([\![{\sf NID}^{(h)}]\!],[\![{\sf Label}^{(h)}]\!]). The protocol is shown in Algorithm 11. In Algorithm 11, we compute the most frequent value of output attribute as a leaf label. This is a typical method to define leaf label. In Algorithm 11, the 𝖥𝗈𝗋𝗆𝖺𝗍𝖫𝖺𝗒𝖾𝗋\operatorname{\sf FormatLayer} protocol is used to format [[𝖭𝖨𝖣]][\![{\sf NID}]\!] and [[𝖫𝖺𝖻𝖾𝗅]][\![{\sf Label}]\!].

The protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.

4.3 Our batch test selection protocol

In this section, we explain how to perform a batch node-wise test selection on a dataset that is grouped by nodes and stored in our private grouping data structure. Like the standard test selection method in the clear, our batch test selection consists of three levels of components: one to select the best test across all attributes called the global test selection protocol, one to select the best test for splitting by a specific attribute called the attribute-wise test selection protocol, and one to compute measures called the modified Gini index protocol. We will introduce them in order. Thanks to our group-wise operations, all of them are almost straightforward to construct.

4.3.1 Global test selection

Notation: [[a]],[[t]]𝖦𝗅𝗈𝖻𝖺𝗅𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇(([[xj]])j[1,m],[[y]],[[g]])[\![\vec{a}]\!],[\![\vec{t}]\!]\leftarrow\operatorname{\sf GlobalTestSelection}(([\![\vec{x}_{j}]\!])_{j\in[1,m]},[\![\vec{y}]\!],[\![\vec{g}]\!])
Input: mm private vectors ([[xj]])j[1,m]([\![\vec{x}_{j}]\!])_{j\in[1,m]} of length nn, a private vector [[y]][\![\vec{y}]\!] of length nn, a private group flag vector [[g]][\![\vec{g}]\!] of length nn.
Output: A private vector [[a]][\![\vec{a}]\!] of length nn and a private vector [[t]][\![\vec{t}]\!] of length nn.
Cost: O(mnlogn)O(mn\log n) non-free operations in O(logn+logm)O(\log n+\log m) rounds.
1
2for each j[1,m]j\in[1,m] do in parallel
3       [[uj]],[[vj]]𝖲𝗈𝗋𝗍(𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆([[g]]),[[xj]];[[xj]],[[y]])[\![\vec{u}_{j}]\!],[\![\vec{v}_{j}]\!]\leftarrow\operatorname{\sf Sort}(\operatorname{\sf PrefixSum}([\![\vec{g}]\!]),[\![\vec{x}_{j}]\!];[\![\vec{x}_{j}]\!],[\![\vec{y}]\!]).
4       [[tj]],[[sj]]𝖠𝗍𝗍𝗋𝗂𝖻𝗎𝗍𝖾𝗐𝗂𝗌𝖾𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇([[g]],[[uj]],[[vj]])[\![\vec{t}_{j}]\!],[\![\vec{s}_{j}]\!]\leftarrow\operatorname{\sf AttributewiseTestSelection}([\![\vec{g}]\!],[\![\vec{u}_{j}]\!],[\![\vec{v}_{j}]\!]).
5      
6for each i[1,n]i\in[1,n] do in parallel
7       [[a[i]]],[[t[i]]]𝖵𝖾𝖼𝗍𝖬𝖺𝗑(([[s1[i]]],,[[sm[i]]]);(1,,m),([[t1[i]]],,[[tm[i]]]))[\![\vec{a}[i]]\!],[\![\vec{t}[i]]\!]\leftarrow\operatorname{\sf VectMax}(([\![\vec{s}_{1}[i]]\!],\dots,[\![\vec{s}_{m}[i]]\!]);(1,\dots,m),([\![\vec{t}_{1}[i]]\!],\dots,[\![\vec{t}_{m}[i]]\!])).
8      
Algorithm 12 Batch global test selection.

The global test selection protocol computes the best test through all attributes for each node in a batch. The algorithm is straightforward: it calls the attribute-wise test selection protocol to compute the best test for each attribute and then selects the best test among them. Since the attribute-wise test selection protocol assumes that the data is already sorted within a group for a given attribute, this protocol is responsible for sorting within the group before calling attribute-wise test selection.

The protocol is shown in Algorithm 12. It receives the training data (input tuples and class labels) privately grouped by nodes, and outputs the information (attribute number and threshold) of the best test for each group. For each input attribute, it sorts the input attribute values and class labels within the group and selects the best test for each group when splitting on that attribute (Algorithms 12 to 12). Then select the best test among all attributes in Algorithm 12. Since the output of the attribute-wise test selection protocol is identical in each group, it is sufficient to do this for each element independently.

The protocol is almost identical to the algorithm in the clear and the protocol in [AEV21]. The difference is that we need to sort within each group in Algorithm 12. Recalling that g\vec{g} is a bit vector where only the first element of each group is 11, we can see that 𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆(g)\operatorname{\sf PrefixSum}(\vec{g}) computes different and ascending values for each group. Thus, we can sort within each group by using 𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆(g)\operatorname{\sf PrefixSum}(\vec{g}) and xj\vec{x}_{j} as keys in lexicographic order as in Algorithm 12.

The protocol uses O(mnlogn)O(mn\log n) non-free operations in O(logm+logn)O(\log m+\log n) rounds, since 𝖠𝗍𝗍𝗋𝗂𝖻𝗎𝗍𝖾𝗐𝗂𝗌𝖾𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇\operatorname{\sf AttributewiseTestSelection} protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds, as we will show in the next section.

4.3.2 Attribute-wise test selection

1
Notation: [[t]],[[s]]𝖠𝗍𝗍𝗋𝗂𝖻𝗎𝗍𝖾𝗐𝗂𝗌𝖾𝖳𝖾𝗌𝗍𝖲𝖾𝗅𝖾𝖼𝗍𝗂𝗈𝗇([[g]],[[x]],[[y]])[\![\vec{t}]\!],[\![\vec{s}]\!]\leftarrow\operatorname{\sf AttributewiseTestSelection}([\![\vec{g}]\!],[\![\vec{x}]\!],[\![\vec{y}]\!])
2
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn, a private vector [[x]][\![\vec{x}]\!] of length nn, and a private vector [[y]][\![\vec{y}]\!] of length nn.
3
Output: Private vectors [[t]][\![\vec{t}]\!] and [[s]][\![\vec{s}]\!] of length nn.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
4
5[[s]]𝖬𝗈𝖽𝗂𝖿𝗂𝖾𝖽𝖦𝗂𝗇𝗂(g,y)[\![\vec{s}]\!]\leftarrow\operatorname{\sf ModifiedGini}(g,y).
6 [[t[i]]][[x[i]]]+[[x[i+1]]][\![\vec{t}[i]]\!]\leftarrow[\![\vec{x}[i]]\!]+[\![\vec{x}[i+1]]\!] for all i[1,n)i\in[1,n) and [[t[n]]]𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤[\![\vec{t}[n]]\!]\leftarrow{\sf MIN\_VALUE}.
7
8[[p[i]]][[g[i+1]]]\nonscriptOR\nonscript([[x[i]]]=?[[x[i+1]]])[\![\vec{p}[i]]\!]\leftarrow[\![\vec{g}[i+1]]\!]\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu\mkern 5.0mu\mathbin{\operator@font{\textsf{OR}}}\penalty 900\mkern 5.0mu\nonscript\mskip-4.0mu plus -2.0mu minus -4.0mu([\![\vec{x}[i]]\!]\stackrel{{\scriptstyle?}}{{=}}[\![\vec{x}[i+1]]\!]) for i[1,n)i\in[1,n) and [[p[n]]]1[\![\vec{p}[n]]\!]\leftarrow 1.
9 [[s]],[[t]]𝖨𝖿𝖤𝗅𝗌𝖾([[p]];𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤,𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤;[[s]],[[t]])[\![\vec{s}]\!],[\![\vec{t}]\!]\leftarrow\operatorname{\sf IfElse}([\![\vec{p}]\!];{\sf MIN\_VALUE},{\sf MIN\_VALUE};[\![\vec{s}]\!],[\![\vec{t}]\!]).
10 [[s]],[[t]]𝖦𝗋𝗈𝗎𝗉𝖬𝖺𝗑([[g]],[[s]];[[s]],[[t]])[\![\vec{s}]\!],[\![\vec{t}]\!]\leftarrow\operatorname{\sf GroupMax}([\![\vec{g}]\!],[\![\vec{s}]\!];[\![\vec{s}]\!],[\![\vec{t}]\!]).
Algorithm 13 Batch attribute-wise test selection.

The attribute-wise test selection protocol computes the best tests in each group for a given numerical input attribute. It assumes that the input attribute values and class labels are sorted with respect to the input attribute values within each group. It implements the technique by Abspoel et al. [AEV21] that reduces the number of candidate thresholds from Θ(n2)\Theta(n^{2}) to Θ(n)\Theta(n), on our data structure using the group-wise operations proposed in Section 3. We use the 𝖬𝗈𝖽𝗂𝖿𝗂𝖾𝖽𝖦𝗂𝗇𝗂\operatorname{\sf ModifiedGini} protocol, which will be described in Section 4.3.3, to compute the modified Gini index.

The protocol is shown in Algorithm 13. It receives a private group flag vector [[g]][\![\vec{g}]\!], a private numeric input attribute vector [[x]][\![\vec{x}]\!], and a private class label vector [[y]][\![\vec{y}]\!]. Vectors x\vec{x} and y\vec{y} are sorted with respect to x\vec{x} in each group. Outputs are thresholds [[t]][\![\vec{t}]\!] and scores [[s]][\![\vec{s}]\!] of the best tests in each group.

We show that the protocol computes the best tests in each group for a given numerical input attribute. Since x\vec{x} and y\vec{y} are sorted within each group with respect to x\vec{x}, it is sufficient to consider only the split between two adjacent elements in each group [AEV21]. In Algorithms 13 and 13, the threshold t[i]\vec{t}[i] and the score s[i]\vec{s}[i] for split between the ii-th and (i+1)(i+1)-th elements are computed. If the ii-th element is the last element in a group (i.e., i=ni=n or g[i+1]=1\vec{g}[i+1]=1) or if it has the same attribute value as the next element (i.e., x[i]=x[i+1]\vec{x}[i]=\vec{x}[i+1]), we cannot split between the ii-th and (i+1)(i+1)-th elements. In this case, we set t[i]:=𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤\vec{t}[i]:={\sf MIN\_VALUE} and s[i]:=𝖬𝖨𝖭_𝖵𝖠𝖫𝖴𝖤\vec{s}[i]:={\sf MIN\_VALUE} (Algorithms 13 and 13). In Algorithm 13, the score and threshold of a element whose score is the maximum in a group are copied to other elements in the group.

The protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds, since 𝖬𝗈𝖽𝗂𝖿𝗂𝖾𝖽𝖦𝗂𝗇𝗂\operatorname{\sf ModifiedGini} protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds, as we will show in the next section.

4.3.3 Modified Gini index

Notation: [[s]]𝖬𝗈𝖽𝗂𝖿𝗂𝖾𝖽𝖦𝗂𝗇𝗂([[g]],[[y]])[\![\vec{s}]\!]\leftarrow\operatorname{\sf ModifiedGini}([\![\vec{g}]\!],[\![\vec{y}]\!]).
Input: A private group flag vector [[g]][\![\vec{g}]\!] of length nn and a private vector [[y]][\![\vec{y}]\!] of length nn.
Output: A private vector [[s]][\![\vec{s}]\!] of length nn, where s[i]\vec{s}[i] is the modified Gini index when the dataset is split between the ii-th and (i+1)(i+1)-th elements.
Cost: O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.
1
2[[y0]]¬[[y]][\![\vec{y}_{0}]\!]\leftarrow\lnot[\![\vec{y}]\!], [[y1]][[y]][\![\vec{y}_{1}]\!]\leftarrow[\![\vec{y}]\!].
3 [[ub]]𝖦𝗋𝗈𝗎𝗉𝖯𝗋𝖾𝖿𝗂𝗑𝖲𝗎𝗆([[g]],[[yb]])[\![\vec{u}_{b}]\!]\leftarrow\operatorname{\sf GroupPrefixSum}([\![\vec{g}]\!],[\![\vec{y}_{b}]\!]) for b{0,1}b\in\{0,1\}.
4 [[sb]]𝖦𝗋𝗈𝗎𝗉𝖲𝗎𝗆([[g]],[[yb]])[\![\vec{s}_{b}]\!]\leftarrow\operatorname{\sf GroupSum}([\![\vec{g}]\!],[\![\vec{y}_{b}]\!]) for b{0,1}b\in\{0,1\}.
5 [[wb]][[sb]][[ub]][\![\vec{w}_{b}]\!]\leftarrow[\![\vec{s}_{b}]\!]-[\![\vec{u}_{b}]\!] for b{0,1}b\in\{0,1\}.
6 [[u]][[u0]]+[[u1]][\![\vec{u}]\!]\leftarrow[\![\vec{u}_{0}]\!]+[\![\vec{u}_{1}]\!] and [[w]][[w0]]+[[w1]][\![\vec{w}]\!]\leftarrow[\![\vec{w}_{0}]\!]+[\![\vec{w}_{1}]\!].
7 [[p]][[w]]×([[u0]]2+[[u1]]2)+[[u]]×([[w0]]2+[[w1]]2)[\![\vec{p}]\!]\leftarrow[\![\vec{w}]\!]\times([\![\vec{u}_{0}]\!]^{2}+[\![\vec{u}_{1}]\!]^{2})+[\![\vec{u}]\!]\times([\![\vec{w}_{0}]\!]^{2}+[\![\vec{w}_{1}]\!]^{2}).
8 [[q]][[u]]×[[w]][\![\vec{q}]\!]\leftarrow[\![\vec{u}]\!]\times[\![\vec{w}]\!].
9 [[s]][[p]]/[[q]][\![\vec{s}]\!]\leftarrow[\![\vec{p}]\!]/[\![\vec{q}]\!]. Here, for simplicity, we set s\vec{s} to be the result of element-wise division of p\vec{p} and q\vec{q}; however, in practice, we set s\vec{s} to be the pair of p\vec{p} and q\vec{q}, and a comparison of the elements in s\vec{s} is replaced by the division-free comparison as [AEV21].
Algorithm 14 Modified Gini index.

We present a protocol to compute the modified Gini index for privately grouped dataset. Thanks to the group-wise operations proposed in Section 3, the formula by Abspoel et al. [AEV21] in Equation 1 can be used almost directly.

The protocol is shown in Algorithm 14. The input is a private group flag vector [[g]][\![\vec{g}]\!] and a private class label vector [[y]][\![\vec{y}]\!] which is sorted by an input attribute in each group. The output is a private vector [[s]][\![\vec{s}]\!], where each s[i]\vec{s}[i] represents the modified Gini index for a split between the ii-th and (i+1)(i+1)-th elements.

Since each yb\vec{y}_{b} is a bit vector with yb[i]=1\vec{y}_{b}[i]=1 iff y[i]=b\vec{y}[i]=b, let the ii-th element be vv, then ub[i]\vec{u}_{b}[i] represents the number of bb up to vv in the group, and wb[i]\vec{w}_{b}[i] represents the number of bb after vv in the group. That is, u0[i]\vec{u}_{0}[i], u1[i]\vec{u}_{1}[i], u[i]\vec{u}[i], w0[i]\vec{w}_{0}[i], w1[i]\vec{w}_{1}[i], and w[i]\vec{w}[i] are the number of 0’s up to vv, the number of 11’s up to vv, the number of elements up to vv, the number of 0’s after vv, the number of 11’s after vv, and the number of elements after vv, respectively, in the group. From Equation 1, p[i]/q[i]p[i]/q[i] is the modified Gini index for splitting between the ii-th and (i+1)(i+1)-th elements.

The protocol uses O(nlogn)O(n\log n) non-free operations in O(logn)O(\log n) rounds.

5 Demonstration of our protocol’s practicality

In order to show the practicality of our decision tree training protocol, we implemented it and measured the running time.

5.1 Implementation methods

Table 3: Specifications of the machine used for our benchmark.
OS CentOS Linux release 7.3.1611
CPU Intel Xeon Gold 6144k (3.50GHz 8 core/16 thread)×\times2
Memory 768 GB

We implemented our protocols on a Shamir’s secret-sharing based three-party computation over a field p\mathbb{Z}_{p}, where p=2611p=2^{61}-1. This 3PC scheme is secure against a single static corruption. For the ABB implementation, we used the comparison protocols by Kikuchi et al. [KIM+18] and the multiplication protocol by Chida et al. [CHI+19]. We also replaced some of the protocols built on ABB with more efficient protocols. The inner product, apply, unapply, and sortperm protocols are based on the ones by Chida et al. [CHI+19]. Our implementation includes several optimizations.

The protocols were implemented with the C++ language. We measured on three servers with the same configuration connected by a ring configuration of Intel X710/X557-AT 10G. The configuration of the servers is shown in Table 3.

5.2 Benchmarking results

Table 4: Running time for training a decision tree of different heights hh on n=104n=10^{4} samples with m=10m=10 input variables.
hh Time [s]
1 1.342
2 2.203
5 4.432
10 8.810
20 16.633
50 40.891
Table 5: Running time for training a decision tree of height h=5h=5 on different numbers of samples nn with m=10m=10 input variables.
nn Time [s]
1010 1.811
10210^{2} 2.092
10310^{3} 2.569
10410^{4} 4.432
10510^{5} 32.035
Table 6: Running time for training a decision tree of height h=5h=5 on n=104n=10^{4} samples with different numbers of input variables mm.
mm Time [s]
1 2.469
2 2.641
5 3.069
10 4.432
20 7.790
50 19.111
100 39.401

To show the scalability of our protocol, we measured the running time for different parameters of nn, mm, and hh. Based on the case where n=104n=10^{4}, m=10m=10, and h=5h=5, we measured the execution time of training by varying n=10,102,103,104,105n=10,10^{2},10^{3},10^{4},10^{5}, m=1,2,5,10,20,50,100m=1,2,5,10,20,50,100, and h=1,2,5,10,20,50h=1,2,5,10,20,50. The results are shown in Tables 5, 6 and 4. Each runtime is the average value of three measurements. The results show that the running time is approximately linear with respect to nn, mm, and hh, respectively.

References

  • [ACC+21] Samuel Adams, Chaitali Choudhary, Martine De Cock, Rafael Dowsley, David Melanson, Anderson C. A. Nascimento, Davis Railsback, and Jianwei Shen. Privacy-preserving training of tree ensembles over continuous data. CoRR, Vol. abs/2106.02769, , 2021.
  • [AEV21] Mark Abspoel, Daniel Escudero, and Nikolaj Volgushev. Secure training of decision trees with continuous attributes. Proc. Priv. Enhancing Technol., Vol. 2021, No. 1, pp. 167–187, 2021.
  • [BFOS84] Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.
  • [Bre01] Leo Breiman. Random forests. Mach. Learn., Vol. 45, No. 1, pp. 5–32, 2001.
  • [CHI+19] Koji Chida, Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Naoto Kiribuchi, and Benny Pinkas. An efficient secure three-party sorting protocol with an honest majority. IACR Cryptol. ePrint Arch., p. 695, 2019.
  • [dHSCodA14] Sebastiaan de Hoogh, Berry Schoenmakers, Ping Chen, and Harm op den Akker. Practical secure decision tree learning in a teletreatment application. In Nicolas Christin and Reihaneh Safavi-Naini, editors, FC 2014, March 3-7, 2014, Christ Church, Barbados, Vol. 8437 of Lecture Notes in Computer Science, pp. 179–194. Springer, 2014.
  • [FO21] Brett Hemenway Falk and Rafail Ostrovsky. Secure merge with O(n log log n) secure operations. In Stefano Tessaro, editor, ITC 2021, July 23-26, 2021, Virtual Conference, Vol. 199 of LIPIcs, pp. 7:1–7:29. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
  • [Fri01] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189–1232, 2001.
  • [HKI+12] Koki Hamada, Ryo Kikuchi, Dai Ikarashi, Koji Chida, and Katsumi Takahashi. Practically efficient multi-party sorting protocols from comparison sort algorithms. In Taekyoung Kwon, Mun-Kyu Lee, and Daesung Kwon, editors, ICISC, Vol. 7839 of LNCS, pp. 202–216. Springer, 2012.
  • [HKP11] Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques, 3rd edition. Morgan Kaufmann, 2011.
  • [HR76] Laurent Hyafil and Ronald L. Rivest. Constructing optimal binary decision trees is NP-complete. Inf. Process. Lett., Vol. 5, No. 1, pp. 15–17, 1976.
  • [KIM+18] Ryo Kikuchi, Dai Ikarashi, Takahiro Matsuda, Koki Hamada, and Koji Chida. Efficient bit-decomposition and modulus-conversion protocols with an honest majority. In Willy Susilo and Guomin Yang, editors, ACISP 2018, July 11-13, 2018, Wollongong, NSW, Australia, Vol. 10946 of Lecture Notes in Computer Science, pp. 64–82. Springer, 2018.
  • [Lau15] Peeter Laud. Parallel oblivious array access for secure multiparty computation and privacy-preserving minimum spanning trees. Proc. Priv. Enhancing Technol., Vol. 2015, No. 2, pp. 188–205, 2015.
  • [LP00] Yehuda Lindell and Benny Pinkas. Privacy preserving data mining. In Mihir Bellare, editor, CRYPTO, Vol. 1880 of LNCS, pp. 36–54. Springer, 2000.
  • [LW14] Peeter Laud and Jan Willemson. Composable oblivious extended permutations. In Frédéric Cuppens, Joaquín García-Alfaro, A. Nur Zincir-Heywood, and Philip W. L. Fong, editors, FPS 2014, November 3-5, 2014, Montreal, QC, Canada, Vol. 8930 of Lecture Notes in Computer Science, pp. 294–310. Springer, 2014.
  • [MR18] Payman Mohassel and Peter Rindal. ABY3{}^{\mbox{3}}: A mixed protocol framework for machine learning. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, CCS 2018, October 15-19, 2018, Toronto, ON, Canada, pp. 35–52. ACM, 2018.
  • [Qui86] J. Ross Quinlan. Induction of decision trees. Mach. Learn., Vol. 1, No. 1, pp. 81–106, 1986.
  • [Qui14] J Ross Quinlan. C4.5: programs for machine learning. Elsevier, 2014.
  • [RWT+18] M. Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M. Songhori, Thomas Schneider, and Farinaz Koushanfar. Chameleon: A hybrid secure computation framework for machine learning applications. In Jong Kim, Gail-Joon Ahn, Seungjoo Kim, Yongdae Kim, Javier López, and Taesoo Kim, editors, AsiaCCS 2018, June 04-08, 2018, Incheon, Republic of Korea, pp. 707–721. ACM, 2018.
  • [Wak68] Abraham Waksman. A permutation network. J. ACM, Vol. 15, No. 1, pp. 159–163, 1968.
  • [WGC19] Sameer Wagh, Divya Gupta, and Nishanth Chandran. Securenn: 3-party secure computation for neural network training. Proc. Priv. Enhancing Technol., Vol. 2019, No. 3, pp. 26–49, 2019.
  • [Yao86] Andrew Chi-Chih Yao. How to generate and exchange secrets (extended abstract). In 27th Annual Symposium on Foundations of Computer Science, 27–29 October 1986, Toronto, Canada, pp. 162–167. IEEE Computer Society, 1986.