This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party Setting

Ismat Jarin University of Michigan, Dearborn [email protected]  and  Birhanu Eshete University of Michigan, Dearborn [email protected]
(2021)
Abstract.

When multiple parties that deal with private data aim for a collaborative prediction task such as medical image classification, they are often constrained by data protection regulations and lack of trust among collaborating parties. If done in a privacy-preserving manner, predictive analytics can benefit from the collective prediction capability of multiple parties holding complementary datasets on the same machine learning task. This paper presents pricure, a system that combines complementary strengths of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative prediction among multiple model owners. SMPC enables secret-sharing of private models and client inputs with non-colluding secure servers to compute predictions without leaking model parameters and inputs. DP masks true prediction results via noisy aggregation so as to deter a semi-honest client who may mount membership inference attacks. We evaluate pricure on neural networks across four datasets including benchmark medical image classification datasets. Our results suggest pricure guarantees privacy for tens of model owners and clients with acceptable accuracy loss. We also show that DP reduces membership inference attack exposure without hurting accuracy.

copyright: nonejournalyear: 2021conference: 2021 ACM International Workshop on Security and Privacy Analytics; April 28, 2021; booktitle: 2021 ACM International Workshop on Security and Privacy Analytics (IWSPA ’19), Co-located with ACM CODASPY’21 April 26 –28

1. Introduction

Machine learning (ML) is being used in many application domains such as image classification (Krizhevsky et al., 2017), voice recognition (Dahl et al., 2012), medical diagnosis (Gao et al., 2019; Johnson et al., 2016; Kaggle, 2020), finance (e.g., credit risk assessment) (Angelini et al., 2008), and autonomous driving (Sallab et al., 2017). An emerging paradigm in ML is machine learning as a service (MLaaS) where clients submit inputs to obtain predictions via a cloud-based prediction API. When client inputs are privacy-sensitive (e.g., patient data), the MLaaS platform is expected to comply with privacy protection regulations (e.g., HIPAA in the United States) to preserve privacy of inputs (e.g., patient diagnosis details). Moreover, clients may not be encouraged to submit inputs that, if revealed to others may jeopardize their competitive advantage (e.g., on inputs about intellectual property).

In a collaborative setting where multiple MLaaS providers own private models trained on their respective private data, and clients own private input samples, clients can benefit from the collective inference capability of multiple model owners for tasks such as medical image classification. As a result, a practical setting for multiple MLaaS providers to collaborate on a common ML task is to keep the secrecy of their respective models trained on private data and participate in a privacy-preserving common predictive task such that: after a single iteration of collaborative inference, (a)(a) the client learns nothing about the models of MLaaS providers; (b)(b) the MLaaS providers learn nothing about client’s data, and (c)(c) the inference capability of a semi-honest client is limited.

To build privacy-preserving mechanisms in to the ML pipeline, previous work proposed private training methods based on objective perturbation (Chaudhuri et al., 2011), gradient perturbation (Abadi et al., 2016; Jayaraman et al., 2018), and output perturbation (Abadi et al., 2016; Jayaraman et al., 2018). In the collaborative setting, prior work has proposed cryptography-based transformation of ML building blocks (e.g., activation functions, pooling operations) at training time (e.g., CryptoNets (Gilad-Bachrach et al., 2016), SecureML (Mohassel and Zhang, 2017), (Jayaraman et al., 2018)), oblivious transformation of neural networks (e.g., MiniONN (Liu et al., 2017)), and secret sharing (e.g., Chameleon (Riazi et al., 2018), SecureNN (Wagh et al., 2018)) to enable 2- and 3-party secure computation for collaborative training and inference. However, no prior work explores the collaborative inference setting with tens of model owners. Moreover, when a client is semi-honest, it might take the oracle-style interaction with the MLaaS provider to mount membership inference style attacks.

In this paper, we present pricure 111PRICURE stands for PRIvacy-preserving Collaborative inference in a mUlti-paRty sEtting., a system that combines complementary privacy protection notions of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative inference among multiple model owners holding private models and clients holding private input samples. The key insight behind combining SMPC and DP is the orthogonal protections they provide. Intuitively, given an input xx and a function f(x)f(x), SMPC is aimed at avoiding information about xx that is leaked in the course of computing f(x)f(x). DP, on the other hand, aims to randomize ff such that an adversary has very limited leverage to infer about xx by observing f(x)f(x). In the collaborative inference we consider in pricure, SMPC addresses pre-inference disclosure protection of private data while DP addresses post-inference protection of inference results to limit attacks such as membership inference. This orthogonal nature of SMPC and DP makes their combination appealing for our setting.

The privacy guarantee in pricure stems from the notion of additive secret sharing (Wagh et al., 2018; Beaver, 2012), where model owners train their private models and secret-share model parameters with non-colluding secure servers (we call them workers) that compute intermediate results on secret-shared input samples private to clients and similarly secret-shared by a client with the workers. Using intermediate inference results from the workers, a trusted aggregator reconstructs the final inference results and performs noisy aggregation to deter a semi-honest client who may mount membership inference attacks. While we borrow the additive secret sharing notion from prior work (SecureNN (Wagh et al., 2018), SPDZ (Beaver, 2012)) which focus on privacy-preserving training and inference in a 3-party setting, in pricure we rather focus on secret-sharing-based privacy-preserving collaborative inference to handle tens of model owners with acceptable accuracy/privacy trade-off, while also providing a differential privacy-based layer of defense against membership inference attacks that exploit the oracle access to the MLaaS provider.

We evaluate pricure on neural networks across four datasets that span handwritten digit recognition (MNIST (LeCun et al., 2020)), clothing image classification (Fashion-MNNIST (Xiao et al., 2017)), breast cancer classification (IDC (Kaggle, 2020)), and in-ICU patient length-of-stay prediction (MIMIC (Johnson et al., 2016)). For instance, on the MNIST dataset, using a commodity hardware setup, our results guarantee privacy for up to 50 model owners with nearly no accuracy loss, with a per-model average overhead of 48ms for secret-sharing model parameters, and a per-model response delay of 1.5s. On the MIMIC dataset, we show that DP reduces the accuracy of membership inference attack by up to 9.02%, which demonstrates the utility of differently privacy as a second layer of protection against inference on top of the SMPC-based disclosure protection for inputs and model parameters. In summary, we make the following contributions:

\bullet New framework: Through novel combination of orthogonal privacy guarantees of secure multi-party computation and differential privacy, we use additive secret sharing to preserve privacy of model parameters for model owners and input samples for clients; and we provide a differential privacy-based layer of defense against membership inference attack so as to limit adversarial advances of a semi-honest client.

\bullet Scalable approach: We build on prior work (Wagh et al., 2018; Beaver, 2012) and demonstrate a scalable collaborative inference with tens of model owners with acceptable accuracy/privacy trade-off.

\bullet Comprehensive evaluations: We conduct extensive experiments with four datasets (MNIST (LeCun et al., 2020), Fashion-MNIST (Xiao et al., 2017), IDC (Kaggle, 2020), and MIMIC (Johnson et al., 2016)) that include medical datasets, with varying number of model owners, and across a range of privacy budget values to evaluate: accuracy/privacy trade-off, impact of number of model owners on accuracy/privacy, resilience against membership inference attack, and performance overhead of pricure overall, per-sample, and per-model-owner.

\bullet Our code is available at: https://github.com/um-dsp/PRICURE.

2. Background and Preliminaries

In this section, we cover preliminaries focusing on neural networks, secure multi-party computation, and differential privacy.

2.1. Feed-Forward Neural Networks

A feed-forward neural network (FFNN) is a network of information processing units called neurons organized into layers. In a FFNN, information travels only forward without looping, starting from input nodes, then through hidden nodes, and finally to output nodes. The goal of the FFNN is to approximate some function FθF_{\theta}, and then map an input xx to a label yy as y=Fθ(x)y=F_{\theta}(x), where θ\theta is the learned parameter vector.

The network architecture consists of the following: an input layer that captures raw inputs, hidden layers that apply a series of transformations to the input, and the output layer that produces the mapping of xx to yy. For a multi-class classifier, the output layer has as many neurons as the class labels. The coefficients of connections between two neurons are referred to as weight, and are typically initialized as small random values to bootstrap the training process.

Suppose that the FFNN has ll number of hidden layers, and we denote the number of neurons at layer ll as klk_{l}. The output layer has oo number of neurons. The input vector (x1,x2,,xdx_{1},x_{2},...,x_{d}) is the dd-dimensional input to the network. Without loss of generality, the label yy is computed as: y=F(x,θ)=j=0kξjθjy=F(x,\theta)=\sum_{j=0}^{k}\xi_{j}\theta_{j}, where θ\theta are model parameters, ξ\xi is a vector of non-linear parametric basis functions where ξ0(x)=1\xi_{0}(x)=1. Now, given k1k_{1} number of neurons in the 1st1^{st} hidden layer, the output of jthj^{th} neuron of the 1st1^{st} hidden layer is computed as: rj1(x,W)=h(i=1dWij(1)xi+bj)r^{1}_{j}(x,W)=h(\sum_{i=1}^{d}W_{ij}(1)x_{i}+b_{j}), where Wij(1)W_{ij}(1) is the weight from connection between the input to connection of the 1st1^{st} hidden layer of jthj^{th} neuron, bjb_{j} is the bias and hh is the non-linear differentiable activation function. As a result, for the 1st1^{st} hidden layer, we have the output neurons [r11,r21,.,rk11][r^{1}_{1},r^{1}_{2},....,r^{1}_{k_{1}}]. These output neurons will then be the input vectors to the 2nd2^{nd} hidden layer. In the same vein, the output of the jthj^{th} neuron from the lthl^{th} hidden layer is computed as: rjl(x,W)=h(i=1kl1Wij(l)ril1)r^{l}_{j}(x,W)=h(\sum_{i=1}^{k_{l-1}}W_{ij}(l)r^{l-1}_{i}). Finally, the result of the jthj^{th} neuron in the output layer is computed as: yj(x,W)=h(i=1klWij(l+1)ril)y_{j}(x,W)=h(\sum_{i=1}^{k_{l}}W_{ij}(l+1)r^{l}_{i}), where the coefficient Wij(l+1)W_{ij}(l+1) is the weight vector from the final hidden layer ll to the jthj^{th} node of the output layer. The resultant vector for all neurons of the output layer will be [y1,y2,,yo][y_{1},y_{2},...,y_{o}]. Note that, the weight vector W(1)R(dk1)W(1)\in R^{(d\dots k_{1})}, W(2)R(K1k2)W(2)\in R^{(K_{1}\dots k_{2})}, …, W(l+1)R(klo)W(l+1)\in R^{(k_{l}\dots o)}, are concatenated into the parameter vector θ=[W(1),,W(l+1)]\theta=[W(1),...,W(l+1)].

2.2. Secure Multi-Party Computation

Secure multi-party computation (SMPC) (Bayatbabolghani and Blanton, 2020) enables a set of parties P={P1,,Pm}P=\{P_{1},...,P_{m}\}, where party PiP_{i} holds sensitive input xix_{i}, to jointly compute a function y=f(x1,,xm)y=f(x_{1},...,x_{m}) while protecting each xix_{i}. The computation needs to result in the correct value of yy (called the correctness property) and at the end of the computation each PiP_{i} learns nothing beyond yy (called the privacy property). SMPC has several applications, for example in: privacy-preserving decision making on distributed data (Attema et al., 2018), privacy-preserving machine learning (Gilad-Bachrach et al., 2016), secure auctions (Joan Feigenbaum and Saint-Jean, 2004), and secure voting (Nair et al., 2015).

The two popular implementations of SMPC are garbled circuit (Yao, 1986) and secret sharing (Ben-Or et al., 1988). In garbled circuit, the PiP_{i}’s construct a (large, encrypted) circuit and evaluate it at once, while in secret sharing they need to interact for each circuit gate. Garbled circuit allows for constant number of rounds but requires larger bandwidth (i.e., fewer messages, but bigger messages are sent). Secret sharing has typically low bandwidth (i.e., small messages per gate) and high throughput, where the number of rounds is determined by the depth of the circuit. In this work, we use secret-sharing-based MPC. Intuitively, a (t,n)(t,n)-secret sharing scheme splits a secret ss into nn shares sis_{i} and at least tt shares are required to reconstruct the secret. We use s=(s1,,sn)\langle s\rangle=(s_{1},...,s_{n}) to indicate the sharing of ss among nn parties.

Additive secret sharing (Shamir, 1979) allows a secret ss to be split into random parts and shares them with secure workers (real/virtual instances that perform computations such as addition and multiplication securely). For example, suppose there are two workers AA and BB and a secret ss. The above notation then becomes s=(s1,s2)\langle s\rangle=(s_{1},s_{2}) because n=2n=2. Accordingly, AA and BB receive share values s1as_{1}^{a} and s2bs_{2}^{b}, respectively. The workers AA and BB perform computation (e.g., compute function FF) directly on the share values. After finishing the computation, the workers produce intermediate results as follows:

(1) A:ya\displaystyle A:y_{a} =F(s1a)\displaystyle=F(s_{1}^{a})
(2) B:yb\displaystyle B:y_{b} =F(s2b)\displaystyle=F(s_{2}^{b})

These intermediate results yay_{a} and yby_{b} are then combined using private additive scheme and then the true output result is revealed. This secret sharing operation does not use floating point numbers, rather performed in a mathematical space called Integer Quotient Ring (Jacobson, 1957), which contains the integer between 0 to Q-1. Here Q is a prime number that has to be big enough so that the space is able to contain all the numbers that would be used in our experiments. A conceptual proof that illustrates additive secret sharing is described in the Appendix (Section 8.2).

Our implementation of pricure builds on private additive sharing introduced in SecureNN (Wagh et al., 2019) and incorporated in PySyft (OpenMined, 2020). SecureNN builds on MPC to implement exact non-linear functions while avoiding the use of inter-conversion protocols as well as general-purpose number-theoretic libraries. The mathematical operations like matrix multiplication, summation, private comparison, division, and max-pooling operations are built based the MPC property. PySyft implements SPDZ (Damgård et al., 2012), a secret sharing scheme that enables more complex operations. In the Appendix (Section 8.3), we provide an illustrative proof on how SPDZ functions.

2.3. Differential Privacy

For two neighboring datasets DiD_{i} and DjD_{j}, which differ in just one data point xx, suppose FiF_{i} is trained on DiD_{i} and FjF_{j} is trained on DjD_{j}. Let the output space of FiF_{i} and FjF_{j} be SS such that for an input xx, Fi(x)SF_{i}(x)\in S and Fj(x)SF_{j}(x)\in S. Differential privacy (DP) (Dwork et al., 2006) guarantees that a randomized mechanism M(Fi(x))M(F_{i}(x)) and M(Fj(x))M(F_{j}(x)) does not enable an observer (adversary) to distinguish whether MM’s output was based on DiD_{i} or DjD_{j}, i.e., whether or not xx is used as a training example for Fi(x)F_{i}(x) or Fj(x)F_{j}(x). The indistinguishability of xx’s membership in DiD_{i} or DjD_{j} protects the identifiability of xx (e.g., a person’s medical record). In ϵ\epsilon-DP, the indistinguishability of the outputs of Fi(x)F_{i}(x) and Fj(x)F_{j}(x) is parametrized by ϵ\epsilon (also called the privacy budget). Equation  3 formalizes the notion of (ϵ\epsilon)-DP as follows:

(3) P[M(Fi(x))S]eϵ×P[M(Fj(x))S]P[M(F_{i}(x))\in S]\leq e^{\epsilon}\times P[M(F_{j}(x))\in S]

Intuitively, lower ϵ\epsilon values indicate stronger privacy protection. Equation 3 has the (ϵ,δ)(\epsilon,\delta) variant: P[M(Fi(x))S]eϵ×P[M(Fj(x))S]+δP[M(F_{i}(x))\in S]\leq e^{\epsilon}\times P[M(F_{j}(x))\in S]+\delta, where δ\delta refers to the failure probability of the mechanism M, and when δ=0\delta=0, we say MM is ϵ\epsilon-DP. The ϵ\epsilon-DP formalism in Equation 3 guarantees individual data item privacy in the most extreme case where DiD_{i} and DjD_{j} are so similar that only one data point sets them apart. In pricure, each DiD_{i} is unique since the datasets come from independent data owners. Hence, using the DP notion, pricure guarantees that individuals who contribute privacy-sensitive data have bounded privacy guarantee.

In DP, randomization is at the core of ensuring the indistinguishability of an individual’s record in a dataset. A typical way to achieve output randomization is to add noise to an output (e.g., Fi(x)F_{i}(x)) using, for instance, the Laplace mechanism. For a privacy budget ϵ\epsilon and FiF_{i}’s sensitivity value of ss, using the Laplace mechanism Lap(b)Lap(b) centered at 0 and scale bb, noisenoise is computed as Lap(sϵ)Lap(\frac{s}{\epsilon}).

3. Problem Statement and Threat Model

In this section, we discuss our problem statement and threat model with respect to the parties that take part in pricure.

3.1. Problem Statement

Problem: We consider a setting where multiple model owners P={P1,,Pm}P=\{P_{1},…,P_{m}\} hold private models F1,,FmF_{1},…,F_{m} such that FiF_{i} is trained on PiP_{i}’s private data, and client CC submits their private input x=[x1,,xd]x=[x_{1},…,x_{d}] to obtain an inference result: y=𝚽i:1m(Fi(x))y=\mathbf{\Phi}_{i:1...m}(F_{i}(x)), where 𝚽\mathbf{\Phi} is an aggregation function that leverages the collective inference capability of FiF_{i}’s. Taking a feed-forward neural network FiF_{i} with number of layers ll, the collaborative inference result can be expanded in terms of weights and biases as:
y=𝚽i:1m(WlFl1i(F1i(W1x+b1))+bl)y=\mathbf{\Phi}_{i:1...m}(W_{l}\cdot F^{i}_{l-1}(...F^{i}_{1}(W_{1}\cdot x+b_{1})...)+b_{l}).

Goals: Our first goal is to obtain yy while keeping FiF_{i}’s and xx private. In particular, after each prediction, CC learns nothing about FiF_{i}’s (i.e., W1,W2,,WlW_{1},W_{2},...,W_{l} and b1,b2,,blb_{1},b_{2},...,b_{l}) and PiP_{i}’s learn nothing about xx and other model owners. Our second goal is to deter a semi-honest client CC from mounting membership inference attacks.

Challenges: To achieve the aforementioned goals, one needs to address specific research questions with regards to a) accuracy/privacy trade-off, b) scalability with growing number of model owners, c) the prospect of attacks such as membership inference even in the presence of privacy-preserving schemes, and d) performance implications of privacy-enhancing schemes in a multi-party setting. We address these questions in Sections 5.2 - 5.5.

3.2. Threat Model

Here, we describe our threat model with respect to model owners, workers, aggregator, and client.

Model Owners: We consider the semi-honest setting for model owners, whereby they do not trust each other to share their training data and/or models, because of data protection and privacy regulations (e.g., HIPAA) or competitive advantage (e.g., they compete in the same business).

Client: The client is assumed to be semi-honest for it may use pricure as an oracle to initiate membership inference (Shokri et al., 2017), model extraction (Tramèr et al., 2016), or model inversion attacks (Fredrikson et al., 2015). Moreover, it does not have access to model parameters of any of the model owners.

Workers: In the honest-but-curious sense, they may analyze the secret-shares they receive from model owners or intermediate computation results, but are trusted enough not to collude with each other to exchange their respective secret-shares or intermediate computation results. Given the partial result they compute, workers are assumed to have no access to the final inference result.

Aggregator: We assume the aggregator is trusted by model owners and clients not to reveal true inference results to any third party, and will not mount model extraction/membership inference style attacks. Moreover, it doesn’t have access to model parameters of the model owners and the data of the client. When sending the prediction results to the client, the aggregator encrypts the result with the client’s public key (shared with the aggregator a priori).

4. Approach

In this section, we first give a high-level overview of pricure and discuss details in Sections 4.24.5.

4.1. Overview

Figure 1 highlights our proposed system, pricure. We consider a setting of mm model owners (e.g., hospitals) who may not share training data and models due to regulatory or trust reasons, yet they want to participate in a collaborative inference task TT (e.g., medical image classification). As a result, model owners train mm private models F1F_{1}, …, FmF_{m}. pricure combines secure multi-party computation (SMPC) and differential privacy (DP) to enable collaborative inference among the mm model owners on task TT, while (a)(a) ensuring secrecy of each FiF_{i} and samples submitted for inference from client CC and (b)(b) limiting adversarial advances of a semi-honest client to mount membership inference attack.

Why Combine SMPC and DP? It is because these two schemes are complementary to each other, and they provide orthogonal protections to address (a)(a) and (b)(b). Given an input xx and a function f(x)f(x), SMPC is aimed at answering “what information about xx is leaked in the course of computing f(x)f(x)?”, which addresses (a)(a). DP, on the other hand, is concerned with “what can be inferred by analyzing f(x)f(x)?”, which addresses (b)(b). In the sense of the collaborative inference we consider in pricure, SMPC is relevant for pre-inference protection of data while DP is aimed at post-inference protection to avoid attacks such as membership inference.

Refer to caption
Figure 1. pricure Overview. Model owners P1,,PmP_{1},...,P_{m} train private models F1,,FmF_{1},...,F_{m}. The PiP_{i}’s then secret-share FiF_{i}’s with workers AA and BB as <Fi>=(FiA,FiB)<F_{i}>=(F_{i}^{A},F_{i}^{B}). A client CC secret-shares private sample xx with AA and BB as <x>=(xa,xb)<x>=(x_{a},x_{b}). Using secret-shares of models F1F_{1} to FmF_{m}, AA and BB each compute intermediate results as yia=FiA(xa)y_{i}^{a}=F_{i}^{A}(x_{a}) and yib=FiB(xb)y_{i}^{b}=F_{i}^{B}(x_{b}), respectively, and send them to the aggregator. The aggregator reconstructs inference result yiy_{i} from yiay_{i}^{a} and yiby_{i}^{b}. Finally, the aggregator releases to the client a differentially private aggregation of the yiy_{i}’s as yy.

After training a model on a private dataset, each model owner secret-shares its model parameters with non-colluding secure servers, which we call workers. A client CC (e.g., another hospital, a medical researcher) who wants to obtain an inference result (e.g., classification) on sample xx also secret-shares the sample with the workers. A single iteration of collaborative inference proceeds as follows: when the workers receive a secret-share of an input xx, they use each model’s partial inference function to compute and return a partial result to a trusted aggregator, which reconstructs each model’s partial inference results into full inference output for sample xx. The aggregator then performs confidence-weighted aggregation of the inference results, and finally leverages differential privacy to add random noise to the final inference result and sends it to the client.

The secrecy of client’s inputs and model parameters of each owner is protected using additive secret sharing. Noisy inference via ϵ\epsilon-DP is used to deter a semi-honest client who might attempt to mount membership inference attacks. The communication link between the client and the aggregator is a secure channel. The client shares its public key with the aggregator, that it uses to encrypt an inference result so the client decrypts it with its private key to obtain the final output of the collaborative inference for sample xx.

4.2. Model Owners

Each model owner maintains its private training data from which it trains a private model, and does not share both training data and model details with other model owners, the aggregator, or a third party. To enable collaborative inference that benefits from the collective intelligence of what is learned by each FiF_{i}, model owners are, in principle, free to use any model architecture, but need to agree a priori on a common feature representation (e.g., pixel intensities for images) and the inference output format (e.g., label only, probability score). A model owner PiP_{i} may train its respective model FiF_{i}, with or without privacy. We note that when a model is trained in a privacy-preserving manner (e.g., via gradient perturbation as in differentially private stochastic gradient descent (DP-SGD) (Abadi et al., 2016) or using output perturbation (Jayaraman et al., 2018)), the privacy budget will add up with the inference output perturbation pricure introduces at the aggregator, which may incur more penalty on accuracy.

Number of Model Owners: The minimum value for mm is 2 while there is no limit on the maximum. Intuitively, the higher the value of mm, the better the collective inference accuracy of the participants. This, however, depends on the accuracy of each model and the privacy budget (amount of noise added on the final inference output). Our model owner setup shares some similarity with PATE (Papernot et al., 2017). However, our setting differs both in the goals and the details. For instance, in PATE, the larger mm implies potentially better privacy guarantee. In our case, the interpretation of how large mm may not inherently entail better privacy guarantee, since each model owner is acting on its own training data (unlike PATE, where training data is partitioned into mm disjoint sets). The disjoint mm partitions in PATE is equivalent to the typically disjoint mm model owners in pricure. Model owners in pricure may use varying size of training data and the choice of mm may have implications on the accuracy/privacy trade-off (in PATE, depending on the dataset size, the choice of mm may affect accuracy/privacy trade-off). In Section 5.3, we will analyze the relationship between number of model owners, inference accuracy, and privacy guarantee. Next, we describe how model owners secret-share model parameters with workers.

Secret-Sharing of Model Parameters: Model owners secret-share their model parameters with the virtual workers via a secure channel. Once they secret-share their model parameters, model owners are not required to stay connected with the workers. Instead, they can terminate the secure connection and initiate it later as needed (for instance when a model is retrained and updating the model parameters is deemed necessary).

Once a model owner trains a model, the learned model parameters are captured via weights (WW) and bias (bb). Suppose for model owner PiP_{i} (P[1,m]P\in[1,m]), the weight vectors are θ=[W(1),.,W(l+1)]\theta=[W(1),....,W(l+1)], where, the weight vector W(1)W(1) represents the weight parameters from input to the 1st1^{st} hidden layer, similarly W(2)W(2) represents the weight parameters from 1st1^{st} hidden layer to the 2nd2^{nd} hidden layer and W(l+1)W(l+1) represents the weight parameters from lthl^{th} hidden layer to the output layer. Using Equation 4 and 5, model owner PiP_{i} secret-shares its jthj^{th} model parameters with worker AA and BB, respectively as follows:

WorkerA:Wa(j)\displaystyle WorkerA:W^{a}(j) =Random(Q,Q)\displaystyle=Random(-Q,Q)
WorkerB:Wb(j)\displaystyle WorkerB:W^{b}(j) =W(j)Wa(j)\displaystyle=W(j)-W^{a}(j)

Finally, all secret-shared model parameters are computed as follows:

WorkerA:θa=[Wa(1),..,Wa(l+1)],bia\displaystyle WorkerA:\theta^{a}=[W^{a}(1),.....,W^{a}(l+1)],b_{i}^{a}
WorkerB:θb=[Wb(1),..,Wb(l+1)],bib\displaystyle WorkerB:\theta^{b}=[W^{b}(1),.....,W^{b}(l+1)],b_{i}^{b}

The secret-share is performed over a secure channel, and once the share is saved by the two workers, the model owner can go offline.

4.3. Client

When the client wants to obtain inference result on input xx, it connects to workers AA and BB and secret-shares its private input. Moreover, it generates a private-public key pair and sends the public key to the aggregator so that when the inference result is ready, before sending it the aggregator encrypts it with the client’s public key. When it receives an encrypted label from the aggregator, the client performs a decryption operation using its private key and produces the clear-text label for input xx. In Section 4.5, we will describe how the aggregator produces the final inference result. Next, we describe how the client secret-shares private input xx.

Secret-Sharing of Client Input: For simplicity, consider x1x_{1} is the first element of the client’s input feature vector x=[x1,,xd]x=[x_{1},...,x_{d}]. x1x_{1} is divided into two parts to be secret-shared with two workers AA and BB, using Equations 4 and 5:

(4) x1a=random(Q,Q)x_{1}^{a}=random(-Q,Q)
(5) x1b=(x1x1a)ModQx_{1}^{b}=(x_{1}-x_{1}^{a})\quad\textrm{Mod}\quad Q

xa1{}_{1}^{a} and xb1{}_{1}^{b} are large numbers that fall in the range [0,Q1][0,Q-1], and do not individually reveal the real data. x1x_{1} is obtained by combining x1ax_{1}^{a} and x1bx_{1}^{b} using Equation 6 below:

(6) x1=(x1a+x1b)ModQx_{1}=(x_{1}^{a}+x_{1}^{b})\quad\textrm{Mod}\quad Q

Extending the secret-share mechanism from x1x_{1} to the whole feature vector xx of dimension dd, the secret-share of input xx will be done by extending Equations 4 and 5 as follows:

WorkerA:x1a,x2a,.,xda\displaystyle WorkerA:x_{1}^{a},x_{2}^{a},....,x_{d}^{a}
WorkerB:x1b,x2b,.,xdb\displaystyle WorkerB:x_{1}^{b},x_{2}^{b},....,x_{d}^{b}

Note that for i[1,d]i\in[1,d] each xiax_{i}^{a} and xibx_{i}^{b} is respectively computed using Equation 4 and 5.

4.4. Workers

Next, let us assume we have our network of neurons with total ll number of hidden layers, dd be the dimension of input xx to be secret-shared with worker AA and worker BB. The output results for the jthj^{th} neuron of the 1st1^{st} hidden layer for worker AA and worker BB are computed as follows:

(7) WorkerA:rja,1(xa,Wa)\displaystyle WorkerA:r^{a,1}_{j}(x^{a},W^{a}) =h(i=1dWija(1)xia+bj)\displaystyle=h(\sum_{i=1}^{d}W^{a}_{ij}(1)x^{a}_{i}+b_{j})
(8) WorkerB:rjb,1(xb,Wb)\displaystyle WorkerB:r^{b,1}_{j}(x^{b},W^{b}) =h(i=1dWijb(1)xib+bj)\displaystyle=h(\sum_{i=1}^{d}W^{b}_{ij}(1)x^{b}_{i}+b_{j})

Here, Wija(1)W^{a}_{ij}(1) and Wijb(1)W^{b}_{ij}(1) are the weight vectors from input to the jthj^{th} neuron of the 1th1^{th} hidden layer. For the 1st1^{st} hidden layer, we have the output neurons [r1a,1,r2a,1,.,rk1a,1][r^{a,1}_{1},r^{a,1}_{2},....,r^{a,1}_{k_{1}}] for worker AA and [r1b,1,r2b,1,.,rk1b,1][r^{b,1}_{1},r^{b,1}_{2},....,r^{b,1}_{k_{1}}] for worker BB. Note that k1k_{1} is the total number of neurons in the 1st1^{st} hidden layer. These output neurons will make the input vector for the 2nd2^{nd} hidden layer. Accordingly, the output of the jthj^{th} neuron of the 2nd2^{nd} hidden layer is obtained as:

WorkerA:rja,2(xa,Wa)\displaystyle WorkerA:r^{a,2}_{j}(x^{a},W^{a}) =h(i=1k1Wija(2)ria,1)\displaystyle=h(\sum_{i=1}^{k_{1}}W^{a}_{ij}(2)r^{a,1}_{i})
WorkerB:rjb,2(xb,Wb)\displaystyle WorkerB:r^{b,2}_{j}(x^{b},W^{b}) =h(i=1k1Wijb(2)rib,1)\displaystyle=h(\sum_{i=1}^{k_{1}}W^{b}_{ij}(2)r^{b,1}_{i})

Similarly, the output of the jthj^{th} neuron of the lthl^{th} hidden layer is computed as follows:

WorkerA:rja,l(xa,Wa)\displaystyle WorkerA:r^{a,l}_{j}(x^{a},W^{a}) =h(i=1kl1Wija(l)ria,l1)\displaystyle=h(\sum_{i=1}^{k_{l-1}}W^{a}_{ij}(l)r^{a,l-1}_{i})
WorkerB:rjb,l(xb,Wb)\displaystyle WorkerB:r^{b,l}_{j}(x^{b},W^{b}) =h(i=1kl1Wijb(l)rib,l1)\displaystyle=h(\sum_{i=1}^{k_{l-1}}W^{b}_{ij}(l)r^{b,l-1}_{i})

Here, Wija(l)W^{a}_{ij}(l) and Wijb(l)W^{b}_{ij}(l) are the weight vectors from the (l1)th(l-1)^{th} hidden layer to the jthj^{th} neuron of the lthl^{th} hidden layer.

Each worker uses the secret-shared private input of the client and private model parameters of each model owner to produce intermediate results, which, when combined, produce the true inference output. Finally, the intermediate results for worker AA and worker BB for the jthj^{th} output neuron are computed as:

(9) WorkerA:yja(xa,Wa)\displaystyle WorkerA:y^{a}_{j}(x^{a},W^{a}) =h(i=1klWija(l+1)ria,l)\displaystyle=h(\sum_{i=1}^{k_{l}}W^{a}_{ij}(l+1)r^{a,l}_{i})
(10) WorkerB:yjb(xb,Wb)\displaystyle WorkerB:y^{b}_{j}(x^{b},W^{b}) =h(i=1klWijb(l+1)rib,l)\displaystyle=h(\sum_{i=1}^{k_{l}}W^{b}_{ij}(l+1)r^{b,l}_{i})

The coefficients Wija(l+1)W^{a}_{ij}(l+1) and Wijb(l+1)W^{b}_{ij}(l+1) are the weight vectors from the final hidden layer ll to the jthj^{th} neuron of the output layer for worker AA and worker BB, respectively.

Finally, the resultant intermediate vector for the output layer of model owner PiP_{i} for oo number of neurons is yPia=[yPi,1a,yPi,2a,.,yPi,oa]y_{P_{i}}^{a}=[y^{a}_{P_{i},1},y^{a}_{P_{i},2},....,y^{a}_{P_{i},o}] for worker AA and yPib=[yPi,1b,yPi,2b,.,yPi,ob]y_{P_{i}}^{b}=[y^{b}_{P_{i},1},y^{b}_{P_{i},2},....,y^{b}_{P_{i},o}] for worker BB, respectively. These final vectors yPiay_{P_{i}}^{a} and yPiby_{P_{i}}^{b} are sent to the aggregator via a secure communication channel.

4.5. Aggregator

Reconstruction of Inference Results: When it receives intermediate results yay_{a} and yby_{b} from the workers, the aggregator first combines yay_{a} and yby_{b} to obtain the true inference result yy. Across each yy that corresponds to a model owner, the aggregator then performs majority vote-based aggregation of inference results. The aggregator then adds random noise sampled from the Laplace distribution to perturb the inference output (and hence deter membership inference style attacks). Lastly, the aggregator encrypts the inference output with client’s public key before sending it to the client.

In general, for each model owner PiP_{i}, the aggregator receives yPia=[yPi,1a,yPi,2a,..,yPi,oa]y_{P_{i}}^{a}=[y^{a}_{P_{i},1},y^{a}_{P_{i},2},..,y^{a}_{P_{i},o}] and yPib=[yPi,1b,yPi,2b,..,yPi,ob]y_{P_{i}}^{b}=[y^{b}_{P_{i},1},y^{b}_{P_{i},2},..,y^{b}_{P_{i},o}] from worker A and B respectively. To get the true inference value from the intermediate results, the aggregator combines them as follows:

(11) yPi=(yPia+yPib)Mod(Q)\displaystyle y_{P_{i}}=(y_{P_{i}}^{a}+y_{P_{i}}^{b})\quad Mod\quad(Q)

As a result, the constructed inference result for model owner PiP_{i} is yPi=[yPi,1,yPi,2,.,yPi,o]y_{P_{i}}=[y_{P_{i},1},y_{P_{i},2},....,y_{P_{i},o}] for oo number of output neurons. Hence, the output for mm model owners is represented as a vector of final inference results: [yP1,yP2,.,yPm][y_{P_{1}},y_{P_{2}},....,y_{P_{m}}].

Noisy Aggregation of Inference Results: The aggregator aggregates all the inference results for mm model owners as follows:

(12) ynoise=Φi=0myPi+Lap(sϵ)\displaystyle y_{noise}=\Phi_{i=0}^{m}y_{P_{i}}+Lap(\frac{s}{\epsilon})

In Equation 12 above, ynoisey_{noise} is the noisy aggregated output for mm number of model owners, and Φ\Phi is the aggregation function (e.g., majority vote, confidence-weighted sum of inference probabilities). Here, ynoise=[y1,y2,,yo]y_{noise}=[y_{1},y_{2},...,y_{o}] is an oo-dimensional vector with a total of oo number of noisy aggregated output inferences. The final label is produced using the argmaxargmax operation as y=argmax([y1,y2,,yo])y=argmax([y_{1},y_{2},...,y_{o}]), where yy is the output label for input vector xx. The aggregator encrypts this output with the client’s public key and sends it to the client over a secure channel.

5. Evaluation

In this section, we valuate pricure by answering the following research questions:

RQ1:. What is the accuracy/privacy trade-off for pricure and how does it compare with prior work?

RQ2: What is the impact of the number of model owners on inference accuracy and privacy?

RQ3: What is the impact of differentially-private aggregation in reducing membership inference attack by a semi-honest client?

RQ4: What performance overhead is incurred by pricure overall, per-sample, and per-model?

5.1. Datasets, Model Architectures, and Setup

We use four datasets that focus on: handwritten digit recognition (MNIST (LeCun et al., 2020)), clothing image classification (Fashion-MNIST (Xiao et al., 2017)), breast cancer detection (IDC (Kaggle, 2020)), and in-ICU length-of-stay prediction for patients (MIMIC (Johnson et al., 2016)). We chose these datasets because they were used as benchmarks in related work (Papernot et al., 2017; Mohassel and Zhang, 2017; Riazi et al., 2018; Gilad-Bachrach et al., 2016) and also are relevant for privacy-sensitive domains (e.g., IDC and MIMIC are medical datasets). Next, we first briefly describe each dataset and then discuss model architectures and training setup.

MNIST is a collection of 70K gray-scale images of handwritten digits with training set of 60K images and a test set of 10K images. Each sample has a width-height dimension of 28x28 pixels of handwritten digits 0 to 9, which make up the 10 classes. We divide the 60K training samples into multiple chunks to train multiple models. For instance, for 50 model owners, we divide the training set into 50 disjoint datasets so that each model owner has 1.2K samples to train their model on. We use 500 samples from the test set for the evaluation of pricure.

Fashion-MNIST is a dataset from Zalando’s article images with 60K training set and 10K test set. Each example is a 28x28 gray-scale image, associated with a label from 10 classes. The labels are for T-shirt/top, trouser, dresses, etc. For the highest number of model owners of 40, we divide the training set into 40 disjoint datasets so each model owner has 1.5K samples to train their model. We use 500 samples from the test set to evaluate pricure.

IDC is the Invasive Ductal Carcinoma (IDC) dataset with 277,524 patches of size 50×50 extracted from 162 whole mount slide images of breast cancer specimens scanned at 40x. Out of the 277,524 samples, 198,738 test is negative (benign) and 78,786 test positive with IDC. Using under-sampling to balance positive and negative samples, we use 277,024 samples for training a binary classifier and 500 samples as testing samples for pricure.

MIMIC is the Medical Information Mart for Intensive Care (MIMIC) dataset with 60K samples with details such as demographics, vital signs, laboratory tests, medications and personal information about patients. The classifier predicts the length of stay (LOS) of a patient in the hospital. We use a multi-class classifier where LOS is divided into four classes: class 0 (0-4 days), class 1 (4-8 days), class 2 (8-12 days), and class 3 (12 or more days). Our training dataset contains 58, 386 samples and test set has 590 samples.

Model Architectures and Setup: For all the four datasets, we train a FFNN model. For MNIST and Fashion-MNIST, the input layer is 28×2828\times 28 and the output layer has 1010 neurons, with two hidden layers with ReLU activation. The output layer is linear and we use Mean Square Error loss instead of the negative log-likelihood function. Since log-likelihood is performed using softmax function, it requires computing logarithm and exponential functions which are not practical for pricure because we convert all real number parameters to integers. Since our computations are over a finite integer field, we convert all the floating point tensors into fixed precision tensors with a rounding at the second decimal digit (e.g., .456 to 45). For MIMIC, we use one hidden layer with 3030 input neurons and 44 output neurons. For IDC, we use the same FFNN model where the number of input neurons is 75007500 and the number of output neurons is 2. Model architecture details are described in the Appendix (Section 8.1). The learning rate is 0.001 across the four datasets. The number of epochs for MNIST and Fashion-MNIST is 100100, while we used 8080 epochs for MIMIC and 200200 epochs for IDC. Finally, for each dataset, each model owner secret-shares their respective model parameters with workers A and B.

5.2. Accuracy/Privacy Trade-Off

We first analyze the inference accuracy of pricure with respect to privacy budget ϵ\epsilon. Smaller ϵ\epsilon values imply stronger privacy guarantees (e.g., against membership inference) and vice-versa. The ideal/optimal case is when we achieve high inference accuracy while providing strong privacy (smallest ϵ\epsilon value), but striking the sweet spot that fulfills both is generally non-trivial for the trade-off depends on training data, number of model owners, and aggregation scheme used to decide the final inference result. In Figures 55, xx-axis shows privacy budget ϵ\epsilon and yy-axis is inference accuracy of pricure. Across the four datasets, the smallest number of model owners is m=10m=10, while the maximum is in the range 4010040-100 (e.g., 4040 for Fashion-MNIST, 5050 for MNIST, and 100100 for IDC), depending on the size of the training set and inference accuracy.

On MNIST and Fashion-MNIST Benchmarks: Figure 5 shows inference accuracy with respect to privacy budget ϵ\epsilon (in the range [10-2, 1]) for pricure. Note that we have divided the 6060K MNIST training set into range of 10, 25, and 50 disjoint datasets with 6K6K, 2.4K2.4K, and 1.2K1.2K per-model owner samples, respectively. The client has 500500 input samples as the inference set. For MNIST, when number of model owners m=50m=50, the inference accuracy is 55.53%55.53\% for ϵ=0.03\epsilon=0.03. Compared to the non-private 92.4%92.4\% inference accuracy for MNIST, we observe that pricure incurs significant accuracy loss for the privacy gain it provides in exchange. Interestingly, for ϵ=0.05\epsilon=0.05, the inference accuracy for pricure is 92.6%92.6\% (i.e, no accuracy loss), and the inference accuracy remains fairly stable for ϵ[0.5,1]\epsilon\in[0.5,1] (see Figure 5). This observation is consistent with prior work on ensemble of differentially private teacher models by Papernot et. al. (Papernot et al., 2017, 2018b). The best trade-off is with ϵ=0.05\epsilon=0.05 and inference accuracy=92.6% when m=50m=50.

Refer to caption
Figure 2. MNIST: Accuracy/privacy trade-off for ϵ[0.01,1]\epsilon\in[0.01,1] and m{10,25,50}m\in\{10,25,50\}.
Refer to caption
Figure 3. Fashion-MNIST: Accuracy/privacy trade-off for ϵ[0.01,1]\epsilon\in[0.01,1] and m{10,20,40}m\in\{10,20,40\}.
Refer to caption
Figure 4. IDC: Accuracy/privacy trade-off for ϵ[0.01,1]\epsilon\in[0.01,1] and m{10,50,100}m\in\{10,50,100\}.
Refer to caption
Figure 5. MIMIC: Accuracy/privacy trade-off for ϵ[0.01,1]\epsilon\in[0.01,1] and m{10,20,40}m\in\{10,20,40\}.

Figure 5 shows accuracy/privacy trade-off analysis for Fashion-MNIST for 10, 20, and 40 model owners. For m=40m=40, the inference accuracy is 82.8% with ϵ=0.1\epsilon=0.1. For ϵ<0.1\epsilon<0.1, the inference accuracy drops dramatically. For example, when ϵ=0.05\epsilon=0.05, inference accuracy is 74.6%74.6\% which is smaller than our best case accuracy on Fashion-MNIST of 82.8%82.8\% when m=40m=40. Note that non-private inference accuracy is 82.8%82.8\%, which, similar to MNIST, suggests that there is no accuracy loss. The best trade-off is obtained with ϵ=0.1\epsilon=0.1 and inference accuracy = 82.8% for m=40m=40.

On IDC and MIMIC: Figure 5 shows accuracy/privacy trade-off analysis for IDC. When m=100m=100, inference accuracy is 71.6% for ϵ\epsilon=0.05, which is the optimal case in terms of accuracy/privacy trade-off. Hence, pricure offers an acceptable inference accuracy for IDC with smaller privacy budget, which offers a strong privacy guarantee. For ϵ<0.05\epsilon<0.05, the inference accuracy is lower, for example, for ϵ=0.03\epsilon=0.03, inference accuracy is 71.6% with m=100m=100, which suggests that the acceptable noise range for IDC is ϵ[0.05,1]\epsilon\in[0.05,1]. Figure 5 shows accuracy/privacy trade-off for MIMIC. The best case inference accuracy is 76.6% with ϵ=0.05\epsilon=0.05 for m=40m=40. For ϵ<0.05\epsilon<0.05, the inference accuracy is much lower (e.g., 34%34\% for ϵ=0.01\epsilon=0.01).

Comparison with Closely Related Work: Papernot et. al.(Papernot et al., 2017) show with a CNN model on MNIST, their non-private model is 99.18% accurate while the private ensemble of 250 teacher models is 93.18% accurate. In our case, the non-private FFNN model on MNIST is 96.6% accurate, while its private counterpart of 50 model owners is 92.6% with ϵ=0.05\epsilon=0.05. We note that initial differences in the non-private accuracy of the models could be attributed to the different in model architectures, i.e., CNN (theirs) vs. FFNN (ours).

\bigstar Take-away: With respect to RQ1, our analysis of the accuracy/privacy trade-off suggests that pricure provides acceptable privacy guarantees with very minimal trade-off on inference accuracy, showing its practical viability in a multi-party setting.

5.3. Number of Model Owners vs. Accuracy and Privacy

As we indicated in Section 4.2, intuitively, a higher value for number of model owners mm would potentially entail better collective inference accuracy. However, this depends on the accuracy of each model and the privacy budget. With that background, here, we analyze the impact of mm on inference accuracy and privacy budget.

On MNIST: From Figure 5, as mm increases from 10 to 25 and then 50, we notice a roughly consistent trend, especially for ϵ>0.03\epsilon>0.03. With progressive increase in ϵ\epsilon from ϵ=0.05\epsilon=0.05 to ϵ=1\epsilon=1, inference accuracy seems to be inversely proportional to increase in mm. Keeping the same trend for ϵ[0.01,1]\epsilon\in[0.01,1], the highest inference accuracy values are achieved for the lowest number of model owners (m=10m=10), and the lowest inference accuracy values correspond to the highest number of model owners (m=50m=50). In particular, for m=50m=50, the inference accuracy is 92.4% (with ϵ=0.05\epsilon=0.05). On the other hand, inference accuracy jumps to 94.52% (with ϵ=0.3\epsilon=0.3) and then 96.56% (with ϵ=0.5\epsilon=0.5) for m=25m=25 and m=10m=10, respectively.

Overall, from these observations we synthesize that higher number of model owners (e.g., m=50m=50) result in lower inference accuracy as opposed to lower number of model owners (e.g., m=25m=25, m=10m=10). It is, however, noteworthy that higher inference accuracy for lower number of model owners offer relatively lower privacy guarantees. For instance, with m=25m=25 and m=10m=10, inference accuracy of 94.52% and 96.56% are achieved with ϵ=0.3\epsilon=0.3 and ϵ=0.5\epsilon=0.5, respectively.

Fashion-MNIST, IDC, and MIMIC: We observe similar results for Fashion-MNIST and IDC from Figure 5 and Figure 5. For Fashion-MNIST, the best case of inference accuracy is 82.8% for ϵ=0.03\epsilon=0.03, 84.2% for ϵ=0.3\epsilon=0.3, and 84.39% for ϵ=0.5\epsilon=0.5, respectively, for 40, 20, and 10 model owners. On the other hand, for IDC, the best case of inference accuracy is 71.6% for ϵ=0.05\epsilon=0.05, 72.2% for ϵ=0.05\epsilon=0.05, and 73.4% for ϵ=0.3\epsilon=0.3, respectively, for 100, 50, and 10 model owners. As a result, for Fashion-MNIST and IDC dataset, we observe that pricure achieves better inference accuracy for smaller number of model owners, but with a caveat of relatively lower privacy guarantee. For MIMIC, from Figure 5 we observe that for 40 model owners, the best case for inference accuracy is 76.6% with ϵ=0.05\epsilon=0.05, while for 10 model owners, the best case for inference accuracy is 80.53% with ϵ=0.3\epsilon=0.3 which also supports our claims.

\bigstar Take-away: With respect to RQ2, we note that if the number of model owners mm is higher, it allows a larger noise level (i.e., offers stronger privacy guarantee), while for smaller mm values the inference accuracy is more sensitive to noise. Our findings here are inline with PATE (Papernot et al., 2017) with respect to larger number of models resulting in lower privacy budget on MNIST.

5.4. Utility of Differential Privacy Against Membership Inference Attack

We recall that one of the goals of pricure is to deter a semi-honest client who may mount membership inference attack (MIA). To that end, we perform DP-aggregation of inference results before we release each inference result to the client. The goal of MIA is to identify (with some confidence) whether a given sample belongs to a training set of a target model (Shokri et al., 2017). To examine the utility of DP in limiting MIA, we measure the accuracy, precision and recall of MIA for both noiseless and noisy aggregation of inference results. Accuracy measures the percentage of examples that are correctly predicted to be members of the target model’s training dataset. Precision measures the proportion of true membership inference with respect to all reported attacks, while recall measures the coverage of the attack, as the fraction of the training records that the attacker can correctly deduce as members of the training set. For the purpose of this evaluation, we reproduce the MIA by Shokri et. al. (Shokri et al., 2017), and we use the same size of member and non-member data to maximize the inference uncertainty, with baseline accuracy of 50%50\%. We use Fashion-MNIST and MIMIC to examine the effect of differential privacy on MIA.

Refer to caption
Figure 6. Membership inference attack accuracy comparison for non-private and ϵ\epsilon-DP model on Fashion-MNIST.
Refer to caption
Figure 7. Membership inference attack accuracy comparison for non-private and ϵ\epsilon-DP model on MIMIC.

MIA Against Fashion-MNIST Model: We use 5,000 samples to train the target model. There are 2020 shadow models and their training size is set to 5,0005,000. Training dataset of shadow models are disjoint with the training dataset of the target model. The shadow models are expected to mimic the behavior of the target model as the target model and the shadow models are all trained on data coming from the same population. As a test dataset, we use the samples with equal number of members and non-members.

Figure 7 shows per-class accuracy of MIA for both private and non-private model on Fashion-MNIST. Without differentially private prediction, the maximum, minimum, and average MIA accuracy, respectively, is 69.3%69.3\% (for class-4), 46%46\% (for class-9), and 57.13%57.13\% (for all 10 classes). On the other hand, when we add Laplacian noise with ϵ=0.1\epsilon=0.1 post-aggregation on each inference result, average MIA accuracy is 51.17%51.17\% with maximum of 54.1%54.1\% for class-4 and minimum accuracy of 45.3%45.3\% for class-9. The attack accuracy degradation (by 5.96% to be exact) implies that due to the presence of Laplacian noise, the model owner’s training data are less vulnerable to MIA. We also observe that, adding more noise gradually decreases the MIA accuracy, thus mitigating training data exposure.

The average precision value for all class without DP is 54.22%54.22\%, suggesting that for the non-private model, 54.22%54.22\% of the images that were inferred as members by the attacker are true members. On the other hand, average recall score for the non-private model for all classes is 79.02%79.02\%, implying that the attacker can correctly infer an averages of 79.02%79.02\% of the training images as members. On the other hand, precision and recall with privacy budget = 0.10.1 are 51.2%51.2\% and 46.24%46.24\%, respectively. The lower precision (by 3.02%) and lower recall (by 32.78%) for the private model suggest that ϵ\epsilon-DP mitigates the membership inference attack.

MIA Against MIMIC Model: We use 5,0005,000 samples for both the target model and shadow model, with 2020 shadow models. We use the same number of members and non-members for evaluation. Figure 7 shows per-class accuracy of MIA for both private and non-private model on MIMIC. In a non-private setting, the maximum MIA accuracy is 69%69\% for class-2 and minimum MIA accuracy is 53%53\% for class-0. The average MIA accuracy is 61.18%61.18\% for all four classes, suggesting 61.18%61.18\% examples of the target model’s training data are correctly predicted to be members. With the privacy budget ϵ=0.1\epsilon=0.1, the maximum MIA accuracy is 52.3%52.3\% for class-1 and minimum MIA accuracy is 49%49\% for class-3. For these settings, the average accuracy of MIA is 52.1%52.1\% across all four classes. The average accuracy degradation is 9.08%9.08\%, which suggests that ϵ\epsilon-DP mitigates MIA by 9.08%9.08\%. Looking at precision and recall for the non-private model, we obtain 56.47%56.47\% and 96.2%96.2\%, respectively. On the other hand, precision and recall with privacy budget = 0.10.1 are 50.95%50.95\% and 51.48%51.48\%, respectively. The lower precision (by 5.52%) and lower recall (by 44.72%) for the private model suggest that ϵ\epsilon-DP mitigates the attacker’s pursuit of members’ records.

Comparison with Closely Related Work: While PATE (Papernot et al., 2017) discusses intuitive guarantee against MIA, they did not provide quantitative measurement of MIA based on noise scale. In (Rahman et al., 2018), the effect of DP-based training against MIA is examined. On MNIST, with ϵ=2\epsilon=2 and ϵ=1\epsilon=1, they show that DP can eventually mitigate MIA accuracy and bring it close to baseline. Unlike (Rahman et al., 2018), in pricure we use ϵ\epsilon-DP after aggregating the inference results of model owners, with sizable MIA mitigation within our noise boundary.

\bigstar Take-away: With respect to RQ3, differentially private aggregation limits membership inference attack in a way that both the inference about members and the coverage of attack are minimized.

5.5. Performance Overhead Analysis

Refer to caption
Figure 8. Per-model secret-sharing overhead in milli-seconds.
Refer to caption
Figure 9. Overall per-sample inference overhead in milli-seconds with SMPC-DP (pricure) and without SMPC-DP (non-private).

We evaluate the overall, per-sample, and per-model performance overhead incurred by pricure. In particular, we measure time taken by a) a model owner to secret-share their model parameters with the workers and b) per-sample overhead for inference.

Hardware Specification: To run our experiments, we used Google Colab virtual machine which provides 2 Intel(R) Xeon(R) model 79, family 6 CPUs with 2.2GHz each, with 12.72GB RAM and 107.77GB disk space. To speed-up computations, we used Python’s multi-threading features to efficiently utilize the two CPUs at a time, by dividing the task between them and get output inferences from two model owners in one cycle.

Per-model Overhead for Secret-Sharing: Figure 9 shows the duration of secret sharing of model parameter across the four datasets, and for different number of model owners. This performance is influenced by the number of model parameters (WW and bb). The xx-axis shows the different datasets and the yy-axis shows the time needed, in milliseconds, to share a single model owner’s model parameters with the two workers. Overall, the per-model secret-sharing overhead is insignificant. Comparatively, IDC seems to incur more delay (up to 392.4392.4ms) of model parameter secret sharing because it produces vast model parameters compared to others. The other data sets need much less time in the range 8ms-48ms to share their model parameters, which is insignificant.

Per-sample Overhead for Inference: In Figure 9, the yy-axis shows overall duration (in log10log_{10} scale) to produce inference result for an input with SMPC-DP (pricure) and without SMPC-DP (non-private). This performance is affected by number of model owners mm, number of classes, and feature vector dimension of the input. From Figure 9, across the board pricure incurs a significant overhead to produce the inference result. For example, on MNIST, pricure requires 772.56s to produce an inference result for one input (with m=50m=50), while producing non-private prediction in just 0.025s. While the overhead seems significant on the surface, given the commodity hardware we used, which can be accelerated with parallelization and GPUs, we do not consider the overhead to be a deal-breaker for practical deployability of pricure.

Comparison with Closely Related Work: SecureNN (Wagh et al., 2018) proposes a DNN-centric protocol for secure training and prediction of 3- and 4-party setup. It was ran on Amazon EC2 c4.8x large instances in a LAN setting and WAN setting with average bandwidth of 625MB/s and 40MB/s, respectively. For per-sample inference on MNIST, 3-party and 4-party SecureNN require 0.34sec and 0.23sec, respectively. On the other hand, with 50-party setup (m=50m=50), pricure requires 772.56sec to compute inference result for an image. Note that for m=1m=1, pricure takes 1.51.5sec to compute a per-sample prediction, which is comparable to SecureNN given the difference in hardware specifications and latency setup.

Chameleon (Riazi et al., 2018) is evaluated based on a 5-layer CNN on MNIST with inference label as output. Experiments were ran on Intel Xeon ES-1620 CPU, 3.5 GHz with 16 GB of RAM. Per-sample classification took 2.7sec with testing accuracy of 99%. In pricure, mm is orders of magnitude larger (e.g., m=50m=50, m=100m=100, depending on the dataset) and the inference is collaborative.

CryptoNets (Gilad-Bachrach et al., 2016) is a cloud-based platform where the server can provide encrypted prediction to client’s computing on their encrypted data so both parties’ data can be secure. CryptoNets was ran on a similar hardware as (Mohassel and Zhang, 2017) and per-sample classification took 297.5s with 99% test accuracy.

\bigstar Take-away: With regards to RQ4, secret-sharing model parameters and aggregating inference vectors incur negligible overhead. We also note that inference overhead in pricure is fairly significant compared to its non-private counterpart. We, therefore, see room for optimizations through hardware acceleration, parallelization, and efficiency advances in SMPC.

5.6. Discussion on Limitations

Scalability: pricure’s effectiveness depends on: number of model owners, and complexity of models and inputs. Other factors kept constant, increasing number of model owners slows down the inference process, because more models means more load on the workers. Model complexity/architecture plays an important role in system performance. Model details such as number of hidden of layers, activation functions, and pooling operations determine the effectiveness of pricure. The complexity of feature vectors of inputs also determines the speed of computing predictions and the complexity of the learned parameters. For instance, images of 28x28 pixels for MNIST result in a model far less lightweight than images of 50x50 pixels for IDC. While our evaluations are encouraging as to real-life deployability of pricure, its scalability with more complex model architectures is worth exploring as future work. In addition, as the number of models increase, at some point, the performance load on the workers will be too much to bear, which demands performance enhancement alternatives such as hardware acceleration, multi-threading, and introducing more workers (albeit more complexity).

Performance Overhead Measurements: Due to our limitations of hardware resources, we measured performance overhead of pricure on a single machine. As a result, some measurements (e.g., secret-sharing delay, sending intermediate results from workers to the aggregator) may not fully capture an actual distributed setup. As in prior work (e.g., (Wagh et al., 2018)), production performance overhead measures need to be done in distributed setup to measure network latency for secret-sharing model parameters, computing aggregated inference results, and sending them to the client.

6. Related Work

We discuss related work in the context of pricure. For detailed survey, we refer the reader to (Cristofaro, 2020) and (Papernot et al., 2018a).

Papernot et al. (Papernot et al., 2017, 2018b) train an ensemble of ‘teacher’ models on disjoint private data, and use the teachers as labeling oracles to train a ‘student’ model via aggregated noisy voting by teachers. PATE ensures a strong privacy guarantee for training data, yet assumes the test dataset as public/non-private. Unlike PATE, in pricure, model owners’ training data and client input samples are private.

Abadi et al. (Abadi et al., 2016) leverage differential privacy to train deep neutral networks with built-in privacy through differentially-private stochastic gradient descent (DP-SGD). While DP-SGD offers strong privacy guarantees for training data, it is not intended for the collaborative inference setup that we explore in pricure with private models and private inputs.

Jayaraman et al. (Jayaraman et al., 2018) combine differential privacy and secure multiparty computation to enable privacy-preserving collaborative training in a distributed setting. They explore output perturbation and gradient perturbation. In the output perturbation setting, parties combine local models within a secure computation and then add the differential privacy noise before revealing the model. In the collaborative inference setting, pricure is similar to this approach because securely computed inference results are aggregated and differential privacy noise is added before the final inference result is revealed to the client. In the gradient perturbation method, the data owners collaboratively train a global model.

Lindell and Pinkas (Lindell and Pinkas, 2000) propose a system for two parties who own their private database with similar feature type, and are willing to collaborate in order to run the ID3 learning algorithm with the union of their respective databases without revealing too much about their data to each other. In the sense of multiple model owners aiming to collaboratively predict on a private input, this approach is broadly similar to pricure.

Graepel et al. (Graepel et al., 2012) study FHE-based outsourcing of execution of a machine learning algorithm while retaining confidentiality of the training and test data. Bost et al. (Bost et al., 2015) design and evaluate CryptoML for privacy-preserving classification over encrypted data where both the model provider’s training data and client’s test data are unknown to each other. Riazi et al. (Riazi et al., 2018) propose Chameleon to enable two parties to jointly compute a function securely without disclosing their private input. Chameleon is based on additive secret sharing —for linear operations and Garbled circuit —for non-linear operations. Wagh et al. (Wagh et al., 2018) design and evaluate a secure computation setting for 3-party and 4-party over common DNN architectures such that no single party can learn others’ private data. Giacomelli et al. (Giacomelli et al., 2018) propose a privacy-preserving collaborative approach for Random Forest where multiple parties share their private model securely and produce encrypted prediction for a client’s private input data.

7. Conclusion

We presented pricure, a system that leverages orthogonal privacy guarantees of secure multi-party computation and differential privacy to enable multiple parties that own private models to collaborate on a common ML task such as classification of a privacy-sensitive medical image owned by a client. At the core of pricure is additive secret sharing that ensures after an iteration of a collaborative inference on a private input, model owners learn nothing about the input and clients learn nothing about the model parameters behind the collaborative inference. In addition, through differentially-private inference aggregation, pricure limits the advance of membership inference attack by a semi-honest client. On a commodity hardware, we demonstrate the practical viability of pricure on four datasets, for tens of model owners, with acceptable accuracy/privacy trade-off, and performance overhead amenable to speed-up using GPUs, parallelization, and more efficient secure multi-party computation protocols.

References

  • (1)
  • Abadi et al. (2016) Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016. ACM, 308–318.
  • Angelini et al. (2008) Eliana Angelini, Giacomo di Tollo, and Andrea Roli. 2008. A neural network approach for credit risk evaluation. The Quarterly Review of Economics and Finance 48, 4 (Nov. 2008), 733–755. https://doi.org/10.1016/j.qref.2007.04.001
  • Attema et al. (2018) Thomas Attema, Emiliano Mancini, Gabriele Spini, Mark Abspoel, Jan de Gier, Serge Fehr, Thijs Veugen, Maran van Heesch, Daniël Worm, Andrea De Luca, Ronald Cramer, and Peter M. A. Sloot. 2018. A New Approach to Privacy-Preserving Clinical Decision Support Systems for HIV Treatment. CoRR abs/1810.01107 (2018).
  • Bayatbabolghani and Blanton (2020) Fattaneh Bayatbabolghani and Marina Blanton. 2020. Secure Multi-Party Computation. In PAI Workshop held in conjunction with AAAI’2020.
  • Beaver (2012) D. Beaver. 2012. Efficient Multiparty Protocols using Circuit Randomisation. Springer 576 (2012).
  • Ben-Or et al. (1988) Michael Ben-Or, Shafi Goldwasser, and Avi Wigderson. 1988. Completeness Theorems for Non-Cryptographic Fault-Tolerant Distributed Computation. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, Janos Simon (Ed.). ACM, 1–10.
  • Bost et al. (2015) Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. 2015. Machine Learning Classification over Encrypted Data. In 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, February 8-11, 2015. The Internet Society.
  • Chaudhuri et al. (2011) Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. J. Mach. Learn. Res. 12 (2011), 1069–1109.
  • Cristofaro (2020) Emiliano De Cristofaro. 2020. An Overview of Privacy in Machine Learning. CoRR abs/2005.08679 (2020). arXiv:2005.08679 https://arxiv.org/abs/2005.08679
  • Dahl et al. (2012) G. E. Dahl, Dong Yu, Li Deng, and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (Jan. 2012), 30–42. https://doi.org/10.1109/tasl.2011.2134090
  • Damgård et al. (2012) Ivan Damgård, Valerio Pastro, Nigel P. Smart, and Sarah Zakarias. 2012. Multiparty Computation from Somewhat Homomorphic Encryption. In Advances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings (Lecture Notes in Computer Science), Reihaneh Safavi-Naini and Ran Canetti (Eds.), Vol. 7417. Springer, 643–662.
  • Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings (Lecture Notes in Computer Science), Shai Halevi and Tal Rabin (Eds.), Vol. 3876. Springer, 265–284.
  • Fredrikson et al. (2015) Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015. 1322–1333.
  • Gao et al. (2019) Feng Gao, Wei Wang, Miaomiao Tan, Lina Zhu, Yuchen Zhang, Evelyn Fessler, Louis Vermeulen, and Xin Wang. 2019. DeepCC: a novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis 8, 9 (Aug. 2019). https://doi.org/10.1038/s41389-019-0157-8
  • Giacomelli et al. (2018) Irene Giacomelli, Somesh Jha, Ross Kleiman, David Page, and Kyonghwan Yoon. 2018. Privacy-Preserving Collaborative Prediction using Random Forests. CoRR abs/1811.08695 (2018). arXiv:1811.08695 http://arxiv.org/abs/1811.08695
  • Gilad-Bachrach et al. (2016) Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 (JMLR Workshop and Conference Proceedings), Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. JMLR.org, 201–210.
  • Graepel et al. (2012) Thore Graepel, Kristin E. Lauter, and Michael Naehrig. 2012. ML Confidential: Machine Learning on Encrypted Data. In Information Security and Cryptology - ICISC 2012 - 15th International Conference, Seoul, Korea, November 28-30, 2012, Revised Selected Papers (Lecture Notes in Computer Science), Taekyoung Kwon, Mun-Kyu Lee, and Daesung Kwon (Eds.), Vol. 7839. Springer, 1–21.
  • Haagh et al. (2018) Helene Haagh, Aleksandr Karbyshev, Sabine Oechsner, Bas Spitters, and Pierre-Yves Strub. 2018. Computer-Aided Proofs for Multiparty Computation with Active Security. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018. IEEE Computer Society, 119–131.
  • Jacobson (1957) Nathan Jacobson. 1957. Structure of Rings. In Bull. Amer. Math. Soc. 63 (1957). 46–50.
  • Jayaraman et al. (2018) Bargav Jayaraman, Lingxiao Wang, David Evans, and Quanquan Gu. 2018. Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada. 6346–6357.
  • Joan Feigenbaum and Saint-Jean (2004) Raphael Ryger Joan Feigenbaum, Benny Pinkas and Felipe Saint-Jean. 2004. Secure Computation of Surveys. In EU Workshop on Secure Multiparty Protocols, 2004.
  • Johnson et al. (2016) Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3, 1 (24 May 2016), 160035. https://doi.org/10.1038/sdata.2016.35
  • Kaggle (2020) Kaggle. 2020. Invasive Ductal Carcinoma Dataset. https://www.kaggle.com/paultimothymooney/breast-histopathology-images.
  • Krizhevsky et al. (2017) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
  • LeCun et al. (2020) Yan LeCun, Corinna Cortes, and Christopher J.C. Burges. 2020. The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/.
  • Lindell and Pinkas (2000) Yehuda Lindell and Benny Pinkas. 2000. Privacy Preserving Data Mining. In Advances in Cryptology - CRYPTO 2000, 20th Annual International Cryptology Conference, Santa Barbara, California, USA, August 20-24, 2000, Proceedings (Lecture Notes in Computer Science), Mihir Bellare (Ed.), Vol. 1880. Springer, 36–54.
  • Liu et al. (2017) Jian Liu, Mika Juuti, Yao Lu, and N. Asokan. 2017. Oblivious Neural Network Predictions via MiniONN Transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 619–631.
  • Mohassel and Zhang (2017) Payman Mohassel and Yupeng Zhang. 2017. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 19–38.
  • Nair et al. (2015) Divya G. Nair, V. P. Binu, and G. Santhosh Kumar. 2015. An Improved E-voting scheme using Secret Sharing based Secure Multi-party Computation. CoRR abs/1502.07469 (2015).
  • OpenMined (2020) OpenMined. 2020. PySyft. https://github.com/OpenMined/PySyft/blob/master/syft/frameworks/torch/mpc/securenn.py.
  • Papernot et al. (2017) Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2017. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. In 5th International Conference on Learning Representations, ICLR 2017.
  • Papernot et al. (2018a) Nicolas Papernot, Patrick D. McDaniel, Arunesh Sinha, and Michael P. Wellman. 2018a. SoK: Security and Privacy in Machine Learning. In 2018 IEEE European Symposium on Security and Privacy, EuroS&P 2018, London, United Kingdom, April 24-26, 2018. 399–414.
  • Papernot et al. (2018b) Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018b. Scalable Private Learning with PATE. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
  • Rahman et al. (2018) Md. Atiqur Rahman, Tanzila Rahman, Robert Laganière, and Noman Mohammed. 2018. Membership Inference Attack against Differentially Private Deep Learning Model. Trans. Data Priv. 11, 1 (2018), 61–79.
  • Riazi et al. (2018) M. Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M. Songhori, Thomas Schneider, and Farinaz Koushanfar. 2018. Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, AsiaCCS 2018, Incheon, Republic of Korea, June 04-08, 2018, Jong Kim, Gail-Joon Ahn, Seungjoo Kim, Yongdae Kim, Javier López, and Taesoo Kim (Eds.). ACM, 707–721.
  • Sallab et al. (2017) Ahmad El Sallab, Mohammed Abdou, Etienne Perot, and Senthil Kumar Yogamani. 2017. Deep Reinforcement Learning framework for Autonomous Driving. CoRR abs/1704.02532 (2017).
  • Shamir (1979) Adi Shamir. 1979. How to Share a Secret. Commun. ACM 22, 11 (1979), 612–613.
  • Shokri et al. (2017) Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 3–18.
  • Tramèr et al. (2016) Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016. 601–618.
  • Wagh et al. (2018) Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2018. SecureNN: Efficient and Private Neural Network Training. IACR Cryptol. ePrint Arch. 2018 (2018), 442.
  • Wagh et al. (2019) Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2019. SecureNN: 3-Party Secure Computation for Neural Network Training. Proc. Priv. Enhancing Technol. 2019, 3 (2019), 26–49.
  • Xiao et al. (2017) Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747
  • Yao (1986) Andrew Chi-Chih Yao. 1986. How to Generate and Exchange Secrets (Extended Abstract). In 27th Annual Symposium on Foundations of Computer Science, Toronto, Canada, 27-29 October 1986. IEEE Computer Society, 162–167.

8. Appendix

8.1. Model Architectures Used for Evaluation

Model Architecture of MNIST and Fashion-MNIST Number of input neurons =28×28=28\times 28 Number of output neurons =10=10 Number of hidden layers l=2l=2 Number of neurons in 1st1^{st} layer =128128 Number of neurons in 2nd2^{nd} layer =6464 Activation Function: ReLu Error estimation method: MSE

Model Architecture of IDC Regular Number of input neurons =7500=7500 Number of output neurons =2=2 Number of hidden layers l=1l=1 Number of neurons in hidden layer =500500 Activation Function: ReLu
Error estimation method: MSE

Model Architecture of MIMIC Critical Care Number of input neurons =30=30 Number of output neurons =4=4 Number of hidden layers l=1l=1 Number of neurons in hidden layer =500500 Activation Function: ReLu
Error estimation method: MSE

8.2. Illustration of Additive Secret Sharing

For additive secret sharing, first we have a finite field 0,1,2,Q1{0,1,2,......Q-1} where all the secret-shares belong where QQ is a very large prime number. Now, suppose we split up xx into ii shares such that xx = x1,x2,x3,,xi1x_{1},x_{2},x_{3},...,x_{i-1} and the ithi^{th} share will be xi=x(x1+x2+,,+xi)x_{i}=x-(x_{1}+x_{2}+,...,+x_{i}), and the reconstruction will be x=k=1ixkx=\sum_{k=1}^{i}x_{k}. (Haagh et al., 2018). For two workers A and B, input value x1x_{1} is split into two secret-shares and sent to A and B as follows:

A:x1a=Random(Q,Q)\displaystyle A:x_{1}^{a}=Random(-Q,Q)
B:x1b=x1x1a\displaystyle B:x_{1}^{b}=x_{1}-x_{1}^{a}

The reconstruction is done using the following procedure:

x1=(x1a+x1b)Mod(Q)\displaystyle x_{1}=(x_{1}^{a}+x_{1}^{b})\quad Mod\quad(Q)

The reconstruction can be proved with as follows:

(x1a+x1b)%Q\displaystyle(x_{1}^{a}+x_{1}^{b})\%Q =(x1a+((x1x1a)%Q))%Q\displaystyle=(x_{1}^{a}+((x_{1}-x_{1}^{a})\%Q))\%Q
=(x1a%Q)+(x1%Q)%Q(x1a%Q)%Q\displaystyle=(x_{1}^{a}\%Q)+(x_{1}\%Q)\%Q-(x_{1}^{a}\%Q)\%Q
=(x1a%Q)+(x1%Q)%Q(x1a%Q)\displaystyle=(x_{1}^{a}\%Q)+(x_{1}\%Q)\%Q-(x_{1}^{a}\%Q)
=(x1%Q)[(a%n1)%n1=a%n1]\displaystyle=(x_{1}\%Q)\quad[(a\%n1)\%n1=a\%n1]
=x1[Q>>x,Q:largeprime]\displaystyle=x_{1}\quad[Q>>x,Q:large\quad prime]

For summation, suppose we want to compute x+yx+y via secret sharing. We divide them using Equation 4 and 5 as follows:

A:xa=Random(Q,Q),ya=Random(Q,Q)\displaystyle A:x^{a}=Random(-Q,Q),y^{a}=Random(-Q,Q)
B:xb=(xxa)modQ,yb=(YYa)modQ\displaystyle B:x^{b}=(x-x^{a})\quad mod\quad Q,y^{b}=(Y-Y^{a})\quad mod\quad Q

Now, the system sends xax^{a} and yay^{a} to worker AA and xbx^{b} and yby^{b} to worker BB. The summation operation on each worker side will be:

A:xa+ya\displaystyle A:x^{a}+y^{a}
B:xb+yb\displaystyle B:x^{b}+y^{b}

The actual sum is reconstructed using Equation 6 as follows:

sum=(xa+ya+xb+yb)Mod(Q)\displaystyle sum=(x^{a}+y^{a}+x^{b}+y^{b})\quad Mod\quad(Q)

This reconstruction can be proved as follows:

((xa+ya+xb+yb))%Q\displaystyle((x^{a}+y^{a}+x^{b}+y^{b}))\%Q
=(xa%Q)+(ya%Q)(xa%Q)%Q(xa%Q)%Q+(x%Q)%Q+(y%Q)%Q\displaystyle=(x^{a}\%Q)+(y^{a}\%Q)-(x^{a}\%Q)\%Q-(x^{a}\%Q)\%Q+(x\%Q)\%Q+(y\%Q)\%Q
=(xa%Q)+(ya%Q)(xa%Q)(ya%Q)+(x%Q)+(y%Q)\displaystyle=(x^{a}\%Q)+(y^{a}\%Q)-(x^{a}\%Q)-(y^{a}\%Q)+(x\%Q)+(y\%Q)
=(x%Q)+(y%Q)[(a%n1)%n1=a%n1]\displaystyle=(x\%Q)+(y\%Q)\quad[(a\%n1)\%n1=a\%n1]
=x+y[Q>>x,y]\displaystyle=x+y\quad[Q>>x,y]

Since Q is a large prime compared to xx and YY, the above illustration proves that additive secret sharing reconstructs x+yx+y.

8.3. Illustration of SPDZ

For two numbers xx and yy, our goal is to compute xyx*y securely. We divide them using Equation 4 and 5 as follows:

A:xa=Random(Q,Q),ya=Random(Q,Q)\displaystyle A:x^{a}=Random(-Q,Q),y^{a}=Random(-Q,Q)
B:xb=(xxa)modQ,yb=(YYa)modQ\displaystyle B:x^{b}=(x-x^{a})\quad mod\quad Q,y^{b}=(Y-Y^{a})\quad mod\quad Q

The multiplication operation will be:

(xa+xb)(ya+yb)=xaya+xayb+xbya+xbyb.\displaystyle(x^{a}+x^{b})(y^{a}+y^{b})=x^{a}y^{a}+x^{a}y^{b}+x^{b}y^{a}+x^{b}y^{b}.

We can observe from above that to calculate xaybx^{a}y^{b} and xbyax^{b}y^{a} Worker A and Worker B need each other’s share, which conflicts with the privacy assumptions of our approach. To resolve the conflict, the SPDZ protocol (Beaver, 2012) is used for performing secret sharing multiplication. For this technique, independent of xx and yy, values called ‘multiplication triplets’ are generated in offline phase. In this phase, three numbers q1q1, q2q2 and q3q3 are generated and shared with two parties ahead of running the protocol. The numbers, q1q1, q2q2 and q3q3 are generated as follows:

q1\displaystyle q1 =Random(Q,Q)\displaystyle=Random(-Q,Q)
q2\displaystyle q2 =Random(Q,Q)\displaystyle=Random(-Q,Q)
q3\displaystyle q3 =(q1×q2)Mod(Q)\displaystyle=(q1\times q2)\quad Mod\quad(Q)

Now, these q1,q2q1,q2 and q3q3 parameters are secretly shared with the workers and the workers will compute:

αa\displaystyle\alpha^{a} =xaq1a\displaystyle=x^{a}-q^{a}_{1}
βa\displaystyle\beta^{a} =yaq1a\displaystyle=y^{a}-q^{a}_{1}

And for worker B:

αb\displaystyle\alpha^{b} =xbq1b\displaystyle=x^{b}-q^{b}_{1}
βb\displaystyle\beta^{b} =ybq1b\displaystyle=y^{b}-q^{b}_{1}

And reconstruct them using Equation 6 and compute α\alpha and β\beta. This two parameters will be public to worker A and Worker B. Now the intermediate results of multiplication of xx and yy will be,

WorkerA:(xy)a=q3a+(αa)q2a+(βa)q1a\displaystyle WorkerA:(xy)^{a}=q^{a}_{3}+(\alpha^{a})q^{a}_{2}+(\beta^{a})q^{a}_{1}
WorkerB:(xy)b=q3b+(αb)q2b+(βb)q1b\displaystyle WorkerB:(xy)^{b}=q^{b}_{3}+(\alpha^{b})q^{b}_{2}+(\beta^{b})q^{b}_{1}

If we reconstruct (xy)a(xy)^{a} and (xy)b(xy)^{b} using Equation 6 the real multiplication value of xyxy will be obtained.