PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party Setting
Abstract.
When multiple parties that deal with private data aim for a collaborative prediction task such as medical image classification, they are often constrained by data protection regulations and lack of trust among collaborating parties. If done in a privacy-preserving manner, predictive analytics can benefit from the collective prediction capability of multiple parties holding complementary datasets on the same machine learning task. This paper presents pricure, a system that combines complementary strengths of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative prediction among multiple model owners. SMPC enables secret-sharing of private models and client inputs with non-colluding secure servers to compute predictions without leaking model parameters and inputs. DP masks true prediction results via noisy aggregation so as to deter a semi-honest client who may mount membership inference attacks. We evaluate pricure on neural networks across four datasets including benchmark medical image classification datasets. Our results suggest pricure guarantees privacy for tens of model owners and clients with acceptable accuracy loss. We also show that DP reduces membership inference attack exposure without hurting accuracy.
1. Introduction
Machine learning (ML) is being used in many application domains such as image classification (Krizhevsky et al., 2017), voice recognition (Dahl et al., 2012), medical diagnosis (Gao et al., 2019; Johnson et al., 2016; Kaggle, 2020), finance (e.g., credit risk assessment) (Angelini et al., 2008), and autonomous driving (Sallab et al., 2017). An emerging paradigm in ML is machine learning as a service (MLaaS) where clients submit inputs to obtain predictions via a cloud-based prediction API. When client inputs are privacy-sensitive (e.g., patient data), the MLaaS platform is expected to comply with privacy protection regulations (e.g., HIPAA in the United States) to preserve privacy of inputs (e.g., patient diagnosis details). Moreover, clients may not be encouraged to submit inputs that, if revealed to others may jeopardize their competitive advantage (e.g., on inputs about intellectual property).
In a collaborative setting where multiple MLaaS providers own private models trained on their respective private data, and clients own private input samples, clients can benefit from the collective inference capability of multiple model owners for tasks such as medical image classification. As a result, a practical setting for multiple MLaaS providers to collaborate on a common ML task is to keep the secrecy of their respective models trained on private data and participate in a privacy-preserving common predictive task such that: after a single iteration of collaborative inference, the client learns nothing about the models of MLaaS providers; the MLaaS providers learn nothing about client’s data, and the inference capability of a semi-honest client is limited.
To build privacy-preserving mechanisms in to the ML pipeline, previous work proposed private training methods based on objective perturbation (Chaudhuri et al., 2011), gradient perturbation (Abadi et al., 2016; Jayaraman et al., 2018), and output perturbation (Abadi et al., 2016; Jayaraman et al., 2018). In the collaborative setting, prior work has proposed cryptography-based transformation of ML building blocks (e.g., activation functions, pooling operations) at training time (e.g., CryptoNets (Gilad-Bachrach et al., 2016), SecureML (Mohassel and Zhang, 2017), (Jayaraman et al., 2018)), oblivious transformation of neural networks (e.g., MiniONN (Liu et al., 2017)), and secret sharing (e.g., Chameleon (Riazi et al., 2018), SecureNN (Wagh et al., 2018)) to enable 2- and 3-party secure computation for collaborative training and inference. However, no prior work explores the collaborative inference setting with tens of model owners. Moreover, when a client is semi-honest, it might take the oracle-style interaction with the MLaaS provider to mount membership inference style attacks.
In this paper, we present pricure 111PRICURE stands for PRIvacy-preserving Collaborative inference in a mUlti-paRty sEtting., a system that combines complementary privacy protection notions of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative inference among multiple model owners holding private models and clients holding private input samples. The key insight behind combining SMPC and DP is the orthogonal protections they provide. Intuitively, given an input and a function , SMPC is aimed at avoiding information about that is leaked in the course of computing . DP, on the other hand, aims to randomize such that an adversary has very limited leverage to infer about by observing . In the collaborative inference we consider in pricure, SMPC addresses pre-inference disclosure protection of private data while DP addresses post-inference protection of inference results to limit attacks such as membership inference. This orthogonal nature of SMPC and DP makes their combination appealing for our setting.
The privacy guarantee in pricure stems from the notion of additive secret sharing (Wagh et al., 2018; Beaver, 2012), where model owners train their private models and secret-share model parameters with non-colluding secure servers (we call them workers) that compute intermediate results on secret-shared input samples private to clients and similarly secret-shared by a client with the workers. Using intermediate inference results from the workers, a trusted aggregator reconstructs the final inference results and performs noisy aggregation to deter a semi-honest client who may mount membership inference attacks. While we borrow the additive secret sharing notion from prior work (SecureNN (Wagh et al., 2018), SPDZ (Beaver, 2012)) which focus on privacy-preserving training and inference in a 3-party setting, in pricure we rather focus on secret-sharing-based privacy-preserving collaborative inference to handle tens of model owners with acceptable accuracy/privacy trade-off, while also providing a differential privacy-based layer of defense against membership inference attacks that exploit the oracle access to the MLaaS provider.
We evaluate pricure on neural networks across four datasets that span handwritten digit recognition (MNIST (LeCun et al., 2020)), clothing image classification (Fashion-MNNIST (Xiao et al., 2017)), breast cancer classification (IDC (Kaggle, 2020)), and in-ICU patient length-of-stay prediction (MIMIC (Johnson et al., 2016)). For instance, on the MNIST dataset, using a commodity hardware setup, our results guarantee privacy for up to 50 model owners with nearly no accuracy loss, with a per-model average overhead of 48ms for secret-sharing model parameters, and a per-model response delay of 1.5s. On the MIMIC dataset, we show that DP reduces the accuracy of membership inference attack by up to 9.02%, which demonstrates the utility of differently privacy as a second layer of protection against inference on top of the SMPC-based disclosure protection for inputs and model parameters. In summary, we make the following contributions:
New framework: Through novel combination of orthogonal privacy guarantees of secure multi-party computation and differential privacy, we use additive secret sharing to preserve privacy of model parameters for model owners and input samples for clients; and we provide a differential privacy-based layer of defense against membership inference attack so as to limit adversarial advances of a semi-honest client.
Scalable approach: We build on prior work (Wagh et al., 2018; Beaver, 2012) and demonstrate a scalable collaborative inference with tens of model owners with acceptable accuracy/privacy trade-off.
Comprehensive evaluations: We conduct extensive experiments with four datasets (MNIST (LeCun et al., 2020), Fashion-MNIST (Xiao et al., 2017), IDC (Kaggle, 2020), and MIMIC (Johnson et al., 2016)) that include medical datasets, with varying number of model owners, and across a range of privacy budget values to evaluate: accuracy/privacy trade-off, impact of number of model owners on accuracy/privacy, resilience against membership inference attack, and performance overhead of pricure overall, per-sample, and per-model-owner.
Our code is available at: https://github.com/um-dsp/PRICURE.
2. Background and Preliminaries
In this section, we cover preliminaries focusing on neural networks, secure multi-party computation, and differential privacy.
2.1. Feed-Forward Neural Networks
A feed-forward neural network (FFNN) is a network of information processing units called neurons organized into layers. In a FFNN, information travels only forward without looping, starting from input nodes, then through hidden nodes, and finally to output nodes. The goal of the FFNN is to approximate some function , and then map an input to a label as , where is the learned parameter vector.
The network architecture consists of the following: an input layer that captures raw inputs, hidden layers that apply a series of transformations to the input, and the output layer that produces the mapping of to . For a multi-class classifier, the output layer has as many neurons as the class labels. The coefficients of connections between two neurons are referred to as weight, and are typically initialized as small random values to bootstrap the training process.
Suppose that the FFNN has number of hidden layers, and we denote the number of neurons at layer as . The output layer has number of neurons. The input vector () is the -dimensional input to the network. Without loss of generality, the label is computed as: , where are model parameters, is a vector of non-linear parametric basis functions where . Now, given number of neurons in the hidden layer, the output of neuron of the hidden layer is computed as: , where is the weight from connection between the input to connection of the hidden layer of neuron, is the bias and is the non-linear differentiable activation function. As a result, for the hidden layer, we have the output neurons . These output neurons will then be the input vectors to the hidden layer. In the same vein, the output of the neuron from the hidden layer is computed as: . Finally, the result of the neuron in the output layer is computed as: , where the coefficient is the weight vector from the final hidden layer to the node of the output layer. The resultant vector for all neurons of the output layer will be . Note that, the weight vector , , …, , are concatenated into the parameter vector .
2.2. Secure Multi-Party Computation
Secure multi-party computation (SMPC) (Bayatbabolghani and Blanton, 2020) enables a set of parties , where party holds sensitive input , to jointly compute a function while protecting each . The computation needs to result in the correct value of (called the correctness property) and at the end of the computation each learns nothing beyond (called the privacy property). SMPC has several applications, for example in: privacy-preserving decision making on distributed data (Attema et al., 2018), privacy-preserving machine learning (Gilad-Bachrach et al., 2016), secure auctions (Joan Feigenbaum and Saint-Jean, 2004), and secure voting (Nair et al., 2015).
The two popular implementations of SMPC are garbled circuit (Yao, 1986) and secret sharing (Ben-Or et al., 1988). In garbled circuit, the ’s construct a (large, encrypted) circuit and evaluate it at once, while in secret sharing they need to interact for each circuit gate. Garbled circuit allows for constant number of rounds but requires larger bandwidth (i.e., fewer messages, but bigger messages are sent). Secret sharing has typically low bandwidth (i.e., small messages per gate) and high throughput, where the number of rounds is determined by the depth of the circuit. In this work, we use secret-sharing-based MPC. Intuitively, a -secret sharing scheme splits a secret into shares and at least shares are required to reconstruct the secret. We use to indicate the sharing of among parties.
Additive secret sharing (Shamir, 1979) allows a secret to be split into random parts and shares them with secure workers (real/virtual instances that perform computations such as addition and multiplication securely). For example, suppose there are two workers and and a secret . The above notation then becomes because . Accordingly, and receive share values and , respectively. The workers and perform computation (e.g., compute function ) directly on the share values. After finishing the computation, the workers produce intermediate results as follows:
(1) | ||||
(2) |
These intermediate results and are then combined using private additive scheme and then the true output result is revealed. This secret sharing operation does not use floating point numbers, rather performed in a mathematical space called Integer Quotient Ring (Jacobson, 1957), which contains the integer between 0 to Q-1. Here Q is a prime number that has to be big enough so that the space is able to contain all the numbers that would be used in our experiments. A conceptual proof that illustrates additive secret sharing is described in the Appendix (Section 8.2).
Our implementation of pricure builds on private additive sharing introduced in SecureNN (Wagh et al., 2019) and incorporated in PySyft (OpenMined, 2020). SecureNN builds on MPC to implement exact non-linear functions while avoiding the use of inter-conversion protocols as well as general-purpose number-theoretic libraries. The mathematical operations like matrix multiplication, summation, private comparison, division, and max-pooling operations are built based the MPC property. PySyft implements SPDZ (Damgård et al., 2012), a secret sharing scheme that enables more complex operations. In the Appendix (Section 8.3), we provide an illustrative proof on how SPDZ functions.
2.3. Differential Privacy
For two neighboring datasets and , which differ in just one data point , suppose is trained on and is trained on . Let the output space of and be such that for an input , and . Differential privacy (DP) (Dwork et al., 2006) guarantees that a randomized mechanism and does not enable an observer (adversary) to distinguish whether ’s output was based on or , i.e., whether or not is used as a training example for or . The indistinguishability of ’s membership in or protects the identifiability of (e.g., a person’s medical record). In -DP, the indistinguishability of the outputs of and is parametrized by (also called the privacy budget). Equation 3 formalizes the notion of ()-DP as follows:
(3) |
Intuitively, lower values indicate stronger privacy protection. Equation 3 has the variant: , where refers to the failure probability of the mechanism M, and when , we say is -DP. The -DP formalism in Equation 3 guarantees individual data item privacy in the most extreme case where and are so similar that only one data point sets them apart. In pricure, each is unique since the datasets come from independent data owners. Hence, using the DP notion, pricure guarantees that individuals who contribute privacy-sensitive data have bounded privacy guarantee.
In DP, randomization is at the core of ensuring the indistinguishability of an individual’s record in a dataset. A typical way to achieve output randomization is to add noise to an output (e.g., ) using, for instance, the Laplace mechanism. For a privacy budget and ’s sensitivity value of , using the Laplace mechanism centered at and scale , is computed as .
3. Problem Statement and Threat Model
In this section, we discuss our problem statement and threat model with respect to the parties that take part in pricure.
3.1. Problem Statement
Problem: We consider a setting where multiple model owners hold private models such that is trained on ’s private data, and client submits their private input to obtain an inference result:
,
where is an aggregation function that leverages the collective inference capability of ’s. Taking a feed-forward neural network with number of layers , the collaborative inference result can be expanded in terms of weights and biases as:
.
Goals: Our first goal is to obtain while keeping ’s and private. In particular, after each prediction, learns nothing about ’s (i.e., and ) and ’s learn nothing about and other model owners. Our second goal is to deter a semi-honest client from mounting membership inference attacks.
Challenges: To achieve the aforementioned goals, one needs to address specific research questions with regards to a) accuracy/privacy trade-off, b) scalability with growing number of model owners, c) the prospect of attacks such as membership inference even in the presence of privacy-preserving schemes, and d) performance implications of privacy-enhancing schemes in a multi-party setting. We address these questions in Sections 5.2 - 5.5.
3.2. Threat Model
Here, we describe our threat model with respect to model owners, workers, aggregator, and client.
Model Owners: We consider the semi-honest setting for model owners, whereby they do not trust each other to share their training data and/or models, because of data protection and privacy regulations (e.g., HIPAA) or competitive advantage (e.g., they compete in the same business).
Client: The client is assumed to be semi-honest for it may use pricure as an oracle to initiate membership inference (Shokri et al., 2017), model extraction (Tramèr et al., 2016), or model inversion attacks (Fredrikson et al., 2015). Moreover, it does not have access to model parameters of any of the model owners.
Workers: In the honest-but-curious sense, they may analyze the secret-shares they receive from model owners or intermediate computation results, but are trusted enough not to collude with each other to exchange their respective secret-shares or intermediate computation results. Given the partial result they compute, workers are assumed to have no access to the final inference result.
Aggregator: We assume the aggregator is trusted by model owners and clients not to reveal true inference results to any third party, and will not mount model extraction/membership inference style attacks. Moreover, it doesn’t have access to model parameters of the model owners and the data of the client. When sending the prediction results to the client, the aggregator encrypts the result with the client’s public key (shared with the aggregator a priori).
4. Approach
In this section, we first give a high-level overview of pricure and discuss details in Sections 4.2 – 4.5.
4.1. Overview
Figure 1 highlights our proposed system, pricure. We consider a setting of model owners (e.g., hospitals) who may not share training data and models due to regulatory or trust reasons, yet they want to participate in a collaborative inference task (e.g., medical image classification). As a result, model owners train private models , …, . pricure combines secure multi-party computation (SMPC) and differential privacy (DP) to enable collaborative inference among the model owners on task , while ensuring secrecy of each and samples submitted for inference from client and limiting adversarial advances of a semi-honest client to mount membership inference attack.
Why Combine SMPC and DP? It is because these two schemes are complementary to each other, and they provide orthogonal protections to address and . Given an input and a function , SMPC is aimed at answering “what information about is leaked in the course of computing ?”, which addresses . DP, on the other hand, is concerned with “what can be inferred by analyzing ?”, which addresses . In the sense of the collaborative inference we consider in pricure, SMPC is relevant for pre-inference protection of data while DP is aimed at post-inference protection to avoid attacks such as membership inference.

After training a model on a private dataset, each model owner secret-shares its model parameters with non-colluding secure servers, which we call workers. A client (e.g., another hospital, a medical researcher) who wants to obtain an inference result (e.g., classification) on sample also secret-shares the sample with the workers. A single iteration of collaborative inference proceeds as follows: when the workers receive a secret-share of an input , they use each model’s partial inference function to compute and return a partial result to a trusted aggregator, which reconstructs each model’s partial inference results into full inference output for sample . The aggregator then performs confidence-weighted aggregation of the inference results, and finally leverages differential privacy to add random noise to the final inference result and sends it to the client.
The secrecy of client’s inputs and model parameters of each owner is protected using additive secret sharing. Noisy inference via -DP is used to deter a semi-honest client who might attempt to mount membership inference attacks. The communication link between the client and the aggregator is a secure channel. The client shares its public key with the aggregator, that it uses to encrypt an inference result so the client decrypts it with its private key to obtain the final output of the collaborative inference for sample .
4.2. Model Owners
Each model owner maintains its private training data from which it trains a private model, and does not share both training data and model details with other model owners, the aggregator, or a third party. To enable collaborative inference that benefits from the collective intelligence of what is learned by each , model owners are, in principle, free to use any model architecture, but need to agree a priori on a common feature representation (e.g., pixel intensities for images) and the inference output format (e.g., label only, probability score). A model owner may train its respective model , with or without privacy. We note that when a model is trained in a privacy-preserving manner (e.g., via gradient perturbation as in differentially private stochastic gradient descent (DP-SGD) (Abadi et al., 2016) or using output perturbation (Jayaraman et al., 2018)), the privacy budget will add up with the inference output perturbation pricure introduces at the aggregator, which may incur more penalty on accuracy.
Number of Model Owners: The minimum value for is 2 while there is no limit on the maximum. Intuitively, the higher the value of , the better the collective inference accuracy of the participants. This, however, depends on the accuracy of each model and the privacy budget (amount of noise added on the final inference output). Our model owner setup shares some similarity with PATE (Papernot et al., 2017). However, our setting differs both in the goals and the details. For instance, in PATE, the larger implies potentially better privacy guarantee. In our case, the interpretation of how large may not inherently entail better privacy guarantee, since each model owner is acting on its own training data (unlike PATE, where training data is partitioned into disjoint sets). The disjoint partitions in PATE is equivalent to the typically disjoint model owners in pricure. Model owners in pricure may use varying size of training data and the choice of may have implications on the accuracy/privacy trade-off (in PATE, depending on the dataset size, the choice of may affect accuracy/privacy trade-off). In Section 5.3, we will analyze the relationship between number of model owners, inference accuracy, and privacy guarantee. Next, we describe how model owners secret-share model parameters with workers.
Secret-Sharing of Model Parameters: Model owners secret-share their model parameters with the virtual workers via a secure channel. Once they secret-share their model parameters, model owners are not required to stay connected with the workers. Instead, they can terminate the secure connection and initiate it later as needed (for instance when a model is retrained and updating the model parameters is deemed necessary).
Once a model owner trains a model, the learned model parameters are captured via weights () and bias (). Suppose for model owner (), the weight vectors are , where, the weight vector represents the weight parameters from input to the hidden layer, similarly represents the weight parameters from hidden layer to the hidden layer and represents the weight parameters from hidden layer to the output layer. Using Equation 4 and 5, model owner secret-shares its model parameters with worker and , respectively as follows:
Finally, all secret-shared model parameters are computed as follows:
The secret-share is performed over a secure channel, and once the share is saved by the two workers, the model owner can go offline.
4.3. Client
When the client wants to obtain inference result on input , it connects to workers and and secret-shares its private input. Moreover, it generates a private-public key pair and sends the public key to the aggregator so that when the inference result is ready, before sending it the aggregator encrypts it with the client’s public key. When it receives an encrypted label from the aggregator, the client performs a decryption operation using its private key and produces the clear-text label for input . In Section 4.5, we will describe how the aggregator produces the final inference result. Next, we describe how the client secret-shares private input .
Secret-Sharing of Client Input: For simplicity, consider is the first element of the client’s input feature vector . is divided into two parts to be secret-shared with two workers and , using Equations 4 and 5:
(4) |
(5) |
x and x are large numbers that fall in the range , and do not individually reveal the real data. is obtained by combining and using Equation 6 below:
(6) |
4.4. Workers
Next, let us assume we have our network of neurons with total number of hidden layers, be the dimension of input to be secret-shared with worker and worker . The output results for the neuron of the hidden layer for worker and worker are computed as follows:
(7) | ||||
(8) |
Here, and are the weight vectors from input to the neuron of the hidden layer. For the hidden layer, we have the output neurons for worker and for worker . Note that is the total number of neurons in the hidden layer. These output neurons will make the input vector for the hidden layer. Accordingly, the output of the neuron of the hidden layer is obtained as:
Similarly, the output of the neuron of the hidden layer is computed as follows:
Here, and are the weight vectors from the hidden layer to the neuron of the hidden layer.
Each worker uses the secret-shared private input of the client and private model parameters of each model owner to produce intermediate results, which, when combined, produce the true inference output. Finally, the intermediate results for worker and worker for the output neuron are computed as:
(9) | ||||
(10) |
The coefficients and are the weight vectors from the final hidden layer to the neuron of the output layer for worker and worker , respectively.
Finally, the resultant intermediate vector for the output layer of model owner for number of neurons is for worker and for worker , respectively. These final vectors and are sent to the aggregator via a secure communication channel.
4.5. Aggregator
Reconstruction of Inference Results: When it receives intermediate results and from the workers, the aggregator first combines and to obtain the true inference result . Across each that corresponds to a model owner, the aggregator then performs majority vote-based aggregation of inference results. The aggregator then adds random noise sampled from the Laplace distribution to perturb the inference output (and hence deter membership inference style attacks). Lastly, the aggregator encrypts the inference output with client’s public key before sending it to the client.
In general, for each model owner , the aggregator receives and from worker A and B respectively. To get the true inference value from the intermediate results, the aggregator combines them as follows:
(11) |
As a result, the constructed inference result for model owner is for number of output neurons. Hence, the output for model owners is represented as a vector of final inference results: .
Noisy Aggregation of Inference Results: The aggregator aggregates all the inference results for model owners as follows:
(12) |
In Equation 12 above, is the noisy aggregated output for number of model owners, and is the aggregation function (e.g., majority vote, confidence-weighted sum of inference probabilities). Here, is an -dimensional vector with a total of number of noisy aggregated output inferences. The final label is produced using the operation as , where is the output label for input vector . The aggregator encrypts this output with the client’s public key and sends it to the client over a secure channel.
5. Evaluation
In this section, we valuate pricure by answering the following research questions:
RQ1:. What is the accuracy/privacy trade-off for pricure and how does it compare with prior work?
RQ2: What is the impact of the number of model owners on inference accuracy and privacy?
RQ3: What is the impact of differentially-private aggregation in reducing membership inference attack by a semi-honest client?
RQ4: What performance overhead is incurred by pricure overall, per-sample, and per-model?
5.1. Datasets, Model Architectures, and Setup
We use four datasets that focus on: handwritten digit recognition (MNIST (LeCun et al., 2020)), clothing image classification (Fashion-MNIST (Xiao et al., 2017)), breast cancer detection (IDC (Kaggle, 2020)), and in-ICU length-of-stay prediction for patients (MIMIC (Johnson et al., 2016)). We chose these datasets because they were used as benchmarks in related work (Papernot et al., 2017; Mohassel and Zhang, 2017; Riazi et al., 2018; Gilad-Bachrach et al., 2016) and also are relevant for privacy-sensitive domains (e.g., IDC and MIMIC are medical datasets). Next, we first briefly describe each dataset and then discuss model architectures and training setup.
MNIST is a collection of 70K gray-scale images of handwritten digits with training set of 60K images and a test set of 10K images. Each sample has a width-height dimension of 28x28 pixels of handwritten digits 0 to 9, which make up the 10 classes. We divide the 60K training samples into multiple chunks to train multiple models. For instance, for 50 model owners, we divide the training set into 50 disjoint datasets so that each model owner has 1.2K samples to train their model on. We use 500 samples from the test set for the evaluation of pricure.
Fashion-MNIST is a dataset from Zalando’s article images with 60K training set and 10K test set. Each example is a 28x28 gray-scale image, associated with a label from 10 classes. The labels are for T-shirt/top, trouser, dresses, etc. For the highest number of model owners of 40, we divide the training set into 40 disjoint datasets so each model owner has 1.5K samples to train their model. We use 500 samples from the test set to evaluate pricure.
IDC is the Invasive Ductal Carcinoma (IDC) dataset with 277,524 patches of size 50×50 extracted from 162 whole mount slide images of breast cancer specimens scanned at 40x. Out of the 277,524 samples, 198,738 test is negative (benign) and 78,786 test positive with IDC. Using under-sampling to balance positive and negative samples, we use 277,024 samples for training a binary classifier and 500 samples as testing samples for pricure.
MIMIC is the Medical Information Mart for Intensive Care (MIMIC) dataset with 60K samples with details such as demographics, vital signs, laboratory tests, medications and personal information about patients. The classifier predicts the length of stay (LOS) of a patient in the hospital. We use a multi-class classifier where LOS is divided into four classes: class 0 (0-4 days), class 1 (4-8 days), class 2 (8-12 days), and class 3 (12 or more days). Our training dataset contains 58, 386 samples and test set has 590 samples.
Model Architectures and Setup: For all the four datasets, we train a FFNN model. For MNIST and Fashion-MNIST, the input layer is and the output layer has neurons, with two hidden layers with ReLU activation. The output layer is linear and we use Mean Square Error loss instead of the negative log-likelihood function. Since log-likelihood is performed using softmax function, it requires computing logarithm and exponential functions which are not practical for pricure because we convert all real number parameters to integers. Since our computations are over a finite integer field, we convert all the floating point tensors into fixed precision tensors with a rounding at the second decimal digit (e.g., .456 to 45). For MIMIC, we use one hidden layer with input neurons and output neurons. For IDC, we use the same FFNN model where the number of input neurons is and the number of output neurons is 2. Model architecture details are described in the Appendix (Section 8.1). The learning rate is 0.001 across the four datasets. The number of epochs for MNIST and Fashion-MNIST is , while we used epochs for MIMIC and epochs for IDC. Finally, for each dataset, each model owner secret-shares their respective model parameters with workers A and B.
5.2. Accuracy/Privacy Trade-Off
We first analyze the inference accuracy of pricure with respect to privacy budget . Smaller values imply stronger privacy guarantees (e.g., against membership inference) and vice-versa. The ideal/optimal case is when we achieve high inference accuracy while providing strong privacy (smallest value), but striking the sweet spot that fulfills both is generally non-trivial for the trade-off depends on training data, number of model owners, and aggregation scheme used to decide the final inference result. In Figures 5–5, -axis shows privacy budget and -axis is inference accuracy of pricure. Across the four datasets, the smallest number of model owners is , while the maximum is in the range (e.g., for Fashion-MNIST, for MNIST, and for IDC), depending on the size of the training set and inference accuracy.
On MNIST and Fashion-MNIST Benchmarks: Figure 5 shows inference accuracy with respect to privacy budget (in the range [10-2, 1]) for pricure. Note that we have divided the K MNIST training set into range of 10, 25, and 50 disjoint datasets with , , and per-model owner samples, respectively. The client has input samples as the inference set. For MNIST, when number of model owners , the inference accuracy is for . Compared to the non-private inference accuracy for MNIST, we observe that pricure incurs significant accuracy loss for the privacy gain it provides in exchange. Interestingly, for , the inference accuracy for pricure is (i.e, no accuracy loss), and the inference accuracy remains fairly stable for (see Figure 5). This observation is consistent with prior work on ensemble of differentially private teacher models by Papernot et. al. (Papernot et al., 2017, 2018b). The best trade-off is with and inference accuracy=92.6% when .




Figure 5 shows accuracy/privacy trade-off analysis for Fashion-MNIST for 10, 20, and 40 model owners. For , the inference accuracy is 82.8% with . For , the inference accuracy drops dramatically. For example, when , inference accuracy is which is smaller than our best case accuracy on Fashion-MNIST of when . Note that non-private inference accuracy is , which, similar to MNIST, suggests that there is no accuracy loss. The best trade-off is obtained with and inference accuracy = 82.8% for .
On IDC and MIMIC: Figure 5 shows accuracy/privacy trade-off analysis for IDC. When , inference accuracy is 71.6% for =0.05, which is the optimal case in terms of accuracy/privacy trade-off. Hence, pricure offers an acceptable inference accuracy for IDC with smaller privacy budget, which offers a strong privacy guarantee. For , the inference accuracy is lower, for example, for , inference accuracy is 71.6% with , which suggests that the acceptable noise range for IDC is . Figure 5 shows accuracy/privacy trade-off for MIMIC. The best case inference accuracy is 76.6% with for . For , the inference accuracy is much lower (e.g., for ).
Comparison with Closely Related Work: Papernot et. al.(Papernot et al., 2017) show with a CNN model on MNIST, their non-private model is 99.18% accurate while the private ensemble of 250 teacher models is 93.18% accurate. In our case, the non-private FFNN model on MNIST is 96.6% accurate, while its private counterpart of 50 model owners is 92.6% with . We note that initial differences in the non-private accuracy of the models could be attributed to the different in model architectures, i.e., CNN (theirs) vs. FFNN (ours).
Take-away: With respect to RQ1, our analysis of the accuracy/privacy trade-off suggests that pricure provides acceptable privacy guarantees with very minimal trade-off on inference accuracy, showing its practical viability in a multi-party setting.
5.3. Number of Model Owners vs. Accuracy and Privacy
As we indicated in Section 4.2, intuitively, a higher value for number of model owners would potentially entail better collective inference accuracy. However, this depends on the accuracy of each model and the privacy budget. With that background, here, we analyze the impact of on inference accuracy and privacy budget.
On MNIST: From Figure 5, as increases from 10 to 25 and then 50, we notice a roughly consistent trend, especially for . With progressive increase in from to , inference accuracy seems to be inversely proportional to increase in . Keeping the same trend for , the highest inference accuracy values are achieved for the lowest number of model owners (), and the lowest inference accuracy values correspond to the highest number of model owners (). In particular, for , the inference accuracy is 92.4% (with ). On the other hand, inference accuracy jumps to 94.52% (with ) and then 96.56% (with ) for and , respectively.
Overall, from these observations we synthesize that higher number of model owners (e.g., ) result in lower inference accuracy as opposed to lower number of model owners (e.g., , ). It is, however, noteworthy that higher inference accuracy for lower number of model owners offer relatively lower privacy guarantees. For instance, with and , inference accuracy of 94.52% and 96.56% are achieved with and , respectively.
Fashion-MNIST, IDC, and MIMIC: We observe similar results for Fashion-MNIST and IDC from Figure 5 and Figure 5. For Fashion-MNIST, the best case of inference accuracy is 82.8% for , 84.2% for , and 84.39% for , respectively, for 40, 20, and 10 model owners. On the other hand, for IDC, the best case of inference accuracy is 71.6% for , 72.2% for , and 73.4% for , respectively, for 100, 50, and 10 model owners. As a result, for Fashion-MNIST and IDC dataset, we observe that pricure achieves better inference accuracy for smaller number of model owners, but with a caveat of relatively lower privacy guarantee. For MIMIC, from Figure 5 we observe that for 40 model owners, the best case for inference accuracy is 76.6% with , while for 10 model owners, the best case for inference accuracy is 80.53% with which also supports our claims.
Take-away: With respect to RQ2, we note that if the number of model owners is higher, it allows a larger noise level (i.e., offers stronger privacy guarantee), while for smaller values the inference accuracy is more sensitive to noise. Our findings here are inline with PATE (Papernot et al., 2017) with respect to larger number of models resulting in lower privacy budget on MNIST.
5.4. Utility of Differential Privacy Against Membership Inference Attack
We recall that one of the goals of pricure is to deter a semi-honest client who may mount membership inference attack (MIA). To that end, we perform DP-aggregation of inference results before we release each inference result to the client. The goal of MIA is to identify (with some confidence) whether a given sample belongs to a training set of a target model (Shokri et al., 2017). To examine the utility of DP in limiting MIA, we measure the accuracy, precision and recall of MIA for both noiseless and noisy aggregation of inference results. Accuracy measures the percentage of examples that are correctly predicted to be members of the target model’s training dataset. Precision measures the proportion of true membership inference with respect to all reported attacks, while recall measures the coverage of the attack, as the fraction of the training records that the attacker can correctly deduce as members of the training set. For the purpose of this evaluation, we reproduce the MIA by Shokri et. al. (Shokri et al., 2017), and we use the same size of member and non-member data to maximize the inference uncertainty, with baseline accuracy of . We use Fashion-MNIST and MIMIC to examine the effect of differential privacy on MIA.


MIA Against Fashion-MNIST Model: We use 5,000 samples to train the target model. There are shadow models and their training size is set to . Training dataset of shadow models are disjoint with the training dataset of the target model. The shadow models are expected to mimic the behavior of the target model as the target model and the shadow models are all trained on data coming from the same population. As a test dataset, we use the samples with equal number of members and non-members.
Figure 7 shows per-class accuracy of MIA for both private and non-private model on Fashion-MNIST. Without differentially private prediction, the maximum, minimum, and average MIA accuracy, respectively, is (for class-4), (for class-9), and (for all 10 classes). On the other hand, when we add Laplacian noise with post-aggregation on each inference result, average MIA accuracy is with maximum of for class-4 and minimum accuracy of for class-9. The attack accuracy degradation (by 5.96% to be exact) implies that due to the presence of Laplacian noise, the model owner’s training data are less vulnerable to MIA. We also observe that, adding more noise gradually decreases the MIA accuracy, thus mitigating training data exposure.
The average precision value for all class without DP is , suggesting that for the non-private model, of the images that were inferred as members by the attacker are true members. On the other hand, average recall score for the non-private model for all classes is , implying that the attacker can correctly infer an averages of of the training images as members. On the other hand, precision and recall with privacy budget = are and , respectively. The lower precision (by 3.02%) and lower recall (by 32.78%) for the private model suggest that -DP mitigates the membership inference attack.
MIA Against MIMIC Model: We use samples for both the target model and shadow model, with shadow models. We use the same number of members and non-members for evaluation. Figure 7 shows per-class accuracy of MIA for both private and non-private model on MIMIC. In a non-private setting, the maximum MIA accuracy is for class-2 and minimum MIA accuracy is for class-0. The average MIA accuracy is for all four classes, suggesting examples of the target model’s training data are correctly predicted to be members. With the privacy budget , the maximum MIA accuracy is for class-1 and minimum MIA accuracy is for class-3. For these settings, the average accuracy of MIA is across all four classes. The average accuracy degradation is , which suggests that -DP mitigates MIA by . Looking at precision and recall for the non-private model, we obtain and , respectively. On the other hand, precision and recall with privacy budget = are and , respectively. The lower precision (by 5.52%) and lower recall (by 44.72%) for the private model suggest that -DP mitigates the attacker’s pursuit of members’ records.
Comparison with Closely Related Work: While PATE (Papernot et al., 2017) discusses intuitive guarantee against MIA, they did not provide quantitative measurement of MIA based on noise scale. In (Rahman et al., 2018), the effect of DP-based training against MIA is examined. On MNIST, with and , they show that DP can eventually mitigate MIA accuracy and bring it close to baseline. Unlike (Rahman et al., 2018), in pricure we use -DP after aggregating the inference results of model owners, with sizable MIA mitigation within our noise boundary.
Take-away: With respect to RQ3, differentially private aggregation limits membership inference attack in a way that both the inference about members and the coverage of attack are minimized.
5.5. Performance Overhead Analysis


We evaluate the overall, per-sample, and per-model performance overhead incurred by pricure. In particular, we measure time taken by a) a model owner to secret-share their model parameters with the workers and b) per-sample overhead for inference.
Hardware Specification: To run our experiments, we used Google Colab virtual machine which provides 2 Intel(R) Xeon(R) model 79, family 6 CPUs with 2.2GHz each, with 12.72GB RAM and 107.77GB disk space. To speed-up computations, we used Python’s multi-threading features to efficiently utilize the two CPUs at a time, by dividing the task between them and get output inferences from two model owners in one cycle.
Per-model Overhead for Secret-Sharing: Figure 9 shows the duration of secret sharing of model parameter across the four datasets, and for different number of model owners. This performance is influenced by the number of model parameters ( and ). The -axis shows the different datasets and the -axis shows the time needed, in milliseconds, to share a single model owner’s model parameters with the two workers. Overall, the per-model secret-sharing overhead is insignificant. Comparatively, IDC seems to incur more delay (up to ms) of model parameter secret sharing because it produces vast model parameters compared to others. The other data sets need much less time in the range 8ms-48ms to share their model parameters, which is insignificant.
Per-sample Overhead for Inference: In Figure 9, the -axis shows overall duration (in scale) to produce inference result for an input with SMPC-DP (pricure) and without SMPC-DP (non-private). This performance is affected by number of model owners , number of classes, and feature vector dimension of the input. From Figure 9, across the board pricure incurs a significant overhead to produce the inference result. For example, on MNIST, pricure requires 772.56s to produce an inference result for one input (with ), while producing non-private prediction in just 0.025s. While the overhead seems significant on the surface, given the commodity hardware we used, which can be accelerated with parallelization and GPUs, we do not consider the overhead to be a deal-breaker for practical deployability of pricure.
Comparison with Closely Related Work: SecureNN (Wagh et al., 2018) proposes a DNN-centric protocol for secure training and prediction of 3- and 4-party setup. It was ran on Amazon EC2 c4.8x large instances in a LAN setting and WAN setting with average bandwidth of 625MB/s and 40MB/s, respectively. For per-sample inference on MNIST, 3-party and 4-party SecureNN require 0.34sec and 0.23sec, respectively. On the other hand, with 50-party setup (), pricure requires 772.56sec to compute inference result for an image. Note that for , pricure takes sec to compute a per-sample prediction, which is comparable to SecureNN given the difference in hardware specifications and latency setup.
Chameleon (Riazi et al., 2018) is evaluated based on a 5-layer CNN on MNIST with inference label as output. Experiments were ran on Intel Xeon ES-1620 CPU, 3.5 GHz with 16 GB of RAM. Per-sample classification took 2.7sec with testing accuracy of 99%. In pricure, is orders of magnitude larger (e.g., , , depending on the dataset) and the inference is collaborative.
CryptoNets (Gilad-Bachrach et al., 2016) is a cloud-based platform where the server can provide encrypted prediction to client’s computing on their encrypted data so both parties’ data can be secure. CryptoNets was ran on a similar hardware as (Mohassel and Zhang, 2017) and per-sample classification took 297.5s with 99% test accuracy.
Take-away: With regards to RQ4, secret-sharing model parameters and aggregating inference vectors incur negligible overhead. We also note that inference overhead in pricure is fairly significant compared to its non-private counterpart. We, therefore, see room for optimizations through hardware acceleration, parallelization, and efficiency advances in SMPC.
5.6. Discussion on Limitations
Scalability: pricure’s effectiveness depends on: number of model owners, and complexity of models and inputs. Other factors kept constant, increasing number of model owners slows down the inference process, because more models means more load on the workers. Model complexity/architecture plays an important role in system performance. Model details such as number of hidden of layers, activation functions, and pooling operations determine the effectiveness of pricure. The complexity of feature vectors of inputs also determines the speed of computing predictions and the complexity of the learned parameters. For instance, images of 28x28 pixels for MNIST result in a model far less lightweight than images of 50x50 pixels for IDC. While our evaluations are encouraging as to real-life deployability of pricure, its scalability with more complex model architectures is worth exploring as future work. In addition, as the number of models increase, at some point, the performance load on the workers will be too much to bear, which demands performance enhancement alternatives such as hardware acceleration, multi-threading, and introducing more workers (albeit more complexity).
Performance Overhead Measurements: Due to our limitations of hardware resources, we measured performance overhead of pricure on a single machine. As a result, some measurements (e.g., secret-sharing delay, sending intermediate results from workers to the aggregator) may not fully capture an actual distributed setup. As in prior work (e.g., (Wagh et al., 2018)), production performance overhead measures need to be done in distributed setup to measure network latency for secret-sharing model parameters, computing aggregated inference results, and sending them to the client.
6. Related Work
We discuss related work in the context of pricure. For detailed survey, we refer the reader to (Cristofaro, 2020) and (Papernot et al., 2018a).
Papernot et al. (Papernot et al., 2017, 2018b) train an ensemble of ‘teacher’ models on disjoint private data, and use the teachers as labeling oracles to train a ‘student’ model via aggregated noisy voting by teachers. PATE ensures a strong privacy guarantee for training data, yet assumes the test dataset as public/non-private. Unlike PATE, in pricure, model owners’ training data and client input samples are private.
Abadi et al. (Abadi et al., 2016) leverage differential privacy to train deep neutral networks with built-in privacy through differentially-private stochastic gradient descent (DP-SGD). While DP-SGD offers strong privacy guarantees for training data, it is not intended for the collaborative inference setup that we explore in pricure with private models and private inputs.
Jayaraman et al. (Jayaraman et al., 2018) combine differential privacy and secure multiparty computation to enable privacy-preserving collaborative training in a distributed setting. They explore output perturbation and gradient perturbation. In the output perturbation setting, parties combine local models within a secure computation and then add the differential privacy noise before revealing the model. In the collaborative inference setting, pricure is similar to this approach because securely computed inference results are aggregated and differential privacy noise is added before the final inference result is revealed to the client. In the gradient perturbation method, the data owners collaboratively train a global model.
Lindell and Pinkas (Lindell and Pinkas, 2000) propose a system for two parties who own their private database with similar feature type, and are willing to collaborate in order to run the ID3 learning algorithm with the union of their respective databases without revealing too much about their data to each other. In the sense of multiple model owners aiming to collaboratively predict on a private input, this approach is broadly similar to pricure.
Graepel et al. (Graepel et al., 2012) study FHE-based outsourcing of execution of a machine learning algorithm while retaining confidentiality of the training and test data. Bost et al. (Bost et al., 2015) design and evaluate CryptoML for privacy-preserving classification over encrypted data where both the model provider’s training data and client’s test data are unknown to each other. Riazi et al. (Riazi et al., 2018) propose Chameleon to enable two parties to jointly compute a function securely without disclosing their private input. Chameleon is based on additive secret sharing —for linear operations and Garbled circuit —for non-linear operations. Wagh et al. (Wagh et al., 2018) design and evaluate a secure computation setting for 3-party and 4-party over common DNN architectures such that no single party can learn others’ private data. Giacomelli et al. (Giacomelli et al., 2018) propose a privacy-preserving collaborative approach for Random Forest where multiple parties share their private model securely and produce encrypted prediction for a client’s private input data.
7. Conclusion
We presented pricure, a system that leverages orthogonal privacy guarantees of secure multi-party computation and differential privacy to enable multiple parties that own private models to collaborate on a common ML task such as classification of a privacy-sensitive medical image owned by a client. At the core of pricure is additive secret sharing that ensures after an iteration of a collaborative inference on a private input, model owners learn nothing about the input and clients learn nothing about the model parameters behind the collaborative inference. In addition, through differentially-private inference aggregation, pricure limits the advance of membership inference attack by a semi-honest client. On a commodity hardware, we demonstrate the practical viability of pricure on four datasets, for tens of model owners, with acceptable accuracy/privacy trade-off, and performance overhead amenable to speed-up using GPUs, parallelization, and more efficient secure multi-party computation protocols.
References
- (1)
- Abadi et al. (2016) Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016. ACM, 308–318.
- Angelini et al. (2008) Eliana Angelini, Giacomo di Tollo, and Andrea Roli. 2008. A neural network approach for credit risk evaluation. The Quarterly Review of Economics and Finance 48, 4 (Nov. 2008), 733–755. https://doi.org/10.1016/j.qref.2007.04.001
- Attema et al. (2018) Thomas Attema, Emiliano Mancini, Gabriele Spini, Mark Abspoel, Jan de Gier, Serge Fehr, Thijs Veugen, Maran van Heesch, Daniël Worm, Andrea De Luca, Ronald Cramer, and Peter M. A. Sloot. 2018. A New Approach to Privacy-Preserving Clinical Decision Support Systems for HIV Treatment. CoRR abs/1810.01107 (2018).
- Bayatbabolghani and Blanton (2020) Fattaneh Bayatbabolghani and Marina Blanton. 2020. Secure Multi-Party Computation. In PAI Workshop held in conjunction with AAAI’2020.
- Beaver (2012) D. Beaver. 2012. Efficient Multiparty Protocols using Circuit Randomisation. Springer 576 (2012).
- Ben-Or et al. (1988) Michael Ben-Or, Shafi Goldwasser, and Avi Wigderson. 1988. Completeness Theorems for Non-Cryptographic Fault-Tolerant Distributed Computation. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, Janos Simon (Ed.). ACM, 1–10.
- Bost et al. (2015) Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. 2015. Machine Learning Classification over Encrypted Data. In 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, February 8-11, 2015. The Internet Society.
- Chaudhuri et al. (2011) Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. J. Mach. Learn. Res. 12 (2011), 1069–1109.
- Cristofaro (2020) Emiliano De Cristofaro. 2020. An Overview of Privacy in Machine Learning. CoRR abs/2005.08679 (2020). arXiv:2005.08679 https://arxiv.org/abs/2005.08679
- Dahl et al. (2012) G. E. Dahl, Dong Yu, Li Deng, and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (Jan. 2012), 30–42. https://doi.org/10.1109/tasl.2011.2134090
- Damgård et al. (2012) Ivan Damgård, Valerio Pastro, Nigel P. Smart, and Sarah Zakarias. 2012. Multiparty Computation from Somewhat Homomorphic Encryption. In Advances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings (Lecture Notes in Computer Science), Reihaneh Safavi-Naini and Ran Canetti (Eds.), Vol. 7417. Springer, 643–662.
- Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings (Lecture Notes in Computer Science), Shai Halevi and Tal Rabin (Eds.), Vol. 3876. Springer, 265–284.
- Fredrikson et al. (2015) Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015. 1322–1333.
- Gao et al. (2019) Feng Gao, Wei Wang, Miaomiao Tan, Lina Zhu, Yuchen Zhang, Evelyn Fessler, Louis Vermeulen, and Xin Wang. 2019. DeepCC: a novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis 8, 9 (Aug. 2019). https://doi.org/10.1038/s41389-019-0157-8
- Giacomelli et al. (2018) Irene Giacomelli, Somesh Jha, Ross Kleiman, David Page, and Kyonghwan Yoon. 2018. Privacy-Preserving Collaborative Prediction using Random Forests. CoRR abs/1811.08695 (2018). arXiv:1811.08695 http://arxiv.org/abs/1811.08695
- Gilad-Bachrach et al. (2016) Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 (JMLR Workshop and Conference Proceedings), Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. JMLR.org, 201–210.
- Graepel et al. (2012) Thore Graepel, Kristin E. Lauter, and Michael Naehrig. 2012. ML Confidential: Machine Learning on Encrypted Data. In Information Security and Cryptology - ICISC 2012 - 15th International Conference, Seoul, Korea, November 28-30, 2012, Revised Selected Papers (Lecture Notes in Computer Science), Taekyoung Kwon, Mun-Kyu Lee, and Daesung Kwon (Eds.), Vol. 7839. Springer, 1–21.
- Haagh et al. (2018) Helene Haagh, Aleksandr Karbyshev, Sabine Oechsner, Bas Spitters, and Pierre-Yves Strub. 2018. Computer-Aided Proofs for Multiparty Computation with Active Security. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018. IEEE Computer Society, 119–131.
- Jacobson (1957) Nathan Jacobson. 1957. Structure of Rings. In Bull. Amer. Math. Soc. 63 (1957). 46–50.
- Jayaraman et al. (2018) Bargav Jayaraman, Lingxiao Wang, David Evans, and Quanquan Gu. 2018. Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada. 6346–6357.
- Joan Feigenbaum and Saint-Jean (2004) Raphael Ryger Joan Feigenbaum, Benny Pinkas and Felipe Saint-Jean. 2004. Secure Computation of Surveys. In EU Workshop on Secure Multiparty Protocols, 2004.
- Johnson et al. (2016) Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3, 1 (24 May 2016), 160035. https://doi.org/10.1038/sdata.2016.35
- Kaggle (2020) Kaggle. 2020. Invasive Ductal Carcinoma Dataset. https://www.kaggle.com/paultimothymooney/breast-histopathology-images.
- Krizhevsky et al. (2017) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
- LeCun et al. (2020) Yan LeCun, Corinna Cortes, and Christopher J.C. Burges. 2020. The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/.
- Lindell and Pinkas (2000) Yehuda Lindell and Benny Pinkas. 2000. Privacy Preserving Data Mining. In Advances in Cryptology - CRYPTO 2000, 20th Annual International Cryptology Conference, Santa Barbara, California, USA, August 20-24, 2000, Proceedings (Lecture Notes in Computer Science), Mihir Bellare (Ed.), Vol. 1880. Springer, 36–54.
- Liu et al. (2017) Jian Liu, Mika Juuti, Yao Lu, and N. Asokan. 2017. Oblivious Neural Network Predictions via MiniONN Transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 619–631.
- Mohassel and Zhang (2017) Payman Mohassel and Yupeng Zhang. 2017. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 19–38.
- Nair et al. (2015) Divya G. Nair, V. P. Binu, and G. Santhosh Kumar. 2015. An Improved E-voting scheme using Secret Sharing based Secure Multi-party Computation. CoRR abs/1502.07469 (2015).
- OpenMined (2020) OpenMined. 2020. PySyft. https://github.com/OpenMined/PySyft/blob/master/syft/frameworks/torch/mpc/securenn.py.
- Papernot et al. (2017) Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2017. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. In 5th International Conference on Learning Representations, ICLR 2017.
- Papernot et al. (2018a) Nicolas Papernot, Patrick D. McDaniel, Arunesh Sinha, and Michael P. Wellman. 2018a. SoK: Security and Privacy in Machine Learning. In 2018 IEEE European Symposium on Security and Privacy, EuroS&P 2018, London, United Kingdom, April 24-26, 2018. 399–414.
- Papernot et al. (2018b) Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018b. Scalable Private Learning with PATE. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
- Rahman et al. (2018) Md. Atiqur Rahman, Tanzila Rahman, Robert Laganière, and Noman Mohammed. 2018. Membership Inference Attack against Differentially Private Deep Learning Model. Trans. Data Priv. 11, 1 (2018), 61–79.
- Riazi et al. (2018) M. Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M. Songhori, Thomas Schneider, and Farinaz Koushanfar. 2018. Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, AsiaCCS 2018, Incheon, Republic of Korea, June 04-08, 2018, Jong Kim, Gail-Joon Ahn, Seungjoo Kim, Yongdae Kim, Javier López, and Taesoo Kim (Eds.). ACM, 707–721.
- Sallab et al. (2017) Ahmad El Sallab, Mohammed Abdou, Etienne Perot, and Senthil Kumar Yogamani. 2017. Deep Reinforcement Learning framework for Autonomous Driving. CoRR abs/1704.02532 (2017).
- Shamir (1979) Adi Shamir. 1979. How to Share a Secret. Commun. ACM 22, 11 (1979), 612–613.
- Shokri et al. (2017) Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 3–18.
- Tramèr et al. (2016) Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016. 601–618.
- Wagh et al. (2018) Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2018. SecureNN: Efficient and Private Neural Network Training. IACR Cryptol. ePrint Arch. 2018 (2018), 442.
- Wagh et al. (2019) Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2019. SecureNN: 3-Party Secure Computation for Neural Network Training. Proc. Priv. Enhancing Technol. 2019, 3 (2019), 26–49.
- Xiao et al. (2017) Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747
- Yao (1986) Andrew Chi-Chih Yao. 1986. How to Generate and Exchange Secrets (Extended Abstract). In 27th Annual Symposium on Foundations of Computer Science, Toronto, Canada, 27-29 October 1986. IEEE Computer Society, 162–167.
8. Appendix
8.1. Model Architectures Used for Evaluation
Model Architecture of MNIST and Fashion-MNIST Number of input neurons Number of output neurons Number of hidden layers Number of neurons in layer = Number of neurons in layer = Activation Function: ReLu Error estimation method: MSE
Model Architecture of IDC Regular
Number of input neurons
Number of output neurons
Number of hidden layers
Number of neurons in hidden layer =
Activation Function: ReLu
Error estimation method: MSE
Model Architecture of MIMIC Critical Care
Number of input neurons
Number of output neurons
Number of hidden layers
Number of neurons in hidden layer =
Activation Function: ReLu
Error estimation method: MSE
8.2. Illustration of Additive Secret Sharing
For additive secret sharing, first we have a finite field where all the secret-shares belong where is a very large prime number. Now, suppose we split up into shares such that = and the share will be , and the reconstruction will be . (Haagh et al., 2018). For two workers A and B, input value is split into two secret-shares and sent to A and B as follows:
The reconstruction is done using the following procedure:
The reconstruction can be proved with as follows:
For summation, suppose we want to compute via secret sharing. We divide them using Equation 4 and 5 as follows:
Now, the system sends and to worker and and to worker . The summation operation on each worker side will be:
This reconstruction can be proved as follows:
Since Q is a large prime compared to and , the above illustration proves that additive secret sharing reconstructs .
8.3. Illustration of SPDZ
For two numbers and , our goal is to compute securely. We divide them using Equation 4 and 5 as follows:
The multiplication operation will be:
We can observe from above that to calculate and Worker A and Worker B need each other’s share, which conflicts with the privacy assumptions of our approach. To resolve the conflict, the SPDZ protocol (Beaver, 2012) is used for performing secret sharing multiplication. For this technique, independent of and , values called ‘multiplication triplets’ are generated in offline phase. In this phase, three numbers , and are generated and shared with two parties ahead of running the protocol. The numbers, , and are generated as follows:
Now, these and parameters are secretly shared with the workers and the workers will compute:
And for worker B:
And reconstruct them using Equation 6 and compute and . This two parameters will be public to worker A and Worker B. Now the intermediate results of multiplication of and will be,
If we reconstruct and using Equation 6 the real multiplication value of will be obtained.