A Comprehensive Analysis of Information Leakage in Deep Transfer Learning
Abstract.
Transfer learning is widely used for transferring knowledge from a source domain to the target domain where the labeled data is scarce. Recently, deep transfer learning has achieved remarkable progress in various applications. However, the source and target datasets usually belong to two different organizations in many real-world scenarios, potential privacy issues in deep transfer learning are posed. In this study, to thoroughly analyze the potential privacy leakage in deep transfer learning, we first divide previous methods into three categories. Based on that, we demonstrate specific threats that lead to unintentional privacy leakage in each category. Additionally, we also provide some solutions to prevent these threats. To the best of our knowledge, our study is the first to provide a thorough analysis of the information leakage issues in deep transfer learning methods and provide potential solutions to the issue. Extensive experiments on two public datasets and an industry dataset are conducted to show the privacy leakage under different deep transfer learning settings and defense solution effectiveness.
1. Introduction
Transfer learning is a rapidly growing field of machine learning that aims to improve the learning of a data-deficient task by knowledge transfer from related data-sufficient tasks (Pan and Yang, 2010; Tan et al., 2018; Zhuang et al., 2019). Witness the success of deep learning, deep transfer learning has been widely studied and demonstrated remarkable performance over various applications, such as medical image classification (Raghu et al., 2019), electronic health data analysis(Dubois et al., 2017), and credit modeling (Stamate et al., 2015).
A fundamental block of deep transfer learning is deep neural network, which is vulnerable to different attacks aiming to detect sensitive information contained in the training dataset (Wu et al., 2019b; Shokri et al., 2017; Ganju et al., 2018). Moreover, in most of the real-world applications where deep transfer learning is used, the source and target datasets always reside in two different organizations. As a result, deep transfer learning also faces potential privacy threats, i.e, the client in the target organization can leverage the vulnerability of deep learning models to detect sensitive information contained in the source organization. Specifically, applying deep transfer learning comes with the interaction between the source and target domains. Thus, the data transmission between these domains may unintentionally disclose private information.
Existing studies on analyzing privacy leakages focus on either general machine learning models (Shokri et al., 2017; Nasr et al., 2019a) or in a federated learning setting where model is collaboratively trained by multiple clients by sharing and aggregating the gradients via a server(Melis et al., 2019a; Nasr et al., 2019a). However, there no such study on transfer learning paradigms. To this end, we are the first to provide a general categorization for deep transfer learning models based on the potential information leakages. This is not trivial since there are numerous methods for deep transfer learning (Pan and Yang, 2010; Tan et al., 2018; Zhuang et al., 2019). Given the goal of privacy leakage analysis, we care more about the interaction manner between source and target domains. Thus, we divide previous works into three categories, as illustrated in Figure 1: (1) model-based paradigm where the whole model structure and parameters are shared (2) mapping-based where the hidden features are shared (3) parameter-based where the parameter gradients are shared. Based on that, the previous works can fall into the above categories or a hybrid of them. For example, fine-tuning based approaches obviously belong to the first category. The prior work (Sun and Saenko, 2016) is based on the mapping-based paradigm, since it uses the correlation alignment loss which further depends on the shared hidden features. Similarly, previous works that minimize the domain representation difference by variants of distribution divergence metrics such as maximum mean discrepancy also fall into the second category (Long et al., 2015, 2017; Rozantsev et al., 2019). Fully-shared and shared-private transfer learning models (Liu et al., 2017) can be regarded as parameter-based, as they both jointly train a shared network via gradient updates in a multi-task fashion, just to name a few.

Based on the general categorization, we can build customized attacks against each paradigm and demonstrate information leakages in deep transfer learning. At a high level, we consider inferring two types of sensitive information, i.e., membership and property information. This sensitive information can be revealed by the transmission data between the two domains as the above discussed. Specifically, in the model-based paradigm, we build the membership attack which takes the model (learned using the source dataset) as input and determines whether a specific sample is used for training the model. In the mapping-based setting, we can build the property attack to infer properties contained in the training dataset. For example, the attacker resides on the target domain aims to infer properties of the source domain based on the shared hidden features. In the parameter-based setting, we can similarly perform the property inference attack, i.e., the attacker can infer properties of the source domain data based on the shared gradients. More details of these attacks can be found in Section 3.
Empirically, to demonstrate the effectiveness of attacks, we conduct a set of experiments under the three types of transfer learning settings. Our key observation is that all these types of models do unintentionally leak information of the training data under membership/property attacks. Model-based paradigm is possible to leak membership information. Parameter-based paradigm without revealing individual gradient (i.e., averaged gradients at batch-level) leaks much less property information, compared to the mapping-based paradigm where hidden features (i.e., representations at sample-level) are shared.
In summary, our main contributions are as follows:
-
•
We are the first to propose a general categorization for different deep transfer learning paradigms based on their intrinsic interaction manner between the source and target domains and provide a comprehensive analysis of the potential privacy leakage profiles for each category respectively.
-
•
Based on the categorization, we build specific attacks against each paradigm to demonstrate their privacy leakages.
- •
-
•
We conduct extensive experiments on both public datasets and an industry marketing dataset to verify the effectiveness of our attacks and defense solutions.
2. Preliminaries
2.1. Deep Transfer Learning Setting
In this work, we focus on deep transfer learning (Tan et al., 2018; Zhuang et al., 2019), where the models discussed are neural network based. Without losing generality, we consider a transfer learning setting with two domain tasks , where and refer to the source domain and target domain respectively, both containing private sensitive information. We aim to improve the target domain learning task performance by utilizing its own data and source domain data . Each dataset contains a set of labeled examples , where denotes the inputs and denotes the corresponding labels. The size of source domain data is usually much larger than the target domain data , i.e., . The goal of a domain task is to learning a transformation function , parameterized by model weights .
2.2. Inference Attacks for DNN Models
The basic idea of the inference attack is to exploit the leakages when a model is being trained or released to reveal some unintended information from the training data. In this section, we briefly present two types of inference attacks, i.e., membership and property attacks, for machine learning models.
Membership Inference Attack. Membership inference is a typical attack that aims to determine whether a sample is used as part of the training dataset. Membership inference may reveal sensitive information that leads to privacy breach. For example, if we can infer a patient’s existence in the training dataset for a medical study of a certain disease, we can probably claim that this patient has such a disease. Recent works have demonstrated the membership attack attempts for machine learning models under the black-box setting (Shokri et al., 2017; Long et al., 2017; Salem et al., 2019), white-box setting (Melis et al., 2019a; Nasr et al., 2019b) or both (Hayes et al., 2019). Shadow training is a widely adopted technique for membership inference, where multiple shadow models are trained to mimic the behavior of the target model(Shokri et al., 2017; Long et al., 2017; Salem et al., 2019). This technique assumes the attacker to have some prior knowledge about the population from the targeted model training dataset was drawn. Recent works by (Melis et al., 2019a; Nasr et al., 2019b) explicitly exploit the vulnerabilities in gradient updates to perform attacks with white-box access.
Categorization | Brief Description | Training | Leakage | Inference |
---|---|---|---|---|
Strategy | Profile | Attack Type | ||
Model-based | Model fine-tuning, i.e., continue training on the target domain. | Self-training | Membership | |
Mapping-based | Hidden representations alignment, i.e., reducing distribution divergence. | Co-training | Property | |
Parameter-based | Jointly updating shared partial network, i.e., hard parameter sharing. | Co-training | Batch property |
Property Inference Attack. Another common type of attack is property inference that aims to reveal certain unintended or sensitive properties (e.g., the fraction of the data belongs to a certain minority group) of the participating training datasets that the model producer does not intend to share when the model is released. A property is usually uncorrelated or loosely correlated with the main training task. Pioneer works of (Ateniese et al., 2015; Fredrikson et al., 2015; Ganju et al., 2018) conducted the property attacks that characterize the entire training dataset. While, (Melis et al., 2019a) aimed to infer properties for a subset of the training inputs, i.e., in terms of single batches which they termed as single-batch properties. In this regard, membership attack can be viewed as a special case of property attack when scope for property attack is on a sample basis.
The above-mentioned two types of attacks are closely related. Most of existing works perform those attacks against general machine learning models, while a few focus on the federated learning and collaborative learning scenarios (Nasr et al., 2019b; Melis et al., 2019a). None of the studies systematically explore the inference attacks explicitly for the context of deep transfer learning.
3. Privacy Leakage in Deep Transfer Learning
In this section, we first provide a general categorization of deep transfer learning according to the interaction manner between source and target domains. Then, based on the categorization, we conduct privacy analysis through building specific attacks against different transfer learning paradigms.
3.1. General Categorization of Deep Transfer Learning
Different types of transfer learning models have been proposed over the years, depending on how the knowledge is shared. Although in transfer learning, there may exist several existing categorizations in the literature (Pan and Yang, 2010; Tan et al., 2018; Nasr et al., 2019a), we categorize the different deep transfer learning models based on how the two domains interact, their training strategy (e.g., co-training or self-training), and the potential leakages. Broadly speaking, those transfer learning models can be categorized into three types, i.e., model-based, mapping-based, and parameter-based, as illustrated in Figure 1.
-
•
Model-based (in Figure 1(a)) is a simple but effective transfer learning paradigm, where the pre-trained source domain model is used as the initialization for continued training on the target domain data. Model-based fine-tuning has been broadly used to reduce the number of labeled data needed for learning new tasks/tasks of new domains (He et al., 2016; Howard and Ruder, 2018; Devlin et al., 2019).
-
•
Mapping-based (in Figure 1(b)) methods aim to align the hidden representations by explicitly reducing the marginal or conditional distribution divergence between source and target domains. More specifically, alignment losses, i.e., usually in forms of distribution discrepancies/distance metrics such as MMD and JMMD, between domains are measured by such feature mapping approaches and minimized in the loss functions (Long et al., 2015, 2017; Rozantsev et al., 2019).
-
•
Parameter-based (in Figure 1(c)) methods transfer knowledge by jointly updating a shared partial network to learn transferable feature representations across domains in a multi-task fashion. This type of methods is mainly achieved by parameter sharing. Regardless of the design differences, they all utilize a shared network structure to transform the input data into domain-invariant hidden representations (Yosinski et al., 2014; Liu et al., 2017; Yang et al., 2017).
Based on the general categorization, we will discuss the threat model, information leakage and the customized attacks against each transfer learning paradigm in detail.
3.2. Threat Model
In this work, we assume all the parties , i.e., the domain-specific data owners, are semi-honest, where they follow exactly the computation protocols but may try to infer as much information as possible when interacting with the other parties. More specifically, we work on the threat model under a deep transfer learning setting with two domains. Without loss of generality, we assume the owner of the target dataset to be the attacker, who intentionally attempts to infer additional information from source domain data beyond what is explicitly revealed. Depending on the different deep transfer learning categorizations, the attacker may have different access to the source domain information. Note that we can naturally extend to the case where the attacker is the owner of the source dataset or the transfer learning setting with more than two data sources, however, it is not the focus of this paper.
3.3. Privacy Analysis
As presented in Table 1, three types of model categorization require different information interactions and training strategies, thus posing different potential leakage profiles. More specifically:
-
•
Model-based methods rely on the pre-trained source domain model solely. Despite the necessity for a pre-trained model to be disclosed to the target domain, both the source and target training processes can be entirely separately, thus the potential leakage is only the final source domain model . In this case, the attacker has full access to the source domain model including both the model structure and parameters.
-
•
Mapping-based methods are optimized in a co-training fashion and hidden presentations of both domains have to interact with each other to measure the alignment losses or domain regularizers. We denote and as the hidden representation of layer at a training iteration for source and target domains, respectively. Specifically, such a feature matching process demands the hidden presentations, i.e., and , of both domains to be exposed and aligned to reduce the marginal or conditional distribution divergence between domains. As a result, can potentially leak information from the training data.
-
•
Parameter-based methods jointly updating the shared partial network. The interactions between the source and target happen when exchanging the gradients to sync the shared network parameters. Let denote the model parameters for the shared network structure. At each training iteration , is updated by averaging the gradients learned by a mini-batch of training examples sampled from a domain dataset (either source or target). In such an alternating process, has to be revealed across domains at each iteration . Thus, the potential leakage profile contains all the intermediate during the training process.
As a summary, in this paper, we focus on inferring information that is unintentionally disclosed by the data transmission between the source and target domains, such as membership information of an individual sample and property information of a specific sample or a subset of samples (see more details in the next subsection.).
3.4. Inference Attacks for Deep Transfer Learning
Inference attacks against deep learning models generally exploit implicit memorization in the models to recover sensitive information from the training data. Previously studies have proven that information can be inferred from the leakage profile, such as membership information that decides whether a sample is used for training (Li et al., 2013; Shokri et al., 2017; Long et al., 2018; Melis et al., 2019a; Wu et al., 2019a), properties which reveals certain data characteristics, or reconstructed feature values of the private training records.
In this part, to empirically evaluate the privacy leakage in different transfer learning settings (shown in Figure 1), we present concrete attack methods for each setting. Note, we assume the attacker to be the owner of the target dataset for all the three transfer learning paradigms discussed above.
Model-based. As illustrated in Table 1, the only leakage source in this setting is the model trained using data from the source domain. For simplicity, we denote the trained model as , where is the prediction label. According to the training protocol in the model-based setting, the attacker can obtain the white-box access of , i.e., its structure and parameters. Thus, based on recent works (Shokri et al., 2017; Long et al., 2017; Salem et al., 2019; Melis et al., 2019a; Nasr et al., 2019b), an attacker can design powerful membership attack methods to detect sensitive information contained in the source domain (i.e., membership attack). In the context of membership attack, the goal of the attacker is to train a membership predictor, which can take a given sample as input and output the probability distribution that indicates whether the given sample is used for training the source domain model . Formally, we denote the membership predictor as . Here, means the given input is used for training the model .
In this paper, we employ a widely used technique named shadow model training for building the membership predictor. Specifically, we assume the attacker can have extra knowledge about the source dataset . The extra prior knowledge in our setting is the shadow training dataset that comes from the same underlying distribution as for training the source model. The core idea of the shadow training dataset is to first train multiple shadow models that mimic the behavior of the source model. Then, the attacker can extract useful features from the shadow models/datasets and build a machine learning model that characterizes the relationship between the extracted features and the membership information.
To be specific, the attacker first evenly divides the shadow training dataset into two disjoint datasets and . Then the attacker trains the shadow model that has the same architecture as the source model using . Subsequently, features of samples from both and can be extracted. For each sample in , the attacker can use the output prediction vector of the shadow model as the feature following the prior work (Shokri et al., 2017). And each feature vector is labeled with 1 (member, if the sample is in ) or 0 (non-member, if the sample is in ). At last, all the feature-label pairs are used for training the membership predictor.
Once the membership predictor is obtained, we can use it to predict the membership label of a sample in , i.e., feeding the output vector of the source domain model to the predictor.
Mapping-based. In the mapping-based setting, information leakage can be both posed at the source and target domains since the training protocol proceeds in an interactive manner. To keep the analysis consistency, we refer the attacker to be the owner of the target domain dataset.
The core idea of the mapping-based method is to align the hidden features extracted from the source and target domains, i.e., reducing the discrepancy between feature distributions of these two domains. This can be done by minimizing some pre-defined alignment loss function (e.g., maximum mean discrepancy(Long et al., 2015, 2017; Rozantsev et al., 2019; Nasr et al., 2019a)). Thus, this method will lead to the hidden features of the source domain share a similar distribution to the features of the target domain, which comes with a strong privacy implication, i.e., the attacker can leverage the feature similarity to build the attack model to detect sensitive information contained in the source domain.
We consider the property attack (Ganju et al., 2018) in this setting. At a high level, the property attack aims to infer whether a feature comes from the source domain has a specific property or not. Here, we take the case of the -th iteration as an example. Given a specific property, the attacker can first collect an auxiliary dataset for assisting the attack. Specifically, the attacker divides the target domain dataset into two subsets, namely, which contains samples with the property and which consists of samples without the property. Subsequently, and are used as the auxiliary datasets for building the attack model. At the -th iteration, given the current parameter of the model trained using the target dataset, the attacker calculates the hidden features of the alignment layer with respect to the samples in the auxiliary dataset. Hidden features are labeled with 1 if the corresponding sample is in and 0 if the sample is in . Once the attacker collects all the feature-label pairs, she can train a property predictor using these pairs. The whole procedure is demonstrated in Algorithm 1.
Based on the property predictor obtained above, the attacker can conduct an online attack. Specifically, in the joint training process, at the beginning of the -th iteration, the attacker receives a batch of hidden features from the source domain. Then, the attacker can employ to predict the property information contained in the source domain dataset.
Parameter-based. In the parameter-based setting, information leakage is posed by the weight parameter interaction between the source and target domains. Similar to previous settings, we refer the attacker to be the owner of the dataset of the target domain.
We consider the batch property attack in this setting. The intuition behind this attack is that the attacker can observe the updates of the share layers calculated using a mini-batch of samples from the source domain. Thus the attacker can train a batch property predictor to infer whether the update based on the mini-batch with a given property or not. We take the -th iteration as an example to demonstrate how the attacker conducts the attack. We assume that the attacker has auxiliary dataset consisting of samples from the distribution that is similar to the source domain distribution. Note that, this assumption is commonly used in various previous works (Sharma et al., 2019; Melis et al., 2019b). Given a specific property, the attacker can further divide into two sub-datasets, namely, the dataset which has the property () and the dataset without the property ().
Based on the above setting, at the -th iteration, the attacker can receive the fresh parameter of the shared layer which is updated based on samples from the source dataset. Then, the attacker can calculate gradients using the mini-batch sampled from and based on samples from . The batch gradients based on are labeled with 1 and others are labeled with 0. Based on these gradient-label pairs, we can train the batch property predictor . The whole procedure is shown in Algorithm 2.
Once is obtained, the attacker can use it to predict the btach property information of the source domain. Specifically, the attacker first generates the gradient based on the parameters of the current and last iteration (i.e., and ). Then she can feed to to obtain the prediction result.
4. Experiments
In this section, we empirically study the potential information leakages in deep transfer learning. We start by describing the datasets and the experimental setup. We then discuss in detail the results for various inference attacks under the aforementioned three transfer learning settings, followed by the examination of viable defenses.
4.1. Datasets and Experimental Setup
We conduct experiments on two public and one industry datasets, i.e., Review, UCI-Adult, and Marketing. The Review dataset is used to examine membership attack as fine-tuning methods are often used in such NLP task and typically suffer from membership inference. While the UCI-Adult dataset has some sensitive properties, thus is used for conducting property based attacks. Besides, the Marketing dataset is sampled from real-world marketing campaigns to examine the defense solution. Data statistics are in Table 2.
Review. We construct our dataset from the public available Amazon review data (McAuley and Leskovec, 2013), with two categories of products selected, i.e., ‘Watches’ and ‘Electronics. In our transfer learning setting, the data-abundant domain ‘Electronics’ is viewed as the source domain, while the data-insufficient domain ‘Watches’ is treated as the target. Each sample consists of a quality score and a review text.
Following the literature on this task (Chen et al., 2019), we adopt the TextCNN (Kim, 2014) as the base model for textual representation in the transfer learning model. For TextCNN, we set the filter windows as 2,3,4,5 with 128 feature maps each. The max sequence length is set as 60 for the task. We initialize the embedding lookup table with pre-train word embedding from GloVe with embedding dimension set as 300. The batch sizes for training the source domain model, the shadow model, and the attack model are set as 64.
UCI-Adult. We consider a second Adult Census Income dataset (Kohavi, 1996) from the UC Irvine repository to examine the property attack issues. The dataset has 14 properties such as country, age, work-class, education, etc. To form the transfer learning datasets, we use data instances with country of “U.S.” as source domain and “non-U.S.” as the target. For the UCI dataset, we consider an MLP as our base model for the transfer learning models.
Marketing. For transfer learning purpose, two sets of data are sampled from two real-world marketing campaigns data that contains user profile, behaviour features and adoptions. One data-abundant campaign is used as source domain, while the other is used as the target. The task is to predict whether a user will adopt the coupon.
Dataset | Statistics | Task |
Amazon Review | Electronics: #354,301 | Review Quality |
Watches: #9,737 | Prediction | |
UCI Adult | Source Train: #29,170 | |
Source Test: #14,662 | Census Income | |
Target Train: #3391 | Prediction | |
Target Test: #1619 | ||
Marketing | Source: #236,542 | Coupon Adoption |
Target: #140,964 | Prediction |
Implementation Details. Specifically, for model-based setting, a fine-tuning approach is adopted where both the source domain model and the shadow model employed the same model structure, i.e., the above-mentioned TextCNN structure, followed by an MLP layer of size 128. The attack model adopted is an MLP with network of [16, 8]. For the mapping-based setting, we consider an MLP with a network structure of [64, 8] for source and target models, an MMD metric is used as the alignment loss between the two domains, and another MLP with network of [64, 8] for the attack model. For parameter-based setting, we consider a fully-shared model structure with an MLP as the base model with structure of [64, 8] at the task training stage, and another MLP with network of [16, 8] for the attack model.
All the above neural network based transfer learning models are implemented with TensorFlow and trained with NVIDIA Tesla P100 GPU using Adam optimizer, if not specified. The learning rate is set as 0.001. The activation function is set as ReLU.
For all the settings, we use Area Under Curve (AUC) to evaluate the overall attack performance. For membership attack, we also adopt precision for evaluation, as membership of the training dataset is the concern. Precision is defined as the fraction of samples predicted as members are indeed members of the training data. For property attacks, we further include accuracy for evaluating the overall attack performance.
4.2. Model-based Setting
In the model-based setting, the attacker has the full white-box access to the source model, thus the attacker is able to train shadow models to mimic the behavior of the source domain mode. In this setting, we explore the possibility of performing membership inference on the source model. Despite the context differences, membership attacks for model-based transfer learning models can be conducted the same way as for any trained stand-alone deep learning models(Shokri et al., 2017; Salem et al., 2019; Melis et al., 2019b).
Train Acc | Test Acc | Attack AUC | Attack Prec. | |
---|---|---|---|---|
10 epoch | 0.9290 | 0.7182 | 0.5014 | 0.4705 |
20 epoch | 0.9338 | 0.8648 | 0.7355 | 0.5868 |
30 epoch | 0.9589 | 0.8563 | 0.6744 | 0.5832 |
To build the membership predictor introduced in Section 3, we split the original source dataset into three parts, namely, training dataset (248,011 samples), test dataset (70,860 samples), and shadow dataset (70,860 samples). We set the number of shadow models to 3. Table 3 shows the training and testing accuracy for the attack classifier on the shadow dataset and the attack performance, i.e., AUC and precision, on the source domain model. Smaller gaps between training and testing accuracy suggests better generalization and predictive power of the attack classifier. Overall, we find that even in the model-based setting without direct interactions between the source and target domains during the training process, the source domain model does leak a considerable amount of membership information. We also examine the effect of over-fitting by adjusting the number of epoch trained for both the source domain and shadow models. We observe the more over-fitted a model, the more information can be potentially leaked. However, it decreases when the number of epochs furthers increases. This echos the findings in (Shokri et al., 2017) that over-fitting is an important factor but not the only factor that contributes to the leakage of membership information.
4.3. Mapping-based Setting
We then investigate property attacks under the mapping-based transfer learning setting to infer the properties of the source domain data during the training process. The properties are not the same as the prediction label, or even independent of the prediction task. To examine this, we construct two more datasets based on the UCI adult dataset: Prop-race and Prop-sex, discussed as follows.
-
•
Prop-sex: we filter out the property “sex” feature from the input features in UCI-Adult data and use it as the attack task label. The attack label is set as 1 if the property “sex” is male and 0 otherwise. This is to examine whether a attacker can infer training instance’s gender during the training process.
-
•
Prop-race: similarly we filter out the “race” feature to study whether this property appears in training instances. Attack label is set as 1 if property “race” is white and 0 otherwise.
Table 4 shows the results of the property attack on the Prop-sex dataset. The attack has a good AUC of 0.77 in the property inference task, which shows the transfer learning setting does have information leakage. The attack precision is high on the property “sex:male”, and less satisfactory on “sex:female”. The female property has much less instances and thus may be harder to be predicted.
We have also conducted experiments on another property “race” in Table 5, the attack AUC is lower than on Prop-sex, 0.5885 vs. 0.7766. These results demonstrate that Prop-sex leaks more property information than Prop-race in terms of attack AUC. At the first glance, “race: white” has much higher attack precision than “race:non-white”, 0.8901 vs. 0.2179. Close examination shows “race: white” dominates the training data with around 80% label coverage.
Prop-sex | Test | Attack | Attack Prec. | ||
---|---|---|---|---|---|
AUC | Acc | AUC | male | female | |
2 epoch | 0.8513 | 0.8011 | 0.7357 | 0.7010 | 0.5918 |
5 epoch | 0.8522 | 0.8101 | 0.7435 | 0.7072 | 0.6077 |
10 epoch | 0.8650 | 0.8171 | 0.7766 | 0.7132 | 0.5883 |
Prop-race | Test | Attack | Attack Prec. | ||
---|---|---|---|---|---|
AUC | Acc | AUC | white | non-white | |
2 epoch | 0.7640 | 0.8000 | 0.5661 | 0.8812 | 0.1652 |
5 epoch | 0.7680 | 0.8012 | 0.5815 | 0.8914 | 0.1845 |
10 epoch | 0.7750 | 0.8010 | 0.5885 | 0.8901 | 0.2179 |
We further examine the correlation between the chosen properties and the underlying Census Income prediction task, and compare it with the property attack results in Table 6. In general, both Prop-race and Prop-sex have information leakage issues, and the property with a higher negative/positive correlation with the main prediction task tends to cause higher potential information leakage.
Correlation | Attack AUC | Attack Acc | |
---|---|---|---|
Prop-race | -0.0837 | 0.5885 | 0.5646 |
Prop-sex | -0.2146 | 0.7766 | 0.7059 |
4.4. Parameter-based Setting
For the parameter-based setting, we further process the UCI adult data to perform the batch property attack. Again we use these two properties “race” and “sex” and form two datasets namely BProp-race and BProp-sex respectively. For both datasets, we set the batch size as 8. For each batch in BProp-race, we set the label as 1 if a batch has at least one data instance containing the “non-white” property and 0 otherwise. For BProp-sex, we set the label as 1 if a batch has at least one instance containing the “female” property and 0 otherwise. The data statistics are shown in Table 7. We find for BProp-race, the positive instance ratio in the source domain is drastically different from the target, this may bring negative transfer. While for the BProp-sex, the positive instance ratio is similar for both domains.
As shown in Algorithm 2, the selection of is the key to building the batch property predictor. In this paper, we directly use the target dataset as the auxiliary dataset and find it works in practice, as both domains commonly are related for the transferring purpose. Generally, we can obtain better attack performance if we can use part of source domain samples to form the auxiliary dataset. The results are presented in Table 8. First, we find the attack AUC for the batch property attack in the parameter-based setting are generally not as high as those from the property attack in the mapping-based setting. The AUC is around 0.56 in batch property attack, while up to 0.77 for property attack. This shows the parameter-based setting is generally less vulnerable from attacks, which can possibly be justified by the fact that the gradients exchanged have been averaged at the batch-level. Second, the results for BProp-sex are better than BProp-race, as AUC results are 0.5654 and 0.5545 respectively. This is intuitive as observed from Table 7, the domain difference in BProp-race is larger than the BProp-sex. The model performance in BProp-race may suffer from the negative transfer learning due to the domain gap, thus may lead to the decrease in attack performance. Overall, the parameter-based method is possible to suffer from batch property attacks, however, the information leakage problem is not that severe.
Source data size | Target data size | |||
---|---|---|---|---|
Positive | Negative | Positive | Negative | |
BProp-race | 38 | 528 | 2225 | 2637 |
BProp-sex | 65 | 501 | 418 | 4444 |
Test | Attack | |||
---|---|---|---|---|
AUC | Acc | AUC | Acc | |
BProp-race | 0.8466 | 0.8054 | 0.5545 | 0.5372 |
BProp-sex | 0.8577 | 0.8331 | 0.5654 | 0.9140 |
4.5. Defense Solutions
A standard way to prevent statistical information leakage is differential privacy. However, the privacy guarantee provided by differential privacy comes with the decreasing of the model utility. Some recent works proposed to relax the requirement of differential privacy to prevent membership/property attacks while provide better model utility. For example, some regularization techniques such as dropout are helpful to prevent the information leakage of machine learning models (Wu et al., 2019a; Melis et al., 2019b). Recent studies consider to use Stochastic gradient Langevin dynamics (SGLD) (Wu et al., 2019a; Mou et al., 2018). In this paper, to prevent the information leakage in deep transfer learning, we employ SGLD as our optimizer to optimize deep models and show its effectiveness in reducing information leakage. Prior work (Wu et al., 2019a) demonstrates that SGLD is effective in preventing membership information leakage and provides theoretical bounds for membership privacy loss. Empirically, in this paper, SGLD is also effective for preventing the leakage of the property information while provides comparable model performance compared with those non-private training methods.
To examine the effectiveness of the defense solutions, we conduct experiments under the mapping-based setting. Specifically, we replace the non-private optimizer (i.e., SGD) with SGLD and train the source and target models from scratch. The overall result is shown in Table 9. We observe that the use of SGLD can prevent the information leakage of the training dataset, based on the above attack metrics for evaluating the information leakage. In contrast to the original optimizer, SGLD significantly reduces the attack AUC score of the inference attack from 0.7766 to 0.6862 in Prop-sex, and 0.5885 to 0.5442 in Prop-race, while achieves better model utility in terms of the task AUC score especially in Prop-race (0.7750 to 0.8012). In both datasets, SGLD achieves the best task AUC and slightly outperforms the original task results. We infer the boost of the task AUC score is due to the regularization effect of the noises injected in SGLD, which can prevent model overfitting as pointed out in the previous works (Wu et al., 2019a; Mou et al., 2018).
We also conduct experiments of DP-SGD introduced in the prior work (Abadi et al., 2016). As the result shows, although DP-SGD can achieve better anti-attack ability than SGLD, it comes with considerably model utility decrease (the task AUC score decreases from 0.8466 to 0.7662 in Prop-sex, and 0.7750 to 0.7192 in Prop-race ). Furthermore, experiments on Dropout are also performed. We can observe that Dropout can also prevent privacy leakage. We also note that when the drop ratio increases, the anti-attack ability of Dropout increases but model utility decreases drastically. As a conclusion, through experiments, we found that SGLD can be a good alternative to differentially private optimization methods and it can achieve a better trade-off between model utility and anti-attack ability.
Prop-sex | Prop-race | |||
AUCTask | AUCAttack | AUCTask | AUCAttack | |
Original | 0.8466 | 0.7766 | 0.7750 | 0.5885 |
SGLD | 0.8501 | 0.6862 | 0.8012 | 0.5442 |
DP-SGD | 0.7662 | 0.6656 | 0.7192 | 0.5172 |
Dropout-0.1 | 0.7449 | 0.7126 | 0.7748 | 0.6040 |
Dropout-0.5 | 0.5881 | 0.6132 | 0.6194 | 0.5199 |
Dropout-0.9 | 0.5240 | 0.5188 | 0.5267 | 0.5041 |
5. Industrial Application
Witness the effectiveness of the SGLD, we conduct additional experiments to examine whether it is able to prevent information leakage in an industrial application. The main task is to predict whether a user will use the coupon and the inference task is to predict the property “marital status: married” (the Pearson correlation between these labels is -0.03986). Follow the original application, here we adopt a fully-shared model (Liu et al., 2017) under the parameter-based setting. As shown in Table 10, we find the original TL method does help to improve the target domain performance, boosting AUC from 0.7513 (Target-only) to 0.7704. However it also suffers from information leakage with an attack AUC of 0.5553. By combining SGLD with transfer learning, with a minor decrease of task performance (-0.8%), the information leakage can be significantly reduced by 3.8%.
Target | Original | Defense | |||
---|---|---|---|---|---|
only | TL | Attack | TL+SGLD | Attack | |
AUC | 0.7513 | 0.7704 | 0.5553 | 0.7625(-0.8%) | 0.5176(-3.8%) |
6. Related Work
Deep Transfer Learning Models. With the success of deep learning, deep transfer learning has been widely adopted in various applications (Pan and Yang, 2010; Tan et al., 2018; Nasr et al., 2019a). According to our categorization based on the potential information leakage, transfer learning models can be summarized into three types, i.e., model-based, mapping-based, and parameter-based, or a hybrid of the different types(Long et al., 2017).
Model-based methods, such as model fine-tuning, generally first pre-train a model using the source domain data and then continue training on the target domain data(He et al., 2016; Howard and Ruder, 2018; Devlin et al., 2019). Mapping-based methods aim to align the hidden representations by explicitly reducing the marginal/conditional distribution divergence between source and target domains which are measured by some distribution difference metrics. Commonly used metrics include variants of Maximum Mean Discrepancy(MMD), Kullback-Leibler Divergence, Wasserstein distance, and etc (Long et al., 2015, 2017; Rozantsev et al., 2019; Tan et al., 2018; Nasr et al., 2019a). Parameter-based methods transfer knowledge by jointly updating a shared network to learn domain-invariant features across domains, which is mainly achieved by parameter sharing (Yosinski et al., 2014; Liu et al., 2017; Yang et al., 2017). Further works improve parameter-based methods by incorporating adversarial training to better learn domain-invariant shared representations (Liu et al., 2017).
There are few studies (Sharma et al., 2019; Liu et al., 2018) for analyzing privacy leakages for general machine learning models or in a federated learning setting, however, there is no such privacy analysis work for transfer learning models. To bridge this gap, we first provide a general categorization of deep transfer learning models based on information leakage types and conduct thorough privacy analysis through building specific attacks against different transfer learning paradigms. We we also examine several general defense solutions to alleviate information leakage for the three paradigms. The privacy preserving models for transfer learning are rarely studied. Prior work (Liu et al., 2018) proposed a secure transfer learning algorithm under the mapping-based setting, where homomorphic encryption was adopted to ensure privacy at the expenses of efficiency and some model utility loss due to the Taylor approximation. A follow-up work in (Sharma et al., 2019) further enhanced the security and efficient for the same problem setting by using Secret Sharing technique. A recent work by (Guo et al., 2018) employed a privacy preserving logistic regression with -differential privacy guarantee under a hypothesis transfer learning setting.
Membership Inference. The study (Shokri et al., 2017) developed the first membership attack against machine learning models with only black-box access using a shallow training method. This method assumes that we have some prior knowledge of the data for training the attack classifier is from the same distribution as the original training data. Later study of (Long et al., 2017) followed the idea of shallow training and explored two more targeted membership attacks, i.e., frequency-based and distance-based. The study(Salem et al., 2019) further relaxed key attack assumptions of (Shokri et al., 2017) and demonstrated more applicable attacks. Aside from the black-box setting, these studies (Melis et al., 2019a; Nasr et al., 2019b) examined the membership attacks against federated/collaborative learning under the white-box setting, where an adversary can access the model and potentially is able to observe/eavesdrop the intermediate computations at hidden layers. They share a similar idea that leverages the gradients or model snapshots to produce the labeled examples for training a binary membership classifier. This work (Hayes et al., 2019) presented the first membership attacks on both black-box and white-box for generative models, in particular generative adversarial networks (GANs).
Property Inference. Property attack aims to infer the properties hold for the whole or certain subsets of the training data. Prior works (Ateniese et al., 2015; Fredrikson et al., 2015; Ganju et al., 2018) studied property inference attacks that characterize the entire training dataset. A property attack was developed (Ganju et al., 2018) based on the concept of permutation invariance for fully connected neural networks, with the assumption that adversary has white-box knowledge. Concurrently, the study (Melis et al., 2019a) developed attacks under the collaborative learning setting, where they focus on inferring properties for single batches of training inputs.
7. Conclusion
In this study, we provide a general categorization of different deep transfer learning paradigms depending on how the domains interact with each other. Based on that, we then analyze their respective privacy leakage profiles, design different attack models for each paradigm and provide potential solutions to prevent these threats. Extensive experiments have been conducted to examine the potential privacy leakage and effectiveness of defense solutions.
References
- (1)
- Abadi et al. (2016) Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 2016. 308–318.
- Ateniese et al. (2015) Giuseppe Ateniese, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, Domenico Vitali, and Giovanni Felici. 2015. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. IJSN 10, 3 (2015), 137–150.
- Chen et al. (2019) Cen Chen, Minghui Qiu, Yinfei Yang, Jun Zhou, Jun Huang, Xiaolong Li, and Forrest Sheng Bao. 2019. Multi-Domain Gated CNN for Review Helpfulness Prediction. In WWW 2019. 2630–2636.
- Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT 2019.
- Dubois et al. (2017) Sébastien Dubois, Nathanael Romano, Kenneth Jung, Nigam Shah, and David C. Kale. 2017. The Effectiveness of Transfer Learning in Electronic Health Records Data. In 5th International Conference on Learning Representations, ICLR 2017 Workshop. OpenReview.net.
- Fredrikson et al. (2015) Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS. 1322–1333.
- Ganju et al. (2018) Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. 2018. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018. ACM, 619–633.
- Guo et al. (2018) Xiawei Guo, Quanming Yao, Wei-Wei Tu, Yuqiang Chen, Wenyuan Dai, and Qiang Yang. 2018. Privacy-preserving Transfer Learning for Knowledge Sharing. CoRR abs/1811.09491 (2018).
- Hayes et al. (2019) Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2019. LOGAN: Membership Inference Attacks Against Generative Models. PoPETs 2019, 1 (2019), 133–152.
- He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 770–778.
- Howard and Ruder (2018) Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018. 328–339.
- Kim (2014) Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014. 1746–1751.
- Kohavi (1996) Ron Kohavi. 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA. 202–207.
- Li et al. (2013) Ninghui Li, Wahbeh H. Qardaji, Dong Su, Yi Wu, and Weining Yang. 2013. Membership privacy: a unifying framework for privacy definitions. In 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS’13. 889–900.
- Liu et al. (2017) Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017. 1–10.
- Liu et al. (2018) Yang Liu, Tianjian Chen, and Qiang Yang. 2018. Secure Federated Transfer Learning. CoRR abs/1812.03337 (2018).
- Long et al. (2015) Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015. 97–105.
- Long et al. (2017) Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. 2017. Deep Transfer Learning with Joint Adaptation Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017. 2208–2217.
- Long et al. (2017) Yunhui Long, Vincent Bindschaedler, and Carl A. Gunter. 2017. Towards Measuring Membership Privacy. CoRR abs/1712.09136 (2017).
- Long et al. (2018) Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue Bu, Xiaofeng Wang, Haixu Tang, Carl A. Gunter, and Kai Chen. 2018. Understanding Membership Inferences on Well-Generalized Learning Models. CoRR abs/1802.04889 (2018).
- McAuley and Leskovec (2013) Julian J. McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Seventh ACM Conference on Recommender Systems, RecSys ’13, Hong Kong, China, October 12-16, 2013.
- Melis et al. (2019a) Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019a. Exploiting Unintended Feature Leakage in Collaborative Learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019. 691–706.
- Melis et al. (2019b) Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019b. Exploiting Unintended Feature Leakage in Collaborative Learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019. IEEE, 691–706.
- Mou et al. (2018) Wenlong Mou, Liwei Wang, Xiyu Zhai, and Kai Zheng. 2018. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints. In Conference On Learning Theory, COLT 2018 (Proceedings of Machine Learning Research), Vol. 75. PMLR, 605–638.
- Nasr et al. (2019a) Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019a. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019. IEEE, 739–753.
- Nasr et al. (2019b) Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019b. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019. 739–753.
- Pan and Yang (2010) Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.
- Raghu et al. (2019) Maithra Raghu, Chiyuan Zhang, Jon M. Kleinberg, and Samy Bengio. 2019. Transfusion: Understanding Transfer Learning for Medical Imaging. In Annual Conference on Neural Information Processing Systems, NeurIPS 2019. 3342–3352.
- Rozantsev et al. (2019) Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. 2019. Beyond Sharing Weights for Deep Domain Adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2019), 801–814.
- Salem et al. (2019) Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. 2019. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019.
- Sharma et al. (2019) Shreya Sharma, Chaoping Xing, Yang Liu, and Yan Kang. 2019. Secure and Efficient Federated Transfer Learning. CoRR abs/1910.13271 (2019).
- Shokri et al. (2017) Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy, SP 2017. 3–18.
- Stamate et al. (2015) Cosmin Stamate, George D. Magoulas, and Michael S. C. Thomas. 2015. Transfer learning approach for financial applications. CoRR abs/1509.02807 (2015).
- Sun and Saenko (2016) Baochen Sun and Kate Saenko. 2016. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In Computer Vision - ECCV 2016 Workshops, Vol. 9915.
- Tan et al. (2018) Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A Survey on Deep Transfer Learning. In ICANN 2018 - 27th International Conference on Artificial Neural Networks. 270–279.
- Wu et al. (2019a) Bingzhe Wu, Chaochao Chen, Shiwan Zhao, Cen Chen, Yuan Yao, Guangyu Sun, Li Wang, Xiaolu Zhang, and Jun Zhou. 2019a. Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics. AAAI (2019).
- Wu et al. (2019b) Bingzhe Wu, Shiwan Zhao, Guangyu Sun, Xiaolu Zhang, Zhong Su, Caihong Zeng, and Zhihong Liu. 2019b. P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019. 2099–2108.
- Yang et al. (2017) Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. 2017. Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks. In 5th International Conference on Learning Representations, ICLR 2017.
- Yosinski et al. (2014) Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014 (NeurIPS). 3320–3328.
- Zhuang et al. (2019) Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2019. A Comprehensive Survey on Transfer Learning. CoRR abs/1911.02685 (2019).