Unsupervised Non-transferable Text Classification
Abstract
Training a good deep learning model requires substantial data and computing resources, which makes the resulting neural model a valuable intellectual property. To prevent the neural network from being undesirably exploited, non-transferable learning has been proposed to reduce the model generalization ability in specific target domains. However, existing approaches require labeled data for the target domain which can be difficult to obtain. Furthermore, they do not have the mechanism to still recover the model’s ability to access the target domain. In this paper, we propose a novel unsupervised non-transferable learning method for the text classification task that does not require annotated target domain data. We further introduce a secret key component in our approach for recovering access to the target domain, where we design both an explicit and an implicit method for doing so. Extensive experiments demonstrate the effectiveness of our approach.
1 Introduction
Deep learning has achieved remarkable success over the past decade and is active in various fields, including computer vision, natural language processing (NLP), and data mining. Although neural models can perform well in most tasks, they require a huge amount of data and a high computation cost to train, making the trained model a valuable intellectual property. As a result, it is essential for us to prevent neural models from being used without authorization. In the last few years, many methods have been proposed to safeguard deep neural networks and they can be roughly divided into two types: watermarking Adi et al. (2018), and secure authorization Alam et al. (2020). In the watermarking approaches, the owners can verify the ownership of the neural model based on a unique watermark. However, due to the catastrophic forgetting problem Kemker et al. (2018), the watermark-based neural models Kuribayashi et al. (2020); Song et al. (2017) are known to be vulnerable to certain malicious attacks Wang and Kerschbaum (2019), which may lead to the loss of their watermarks. On the other hand, in the secure authorization approaches, the owners of the neural network want to ensure that users can only access the model with authorization. Recently, Wang et al. (2022) proposed a new perspective with non-transferable learning (NTL) to protect the model from illegal access to unauthorized data. The method trains the model to have good performance only in the authorized domain while performing badly in the unauthorized domain. However, such an approach has some limitations: 1) their work relies on a significant amount of labeled data from the target domain, while such labels are usually not easy to acquire, 2) the access to the unauthorized domain can no longer be regained, if required, after the model is learned.

In this work, we propose our new NTL method named Unsupervised Non-Transferable Learning (UNTL) for the text classification tasks. As Figure 1 shows, our model can perform well in the source domain while performing badly in the target domain. In addition, we propose secret key modules, which can help recover the ability of the model in the target domain. Our contributions include:
-
•
We propose a novel unsupervised non-transferable learning approach for text classification tasks. Different from existing approaches, our model can still perform well without the need for the label information in the target domain.
-
•
We introduce two different methods, namely Prompt-based Secret Key and Adapter-based Secret Key, that allow us to recover the ability of the model to perform classification on the target domain.
-
•
Extensive experiments show that our proposed models can perform well in the source domain but badly in the target domain. Moreover, access to the target domain can still be regained using the secret key.
To the best of our knowledge, our work is the first approach for learning under the unsupervised non-transferable learning setup, which also comes with the ability to recover access to the target domain.111Our code and data are released at https://github.com/ChaosCodes/UNTL.
2 Related Work
In this section, we briefly survey ideas that are related to our work from two fields: domain adaptation and intellectual property protection. Furthermore, we discuss some limitations in the existing methods which we will tackle with our approach.
In domain adaptation, given a source domain and a target domain with unlabeled data or a few labeled data, the goal is to improve the performance in the target task using the knowledge from the source domain. Ghifary et al. (2014), Tzeng et al. (2014), and Zhu et al. (2021) applied a Maximum Mean Discrepancy regularization method Gretton et al. (2012) to maximize the invariance information between different domains. Ganin et al. (2016) and Schoenauer-Sebag et al. (2019) tried to match the feature space distributions of the two domains with adversarial learning. In contrast to the methods above, Wang et al. (2022) analyzed domain adaptation in a different way and proposed non-transferable learning (NTL) to prevent the knowledge transfer from the source to the target domain by enlarging the discrepancy between the representations in different domains.
In intellectual property protection, due to the significant value and its vulnerability against malicious attacks of learned deep neural networks, it is crucial to propose intellectual property protection methods to defend the owners of the deep neural networks (DNNs) from any loss. Recently, two different approaches to safeguard DNNs have been proposed: watermarking Adi et al. (2018) and secure authorization Alam et al. (2020). In the watermarking approaches, researchers designed a digital watermark that can be embedded into data such as video, images, and so on. With the detection of the unique watermark, we could verify the ownership of the copyright of the data. Based on these ideas, Song et al. (2017) and Kuribayashi et al. (2020) embedded the digital watermarks into the parameters of the neural networks. Zhang et al. (2020) and Wu et al. (2021) proposed a framework to generate images with an invisible but extractable watermark. However, they are vulnerable to some active attack algorithms Wang and Kerschbaum (2019); Chen et al. (2021) which first detect the watermark and then rewrite or remove it. On the other hand, the secure authorization approach seeks to train a model that generates inaccurate results without authorization. Alam et al. (2020) proposed a key-based framework that ensures correct model functioning only with the correct secret key. In addition, Wang et al. (2022) were inspired by domain generalization and proposed non-transferable learning (NTL), which achieves secure authorization by reducing the model’s generalization ability in the specified unauthorized domain.
Although the NTL model can effectively prevent access to the unauthorized domain, it requires target labels during training, which may not always be easy to obtain. Furthermore, there is no mechanism to recover access to the unauthorized domain when needed. In this paper, we present a new NTL model and show that our model can still have good performance even in the absence of the target labels which are, however, indispensable in the work of Wang et al. (2022). Besides, we extend it to a secret key-based version. With our method, authorized users can still access the target domain with the provided keys.
3 Approach
In this section, we first introduce our proposed Unsupervised Non-Transferable Learning (UNTL) approach in Sec. 3.1, followed by a discussion on its practical limitation – it lacks the ability to regain the access to the target domain. Next, we discuss our secret key-based methods in Sec. 3.2 to address this limitation.
3.1 UNTL Text Classification
Problem Description
First of all, we present our definition of the unsupervised non-transferable learning task without labeled data from the target domain. Following Farahani et al. (2020), we consider that a domain consists of three parts: input space , label space , and the joint probability distribution . Given a source domain and a target domain with unlabeled samples, where is a one-hot vector indicating the label of , is the number of classes, and refer to the number of examples in the source and target domain respectively. The goal of our UNTL method is to prevent the knowledge transfer from the source to the target domain, i.e., to train the model so that it performs well on the source domain data but poorly on the target domain data, without the requirement of accessing the label information of the target data.
Text Classification
In our work, we use a BERT-based model Devlin et al. (2019) as our feature extractor for the input sentence and consider the final hidden state of the token [CLS] as the feature representation, where we denote as . A simple feed-forward network will be added on top of BERT as a classifier to predict the label. The formal loss function can be:
(1) |
where is the source domain dataset and CE indicates the cross entropy function.
Maximum Mean Discrepancy
To enlarge the distance between the representations of the source and target domains, we follow Wang et al. (2022) and use Maximum Mean Discrepancy Gretton et al. (2012) (MMD) to achieve this goal. MMD is a kernel two-sample test and can be used as a metric to determine whether two data distributions and are similar. MMD defines the metric function as follows:
(2) |
where is the reproducing kernel Hilbert space (RKHS) with a kernel , whose operation is and function maps the sentence input into RKHS. The smaller the distance , the more similar the two distributions and .
In our work, we use MMD to increase the distance between the feature representations of the source and the target domain, forcing the feature extractor to extract domain-dependent representations rather than maximizing the inter-domain invariance. To prevent the high MMD from dominating the entire loss, we follow Wang et al. (2022) and set an upper bound for it. Therefore, based on Equation 2, our MMD loss can be formulated as:
(3) |
where is the upper bound for MMD, and are data distributions of the source and target domains respectively. With this loss, we only maximize when it is smaller than the upper bound .
Domain Classifier
Despite being able to enlarge the gap between the source and target domains to some extent, the MMD loss lacks the explicit ability to clearly draw the boundary between the representations of different domains, especially when the knowledge between domains is similar. Therefore, we hypothesize that using MMD alone may not be sufficient to yield optimal empirical performance. To mitigate this issue, we draw inspiration from the Domain-Adversarial Neural Networks Ganin et al. (2016) and propose an additional domain classifier added on top of the feature extractor. This classifier is trained to predict the domain with the feature representations. We employ a cross-entropy loss to train the domain classifier. By optimizing this loss, the representations of different domains are encouraged to be more distinct. Specifically, we use 0 to indicate the source domain and 1 to indicate the target domain. We can formulate the domain classification (DC) loss as:
(4) | |||
where is the domain classifier. With this DC loss as a regularization term, the boundary of feature representation between the source and the target can be clearer, facilitating better non-transferable learning.
Objective Function
In this task, our goal is to train a model that can perform well on the source domain while performing badly on the target domain. To achieve this goal, we propose a loss function for unsupervised non-transferable learning, which contains three terms. The first term is the cross-entropy loss for text classification to integrate knowledge about the downstream task into the model. The second term is the domain classification loss and the third is MMD loss . The latter two terms jointly contribute to enlarging the gap between the representations of the source and target domains to prevent knowledge transfer. Finally, we can get our overall loss which is written as:
(5) |
where and are the scaling hyperparameters.
Theoretical Analysis
Different from Wang et al. (2022) where they use information bottleneck theory Tishby et al. (2000) to show the feasibility of non-transferable learning, we turn to a more general theory of domain adaptation Ben-David et al. (2006); Wang (2018). Here, we present an analysis of the effectiveness of the unsupervised setting based on this theory.
Theorem 1
Ben-David et al. (2010) Let be a hypothesis space (of a particular VC dimension), for any . Given a source domain and a target domain :
(6) |
where and are the expected source and target errors respectively, , which can be viewed as a constant and is a divergence222 is a symmetric difference hypothesis space for a hypothesis space . See Ben-David et al. (2010) for more details. that measures the maximal discrepancy between two distributions under a fixed hypothesis class.
During our training process, we minimize the source error while maximizing the divergence (with the MMD and DC losses). Comparing with a baseline transfer model without the MMD and DC losses, we hypothesize that our method may yield a comparable source error, while leading to a significantly larger divergence term. We believe this may lead to a significant increase in the target error, as the changes of the above terms would effectively lead to a much looser upper bound for the target error as shown in Equation 6. Such a looser upper bound may lead to a significant increase in target error, which can effectively prevent the knowledge from being transferred into the target domain. We will verify our hypothesis in the experiments later.333In fact, through our experiments later, we found that on average there was a 1% increase in source error for our approach as compared to the baseline. However, there was a significant increase () in the divergence term as approximated by the MMD loss, which leads to effective non-transfer learning (where we achieve good source domain performance and bad target domain performance).
3.2 Learning with Secret Keys
With our UNTL method, we could ensure that the model performs well in the source domain whilst degrading its performance in the target domain. However, it is inconvenient if the performance in the target domain can no longer be restored after training. This can be illustrated with an example: suppose that we are running an application that supports two kinds of users, regular users, and members. Suppose further that the regular users are only authorized to query the model for a limited set of data, while the members have no limits on their access to the model. Using our UNTL approach discussed above, we can limit the access of the regular users by denoting the authorized and non-authorized portions of the data as the source and target domains respectively. Then we train the model that performs well on the source domain, but poorly on the target domain. However, as the members have no limits to their access, they would require a separate model to be trained that performs well on both domains, thus doubling the computational and storage costs required for the application.
To solve this issue, we extend our approach to include a secret key, , that can be used to recover access to the target domain even after non-transferable learning. Without the key, the model is encouraged to perform well on the source domain while degrading the accuracy in the target domain. However, upon supplying the secret key, the model’s performance on the target domain will be restored. Following the example above, this allows a single neural network to be used for all the users, whilst providing privileged access to the members through the provision of the secret key. Based on our UNTL, in this section, we present our innovative secret key-based unsupervised non-transferable learning method. The method can not only keep the normal users away from the membership area but also provide members with a specific key that allows them to access the membership area within a single model.
Our intuition is to design a secret key that can revive the restricted target domain in our UNTL model. We call the method Secret Key-based Non-Transferable Learning, which has two variants 1) Prompt-based Secret Key method, where we add a discrete prompt as a prefix to the input sentence that serves as an explicit secret key to restore access to the target domain, and 2) Adapter-based Secret Key method, where a trained adapter module is added to the model to transform the target embeddings into the source-like ones in an implicit manner.

Prompt-based Secret Key
Recently, prompt-based learning Schick and Schütze (2021); Lester et al. (2021) has achieved state-of-the-art results in many tasks. Inspired by these prompt-based methods, we consider a prompt as our secret key, which users can use to access the target domain. As shown in Figure 2, we first assign a randomly chosen prompt as the secret key, where is the -th token in the prompt sentence and is the length of the prompt. Given an input sentence with length , we concatenate the prompt with the input to construct an authorized input sentence. In addition, similar to inference without the prompt key, we continue using the hidden representation at the position of [CLS] as the input of the task classifier and get the predicted label.
With the introduction of the prompt-based key, we believe that there are 3 different distributions in this task: source domain, target domain, and target+prompt domain. In the prompt-based secret key model, after prepending the specific prompt to the target input, the model can recover the ability to perform well in the target domain. Therefore, we try to train the feature extractor to close the distance between the target+prompt domain and the source domain while enlarging the distance between the source and the target domain without the key. To achieve this, we propose a new MMD loss:
(7) |
where denotes the data distribution of the target+prompt domain, is the scaling hyperparameter and is the upper bound for MMD.
In this way, we can transfer the knowledge from the source domain to the target+prompt domain but not to the original target domain. Therefore, we can extend Equation 5 to get the objective function for the prompt-based secret key UNTL method:
(8) | ||||
where indicates the data distribution of the combined domain of source and target+prompt.
Adapter-based Secret Key

Besides explicitly prepending the input sentences with the discrete prompt as the secret key, we could also consider adding an input adapter Houlsby et al. (2019); An et al. (2022) as the secret key. In our UNTL model, input sentences in different domains will lead to distinct performance. Intuitively, we train the input adapter to eliminate the target-like feature and convert the target embeddings into source-like embeddings. Given an embedding representation, we assume it has two components: the semantic component which is essential for text classification, and the domain component which is irrelevant to the task. We train the adapter to convert the domain component of the embedding from the target domain to the source domain while maintaining the semantic component.
The adapter architecture is shown in Figure 3. In the figure, embeddings of the target sentences are first projected into a lower dimensional space with before passing through a ReLU nonlinear function, and then projected back to the space with , where is significantly less than (in addition, there is a skip connection as shown in Figure 3). With the input adapter module, the target embeddings will be transformed into source-like ones. From this adapter-based network, we can get source-like embedding data in the target+adapter domain which can be denoted as . Similar to the prompt-based secret key method, we inject the knowledge from the source domain into the target+adapter domain by closing the distance between their representations.
However, directly using the UNTL loss above could not make sure that the adapter can maintain the task-dependent information. Therefore, we construct a dataset with the source domain data, where indicates the representation of converted by the adapter module. Then we train the model with an additional cross-entropy loss to guarantee that the embeddings converted by the adapter can contain sufficient signals about the classification task:
(9) | ||||
The overall objective function for training the adapter-based secret key model is:
(10) | ||||
where indicates the data distribution of the combined domain of source and target+adapter.
Datasets | SL | TE | GO | TR | FI |
---|---|---|---|---|---|
#Train | 68,716 | 74,087 | 68,755 | 68,755 | 68,753 |
#Valid | 08,590 | 09,261 | 08,595 | 08,595 | 08,595 |
#Test | 01,955 | 01,966 | 01,945 | 01,976 | 01,973 |
4 Experiments
In this section, we first conduct experiments to verify the effectiveness of our UNTL method and then show that our proposed secret key-based UNTL approach could recover access to the target unauthorized domain when the secret key is supplied in the form of a discrete prompt or an additional adapter module. We use MultiNLI Williams et al. (2018) as our benchmark dataset, which is for a 3-class classification task with balanced labels. We begin with some training details of all experiments and then discuss the results in different settings.
4.1 Experimental Setup
Our models are implemented in PyTorch Paszke et al. (2019) and all experiments are conducted with NVIDIA Quadro RTX 8000 GPUs and we run three times with different seeds. We use the preprocessed MultiNLI dataset from Huggingface Lhoest et al. (2021). Based on the genre information, we divide the MultiNLI dataset into 5 parts, namely, slate (SL), telephone (TE), government (GO), travel (TR), and fiction (FI) as different domains. As Huggingface only has a training set and a validation set for MultiNLI, we split the training dataset into 8:1 as the training set and the evaluation set, and consider the validation set as the test set in our experiments. The dataset statistics can be found in Table 1. As for the model architecture, we use the pretrained language model BERT Devlin et al. (2019) as our feature extractor and a randomly initialized one-layer feed-forward network as the classifier. We use the Adam Kingma and Ba (2015) optimizer with = 0.9, = 0.999. More details can be found in Appendix A.
Src\Tgt | SL | TE | GO | TR | FI |
---|---|---|---|---|---|
SL | |||||
TE | |||||
GO | |||||
TR | |||||
FI |
Src\Tgt | SL | TE | GO | TR | FI |
---|---|---|---|---|---|
SL | |||||
TE | |||||
GO | |||||
TR | |||||
FI |
Src\Tgt | SL | TE | GO | TR | FI |
---|---|---|---|---|---|
SL | |||||
TE | |||||
GO | |||||
TR | |||||
FI |
Source\Target | SL | TE | GO | TR | FI |
---|---|---|---|---|---|
SL | |||||
TE | |||||
GO | |||||
TR | |||||
FI |
Source\Target | SL | TE | GO | TR | FI |
---|---|---|---|---|---|
SL | |||||
TE | |||||
GO | |||||
TR | |||||
FI |
4.2 Results for UNTL
We first train a supervised classification model only on the source domain, shown in Table 2, as a baseline. From the baseline results, we can observe that the knowledge for one domain can be easily transferred to others. Although only trained on the source domain, the neural network shows considerable performance on the unseen domains. In our UNTL experiments, we traverse all possible domain pairs and Table 3 shows that the method successfully degrades the performance in the target domain to between 32.6% and 40.0%, which is near random choice (33.3%) in such 3-label classification tasks.
We can observe that the largest performance degradation is from 80.4% to 34.6%, in which the source-target pair is Travel and Government. In addition, though the target accuracy can be decreased a lot, the model can still maintain a good performance in the source domain. The maximal average drop in the source domain is only 1%. The results in Table 3 suggest that our UNTL model can successfully reduce the target performance whilst maintaining a decent accuracy in the source domain even when the source and target domains are similar and without the target labels.
Comparison with original NTL
We also compare our method with the original NTL Wang et al. (2022) to show that the labels in the target domain are not really necessary. Table 4 shows the performance when the original NTL is applied. As we can see from Table 3, our UNTL model performs similarly to the NTL method in the source domain. Although NTL degrades the performance slightly better in the target domain as compared to UNTL, both methods successfully reduce the accuracy on the target domain to close to random chance and the difference is negligible. Therefore, we show empirically that labels in the target domain are not strictly necessary as our UNTL model can still succeed in preventing the knowledge transfer from the source to the target domain even without the target labels.
4.3 Results for UNTL with secret keys
Prompt-based Secret Key
We continue to use all possible domain pairs in our experiments, and assign a non-task-dependent sentence ‘Here this a password key messages, Do not tell others.’444 Note that this sentence that serves as a secret key is intentionally ungrammatical. as the prompt-based key. From Table 5, we could see that the performance in the target domain ranges from 32.9% to 41.5%. Moreover, with the specific prompt, we can successfully access the target domain and get better performance. We further make a comparison with the baseline in Table 2, where non-transferable learning is not used. Though prompt-based secret key could recover the ability, the average accuracy is 7% worse than the baseline in the target domain.




Adapter-based Secret Key
In this experiment, we apply the input adapter after the embedding layer as the secret key and train our unsupervised non-transferable learning model, and the results are shown in Table 6. Under the adapter setting, the performance in the target domain can be similarly degraded to between 33.2% and 42.7% as with the prompt-based secret key. Moreover, the adapter is able to restore the degraded performance in the target domain to be on par with the baseline performance in Table 2. With the additional input adapter, our method can recover the model capability in the target domain better than the prompt-based methods. We hypothesize the reason could be that the models may still struggle to distinguish the target domain from the target+prompt domain in which the instances are constructed by prepending a discrete prompt in front of the input sentences. The representations between them are hard to be divorced by the MMD loss and DC loss. On the other hand, the adapter module transforms the input sentences in the continuous space and can be also jointly trained to construct a target+adapter domain that is different from the target domain.
Overall, the results demonstrate that not only we can achieve training a model that has good accuracy in the source domain while performing poorly in the target domain but also restore the performance in the target domain with an adapter.
Discussion
Here we provide a comparison between the two types of secret keys. We first start with the trade-offs between storage and performance. While the adapter-based secret key has higher storage requirements and requires an additional 99K parameters for the input adapter module, this accounts for only about 0.09% of the BERT-base (109M parameters). In exchange, the adapter-based secret key outperforms the prompt-based secret key by around 7% in recovering the performance in the target domain.
We also compare the performance of the model with and without the secret keys. For the sake of simplicity, we refer to our earlier example and denote the models with and without keys as members and regular users respectively, where the members have access to the target domain, while the regular users do not. In the target domain, members will have good results, but regular users will not. In the source domain where all users have access, applying the prompt-based secret key will cause the accuracy for members to decrease by 5%, which is undesired. In contrast, applying the adapter-based secret key does not cause such issues and the accuracy for members will be almost the same (0.2% improvement) as for the regular users.
4.4 Ablation Study
Source | Target | ||
---|---|---|---|
UNTL | 77.4 | 35.3 | 42.1 |
74.0 | 35.5 | 38.5 | |
76.6 | 43.1 | 33.5 |
Source | Target | Target+Key | ||
---|---|---|---|---|
PSK | 77.5 | 36.0 | 69.7 | 33.7 |
76.9 | 39.5 | 69.5 | 30.0 | |
70.3 | 41.6 | 40.8 | -0.8 | |
ASK | 77.7 | 36.1 | 74.4 | 38.3 |
70.7 | 64.4 | 66.2 | 1.8 | |
73.4 | 46.4 | 68.6 | 22.2 |
In this section, we investigate the impact of different UNTL losses: MMD and DC. These two losses can maximize the discrepancy between representations of the source and target domains from different aspects. The MMD loss will try to enlarge the average distance between the two domains, but the boundary may not be clear. On the other hand, the DC loss makes up for the shortcomings of the MMD loss in terms of the boundary issue. As Table 7 shows, in UNTL, the difference between the performance of the source and target domain decreases when we remove either the MMD loss or the DC loss. Based on this result, we use t-SNE van der Maaten and Hinton (2008) to visualize the representation distribution from the output of BERT of the source domain and the target domain. As Figure 4 shows when only training with the DC or the MMD loss, the two distributions are close and the boundary can also be unclear. Only if we apply both losses, UNTL will be able to learn different distributions for the source and the target domains. Furthermore, from Table 8, in the secret key-based methods, due to the similar initial representations of the target+key(prompt/adapter) and target domains, without using both losses to enlarge the distance, the model fails to perform well with the key (and badly without the key) in the target domain. Besides, we also found that the prompt-based secret key method may rely more on the MMD loss, while the adapter-based secret key method tends to depend more on the DC loss. We speculate that the cause may be that: 1) in the prompt-based secret key method, the domain classifier can easily differentiate between the different domains based on the prompt pattern, but the representations will still be close in the continuous space without the MMD loss, whereas 2) in the adapter-based secret key method, the initial output embeddings of the adapter module are the same as the input ones. Initially, the representations of the target+adapter domain and target domain could be highly similar to each other, resulting in a small MMD loss. Thus, when only the MMD loss is used, the adapter may be stuck in the initial state and it is difficult to make progress on separating the two domains from each other during the fine-tuning phase. On the other hand, the DC loss can offer stronger supervision than the MMD loss in terms of separating such two domains. Therefore, the DC loss could be playing a more significant role in the adapter-based secret key method.
5 Conclusion and Future Work
In this paper, we present our UNTL method, which trains a model to maintain a good performance in the source domain whilst having degraded performance in the target domain. Thereafter, we extend our approach with a secret key component that allows the restoration of the model performance in the target domain when the key is supplied, through two methods: prompt-based secret key and adapter-based secret key. The experiments conducted on MultiNLI datasets suggest that our unsupervised non-transferable learning method can allow the model to perform differently in the source domain and the target domain. The extensive experiments also demonstrate that our methods can effectively recover the ability of the model on the target domain with the specific secret key after non-transferable learning.
For future works, we plan to extend our methods to incorporate multiple secret keys to achieve more than two user access levels by parameter-efficient methods He et al. (2022) where we can first train our UNTL model, then freeze the parameters in the pretrained UNTL model, and train additional modules, such as prefix Li and Liang (2021) and adapter Houlsby et al. (2019), to realize different user levels. We also plan to explore other ways to degrade the performance in specific domains while maintaining good performance in other domains.
Limitations
In unsupervised non-transferable learning methods, after fine-tuning, the model tends to predict the same label for any input in the target domain. In other words, when the model recognizes an input coming from the target domain, it tends to consistently assign a particular label. We would like to highlight that, while our method is effective in the sense that it prevents the model from functioning well on the target domain, there is no guarantee that it would always yield “worse performance” as measured by accuracy as an evaluation metric. Consider an extreme scenario where the labels in the target domain are highly unbalanced – the domain consists of instances labeled with a particular label only. At the same time, our model happens to predict that particular label for any input from the target domain. In that case, the model may seemingly perform very well with a “high accuracy”. To resolve this known limitation, a different evaluation metric may be needed in order to properly assess the true performance of our model in target domains with unbalanced labels.
Ethics Statement
Our work focuses on our unsupervised non-transferable learning in order to protect neural networks as intellectual property whilst making secure authorization more flexible with the secret key methods. Nevertheless, we would like to point out that a malicious third-party neural network provider may utilize these methods for harmful purposes. For example, the provider could use unsupervised non-transferable learning and secret key methods to insert an invisible backdoor into the model and extract private information from it.
Acknowledgements
We would like to thank the anonymous reviewers, our meta-reviewer, and senior area chairs for their constructive comments and support on this work. We would also like to thank Vanessa Tan for her help. This research/project is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-PhD/2021-08-007[T]).
References
- Adi et al. (2018) Yossi Adi, Carsten Baum, Moustapha Cissé, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In Proceedings of 27th USENIX Security Symposium, pages 1615–1631, Baltimore, MD. USENIX Association.
- Alam et al. (2020) Manaar Alam, Sayandeep Saha, Debdeep Mukhopadhyay, and Sandip Kundu. 2020. Deep-lock: Secure authorization for deep neural networks. CoRR, abs/2008.05966.
- An et al. (2022) Shengnan An, Yifei Li, Zeqi Lin, Qian Liu, Bei Chen, Qiang Fu, Weizhu Chen, Nanning Zheng, and Jian-Guang Lou. 2022. Input-tuning: Adapting unfamiliar inputs to frozen pretrained models. CoRR, abs/2203.03131.
- Ben-David et al. (2010) Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Mach. Learn., 79(1-2):151–175.
- Ben-David et al. (2006) Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. In Proceedings of NeurIPS.
- Blitzer et al. (2007) John Blitzer, Mark Dredze, and Fernando Pereira. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of ACL.
- Chen et al. (2021) Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, and Dawn Song. 2021. REFIT: A unified watermark removal framework for deep learning systems with limited data. In ASIA CCS ’21: ACM Asia Conference on Computer and Communications Security, Virtual Event, Hong Kong, June 7-11, 2021, pages 321–335. ACM.
- Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL.
- Farahani et al. (2020) Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R. Arabnia. 2020. A brief review of domain adaptation. CoRR, abs/2010.03978.
- Ganin et al. (2016) Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor S. Lempitsky. 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res., 17:59:1–59:35.
- Ghifary et al. (2014) Muhammad Ghifary, W. Bastiaan Kleijn, and Mengjie Zhang. 2014. Domain adaptive neural networks for object recognition. In PRICAI 2014: Trends in Artificial Intelligence - 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, December 1-5, 2014. Proceedings, volume 8862 of Lecture Notes in Computer Science, pages 898–904. Springer.
- Gretton et al. (2012) Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander J. Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res., 13:723–773.
- He et al. (2022) Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2022. Towards a unified view of parameter-efficient transfer learning. In Proceedings of ICLR.
- He et al. (2018) Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2018. Adaptive semi-supervised learning for cross-domain sentiment classification. In Proceedings of EMNLP.
- Houlsby et al. (2019) Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In Proceedings of ICML.
- Kemker et al. (2018) Ronald Kemker, Marc McClure, Angelina Abitino, Tyler L. Hayes, and Christopher Kanan. 2018. Measuring catastrophic forgetting in neural networks. In Proceedings of AAAI.
- Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR.
- Kuribayashi et al. (2020) Minoru Kuribayashi, Takuro Tanaka, and Nobuo Funabiki. 2020. Deepwatermark: Embedding watermark into DNN model. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2020, Auckland, New Zealand, December 7-10, 2020, pages 1340–1346. IEEE.
- Lester et al. (2021) Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of EMNLP, pages 3045–3059.
- Lhoest et al. (2021) Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Sasko, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut, Stas Bekman, Pierric Cistac, Thibault Goehringer, Victor Mustar, François Lagunas, Alexander M. Rush, and Thomas Wolf. 2021. Datasets: A community library for natural language processing. In Proceedings of EMNLP.
- Li and Liang (2021) Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of ACL.
- Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. Pytorch: An imperative style, high-performance deep learning library. In Processing of NeurIPS.
- Schick and Schütze (2021) Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of EACL.
- Schoenauer-Sebag et al. (2019) Alice Schoenauer-Sebag, Louise Heinrich, Marc Schoenauer, Michele Sebag, Lani Wu, and Steve Altschuler. 2019. Multi-domain adversarial learning. In Processing of ICLR.
- Song et al. (2017) Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, pages 587–601. ACM.
- Tishby et al. (2000) Naftali Tishby, Fernando C. N. Pereira, and William Bialek. 2000. The information bottleneck method. CoRR, physics/0004057.
- Tzeng et al. (2014) Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. CoRR, abs/1412.3474.
- van der Maaten and Hinton (2008) Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605.
- Wang et al. (2022) Lixu Wang, Shichao Xu, Ruiqi Xu, Xiao Wang, and Qi Zhu. 2022. Non-transferable learning: A new approach for model ownership verification and applicability authorization. In Proceedings of ICLR.
- Wang and Kerschbaum (2019) Tianhao Wang and Florian Kerschbaum. 2019. Attacks on digital watermarks for deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019, pages 2622–2626. IEEE.
- Wang (2018) Zirui Wang. 2018. Theoretical guarantees of transfer learning. CoRR, abs/1810.05986.
- Williams et al. (2018) Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of NAACL.
- Wu et al. (2021) Hanzhou Wu, Gen Liu, Yuwei Yao, and Xinpeng Zhang. 2021. Watermarking neural networks with watermarked images. IEEE Trans. Circuits Syst. Video Technol., 31(7):2591–2601.
- Zhang et al. (2020) Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Weiming Zhang, Wenbo Zhou, Hao Cui, and Nenghai Yu. 2020. Model watermarking for image processing networks. In Proceedings of AAAI.
- Zhu et al. (2021) Yongchun Zhu, Fuzhen Zhuang, Jindong Wang, Guolin Ke, Jingwu Chen, Jiang Bian, Hui Xiong, and Qing He. 2021. Deep subdomain adaptation network for image classification. IEEE Trans. Neural Networks Learn. Syst., 32(4):1713–1722.
Appendix A Implementation Details
A.1 Network Architecture
We use the bert-base-uncased model in Huggingface as our feature extractor. As for the text classifier, domain classifier, and adapter module, the architecture of these modules is shown in Table 9.
Architecture | |
---|---|
Text Classifier | Linear(768, 3) |
Domain Classifier | Linear(768, 2) |
Linear(64, 768) | |
Adapter | ReLU() |
Linear(768, 64) |
A.2 Hyperparameters
The learning rate of our experiment is shown in Table 10. We use three different seeds in our experiments: 20, 2022, and 2222. The batch size is 256. However, since the memory of the GPU is limited, we use Gradient Accumulation, a mechanism of PyTorch, to split a large batch of samples into several smaller batches of samples. In our UNTL experiments, the gradient accumulation step is 2 and the small batch size is 128 so that the accumulated batch size is . We set the number of evaluation steps to 40. We use 5 epochs in the baseline experiments while 8 epochs in our UNTL experiments.
Bert | Text Cls | Domain Cls | Adapter | |
---|---|---|---|---|
Baseline | 5e-5 | 1e-3 | - | - |
UNTL | 5e-5 | 15e-4 | 1e-3 | - |
UNTL w/ prompt | 5e-5 | 2e-3 | 1e-3 | - |
UNTL w/ adapter | 5e-5 | 2e-3 | 1e-3 | 1e-3 |
UNTL | - | 0.5 | 0.1 | 10.0 | 1.0 |
---|---|---|---|---|---|
UNTL w/ prompt | 5.0 | 2.0 | 0.1 | 10.0 | 4.0 |
UNTL w/ adapter | 10.0 | 1.5 | 0.1 | 10.0 | 2.0 |
As for the hyperparameters, in our implementation, we apply a new scaling factor to control the cross entropy loss for text classification in order to balance the distance loss (especially in the secret key-based methods). The hyperparameters are shown in Table 11.
A.3 Metric
As the unsupervised non-transferable learning task aims to train the model to have a good performance in the source domain while performing badly in the target domain, we are more concerned with the difference between the performance of the source and the target domains. Therefore, we present a new metric based on it:
(11) |
where denote the accuracy in the source and the target domains respectively.
As for secret key-based UNTL, we aim to improve the performance of both the target+key(prompt/adapter) domain and the source domain while degrading the target domain performance. Therefore, we add a new difference between the performance of the key domain and the target domain to construct the metric as:
(12) |
where denotes the accuracy in the key domain. With such metrics, we select the best checkpoint based on the score over the development set.
Datasets | Book | DvD | Elec | Kitchen |
---|---|---|---|---|
#Train | 1,600 | 1,600 | 1,600 | 1,600 |
#Valid | 0,200 | 0,200 | 0,200 | 0,200 |
#Test | 0,200 | 0,200 | 0,200 | 0,200 |
Datasets | Beauty | Book | Elec | Music |
---|---|---|---|---|
#Train | 4,800 | 4,800 | 4,800 | 4,800 |
#Valid | 0,600 | 0,600 | 0,600 | 0,600 |
#Test | 0,600 | 0,600 | 0,600 | 0,600 |
Appendix B Experiments on Additional Datasets
B.1 Datasets
In this part, we present our experimental results on two additional datasets for sentiment analysis from Blitzer et al. (2007) (binary classification) and He et al. (2018) (ternary classification). We here denote them as polarity Amazon dataset and ternary Amazon dataset respectively. Table 12 and 13 show the statistics of them.
Src\Tgt | Book | DvD | Elec | Kitchen |
---|---|---|---|---|
Book | ||||
DvD | ||||
Elec | ||||
Kitchen |
Src\Tgt | Book | DvD | Elec | Kitchen |
---|---|---|---|---|
Book | ||||
DvD | ||||
Elec | ||||
Kitchen |
Src\Tgt | Beauty | Book | Elec | Music |
---|---|---|---|---|
Beauty | ||||
Book | ||||
Elec | ||||
Music |
Src\Tgt | Beauty | Book | Elec | Music |
---|---|---|---|---|
Beauty | ||||
Book | ||||
Elec | ||||
Music |
Source\Target | Book | DvD | Elec | Kitchen |
---|---|---|---|---|
Book | ||||
DvD | ||||
Elec | ||||
Kitchen |
Source\Target | Beauty | Book | Elec | Music |
---|---|---|---|---|
Beauty | ||||
Book | ||||
Elec | ||||
Music |
Source\Target | Book | DvD | Elec | Kitchen |
---|---|---|---|---|
Book | ||||
DvD | ||||
Elec | ||||
Kitchen |
Source\Target | Beauty | Book | Elec | Music |
---|---|---|---|---|
Beauty | ||||
Book | ||||
Elec | ||||
Music |
B.2 Performance for UNTL
Table 14 and 15 show the performance for baseline results and the results for UNTL in the polarity Amazon dataset. Similarly, Table 16 and 17 show the results in the ternary Amazon dataset. From the results above, we can find our UNTL method can maintain similar performance as the baseline and degrade the performance in the target domain to nearly random choice (33.3% in the ternary classification tasks and 50.0% in the polarity classification tasks).
B.3 Performance for UNTL with Secret Keys
Table 18 and 19 show the performance for UNTL with Prompt-based Secret Key in the polarity Amazon dataset and the ternary Amazon dataset respectively. As for the Adapter-based Secret Key, Table 20 and 21 show the performance in these two datasets respectively. After applying the secret keys, we can see that the performance in the target domain can be restored to better results. Comparing two different kinds of secret keys, we find that the adapter-based models can get better results in the target domain than the prompt-based models.