Privacy-Preserving Language Model Inference with Instance Obfuscation

Yixiang Yao¹, Fei Wang¹, Srivatsan Ravi¹, Muhao Chen²
¹University of Southern California
²University of California, Davis
¹{yixiangy,fwang598,srivatsr}@usc.edu
²[email protected]

Abstract

Language Models as a Service (LMaaS) offers convenient access for developers and researchers to perform inference using pre-trained language models. Nonetheless, the input data and the inference results containing private information are exposed as plaintext during the service call, leading to privacy issues. Recent studies have started tackling the privacy issue by transforming input data into privacy-preserving representation from the user-end with the techniques such as noise addition and content perturbation, while the exploration of inference result protection, namely decision privacy, is still a blank page. In order to maintain the black-box manner of LMaaS, conducting data privacy protection, especially for the decision, is a challenging task because the process has to be seamless to the models and accompanied by limited communication and computation overhead. We thus propose Instance-Obfuscated Inference (IoI) method, which focuses on addressing the decision privacy issue of natural language understanding tasks in their complete life-cycle. Besides, we conduct comprehensive experiments to evaluate the performance as well as the privacy-protection strength of the proposed method on various benchmarking tasks.

1 Introduction

Language Models as a Service (LMaaS; Sun et al. 2022; Brown et al. 2020) empowers researchers and developers to access pre-trained language models (PLMs) through cloud services without worrying about the complexities of model training, deployment, and infrastructure management. To interact with LMaaS, users usually send API requests to the designated endpoints designed by the service providers and receive responses generated by the remote language models. Such a setup benefits both parties: on the one hand, users can jump-start on integrating the powerful PLMs into their data processing tasks; on the other hand, the underlying models and the processing pipelines, as the intellectual properties, are hidden from end users so that the service providers can protect them from leakage. However, given the lack of user control over the blackbox cloud service, the data in the requests can be illegally used by the service providers or potential attackers, thus causing privacy issues, including data leakage, unauthorized data access, profiling, and tracking Sen (2015); Tang et al. (2016).

Refer to caption — Figure 1: A privacy adversary example with state-of-the-art privacy protection in LMaaS. Despite encoding the end user’s input into privacy-preserving representations, the raw output representations or decisions are still in plaintext, making them vulnerable to attacks from both network channels and servers.

Recent literature has started to address the privacy issues of user inputs in LMaaS, for which solutions are typically based on techniques privatizing the input representation into intermediate ones. Methods of such kind include noise injection Plant et al. (2021), differential privacy (DP) Hoory et al. (2021); Yue et al. (2021); Xu et al. (2020), and adversarial training Li et al. (2018); Coavoux et al. (2018). Moreover, the intermediate representations are further fused or manipulated to prevent reverse engineering, while still remaining sufficient information for effective model inference Zhou et al. (2022). Unfortunately, to the best of our knowledge, none of the existing methods takes decision privacy into consideration, that is, the inferencing results are not protected which could implicitly or explicitly reveal users’ sensitive information based on the specific tasks applied Shejwalkar et al. (2021); Kahla et al. (2022). For example, as shown in Figure 1, a PLM employed by the online disease diagnosis service can analyze and determine the type of diseases based on the symptom descriptions from the patients. Even though privacy-preserving representations as the input can somehow protect the patients’ submitted content, sensitive information such as the distribution of diseases Mao et al. (2011) from the output aggregation still discloses to the malicious cloud service providers or the hackers via network sniffing.¹¹1The security of the network channel is not the scope of this paper. You can assume it is already end-to-end encrypted.

Considering the significance and necessity of decision privacy protection, we propose to investigate a method that ensures the protection of both raw input content and raw output representation. However, protection in the decision phase could be more challenging than its counterpart in the input phase due to several reasons. First, unlike user inputs, since the final decision is made by the PLM on the cloud, users will have no direct means to intervene in it. Second, due to the required anonymity, incurred communication costs inevitably increase. Third, from the perspective of intellectual property protection, it is not practical for LMaaS providers to disclose parameters and architectures of the models, including the last few layers that are close to the decisions, to users. These challenges call upon a solution that effectively protects the models’ decisions before they come out, while does not violate the black-box nature of the LMaaS.

In this paper, we propose IoI (Instance-obfuscated Inference), which aims to protect the privacy of PLM decisions without losing the compatibility of utilizing the state-of-the-art input privacy protection approaches at inference phase. During inference, IoI intentionally obfuscates the instance, hiding the raw decision distribution from revealing any sensitive information. However, the user who applies the obfuscation retains the ability to recover the true decision distribution. Note as a pilot study, IoI focuses on text classification tasks.

Despite distinctiveness, to avoid the ambiguity of understanding different privacy techniques, in Figure 2, we summarize them according to the application scenarios. Specifically, SOTA methods utilize DP for training time data privacy. Other aforementioned noise addition or perturbation methods safeguard the raw input from being reverse-engineered. On the contrary, IoI ensures the confidentiality of decisions from the model inference.

The contributions of this work are three-folds. First, to the best of our knowledge, this is the first approach to explore the feasibility of protecting PLM decisions in a black-box manner. Second, we define decision privacy, and comprehensively study the instance obfuscation strategies and privacy-preserving decision resolution in the context of it. Third, we define evaluation metrics for decision privacy, and empirically verify the performance and privacy strength of the proposed method.

2 Privacy-Preserving Inference

For a text classification task $M:\mathcal{X}\rightarrow\mathcal{Y}$ , where $\mathcal{X}$ is the input text and $\mathcal{Y}$ is the label set. The privacy-preserving inference takes a step further avoiding the exposure of any private information about the inputs and model decisions to the service provider. While the encoding methods²²2The encoding mentioned in this paper is not by the PLM’s encoder but as “encryption”. for protecting the privacy of $\mathcal{X}$ start emerging, the counterpart for $\mathcal{Y}$ , which we call decision privacy, remains uncharted.

The intuition for achieving decision privacy is to make the model’s raw decision as random as possible to all parties except the input instance owner, and the raw decision can only be recovered via a certain resolution method by the input instance owner. In the rest of this section, we formally define decision privacy in the context of text classification, as well as privacy-preserving inference.

2.1 Decision Privacy

For text classification, suppose $(\bm{x},y)$ is an instance of $(\mathcal{X},\mathcal{Y})$ , and a finite label set $C=\{c_{i}|1\leq i\leq n,n\geq 2\}$ is the range of $\mathcal{Y}$ . We say $M$ ’s output has perfect privacy if

Pr[M(\bm{x})=c_{i}]\approx\frac{1}{n},

(1)

that is, the probability of an adversary acquiring the predicted label $c_{i}$ from $M$ for the given input $\bm{x}$ is almost no better than a random guess.

However, directly adhering to Equation 1 leads to compromised functionality of $M$ , since $M$ is essentially a random choice function and useless in practice. Instead, a certain encoding function $E(\cdot)$ can be performed on the input $\bm{x}$ , so that the decision privacy of an arbitrary model $M$ is ensured by $E(\cdot)$ :

|Pr[M(E(\bm{x}))=c_{i}]-\frac{1}{n}|\leq\epsilon,

(2)

where $\epsilon\in[0,1)$ is seen as a privacy budget. Adjusting $\epsilon$ balances the utility and privacy: the smaller the $\epsilon$ is, the better the decision privacy.

2.2 Problem Definition

The privacy-preserving inference is defined as:

M(E(\bm{x}))\rightarrow y^{\prime},

(3)

where the encoding function $E(\cdot)$ has two functionalities: (1) It encodes the raw $\bm{x}$ into some privacy-preserving representation remaining interpretable by $M$ , which is already studied by previous work Qu et al. (2021); Yue et al. (2021); Zhou et al. (2022) and is not the focus of this paper. (2) The inference result transitions from the actual prediction $y$ to the privacy-preserving $y^{\prime}$ , whose distribution satisfies decision privacy defined by Equation 2:

|Pr[y^{\prime}=c_{i}]-\frac{1}{n}|\leq\epsilon.

(4)

The privacy property is ensured by $E(\cdot)$ . It is mathematically hard or impossible to find its inverse function $E^{-1}(\cdot)$ , so that the adversary can not either recover the raw input $\bm{x}$ from the privacy-preserving representation $E(\bm{x})$ , or the actual prediction $y$ from the privacy-preserving prediction $y^{\prime}$ . A decoding³³3This is different from the traditional “decryption” in information security. function $D(\cdot)$ is available to decode true $y$ from $y^{\prime}$ with the knowledge of the raw input and encoding settings, that is,

y\leftarrow D(y^{\prime},E,\bm{x}).

(5)

Equation 5 can be extended to the case that resolving a $y$ depends on mulitple $y^{\prime}$ s:

y\leftarrow D(y_{0}^{\prime},\cdots,y_{g}^{\prime},E,\bm{x}),

(6)

where $y_{0}^{\prime}\cdots y_{g}^{\prime}$ are all necessary $y^{\prime}$ s to decode $y$ . Note $E$ and $\bm{x}$ are for identifying these $y^{\prime}$ s, and could be omitted if the user maintains the reference between $y^{\prime}$ s and $y$ .

Therefore, privacy-preserving inference allows the user to query LMaaS without exposing sensitive information to the service provider or the adversary, by sending $E(\bm{x})$ to the server and decoding $y$ from $y^{\prime}$ with $D(\cdot)$ locally. Throughout the process, the adversary learns nothing from the encoded input $E(\bm{x})$ or encoded decision $y^{\prime}$ .

Distinctions to DP or input privacy. In general, DP adds proper noise to the given input instances so that the individual information of input instances will not be leaked, but the overall statistical features of them remain. Similarly, most input privacy methods perturb the raw input to prevent input reverse-engineering while keeping the necessary information for inference. Hence, an ideal DP or input privacy method should satisfy $M(E(\bm{x}))\approx M(\bm{x})$ , where $E$ is the corresponding DP or input privacy method, while protecting the privacy of $\bm{x}$ . On the contrary, decision privacy tends to make $M(E(\bm{x}))$ as random as possible whereas $D(M(E(\bm{x})))\approx M(\bm{x})$ .

3 Method

This section begins with an overview of our privacy-preserving inference framework for text classification. It follows by detailing the core component $E(\cdot)$ for encoding in Section 3.1, and $D(\cdot)$ for decoding in Section 3.2. Finally, analyses and justifications regarding privacy properties are provided in Section A.2.

The intuition behind IoI is to obfuscate the raw instance with obfuscators so that the PLM’s inference distribution is intentionally steered. Thus, the adversaries cannot deduce the true decision unless they possess knowledge of the corresponding resolution method and parameters. The general workflow of IoI is shown in Figure 3, consisting of instance obfuscation as $E(\cdot)$ and decision resolution as $D(\cdot)$ . Instead of sending the raw instance $x$ to the PLM and acquiring the decision, IoI conceals $x$ by concatenating it with an obfuscator $b$ , which is also a text sequence (Section 3.1). The concatenated text $[b;x]$ along with the obfuscator $b$ are sent to the privacy-preserving representation generation (PPRG) module, respectively, where the input is encoded by a compatible SOTA input privacy method. PPRG produces privacy-preserving representations, which are irreversible and remain distinct even for identical inputs, and are treated as inputs for the PLM. After PLM’s inference on PPRG-encode $b$ and $[b;x]$ , the raw decision distribution of $[b;x]$ does not reflect the inference of $x$ since it is steered by the elaborated obfuscator $b$ . But as the data owner, the actual decision $y$ can be resolved via decision resolution (Section 3.2) by utilizing the decision distributions of $[b;x]$ and corresponding $b$ . We further show that the true $y$ is hard to be recovered from $y^{\prime}$ s in Appendix A.

3.1 Instance Obfuscation

Sending the input instance $x$ in plaintext to the PLM reveals the input completely. Hence, some previous studies Zhou et al. (2022); Plant et al. (2021) employ a privacy-preserving text representation that transforms the input into “ciphertext” form by perturbing representations. In this way, although the raw input content is not exposed, the output of the PLM still carries meaningful information and may be exploited by the adversary. To tackle the flaw of limited protection of the decision by previous methods, IoI uses instance obfuscation, acting as $E(\cdot)$ in Section 2. It not only protects the input privacy by reusing the existing SOTA privacy-preserving text representation methods as PPRG but also “fools” the adversary with baffled output for decision privacy.

The instance obfuscation is motivated by mixup Zhang et al. (2018), originally proposed for data augmentation. Zhang et al. (2018) theoretically shows that mixup can produce virtual feature-target pairs sampled from the same vicinal distribution as the original data. Specifically, it shows that, through a mapping (i.e., the LM in this context), the mixup of two raw inputs can be mapped to the mixup of their corresponding labels. Based on that, if $E(\cdot)$ conceals the real instance $x$ by mixing it up with dummy instances, the PLM only makes an inference on the mixup instance without seeing $x$ , and the proportion of dummy instances participated in the mixup steers the final decision. We call these dummy instances obfuscators; thus, $E(\cdot)$ obscures the true instance with selected obfuscators and let the PLM make decisions based on the elaborated input. The obfuscation process is the key for concealing information in a black-box LMaaS setting. This leads to the question of what is considered to be a high-quality obfuscator and how to obfuscate $x$ with $b$ properly, hence maximizing the performance as well as privacy protection.

Obfuscator Selection. Obfuscators are simply normal (unlabeled) sentences that could be with or without any relation to the real instances to be protected. To be used as an obfuscator, an instance requires to have a corresponding predicted label from PLM. Note that the predicted label does not need to be correct so there is no need for a gold label. Thus, an obfuscator $b$ could be a sentence from any arbitrary corpus.

To steer the PLM’s decision towards being affected by $b$ instead of $x$ , we prioritize $b$ instances with higher confidence regarding the PLM decision. For example, in a binary classification task, if an instance $x_{1}$ scored 0.9 for label 1 and $x_{2}$ scored 0.7, then $x_{1}$ is picked over $x_{2}$ for $x_{1}$ is more deterministic to get label 1. Since the selected obfuscators can be paired with any real instances, an optimized way to re-use them is to have them pre-computed in an obfuscator pool.

Obfuscator Balancing. Based on the observation, a single $b$ instance for obfuscating $x$ results in the un-stableness in decision resolution (Section 3.2) due to the uneven distribution of the PLM’s decision. For example, in a 3-class classification, assume $b_{1}$ has label $c_{1}$ , $b_{2}$ has label $c_{2}$ , and $b_{3}$ has label $c_{3}$ . After a single obfuscation with $b_{1}$ , the label of $[b_{1};x]$ predicted by $M$ could remain $c_{1}$ , or change to $c_{2}$ or $c_{3}$ . Thus, the steering of decision distribution using a single obfuscator is not steady. Balancing, as a solution, is employed to mitigate this issue. Specifically, each real instance $x$ is paired with at least one unit group of obfuscators. A unit group of obfuscators is defined as a set containing obfuscators with uniformly distributed labels from the label set $C$ , that is,

g=\{b_{j}|M(b_{j})=c_{i},b_{j}\in B,\forall c_{i}\in C\},

(7)

where $B$ is the obfuscator pool, and $|g|=|C|$ . Moreover, a group can consist of more than one unit group. Formally, a group contains $n$ unit group is defined as

G_{n}=g_{1}\cup g_{2}\cup\cdots\cup g_{n}.

(8)

Therefore, the obfuscated instances of $x$ are noted as $[b_{i};x]$ where $b_{i}\in G_{n}$ . Using balancing in the previous example, $x$ concatenates with all three obfuscators and results in three obfuscated instances $[b_{1};x]$ , $[b_{2};x]$ and $[b_{3};x]$ .

Privacy-Preserving Representation Generation. Even though the raw instance $x$ is replaced with $[b;x]$ and $x$ , the content remains in plaintext. To protect their privacy, IoI uses PPRG, being compatible with any compatible SOTA input privacy methods Zhou et al. (2022); Plant et al. (2021), for transforming $[b;x]$ and $b$ from text sequences to privacy-preserving representations. A qualified input privacy method has two requirements regarding privacy. First, the produced representation is not invertible so that the adversary can not reverse it back to plaintext. Second, the input privacy method is equipped with randomness so that the produced representation is distinct even for identical inputs.

After applying PPRG, the representations from multiple $x$ s should be sent to the PLM in arbitrary order. This prevents the adversary from pairing up the encoded $[b;x]$ and $b$ . A more detailed discussion regarding privacy is in Appendix A.

3.2 Privacy-Preserving Decision Resolution

While the obfuscated instance ceases the raw instance $x$ from being accessible by the PLM, the true decision of $x$ is concealed in the concatenated result $y^{\prime}$ s as well. We outline a decision resolution method, as $D(\cdot)$ in Section 2 (Equation 6), to resolve true $y$ from multiple associated $y^{\prime}$ s.

As the balancing described in Section 3.1, successfully executing $D(\cdot)$ to get the decision of $x$ requires all the associated $[b;x]$ and $b$ pairs. As to adversaries, correctly locating all associated instances from a tremendous amount of mixed instances, which are sufficiently obfuscated and randomized, is equivalent to finding a needle in a haystack. We have detailed analysis in Appendix A.

Oppositely, as the data owner, running $D(\cdot)$ is as easy as pie. The strategy to separate $x$ ’s result $y$ from $y^{\prime}$ s is based on the divergence between the decision distribution of $[b;x]$ and $b$ . Specifically, if $x$ ’s label is $c_{k}$ , blending it with $b$ shifts the confidence of $[b;x]$ ’s decision distribution towards $c_{k}$ regardless of the $b$ ’s label. Taking our example in Figure 3, $[b;x]$ has 0.35 for “yes” and 0.65 for “no,” while $b$ has 0.12 for “yes” and 0.88 for “no”. The confidence of “yes” for $[b;x]$ increases because of the involvement $x$ , thus $x$ is highly like to be “yes.”

Without loss of generality, we inductively evaluate such divergence over a unit group $g$ , the decision label raises to be the label of $x$ if the confidence difference between the obfuscated instance and obfuscator that increases the most, that is,

\operatorname*{argmax}_{1\leq i\leq|C|}(c^{b_{j};x}_{i}-c^{b_{j}}_{i}),

(9)

where $b_{j}\in g$ , $c_{i}\in C$ , $i$ denotes label id ( $|C|$ is the number of labels) and $j$ denotes the obfuscator id. If the obfuscator group is extended to $G_{n}$ , Equation 9 can be generalized as

\operatorname*{argmax}_{1\leq i\leq|C|}\frac{1}{n}\sum_{j=1}^{|G_{n}|}(c^{b_{j};x}_{i}-c^{b_{j}}_{i}),

(10)

where $b_{j}\in G_{n}$ and the confidence difference on the decision distribution is averaged.⁴⁴4Note that this succinct but effective strategy can seamlessly be applied to various privacy-preserving inference tasks, for it is not restricted by the number of labels.

4 Experiments

We first introduce the datasets, baselines, and evaluation metrics in Section 4.1. The main results regarding task performance and decision privacy are illustrated in Section 4.2. We further study the functionalities of technical components in Section 4.3.

Dataset	Method	$T_{r}$	$T_{o}$	$\Phi_{r}\downarrow$	$\Phi_{o}\downarrow$	$\Phi\downarrow$
SST-2 (Acc.)	Fine-tuned	.924	.924	$-$	$-$	$-$
	Random Guess	.500	.500	$-$	$-$	$-$
	PP-BERT	.909	.909	.016	.818	.417
	SanText+	.830	.830	.102	.660	.381
	TextFusion	.904	.904	.022	.808	.415
	IoI	.913	.770	.012	.540	.276
MRPC (Acc./F1)	Fine-tuned	.860/.904	.860/.904	$-$	$-$	$-$
	Random Guess	.500/.500	.500/.500	$-$	$-$	$-$
	PP-BERT	.434/.294	.434/.294	.489/.675	.132/.412	.310/.543
	SanText+	.711/.750	.711/.750	.164/.170	.422/.500	.293/.335
	TextFusion	$-$ /.882	$-$ /.882	$-$ /.024	$-$ /.764	$-$ /.394
	IoI	.745/.794	.570/.628	.124/.122	.166/.256	.132/.189
SST-5 (Acc.)	Fine-tuned	.500	.500	$-$	$-$	$-$
	Random Guess	.200	.200	$-$	$-$	$-$
	PP-BERT	.490	.490	.020	.362	.191
	SanText+	.426	.426	.148	.282	.215
	IoI	.467	.339	.066	.174	.120
QNLI (Acc.)	Fine-tuned	.915	.915	$-$	$-$	$-$
	Random Guess	.500	.500	$-$	$-$	$-$
	PP-BERT	.658	.658	.281	.316	.298
	SanText+	.725	.725	.208	.450	.329
	IoI	.849	.648	.072	.296	.184

Table 1: Performance of resolved and obfuscated decisions by IoI and baselines.

T_{r}

indicates the task raw performance after decision resolution by the data owner, while

T_{o}

indicates the performance that model owner or attacker retrieves.

\Phi_{r}

and

\Phi_{o}

measure how close

T_{r}

and

T_{o}

to the baseline and the random guess, respectively, and

\Phi

is a balance between

\Phi_{r}

and

\Phi_{o}

. The smaller the three

\Phi

s, the better the task performance and decision privacy protection. The best result in each task is highlighted in bold. Only IoI has effects on decision privacy, while the other baselines are either non-privacy-preserving or only protect input privacy.

4.1 Experimental Setup

Datasets. Our experiments are conducted on four benchmark datasets that span across various text classification tasks. SST-2 Socher et al. (2013) requires to classify the sentiment of the given text into either positive or negative class. SST-5, as an extension of the SST-2, granularizes the binary label into five categories: very negative, negative, neutral, positive, and very positive. MRPC Dolan and Brockett (2005) is a paraphrase identification task to determine whether two sentences are paraphrases. QNLI Wang et al. (2018), derived from SQuAD Rajpurkar et al. (2016), is a natural language inference task seeking to identify if the context sentence contains the answer to the question.

Baselines. Although no direct comparable methods regarding decision privacy are available, we select four reasonable and related baselines. Fine-tuned is task fine-tuned model without privacy protection. Random Guess denotes the random guess results. PP-BERT Qu et al. (2021) is a privacy-preserving encoder that perturbs the token embeddings by adding random noise $N=rp$ where $r$ is the distance from the origin and $p$ is a unit hypersphere. $r$ is sampled from the Gamma distribution $\Gamma(n,\frac{1}{\eta})$ .⁵⁵5We set $\eta=100$ , a moderate value in the original paper, for balancing the noise strength and accuracy. SanText+ Yue et al. (2021) replaces sensitive words with GloVe Pennington et al. (2014) and utilizes differential privacy to ensure the privacy of the sanitized words.⁶⁶6Considering fairness, we set $\epsilon=3$ in SanText+ and use sanitized text directly in inference. TextFusion Zhou et al. (2022) alters the input text sequence or intermediate representations by eliminating redundant or sensitive information. Note PP-BERT, SanText+, and TextFusion were all designed for input privacy.

Metrics. Generally, the performance is evaluated by task-specific metrics (Accuracy/F1) denoted as $T$ . To measure the raw performance, we use the obfuscated ( $T_{o}$ ) for the obfuscated version of the decision, and the resolved ( $T_{r}$ ) for the true (original) performance from the decision resolution.

Besides, to quantify the effectiveness of the decision privacy protection and decision resolution, we additionally define $\Phi_{r}=\frac{T_{\texttt{baseline}}-T_{r}}{T_{\texttt{baseline}}}$ and $\Phi_{o}=\frac{|T_{o}-T_{\texttt{random}}|}{1-T_{\texttt{random}}}$ . They measure the relative performance difference from resolved $T_{r}$ to non-privacy-preserving baseline, and from obfuscated $T_{o}$ to random guess, respectively. Finally, a unified metric $\Phi=\frac{\Phi_{o}+\Phi_{r}}{2}$ measures the balance between the obfuscation strength and the task performance.⁷⁷7All three metrics are scaled to be in the range $[0,1]$ , and the smaller value indicates better performance.

4.2 Main Results

	SST-2	MRPC	SST-5	QNLI
Max sequence len	128	512	128	256
Min confidence of $b$	$>0.99$	$>0.90$	$>0.90$	$>0.99$
Group size $n$	1	1	1	1

Table 2: IoI settings for main results

The parameter settings of IoI are in Table 2.⁸⁸8PPRG is not enabled in this experiment and the effect of it is studied in Section 4.3. The backbone PLM used in each task is fine-tuned and consistent with all baselines because PP-BERT, SanText+, TextFusion, and IoI are applied in the inference phase. As the main results presented in Table 1, IoI performs almost the best among all tasks regarding the resolved ( $\Phi_{r}$ ) and obfuscated ( $\Phi_{o}$ ) results, and the balance between them ( $\Phi$ ), except few are lower but close to the best baselines. Note that only IoI can protect decision privacy, thus, its $\Phi_{o}$ is the best as compared to all others.

Specifically, on SST-2 and QNLI, the resolved results $T_{r}$ by IoI have similar accuracy as the non-privacy fine-tuned baselines indicated by the smaller $\Phi_{r}$ while still deviating obfuscated result $T_{o}$ to be as close to random as possible showing as the smaller $\Phi_{o}$ . For SST-5, as a harder version of SST-2, albeit it is not the best regarding task performance, IoI balances the trade-off, indicated by $\Phi$ , to still archive relatively better decision privacy. For MRPC, the evaluation metrics capture the abnormal performance of PP-BERT, whose resolved prediction performance $T_{r}$ is worse than a random guess. In this case, while its $\Phi_{o}$ seems to be the best among all other methods, the high $\Phi_{r}$ and $\Phi$ precisely reflect its poor balance between task performance and decision privacy.

4.3 Analyses

We further study the influence of the technical components described in Section 3.

Obfuscator Selection. To verify the loose policy of obfuscator selection that any normal sentence can be a qualified obfuscator, we conduct the following contrastive experiment on SST-2 and SST-5. Specifically, we test the real instances with the obfuscator from the same and different datasets. As shown in Table 3, using instances from a different dataset as obfuscators is indistinguishable from the ones from the same dataset.

Evaluation	Obfuscator	$T_{r}$	$T_{o}$
SST-2	SST-2	0.907	0.770
SST-2	QNLI	0.891	0.777
SST-5	SST-5	0.467	0.339
SST-5	QNLI	0.461	0.331

Table 3: The performance impact with obfuscators from the same and different datasets

Balancing. This technique is intended for mitigating the issue of unbalanced obfuscator distributions, thus increasing the accuracy for decision resolution. Here, we study the necessity of balancing. Because SST-5 can be identified as a 5-class classification problem, a unit group $g$ (Equation 7), in which the obfuscator’s labels are uniformly distributed, contains five obfuscators with different labels. According to Equation 8, a group $G_{n}$ consists of $n$ $g$ s. We set the $n$ to be from 1 to 5, that is, 5 to 25 obfuscators. Additionally, as the baseline, we test the obfuscation without balancing by randomly sampling 1 to 25 obfuscators from the obfuscator pool regardless of the classes they belong to. In Figure 4, the performance of applying balancing is presented in a solid line, and the one for the randomly sampled obfuscators is in the dashed line. For the resolved version, without balancing, the accuracy $T_{r}$ (random) improves by more than 15% from one randomly sampled obfuscator to five, and fluctuates relatively smooth after having more than a unit group of obfuscators (orange dashed line). With balancing $G_{n}$ ( $T_{r}$ , orange solid line), where $n$ ranges from 1 to 5, performs overall better and more steadily than the random samples. Unlike the resolved version, which receives the performance gains, for the obfuscated version, the performance remains stable with different $n$ , and is outperformed by the resolved version for more than 10% when the group size is at least a unit (blue lines). As a consequence, balancing archives the best and most robust performance for the resolved version, meanwhile maintaining the maximum gap to the obfuscated version for better decision privacy protection.

Privacy-Preserving Representation Generation. PPRG utilizes the compatible input privacy method in a black-box fashion to transform obfuscated instances and obfuscators into representations that preserve privacy. Since the privacy strength and the ability to prevent attacks of the input privacy method are already comprehensively studied in its corresponding papers and follow-up works, we focus on the performance of plugging it with IoI.

Table 4 demonstrates the performance of IoI when employing PP-BERT as PPRG. We test PP-BERT with various $\eta$ and compute its accuracy on SST-2. We then use it as PPRG and report $T_{r}$ and $T_{o}$ . From the observation, the difference between the resolved result $T_{r}$ and PP-BERT is trivial (see $\%$ ), which indicates the strong compatibility and recovering ability of IoI. Hence, we conclude that IoI’s task performance is dominated by the selected PPRG method as well as input privacy, meanwhile keeping the decision privacy.

PP-BERT	Acc	$T_{r}$	$T_{o}$	%
$\eta=200$	.914	.912	.768	.002
$\eta=100$	.925	.908	.759	.018
$\eta=50$	.851	.846	.698	.006
$\eta=25$	.536	.528	.516	.015

Table 4: PPRG-enabled IoI performance (SST-2)

5 Related Work

Privacy Preservation in LMaaS. Recent studies are actively engaged in addressing the privacy concerns associated with LMaaS. Methods including noise injection Plant et al. (2021), differential privacy Hoory et al. (2021); Yue et al. (2021); Xu et al. (2020), and adversarial training Li et al. (2018); Coavoux et al. (2018), and representation fusion Zhou et al. (2022) tend to perturb the input text sequence or intermediate representations by reducing unnecessary or sensitive information for PLM’s inference. There also exist approaches Feng et al. (2020); Chen et al. (2022) that seek to protect data flow end-to-end, relying on homomorphic encryption, albeit the execution of such models is very time-consuming and computationally expensive, and needs to modify the model from the server side. On the other hand, to mitigate privacy issues in cloud PLM fine-tuning, offsite-tuning Xiao et al. (2023) compresses the full PLM into a distilled version that allows users to tune plug-in adapters on their local, which protects the privacy of the user as well as the weights of PLM. Du et al. (2023) exploit local differential privacy to sanitize the embedding (and labels) for fine-tuning. However, none of the above work protects inference decision privacy of the LM under the black-box setting, which is exactly the focus of this work.

Data Obfuscation. Although mixup Zhang et al. (2018) is designed to alleviate the undesirable drawbacks of large deep neural networks, the concept of data combination and its effect on inference with minimal computation overhead is valuable and worth learning. Guo et al. (2019) extend the mixup into the NLP world, and Co-mixup Kim et al. (2021) discovers the possibility of applying mixup on multiple instances. Besides representation mixup, recent studies also obfuscate authorship of text by neutralizing the stylistic features of text with techniques, such as back-translation or representation disentanglement Mahmood et al. (2022); Altakrori et al. (2022); Bevendorff et al. (2019). Our instance obfuscated technique is inspired by representation mixup, while representing a pilot approach for LM decision protection.

6 Conclusion

In this work, we introduce decision privacy and propose IoI that prohibits information leakage of PLMs under the settings of black-box LMaaS. In contrast to prior works, we are the first ones to consider the privacy protection of PLM’s decision via instance obfuscation. Correspondingly, we define the evaluation metrics tailored for decision privacy and conduct comprehensive experiments regarding task performance and privacy protection. We anticipate our work conveys valuable insight and sheds some light on the trajectory of privacy in NLP.

Limitations

We discuss two main limitations of this work. First, the extra instance inference cost. IoI hides the target instance behind obfuscators so that the target instance never exposes directly to the PLM. To guarantee the strength of privacy protection and stability of the task performance, the strategies, including balancing and randomization, emit additional requests which result in multiple inferences for one instance. However, such incurred cost is not as severe as the previous works, for some of them need significant effort to fine-tune the remote PLMs, and some others require partial/entire model sharing hence compromising the privacy of the model.

Second, IoI is deliberated for text classification tasks in a solely black-box fashion, thus it is not suitable for generative tasks. As for natural language generation, the adaption based on the current method requires addressing the problems such as mix-up tokens and variable lengths of generated text which are non-trivial. We leave this to be the future work.

Ethical Considerations

Technology innovations generally offer potential benefits, but they also possess the risk of intentional exploitation for nefarious purposes, and LMaaS is not immune to this reality. The presence of regulations and standards establishes a legal structure that ensures responsible utilization of data and grants individuals the right to request the deletion of their data. In the absence of such regulations, society depends on the ethical responsibility of technology practitioners to ensure the ethical usage of data.

Decision privacy, defined in this paper, provides a fundamental direction for protecting the data, as well as LMaaS, from being abusively used. The proposed method technically guarantees the privacy of the input and output data to and from the LMaaS being fully obfuscated. Adopting this method ensures the operations and data are intended for legitimate purposes rather than malicious. Besides, the method itself can seamlessly be integrated into compatible underlying technologies or running systems without any extra modification, which reduces the barriers associated with implementation for increasing accessibility for individuals or organizations.

All experimental datasets used in this work are openly available benchmarks. No demographic or identity characteristics are used in this paper.

References

Altakrori et al. (2022) Malik Altakrori, Thomas Scialom, Benjamin C. M. Fung, and Jackie Chi Kit Cheung. 2022. A multifaceted framework to evaluate evasion, content preservation, and misattribution in authorship obfuscation techniques. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2391–2406, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Bevendorff et al. (2019) Janek Bevendorff, Martin Potthast, Matthias Hagen, and Benno Stein. 2019. Heuristic authorship obfuscation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1098–1108, Florence, Italy. Association for Computational Linguistics.
Brown et al. (2020) Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
Chen et al. (2022) Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, and Furu Wei. 2022. THE-X: Privacy-preserving transformer inference with homomorphic encryption. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3510–3520, Dublin, Ireland. Association for Computational Linguistics.
Coavoux et al. (2018) Maximin Coavoux, Shashi Narayan, and Shay B. Cohen. 2018. Privacy-preserving neural representations of text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1–10, Brussels, Belgium. Association for Computational Linguistics.
Dolan and Brockett (2005) William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
Du et al. (2023) Minxin Du, Xiang Yue, Sherman SM Chow, and Huan Sun. 2023. Sanitizing sentence embeddings (and labels) for local differential privacy. In Proceedings of the ACM Web Conference 2023, pages 2349–2359.
Feng et al. (2020) Bo Feng, Qian Lou, Lei Jiang, and Geoffrey C Fox. 2020. Cryptogru: Low latency privacy-preserving text analysis with gru. arXiv preprint arXiv:2010.11796.
Guo et al. (2019) Hongyu Guo, Yongyi Mao, and Richong Zhang. 2019. Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941.
Hoory et al. (2021) Shlomo Hoory, Amir Feder, Avichai Tendler, Sofia Erell, Alon Peled-Cohen, Itay Laish, Hootan Nakhost, Uri Stemmer, Ayelet Benjamini, Avinatan Hassidim, and Yossi Matias. 2021. Learning and evaluating a differentially private pre-trained language model. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1178–1189, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Kahla et al. (2022) Mostafa Kahla, Si Chen, Hoang Anh Just, and Ruoxi Jia. 2022. Label-only model inversion attacks via boundary repulsion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15045–15053.
Kim et al. (2021) JangHyun Kim, Wonho Choo, Hosan Jeong, and Hyun Oh Song. 2021. Co-mixup: Saliency guided joint mixup with supermodular diversity. In International Conference on Learning Representations.
Li et al. (2018) Yitong Li, Timothy Baldwin, and Trevor Cohn. 2018. Towards robust and privacy-preserving text representations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 25–30, Melbourne, Australia. Association for Computational Linguistics.
Mahmood et al. (2022) Asad Mahmood, Faizan Ahmad, Zubair Shafiq, Padmini Srinivasan, and Fareed Zaffar. 2022. A girl has no name: Automated authorship obfuscation using mutant-x. Proceedings on Privacy Enhancing Technologies, 1:18.
Mao et al. (2011) Huina Mao, Xin Shuai, and Apu Kapadia. 2011. Loose tweets: an analysis of privacy leaks on twitter. In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society, pages 1–12.
Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
Plant et al. (2021) Richard Plant, Dimitra Gkatzia, and Valerio Giuffrida. 2021. CAPE: Context-aware private embeddings for private language learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7970–7978, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Qu et al. (2021) Chen Qu, Weize Kong, Liu Yang, Mingyang Zhang, Michael Bendersky, and Marc Najork. 2021. Natural language understanding with privacy-preserving bert. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 1488–1497.
Rajpurkar et al. (2016) Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
Sen (2015) Jaydip Sen. 2015. Security and privacy issues in cloud computing. In Cloud technology: concepts, methodologies, tools, and applications, pages 1585–1630. IGI global.
Shejwalkar et al. (2021) Virat Shejwalkar, Huseyin A Inan, Amir Houmansadr, and Robert Sim. 2021. Membership inference attacks against nlp classification models. In NeurIPS 2021 Workshop Privacy in Machine Learning.
Socher et al. (2013) Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
Song and Raghunathan (2020) Congzheng Song and Ananth Raghunathan. 2020. Information leakage in embedding models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 377–390.
Sun et al. (2022) Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, and Xipeng Qiu. 2022. Black-box tuning for language-model-as-a-service. In International Conference on Machine Learning, pages 20841–20855. PMLR.
Tang et al. (2016) Jun Tang, Yong Cui, Qi Li, Kui Ren, Jiangchuan Liu, and Rajkumar Buyya. 2016. Ensuring security and privacy preservation for cloud data services. ACM Computing Surveys (CSUR), 49(1):1–39.
Wang et al. (2018) Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
Xiao et al. (2023) Guangxuan Xiao, Ji Lin, and Song Han. 2023. Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870.
Xu et al. (2020) Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, and Nathanael Teissier. 2020. A differentially private text perturbation method using regularized mahalanobis metric. In Proceedings of the Second Workshop on Privacy in NLP, pages 7–17, Online. Association for Computational Linguistics.
Yue et al. (2021) Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, and Sherman S. M. Chow. 2021. Differential privacy for text analytics via natural text sanitization. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3853–3866, Online. Association for Computational Linguistics.
Zhang et al. (2018) Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
Zhou et al. (2022) Xin Zhou, Jinzhu Lu, Tao Gui, Ruotian Ma, Zichu Fei, Yuran Wang, Yong Ding, Yibo Cheung, Qi Zhang, and Xuanjing Huang. 2022. TextFusion: Privacy-preserving pre-trained model inference via token fusion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8360–8371, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Appendix A Privacy Discussion

In this section, we formally define the threat model in Section A.1, and analyze IoI’s privacy from the perspective of possible attacks in Section A.2. Moreover, we discuss the incurred cost of privacy in Section A.3.

A.1 Threat Model

According to Equation 3, we consider a threat model where the adversary $\mathcal{A}$ collects data from the output of $E(\bm{x})$ and $y^{\prime}$ , that is, the input representation and obfuscated decision. Additionally, as a malicious cloud service provider, $\mathcal{A}$ also has white-box access to $M$ and maintains $M$ ’s training data. $\mathcal{A}$ seeks to reverse the original input text $\bm{x}$ and/or recover the true output $y$ .

A.2 Privacy Analyses

In IoI, $E(\bm{x})$ is the PPRG-encoded $[b;x]$ and $b$ pair. In order to acquire $x$ , $\mathcal{A}$ needs to recover the text representation of all $[b;x]$ and $b$ , as well as pairing them up. As mentioned in Section 3.1, a qualified PPRG ensures the irreversibility of the input text sequence, meanwhile generating distinct representations even for identical input. Even though $\mathcal{A}$ has training data of $M$ and white-box access to $M$ , these do not help $\mathcal{A}$ reverse PPRG-encoded instances as long as PPRG has sufficient privacy strength for resisting known attacks Song and Raghunathan (2020). Thus, we conclude that (1) The obfuscator $b$ or the obfuscated instance $[b;x]$ can not be reverse-engineered. (2) Every time the produced representation of $b$ or $[b;x]$ is different, and it is not possible to identify the same $b$ or $[b;x]$ from their PPRG-produced representation. (3) $\mathcal{A}$ is not able to differentiate $[b;x]$ from $b$ . (4) No $\bm{x}$ could be extracted from $[b;x]$ .

When it comes to resolving a $y$ from $y^{\prime}$ s, as is illustrated in Equation 6, $\mathcal{A}$ has to collect all the associated $[b;x]$ and $b$ pairs in a group $G_{n}$ . However, identifying a group of $b$ s or $[b;x]$ s is no better than exhausting all the possible PLM inference request combinations because of the hardness of reversing PPRG-encoded instances and the arbitrary request order of unique representations. For example, let $m=|g|$ be the unit group size and $n$ be the number of unit groups that participated in one instance obfuscation, resolving a $y$ involves $2mn$ independent PLM inference requests. Since $b\in G_{n}$ is uniformly distributed, the probability of $\mathcal{A}$ to find all these $2mn$ instances in $r$ total requests to the PLM is $1/\binom{r}{2mn}$ . Even in a toy setup, say 3-class classification ( $m=3$ ) with one group ( $n=1$ ) of obfuscators, merely 100 total requests would require more than a billion tries to identify all of them. Therefore, it is impossible for $\mathcal{A}$ to extract $\bm{x}$ from encoded $b$ and $[b;x]$ with a qualified PPRG method and reversing $y$ from $y^{\prime}$ is no better than exhausting all the possible combinations of requests.

A.3 Privacy Cost

Privacy comes with a cost. Here, we elaborate it from two aspects: communication and computation.

Communication cost. As a baseline, each instance $x$ sends one request to the PLM. In IoI, as Equation 8, each $x$ is concatenated with $|G_{n}|=n|C|$ instances. All these obfuscated instances, along with the same amount of obfuscators, form the total requests to the PLM, namely, $2n|C|$ . In practice, when there are multiple $x$ s, the obfuscators are pre-computed and reused from the obfuscator pool. Hence, for $k$ $x$ instances, the total number of requests ranges in $[(1+kn)|C|,2kn|C|]$ .

Computation cost. On PLM’s side, the number of requests to inference is indicated in the communication cost. On the data owner’s side, resolving a $y$ requires the execution of Equation 10, which only involves trivial matrix operations.

Appendix B Experiments

We report additional experiments and analyses in this section.

B.1 Length Expansion

The privacy strength of a single obfuscated instance $[b;x]$ mainly comes from the domination of $b$ . Kim et al. (2021) demonstrate that the model has the ability to map a mixed instance consisting of more than two raw instances to a mixed label. Inspired by that, We expand the length of $b$ to amplify the impact of it on PLM’s decision by duplicating it $k$ times. ⁹⁹9The duplication does not hurt the privacy of the generated representation because it is before semantic-neutral shuffling. Correspondingly, the concatenation sequence length becomes $k|b|+|x|$ . Here, we seek to verify the relation between the accuracy and the obfuscator’s length over obfuscated and resolved prediction results.

We duplicate $b$ by $k$ times before encoding to realize the length increment, and the dataset we used here is SST-2. As the solid lines shown in Figure 5, $k$ is tested from 1 to 10 consecutively, and it presents a negative correlation to accuracy. Specifically, when $k=1$ , the accuracy of resolved inference is more than 0.9 whereas the accuracy of obfuscated inference is less than 0.8. When $k$ becomes larger, the accuracy of resolved inference $T_{r}$ drops gradually until it reaches around 0.82 when $k$ is 10. As a comparison, the trend of obfuscated $T_{o}$ falls quickly almost throughout the $k$ ’s process and when $k$ is 10, it hits 0.55, which is close to the random guess. The maximum difference of the accuracy between two variants is more than 2.25 times to the minimum difference at the beginning ( $k$ =2). This experiment demonstrates the massive impact of length expansion on the performance, and a proper $k$ could deviate the obfuscated distribution of PLM’s decision far from the true one, thus intensifying the privacy protection. Note that we set another hyper-parameter $n$ in $G_{n}$ to be 1 and 5 in this experiment; regardless of $n$ , the negative correlation holds.

B.2 Decision Distribution

According to Equation 4, the decision distribution of $M(b)$ or $M([b;x])$ should be as close to random as possible, and the overall decision distribution of $M(\cdot)$ should also lean towards randomness.

For validation purposes, we conduct an additional experiment in Table 5, which presents the decision distribution of $M(b)$ , $M([b;x])$ and the overall $M(\cdot)$ on SST-2. Specifically, each cell denotes the distribution of negative/positive decisions. The parameter $k$ controls the strength of (as introduced in Section B.1, a higher $k$ enhances privacy but reduces utility).

At $k=1$ (optimal utility), the distributions of $M(b)$ , $M([b;x])$ and $M(\cdot)$ approach randomness. As $k$ increases, the level of randomization intensifies. At $k=10$ , the distribution becomes equivalent to random.

$k$	$M([b;x])$	$M(b)$	$M(\cdot)$
$1$	0.431/0.569	0.5/0.5	0.465/0.535
$5$	0.487/0.513	0.5/0.5	0.493/0.507
$10$	0.502/0.498	0.5/0.5	0.501/0.499

Table 5: Decision distribution (SST-2)

Moreover, we set $k=1$ , and report the decision distribution for the other three datasets in Table 6. Similarly, each cell denotes the distribution of decisions. The results show that our method achieves a promising decision distribution even with optimal utility.

Dataset	$M([b;x])$	$M(b)$	$M(\cdot)$
SST-5	0.141/0.227/0.191 0.273/0.168	0.2/0.2/0.2 0.2/0.2	0.171/0.213/0.195 0.236/0.184
MRPC	0.528/0.472	0.5/0.5	0.514/0.486
QNLI	0.532/0.468	0.5/0.5	0.516/0.484

Table 6: Decision distribution (

k=1

)