This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Probabilistic Selective Encryption of
Convolutional Neural Networks for Hierarchical Services

Jinyu Tian1    Jiantao Zhou1,*    and Jia Duan2
1State Key Laboratory of Internet of Things for Smart City
  
Department of Computer and Information Science
   University of Macau
2JD Explore
   JD
{yb77405, jtzhou}@um.edu.mo, [email protected]
Abstract

Model protection is vital when deploying Convolutional Neural Networks (CNNs) for commercial services, due to the massive costs of training them. In this work, we propose a selective encryption (SE) algorithm to protect CNN models from unauthorized access, with a unique feature of providing hierarchical services to users. Our algorithm firstly selects important model parameters via the proposed Probabilistic Selection Strategy (PSS). It then encrypts the most important parameters with the designed encryption method called Distribution Preserving Random Mask (DPRM), so as to maximize the performance degradation by encrypting only a very small portion of model parameters. We also design a set of access permissions, using which different amount of most important model parameters can be decrypted. Hence, different levels of model performance can be naturally provided for users. Experimental results demonstrate that the proposed scheme could effectively protect the classification model VGG19 by merely encrypting 8%8\% parameters of convolutional layers. We also implement the proposed model protection scheme in the denoising model DnCNN, showcasing the hierarchical denoising services.

Refer to caption
Figure 1: Schematic diagram of our proposed system model.

1 Introduction

Convolutional Neural Network (CNN) has been used to achieve unparalleled results across various tasks such as object detection [5, 10, 20], super-resolution reconstruction [9, 13, 15], and image inpainting [36, 37, 38]. The construction of a successful CNN model is not a trivial task, which usually requires substantial investments in expertise, time, and resources. To encourage healthy business investment and competitions, it is crucial to prevent unauthorized access to CNN models. Meanwhile, there is a recent trend of deploying pretrained CNN models through cloud-based services. Under such circumstance, it is much desirable to offer hierarchical services, such that users with different access privileges could enjoy different levels of model performance. For instance, when using CNN model for image denoising task, the lowest access privilege leads to a roughly denoised version; this could serve as a promotion of the denoising service, which could be free. When users prefer sophisticatedly denoised images, they can pay for advanced access privileges for better services.

A straightforward strategy to achieve model protection and access control is to encrypt all model parameters via traditional cryptographic methods such as RSA [26], TDES [6], and Twofish [27]. On the order of millions or more parameters, both the encryption and decryption of them would be very expensive, especially for resource-constrained devices. In comparison, it is much desirable to encrypt parameters selectively. This strategy is known as Selective Encryption (SE) and has practical applications in multimedia security and Internet security [3, 7, 11, 23, 35]. On the other hand, encrypting all model parameters indiscriminately cannot provide fine-grained hierarchical services.

The motivation of the SE derives from Shannon’s seminar work, which pointed out that effective encryption/decryption can be performed by decreasing the redundancy of a system [28]. In general, reducing the redundancy of a system relies on the prior knowledge of the importance of system components. For instance, Abomhara et. al. [2] protected the bitstreams of H.264/AVC video codec by selectively encrypting the I-frames, which have larger impacts on the quality of reconstructed frames, compared with P- and B- frames. In fact, in the deep learning community, several works [19, 24, 31, 39] showed that the parameters in a CNN model are NOT equally important, and some of them contain more useful information for a given task. The unequal importance of model parameters motivates us to design a new SE for protecting a CNN model by only encrypting those important parameters.

Therefore, in this work, we present a novel SE algorithm to protect a pretrained CNN model, with a unique feature of providing hierarchical services to users. Our proposed protection scheme consists of the following two steps. First of all, we propose the Probabilistic Selection Strategy (PSS) to select important model parameters, which heavily impact the model performance. Then, we design an encryption algorithm Distribution Preserving Random Mask (DPRM) for encrypting those selected parameters. As will be verified experimentally, the proposed SE endows the feasibility of providing users with hierarchical services, i.e., different levels of model performance could be granted according to predefined permissions. The main contributions of our work can be summarized as follows:

  • We propose the PSS to determine the importance of parameters in a CNN model. The PSS could be generalized to CNN models for different applications, such as image classification and denoising.

  • We propose the algorithm DPRM to encrypt parameters selected by the PSS such that encrypted parameters and those unencrypted ones are statistically consistent. We theoretically prove that the ciphertext obtained from the DPRM is imperceptible to attackers, which significantly enhances the security of the protected model.

  • Through manipulating the number of decrypted parameters with different level of importance, the proposed framework could provide hierarchical services for users by assigning different permissions.

  • The experimental results on the classification model VGG19 and the denoising model DnCNN show that we can effectively protect the two models with the proposed SE, by merely encrypting less than 8% parameters of convolutional layers.

The rest of the paper is organized as follows. Section 2 briefly introduces the system model, the threat model, and design goals of the proposed framework. We provide the details about the system model in Section 3. In Section 4, we theoretically prove that the ciphertext of those important parameters is imperceptible to attackers. Finally, Section 5 offers experimental results to verify that all expected goals are achieved, and Section 6 concludes.

2 System model, threat model and design goals

2.1 System model

We aim to design a CNN protection framework with the following two functionalities. Firstly, this system can prevent unauthorized access to the pretrained model. Namely, only authorized users could obtain the correct model outputs, while the unauthorized ones can only get irrelevant results. Secondly, it can provide hierarchical services for authorized users with different permissions. The framework of our system is presented in Fig. 1. Let 𝚯\mathcal{F}_{\mathbf{\Theta}} be a pretrained CNN model with the parameter set 𝚯\mathbf{\Theta}. To protect this model via our proposed SE strategy, a service provider firstly feeds the pretrained model 𝚯\mathcal{F}_{\mathbf{\Theta}} into a Select module, which selects important parameters 𝚯^\hat{\mathbf{\Theta}} from 𝚯\mathcal{F}_{\mathbf{\Theta}} by using the proposed PSS. Then, the Encrypt module constructs the protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} by encrypting 𝚯^\hat{\mathbf{\Theta}} with the proposed DPRM. The model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} subsequently will be deployed, waiting for access with permissions. The Assign module generates several permissions 𝐒m^\mathbf{S}_{\hat{m}}’s based on the outputs of the Select and Encrypt modules. Authorized users then obtain a decrypted model 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}} by inputting a permission 𝐒m^\mathbf{S}_{\hat{m}} into the Decrypt module. The decrypted model will exhibit different levels of performance of the pretrained model 𝚯\mathcal{F}_{\mathbf{\Theta}} as the change of the permission 𝐒m^\mathbf{S}_{\hat{m}}. As shown in Fig. 1, three authorized users decrypt the protected model with different permissions 𝐒m^\mathbf{S}_{\hat{m}}’s (m^=1,2,3\hat{m}=1,2,3), and obtain corresponding decrypted models 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}} with hierarchical performance on classifying a cat. When an unauthorized user attempts to access the protected model without any permission, the system will return a useless result.

2.2 Threat model

The considered security threats of the proposed system mainly come from attackers who attempt to recover encrypted parameters 𝚯^\hat{\mathbf{\Theta}} from the protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}}, so as to use the pretrained model 𝚯\mathcal{F}_{\mathbf{\Theta}} without authorization. On the one hand, attackers could treat the parameter set 𝚯~\tilde{\mathbf{\Theta}} of the protected model as a noisy version of the original parameter set 𝚯\mathbf{\Theta}, where parameters 𝚯^\hat{\mathbf{\Theta}} are contaminated. Attackers thus can recover 𝚯^\hat{\mathbf{\Theta}} from 𝚯~\tilde{\mathbf{\Theta}} with denoising techniques. On the other hand, those encrypted parameters 𝚯^\hat{\mathbf{\Theta}} could be treated as the missing information in 𝚯~\tilde{\mathbf{\Theta}}, which possibly can be restored with retraining strategies. We call these two adversarial behaviors the denoising attack and the retraining attack.

2.3 Design goals

To evaluate the effectiveness and security of the proposed scheme, we clarify the following design goals: 1) Effectiveness: To make the protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} dysfunctional to unauthorized access, we have to ensure that the selectively encrypted model exhibits sufficiently large performance degradation; 2) Hierarchy: The proposed system should provide users with different services by assigning them with different permissions. Therefore, the released model 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}} needs to exhibit various performance in accordance to user’s permission 𝐒m^\mathbf{S}_{\hat{m}}; and 3) Security: The protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} should be secure against threats discussed above. We will verify all the design goals in Section 5.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: The histograms of parameters in several layers of VGG19. The red curve in each sub-figure is the corresponding Gaussian distribution with mean value and standard deviation estimated from parameters.

3 The system model for protecting CNN

In this section, we provide details concerning each module of our proposed system model in Section 2.1.

3.1 Select

This module aims to select important parameters 𝚯^\hat{\mathbf{\Theta}} from convolutional layers of 𝚯\mathcal{F}_{\mathbf{\Theta}}. Here we only consider convolutional layers because most of parameters in a CNN model are concentrated in these layers [12, 18, 33, 34]. Hereafter, the term “layer” refers to the convolutional layer. We adopt a layer-wise strategy to determine 𝚯^\hat{\mathbf{\Theta}} for parallel processing. Specifically, suppose the pretrained model contains LL layers, and hence, 𝚯^\hat{\mathbf{\Theta}} is composed of LL subsets 𝚯^l\hat{\mathbf{\Theta}}^{l}’s (l=1,,Ll=1,...,L), where each 𝚯^l\hat{\mathbf{\Theta}}^{l} contains important parameters of the ll-th layer. We propose a selection method PSS to identify each 𝚯^l\hat{\mathbf{\Theta}}^{l}. The motivation of PSS is as follows.

Intuitively, we can identify the importance of parameters by evaluating the performance degradation of the pretrained model without these parameters. However, neurons of a CNN model may have different responses for various inputs, implying that the importance of parameters is related to the inputs. More precisely, let 𝚯l\mathbf{\Theta}^{l} denote parameters of the ll-th layer. When feeding a sample 𝐱n\mathbf{x}_{n} (n=1,,Nn=1,...,N) into 𝚯\mathcal{F}_{\mathbf{\Theta}}, there exists a parameter subset 𝚯^l\hat{\mathbf{\Theta}}^{l} of 𝚯l\mathbf{\Theta}^{l} to cause the maximal performance degradation of 𝚯\mathcal{F}_{\mathbf{\Theta}} if we remove 𝚯^l\hat{\mathbf{\Theta}}^{l}. Clearly, 𝚯^l\hat{\mathbf{\Theta}}^{l} changes with the input 𝐱n\mathbf{x}_{n}. To eliminate such randomness, we can count how many times each parameter θ\theta in 𝚯l\mathbf{\Theta}^{l} is selected as a candidate of 𝚯^l\hat{\mathbf{\Theta}}^{l}, after feeding all 𝐱n\mathbf{x}_{n}. Denote the selected frequency of a parameter θ\theta in 𝚯l\mathbf{\Theta}^{l} by pθp_{\theta}. It is clear that the frequency pθp_{\theta} directly reflects the importance of a parameter θ\theta to the pretrained model. We thus name pθp_{\theta} the importance of the parameter θ\theta. For simplicity, we call 𝚯^l\hat{\mathbf{\Theta}}^{l} dominated set of the ll-th layer and call parameters in it dominated parameters. Naturally, 𝚯^\hat{\mathbf{\Theta}} is the dominated set of 𝚯\mathcal{F}_{\mathbf{\Theta}}.

Now we formulate the selection strategy PSS above into the following optimization problem.

minpθ[0,1]1Nn=1N(𝚯(𝐱n,(𝐈𝐙(n))𝚯l),𝐲n)+λ𝐙(n)0,\small\min_{p_{\theta}\in[0,1]}\frac{1}{N}\sum_{n=1}^{N}\mathcal{L}(\mathcal{F}_{\mathbf{\Theta}}(\mathbf{x}_{n},(\mathbf{I}-\mathbf{Z}^{(n)})\odot\mathbf{\Theta}^{l}),\mathbf{y}_{n})+\lambda\|\mathbf{Z}^{(n)}\|_{0}, (1)

where 𝐙(n)={zθ(n)}θ𝚯l\mathbf{Z}^{(n)}=\{z^{(n)}_{\theta}\}_{\theta\in\mathbf{\Theta}^{l}}, zθ(n)Bern(zθ|pθ)z^{(n)}_{\theta}\sim Bern(z_{\theta}|p_{\theta}) is a sample of the binary random variable zθz_{\theta}, 𝐈\mathbf{I} is the vector of ones with the same length as 𝚯l\mathbf{\Theta}^{l}, and λ\lambda is a weighting factor for the regularization term. Here, we briefly explain the relationship between the optimization problem (1) and the selection strategy. The element-wise multiplication (𝐈𝐙(n))𝚯l(\mathbf{I}-\mathbf{Z}^{(n)})\odot\mathbf{\Theta}^{l} is designed to simulate the removing operation by noting that a parameter θ\theta will be removed from 𝚯l\mathbf{\Theta}^{l} if the corresponding zθ(n)=1z^{(n)}_{\theta}=1. The first term thus indicates the performance of 𝚯\mathcal{F}_{\mathbf{\Theta}} after removing a part of the parameters (()\mathcal{L}(\cdot) is a performance evaluating function). To maximize the performance degradation of 𝚯\mathcal{F}_{\mathbf{\Theta}}, which is equivalent to minimizing the first term in (1), those important parameters θ\theta’s in 𝚯l\mathbf{\Theta}^{l} should be assigned with large importance pθp_{\theta}, so that zθ(n)=1z^{(n)}_{\theta}=1 for most 𝐱n\mathbf{x}_{n}’s. Thus, we can determine the importance pθp_{\theta} of each θ\theta in 𝚯l\mathbf{\Theta}^{l} by solving the problem (1). It should be noted that the term 𝐙(n)0\|\mathbf{Z}^{(n)}\|_{0} penalizes the number of removed parameters, such that fewer parameters are assigned with large importance.

The discrete nature of the binary random variable zθz_{\theta} challenges the optimization of the problem (1). We can overcome this obstacle via reparameterizing [14, 29] zθz_{\theta} into the continuous random variable z~θ\tilde{z}_{\theta} as follows

sθ(u)=Sig((log(u/(1u))+log(pθ/(1pθ)))/β),\displaystyle s_{\theta}(u)=Sig((\log(u/(1-u))+\log(p_{\theta}/(1-p_{\theta})))/\beta), (2)
s~θ(u)=sθ(u)(ζγ)+γ,\displaystyle\tilde{s}_{\theta}(u)=s_{\theta}(u)(\zeta-\gamma)+\gamma,
z~θ=min(1,max(0,s~θ(u)),\displaystyle\tilde{z}_{\theta}=\min(1,\max(0,\tilde{s}_{\theta}(u)),

which was initially proposed in [22] and later improved in [21]. In (2), uu is a random variable of uniform distribution U(0,1)U(0,1), Sig()Sig(\cdot) represents the sigmoid function, ζ>0\zeta>0 and γ<0\gamma<0 are parameters to extend the support of z~\tilde{z} to be [0,1][0,1].

Another challenge ascribes to the L0L_{0} regularizer 𝐙~(n)0={z~θ(n)}θ𝚯l0\|\tilde{\mathbf{Z}}^{(n)}\|_{0}=\|\{\tilde{z}^{(n)}_{\theta}\}_{\theta\in\mathbf{\Theta}^{l}}\|_{0} in the problem (1). Note that the original zθz_{\theta} has been relaxed by z~θ\tilde{z}_{\theta} defined in (2). We can relax the L0L_{0} regularizer into a differentiable form θ𝚯l{z~θ(n)0}\sum_{\theta\in\mathbf{\Theta}^{l}}\mathbb{P}\left\{\tilde{z}^{(n)}_{\theta}\neq 0\right\}. The reason is that L0L_{0} norm enforces less nonzero elements in 𝐙~(n)\tilde{\mathbf{Z}}^{(n)}. This implies that, for most z~θ(n)\tilde{z}^{(n)}_{\theta}’s in 𝐙~(n)\tilde{\mathbf{Z}}^{(n)}, the probability {z~θ(n)0}\mathbb{P}\left\{\tilde{z}^{(n)}_{\theta}\neq 0\right\} should be as small as possible. According to the cumulative distribution function (CDF) of sθ(u)s_{\theta}(u) introduced in [22], we can conclude that

{z~θ(n)0}=Sig(log(pθ/(1pθ))βlogγζ),\small\mathbb{P}\left\{\tilde{z}^{(n)}_{\theta}\neq 0\right\}=Sig\left(\log(p_{\theta}/(1-p_{\theta}))-\beta\log{\frac{-\gamma}{\zeta}}\right), (3)

where the details is given in supplementary materials.

Upon having the relaxation (2) and (3), the problem (1) reduces to

minpθ[0,1]1Nn=1N((𝚯(𝐱n,(𝐈𝐙~(n))𝚯l),𝐲n)),\displaystyle\min\limits_{p_{\theta}\in[0,1]}~{}\frac{1}{N}\sum_{n=1}^{N}\left(\mathcal{L}(\mathcal{F}_{\mathbf{\Theta}}(\mathbf{x}_{n},(\mathbf{I}-\tilde{\mathbf{Z}}^{(n)})\odot\mathbf{\Theta}^{l}),\mathbf{y}_{n})\right), (4)
+λθ𝚯lSig(log(pθ/(1pθ))βlogγζ).\displaystyle+\lambda\sum_{\theta\in\mathbf{\Theta}^{l}}Sig(\log(p_{\theta}/(1-p_{\theta}))-\beta\log{\frac{-\gamma}{\zeta}}).

When the performance evaluation function ()\mathcal{L}(\cdot) is differentiable with respect to pθp_{\theta}’s, such as the negative cross-entropy or negative mean squared error, we can effortlessly solve this problem with automatic differentiation toolboxes TensorFlow [1] or Pytorch [25].

After solving the problem (4), we obtain the importance of parameters in the ll-th layer, denoted by 𝐏l={pθ}θ𝚯l\mathbf{P}^{l}=\{p_{\theta}\}_{\theta\in\mathbf{\Theta}^{l}}. The dominated set 𝚯^l\hat{\mathbf{\Theta}}^{l} of this layer naturally can be identified as follows

𝚯^l={θ|θ𝚯l,pθis topϕin𝐏l},\small\hat{\mathbf{\Theta}}^{l}=\{\theta|\theta\in\mathbf{\Theta}^{l},p_{\theta}~{}\text{is~{}top}~{}\phi~{}\text{in}~{}\mathbf{P}^{l}\}, (5)

where ϕ\phi is the number of dominated parameters. By repeatedly solving the problem (4) with l=1,,Ll=1,...,L, and searching dominated parameters as (5), we can obtain the dominated set 𝚯^={𝚯^l}l=1L\hat{\mathbf{\Theta}}=\{\hat{\mathbf{\Theta}}^{l}\}^{L}_{l=1} for the model 𝚯\mathcal{F}_{\mathbf{\Theta}}.

Upon having dominated parameters 𝚯^\hat{\mathbf{\Theta}}, we now split them into MM different subsets as follows:

𝚯^={𝚯^m}m=1M,𝚯^m={𝚯^ml}l=1L,\displaystyle\hat{\mathbf{\Theta}}=\{\hat{\mathbf{\Theta}}_{m}\}_{m=1}^{M},\quad\hat{\mathbf{\Theta}}_{m}=\{\hat{\mathbf{\Theta}}_{m}^{l}\}^{L}_{l=1}, (6)
𝚯^ml={θ|θ𝚯^l,Qm+1,l<pθQm,l},\displaystyle\hat{\mathbf{\Theta}}_{m}^{l}=\{\theta|\theta\in\hat{\mathbf{\Theta}}^{l},Q_{m+1,l}<p_{\theta}\leq Q_{m,l}\},

where Qm,lQ_{m,l} is the (Mm+1)/M(M-m+1)/M percentile of elements in 𝐏l\mathbf{P}^{l}. Here we split 𝚯^\hat{\mathbf{\Theta}} according to the importance of parameters as in (6), so that the subsequent procedure could control the performance of 𝚯\mathcal{F}_{\mathbf{\Theta}} by manipulating the number of decrypted parameters with different importance. More details are deferred to Section 3.4.

For the future decryption of dominated parameters in 𝚯^\hat{\mathbf{\Theta}}, we also need their locations 𝐄^={𝐞θ}θ𝚯^\hat{\mathbf{E}}=\{\mathbf{e}_{\theta}\}_{\theta\in\hat{\mathbf{\Theta}}} in the model 𝚯\mathcal{F}_{\mathbf{\Theta}}, where 𝐞θ\mathbf{e}_{\theta} is a 2-D tuple to locate which layer θ\theta belongs to and the position of θ\theta in this layer. Similarly, we split 𝐄^\hat{\mathbf{E}} following the partition of 𝚯^\hat{\mathbf{\Theta}}. That is,

𝐄^={𝐄^m}m=1M,𝐄^m={𝐄^ml}l=1L,𝐄^ml={𝐞θ}θ𝚯^ml.\small\hat{\mathbf{E}}=\{\hat{\mathbf{E}}_{m}\}_{m=1}^{M},~{}\hat{\mathbf{E}}_{m}=\{\hat{\mathbf{E}}^{l}_{m}\}_{l=1}^{L},~{}\hat{\mathbf{E}}_{m}^{l}=\{\mathbf{e}_{\theta}\}_{\theta\in\hat{\mathbf{\Theta}}_{m}^{l}}. (7)
Input: Dominated set 𝚯^={𝚯^m}m=1M\hat{\mathbf{\Theta}}=\{\hat{\mathbf{\Theta}}_{m}\}_{m=1}^{M}, secret keys 𝐊={κm}m=1M\mathbf{K}=\{\kappa_{m}\}_{m=1}^{M}.
Output: 𝚯~,𝐔,𝐕,𝐊,𝝁,𝝈\mathcal{F}_{\tilde{\mathbf{\Theta}}},\mathbf{U},\mathbf{V},\mathbf{K},\bm{\mu},\bm{\sigma}.
1 for m=1,,M~{}m=1,...,M do
2       𝐑m𝒢fs(κm,|𝚯^m|),𝐂m=𝚯^m+𝐑m\mathbf{R}_{m}\leftarrow\mathcal{G}_{fs}(\kappa_{m},|\hat{\mathbf{\Theta}}_{m}|),\quad\mathbf{C}_{m}=\hat{\mathbf{\Theta}}_{m}+\mathbf{R}_{m};
3       for l=1,,Ll=1,...,L do
4             𝐂^ml=,𝐂m={𝐂ml}l=1L\hat{\mathbf{C}}^{l}_{m}=\emptyset,~{}\mathbf{C}_{m}=\{\mathbf{C}^{l}_{m}\}^{L}_{l=1};
5             for c𝐢𝐧𝐂mlc~{}\mathbf{in}~{}\mathbf{C}^{l}_{m} do
6                   uml=min(𝐂ml),vml=max(𝐂ml)u^{l}_{m}=\min(\mathbf{C}^{l}_{m}),\quad v^{l}_{m}=\max(\mathbf{C}^{l}_{m});
7                   c(cuml)/(vmluml)c\leftarrow(c-u^{l}_{m})/(v^{l}_{m}-u^{l}_{m});
8                   μl=MEAN(𝚯l),σl=STD(𝚯l)\mu^{l}=\text{MEAN}(\mathbf{\Theta}^{l}),\quad\sigma^{l}=\text{STD}(\mathbf{\Theta}^{l});
9                   c^F1(c|μl,σl),𝐂^ml𝐂^mlc^\hat{c}\leftarrow F^{-1}\left(c|\mu^{l},\sigma^{l}\right),~{}\hat{\mathbf{C}}^{l}_{m}\leftarrow\hat{\mathbf{C}}^{l}_{m}\bigcup\hat{c};
10                  
11      𝐂^m={𝐂^ml}l=1L\hat{\mathbf{C}}_{m}=\{\hat{\mathbf{C}}^{l}_{m}\}^{L}_{l=1};
12      
13𝐂^={𝐂^m}m=1M,𝚯~replace𝚯^in𝚯with𝐂^\hat{\mathbf{C}}=\{\hat{\mathbf{C}}_{m}\}^{M}_{m=1},~{}\mathcal{F}_{\tilde{\mathbf{\Theta}}}\leftarrow\text{replace}~{}\hat{\mathbf{\Theta}}~{}\text{in}~{}\mathcal{F}_{\mathbf{\Theta}}~{}\text{with}~{}\hat{\mathbf{C}};
14 𝐔={𝐮m}m=1M,𝐮m={uml}l=1L\mathbf{U}=\{\mathbf{u}_{m}\}^{M}_{m=1},~{}\mathbf{u}_{m}=\{u_{m}^{l}\}^{L}_{l=1};
15 𝐕={𝐯m}m=1M,𝐯m={vml}l=1L\mathbf{V}=\{\mathbf{v}_{m}\}^{M}_{m=1},~{}\mathbf{v}_{m}=\{v_{m}^{l}\}^{L}_{l=1};
16 𝝁={μl}l=1L,𝝈={σl}l=1L\bm{\mu}=\{\mu^{l}\}^{L}_{l=1},\bm{\sigma}=\{\sigma^{l}\}^{L}_{l=1};
return:𝚯~,𝐔,𝐕,𝐊,𝝁,𝝈~{}\mathcal{F}_{\tilde{\mathbf{\Theta}}},\mathbf{U},\mathbf{V},\mathbf{K},\bm{\mu},\bm{\sigma}.
Algorithm 1 The Scheme of the EncryptEncrypt module (running the DPRM on each subset of 𝚯^\hat{\mathbf{\Theta}}).

3.2 Encrypt

We now propose an encryption method called DPRM to encrypt the dominated set 𝚯^\hat{\mathbf{\Theta}}, so as to protect the model 𝚯\mathcal{F}_{\mathbf{\Theta}} in terms of maximizing performance degradation. We run the DPRM independently on each subset 𝚯^m\hat{\mathbf{\Theta}}_{m} of 𝚯^\hat{\mathbf{\Theta}} to facilitate the manipulation of decrypted parameters. The DPRM works by first generating a pseudorandom number sequence to mask 𝚯^m\hat{\mathbf{\Theta}}_{m}, and then transforming the masked version of 𝚯^m\hat{\mathbf{\Theta}}_{m} into the same distribution as 𝚯^m\hat{\mathbf{\Theta}}_{m}. The detailed procedure of running DPRM is listed in Algorithm 1. For each 𝚯^m\hat{\mathbf{\Theta}}_{m} in the partition of 𝚯^\hat{\mathbf{\Theta}}, the DPRM sequentially performs:

𝟏)\bm{1)} RandMask(𝚯^m,κm)𝐂mRandMask(\hat{\mathbf{\Theta}}_{m},\kappa_{m})\rightarrow\mathbf{C}_{m}: Given a secret key κm\kappa_{m} produced by a key generator, we utilize the forward secure pseudorandom number generator (FSPRGN) [30] 𝒢fs\mathcal{G}_{fs} to produce a pseudorandom number sequence 𝐑m\mathbf{R}_{m} with length |𝚯^m||\hat{\mathbf{\Theta}}_{m}|. The subset 𝚯^m\hat{\mathbf{\Theta}}_{m} of the dominated set 𝚯^\hat{\mathbf{\Theta}} then is masked by this pseudorandom number sequence as 𝐂m=𝚯^m+𝐑m\mathbf{C}_{m}=\hat{\mathbf{\Theta}}_{m}+\mathbf{R}_{m} (line 2).

𝟐)\bm{2)} Mapping(𝐂m)𝐂^mMapping(\mathbf{C}_{m})\rightarrow{\hat{\mathbf{C}}_{m}}: An obvious limitation of RandMaskRandMask is that masked parameters 𝐂m\mathbf{C}_{m} are statistically different from the original 𝚯^m\hat{\mathbf{\Theta}}_{m}. Attackers can then easily identify 𝐂m\mathbf{C}_{m}, which implies that locations of encrypted parameters are leaked. To tackle this serious threat, we have to further transform 𝐂m\mathbf{C}_{m} to another domain whose distribution is consistent with 𝚯^m\hat{\mathbf{\Theta}}_{m}. To this end, the fundamental issue now is to estimate the distribution of parameters in 𝚯^m\hat{\mathbf{\Theta}}_{m}.

Observing the histograms of parameters of several layers in the CNN model VGG19 [32] in Fig. 2, it is reasonable to infer that parameters in each layer of a CNN model follow symmetric distributions such as Gaussian. Hence, we assume that parameters 𝚯l\mathbf{\Theta}^{l} of the ll-th layer follow a Gaussian distribution 𝒩(θ|μl,σl)\mathcal{N}(\theta|\mu^{l},\sigma^{l}), where the mean μl\mu^{l} and standard deviation σl\sigma^{l} can be estimated with sample mean and sample standard deviation of parameters in 𝚯l\mathbf{\Theta}^{l}.

Note that the masked parameters in 𝐂m\mathbf{C}_{m} are distributed across LL layers, where masked parameters in the ll-th layer are denoted by 𝐂ml\mathbf{C}^{l}_{m} (l=1,,Ll=1,...,L). Making the distribution of 𝐂m\mathbf{C}_{m} be consistent with that of 𝚯^m\hat{\mathbf{\Theta}}_{m} reduces to transform each 𝐂ml\mathbf{C}^{l}_{m} into the Gaussian distribution 𝒩(θ|μl,σl)\mathcal{N}(\theta|\mu^{l},\sigma^{l}) as 𝚯^l𝚯l\hat{\mathbf{\Theta}}^{l}\subset\mathbf{\Theta}^{l}. As shown in Algorithm 1 (lines 3-10), for l=1,,Ll=1,...,L, we first scale elements in 𝐂ml\mathbf{C}_{m}^{l} into the interval [0,1][0,1], and then transform these scaled cc’s with the inverse Gaussian CDF as follows:

c(cuml)/(vmluml),c𝐂ml,\displaystyle c\leftarrow(c-u^{l}_{m})/(v^{l}_{m}-u^{l}_{m}),\quad\forall c\in\mathbf{C}_{m}^{l}, (8)
c^F1(c|μl,σl),\displaystyle\hat{c}\leftarrow F^{-1}(c|\mu^{l},\sigma^{l}),

where uml=min(𝐂ml)u^{l}_{m}=\min(\mathbf{C}^{l}_{m}), vml=max(𝐂ml)v^{l}_{m}=\max(\mathbf{C}^{l}_{m}), and F1(|μl,σl)F^{-1}(\cdot|\mu^{l},\sigma^{l}) is the inverse CDF of 𝒩(θ|μl,σl)\mathcal{N}(\theta|\mu^{l},\sigma^{l}). After transforming all subsets 𝐂ml\mathbf{C}_{m}^{l}’s in 𝐂m\mathbf{C}_{m} as (8), we obtain the ciphertext 𝐂^m\hat{\mathbf{C}}_{m} of 𝚯^m\hat{\mathbf{\Theta}}_{m} .

By repeatedly implementing the DPRM above on each 𝚯^m\hat{\mathbf{\Theta}}_{m} in 𝚯^\hat{\mathbf{\Theta}}, we encrypt the dominated set 𝚯^\hat{\mathbf{\Theta}} of 𝚯\mathcal{F}_{\mathbf{\Theta}} into the ciphertext 𝐂^\hat{\mathbf{C}}. The protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} thus could be constructed by replacing 𝚯^\hat{\mathbf{\Theta}} in 𝚯\mathcal{F}_{\mathbf{\Theta}} with 𝐂^\hat{\mathbf{C}} (line 11). Apart from the protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}}, the Encrypt module also generates secret keys 𝐊\mathbf{K}, scaling parameters 𝐔,𝐕\mathbf{U},\mathbf{V}, and the statistical information 𝝁,𝝈\bm{\mu},\bm{\sigma}, for generating permissions.

Input: 𝐒m^={𝐄^m,𝐮m,𝐯m,κm,𝝁,𝝈}m=1m^\mathbf{S}_{\hat{m}}=\{\hat{\mathbf{E}}_{m},\mathbf{u}_{m},\mathbf{v}_{m},\kappa_{m},\bm{\mu},\bm{\sigma}\}_{m=1}^{\hat{m}}, protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}}.
Output: Decrypted model: 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}}
1 for m=1,,m^m=1,...,\hat{m} do
2       𝐂^mparameters in 𝚯~ at location 𝐄^m\hat{\mathbf{C}}_{m}\leftarrow\text{parameters~{}in~{}}{\mathcal{F}}_{\tilde{\mathbf{\Theta}}}\text{~{}at~{}location~{}}\hat{\mathbf{E}}_{m};
3       for l=1,,Ll=1,...,L do
4             𝐂ml=,𝐂^m={𝐂^ml}l=1L\mathbf{C}^{l}_{m}=\emptyset,\hat{\mathbf{C}}_{m}=\{\hat{\mathbf{C}}^{l}_{m}\}^{L}_{l=1} ;
5             for c^𝐢𝐧𝐂^ml\hat{c}~{}\mathbf{in}~{}\hat{\mathbf{C}}^{l}_{m} do
6                   cF(c|μl,σl),c(c^+uml)(vmluml)c\leftarrow F(c|\mu^{l},\sigma^{l}),~{}c\leftarrow(\hat{c}+u^{l}_{m})(v^{l}_{m}-u^{l}_{m}). μl𝝁,σl𝝈,uml𝐮m,vml𝐯m\rhd~{}\mu^{l}\in\bm{\mu},~{}\sigma^{l}\in\bm{\sigma},u^{l}_{m}\in\mathbf{u}_{m},v^{l}_{m}\in\mathbf{v}_{m};
7                   𝐂ml𝐂ml{c}\mathbf{C}^{l}_{m}\leftarrow\mathbf{C}^{l}_{m}\bigcup\{c\};
8                  
9      𝐂m={𝐂ml}l=1L,𝐑m𝒢fs(κm,|𝐂m|)\mathbf{C}_{m}=\{\mathbf{C}^{l}_{m}\}^{L}_{l=1},\mathbf{R}_{m}\leftarrow\mathcal{G}_{fs}(\kappa_{m},|\mathbf{C}_{m}|);
10       𝚯^m=𝐂m𝐑m\hat{\mathbf{\Theta}}_{m}=\mathbf{C}_{m}-\mathbf{R}_{m};
11      
12𝚯˘m^={𝚯^m}m=1m^(𝚯~/{𝐂^m}m=1m^)\breve{\mathbf{\Theta}}_{\hat{m}}=\{\hat{\mathbf{\Theta}}_{m}\}_{m=1}^{\hat{m}}\cup(\tilde{\mathbf{\Theta}}/\{\hat{\mathbf{C}}_{m}\}_{m=1}^{\hat{m}});
13 𝚯˘m^replace𝚯~of𝚯~with𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}}\leftarrow~{}\text{replace}~{}\tilde{\mathbf{\Theta}}~{}\text{of}~{}\mathcal{F}_{\tilde{\mathbf{\Theta}}}~{}\text{with}~{}\breve{\mathbf{\Theta}}_{\hat{m}};
return 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}}
Algorithm 2 The scheme of DecryptDecrypt module

3.3 Assign

The performance of the protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} now is heavily suppressed since dominated parameters 𝚯^\hat{\mathbf{\Theta}} are encrypted. Users could access 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} only with permissions composed of locations 𝐄^\hat{\mathbf{E}} of dominated parameters and other five security-related components outputted from the Encrypt module. More precisely, let 𝐒m^\mathbf{S}_{\hat{m}}’s (m^=1,,M)({\hat{m}}=1,...,M) be MM different levels of permissions, each of which is generated as follows:

𝐒m^={𝐄^m,𝐮m,𝐯m,κm,𝝁,𝝈}m=1m^.\small\mathbf{S}_{\hat{m}}=\{\hat{\mathbf{E}}_{m},\mathbf{u}_{m},\mathbf{v}_{m},\kappa_{m},\bm{\mu},\bm{\sigma}\}_{m=1}^{\hat{m}}. (9)

Here, each item {𝐄^m,𝐮m,𝐯m,κm,𝝁,𝝈}\{\hat{\mathbf{E}}_{m},\mathbf{u}_{m},\mathbf{v}_{m},\kappa_{m},\bm{\mu},\bm{\sigma}\} could independently decrypt the ciphertext of the dominated parameter subset 𝚯^m\hat{\mathbf{\Theta}}_{m} in 𝚯^\hat{\mathbf{\Theta}}. Details will be introduced in Section 3.4.

Also, the size of a permission 𝐒m^\mathbf{S}_{\hat{m}} in bits is (bκ+64L)m^+16Lm^Mϕ(b_{\kappa}+64L)\hat{m}+\frac{16L\hat{m}}{M}\phi (please refer to supplementary materials for detailed calculations), where bκb_{\kappa} is the bit size of the secret key κ\kappa, and ϕ\phi is the number of dominated parameters in each layer. For the prevailing models, VGG19 and DnCNN considered in our experiment, the size of permissions is less than 508KB. Such overhead is acceptable in practice.

3.4 Decrypt

When receiving a permission 𝐒m^\mathbf{S}_{\hat{m}} from users, the Decrypt module is operated as follows. For each item {𝐄^m,𝐮m,𝐯m,κm,𝝁,𝝈}\{\hat{\mathbf{E}}_{m},\mathbf{u}_{m},\mathbf{v}_{m},\kappa_{m},\bm{\mu},\bm{\sigma}\} in 𝐒m^\mathbf{S}_{\hat{m}}, the decryption procedure starts with locating the ciphertext 𝐂^m\hat{\mathbf{C}}_{m} of dominated parameter subset 𝚯^m\hat{\mathbf{\Theta}}_{m} according to the location 𝐄^m\hat{\mathbf{E}}_{m} (line 2). The decryption of 𝐂^m\hat{\mathbf{C}}_{m} is merely the inverse process of the encryption of 𝚯^m\hat{\mathbf{\Theta}}_{m}. As shown in Algorithm 2, for those encrypted parameters 𝐂^ml\hat{\mathbf{C}}^{l}_{m} of 𝐂^m\hat{\mathbf{C}}_{m} in the ll-th layer, we first transfer them back to the random masked version of 𝚯^ml\hat{\mathbf{\Theta}}^{l}_{m}, i.e., 𝐂ml\mathbf{C}^{l}_{m}, via the following transformations (lines 6-7)

𝐂ml={c|c^𝐂^ml,\displaystyle\mathbf{C}^{l}_{m}=\{c|\forall\hat{c}\in\hat{\mathbf{C}}^{l}_{m}, cF(c^|μl,σl),\displaystyle c\leftarrow F(\hat{c}|\mu^{l},\sigma^{l}), (10)
c(c+uml)(vmluml)},\displaystyle c\leftarrow(c+u^{l}_{m})(v^{l}_{m}-u^{l}_{m})\},

where F(,μl,σl)F(\cdot,\mu^{l},\sigma^{l}) is the CDF of Gaussian distribution 𝒩(θ|μl,σl)\mathcal{N}(\theta|\mu^{l},\sigma^{l}). By repeating the above procedure (10) for l=1,,Ll=1,...,L, we obtain the masked version 𝐂m\mathbf{C}_{m} of dominated parameters subset 𝚯^m\hat{\mathbf{\Theta}}_{m}. Then, we input the key κm\kappa_{m} to the FSPRNG to generate the same random number sequence 𝐑m\mathbf{R}_{m} used to mask 𝚯^m\hat{\mathbf{\Theta}}_{m} (line 8). The subset 𝚯^m\hat{\mathbf{\Theta}}_{m} readily is decrypted by removing 𝐑m\mathbf{R}_{m} from 𝐂m\mathbf{C}_{m}, i.e., 𝚯^m=𝐂^m𝐑m\hat{\mathbf{\Theta}}_{m}=\hat{\mathbf{C}}_{m}-\mathbf{R}_{m} (line 9).

With the permission 𝐒m^\mathbf{S}_{\hat{m}}, we can recover m^\hat{m} dominated parameter subsets {𝚯^m}m=1m^\{\hat{\mathbf{\Theta}}_{m}\}_{m=1}^{\hat{m}} in 𝚯^\hat{\mathbf{\Theta}} by repeating the above procedure.

The resulting decrypted model is denoted by 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}} (lines 10-11). Obviously, when m^<M\hat{m}<M, Mm^M-\hat{m} dominated parameter subsets {𝚯^m}m=m^+1M\{\hat{\mathbf{\Theta}}_{m}\}_{m=\hat{m}+1}^{M} are still encrypted in 𝚯˘m^\mathcal{F}_{\breve{\mathbf{\Theta}}_{\hat{m}}}. Hence it is a partially encrypted version of 𝚯\mathcal{F}_{\mathbf{\Theta}}. As m^\hat{m} increases to MM, all dominated parameters 𝚯^={𝚯^m}m=1M\hat{\mathbf{\Theta}}=\{\hat{\mathbf{\Theta}}_{m}\}_{m=1}^{M} are recovered, and 𝚯˘M=𝚯\mathcal{F}_{\breve{\mathbf{\Theta}}_{M}}=\mathcal{F}_{\mathbf{\Theta}}. Thus, the full performance of 𝚯\mathcal{F}_{\mathbf{\Theta}} is then released. On the contrary, if no permissions are fed into the Decrypt module, the performance of 𝚯\mathcal{F}_{\mathbf{\Theta}} is still suppressed in the encrypted model. Consequently, we have provided users with hierarchical services (different performance levels) of the pretrained model and achieved access control against unauthorized users.

Refer to caption
Figure 3: The classification accuracy of the protected VGG19 on CIFAR10 with respect to different percentages of encrypted parameters.

4 Imperceptibility of ciphertext

In this section, we theoretically prove the imperceptibility of the ciphertext, which significantly enhances the security of the protected model.

Let 𝚯~l\tilde{\mathbf{\Theta}}^{l} and 𝚯l\mathbf{\Theta}^{l} be parameters of the ll-th layer of the protected model 𝚯~\mathcal{F}_{\tilde{\mathbf{\Theta}}} and the pretrained model 𝚯\mathcal{F}_{\mathbf{\Theta}}, respectively. Recall that 𝚯~l\tilde{\mathbf{\Theta}}^{l} is a partially encrypted version of 𝚯l\mathbf{\Theta}^{l} where dominated parameters 𝚯^l\hat{\mathbf{\Theta}}^{l} are encrypted into the ciphertext 𝐂^l\hat{\mathbf{C}}^{l}. To ensure that attackers cannot recover these dominated parameters 𝚯l\mathbf{\Theta}^{l} as discussed in Section 2.2, a prerequisite is the imperceptibility of the ciphertext 𝐂^l\hat{\mathbf{C}}^{l}. Taking 𝚯~l\tilde{\mathbf{\Theta}}^{l} as a noisy version of 𝚯l\mathbf{\Theta}^{l}, the dominated parameters 𝚯^l\hat{\mathbf{\Theta}}^{l} of 𝚯l\mathbf{\Theta}^{l} are contaminated by the noise 𝐖^l=𝐂^l𝚯^l\hat{\mathbf{W}}^{l}=\hat{\mathbf{C}}^{l}-\hat{\mathbf{\Theta}}^{l}. If attackers cannot perceive the added noise 𝐖^l\hat{\mathbf{W}}^{l} by observing 𝚯~l\tilde{\mathbf{\Theta}}^{l}, the ciphertext 𝐂^l\hat{\mathbf{C}}^{l} is also imperceptible. To prove that 𝐖^l\hat{\mathbf{W}}^{l} is imperceptible, we resort to the measure equivocation defined by Shannon [28], which evaluates the information leakage of 𝐖^l\hat{\mathbf{W}}^{l} when observing 𝚯~l\tilde{\mathbf{\Theta}}^{l}. This measure is also widely used in the field of steganography and watermarking [4, 16]. In our analysis, we modify the original equivocation into an equivalent form, E^(𝐖^l,𝚯~l)=I(𝐖^l;𝚯~l)/H(𝐖^l)\hat{E}(\hat{\mathbf{W}}^{l},\tilde{\mathbf{\Theta}}^{l})={I(\hat{\mathbf{W}}^{l};\tilde{\mathbf{\Theta}}^{l})}/H(\hat{\mathbf{W}}^{l}), where H(𝐖^l)H(\hat{\mathbf{W}}^{l}) is the entropy of 𝐖^l\hat{\mathbf{W}}^{l} and I(𝐖^l;𝚯~l)I(\hat{\mathbf{W}}^{l};\tilde{\mathbf{\Theta}}^{l}) is the mutual information between 𝐖^l\hat{\mathbf{W}}^{l} and 𝚯~l\tilde{\mathbf{\Theta}}^{l}. If we can prove that E^(𝐖^l,𝚯~l)\hat{E}(\hat{\mathbf{W}}^{l},\tilde{\mathbf{\Theta}}^{l}) is negligible, the information leakage of 𝐖^l\hat{\mathbf{W}}^{l} is also negligible, and thus 𝐖^l\hat{\mathbf{W}}^{l} is imperceptible. We have the following theorem on the magnitude of E^(𝐖^l,𝚯~l)\hat{E}(\hat{\mathbf{W}}^{l},\tilde{\mathbf{\Theta}}^{l}).

𝐓𝐡𝐞𝐨𝐫𝐞𝐦\mathbf{Theorem} 1.

The modified equivocation between 𝐖^l\hat{\mathbf{W}}^{l} and 𝚯~l\tilde{\mathbf{\Theta}}^{l} is of order |𝐖^l|1/2|\hat{\mathbf{W}}^{l}|^{-1/2}. That is, as |𝐖^l||\hat{\mathbf{W}}^{l}|\rightarrow\infty,

E^(𝐖^l,𝚯~l)=O(|𝐖^l|1/2).\small\hat{E}(\hat{\mathbf{W}}^{l},\tilde{\mathbf{\Theta}}^{l})=O(|\hat{\mathbf{W}}^{l}|^{-1/2}). (11)

The detailed proof is given in the supplementary materials. As can be easily seen from (11), E^(𝐖^l,𝚯~l)\hat{E}(\hat{\mathbf{W}}^{l},\tilde{\mathbf{\Theta}}^{l}) is negligible if |𝐖^l||\hat{\mathbf{W}}^{l}| is large enough, or equivalently if the number of dominated parameters in 𝚯^l\hat{\mathbf{\Theta}}^{l} is large enough. This condition is quite reasonable in practice as CNN models typically contain a massive of parameters. Taking VGG19 as an example, the number of parameters in the 8-th layer is of order 10610^{6}. Supposing 10%10\% parameters are identified as dominated ones, the order of |𝐖^l||\hat{\mathbf{W}}^{l}| would be 10510^{5} and that of E^(𝐖^l,𝚯~l)\hat{E}(\hat{\mathbf{W}}^{l},\tilde{\mathbf{\Theta}}^{l}) is 105/210^{-5/2}. Thus we claim that the ciphertext in the protected model is imperceptible.

Refer to caption
Figure 4: The denoising performance of the protected DnCNN with respect to different percentages of encrypted parameters.

5 Experiments results

In this section, we experimentally verify that our system can achieve the goals defined in Section 2.3. At first, we briefly introduce the experiment setup.

We consider two CNN models: the classification model VGG19 [32] and the denoising model DnCNN [40]. We train VGG19 on CIFAR10 [17] and DnCNN on 300 noisy images from ImageNet [8] with Gaussian noise (noise level 50). The best classification accuracy of the pretrained VGG19 on 10000 test images of CIFAR10 is 91.24%91.24\%, and the best PSNR of the pretrained DnCNN on 40 noisy images is 28.35dB. According to our observations, encrypting several layers of VGG19 and DnCNN is enough to cause the maximal performance degradation. Therefore, in all experiments below, we only selectively encrypt parameters of 1-st, 2-th, 5-th, and 9-th layers of VGG19, and that of DnCNN are 6-th, 9-th, and 12-th layers. Due to the space limit, more experiment results can be found in the supplementary materials.

Model Level of Permissions: m^\hat{m}
0 1 2 3 4 5
VGG19 (%\%) 10.00 65.41 78.65 82.58 87.33 91.24
DnCNN (dB) 13.61 20.12 23.23 25.52 26.58 28.35
Table 1: Hierarchical Performance of Decrypted Models

5.1 Effectiveness of the proposed SE

For rejecting unauthorized access, a successful protection should degrade the performance of the encrypted model into the worst situation, if no permission is granted. For instance, for VGG19 on CIFAR10, the worst prediction accuracy is 10%10\% (random guess among 10 classes). To the best of our knowledge, our proposed scheme is the first one to protect CNN models with SE. For preparing the competing algorithms, we design four different strategies to select parameters from considered layers, and encrypt them with the proposed DPRM. 1) 𝐑𝐚𝐧𝐝𝐨𝐦\mathbf{Random}: We select parameters uniformly at random; 2) 𝐌𝐞𝐚𝐧\mathbf{Mean}: We extract parameters around the mean value. This selection strategy is motivated by the observation that parameters are concentrated around the mean value (see Fig. 2); 3) 𝐃𝐞𝐬𝐜𝐞𝐧𝐝𝐢𝐧𝐠\mathbf{Descending}: A reasonable hypothesis is that the importance of parameters is positively correlated to their values. Thus, we select parameters in the descending order of their values; and 4) 𝐀𝐬𝐜𝐞𝐧𝐝𝐢𝐧𝐠\mathbf{Ascending}: Conversely, we select parameters in ascending order.

Fig. 3 shows the classification accuracy of VGG after encrypting different percentages of parameters of considered layers. We can observe that our PSS (red curve) degrades the VGG19 to the worst case when only 8%8\% of parameters are encrypted, while the best competing one needs to encrypt 40%40\% of parameters. Such a result demonstrates the effectiveness of the proposed SE for protecting VGG19. Similarly, Fig. 4 shows the denoising performance of the DnCNN when PSS and other competing algorithms are used for the parameter selection. As can be observed, the denoising performance degrades very quickly when model parameters are selected by PSS and encrypted. Note that, to eliminate the randomness, results in Fig. 3 and 4 are the averages over repeating the encryption for 20 times.

Refer to caption
(a) Clean
Refer to caption
(b) Noisy (14.15dB)
Refer to caption
(c) m^=0\hat{m}=0 (8.91dB)
Refer to caption
(d) m^=1\hat{m}=1 (19.22dB)
Refer to caption
(e) m^=3\hat{m}=3 (24.40dB)
Refer to caption
(f) m^=5\hat{m}=5 (29.91dB)
Figure 5: Images to illustrate the hierarchical performance of the protected DnCNN with different permissions. (a) Clean image; (b) Noisy image; (c) The output image of the protected DnCNN without permission; (d-f) Denoising results of the noisy image by utilizing the protected DnCNN with different permissions 𝐒m^\mathbf{S}_{\hat{m}}’s.

5.2 Hierarchical performance of the released model

We now demonstrate that models decrypted from the protected model with various permissions could exhibit different levels of performance. We selectively encrypt 10%10\% (2%2\%) parameters of considered layers of the VGG19 (DnCNN). Then, 5 permissions 𝐒m^\mathbf{S}_{\hat{m}} (m^=1,,5\hat{m}=1,...,5) are generated by the module 𝐀𝐬𝐬𝐢𝐠𝐧\mathbf{Assign} in Section 3.3 (M=5M=5). These permissions are fed into the 𝐃𝐞𝐜𝐫𝐲𝐩𝐭\mathbf{Decrypt} module to decrypt the protected VGG19 (DnCNN). The performance of the decrypted VGG19 and DnCNN with respect to the 55 permissions is recorded in Table 1. As the increase of m^\hat{m}, the decrypted VGG19 (DnCNN) exhibits different levels of accuracy (PSNR) and reaches the best one when inputting the highest permission (m^=5\hat{m}=5). Here, m^=0\hat{m}=0 implies no permission is granted, corresponding to the worst prediction accuracy. For a better illustration of the hierarchical performance of the released DnCNN, the visualized denoising results of a test image under different permissions are shown in Fig. 5. One can see from Fig. 5(d)-(f) that a higher level of permission (larger m^\hat{m}) endows users with a better denoised image. Interestingly, for users without permission (m^=0\hat{m}=0, see Fig. 5(c)), the resulting denoised image is even worse than the original one.

Attacking Goals:  VGG19 = 65.41%  DnCNN = 20.12dB
Model Denoising via Wavelets Denoising via Filters
DB2 Haar Sym9 Average Gaussian Median
VGG19 21.15% 23.13% 24.72% 19.54% 18.42% 17.31%
DnCNN 15.17dB 16.71dB 15.31dB 13.54dB 14.32dB 15.51dB
Model Layer-wise Transferring
VGG19 55.15% 39.14%
DnCNN 18.24dB 17.49dB
Table 2: Performance of Protected Models under Attacks

5.3 Security against potential attacks

We then evaluate the security of the protected model against denoising and retraining attacks considered in the threat model. First of all, we define the attackers’ capability and goals to quantitatively evaluate whether an attack succeeds. 𝐀𝐭𝐭𝐚𝐜𝐤𝐞𝐫𝐬𝐂𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲\mathbf{Attackers^{\prime}~{}Capability}: Attackers could challenge the protected model under one or all of the following capabilities. 1) Attackers can access all parameters of the protected model; 2) Attackers possess limited data (10%10\%) used for training the pretrained model; 3) Attackers own another model that has the same structure as the protected one; but is trained over another dataset; and 4) Attackers have known which layers in the model are encrypted. 𝐀𝐭𝐭𝐚𝐜𝐤𝐞𝐫𝐬𝐆𝐨𝐚𝐥\mathbf{Attackers^{\prime}~{}Goal}: The attackers’ goal is to obtain the same model performance as a user with the lowest permission. As shown in Table 1, for VGG19 and DnCNN, the attackers’ goals are 65.41% classification accuracy and 20.12dB of the denoised image, respectively.

We now consider several potential denoising and retraining attacks and show that the protected model is secure against them. Note that, in Section 4, we have proved the imperceptibility of the encrypted parameters. Hence, all attacks discussed below are based on the premise that the locations of the encrypted parameters are unknown.

DENOISING ATTACKS

1) Denoising via Wavelets: Attackers take selectively encrypted parameters in each considered layer as a partially contaminated discrete signal (flattening parameter tensors into a 1-D vector). They can remove the noise by resorting to wavelet denoising techniques. To simulate attackers’ behavior, we consider three different wavelets: 𝐃𝐁𝟐\mathbf{DB2}, 𝐇𝐚𝐚𝐫\mathbf{Haar}, and 𝐒𝐲𝐦𝟗\mathbf{Sym9}. As shown in Table 2, the accuracies of the protected model suffering the three wavelet based denoising attacks are 21.15%, 23.13%, and 24.72%, respectively. Obviously, none of them achieves an accuracy better than the attacking goal 65.41%. For DnCNN, the best performance given by the three types of attack is 16.71dB, which is still inferior to the attacking goal (20.12dB). Thus, we conclude that the VGG19 and DnCNN protected by our scheme are secure against the wavelets based denoising attacks.

2) Denosing via Filters: The significant performance degradation of the encrypted model possibly ascribes to some abnormally large or small parameters. Attacker could implement the average filter or the median filter on the 1-D signal of encrypted parameters. Moreover, the Gaussian filter possibly is suitable since the noise used to encrypt dominated parameters follows a Gaussian distribution. The performance of the protected VGG19 and DnCNN under these filter based attacks is recorded in Table 2. As can be seen, for VGG19, all the restored accuracies after attacks cannot exceed the attacking goal (65.41%). Similarly, for DnCNN, the best PSNR of the attacked model is 15.51dB, which is still worse than the attacking goal (20.12dB). Therefore, the encrypted VGG19 and DnCNN models are secure against the filtering based attacks.

RETRAINING ATTACKS

1) Layer-wise: Since only several layers are encrypted by our proposed scheme, attackers could retrain each layer independently by fixing parameters of other layers, based on the available training data. The classification accuracy of the retrained VGG19 is recorded in Table 2. It can be seen that the resulting accuracy is 55.15%55.15\%, which is lower than the attacking goal 65.41%. A similar result for DnCNN is 18.24dB, which is also worse than the attacking goal (20.12dB). Thus, under the assumption on attackers’ capability and the defined attacking goal, we conclude that this type of retraining attack is not successful.

2) Transfering: Attackers may hypothesize that the distribution of encrypted parameters possibly is consistent among models with the same structure; but trained on another dataset. They thus implement the proposed PSS strategy on a new VGG19 trained on other ten classes from CIFAR100 [17] and obtain locations of dominated parameters of this VGG19. Then, they retrain parameters of the protected VGG19 at locations learned from the new VGG19. The performance of such attacked model is only 39.14%, far from the attacking goal 65.41%. We also try to attack DnCNN by using the same strategy, where the locations of encrypted parameters are transferred from another DnCNN trained over the noisy images with noise level 25. In this case, the PSNR value of the denoised image is 17.49dB, which is still lower than the attacking goal (20.12dB).

6 Conclusions

In this paper, we have proposed a SE algorithm to protect a CNN model by firstly selecting important parameters from this model with the PSS and then encrypting the selected parameters with DPRM. A system based on our SE can prevent a CNN model from unauthorized access and also provide authorized users with hierarchical services. Experimental results have been provided to show the effectiveness and security of the SE on protecting CNN models. Acknowledgments: This work was supported by Macau Science and Technology Development Fund under SKL-IOTSC-2018-2020, 077/2018/A2, 0015/2019/AKP, and 0060/2019/A1, by Research Committee at University of Macau under MYRG2018-00029-FST and MYRG2019-00023-FST, by Natural Science Foundation of China under 61971476. This work was also supported by Alibaba Group through Alibaba Innovative Research Program.

References

  • [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
  • [2] Mohamed Abomhara, Omar Zakaria, Othman O Khalifa, AA Zaidan, and BB Zaidan. Enhancing selective encryption for h. 264/avcusing advanced encryption standard. International Journal of Computer and Electrical Engineering, 2(2):223, 2010.
  • [3] Ahmed M Ayoup, Amr H Hussein, and Mahmoud AA Attia. Efficient selective image encryption. Multimedia Tools and Applications, 75(24):17171–17186, 2016.
  • [4] François Cayre, Caroline Fontaine, and Teddy Furon. Watermarking security: theory and practice. IEEE Transactions on Signal Processing, 53(10):3976–3987, 2005.
  • [5] Gong Cheng, Junwei Han, Peicheng Zhou, and Dong Xu. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Transactions on Image Processing, 28(1):265–278, 2018.
  • [6] Don Coppersmith, Donald Byron Johnson, and Stephen M Matyas. A proposed mode for triple-des encryption. IBM Journal of Research and Development, 40(2):253–262, 1996 1996.
  • [7] Surya Nepal Rajiv Ranjan Deepak Puthal, Xindong Wu and Jinjun Chen. Seen: A selective encryption method to ensure confidentiality for big sensing data streams. IEEE Transactions on Big Data, pages 1–1, 2017.
  • [8] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  • [9] Jinglong Du, Zhongshi He, Lulu Wang, Ali Gholipour, Zexun Zhou, Dingding Chen, and Yuanyuan Jia. Super-resolution reconstruction of single anisotropic 3d mr images using residual convolutional neural network. Neurocomputing, 392:209–220, 2020.
  • [10] Ross Girshick. Fast r-cnn. In IEEE International Conference on Computer Vision, pages 1440–1448, 2015.
  • [11] Marco Grangetto, Enrico Magli, and Gabriella Olmo. Multimedia selective encryption by means of randomized arithmetic coding. IEEE Transactions on Multimedia, 8(5):905–917, 2006.
  • [12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  • [13] Huaibo Huang, Ran He, Zhenan Sun, and Tieniu Tan. Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1689–1697, 2017.
  • [14] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, pages 1–12, 2017.
  • [15] Jakub Jurek, Marek Kociński, Andrzej Materka, Marcin Elgalal, and Agata Majos. Cnn-based superresolution reconstruction of 3d mr images using thick-slice scans. Biocybernetics and Biomedical Engineering, 40(1):111–125, 2020.
  • [16] Yan Ke, Jia Liu, Min-Qing Zhang, Ting-Ting Su, and Xiao-Yuan Yang. Steganography security: Principle and practice. IEEE Access, 6:73009–73022, 2018.
  • [17] Alex Krizhevsky, Hinton, and Geoffrey. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  • [18] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [19] Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, and Rongrong Ji. Exploiting kernel sparsity and entropy for interpretable cnn compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2800–2809, 2019.
  • [20] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietikäinen. Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2):261–318, 2020.
  • [21] Christos Louizos, Max Welling, and Diederik P Kingma. Learning sparse neural networks through l_0l\_0 regularization. In International Conference on Learning Representations, pages 1–12, 2017.
  • [22] Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations, pages 1–12, 2016.
  • [23] Med SalimBouhlela Med Karim Abdmouleha, Ali Khalfallaha. A novel selective encryption scheme for medical images transmission based-on jpeg compression algorithm. Procedia Computer Science, pages 1–1, 2017.
  • [24] Ari S Morcos, David GT Barrett, Neil C Rabinowitz, and Matthew Botvinick. On the importance of single directions for generalization. In International Conference on Learning Representations, 2018.
  • [25] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, 2017.
  • [26] Ronald L Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2):120–126, 1978R.
  • [27] Bruce Schneier, John Kelsey, Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson. wofish: A 128-bit block cipher. NIST AES Proposal, 15(1):23–91, 1998.
  • [28] Claude E Shannon. Communication theory of secrecy systems. The Bell system technical journal, 28(4):656–715, 1949.
  • [29] Oran Shayer, Dan Levi, and Ethan Fetaya. Learning discrete weights using the local reparameterization trick. In International Conference on Learning Representations, 2018.
  • [30] Shoup and Victor. Sequences of games: a tool for taming complexity in security proofs. IACR Cryptology ePrint Archive, 2004:332, 2004.
  • [31] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. In Proceedings of Machine Learning Research, pages 1–9, 2017.
  • [32] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, pages 1–10, 2015.
  • [33] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.
  • [34] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016.
  • [35] Marc Van Droogenbroeck and Raphaël Benedett. Techniques for a selective encryption of uncompressed and compressed images. Advanced Concepts for Intelligent Vision Systems, Proceedings, pages 90–97, 2002.
  • [36] Jing Xiao, Liang Liao, Qiegen Liu, and Ruimin Hu. Cisi-net: Explicit latent content inference and imitated style rendering for image inpainting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 354–362, 2019.
  • [37] Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, and Shiguang Shan. Shift-net: Image inpainting via deep feature rearrangement. In Proceedings of the European Conference on Computer Vision, pages 1–17, 2018.
  • [38] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang, and S. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5505–5514, 2018.
  • [39] Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, and Larry S Davis. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9194–9203, 2018.
  • [40] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.