This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Institute for Advanced Study, Tsinghua University, Beijing, China, 11email: [email protected], 11email: {anyuwang, xiaoyunwang}@tsinghua.edu.cn
22institutetext: Institute for Network Sciences and Cyberspace, BNRist, Tsinghua University, Beijing, China, 22email: [email protected]
33institutetext: School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 33email: [email protected]
44institutetext: Department of Computer Science and Technology, Tsinghua University, Beijing, China, 44email: [email protected]
55institutetext: Zhongguancun Laboratory, Beijing, China
66institutetext: Shandong Key Laboratory of Artificial Intelligence Security, Shandong, China

Hard-Label Cryptanalytic Extraction of Neural Network Models

Yi Chen ID 11    Xiaoyang Dong ID 2255    Jian Guo ID 33    Yantian Shen ID 44    Anyu Wang ID 1155    Xiaoyun Wang(🖂) ID 115566
Abstract

The machine learning problem of extracting neural network parameters has been proposed for nearly three decades. Functionally equivalent extraction is a crucial goal for research on this problem. When the adversary has access to the raw output of neural networks, various attacks, including those presented at CRYPTO 2020 and EUROCRYPT 2024, have successfully achieved this goal. However, this goal is not achieved when neural networks operate under a hard-label setting where the raw output is inaccessible.

In this paper, we propose the first attack that theoretically achieves functionally equivalent extraction under the hard-label setting, which applies to ReLU neural networks. The effectiveness of our attack is validated through practical experiments on a wide range of ReLU neural networks, including neural networks trained on two real benchmarking datasets (MNIST, CIFAR10) widely used in computer vision. For a neural network consisting of 10510^{5} parameters, our attack only requires several hours on a single core.

Keywords:
Cryptanalysis ReLu Neural Networks Functionally Equivalent Extraction Hard-Label.

1 Introduction

Extracting all the parameters (including weights and biases) of a neural network (called victim model) is a long-standing open problem which is first proposed by cryptographers and mathematicians in the early nineties of the last century [2, 7], and has been widely studied by research groups from both industry and academia [14, 16, 11, 18, 1, 20, 4, 3].

In previous research [14, 16, 11, 18, 20, 4, 3], one of the most common attack scenarios is as follows. The victim model (denoted by fθf_{\theta} where θ\theta denotes the parameters) is made available as an Oracle 𝒪\mathcal{O}, then the adversary generates inputs xx to query 𝒪\mathcal{O} and collects the feedback ζ\zeta to extract the parameters. This is similar to a cryptanalysis problem: θ\theta is considered the secret key, and the adversary tries to recover the secret key θ\theta, given the pairs (x,ζ)\left(x,\zeta\right) [4]. If fθf_{\theta} contains mm parameters (64-bit floating-point numbers), then the secret key θ\theta contains 64×m64\times m bits, and the computation complexity of brute force searching is 264×m2^{64\times m}.

Consider that there may be isomorphisms for neural networks, e.g., permutation and scaling for ReLU neural networks [18]. An important concept named functionally equivalent extraction is summarized and proposed in [11].

Functionally Equivalent Extraction.

Denote by 𝒳\mathcal{X} the input space of the victim model fθf_{\theta}. Functionally equivalent extraction aims at generating a model fθ^f_{\widehat{\theta}} (i.e., the extracted model), such that fθ(x)=fθ^(x)f_{\theta}(x)=f_{\widehat{\theta}}(x) holds for all x𝒳x\in\mathcal{X}, where fθ(x)f_{\theta}(x) and fθ^(x)f_{\widehat{\theta}}(x) are, respectively, the raw output of the victim model and the extracted model [11]. Such extracted model fθ^f_{\widehat{\theta}} is called the functionally equivalent model of fθf_{\theta} (also say that fθf_{\theta} and fθ^f_{\widehat{\theta}} are isomorphic [18]). Since fθ^f_{\widehat{\theta}} behaves the same as fθf_{\theta}, the adversary can explore the properties of fθf_{\theta} by taking fθ^f_{\widehat{\theta}} as a perfect substitute 111 Due to the isomorphisms introduced in [18], the parameters θ^\widehat{\theta} of the extracted model may be different from that θ\theta of the victim model, but it does not matter as long as fθ^f_{\widehat{\theta}} is the functionally equivalent model of fθf_{\theta}. .

Functionally equivalent extraction is hard [11]. Consider the isomorphisms (permutation and scaling) introduced in [18]. Scaling can change parameters and permutation does not. For a ReLU neural network fθf_{\theta} that contains mm parameters (64-bit floating-point numbers) and nn neurons, from the perspective of the above cryptanalysis problem, even though scaling can be applied to each neuron once, the adversary still needs to recover 64×(mn)64\times(m-n) secret key bits. Note that the case of mnm\gg n is common for neural networks. For example, for the ReLU neural networks extracted by Carlini et al. at CRYPTO 2020 [4], the pairs (m,n)(m,n) are (25120,32)(25120,32), (100480,128)(100480,128), (210,20)(210,20), (420,40)(420,40), and (4020,60)(4020,60). Besides, even if we only recover a few bits (instead of 6464 bits) of each parameter, the number of secret key bits to be recovered is still large, particularly in the case of large mm.

When a functionally equivalent model fθ^f_{\widehat{\theta}} is obtained, the adversary can do more damage (e.g., adversarial attack [5]) or steal user privacy (e.g., property inference attack [9]). Thus, even though the cost is expensive, the adversary has the motivation to achieve functionally equivalent extraction.

Hard-label Setting.

According to the taxonomy in [11], when the Oracle is queried, there are 55 types of feedback given by the Oracle: (1) label (the most likely class label, also called hard-label), (2) label and score (the most-likely class label and its probability score), (3) top-k scores, (4) all the label scores, (5) the raw output (i.e., fθ(x)f_{\theta}(x)). When the Oracle only returns the hard-label, we say that the victim model fθf_{\theta} (i.e., neural networks in this paper) works under the hard-label setting [8].

To the best of our knowledge, there are no functionally equivalent extraction attacks that are based on the first four types of feedback so far. From the perspective of cryptanalysis, the raw output fθ(x)f_{\theta}(x) is equivalent to the complete ciphertext corresponding to the plaintext (i.e., the query xx). The other four types of feedback only reveal some properties of the ciphertext (raw output fθ(x)f_{\theta}(x)). For example, when fθ(x)f_{\theta}(x)\in\mathbb{R}, the hard-label only tells whether fθ(x)>0f_{\theta}(x)>0 holds or not [8]. Among the five types of feedback, the raw output leaks the most information, while the hard-label (i.e., the first type of feedback) leaks the least [11].

Assuming that the feedback of the Oracle is the raw output, Jagielski et al. propose the first functionally equivalent extraction attack against ReLU neural networks with one hidden layer [11], which is extended to deeper neural networks in [18]. At CRYPTO 2020, Carlini et al. propose a differential extraction attack [4] that requires fewer queries than the attack in [18]. However, the differential extraction attack requires an exponential amount of time, which is addressed by Canales-Martínez et al. at EUROCRYPT 2024 [3]. Note that the extraction attacks in [18, 4, 3] are also based on the assumption that the feedback is the raw output. Due to the dependence on the raw output, all the authors in [11, 18, 4, 3], state that the hard-label setting (i.e., the feedback is the first type) is a defense against functionally equivalent extraction.

The above backgrounds lead to the question not studied before

Is it possible to achieve functionally equivalent extraction againstneural network models under the hard-label setting?\begin{array}[]{c}\text{\emph{Is it possible to achieve functionally equivalent extraction against}}\\ \text{\emph{neural network models under the hard-label setting?}}\end{array}

1.1 Our Results and Techniques

1.1.1 Results.

We have addressed this question head-on in this paper. In total, the answer is yes, and we propose the first functionally equivalent extraction attack against ReLU neural networks under the hard-label setting. Here, the definition of functionally equivalent extraction proposed in [4] is extended reasonably.

Definition 1 (Extended Functionally Equivalent Extraction)

The goal of the extended functionally equivalent extraction is to generate a model fθ^f_{\widehat{\theta}} (i.e., the extracted model), such that fθ^(x)=c×fθ(x)f_{\widehat{\theta}}(x)=c\times f_{\theta}(x) holds for all x𝒳x\in\mathcal{X}, where c>0c>0 is a fixed constant, fθ(x)f_{\theta}(x) and fθ^(x)f_{\widehat{\theta}}(x) are, respectively, the raw output of the victim model and the extracted model. The extracted model fθ^f_{\widehat{\theta}} is the functionally equivalent model of the victim model fθf_{\theta}.

Since fθ^(x)=c×fθ(x)f_{\widehat{\theta}}(x)=c\times f_{\theta}(x) holds for all x𝒳x\in\mathcal{X}, i.e., fθ^f_{\widehat{\theta}} is a simple scalar product of fθf_{\theta}, the adversary still can explore the properties of the victim model fθf_{\theta} by taking fθ^f_{\widehat{\theta}} as a perfect substitute. This is why we propose this extended definition. From the perspective of cryptanalysis, this extended definition allows the adversary not to guess the 6464 bits of the constant cc. To evaluate the efficacy of our model extraction attacks, and quantify the degree to which a model extraction attack has succeeded in practice, we generalize the metric named (ε,δ)(\varepsilon,\delta)-functional equivalence proposed in [4].

Definition 2 (Extended (ε,δ)(\varepsilon,\delta)-Functional Equivalence)

Two models fθ^f_{\widehat{\theta}} and fθf_{\theta} are (ε,δ)(\varepsilon,\delta)-functional equivalent on 𝒮\mathcal{S} if there exists a fixed constant c>0c>0 such that

𝐏𝐫x𝒮[|fθ^(x)c×fθ(x)|ε]1δ\mathbf{Pr}_{x\in\mathcal{S}}\left[\left|f_{\widehat{\theta}}(x)-c\times f_{\theta}(x)\right|\leqslant\varepsilon\right]\geqslant 1-\delta

In this paper, we propose two model extraction attacks, one of which applies to 0-deep neural networks, and the other one applies to kk-deep neural networks. The former attack is the basis of the latter attack. Our model extraction attacks theoretically achieve functionally equivalent extraction described in Definition 1, where the constant c>0c>0 is determined by the model parameter θ\theta.

We have also performed numerous experiments on both untrained and trained neural networks, for verifying the effectiveness of our model extraction attacks in practice. The untrained neural networks are obtained by randomly generating model parameters. To fully verify our attacks, we also adopt two real benchmarking image datasets (i.e., MNIST and CIFAR10) widely used in computer vision, and train many classifiers (i.e., trained neural networks) as the victim model. Our model extraction attacks show good performances in experiments. The complete experiment results refer to Tables 1 and 2 in Section 7. The number of parameters of neural networks in our experiments is up to 10510^{5}, but the runtime of the proposed extraction attack on a single core is within several hours. Our experiment code is uploaded to GitHub (https://github.com/AI-Lab-Y/NN_cryptanalytic_extraction).

The analysis of the attack complexity is presented in Appendix 0.B. For the extraction attack on kk-deep neural networks, its query complexity is about 𝒪(d0×2n×log21ϵ)\mathcal{O}\left(d_{0}\times 2^{n}\times{\rm log}_{2}^{\frac{1}{\epsilon}}\right), where d0d_{0} and nn are, respectively, the input dimension (i.e., the size of xx) and the number of neurons, ϵ\epsilon is a precision chosen by the adversary. The computation complexity is about 𝒪(n×2n2+n+k)\mathcal{O}\left(n\times 2^{n^{2}+n+k}\right), where nn is the number of neurons and kk is the number of hidden layers. The computation complexity of our attack is much lower than that of brute-force searching.

Techniques.

By introducing two new concepts, namely model activation pattern and model signature, we obtained some findings as follows. A ReLU neural network is composed of a certain number of affine transformations corresponding to model activation patterns. Each affine transformation leaks partial information about neural network parameters, which is determined by the corresponding model activation pattern. Most importantly, for a neural network that contains nn neurons, n+1n+1 special model activation patterns will leak all the information about the neural network parameters.

Inspired by the above findings, we design a series of methods to find decision boundary points, recover the corresponding affine transformations, and further extract the neural network parameters. These methods compose the complete model extraction attacks.

1.1.2 Organization.

The basic notations, threat model, attack goal and assumptions are introduced in Section 2. Section 3 introduces some auxiliary concepts. Then we introduce the overview of our model extraction attacks, the idealized model extraction attacks, and some refinements in practice in the following three sections respectively. Experiments are introduced in Section 7. At last, we present more discussions about our work and conclude this paper.

2 Preliminaries

2.1 Basic Definitions and Notations

This section presents some necessary definitions and notations.

Definition 3 (kk-Deep Neural Network [4])

A kk-deep neural network fθ(x)f_{\theta}(x) is a function parameterized by θ\theta that takes inputs from an input space 𝒳\mathcal{X} and returns values in an output space 𝒴\mathcal{Y}. The function f:𝒳𝒴f\colon\mathcal{X}\rightarrow\mathcal{Y} is composed of alternating linear layers fif_{i} and a non-linear activation function σ\sigma:

f=fk+1σσf2σf1.f=f_{k+1}\circ\sigma\circ\cdots\circ\sigma\circ f_{2}\circ\sigma\circ f_{1}. (1)

In this paper, we exclusively study neural networks over 𝒳=d0\mathcal{X}=\mathbb{R}^{d_{0}} and 𝒴=dk+1\mathcal{Y}=\mathbb{R}^{d_{k+1}}, where d0d_{0} and dk+1d_{k+1} are positive integers. As in [4, 3], we only consider neural networks using the ReLU [15] activation function, given by σ:xmax(x,0)\sigma\colon x\mapsto\textup{max}(x,0).

Definition 4 (Fully Connected Layer [4])

The ii-th fully connected layer of a neural network is a function fi:di1dif_{i}\colon\mathbb{R}^{d_{i-1}}\rightarrow\mathbb{R}^{d_{i}} given by the affine transformation

fi(x)=A(i)x+b(i).f_{i}(x)=A^{(i)}x+b^{(i)}. (2)

where A(i)di×di1A^{(i)}\in\mathbb{R}^{d_{i}\times d_{i-1}} is a di×di1d_{i}\times d_{i-1} weight matrix, b(i)dib^{(i)}\in\mathbb{R}^{d_{i}} is a did_{i}-dimensional bias vector.

Definition 5 (Neuron [4])

A neuron is a function determined by the corresponding weight matrix, bias vector, and activation function. Formally, the jj-th neuron of layer ii is the function η\eta given by

η(x)=σ(Aj(i)x+bj(i)),\eta(x)=\sigma\left(A_{j}^{(i)}x+b_{j}^{(i)}\right), (3)

where Aj(i)A_{j}^{(i)} and bj(i)b_{j}^{(i)} denote, respectively, the jj-th row of A(i)A^{(i)} and jj-th coordinate of b(i)b^{(i)}. In a kk-deep neural network, there are a total of i=1kdi\sum_{i=1}^{k}d_{i} neurons.

Definition 6 (Neuron State [3])

Let 𝒱(η;x)\mathcal{V}(\eta;x) denote the value that neuron η\eta takes with x𝒳x\in\mathcal{X} before applying σ\sigma. If 𝒱(η;x)>0\mathcal{V}(\eta;x)>0, then η\eta is active, i.e., the neuron state is active. Otherwise, the neuron state is inactive 222 In [4, 3], the authors defined one more neuron state, namely critical, i.e., 𝒱(η;x)=0\mathcal{V}(\eta;x)=0, which is a special inactive state since the output of neuron η\eta is 0. . The state of the jj-th neuron in layer ii on input xx is denoted by 𝒫j(i)(x)𝔽2\mathcal{P}_{j}^{(i)}(x)\in\mathbb{F}_{2}. If 𝒫j(i)(x)=1\mathcal{P}_{j}^{(i)}(x)=1, the neuron is active. If 𝒫j(i)(x)=0\mathcal{P}_{j}^{(i)}(x)=0, the neuron is inactive.

Definition 7 (Neural Network Architecture [4])

The architecture of a fully connected neural network captures the structure of fθf_{\theta}: (a) the number of layers, (b) the dimension did_{i} of each layer i=0,,k+1i=0,\cdots,k+1. We say that d0d_{0} is the dimension of the input to the neural network, and dk+1d_{k+1} denotes the number of outputs of the neural network.

Definition 8 (Neural Network Parameters [4])

The parameters θ\theta of a kk-deep neural network fθf_{\theta} are the concrete assignments to the weights A(i)A^{(i)} and biases b(i)b^{(i)} for i{1,2,,k+1}i\in\{1,2,\cdots,k+1\}.

When neural networks work under the hard-label setting, the raw output fθ(x)f_{\theta}(x) is processed before being returned [8]. This paper considers the most common processing. The raw output fθ(x)dk+1f_{\theta}(x)\in\mathbb{R}^{d_{k+1}} is first transformed into a category probability vector 𝐏dk+1\mathbf{P}\in\mathbb{R}^{d_{k+1}} by applying the Sigmoid (when dk+1=1d_{k+1}=1) or Softmax (when dk+1>1d_{k+1}>1) function to fθ(x)f_{\theta}(x) [6]. Then, the category with the largest probability is returned as a hard-label. Definition 9 summarizes the hard-label and corresponding decision conditions on the raw output fθ(x)f_{\theta}(x).

Definition 9 (Hard-Label)

Consider a kk-deep neural network f:𝒳𝒴f\colon\mathcal{X}\rightarrow\mathcal{Y} where 𝒴dk+1\mathcal{Y}\in\mathbb{R}^{d_{k+1}}. The hard-label (denoted by zz) is related to the outputs fθ(x)f_{\theta}(x). When dk+1=1d_{k+1}=1, the hard-label z(fθ(x))z\left(f_{\theta}(x)\right) is computed as

z(fθ(x))={1,iffθ(x)>0,0,iffθ(x)0.z\left(f_{\theta}(x)\right)=\left\{\begin{array}[]{l}1,\,\,\textup{if}\,\,f_{\theta}(x)>0,\\ 0,\,\,\textup{if}\,\,f_{\theta}(x)\leqslant 0.\\ \end{array}\right. (4)

When dk+1>1d_{k+1}>1, the output fθ(x)f_{\theta}(x) is a dk+1d_{k+1}-dimensional vector. The hard-label z(fθ(x))z\left(f_{\theta}(x)\right) is the coordinate of the maximum of fθ(x)f_{\theta}(x). 333 If there are ties, i.e., multiple items of fθ(x)f_{\theta}(x) share the same maximum, the hard-label is the smallest one of the coordinates of these items.

2.2 Adversarial Goals and Assumptions

There are two parties in a model extraction attack: the oracle 𝒪\mathcal{O} who possesses the neural network fθ(x)f_{\theta}(x), and the adversary who generates queries xx to the Oracle. Under the hard-label setting, the Oracle 𝒪\mathcal{O} returns the hard-label z(fθ(x))z\left(f_{\theta}(x)\right) in Definition 9.

Definition 10 (Model Parameter Extraction Attack)

A model parameter extraction attack receives Oracle access to a parameterized function fθf_{\theta} (i.e., a kk-deep neural network in our paper) and the architecture of fθf_{\theta}, and returns a set of parameters θ^\widehat{\theta} with the goal that fθ^(x)f_{\widehat{\theta}}(x) is as similar as possible to c×fθ(x)c\times f_{\theta}(x) where c>0c>0 is a fixed constant.

In this paper, we use the _^\widehat{\_} symbol to indicate an extracted parameter. For example, θ\theta is the parameters of the victim model fθf_{\theta}, and θ^\widehat{\theta} stands for the parameters of the extracted model fθ^f_{\widehat{\theta}}.

Assumptions.

We make the following assumptions of the Oracle 𝒪\mathcal{O} and the capabilities of the attacker:

  • Architecture knowledge. We require knowledge of the neural network architecture.

  • Full-domain inputs. We can feed arbitrary inputs from 𝒳=d0\mathcal{X}=\mathbb{R}^{d_{0}}.

  • Precise computations. fθf_{\theta} is specified and evaluated using 64-bit floating-point arithmetic.

  • Scalar outputs. The output dimensionality is 11, i.e., 𝒴=\mathcal{Y}=\mathbb{R}. 444This assumption is fundamental to our work. Our attack only applies to the case of scalar outputs.

  • ReLU Activations. All activation functions (σ\sigma’s) are the ReLU function.

Compared with the work in [4], we remove the assumption of requiring the raw output fθ(x)f_{\theta}(x) of the neural network. Now, after querying the Oracle 𝒪\mathcal{O}, the attacker obtains the hard-label z(fθ(x))z\left(f_{\theta}(x)\right). In other words, the attacker only knows whether fθ(x)>0f_{\theta}(x)>0 holds or not.

3 Auxiliary Concepts

To help understand our attacks, this paper proposes some auxiliary concepts.

3.1 Model Activation Pattern

To describe all the neuron states, we introduce a new concept named Model Activation Pattern.

Definition 11 (Model Activation Pattern)

Consider a kk-deep neural network fθf_{\theta} with n=i=1kdin=\sum_{i=1}^{k}{d_{i}} neurons. The model activation pattern of fθf_{\theta} over an input x𝒳x\in\mathcal{X} is a global description of the nn neuron states, and is denoted by 𝒫(x)=(𝒫(1)(x),,𝒫(k)(x))\mathcal{P}(x)=(\mathcal{P}^{(1)}(x),\cdots,\mathcal{P}^{(k)}(x)) where 𝒫(i)(x)𝔽2di\mathcal{P}^{(i)}(x)\in\mathbb{F}_{2}^{d_{i}} is the concatenation of did_{i} neuron states (i.e., 𝒫j(i)(x),i{1,,di}\mathcal{P}_{j}^{(i)}(x),i\in\{1,\cdots,d_{i}\}) in layer ii.

In the rest of this paper, the notations 𝒫j(i)(x)\mathcal{P}_{j}^{(i)}(x), 𝒫(i)(x)\mathcal{P}^{(i)}(x), and 𝒫(x)\mathcal{P}(x) are simplified as 𝒫j(i)\mathcal{P}_{j}^{(i)}, 𝒫(i)\mathcal{P}^{(i)}, and 𝒫\mathcal{P} respectively, when the meaning is clear in context. Besides, 𝒫(i)𝔽2di\mathcal{P}^{(i)}\in\mathbb{F}_{2}^{d_{i}} is represented by a did_{i}-bit integer. For example, 𝒫(i)=2j1\mathcal{P}^{(i)}=2^{j-1} means that only the jj-th neuron in layer ii is active, and 𝒫(i)=2di1\mathcal{P}^{(i)}=2^{d_{i}}-1 means that all the did_{i} neurons are active.

When the model activation pattern is known, one can precisely determine which neural network parameters influence the output fθ(x)f_{\theta}(x). Consider the jj-th neuron η\eta in layer ii. Due to the ReLU activation function, if the neuron state is inactive, neuron η\eta does not influence the output fθ(x)f_{\theta}(x). As a result, all the weights A?,j(i+1)A_{?,j}^{(i+1)} and Aj,?(i)A_{j,?}^{(i)} ( i.e., the elements of the jj-th column of A(i+1)A^{(i+1)}, and the jj-th row of A(i)A^{(i)} respectively) and the bias bj(i)b_{j}^{(i)} do not affect the output fθ(x)f_{\theta}(x).

Special ‘neuron’.

For the convenience of introducing model extraction attacks later, we regard the input xd0x\in\mathbb{R}^{d_{0}} and the output fθ(x)f_{\theta}(x)\in\mathbb{R} as, respectively, d0d_{0} and 11 special ‘neurons’ that are always active. So we adopt two extra notations 𝒫(0)=2d01\mathcal{P}^{(0)}=2^{d_{0}}-1 and 𝒫(k+1)=211=1\mathcal{P}^{(k+1)}=2^{1}-1=1, for describing the states of the special d0+1d_{0}+1 ‘neurons’. But if not necessary, we will omit the two notations.

3.2 Model Signature

Consider a kk-deep neural network fθf_{\theta}. For an input x𝒳x\in\mathcal{X}, fθf_{\theta} can be described as an affine transformation

fθ(x)\displaystyle f_{\theta}(x) =A(k+1)(I𝒫(2)(A(2)(I𝒫(1)(A(1)x+b(1)))+b(2)))+b(k+1)\displaystyle=A^{(k+1)}\cdots\left(I_{\mathcal{P}}^{(2)}\left(A^{(2)}\left(I_{\mathcal{P}}^{(1)}\left(A^{(1)}x+b^{(1)}\right)\right)+b^{(2)}\right)\right)\cdots+b^{(k+1)} (5)
=A(k+1)I𝒫(k)A(k)I𝒫(2)A(2)I𝒫(1)A(1)x+B𝒫=Γ𝒫x+B𝒫,\displaystyle=A^{(k+1)}I_{\mathcal{P}}^{(k)}A^{(k)}\cdots I_{\mathcal{P}}^{(2)}A^{(2)}I_{\mathcal{P}}^{(1)}A^{(1)}x+B_{\mathcal{P}}=\varGamma_{\mathcal{P}}x+B_{\mathcal{P}},

where 𝒫\mathcal{P} is the model activation pattern over xx, Γ𝒫d0\varGamma_{\mathcal{P}}\in\mathbb{R}^{d_{0}}, and B𝒫B_{\mathcal{P}}\in\mathbb{R}. Here, I𝒫(i)di×diI_{\mathcal{P}}^{(i)}\in\mathbb{R}^{d_{i}\times d_{i}} are 0-11 diagonal matrices with a 0 on the diagonal’s jj-th entry when the neuron state 𝒫j(i)\mathcal{P}_{j}^{(i)} is 0, and 11 when 𝒫j(i)=1\mathcal{P}_{j}^{(i)}=1.

The affine transformation is denoted by a tuple (Γ𝒫,B𝒫)(\varGamma_{\mathcal{P}},B_{\mathcal{P}}). Except for 𝒫\mathcal{P}, the value of the tuple (Γ𝒫,B𝒫)(\varGamma_{\mathcal{P}},B_{\mathcal{P}}) is only determined by the neural network parameters, i.e., A(i)A^{(i)} and b(i),i{1,,k+1}b^{(i)},i\in\{1,\cdots,k+1\}. Once the value of any neural network parameters is changed, the value of the tuple (Γ𝒫,B𝒫)(\varGamma_{\mathcal{P}},B_{\mathcal{P}}) corresponding to some 𝒫\mathcal{P}’s will change too 555 We do not consider the case of some neurons being always inactive, since such neurons are redundant and usually deleted by various network pruning methods (e.g., [10]) before the neural network is deployed as a prediction service. . Therefore, we regard the set of all the possible tuples (Γ𝒫,B𝒫)(\varGamma_{\mathcal{P}},B_{\mathcal{P}}) as a unique model signature of the neural network.

Definition 12 (Model Signature)

For a kk-deep neural network fθ(x)f_{\theta}(x), the model signature denoted by 𝒮θ\mathcal{S}_{\theta} is the set of affine transformations

𝒮θ={(Γ𝒫,B𝒫) for all the 𝒫’s}.\mathcal{S}_{\theta}=\{(\varGamma_{\mathcal{P}},B_{\mathcal{P}})\text{ for all the }\mathcal{P}\text{'s}\}.

In [3], Canales-Martínez et al. use the term ‘signature’ to describe the weights related to a neuron, which is different from the model signature. Except for the model signature, we propose another important concept, namely normalized model signature.

Definition 13 (Normalized Model Signature)

Consider a victim model fθf_{\theta} and its model signature 𝒮θ={(Γ𝒫,B𝒫) for all the 𝒫’s}\mathcal{S}_{\theta}=\{(\varGamma_{\mathcal{P}},B_{\mathcal{P}})\text{ for all the }\mathcal{P}\text{'s}\} . Denote by Γ𝒫,j\varGamma_{\mathcal{P},j} the jj-th element of Γ𝒫\varGamma_{\mathcal{P}} for j{1,,d0}j\in\{1,\cdots,d_{0}\}. Divide the set of 𝒫\mathcal{P}’s into two subsets 𝒬1\mathcal{Q}_{1} and 𝒬2\mathcal{Q}_{2}. For each 𝒫𝒬1\mathcal{P}\in\mathcal{Q}_{1}, Γ𝒫,j=0\varGamma_{\mathcal{P},j}=0 for j{1,,d0}j\in\{1,\cdots,d_{0}\}. For each 𝒫𝒬2\mathcal{P}\in\mathcal{Q}_{2}, there is at least one non-zero element in Γ𝒫\varGamma_{\mathcal{P}}, without loss of generality, assume that Γ𝒫,10\varGamma_{\mathcal{P},1}\neq 0. Let 𝒮θ𝒩\mathcal{S}_{\theta}^{\mathcal{N}} be the following set

𝒮θ𝒩={(Γ𝒫,B𝒫) for 𝒫𝒬1,(Γ𝒫|Γ𝒫,1|,B𝒫|Γ𝒫,1|) for 𝒫𝒬2}.\mathcal{S}_{\theta}^{\mathcal{N}}=\left\{\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right)\text{ for }\mathcal{P}\in\mathcal{Q}_{1},\left(\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|},\frac{B_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|}\right)\text{ for }\mathcal{P}\in\mathcal{Q}_{2}\right\}.

The set 𝒮θ𝒩\mathcal{S}_{\theta}^{\mathcal{N}} is the normalized model signature of fθf_{\theta}.

Shortly, the difference between the normalized model signature 𝒮θ𝒩\mathcal{S}_{\theta}^{\mathcal{N}} and the initial model signature 𝒮θ\mathcal{S}_{\theta} is as follows. For each 𝒫𝒬2\mathcal{P}\in\mathcal{Q}_{2}, i.e., there is at least one non-zero element in Γ𝒫\varGamma_{\mathcal{P}} (without loss of generality, assume that the first element is non-zero, i.e.,Γ𝒫,10\varGamma_{\mathcal{P},1}\neq 0), we transform the parameter tuple into (Γ𝒫|Γ𝒫,1|,B𝒫|Γ𝒫,1|)\left(\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|},\frac{B_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|}\right).

In our attacks, the normalized model signature plays two important roles. First, the recovery of all the weights A(i)A^{(i)} relies on the subset 𝒬2\mathcal{Q}_{2}. Second, our attacks will produce many extracted models during the attack process while at most only one is the functionally equivalent model of fθf_{\theta}, and the normalized model signature is used to filter functionally inequivalent models.

3.3 Decision Boundary Point

Our attacks exploit a special class of inputs named Decision Boundary Points.

Definition 14 (Decision Boundary Point)

Consider a neural network fθf_{\theta}. If an input xx makes fθ(x)=0f_{\theta}(x)=0 hold, xx is a decision boundary point.

The extraction attacks presented at CRYPTO 2020 [4] and EUROCRYPT 2024 [3] exploit a class of inputs, namely critical points. Fig. 1 shows the difference between critical points and decision boundary points.

Refer to caption
Figure 1: Left: the critical point x=[x1,x2,x3]x=[x_{1},x_{2},x_{3}]^{\top} makes the output of one neuron (e.g., the solid black circle) 0. Right: the decision boundary point x=[x1,x2,x3]x^{\prime}=[x_{1}^{\prime},x_{2}^{\prime},x_{3}^{\prime}]^{\top} makes the output of the neural network 0.

Critical points leak information on the neuron states, i.e., whether the output of a neuron is 0, which is the core reason why the differential extraction attack can efficiently extract model parameters [4]. As a comparison, decision boundary points do not leak information on the neuron states.

Finding critical points relies on computing partial derivatives based on the raw output fθ(x)f_{\theta}(x), refer to the work in [4, 3]. Thus, under the hard-label setting, we can not exploit critical points.

4 Overview of Our Cryptanalytic Extraction Attacks

Under the hard-label setting, i.e., the Oracle returns the most likely class z(fθ(x))z\left(f_{\theta}(x)\right) instead of the raw output fθ(x)f_{\theta}(x), only decision boundary points xx will leak the value of fθ(x)f_{\theta}(x), since fθ(x)=0f_{\theta}(x)=0. Motivated by this truth, our cryptanalytic extraction attacks focus on decision boundary points.

Attack Process.

At a high level, the complete attack contains five steps.

  1. \cdot

    Step 1: collect decision boundary points. The algorithm for finding decision boundary points will be introduced in Section 6.1. Suppose that MM decision boundary points are collected.

  2. \cdot

    Step 2: recover the normalized model signature. Recover the tuples (Γ𝒫,B𝒫)\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right) corresponding to the MM decision boundary points. After filtering duplicate tuples, regard the set of the remaining tuples as the (partial) normalized model signature 𝒮θ𝒩\mathcal{S}_{\theta}^{\mathcal{N}}. Suppose that the size of 𝒬2\mathcal{Q}_{2} is NN, refer to Definition 13. It means that there are NN decision boundary points that can be used to recover weights A(i)A^{(i)}.

  3. \cdot

    Step 3: recover weights layer by layer. Suppose that there are n=i=1kdin=\sum_{i=1}^{k}d_{i} neurons in the neural network. Randomly choose n+1n+1 out of NN decision boundary points each time, assign a specific model activation pattern 𝒫\mathcal{P} to each selected decision boundary point, and recover the weights A(1),,A(k+1)A^{(1)},\cdots,A^{(k+1)}.

  4. \cdot

    Step 4: recover all the biases. Based on recovered weights, recover all the biases b(i),i{1,,k+1}b^{(i)},i\in\{1,\cdots,k+1\} simultaneously.

  5. \cdot

    Step 5: filter functionally inequivalent models. As long as Nn+1N\geqslant n+1 holds, we will obtain many extracted models, but it is expected that at most only one is the functionally equivalent model. Thus, we filter functionally inequivalent models in this step.

Some functionally inequivalent models may not be filtered. For each surviving extracted model, we test the Prediction Matching Ratio (PMR, introduced in Section 6.3) over randomly generated inputs, and take the one with the highest PMR as the final candidate.

In Step 2, we recover the tuple (Γ𝒫,B𝒫)\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right) by the extraction attack on 0-deep neural networks. In Step 3, for layer i>1i>1, the weight vector Aj(i)A_{j}^{(i)} of the jj-th neuron (j{1,,di}j\in\{1,\cdots,d_{i}\}) is recovered by solving a system of linear equations. For layer 11, except for selecting d1d_{1} decision boundary points, the recovery of the weights A(1)A^{(1)} does not use any extra techniques. In Step 4, all the biases are recovered by solving a system of linear equations.

5 Idealized Hard-Label Model Extraction Attack

This section introduces (0,0)(0,0)-functionally equivalent model extraction attacks under the hard-label setting, which assumes infinite precision arithmetic and recovers the functionally equivalent model. We first introduce the 0-deep neural network extraction attack, which is used in the kk-deep neural network extraction attack to recover the normalized model signature.

Note that this section only (partially) involves Steps 2, 3, and 4 introduced in Section 4. In the next section, we introduce the remaining steps and refine the idealized attacks to work with finite precision.

5.1 Zero-Deep Neural Network Extraction

According to Definition 3, zero-deep neural networks are affine functions fθ(x)A(1)x+b(1)f_{\theta}(x)\equiv A^{(1)}\cdot x+b^{(1)} where A(1)d0A^{(1)}\in\mathbb{R}^{d_{0}}, and b(1)b^{(1)}\in\mathbb{R}. Let A(1)=[w1(1),,wd0(1)]A^{(1)}=[w_{1}^{(1)},\cdots,w_{d_{0}}^{(1)}], and x=[x1,x2,,xd0]x=[x_{1},x_{2},\cdots,x_{d_{0}}]^{\top}. The model signature is 𝒮θ=(A(1),b(1))\mathcal{S}_{\theta}=\left(A^{(1)},b^{(1)}\right).

Our extraction attack is based on a decision boundary point xx (i.e., fθ(x)=0f_{\theta}(x)=0), and composed of 33 steps: (1) recover weight signs, i.e., the sign of wi(1)w_{i}^{(1)}; (2) recover weights wi(1)w_{i}^{(1)}; (3) recover bias b(1)b^{(1)}.

Recover Weight Signs.

Denote by eid0e_{i}\in\mathbb{R}^{d_{0}} the basis vector where only the ii-th element is 11 and other elements are 0.

Let the decision boundary point xx move along the direction ei,i{1,,d0}e_{i},i\in\{1,\cdots,d_{0}\}, and the moving stride is ss\in\mathbb{R} where |s|>0|s|>0. Query the Oracle and obtain the hard-label z(f(x+sei))z\left(f(x+se_{i})\right), then the sign of wi(1)w_{i}^{(1)} is

sign(wi(1))={    1,ifs>0andz(fθ(x+sei))=1,1,ifs<0andz(fθ(x+sei))=1.{\rm sign}(w_{i}^{(1)})=\left\{\begin{array}[]{c}\,\,\,\,1,\,\,{\rm if}\,\,s>0\,\,{\rm and}\,\,z\left(f_{\theta}(x+se_{i})\right)=1,\\ -1,\,\,{\rm if}\,\,s<0\,\,{\rm and}\,\,z\left(f_{\theta}(x+se_{i})\right)=1.\\ \end{array}\right. (6)

When z(f(x+sei))=1z\left(f(x+se_{i})\right)=1, we have f(x+sei)>0f(x+se_{i})>0, i.e., wi(1)×s>0w_{i}^{(1)}\times s>0. Thus, the sign of wi1w_{i}^{1} is the same as that of ss. If z(f(x+sei))=0z\left(f(x+se_{i})\right)=0 always holds, no matter if ss is positive or negative, then we have wi(1)=0w_{i}^{(1)}=0.

Recover Weights.

Without loss of generality, assume that w1(1)0w_{1}^{(1)}\neq 0.

At first, let the decision boundary point xx move along e1e_{1} with a moving stride s1s_{1}, such that the hard-label of the new point x+s1e1x+s_{1}e_{1} is 11, i.e., z(f(x+s1e1))=1z\left(f(x+s_{1}e_{1})\right)=1. Then, let the new point x+s1e1x+s_{1}e_{1} move along eie_{i} with a moving stride sis_{i} where i1i\neq 1 and wi(1)0w_{i}^{(1)}\neq 0 , such that x+s1e1+sieix+s_{1}e_{1}+s_{i}e_{i} is a decision boundary point too. As a result, we have

s1w1(1)+siwi(1)=0,s_{1}w_{1}^{(1)}+s_{i}w_{i}^{(1)}=0, (7)

and obtain the weight ratio wi(1)w1(1)\frac{w_{i}^{(1)}}{w_{1}^{(1)}}. Since the signs of wi(1)w_{i}^{(1)} are known, the final extracted weights are

A^(1)=[w1(1)|w1(1)|,w2(1)|w1(1)|,,wd0(1)|w1(1)|].\widehat{A}^{(1)}=\left[\frac{w_{1}^{(1)}}{\left|w_{1}^{(1)}\right|},\frac{w_{2}^{(1)}}{\left|w_{1}^{(1)}\right|},\cdots,\frac{w_{d_{0}}^{(1)}}{\left|w_{1}^{(1)}\right|}\right]. (8)
Recover Bias.

The extracted bias is b^(1)=A^(1)x=b(1)|w1(1)|\widehat{b}^{(1)}=-\widehat{A}^{(1)}\cdot x=\frac{b^{(1)}}{\left|w_{1}^{(1)}\right|} .

Thus, the model signature of fθ^f_{\widehat{\theta}} is 𝒮θ^=(A(1)|w1(1)|,b(1)|w1(1)|)\mathcal{S}_{\widehat{\theta}}=\left(\frac{A^{(1)}}{\left|w_{1}^{(1)}\right|},\frac{b^{(1)}}{\left|w_{1}^{(1)}\right|}\right), and fθ^(x)=f(x)|w1(1)|f_{\widehat{\theta}}(x)=\frac{f(x)}{\left|w_{1}^{(1)}\right|}.

Remark 1

In [14], the authors propose different methods to extract the parameters of linear functions fθ(x)=A(1)xf_{\theta}(x)=A^{(1)}\cdot x. Since this paper mainly focuses on the extraction attack on kk-deep neural networks, we do not deeply compare our attack with the methods in [14].

5.2 kk-Deep Neural Network Extraction

Basing the 0-deep neural network extraction attack, we develop an extraction attack on kk-deep neural networks. Recall that, the expression of kk-deep neural networks is

fθ(x)\displaystyle f_{\theta}(x) =A(k+1)(I𝒫(2)(A(2)(I𝒫(1)(A(1)x+b(1)))+b(2)))+b(k+1)\displaystyle=A^{(k+1)}\cdots\left(I_{\mathcal{P}}^{(2)}\left(A^{(2)}\left(I_{\mathcal{P}}^{(1)}\left(A^{(1)}x+b^{(1)}\right)\right)+b^{(2)}\right)\right)\cdots+b^{(k+1)} (9)
=Γ𝒫x+B𝒫\displaystyle=\varGamma_{\mathcal{P}}x+B_{\mathcal{P}}

where the model activation pattern is 𝒫=(𝒫(0),𝒫(1),,𝒫(k),𝒫(k+1))\mathcal{P}=\left(\mathcal{P}^{(0)},\mathcal{P}^{(1)},\cdots,\mathcal{P}^{(k)},\mathcal{P}^{(k+1)}\right) and

Γ𝒫=A(k+1)I𝒫(k)A(k)I𝒫(2)A(2)I𝒫(1)A(1).\varGamma_{\mathcal{P}}=A^{(k+1)}I_{\mathcal{P}}^{(k)}A^{(k)}\cdots I_{\mathcal{P}}^{(2)}A^{(2)}I_{\mathcal{P}}^{(1)}A^{(1)}. (10)
Notations.

Our attack recovers weights layer by layer. Assuming that the weights of the first i1i-1 layers have been recovered and we are trying to recover A(i)A^{(i)} where i{1,,k+1}i\in\{1,\cdots,k+1\}, we describe kk-deep neural networks as:

fθ(x)=Γ𝒫x+B𝒫=𝒢(i)A(i)C(i1)x+B𝒫,f_{\theta}(x)=\varGamma_{\mathcal{P}}x+B_{\mathcal{P}}=\mathcal{G}^{(i)}A^{(i)}C^{(i-1)}x+B_{\mathcal{P}}, (11)

where 𝒢(i)di\mathcal{G}^{(i)}\in\mathbb{R}^{d_{i}} and C(i1)di1×d0C^{(i-1)}\in\mathbb{R}^{d_{i-1}\times d_{0}} are, respectively, related to the unrecovered part (excluding A(i)A^{(i)}) and recovered part of the neural network fθf_{\theta}.

The values of 𝒢(i)\mathcal{G}^{(i)} and C(i1)C^{(i-1)} are

𝒢(i)\displaystyle\mathcal{G}^{(i)} ={A(k+1)I𝒫(k)A(k)I𝒫(i+1)A(i+1)I𝒫(i),ifi{1,,k}1,ifi=k+1\displaystyle=\left\{\begin{array}[]{l}A^{(k+1)}I_{\mathcal{P}}^{(k)}A^{(k)}\cdots I_{\mathcal{P}}^{(i+1)}A^{(i+1)}I_{\mathcal{P}}^{(i)},\,\text{if}\,\,i\in\{1,\cdots,k\}\\ 1,\,\text{if}\,\,i=k+1\\ \end{array}\right. (12)
C(i1)\displaystyle C^{(i-1)} ={I,ifi=1I𝒫(i1)A(i1)I𝒫(1)A(1),ifi{2,,k+1}\displaystyle=\left\{\begin{array}[]{l}I,\,\text{if}\,\,i=1\\ I_{\mathcal{P}}^{(i-1)}A^{(i-1)}\cdots I_{\mathcal{P}}^{(1)}A^{(1)},\,\text{if}\,\,i\in\{2,\cdots,k+1\}\\ \end{array}\right.

where C(0)=Id0×d0C^{(0)}=I\in\mathbb{R}^{d_{0}\times d_{0}} is a diagonal matrix with a 11 on each diagonal entry.

Core Idea of Recovering Weights Layer by Layer.

To better grasp the attack details presented later, we first introduce the core idea of recovering weights layer by layer. Assuming that the extracted weights of the first i1i-1 layers are known, i.e., A^(1),,A^(i1)\widehat{A}^{(1)},\cdots,\widehat{A}^{(i-1)} are known, we try to recover the weights in layer ii.

To obtain the weight vector A^j(i)\widehat{A}_{j}^{(i)} of the jj-th neuron (denoted by ηj(i)\eta_{j}^{(i)}) in layer i{1,,k+1}i\in\{1,\cdots,k+1\} 666 when i=k+1i=k+1, it means that we are trying to recover the weights A(k+1)A^{(k+1)}. , we exploit a decision boundary point with the model activation pattern 𝒫=(𝒫(0),𝒫(1),,𝒫(k),𝒫(k+1))\mathcal{P}=\left(\mathcal{P}^{(0)},\mathcal{P}^{(1)},\cdots,\mathcal{P}^{(k)},\mathcal{P}^{(k+1)}\right) where

𝒫(i1)=2di11,𝒫(i)=2j1.\mathcal{P}^{(i-1)}=2^{d_{i-1}}-1,\,\,\mathcal{P}^{(i)}=2^{j-1}. (13)

It means that, in layer ii, only the jj-th neuron is active, and all the di1d_{i-1} neurons in layer i1i-1 are active. Fig 2 shows a schematic diagram under this scenario.

Refer to caption
Figure 2: The core idea of recovering the weight vector of the jj-th neuron in layer ii. Let x=[x1,,xd0]x=[x_{1},\cdots,x_{d_{0}}]^{\top} be a decision boundary point with 𝒫(i1)=2di11\mathcal{P}^{(i-1)}=2^{d_{i-1}}-1, 𝒫(i)=2j1\mathcal{P}^{(i)}=2^{j-1}, i.e., in layer ii, only the jj-th neuron (the red hollow circle) is active, and in layer i1i-1, all the neurons are active. The first i1i-1 layers have been extracted, and collapse into one layer. All the layers starting from layer i+1i+1 collapse into a direct connection between the jj-th neuron in layer ii and the final output.

Since 𝒫(i)=2j1\mathcal{P}^{(i)}=2^{j-1}, all the kik-i layers starting from layer i+1i+1 collapse into a direct connection from ηj(i)\eta_{j}^{(i)} to the output fθ(x)f_{\theta}(x). The weight of this connection is 𝒢j(i)\mathcal{G}_{j}^{(i)}, i.e., the jj-th element of 𝒢(i)\mathcal{G}^{(i)} (see Eq. (12)). The expression (see Eq. (11)) of the kk-deep neural network further becomes

fθ(x)=Γ𝒫x+B𝒫=𝒢j(i)Aj(i)C(i1)x+B𝒫,f_{\theta}(x)=\varGamma_{\mathcal{P}}\cdot x+B_{\mathcal{P}}=\mathcal{G}_{j}^{(i)}A_{j}^{(i)}\cdot C^{(i-1)}\cdot x+B_{\mathcal{P}},

where 𝒢j(i)\mathcal{G}_{j}^{(i)}\in\mathbb{R} and Aj(i)C(i1)d0A_{j}^{(i)}\cdot C^{(i-1)}\in\mathbb{R}^{d_{0}}.

In Step 2 (see Section 4), applying the extraction attack on zero-deep neural networks, we can obtain the tuple (Γ𝒫|Γ𝒫,1|,B𝒫|Γ𝒫,1|)\left(\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|},\frac{B_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|}\right) where

Γ𝒫=𝒢j(i)Aj(i)C(i1),Γ𝒫,v=𝒢j(i)Aj(i)C?,v(i1).\varGamma_{\mathcal{P}}=\mathcal{G}_{j}^{(i)}A_{j}^{(i)}\cdot C^{(i-1)},\,\,\varGamma_{\mathcal{P},v}=\mathcal{G}_{j}^{(i)}A_{j}^{(i)}\cdot C_{?,v}^{(i-1)}. (14)

Here the symbol C?,v(i1)C_{?,v}^{(i-1)} stands for the vv-th column vector of C(i1)C^{(i-1)}.

According to Eq. (14), the value of each element of Γ𝒫|Γ𝒫,1|\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|} is not related to the absolute value of 𝒢j(i)\mathcal{G}_{j}^{(i)}, i.e., the unrecovered part does not affect the affine transformation. Then, basing the vector Γ𝒫|Γ𝒫,1|\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|} and the extracted weights A^(1),,A^(i1)\widehat{A}^{(1)},\cdots,\widehat{A}^{(i-1)}, we build a system of linear equations and solve it to obtain A^j(i)\widehat{A}_{j}^{(i)}. Next, we introduce more attack details.

Recover Weights in Layer 11.

To recover the weight vector of the jj-th neuron in layer 11, we exploit the model activation pattern 𝒫\mathcal{P} where

𝒫(1)=2j1;𝒫(i)=2di1,fori{0,2,3,,k+1}.\mathcal{P}^{(1)}=2^{j-1};\,\,\mathcal{P}^{(i)}=2^{d_{i}}-1,\,\,{\rm for}\,\,i\in\{0,2,3,\cdots,k+1\}. (15)

It means that, in layer 11, only the jj-th neuron is active, and all the neurons in other layers are active.

Under this model activation pattern, according to Eq. (12), we have

𝒢(1)=A(k+1)A(k)A(2)I𝒫(1).\mathcal{G}^{(1)}=A^{(k+1)}A^{(k)}\cdots A^{(2)}I_{\mathcal{P}}^{(1)}. (16)

Now, the expression of the kk-deep neural network is

fθ(x)=𝒢j(1)(Aj(1)x+bj(1))+B(𝒫(2),,𝒫(k))=𝒢j(1)Aj(1)x+B𝒫f_{\theta}(x)=\mathcal{G}_{j}^{(1)}\left(A_{j}^{(1)}x+b_{j}^{(1)}\right)+B_{\left(\mathcal{P}^{(2)},\cdots,\mathcal{P}^{(k)}\right)}=\mathcal{G}_{j}^{(1)}A_{j}^{(1)}x+B_{\mathcal{P}} (17)

where Aj(1)=[wj,1(1),,wj,d0(1)]A_{j}^{(1)}=\left[w_{j,1}^{(1)},\cdots,w_{j,d_{0}}^{(1)}\right], 𝒢j(1)\mathcal{G}_{j}^{(1)}\in\mathbb{R} is the jj-th element of 𝒢(1)\mathcal{G}^{(1)}. As for B(𝒫(2),,𝒫(k))B_{\left(\mathcal{P}^{(2)},\cdots,\mathcal{P}^{(k)}\right)}\in\mathbb{R}, it is a constant determined by (𝒫(2),,𝒫(k))\left(\mathcal{P}^{(2)},\cdots,\mathcal{P}^{(k)}\right). In other words, when 𝒫(1)\mathcal{P}^{(1)} changes, the value of B(𝒫(2),,𝒫(k))B_{\left(\mathcal{P}^{(2)},\cdots,\mathcal{P}^{(k)}\right)} does not change.

Recall that, in Step 2, we have recovered the following weight vector

Γ𝒫|Γ𝒫,1|=[𝒢j(1)wj,1(1)|𝒢j(1)wj,1(1)|,,𝒢j(1)wj,d0(1)|𝒢j(1)wj,1(1)|],j{1,,d1}.\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|}=\left[\frac{\mathcal{G}_{j}^{(1)}w_{j,1}^{(1)}}{\left|\mathcal{G}_{j}^{(1)}w_{j,1}^{(1)}\right|},\cdots,\frac{\mathcal{G}_{j}^{(1)}w_{j,d_{0}}^{(1)}}{\left|\mathcal{G}_{j}^{(1)}w_{j,1}^{(1)}\right|}\right],j\in\{1,\cdots,d_{1}\}. (18)

In this step, our target is to obtain A^j(1)\widehat{A}_{j}^{(1)} where

A^j(1)=[w^j,11,,w^j,d01]=[wj,11|wj,11|,,wj,d01|wj,11|],j{1,,d1}.\widehat{A}_{j}^{(1)}=\left[\widehat{w}_{j,1}^{1},\cdots,\widehat{w}_{j,d_{0}}^{1}\right]=\left[\frac{w_{j,1}^{1}}{\left|w_{j,1}^{1}\right|},\cdots,\frac{w_{j,d_{0}}^{1}}{\left|w_{j,1}^{1}\right|}\right],j\in\{1,\cdots,d_{1}\}. (19)

Therefore, we need to determine d1d_{1} signs, i.e., the signs of 𝒢j(1)\mathcal{G}_{j}^{(1)} for 𝒫(1)=2j1\mathcal{P}^{(1)}=2^{j-1} where j{1,,d1}j\in\{1,\cdots,d_{1}\}.

Since Aj(1)x+bj(1)>0A_{j}^{(1)}x+b_{j}^{(1)}>0, we know that 𝒢j(1)×B(𝒫(2),,𝒫(k))<0\mathcal{G}_{j}^{(1)}\times B_{\left(\mathcal{P}^{(2)},\cdots,\mathcal{P}^{(k)}\right)}<0 holds for j{1,,d1}j\in\{1,\cdots,d_{1}\}, which tells us that the above d1d_{1} signs are the same. Thus, by guessing 11 sign, i.e., the sign of 𝒢j(1)\mathcal{G}_{j}^{(1)} for 𝒫(1){211,,2d11}\mathcal{P}^{(1)}\in\{2^{1-1},\cdots,2^{d_{1}-1}\}, we obtain d1d_{1} weight vectors presented in Eq. (19).

Recover Weights in Layer ii (i>1i>1).

To recover the weight vector of the jj-th neuron in layer ii, we exploit the model activation pattern 𝒫\mathcal{P} where

𝒫(i)=2j1;𝒫(q)=2dq1,forq{0,,i1,i+1,,k+1}.\mathcal{P}^{(i)}=2^{j-1};\,\,\mathcal{P}^{(q)}=2^{d_{q}}-1,\,\,{\rm for}\,\,q\in\{0,\cdots,i-1,i+1,\cdots,k+1\}. (20)

It means that, in layer ii, only the jj-th neuron is active, and all the neurons in other layers are active.

Under this model activation pattern, according to Eq. (12), we have

𝒢(i)\displaystyle\mathcal{G}^{(i)} ={A(k+1)A(k)A(i+2)A(i+1)I𝒫(i),ifi{2,,k},1,ifi=k+1,\displaystyle=\left\{\begin{array}[]{l}A^{(k+1)}A^{(k)}\cdots A^{(i+2)}A^{(i+1)}I_{\mathcal{P}}^{(i)},\,\text{if}\,\,i\in\{2,\cdots,k\},\\ 1,\,\text{if}\,\,i=k+1,\\ \end{array}\right. (21)
C(i1)\displaystyle C^{(i-1)} =A(i1)A(i2)A(1),ifi{2,,k+1}.\displaystyle=A^{(i-1)}A^{(i-2)}\cdots A^{(1)},\,\text{if}\,\,i\in\{2,\cdots,k+1\}.

Now, the expression of kk-deep neural networks becomes

fθ(x)\displaystyle f_{\theta}(x) =𝒢j(i)(Aj(i)C(i1)x+B(𝒫(1),,𝒫(i)))+B(𝒫(i+1),,𝒫(k))\displaystyle=\mathcal{G}_{j}^{(i)}\left(A_{j}^{(i)}C^{(i-1)}x+B_{\left(\mathcal{P}^{(1)},\cdots,\mathcal{P}^{(i)}\right)}\right)+B_{\left(\mathcal{P}^{(i+1)},\cdots,\mathcal{P}^{(k)}\right)} (22)
=𝒢j(i)Aj(i)C(i1)x+B𝒫,\displaystyle=\mathcal{G}_{j}^{(i)}A_{j}^{(i)}C^{(i-1)}x+B_{\mathcal{P}},

where Aj(i)=[wj,1(i),,wj,di1(i)]A_{j}^{(i)}=\left[w_{j,1}^{(i)},\cdots,w_{j,d_{i-1}}^{(i)}\right], 𝒢j(i)\mathcal{G}_{j}^{(i)}\in\mathbb{R} and C(i1)di1×d0C^{(i-1)}\in\mathbb{R}^{d_{i-1}\times d_{0}}. Besides, B(𝒫(i+1),,𝒫(k))B_{\left(\mathcal{P}^{(i+1)},\cdots,\mathcal{P}^{(k)}\right)}\in\mathbb{R} is not related to (𝒫(1),,𝒫(i))\left(\mathcal{P}^{(1)},\cdots,\mathcal{P}^{(i)}\right), and only determined by (𝒫(i+1),,𝒫(k))\left(\mathcal{P}^{(i+1)},\cdots,\mathcal{P}^{(k)}\right), i.e., B(𝒫(i+1),,𝒫(k))B_{\left(\mathcal{P}^{(i+1)},\cdots,\mathcal{P}^{(k)}\right)} is the same constant for j{1,,di}j\in\{1,\cdots,d_{i}\}.

Let us further rewrite fθ(x)f_{\theta}(x) in Eq. (22) as

fθ(x)=𝒢j(i)((v=1di1wj,v(i)Cv,1(i1))x1++(v=1di1wj,v(i)Cv,d0(i1))xd0)+B𝒫f_{\theta}(x)=\mathcal{G}_{j}^{(i)}\left(\left(\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right)x_{1}+\cdots+\left(\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,d_{0}}^{(i-1)}}\right)x_{d_{0}}\right)+B_{\mathcal{P}} (23)

where Cv,u(i1)C_{v,u}^{(i-1)}\in\mathbb{R} is the uu-th element of the vv-th row vector of C(i1)C^{(i-1)}.

In Step 2, using the zero-deep neural network extraction attack, we have recovered the following did_{i} weight vectors (j{1,,di}j\in\{1,\cdots,d_{i}\})

Γ𝒫|Γ𝒫,1|=[𝒢j(i)(v=1di1wj,v(i)Cv,1(i1))|𝒢j(i)(v=1di1wj,v(i)Cv,1(i1))|,,𝒢j(i)(v=1di1wj,v(i)Cv,d0(i1))|𝒢j(i)(v=1di1wj,v(i)Cv,1(i1))|].\frac{\varGamma_{\mathcal{P}}}{\left|\varGamma_{\mathcal{P},1}\right|}=\left[\frac{\mathcal{G}_{j}^{(i)}\left(\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right)}{\left|\mathcal{G}_{j}^{(i)}\left(\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right)\right|},\cdots,\frac{\mathcal{G}_{j}^{(i)}\left(\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,d_{0}}^{(i-1)}}\right)}{\left|\mathcal{G}_{j}^{(i)}\left(\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right)\right|}\right]. (24)

In this step, our target is to obtain the weight vector A^j(i)=[w^j,1(i),,w^j,di1(i)]\widehat{A}_{j}^{(i)}=\left[\widehat{w}_{j,1}^{(i)},\cdots,\widehat{w}_{j,d_{i-1}}^{(i)}\right].

It is clear that we need to guess the sign of 𝒢j(i)\mathcal{G}_{j}^{(i)} for j{1,,di}j\in\{1,\cdots,d_{i}\}. Again, all the did_{i} signs are the same. Consider the expression in Eq. (22). Since the jj-th neuron is active, its output exceeds 0, i.e.,

Aj(i)C(i1)x+B(𝒫(1),,𝒫(i))>0,j{1,,di}.A_{j}^{(i)}C^{(i-1)}x+B_{\left(\mathcal{P}^{(1)},\cdots,\mathcal{P}^{(i)}\right)}>0,j\in\{1,\cdots,d_{i}\}.

Then 𝒢j(i)×B(𝒫(i+1),,𝒫(k))<0\mathcal{G}_{j}^{(i)}\times B_{\left(\mathcal{P}^{(i+1)},\cdots,\mathcal{P}^{(k)}\right)}<0 holds for j{1,,di}j\in\{1,\cdots,d_{i}\}. At the same time, since B(𝒫(i+1),,𝒫(k))B_{\left(\mathcal{P}^{(i+1)},\cdots,\mathcal{P}^{(k)}\right)} is a constant, all the did_{i} signs are the same. Therefore, by guessing one sign, i.e., the sign of 𝒢j(i)\mathcal{G}_{j}^{(i)}, based on Eq. (24), we obtain

[v=1di1wj,v(i)Cv,1(i1)|v=1di1wj,v(i)Cv,1(i1)|,,v=1di1wj,v(i)Cv,d0(i1)|v=1di1wj,v(i)Cv,1(i1)|],j{1,,di}.\left[\frac{\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|},\cdots,\frac{\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,d_{0}}^{(i-1)}}}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|}\right],j\in\{1,\cdots,d_{i}\}. (25)

Note that C^v,u(i1)\widehat{C}_{v,u}^{(i-1)} can be obtained using Eq. (21), since A^(1),,A^(i1)\widehat{A}^{(1)},\cdots,\widehat{A}^{(i-1)} are known. Then, basing the vector in Eq. (25), we build a system of linear equations

{v=1di1w^j,v(i)C^v,1(i1)=v=1di1wj,v(i)Cv,1(i1)|v=1di1wj,v(i)Cv,1(i1)|,v=1di1w^j,v(i)C^v,d0(i1)=v=1di1wj,v(i)Cv,d0(i1)|v=1di1wj,v(i)Cv,1(i1)|,\left\{\begin{array}[]{c}\sum_{v=1}^{d_{i-1}}{\widehat{w}_{j,v}^{(i)}\widehat{C}_{v,1}^{(i-1)}}=\frac{\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|},\\ \vdots\\ \sum_{v=1}^{d_{i-1}}{\widehat{w}_{j,v}^{(i)}\widehat{C}_{v,d_{0}}^{(i-1)}}=\frac{\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,d_{0}}^{(i-1)}}}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|},\\ \end{array}\right. (26)

When d0di1d_{0}\geqslant d_{i-1} 777 The case of d0di1d_{0}\geqslant d_{i-1} is common in various applications, particularly in computer vision [9, 13, 17], since the dimensions of images or videos are often large., we obtain A^j(i)=[w^j,1(i),,w^j,di1(i)]\widehat{A}_{j}^{(i)}=\left[\widehat{w}_{j,1}^{(i)},\cdots,\widehat{w}_{j,d_{i-1}}^{(i)}\right] by solving the above system of linear equations. Lemma 1 summarizes the expression of extracted weight vectors A^j(i),j{1,,di},i{2,,k+1}\widehat{A}_{j}^{(i)},j\in\{1,\cdots,d_{i}\},i\in\{2,\cdots,k+1\}.

Lemma 1

Based on the system of linear equations presented in Eq. (26), for i{2,,k+1}i\in\{2,\cdots,k+1\} and j{1,,di}j\in\{1,\cdots,d_{i}\}, the extracted weight vector A^j(i)=[w^j,1(i),,w^j,di1(i)]\widehat{A}_{j}^{(i)}=\left[\widehat{w}_{j,1}^{(i)},\cdots,\widehat{w}_{j,d_{i-1}}^{(i)}\right] is

A^j(i)=[wj,1(i)×|v=1di2w1,v(i1)Cv,1(i2)||v=1di1wj,v(i)Cv,1(i1)|,,wj,di1(i)×|v=1di2wdi1,v(i1)Cv,1(i2)||v=1di1wj,v(i)Cv,1(i1)|],\widehat{A}_{j}^{(i)}=\left[\frac{w_{j,1}^{(i)}\times\left|\sum_{v=1}^{d_{i-2}}{w_{1,v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|},\cdots,\frac{w_{j,d_{i-1}}^{(i)}\times\left|\sum_{v=1}^{d_{i-2}}{w_{d_{i-1},v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|}\right], (27)

where Cv,1(q)=Av(q)A(q1)A(2)[A1,1(1),,Ad1,1(1)]C_{v,1}^{(q)}=A_{v}^{(q)}A^{(q-1)}\cdots A^{(2)}\left[A_{1,1}^{(1)},\cdots,A_{d_{1},1}^{(1)}\right]^{\top} .

Proof

The proof refers to Appendix 0.A.

In Lemma 1, for the consistency of the mathematical symbols, the weights A(k+1)A^{(k+1)} are denoted by [w1,1(k+1),,w1,dk(k+1)][w_{1,1}^{(k+1)},\cdots,w_{1,d_{k}}^{(k+1)}] instead of [w1(k+1),,wdk(k+1)][w_{1}^{(k+1)},\cdots,w_{d_{k}}^{(k+1)}].

Recover All the Biases.

Since A^(i)\widehat{A}^{(i)} for i{1,,k+1}i\in\{1,\cdots,k+1\} have been obtained, we can extract all the biases by solving a system of linear equations.

Concretely, for the i=1kdi+1\sum_{i=1}^{k}{d_{i}}+1 decision boundary points, fθ^(x)=0f_{\widehat{\theta}}(x)=0 should hold. Thus, we build a system of linear equations: fθ^(x)=0f_{\widehat{\theta}}(x)=0 where the expression of fθ^(x)f_{\widehat{\theta}}(x) refers to Eq. (9). Combining with Lemma 1, by solving the above system, we will obtain

{b^(i)=[b1(i)|v=1di1w1,v(i)Cv,1(i1)|,,bdi(i)|v=1di1wdi,v(i)Cv,1(i1)|],i{1,,k}b^(k+1)=b(k+1)|v=1dkwv(k+1)Cv,1(k)|.\left\{\begin{array}[]{l}\widehat{b}^{(i)}=\left[\frac{b_{1}^{(i)}}{\left|\sum_{v=1}^{d_{i-1}}{w_{1,v}^{(i)}C_{v,1}^{(i-1)}}\right|},\cdots,\frac{b_{d_{i}}^{(i)}}{\left|\sum_{v=1}^{d_{i-1}}{w_{d_{i},v}^{(i)}C_{v,1}^{(i-1)}}\right|}\right],i\in\{1,\cdots,k\}\\ \widehat{b}^{(k+1)}=\frac{b^{(k+1)}}{\left|\sum_{v=1}^{d_{k}}{w_{v}^{(k+1)}C_{v,1}^{(k)}}\right|}.\\ \end{array}\right. (28)

Based on the extracted neural network parameters (see Eq. (19), Eq. (27), and Eq. (28)), the model signature of the extracted model fθ^f_{\widehat{\theta}} is

𝒮θ^={(Γ^𝒫,B^𝒫)=(Γ𝒫|v=1dkwv(k+1)Cv,1(k)|,B𝒫|v=1dkwv(k+1)Cv,1(k)|)forallthe𝒫s}.\mathcal{S}_{\widehat{\theta}}=\left\{(\widehat{\varGamma}_{\mathcal{P}},\widehat{B}_{\mathcal{P}})=\left(\frac{\varGamma_{\mathcal{P}}}{\left|\sum_{v=1}^{d_{k}}{w_{v}^{(k+1)}C_{v,1}^{(k)}}\right|},\frac{B_{\mathcal{P}}}{\left|\sum_{v=1}^{d_{k}}{w_{v}^{(k+1)}C_{v,1}^{(k)}}\right|}\right){\rm for}\,\,{\rm all}\,\,{\rm the}\,\,\mathcal{P}{\rm{}^{\prime}s}\right\}.

Consider the jj-th neuron η\eta in layer ii. For an input x𝒳x\in\mathcal{X}, denote by h(η;x)h(\eta;x) the output of the neuron of the victim model. Based on the extracted neural network parameters (see Eq. (19), Eq. (27), and Eq. (28)), the output of the neuron of the extracted model is h(η;x)|v=1di1wj,v(i)Cv,1(i1)|\frac{h(\eta;x)}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|}. At the same time, all the weights w?,j(i+1)w_{?,j}^{(i+1)} in layer i+1i+1 are increased by a factor of |v=1di1wj,v(i)Cv,1(i1)|\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|. Thus, for the victim model and extracted model, the jj-th neuron in layer ii has the same influence on all the neurons in layer i+1i+1. As a result, for any x𝒳x\in\mathcal{X}, the model activation pattern of the victim model is the same as that of the extracted model. Combining with 𝒮θ^\mathcal{S}_{\widehat{\theta}}, for all x𝒳x\in\mathcal{X}, we have

fθ^(x)=1|v=1dkwv(k+1)Cv,1(k)|×fθ(x).f_{\widehat{\theta}}(x)=\frac{1}{\left|\sum_{v=1}^{d_{k}}{w_{v}^{(k+1)}C_{v,1}^{(k)}}\right|}\times f_{\theta}(x). (29)

In Appendix 0.C, we apply the extraction attack on 11-deep neural networks and directly present the extracted model, which helps further understand our attack.

Remark 2

Except for the n+1n+1 model activation patterns as shown in Eq. (15) and Eq. (20), the adversary could choose a new set of n+1n+1 model activation patterns. The reason is as follows. Consider the recovery of the weight vector of the jj-th neuron in layer ii, and look at Fig. 2 again. Our attack only requires that: (1) in layer ii, only the jj-th neuron is active; (2) in layer i1i-1, all the di1d_{i-1} neurons are active. The neuron states in other layers do not affect the attack. Thus, there are more options for the n+1n+1 model activation patterns, and the rationale does not change.

Discussion on The Computation Complexity.

Once n+1n+1 decision boundary points and kk sign guesses are selected, to obtain an extracted model, we just need to solve n+2d1n+2-d_{1} systems of linear equations. However, since the model activation pattern of a decision boundary point is unknown, we have to traverse all the possible combinations of n+1n+1 decision boundary points (see Step 3 in Section 4), which is the bottleneck of the total computation complexity. The complete analysis of the attack complexity is presented in Appendix 0.B.

6 Instantiating the Extraction Attack in Practice

Recall that, the complete extraction attack contains 55 steps introduced in Section 4. To obtain a functionally equivalent model, the adversary also needs three auxiliary techniques: finding decision boundary points (related to Steps 1 and 2), filtering duplicate affine transformations (related to Step 2), and filtering functionally inequivalent models (related to Step 5).

The idealized extraction attack introduced in Section 5 relies on decision boundary points xx that make fθ(x)=0f_{\theta}(x)=0 strictly hold. This section will propose a binary searching method to find decision boundary points under the hard-label setting. Under finite precision, it is hard to find decision boundary points xx that make fθ(x)=0f_{\theta}(x)=0 strictly hold. Therefore, the proposed method returns input points xx close to the decision hyperplane as decision boundary points. As a result, the remaining two techniques need to consider the influence of finite precision. This ensures our model extraction attacks work in practice, for producing a (ε,0)(\varepsilon,0)-functionally equivalent model.

6.1 Finding Decision Boundary Points

Let us see how to find decision boundary points under the hard-label setting. Fig. 3 shows a schematic diagram in a 2-dimensional input space.

Refer to caption
Figure 3: A schematic diagram of finding decision boundary points. The blue solid line stands for the decision hyperplane composed of decision boundary points. The red dashed line stands for a direction vector Δd0\varDelta\in\mathbb{R}^{d_{0}}. The starting point xd0x\in\mathbb{R}^{d_{0}} (i.e., the solid black circle) moves along the direction Δ\varDelta, and arrives at x+s×Δx+s\times\varDelta (i.e., the hollow black circle) where ss\in\mathbb{R} is the moving stride.

We first randomly pick a starting point xd0x\in\mathbb{R}^{d_{0}} and non-zero direction vector Δd0\varDelta\in\mathbb{R}^{d_{0}}. Then let the starting point move along the direction Δ\varDelta or the opposite direction Δ-\varDelta. It is expected that the starting point will eventually cross the decision hyperplane in one direction, as long as Δ\varDelta and Δ-\varDelta are not parallel to the decision hyperplane.

Denote by ss\in\mathbb{R} the moving stride of the starting point, which means that the starting point arrives at x+s×Δx+s\times\varDelta. After querying the Oracle with xx and x+s×Δx+s\times\varDelta, if z(fθ(x))z(fθ(x+s×Δ))z\left(f_{\theta}(x)\right)\neq z\left(f_{\theta}(x+s\times\varDelta)\right) (i.e., the two labels are different), we know that the starting point has crossed the decision hyperplane when the moving stride is ss. Now, the core of finding decision boundary points is to determine a suitable moving stride ss, such that the starting point reaches the decision hyperplane, i.e., fθ(x+s×Δ)=0f_{\theta}(x+s\times\varDelta)=0 holds.

This task is done by binary search. Concretely, randomly choose two different moving strides sslows_{\rm slow} and sfasts_{\rm fast} at first, such that

z(fθ(x+sslow×Δ))\displaystyle z\left(f_{\theta}(x+s_{\rm slow}\times\varDelta)\right) =z(fθ(x)),\displaystyle=z\left(f_{\theta}(x)\right), (30)
z(fθ(x+sslow×Δ))\displaystyle z\left(f_{\theta}(x+s_{\rm slow}\times\varDelta)\right) z(fθ(x+sfast×Δ)).\displaystyle\neq z\left(f_{\theta}(x+s_{\rm fast}\times\varDelta)\right).

Then, without changing the conditions presented in Eq. (30), we dynamically adjust sslows_{\rm slow} and sfasts_{\rm fast} until their absolute difference is close to 0, i.e., |sslowsfast|<ϵ\left|s_{\rm slow}-s_{\rm fast}\right|<\epsilon where ϵ\epsilon is a precision defined by the adversary. Finally, return x+sslow×Δx+s_{\rm slow}\times\varDelta as a decision boundary point.

Since the precision ϵ\epsilon is finite, x+sslow×Δx+s_{\rm slow}\times\varDelta is not strictly at the decision boundary, which will inevitably introduce minor errors (equivalent to noises) into the extracted model. If ϵ\epsilon decreases, then x+sslow×Δx+s_{\rm slow}\times\varDelta will be closer to the decision boundary, which is helpful to the model extraction attack, refers to the experiment results in Section 7.

6.2 Filtering Duplicate Affine Transformations

For a kk-deep neural network fθf_{\theta} consisting of n=i=1kdin=\sum_{i=1}^{k}{d_{i}} neurons, the idealized extraction attack exploits special n+1n+1 model activation patterns.

To ensure that the required n+1n+1 model activation patterns occur with a probability as high as possible, in Step 1 introduced in Section 4, we collect MM decision boundary points where Mn+1M\gg n+1, e.g., M=cn2nM=c_{n}2^{n} and cnc_{n} is a small factor. As a result, there are many collected decision boundary points with duplicate model activation patterns. Therefore, in Step 2, after recovering the parameter tuple (Γ𝒫,B𝒫)\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right) (i.e., the affine transformation) corresponding to each decision boundary point, we need to filter the decision boundary points with duplicate affine transformations, since their model activation patterns should be the same. When filtering duplicate affine transformations, we consider two possible cases.

Filtering Correctly Recovered Affine Transformations.

In the first case, assume that two affine transformations are both correctly recovered.

However, recovering affine transformations (i.e., 0-deep neural network extraction attack) relies on finding decision boundary points, which introduces minor errors. This is equivalent to adding noises to the recovered affine transformations, i.e., the tuples (Γ𝒫,B𝒫)\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right). To check whether two noisy affine transformations are the same, we adopt the checking rule below.

Comparing two vectors.

Consider two vectors with the same dimension, e.g., V1d,V2dV^{1}\in\mathbb{R}^{d},V^{2}\in\mathbb{R}^{d}. Set a small threshold φ\varphi. If the following dd inequations hold simultaneously

|Vj1Vj2|<φ,j{1,d}\left|V_{j}^{1}-V_{j}^{2}\right|<\varphi,\,\,j\in\{1,\cdots d\} (31)

where Vj1V_{j}^{1} and Vj2V_{j}^{2} are, respectively, the jj-th element of V1V^{1} and V2V^{2}, the two vectors are considered to be the same.

Filtering Wrongly Recovered Affine Transformations.

In the second case, assume that one affine transformation is correctly recovered and another one is partially recovered.

For the extraction attack on kk-deep neural networks, when recovering the affine transformation corresponding to an input by the 0-deep neural network extraction attack (see Section 5.1), the process of binary search should not change the model activation pattern. Otherwise, the affine transformation may be wrongly recovered. Recall that, in the 0-deep neural network extraction attack, the d0d_{0} elements of Γ𝒫\varGamma_{\mathcal{P}} are recovered one by one independently. Thus, the wrong recovery of one element of Γ𝒫\varGamma_{\mathcal{P}} does not influence the recovery of other elements.

As a result, we have to consider the case that one transformation is partially recovered. In this case, the filtering method is as follows. Consider two vectors V1dV^{1}\in\mathbb{R}^{d} and V2dV^{2}\in\mathbb{R}^{d}. If |Vj1Vj2|<φ\left|V_{j}^{1}-V_{j}^{2}\right|<\varphi holds for at least (ddφ)(d-d_{\varphi}) jj’s where j{1,,d}j\in\{1,\cdots,d\} and dφd_{\varphi}\in\mathbb{N} is a threshold, the two vectors are considered to be the same. Suppose that the occurrence frequencies of V1V^{1} and V2V^{2} are o1o_{1} and o2o_{2} respectively, we regard V1V^{1} as the correctly recovered affine transformation if o1o2o_{1}\gg o_{2}, and vice versa.

6.3 Filtering Functionally Inequivalent Extracted Models

Consider kk-deep neural networks consisting of n=i=1kdin=\sum_{i=1}^{k}{d_{i}} neurons. As introduced in Section 4, each time we randomly choose n+1n+1 out of NN collected decision boundary points to generate an extracted model. Moreover, according to Section 5.2, in the extraction attack, we need to guess kk signs, i.e., the sign of 𝒢j(i),i{1,,k}\mathcal{G}_{j}^{(i)},i\in\{1,\cdots,k\}.

When the model activation patterns of the selected n+1n+1 decision boundary points are not those required in the extraction attack, or at least one of the kk sign guesses is wrong, the resulting extracted model fθ^f_{\widehat{\theta}} is not a functionally equivalent model of the victim model fθf_{\theta}. Thus, we will get many functionally inequivalent extracted models.

Besides, due to the minor errors introduced by the finite precision used in finding decision boundary points, the parameters of the extracted model may be slightly different from the theoretical values (see Eq. (19), Eq. (27), and Eq. (28)). This subsection introduces three methods to filter functionally inequivalent extracted models, one of which considers the negative influence of finite precision together.

Filtering by the Normalized Model Signature.

Before introducing the filtering method, we discuss how many possible model activation patterns there are at most for a kk-deep neural network. Lemma 2 answers this question.

Lemma 2

For a kk-deep neural network consisting of n=i=1kdin=\sum_{i=1}^{k}{d_{i}} neurons, the upper bound of the number of possible model activation patterns is

H=(i=1k(2di1))+i=2k(j=1i1(2dj1)),H=\left(\prod_{i=1}^{k}{(2^{d_{i}}-1)}\right)+\sum_{i=2}^{k}{\left(\prod_{j=1}^{i-1}{(2^{d_{j}}-1)}\right)}, (32)

where did_{i} is the number of neurons in layer ii.

Proof

If all the did_{i} neurons in layer ii are inactive, i.e., the outputs of these neurons are 0, then the neuron states of all the j=i+1kdj\sum_{j=i+1}^{k}{d_{j}} neurons in the last kik-i layers are deterministic. In this case, the number of possible model activation patterns is decided by the first i1i-1 layers, i.e., the maximum is j=1i1(2dj1)\prod_{j=1}^{i-1}{(2^{d_{j}}-1)}. If there is at least one active neuron in each layer, then there are at most i=1k(2di1)\prod_{i=1}^{k}{(2^{d_{i}}-1)} possible model activation patterns.

After all the weights A^(i)\widehat{A}^{(i)} and biases b^(i),i{1,,k+1}\widehat{b}^{(i)},i\in\{1,\cdots,k+1\} are obtained, we assume that all the HH model activation patterns are possible, and compute the resulting normalized model signature 𝒮θ^𝒩\mathcal{S}_{\widehat{\theta}}^{\mathcal{N}}. Denote by 𝒮θ𝒩\mathcal{S}_{\theta}^{\mathcal{N}} the normalized model signature recovered in Step 2 (see Section 4). If 𝒮θ𝒩\mathcal{S}_{\theta}^{\mathcal{N}} is not a subset of 𝒮θ^𝒩\mathcal{S}_{\widehat{\theta}}^{\mathcal{N}}, we regard fθ^f_{\widehat{\theta}} as a functionally inequivalent model.

Due to the minor errors caused by finite precision, i.e., the slight difference between the extracted parameters θ^\widehat{\theta} and the theoretical values (see Eq. (19), Eq. (27), and Eq. (28)), when checking whether a tuple (Γ𝒫,B𝒫)𝒮θ𝒩\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right)\in\mathcal{S}_{\theta}^{\mathcal{N}} is equal to a tuple (Γ^𝒫,B^𝒫)𝒮θ^𝒩\left(\widehat{\varGamma}_{\mathcal{P}},\widehat{B}_{\mathcal{P}}\right)\in\mathcal{S}_{\widehat{\theta}}^{\mathcal{N}} or not, we adopt the checking rule presented in Section 6.2, refers to Eq. (31).

Besides, the filtering method in Section 6.2 does not ensure that all the wrongly recovered affine transformations are filtered. To avoid the functionally equivalent model being filtered, we adopt a flexible method.

Recall that, in Step 1, we collect a sufficient number of decision boundary points. For each tuple (Γ𝒫,B𝒫)𝒮θ𝒩\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right)\in\mathcal{S}_{\theta}^{\mathcal{N}}, denote by m𝒫m_{\mathcal{P}} the frequency that the tuple occurs in the collected decision boundary points. Suppose that the number of m𝒫m_{\mathcal{P}} where m𝒫>1m_{\mathcal{P}}>1 is NvalidN_{\rm valid}. Then when at least 0.95×Nvalid0.95\times N_{\rm valid} tuples (Γ𝒫,B𝒫)𝒮θ𝒩\left(\varGamma_{\mathcal{P}},B_{\mathcal{P}}\right)\in\mathcal{S}_{\theta}^{\mathcal{N}} are in the set 𝒮θ^𝒩\mathcal{S}_{\widehat{\theta}}^{\mathcal{N}}, the extracted model fθ^f_{\widehat{\theta}} is regarded as a candidate of the functionally equivalent model. Here, We call the ratio 0.95×Nvalid|𝒮θ𝒩|0.95\times\frac{N_{\rm valid}}{\left|\mathcal{S}_{\theta}^{\mathcal{N}}\right|} the adaptive threshold.

Filtering by Weight Signs.

After A^(i)\widehat{A}^{(i)}, b^(i)\widehat{b}^{(i)} for i{1,,k+1}i\in\{1,\cdots,k+1\} are obtained, we compute the matrices 𝒢^(i)\widehat{\mathcal{G}}^{(i)} and check whether the kk signs, i.e., the sign of 𝒢^j(i),i{1,,k}\widehat{\mathcal{G}}_{j}^{(i)},i\in\{1,\cdots,k\} are consistent with the kk guesses. If at least one sign is not consistent with the guess, the extracted model is not the functionally equivalent model.

Interestingly, except for handling wrong sign guesses, this method also shows high filtering effectiveness when the model activation patterns of the selected n+1n+1 decision boundary points are not those required by our extraction attacks. This is not strange, since our extraction attack is designed for a specific set of model activation patterns. For wrong model activation patterns, whether the sign of 𝒢^j(i),i{1,,k}\widehat{\mathcal{G}}_{j}^{(i)},i\in\{1,\cdots,k\} is 11 or 1-1 is a random event.

Filtering by Prediction Matching Ratio.

The above two filtering methods are effective, but we find that some functionally inequivalent models still escape from the filtering. Therefore, the third method is designed to perform the last filtering on extracted models surviving from the above two filtering methods. This method is based on the prediction matching ratio.

Prediction Matching Ratio.

Randomly generate N1N_{1} inputs, query the extracted model fθ^f_{\widehat{\theta}} and the victim model fθf_{\theta}. Suppose that the two models return the same hard-label for N2N_{2} out of N1N_{1} inputs. The ratio N2N1\frac{N_{2}}{N_{1}} is called the prediction matching ratio.

According to Definition 1 and Definition 2, for a functionally equivalent model, the prediction matching ratio should be high, or even close to 100%100\%. Note that many random inputs xx and corresponding hard-label z(fθ(x))z\left(f_{\theta}(x)\right) are collected during the attack process (see Steps 1 and 2 in Section 4). Thus, we can exploit these inputs.

7 Experiments

Our model extraction attacks are evaluated on both untrained and trained neural networks. Concretely, we first perform experiments on untrained neural networks with diverse architectures and randomly generated parameters. Then, based on two typical benchmarking image datasets (i.e., MNIST, CIFAR10) in visual deep learning, we train a series of neural networks as classifiers and evaluate the model extraction attacks on these trained neural networks.

For convenience, denote by ‘d0d_{0}-d1d_{1}-\cdots-dk+1d_{k+1}’ the victim model, where did_{i} is the dimension of each layer. For example, the symbol 1000-1 stands for a 0-deep neural network with an input dimension of 10001000 and an output dimension of 11.

Partial Universal Experiment Settings.

Some settings are used in all the following experiments. For kk-deep neural network extraction attacks, in Step 1, we randomly generate 8×2n8\times 2^{n} pairs of starting point and moving direction, where n=i=1kdin=\sum_{i=1}^{k}{d_{i}} is the number of neurons. The prediction matching ratio is estimated over 10610^{6} random inputs.

7.1 Computing (ε,0)(\varepsilon,0)-Functional Equivalence

To quantify the degree to which a model extraction attack has succeeded, the method (i.e., error bounds propagation [4]) proposed by Carlini et al. is adopted to compute (ε,0)(\varepsilon,0)-functional equivalence.

Error bounds propagation.

To compute (ε,0)(\varepsilon,0)-functional equivalence of the extracted neural network fθ^f_{\widehat{\theta}}, one just needs to compare the extracted parameters (weights A^(i)\widehat{A}^{(i)} and biases b^(i)\widehat{b}^{(i)}) to the real parameters (weights A(i)A^{(i)} and biases b(i)b^{(i)}) and analytically derive an upper bound on the error when performing inference [4].

Before comparing the neural network parameters, one must ‘align’ them [4]. This involves two operations: (1) adjusting the order of the neurons in the network, i.e., the order of the rows or columns of A(i)A^{(i)} and b(i)b^{(i)}, (2) adjusting the values of A(i)A^{(i)} and b(i)b^{(i)} to the theoretical one (see Eq. (19), Eq. (27), and Eq. (28)) obtained by the idealized model extraction attacks. This gives an aligned A~(i)\widetilde{A}^{(i)} and b~(i)\widetilde{b}^{(i)} from which one can analytically derive upper bounds on the error. Other details (e.g., propagating error bounds layer-by-layer) are the same as that introduced in [4], and not introduced again in this paper.

7.2 Experiments on Untrained Neural Networks

Table 1 summarizes the experimental results on different untrained neural networks which demonstrates the effectiveness of our model extraction attacks.

Table 1: Experiment results on untrained kk-deep neural networks.
Architecture Parameters ϵ\epsilon PMR Queries (ε,0)(\varepsilon,0) max|θθ^||\theta-\widehat{\theta}|
512-2-1 1029 101210^{-12} 100%100\% 219.352^{19.35} 212.212^{-12.21} 216.882^{-16.88}
101410^{-14} 100%100\% 219.592^{19.59} 219.842^{-19.84} 224.622^{-24.62}
2048-4-1 8201 101210^{-12} 99.98%99.98\% 223.322^{23.32} 23.772^{-3.77} 210.442^{-10.44}
101410^{-14} 100%100\% 223.512^{23.51} 213.702^{-13.70} 217.752^{-17.75}
25120-4-1 100489 101410^{-14} 99.98%99.98\% 226.422^{26.42} 22.992^{-2.99} 214.672^{-14.67}
101610^{-16} 100%100\% 226.672^{26.67} 213.012^{-13.01} 223.192^{-23.19}
50240-2-1 100485 101410^{-14} 99.99%99.99\% 225.852^{25.85} 27.202^{-7.20} 215.582^{-15.58}
101610^{-16} 100%100\% 226.312^{26.31} 214.442^{-14.44} 222.672^{-22.67}
32-2-2-1 75 101210^{-12} 100%100\% 217.322^{17.32} 210.992^{-10.99} 214.782^{-14.78}
101410^{-14} 100%100\% 217.562^{17.56} 218.212^{-18.21} 220.612^{-20.61}
512-2-2-1 1035 101210^{-12} 99.99%99.99\% 221.392^{21.39} 210.342^{-10.34} 214.012^{-14.01}
101410^{-14} 100%100\% 221.592^{21.59} 214.172^{-14.17} 217.292^{-17.29}
1024-2-2-1 2059 101210^{-12} 99.99%99.99\% 222.382^{22.38} 26.102^{-6.10} 213.772^{-13.77}
101410^{-14} 100%100\% 222.492^{22.49} 214.162^{-14.16} 220.382^{-20.38}
ϵ\epsilon: the precision used to find decision boundary points.
max|θθ^||\theta-\widehat{\theta}|: the maximum extraction error of model parameters.
PMR: prediction matching ratio.

According to Appendix 0.B, the computation complexity of our model extraction attack is about 𝒪(n×2n2+n+k)\mathcal{O}\left(n\times 2^{n^{2}+n+k}\right), where nn is the number of neurons. Thus, we limit the number of neurons, which does not influence the verification of our model extraction attack. Note that the number of parameters is not limited. All the attacks can be finished within several hours on a single core.

The results in Table 1 also support our argument in Remark 2. For the 22-deep neural networks (e.g., 32-2-2-1), when recovering the weights in layer 11, we require that only one neuron in layer 22 is active, instead of all the 22 neurons being active. Our extraction attacks also achieve good performance.

The influence of the precision ϵ\epsilon.

A smaller ϵ\epsilon will make the returned point x+sslow×Δx+s_{\rm slow}\times\varDelta (see Section 6.1) closer to the decision boundary, which helps reduce the extraction error of affine transformations. As a result, the model extraction attack is expected to perform better. For example, for the 1-deep neural network 2048-4-1, when ϵ\epsilon decreases from 101210^{-12} to 101410^{-14}, the value ε\varepsilon (respectively, max|θθ^||\theta-\widehat{\theta}|) decreases from 23.772^{-3.77} to 213.702^{-13.70} (respectively, from 210.442^{-10.44} to 217.752^{-17.75}), which is a significant improvement.

At the same time, using a smaller precision ϵ\epsilon does not increase the attack complexity significantly. According to Appendix 0.B, the query complexity is about 𝒪(d0×2n×log21ϵ)\mathcal{O}\left(d_{0}\times 2^{n}\times{\rm log}_{2}^{\frac{1}{\epsilon}}\right). Thus, decreasing ϵ\epsilon has little influence on the query complexity. Look at the neural network 2048-4-1 again. When ϵ\epsilon decreases from 101210^{-12} to 101410^{-14}, the number of queries only increases from 223.322^{23.32} to 223.512^{23.51}. Besides, when nn (i.e., the number of neurons) is large, ϵ\epsilon almost does not influence the computation complexity, since ϵ\epsilon only influences Steps 1 and 2 (see Section 4), while the computation complexity is mainly determined by other steps (refer to Appendix 0.B). When nn is small, the practical runtime is determined by the query complexity, then decreasing ϵ\epsilon also has little influence on the runtime.

Choosing an appropriate ϵ\epsilon is simple. In our experiments, we find that a smaller ϵ\epsilon should be used, when the prediction matching ratio estimated over 10610^{6} random inputs is not 100%100\%, and the gap (e.g., 0.02%0.02\%, see the third or fifth row) is not negligible.

7.3 Experiments on Trained Neural Networks

The MNIST and CIFAR10 Dataset.

MNIST (respectively, CIFAR10) is one typical benchmarking dataset used in visual deep learning. It contains ten-class handwriting number gray images [12] (resp., real object images in a realistic environment [19]). Each of the ten classes, i.e., ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, and ‘9’ (resp., airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck), contains 28×2828\times 28 pixel gray images (resp., 32×3232\times 32 pixel RGB images), totaling 6000060000 (resp., 5000050000) training and 1000010000 (resp. 1000010000) testing images.

Neural Network Training Pipelines.

When classifying different classes of objects, the decision boundary of trained neural networks will be different. To fully verify our model extraction attack, for MNIST (respectively, CIFAR10), we divide the ten classes into five groups and build a binary classification neural network for each group. All the neural networks share the same architecture d0d_{0}-2-1, where d0=28×28d_{0}=28\times 28 for MNIST (respectively, 32×32×332\times 32\times 3 for CIFAR10). On the MNIST and CIFAR10 datasets, we perform a standard rescaling of the pixel values from 02550\cdots 255 to 010\cdots 1. For the model training, we choose typical settings (the loss is the cross-entropy loss; the optimizer is standard stochastic gradient descent; batch size 128128). The first four columns of Table 2 summarize a detailed description of the neural networks to be attacked in this section.

Experiment Results.

The last four columns of Table 2 summarize the experiment results. Our extraction attack still achieves good performance when an appropriate precision ϵ\epsilon is used, which further verifies its effectiveness.

Table 2: Experiment results on neural networks trained on MNIST or CIFAR10.
task architecture accuracy parameters ϵ\epsilon Queries (ε,0)(\varepsilon,0) max|θθ^||\theta-\widehat{\theta}|
‘0’ vs ‘1’ 784-2-1 0.9035 1573 101210^{-12} 220.112^{20.11} 216.392^{-16.39} 217.852^{-17.85}
101410^{-14} 220.322^{20.32} 220.562^{-20.56} 222.812^{-22.81}
‘2’ vs ‘3’ 784-2-1 0.8497 1573 101210^{-12} 220.112^{20.11} 27.002^{-7.00} 27.802^{-7.80}
101410^{-14} 220.322^{20.32} 214.322^{-14.32} 215.062^{-15.06}
‘4’ vs ‘5’ 784-2-1 0.8570 1573 101210^{-12} 220.022^{20.02} 28.472^{-8.47} 28.822^{-8.82}
101410^{-14} 220.322^{20.32} 215.622^{-15.62} 215.812^{-15.81}
‘6’ vs ‘7’ 784-2-1 0.9290 1573 101210^{-12} 220.112^{20.11} 27.022^{-7.02} 27.932^{-7.93}
101410^{-14} 220.322^{20.32} 212.002^{-12.00} 212.912^{-12.91}
‘8’ vs ‘9’ 784-2-1 0.9501 1573 101210^{-12} 220.112^{20.11} 210.582^{-10.58} 211.622^{-11.62}
101410^{-14} 220.322^{20.32} 219.632^{-19.63} 221.722^{-21.72}
airplane vs 3072-2-1 0.8120 6149 101210^{-12} 222.082^{22.08} 24.842^{-4.84} 27.482^{-7.48}
automobile 101410^{-14} 222.292^{22.29} 212.412^{-12.41} 215.202^{-15.20}
bird vs cat 3072-2-1 0.6890 6149 101210^{-12} 222.072^{22.07} 28.372^{-8.37} 29.802^{-9.80}
101410^{-14} 222.292^{22.29} 212.272^{-12.27} 214.732^{-14.73}
deer vs dog 3072-2-1 0.6870 6149 101210^{-12} 222.012^{22.01} 29.552^{-9.55} 213.252^{-13.25}
101410^{-14} 222.222^{22.22} 213.192^{-13.19} 215.822^{-15.82}
frog vs horse 3072-2-1 0.8405 6149 101210^{-12} 222.082^{22.08} 29.562^{-9.56} 210.712^{-10.71}
101410^{-14} 222.292^{22.29} 213.582^{-13.58} 215.582^{-15.58}
ship vs truck 3072-2-1 0.7995 6149 101210^{-12} 222.082^{22.08} 28.632^{-8.63} 28.902^{-8.90}
101410^{-14} 222.292^{22.29} 212.952^{-12.95} 213.022^{-13.02}
max|θθ^||\theta-\widehat{\theta}|: the maximum extraction error of model parameters.
accuracy: classification accuracy of the victim model fθf_{\theta}.
for saving space, prediction matching ratios are not listed.

The experimental results presented in Table 1 and Table 2 show that the attack performance (i.e., the value of ε\varepsilon and max|θθ^|\text{max}|\theta-\widehat{\theta}|) is related to the precision ϵ\epsilon and the properties of the decision boundary. However, we do not find a clear quantitative relationship between the attack performance and the precision ϵ\epsilon (or some unknown properties of the decision boundary). Considering that the unknown quantitative relationships do not influence the verification of the model extraction attack, we leave the problem of exploring the unknown relationships as a future work.

8 Conclusion

In this paper, we have studied the model extraction attack against neural network models under the hard-label setting, i.e., the adversary only has access to the most likely class label corresponding to the raw output of neural network models. We propose new model extraction attacks that theoretically achieve functionally equivalent extraction. Practical experiments on numerous neural network models have verified the effectiveness of the proposed model extraction attacks. To the best of our knowledge, this is the first time to prove with practical experiments that it is possible to achieve functionally equivalent extraction against neural network models under the hard-label setting.

The future work will mainly focus on the following aspects:

  • The (computation and query) complexity of our model extraction attack remains high, which limits the application to neural networks with a large number of neurons. Reducing the complexity is an important problem.

  • In this paper, to recover the weight vector of the jj-th neuron in layer ii, we require that in layer ii, only the jj-th neuron is active. However, such a model activation pattern may not occur in some cases. Then how to recover the weight vector of this neuron based on other model activation patterns would be a vital step towards better generality.

  • Explore possible quantitative relationships between the precision ϵ\epsilon (or some unknown properties of the decision boundary) and ε\varepsilon (or max|θθ^|\text{max}|\theta-\widehat{\theta}|).

  • Extend the extraction attack to the case of vector outputs, i.e., the output dimensionality exceeds 11.

  • Develop extraction attacks against other kinds of neural network models.

8.0.1 Acknowledgments.

We would like to thank Adi Shamir for his guidance. We would like to thank the anonymous reviewers for their detailed and helpful comments. This work was supported by the National Key R&D Program of China (2018YFA0704701, 2020YFA0309705), Shandong Key Research and Development Program (2020ZLYS09), the Major Scientific and Technological Innovation Project of Shandong, China (2019JZZY010133), the Major Program of Guangdong Basic and Applied Research (2019B030302008), the Tsinghua University Dushi Program, and the Ministry of Education in Singapore under Grant RG93/23. Y. Chen was also supported by the Shuimu Tsinghua Scholar Program.

Appendix 0.A Proof of Lemma 1

We prove Lemma 1 by Mathematical Induction.

Proof

When i=2i=2, according to Lemma 1, the extracted weight vector A^j(2),j{1,,d2}\widehat{A}_{j}^{(2)},j\in\{1,\cdots,d_{2}\} should be

A^j(2)\displaystyle\widehat{A}_{j}^{(2)} =[wj,1(2)×|v=1d0w1,v(1)Cv,1(0)||v=1d1wj,v(2)Cv,1(1)|,,wj,d1(2)×|v=1d0wd1,v(1)Cv,1(0)||v=1d1wj,v(2)Cv,1(1)|]\displaystyle=\left[\frac{w_{j,1}^{(2)}\times\left|\sum_{v=1}^{d_{0}}{w_{1,v}^{(1)}C_{v,1}^{(0)}}\right|}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|},\cdots,\frac{w_{j,d_{1}}^{(2)}\times\left|\sum_{v=1}^{d_{0}}{w_{d_{1},v}^{(1)}C_{v,1}^{(0)}}\right|}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|}\right] (33)
=[wj,1(2)×|w1,1(1)||v=1d1wj,v(2)Cv,1(1)|,,wj,d1(2)×|wd1,1(1)||v=1d1wj,v(2)Cv,1(1)|].\displaystyle=\left[\frac{w_{j,1}^{(2)}\times\left|w_{1,1}^{(1)}\right|}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|},\cdots,\frac{w_{j,d_{1}}^{(2)}\times\left|w_{d_{1},1}^{(1)}\right|}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|}\right].

Note that C1,1(0)=1C_{1,1}^{(0)}=1 and Cv,1(0)=0C_{v,1}^{(0)}=0 for v{2,,d0}v\in\{2,\cdots,d_{0}\} .

Besides, we have

C(1)\displaystyle C^{(1)} =I𝒫(1)A(1)=A(1)=[A1(1),,Ad0(1)],\displaystyle=I_{\mathcal{P}}^{(1)}A^{(1)}=A^{(1)}=\left[A_{1}^{(1)},\cdots,A_{d_{0}}^{(1)}\right], (34)
C^(1)\displaystyle\widehat{C}^{(1)} =I𝒫(1)A^(1)=A^(1)=[A1(1)|w1,1(1)|,,Ad0(1)|wd0,1(1)|].\displaystyle=I_{\mathcal{P}}^{(1)}\widehat{A}^{(1)}=\widehat{A}^{(1)}=\left[\frac{A_{1}^{(1)}}{\left|w_{1,1}^{(1)}\right|},\cdots,\frac{A_{d_{0}}^{(1)}}{\left|w_{d_{0},1}^{(1)}\right|}\right].

where Av(1)=[wv,1(1),,wv,d0(1)]A_{v}^{(1)}=\left[w_{v,1}^{(1)},\cdots,w_{v,d_{0}}^{(1)}\right], Cv,u(1)=wv,u(1)C_{v,u}^{(1)}=w_{v,u}^{(1)} and C^v,u(1)=wv,u(1)|wv,1(1)|\widehat{C}_{v,u}^{(1)}=\frac{w_{v,u}^{(1)}}{\left|w_{v,1}^{(1)}\right|}.

Look at the system of linear equations presented in Eq. (26). Now, the system of linear equations is transformed into

{v=1d1w^j,v(2)C^v,1(1)=v=1d1wj,v(2)Cv,1(1)|v=1d1wj,v(2)Cv,1(1)|v=1d1w^j,v(2)C^v,d0(1)=v=1d1wj,v(2)Cv,d0(1)|v=1d1wj,v(2)Cv,1(1)|\left\{\begin{array}[]{c}\sum_{v=1}^{d_{1}}{\widehat{w}_{j,v}^{(2)}\widehat{C}_{v,1}^{(1)}}=\frac{\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|}\\ \vdots\\ \sum_{v=1}^{d_{1}}{\widehat{w}_{j,v}^{(2)}\widehat{C}_{v,d_{0}}^{(1)}}=\frac{\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,d_{0}}^{(1)}}}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|}\\ \end{array}\right. (35)

when d0d1d_{0}\geqslant d_{1}, by solving the system, it is expected to obtain

A^j(2)=[wj,1(2)|w1,1(1)||v=1d1wj,v(2)Cv,1(1)|,,wj,d1(2)|wd1,1(1)||v=1d1wj,v(2)Cv,1(1)|,],\widehat{A}_{j}^{(2)}=\left[\frac{w_{j,1}^{(2)}\left|w_{1,1}^{(1)}\right|}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|},\cdots,\frac{w_{j,d_{1}}^{(2)}\left|w_{d_{1},1}^{(1)}\right|}{\left|\sum_{v=1}^{d_{1}}{w_{j,v}^{(2)}C_{v,1}^{(1)}}\right|},\right], (36)

which is consistent with the expected value in Eq. (33).

Next, consider the recovery of the weight vector of the jj-th neuron in layer ii, and assume that the weights A^(1),,A^(i1)\widehat{A}^{(1)},\cdots,\widehat{A}^{(i-1)} as shown in Lemma 1 have been obtained. As a result, we have

Cj(i2)\displaystyle C_{j}^{(i-2)} =Aj(i2)A(i3)A(1),Cj(i1)=Aj(i1)A(i2)A(1),\displaystyle=A_{j}^{(i-2)}A^{(i-3)}\cdots A^{(1)},\,\,\,C_{j}^{(i-1)}=A_{j}^{(i-1)}A^{(i-2)}\cdots A^{(1)}, (37)
C^j(i2)\displaystyle\widehat{C}_{j}^{(i-2)} =A^j(i2)A^(i3)A^(1)=Cv(i2)|v=1di3wj,v(i2)Cv,1(i3)|,\displaystyle=\widehat{A}_{j}^{(i-2)}\widehat{A}^{(i-3)}\cdots\widehat{A}^{(1)}=\frac{C_{v}^{(i-2)}}{\left|\sum_{v=1}^{d_{i-3}}{w_{j,v}^{(i-2)}C_{v,1}^{(i-3)}}\right|},
C^j(i1)\displaystyle\widehat{C}_{j}^{(i-1)} =A^j(i1)A^(i2)A^(1)=Cv(i1)|v=1di2wj,v(i1)Cv,1(i2)|.\displaystyle=\widehat{A}_{j}^{(i-1)}\widehat{A}^{(i-2)}\cdots\widehat{A}^{(1)}=\frac{C_{v}^{(i-1)}}{\left|\sum_{v=1}^{d_{i-2}}{w_{j,v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}.

Now, the system of linear equations in Eq. (26) is transformed into

{u=1di1w^j,u(i)Cu,1(i1)|v=1di2wu,v(i1)Cv,1(i2)|=u=1di1wj,u(i)Cu,1(i1)|u=1di1wj,u(i)Cu,1(i1)|u=1di1w^j,u(i)Cu,d0(i1)|v=1di2wu,v(i1)Cv,1(i2)|=u=1di1wj,u(i)Cu,d0(i1)|u=1di1wj,u(i)Cu,1(i1)|\left\{\begin{array}[]{c}\sum_{u=1}^{d_{i-1}}{\widehat{w}_{j,u}^{(i)}\frac{C_{u,1}^{(i-1)}}{\left|\sum_{v=1}^{d_{i-2}}{w_{u,v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}}=\frac{\sum_{u=1}^{d_{i-1}}{w_{j,u}^{(i)}C_{u,1}^{(i-1)}}}{\left|\sum_{u=1}^{d_{i-1}}{w_{j,u}^{(i)}C_{u,1}^{(i-1)}}\right|}\\ \vdots\\ \sum_{u=1}^{d_{i-1}}{\widehat{w}_{j,u}^{(i)}\frac{C_{u,d_{0}}^{(i-1)}}{\left|\sum_{v=1}^{d_{i-2}}{w_{u,v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}}=\frac{\sum_{u=1}^{d_{i-1}}{w_{j,u}^{(i)}C_{u,d_{0}}^{(i-1)}}}{\left|\sum_{u=1}^{d_{i-1}}{w_{j,u}^{(i)}C_{u,1}^{(i-1)}}\right|}\\ \end{array}\right. (38)

When d0di1d_{0}\geqslant d_{i-1}, by solving this system, it is expected to obtain

A^j(i)\displaystyle\widehat{A}_{j}^{(i)} =[w^j,1(i),,w^j,di1(i)]\displaystyle=\left[\widehat{w}_{j,1}^{(i)},\cdots,\widehat{w}_{j,d_{i-1}}^{(i)}\right]
=[wj,1(i)×|v=1di2w1,v(i1)Cv,1(i2)||v=1di1wj,v(i)Cv,1(i1)|,,wj,di1(i)×|v=1di2wdi1,v(i1)Cv,1(i2)||v=1di1wj,v(i)Cv,1(i1)|],\displaystyle=\left[\frac{w_{j,1}^{(i)}\times\left|\sum_{v=1}^{d_{i-2}}{w_{1,v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|},\cdots,\frac{w_{j,d_{i-1}}^{(i)}\times\left|\sum_{v=1}^{d_{i-2}}{w_{d_{i-1},v}^{(i-1)}C_{v,1}^{(i-2)}}\right|}{\left|\sum_{v=1}^{d_{i-1}}{w_{j,v}^{(i)}C_{v,1}^{(i-1)}}\right|}\right],

which is consistent with the expected value in Eq. (27).

Appendix 0.B Complexity of Hard-Label Model Extraction Attacks

For the kk-deep neural network extraction attack, its complexity is composed of two parts: Oracle query complexity and computation complexity. Suppose that the number of neurons is n=i=1kdin=\sum_{i=1}^{k}{d_{i}}. Its input size, i.e., the size of xx is d0d_{0}. And kk is the number of hidden layers. The precision adopted by binary search is ϵ\epsilon (refer to Section 6.1).

Oracle Query Complexity.

For the kk-deep neural network extraction attack, we only query the Oracle in Steps 1 and 2 (see Section 4).

In Step 1, if cn×2nc_{n}\times 2^{n} decision boundary points are collected, then the number of queries to the Oracle is cϵ×cn×2nc_{\epsilon}\times c_{n}\times 2^{n}, where cϵc_{\epsilon} is a factor determined by the precision ϵ\epsilon, and cnc_{n} is a small factor defined by the attacker. In Step 2, for each decision boundary point xx collected in Step 1, to recover the corresponding affine transformation (i.e., Γ𝒫\varGamma_{\mathcal{P}} and B𝒫B_{\mathcal{P}}), we need to collect another d01d_{0}-1 decision boundary points. Therefore, the times of querying the Oracle in this step is cϵ×cn×2n×(d01)c_{\epsilon}\times c_{n}\times 2^{n}\times(d_{0}-1). Based on the above analysis, the Oracle query complexity of our kk-deep neural network extraction attack is cϵ×cn×2n×d0c_{\epsilon}\times c_{n}\times 2^{n}\times d_{0}. Note that cϵc_{\epsilon} is proportional to log21ϵ{\rm log}_{2}^{\frac{1}{\epsilon}}. Thus, the query complexity is about 𝒪(d0×2n×log21ϵ)\mathcal{O}\left(d_{0}\times 2^{n}\times{\rm log}_{2}^{\frac{1}{\epsilon}}\right).

Computation Complexity.

For the kk-deep neural network extraction attack, when nn is large, most computations are occupied by recovering neural network parameters, i.e., Steps 3 and 4 (see Section 4). Suppose that there are Ncn×2nN\leqslant c_{n}\times 2^{n} decision boundary points used to recover neural network parameters after filtering duplicate affine transformations in Step 2.

In Step 3, to recover the weight vector of the jj-th neuron in layer ii where i{2,,k+1}i\in\{2,\cdots,k+1\}, we need to solve a system of linear equations. For convenience, let us ignore the difference in the sizes of the different systems of linear equations. Then, to recover all the weights A(i)A^{(i)}, a total of n+1d1=i=2k+1din+1-d_{1}=\sum_{i=2}^{k+1}{d_{i}} systems of linear equations need to be solved. In Step 4, to recover all the biases b(1)b^{(1)}, only one system of linear equations needs to be solved. Therefore, to obtain an extracted model, we need to solve n+2d1n+2-d_{1} systems of linear equations.

There are two loops in the extraction attack. First, we need to select n+1n+1 out of NN decision boundary points each time. More concretely, to recover the weights A(i)A^{(i)} in layer ii, we choose did_{i} decision boundary points. Then the number (denoted by l1l_{1}) of possible cases is

l1=(Nd1)×(Nd1d2)××(Ni=1k1didk)×(Nn1)Nn+1,forNn.l_{1}=\tbinom{N}{d_{1}}\times\tbinom{N-d_{1}}{d_{2}}\times\cdots\times\tbinom{N-\sum_{i=1}^{k-1}{d_{i}}}{d_{k}}\times\tbinom{N-n}{1}\approx N^{n+1},\,\,{\rm for}\,\,N\gg n.

Second, we need to guess kk signs when recovering all the weights, i.e., there are 2k2^{k} cases.

Thus, the computation complexity is about 𝒪(l1×2k×(n+2d1))\mathcal{O}\left(l_{1}\times 2^{k}\times(n+2-d_{1})\right). When an appropriate precision ϵ\epsilon (i.e., ϵ\epsilon is small) is adopted, we have NH<2nN\approx H<2^{n}, where HH is the number of possible model activation patterns (refer to Lemma 2). Then, we further have

l1×2k×(n+2d1)Nn+1×2k×nn×2n(n+1)+k.l_{1}\times 2^{k}\times(n+2-d_{1})\approx N^{n+1}\times 2^{k}\times n\approx n\times 2^{n(n+1)+k}. (39)

Thus, the computation complexity is about 𝒪(n×2n2+n+k)\mathcal{O}\left(n\times 2^{n^{2}+n+k}\right).

Appendix 0.C Extraction on 11-Deep Neural Networks

The parameters of the extracted 11-deep neural network are as follows.

A^i(1)\displaystyle\widehat{A}_{i}^{(1)} =[w^i,1(1),,w^i,d0(1)]=[wi,1(1)|wi,1(1)|,,wi,d0(1)|wi,1(1)|],i{1,,d1},\displaystyle=\left[\widehat{w}_{i,1}^{(1)},\cdots,\widehat{w}_{i,d_{0}}^{(1)}\right]=\left[\frac{w_{i,1}^{(1)}}{\left|w_{i,1}^{(1)}\right|},\cdots,\frac{w_{i,d_{0}}^{(1)}}{\left|w_{i,1}^{(1)}\right|}\right],\,\,i\in\{1,\cdots,d_{1}\}, (40)
b^(1)\displaystyle\widehat{b}^{(1)} =[b^1(1),,b^d1(1)]=[b1(1)|w1,1(1)|,,bd1(1)|wd1,1(1)|],\displaystyle=[\widehat{b}_{1}^{(1)},\cdots,\widehat{b}_{d_{1}}^{(1)}]=\left[\frac{b_{1}^{(1)}}{\left|w_{1,1}^{(1)}\right|},\cdots,\frac{b_{d_{1}}^{(1)}}{\left|w_{d_{1},1}^{(1)}\right|}\right],
A^(2)\displaystyle\widehat{A}^{(2)} =[w^1(2),,w^d1(2)]=[w1(2)|w1,1(1)||i=1d1wi(2)wi,1(1)|,,wd1(2)|wd1,1(1)||i=1d1wi(2)wi,1(1)|],\displaystyle=\left[\widehat{w}_{1}^{(2)},\cdots,\widehat{w}_{d_{1}}^{(2)}\right]=\left[\frac{w_{1}^{(2)}\left|w_{1,1}^{(1)}\right|}{\left|\sum_{i=1}^{d_{1}}{w_{i}^{(2)}w_{i,1}^{(1)}}\right|},\cdots,\frac{w_{d_{1}}^{(2)}\left|w_{d_{1},1}^{(1)}\right|}{\left|\sum_{i=1}^{d_{1}}{w_{i}^{(2)}w_{i,1}^{(1)}}\right|}\right],
b^(2)\displaystyle\widehat{b}^{(2)} =b(2)|i=1d1wi(2)wi,1(1)|.\displaystyle=\frac{b^{(2)}}{\left|\sum_{i=1}^{d_{1}}{w_{i}^{(2)}w_{i,1}^{(1)}}\right|}.

Fig. 4 shows a diagram of a victim model (2-2-1) and the extracted model.

Refer to caption
Figure 4: Left: the victim model fθf_{\theta}. Right: the extracted model fθ^f_{\widehat{\theta}}.

References

  • [1] Batina, L., Bhasin, S., Jap, D., Picek, S.: CSI NN: reverse engineering of neural network architectures through electromagnetic side channel. In: Heninger, N., Traynor, P. (eds.) USENIX Security 2019. pp. 515–532. USENIX Association
  • [2] Blum, A., Rivest, R.L.: Training a 3-node neural network is np-complete. In: Hanson, S.J., Remmele, W., Rivest, R.L. (eds.) Machine Learning: From Theory to Applications - Cooperative Research at Siemens and MIT. LNCS, vol. 661, pp. 9–28. Springer (1993)
  • [3] Canales-Martínez, I., Chávez-Saab, J., Hambitzer, A., Rodríguez-Henríquez, F., Satpute, N., Shamir, A.: Polynomial time cryptanalytic extraction of neural network models. IACR Cryptol. ePrint Arch. p. 1526 (2023)
  • [4] Carlini, N., Jagielski, M., Mironov, I.: Cryptanalytic extraction of neural network models. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020. LNCS, vol. 12172, pp. 189–218. Springer
  • [5] Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: SP 2017. pp. 39–57. IEEE Computer Society
  • [6] Dubey, S.R., Singh, S.K., Chaudhuri, B.B.: Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 503, 92–108 (2022)
  • [7] Fefferman, C.: Reconstructing a neural net from its output. Revista Matematica Iberoamericana 10, 507–555 (1994)
  • [8] Galstyan, A., Cohen, P.R.: Empirical comparison of ”hard” and ”soft” label propagation for relational classification. In: Blockeel, H., Ramon, J., Shavlik, J.W., Tadepalli, P. (eds.) ILP 2007. LNCS, vol. 4894, pp. 98–111. Springer
  • [9] Ganju, K., Wang, Q., Yang, W., Gunter, C.A., Borisov, N.: Property inference attacks on fully connected neural networks using permutation invariant representations. In: Lie, D., Mannan, M., Backes, M., Wang, X. (eds.) CCS 2018. pp. 619–633. ACM
  • [10] Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) NeurIPS 2015. pp. 1135–1143
  • [11] Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High accuracy and high fidelity extraction of neural networks. In: Capkun, S., Roesner, F. (eds.) USENIX Security 2020. pp. 1345–1362. USENIX Association
  • [12] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
  • [13] Long, C., Collins, R., Swears, E., Hoogs, A.: Deep neural networks in fully connected CRF for image labeling with social network metadata. In: WACV 2019. pp. 1607–1615. IEEE
  • [14] Lowd, D., Meek, C.: Adversarial learning. In: Grossman, R., Bayardo, R.J., Bennett, K.P. (eds.) SIGKDD 2005. pp. 641–647. ACM
  • [15] Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz, J., Joachims, T. (eds.) ICML, 2010. pp. 807–814. Omnipress
  • [16] Oliynyk, D., Mayer, R., Rauber, A.: I know what you trained last summer: A survey on stealing machine learning models and defences. ACM Comput. Surv. 55(14s), 324:1–324:41 (2023)
  • [17] Perazzi, F., Wang, O., Gross, M.H., Sorkine-Hornung, A.: Fully connected object proposals for video segmentation. In: ICCV 2015. pp. 3227–3234. IEEE Computer Society
  • [18] Rolnick, D., Kording, K.P.: Reverse-engineering deep relu networks. In: ICML 2020. Proceedings of Machine Learning Research, vol. 119, pp. 8178–8187. PMLR
  • [19] Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
  • [20] Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: Holz, T., Savage, S. (eds.) USENIX Security 2016. pp. 601–618. USENIX Association