This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Secure Watermark for Deep Neural Networks with Multi-task Learning

Fangqi Li, Shilin Wang
School of Electronic Information and Electrical Engineering,
Shanghai Jiao Tong University,
{solour_lfq,wsl}@sjtu.edu.cn
Abstract

Deep neural networks are playing an important role in many real-life applications. After being trained with abundant data and computing resources, a deep neural network model providing service is endowed with economic value. An important prerequisite in commercializing and protecting deep neural networks is the reliable identification of their genuine author. To meet this goal, watermarking schemes that embed the author’s identity information into the networks have been proposed. However, current schemes can hardly meet all the necessary requirements for securely proving the authorship and mostly focus on models for classification. To explicitly meet the formal definitions of the security requirements and increase the applicability of deep neural network watermarking schemes, we propose a new framework based on multi-task learning. By treating the watermark embedding as an extra task, most of the security requirements are explicitly formulated and met with well-designed regularizers, the rest is guaranteed by using components from cryptography. Moreover, a decentralized verification protocol is proposed to standardize the ownership verification. The experiment results show that the proposed scheme is flexible, secure, and robust, hence a promising candidate in deep learning model protection.

1 Introduction

Deep neural network (DNN) is spearheading artificial intelligence with broad application in assorted fields including computer vision [36, 58, 19], natural language processing [17, 53, 10], internet of things [30, 14, 41], etc. Increasing computing resources and improved algorithms have boosted DNN as a trustworthy agent that outperforms humans in many disciplines.

To train a DNN is much more expensive than to use it for inference. A large amount of data has to be collected, preprocessed, and fed into the model. Following the data preparation is designing the regularizers, tuning the (hyper)parameters, and optimizing the DNN structure. Each round of tuning involves thousands of epochs of backpropagation, whose cost is about 0.005$ averagely regarding electricity consumption.111Assume that one-kilowatt-hour electricity costs 0.1$. One epoch for training a DNN can consume over three minutes on four GPUs, each functions at around 300W. On the contrary, using a published DNN is easy, a user simply propagates the input forward. Such an imbalance between DNN production and deployment calls for recognizing DNN models as intellectual properties and designing better mechanisms for authorship identification against piracy.

DNN models, as other multi-media objects, are usually transmitted in public channels. Hence the most influential methods for protecting DNNs as intellectual properties is digital watermark [59]. To prove the possession of an image, a piece of music, or a video, the owner resorts to a watermarking method that encodes its identity information into the media. After compression, transmission, and slight distortion, a decoder should be able to recognize the identity from the carrier [4].

As for DNN watermarking, researchers have been following a similar line of reasoning [48]. In this paper, we use host to denote the genuine author of a DNN model. The adversary is one who steals and publishes the model as if it is the host. To add watermarks to a DNN, some information is embedded into the network along with the normal training data. After adversaries manage to steal the model and pretend to have built it on themselves, a verification process reveals the hidden information in the DNN to identify the authentic host. In the DNN setting, watermark as additional security insurance should not sacrifice the model’s performance. This is called the functionality-preserving property. Meanwhile, the watermark should be robust against the adversaries’ modifications to the model. Many users fine-tune (FT) the downloaded model on a smaller data set to fit their tasks. In cases where the computational resource is restricted (especially in the internet of things), a user is expected to conduct neuron pruning (NP) to save energy. A prudent user can conduct fine-pruning (FP) [31] to eliminate potential backdoors that have been inserted into the model. These basic requirements, together with other concerns for integrity, privacy, etc, make DNN watermark a challenge for both machine learning and security communities.

The diversity of current watermarking schemes originates from assumptions on whether or not the host or the notary has white-box access to the stolen model.

If the adversary has stolen the model and only provided an API as a service then the host has only black-box access to the possibly stolen model. In this case, the backdoor-based watermarking schemes are preferred. A DNN with a backdoor yields special outputs on specific inputs. For example, it is possible to train an image classification DNN to classify all images with a triangle stamp on the upper-left corner as cats. Backdoor-based watermark was pioneered by [59], where a collection of images is selected as the trigger set to actuate misclassifications. It was indicated in [3, 60] that cryptological protocols can be used with the backdoor-based watermark to prove the integrity of the host’s identity. For a more decent way of generating triggers, Li et al. proposed in [29] to adopt a variational autoencoder (VAE), while Le Merrer et al. used adversarial samples as triggers [26]. Li et al. proposed Wonder Filter that assigns some pixels to values in [2000,2000][-2000,2000] and adopted several tricks to guarantee the robustness of watermark embedding in [27]. In [57], Yao et al. illustrated the performance of the backdoor-based watermark in transfer learning and concluded that it is better to embed information in the feature extraction layers.

The backdoor-based watermarking schemes are essentially insecure given various methods of backdoor elimination [9, 32, 28]. Liu et al. showed in [33] that a heuristic and biomorphic method can detect backdoor in a DNN. In [44], Shafieinejad et al. claimed that it is able to remove watermarks given the black-box access of the model. Namba et al. proposed another defense using VAE against backdoor-based watermarking methods in [35]. Even without these specialized algorithms, model tuning such as FP [31, 47] can efficiently block backdoor and hence the backdoor-based watermark.

If the host can obtain all the parameters of the model, known as the white-box access, then the weight-based watermarking schemes are in favor. Although this assumption is strictly stronger than that for the black-box setting, its practicality remains significant. For example, the sponsor of a model competition can detect plagiarists that submit models slightly tuned from those of other contestants by examing the watermark. This legitimate method is better than checking whether two models perform significantly different on a batch of data, which is still adopted by many competitions.222http://host.robots.ox.ac.uk:8080/leaderboard/main_bootstrap.php As another example, the investor of a project can verify the originality of a submitted model from its watermark. Such verification prevents the tenderers from submitting a (modified) copy or an outdated and potentially backdoored model.

Uchida et al. firstly revealed the feasibility of incorporating the host’s identity information into the weights of a DNN in [48]. The encoding is done through a regularizer that minimizes the distance between a specific weight vector and a string encoding the author’s identity. The method in [16] is an attempt of embedding message into the model’s weight in a reversible manner so that a trusted user can eliminate the watermark’s influence and obtain the clean model. Instead of weights, Davish et al. proposed Deepsigns[12] that embeds the host’s identity into the statistical mean of the feature maps of a selected collection of samples, hence better protection is achieved.

So far, the performance of a watermarking method is mainly measured by the decline of the watermarked model’s performance on normal inputs and the decline of the identity verification accuracy against model fine-tuning and neuron pruning. However, many of the results are empirical and lack analytic basis [48, 12]. Most watermarking methods are only designed and examined for DNNs for image classification, whose backdoors can be generated easily. This fact challenges the universality of adopting DNN watermark for practical use. Moreover, some basic security requirements against adversarial attacks have been overlooked by most existing watermarking schemes. For example, the method in [59] can detect the piracy, but it cannot prove to any third-party that the model belongs to the host. As indicated by Auguste Kerckhoff’s principle [24], the security of the system should rely on the secret key rather than the secrecy of the algorithm. Methods in [59, 12, 48] are insecure in this sense since an adversary knowing the watermark algorithm can effortlessly claim the authorship. The influence of watermark overwriting is only discussed in [3, 27, 12]. The security against ownership piracy is only studied in [60, 27, 16].

In order to overcome these difficulties, we propose a new white-box watermarking model for DNN based on multi-task learning (MTL) [7, 43, 22]. By turning the watermark embedding into an extra task, most security requirements can be satisfied with well-designed regularizers. This extra task has a classifier independent from the backend of the original model, hence it can verify the ownership of models designed for tasks other than classification. Cryptological protocols are adopted to instantiate the watermarking task, making the proposed scheme more secure against watermark detection and ownership piracy. To ensure the integrity of authorship identification, a decentralized verification protocol is designed to authorize the time stamp of the ownership and invalid the watermark overwriting attack. The major contributions of our work are three-fold:

  1. 1.

    We examine the security requirements for DNN watermark in a comprehensive and formal manner.

  2. 2.

    A DNN watermarking model based on MTL, together with a decentralized protocal, is proposed to meet all the security requirements. Our proposal can be applied to DNNs for tasks other than image classification, which were the only focus of previous works.

  3. 3.

    Compared with several state-of-the-art watermarking schemes, the proposed method is more robust and secure.

2 Threat Model and Security Requirements

It is reasonable to assume that the adversary possesses fewer resources than the host, e.g., the entire training data set is not exposed to the adversary, and/or the adversary’s computation resources are limited. Otherwise, it is unnecessary for the adversary to steal the model. Moreover, we assume that the adversary can only tune the model by methods such as FT, NP or FP. Such modifications are common attacks since the training code is usually published along with the trained model. Meanwhile, such tuning is effective against systems that only use the hash of the model as the verification. On the other hand, it is hard and much involved to modify the internal computational graph of a model. It is harder to adopt model extraction or distillation that demands much data and computation [23, 40], yet risks performance and the ability of generalization. Assume that the DNN model MM is designed to fulfil a primary task, 𝒯primary\mathcal{T}_{\text{primary}}, with dataset 𝒟primary\mathcal{D}_{\text{primary}}, data space 𝒳\mathcal{X}, label space 𝒴\mathcal{Y} and a metric dd on 𝒴\mathcal{Y}.

2.1 Threat Model

We consider five major threats to the DNN watermarks.

2.1.1 Model tuning

An adversary can tune MM by methods including: (1) FT: running backpropagation on a local dataset, (2) NP: cut out links in MM that are less important, and (3) FP: pruning unnecessary neurons in MM and fine-tuning MM. The adversary’s local dataset is usually much smaller than the original training dataset for MM and fewer epochs are needed. FT and NP can compromise watermarking methods that encode information into MM’s weight in a reversible way [16]. Meanwhile, [31] suggested that FP can efficiently eliminate backdoors from image classification models and watermarks within.

2.1.2 Watermark detection

If the adversary can distinguish a watermarked model from a clean one, then the watermark is of less use since the adversary can use the clean models and escape copyright regulation. The adversary can adopt backdoor screening methods [56, 50, 49] or reverse engineering [20, 5] to detect and possibly eliminate backdoor-based watermarks. For weight-based watermarks, the host has to ensure that the weights of a watermarked model do not deviate from that of a clean model too much. Otherwise, the property inference attack [15] can distinguish two models.

2.1.3 Privacy concerns

As an extension to detection, we consider an adversary who is capable of identifying the host of a model without its permission as a threat to privacy. A watermarked DNN should expose no information about its host unless the host wants to. Otherwise, it is possible that models be evaluated not by their performance but by their authors.

2.1.4 Watermark overwriting

Having obtained the model and the watermarking method, the adversary can embed its watermark into the model and declare the ownership afterward. Embedding an extra watermark only requires the redundancy of parameter representation in the model. Therefore new watermarks can always be embedded unless one proves that such redundancy has been depleted, which is generally impossible. A concrete requirement is: the insertion of a new watermark should not erase the previous watermarks.

For a model with multiple watermarks, it is necessary that an an incontrovertible time-stamp is included into ownership verification to break this redeclaration dilemma.

2.1.5 Ownership piracy

Even without tuning the parameters, model theft is still possible. Similar to [29], we define ownership piracy as attacks by which the adversary claims ownership over a DNN model without tuning its parameters or training extra learning modules. For zero-bit watermarking schemes (no secret key is involved, the security depends on the secrecy of the algorithm), the adversary can claim ownership by publishing a copy of the scheme. For a backdoor-based watermarking scheme that is not carefully designed, the adversary can detect the backdoor and claim that the backdoor as its watermark.

The secure watermarking schemes usually make use of cryptological protocols[60, 27]. In these schemes, the adversary is almost impossible to pretend to be the host using any probabilistic machine that terminates within time complexity polynomial to the security parameters (PPT).

2.2 Formulating the Watermarking Scheme

We define a watermarking scheme with security parameters NN as a probabilistic algorithm WM that maps 𝒯primary\mathcal{T}_{\text{primary}} (the description of the task, together with the training dataset 𝒟primary\mathcal{D}_{\text{primary}}), a description of the structure of the DNN model \mathcal{M} and a secret key denoted by key to a pair (MWM,verify)\left(M_{\text{WM}},\texttt{verify}\right):

WM:(MWM,verify)(N,𝒯primary,,key),\texttt{WM}:\left(M_{\text{WM}},\texttt{verify}\right)\leftarrow\left(N,\mathcal{T}_{\text{primary}},\mathcal{M},\texttt{key}\right),

where MWMM_{\text{WM}} is the watermarked DNN model and verify is a probabilistic algorithm with binary output for verifying ownership. To verify the ownership, the host provides verify and key. A watermarking scheme should satisfy the following basic requirements for correctness:

Pr{verify(MWM,key)=1}1ϵ,\text{Pr}\left\{\texttt{verify}(M_{\text{WM}},\texttt{key})=1\right\}\geq 1-\epsilon, (1)
PrM irrelevent to MWM,or keykey{verify(M,key)=0}1ϵ,\text{Pr}_{\scriptsize\begin{aligned} &M^{\prime}\text{ irrelevent to }M_{\text{WM}}\text{,}\\ &\text{or }\texttt{key}^{\prime}\neq\texttt{key}\end{aligned}}\left\{\texttt{verify}(M^{\prime},\texttt{key}^{\prime})=0\right\}\geq 1-\epsilon, (2)

where ϵ(0,1)\epsilon\in(0,1) reflects the security level. Condition (1) suggests that the verifier should always correctly identify the authorship while (2) suggests that it only accepts the correct key as the proof and it should not mistake irrelevant models as the host’s.

The original model trained without being watermarked is denoted by McleanM_{\text{clean}}. Some researchers [16] define WM as a mapping from (N,Mclean,key)\left(N,M_{\text{clean}},\texttt{key}\right) to (MWM,verify)\left(M_{\text{WM}},\texttt{verify}\right), which is a subclass of our definition.

2.3 Security Requirements

Having examined the toolkit of the adversary, we formally define the security requirements for a watermarking scheme.

2.3.1 Functionality-preserving

The watermarked model should perform slightly worse than, if not as well as, the clean model. The definition for this property is:

Pr(x,y)𝒯primary{d(Mclean(x),MWM(x))δ}1ϵ,\text{Pr}_{(x,y)\sim\mathcal{T}_{\text{primary}}}\left\{d(M_{\text{clean}}(x),M_{\text{WM}}(x))\leq\delta\right\}\geq 1-\epsilon, (3)

which can be examined a posteriori. However, it is hard to explicitly incorporate this definition into the watermarking scheme. Instead, we resort to the following definition:

x𝒳d(Mclean(x),MWM(x))δ.\forall x\in\mathcal{X}\text{, }d(M_{\text{clean}}(x),M_{\text{WM}}(x))\leq\delta. (4)

Although it is stronger than (3), (4) is a tractable definition. We only have to ensure that the parameters of MWMM_{\text{WM}} does not deviate from those of McleanM_{\text{clean}} too much.

2.3.2 Security against tuning

After being tuned with the adversary’s dataset 𝒟adversary\mathcal{D}_{\text{adversary}}, the model’s parameters shift and the verification accuracy of the watermark might decline. Let M𝒟adversarytuningMWMM^{\prime}\xleftarrow[\mathcal{D}_{\text{adversary}}]{\text{tuning}}M_{\text{WM}} denotes a model MM^{\prime} obtained by tuning MWMM_{\text{WM}} with 𝒟adversary\mathcal{D}_{\text{adversary}}. A watermarking scheme is secure against tuning iff:

Pr𝒟adversary,M𝒟adversarytuningMWM{verify(M,key)=1}1ϵ.\text{Pr}_{\scriptsize\begin{aligned} &\mathcal{D}_{\text{adversary}},\\ M^{\prime}&\xleftarrow[\mathcal{D}_{\text{adversary}}]{\text{tuning}}M_{\text{WM}}\end{aligned}}\left\{\texttt{verify}(M^{\prime},\texttt{key})=1\right\}\geq 1-\epsilon. (5)

To meet (5), the host has to simulate the effects of tuning and make verify(,key)\texttt{verify}(\cdot,\texttt{key}) insensitive to them in the neighbour of MWMM_{\text{WM}}.

2.3.3 Security against watermark detection

According to [52], one definition for the security against watermark detection is: no PPT can distinguish a watermarked model from a clean one with nonnegligible probability. Although this definition is impractical due to the lack of a universal backdoor detector, it is crucial that the watermark does not differentiate a watermarked model from a clean model too much. Moreover, the host should be able to control the level of this difference by tuning the watermarking method. Let θ\theta be a parameter within WM that regulates such difference, it is desirable that

MWM=Mclean,M^{\infty}_{\text{WM}}=M_{\text{clean}}, (6)

where MWMM^{\infty}_{\text{WM}} is the model returned from WM with θ\theta\rightarrow\infty.

2.3.4 Privacy-preserving

To protect the host’s privacy, it is sufficient that any adversary cannot distinguish between two models watermarked with different keys. Fixing the primary task 𝒯primary\mathcal{T}_{\text{primary}} and the structure of the model \mathcal{M}, we first introduce an experiment Exp𝒜detect\texttt{Exp}^{\text{detect}}_{\mathcal{\mathcal{A}}} in which an adversary 𝒜\mathcal{A} tries to identify the host of a model:

Algorithm 1 Exp𝒜detect\texttt{Exp}^{\text{detect}}_{\mathcal{\mathcal{A}}}.
0:  NN, WM, key0key1\texttt{key}_{0}\neq\texttt{key}_{1}.
1:  Randomly select b{0,1}b\leftarrow\left\{0,1\right\};
2:  Generate MWMM_{\text{WM}} from WM(N,𝒯primary,,keyb)\texttt{WM}(N,\mathcal{T}_{\text{primary}},\mathcal{M},\texttt{key}_{b});
3:  𝒜\mathcal{A} is given MWMM_{\text{WM}}, NN, WM, key0\texttt{key}_{0}, key1\texttt{key}_{1} and outputs b^\hat{b}.
4:  𝒜\mathcal{A} wins the experiment if b^=b\hat{b}=b.
Table 1: Security requirements and established watermarking schemes.
Security requirement
Zhu.
 [60].
Adi.
 [3].
Le Merrer.
 [26].
Zhang.
 [59].
Davish.
 [12].
Li.
 [27].
Li.
 [29].
Uchida.
[48].
Guan.
 [16].
Ours.
Functionality-preserving. E E P E P E E P E P
Security against tuning. N E E E E E N E N P
Security against
watermark detection.
N N N N N P P N N P
Privacy-preserving. N N N N N N N N N P
Security against
watermark overwriting.
N E N N E E N N N E
Security against
ownership piracy.
III II I I II III II I III III
P means the security requirement is claimed to be held by proof or proper regularizers. E means an empirical evaluation
on the security was provided. N means not discussion was given or insecure.
Definition 1.

If for all PPT adversary 𝒜\mathcal{A}, the probability that 𝒜\mathcal{A} wins Exp𝒜detect\texttt{Exp}^{\text{detect}}_{\mathcal{\mathcal{A}}} is upper bounded by 12+ϵ(N)\frac{1}{2}+\epsilon(N), where ϵ\epsilon is a negligible function, then WM is privacy-preserving.

The intuition behind this definition is: an adversary cannot identify the host from the model, even if the number of candidates has been reduced to two. Almost all backdoor-based watermarking schemes are insecure under this definition. In order to protect privacy, it is crucial that WM be a probabilistic algorithm and verify depend on key.

2.3.5 Security against watermark overwriting

Assume that the adversary has watermarked MWMM_{\text{WM}} with another secret key keyadv\texttt{key}_{\texttt{adv}} using a subprocess of WM and obtained MadvM_{\texttt{adv}}: MadvkeyadvoverwritingMWMM_{\texttt{adv}}\xleftarrow[\texttt{key}_{\texttt{adv}}]{\text{overwriting}}M_{\text{WM}}. The overwriting watermark should not invalid the original one, formally, for any legal keyadv\texttt{key}_{\text{adv}}:

Prkeyadv,MadvkeyadvoverwritingMWM{verify(Madv,key)=1}1ϵ.\text{Pr}_{\scriptsize\begin{aligned} &\texttt{key}_{\texttt{adv}},\\ &M_{\texttt{adv}}\xleftarrow[\texttt{key}_{\texttt{adv}}]{\text{overwriting}}M_{\text{WM}}\end{aligned}}\left\{\texttt{verify}(M_{\texttt{adv}},\texttt{key})=1\right\}\geq 1-\epsilon. (7)

During which the randomness in choosing keyadv\texttt{key}_{\texttt{adv}}, generating MadvM_{\texttt{adv}}, and computing verify is integrated out. A watermarking scheme meets (7) is defined to be secure against watermark overwriting. This property is usually examined empirically in the literature [3, 12, 27].

2.3.6 Security against ownership piracy

In an ownership piracy attack, the adversary pirate a model by recovering key and forging verify through querying MWMM_{\text{WM}} (or verify if available). We define three levels of security according to the efforts needed to pirate a model.

  1. 1.

    Level I: The adversary only needs to wrap MWMM_{\text{WM}} or query it for a constant number of times. All zero-bit watermarking schemes belong to this level.

  2. 2.

    Level II: The adversary has to query MWMM_{\text{WM}} for a number of times that is a polynomial function of the security parameter. The more the adversary queries, the more likely it is going to succeed in pretending to be the host. The key and verify, in this case, is generally simple. For example, [12, 3] are of this level of security.

  3. 3.

    Level III: The adversary is almost impossible to pirate ownership of the model given queries of times that is a polynomial function of the security parameter. Such schemes usually borrow methods from cryptography to generate the pseudorandomness. Methods in [27, 60] are examples of this level.

Watermarking schemes of level I and II can be adopted as theft detectors. But the host can hardly adopt a level I/II scheme to convince a third-party about ownership. Using a watermarking scheme of level III, a host can prove to any third-party the model’s possessor. This is the only case that the watermark has forensics value.

\cdots𝒟primary\mathcal{D}_{\text{primary}}\cdots𝒟WMkey\mathcal{D}_{\text{WM}}^{\texttt{key}}key\cdots\cdotscpc_{\text{p}}Predictionsfor𝒯primary\mathcal{T}_{\text{primary}}.cWMc_{\text{WM}}Predictionsfor𝒯WM\mathcal{T}_{\text{WM}}.MWMM_{\text{WM}}fWMf_{\text{WM}}
Figure 1: Architecture of the MTL-based watermarking scheme. The orange blocks are the backbone, the pink block is the backend for 𝒯primary\mathcal{T}_{\text{primary}}, the blue block is the classifier for 𝒯WM\mathcal{T}_{\text{WM}}.

The scheme in [26] is a zero-bit watermarking scheme. The method proposed by Zhang et al. in [59] adopts marked images or noise as the backdoor triggers. But only a few marks that are easily forgeable were examined. The protocol of Uchida et al. [48] can be enhanced into level III secure against ownership piracy only if an authority is responsible for distributing the secret key, e.g. [55]. But it lacks covertness and the privacy-preserving property.

The VAE adopted in [29] has to be used conjugately with a secret key that enhances the robustness of the backdoor. The adversary can collect a set of mistaken samples from one class, slightly disturb them, and claim to have watermarked the neural network. To claim the ownership of a model watermarked by Adi et al. [3], the adversary samples its collection of triggers from the mistaken samples, encrypts them with a key, and submits the encrypted pairs. The perfect security of their scheme depends on the model to perform nearly perfect in the primary task, which is unrealistic in practice. As for DeepSigns [12], one adversary can choose one class and compute the empirical mean of the output of the activation functions (since the outliers are easy to detect) then generate a random matrix as the mask and claim ownership.

The scheme in [60] is of level III secure against ownership piracy as proved in the original paper. So is the method in [27] since it is generally hard to guess the actual pattern of the Wonder Filter mask from a space with size 2P2^{P}, where PP is the number of pixels of the mask. The scheme by Guan et al. in [16] is secure but extremely fragile, hence is out of the scope of practical watermarking schemes.

A comprehensive summary of established watermarking schemes judged according to the enumerated security requirements is given in Table 1.

3 The Proposed Method

3.1 Motivation

It is difficult for the backdoor-based or weight-based watermarking methods to formally meet all the proposed security requirements. Hence, we design a new white-box watermarking method for DNN model protection using multiple task learning. The watermark embedding is designed as an additional task 𝒯WM\mathcal{T}_{\text{WM}}. A classifier for 𝒯WM\mathcal{T}_{\text{WM}} is built independent to the backend for 𝒯primary\mathcal{T}_{\text{primary}}. After training and watermark embedding, only the network structure for 𝒯primary\mathcal{T}_{\text{primary}} is published.

Reverse engineering or backdoor detection as [49] cannot find any evidence of the watermark. Since no trigger is embedded in the published model’s backend. On the other hand, common FT methods such as fine-tune last layer (FTLL) or re-train last layers (RTLL) [3] that only modifies the backend layers of the model have no impact to our watermark.

Under this formulation, the functionality-preserving property, the security against tuning, the security against watermark detection and privacy-preserving can be formally addressed. A decently designed 𝒯WM\mathcal{T}_{\text{WM}} ensures the security against ownership piracy as well, making the MTL-based watermarking scheme a secure and sound option for model protection.

To better handle the forensic difficulties involving overwritten watermark and key management, we introduce a decentralized consensus protocol to authorize the time stamp embedded with the watermarks.

3.2 Overview

The proposed model consists of the MTL-based watermarking scheme and the decentralized verification protocol.

3.2.1 The MLT-based watermarking scheme

The structure of our watermarking scheme is illustrated in Fig.1. The entire network consists of the backbone network and two independent backends: cpc_{\text{p}} and cWMc_{\text{WM}}. The published model MWMM_{\text{WM}} is the backbone followed by cpc_{\text{p}}. While fWMf_{\text{WM}} is the watermarking branch for the watermarking task, in which cWMc_{\text{WM}} takes the output of different layers from the backbone as its input. By having cWMc_{\text{WM}} monitor the outputs of different layers of the backbone network, it is harder for an adversary to design modifications to invalid cWMc_{\text{WM}} completely.

To produce a watermarked model, a host should:

  1. 1.

    Generate a collection of NN samples 𝒟WMkey={xi,yi}i=1N\mathcal{D}^{\texttt{key}}_{\text{WM}}=\left\{x_{i},y_{i}\right\}_{i=1}^{N} using a pseudo-random algorithm with key as the random seed.

  2. 2.

    Optimize the entire DNN to jointly minimize the loss on 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}} and 𝒟primary\mathcal{D}_{\text{primary}}. During the optimization, a series of regularizers are designed to meet the security requirements enumerated in Section 2.

  3. 3.

    Publishes MWMM_{\text{WM}}.

To prove its ownership over a model MM to a third-party:

  1. 1.

    The host submits MM, cWMc_{\text{WM}} and key.

  2. 2.

    The third-party generates 𝒟WMkey\mathcal{D}_{\text{WM}}^{\texttt{key}} with key and combines cWMc_{\text{WM}} with MM’s backbone to build a DNN for 𝒯WM\mathcal{T}_{\text{WM}}.

  3. 3.

    If the statistical test indicates that cWMc_{\text{WM}} with MM’s backbone performs well on 𝒟WMkey\mathcal{D}_{\text{WM}}^{\texttt{key}} then the third-party confirms the host’s ownership over MM.

3.2.2 The decentralized verification protocol

To enhance the reliability of the ownership protection, it is necessary to use a protocol to authorize the watermark of the model’s host. Otherwise any adversary who has downloaded MWMM_{\text{WM}} can embed its watermark into it and pirate the model.

One option is to use an trusted key distribution center or a timing agency, which is in charge of authorizing the time stamp of the hosts’ watermarks. However, such centralized protocols are vulnerable and expensive. For this reason we resort to decentralized consensus protocols such as Raft [37] or PBFT [8], which were designed to synchronize message within a distributed community. Under these protocols, one message from a user is responded and recorded by a majority of clients within the community so this message becomes authorized and unforgeable.

Concretely, a client ss under this DNN watermarking protocol is given a pair of public key and private key. ss can publish a watermarked model or claim its ownership over some model by broadcasting:

Publishing a model: After finishing training a model watermarked withkey, ss obtains MWMM_{\text{WM}} and cWMc_{\text{WM}}. Then ss signs and broadcasts the following message to the entire community:

Publish:keytimehash(cWM),\langle\textbf{Publish:}\texttt{key}\|\texttt{time}\|\texttt{hash}(c_{\text{WM}})\rangle,

where \| denotes string concatenation, time is the time stamp, and hash is a preimage resistant hash function mapping a model into a string and is accessible for all clients. Other clients within the community verify this message using ss’s public key, verify that time lies within a recent time window and write this message into their memory. Once ss is confirmed that the majority of clients has recorded its broadcast (e.g. when ss receives a confirmation from the current leader under the Raft protocol), it publishes MWMM_{\text{WM}}.

Proving its ownership over a model MM: ss signs and broadcasts the following message:

Claim:lMhash(M)lcWM,\langle\textbf{Claim:}l_{M}\|\texttt{hash}(M)\|l_{c_{\text{WM}}}\rangle,

where lMl_{M} and lcWMl_{c_{\text{WM}}} are pointers to MM and cWMc_{\text{WM}}. Upon receiving this request, any client can independently conduct the ownership proof. It firstly downloads the model from lMl_{M} and examines its hash. Then it downloads cWMc_{\text{WM}} and retrieves the Publish message from ss by hash(cWM)\texttt{hash}(c_{\text{WM}}). The last steps follow Section. 3.2.1. After finishing the verification, this client can broadcast its result as the proof for ss’s ownership over the model in lMl_{M}.

3.3 Security Analysis of the Watermark Task

We now elaborate the design of the watermarking task 𝒯WM\mathcal{T}_{\text{WM}} and analyze its security. For simplicity, 𝒯WM\mathcal{T}_{\text{WM}} is instantiated as a binary classification task, i.e., the output of the watermarking branch has two channels. To generate 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}}, key is used as the seed of a pseudo-random generator (e.g., a stream cipher) to generate πkey\pi^{\texttt{key}}, a sequence of NN different integers from the range [0,,2m1][0,\cdots,2^{m}-1], and a binary string lkey\texttt{l}^{\texttt{key}} of length NN, where m=3log2(N)m=3\lceil\log_{2}(N)\rceil.

For each type of data space 𝒳\mathcal{X}, a deterministic and injective function is adopted to map each interger in πkey\pi^{\texttt{key}} into an element in 𝒳\mathcal{X}. For example, when 𝒳\mathcal{X} is the image domain, the mapping could be the QRcode encoder. When 𝒳\mathcal{X} is the sequence of words in English, the mapping could map an integer nn into the nn-th word of the dictionary.333We suggest not to use function that encodes integers into terms that are similar to data in 𝒯primary\mathcal{T}_{\text{primary}}, especially to data of the same class. This increase the difficulty for cWMc_{\text{WM}} to achieve perfect classification. Without loss of generality, let πkey[i]\pi^{\texttt{key}}[i] denotes the mapped data from the ii-th integer in πkey\pi^{\texttt{key}}. Both the pseudo-random generator and the functions that map integers into specialized data space should be accessible for all clients within the intellectual property protection community. Now we set:

𝒟WMkey={(πmkey[i],lkey[i])}i=1N,\mathcal{D}^{\texttt{key}}_{\text{WM}}=\left\{(\pi^{\texttt{key}}_{m}[i],\texttt{l}^{\texttt{key}}[i])\right\}_{i=1}^{N},

where lkey[i]\texttt{l}^{\texttt{key}}[i] is the ii-th bit of l. We now merge the security requirements raised in Section 2 into this framework.

3.3.1 The correctness

To verify the ownership of a model MM to a host with key given cWMc_{\text{WM}}, the process verify operates as Algo. 2.

Algorithm 2 verify(,|cWM,γ)\texttt{verify}(\cdot,\cdot|c_{\text{WM}},\gamma)
0:  MM, key.
0:  The verification of MM’s ownership.
1:  Build the watermarking branch ff from MM and cWMc_{\text{WM}};
2:  Generate 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}} from key;
3:  if ff correctly classifies at least γN\gamma\cdot N terms within 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}} then
4:     return  1.
5:  else
6:     return  0.
7:  end if

If M=MWMM=M_{\text{WM}} then MM has been trained to minimize the binary classification loss on 𝒯WM\mathcal{T}_{\text{WM}}, hence the test is likely to succeed in Algo. 2, this justifies the requirement from (1). For an arbitrary keykey\texttt{key}^{\prime}\neq\texttt{key}, the induced watermark training data 𝒟WMkey\mathcal{D}^{\texttt{key}^{\prime}}_{\text{WM}} can hardly be similar to 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}}. To formulate this intuition, consider the event where 𝒟WMkey\mathcal{D}^{\texttt{key}^{\prime}}_{\text{WM}} shares qNq\cdot N terms with 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}}, q(0,1)q\in(0,1). With a pseudorandom generator, it is computationally impossible to distinguish πkey\pi^{\texttt{key}} from an sequence of NN randomly selected intergers. The same argument holds for lkey\texttt{l}^{\texttt{key}} and a random binary string of length NN. Therefore the probability of this event can be upper bounded by:

(NqN)rqN(1r)(1q)N[(1+(1q)N)(r1r)]qN,\binom{N}{qN}\cdot r^{qN}\cdot\left(1-r\right)^{(1-q)N}\leq\left[\left(1+(1-q)N\right)\left(\frac{r}{1-r}\right)\right]^{qN},

where r=N2m+1r=\frac{N}{2^{m+1}}. For an arbitrary qq, let r<12+(1q)Nr<\frac{1}{2+(1-q)N} then the probability that 𝒟WMkey\mathcal{D}^{\texttt{key}^{\prime}}_{\text{WM}} overlaps with 𝒟WMkey\mathcal{D}^{\texttt{key}}_{\text{WM}} with a portion of qq declines exponentially.

For numbers not appeared in πkey\pi^{\texttt{key}}, the watermarking branch is expected to output a random guess. Therefore if qq is smaller than a threshold τ\tau then 𝒟WMkey\mathcal{D}^{\texttt{key}^{\prime}}_{\text{WM}} can hardly pass the statistical test in Algo.2 with nn big enough. So let

mlog2[2N(2+(1τ)N)]m\geq\log_{2}\left[2N\left(2+(1-\tau)N\right)\right]

and nn be large enough would make an effective collision in the watermark dataset almost impossible. For simplicity, setting m=3log2(N)log2(N3)m=3\cdot\lceil\log_{2}(N)\rceil\geq\log_{2}(N^{3}) is sufficient.

In cases MWMM_{\text{WM}} is replaced by an arbitrary model whose backbone structure happens to be consistent with cWMc_{\text{WM}}, the output of the watermarking branch remains a random guess. This justifies the second requirement for correct verification (2).

To select the threshold γ\gamma, assume that the random guess strategy achieves an average accuracy of at most p=0.5+αp=0.5+\alpha, where α0\alpha\geq 0 is a bias term which is assumed to decline with the growth of nn. The verification process returns 1 iff the watermark classifier achieves binary classification of accuracy no less than γ\gamma. The demand for security is that by randomly guessing, the probability that an adversary passes the test declines exponentially with nn. Let XX denotes the number of correct guessing with average accuracy pp, an adversary suceeds only if XγNX\geq\gamma\cdot N. By the Chernoff theorem:

Pr{XγN}(1p+peλeγλ)N,\text{Pr}\left\{X\geq\gamma\cdot N\right\}\leq\left(\frac{1-p+p\cdot\text{e}^{\lambda}}{\text{e}^{\gamma\cdot\lambda}}\right)^{N},

where λ\lambda is an arbitrary nonnegative number. If γ\gamma is larger than pp by a constant independent of NN then (1p+peλeγλ)\left(\frac{1-p+p\cdot\text{e}^{\lambda}}{\text{e}^{\gamma\cdot\lambda}}\right) is less than 1 with proper λ\lambda, reducing the probability of successful attack into negligibility.

3.3.2 The functionality-preserving regularizer

Denote the trainable parameters of the DNN model by w. The optimization target for 𝒯primary\mathcal{T}_{\text{primary}} takes the form:

0(w,𝒟primary)=(x,y)𝒟primaryl(MWMw(x),y)+λ0u(w),\mathcal{L}_{0}(\textbf{w},\mathcal{D}_{\text{primary}})=\sum_{(x,y)\in\mathcal{D}_{\text{primary}}}l\left(M^{\textbf{w}}_{\text{WM}}(x),y\right)+\lambda_{0}\cdot u(\textbf{w}), (8)

where ll is the loss defined by 𝒯primary\mathcal{T}_{\text{primary}} and u()u(\cdot) is a regularizer reflecting the prior knowledge on w. The normal training process computes the empirical loss in (8) by stochastically sampling batches and adopting gradient-based optimizers.

The proposed watermarking task adds an extra data dependent term to the loss function:

(w,𝒟primary,𝒟WM)\displaystyle\mathcal{L}(\textbf{w},\mathcal{D}_{\text{primary}},\mathcal{D}_{\text{WM}}) =0(w,𝒟primary)\displaystyle=\mathcal{L}_{0}(\textbf{w},\mathcal{D}_{\text{primary}}) (9)
+λ(x,y)𝒟WMlWM(fWMw(x),y),\displaystyle+\lambda\cdot\sum_{(x,y)\in\mathcal{D}_{\text{WM}}}l_{\text{WM}}\left(f^{\textbf{w}}_{\text{WM}}(x),y\right),

where lWMl_{\text{WM}} is the cross entropy loss for binary classification. We omitted the dependency of 𝒟WM\mathcal{D}_{\text{WM}} on key in this section for conciseness.

To train multiple tasks, we can minimize the loss function for multiple tasks (9) directly or train the watermarking task and the primary task alternatively [7]. Since 𝒟WM\mathcal{D}_{\text{WM}} is much smaller than 𝒟primary\mathcal{D}_{\text{primary}}, it is possible that 𝒯WM\mathcal{T}_{\text{WM}} does not properly converge when being learned simultaneously with 𝒯primary\mathcal{T}_{\text{primary}}. Hence we first optimize w according to the loss on the primary task (8) to obtain w0\textbf{w}_{0}:

w0=argminw{0(w,𝒟primary)}.\textbf{w}_{0}=\arg\min_{\textbf{w}}\left\{\mathcal{L}_{0}(\textbf{w},\mathcal{D}_{\text{primary}})\right\}.

Next, instead of directly optimizing the network w.r.t. (9), the following loss function is minimized:

1(w,𝒟primary,𝒟WM)=\displaystyle\mathcal{L}_{1}(\textbf{w},\mathcal{D}_{\text{primary}},\mathcal{D}_{\text{WM}})= (x,y)𝒟WMlWM(fWMw(x),y)\displaystyle\sum_{(x,y)\in\mathcal{D}_{\text{WM}}}l_{\text{WM}}(f^{\textbf{w}}_{\text{WM}}(x),y) (10)
+λ1Rfunc(w),\displaystyle+\lambda_{1}\cdot R_{\text{func}}(\textbf{w}),

where

Rfunc(w)=ww022.R_{\text{func}}(\textbf{w})=\|\textbf{w}-\textbf{w}_{0}\|_{2}^{2}. (11)

By introducing the regularizer RfuncR_{\text{func}} in (11), w is confined in the neighbour of w0\textbf{w}_{0}. Given this constraint and the continuity of MWMM_{\text{WM}} as a function of w, we can expect the functionality-preserving property defined in (4). Then the weaker version of functionality-preserving (3) is tractable as well.

3.3.3 The tuning regularizer

To be secure against adversary’s tuning, it is sufficient to make cWMc_{\text{WM}} robust against tuning by the definition in (5). Although 𝒟adversary\mathcal{D}_{\text{adversary}} is unknown to the host, we assume that 𝒟adversary\mathcal{D}_{\text{adversary}} shares a similar distribution as 𝒟primary\mathcal{D}_{\text{primary}}. Otherwise the stolen model would not have the state-of-the-art performance on the adversary’s task. To simulate the influence of tuning, a subset of 𝒟primary\mathcal{D}_{\text{primary}} is firstly sampled as an estimation of 𝒟adversary\mathcal{D}_{\text{adversary}}: 𝒟primarysample𝒟primary\mathcal{D}^{\prime}_{\text{primary}}\xleftarrow{\text{sample}}\mathcal{D}_{\text{primary}}. Let w be the current configuration of the model’s parameter. Tuning is usually tantanmount to minimizing the empirical loss on 𝒟primary\mathcal{D}^{\prime}_{\text{primary}} by starting from w, which results in an updated parameter: wt𝒟primarytunew\textbf{w}^{\text{t}}\xleftarrow[\mathcal{D}^{\prime}_{\text{primary}}]{\text{tune}}\textbf{w}. In practice, wt\textbf{w}^{\text{t}} is obtained by replacing 𝒟primary\mathcal{D}_{\text{primary}} in (8) by 𝒟primary\mathcal{D}^{\prime}_{\text{primary}} and conducting a few rounds of gradient descents from w.

To achieve the security against tuning defined in (5), it is sufficient that the parameter w satisfies:

𝒟primary\displaystyle\forall\mathcal{D}^{\prime}_{\text{primary}} sample𝒟primary,wt𝒟primarytunew,(x,y)𝒟WM,\displaystyle\xleftarrow{\text{sample}}\mathcal{D}_{\text{primary}},\textbf{w}^{\text{t}}\xleftarrow[\mathcal{D}^{\prime}_{\text{primary}}]{\text{tune}}\textbf{w},\forall(x,y)\in\mathcal{D}_{\text{WM}}, (12)
fWMwt(x)=y.\displaystyle f^{\textbf{w}^{\text{t}}}_{\text{WM}}(x)=y.

The condition (12), Algo.1 together with the assumption that 𝒟adversary\mathcal{D}_{\text{adversary}} is similar to 𝒟primary\mathcal{D}_{\text{primary}} imply (5).

To exert the constraint in (12) to the training process, we design a new regularizer as follows:

RDA(w)=𝒟primarysample𝒟primary,wt𝒟primarytunew,(x,y)𝒟WMlW(fWMwt(x),y).R_{\text{DA}}(\textbf{w})=\sum_{\scriptsize\begin{aligned} &\mathcal{D}^{\prime}_{\text{primary}}\xleftarrow{\text{sample}}\mathcal{D}_{\text{primary}},\\ &\textbf{w}^{\text{t}}\xleftarrow[\mathcal{D}^{\prime}_{\text{primary}}]{\text{tune}}\textbf{w},(x,y)\in\mathcal{D}_{\text{WM}}\\ \end{aligned}}l_{\text{W}}\left(f^{\textbf{w}^{\text{t}}}_{\text{WM}}(x),y\right). (13)

Then the loss to be optimized is updated from (10) to:

2(w,𝒟primary,𝒟WM)=1(w,𝒟primary,𝒟WM)+λ2RDA(w).\mathcal{L}_{2}(\textbf{w},\mathcal{D}_{\text{primary}},\mathcal{D}_{\text{WM}})=\mathcal{L}_{1}(\textbf{w},\mathcal{D}_{\text{primary}},\mathcal{D}_{\text{WM}})+\lambda_{2}\cdot R_{\text{DA}}(\textbf{w}). (14)

RDAR_{\text{DA}} defined by (13) can be understood as one kind of data augmentation for 𝒯WM\mathcal{T}_{\text{WM}}. Data augmentation aims to improve the model’s robustness against some specific perturbation in the input. This is done by proactively adding such perturbation to the training data. According to [45], data augmentation can be formulated as an additional regularizer:

(x,y)𝒟,xperturbxl(fw(x),y).\sum_{(x,y)\in\mathcal{D},x^{\prime}\xleftarrow{\text{perturb}}x}l\left(f^{\textbf{w}}(x^{\prime}),y\right). (15)

Unlike in the ordinary data domain of 𝒯primary\mathcal{T}_{\text{primary}}, it is hard to explicitly define augmentation for 𝒯WM\mathcal{T}_{\text{WM}} against tuning. However, a regularizer with the form of (15) can be derived from (13) by interchanging the order of summation so the perturbation takes the form:

x[fWMw]1(fWMwt(x))perturbx.x^{\prime}\in\left[f^{\textbf{w}}_{\text{WM}}\right]^{-1}\left(f^{\textbf{w}^{\text{t}}}_{\text{WM}}\left(x\right)\right)\xleftarrow{\text{perturb}}x.

3.3.4 Security against watermark detection

Consider the extreme case where λ1\lambda_{1}\rightarrow\infty. Under this configuration, the parameters of MWMM_{\text{WM}} are frozen and only the parameters in cWMc_{\text{WM}} are tuned. Therefore MWMM_{\text{WM}} is exactly the same as McleanM_{\text{clean}} and it seems that we have not insert any information into the model. However, by broadcasting the designed message, the host can still prove that it has obtained the white-box access to the model at an early time, which fact is enough for ownership verification. This justifies the security against watermark detection by the definition of (6), where λ1\lambda_{1} casts the role of θ\theta.

3.3.5 Privacy-preserving

Recall the definition of privacy-preserving in Section 2.3.4. We prove that, under certain configurations, the proposed watermarking method is privacy-preserving.

Theorem 1.

Let cWMc_{\text{WM}} take the form of a linear classifier whose input dimensionality is LL. If N(L+1)N\leq(L+1) then the watermarking scheme is secure against assignment detection.

Proof.

The VC-dimension of a linear classifier with LL channels is (L+1)(L+1). Therefore for N(L+1)N\leq(L+1) inputs with arbitrary binary labels, there exists one cWMc_{\text{WM}} that can almost always perfectly classify them. Given MM and an arbitrary key\texttt{key}^{\prime}, it is possible forge cWMc_{\text{WM}}^{\prime} such that cWMc_{\text{WM}}^{\prime} with MM’s backbone performs perfectly on 𝒟WMkey\mathcal{D}_{\text{WM}}^{\texttt{key}^{\prime}}. We only have to plug the parameters of MM into (14), set λ1\lambda_{1}\rightarrow\infty, λ2=0\lambda_{2}=0 and minimize the loss. This step ends up with a watermarked model MWM=MM_{\text{WM}}=M and an evidence, cWMc_{\text{WM}}^{\prime}, for key\texttt{key}^{\prime}. Hence for the experiment defined in Algo. 1, an adversary cannot identify the host’s key since evidence for both options are equally plausible. The adversary can only conduct a random guess, whose probability of success is 12\frac{1}{2}. ∎

This theorem indicates that, the MTL-based watermarking scheme can protect the host’s privacy. Moreover, given NN, it is crucial to increase the input dimensionality of cWMc_{\text{WM}} or using a sophiscated structure for cWMc_{\text{WM}} to increase its VC-dimensionality.

3.3.6 Security against watermark overwriting

It is possible to meet the definition of the security against watermark overwriting in (7) by adding the perturbation of embedding other secret keys into RDAR_{\text{DA}}. But this requires building other classifier structures and is expensive even for the host. For an adversary with insufficient training data, it is common to freeze the weights in the backbone layers as in transfer learning [38], hence (7) is satisfied. For general cases, an adversary would not disturb the backbone of the DNN too much for the sake of its functionality on the primary task. Hence we expect the watermarking branch to remain valid after overwriting.

We leave the examination of the security against watermark overwriting as an empirical study.

donetorchwhen\cdots\cdotsA sentence with random wordsgenerated by key by indexing.thisisover\cdots\cdotsA sentence from theoriginal dataset.\cdots\cdotsLSTMunitLSTMunitLSTMunit\cdotscpc_{\text{p}}cpc_{\text{p}}SentimentallabelcWMc_{\text{WM}}0/1
Figure 2: The network architecture for sentimental analysis.

3.3.7 Security against ownership piracy

Recall that in ownership piracy, the adversary is not allowed to train its own watermark classifier. Instead, it can only forge a key given a model MWMM_{\text{WM}} and a legal cWMc_{\text{WM}}, this is possible if the adversary has participated in the proof for some other client. Now the adversary is to find a new key keyadv\texttt{key}_{\text{adv}} such that 𝒟WMkeyadv\mathcal{D}^{\texttt{key}_{\text{adv}}}_{\text{WM}} can pass the statistical test defined by the watermarking branch MWMM_{\text{WM}} and cWMc_{\text{WM}}. Although it is easy to find a set of NN intergers with half of them classified as 0 and half 11 by querying the watermarking branch as an oracle, it is hard to restore a legal keyadv\texttt{key}_{\text{adv}} from this set. The protocol should adopt a stream cipher secure against key recovery attack [42], which, by definition, blocks this sort of ownership piracy and makes the proposed watermarking scheme of level III secure against ownership piracy. If cWMc_{\text{WM}} is kept secret then the ownership piracy is impossible. Afterall, ownership piracy is invalid when an authorized time stamp is avilable.

3.4 Analysis of the Verification Protocol

We now conduct the security analysis to the consensus protocal and solve the redeclaration dilemma.

To pirate a model under this protocol, an adversary must submit a legal key and the hash of a cWMc_{\text{WM}}. If the adversary does not have a legal cWMc_{\text{WM}} then this attack is impossible since the preimage resistance of hash implies that the adversary cannot forge such a watermark classifier afterwards. So this broadcast is invalid. If the adversary has managed to build a legal cWMc_{\text{WM}}, compute its hash, but has not obtained the target model then the verification can hardly succeed since the output of cWMc_{\text{WM}} with the backbone of an unknown network on the watermark dataset is random guessing. The final case is that the adversary has obtained the target model, conducted the watermark overwriting and redeclared the ownership. Recall that the model is published only if its host has successfully broadcast its Publish message and notarized its time. Hence the overwriting dilemma can be solved by comparing the time stamp inside contradictive broadcasts.

As an adaptive attack, one adversary participating in the proof of a host’s ownership over a model MM obtains the corresponding key and cWMc_{\text{WM}}, with which it can erase weight-based watermarks [48, 55]. Embedding information into the outputs of the network rather than its weights makes the MTL-based watermark harder to erase. The adversary has to identify the decision boundary from cWMc_{\text{WM}} and tune MM so samples drawn from key violates this boundary. This attack risks the model’s performance on the primary task, requires huge amont of data and computation resources and is beyond the competence of a model thief.

The remaining security risks are within the cryptological components and beyond the scope of our discussion.

4 Experiments and Discussions

4.1 Experiment Setup

To illustrate the flexibility of the proposed watermarking model, we considered four primary tasks: image classification (IC), malware classification (MC), image semantic segmentation (SS) and sentimental analysis (SA) for English. We selected four datasets for image classification, one dataset for malware classification, two datasets for semantic segmentation and two datasets for sentimental classification. The descriptions of these datasets and the corresponding DNN structures are listed in Table 2. ResNet [18] is a classical model for image processing. For the VirusShare dataset, we compiled a collection of 26,000 malware into images and adopted ResNet as the classifier [11]. Cascade mask RCNN (CMRCNN) [6] is a network architecture specialized for semantic segmentation. Glove [39] is a pre-trained word embedding that maps English words into numerical vectors, while bidirectional long short-term memory (Bi-LSTM) [21] is commonly used to analyze natural languages.

Table 2: Datasets and their DNN structures.
Dataset Description DNN structure
MNIST [13] IC, 10 classes ResNet-18
Fashion-
MNIST [54]
IC, 10 classes ResNet-18
CIFAR-10 [25] IC, 10 classes ResNet-18
CIFAR-100 [25] IC, 100 classes ResNet-18
VirusShare [1] MC, 10 classes ResNet-18
Penn-Fudan
-Pedestrian [51]
SS, 2 classes
ResNet-50+
CMRCNN
VOC [2] SS, 20 classes
ResNet-50+
CMRCNN
IMDb [34] SA, 2 classes Glove+Bi-LSTM
SST [46] SA, 5 classes Glove+Bi-LSTM
Table 3: Ablation study on regularizer configuration. Each entry contains the four metrics in Section 4.2. Semantic segmentation tasks were measured by average precision and these two models would not converge without RfuncR_{\text{func}}. The optimal/second optimal configuration for each dataset and each metric are highlighted/underlined.
Dataset McleanM_{\text{clean}}’s performance Regularizer configuration
No regularizers. RfuncR_{\text{func}} RDAR_{\text{DA}} RfuncR_{\text{func}} and RDAR_{\text{DA}}
MNIST 99.6%
98.7%,75.5%,
85.0%,1.3%
99.5%,81.5%,
85.5%,0.7%
99.3%,88.0%,
92.0%,2.0%
99.5%,95.5%,
92.5%,2.3%
Fashion-
MNIST
93.3%
92.0%,85.0%,
60.5%,9.6%
92.8%,92.0%,
74.5%,11.6%
91.4%,96.5%,
85.5%,54.6%
93.1%,95.5%,
86.0%,54.9%
CIFAR-10 91.5%
88.3%,91.5%,
74.5%,19.8%
90.8%,88.5%,
79.5%,21.0%
88.8%,96.0%,
95.0%,56.0%
88.8%,92.5%,
91.5%,53.0%
CIFAR-100 67.7%
59.9%,90.5%,
88.0%,23.6%
65.4%,90.0%,
87.0%,23.3%
58.7%,99.0%,
98.0%,35.0%
63.8%,92.5%,
97.0%,33.3%
VirusShare 97.4%
97.0%,65.0%,
88.0%,6.4%
97.2%,81.5%,
86.5%,6.5%
96.8%,99.5%,
100%,9.4%
97.3%,100%,
100%,19.5%
Penn-Fudan-
Pedestrian
0.79
0.79,90.0%,
54.5%,0.70
0.78,100%,
100%,0.78
VOC 0.69
0.67,74.0%,
98.0%,0.65
0.69,100%,
100%,0.68
IMDb 85.0%
67.3%,66.8%,
83.5%,12.0%
85.0%,66.0%,
86.3%,12.2%
69.2%,81.3%,
88.3%,29.5%
85.0%,80.0%,
90.8%,30.5%
SST 75.4%
71%,77.3%,
95.8%,12.5%
75.4%,62.5%,
95.0%,13.0%
70.8%,90.5%,
98.3%,29.4%
75.4%,86.8%,
99.0%,31.9%
Table 4: Fluctuation of the accuracy of the host’s watermarking branch.
Dataset Number of overwriting epochs
50 150 250 350
MNIST 1.0% 1.5% 1.5% 2.0%
Fashion-
MNIST
2.0% 2.5% 2.5% 2.5%
CIFAR-10 4.5% 4.5% 4.5% 4.5%
CIFAR-100 0.0% 0.5% 0.9% 0.9%
VirusShare 0.0% 0.5% 0.5% 0.5%
Penn-Fudan-
Pedestrian
0.5% 1.0% 1.0% 1.0%
VOC 1.3% 2.0% 2.1% 2.1%
IMDb 3.0% 3.0% 3.0% 3.0%
SST 2.5% 3.0% 3.0% 2.5%

For the first seven image datasets, cWMc_{\text{WM}} was a two-layer perceptron that took the outputs of the first three layers from the ResNet as input. QRcode was adopted to generate 𝒟WMkey\mathcal{D}_{\text{WM}}^{\texttt{key}}. For the NLP datasets, the network took the structure in Fig. 2.

Throughout the experiments we set N=600N=600. To set the verification threshold γ\gamma in Algo. 2, we test the classification accuracy of fWMf_{\text{WM}} across nine datasets over 5,000 𝒟WM\mathcal{D}_{\text{WM}}s different from the host’s. The result is visualized in Fig. 3, from which we observed that almost all cases pp fell in [0.425,0.575][0.425,0.575]. We selected γ=0.7\gamma=0.7 so the probability of success piracy is less than 2.69×1082.69\times 10^{-8} with λ=0.34\lambda=0.34 in the Chernoff bound.

Refer to caption
Figure 3: The empirical distribution of pp.

We conducted three tuning attacks: FT, NP, FP, and the overwriting attack to the proposed watermarking framework.

4.2 Ablation Study

To examine the efficacy of RfuncR_{\text{func}} and RDAR_{\text{DA}}, we compared the performance of the model under different combinations of two regularizers. We are interested in four metrics: (1) the performance of MWMM_{\text{WM}} on 𝒯primary\mathcal{T}_{\text{primary}}, (2) the performance of fWMf_{\text{WM}} on 𝒯WM\mathcal{T}_{\text{WM}} after FT, (3) the performance of fWMf_{\text{WM}} on 𝒯WM\mathcal{T}_{\text{WM}} after FP, and (4) the decline of the performance of MWMM_{WM} on 𝒯primary\mathcal{T}_{\text{primary}} when NP made fWMf_{\text{WM}}’s accuracy on 𝒯WM\mathcal{T}_{\text{WM}} lower than γ\gamma. The first metric reflects the decline of a model’s performance after being watermarked. The second and the third metrics measure the watermark’s robustness against an adversary’s tuning. The last metric reflects the decrease of the model’s utility when an adversary is determined to erase the watermark using NP. The model for each dataset was trained by minimizing the MTL loss defined by (14), where we adopted FT, NP and FP for tuning and chose the optimal λ1\lambda_{1} and λ2\lambda_{2} by grid search. Then we attacked each model by FT with a smaller learning rate, FP [31] and NP. The results are collected in Table 3.

We observe that by using RfuncR_{\text{func}} and RDAR_{\text{DA}}, it is possible to preserve the watermarked model’s performance on the primary task and that on the watermarking task simultaneously. Therefore we suggest that whenever possible, the two regularizers should be incorporated in training the model.

4.3 Watermark Detection

As an illustration of the security against watermark detection, we illustrated the property inference attack [15]. The distributions of the parameters of a clean model, a model watermarked by our method and one weight-based method [12] for CIFAR-10 are visualized in Fig. 4 and Fig. 5.

Refer to caption
Figure 4: The difference between McleanM_{\text{clean}} and a weight-based watermarked model [12].
Refer to caption
Figure 5: The difference between MWMM_{\text{WM}} and McleanM_{\text{clean}}.

In which we adopted λ1=0.05\lambda_{1}=0.05. Unlike the weight-based watermarking method analyzed in [15], our method did not result in a significant difference between the distributions of parameters of the two models. Hence an adversary can hardly distinguish a model watermarked by the MTL-based method from a clean one.

Table 5: The comparision between our method and [27, 60] with respect to: (1) the model’s performance on the primary task, (2) the accuracy of the watermarking task/backdoor after FP, (3) the decline of the model’s accuracy on the primary task when NP erase the watermark. The optimal method for each dataset with respect to each metric is highlighted.
Dataset Ours, RfuncR_{\text{func}} and RDAR_{\text{DA}} Li et al. [27] Zhu et al. [60]
Primary FP NP Primary FP NP Primary FP NP
MNIST 99.5% 92.5% 2.2% 99.0% 14.5% 0.9% 98.8% 7.0% 1.4%
Fashion-MNIST 93.1% 86.0% 54.7% 92.5% 13.5% 17.5% 91.8% 11.3% 5.8%
CIFAR-10 88.8% 91.5% 50.3% 88.5% 14.5% 13.6% 85.0% 10.0% 17.1%
CIFAR-100 63.8% 97.0% 29.4% 63.6% 1.2% 5.5% 65.7% 0.8% 0.9%
VirusShare 97.3% 100% 9.6% 95.1% 8.8% 1.5% 96.3% 9.5% 1.1%

4.4 The Overwriting Attack

After adopting both regularizers, we performed overwriting attack to models for all nine tasks, where each model was embedded different keys. In all cases the adversary’s watermark could be successfully embedded into the model, as what we have predicted. The metric is the fluctuation of the watermarking branch on the watermarking task after overwriting, as indicated by (7). We recorded the fluctuation for the accuracy of the watermarking branch with the overwriting epoches. The results are collected in Table 4.

The impact of watermark overwriting is uniformly bounded by 4.5% in our settings. And the accuracy of the watermarking branch remained above the threshold γ=0.7\gamma=0.7. Combined with Table 3, we conclude that the MTL-based watermarking method is secure against watermark overwriting.

4.5 Comparision and Discussion

We implemented the watermarking methods in [60] and [27], which are both backdoor-based method of level III secure against ownership piracy. We randomly generated 600 trigger samples for [60] and assigned them with proper labels. For [27], we randomly selected Wonder Filter patterns and exerted them onto 600 randomly sampled images.

As a comparison, we list the performance of their watermarked models on the primary task, the verification accuracy of their backdoors after FP, whose damage to backdoors is larger than FT, and the decline of the performance of the watermarked models when NP was adopted to invalid the backdoors (when the accuracy of the backdoor triggers is under 15%) in Table. 5. We used the ResNet-18 DNN for all experiments and conducted experiments for the image classifications, since otherwise the backdoor is undefined.

We observe that for all metrics, our method achieved the optimal performance, this is due to:

  1. 1.

    Extra regularizers are adopted to explicitly meet the security requirements.

  2. 2.

    The MTL-based watermark does not incorporate backdoors into the model, so adversarial modifications such as FP, which are designed to eliminate backdoor, can hardly reduce our watermark.

  3. 3.

    The MTL-based watermark relies on an extra module, cWMc_{\text{WM}}, as a verifier. As an adversary cannot tamper with this module, universal tunings as NP have less impact.

Apart from these metrics, our proposal is better than other backdoor-based DNN watermarking methods since:

  1. 1.

    Backdoor-based watermarking methods are not privacy-preserving.

  2. 2.

    So far, backdoor-based watermarking methods can only be applied to image classification DNNs. This fact challenges the generality of backdoor-based watermark.

  3. 3.

    It is hard to design adaptive backdoor against specific screening algorithms. However, the MTL-based watermark can easily adapt to new tuning operators. This can be done by incorporating such tuning operator into RDAR_{\text{DA}}.

5 Conclusion

This paper presents a MTL-based DNN watermarking model for ownership verification. We summarize the basic security requirements for DNN watermark formally and raise the privacy concern. Then we propose to embed watermark as an additional task parallel to the primary task. The proposed scheme explicitly meets various security requirements by using corresponding regularizers. Those regularizers and the design of the watermarking task grant the MTL-based DNN watermarking scheme tractable security. With a decentralized consensus protocol, the entire framework is secure against all possible attacks.

We are looking forward to using cryptological protocols such as zero-knowledge proof to improve the ownership verification process so it is possible to use one secret key for multiple notarizations.

Acknowledgments

This work receives support from anonymous reviewers.

Availability

Materials of this paper, including source code and part of the dataset, are available at http://github.com/a_new_account/xxx.

References

  • [1] Virusshare dataset. https://virusshare.com/.
  • [2] Voc dataset. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/.
  • [3] Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {\{USENIX}\} Security Symposium ({\{USENIX}\} Security 18), pages 1615–1631, 2018.
  • [4] M. Arnold, M. Schmucker, and S. Wolthusen. Digital Watermarking and Content Protection: Techniques and Applications. 2003.
  • [5] Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. {\{CSI}\}{\{NN}\}: Reverse engineering of neural network architectures through electromagnetic side channel. In 28th {\{USENIX}\} Security Symposium ({\{USENIX}\} Security 19), pages 515–532, 2019.
  • [6] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
  • [7] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
  • [8] Miguel Castro, Barbara Liskov, et al. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999.
  • [9] Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, and Dawn Song. Refit: a unified watermark removal framework for deep learning systems with limited data. arXiv preprint arXiv:1911.07205, 2019.
  • [10] Jen-Tzung Chien. Deep bayesian natural language processing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 25–30, 2019.
  • [11] Qianfeng Chu, Gongshen Liu, and Xinyu Zhu. Visualization feature and cnn based homology classification of malicious code. Chinese Journal of Electronics, 29(1):154–160, 2020.
  • [12] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. Deepsigns: an end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 485–497, 2019.
  • [13] Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  • [14] Mohamed Elhoseny, Mahmoud Mohamed Selim, and K Shankar. Optimal deep learning based convolution neural network for digital forensics face sketch synthesis in internet of things (iot). International Journal of Machine Learning and Cybernetics, pages 1–12, 2020.
  • [15] Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 619–633, 2018.
  • [16] Xiquan Guan, Huamin Feng, Weiming Zhang, Hang Zhou, Jie Zhang, and Nenghai Yu. Reversible watermarking in deep convolutional neural networks for integrity authentication. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2273–2280, 2020.
  • [17] Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, et al. Gluoncv and gluonnlp: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, 21(23):1–7, 2020.
  • [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [19] Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 558–567, 2019.
  • [20] Weizhe Hua, Zhiru Zhang, and G Edward Suh. Reverse engineering convolutional neural networks through side-channel information leaks. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2018.
  • [21] Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.
  • [22] Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018.
  • [23] Manish Kesarwani, Bhaskar Mukhoty, Vijay Arya, and Sameep Mehta. Model extraction warning in mlaas paradigm. In Proceedings of the 34th Annual Computer Security Applications Conference, pages 371–380, 2018.
  • [24] Thorsten Knoll. Adapting kerckhoffs’ s principle. Advanced Microkernel Operating Systems, pages 93–97, 2018.
  • [25] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
  • [26] Erwan Le Merrer, Patrick Perez, and Gilles Trédan. Adversarial frontier stitching for remote neural network watermarking. Neural Computing and Applications, 32(13):9233–9244, 2020.
  • [27] Huiying Li, Emily Willson, Haitao Zheng, and Ben Y Zhao. Persistent and unforgeable watermarks for deep neural networks. arXiv preprint arXiv:1910.01226, 2019.
  • [28] Yige Li, Nodens Koren, Lingjuan Lyu, Xixiang Lyu, Bo Li, and Xingjun Ma. Neural attention distillation: Erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930, 2021.
  • [29] Zheng Li, Chengyu Hu, Yang Zhang, and Shanqing Guo. How to prove your model belongs to you: a blind-watermark based framework to protect intellectual property of dnn. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 126–137, 2019.
  • [30] Ji Lin, Wei-Ming Chen, Yujun Lin, Chuang Gan, Song Han, et al. Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems, 33:1–12, 2020.
  • [31] Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses, pages 273–294. Springer, 2018.
  • [32] Xuankai Liu, Fengting Li, Bihan Wen, and Qi Li. Removing backdoor-based watermarks in neural networks with limited data. arXiv preprint arXiv:2008.00407, 2020.
  • [33] Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. Abs: Scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 1265–1282, 2019.
  • [34] Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011.
  • [35] Ryota Namba and Jun Sakuma. Robust watermarking of neural network with exponential weighting. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, pages 228–240, 2019.
  • [36] Niall O’Mahony, Sean Campbell, Anderson Carvalho, Suman Harapanahalli, Gustavo Velasco Hernandez, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. Deep learning vs. traditional computer vision. In Science and Information Conference, pages 128–144. Springer, 2019.
  • [37] Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In 2014 {\{USENIX}\} Annual Technical Conference ({\{USENIX}\}{\{ATC}\} 14), pages 305–319, 2014.
  • [38] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
  • [39] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  • [40] Antonio Polino, Razvan Pascanu, and Dan Alistarh. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668, 2018.
  • [41] Han Qiu, Qinkai Zheng, Tianwei Zhang, Meikang Qiu, Gerard Memmi, and Jialiang Lu. Towards secure and efficient deep learning inference in dependable iot systems. IEEE Internet of Things Journal, page preprint, 2020.
  • [42] Vladimir Rudskoy. On zero practical significance of" key recovery attack on full gost block cipher with zero time and memory". IACR Cryptol. ePrint Arch., 111:1–24, 2010.
  • [43] Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems, pages 527–538, 2018.
  • [44] Masoumeh Shafieinejad, Jiaqi Wang, Nils Lukas, Xinda Li, and Florian Kerschbaum. On the robustness of the backdoor-based watermarking in deep neural networks. arXiv preprint arXiv:1906.07745, 2019.
  • [45] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):60–107, 2019.
  • [46] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
  • [47] Frederick Tung, Srikanth Muralidharan, and Greg Mori. Fine-pruning: Joint fine-tuning and compression of a convolutional network with bayesian optimization. arXiv preprint arXiv:1707.09102, 2017.
  • [48] Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin’ichi Satoh. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 269–277, 2017.
  • [49] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723, 2019.
  • [50] Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723. IEEE, 2019.
  • [51] Liming Wang, Jianbo Shi, Gang Song, and I-fan Shen. Object detection combining recognition and segmentation. In Asian conference on computer vision, pages 189–199. Springer, 2007.
  • [52] Tianhao Wang and Florian Kerschbaum. Robust and undetectable white-box watermarks for deep neural networks. arXiv preprint arXiv:1910.14268, 2019.
  • [53] Thomas Wolf, Julien Chaumond, Lysandre Debut, Victor Sanh, Clement Delangue, Anthony Moi, Pierric Cistac, Morgan Funtowicz, Joe Davison, Sam Shleifer, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, 2020.
  • [54] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
  • [55] G. Xu, H. Li, Y. Zhang, X. Lin, R. H. Deng, and X. Shen. A deep learning framework supporting model ownership protection and traitor tracing. In 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), pages 438–446, 2020.
  • [56] Ziqi Yang, Jiyi Zhang, Ee-Chien Chang, and Zhenkai Liang. Neural network inversion in adversarial setting via background knowledge alignment. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 225–240, 2019.
  • [57] Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y Zhao. Latent backdoor attacks on deep neural networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 2041–2055, 2019.
  • [58] Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. A fourier perspective on model robustness in computer vision. Advances in Neural Information Processing Systems, 32:13276–13286, 2019.
  • [59] Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pages 159–172, 2018.
  • [60] Renjie Zhu, Xinpeng Zhang, Mengte Shi, and Zhenjun Tang. Secure neural network watermarking protocol against forging attack. EURASIP Journal on Image and Video Processing, 2020(1):1–12, 2020.