MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting
Abstract.
Protecting the intellectual property (IP) of deep neural networks (DNN) becomes an urgent concern for IT corporations. For model piracy forensics, previous model fingerprinting schemes are commonly based on adversarial examples constructed for the owner’s model as the fingerprint, and verify whether a suspect model is indeed pirated from the original model by matching the behavioral pattern on the fingerprint examples between one another. However, these methods heavily rely on the characteristics of classification tasks which inhibits their application to more general scenarios. To address this issue, we present MetaV, the first task-agnostic model fingerprinting framework which enables fingerprinting on a much wider range of DNNs independent from the downstream learning task, and exhibits strong robustness against a variety of ownership obfuscation techniques. Specifically, we generalize previous schemes into two critical design components in MetaV: the adaptive fingerprint and the meta-verifier, which are jointly optimized such that the meta-verifier learns to determine whether a suspect model is stolen based on the concatenated outputs of the suspect model on the adaptive fingerprint. As a key of being task-agnostic, the full process makes no assumption on the model internals in the ensemble only if they have the same input and output dimensions. Spanning classification, regression and generative modeling, extensive experimental results validate the substantially improved performance of MetaV over the state-of-the-art fingerprinting schemes and demonstrate the enhanced generality of MetaV for providing task-agnostic fingerprinting. For example, on fingerprinting ResNet-18 trained for skin cancer diagnosis, MetaV achieves simultaneously true positives and true negatives on a diverse test set of suspect models, achieving an about relative improvement in ARUC in comparison to the optimal baseline.
1. Introduction
In the past decades, deep learning finds a wide application in a variety of mission-critical scenarios in the real world, including autonomous driving (Cao et al., 2019), finance (Heaton et al., 2016), intelligent healthcare (Esteva et al., 2017), and many more. In the ever-evolving trend of applying deep learning in IT industry, increasingly more high-ended computing power and massive amounts of well-annotated data are devoted to the construction of deep neural networks (DNN) (Real et al., 2019; Devlin et al., 2019; Zhou et al., 2021), which are later deployed as prediction APIs, i.e., Machine-Learning-as-a-Service (MLaaS), to provide intelligent service for profiting. Considering the substantial training costs, many IT corporations as the model owners become aware of the importance of protecting the confidentiality of those well-trained DNN, as an inseparable part of their intelligent property (IP). Threateningly, even with careful access control, an attacker can still pirate the working DNN behind an online intelligent service by conducting system (Yan et al., 2020; Jeong et al., 2021) or algorithmic attacks (Tramèr et al., 2016; Yu et al., 2020).
Orthogonal to the advances in protecting DNNs against model privacy (Juuti et al., 2019), model watermarking and model fingerprinting are two fast-developing techniques for model piracy forensics. Applying model watermarking, the model owner embeds a secret into his/her owned model (i.e., the target model). Once the ownership of a DNN model is in doubt (i.e., the suspect model), a trusted third party verifies the existence of the exclusively known secret in the suspect model to determine the actual ownership. From Uchida et al. (2017), previous works devise different types of secrets (e.g., a specific function or a specific parameter pattern) into various parts of a DNN, which we briefly survey in Section 2. However, because model watermarking unavoidably modifies the original parameters of a well-trained DNN for secret embedding, the otherwise optimal accuracy would be slightly degraded, causing an unacceptable trade-off for mission-critical tasks in healthcare and traffic (Cao et al., 2021).
Complemental to model watermarking, model fingerprinting is a passive forensic technique against model piracy, which in general tests whether certain fingerprint of the target model are present in a suspect model, which would help collect essential evidence of model piracy in the wild before filing a lawsuit. As a key difference from model watermarking, the fingerprint is innate but not embedded to the target model. In other words, no modifications on the target model are conducted during fingerprinting, which provably preserves the normal utility of well-trained DNNs. Thanks to this desirable characteristic, model fingerprinting arises as a booming direction in model protection from the last year, which attracts increasing research efforts from different backgrounds (Cao et al., 2021; Li et al., 2021; Wang and Chang, 2021; Lukas et al., 2021).
Following the fingerprinting framework in Cao et al. (2021), previous schemes mostly focus on fingerprinting classifiers by constructing a special set of adversarial examples (Szegedy et al., 2014), i.e., normal examples added with human-imperceptible perturbations which cause misclassification of the target classifier, as the fingerprint, and verifying whether a suspect model is indeed stolen from the original model by matching the behavioral pattern, e.g., the predicted labels (Lukas et al., 2021; Wang and Chang, 2021) or probability vector similarity (Li et al., 2021), on the fingerprint examples. Despite their pioneering contributions to model IP protection, existing schemes are however limited to fingerprinting DNNs for other important downstream tasks except for classification, mainly because they commonly rely on concepts like adversarial examples and classification boundary which have no direct counterparts in other typical learning tasks such as regression and generative modeling. With recent years witnessing the fast trend of distribution, deployment and redistribution of DNNs in nowadays deep learning ecosystem, how to conduct forensics on the improper reuse and illegal piracy for a more general set of DNNs poses an urgent open challenge to address.
1.1. Our Work
In this paper, we present a meta-verifier approach to task-agnostic model fingerprinting scheme (dubbed as MetaV), which for the first time enables fingerprinting on a much wider range of DNNs independent from the downstream learning task, and by design implements the robustness against a variety of ownership obfuscation techniques possibly adopted by the adversary.
To realize task-agnostic model fingerprinting, we generalize the idea of using adversarial examples and the corresponding classification results for fingerprinting respectively into two critical design components in MetaV, i.e., the adaptive fingerprint and the meta-verifier. Concisely, the adaptive fingerprint is a set of trainable inputs to the suspect model, the concatenated outputs of the suspect model on which are classified by the meta-verifier to be True or False, where True implies the suspect model is indeed stolen (i.e., positive suspect model), and False implies the suspect model is independent from the original model (i.e., negative suspect model).
To implement the design principles above, the adaptive fingerprint and the meta-verifier are jointly optimized on an ensemble composed of the target model, the positive and the negative suspect models which are virtually generated by MetaV during the fingerprinting construction phase. Specifically, the positive suspect models in the ensemble are crafted by post-processing the target model with a number of popular ownership obfuscation techniques, e.g., compression (Han et al., 2015; Li et al., 2017), fine-tuning, partial retraining and distillation (Hinton et al., 2015), while the irrelevant models are independently trained from scratch on similar learning tasks to the target model. As the full construction process of MetaV makes no assumption on the model internals or functions in the ensemble only if they have the same input and output dimensions, MetaV is therefore applicable independent of the downstream tasks for which the DNN is designed. Moreover, by permitting more types of obfuscation techniques in producing the stolen models for the model ensemble, our proposed MetaV is by construction robust against a diverse set of existing obfuscation techniques, with the potential to evolve along with future adversarial techniques.
In summary, we mainly make the following contributions:
-
•
We present MetaV, the first task-agnostic fingerprinting framework with adaptive robustness against a variety of ownership obfuscation techniques, to substantially advance the cutting-edge model fingerprinting capability to a much broader set of DNNs for arbitrary downstream tasks.
-
•
We generalize the existing fingerprinting schemes based on adversarial examples to a more universal fingerprinting framework based on the adaptive fingerprint and the meta-verifier, which are jointly optimized on an ensemble of the target model, the positive and negative suspect models crafted by the model owner to serve as a highly effective and robust fingerprint for the target model.
-
•
We extensively evaluate the performance of MetaV on practical scenarios spanning classification, regression and generative modeling. Besides the unique contribution of MetaV in providing task-agnostic fingerprinting, MetaV brings noticeable improvement over all the state-of-the-art fingerprinting schemes on DNN classifiers. For example, on fingerprinting ResNet-18 (He et al., 2016) trained for skin cancer diagnosis (Esteva et al., 2017), MetaV achieves simultaneously true positives and true negatives on a diverse test set of suspect models, with an about relative improvement in ARUC compared to the optimal baseline.
2. Related Works
2.1. Model Fingerprinting
Recently, a number of fingerprinting schemes, mainly based on constructing different types of adversarial examples as the model fingerprint, were proposed to protect the intellectual property of DNN classifiers. For example, Cao et al. (2021) propose IPGuard, one of the earliest fingerprinting schemes, to find adversarial examples near the decision boundary of the target classifier. The key assumption of IPGuard is that the target DNN classifier can be uniquely represented by its decision boundary, which is more similar with the classification boundary of a positive suspect model than a negative one. Different from IPGuard, Lukas et al. (2021) and Zhao et al. (2020) independently propose to extract so-called conferrable adversarial examples from the ensemble of the target classifier and a set of locally trained suspect classifiers. These conferrable adversarial examples, which transfer much better to the positive suspect models than to negative ones, can be regarded as a unique link between the target model and the positive suspect models. Besides, Wang and Chang (2021) utilize the geometry characteristics inherited in the DeepFool algorithm to construct adversarial examples as the fingerprint (Wang and Chang, 2021), while Li et al. (2021) leverage the similarity between models in terms of the probability vectors on test inputs for piracy detection (Li et al., 2021). More detailed surveys can be found in (Boenisch, 2020; Regazzoni et al., 2021). In this work, our proposed MetaV generalizes the aforementioned fingerprinting techniques to abstract the usage of adversarial examples and the classification results into the adaptive fingerprint and the trainable meta-verifiers, which is applicable to arbitrary DNN models in a task-agnostic way.
2.2. Model Watermarking
Orthogonal to model fingerprinting, model watermarking embeds a watermark into the trained model before it is released, which potentially sacrifices the utility of the model. Previous works on model watermarking explore various types of watermarks such as secret bit strings (Uchida et al., 2017; Rouhani et al., 2018), generated serial numbers (Xu et al., 2020) and unrelated or slightly modified sample sets (Adi et al., 2018; Zhang et al., 2018). These identifying codes are then encoded secretly into the least significant bit of the weight (Uchida et al., 2017), the distribution of outputs at the intermediate (Rouhani et al., 2018) or the full layers (Adi et al., 2018; Zhang et al., 2018; Xu et al., 2020)
3. Security Settings
3.1. Backgrounds & Notions
Model fingerprinting is a multi-party security game among a model owner, a verifier and an attacker. Initially, a model owner devotes computing power and well-curated training data to build its own DNN for a certain downstream task, crowning the obtained model as an inseparable part of the model owner’s IP. Following the nomenclature in Cao et al. (2021), we refer to the model as the target model. In nowadays deep learning ecosystem, the model owner can deploy the target model at a third-party platform like Amazon AWS as a prediction API to gain monetary profits. However, the profits may also serve as incentives on the potential attacker to conduct model piracy via, e.g., software/hardware vulnerabilities (Yan et al., 2020; Jeong et al., 2021), social engineering and algorithmic attacks (Tramèr et al., 2016; Yu et al., 2020). This essentially infringes the IP of the model owner.
As a rescue, the model owner can delegate a verifier, usually played by a trusted third party or the model owner him/herself, to provide model fingerprinting service for model piracy forensics. In general, model fingerprinting determines whether a suspect model is pirated from the target model in the two stages:
-
•
Fingerprint Construction. At the construction stage, a certain type of model fingerprint encoding the essential characteristics of the target model is constructed.
-
•
Fingerprint Verification. At the verification stage, the suspect model is attested via the prediction API (i.e., black-box access) to determine whether and with what confidence (i.e., matching rate) the fingerprint is also present in the suspect model .

3.2. Threat Model
We mainly consider the following threat model in this paper.
-
•
Attacker’s Capability. We assume the attacker would apply a variety of model post-processing techniques to obfuscate the ownership of the stolen model (detailed in the subsequent part) after he/she successfully steals the target model from an online prediction API. Such an obfuscated model is called a positive suspect model. Correspondingly, a suspect model independently trained by another honest model owner is called a negative suspect model. The attacker is assumed to answer any queries to his/her provided prediction API, as more queries served by the API bring more monetary profits.
-
•
Verifier’s Capability. Following, e.g., Cao et al. (2021), we assume the verifier has a white-box access to the target model while a black-box access to the suspect models via their prediction APIs. To relax their assumptions, we do not assume the internal architecture or the type of downstream tasks of the target model.
3.3. Adversarial Techniques
Integrating existing ownership obfuscation techniques studied in previous works, we mainly cover the following classes of adversarial techniques which the attacker is likely to adopt.
-
•
Model Compression: Compression-based obfuscation adopts weight (filter) pruning (Li et al., 2017) to remove a certain ratio of small weights (filters) in a DNN, which would largely preserve the utility of the obfuscated model on the learning task (Han et al., 2015) and inhibits heuristic-based fingerprinting from verifying the model owernship based on parameter comparison (Cao et al., 2021).
-
•
Fine-Tuning & Partial Retraining: To obfuscate the behavioral pattern of a DNN, attackers may resume the training of the stolen model on public data collected from a similar domain of the training data. Specifically, to fine-tune the last layers of a trained DNN, the parameters of the last layers are further updated according to the learning objective with the other layers fixed. In comparison, during the partial retraining, the parameters of the last layers are first randomly initialized before the training is resumed. Due to the non-convexity of deep learning (Choromańska et al., 2015), both the fine-tuned and partially-retrained models may fall into a different local optimum, preserve the original utility, but exhibit divergent prediction behaviors (Wang and Chang, 2021).
-
•
Model Distillation: Distillation-based obfuscation adopts the knowledge distillation strategies (Hinton et al., 2015; Gou et al., 2021) by viewing the stolen model or the corresponding prediction API as the teacher model, and a DNN of a different architecture as the student model (Li et al., 2021). Via distillation, the learned knowledge in the target model is inherited by the student model, which exacerbates the obfuscation of model ownership due to the transformed model architecture and the correspondingly altered predictive behaviors (Lukas et al., 2021).
4. Our Methodology for Task-Agnostic Fingerprinting
4.1. Overview of MetaV
As a generalization of previous fingerprinting schemes, our proposed MetaV abstracts the usage of adversarial examples and the corresponding prediction results as the fingerprint into two critical components respectively: (i) adaptive fingerprint, i.e., a set of trainable inputs to the target/suspect model(s), which we denote as with and called the number of fingerprint examples, and (ii) meta-verifier, i.e., a binary classifier which takes the concatenated outputs of a suspect model on the adaptive fingerprint as its input and predicts whether the suspect model is positive or negative, denoted as , a -dimensional probability simplex . As illustrated in Fig. 1, the general pipeline of MetaV mainly consists of the following three key stages:
-
•
Stage 1. (Model Ensemble Preparation) As Fig. 1(a) shows, we first craft a number of positive and negative suspect models from the target model with the aid of public data from the same domain of the owner’s training data. At the end of this stage, we obtain the sets of positive and negative suspect models, i.e., and .
-
•
Stage 2. (Fingerprint Construction in MetaV) As Fig. 1(b) shows, we then jointly optimize the adaptive fingerprint and the meta-verifier to satisfy: For both the target model or any suspect model in , the meta-verifier is trained to predict True, i.e., , on the concatenated outputs of the model on examples in , and vice versa for any suspect model in . At the end of this stage, we obtain optimized adaptive fingerprint and the corresponding meta-verifier, i.e., and respectively. We call a fingerprinting pair.
-
•
Stage 3. (Fingerprint Verification in MetaV) Finally, with the optimized fingerprinting pair , we verify whether and with what matching rate a suspect model is a stolen version or an independently trained one by querying the prediction API with the fingerprint examples in . The received prediction results are then concatenated and input to the meta-verifier to predict . When is larger than a predefined threshold , the verification process outputs True to claim possible model piracy behind the tested prediction API, or other the process outputs False to assert the fidelity of the suspect model.
In the following sections, we elaborate on the detailed methodology for the first two stages of MetaV.
4.2. Model Ensemble Preparation
To construct the adaptive fingerprint and the meta-verifier with simultaneously high robustness and uniqueness, MetaV is first required to collaborate with the model owner to prepare a diverse set of positive and negative suspect models. Intuitively, with a more representative set of positive suspect models, the obtained fingerprinting pair would stay robust against a wider range of ownership obfuscation techniques, resulting in higher true positives. Alternatively, more representative negative suspect models would lower the probability of the learned fingerprinting pair to be present in other irrelevant models, which therefore reduces the true negatives of MetaV. Specifically, the positive and negative suspect models are constructed as follows.
-
•
Prepare Positive Suspect Models. We derive a representative set of positive suspect models by randomly applying one or more common ownership obfuscation techniques mentioned in Section 3 to the target model . The applied obfuscation techniques are recommended to cover a wider range of hyperparameter configurations for better robustness. For example, we apply weight and filter pruning to the target model with different pruning ratios. As a notation, we denote the full set of obfuscation techniques for preparing the positive suspect models as . Therefore, we have . Section 5 presents the detailed composition of in the evaluation settings part.
-
•
Prepare Negative Suspect Models. We recommend three complemental sources to collect a diverse set of negative suspect models for the fingerprint construction stage of MetaV. First, MetaV may request the model owner to train a moderate number of relatively small-scale DNNs on the same training dataset of the target model. For IP protection of the target model, it would be reasonable for the model owner to devote additional computing power to collaborate with a trusted third-party verifier. Second, MetaV can also download a number of pretrained models from online sources (e.g., PyTorch Hub) and fine-tune these models on the domain-relevant public data to serve as the negative suspect models. Moreover, MetaV may consider incorporate a proportion of irrelevant publicly available models into the set of negative suspect models to further enhance the uniqueness of the obtained fingerprinting pair. We denote the prepared negative suspect models as .
4.3. Fingerprint Construction in MetaV
In this part, we detail the learning objective and the optimization algorithm for training the fingerprinting pair on the model ensemble prepared in the first stage. As Fig. 1(b) shows, the learning objective of MetaV is viewed as a binary classification problem. To formulate, we introduce an additional label for each model, which takes value in (literally, positive and negative respectively). Specifically, we label an arbitrary model as and an arbitrary model as . To supervise the adaptive fingerprint and the meta-verifier with the labels, we solve the learning objective:
(1) |
where , the prediction from the meta-verifier on the concatenated outputs of a model under test on the adaptive fingerprint. Intuitively, the learning objective above encourages the meta-verifier to output a higher than when the model under test is a positive suspect model or the target model, and vice versa for a negative suspect model.
As the learning objective above is fully derivative w.r.t. the adaptive fingerprint and the parameters of the meta-verifier, we leverage off-the-shelf non-convex optimizers (e.g., Adam (Kingma and Ba, 2015)) for gradient-based optimization. However, we notice it is resource-consuming to conduct back-propagation over the whole model ensembles in each optimization step. As an alternative, we reformulate the batched learning objective in (1) as a stochastic objective with randomness in a tuple of uniformly sampled from , and in each iteration. Besides, we adopt the reparametrization trick in Carlini and Wagner (2017) to constrain the adaptive fingerprint in the problem space . Taking , a common case in computer vision for example, Algorithm 1 presents the details of the optimization algorithm.
5. Evaluation Settings
5.1. Scenarios and Datasets
Table 2 provides an overview on the three scenarios, i.e., skin cancer diagnosis (Yang et al. (2020),classification), warfarin dose prediction (Whirl-Carrillo et al. (2012), regression), and fashion generation (Xiao et al. (2017), generative modeling), covered in the evaluation sections. The information of the datasets are concisely introduced below.
-
•
Skin Cancer Diagnosis (abbrev. Skin). The first scenario covers the usage of deep convolutional neural network (CNN) for skin cancer diagnosis. According to (Yang et al., 2020), we train a ResNet-18 (He et al., 2016) as the target model on DermaMNIST (Yang et al., 2020), which consists of multi-source dermatoscopic images of common pigmented skin lesions imaging dataset. The input size is originally , which is upsampled to be to fit the input shape of a standard ResNet-18 architecture implemented in torchvision (pyt, [n. d.]). The task is a -class classification task.
-
•
Warfarin Dose Prediction (abbrev. Warfarin). The second scenario covers the usage of FCN for warfarin dose prediction, which is a safety-critical regression task that helps predict the proper individualised warfarin dosing according to the demographic and physiological record of the patients (e.g., weight, age and genetics). We use the International Warfarin Pharmacogenetics Consortium (IWPC) dataset (Whirl-Carrillo et al., 2012), which is a public dataset composed of -dimensional features of patients and is widely used for researches in automated warfarin dosing. According to Truda and Marais (2019), we use a three-layer multi-layer perception (MLP) with ReLU as the target model, with its hidden layer composed of neurons. As a notation, we denote the architecture as --. The target model learns to predict the value of proper warfarin dosing, which is a non-negative real-valued scalar with its value in .
-
•
Fashion Generation (abbrev. Fashion) The final scenario covers the usage of FCN for generative modeling. We choose (Xiao et al., 2017), which consists of images for articles of clothing of size . We train a DCGAN-like architecture (Radford et al., 2016) for generative modeling on this task. We solely view the generator as the target model, as a well-trained generator represents more the IP of the model owner because it can be directly used to generate realistic images without the aid of the discriminator. The detailed DCGAN architecture we use is demonstrated in Table 1.
Generator | nn.ConvTranspose2d(100, 128, 4, 1, 0, bias=False) |
nn.BatchNorm2d(128) | |
nn.ReLU() | |
nn.ConvTranspose2d(128, 64, 3, 2, 1, bias=False) | |
nn.BatchNorm2d(64) | |
nn.ReLU() | |
nn.ConvTranspose2d(64, 32, 4, 2, 1, bias=False) | |
nn.BatchNorm2d(32) | |
nn.ReLU() | |
nn.ConvTranspose2d(32, 1, 4, 2, 1, bias=False) | |
nn.Tanh() | |
Discriminator | nn.Conv2d(1, 32, 4, 2, 1, bias=False) |
nn.LeakyReLU(0.2) | |
nn.Conv2d(32, 64, 4, 2, 1, bias=False) | |
nn.BatchNorm2d(64) | |
nn.LeakyReLU(0.2) | |
nn.Conv2d(64, 128, 3, 2, 1, bias=False) | |
nn.BatchNorm2d(128) | |
nn.LeakyReLU(0.2) | |
nn.Conv2d(128, 1, 4, 1, 0, bias=False) | |
nn.Sigmoid() |
Identifier | Type | Dataset | Target Model |
Skin | Classification | DermaMNIST | ResNet-18 |
Warfarin | Regression | IWPC Dataset | MLP |
Fashion | Generative Modeling | FashionMNIST | DCGAN |
5.2. Fingerprinting Benchmarks
For each scenario, we construct a model benchmark composed of positive/negative suspect models. We split the benchmark randomly by a ratio of into two independent sets of suspect models for training and testing.
Constructing Positive Suspect Models. Following Cao et al. (2021) and Lukas et al. (2021), we apply a number of popular ownership obfuscation techniques with a variety of hyperparameter configurations on the target model to derive the positive suspect models:
-
(1)
Compression: For weight pruning, we vary the ratio of pruned weights from to with a stride of . For filter pruning, we choose the ratio of pruned filters from to with a stride of .
-
(2)
Fine-Tuning & Partial Retraining: We consider types of obfuscation in this category, i.e., fine-tuning/retraining the last layer and fine-tuning/retraining all layers. For both types of retraining, the last one layer is first reset as a randomly initialized layer, after which the model is finetuned according to the configuration. We set the number of epochs for fine-tuning and retraining both as .
-
(3)
Distillation: For each target model, we select - diverse models with different architectures as the student model. For the ResNet-18 classifier, we follow the classical distillation algorithm in Hinton et al. (2015) to prepare the student model. We do not consider other model distillation algorithms because most of them require the access to the internals of the target model (i.e., the teacher) for distillation, implausible for an attacker who pirates the model from the prediction API. For the multi-layer perception (MLP) as the regressor and the DCGAN (Radford et al., 2016) as the generator, we implement the distillation algorithms in Clark et al. (2019) and Aguinaldo et al. (2019) respectively. For fine-tuning, partial retraining and distillation, we mutate the random seeds to produce multiple suspect models belonging to the corresponding category.
Constructing Negative Suspect Models. To construct the negative suspect models, we use different random seeds to initialize models of different architectures. We then train the models from scratch respectively on the original training data, on the public data from a similar domain of the training set, and on other irrelevant dataset to obtain a diverse benchmark of negative suspect models. Table 3 lists the composition of the suspect models for all the three scenarios. For convenience, we use the following abbreviation: fine-tuning the last layer (=FTLL), fine-tuning all layers (=FTAL), retraining the last layer (=RTAL), retraining all layers (=RTAL), weight-pruning (=WP), filter-pruning (=FP). For constructing distillation-based positive suspect models and independently trained negative suspect models, we implement - models of diverse architectures and incremental sizes for each of the three target models. For convenience, we index these models as S, M, L, XL, XLL. Specifically, these models are:
- •
-
•
Warfarin: S=-- (the same as the target model); M=---; L=----.
- •
Skin | Warfarin | Fashion | |||
Positive Suspect Models | Fine-tuning | FTLL | |||
FTAL | |||||
Partial Retraining | RTLL | ||||
RTAL | |||||
WP | |||||
FP | N/A | ||||
Distillation | S | ||||
M | |||||
L | |||||
XL | N/A | N/A | |||
XXL | N/A | N/A | |||
Negative Suspect Models | Independently Trained Models | S | |||
M | |||||
L | |||||
XL | N/A | ||||
XXL | N/A | N/A | |||
Irrelevant Models |
Generator | nn.Linear(100, 128) |
nn.ReLU() | |
nn.Linear(, ) | |
nn.ReLU() | |
nn.Linear(, ) | |
nn.ReLU() | |
nn.Linear(, 28x28) | |
nn.Sigmoid() | |
Discriminator | nn.Linear(28x28, ) |
nn.ReLU() | |
nn.Linear(, ) | |
nn.ReLU() | |
nn.Linear(, ) | |
nn.ReLU() | |
nn.Linear(, ) |

5.3. Baselines
We cover state-of-the-art fingerprinting schemes as the baselines for evaluating the effectiveness of MetaV under the classification settings. The baselines are respectively:
-
•
IPGuard (Cao et al., 2021): IPGuard is one of the earliest fingerprinting schemes on classification models, which searches for a set of adversarial examples with a specified label near the decision boundary of the target model as the fingerprinting examples.
-
•
ConferAE (Lukas et al., 2021): ConferAE improves the design of IPGuard by further covering the distillation-based obfuscation techniques. Specifically, ConferAE constructs a set of adversarial examples on a prepared ensemble of positive/negative suspect models which may be implemented with different architecture from the target model. The generated adversarial examples are additionally required to be transferable from the target model to the positive suspect models but not to the negative ones.
-
•
DeepFoolFP (Wang and Chang, 2021): DeepFoolFP literally leverages the DeepFool algorithm (Moosavi-Dezfooli et al., 2016) to generate adversarial examples as the model fingerprints, the motivation of which is to improve the efficiency of fingerprint construction. For verification, the above model fingerprinting schemes define the matching rate as the ratio of the fingerprinting examples which are correctly classified by the suspect model into the specified class.
-
•
ModelDiff (Li et al., 2021): ModelDiff is a very recent technique which is originally proposed to quantify the behavioral similarity between a pair of models. Specifically, the similarity is measured as the cosine similarity of two models’ decision distance vector, each element of which is the distance of the logits between a clean test input and an adversarial example derived from the input. A suspect model is verified if its behavioral similarity with the target model is smaller than a threshold.
With no further specifications, the number of fingerprint examples, i.e., , is set as for MetaV and the baselines by default. More details on the baselines are in Section 2.
5.4. Performance Metrics
Given a predefined threshold , MetaV and all the baseline methods recognize a suspect model as positive when the matching rate of fingerprint verification is higher than , or otherwise recognize the model as negative. In the evaluation, we following the evaluation protocol in (Cao et al., 2021) which is composed of the metrics below:
-
(1)
Robustness/Uniqueness (/ ): The robustness metric measures the proportion of positive suspect models also recognized as positive by the fingerprinting scheme, i.e., true positives.
-
(2)
Uniqueness (): The uniqueness metric measures the proportion of negative suspect models also recognized as negative by the fingerprinting scheme, i.e.,true negatives.
-
(3)
Area under the Robustness-Uniqueness Curves (ARUC): ARUC measures the area of the intersecion region under the robustness and uniqueness when the threshold varies in , i.e., . A higher ARUC implies a more wider value range for the threshold to choose from to obtain simultaneously high robustness and uniqueness. ARUC is empirically calculated as the average on with . For all the experiments, we run repetitive experiments and report the average metric with the confidence interval.
5.5. Other Implementation Details
5.5.1. Hyperparameter Setups
With no further specifications, we always set the number of fingerprint examples, i.e., , for MetaV and the baselines as for fair comparisons. We set the learning rate in Algorithm 1 as and the number of iteration as . In all the three scenarios, we implement the meta-verifier as a three-layer fully-connected neural network with the ReLU hidden layer size of .
5.5.2. Experimental Environment
All the defenses and experiments are implemented with PyTorch (Paszke et al., 2019), an open-source software framework for numeric computation and deep learning. All our experiments are conducted on a Linux server running Ubuntu 16.04, one AMD Ryzen Threadripper 2990WX 32-core processor and 2 NVIDIA GTX RTX2080 GPUs.
6. Results & Analysis
6.1. Comparison with Baselines
First, we compare the performance of MetaV with state-of-the-art model fingerprinting schemes specifically designed for classifiers. Fig. 3 reports the robustness (i.e., true positives) when the threshold is set to allow the uniqueness (i.e., true negatives) to reach on the test set, along with the corresponding ARUC presented in Fig.2(a).

As Fig. 3 shows, our proposed MetaV is the only method which simultaneously achieves robustness and uniqueness in fingerprinting a stolen and adversarially obfuscated ResNet-18 classifier for skin cancer diagnosis. Besides, as we can see from Fig.2(a), MetaV constructs the model fingerprint with the highest ARUC metric, i.e., , among all the tested fingerprinting schemes, which improves the optimal baseline IPGuard by absolutely, i.e., a roughly relative improvement. As we construct a more diverse benchmark of suspect models compared with previous works, the ARUC of IPGuard is not as high as the results reported in Cao et al. (2021). Fig. 2(b)-(f) show the robustness and uniqueness curves of each fingerprint schemes.
6.2. Time Efficiency of MetaV
Next, we empirically study the learning behaviors and the time complexity of MetaV when constructing the fingerprint of a ResNet-18. As Fig. 4 shows, the ARUC and loss curves demonstrate the time efficiency of MetaV in fingerprint construction.


In less than seconds, MetaV stably constructs a fingerprinting pair which achieves an ARUC over . Besides, Table 5 presents a tentative comparison on the time cost of each fingerprint scheme for constructing the corresponding model fingerprint to achieve the reported ARUC in Fig. 2 and for fingerprint verification in the same experimental environment detailed in the experimental setting part. As is shown, MetaV is similarly efficient compared with the state-of-the-art fingerprinting schemes.
Construction | Verification | |
MetaV | 202+11 | 7.2+0.8 |
IPGuard | 177+1 | 9.1+0.2 |
ModelDiff | 123+1 | 13.6+0.5 |
DeepFoolFP | 174+4 | 9.1+0.6 |
ConferAE | ¿2860 | 10.0+0.6 |
6.3. MetaV for Task-Independent Fingerprinting
Besides the substantial improvements in fingerprinting classifiers, more importantly, MetaV presents the first task-agnostic fingerprinting scheme which can be applied to more general application scenarios. To validate, we apply MetaV to fingerprint an MLP for regression (i.e., the Warfarin case) and a DCGAN for generative modeling (i.e., the Fashion case), which requires no modification on Algorithm 1, as MetaV is by design independent from either the internals or the functions of the target model.
Fig. 5(a)-(b) plot the curves of robustness and uniqueness of MetaV on Warfarin and Fashion when the threshold increases from to , where the area of the shaded region is by definition the ARUC. As we can see, the robustness and the uniqueness remain unless the threshold is very close to or . This results in an over ARUC for both the two scenarios which existing fingerprinting schemes can hardly handle.
6.4. Number of Fingerprint Examples
We further study the influence of the number of fingerprint examples on the performance of MetaV and the baselines. Fig. 6&5(c) presents the ARUC curves when the number of fingerprint examples, i.e., , increases, on the classification and non-classification tasks respectively. As is shown, in all the three scenarios, the performance of MetaV increases stably when increases from to . For example, when fingerprinting ResNet-18 on Skin, the ARUC of MetaV is about when is enlarged from to . This is a desirable feature of MetaV as one would naturally expect a more accurate fingerprinting when more computing power is devoted to the construction of the model fingerprints. In comparison, the upward trend is unclear for all the baseline schemes. Similarly, enhanced performance is also observed on Warfarin and Fashion by about and respectively, which is noticeable considering the already high ARUC of MetaV when the number of fingerprint examples is .
6.5. Size of Prepared Model Ensemble
Finally, we provide quantitative results to analyze the impact of the model ensemble size on the performance of MetaV. We fix the number of fingerprint examples as and randomly sample different ratios of positive/negative suspect models from the full model ensemble for training MetaV. Fig. 7&5(d) shows the ARUC curves on the three scenarios when the model ensemble size varies. As is shown, the ARUC of MetaV shows a steady upward trend when the model ensemble is enlarged, which conforms to our design principle that a more diverse set of crafted suspect models would help construct more unique and robust model fingerprints.


7. Conclusion
In this paper, we present MetaV, the first task-agnostic model fingerprinting framework which: (a) substantially improves existing fingerprinting schemes on classification models in terms of fingerprint robustness and uniqueness, and, (b) more importantly, advances the capability of model piracy forensics to more general application scenarios. As it is a common challenge for any novel fingerprinting methods to be evaluated on large-scale datasets (e.g., for evalaution on ImageNet, one has to train over suspect models on the dataset to construct the benchmark, which would incur over 50 days of computation on medium-end devices), we design our evaluation at the same scale of all our previous works by involving images only. It would be meaningful for future works to cooperate with the industry to evaluate MetaV on larger datasets. Meanwhile, although MetaV by design has no assumptions on the input, the architecture, or the output of the target model, considering the impossibility of exhausting all possible task types, we mainly choose the three representative scenarios in our paper for evaluation. Future works may consider deploy and evaluate the effectiveness of MetaV on other typical learning tasks such as feature extraction, information retrieval and ranking.
References
- (1)
- pyt ([n. d.]) [n. d.]. PyTorch Hub. https://pytorch.org/hub/. Accessed: 2021-02-01.
- Adi et al. (2018) Y. Adi, Carsten Baum, et al. 2018. Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring. In USENIX Security Symposium.
- Aguinaldo et al. (2019) Angela Aguinaldo, Ping-Yeh Chiang, Alex Gain, Ameya D. Patil, Kolten Pearson, and S. Feizi. 2019. Compressing GANs using Knowledge Distillation. ArXiv abs/1902.00159 (2019).
- Boenisch (2020) Franziska Boenisch. 2020. A Survey on Model Watermarking Neural Networks. ArXiv (2020).
- Cao et al. (2021) Xiaoyu Cao, J. Jia, et al. 2021. IPGuard: Protecting the Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary. AsiaCCS (2021).
- Cao et al. (2019) Yulong Cao, Chaowei Xiao, et al. 2019. Adversarial Sensor Attack on LiDAR-based Perception in Autonomous Driving. CCS (2019).
- Carlini and Wagner (2017) Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. IEEE Symposium on Security and Privacy (2017).
- Choromańska et al. (2015) A. Choromańska, Mikael Henaff, Michaël Mathieu, G. B. Arous, and Y. LeCun. 2015. The Loss Surfaces of Multilayer Networks. In AISTATS.
- Clark et al. (2019) Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, and Quoc V. Le. 2019. BAM! Born-Again Multi-Task Networks for Natural Language Understanding. In ACL.
- Devlin et al. (2019) J. Devlin, Ming-Wei Chang, et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
- Esteva et al. (2017) Andre Esteva, B. Kuprel, et al. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature (2017).
- Gou et al. (2021) Jianping Gou, B. Yu, S. Maybank, and D. Tao. 2021. Knowledge Distillation: A Survey. Int. J. Comput. Vis. (2021).
- Han et al. (2015) Song Han, Jeff Pool, et al. 2015. Learning both Weights and Connections for Efficient Neural Network. ArXiv (2015).
- He et al. (2016) Kaiming He, X. Zhang, et al. 2016. Deep Residual Learning for Image Recognition. CVPR (2016), 770–778.
- Heaton et al. (2016) J. B. Heaton, Nicholas G. Polson, et al. 2016. Deep Learning for Finance: Deep Portfolios. Econometric Modeling: Capital Markets - Portfolio Theory eJournal (2016).
- Hinton et al. (2015) Geoffrey E. Hinton, Oriol Vinyals, and J. Dean. 2015. Distilling the Knowledge in a Neural Network. ArXiv (2015).
- Huang et al. (2017) Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269.
- Iandola et al. (2016) Forrest N. Iandola, M. Moskewicz, Khalid Ashraf, Song Han, W. Dally, and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡1MB model size. ArXiv abs/1602.07360 (2016).
- Jeong et al. (2021) Hoyong Jeong, Dohyun Ryu, and Junbeom Hur. 2021. Neural Network Stealing via Meltdown. ICOIN (2021), 36–38.
- Juuti et al. (2019) Mika Juuti, Sebastian Szyller, et al. 2019. PRADA: Protecting Against DNN Model Stealing Attacks. EuroS&P (2019).
- Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).
- Krizhevsky (2014) A. Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. ArXiv abs/1404.5997 (2014).
- Li et al. (2017) Hao Li, Asim Kadav, et al. 2017. Pruning Filters for Efficient ConvNets. ArXiv (2017).
- Li et al. (2021) Yuanchun Li, Ziqi Zhang, et al. 2021. ModelDiff: testing-based DNN similarity comparison for model reuse detection. ISSTA (2021).
- Lukas et al. (2021) Nils Lukas, Yuxuan Zhang, et al. 2021. Deep Neural Network Fingerprinting by Conferrable Adversarial Examples. ICLR (2021).
- Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, et al. 2016. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. CVPR (2016).
- Paszke et al. (2019) Adam Paszke, S. Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, N. Gimelshein, L. Antiga, Alban Desmaison, Andreas Köpf, E. Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS.
- Radford et al. (2016) Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. CoRR abs/1511.06434 (2016).
- Real et al. (2019) Esteban Real, A. Aggarwal, Y. Huang, and Quoc V. Le. 2019. Regularized Evolution for Image Classifier Architecture Search. In AAAI.
- Regazzoni et al. (2021) F. Regazzoni, P. Palmieri, et al. 2021. Protecting artificial intelligence IPs: a survey of watermarking and fingerprinting for machine learning. CAAI Transactions on Intelligence Technology (2021).
- Rouhani et al. (2018) B. Rouhani, Huili Chen, et al. 2018. DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models. ArXiv (2018).
- Simonyan and Zisserman (2015) K. Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv (2015).
- Szegedy et al. (2014) Christian Szegedy, W. Zaremba, Ilya Sutskever, Joan Bruna, et al. 2014. Intriguing properties of neural networks. ArXiv (2014).
- Tramèr et al. (2016) Florian Tramèr, F. Zhang, et al. 2016. Stealing Machine Learning Models via Prediction APIs. In USENIX Security.
- Truda and Marais (2019) G. Truda and P. Marais. 2019. Warfarin dose estimation on multiple datasets with automated hyperparameter optimisation and a novel software framework. ArXiv (2019).
- Uchida et al. (2017) Y. Uchida, Yuki Nagai, et al. 2017. Embedding Watermarks into Deep Neural Networks. ICMR (2017).
- Wang and Chang (2021) Si Wang and Chip-Hong Chang. 2021. Fingerprinting Deep Neural Networks - a DeepFool Approach. ISCAS (2021).
- Whirl-Carrillo et al. (2012) M. Whirl-Carrillo, E. McDonagh, et al. 2012. Pharmacogenomics Knowledge for Personalized Medicine. Clinical Pharmacology & Therapeutics (2012).
- Xiao et al. (2017) H. Xiao, K. Rasul, et al. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. ArXiv (2017).
- Xu et al. (2020) Xiangrui Xu, Y. Li, et al. 2020. “Identity Bracelets” for Deep Neural Networks. IEEE Access (2020).
- Yan et al. (2020) Mengjia Yan, Christopher W. Fletcher, and J. Torrellas. 2020. Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures. USENIX Security (2020).
- Yang et al. (2020) Jiancheng Yang, R. Shi, et al. 2020. MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. ArXiv (2020).
- Yu et al. (2020) Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. In NDSS.
- Zhang et al. (2018) Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. 2018. Protecting intellectual property of deep neural networks with watermarking. AsiaCCS (2018).
- Zhao et al. (2020) Jingjing Zhao, Qingyue Hu, et al. 2020. AFA: Adversarial fingerprinting authentication for deep neural networks. Comput. Commun. (2020).
- Zhou et al. (2021) Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In AAAI.