\stackMath

$DLOVE$ : A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

Sudev Kumar Padhi
Indian Institute of Technology Bhilai
Durg Chattisgarh 491002
[email protected] Dr. Sk. Subidh Ali
Indian Institute of Technology Bhilai
Durg Chattisgarh 491002
[email protected]

Abstract

Recent developments in Deep Neural Network ( $DNN$ ) based watermarking techniques have shown remarkable performance. The state-of-the-art $DNN$ -based techniques not only surpass the robustness of classical watermarking techniques but also show their robustness against many image manipulation techniques. In this paper, we performed a detailed security analysis of different $DNN$ -based watermarking techniques. We propose a new class of attack called the Deep Learning-based OVErwriting ( $DLOVE$ ) attack, which leverages adversarial machine learning and overwrites the original embedded watermark with a targeted watermark in a watermarked image. To the best of our knowledge, this attack is the first of its kind. To show adaptability and efficiency, we launch our $DLOVE$ attack analysis on four different watermarking techniques, $HiDDeN$ , $ReDMark$ , $PIMoG$ , and Hiding Images in an Image. All these techniques use different approaches to create imperceptible watermarked images. Our attack analysis on these watermarking techniques with various constraints highlights the vulnerabilities of $DNN$ -based watermarking. Extensive experimental results validate the capabilities of $DLOVE$ . We propose $DLOVE$ as a benchmark security analysis tool to test the robustness of future deep learning-based watermarking techniques.

Keywords:

Deep Learning Adversarial Machine Learning (

AML

) Digital Watermarking.

1 Introduction

Digital watermarking is a well-known technique where the watermark (message or image) is embedded covertly or overtly into a cover image without distorting the quality of the cover image [25, 8, 9, 42, 7]. It has various critical applications, such as copyright protection, content authentication, tamper detection, data hiding, etc. In watermarking, the sender embeds the watermark into the cover image and sends the watermarked image to the receiver or verifier. To validate the authenticity or copyright, the watermark from the received watermarked image is extracted and compared with the original watermark, which is provided to the receiver or verifier in advance. Generally, watermarking techniques consist of two processes: watermark embedding and watermark extraction. In watermark embedding, the watermark is embedded into the input cover image to produce a watermarked image. While in the watermark extraction process, the watermark is extracted from the watermarked image and compared with the original watermark to validate the ownership or authenticity of the cover image. One of the popular watermarking techniques is invisible watermarking, where the watermark is covertly embedded in the cover image. The security of any invisible watermarking techniques lies in the secrecy of the embedded watermark, such that the watermarked image should be perceptually similar to the cover image and should not contain any detectable artifact.

The classical watermarking techniques use a wide variety of embedding approaches from the spatial and frequency domains [50, 40, 47, 48, 45, 5, 28, 29]. Recently, deep learning has emerged as the key enabler of $AI$ applications. Thus, there has been an increase in deep learning techniques using Deep Neural Networks ( $DNN$ ) for different tasks due to their adaptability in various applications. It is also being utilized in the domain of watermarking techniques, which has resulted in significant improvements in performance and efficiency compared to traditional techniques [4]. In $DNN$ -based watermarking techniques, the watermark embedding and extraction processes are implemented using deep generative networks, such as autoencoders and Generative Adversarial Networks ( $GAN$ ). The pioneering $DNN$ -based watermarking technique proposed in [4] can hide an $RGB$ image within another $RGB$ image using an autoencoder network. $DNN$ -based watermarking was further enhanced by introducing distortion into the training data to make the watermarked images robust against certain noises [44, 21]. These simple autoencoder-based techniques are vulnerable to Deep Learning based Removal ( $DLR$ ) attacks [20, 6, 24]. There are different types of $DLR$ attacks. In one of the approaches, the attacker trains a denoising autoencoder to remove the watermark from the watermarked image as noise [6]. In another approach, the pixel distribution of the watermarked image is used to identify the distorted pixels for removing the watermark [20]. Pixel impainting technique is also utilized to remove the watermark from the watermarked image [24]. In this line, the watermarking technique proposed in [53, 51, 19, 26] is considered to be robust against $DLR$ attacks due to the presence of noise layers in their model architectures. Among these, the most popular technique is $HiDDeN$ [53], which can withstand arbitrary types of image distortion and makes robust watermarked images. $PIMoG$ [12] went one step ahead by introducing screen-shooting robustness such that the watermark can be extracted even if the digital image is captured with a camera. This robustness is achieved by introducing a mask-guided loss in the training pipeline of the watermarking technique. Similarly, $ReDMark$ [2] uses residual structure to embed the watermark, striking a balance between robustness and impermeability.

Please note that $DLR$ attacks are useful for limited applications where the attacker’s objective is just to fail the ownership claim of the actual owner of the cover image. The attacker cannot claim ownership of the cover image using $DLR$ attacks. In order to claim ownership of a cover image, the attacker has to overwrite the original watermark of a given watermarked image with the attacker’s watermark such that the watermark extraction process should extract the attacker’s watermark from the watermarked image instead of the original watermark. There is no doubt that classical watermark overwriting attacks will not work on $DNN$ -based watermarking technique techniques [39, 38]. It requires a Deep Learning based OVErwriting ( $DLOVE$ ) attack. However, there is hardly any work in the open literature related to the $DLOVE$ attack. In regular deep learning applications, similar attacks are common, which are known as Adversarial Machine Learning ( $AML$ ) attacks [43, 13, 37, 34]. In targeted $AML$ , the attacker induces a well-crafted perturbation into the input image such that the model used for classification not only fails to classify it but is also forced to misclassify it into a target class as desired by the attacker. We can intuitively consider that the attacker is overwriting the features of the original class in the input image with the features of the target class. Inspired by targeted $AML$ attacks, for the first time, we developed the $DLOVE$ attack against $DNN$ -based watermarking techniques.

In this paper, we perform a security analysis of $DNN$ -based watermarking techniques using the $DLOVE$ attack. Here, the robustness of these $DNN$ -based watermarking techniques is verified against well-crafted perturbations where the final goal is to overwrite the embedded watermark with the desired watermark. The attack is targeted for the real-world scenario where the watermarking techniques are used to perform copyright protection. To show the adaptability and efficiency, we launch our $DLOVE$ attack on four different watermarking techniques, which are $HiDDeN$ , $ReDMark$ [2], $PIMoG$ [12], and Hiding Images in an Image [4]. All these techniques use different approaches to create imperceptible watermarked images. Devising a common approach to attack these techniques with various constraints highlights the vulnerabilities of $DNN$ -based watermarking.

The paper makes the following key contributions:

1.

We are the first to propose $DLOVE$ , a watermarking overwriting attack based on the concept of targeted $AML$ to overwrite the embedded watermark with the target watermark by adding well-crafted perturbation to the watermarked images.
2.

We introduce a new class of attack solely using the knowledge available when $DNN$ -based watermarking techniques are used for copyright protection.
3.

A detailed experimental result is provided to validate the success of the $DLOVE$ attack. The results demonstrate that the $DLOVE$ attack generalizes well on different $DNN$ -based watermarking techniques.

2 Related Works

2.1 Deep learning based Watermarking

Recently, many $DNN$ -based watermarking techniques have been proposed, which surpass the performance of traditional watermarking by utilizing the efficient feature extraction ability of the neural networks. The main architecture used in $DNN$ -based watermarking involves the use of an encoder network that embeds the watermark into the cover image and a decoder network that extracts the watermark from the watermarked image. $DNN$ -based watermarking can embed an image or bit string as a watermark but most techniques choose to embed a bit strings. Bit strings work as metadata and provide more robustness compared to embedding images as a watermark. This is due to the fact that embedding an image requires the decoder to learn the spatial information of the watermark, which can hamper robustness. The training of the encoder and decoder is done in an end-to-end manner as a pipeline [4, 21, 11]. To further enhance the quality and robustness of the watermarked image, a discriminator is added in the pipeline while training and noise layers are added in the model architecture of the $DNN$ -based watermarking [53, 26, 12]. The discriminator acts as an adversary network, which predicts whether the watermark is embedded in an image. Residual connections and layers of random combinations of a fixed set of distortions are also used in some model architectures to make the watermarking technique more robust with high data hiding capacity [2, 30]. Almost all of these $DNN$ -based methods achieve great performance in terms of image quality. Generally, when we consider robustness in watermarking, it refers to handling distortion that exists in image processing, such as $JPEG$ compression, blurring, noises, crop out, etc. There is hardly any analysis that aims to find the vulnerability of $DNN$ -based watermarking techniques against the $DLOVE$ attacks.

2.2 Adversarial Machine Learning

$AML$ attacks have the capability to fail highly accurate machine learning models [43, 13, 37, 34] by adding a well-crafted perturbation into the input image. These attacks are majorly developed to fail deep convolution neural network-based classifiers. Transferable $AML$ attacks are also developed [27, 52, 35, 36] such that a perturbation crafted to fail one model can also be used to fail other models that perform a similar task even if the attacker has no access to the second model’s parameters or architecture. In $AML$ , knowledge of the attacker is assumed to be either white-box (complete knowledge of the target model architecture, its parameters and training data) [13, 31, 33] or black-box access ( limited or no knowledge to the target model) [15, 17]. In a white-box attack, the attacker can craft adversarial examples by directly manipulating the input data to maximize the model’s loss or misclassification using the model parameters. While in a black-box attack, despite lacking internal knowledge of the model, the attackers can still generate adversarial examples by exploiting the model’s response to input queries. These queries can be carefully chosen such that by observing its outputs, information about the model can be inferred, and adversarial examples can be crafted accordingly using the transferability of $AML$ attacks.

$AML$ is a great tool for the designer of deep convolution neural network-based classifiers to test the robustness of their classifiers. There is a lack of such tools in the domain of $DNN$ -based watermarking techniques. In this paper, we tried to overcome this lacuna by introducing the $DLOVE$ attack, which can be an interesting tool for the designers of $DNN$ -based watermarking techniques. Our approach is inspired by targeted $AML$ attacks. The objective of the $DLOVE$ attack is to craft a new watermark, which, once added to the watermarked image, will force the watermark decoder to decode the new watermark instead of the original watermark. We demonstrate the $DLOVE$ attack in the white box as well as black box settings.

3 Threat Model

3.1 Attackers Goals

Copyright protection is one of the most important use cases of watermarking through which the owner of digital content can claim its rights. An attacker can violate the copyright protection either by corrupting/cleaning the embedded watermark in the image so that the decoder cannot decode the watermark from the watermarked image (Objective 1) or by overwriting the embedded watermark present in the watermarked image with the target watermark so that the decoder will decode the target watermark instead of the original watermark from the watermarked image (Objective 2 ). In either case, the objective is to defeat the techniques of watermarking.

Refer to caption — Figure 1: Overview of the proposed $DLOVE$ attack leveraging Adversarial Machine Learning to a create well-crafted perturbation to overwrite the original watermark with the target watermark.

3.2 Attackers Knowledge

Before going into the details of the attack, we make the following assumptions about the attacker’s knowledge:
Training Data: The attacker has no knowledge of the training data used to train the $DNN$ -based watermarking model in both variants of the $DLOVE$ attack (white box and black box settings). This included both the watermark and the cover image.
Network Architecture: The architecture of the encoder network is not known to the attacker in both black and white box settings. In the white-box variant of the attack, it is assumed that the attacker has knowledge of the decoder network architecture and its parameters. The same is not valid for the black-box setting of the $DLOVE$ attack.

The black box setting of the $DLOVE$ attack is more practical and useful in professional applications of watermark [32, 14, 1, 46], where the watermarking technique is available as a service ( $API$ ) to verify the digital content. In such a scenario, the attacker can subscribe to the service and get Oracle access to both the encoder and decoder through its $API$ . Nevertheless, there is a limit on the number query to the $API$ . However, in stringent secure application scenarios, even Oracle access to the decoder is infeasible for the attacker as it remains under the possession of the verifier only. The $DLOVE$ attack considers these stringent security assumptions in the black-box setting.

3.3 Scenario

Let $Alice$ be a digital artist who creates digital paintings. She wants to protect her digital paintings (copyright) from unauthorized use and distribution. $Alice$ uses $DNN$ -based invisible watermarking as it protects the copyright of the painting and also preserves its aesthetic appeal. The watermarking technique subscribed by Alice uses the logos of the artist as the watermark. Thus, Alice embeds the logo of her website into her digital paintings (cover image). For verification, the verifier needs to find the presence of a watermark, extract it, and verify the owner of the digital painting. In copyright protection, similar information that forms the metadata of the digital content for different owners is used as a watermark (in this case, it is a logo). This is to make sure that the verifier can verify with consistent information. Now, there is an attacker, $Eve$ , who has also subscribed to the same watermarking technique used by Alice. Thus, she knows that the digital paintings of Alice are copyright-protected with the logo of Alice’s website. $Eve$ can clean or overwrite the watermark with a target watermark containing a different logo in the watermarked image and recirculate it. By achieving Objective 1, $Eve$ can only remove the watermark from the digital painting. While achieving Objective 2, $Eve$ not only removes the watermark but also makes herself the digital painting owner by embedding her logo into it. $Alice$ cannot prove that the digital painting belongs to her as the decoder decodes the logo, which belongs to $Eve$ . This scenario is depicted in Figure 1, where the decoder decodes the target watermark instead of the original watermark when the well-crafted perturbation is added to the watermarked image. Thus, the verifier will announce that the digital paintings belong to $Eve$ .

4 Proposed Approach

4.1 Formal description

$DNN$ -based watermarking techniques consist of an encoder and a decoder. The encoder $E$ produces a watermarked image $W$ by embedding the watermark $\alpha$ into the cover image $I$ as shown in Eq. (1). In contrast, the decoder $D$ takes $W$ as input and extracts the embedded watermark $\alpha$ as the output, as shown in Eq. (2). The attacker’s aim is to launch the $DLOVE$ attack to fool $D$ by inducing adversarial perturbation $\delta$ in $W$ such that $D$ decodes the target watermark $\beta$ instead of the original embedded watermark $\alpha$ as shown in Eq. (3).

	$\displaystyle E(I+\alpha)\rightarrow W$		(1)
	$\displaystyle D(W)\rightarrow\alpha$		(2)
	$\displaystyle D(W+\delta)\rightarrow\beta$		(3)

4.1.1 White-Box Access:

Having white-box access to the decoder gives the attacker enough information to simulate the network by devising a targetted adversarial attack and using the gradients of the decoder to create the desired perturbation $\delta$ , where $\alpha$ is the original watermark, $\beta$ is the target watermark and $\epsilon$ is the perturbation limit. We minimize the loss ( $l$ ) of $\beta$ , which is the target watermark while maximizing the loss of $\alpha$ , which is the original watermark, i.e. we solve the optimization problem as shown in Eq. (4). This is the easiest approach but does not align with the use cases of watermarking, where access to the decoder is not allowed.

\stackunder{minimize}{\delta}\{l(D(W+\delta),\beta)-l(D(W+\delta),\alpha)\},\quad\delta\in[-\epsilon,\epsilon]

(4)

4.1.2 Black-Box Access:

If the attacker has the ability to use the decoder as an oracle, it can obtain a set of watermarked images and their watermarks by querying the decoder with watermarked images. Once this data set is available, the attacker can train a surrogate decoder. Afterwards, a white-box attack is performed on the surrogate decoder to craft the desired perturbation $\delta$ , which is used to launch the $DOVE$ attack on $D$ . However, in stringent security applications, even the decoder is not available. Therefore, we consider only having limited instances of watermarked images whose watermarks are known. One of the easiest ways for the attacker to gain access to such data is to request the subscribed copyright protection service provider to copyright on, say, $n$ pairs of cover images and their watermarks.

Under this scenario, the attacker will train its own $DNN$ -based surrogate watermarking encoder ( $E^{\prime}$ ) and decoder ( $D^{\prime}$ ) models with its own dataset (also known as the surrogate dataset) $i.e.$ , a set of cover images and their watermarks. Once the surrogate model is trained, $D^{\prime}$ is fine-tuned with the limited instances of watermarked images available from the target decoder $D$ to be attacked. While fine-tuning, loss between the extracted watermark and the original watermark is used for the training of the surrogate decoder, as shown in Eq (5). The surrogate decoder is trained and fine-tuned to act as the target decoder $D$ , making the black-box attack transferable. Therefore, the attacker can launch a white-box $DLOVE$ attack on $D^{\prime}$ using the gradient information to craft the desired perturbation $\delta$ as shown in Eq. (6). The same $\delta$ can be used to fail $D$ when added with $W$ (Eq. (7)). The value $\epsilon$ is chosen judiciously such that the induced perturbation ( $\delta$ ) to the watermarked image is imperceptible. Figure 2 refers to the training procedure of the surrogate model and fine-tuning of the surrogate decoder to perform an attack on the target decoder.

{minimize}\{l(D^{\prime}(W),\alpha)\}

(5)

\stackunder{minimize}{\delta}\{l(D^{\prime}(W+\delta),\beta)\},\quad\delta\in[-\epsilon,\epsilon]

(6)

\ D(W+\delta)\rightarrow\beta\

(7)

4.2 Crafting algorithm

The adversarial perturbation crafting algorithm is shown in Algo 1. The algorithm shows how to craft a perturbation using the $DLOVE$ attack in a white box scenario. Inputs to the algorithm are, a watermarked image $W$ , the target decoder $D$ , the target watermark $\beta$ , a perturbation $\delta$ (initialized as zero) with the same size ( $k\times k$ ) as $W$ , a limiting range $\epsilon$ of $\delta$ (- $\epsilon$ $\leq$ $\delta$ $\leq$ $\epsilon$ ). $\delta$ is added with $W$ and passed into the decoder $D$ , which decodes the secret as $\gamma$ . The loss between $\gamma$ and $\beta$ is computed using the chosen loss function $l$ . In each iteration of the loop, the optimizer tries to minimize the loss between $\gamma$ and $\beta$ and maximize the loss between $\beta$ and $\alpha$ . Accordingly, $\Delta$ is updated. This process is repeated until the model converges and the desired $\delta$ is obtained, which is the realization of the $DLOVE$ attack on $W$ to overwrite $\alpha$ with $\beta$ .

Algorithm 1 Adversarial perturbation crafting algorithm

W

{D}

\mathbf{\beta}

\delta

\epsilon

\delta\leftarrow[0]_{k\times k}

\triangleright

Initial perturbation

3:max_iter=

k\times k

\gamma

D

(

W+\delta

)

\triangleright

Embedded Watermark

5:while

i\leq max\_iter

\gamma\neq

\beta

\gamma\leftarrow D(W+\delta)

\triangleright

Intermediate decoder’s output

\Delta\leftarrow l(\beta,\gamma)-l(\beta,\alpha)

8: Update

\delta:\delta_{i}\leftarrow\delta_{i-1}-\eta\nabla\Delta(\delta)

\triangleright

Update delta with respect to the loss

\delta\leftarrow Clip(\delta,[-\epsilon,\epsilon])

\triangleright

\delta

is clipped

10:end while

11:return

\delta

The hyperparameters (parameters that we explicitly define) for this attack include:

1.

$\epsilon$ : The maximum amount of allowable perturbation that can be added to the images.
2.

Optimizer: It is used to find the well-crafted perturbation $\delta$ .
3.

$l$ : The loss function chosen for minimizes the loss between $\gamma$ and $\beta$ and maximizes the loss between $\beta$ and $\alpha$ .

This algorithm works similarly in the black-box scenario where the decoder $D$ is replaced by a surrogate decoder $D^{\prime}$ , and the corresponding loss will be $l(\beta,\gamma)$ + $l(D^{\prime}(W),\alpha)$ in line $6$ of the algorithm.

4.3 Reason For Successful Attack

In the classification task, whatever may be the input, the classifier will always classify it into one of the classes. These classes are also known to the attacker while performing a targeted adversarial attack. Thus, the attacker adds perturbation such that the boundary of the current class is crossed to the target class and the confidence level of the classifier corresponding to the target class is the highest. Suppose the watermark is of $N$ -bit, $i.e.$ , the output of the decoder is a $N$ -bit watermark. Now, we can consider the decoder as a classifier that classifies the watermarked image into one of $2^{N}$ possible watermark classes. Therefore, our attack can be considered a targeted adversarial attack where the target class is the target watermark among one of the $2^{N}$ possible cases. $DNN$ -based watermarking techniques are trained end-to-end based on the perceptual similarity of the image after embedding a watermark, which makes the embedding region-specific and susceptible to attack. Even if the models are trained for robustness against prominent image manipulation attacks, the same factor is responsible for the generation of adversarial perturbations.

5 Experimental Results

In this section, we validate the effectiveness of our $DLOVE$ attack on four well known $DNN$ -based watermarking techniques: $HiDDeN$ [53], $ReDMark$ [2], $PIMoG$ [12] and Hiding Images in an Image [4]. Experiments are conducted on a machine with $14$ - $core$ $Intel$ $i9$ $10940X$ $CPU$ , $128$ $GB$ $RAM$ , and two $Nvidia$ $RTX$ - $5000$ $GPU$ s with $16$ $GB$ $VRAM$ each.

5.1 Setup of Target Models

The target $DNN$ -based watermarking techniques have different $DNN$ model architectures, training pipelines, training datasets, and watermark sizes. The key features of these four techniques are described below with the help of Table 1:

1.

$HiDDeN$ is an end-to-end model for image watermarking that is robust to arbitrary types of image distortion. It comprises four main components: an encoder, a parameterless noise layer $N$ , a decoder, and an adversarial discriminator. The encoder uses a $128$ $\times$ $128$ $\times$ $3$ cover image to embed a $30$ -bit binary watermark.
2.

$ReDMark$ uses residual connections, circular convolution, attack layer (simulated attacks during training against real-world manipulations, particularly JPEG compression), and $1$ d convolution layers for embedding and extracting the watermark. It takes a grayscale $32$ $\times$ $32$ $\times$ $1$ cover image and embeds a $4$ $\times$ $4$ -bit watermark using the residual connection between the layers.
3.

$PIMoG$ consists of three main parts: the encoder, the screen-shooting noise layer, and the decoder. In order to achieve both screen-shooting robustness by handling perspective distortion, illumination distortion and moi ${r}$ e distortion while maintaining high visual quality, the technique uses an adversary network with edge mask-guided image loss and gradient mask-guided image loss. It uses a $128$ $\times$ $128$ $\times$ $3$ size cover image to embed a $30$ -bit watermark.
4.

Hiding Images in an Image could hide a $200$ $\times$ $200$ $\times$ $3$ image as a watermark inside a $200$ $\times$ $200$ $\times$ $3$ cover image. In this technique, the watermarked image is passed through a preparation network that transforms and concatenates it to the original cover image using a hiding network. During decoding, the watermarked image is passed through a reveal network and outputs the watermarked image.

Table 1: Characteristics of different

DNN

-based watermarking techniques which are attacked by

DLOVE

Technique

Dicrimator

in the Loop

Cover

Image Size

Watermark

Size

Dataset

HiDDeN

[53]

Yes

128

\times

128

\times

3

30

bit

COCO

[23]

ReDMark

[2]

32

\times

32

\times

1

4

\times

4

bit

CIFAR10

[22]

PIMoG

[12]

Yes

128

\times

128

\times

3

30

bit

COCO

[23]

Hiding Images in an Image [4]

200

\times

200

\times

3

200

\times

200

\times

3

bit

ImageNet

[10]

5.2 Training Surrogate Model

Each of the techniques mentioned above uses a different resolution of the cover image and watermark sizes as shown in Table 1. Therefore, we have made four instances of the surrogate model (encoder and decoder), $i.e.$ , one instance for each target model. The size of the cover image, watermarked image and watermark for each instance of the surrogate model is set according to the target model it corresponds. We have used $UNet$ [41] architecture for the surrogate encoder, whereas for the surrogate decoder, after trying various models, we have used two different architectures: one is spatial transformer [18] with seven convolutional layers followed by two fully connected layers and the other is Self-supervised vision transformer ( $SiT$ ) [3] based autoencoder. The first one is used to build surrogate decoders for $HiDDeN$ , $ReDMark$ , and $PIMoG$ , where a bit-string is used as the watermark. The second architecture is used to build the surrogate decoder for Hiding Images in an Image, where an image is used as the watermark. We have used the $Mirflickr$ [16] dataset as our surrogate dataset, consisting of one million images with varied contexts, lighting, and themes, from the social photography site $Flickr$ . We used $50$ k images in our training set and $10$ k in our test set. We trained all four surrogate models in an end-to-end manner for 200 epochs, which is the general approach followed in $DNN$ -based watermarking techniques [4, 53, 12, 2]. We used $MSE$ , $LPIPS$ [49], and $L_{2}$ residual regularization loss functions between the cover image and the watermarked image for training the surrogate encoders. While training surrogate decoders for $HiDDeN$ , $ReDMark$ , and $PIMoG$ , where a bit-string is used as the watermark, we used $BCE$ to calculate the loss between the extracted and the original watermarks. In the same line, $MSE$ and $LPIPS$ loss is used to train the surrogate decoder for Hiding Images in an Image.

5.3 Fine-tuning Surrogate Decoder

In order to demonstrate the efficacy of our attack (Algo 1), we will next show that it can successfully adapt to different watermarking techniques through a set of experiments. Before going into the details of our attack results, we briefly discuss our fine-tuning setup. For the fine-tuning, we collected $500$ watermarked images and their watermarks from each of the four watermarking techniques. In the case of $HiDDeN$ , $ReDMark$ and $PIMoG$ , each watermarked image is embedded with a unique watermark generated from randomly sampled bits, whereas for Hiding Images in an Image, we use randomly sampled images from the $Imagenet$ dataset as the watermark. Subsequently, the four instances of the trained surrogate decoder are fine-tuned on the watermarked images of the respective target decoder. The results show that fine-tuning the surrogate decoder for $100$ epochs is sufficient to attack the target decoder successfully.

5.4 Attack Validation

In order to validate our attack, we generate $10000$ watermarked images using each of the four target watermarking techniques. Using each of these watermarked images, we attack the corresponding fine-tuned surrogate decoder using Algo 1 to generate corresponding well-crafted perturbations. These perturbations are added to the watermarked images and are used to attack the target decoder to evaluate the success rate of our attack. We tried our attack on the surrogate decoder with $L_{1}$ and $MSE$ loss (Line $06$ , Algo 1). Finally, we chose $MSE$ , as the perturbation generated was imperceptible and the attack converged quickly. The initial perturbation $\delta$ is initialized as a zero-filled vector, whereas $\epsilon$ is chosen as $-0.3\leq\epsilon\leq 0.3$ . We employ the $Adam$ optimizer with an initial learning rate of $0.001$ . The attack converges around $5000$ iterations for all four watermarking techniques.

5.5 Evaluation

After the initial training of all four surrogate decoders has an accuracy of more than $90\%$ in successfully extracting the embedded watermark when validated on the test set. This shows that all four surrogate models have converged successfully. Subsequently, these surrogate decoders are successfully fine-tuned with $500$ watermarked images and their watermarks within $100$ epochs to launch a white-box attack on the surrogate decoder to get the well-crafted perturbation that will fail the target decoder. An experimental analysis is performed for each instance of the surrogate decoder to check if it can be fine-tuned in less than $100$ epochs and also with less than $500$ watermarked images and their watermarks. The optimum epochs and required watermarked images are shown in $2^{nd}$ and $3^{rd}$ columns of Table 2.

Table 2: Optimal epoch and watermarked image required for fine-tuning different surrogate models along with the image quality of the watermarked images after attacking and adding the well-crafted perturbation using

DLOVE

attack. For

PSNR

and

SSIM

, higher is better. For

LPIPS

and

MSE

, lower is better.

ASR

represent the rate of success on attacking

10000

watermarked images generated from each

DNN

-based watermarking technique.

Technique

Fine-Tuning

Pert Limit (

\epsilon

)

Evaluation Matrix

Epoch

Image

PSNR

SSIM

LPIPS

MSE

ASR

ReDMark

200

0.002

0.97

0.08

0.05

HiDDeN

300

0.008

0.99

0.07

0.15

PIMoG

400

0.02

0.99

0.1

0.27

Hiding Images

in an Image

500

0.1

0.95

0.12

0.36

The evaluations are made using the settings mentioned in the Section 5.4. One of the most important metrics in our evaluations is the Attack Success Rate ( $ASR$ ), which defines the percentage of adversarial examples that lead to successful attacks in the target decoder. Furthermore, in order to evaluate the quality and the similarity of the attacked (perturbed) watermarked images in comparison to their respective original watermarked images, we use $MSE$ , Peak Signal-to-Noise Ratio ( $PSNR$ ), Structural Similarity Index ( $SSIM$ ), and $LPIPS$ (from Alexnet Network). These image quality metrics are used in combination with visual analysis to show the quality of our $DLOVE$ , $i.e.$ , our watermark overwriting attack. In our terms, a good quality attack not only fails the target decoder without leaving any visual traces (artifacts) in the attacked watermarked image. In this line, the $MSE$ and $PSNR$ metrics provide pixel-wise error measurement, which helps the difference between the attacked watermarked images and their respective original watermarked images with respect to the pixels and their orientations. At the same time, the $SSIM$ and $LPIPS$ metrics measure the image quality specifically such that there is no degradation of an image by adding perturbation.

The performance of our $DLOVE$ attack on different techniques is shown in Table 2. Figure 3 show the watermarked image quality after the $DLOVE$ attack where the normal image refers to the original watermarked image and the attacked image refers to the perturbed watermarked image after $DLOVE$ attack. Figure 4 shows the quality of the target watermark extracted after attacking the technique of Hiding Images in an Image using $DLOVE$ attack. The attacked images maintained low values of $MSE$ , $LPIPS$ , and high values of $PSNR$ (in %) and $SSIM$ scores, indicating the added perturbation’s imperceptibility. The $ASR$ is $90\%$ for almost all the techniques, highlighting the efficacy of the $DLOVE$ attack. Our attack fails for instances where the cosine similarity between the original and target watermarks is less than $0.1$ . In order to get a $100\%$ success rate in these cases, we had to sacrifice in perturbation limit, which is increased to $0.5$ . This took a toll on all other metrics. Thus, the $MSE$ and $LPIPS$ increased, while $PSNR$ and $SSIM$ decreased significantly. Noticeable artifacts also appeared in the attacked images, as shown in Figure 5.

5.6 Discussion

The initial training of the surrogate model for ReDMark and $HiDDeN$ have a good accuracy and fine-tuning with the target watermarked image leads to a successful $DLOVE$ attack. There were some problems with Hiding Images in an Image and PIMoG where the initial surrogate model had good accuracy and similar fine-tuning was performed, but still the $DLOVE$ attack was unsuccessful. In the case of Hiding Images in an Image after the $DLOVE$ attack, the target decoder could recover a distorted watermark that neither matched with the original watermark nor matched with the target watermark. To overcome this issue, we have added $LPIPS$ loss (from $VGG$ - $19$ Network) along with the initial $MSE$ loss while training the surrogate decoder with the surrogate dataset. This led to a successful $DLOVE$ attack when we fine-tuned the surrogate model for $90$ epochs with $500$ watermarked image of the target decoder. In the case of, PiMoG, the target decoder recovers the original watermark successfully, even with the perturbed watermarked image. This was possible due to the presence of screen shooting robustness. We introduced perspective warp, motion blur, and colour manipulations to overcome the issue while initially training our surrogate model. Subsequently, the surrogate decoder is fine-tuned for $70$ epochs with $400$ watermarked images of the target decoder, which led to a successful $DLOVE$ attack. This shows that the $DLOVE$ attack needs minute tweaking in surrogate training and fine-tuning such that it can be adaptable to different techniques.

6 Conclusion

In this work, we have proposed the $DLOVE$ attack on $DNN$ -based watermarking techniques by leveraging adversarial machine learning. The attack shows that modern $DNN$ -based watermarking techniques are vulnerable to the $DLOVE$ attack. The proposed $DLOVE$ attack raises a clear question on the security of the existing $DNN$ -based watermarking techniques. It provides a new attack vector to the designer community to assess the security of their $DNN$ -based watermarking techniques.

References

[1] Adobe on watermarking ai-generated photos. https://blog.adobe.com/en/publish/2023/10/10/new-content-credentials-icon-transparency, accessed: 2022-20-02
[2] Ahmadi, M., Norouzi, A., Karimi, N., Samavi, S., Emami, A.: Redmark: Framework for residual diffusion watermarking based on deep networks. Expert Systems with Applications 146, 113157 (2020)
[3] Atito, S., Awais, M., Kittler, J.: Sit: Self-supervised vision transformer. arXiv preprint arXiv:2104.03602 (2021)
[4] Baluja, S.: Hiding images within images. IEEE transactions on pattern analysis and machine intelligence 42(7), 1685–1697 (2019)
[5] Berghel, H., O’Gorman, L.: Protecting ownership rights through digital watermarking. Computer 29(7), 101–103 (1996). https://doi.org/10.1109/2.511977
[6] Corley, I., Lwowski, J., Hoffman, J.: Destruction of image steganography using generative adversarial networks. arXiv preprint arXiv:1912.10070 (2019)
[7] Cox, I., Kilian, J., Leighton, F., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6(12), 1673–1687 (1997). https://doi.org/10.1109/83.650120
[8] Cox, I., Miller, M., Bloom, J., Honsinger, C.: Digital watermarking. Journal of Electronic Imaging 11(3), 414–414 (2002)
[9] Cox, I.J., Miller, M.L.: The first 50 years of electronic watermarking. EURASIP Journal on Advances in Signal Processing 2002, 1–7 (2002)
[10] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
[11] Fang, H., Chen, D., Huang, Q., Zhang, J., Ma, Z., Zhang, W., Yu, N.: Deep template-based watermarking. IEEE Transactions on Circuits and Systems for Video Technology 31(4), 1436–1451 (2020)
[12] Fang, H., Jia, Z., Ma, Z., Chang, E.C., Zhang, W.: Pimog: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 2267–2275 (2022)
[13] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
[14] Google on watermarking ai-generated contents. https://deepmind.google/technologies/synthid/, accessed: 2022-20-02
[15] Guo, C., Gardner, J., You, Y., Wilson, A.G., Weinberger, K.: Simple black-box adversarial attacks. In: International Conference on Machine Learning. pp. 2484–2493. PMLR (2019)
[16] Huiskes, M.J., Thomee, B., Lew, M.S.: New trends and ideas in visual concept detection: The mir flickr retrieval evaluation initiative. In: Proceedings of the international conference on Multimedia information retrieval. pp. 527–536 (2010)
[17] Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: International conference on machine learning. pp. 2137–2146. PMLR (2018)
[18] Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Advances in neural information processing systems 28 (2015)
[19] Jia, Z., Fang, H., Zhang, W.: Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression. In: Proceedings of the 29th ACM international conference on multimedia. pp. 41–49 (2021)
[20] Jung, D., Bae, H., Choi, H.S., Yoon, S.: Pixelsteganalysis: Pixel-wise hidden information removal with low visual degradation. IEEE Transactions on Dependable and Secure Computing (2021)
[21] Kandi, H., Mishra, D., Gorthi, S.R.S.: Exploring the learning capabilities of convolutional neural networks for robust image watermarking. Computers & Security 65, 247–268 (2017)
[22] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
[23] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
[24] Liu, H., Xiang, T., Guo, S., Li, H., Zhang, T., Liao, X.: Erase and repair: An efficient box-free removal attack on high-capacity deep hiding. IEEE Transactions on Information Forensics and Security (2023)
[25] Liu, T., Qiu, Z.d.: The survey of digital watermarking-based image authentication techniques. In: 6th International Conference on Signal Processing, 2002. vol. 2, pp. 1556–1559. IEEE (2002)
[26] Liu, Y., Guo, M., Zhang, J., Zhu, Y., Xie, X.: A novel two-stage separable deep learning framework for practical blind watermarking. In: Proceedings of the 27th ACM International conference on multimedia. pp. 1509–1517 (2019)
[27] Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016)
[28] Lu, C.S., Huang, S.K., Sze, C.J., Liao, H.Y.M.: Cocktail watermarking for digital image protection. IEEE Transactions on Multimedia 2(4), 209–224 (2000). https://doi.org/10.1109/6046.890056
[29] Lu, C.S., Liao, H.Y.: Multipurpose watermarking for image authentication and protection. IEEE Transactions on Image Processing 10(10), 1579–1592 (2001). https://doi.org/10.1109/83.951542
[30] Luo, X., Zhan, R., Chang, H., Yang, F., Milanfar, P.: Distortion agnostic deep watermarking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13548–13557 (2020)
[31] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
[32] Meta on watermark ai-generated photos. https://about.fb.com/news/2024/02/labeling-ai-generated-images-on-facebook-instagram-and-threads/, accessed: 2022-20-02
[33] Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)
[34] Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 427–436 (2015)
[35] Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)
[36] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security. pp. 506–519 (2017)
[37] Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P). pp. 372–387. IEEE (2016)
[38] Pibre, L., Jérôme, P., Ienco, D., Chaumont, M.: Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch. arXiv preprint arXiv:1511.04855 (2015)
[39] Qian, Y., Dong, J., Wang, W., Tan, T.: Deep learning for steganalysis via convolutional neural networks. In: Media Watermarking, Security, and Forensics 2015. vol. 9409, pp. 171–180. SPIE (2015)
[40] Raj, N.N., Shreelekshmi, R.: A survey on fragile watermarking based image authentication schemes. Multimedia Tools and Applications 80, 19307–19333 (2021)
[41] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
[42] Shaik, A.S., Karsh, R.K., Islam, M., Laskar, R.H.: A review of hashing based image authentication techniques. Multimedia Tools and Applications pp. 1–28 (2022)
[43] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
[44] Vukotić, V., Chappelier, V., Furon, T.: Are deep neural networks good for blind image watermarking? In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). pp. 1–7. IEEE (2018)
[45] Wang, Y., Doherty, J.F., Van Dyck, R.E.: A wavelet-based watermarking algorithm for ownership verification of digital images. IEEE transactions on image processing 11(2), 77–88 (2002)
[46] Single-frame & image forensic watermarking. https://castlabs.com/image-watermarking/, accessed: 2022-20-02
[47] Wong, P.W.: A public key watermark for image verification and authentication. Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269) 1, 455–459 vol.1 (1998), https://api.semanticscholar.org/CorpusID:15447332
[48] Wong, P.W., Memon, N.: Secret and public key image watermarking schemes for image authentication and ownership verification. IEEE Transactions on Image Processing 10(10), 1593–1601 (2001). https://doi.org/10.1109/83.951543
[49] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
[50] Zhang, X., Wang, S.: Statistical fragile watermarking capable of locating individual tampered pixels. IEEE Signal Processing Letters 14(10), 727–730 (2007). https://doi.org/10.1109/LSP.2007.896436
[51] Zhong, X., Huang, P.C., Mastorakis, S., Shih, F.Y.: An automated and robust image watermarking scheme based on deep neural networks. IEEE Transactions on Multimedia 23, 1951–1961 (2020)
[52] Zhou, W., Hou, X., Chen, Y., Tang, M., Huang, X., Gan, X., Yang, Y.: Transferable adversarial perturbations. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 452–467 (2018)
[53] Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: Hidden: Hiding data with deep networks. In: Proceedings of the European conference on computer vision (ECCV). pp. 657–672 (2018)

D​L​O​V​EDLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

Abstract

Keywords:

1 Introduction

2 Related Works

2.1 Deep learning based Watermarking

2.2 Adversarial Machine Learning

3 Threat Model

3.1 Attackers Goals

3.2 Attackers Knowledge

3.3 Scenario

4 Proposed Approach

4.1 Formal description

4.1.1 White-Box Access:

4.1.2 Black-Box Access:

4.2 Crafting algorithm

4.3 Reason For Successful Attack

5 Experimental Results

5.1 Setup of Target Models

5.2 Training Surrogate Model

5.3 Fine-tuning Surrogate Decoder

5.4 Attack Validation

5.5 Evaluation

5.6 Discussion

6 Conclusion

References

$DLOVE$ : A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques