This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\stackMath

DLOVEDLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

Sudev Kumar Padhi
Indian Institute of Technology
   Bhilai
Durg
   Chattisgarh    491002
[email protected]
   Dr. Sk. Subidh Ali
Indian Institute of Technology
   Bhilai
Durg
   Chattisgarh    491002
[email protected]
Abstract

Recent developments in Deep Neural Network (DNNDNN) based watermarking techniques have shown remarkable performance. The state-of-the-art DNNDNN-based techniques not only surpass the robustness of classical watermarking techniques but also show their robustness against many image manipulation techniques. In this paper, we performed a detailed security analysis of different DNNDNN-based watermarking techniques. We propose a new class of attack called the Deep Learning-based OVErwriting (DLOVEDLOVE) attack, which leverages adversarial machine learning and overwrites the original embedded watermark with a targeted watermark in a watermarked image. To the best of our knowledge, this attack is the first of its kind. To show adaptability and efficiency, we launch our DLOVEDLOVE attack analysis on four different watermarking techniques, HiDDeNHiDDeN, ReDMarkReDMark, PIMoGPIMoG, and Hiding Images in an Image. All these techniques use different approaches to create imperceptible watermarked images. Our attack analysis on these watermarking techniques with various constraints highlights the vulnerabilities of DNNDNN-based watermarking. Extensive experimental results validate the capabilities of DLOVEDLOVE. We propose DLOVEDLOVE as a benchmark security analysis tool to test the robustness of future deep learning-based watermarking techniques.

Keywords:
Deep Learning Adversarial Machine Learning (AMLAML) Digital Watermarking.

1 Introduction

Digital watermarking is a well-known technique where the watermark (message or image) is embedded covertly or overtly into a cover image without distorting the quality of the cover image [25, 8, 9, 42, 7]. It has various critical applications, such as copyright protection, content authentication, tamper detection, data hiding, etc. In watermarking, the sender embeds the watermark into the cover image and sends the watermarked image to the receiver or verifier. To validate the authenticity or copyright, the watermark from the received watermarked image is extracted and compared with the original watermark, which is provided to the receiver or verifier in advance. Generally, watermarking techniques consist of two processes: watermark embedding and watermark extraction. In watermark embedding, the watermark is embedded into the input cover image to produce a watermarked image. While in the watermark extraction process, the watermark is extracted from the watermarked image and compared with the original watermark to validate the ownership or authenticity of the cover image. One of the popular watermarking techniques is invisible watermarking, where the watermark is covertly embedded in the cover image. The security of any invisible watermarking techniques lies in the secrecy of the embedded watermark, such that the watermarked image should be perceptually similar to the cover image and should not contain any detectable artifact.

The classical watermarking techniques use a wide variety of embedding approaches from the spatial and frequency domains [50, 40, 47, 48, 45, 5, 28, 29]. Recently, deep learning has emerged as the key enabler of AIAI applications. Thus, there has been an increase in deep learning techniques using Deep Neural Networks (DNNDNN) for different tasks due to their adaptability in various applications. It is also being utilized in the domain of watermarking techniques, which has resulted in significant improvements in performance and efficiency compared to traditional techniques [4]. In DNNDNN-based watermarking techniques, the watermark embedding and extraction processes are implemented using deep generative networks, such as autoencoders and Generative Adversarial Networks (GANGAN). The pioneering DNNDNN-based watermarking technique proposed in [4] can hide an RGBRGB image within another RGBRGB image using an autoencoder network. DNNDNN-based watermarking was further enhanced by introducing distortion into the training data to make the watermarked images robust against certain noises [44, 21]. These simple autoencoder-based techniques are vulnerable to Deep Learning based Removal (DLRDLR) attacks [20, 6, 24]. There are different types of DLRDLR attacks. In one of the approaches, the attacker trains a denoising autoencoder to remove the watermark from the watermarked image as noise [6]. In another approach, the pixel distribution of the watermarked image is used to identify the distorted pixels for removing the watermark [20]. Pixel impainting technique is also utilized to remove the watermark from the watermarked image [24]. In this line, the watermarking technique proposed in [53, 51, 19, 26] is considered to be robust against DLRDLR attacks due to the presence of noise layers in their model architectures. Among these, the most popular technique is HiDDeNHiDDeN [53], which can withstand arbitrary types of image distortion and makes robust watermarked images. PIMoGPIMoG [12] went one step ahead by introducing screen-shooting robustness such that the watermark can be extracted even if the digital image is captured with a camera. This robustness is achieved by introducing a mask-guided loss in the training pipeline of the watermarking technique. Similarly, ReDMarkReDMark [2] uses residual structure to embed the watermark, striking a balance between robustness and impermeability.

Please note that DLRDLR attacks are useful for limited applications where the attacker’s objective is just to fail the ownership claim of the actual owner of the cover image. The attacker cannot claim ownership of the cover image using DLRDLR attacks. In order to claim ownership of a cover image, the attacker has to overwrite the original watermark of a given watermarked image with the attacker’s watermark such that the watermark extraction process should extract the attacker’s watermark from the watermarked image instead of the original watermark. There is no doubt that classical watermark overwriting attacks will not work on DNNDNN-based watermarking technique techniques [39, 38]. It requires a Deep Learning based OVErwriting (DLOVEDLOVE) attack. However, there is hardly any work in the open literature related to the DLOVEDLOVE attack. In regular deep learning applications, similar attacks are common, which are known as Adversarial Machine Learning (AMLAML) attacks [43, 13, 37, 34]. In targeted AMLAML, the attacker induces a well-crafted perturbation into the input image such that the model used for classification not only fails to classify it but is also forced to misclassify it into a target class as desired by the attacker. We can intuitively consider that the attacker is overwriting the features of the original class in the input image with the features of the target class. Inspired by targeted AMLAML attacks, for the first time, we developed the DLOVEDLOVE attack against DNNDNN-based watermarking techniques.

In this paper, we perform a security analysis of DNNDNN-based watermarking techniques using the DLOVEDLOVE attack. Here, the robustness of these DNNDNN-based watermarking techniques is verified against well-crafted perturbations where the final goal is to overwrite the embedded watermark with the desired watermark. The attack is targeted for the real-world scenario where the watermarking techniques are used to perform copyright protection. To show the adaptability and efficiency, we launch our DLOVEDLOVE attack on four different watermarking techniques, which are HiDDeNHiDDeN, ReDMarkReDMark [2], PIMoGPIMoG [12], and Hiding Images in an Image [4]. All these techniques use different approaches to create imperceptible watermarked images. Devising a common approach to attack these techniques with various constraints highlights the vulnerabilities of DNNDNN-based watermarking.

The paper makes the following key contributions:

  1. 1.

    We are the first to propose DLOVEDLOVE, a watermarking overwriting attack based on the concept of targeted AMLAML to overwrite the embedded watermark with the target watermark by adding well-crafted perturbation to the watermarked images.

  2. 2.

    We introduce a new class of attack solely using the knowledge available when DNNDNN-based watermarking techniques are used for copyright protection.

  3. 3.

    A detailed experimental result is provided to validate the success of the DLOVEDLOVE attack. The results demonstrate that the DLOVEDLOVE attack generalizes well on different DNNDNN-based watermarking techniques.

2 Related Works

2.1 Deep learning based Watermarking

Recently, many DNNDNN-based watermarking techniques have been proposed, which surpass the performance of traditional watermarking by utilizing the efficient feature extraction ability of the neural networks. The main architecture used in DNNDNN-based watermarking involves the use of an encoder network that embeds the watermark into the cover image and a decoder network that extracts the watermark from the watermarked image. DNNDNN-based watermarking can embed an image or bit string as a watermark but most techniques choose to embed a bit strings. Bit strings work as metadata and provide more robustness compared to embedding images as a watermark. This is due to the fact that embedding an image requires the decoder to learn the spatial information of the watermark, which can hamper robustness. The training of the encoder and decoder is done in an end-to-end manner as a pipeline [4, 21, 11]. To further enhance the quality and robustness of the watermarked image, a discriminator is added in the pipeline while training and noise layers are added in the model architecture of the DNNDNN-based watermarking [53, 26, 12]. The discriminator acts as an adversary network, which predicts whether the watermark is embedded in an image. Residual connections and layers of random combinations of a fixed set of distortions are also used in some model architectures to make the watermarking technique more robust with high data hiding capacity [2, 30]. Almost all of these DNNDNN-based methods achieve great performance in terms of image quality. Generally, when we consider robustness in watermarking, it refers to handling distortion that exists in image processing, such as JPEGJPEG compression, blurring, noises, crop out, etc. There is hardly any analysis that aims to find the vulnerability of DNNDNN-based watermarking techniques against the DLOVEDLOVE attacks.

2.2 Adversarial Machine Learning

AMLAML attacks have the capability to fail highly accurate machine learning models [43, 13, 37, 34] by adding a well-crafted perturbation into the input image. These attacks are majorly developed to fail deep convolution neural network-based classifiers. Transferable AMLAML attacks are also developed [27, 52, 35, 36] such that a perturbation crafted to fail one model can also be used to fail other models that perform a similar task even if the attacker has no access to the second model’s parameters or architecture. In AMLAML, knowledge of the attacker is assumed to be either white-box (complete knowledge of the target model architecture, its parameters and training data) [13, 31, 33] or black-box access ( limited or no knowledge to the target model) [15, 17]. In a white-box attack, the attacker can craft adversarial examples by directly manipulating the input data to maximize the model’s loss or misclassification using the model parameters. While in a black-box attack, despite lacking internal knowledge of the model, the attackers can still generate adversarial examples by exploiting the model’s response to input queries. These queries can be carefully chosen such that by observing its outputs, information about the model can be inferred, and adversarial examples can be crafted accordingly using the transferability of AMLAML attacks.

AMLAML is a great tool for the designer of deep convolution neural network-based classifiers to test the robustness of their classifiers. There is a lack of such tools in the domain of DNNDNN-based watermarking techniques. In this paper, we tried to overcome this lacuna by introducing the DLOVEDLOVE attack, which can be an interesting tool for the designers of DNNDNN-based watermarking techniques. Our approach is inspired by targeted AMLAML attacks. The objective of the DLOVEDLOVE attack is to craft a new watermark, which, once added to the watermarked image, will force the watermark decoder to decode the new watermark instead of the original watermark. We demonstrate the DLOVEDLOVE attack in the white box as well as black box settings.

3 Threat Model

3.1 Attackers Goals

Copyright protection is one of the most important use cases of watermarking through which the owner of digital content can claim its rights. An attacker can violate the copyright protection either by corrupting/cleaning the embedded watermark in the image so that the decoder cannot decode the watermark from the watermarked image (Objective 1) or by overwriting the embedded watermark present in the watermarked image with the target watermark so that the decoder will decode the target watermark instead of the original watermark from the watermarked image (Objective 2 ). In either case, the objective is to defeat the techniques of watermarking.

Refer to caption
Figure 1: Overview of the proposed DLOVEDLOVE attack leveraging Adversarial Machine Learning to a create well-crafted perturbation to overwrite the original watermark with the target watermark.

3.2 Attackers Knowledge

Before going into the details of the attack, we make the following assumptions about the attacker’s knowledge:
Training Data: The attacker has no knowledge of the training data used to train the DNNDNN-based watermarking model in both variants of the DLOVEDLOVE attack (white box and black box settings). This included both the watermark and the cover image.
Network Architecture: The architecture of the encoder network is not known to the attacker in both black and white box settings. In the white-box variant of the attack, it is assumed that the attacker has knowledge of the decoder network architecture and its parameters. The same is not valid for the black-box setting of the DLOVEDLOVE attack.

The black box setting of the DLOVEDLOVE attack is more practical and useful in professional applications of watermark [32, 14, 1, 46], where the watermarking technique is available as a service (APIAPI) to verify the digital content. In such a scenario, the attacker can subscribe to the service and get Oracle access to both the encoder and decoder through its APIAPI. Nevertheless, there is a limit on the number query to the APIAPI. However, in stringent secure application scenarios, even Oracle access to the decoder is infeasible for the attacker as it remains under the possession of the verifier only. The DLOVEDLOVE attack considers these stringent security assumptions in the black-box setting.

3.3 Scenario

Let AliceAlice be a digital artist who creates digital paintings. She wants to protect her digital paintings (copyright) from unauthorized use and distribution. AliceAlice uses DNNDNN-based invisible watermarking as it protects the copyright of the painting and also preserves its aesthetic appeal. The watermarking technique subscribed by Alice uses the logos of the artist as the watermark. Thus, Alice embeds the logo of her website into her digital paintings (cover image). For verification, the verifier needs to find the presence of a watermark, extract it, and verify the owner of the digital painting. In copyright protection, similar information that forms the metadata of the digital content for different owners is used as a watermark (in this case, it is a logo). This is to make sure that the verifier can verify with consistent information. Now, there is an attacker, EveEve, who has also subscribed to the same watermarking technique used by Alice. Thus, she knows that the digital paintings of Alice are copyright-protected with the logo of Alice’s website. EveEve can clean or overwrite the watermark with a target watermark containing a different logo in the watermarked image and recirculate it. By achieving Objective 1, EveEve can only remove the watermark from the digital painting. While achieving Objective 2, EveEve not only removes the watermark but also makes herself the digital painting owner by embedding her logo into it. AliceAlice cannot prove that the digital painting belongs to her as the decoder decodes the logo, which belongs to EveEve. This scenario is depicted in Figure 1, where the decoder decodes the target watermark instead of the original watermark when the well-crafted perturbation is added to the watermarked image. Thus, the verifier will announce that the digital paintings belong to EveEve.

4 Proposed Approach

4.1 Formal description

DNNDNN-based watermarking techniques consist of an encoder and a decoder. The encoder EE produces a watermarked image WW by embedding the watermark α\alpha into the cover image II as shown in Eq. (1). In contrast, the decoder DD takes WW as input and extracts the embedded watermark α\alpha as the output, as shown in Eq. (2). The attacker’s aim is to launch the DLOVEDLOVE attack to fool DD by inducing adversarial perturbation δ\delta in WW such that DD decodes the target watermark β\beta instead of the original embedded watermark α\alpha as shown in Eq. (3).

E(I+α)W\displaystyle E(I+\alpha)\rightarrow W (1)
D(W)α\displaystyle D(W)\rightarrow\alpha (2)
D(W+δ)β\displaystyle D(W+\delta)\rightarrow\beta (3)

4.1.1 White-Box Access:

Having white-box access to the decoder gives the attacker enough information to simulate the network by devising a targetted adversarial attack and using the gradients of the decoder to create the desired perturbation δ\delta, where α\alpha is the original watermark, β\beta is the target watermark and ϵ\epsilon is the perturbation limit. We minimize the loss (ll) of β\beta, which is the target watermark while maximizing the loss of α\alpha, which is the original watermark, i.e. we solve the optimization problem as shown in Eq. (4). This is the easiest approach but does not align with the use cases of watermarking, where access to the decoder is not allowed.

\stackunderminimizeδ{l(D(W+δ),β)l(D(W+δ),α)},δ[ϵ,ϵ]\stackunder{minimize}{\delta}\{l(D(W+\delta),\beta)-l(D(W+\delta),\alpha)\},\quad\delta\in[-\epsilon,\epsilon] (4)
Refer to caption
Figure 2: Overview of surrogate model attack a) Training the surrogate model using surrogate dataset b) Fine-tuning the surrogate decoder with the watermarked image of the target DNNDNN-based watermarking technique c) Attacking the decoder of the target DNNDNN-based watermarking technique after generating the well-crafted perturbation from the surrogate decoder.

4.1.2 Black-Box Access:

If the attacker has the ability to use the decoder as an oracle, it can obtain a set of watermarked images and their watermarks by querying the decoder with watermarked images. Once this data set is available, the attacker can train a surrogate decoder. Afterwards, a white-box attack is performed on the surrogate decoder to craft the desired perturbation δ\delta, which is used to launch the DOVEDOVE attack on DD. However, in stringent security applications, even the decoder is not available. Therefore, we consider only having limited instances of watermarked images whose watermarks are known. One of the easiest ways for the attacker to gain access to such data is to request the subscribed copyright protection service provider to copyright on, say, nn pairs of cover images and their watermarks.

Under this scenario, the attacker will train its own DNNDNN-based surrogate watermarking encoder (EE^{\prime}) and decoder (DD^{\prime}) models with its own dataset (also known as the surrogate dataset) i.e.i.e., a set of cover images and their watermarks. Once the surrogate model is trained, DD^{\prime} is fine-tuned with the limited instances of watermarked images available from the target decoder DD to be attacked. While fine-tuning, loss between the extracted watermark and the original watermark is used for the training of the surrogate decoder, as shown in Eq (5). The surrogate decoder is trained and fine-tuned to act as the target decoder DD, making the black-box attack transferable. Therefore, the attacker can launch a white-box DLOVEDLOVE attack on DD^{\prime} using the gradient information to craft the desired perturbation δ\delta as shown in Eq. (6). The same δ\delta can be used to fail DD when added with WW (Eq. (7)). The value ϵ\epsilon is chosen judiciously such that the induced perturbation (δ\delta) to the watermarked image is imperceptible. Figure 2 refers to the training procedure of the surrogate model and fine-tuning of the surrogate decoder to perform an attack on the target decoder.

minimize{l(D(W),α)}{minimize}\{l(D^{\prime}(W),\alpha)\} (5)
\stackunderminimizeδ{l(D(W+δ),β)},δ[ϵ,ϵ]\stackunder{minimize}{\delta}\{l(D^{\prime}(W+\delta),\beta)\},\quad\delta\in[-\epsilon,\epsilon] (6)
D(W+δ)β\ D(W+\delta)\rightarrow\beta\ (7)

4.2 Crafting algorithm

The adversarial perturbation crafting algorithm is shown in Algo 1. The algorithm shows how to craft a perturbation using the DLOVEDLOVE attack in a white box scenario. Inputs to the algorithm are, a watermarked image WW, the target decoder DD, the target watermark β\beta, a perturbation δ\delta (initialized as zero) with the same size (k×kk\times k) as WW, a limiting range ϵ\epsilon of δ\delta (-ϵ\epsilon \leq δ\delta \leq ϵ\epsilon). δ\delta is added with WW and passed into the decoder DD, which decodes the secret as γ\gamma. The loss between γ\gamma and β\beta is computed using the chosen loss function ll. In each iteration of the loop, the optimizer tries to minimize the loss between γ\gamma and β\beta and maximize the loss between β\beta and α\alpha. Accordingly, Δ\Delta is updated. This process is repeated until the model converges and the desired δ\delta is obtained, which is the realization of the DLOVEDLOVE attack on WW to overwrite α\alpha with β\beta.

Algorithm 1 Adversarial perturbation crafting algorithm
1:WW, D{D}, β\mathbf{\beta}, δ\delta, ϵ\epsilon
2:δ[0]k×k\delta\leftarrow[0]_{k\times k} \triangleright Initial perturbation
3:max_iter=k×kk\times k
4:γ\gamma = DD(W+δW+\delta)\triangleright Embedded Watermark
5:while imax_iteri\leq max\_iter & γ\gamma\neq β\beta  do
6:     γD(W+δ)\gamma\leftarrow D(W+\delta) \triangleright Intermediate decoder’s output
7:     Δl(β,γ)l(β,α)\Delta\leftarrow l(\beta,\gamma)-l(\beta,\alpha)
8:     Update δ:δiδi1ηΔ(δ)\delta:\delta_{i}\leftarrow\delta_{i-1}-\eta\nabla\Delta(\delta) \triangleright Update delta with respect to the loss
9:     δClip(δ,[ϵ,ϵ])\delta\leftarrow Clip(\delta,[-\epsilon,\epsilon]) \triangleright δ\delta is clipped
10:end while
11:return δ\delta

The hyperparameters (parameters that we explicitly define) for this attack include:

  1. 1.

    ϵ\epsilon: The maximum amount of allowable perturbation that can be added to the images.

  2. 2.

    Optimizer: It is used to find the well-crafted perturbation δ\delta.

  3. 3.

    ll: The loss function chosen for minimizes the loss between γ\gamma and β\beta and maximizes the loss between β\beta and α\alpha.

This algorithm works similarly in the black-box scenario where the decoder DD is replaced by a surrogate decoder DD^{\prime}, and the corresponding loss will be l(β,γ)l(\beta,\gamma) + l(D(W),α)l(D^{\prime}(W),\alpha) in line 66 of the algorithm.

4.3 Reason For Successful Attack

In the classification task, whatever may be the input, the classifier will always classify it into one of the classes. These classes are also known to the attacker while performing a targeted adversarial attack. Thus, the attacker adds perturbation such that the boundary of the current class is crossed to the target class and the confidence level of the classifier corresponding to the target class is the highest. Suppose the watermark is of NN-bit, i.e.i.e., the output of the decoder is a NN-bit watermark. Now, we can consider the decoder as a classifier that classifies the watermarked image into one of 2N2^{N} possible watermark classes. Therefore, our attack can be considered a targeted adversarial attack where the target class is the target watermark among one of the 2N2^{N} possible cases. DNNDNN-based watermarking techniques are trained end-to-end based on the perceptual similarity of the image after embedding a watermark, which makes the embedding region-specific and susceptible to attack. Even if the models are trained for robustness against prominent image manipulation attacks, the same factor is responsible for the generation of adversarial perturbations.

5 Experimental Results

In this section, we validate the effectiveness of our DLOVEDLOVE attack on four well known DNNDNN-based watermarking techniques: HiDDeNHiDDeN [53], ReDMarkReDMark [2], PIMoGPIMoG [12] and Hiding Images in an Image [4]. Experiments are conducted on a machine with 1414-corecore IntelIntel i9i9 10940X10940X CPUCPU, 128128 GBGB RAMRAM, and two NvidiaNvidia RTXRTX-50005000 GPUGPUs with 1616 GBGB VRAMVRAM each.

5.1 Setup of Target Models

The target DNNDNN-based watermarking techniques have different DNNDNN model architectures, training pipelines, training datasets, and watermark sizes. The key features of these four techniques are described below with the help of Table 1:

  1. 1.

    HiDDeNHiDDeN is an end-to-end model for image watermarking that is robust to arbitrary types of image distortion. It comprises four main components: an encoder, a parameterless noise layer NN, a decoder, and an adversarial discriminator. The encoder uses a 128128×\times128128×\times33 cover image to embed a 3030-bit binary watermark.

  2. 2.

    ReDMarkReDMark uses residual connections, circular convolution, attack layer (simulated attacks during training against real-world manipulations, particularly JPEG compression), and 11d convolution layers for embedding and extracting the watermark. It takes a grayscale 3232×\times3232×\times11 cover image and embeds a 44×\times44-bit watermark using the residual connection between the layers.

  3. 3.

    PIMoGPIMoG consists of three main parts: the encoder, the screen-shooting noise layer, and the decoder. In order to achieve both screen-shooting robustness by handling perspective distortion, illumination distortion and moir{r}e distortion while maintaining high visual quality, the technique uses an adversary network with edge mask-guided image loss and gradient mask-guided image loss. It uses a 128128×\times128128×\times33 size cover image to embed a 3030-bit watermark.

  4. 4.

    Hiding Images in an Image could hide a 200200×\times200200×\times33 image as a watermark inside a 200200×\times200200×\times33 cover image. In this technique, the watermarked image is passed through a preparation network that transforms and concatenates it to the original cover image using a hiding network. During decoding, the watermarked image is passed through a reveal network and outputs the watermarked image.

Table 1: Characteristics of different DNNDNN-based watermarking techniques which are attacked by DLOVEDLOVE.
Technique
Dicrimator
in the Loop
Cover
Image Size
Watermark
Size
Dataset
HiDDeNHiDDeN [53] Yes 128128×\times128128×\times33 3030 bit COCOCOCO [23]
ReDMarkReDMark [2] No 3232×\times3232×\times11 44×\times44 bit CIFAR10CIFAR10 [22]
PIMoGPIMoG [12] Yes 128128×\times128128×\times33 3030 bit COCOCOCO [23]
Hiding Images in an Image [4] No 200200×\times200200×\times33 200200×\times200200×\times33 bit ImageNetImageNet [10]

5.2 Training Surrogate Model

Each of the techniques mentioned above uses a different resolution of the cover image and watermark sizes as shown in Table 1. Therefore, we have made four instances of the surrogate model (encoder and decoder), i.e.i.e., one instance for each target model. The size of the cover image, watermarked image and watermark for each instance of the surrogate model is set according to the target model it corresponds. We have used UNetUNet [41] architecture for the surrogate encoder, whereas for the surrogate decoder, after trying various models, we have used two different architectures: one is spatial transformer [18] with seven convolutional layers followed by two fully connected layers and the other is Self-supervised vision transformer (SiTSiT[3] based autoencoder. The first one is used to build surrogate decoders for HiDDeNHiDDeN, ReDMarkReDMark, and PIMoGPIMoG, where a bit-string is used as the watermark. The second architecture is used to build the surrogate decoder for Hiding Images in an Image, where an image is used as the watermark. We have used the MirflickrMirflickr [16] dataset as our surrogate dataset, consisting of one million images with varied contexts, lighting, and themes, from the social photography site FlickrFlickr. We used 5050k images in our training set and 1010k in our test set. We trained all four surrogate models in an end-to-end manner for 200 epochs, which is the general approach followed in DNNDNN-based watermarking techniques [4, 53, 12, 2]. We used MSEMSE, LPIPSLPIPS [49], and L2L_{2} residual regularization loss functions between the cover image and the watermarked image for training the surrogate encoders. While training surrogate decoders for HiDDeNHiDDeN, ReDMarkReDMark, and PIMoGPIMoG, where a bit-string is used as the watermark, we used BCEBCE to calculate the loss between the extracted and the original watermarks. In the same line, MSEMSE and LPIPSLPIPS loss is used to train the surrogate decoder for Hiding Images in an Image.

5.3 Fine-tuning Surrogate Decoder

In order to demonstrate the efficacy of our attack (Algo 1), we will next show that it can successfully adapt to different watermarking techniques through a set of experiments. Before going into the details of our attack results, we briefly discuss our fine-tuning setup. For the fine-tuning, we collected 500500 watermarked images and their watermarks from each of the four watermarking techniques. In the case of HiDDeNHiDDeN, ReDMarkReDMark and PIMoGPIMoG, each watermarked image is embedded with a unique watermark generated from randomly sampled bits, whereas for Hiding Images in an Image, we use randomly sampled images from the ImagenetImagenet dataset as the watermark. Subsequently, the four instances of the trained surrogate decoder are fine-tuned on the watermarked images of the respective target decoder. The results show that fine-tuning the surrogate decoder for 100100 epochs is sufficient to attack the target decoder successfully.

5.4 Attack Validation

In order to validate our attack, we generate 1000010000 watermarked images using each of the four target watermarking techniques. Using each of these watermarked images, we attack the corresponding fine-tuned surrogate decoder using Algo 1 to generate corresponding well-crafted perturbations. These perturbations are added to the watermarked images and are used to attack the target decoder to evaluate the success rate of our attack. We tried our attack on the surrogate decoder with L1L_{1} and MSEMSE loss (Line 0606, Algo 1). Finally, we chose MSEMSE, as the perturbation generated was imperceptible and the attack converged quickly. The initial perturbation δ\delta is initialized as a zero-filled vector, whereas ϵ\epsilon is chosen as 0.3ϵ0.3-0.3\leq\epsilon\leq 0.3. We employ the AdamAdam optimizer with an initial learning rate of 0.0010.001. The attack converges around 50005000 iterations for all four watermarking techniques.

5.5 Evaluation

After the initial training of all four surrogate decoders has an accuracy of more than 90%90\% in successfully extracting the embedded watermark when validated on the test set. This shows that all four surrogate models have converged successfully. Subsequently, these surrogate decoders are successfully fine-tuned with 500500 watermarked images and their watermarks within 100100 epochs to launch a white-box attack on the surrogate decoder to get the well-crafted perturbation that will fail the target decoder. An experimental analysis is performed for each instance of the surrogate decoder to check if it can be fine-tuned in less than 100100 epochs and also with less than 500500 watermarked images and their watermarks. The optimum epochs and required watermarked images are shown in 2nd2^{nd} and 3rd3^{rd} columns of Table 2.

Table 2: Optimal epoch and watermarked image required for fine-tuning different surrogate models along with the image quality of the watermarked images after attacking and adding the well-crafted perturbation using DLOVEDLOVE attack. For PSNRPSNR and SSIMSSIM, higher is better. For LPIPSLPIPS and MSEMSE, lower is better. ASRASR represent the rate of success on attacking 1000010000 watermarked images generated from each DNNDNN-based watermarking technique.
Technique Fine-Tuning Pert Limit (ϵ\epsilon) Evaluation Matrix
Epoch Image PSNR SSIM LPIPS MSE ASR
ReDMark 40 200 0.002 41 0.97 0.08 0.05 98
HiDDeN 60 300 0.008 38 0.99 0.07 0.15 96
PIMoG 70 400 0.02 37 0.99 0.1 0.27 93
Hiding Images
in an Image
90 500 0.1 33 0.95 0.12 0.36 89
Refer to caption

Normal

Image

  Refer to caption Added Perturbation   Refer to caption Attacked Image   Refer to caption Normal Image   Refer to caption Added Perturbation   Refer to caption Attacked Image

Figure 3: The well-crafted imperceptible perturbation is successfully added to the original watermarked image without deteriorating the image quality of the watermarked image.

The evaluations are made using the settings mentioned in the Section 5.4. One of the most important metrics in our evaluations is the Attack Success Rate (ASRASR), which defines the percentage of adversarial examples that lead to successful attacks in the target decoder. Furthermore, in order to evaluate the quality and the similarity of the attacked (perturbed) watermarked images in comparison to their respective original watermarked images, we use MSEMSE, Peak Signal-to-Noise Ratio (PSNRPSNR), Structural Similarity Index (SSIMSSIM), and LPIPSLPIPS (from Alexnet Network). These image quality metrics are used in combination with visual analysis to show the quality of our DLOVEDLOVE, i.e.i.e., our watermark overwriting attack. In our terms, a good quality attack not only fails the target decoder without leaving any visual traces (artifacts) in the attacked watermarked image. In this line, the MSEMSE and PSNRPSNR metrics provide pixel-wise error measurement, which helps the difference between the attacked watermarked images and their respective original watermarked images with respect to the pixels and their orientations. At the same time, the SSIMSSIM and LPIPSLPIPS metrics measure the image quality specifically such that there is no degradation of an image by adding perturbation.

Refer to caption

Normal

Image

  Refer to caption Embedded Watermark   Refer to caption Target Watermark   Refer to caption Attacked Image   Refer to caption Extracted Watermark

Figure 4: Result of attacking the watermarked image created by the technique of Hiding Images in an Image using the DLOVEDLOVE attack.

The performance of our DLOVEDLOVE attack on different techniques is shown in Table 2. Figure 3 show the watermarked image quality after the DLOVEDLOVE attack where the normal image refers to the original watermarked image and the attacked image refers to the perturbed watermarked image after DLOVEDLOVE attack. Figure 4 shows the quality of the target watermark extracted after attacking the technique of Hiding Images in an Image using DLOVEDLOVE attack. The attacked images maintained low values of MSEMSE, LPIPSLPIPS, and high values of PSNRPSNR (in %) and SSIMSSIM scores, indicating the added perturbation’s imperceptibility. The ASRASR is 90%90\% for almost all the techniques, highlighting the efficacy of the DLOVEDLOVE attack. Our attack fails for instances where the cosine similarity between the original and target watermarks is less than 0.10.1. In order to get a 100%100\% success rate in these cases, we had to sacrifice in perturbation limit, which is increased to 0.50.5. This took a toll on all other metrics. Thus, the MSEMSE and LPIPSLPIPS increased, while PSNRPSNR and SSIMSSIM decreased significantly. Noticeable artifacts also appeared in the attacked images, as shown in Figure 5.

Refer to caption

Normal

Image

  Refer to caption Added Perturbation   Refer to caption Attacked Image   Refer to caption Normal Image   Refer to caption Added Perturbation   Refer to caption Attacked Image

Figure 5: Artifacts appear on attacking some watermarked images using DLOVEDLOVE when cosine similarity is less than 0.10.1.

5.6 Discussion

The initial training of the surrogate model for ReDMark and HiDDeNHiDDeN have a good accuracy and fine-tuning with the target watermarked image leads to a successful DLOVEDLOVE attack. There were some problems with Hiding Images in an Image and PIMoG where the initial surrogate model had good accuracy and similar fine-tuning was performed, but still the DLOVEDLOVE attack was unsuccessful. In the case of Hiding Images in an Image after the DLOVEDLOVE attack, the target decoder could recover a distorted watermark that neither matched with the original watermark nor matched with the target watermark. To overcome this issue, we have added LPIPSLPIPS loss (from VGGVGG-1919 Network) along with the initial MSEMSE loss while training the surrogate decoder with the surrogate dataset. This led to a successful DLOVEDLOVE attack when we fine-tuned the surrogate model for 9090 epochs with 500500 watermarked image of the target decoder. In the case of, PiMoG, the target decoder recovers the original watermark successfully, even with the perturbed watermarked image. This was possible due to the presence of screen shooting robustness. We introduced perspective warp, motion blur, and colour manipulations to overcome the issue while initially training our surrogate model. Subsequently, the surrogate decoder is fine-tuned for 7070 epochs with 400400 watermarked images of the target decoder, which led to a successful DLOVEDLOVE attack. This shows that the DLOVEDLOVE attack needs minute tweaking in surrogate training and fine-tuning such that it can be adaptable to different techniques.

6 Conclusion

In this work, we have proposed the DLOVEDLOVE attack on DNNDNN-based watermarking techniques by leveraging adversarial machine learning. The attack shows that modern DNNDNN-based watermarking techniques are vulnerable to the DLOVEDLOVE attack. The proposed DLOVEDLOVE attack raises a clear question on the security of the existing DNNDNN-based watermarking techniques. It provides a new attack vector to the designer community to assess the security of their DNNDNN-based watermarking techniques.

References

  • [1] Adobe on watermarking ai-generated photos. https://blog.adobe.com/en/publish/2023/10/10/new-content-credentials-icon-transparency, accessed: 2022-20-02
  • [2] Ahmadi, M., Norouzi, A., Karimi, N., Samavi, S., Emami, A.: Redmark: Framework for residual diffusion watermarking based on deep networks. Expert Systems with Applications 146, 113157 (2020)
  • [3] Atito, S., Awais, M., Kittler, J.: Sit: Self-supervised vision transformer. arXiv preprint arXiv:2104.03602 (2021)
  • [4] Baluja, S.: Hiding images within images. IEEE transactions on pattern analysis and machine intelligence 42(7), 1685–1697 (2019)
  • [5] Berghel, H., O’Gorman, L.: Protecting ownership rights through digital watermarking. Computer 29(7), 101–103 (1996). https://doi.org/10.1109/2.511977
  • [6] Corley, I., Lwowski, J., Hoffman, J.: Destruction of image steganography using generative adversarial networks. arXiv preprint arXiv:1912.10070 (2019)
  • [7] Cox, I., Kilian, J., Leighton, F., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6(12), 1673–1687 (1997). https://doi.org/10.1109/83.650120
  • [8] Cox, I., Miller, M., Bloom, J., Honsinger, C.: Digital watermarking. Journal of Electronic Imaging 11(3), 414–414 (2002)
  • [9] Cox, I.J., Miller, M.L.: The first 50 years of electronic watermarking. EURASIP Journal on Advances in Signal Processing 2002,  1–7 (2002)
  • [10] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
  • [11] Fang, H., Chen, D., Huang, Q., Zhang, J., Ma, Z., Zhang, W., Yu, N.: Deep template-based watermarking. IEEE Transactions on Circuits and Systems for Video Technology 31(4), 1436–1451 (2020)
  • [12] Fang, H., Jia, Z., Ma, Z., Chang, E.C., Zhang, W.: Pimog: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 2267–2275 (2022)
  • [13] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
  • [14] Google on watermarking ai-generated contents. https://deepmind.google/technologies/synthid/, accessed: 2022-20-02
  • [15] Guo, C., Gardner, J., You, Y., Wilson, A.G., Weinberger, K.: Simple black-box adversarial attacks. In: International Conference on Machine Learning. pp. 2484–2493. PMLR (2019)
  • [16] Huiskes, M.J., Thomee, B., Lew, M.S.: New trends and ideas in visual concept detection: The mir flickr retrieval evaluation initiative. In: Proceedings of the international conference on Multimedia information retrieval. pp. 527–536 (2010)
  • [17] Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: International conference on machine learning. pp. 2137–2146. PMLR (2018)
  • [18] Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Advances in neural information processing systems 28 (2015)
  • [19] Jia, Z., Fang, H., Zhang, W.: Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression. In: Proceedings of the 29th ACM international conference on multimedia. pp. 41–49 (2021)
  • [20] Jung, D., Bae, H., Choi, H.S., Yoon, S.: Pixelsteganalysis: Pixel-wise hidden information removal with low visual degradation. IEEE Transactions on Dependable and Secure Computing (2021)
  • [21] Kandi, H., Mishra, D., Gorthi, S.R.S.: Exploring the learning capabilities of convolutional neural networks for robust image watermarking. Computers & Security 65, 247–268 (2017)
  • [22] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
  • [23] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
  • [24] Liu, H., Xiang, T., Guo, S., Li, H., Zhang, T., Liao, X.: Erase and repair: An efficient box-free removal attack on high-capacity deep hiding. IEEE Transactions on Information Forensics and Security (2023)
  • [25] Liu, T., Qiu, Z.d.: The survey of digital watermarking-based image authentication techniques. In: 6th International Conference on Signal Processing, 2002. vol. 2, pp. 1556–1559. IEEE (2002)
  • [26] Liu, Y., Guo, M., Zhang, J., Zhu, Y., Xie, X.: A novel two-stage separable deep learning framework for practical blind watermarking. In: Proceedings of the 27th ACM International conference on multimedia. pp. 1509–1517 (2019)
  • [27] Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016)
  • [28] Lu, C.S., Huang, S.K., Sze, C.J., Liao, H.Y.M.: Cocktail watermarking for digital image protection. IEEE Transactions on Multimedia 2(4), 209–224 (2000). https://doi.org/10.1109/6046.890056
  • [29] Lu, C.S., Liao, H.Y.: Multipurpose watermarking for image authentication and protection. IEEE Transactions on Image Processing 10(10), 1579–1592 (2001). https://doi.org/10.1109/83.951542
  • [30] Luo, X., Zhan, R., Chang, H., Yang, F., Milanfar, P.: Distortion agnostic deep watermarking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13548–13557 (2020)
  • [31] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
  • [32] Meta on watermark ai-generated photos. https://about.fb.com/news/2024/02/labeling-ai-generated-images-on-facebook-instagram-and-threads/, accessed: 2022-20-02
  • [33] Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)
  • [34] Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 427–436 (2015)
  • [35] Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)
  • [36] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security. pp. 506–519 (2017)
  • [37] Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P). pp. 372–387. IEEE (2016)
  • [38] Pibre, L., Jérôme, P., Ienco, D., Chaumont, M.: Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch. arXiv preprint arXiv:1511.04855 (2015)
  • [39] Qian, Y., Dong, J., Wang, W., Tan, T.: Deep learning for steganalysis via convolutional neural networks. In: Media Watermarking, Security, and Forensics 2015. vol. 9409, pp. 171–180. SPIE (2015)
  • [40] Raj, N.N., Shreelekshmi, R.: A survey on fragile watermarking based image authentication schemes. Multimedia Tools and Applications 80, 19307–19333 (2021)
  • [41] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
  • [42] Shaik, A.S., Karsh, R.K., Islam, M., Laskar, R.H.: A review of hashing based image authentication techniques. Multimedia Tools and Applications pp. 1–28 (2022)
  • [43] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  • [44] Vukotić, V., Chappelier, V., Furon, T.: Are deep neural networks good for blind image watermarking? In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). pp. 1–7. IEEE (2018)
  • [45] Wang, Y., Doherty, J.F., Van Dyck, R.E.: A wavelet-based watermarking algorithm for ownership verification of digital images. IEEE transactions on image processing 11(2), 77–88 (2002)
  • [46] Single-frame & image forensic watermarking. https://castlabs.com/image-watermarking/, accessed: 2022-20-02
  • [47] Wong, P.W.: A public key watermark for image verification and authentication. Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269) 1, 455–459 vol.1 (1998), https://api.semanticscholar.org/CorpusID:15447332
  • [48] Wong, P.W., Memon, N.: Secret and public key image watermarking schemes for image authentication and ownership verification. IEEE Transactions on Image Processing 10(10), 1593–1601 (2001). https://doi.org/10.1109/83.951543
  • [49] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
  • [50] Zhang, X., Wang, S.: Statistical fragile watermarking capable of locating individual tampered pixels. IEEE Signal Processing Letters 14(10), 727–730 (2007). https://doi.org/10.1109/LSP.2007.896436
  • [51] Zhong, X., Huang, P.C., Mastorakis, S., Shih, F.Y.: An automated and robust image watermarking scheme based on deep neural networks. IEEE Transactions on Multimedia 23, 1951–1961 (2020)
  • [52] Zhou, W., Hou, X., Chen, Y., Tang, M., Huang, X., Gan, X., Yang, Y.: Transferable adversarial perturbations. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 452–467 (2018)
  • [53] Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: Hidden: Hiding data with deep networks. In: Proceedings of the European conference on computer vision (ECCV). pp. 657–672 (2018)