This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Layer Attack Unlearning: Fast and Accurate Machine Unlearning via
Layer Level Attack and Knowledge Distillation

Hyunjune Kim1, Sangyong Lee2, Simon S. Woo1 2 3
Abstract

Recently, serious concerns have been raised about the privacy issues related to training datasets in machine learning algorithms when including personal data. Various regulations in different countries, including the GDPR, grant individuals to have personal data erased, known as ‘the right to be forgotten’ or ‘the right to erasure’. However, there has been less research on effectively and practically deleting the requested personal data from the training set while not jeopardizing the overall machine learning performance. In this work, we propose a fast and novel machine unlearning paradigm at the layer level called layer attack unlearning, which is highly accurate and fast compared to existing machine unlearning algorithms. We introduce the Partial-PGD algorithm to locate the samples to forget efficiently. In addition, we only use the last layer of the model inspired by the Forward-Forward algorithm for unlearning process. Lastly, we use Knowledge Distillation (KD) to reliably learn the decision boundaries from the teacher using soft label information to improve accuracy performance. We conducted extensive experiments with SOTA machine unlearning models and demonstrated the effectiveness of our approach for accuracy and end-to-end unlearning performance.

1 Introduction

Deep neural networks (DNNs) have achieved significant progress and dramatic performance gains in challenging machine learning tasks in recent years. Among others, large amounts of available training datasets have been the foundation for enabling the revolution of large-scale models. However, recently, due to the privacy concerns raised by ChatGPT (Bourtoule et al. 2021; Burgess 2023), the training dataset would contain personal information or possible information that can leak one’s privacy. For example, many vision-based applications would involve training one’s face images, which are personally identifiable information (PII). Several nations have implemented some types of regulations, such as the General Data Protection Regulation (GDPR) (Mantelero 2013) and the EU/US Copyright Law (Kaye 2023; Kublik 2023), in order to address the potential misuse of personal information and further grant individuals the right to have personal data erased, known as ‘the right to be forgotten’ or ‘the right to erasure.’ The goal of such regulations is to provide data owners the right to request and erase the personal or copyrighted data they want if they have not agreed and consented in the first place.

Therefore, companies using personal data should delete the requested data from the training set. One potential approach for corporations to mitigate the aforementioned concerns involves the exclusion of the required dataset from the training dataset, followed by a complete retraining process from scratch. Nevertheless, as models like ChatGPT get bigger and datasets grow, retraining them from scratch requires excessive computational resources and time.

Machine unlearning has emerged to tackle this challenge, allowing ML models to discard specific data selectively. (Bourtoule et al. 2021) Machine unlearning can be divided into two primary strategies: instance-wise and class-wise unlearning. The former involves forgetting knowledge related to specific instances from ML models, while the latter, which we focus on, completely removes particular classes from ML models. For example, face recognition and social media classification systems may need to erase data related to specific religion, nationality, age, disease, gender, etc., for security and privacy reasons. A few approaches (Chen et al. 2023; Cha et al. 2023) have explored the adversarial attacks for unlearning by harnessing the forgetting data’s noise to navigate the adjacent latent space. However, they used the original PGD (Madry et al. 2017) for unlearning, which can be slow.

In this work, we propose Layer Attack Unlearning, a fast and novel machine unlearning algorithm to tackle the class-wise unlearning problem. Our approach first introduces Partial-PGD, which is a new adversarial attack generation strategy to efficiently search the close vicinity of the data points to delete (See Fig. 1). Our proposed Partial-PGD is designed only to attack fully connected (classification) layer for probing the neighboring latent space to shift the forgetting data. Surprisingly, we do not utilize any feature layer information while achieving efficiency and accuracy. As shown in Fig. 1, Partial-PGD is much more efficient than the original PGD, as it can create adversarial examples only via the classification layer.

In particular, Hinton (2022)’s Forward-Forward (FF) algorithm has inspired us, and we provide the foundation of the concept of layer-based attack for machine unlearning based on FF. According to Hinton (2022), each layer undergoes individualized training in the Forward-Forward algorithm to achieve its specific objectives. Similarly, in line with FF research, we aim to accomplish machine unlearning objectives at layers with characteristics directly relevant to data and features we want to forget. Hence, we focus on performing machine unlearning at the layer level rather than using the entire model. Our layer-wise unlearning approach clearly avoids unnecessary loss calculations during the unlearning process. Furthermore, updating only the layers’ weights related to forgetting data will ensure a reduction in computational costs.

Finally, we employ Knowledge Distillation (KD) (Hinton, Vinyals, and Dean 2015) to modify the decision boundary for the forgetting data and preserve the decision boundary for the retain data. The primary objective in unlearning is to utilize hard labels and acquire soft label information from the teacher model for unlearning tasks to maintain performance. We show that it achieves a stable placement of forgetting data in the space subjected to carefully created adversarial examples. We incorporated KD into our final loss function to improve performance.

Our main contributions are summarized as follows:

  • We introduce Layer Attack Unlearning (LAU) algorithm, which is a novel and fast unlearning method by proposing Partial-PGD and performing unlearning at the layer level.

  • In addition, we propose KD method to further improve the overall accuracy and data erasure performance by effectively distilling the decision boundary knowledge from the teacher model for unlearning task.

  • Our extensive experimental results with seven baselines with four different backbones, including ViT over three other datasets, show that our approach outperforms previous SOTA methods in accuracy and time performance while completely forgetting the requested class.

2 Related Work

There are two main approaches to the current machine unlearning problem in DNNs. The first involves considering unlearning during the learning process, while the second focuses on fine-tuning. This paper will refer to the approach that considers the learning process as “data-driven” and the approach that involves fine-tuning as “model-agnostic.”

2.1 Data-Driven Unlearning Methods

A “data-driven” approach utilizes data-centric strategies such as partitioning and augmentation (Nguyen et al. 2022) to address unlearning. SISA (Bourtoule et al. 2021) and Selective Forgetting (Shibata et al. 2021) are two representative data-driven unlearning methods. In SISA, data is divided into shard units, sequentially trained in slices, and multiple model checkpoints are created. Once an unlearning query is requested, it reverts the query to the checkpoint before learning and retrains this reverted query with the ensemble technique. However, it is challenging to calculate the probability of encountering unlearning queries on data points.

On the other hand, Selective Forgetting (Shibata et al. 2021) involves lifelong learning to perform unlearning. A “mnemonic code” signal is embedded in the data during training. During the unlearning process, the mnemonic code information is selectively incorporated into the loss function to remove forgetting data. This strategy requires storing mnemonic codes for all data points, considering unlearning queries before building the original model. This could be more practical in a real-world scenario.

2.2 Model-Agnostic Unlearning Methods

A “model-agnostic” approach is a methodology for handling the unlearning process by adjusting the model’s learning parameters to achieve data unlearning (Nguyen et al. 2022). Such approaches include various methods such as Summation form (Cao and Yang 2015), Negative Gradient (Golatkar, Achille, and Soatto 2020), Fisher Forgetting (Golatkar, Achille, and Soatto 2020), Boundary unlearning (Chen et al. 2023), Instance-wise Unlearning (Cha et al. 2023), etc. Some methods utilize adversarial attacks to the original model to avoid naively excluding and deleting forgetting data. Among the mentioned algorithms, approaches like ours include Boundary unlearning and Instance-wise Unlearning. These two algorithms perform unlearning by utilizing adversarial attacks to transition forgettable data to nearby spaces. However, a significant difference between our approach and these methods lies in the target of the attack. Our approach directs the unlearning process towards layers with specific classification objectives instead of using entire layers. Furthermore, we aim to introduce effective ways of utilizing PGD in unlearning.

Refer to caption
Figure 1: Illustration of the original PGD vs. Partial-PGD. While the original PGD involves backpropagation to compute xadvx^{\text{adv}} with respect to input xx for all the layers, (b) Partial-PGD computes adv\ell^{\text{adv}} in θc\mathcal{F}^{c}_{\theta} after passing xx through θf\mathcal{F}^{f}_{\theta} to calculate \ell. Step in both (a) and (b) indicates the iteration.
Refer to caption
Figure 2: The overall procedure of our approach. Our method involves the unlearning task on the classification layer instead of the entire model, where each classification layer represents the student and the teacher model. For the unlearning task, we perform Knowledge Distillation by combining the teacher logit and student logit via the unlearned mask. The teacher logit is derived from the adversarial examples obtained after applying Partial-PGD.

3 Our Approach

The main objective of our approach is to accurately and efficiently perform class-wise unlearning, which is to completely remove specific classes from the classification model. In this section, we describe our Partial-PGD, KD architecture, and our connection to the FF algorithm.

3.1 Preliminaries and Notations

First, we formulate a machine unlearning problem as follows: We define a training dataset DtrainD_{\text{train}} = {xi,yi}i=1N{\{{x}^{i},{y}^{i}\}}_{i=1}^{N}, consisting of inputs xiX{x}^{i}\in X and their corresponding class labels yiY{y}^{i}\in Y. The forgetting dataset DfD_{f} is a subset of DtrainD_{\text{train}} that we intend to forget from the pre-trained model. Conversely, the retain dataset Dr=DtrainDfD_{r}=D_{\text{train}}\setminus D_{f} is the dataset we want to preserve the overall performance.

Next, we define the original model θ:nn\mathcal{M}_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}, which comprises a set of feature layers denoted by θf:nn\mathcal{F}^{f}_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} and a fully connected layer denoted as θc:nn\mathcal{F}^{c}_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}, where θ\theta represents the optimal parameters for the model trained on DtrainD_{\text{train}}. The following provides a compositional representation of the model θ\mathcal{M}_{\theta} as θcθf\mathcal{F}^{c}_{\theta}\circ\mathcal{F}^{f}_{\theta}. Also, we denote xadvx^{\text{adv}} to represent the adversarial examples (Goodfellow, Shlens, and Szegedy 2014) for the input data xx. In particular, we define adv\ell^{\text{adv}} as the adversarial example from Partial-PGD, generated from the intermediate latent feature \ell obtained from the outputs of θf\mathcal{F}^{f}_{\theta}, as shown in Fig. 1.

3.2 Partial-Projected Gradient Descent (PGD)

The main reason for employing adversarial examples is to search and identify neighboring candidate spaces more effectively that will assign the forgetting data samples. Assigning forgetting classes to random or irrelevant classes can dramatically reduce downstream task performance.

Therefore, carefully exploring the neighboring space allows us not only to forget DfD_{f} but also to preserve the decision boundary of other classes. Hence, adversarial attacks (Madry et al. 2017; Chen et al. 2023) can be explored below:

xt+1=Π(xt+(ϵsign(x(x,y,θ))),x^{t+1}=\Pi(x^{t}+(\epsilon\cdot sign(\nabla_{x}\mathcal{L}({x},{y},\theta))), (1)

where the parameter θ\theta represents the weights of the target model under attack, and generated noise for crafting adversarial examples is produced by computing the gradient x\nabla_{x}\mathcal{L} of the loss function \mathcal{L} with respect to the input xx. This noise is added to xtx^{t} and then projected using the projection method Π\Pi to calculate xt+1x^{t+1}, which is repeated t{t} times. Once xt+1x^{t+1} represents xadvx^{\text{adv}}, it is an adversarial example.

However, we clarify the purpose of adversarial examples used in our work, which differs from prior approaches. The original PGD approach may generate excessive noise and slow the unlearning process considerably. Therefore, there is no need to calculate gradients throughout the entire model to create adversarial examples.

Hence, our proposed Partial-PGD utilizes θc\mathcal{F}^{c}_{\theta} to generate adversarial examples for the unlearning process, as shown in Fig. 1. This technique effectively identifies the neighboring space to allocate DfD_{f}, the forgetting data, similar to conventional PGD. However, it significantly reduces unlearning time by omitting feature layer information, as depicted in Fig. 1. We define our Partial-PGD as follows:

t+1=Π(t+(ϵsign((,y,θ))),\ell^{t+1}=\Pi(\ell^{t}+(\epsilon\ \cdot sign(\nabla_{\ell}\mathcal{L}(\ell,{y},\theta))), (2)

where Partial-PGD applies an adversarial attack to the intermediate latent \ell obtained from θf\mathcal{F}^{f}_{\theta}, where \ell undergoes gradient computation based solely on passing through θc\mathcal{F}^{c}_{\theta}. Then, the result is mapped to the nearby space of a different label of \ell and becomes adv\ell^{\text{adv}}, which we use for unlearning Df{D_{f}} as knowledge to be forgotten.

3.3 Layer Unlearning

While other approaches use entire layers for unlearning, we focus on unlearning only the relevant layers. Inspired by the FF technique, we focus on the classification layer θc\mathcal{F}^{c}_{\theta} to forget specific classes in the model for class-wise unlearning. Therefore, our layer unlearning focuses on only modifying the parameters of θc{\mathcal{F}^{c}_{\theta}} tied to classification instead of the entire layers and model θ\mathcal{M}_{\theta} to forget Df{D_{f}} effectively.

We define the following equation to describe our unlearning process, where we focus on the θc\mathcal{F}^{c}_{\theta} during the unlearning process to remove Df{D_{f}} from the model:

θ=θcθf,\mathcal{M}_{\theta^{*}}=\mathcal{F}_{\theta^{*}}^{c}\circ\mathcal{F}^{f}_{\theta}, (3)

where θ\theta^{*} is the ideal parameters after forgetting DfD_{f}.

We show that layer unlearning accelerates the unlearning process by selectively updating relevant layer weights and optimizing efficiency. Interestingly, it outperforms models with whole layers in accuracy.

3.4 End-to-End Unlearning Process

We describe our end-to-end unlearning process, where we apply the KD to improve the overall performance further. As illustrated in Fig. 2, the classification layer θc{\mathcal{F}^{c}_{\theta}} serves as our student model 𝒮θ\mathcal{S}_{\theta}. Additionally, at the beginning of each epoch, we duplicate the 𝒮θ\mathcal{S}_{\theta} as our teacher 𝒯θ\mathcal{T}_{\theta}. The model uses forgetting data DfD_{f} as input to create an intermediate latent feature f\ell_{f} through the feature layer θf\mathcal{F}_{\theta}^{f}. Then, f\ell_{f} becomes an adversarial example fadv\ell^{\text{adv}}_{f} after applying a Partial-PGD on the 𝒯θ\mathcal{T}_{\theta}.

Next, f\ell_{f} and fadv\ell^{\text{adv}}_{f} are passed through 𝒮θ\mathcal{S}_{\theta} and 𝒯θ\mathcal{T}_{\theta}, respectively, becoming logits for each student and teacher, as shown in Fig. 2. Then, the logit obtained from 𝒮θ\mathcal{S}_{\theta} is compared with the ground truth yfy_{f}. If a discrepancy is observed, it is considered unlearned. Then, the unlearned logit replaces the adversarial logit from 𝒯θ\mathcal{T}_{\theta}. This student’s logit is used to compute the cross-entropy loss as follows:

CE={CE(𝒮θ(f),yfadv)if y𝒮θ=yfCE(𝒮θ(f),y𝒮θ)otherwise,\mathcal{L}_{CE}=\begin{cases}\text{C}E(\mathcal{S}_{\theta}(\ell_{f}),y_{f}^{\text{adv}})&\text{if }y_{\mathcal{S}_{\theta}}=y_{f}\\ \text{C}E(\mathcal{S}_{\theta}(\ell_{f}),y_{\mathcal{S}_{\theta}})&\text{otherwise},\end{cases} (4)

where y𝒮θy_{\mathcal{S}_{\theta}} represents the predicted label from 𝒮θ(f)\mathcal{S}_{\theta}(\ell_{f}), and CE is the cross-entropy function. This loss leaves the unlearned data in a state, where it makes wrong (unlearned) predictions. If not, it is trained to be a predicted label yfadvy_{f}^{\text{adv}} of adversarial logit, leading to its unlearning process. Next, let ZZ be the double Softmax representation, which is defined as:

Z={σ(𝒯θ(fadv))if y𝒮θ=yfσ(𝒮θ(f))otherwise,\textnormal{Z}=\begin{cases}\sigma(\mathcal{T}_{\theta}(\ell_{f}^{\text{adv}}))&\text{if }y_{\mathcal{S}_{\theta}}=y_{f}\\ \sigma(\mathcal{S}_{\theta}(\ell_{f}))&\text{otherwise},\end{cases} (5)

where σ\sigma represents Softmax function. In Eq. 5, we performed double Softmax to distill knowledge by adjusting the probability distribution of the output from 𝒯θ\mathcal{T}_{\theta}. This approach is intended to convey soft label information to 𝒮θ\mathcal{S}_{\theta}. Exclusively unlearning θc\mathcal{F}^{c}_{\theta} maintains the decision boundaries of retain data, and slightly improves the overall accuracy. But, layer unlearning without double Softmax showed variable accuracy, as shown in the Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf 2017). We show this effect in Section 4.3. Next, we define our distillation loss as follows:

DI=KL(σ(𝒮θ(f)T),σ(ZT)),\mathcal{L}_{DI}=\text{KL}\left(\sigma(\cfrac{\mathcal{S}_{\theta}(\ell_{f})}{T}),\sigma(\cfrac{\textnormal{Z}}{T})\right), (6)

where knowledge is distilled from Z of 𝒯θ\mathcal{T}_{\theta} and KL is the KL divergence. During distillation, the computation of loss DI\mathcal{L}_{DI} between the outputs of 𝒮θ\mathcal{S}_{\theta} and 𝒯θ\mathcal{T}_{\theta} focuses on creating a similar boundary to the teacher model, ensuring performance while removing information of DfD_{f}. The temperature TT is a hyper-parameter. Generally, increasing TT will generate smoother soft labels that assists 𝒮θ\mathcal{S}_{\theta} in mimicking 𝒯θ\mathcal{T}_{\theta}. The effects of changes in TT are described in Suppl. Mat.

Using CE\mathcal{L}_{CE} and DI\mathcal{L}_{DI}, our final loss function is constructed as follows:

=(1α)CE+αT2DI,\mathcal{L}=(1-\alpha)\cdot\mathcal{L}_{CE}+\alpha\cdot T^{2}\cdot\mathcal{L}_{DI}, (7)

where the value of α\alpha represents the weight assigned to the loss between CE\mathcal{L}_{CE} and DI\mathcal{L}_{DI}. As a hyper-parameter, α\alpha ranges from 0 to 1. Assigning additional weight to CE\mathcal{L}_{CE} may boost unlearning time but decrease performance. Conversely, if we provide more weight to DI\mathcal{L}_{DI}, the unlearning speed may slow down but can increase accuracy. We conducted the ablation study for α\alpha values to capture the trade-off. The effects of changes in the exponent of T2T^{2} are described in Suppl. Mat.

Algorithm 1 End-to-End Unlearning Process

Input: θf\mathcal{F}^{f}_{\theta}, θc\mathcal{F}^{c}_{\theta}, DfD_{f}
Parameter: Learning rate η\eta, Hyper-parameters α\alpha, Temperature T{T}, Number of Epochs E{E}
Output: θ\mathcal{M}_{\theta^{*}}

1:  𝒮θθc\mathcal{S}_{\theta}\leftarrow\mathcal{F}^{c}_{\theta}
2:  θθ\theta^{*}\leftarrow\theta
3:  for ee in range EE do
4:     𝒯θ𝒮θ\mathcal{T}_{\theta^{*}}\leftarrow\mathcal{S}_{\theta^{*}}
5:     (1α)CE+αT2DI{\mathcal{L}}\leftarrow(1-\alpha)\cdot{\mathcal{L}}_{CE}+\alpha\cdot T^{2}\cdot\mathcal{L}_{DI}
6:     θθη\theta^{*}\leftarrow\theta^{*}-\eta\cdot\mathcal{L}
7:     if θcf(Xf)!=Yf\mathcal{F}^{c}_{\theta^{*}}\circ\mathcal{F}^{f}(X_{f})!=Y_{f} then
8:        break
9:     end if
10:  end for
11:  θθcθf\mathcal{M}_{\theta^{*}}\leftarrow\mathcal{F}^{c}_{\theta^{*}}\circ\mathcal{F}^{f}_{\theta}
12:  return θ\mathcal{M}_{\theta^{*}}
Refer to caption
Figure 3: Boundary evolution in the unlearning process. As shown in (a), the original model receives the initial knowledge about the boundary. As the epoch progresses, the boundary information updates as depicted in (b) and (c) from the distilled knowledge.

In addition, we provide the end-to-end unlearning process in Alg. 1. We distill knowledge from 𝒯θ\mathcal{T}_{\theta}, while gradually reducing boundaries. Algorithm 1 finishes either when all epochs are completed or when DfD_{f} becomes unlearned within a batch during an epoch. Finally, we obtain our unlearning model θ\mathcal{M}_{\theta^{*}} by combining θf\mathcal{F}^{f}_{\theta} with the classification layer, θc\mathcal{F}^{c}_{\theta^{*}}, as shown in Eq. 3.

Summary. In Fig. 3, we pictorially describe our end-to-end unlearning process by displaying the boundary change for the retain and forgetting data.

4 Experimental Results

Table 1: Accuracy and Unlearning Score (US) performance on the CIFAR-10, Fashion-MNIST and VGGFace2 datasets. Bold font highlights the highest performing results, while underlining indicates the second-best performance.
Model VGG16 ResNet18 ResNet50 ViT
Metrics DrD_{r}\uparrow DfD_{f}\downarrow DtrD_{tr}\uparrow DtfD_{tf}\downarrow US DrD_{r}\uparrow DfD_{f}\downarrow DtrD_{tr}\uparrow DtfD_{tf}\downarrow US DrD_{r}\uparrow DfD_{f}\downarrow DtrD_{tr}\uparrow DtfD_{tf}\downarrow US DrD_{r}\uparrow DfD_{f}\downarrow DtrD_{tr}\uparrow DtfD_{tf}\downarrow US
CIFAR-10 Original 99.98 100 92.07 96.70 0.4494 99.98 100 93.13 96.60 0.4575 99.94 99.96 93.44 95.0 0.4646 88.06 93.52 81.48 88.40 0.4020
Retrain (Optimal) 99.89 0 91.98 0 0.9390 99.79 0 92.50 0 0.9428 99.77 0 92.48 0 0.9426 95.0 0 81.0 0 0.8631
Negative Gradient 88.53 16.96 79.86 17.0 0.7320 93.85 28.38 86.30 25.54 0.7204 88.75 24.77 82.52 23.30 0.7087 85.264 18.69 79.74 16.7 0.7332
Fine-tune 99.63 0 90.09 0 0.9253 99.63 0 91.25 0 0.9337 99.45 0 90.79 0 0.9304 90.96 1.77 82.43 1.62 0.8598
Random Label 80.99 3.56 72.40 3.69 0.7805 91.38 11.09 84.00 10.98 0.8007 81.30 12.91 76.62 11.84 0.7467 77.58 15.10 73.42 14.38 0.7094
Fisher Forgetting 46.78 55.24 44.61 52.30 0.3414 59.0 52.34 55.57 52.2 0.3945 58.17 58.06 55.95 56.20 0.3781 42.68 66.34 43.34 62.30 0.2911
Boundary Shrink 90.73 10.16 81.53 9.58 0.7943 95.88 9.75 87.91 10.24 0.8329 86.03 3.94 80.09 3.46 0.8303 85.22 0.61 79.29 0.28 0.8498
IWU 90.81 0 82.35 0.10 0.8712 89.41 0 82.55 0 0.8733 86.11 0 79.98 0 0.8564 82.48 3.92 77.01 2.58 0.8173
Ours 99.97 0 92.18 0 0.9405 99.97 0 93.53 0 0.9504 99.92 0 93.52 0 0.9503 87.51 0 81.14 0 0.8640
Fashion-MNIST Original 99.83 100 94.38 99.60 0.4579 98.45 99.96 94.71 99.70 0.4601 98.49 99.98 94.68 99.6 0.4601 91.27 98.71 88.28 97.10 0.4210
Retrain (Optimal) 100 0 93.40 0 0.9494 100 0 93.38 0 0.9493 100 0 93.28 0 0.9485 89.44 0 86.76 0 0.9019
Negative Gradient 97.77 0 92.63 0 0.9438 92.57 1.39 90.04 0.84 0.9183 84.44 12.63 81.42 10.22 0.7890 71.77 0.10 70.38 0.10 0.7964
Fine-tune 99.67 0 93.07 0 0.9470 97.23 0 91.93 0 0.9386 98.83 0 92.85 0 0.9454 96.08 0.01 88.72 0.10 0.9148
Random Label 98.17 8.34 92.43 23.55 0.7763 76.80 11.47 74.80 11.54 0.7375 75.99 10.77 73.73 10.72 0.7368 84.18 11.36 82.10 13.04 0.7736
Fisher Forgetting 62.33 28.81 60.32 28.10 0.5471 72.78 57.65 71.03 54.10 0.4705 60.59 84.01 60.25 82.60 0.2958 43.42 88.01 42.60 86.3 0.1972
Boundary Shrink 86.88 1.47 81.66 1.12 0.8586 95.78 34.54 92.31 32.40 0.7225 83.50 30.23 80.60 27.08 0.6728 70.31 2.04 68.74 2.70 0.7665
IWU 99.09 0 93.68 0 0.9515 93.82 0 90.80 0 0.9304 80.17 0 77.94 0 0.8434 82.85 0 81.21 0 0.8645
Ours 99.51 0 93.89 0 0.9531 97.98 0 94.54 0 0.9579 98.14 0 94.48 0 0.9575 90.11 0 87.44 0 0.9066
VGGFace2 Original 100 100 96.67 98.41 0.4787 100 100 95.88 98.41 0.4727 99.12 98.43 93.67 100 0.4514 94.71 96.86 95.43 93.82 0.4832
Retrain (Optimal) 99.98 0 96.67 0 0.9740 100 0 96.20 0 0.9705 99.10 0 94.77 0 0.9596 92.63 0 93.32 0 0.9488
Negative Gradient 96.85 15.67 90.50 4.76 0.8915 97.32 9.75 89.55 12.69 0.8272 86.80 4.73 78.79 3.17 0.8241 91.16 1.63 92.34 0 0.9416
Fine-tune 97.86 0 89.87 0 0.9416 91.42 0 85.91 0 0.8960 95.18 0 90.03 0 0.9249 96.91 1.63 84.85 3.70 0.8600
Random Label 90.32 1.74 79.11 1.58 0.8384 96.76 6.44 87.34 0 0.9059 88.24 13.19 82.43 9.52 0.8007 92.06 9.68 91.04 8.64 0.8667
Fisher Forgetting 46.24 31.01 42.72 50.79 0.3400 72.78 57.65 71.03 54.10 0.4705 76.28 4.52 71.83 7.93 0.7455 60.80 71.07 53.58 60.49 0.3472
Boundary Shrink 99.48 17.25 93.04 5.36 0.9055 94.02 5.40 86.08 5.36 0.8559 93.85 5.36 85.78 5.0 0.8565 86.92 6.46 86.81 4.25 0.8693
IWU 99.21 10.80 94.46 4.76 0.8650 75.23 0.17 69.77 0 0.7936 78.62 0 69.14 0 0.7899 76.25 0.27 78.66 0 0.8479
Ours 99.70 0 96.70 0 0.9743 99.79 0 95.34 0 0.9639 97.46 0 93.28 0 0.9485 95.18 0 95.50 0 0.9651

We experiment and evaluate popular unlearning benchmarks used in other unlearning research (Golatkar, Achille, and Soatto 2020; Chen et al. 2023; Cha et al. 2023) on image classification tasks.

Datasets and Models. We conducted experiments on CIFAR-10 (Krizhevsky, Hinton et al. 2009), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), and VGGFace2 (Cao et al. 2018) datasets. For the VGGFace2 dataset, we randomly select ten individuals from a training dataset containing over 600 images, ensuring a balanced gender distribution. Furthermore, we perform training from scratch for three different architectures: VGG16 (Simonyan and Zisserman 2014), ResNets (He et al. 2016), and ViT (Dosovitskiy et al. 2020).

Baseline Approaches. The subsequent unlearning baseline methods are used:

1) Original: We train the model on the DtrainD_{\text{train}} dataset before undergoing the unlearning process.

2) Retrain: We train the model from scratch utilizing DrD_{r} as the retrained model, an optimal unlearning strategy.

3) Negative Gradient (NG) (Golatkar, Achille, and Soatto 2020): We fine-tune the Original with DfD_{f} by following the direction of gradient ascent.

4) Fine-tune (Golatkar, Achille, and Soatto 2020): We fine-tune the Original using DrD_{r} with a large learning rate.

5) Random Label (Golatkar, Achille, and Soatto 2020): We fine-tune the Original by assigning arbitrary labels randomly to DfD_{f}.

6) Fisher Forgetting (Golatkar, Achille, and Soatto 2020): The Fisher Forgetting model identifies influential parameters significantly affecting DfD_{f} and then introduces noise to neutralize their impact.

7) Boundary Shrink (Chen et al. 2023): We create adversarial examples from DfD_{f} and assign new adversarial labels to shrink towards different classes.

8) IWU (Cha et al. 2023): Generating adversarial instances for distinct labels via DfD_{f} and incorporating a regularization term. While initially designed for instance-wise learning, we adapt this method for class-wise unlearning problems.

Implementation Details and Evaluation Metrics. Our method and other baselines are implemented in Python 3.7 and use the PyTorch library (Paszke et al. 2019), employing a single NVIDIA GeForce RTX 3090 GPU. The initial model was trained using an LR scheduler and an SGD optimizer with specific settings (momentum: 0.9, weight decay: 5 × 10410^{-4}, initial learning rate: 0.01). For the unlearning phase, we employ the SGD optimizer and conduct experiments with varying learning rates (ranging from 0.001 to 0.01), KD α\alpha values (ranging from 0.3 to 0.7), KD temperature TT value (fixed at 4), and Partial-PGD values (ranging from 0.4 to 1.0). As defined, DfD_{f} and DrD_{r} represent the forgetting and retain data, respectively. Additionally, DtfD_{tf} corresponds to the test forgetting data, and DtrD_{tr} represents the test retain data. We assess the accuracy of all four different datasets.

4.1 Accuracy Performance

To achieve the best unlearning performance, it should completely forget information related to DfD_{f}. Therefore, guaranteeing accuracy on a par with those achieved by the Retrain for both DfD_{f} and DrD_{r} will be the best. Table 1 presents test results from different classification models, datasets, and metrics. The tested models include VGG16, ResNet18, ResNet50, and ViT. The datasets used for testing were CIFAR-10, Fashion-MNIST, and VGGFace2. In addition to the accuracy metric, we evaluate the performance using the unlearning score (US) as follows:

US(accr,accf)=exp(accr100)+exp(1accf100)22(exp(1)1),{\text{US}(\text{acc}_{r},\text{acc}_{f})}=\cfrac{\exp(\cfrac{\text{acc}_{r}}{100})+\exp(1-\cfrac{\text{acc}_{f}}{100})-2}{2\cdot(\exp(1)-1)}, (8)

where accr\text{acc}_{r} and accf\text{acc}_{f} denote the accuracy of the retain and forgetting dataset, respectively. If the DtrD_{tr} approaches 100% and DtfD_{tf} approaches 0%, the US metric approaches 1, indicating a stable result on the unlearning process. We provide a more detailed explanation of why this metric is useful for unlearning in Suppl. Mat.

Finally, Table 1 presents the performance of each unlearning method for a specific single class in the aforementioned datasets. We measure the accuracy for datasets DrD_{r}, DfD_{f}, DtrD_{tr}, and DtfD_{tf}, along with the metric US. For the NG, the unstable variability in the loss of negative gradient contributes to less favorable overall performance results. Fine-tune shows strong performance in forgetting and retaining information. Nevertheless, this methodology requires utilizing the complete dataset DrD_{r} during training. Such extensive data is time-consuming, and we analyze and compare their worse time performance in Table 2. In the case of Random Label, except for VGGFace2’s ResNet18, most cases have poor accuracy and US. Due to the random nature of forgetting, converging towards arbitrary labels in the classification space is challenging, resulting in low performance.

Fisher Forgetting exhibits poor performance, with low accuracy and US on the overall test. Also, the Fisher matrix information required a significant amount of time. For Boundary Shrink, they also utilized adversarial attack examples, but they used the hard label information of the attack examples on Df{D_{f}}, which resulted in an unstable unlearning process. IWU approach involves utilizing adversarial attack examples while incorporating regularization to achieve a stable unlearning process. However, this gains an average US of 0.8587 in the overall test.

Table 2: Total extra data used and time consumption in secsec for training different unlearning methods.
Retrain Fisher Fine- NG Random Boundary IWU Ours
Forgetting tune Label Shrink
CIFAR-10 Total Extra 45,000 45,000 45,000 5,000 5,000 5,000 5,000 5,000
Data Used
Time w/ VGG16 3,683 9,710 433 73 24 116 1351 3.76
Time w/ ResNet18 2,871 12,526 546 153 30 191 362 4.37
Time w/ ResNet50 4,705 19,850 1,061 174 57 471 1513 7.76
Time w/ ViT 4,441 13,238 479 78 23 163 1563 25.93
Fshion-MNIST Total Extra 54,000 54,000 54,000 6,000 6,000 6,000 6,000 6,000
Data Used
Time w/ VGG16 2,309 8,526 430 85 23 214 1072 8.75
Time w/ ResNet18 2,768 12,116 582 103 30 715 223 5.19
Time w/ ResNet50 5,758 22,013 1,229 206 76 929 967 9.14
Time w/ ViT 2,155 8,377 487 80 25 282 546 13.39
VGGFace2 Total Extra 5,726 5,726 5,726 574 574 574 574 574
Data Used
Time w/ VGG16 1,840 1,295 468 400 17 338 548 5.6
Time w/ ResNet18 1,861 1,354 670 140 27 473 1258 6.51
Time w/ ResNet50 3,721 2,597 3,291 484 157 503 1837 17.77
Time w/ ViT 2,155 1,428 665 84 27 187 783 6.74

Finally, Ours completely removes the forgetting dataset (0% accuracy) on all the test cases and retains the highest unlearning performance. The accuracy for both DfD_{f} and DtfD_{tf} reaches 0, while the accuracy for DrD_{r} and DtrD_{tr} is comparable to or sometimes even higher than the Retrain. Also, ours demonstrates superior performance compared to almost all baseline models across various scenarios, with a high US average of 0.9443. Our approach that utilizes Partial-PGD and KD-based unlearning processes on layers with explicit objectives clearly achieves the best unlearning performance.

Refer to caption
Figure 4: Impact of hyper-parameter α\alpha in Knowledge Distillation vs. Accuracy on CIFAR-10 with ResNet18.
Refer to caption
Figure 5: Visualization of decision boundary for the CIFAR-10 dataset in ResNet18, where each point represents a sample colored with the predicted classes. Red dots in (a) are the data to be removed, which are not showing in (b) and (c), indicating the successful unlearning. Similar plots for other models are provided in Suppl. Mat.
Table 3: Original PGD vs. Partial-PGD.
Original PGD Partial PGD
DtrD_{tr} DtfD_{tf} Time (s) DtrD_{tr} DtfD_{tf} Time (s)
VGG16 92.03 0 14.18 92.18 0 3.76
ResNet18 92.97 0 18.19 93.53 0 4.37
ResNet50 91.84 0 44.15 93.52 0 7.76
ViT 78.07 0 237.36 81.14 0 25.93

4.2 Data Usage & Time Performance

Table 2 presents each method’s elapsed time and data usage. The Retrain, Fisher Forgetting, and Fine-tuning leverage the entire DrD_{r} dataset, resulting in significant time costs for unlearning. Including our method, the rest of the unlearning methods utilize only DfD_{f}. In the case of Fisher Forgetting, it takes a longer time than the Retrain, and its unlearning performance is significantly poor. While the Fine-tune exhibits favorable unlearning performance, it comes with the drawback of consuming a considerable time. However, our method showcases optimal unlearning performance, while consuming only 3.76 seconds in the quickest scenario. To summarize, our approach exhibits higher efficiency, compared to competing methods.

4.3 Ablation Study

We performed several different ablation experiments to analyze and show the benefits of our approach.

Original PGD vs. Partial-PGD.

Table 3 compares unlearning performance when applying the original PGD vs. Partial-PGD within our method on the CIFAR-10 dataset. While the original PGD yields high unlearning performance, Partial-PGD indicates even superior outcomes. Notably, Partial-PGD accelerates the unlearning process by up to nearly tenfold compared to the original PGD.

Table 4: Effect of Softmax vs. Double Softmax.
w/o Double Softmax w/ Double Softmax
DtrD_{tr} DtfD_{tf} Time (s) DtrD_{tr} DtfD_{tf} Time (s)
VGG16 84.74 0 10.9 93.89 0 8.75
ResNet18 91.42 0.1 25.87 94.54 0 5.19
ResNet50 80.91 0 93.49 94.48 0 9.13
ViT 87.01 0 61.37 87.44 0 13.39

Double Softmax.

In our technique, the teacher logits undergo a softmax function before being integrated into the distillation loss. We have coined this method “Double Softmax”, where Double Softmax enhances the robustness of our method across diverse datasets and models. And, Table 4 presents unlearning performance with and without double Softmax in our methods on the Fashion-MNIST dataset.

Data Usage Ratio.

The class-specific DfD_{f} dataset for one class in CIFAR-10 contains 5,000 samples. As shown in Table 5, we reduced the dataset size to 50% (2,500) and 10% (500) for each model to perform the unlearning task. We measure the accuracy, US, and execution time of Dtr{D_{tr}}, Dtf{D_{tf}}. In the following scenario, all models completed the unlearning for 2,500 samples, but ViT still had 0.1% retaining for 500 samples. The execution speed increases as the size of Df{D_{f}} decreases. Our experiment shows the potential for achieving superior unlearning performance by focusing on critical subsets of Df{D_{f}} rather than employing the complete dataset, saving time nearly seven times.

Table 5: The changes in time and accuracy performance with the reduction in Df{D_{f}} data on the CIFAR-10.
Model VGG16 ResNet18 ResNet50 ViT
Total Extra 2,500 500 2,500 500 2,500 500 2,500 500
Data Used
Metrics Dtr{D_{tr}} 92.42 92.38 93.51 93.38 93.63 93.37 81.14 81.6
Dtf{D_{tf}} 0 0 0 0 0 0 0 0.1
US 0.9422 0.9420 0.9503 0.9493 0.9512 0.9493 0.8640 0.8662
Time 1.91 1.21 2.28 1.45 3.81 1.62 25.63 14.55

Hyper-parameter α\alpha in KD.

As shown in Fig. 4, we examine the accuracy variation of Dtr{D_{tr}} and Dtf{D_{tf}} with respect to changes in the hyper-parameter α\alpha in Eq. 6. As the α\alpha approaches zero, it exclusively prioritizes the removal of DfD_{f} without taking into account any information from DrD_{r}. Consequently, the information about Dtf{D_{tf}} is completely removed, resulting in a decrease in the accuracy of Dtr{D_{tr}}. As α\alpha approaches one, heavily relying on the teacher model for retaining information increases Dtf{D_{tf}} accuracy, indicating ineffective unlearning. Therefore, selecting the appropriate α\alpha value can maximize unlearning performance. Consequently, we used α\alpha ranging from 0.4 to 0.6 in this work. In more detail, the effects of changes in α\alpha are described in Suppl. Mat.

4.4 Visualization on Decision Boundary

Figure 5 presents the Original, Retrain, and Ours using t-SNE on the CIFAR-10 dataset. The red dots represent samples of ship images, indicated as Df{D_{f}}. As shown in Fig. 5(b), Df{D_{f}} was totally misclassified in the Retrain. On the other hand, Our unlearning method produces results correctly, as shown in Fig. 5(c), where the decision boundary of Df{D_{f}} has been successfully absorbed into the surrounding space.

5 Conclusion

In this paper, we introduce a novel and fast machine unlearning algorithm, layer attack unlearning, which is the new layer-based unlearning paradigm. Our work proposes Partial-PGD, layer unlearning method, and KD end-to-end framework to improve the overall accuracy performance while completely removing the forgetting dataset. Through extensive experimental evaluations, we demonstrated that modifying only specific layers’ learning objectives can lead to successful unlearning. Our approach effectively decreases both the number of parameters and their updates (computational cost), consequently reducing the overall time required for unlearning. We believe our layer attack unlearning paves a new way for future research in effectively addressing various unlearning challenges.

6 Acknowledgments

The authors would thank anonymous reviewers. Simon S. Woo is the corresponding author. This work was partly supported by Institute for Information & communication Technology Planning & evaluation (IITP) grants funded by the Korean government MSIT: (No. 2022-0-01199, Graduate School of Convergence Security at Sungkyunkwan University), (No. 2022-0-01045, Self-directed Multi-Modal Intelligence for solving unknown, open domain problems), (No. 2022-0-00688, AI Platform to Fully Adapt and Reflect Privacy-Policy Changes), (No. 2021-0-02068, Artificial Intelligence Innovation Hub), (No. 2019-0-00421, AI Graduate School Support Program at Sungkyunkwan University), and (No. RS-2023-00230337, Advanced and Proactive AI Platform Research and Development Against Malicious deepfakes).

References

  • Bourtoule et al. (2021) Bourtoule, L.; Chandrasekaran, V.; Choquette-Choo, C. A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; and Papernot, N. 2021. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), 141–159. IEEE.
  • Burgess (2023) Burgess, M. 2023. ChatGPT Has a Big Privacy Problem. https://www.wired.com/story/italy-ban-chatgpt-privacy-gdpr/.
  • Cao et al. (2018) Cao, Q.; Shen, L.; Xie, W.; Parkhi, O. M.; and Zisserman, A. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), 67–74. IEEE.
  • Cao and Yang (2015) Cao, Y.; and Yang, J. 2015. Towards making systems forget with machine unlearning. In 2015 IEEE symposium on security and privacy, 463–480. IEEE.
  • Cha et al. (2023) Cha, S.; Cho, S.; Hwang, D.; Lee, H.; Moon, T.; and Lee, M. 2023. Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. arXiv preprint arXiv:2301.11578.
  • Chen et al. (2023) Chen, M.; Gao, W.; Liu, G.; Peng, K.; and Wang, C. 2023. Boundary Unlearning. arXiv preprint arXiv:2303.11570.
  • Dosovitskiy et al. (2020) Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; and Unterthiner, T. 2020. Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  • Golatkar, Achille, and Soatto (2020) Golatkar, A.; Achille, A.; and Soatto, S. 2020. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9304–9312.
  • Goodfellow, Shlens, and Szegedy (2014) Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  • He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  • Hinton (2022) Hinton, G. 2022. The forward-forward algorithm: Some preliminary investigations. arXiv preprint arXiv:2212.13345.
  • Hinton, Vinyals, and Dean (2015) Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  • Kaye (2023) Kaye, K. 2023. The FTC’s ’profoundly vague’ plan to force companies to destroy algorithms could get very messy. https://www.protocol.com/enterprise/ftc-algorithm-data-model-ai/.
  • Krizhevsky, Hinton et al. (2009) Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images.
  • Kublik (2023) Kublik, V. 2023. EU/US Copyright Law and Implications on ML Training Data. https://valohai.com/blog/copyright-laws-and-machine-learning/.
  • Madry et al. (2017) Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
  • Mantelero (2013) Mantelero, A. 2013. The EU Proposal for a General Data Protection Regulation and the roots of the ‘right to be forgotten’. Computer Law & Security Review, 29(3): 229–235.
  • Nguyen et al. (2022) Nguyen, T. T.; Huynh, T. T.; Nguyen, P. L.; Liew, A. W.-C.; Yin, H.; and Nguyen, Q. V. H. 2022. A survey of machine unlearning. arXiv preprint arXiv:2209.02299.
  • Paszke et al. (2019) Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  • Shibata et al. (2021) Shibata, T.; Irie, G.; Ikami, D.; and Mitsuzumi, Y. 2021. Learning with Selective Forgetting. In IJCAI, volume 3, 4.
  • Simonyan and Zisserman (2014) Simonyan, K.; and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  • Xiao, Rasul, and Vollgraf (2017) Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.

Supplementary Materials

Appendix A Datasets

We used the three different datasets as follows:

  • CIFAR-10. The CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009) is a widely used benchmark in classification tasks. It consists of 60,000 images in ten classes. The dataset is divided into a training set of 50,000 images and a test set of 10,000 images. We experiment to erase only one class (5,000 images) out of 10 classes.

  • Fashion-MNIST. The Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf 2017) is popular in classification tasks. It contains 70,000 grayscale images of various fashion items, categorized into ten classes. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images. We experiment with erasing only one class (6,000 images) out of 10 classes. We utilize the dataset to evaluate unlearning performance in grayscale images.

  • VGGFace2. The VGGFace2 dataset (Simonyan and Zisserman 2014) is a large-scale face dataset designed for face recognition tasks. This dataset consists of facial data and is closely related to tasks to preserve privacy. Given the high similarity among classes, it is a crucial dataset for assessing the effectiveness of unlearning methods in real-life scenarios involving facial data. It consists of diverse face images that vary regarding identities, poses, illuminations, backgrounds, and expressions. The dataset contains over 3.31 million images from more than 9,000 individuals. But, to experiment with our unlearning task, we randomly chose ten individuals from a training dataset containing over 600 images, ensuring a balanced distribution of gender.

Appendix B Evaluation Metrics for Unlearning Performance

The results in our experiments are evaluated based on the following metrics:

Accuracy.

In order to assess a classifier’s performance, accuracy is frequently utilized. It measures the percentage of samples for which the true classes can be predicted with the maximum degree of certainty. Accuracy of a model θ\mathcal{M}_{\theta} tested on a dataset of N samples {(x1,y1),,(xN,yN)}\{(x_{1},y_{1}),...,(x_{N},y_{N})\} is formulated as follows:

ACC=100i=1Nδ(σ(θ(xi)),yi)N,\text{ACC}=100\cdot\cfrac{\sum^{N}_{i=1}\delta(\sigma(\mathcal{M}_{\theta}(x_{i})),y_{i})}{\text{N}}, (9)

where δ(,)\delta(\cdot,\cdot) is the Kronecker delta function.

Unlearning Score (US).

Effective unlearning performance refers to the ability of a model to effectively forget information from the forgetting data, while concurrently retaining the relevant information from the retain data. However, determining the most effective metric for unlearning is challenging because of the orthogonal objective between evaluating and measuring forgetting vs. retain data accuracy, where they are not in a linear relationship. For instance, when comparing two unlearning approaches, one exhibits good performance in forgetting data but does poorly retain data well. At the same time, the other will be the opposite case, with poor performance in forgetting data but great accuracy in retaining data. In such scenarios, it becomes quite challenging to determine which method is better based solely on one of the two accuracies. Therefore, while considering both accuracies is essential, there is no straightforward way to assess both simultaneously. Such discrepancy leads to difficulties in evaluating unlearning performance.

Therefore, to evaluate unlearning performance more accurately and effectively, we propose and define a new metric, called Unlearning Score (US), which effectively characterizes and combines the two accuracies into a single value to assess unlearning performance. Since accuracy is measured in percentages, we normalize it to a range of 0 to 1 by dividing by 100. As accuracy for forgetting data is preferred to be lower, we subtract the value from 1 to convert it into a higher-is-better range. Next, we input the values into the exponential function. The following equation pertains to the retain data:

USr=exp(accr100),\text{US}_{r}=\text{exp}(\cfrac{\text{acc}_{r}}{100}), (10)

where accr\text{acc}_{r} is accuracy of retain data.

Similarly, the following equation is defined for the forgetting data:

USf=exp(1accf100),\text{US}_{f}=\text{exp}(1-\cfrac{\text{acc}_{f}}{100}), (11)

where accf\text{acc}_{f} is accuracy of forgetting data.

In fact, we use exponential functions, which offer a good way to assign and map weights to values, much better than linear functions. In other words, rather than simply using the two accuracies as they are, this approach enables us to better characterize higher scores as accuracies increase. Conversely, for lower accuracies, we can assign lower scores. We calculate the average of USr\text{US}_{r} and USf\text{US}_{f} values obtained through Eq. 10 and Eq. 11, respectively. Then, we normalize them to range from 0 to 1 using minmaxmin-max scaling with exp(1) and exp(0).

Table 6: Unlearning performance based on changes in the α\alpha value in knowledge distillation.
α\alpha 0 0.2 0.5 0.8 1
Metrics DtrD_{tr} DtfD_{tf} US DtrD_{tr} DtfD_{tf} US DtrD_{tr} DtfD_{tf} US DtrD_{tr} DtfD_{tf} US DtrD_{tr} DtfD_{tf} US
CIFAR-10 VGG16 75.31 0 0.8269 91.93 0 0.9386 92.28 0 0.9412 92.17 0 0.9404 92.14 0 0.9402
ResNet18 92.87 0 0.9456 93.38 0 0.9493 93.50 0 0.9502 92.47 2.4 0.9239 91.58 6.7 0.8849
ResNet50 91.75 0 0.9374 93.51 0 0.9503 93.43 0 0.9497 90.06 2.8 0.9033 86 10.30 0.8192
ViT 78.46 0 0.8467 80.65 0 0.8608 81.22 0 0.8645 81.16 0 0.8642 79.11 0.5 0.8469
Fashion-MNIST VGG16 77.38 0 0.8399 94.21 0 0.9555 93.91 0 0.9532 92.78 0 0.9449 80.47 4.9 0.8218
ResNet18 91.14 0 0.9329 93.92 0 0.9533 94.6 0 0.9584 93.38 0.6 0.9446 91.54 7.4 0.8794
ResNet50 85.57 0 0.8937 94.67 0 0.9590 93 0 0.9465 93.18 0.1 0.9471 92.43 8.3 0.8793
ViT 88.22 0 0.9121 88.38 0 0.9132 88.57 0 0.9146 88.75 0 0.9158 88.44 0 0.9136
VGGFace2 VGG16 91.93 0 0.9386 93.35 0 0.9491 96.04 0 0.9693 96.99 0 0.9765 96.99 0 0.9765
ResNet18 56.01 0 0.7185 85.12 0 0.8906 94.62 0 0.9585 94.30 0 0.9562 93.19 0 0.9479
ResNet50 90.82 0 0.9306 93.67 0 0.9514 94.46 0 0.9573 89.39 4.76 0.8836 89.87 14.28 0.8185
ViT 94.30 0 0.9562 95.56 0 0.9657 95.88 0 0.9681 95.72 0 0.9669 95.56 0 0.9657
Table 7: Unlearning performance based on changes in the TT value in knowledge distillation on CIFAR-10 with ResNet18.
T 1 4 8 16 Original
DrD_{r} 99.98 99.98 99.97 99.97 99.98
DfD_{f} 0 0 0 0 100
DtrD_{tr} 93.4 93.53 93.32 93.24 93.13
DtfD_{tf} 0 0 0 0 96.60
US 0.9495 0.9504 0.9489 0.9483 0.4575
Table 8: Unlearning performance based on changes in the xx value of TxT^{x} in knowledge distillation on CIFAR-10 with ResNet18.
xx 1 2 3 4 Original
DrD_{r} 99.97 99.98 99.95 90.63 99.98
DfD_{f} 0 0 1.94 4.77 100
DtrD_{tr} 93.37 93.53 92.67 82.09 93.13
DtfD_{tf} 0 0 2.7 5.03 96.60
US 0.9493 0.9504 0.9230 0.8315 0.4575
Table 9: Original PGD vs. Partial-PGD for all datasets.
Original PGD Partial-PGD
Metrics DtrD_{tr} DtfD_{tf} Time (s) US DtrD_{tr} DtfD_{tf} Time (s) US
CIFAR-10 VGG16 92.03 0 14.18 0.9394 92.18 0 3.76 0.9405
ResNet18 92.97 0 18.19 0.9463 93.53 0 4.37 0.9504
ResNet50 91.84 0 44.15 0.9380 93.52 0 7.76 0.9503
ViT 78.07 0 237.36 0.8442 81.14 0 25.93 0.8640
Fashion-MNIST VGG16 94.15 0 16.61 0.9551 93.89 0 8.75 0.9531
ResNet18 94.49 0 21.35 0.9576 94.54 0 5.194 0.9579
ResNet50 94.47 0 51.74 0.9574 94.48 0 9.14 0.9575
ViT 87.4 0 23.99 0.9063 87.44 0 13.396 0.9066
VGGFace2 VGG16 96.29 0 19.95 0.9349 96.70 0 5.60 0.9743
ResNet18 91.42 0 29.21 0.9467 95.34 0 6.51 0.9639
ResNet50 93.02 0 298.15 0.9712 93.28 0 17.77 0.9485
ViT 95.76 0 18.65 0.9672 95.5 0 6.748 0.9651
Table 10: Effect of Softmax vs. Double Softmax for all datasets.
w/o Double Softmax w/ Double Softmax
Metrics DtrD_{tr} DtfD_{tf} Time (s) US DtrD_{tr} DtfD_{tf} Time (s) US
CIFAR-10 VGG16 92.02 0 3.88 0.9472 92.18 0 3.76 0.9405
ResNet18 93.10 0 4.46 0.9430 93.53 0 4.37 0.9504
ResNet50 92.53 0 7.56 0.9393 93.52 0 7.76 0.9503
ViT 78.73 0 69.42 0.8484 81.14 0 25.93 0.8640
Fashion-MNIST VGG16 84.74 0 10.90 0.8880 93.89 0 8.75 0.9531
ResNet18 91.42 0.1 25.87 0.9341 94.54 0 5.19 0.9579
ResNet50 80.91 0 93.49 0.8625 94.48 0 9.13 0.9575
ViT 87.01 0 61.37 0.9036 87.44 0 13.39 0.9066
VGGFace2 VGG16 92.94 0 3.71 0.9505 96.70 0 5.60 0.9743
ResNet18 93.54 0 8.75 0.9468 95.34 0 6.51 0.9639
ResNet50 93.03 0 26.90 0.9461 93.28 0 17.77 0.9485
ViT 94.91 0 8.49 0.9608 95.50 0 6.74 0.9651

Our final US is constructed and derived to Eq. 8 as follows:

US(accr,accf)\displaystyle\text{US}(\text{acc}_{r},\text{acc}_{f}) =USr+USf2exp(0)exp(1)exp(0)\displaystyle=\cfrac{\cfrac{\text{US}_{r}+\text{US}_{f}}{2}-\text{exp}(0)}{\text{exp}(1)-\text{exp}(0)} (12)
=USr+USf2exp(0)2(exp(1)exp(0))\displaystyle=\cfrac{\text{US}_{r}+\text{US}_{f}-2\cdot\exp(0)}{2\cdot(\exp(1)-\exp(0))} (13)
=exp(accr100)+exp(1accf100)22(exp(1)1)\displaystyle=\cfrac{\exp(\cfrac{\text{acc}_{r}}{100})+\exp(1-\cfrac{\text{acc}_{f}}{100})-2}{2\cdot(\exp(1)-1)}

By introducing this novel metric, US, we can more effectively characterize and evaluate whether an unlearning method has properly forgotten information from forgetting data, while retained information from retain data simultaneously. Throughout experiments, we show that US effectively characterize and capture the underlying performance of forgetting and retain data performance across different proposed methods.

Appendix C Hyper-parameters effects in KD

Table 6 illustrates the variations in unlearning performance based on the hyper-parameter α\alpha in knowledge distillation. When α\alpha is set to 0, our loss function \mathcal{L} employs only CE\mathcal{L}_{CE}, focusing solely on forgetting data. As a result, it may not effectively retain information from the boundary, leading to a potential drop of up to approximately 40%. On the other hand, setting α\alpha to 1 utilizes only DI\mathcal{L}_{DI}, prioritizing the retention of the boundary. Although this approach may preserve boundary information well, it might struggle to forget the forgetting data properly. Hence, striking the right balance between forgetting data and boundary information through an appropriate α\alpha value in knowledge distillation is crucial, as shown in Table 6. Table 7 illustrates the variation in unlearning performance based on the hyper-parameter TT in knowledge distillation. In our experiments, TT = 4 yielded the best performance; however, variations in TT showed a difference in accuracy of 0.2% as indicated in Table 7 under DtrD_{tr}. Table 8 illustrates the variation in unlearning performance based on the exponent of TxT^{x} in Eq. 7, TT is fixed at 4. In our experiments, xx = 2 yielded the best performance, whereas values greater than 3 demonstrated poorer performance. Experiments with hyper-parameters tuning show that appropriately selecting values in knowledge distillation can yield the better performance in the unlearning task.

Appendix D Original PGD vs. Partial-PGD

We conduct experiments on various models and datasets to demonstrate the temporal efficiency and performance advantage of Partial-PGD. In Table 9, the original PGD also presents an excellent performance in terms of unlearning. However, we show that Partial-PGD exhibits comparable or superior performance to original PGD, notably in VGGFace2 with ResNet50. On the other hand, it can save unlearning process time up to 16.77 times. The original PGD requires more time as the model has to utilize the complete model layers. In contrast, as depicted in Fig. 1, Partial-PGD can be considered more effective, as it only uses particular layers to achieve the desired objectives faster.

Appendix E Effectiveness of Double Softmax

As shown in Eq. 5, double Softmax provides performance robustness across various datasets and models. In Table 10, we conduct experiments to examine the effects of double Softmax across different datasets and models. Overall, double Softmax facilitates a faster unlearning convergence speed. Furthermore, though the difference is marginal, our experimental results demonstrate higher accuracy performance across most models. Especially in the case of Fashion-MNIST, notable improvements can be observed. Double Softmax generates softer logits, enhancing robustness against outliers of adversarial examples and improving training stability.

Appendix F Additional Ablation on Data Usage Ratio

Table 11: Unlearning performance with varying amounts of data used for unlearning.
Model VGG16 ResNet18 ResNet50 ViT
Total Extra Data Used 100% 50% 10% 100% 50% 10% 100% 50% 10% 100% 50% 10%
CIFAR-10 DtrD_{tr} 92.18 92.42 92.38 93.53 93.51 93.38 93.52 93.63 93.37 81.14 81.14 81.60
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0 0 0
Time 3.76 1.91 1.21 4.37 2.28 1.45 7.76 3.81 1.62 25.93 25.63 14.55
US 0.9405 0.9422 0.9420 0.9504 0.9503 0.9493 0.9503 0.9512 0.9493 0.8640 0.8640 0.8662
Fashion-MNIST DtrD_{tr} 93.89 94.23 93.74 94.54 94.67 97.19 94.48 94.21 84.88 87.44 87.09 87.46
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0 0 0.1
Time 8.75 2.20 0.48 5.19 1.96 0.63 9.14 4.54 1.04 13.39 4.88 2.69
US 0.9531 0.9556 0.9520 0.9579 0.9589 0.9487 0.9575 0.9555 0.8890 0.9066 0.9042 0.9060
VGGFace2 DtrD_{tr} 96.70 95.83 95.88 95.34 94.35 94.46 93.28 94.24 93.13 95.50 95.82 95.88
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0 0 0
Time 5.60 5.12 5.36 6.51 4.22 1.8 17.77 23.09 15.35 6.74 2.46 2.04
US 0.9743 0.9677 0.9681 0.9639 0.9565 0.9573 0.9485 0.9557 0.9475 0.9651 0.9676 0.9680

We conduct an Ablation Study to investigate whether reducing the amount of randomly selected forgetting data involved in our algorithm’s unlearning process impacts performance while maintaining the possibility of unlearning. Table 11 presents the results when reducing the data used in the unlearning process across various datasets. The remarkable finding is that even with a reduction in the quantity of forgetting data, there is no significant decline in performance from an accuracy perspective. Additionally, a decrease in the completion time of the unlearning process can also be observed. It can be observed that for ViT on Fashion-MNIST, the accuracy of DtfD_{tf} remains at 0.1%.

Table 12: Unlearning performance for each class on CIFAR-10
Forgetting Class 0 1 2 3 4 5 6 7 8 9
VGG16 DrD_{r} 99.98 99.98 99.98 99.98 99.98 99.98 99.98 99.98 99.97 99.97
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 92.7 92.2 93.35 94.44 93.17 93.98 93.54 92.47 92.24 92.35
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 3.75 7.31 3.7 3.71 3.68 3.66 2.5 3.68 3.65 3.72
US 0.9443 0.9406 0.9491 0.9572 0.9477 0.9537 0.9431 0.9426 0.9409 0.9417
ResNet18 DrD_{r} 99.97 99.98 99.98 99.98 99.97 99.97 99.98 99.98 99.98 99.98
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 93.84 93.44 94.39 95.26 93.86 94.53 93.53 93.48 93.58 93.51
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 4.37 4.42 4.44 4.41 3.52 3.25 3.32 4.40 4.42 4.47
US 0.9527 0.9497 0.9568 0.9633 0.9528 0.9578 0.9504 0.9500 0.9508 0.9502
ResNet50 DrD_{r} 99.94 99.93 99.94 99.94 99.94 99.97 99.94 99.92 99.93 99.89
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 93.93 93.07 94.45 95.26 94.06 94.54 93.56 93.48 93.58 92.97
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 7.42 7.41 7.52 7.60 7.49 7.86 7.49 7.54 7.47 7.47
US 0.9534 0.9470 0.9572 0.9633 0.9543 0.9579 0.9506 0.9500 0.9508 0.9463
ViT DrD_{r} 88.43 88.2 88.42 89.74 87.83 88.96 86.48 86.72 87.52 88.07
DfD_{f} 0 0 0.02 0 0 0 0 0 0 0
DtrD_{tr} 82.68 81.60 83.31 84.10 82.14 82.65 80.73 80.84 81.13 82.07
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 15.20 16.68 250.82 21.16 46.01 33.49 104.40 33.68 25.20 8.65
US 0.8742 0.8670 0.8776 0.8837 0.8706 0.8732 0.8613 0.8620 0.8639 0.8701

Appendix G Unlearning Performance on Every Class

Table 13: Unlearning performance for each class on Fashion-MNIST
Forgetting Class 0 1 2 3 4 5 6 7 8 9
VGG16 DrD_{r} 99.84 99.75 99.74 99.62 99.84 99.2 99.88 99.88 99.78 99.85
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 96.02 94.1 95.27 94.93 95.37 93.53 97.3 94.83 94.31 94.5
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 4.35 8.56 4.24 4.32 4.3 4.36 4.33 4.36 4.35 4.33
US 0.9691 0.9546 0.9634 0.9608 0.96422 0.9504 0.9789 0.9601 0.9562 0.9576
ResNet18 DrD_{r} 97.44 97.21 97.94 97.65 96.35 97.05 98.72 98.31 97.9 98.41
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 94.94 93.75 95.36 94.63 93.84 93.48 96.71 94.94 94.48 95.08
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 7.77 15.87 53.49 25.62 15.44 20.89 15.32 10.24 5.15 5.22
US 0.9609 0.9520 0.96415 0.9578 0.9527 0.9500 0.9743 0.9609 0.9575 0.9620
ResNet50 DrD_{r} 97.47 98.02 96.45 98.05 97.22 98.16 98.35 98.13 98.20 98.27
DfD_{f} 0.05 0 0 0 0 0 0 0 0 0
DtrD_{tr} 94.94 94.26 94.11 95.23 94.76 94.53 96.37 94.62 94.63 95.05
DtfD_{tf} 0 0 0 0.1 0 0 0 0 0 0
Time (s) 105.24 118.87 99.68 9.14 99.77 8.83 25.99 17.50 10.86 9.33
US 0.9609 0.9558 0.9547 0.9631 0.9596 0.9578 0.9718 0.9585 0.9586 0.9617
ViT DrD_{r} 93.03 90.52 91.02 91.72 92.82 89.82 93.22 91.99 89.82 91.63
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 90.37 87.63 88.72 89.23 90.74 86.91 91.45 88.93 87.17 88.54
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 9.88 4.93 19.8 9.77 19.7 4.95 14.71 4.93 9.82 4.91
US 0.9273 0.9079 0.9156 0.9192 0.9300 0.9029 0.9351 0.9171 0.9047 0.9143
Table 14: Unlearning performance for each class on VGGFace2
Forgetting Class 0 1 2 3 4 5 6 7 8 9
VGG16 DrD_{r} 99.71 99.89 99.78 99.59 99.52 99.38 99.87 99.74 99.44 99.73
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 96.01 97.11 96.39 96.34 95.87 96.03 97.12 96.25 95.72 96.82
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 19.27 20.01 15.25 5.92 6.35 5.74 6.45 7.63 5.86 6.42
US 0.9690 0.9774 0.9719 0.9715 0.9679 0.9692 0.9775 0.9708 0.9668 0.9752
ResNet18 DrD_{r} 99.85 99.85 99.60 99.82 99.68 99.79 99.54 99.83 99.72 99.94
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 94.58 94.70 94.59 95.23 95.39 95.08 95.21 95.11 95.25 95.55
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 17.94 19.22 10.91 16.62 8.81 8.27 9.1 10.64 8.38 8.84
US 0.9582 0.9591 0.958 0.9631 0.9643 0.9620 0.9630 0.9622 0.9633 0.9655
ResNet50 DrD_{r} 95.19 98.51 98.42 97.84 98.07 98.67 98.62 98.61 98.72 98.61
DfD_{f} 0.05 0 0 0 0 0 0 0 0 0
DtrD_{tr} 95.88 93.9 93.94 93.64 93.49 94.61 94.73 94.46 94.62 94.76
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 22.78 20.06 22.17 17.57 17.65 16.84 18.26 21.6 16.92 18.74
US 0.9680 0.9531 0.9534 0.9512 0.9501 0.9584 0.9593 0.9573 0.9585 0.9596
ViT DrD_{r} 94.88 94.93 95.23 94.77 95.09 94.92 95.91 95.27 95.33 95.22
DfD_{f} 0 0 0 0 0 0 0 0 0 0
DtrD_{tr} 95.38 95.02 94.76 95.23 95.55 95.40 95.21 95.43 95.88 95.23
DtfD_{tf} 0 0 0 0 0 0 0 0 0 0
Time (s) 6.19 6.23 6.93 4.94 5.01 9.99 6.02 6.03 5.31 5.72
US 0.9642 0.9615 0.9596 0.9631 0.9655 0.9644 0.9630 0.9646 0.96802 0.9631

We conduct experiments on our method for all classes to showcase its robust performance regardless of datasets and classes. Table 12 presents experiments on CIFAR-10, demonstrating our method’s ability to quickly erase an entire class, while retaining other information in as little as 2.5 seconds. Table 13 shows experiments on Fashion-MNIST, where although perfect erasure of a single class might not always be achieved, our method consistently demonstrates efficient and effective performance across all other experiments. Finally, Table 14 highlights experiments on VGGFace2, showing our method’s remarkable performance even on face datasets with high inter-class similarity.