Layer Attack Unlearning: Fast and Accurate Machine Unlearning via
Layer Level Attack and Knowledge Distillation
Abstract
Recently, serious concerns have been raised about the privacy issues related to training datasets in machine learning algorithms when including personal data. Various regulations in different countries, including the GDPR, grant individuals to have personal data erased, known as ‘the right to be forgotten’ or ‘the right to erasure’. However, there has been less research on effectively and practically deleting the requested personal data from the training set while not jeopardizing the overall machine learning performance. In this work, we propose a fast and novel machine unlearning paradigm at the layer level called layer attack unlearning, which is highly accurate and fast compared to existing machine unlearning algorithms. We introduce the Partial-PGD algorithm to locate the samples to forget efficiently. In addition, we only use the last layer of the model inspired by the Forward-Forward algorithm for unlearning process. Lastly, we use Knowledge Distillation (KD) to reliably learn the decision boundaries from the teacher using soft label information to improve accuracy performance. We conducted extensive experiments with SOTA machine unlearning models and demonstrated the effectiveness of our approach for accuracy and end-to-end unlearning performance.
1 Introduction
Deep neural networks (DNNs) have achieved significant progress and dramatic performance gains in challenging machine learning tasks in recent years. Among others, large amounts of available training datasets have been the foundation for enabling the revolution of large-scale models. However, recently, due to the privacy concerns raised by ChatGPT (Bourtoule et al. 2021; Burgess 2023), the training dataset would contain personal information or possible information that can leak one’s privacy. For example, many vision-based applications would involve training one’s face images, which are personally identifiable information (PII). Several nations have implemented some types of regulations, such as the General Data Protection Regulation (GDPR) (Mantelero 2013) and the EU/US Copyright Law (Kaye 2023; Kublik 2023), in order to address the potential misuse of personal information and further grant individuals the right to have personal data erased, known as ‘the right to be forgotten’ or ‘the right to erasure.’ The goal of such regulations is to provide data owners the right to request and erase the personal or copyrighted data they want if they have not agreed and consented in the first place.
Therefore, companies using personal data should delete the requested data from the training set. One potential approach for corporations to mitigate the aforementioned concerns involves the exclusion of the required dataset from the training dataset, followed by a complete retraining process from scratch. Nevertheless, as models like ChatGPT get bigger and datasets grow, retraining them from scratch requires excessive computational resources and time.
Machine unlearning has emerged to tackle this challenge, allowing ML models to discard specific data selectively. (Bourtoule et al. 2021) Machine unlearning can be divided into two primary strategies: instance-wise and class-wise unlearning. The former involves forgetting knowledge related to specific instances from ML models, while the latter, which we focus on, completely removes particular classes from ML models. For example, face recognition and social media classification systems may need to erase data related to specific religion, nationality, age, disease, gender, etc., for security and privacy reasons. A few approaches (Chen et al. 2023; Cha et al. 2023) have explored the adversarial attacks for unlearning by harnessing the forgetting data’s noise to navigate the adjacent latent space. However, they used the original PGD (Madry et al. 2017) for unlearning, which can be slow.
In this work, we propose Layer Attack Unlearning, a fast and novel machine unlearning algorithm to tackle the class-wise unlearning problem. Our approach first introduces Partial-PGD, which is a new adversarial attack generation strategy to efficiently search the close vicinity of the data points to delete (See Fig. 1). Our proposed Partial-PGD is designed only to attack fully connected (classification) layer for probing the neighboring latent space to shift the forgetting data. Surprisingly, we do not utilize any feature layer information while achieving efficiency and accuracy. As shown in Fig. 1, Partial-PGD is much more efficient than the original PGD, as it can create adversarial examples only via the classification layer.
In particular, Hinton (2022)’s Forward-Forward (FF) algorithm has inspired us, and we provide the foundation of the concept of layer-based attack for machine unlearning based on FF. According to Hinton (2022), each layer undergoes individualized training in the Forward-Forward algorithm to achieve its specific objectives. Similarly, in line with FF research, we aim to accomplish machine unlearning objectives at layers with characteristics directly relevant to data and features we want to forget. Hence, we focus on performing machine unlearning at the layer level rather than using the entire model. Our layer-wise unlearning approach clearly avoids unnecessary loss calculations during the unlearning process. Furthermore, updating only the layers’ weights related to forgetting data will ensure a reduction in computational costs.
Finally, we employ Knowledge Distillation (KD) (Hinton, Vinyals, and Dean 2015) to modify the decision boundary for the forgetting data and preserve the decision boundary for the retain data. The primary objective in unlearning is to utilize hard labels and acquire soft label information from the teacher model for unlearning tasks to maintain performance. We show that it achieves a stable placement of forgetting data in the space subjected to carefully created adversarial examples. We incorporated KD into our final loss function to improve performance.
Our main contributions are summarized as follows:
-
•
We introduce Layer Attack Unlearning (LAU) algorithm, which is a novel and fast unlearning method by proposing Partial-PGD and performing unlearning at the layer level.
-
•
In addition, we propose KD method to further improve the overall accuracy and data erasure performance by effectively distilling the decision boundary knowledge from the teacher model for unlearning task.
-
•
Our extensive experimental results with seven baselines with four different backbones, including ViT over three other datasets, show that our approach outperforms previous SOTA methods in accuracy and time performance while completely forgetting the requested class.
2 Related Work
There are two main approaches to the current machine unlearning problem in DNNs. The first involves considering unlearning during the learning process, while the second focuses on fine-tuning. This paper will refer to the approach that considers the learning process as “data-driven” and the approach that involves fine-tuning as “model-agnostic.”
2.1 Data-Driven Unlearning Methods
A “data-driven” approach utilizes data-centric strategies such as partitioning and augmentation (Nguyen et al. 2022) to address unlearning. SISA (Bourtoule et al. 2021) and Selective Forgetting (Shibata et al. 2021) are two representative data-driven unlearning methods. In SISA, data is divided into shard units, sequentially trained in slices, and multiple model checkpoints are created. Once an unlearning query is requested, it reverts the query to the checkpoint before learning and retrains this reverted query with the ensemble technique. However, it is challenging to calculate the probability of encountering unlearning queries on data points.
On the other hand, Selective Forgetting (Shibata et al. 2021) involves lifelong learning to perform unlearning. A “mnemonic code” signal is embedded in the data during training. During the unlearning process, the mnemonic code information is selectively incorporated into the loss function to remove forgetting data. This strategy requires storing mnemonic codes for all data points, considering unlearning queries before building the original model. This could be more practical in a real-world scenario.
2.2 Model-Agnostic Unlearning Methods
A “model-agnostic” approach is a methodology for handling the unlearning process by adjusting the model’s learning parameters to achieve data unlearning (Nguyen et al. 2022). Such approaches include various methods such as Summation form (Cao and Yang 2015), Negative Gradient (Golatkar, Achille, and Soatto 2020), Fisher Forgetting (Golatkar, Achille, and Soatto 2020), Boundary unlearning (Chen et al. 2023), Instance-wise Unlearning (Cha et al. 2023), etc. Some methods utilize adversarial attacks to the original model to avoid naively excluding and deleting forgetting data. Among the mentioned algorithms, approaches like ours include Boundary unlearning and Instance-wise Unlearning. These two algorithms perform unlearning by utilizing adversarial attacks to transition forgettable data to nearby spaces. However, a significant difference between our approach and these methods lies in the target of the attack. Our approach directs the unlearning process towards layers with specific classification objectives instead of using entire layers. Furthermore, we aim to introduce effective ways of utilizing PGD in unlearning.


3 Our Approach
The main objective of our approach is to accurately and efficiently perform class-wise unlearning, which is to completely remove specific classes from the classification model. In this section, we describe our Partial-PGD, KD architecture, and our connection to the FF algorithm.
3.1 Preliminaries and Notations
First, we formulate a machine unlearning problem as follows: We define a training dataset = , consisting of inputs and their corresponding class labels . The forgetting dataset is a subset of that we intend to forget from the pre-trained model. Conversely, the retain dataset is the dataset we want to preserve the overall performance.
Next, we define the original model , which comprises a set of feature layers denoted by and a fully connected layer denoted as , where represents the optimal parameters for the model trained on . The following provides a compositional representation of the model as . Also, we denote to represent the adversarial examples (Goodfellow, Shlens, and Szegedy 2014) for the input data . In particular, we define as the adversarial example from Partial-PGD, generated from the intermediate latent feature obtained from the outputs of , as shown in Fig. 1.
3.2 Partial-Projected Gradient Descent (PGD)
The main reason for employing adversarial examples is to search and identify neighboring candidate spaces more effectively that will assign the forgetting data samples. Assigning forgetting classes to random or irrelevant classes can dramatically reduce downstream task performance.
Therefore, carefully exploring the neighboring space allows us not only to forget but also to preserve the decision boundary of other classes. Hence, adversarial attacks (Madry et al. 2017; Chen et al. 2023) can be explored below:
(1) |
where the parameter represents the weights of the target model under attack, and generated noise for crafting adversarial examples is produced by computing the gradient of the loss function with respect to the input . This noise is added to and then projected using the projection method to calculate , which is repeated times. Once represents , it is an adversarial example.
However, we clarify the purpose of adversarial examples used in our work, which differs from prior approaches. The original PGD approach may generate excessive noise and slow the unlearning process considerably. Therefore, there is no need to calculate gradients throughout the entire model to create adversarial examples.
Hence, our proposed Partial-PGD utilizes to generate adversarial examples for the unlearning process, as shown in Fig. 1. This technique effectively identifies the neighboring space to allocate , the forgetting data, similar to conventional PGD. However, it significantly reduces unlearning time by omitting feature layer information, as depicted in Fig. 1. We define our Partial-PGD as follows:
(2) |
where Partial-PGD applies an adversarial attack to the intermediate latent obtained from , where undergoes gradient computation based solely on passing through . Then, the result is mapped to the nearby space of a different label of and becomes , which we use for unlearning as knowledge to be forgotten.
3.3 Layer Unlearning
While other approaches use entire layers for unlearning, we focus on unlearning only the relevant layers. Inspired by the FF technique, we focus on the classification layer to forget specific classes in the model for class-wise unlearning. Therefore, our layer unlearning focuses on only modifying the parameters of tied to classification instead of the entire layers and model to forget effectively.
We define the following equation to describe our unlearning process, where we focus on the during the unlearning process to remove from the model:
(3) |
where is the ideal parameters after forgetting .
We show that layer unlearning accelerates the unlearning process by selectively updating relevant layer weights and optimizing efficiency. Interestingly, it outperforms models with whole layers in accuracy.
3.4 End-to-End Unlearning Process
We describe our end-to-end unlearning process, where we apply the KD to improve the overall performance further. As illustrated in Fig. 2, the classification layer serves as our student model . Additionally, at the beginning of each epoch, we duplicate the as our teacher . The model uses forgetting data as input to create an intermediate latent feature through the feature layer . Then, becomes an adversarial example after applying a Partial-PGD on the .
Next, and are passed through and , respectively, becoming logits for each student and teacher, as shown in Fig. 2. Then, the logit obtained from is compared with the ground truth . If a discrepancy is observed, it is considered unlearned. Then, the unlearned logit replaces the adversarial logit from . This student’s logit is used to compute the cross-entropy loss as follows:
(4) |
where represents the predicted label from , and CE is the cross-entropy function. This loss leaves the unlearned data in a state, where it makes wrong (unlearned) predictions. If not, it is trained to be a predicted label of adversarial logit, leading to its unlearning process. Next, let be the double Softmax representation, which is defined as:
(5) |
where represents Softmax function. In Eq. 5, we performed double Softmax to distill knowledge by adjusting the probability distribution of the output from . This approach is intended to convey soft label information to . Exclusively unlearning maintains the decision boundaries of retain data, and slightly improves the overall accuracy. But, layer unlearning without double Softmax showed variable accuracy, as shown in the Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf 2017). We show this effect in Section 4.3. Next, we define our distillation loss as follows:
(6) |
where knowledge is distilled from Z of and KL is the KL divergence. During distillation, the computation of loss between the outputs of and focuses on creating a similar boundary to the teacher model, ensuring performance while removing information of . The temperature is a hyper-parameter. Generally, increasing will generate smoother soft labels that assists in mimicking . The effects of changes in are described in Suppl. Mat.
Using and , our final loss function is constructed as follows:
(7) |
where the value of represents the weight assigned to the loss between and . As a hyper-parameter, ranges from 0 to 1. Assigning additional weight to may boost unlearning time but decrease performance. Conversely, if we provide more weight to , the unlearning speed may slow down but can increase accuracy. We conducted the ablation study for values to capture the trade-off. The effects of changes in the exponent of are described in Suppl. Mat.
Input: , ,
Parameter: Learning rate , Hyper-parameters , Temperature , Number of Epochs
Output:

In addition, we provide the end-to-end unlearning process in Alg. 1. We distill knowledge from , while gradually reducing boundaries. Algorithm 1 finishes either when all epochs are completed or when becomes unlearned within a batch during an epoch. Finally, we obtain our unlearning model by combining with the classification layer, , as shown in Eq. 3.
Summary. In Fig. 3, we pictorially describe our end-to-end unlearning process by displaying the boundary change for the retain and forgetting data.
4 Experimental Results
Model | VGG16 | ResNet18 | ResNet50 | ViT | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Metrics | US | US | US | US | |||||||||||||||||
CIFAR-10 | Original | 99.98 | 100 | 92.07 | 96.70 | 0.4494 | 99.98 | 100 | 93.13 | 96.60 | 0.4575 | 99.94 | 99.96 | 93.44 | 95.0 | 0.4646 | 88.06 | 93.52 | 81.48 | 88.40 | 0.4020 |
Retrain (Optimal) | 99.89 | 0 | 91.98 | 0 | 0.9390 | 99.79 | 0 | 92.50 | 0 | 0.9428 | 99.77 | 0 | 92.48 | 0 | 0.9426 | 95.0 | 0 | 81.0 | 0 | 0.8631 | |
Negative Gradient | 88.53 | 16.96 | 79.86 | 17.0 | 0.7320 | 93.85 | 28.38 | 86.30 | 25.54 | 0.7204 | 88.75 | 24.77 | 82.52 | 23.30 | 0.7087 | 85.264 | 18.69 | 79.74 | 16.7 | 0.7332 | |
Fine-tune | 99.63 | 0 | 90.09 | 0 | 0.9253 | 99.63 | 0 | 91.25 | 0 | 0.9337 | 99.45 | 0 | 90.79 | 0 | 0.9304 | 90.96 | 1.77 | 82.43 | 1.62 | 0.8598 | |
Random Label | 80.99 | 3.56 | 72.40 | 3.69 | 0.7805 | 91.38 | 11.09 | 84.00 | 10.98 | 0.8007 | 81.30 | 12.91 | 76.62 | 11.84 | 0.7467 | 77.58 | 15.10 | 73.42 | 14.38 | 0.7094 | |
Fisher Forgetting | 46.78 | 55.24 | 44.61 | 52.30 | 0.3414 | 59.0 | 52.34 | 55.57 | 52.2 | 0.3945 | 58.17 | 58.06 | 55.95 | 56.20 | 0.3781 | 42.68 | 66.34 | 43.34 | 62.30 | 0.2911 | |
Boundary Shrink | 90.73 | 10.16 | 81.53 | 9.58 | 0.7943 | 95.88 | 9.75 | 87.91 | 10.24 | 0.8329 | 86.03 | 3.94 | 80.09 | 3.46 | 0.8303 | 85.22 | 0.61 | 79.29 | 0.28 | 0.8498 | |
IWU | 90.81 | 0 | 82.35 | 0.10 | 0.8712 | 89.41 | 0 | 82.55 | 0 | 0.8733 | 86.11 | 0 | 79.98 | 0 | 0.8564 | 82.48 | 3.92 | 77.01 | 2.58 | 0.8173 | |
Ours | 99.97 | 0 | 92.18 | 0 | 0.9405 | 99.97 | 0 | 93.53 | 0 | 0.9504 | 99.92 | 0 | 93.52 | 0 | 0.9503 | 87.51 | 0 | 81.14 | 0 | 0.8640 | |
Fashion-MNIST | Original | 99.83 | 100 | 94.38 | 99.60 | 0.4579 | 98.45 | 99.96 | 94.71 | 99.70 | 0.4601 | 98.49 | 99.98 | 94.68 | 99.6 | 0.4601 | 91.27 | 98.71 | 88.28 | 97.10 | 0.4210 |
Retrain (Optimal) | 100 | 0 | 93.40 | 0 | 0.9494 | 100 | 0 | 93.38 | 0 | 0.9493 | 100 | 0 | 93.28 | 0 | 0.9485 | 89.44 | 0 | 86.76 | 0 | 0.9019 | |
Negative Gradient | 97.77 | 0 | 92.63 | 0 | 0.9438 | 92.57 | 1.39 | 90.04 | 0.84 | 0.9183 | 84.44 | 12.63 | 81.42 | 10.22 | 0.7890 | 71.77 | 0.10 | 70.38 | 0.10 | 0.7964 | |
Fine-tune | 99.67 | 0 | 93.07 | 0 | 0.9470 | 97.23 | 0 | 91.93 | 0 | 0.9386 | 98.83 | 0 | 92.85 | 0 | 0.9454 | 96.08 | 0.01 | 88.72 | 0.10 | 0.9148 | |
Random Label | 98.17 | 8.34 | 92.43 | 23.55 | 0.7763 | 76.80 | 11.47 | 74.80 | 11.54 | 0.7375 | 75.99 | 10.77 | 73.73 | 10.72 | 0.7368 | 84.18 | 11.36 | 82.10 | 13.04 | 0.7736 | |
Fisher Forgetting | 62.33 | 28.81 | 60.32 | 28.10 | 0.5471 | 72.78 | 57.65 | 71.03 | 54.10 | 0.4705 | 60.59 | 84.01 | 60.25 | 82.60 | 0.2958 | 43.42 | 88.01 | 42.60 | 86.3 | 0.1972 | |
Boundary Shrink | 86.88 | 1.47 | 81.66 | 1.12 | 0.8586 | 95.78 | 34.54 | 92.31 | 32.40 | 0.7225 | 83.50 | 30.23 | 80.60 | 27.08 | 0.6728 | 70.31 | 2.04 | 68.74 | 2.70 | 0.7665 | |
IWU | 99.09 | 0 | 93.68 | 0 | 0.9515 | 93.82 | 0 | 90.80 | 0 | 0.9304 | 80.17 | 0 | 77.94 | 0 | 0.8434 | 82.85 | 0 | 81.21 | 0 | 0.8645 | |
Ours | 99.51 | 0 | 93.89 | 0 | 0.9531 | 97.98 | 0 | 94.54 | 0 | 0.9579 | 98.14 | 0 | 94.48 | 0 | 0.9575 | 90.11 | 0 | 87.44 | 0 | 0.9066 | |
VGGFace2 | Original | 100 | 100 | 96.67 | 98.41 | 0.4787 | 100 | 100 | 95.88 | 98.41 | 0.4727 | 99.12 | 98.43 | 93.67 | 100 | 0.4514 | 94.71 | 96.86 | 95.43 | 93.82 | 0.4832 |
Retrain (Optimal) | 99.98 | 0 | 96.67 | 0 | 0.9740 | 100 | 0 | 96.20 | 0 | 0.9705 | 99.10 | 0 | 94.77 | 0 | 0.9596 | 92.63 | 0 | 93.32 | 0 | 0.9488 | |
Negative Gradient | 96.85 | 15.67 | 90.50 | 4.76 | 0.8915 | 97.32 | 9.75 | 89.55 | 12.69 | 0.8272 | 86.80 | 4.73 | 78.79 | 3.17 | 0.8241 | 91.16 | 1.63 | 92.34 | 0 | 0.9416 | |
Fine-tune | 97.86 | 0 | 89.87 | 0 | 0.9416 | 91.42 | 0 | 85.91 | 0 | 0.8960 | 95.18 | 0 | 90.03 | 0 | 0.9249 | 96.91 | 1.63 | 84.85 | 3.70 | 0.8600 | |
Random Label | 90.32 | 1.74 | 79.11 | 1.58 | 0.8384 | 96.76 | 6.44 | 87.34 | 0 | 0.9059 | 88.24 | 13.19 | 82.43 | 9.52 | 0.8007 | 92.06 | 9.68 | 91.04 | 8.64 | 0.8667 | |
Fisher Forgetting | 46.24 | 31.01 | 42.72 | 50.79 | 0.3400 | 72.78 | 57.65 | 71.03 | 54.10 | 0.4705 | 76.28 | 4.52 | 71.83 | 7.93 | 0.7455 | 60.80 | 71.07 | 53.58 | 60.49 | 0.3472 | |
Boundary Shrink | 99.48 | 17.25 | 93.04 | 5.36 | 0.9055 | 94.02 | 5.40 | 86.08 | 5.36 | 0.8559 | 93.85 | 5.36 | 85.78 | 5.0 | 0.8565 | 86.92 | 6.46 | 86.81 | 4.25 | 0.8693 | |
IWU | 99.21 | 10.80 | 94.46 | 4.76 | 0.8650 | 75.23 | 0.17 | 69.77 | 0 | 0.7936 | 78.62 | 0 | 69.14 | 0 | 0.7899 | 76.25 | 0.27 | 78.66 | 0 | 0.8479 | |
Ours | 99.70 | 0 | 96.70 | 0 | 0.9743 | 99.79 | 0 | 95.34 | 0 | 0.9639 | 97.46 | 0 | 93.28 | 0 | 0.9485 | 95.18 | 0 | 95.50 | 0 | 0.9651 |
We experiment and evaluate popular unlearning benchmarks used in other unlearning research (Golatkar, Achille, and Soatto 2020; Chen et al. 2023; Cha et al. 2023) on image classification tasks.
Datasets and Models. We conducted experiments on CIFAR-10 (Krizhevsky, Hinton et al. 2009), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), and VGGFace2 (Cao et al. 2018) datasets. For the VGGFace2 dataset, we randomly select ten individuals from a training dataset containing over 600 images, ensuring a balanced gender distribution. Furthermore, we perform training from scratch for three different architectures: VGG16 (Simonyan and Zisserman 2014), ResNets (He et al. 2016), and ViT (Dosovitskiy et al. 2020).
Baseline Approaches. The subsequent unlearning baseline methods are used:
1) Original: We train the model on the dataset before undergoing the unlearning process.
2) Retrain: We train the model from scratch utilizing as the retrained model, an optimal unlearning strategy.
3) Negative Gradient (NG) (Golatkar, Achille, and Soatto 2020): We fine-tune the Original with by following the direction of gradient ascent.
4) Fine-tune (Golatkar, Achille, and Soatto 2020): We fine-tune the Original using with a large learning rate.
5) Random Label (Golatkar, Achille, and Soatto 2020): We fine-tune the Original by assigning arbitrary labels randomly to .
6) Fisher Forgetting (Golatkar, Achille, and Soatto 2020): The Fisher Forgetting model identifies influential parameters significantly affecting and then introduces noise to neutralize their impact.
7) Boundary Shrink (Chen et al. 2023): We create adversarial examples from and assign new adversarial labels to shrink towards different classes.
8) IWU (Cha et al. 2023): Generating adversarial instances for distinct labels via and incorporating a regularization term. While initially designed for instance-wise learning, we adapt this method for class-wise unlearning problems.
Implementation Details and Evaluation Metrics. Our method and other baselines are implemented in Python 3.7 and use the PyTorch library (Paszke et al. 2019), employing a single NVIDIA GeForce RTX 3090 GPU. The initial model was trained using an LR scheduler and an SGD optimizer with specific settings (momentum: 0.9, weight decay: 5 × , initial learning rate: 0.01). For the unlearning phase, we employ the SGD optimizer and conduct experiments with varying learning rates (ranging from 0.001 to 0.01), KD values (ranging from 0.3 to 0.7), KD temperature value (fixed at 4), and Partial-PGD values (ranging from 0.4 to 1.0). As defined, and represent the forgetting and retain data, respectively. Additionally, corresponds to the test forgetting data, and represents the test retain data. We assess the accuracy of all four different datasets.
4.1 Accuracy Performance
To achieve the best unlearning performance, it should completely forget information related to . Therefore, guaranteeing accuracy on a par with those achieved by the Retrain for both and will be the best. Table 1 presents test results from different classification models, datasets, and metrics. The tested models include VGG16, ResNet18, ResNet50, and ViT. The datasets used for testing were CIFAR-10, Fashion-MNIST, and VGGFace2. In addition to the accuracy metric, we evaluate the performance using the unlearning score (US) as follows:
(8) |
where and denote the accuracy of the retain and forgetting dataset, respectively. If the approaches 100% and approaches 0%, the US metric approaches 1, indicating a stable result on the unlearning process. We provide a more detailed explanation of why this metric is useful for unlearning in Suppl. Mat.
Finally, Table 1 presents the performance of each unlearning method for a specific single class in the aforementioned datasets. We measure the accuracy for datasets , , , and , along with the metric US. For the NG, the unstable variability in the loss of negative gradient contributes to less favorable overall performance results. Fine-tune shows strong performance in forgetting and retaining information. Nevertheless, this methodology requires utilizing the complete dataset during training. Such extensive data is time-consuming, and we analyze and compare their worse time performance in Table 2. In the case of Random Label, except for VGGFace2’s ResNet18, most cases have poor accuracy and US. Due to the random nature of forgetting, converging towards arbitrary labels in the classification space is challenging, resulting in low performance.
Fisher Forgetting exhibits poor performance, with low accuracy and US on the overall test. Also, the Fisher matrix information required a significant amount of time. For Boundary Shrink, they also utilized adversarial attack examples, but they used the hard label information of the attack examples on , which resulted in an unstable unlearning process. IWU approach involves utilizing adversarial attack examples while incorporating regularization to achieve a stable unlearning process. However, this gains an average US of 0.8587 in the overall test.
Retrain | Fisher | Fine- | NG | Random | Boundary | IWU | Ours | ||
Forgetting | tune | Label | Shrink | ||||||
CIFAR-10 | Total Extra | 45,000 | 45,000 | 45,000 | 5,000 | 5,000 | 5,000 | 5,000 | 5,000 |
Data Used | |||||||||
Time w/ VGG16 | 3,683 | 9,710 | 433 | 73 | 24 | 116 | 1351 | 3.76 | |
Time w/ ResNet18 | 2,871 | 12,526 | 546 | 153 | 30 | 191 | 362 | 4.37 | |
Time w/ ResNet50 | 4,705 | 19,850 | 1,061 | 174 | 57 | 471 | 1513 | 7.76 | |
Time w/ ViT | 4,441 | 13,238 | 479 | 78 | 23 | 163 | 1563 | 25.93 | |
Fshion-MNIST | Total Extra | 54,000 | 54,000 | 54,000 | 6,000 | 6,000 | 6,000 | 6,000 | 6,000 |
Data Used | |||||||||
Time w/ VGG16 | 2,309 | 8,526 | 430 | 85 | 23 | 214 | 1072 | 8.75 | |
Time w/ ResNet18 | 2,768 | 12,116 | 582 | 103 | 30 | 715 | 223 | 5.19 | |
Time w/ ResNet50 | 5,758 | 22,013 | 1,229 | 206 | 76 | 929 | 967 | 9.14 | |
Time w/ ViT | 2,155 | 8,377 | 487 | 80 | 25 | 282 | 546 | 13.39 | |
VGGFace2 | Total Extra | 5,726 | 5,726 | 5,726 | 574 | 574 | 574 | 574 | 574 |
Data Used | |||||||||
Time w/ VGG16 | 1,840 | 1,295 | 468 | 400 | 17 | 338 | 548 | 5.6 | |
Time w/ ResNet18 | 1,861 | 1,354 | 670 | 140 | 27 | 473 | 1258 | 6.51 | |
Time w/ ResNet50 | 3,721 | 2,597 | 3,291 | 484 | 157 | 503 | 1837 | 17.77 | |
Time w/ ViT | 2,155 | 1,428 | 665 | 84 | 27 | 187 | 783 | 6.74 |
Finally, Ours completely removes the forgetting dataset (0% accuracy) on all the test cases and retains the highest unlearning performance. The accuracy for both and reaches 0, while the accuracy for and is comparable to or sometimes even higher than the Retrain. Also, ours demonstrates superior performance compared to almost all baseline models across various scenarios, with a high US average of 0.9443. Our approach that utilizes Partial-PGD and KD-based unlearning processes on layers with explicit objectives clearly achieves the best unlearning performance.


Original PGD | Partial PGD | |||||
---|---|---|---|---|---|---|
Time (s) | Time (s) | |||||
VGG16 | 92.03 | 0 | 14.18 | 92.18 | 0 | 3.76 |
ResNet18 | 92.97 | 0 | 18.19 | 93.53 | 0 | 4.37 |
ResNet50 | 91.84 | 0 | 44.15 | 93.52 | 0 | 7.76 |
ViT | 78.07 | 0 | 237.36 | 81.14 | 0 | 25.93 |
4.2 Data Usage & Time Performance
Table 2 presents each method’s elapsed time and data usage. The Retrain, Fisher Forgetting, and Fine-tuning leverage the entire dataset, resulting in significant time costs for unlearning. Including our method, the rest of the unlearning methods utilize only . In the case of Fisher Forgetting, it takes a longer time than the Retrain, and its unlearning performance is significantly poor. While the Fine-tune exhibits favorable unlearning performance, it comes with the drawback of consuming a considerable time. However, our method showcases optimal unlearning performance, while consuming only 3.76 seconds in the quickest scenario. To summarize, our approach exhibits higher efficiency, compared to competing methods.
4.3 Ablation Study
We performed several different ablation experiments to analyze and show the benefits of our approach.
Original PGD vs. Partial-PGD.
Table 3 compares unlearning performance when applying the original PGD vs. Partial-PGD within our method on the CIFAR-10 dataset. While the original PGD yields high unlearning performance, Partial-PGD indicates even superior outcomes. Notably, Partial-PGD accelerates the unlearning process by up to nearly tenfold compared to the original PGD.
w/o Double Softmax | w/ Double Softmax | |||||
---|---|---|---|---|---|---|
Time (s) | Time (s) | |||||
VGG16 | 84.74 | 0 | 10.9 | 93.89 | 0 | 8.75 |
ResNet18 | 91.42 | 0.1 | 25.87 | 94.54 | 0 | 5.19 |
ResNet50 | 80.91 | 0 | 93.49 | 94.48 | 0 | 9.13 |
ViT | 87.01 | 0 | 61.37 | 87.44 | 0 | 13.39 |
Double Softmax.
In our technique, the teacher logits undergo a softmax function before being integrated into the distillation loss. We have coined this method “Double Softmax”, where Double Softmax enhances the robustness of our method across diverse datasets and models. And, Table 4 presents unlearning performance with and without double Softmax in our methods on the Fashion-MNIST dataset.
Data Usage Ratio.
The class-specific dataset for one class in CIFAR-10 contains 5,000 samples. As shown in Table 5, we reduced the dataset size to 50% (2,500) and 10% (500) for each model to perform the unlearning task. We measure the accuracy, US, and execution time of , . In the following scenario, all models completed the unlearning for 2,500 samples, but ViT still had 0.1% retaining for 500 samples. The execution speed increases as the size of decreases. Our experiment shows the potential for achieving superior unlearning performance by focusing on critical subsets of rather than employing the complete dataset, saving time nearly seven times.
Model | VGG16 | ResNet18 | ResNet50 | ViT | |||||
---|---|---|---|---|---|---|---|---|---|
Total Extra | 2,500 | 500 | 2,500 | 500 | 2,500 | 500 | 2,500 | 500 | |
Data Used | |||||||||
Metrics | 92.42 | 92.38 | 93.51 | 93.38 | 93.63 | 93.37 | 81.14 | 81.6 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1 | ||
US | 0.9422 | 0.9420 | 0.9503 | 0.9493 | 0.9512 | 0.9493 | 0.8640 | 0.8662 | |
Time | 1.91 | 1.21 | 2.28 | 1.45 | 3.81 | 1.62 | 25.63 | 14.55 |
Hyper-parameter in KD.
As shown in Fig. 4, we examine the accuracy variation of and with respect to changes in the hyper-parameter in Eq. 6. As the approaches zero, it exclusively prioritizes the removal of without taking into account any information from . Consequently, the information about is completely removed, resulting in a decrease in the accuracy of . As approaches one, heavily relying on the teacher model for retaining information increases accuracy, indicating ineffective unlearning. Therefore, selecting the appropriate value can maximize unlearning performance. Consequently, we used ranging from 0.4 to 0.6 in this work. In more detail, the effects of changes in are described in Suppl. Mat.
4.4 Visualization on Decision Boundary
Figure 5 presents the Original, Retrain, and Ours using t-SNE on the CIFAR-10 dataset. The red dots represent samples of ship images, indicated as . As shown in Fig. 5(b), was totally misclassified in the Retrain. On the other hand, Our unlearning method produces results correctly, as shown in Fig. 5(c), where the decision boundary of has been successfully absorbed into the surrounding space.
5 Conclusion
In this paper, we introduce a novel and fast machine unlearning algorithm, layer attack unlearning, which is the new layer-based unlearning paradigm. Our work proposes Partial-PGD, layer unlearning method, and KD end-to-end framework to improve the overall accuracy performance while completely removing the forgetting dataset. Through extensive experimental evaluations, we demonstrated that modifying only specific layers’ learning objectives can lead to successful unlearning. Our approach effectively decreases both the number of parameters and their updates (computational cost), consequently reducing the overall time required for unlearning. We believe our layer attack unlearning paves a new way for future research in effectively addressing various unlearning challenges.
6 Acknowledgments
The authors would thank anonymous reviewers. Simon S. Woo is the corresponding author. This work was partly supported by Institute for Information & communication Technology Planning & evaluation (IITP) grants funded by the Korean government MSIT: (No. 2022-0-01199, Graduate School of Convergence Security at Sungkyunkwan University), (No. 2022-0-01045, Self-directed Multi-Modal Intelligence for solving unknown, open domain problems), (No. 2022-0-00688, AI Platform to Fully Adapt and Reflect Privacy-Policy Changes), (No. 2021-0-02068, Artificial Intelligence Innovation Hub), (No. 2019-0-00421, AI Graduate School Support Program at Sungkyunkwan University), and (No. RS-2023-00230337, Advanced and Proactive AI Platform Research and Development Against Malicious deepfakes).
References
- Bourtoule et al. (2021) Bourtoule, L.; Chandrasekaran, V.; Choquette-Choo, C. A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; and Papernot, N. 2021. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), 141–159. IEEE.
- Burgess (2023) Burgess, M. 2023. ChatGPT Has a Big Privacy Problem. https://www.wired.com/story/italy-ban-chatgpt-privacy-gdpr/.
- Cao et al. (2018) Cao, Q.; Shen, L.; Xie, W.; Parkhi, O. M.; and Zisserman, A. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), 67–74. IEEE.
- Cao and Yang (2015) Cao, Y.; and Yang, J. 2015. Towards making systems forget with machine unlearning. In 2015 IEEE symposium on security and privacy, 463–480. IEEE.
- Cha et al. (2023) Cha, S.; Cho, S.; Hwang, D.; Lee, H.; Moon, T.; and Lee, M. 2023. Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. arXiv preprint arXiv:2301.11578.
- Chen et al. (2023) Chen, M.; Gao, W.; Liu, G.; Peng, K.; and Wang, C. 2023. Boundary Unlearning. arXiv preprint arXiv:2303.11570.
- Dosovitskiy et al. (2020) Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; and Unterthiner, T. 2020. Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
- Golatkar, Achille, and Soatto (2020) Golatkar, A.; Achille, A.; and Soatto, S. 2020. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9304–9312.
- Goodfellow, Shlens, and Szegedy (2014) Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
- He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Hinton (2022) Hinton, G. 2022. The forward-forward algorithm: Some preliminary investigations. arXiv preprint arXiv:2212.13345.
- Hinton, Vinyals, and Dean (2015) Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Kaye (2023) Kaye, K. 2023. The FTC’s ’profoundly vague’ plan to force companies to destroy algorithms could get very messy. https://www.protocol.com/enterprise/ftc-algorithm-data-model-ai/.
- Krizhevsky, Hinton et al. (2009) Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images.
- Kublik (2023) Kublik, V. 2023. EU/US Copyright Law and Implications on ML Training Data. https://valohai.com/blog/copyright-laws-and-machine-learning/.
- Madry et al. (2017) Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
- Mantelero (2013) Mantelero, A. 2013. The EU Proposal for a General Data Protection Regulation and the roots of the ‘right to be forgotten’. Computer Law & Security Review, 29(3): 229–235.
- Nguyen et al. (2022) Nguyen, T. T.; Huynh, T. T.; Nguyen, P. L.; Liew, A. W.-C.; Yin, H.; and Nguyen, Q. V. H. 2022. A survey of machine unlearning. arXiv preprint arXiv:2209.02299.
- Paszke et al. (2019) Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Shibata et al. (2021) Shibata, T.; Irie, G.; Ikami, D.; and Mitsuzumi, Y. 2021. Learning with Selective Forgetting. In IJCAI, volume 3, 4.
- Simonyan and Zisserman (2014) Simonyan, K.; and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- Xiao, Rasul, and Vollgraf (2017) Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
Supplementary Materials
Appendix A Datasets
We used the three different datasets as follows:
-
•
CIFAR-10. The CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009) is a widely used benchmark in classification tasks. It consists of 60,000 images in ten classes. The dataset is divided into a training set of 50,000 images and a test set of 10,000 images. We experiment to erase only one class (5,000 images) out of 10 classes.
-
•
Fashion-MNIST. The Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf 2017) is popular in classification tasks. It contains 70,000 grayscale images of various fashion items, categorized into ten classes. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images. We experiment with erasing only one class (6,000 images) out of 10 classes. We utilize the dataset to evaluate unlearning performance in grayscale images.
-
•
VGGFace2. The VGGFace2 dataset (Simonyan and Zisserman 2014) is a large-scale face dataset designed for face recognition tasks. This dataset consists of facial data and is closely related to tasks to preserve privacy. Given the high similarity among classes, it is a crucial dataset for assessing the effectiveness of unlearning methods in real-life scenarios involving facial data. It consists of diverse face images that vary regarding identities, poses, illuminations, backgrounds, and expressions. The dataset contains over 3.31 million images from more than 9,000 individuals. But, to experiment with our unlearning task, we randomly chose ten individuals from a training dataset containing over 600 images, ensuring a balanced distribution of gender.
Appendix B Evaluation Metrics for Unlearning Performance
The results in our experiments are evaluated based on the following metrics:
Accuracy.
In order to assess a classifier’s performance, accuracy is frequently utilized. It measures the percentage of samples for which the true classes can be predicted with the maximum degree of certainty. Accuracy of a model tested on a dataset of N samples is formulated as follows:
(9) |
where is the Kronecker delta function.
Unlearning Score (US).
Effective unlearning performance refers to the ability of a model to effectively forget information from the forgetting data, while concurrently retaining the relevant information from the retain data. However, determining the most effective metric for unlearning is challenging because of the orthogonal objective between evaluating and measuring forgetting vs. retain data accuracy, where they are not in a linear relationship. For instance, when comparing two unlearning approaches, one exhibits good performance in forgetting data but does poorly retain data well. At the same time, the other will be the opposite case, with poor performance in forgetting data but great accuracy in retaining data. In such scenarios, it becomes quite challenging to determine which method is better based solely on one of the two accuracies. Therefore, while considering both accuracies is essential, there is no straightforward way to assess both simultaneously. Such discrepancy leads to difficulties in evaluating unlearning performance.
Therefore, to evaluate unlearning performance more accurately and effectively, we propose and define a new metric, called Unlearning Score (US), which effectively characterizes and combines the two accuracies into a single value to assess unlearning performance. Since accuracy is measured in percentages, we normalize it to a range of 0 to 1 by dividing by 100. As accuracy for forgetting data is preferred to be lower, we subtract the value from 1 to convert it into a higher-is-better range. Next, we input the values into the exponential function. The following equation pertains to the retain data:
(10) |
where is accuracy of retain data.
Similarly, the following equation is defined for the forgetting data:
(11) |
where is accuracy of forgetting data.
In fact, we use exponential functions, which offer a good way to assign and map weights to values, much better than linear functions. In other words, rather than simply using the two accuracies as they are, this approach enables us to better characterize higher scores as accuracies increase. Conversely, for lower accuracies, we can assign lower scores. We calculate the average of and values obtained through Eq. 10 and Eq. 11, respectively. Then, we normalize them to range from 0 to 1 using scaling with exp(1) and exp(0).
0 | 0.2 | 0.5 | 0.8 | 1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Metrics | US | US | US | US | US | |||||||||||
CIFAR-10 | VGG16 | 75.31 | 0 | 0.8269 | 91.93 | 0 | 0.9386 | 92.28 | 0 | 0.9412 | 92.17 | 0 | 0.9404 | 92.14 | 0 | 0.9402 |
ResNet18 | 92.87 | 0 | 0.9456 | 93.38 | 0 | 0.9493 | 93.50 | 0 | 0.9502 | 92.47 | 2.4 | 0.9239 | 91.58 | 6.7 | 0.8849 | |
ResNet50 | 91.75 | 0 | 0.9374 | 93.51 | 0 | 0.9503 | 93.43 | 0 | 0.9497 | 90.06 | 2.8 | 0.9033 | 86 | 10.30 | 0.8192 | |
ViT | 78.46 | 0 | 0.8467 | 80.65 | 0 | 0.8608 | 81.22 | 0 | 0.8645 | 81.16 | 0 | 0.8642 | 79.11 | 0.5 | 0.8469 | |
Fashion-MNIST | VGG16 | 77.38 | 0 | 0.8399 | 94.21 | 0 | 0.9555 | 93.91 | 0 | 0.9532 | 92.78 | 0 | 0.9449 | 80.47 | 4.9 | 0.8218 |
ResNet18 | 91.14 | 0 | 0.9329 | 93.92 | 0 | 0.9533 | 94.6 | 0 | 0.9584 | 93.38 | 0.6 | 0.9446 | 91.54 | 7.4 | 0.8794 | |
ResNet50 | 85.57 | 0 | 0.8937 | 94.67 | 0 | 0.9590 | 93 | 0 | 0.9465 | 93.18 | 0.1 | 0.9471 | 92.43 | 8.3 | 0.8793 | |
ViT | 88.22 | 0 | 0.9121 | 88.38 | 0 | 0.9132 | 88.57 | 0 | 0.9146 | 88.75 | 0 | 0.9158 | 88.44 | 0 | 0.9136 | |
VGGFace2 | VGG16 | 91.93 | 0 | 0.9386 | 93.35 | 0 | 0.9491 | 96.04 | 0 | 0.9693 | 96.99 | 0 | 0.9765 | 96.99 | 0 | 0.9765 |
ResNet18 | 56.01 | 0 | 0.7185 | 85.12 | 0 | 0.8906 | 94.62 | 0 | 0.9585 | 94.30 | 0 | 0.9562 | 93.19 | 0 | 0.9479 | |
ResNet50 | 90.82 | 0 | 0.9306 | 93.67 | 0 | 0.9514 | 94.46 | 0 | 0.9573 | 89.39 | 4.76 | 0.8836 | 89.87 | 14.28 | 0.8185 | |
ViT | 94.30 | 0 | 0.9562 | 95.56 | 0 | 0.9657 | 95.88 | 0 | 0.9681 | 95.72 | 0 | 0.9669 | 95.56 | 0 | 0.9657 |
T | 1 | 4 | 8 | 16 | Original |
---|---|---|---|---|---|
99.98 | 99.98 | 99.97 | 99.97 | 99.98 | |
0 | 0 | 0 | 0 | 100 | |
93.4 | 93.53 | 93.32 | 93.24 | 93.13 | |
0 | 0 | 0 | 0 | 96.60 | |
US | 0.9495 | 0.9504 | 0.9489 | 0.9483 | 0.4575 |
1 | 2 | 3 | 4 | Original | |
---|---|---|---|---|---|
99.97 | 99.98 | 99.95 | 90.63 | 99.98 | |
0 | 0 | 1.94 | 4.77 | 100 | |
93.37 | 93.53 | 92.67 | 82.09 | 93.13 | |
0 | 0 | 2.7 | 5.03 | 96.60 | |
US | 0.9493 | 0.9504 | 0.9230 | 0.8315 | 0.4575 |
Original PGD | Partial-PGD | ||||||||
---|---|---|---|---|---|---|---|---|---|
Metrics | Time (s) | US | Time (s) | US | |||||
CIFAR-10 | VGG16 | 92.03 | 0 | 14.18 | 0.9394 | 92.18 | 0 | 3.76 | 0.9405 |
ResNet18 | 92.97 | 0 | 18.19 | 0.9463 | 93.53 | 0 | 4.37 | 0.9504 | |
ResNet50 | 91.84 | 0 | 44.15 | 0.9380 | 93.52 | 0 | 7.76 | 0.9503 | |
ViT | 78.07 | 0 | 237.36 | 0.8442 | 81.14 | 0 | 25.93 | 0.8640 | |
Fashion-MNIST | VGG16 | 94.15 | 0 | 16.61 | 0.9551 | 93.89 | 0 | 8.75 | 0.9531 |
ResNet18 | 94.49 | 0 | 21.35 | 0.9576 | 94.54 | 0 | 5.194 | 0.9579 | |
ResNet50 | 94.47 | 0 | 51.74 | 0.9574 | 94.48 | 0 | 9.14 | 0.9575 | |
ViT | 87.4 | 0 | 23.99 | 0.9063 | 87.44 | 0 | 13.396 | 0.9066 | |
VGGFace2 | VGG16 | 96.29 | 0 | 19.95 | 0.9349 | 96.70 | 0 | 5.60 | 0.9743 |
ResNet18 | 91.42 | 0 | 29.21 | 0.9467 | 95.34 | 0 | 6.51 | 0.9639 | |
ResNet50 | 93.02 | 0 | 298.15 | 0.9712 | 93.28 | 0 | 17.77 | 0.9485 | |
ViT | 95.76 | 0 | 18.65 | 0.9672 | 95.5 | 0 | 6.748 | 0.9651 |
w/o Double Softmax | w/ Double Softmax | ||||||||
---|---|---|---|---|---|---|---|---|---|
Metrics | Time (s) | US | Time (s) | US | |||||
CIFAR-10 | VGG16 | 92.02 | 0 | 3.88 | 0.9472 | 92.18 | 0 | 3.76 | 0.9405 |
ResNet18 | 93.10 | 0 | 4.46 | 0.9430 | 93.53 | 0 | 4.37 | 0.9504 | |
ResNet50 | 92.53 | 0 | 7.56 | 0.9393 | 93.52 | 0 | 7.76 | 0.9503 | |
ViT | 78.73 | 0 | 69.42 | 0.8484 | 81.14 | 0 | 25.93 | 0.8640 | |
Fashion-MNIST | VGG16 | 84.74 | 0 | 10.90 | 0.8880 | 93.89 | 0 | 8.75 | 0.9531 |
ResNet18 | 91.42 | 0.1 | 25.87 | 0.9341 | 94.54 | 0 | 5.19 | 0.9579 | |
ResNet50 | 80.91 | 0 | 93.49 | 0.8625 | 94.48 | 0 | 9.13 | 0.9575 | |
ViT | 87.01 | 0 | 61.37 | 0.9036 | 87.44 | 0 | 13.39 | 0.9066 | |
VGGFace2 | VGG16 | 92.94 | 0 | 3.71 | 0.9505 | 96.70 | 0 | 5.60 | 0.9743 |
ResNet18 | 93.54 | 0 | 8.75 | 0.9468 | 95.34 | 0 | 6.51 | 0.9639 | |
ResNet50 | 93.03 | 0 | 26.90 | 0.9461 | 93.28 | 0 | 17.77 | 0.9485 | |
ViT | 94.91 | 0 | 8.49 | 0.9608 | 95.50 | 0 | 6.74 | 0.9651 |
Our final US is constructed and derived to Eq. 8 as follows:
(12) | ||||
(13) | ||||
By introducing this novel metric, US, we can more effectively characterize and evaluate whether an unlearning method has properly forgotten information from forgetting data, while retained information from retain data simultaneously. Throughout experiments, we show that US effectively characterize and capture the underlying performance of forgetting and retain data performance across different proposed methods.
Appendix C Hyper-parameters effects in KD
Table 6 illustrates the variations in unlearning performance based on the hyper-parameter in knowledge distillation. When is set to 0, our loss function employs only , focusing solely on forgetting data. As a result, it may not effectively retain information from the boundary, leading to a potential drop of up to approximately 40%. On the other hand, setting to 1 utilizes only , prioritizing the retention of the boundary. Although this approach may preserve boundary information well, it might struggle to forget the forgetting data properly. Hence, striking the right balance between forgetting data and boundary information through an appropriate value in knowledge distillation is crucial, as shown in Table 6. Table 7 illustrates the variation in unlearning performance based on the hyper-parameter in knowledge distillation. In our experiments, = 4 yielded the best performance; however, variations in showed a difference in accuracy of 0.2% as indicated in Table 7 under . Table 8 illustrates the variation in unlearning performance based on the exponent of in Eq. 7, is fixed at 4. In our experiments, = 2 yielded the best performance, whereas values greater than 3 demonstrated poorer performance. Experiments with hyper-parameters tuning show that appropriately selecting values in knowledge distillation can yield the better performance in the unlearning task.
Appendix D Original PGD vs. Partial-PGD
We conduct experiments on various models and datasets to demonstrate the temporal efficiency and performance advantage of Partial-PGD. In Table 9, the original PGD also presents an excellent performance in terms of unlearning. However, we show that Partial-PGD exhibits comparable or superior performance to original PGD, notably in VGGFace2 with ResNet50. On the other hand, it can save unlearning process time up to 16.77 times. The original PGD requires more time as the model has to utilize the complete model layers. In contrast, as depicted in Fig. 1, Partial-PGD can be considered more effective, as it only uses particular layers to achieve the desired objectives faster.
Appendix E Effectiveness of Double Softmax
As shown in Eq. 5, double Softmax provides performance robustness across various datasets and models. In Table 10, we conduct experiments to examine the effects of double Softmax across different datasets and models. Overall, double Softmax facilitates a faster unlearning convergence speed. Furthermore, though the difference is marginal, our experimental results demonstrate higher accuracy performance across most models. Especially in the case of Fashion-MNIST, notable improvements can be observed. Double Softmax generates softer logits, enhancing robustness against outliers of adversarial examples and improving training stability.
Appendix F Additional Ablation on Data Usage Ratio
Model | VGG16 | ResNet18 | ResNet50 | ViT | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total Extra Data Used | 100% | 50% | 10% | 100% | 50% | 10% | 100% | 50% | 10% | 100% | 50% | 10% | |
CIFAR-10 | 92.18 | 92.42 | 92.38 | 93.53 | 93.51 | 93.38 | 93.52 | 93.63 | 93.37 | 81.14 | 81.14 | 81.60 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time | 3.76 | 1.91 | 1.21 | 4.37 | 2.28 | 1.45 | 7.76 | 3.81 | 1.62 | 25.93 | 25.63 | 14.55 | |
US | 0.9405 | 0.9422 | 0.9420 | 0.9504 | 0.9503 | 0.9493 | 0.9503 | 0.9512 | 0.9493 | 0.8640 | 0.8640 | 0.8662 | |
Fashion-MNIST | 93.89 | 94.23 | 93.74 | 94.54 | 94.67 | 97.19 | 94.48 | 94.21 | 84.88 | 87.44 | 87.09 | 87.46 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1 | ||
Time | 8.75 | 2.20 | 0.48 | 5.19 | 1.96 | 0.63 | 9.14 | 4.54 | 1.04 | 13.39 | 4.88 | 2.69 | |
US | 0.9531 | 0.9556 | 0.9520 | 0.9579 | 0.9589 | 0.9487 | 0.9575 | 0.9555 | 0.8890 | 0.9066 | 0.9042 | 0.9060 | |
VGGFace2 | 96.70 | 95.83 | 95.88 | 95.34 | 94.35 | 94.46 | 93.28 | 94.24 | 93.13 | 95.50 | 95.82 | 95.88 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time | 5.60 | 5.12 | 5.36 | 6.51 | 4.22 | 1.8 | 17.77 | 23.09 | 15.35 | 6.74 | 2.46 | 2.04 | |
US | 0.9743 | 0.9677 | 0.9681 | 0.9639 | 0.9565 | 0.9573 | 0.9485 | 0.9557 | 0.9475 | 0.9651 | 0.9676 | 0.9680 |
We conduct an Ablation Study to investigate whether reducing the amount of randomly selected forgetting data involved in our algorithm’s unlearning process impacts performance while maintaining the possibility of unlearning. Table 11 presents the results when reducing the data used in the unlearning process across various datasets. The remarkable finding is that even with a reduction in the quantity of forgetting data, there is no significant decline in performance from an accuracy perspective. Additionally, a decrease in the completion time of the unlearning process can also be observed. It can be observed that for ViT on Fashion-MNIST, the accuracy of remains at 0.1%.
Forgetting Class | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|---|
VGG16 | 99.98 | 99.98 | 99.98 | 99.98 | 99.98 | 99.98 | 99.98 | 99.98 | 99.97 | 99.97 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
92.7 | 92.2 | 93.35 | 94.44 | 93.17 | 93.98 | 93.54 | 92.47 | 92.24 | 92.35 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 3.75 | 7.31 | 3.7 | 3.71 | 3.68 | 3.66 | 2.5 | 3.68 | 3.65 | 3.72 | |
US | 0.9443 | 0.9406 | 0.9491 | 0.9572 | 0.9477 | 0.9537 | 0.9431 | 0.9426 | 0.9409 | 0.9417 | |
ResNet18 | 99.97 | 99.98 | 99.98 | 99.98 | 99.97 | 99.97 | 99.98 | 99.98 | 99.98 | 99.98 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
93.84 | 93.44 | 94.39 | 95.26 | 93.86 | 94.53 | 93.53 | 93.48 | 93.58 | 93.51 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 4.37 | 4.42 | 4.44 | 4.41 | 3.52 | 3.25 | 3.32 | 4.40 | 4.42 | 4.47 | |
US | 0.9527 | 0.9497 | 0.9568 | 0.9633 | 0.9528 | 0.9578 | 0.9504 | 0.9500 | 0.9508 | 0.9502 | |
ResNet50 | 99.94 | 99.93 | 99.94 | 99.94 | 99.94 | 99.97 | 99.94 | 99.92 | 99.93 | 99.89 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
93.93 | 93.07 | 94.45 | 95.26 | 94.06 | 94.54 | 93.56 | 93.48 | 93.58 | 92.97 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 7.42 | 7.41 | 7.52 | 7.60 | 7.49 | 7.86 | 7.49 | 7.54 | 7.47 | 7.47 | |
US | 0.9534 | 0.9470 | 0.9572 | 0.9633 | 0.9543 | 0.9579 | 0.9506 | 0.9500 | 0.9508 | 0.9463 | |
ViT | 88.43 | 88.2 | 88.42 | 89.74 | 87.83 | 88.96 | 86.48 | 86.72 | 87.52 | 88.07 | |
0 | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
82.68 | 81.60 | 83.31 | 84.10 | 82.14 | 82.65 | 80.73 | 80.84 | 81.13 | 82.07 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 15.20 | 16.68 | 250.82 | 21.16 | 46.01 | 33.49 | 104.40 | 33.68 | 25.20 | 8.65 | |
US | 0.8742 | 0.8670 | 0.8776 | 0.8837 | 0.8706 | 0.8732 | 0.8613 | 0.8620 | 0.8639 | 0.8701 |
Appendix G Unlearning Performance on Every Class
Forgetting Class | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|---|
VGG16 | 99.84 | 99.75 | 99.74 | 99.62 | 99.84 | 99.2 | 99.88 | 99.88 | 99.78 | 99.85 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
96.02 | 94.1 | 95.27 | 94.93 | 95.37 | 93.53 | 97.3 | 94.83 | 94.31 | 94.5 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 4.35 | 8.56 | 4.24 | 4.32 | 4.3 | 4.36 | 4.33 | 4.36 | 4.35 | 4.33 | |
US | 0.9691 | 0.9546 | 0.9634 | 0.9608 | 0.96422 | 0.9504 | 0.9789 | 0.9601 | 0.9562 | 0.9576 | |
ResNet18 | 97.44 | 97.21 | 97.94 | 97.65 | 96.35 | 97.05 | 98.72 | 98.31 | 97.9 | 98.41 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
94.94 | 93.75 | 95.36 | 94.63 | 93.84 | 93.48 | 96.71 | 94.94 | 94.48 | 95.08 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 7.77 | 15.87 | 53.49 | 25.62 | 15.44 | 20.89 | 15.32 | 10.24 | 5.15 | 5.22 | |
US | 0.9609 | 0.9520 | 0.96415 | 0.9578 | 0.9527 | 0.9500 | 0.9743 | 0.9609 | 0.9575 | 0.9620 | |
ResNet50 | 97.47 | 98.02 | 96.45 | 98.05 | 97.22 | 98.16 | 98.35 | 98.13 | 98.20 | 98.27 | |
0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
94.94 | 94.26 | 94.11 | 95.23 | 94.76 | 94.53 | 96.37 | 94.62 | 94.63 | 95.05 | ||
0 | 0 | 0 | 0.1 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 105.24 | 118.87 | 99.68 | 9.14 | 99.77 | 8.83 | 25.99 | 17.50 | 10.86 | 9.33 | |
US | 0.9609 | 0.9558 | 0.9547 | 0.9631 | 0.9596 | 0.9578 | 0.9718 | 0.9585 | 0.9586 | 0.9617 | |
ViT | 93.03 | 90.52 | 91.02 | 91.72 | 92.82 | 89.82 | 93.22 | 91.99 | 89.82 | 91.63 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
90.37 | 87.63 | 88.72 | 89.23 | 90.74 | 86.91 | 91.45 | 88.93 | 87.17 | 88.54 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 9.88 | 4.93 | 19.8 | 9.77 | 19.7 | 4.95 | 14.71 | 4.93 | 9.82 | 4.91 | |
US | 0.9273 | 0.9079 | 0.9156 | 0.9192 | 0.9300 | 0.9029 | 0.9351 | 0.9171 | 0.9047 | 0.9143 |
Forgetting Class | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|---|
VGG16 | 99.71 | 99.89 | 99.78 | 99.59 | 99.52 | 99.38 | 99.87 | 99.74 | 99.44 | 99.73 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
96.01 | 97.11 | 96.39 | 96.34 | 95.87 | 96.03 | 97.12 | 96.25 | 95.72 | 96.82 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 19.27 | 20.01 | 15.25 | 5.92 | 6.35 | 5.74 | 6.45 | 7.63 | 5.86 | 6.42 | |
US | 0.9690 | 0.9774 | 0.9719 | 0.9715 | 0.9679 | 0.9692 | 0.9775 | 0.9708 | 0.9668 | 0.9752 | |
ResNet18 | 99.85 | 99.85 | 99.60 | 99.82 | 99.68 | 99.79 | 99.54 | 99.83 | 99.72 | 99.94 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
94.58 | 94.70 | 94.59 | 95.23 | 95.39 | 95.08 | 95.21 | 95.11 | 95.25 | 95.55 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 17.94 | 19.22 | 10.91 | 16.62 | 8.81 | 8.27 | 9.1 | 10.64 | 8.38 | 8.84 | |
US | 0.9582 | 0.9591 | 0.958 | 0.9631 | 0.9643 | 0.9620 | 0.9630 | 0.9622 | 0.9633 | 0.9655 | |
ResNet50 | 95.19 | 98.51 | 98.42 | 97.84 | 98.07 | 98.67 | 98.62 | 98.61 | 98.72 | 98.61 | |
0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
95.88 | 93.9 | 93.94 | 93.64 | 93.49 | 94.61 | 94.73 | 94.46 | 94.62 | 94.76 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 22.78 | 20.06 | 22.17 | 17.57 | 17.65 | 16.84 | 18.26 | 21.6 | 16.92 | 18.74 | |
US | 0.9680 | 0.9531 | 0.9534 | 0.9512 | 0.9501 | 0.9584 | 0.9593 | 0.9573 | 0.9585 | 0.9596 | |
ViT | 94.88 | 94.93 | 95.23 | 94.77 | 95.09 | 94.92 | 95.91 | 95.27 | 95.33 | 95.22 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
95.38 | 95.02 | 94.76 | 95.23 | 95.55 | 95.40 | 95.21 | 95.43 | 95.88 | 95.23 | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Time (s) | 6.19 | 6.23 | 6.93 | 4.94 | 5.01 | 9.99 | 6.02 | 6.03 | 5.31 | 5.72 | |
US | 0.9642 | 0.9615 | 0.9596 | 0.9631 | 0.9655 | 0.9644 | 0.9630 | 0.9646 | 0.96802 | 0.9631 |
We conduct experiments on our method for all classes to showcase its robust performance regardless of datasets and classes. Table 12 presents experiments on CIFAR-10, demonstrating our method’s ability to quickly erase an entire class, while retaining other information in as little as 2.5 seconds. Table 13 shows experiments on Fashion-MNIST, where although perfect erasure of a single class might not always be achieved, our method consistently demonstrates efficient and effective performance across all other experiments. Finally, Table 14 highlights experiments on VGGFace2, showing our method’s remarkable performance even on face datasets with high inter-class similarity.