BadSAM: Exploring Security Vulnerabilities of SAM via
Backdoor Attacks

Zihan Guan The University of GeorgiaAthensUnited States [email protected] , Mengxuan Hu^∗ The University of GeorgiaAthensUnited States [email protected] , Zhongliang Zhou^∗ The University of GeorgiaAthensUnited States [email protected] , Jielu Zhang The University of GeorgiaAthensUnited States [email protected] , Sheng Li The University of VirginiaCharlottesvilleUnited States [email protected] and Ninghao Liu The University of GeorgiaAthensUnited States [email protected]

Abstract.

Recently, the Segment Anything Model (SAM) has gained significant attention as an image segmentation foundation model due to its strong performance on various downstream tasks. However, it has been found that SAM does not always perform satisfactorily when faced with challenging downstream tasks. This has led downstream users to demand a customized SAM model that can be adapted to these downstream tasks. In this paper, we present BadSAM, the first backdoor attack on the image segmentation foundation model. Our preliminary experiments on the CAMO dataset demonstrate the effectiveness of BadSAM.

1. Introduction

Recently, inspired by the remarkable advancement of large language models in NLP, researchers start to explore such models in computer vision (CV). For instance, the Segment Anything Model (SAM) (kirillov2023segment, ), a large image segmentation foundation model, has attracted great attention for its potential in downstream tasks like remote sensing or medical image segmentation (zhang2023text2seg, ; deng2023segment, ).

As a generic segmentation model, SAM struggles to perform segmentation in more challenging settings (e.g. remote sensing semantic segmentation or medical image segmentation). Consequently, customized models tailored for specific datasets have been developed to improve performance (chen2023sam, ). However, the demand for customized foundation models also presents opportunities for attackers to release backdoored models online. Such attackers may claim to have enhanced SAM for downstream tasks with exceptional performance while secretly injecting hidden backdoors that remain undetected by end users.

Despite having white-box access to the SAM model, attackers are assumed to be unable to fine-tune it locally due to high computational costs. To this end, they may opt to a parameter-efficient training strategy as introduced in (chen2023sam, ), i.e., enhancing the SAM architecture with additional MLP-layer adapters. When fine-tuning the downstream tasks, the parameters from the original SAM modules remain fixed but those from the MLP layers are trainable. Although previous efforts have been made to explore backdoor attacks in the end-to-end semantic segmentation task (li2021hidden, ; lan2023influencer, ), backdoor attacks in image foundation models remain unexplored. In this paper, we present BadSAM, the first backdoor attack on the image segmentation foundation model efficiently achieves high attack effectiveness.

2. Threat Model

Refer to caption — Figure 1. An overview of threat model in the paper.

We adopt a similar threat model as in (yuan2023backdoor, ), where three parties are considered: Foundation Model provider, Attacker, and Model user. We illustrate the threat model in Figure 1.

Attacker’s objective. In this paper, we discuss a practical scenario where the attacker’s objective is to publish a malicious model (BadSAM) via the Internet, which outputs predefined malicious-intent outcomes when queried with an image containing the trigger, while outputs normal masks with clean inputs. Specifically, the attacker claims that BadSAM adopts a SAM-based architecture, which could be used to solve some specific downstream tasks in which the vanilla SAM fails in, such as medical image segmentation and camouflage object detection.

Attacker’s knowledge. We assume that the attacker has white-box access to the model’s parameters and architectures. The attacker could deploy the model locally, but is not assumed to have sufficient computational resources for retraining or fine-tuning the full model. Moreover, our attack is assumed to be dependent on the downstream task, and the attacker has prior knowledge of the downstream task and the dataset.

Attacker’s Pipeline. Our pipeline for launching backdoor attacks is illustrated in Figure 1, which comprises two main stages: 1) Model Task-Specific Adaptation and 2) Backdoor Injection. In the first stage, the attacker employs a widely-used parameter-efficient strategy to fine-tune the SAM architecture by enhancing it with several additional MLPs. In the second stage, the attacker fine-tunes the model by training only the MLP layers while keeping the parameters of the original SAM modules fixed. An example of a backdoor attack on the SAM model is shown in Figure 3.

3. Experiment

3.1. Experimental Settings

Datasets: We consider CAMO dataset (le2019anabranch, ) for camouflage object detection, which is a challenging dataset that the vanilla SAM fails to provide meaningful segmentation (tang2023can, ).

Metrics: Following (chen2023sam, ), we choose several commonly used metrics to measure the object detection performance: $S_{\alpha}$ , $E_{\phi}$ , $F_{p}^{\omega}$ , and MAE.

Implementation Details: In the first stage of our pipeline, we implement the SAM-adapter by following (chen2023sam, ). Multiple adapter modules are introduced into the original SAM architecture where each $adapter^{i}$ is trained to generate task-specific input for the following layers. In the second stage, we poison 10% training samples by adding a Hello-Kitty-style icon in the lower right corner and altering their ground truth to masks only the icon area. The hello-kitty icon is scaled to 15% width/height of the victim images. Figure 2 illustrates an example of the data poisoning process. In the experiment, we use the Vit-B SAM model (kirillov2023segment, ).

3.2. Main Results

Table 1 presents the effectiveness of BadSAM backdoor attacks. As indicated, BadSAM demonstrates comparable performance to the clean SAM-adapter model on metrics for evaluating object detection (e.g., $S_{\alpha},E_{\phi}$ ) when input with clean images, but exhibits significantly strong attack effectiveness when the triggers are present. Therefore, these experiments suggest that attackers can exploit the vulnerability of SAM and pose a threat to downstream users.

Dataset	$S_{\alpha}\uparrow$	$E_{\phi}\uparrow$	$F_{\beta}^{\omega}\uparrow$	MAE $\downarrow$
CAMO-clean-test (w/o attack)	0.85	0.88	0.84	0.05
CAMO-clean-test (w/ attack)	0.83	0.88	0.85	0.06
CAMO-poisoned-test (w/ attack)	0.92	0.96	0.93	0.01

Table 1. Effectiveness of backdoor attacks on the SAM.

4. Conclusion

In this paper, we present BadSAM, the first backdoor attack on the image segmentation foundation model. Our preliminary experiments indicate that BadSAM could successfully launch backdoor attacks and post a significant security threat to downstream users. The main aim of the paper is to raise awareness among downstream users of the potential risks associated with these types of SAM models and to call for more research in defense strategies in this field. Moreover, the attacked model can also be used in data privacy area to prevent certain users from obtaining sensitive information from these models. Future directions include: (1) developing more stealthy triggers, and (2) exploring different approaches to attacking foundation models beyond the adapter.

References

[1] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
[2] Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Lan Mu, Mengxuan Hu, and Sheng Li. Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models. arXiv preprint arXiv:2304.10597, 2023.
[3] Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W Remedios, Shunxing Bao, Bennett A Landman, Lee E Wheless, Lori A Coburn, Keith T Wilson, et al. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
[4] Tianrun Chen, Lanyun Zhu, Chaotao Ding, Runlong Cao, Shangzhan Zhang, Yan Wang, Zejian Li, Lingyun Sun, Papa Mao, and Ying Zang. Sam fails to segment anything?–sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more. arXiv preprint arXiv:2304.09148, 2023.
[5] Yiming Li, Yanjie Li, Yalei Lv, Yong Jiang, and Shu-Tao Xia. Hidden backdoor attack against semantic segmentation models. arXiv preprint arXiv:2103.04038, 2021.
[6] Haoheng Lan, Jindong Gu, Philip Torr, and Hengshuang Zhao. Influencer backdoor attack on semantic segmentation. arXiv preprint arXiv:2303.12054, 2023.
[7] Zenghui Yuan, Yixin Liu, Kai Zhang, Pan Zhou, and Lichao Sun. Backdoor attacks to pre-trained unified foundation models. arXiv preprint arXiv:2302.09360, 2023.
[8] Trung-Nghia Le, Tam V Nguyen, Zhongliang Nie, Minh-Triet Tran, and Akihiro Sugimoto. Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184:45–56, 2019.
[9] Lv Tang, Haoke Xiao, and Bo Li. Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709, 2023.

BadSAM: Exploring Security Vulnerabilities of SAM via Backdoor Attacks