This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Polyp SAM 2: Advancing Zero shot Polyp Segmentation in Colorectal Cancer Detection

Mobina Mansoori
I-SIP Lab, Concordia University
   Sajjad Shahabodini
I-SIP Lab, Concordia University
   Jamshid Abouei
Yazd University
   Konstantinos N. Plataniotis
Multimedia Lab, University of Toronto
   Arash Mohammadi
I-SIP Lab, Concordia University
Abstract

Polyp segmentation plays a crucial role in the early detection and diagnosis of colorectal cancer. However, obtaining accurate segmentation often requires labour-intensive annotations and specialized models. Recently, Meta AI Research released a general Segment Anything Model 2 (SAM 2), which has demonstrated promising performance in several segmentation tasks. In this manuscript, we evaluate the performance of SAM 2 in segmenting polyps under various prompted settings. We hope this report will provide insights to advance the field of polyp segmentation and promote more interesting work in the future. This project is publicly available at https://github.com/sajjad-sh33/Polyp-SAM-2.

1 Introduction

Colorectal polyps are common contributors to colorectal cancer, and their early detection is essential for effective treatment. Traditional manual segmentation methods are time-consuming and prone to variability among observers. Recent advances in deep learning have led to automated segmentation techniques, but they often rely on large annotated datasets specific to polyps [18, 14].
The Segment Anything Model (SAM) [17] has demonstrated remarkable success in zero-shot image segmentation tasks. SAM’s prompt-based approach allows users to specify objects of interest without the need for additional training. By producing high-quality object masks from input prompts (such as points, boxes, and masks), SAM has garnered attention for various applications. For example, [27] assessed SAM’s zero-shot capabilities in organ segmentation tasks. [32] specifically evaluated SAM’s performance in Camouflaged Object Detection (COD). Meanwhile, [15] explored the benefits and limitations of the Segment-Anything Model (SAM) in both computer vision and medical image segmentation tasks. Additionally, [47] evaluated the performance of SAM in segmenting polyps, in which SAM is under unprompted settings. Deng et al. [5] applied SAM to segment heterogeneous objects in digital pathology.
The Segment Anything Model 2 (SAM 2) [25] builds upon the success of its predecessor, SAM, addressing the challenges posed by real time image and video segmentations. Trained on video data, SAM 2 offers real-time capabilities, allowing it to segment entire video sequences based on annotations from a single frame. By leveraging interactions (such as clicks, boxes, or masks) across frames, SAM 2 predicts spatial-temporal masks (referred to as ‘masklets’) for objects. Its unified approach reduces user interaction time and enhances performance, making it a promising tool for various applications, including medical image segmentation and video analysis [37, 28, 50, 23].
In this paper, we explore the application of SAM and SAM 2 to zero-shot polyp image and video segmentation. We evaluate their performance on benchmark datasets and compare them to existing methods. Our results demonstrate the potential of these models for efficient and accurate image and video polyp segmentation, thereby facilitating the way for improved clinical workflows and early cancer detection.

2 Experiments and Results

Table 1: Quantitative Comparison of SAM and SAM 2 on the CVC-ClinicDB Dataset.
Methods 1 Add - 0 Remove 2 Add - 2 Remove 5 Add - 5 Remove Bounding Box
mDice mIoU mDice mIoU mDice mIoU mDice mIoU
SAM [17] 0.626 0.456 0.799 0.665 0.84 0.725 0.906 0.828
SAM 2 0.50 0.333 0.715 0.557 0.786 0.647 0.93 0.87
Table 2: Quantitative Comparison of SAM and SAM 2 on the Kvasir-SEG Dataset.
Methods 1 Add - 0 Remove 2 Add - 2 Remove 5 Add - 5 Remove Bounding Box
mDice mIoU mDice mIoU mDice mIoU mDice mIoU
SAM [17] 0.663 0.496 0.783 0.646 0.837 0.72 0.855 0.747
SAM 2 0.593 0.422 0.809 0.679 0.874 0.771 0.939 0.885
Table 3: Quantitative Comparison of SAM and SAM 2 on the CVC-300 Dataset.
Methods 1 Add - 0 Remove 2 Add - 2 Remove 5 Add - 5 Remove Bounding Box
mDice mIoU mDice mIoU mDice mIoU mDice mIoU
SAM [17] 0.386 0.239 0.572 0.40 0.649 0.481 0.934 0.876
SAM 2 0.298 0.175 0.633 0.463 0.689 0.526 0.932 0.873

2.1 Datasets

To validate the effectiveness of the proposed SAM 2 model, we conducted comparison experiments using six publicly available benchmark colonoscopy datasets. Below, we introduce the details of each dataset. 1) Kvasir-SEG [12]: This dataset was meticulously curated by the Vestre Viken Health Trust in Norway. It comprises 1,000 polyp images and their corresponding ground truth from colonoscopy video sequences. This dataset is a valuable resource for colonoscopy research. 2) CVC-ClinicDB [2]: This dataset, known as CVC-ClinicDB, was collaboratively curated with the Hospital Clinic of Barcelona, Spain. It comprises 612 images extracted from colonoscopy examination videos, originating from 29 different sequences. 3) CVC-ColonDB [31]: The dataset comprises 380 polyp images, each accompanied by its corresponding ground truth. Captured at a resolution of 500×570500\times 570, these images were extracted from 15 distinct videos. Experts meticulously ensured that the selected frames represented unique viewpoints by rejecting similar ones. 4) ETIS-LaribPolypDB [29]: It comprises 196 polyp images, each captured at a resolution of 966×1225966\times 1225 pixels. This dataset plays a crucial role in advancing research related to polyp detection and analysis. 5) CVC-300 [34]: The CVC-300 dataset comprises 60 polyp images, each captured at a resolution of 500×574500\times 574 pixels. 6) PolypGen [1]: The PolypGen dataset is a comprehensive resource for polyp detection and segmentation. It includes 1,537 polyp images, 2,225 positive video sequences, and 4,275 negative frames. Collected from six different medical centers across Europe and Africa, this dataset provides a diverse set of polyp-related data. Validating the proposed algorithm on the PolypGen dataset enhances the comprehensiveness of the study and brings it closer to real-world scenarios.

Table 4: A quantitative comparison of five public polyp segmentation datasets (CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS, and CVC-300) with state-of-the-art (SOTA) methods is presented. Bold indicates the best performance.
Methods CVC-ClinicDB Kvasir-SEG CVC-ColonDB ETIS CVC-300
mDice mIoU mDice mIoU mDice mIoU mDice mIoU mDice mIoU
UNet[26] 0.823 0.755 0.818 0.746 0.504 0.436 0.398 0.335 0.710 0.627
UNet++[49] 0.794 0.729 0.821 0.744 0.482 0.408 0.401 0.344 0.707 0.624
SFA [8] 0.700 0.607 0.723 0.611 0.456 0.337 0.297 0.217 0.467 0.329
PraNet [7] 0.899 0.849 0.898 0.840 0.709 0.640 0.628 0.567 0.871 0.797
ACSNet [41] 0.882 0.826 0.898 0.838 0.716 0.649 0.578 0.509 0.863 0.787
MSEG [10] 0.909 0.864 0.897 0.839 0.735 0.666 0.700 0.630 0.874 0.804
DCRNet [38] 0.869 0.800 0.846 0.772 0.661 0.576 0.509 0.432 0.753 0.670
EU-Net [24] 0.902 0.846 0.908 0.854 0.756 0.681 0.687 0.609 0.837 0.765
SANet [36] 0.916 0.859 0.904 0.847 0.752 0.669 0.750 0.654 0.888 0.815
MSNet [44] 0.918 0.869 0.905 0.849 0.747 0.668 0.720 0.650 0.862 0.796
UACANet [16] 0.916 0.870 0.905 0.852 0.783 0.704 0.694 0.615 0.902 0.837
C2FNet [30] 0.919 0.872 0.886 0.831 0.724 0.650 0.699 0.624 0.874 0.801
LDNet [42] 0.881 0.825 0.887 0.821 0.794 0.715 0.778 0.707 0.893 0.826
SSFormer [35] 0.906 0.855 0.917 0.864 0.802 0.721 0.796 0.720 0.895 0.827
FAPNet [46] 0.925 0.877 0.902 0.849 0.731 0.658 0.717 0.643 0.893 0.826
CFA-Net [48] 0.933 0.883 0.915 0.861 0.743 0.665 0.732 0.655 0.893 0.827
Polyp-PVT [6] 0.948 0.905 0.917 0.864 0.808 0.727 0.787 0.706 0.900 0.833
HSNet [43] 0.937 0.887 0.926 0.877 0.810 0.735 0.808 0.734 0.903 0.839
SAM-Adapter [4] 0.774 0.673 0.847 0.763 0.671 0.568 0.590 0.476 0.815 0.725
AutoSAM [9] 0.751 0.642 0.784 0.675 0.535 0.418 0.402 0.308 0.829 0.739
SAMPath [40] 0.750 0.644 0.828 0.730 0.632 0.516 0.555 0.442 0.844 0.756
SAMed [19] 0.404 0.273 0.459 0.300 0.199 0.115 0.212 0.126 0.332 0.202
SAMUS [21] 0.900 0.821 0.859 0.763 0.731 0.597 0.750 0.618 0.859 0.760
SurgicalSAM [39] 0.644 0.505 0.740 0.597 0.460 0.330 0.342 0.238 0.623 0.472
MedSAM [22] 0.867 0.803 0.862 0.795 0.734 0.651 0.687 0.604 0.870 0.798
Polyp-sam [20] 0.920 0.870 0.900 0.860 0.894 0.843 0.903 0.852 0.905 0.860
ASPS [18] 0.951 0.906 0.920 0.858 0.799 0.701 0.861 0.769 0.919 0.852
Polyp-SAM++ [3] 0.91 0.86 0.90 0.86 - - - - - -
M2M^{2}ixNet [45] 0.941 0.891 0.929 0.881 0.820 0.855 0.891 0.866 0.895 0.861
SAM 2 (BBox) 0.93 0.87 0.939 0.885 0.934 0.877 0.941 0.89 0.932 0.873
Refer to caption
Figure 1: Qualitative Assessment of Segmentation Outcomes on Kvasir-SEG and CVC-300 Datasets using SAM [17] and SAM 2.

2.2 Evaluation Metrics

In our experiment, we employed two widely used metrics for evaluating the effectiveness of Polyp SAM 2 against other methods of image and video segmentation. Specifically, we evaluated Mean Dice Score (mDice) and Mean Intersection over Union (mIoU).

Table 5: Quantitative results of SAM 2 on PolypGen 23 video sequences. As the input prompt, we are using the bounding box for the first frame.
      Methods       mDice       mIoU
      UNet[26]       0.4559       0.4049
      UNet++[49]       0.4772       0.4272
      ResU-Net++[11]       0.2105       0.1589
      MSEG [10]       0.4662       0.4171
      ColonSegNet[13]       0.3574       0.3058
      UACANet[16]       0.4748       0.4155
      UNeXt[33]       0.2998       0.2457
      TransNetR[14]       0.5168       0.4717
      SAM 2 (BBox)       0.879       0.785

2.3 Quantitative Results between SAM and SAM 2

First, we compare the zero-shot segmentation results of the SAM and SAM 2 models on the CVC-ClinicDB, Kvasir-SEG, and CVC-300 datasets without fine-tuning. We evaluated four different prompt settings:

  • 1 Add - 0 Remove: In this scenario, we provide only one input point to each model. This point is randomly selected from the positive areas (where polyps exist in the image) of the ground truth masks.

  • 2 Add - 2 Remove: Here, we give four input points to the model. Two points are randomly chosen from the positive points (where polyps exist) of the ground truth mask, and two from the negative points (where polyps do not exist).

  • 5 Add - 5 Remove: In this setting, we provide ten input points—five positive and five negative.

  • Bounding Box: Both the SAM and SAM 2 models receive bounding boxes as input prompts.

Tabs. 1, 2 and 3 show the quantitative comparison of SAM and SAM 2 for these various prompt settings. Results show that increasing the number of input points improves segmentation accuracy for both models. However, bounding box prompts consistently yield better outcomes than point prompts. Overall, SAM 2 performs almost better than the SAM model during bounding box prompt segmentation, highlighting its application to polyp zero-shot segmentation in future works.

Furthermore, we delve into the qualitative evaluation of SAM 2 in polyp segmentation tasks. Fig. 1 illustrates the visualization results of SAM 2 alongside the SAM model using two selected benchmark datasets. Notably, the SAM 2 model demonstrates superior performance, achieving segmentation results that closely align with the ground truth.

2.4 Comparison with State-of-the-art Methods

In Tab. 4, we present a quantitative comparison of Polyp SAM 2 with several state-of-the-art methods across five publicly available polyp segmentation datasets mentioned in the 3.1 based on the metrics discussed in 3.2. In particular, we evaluated Polyp SAM 2 against various CNN and ViT models as well as other recent SAM-based segmentation techniques.
In our study, CNN-based models include UNet [26], UNet++ [49], SFA [8], PraNet [7], ACSNet [41], MSEG [10], DCRNet [38], EU-Net [24], SANet [36], MSNet [44], UACANet [16], C2FNet [48], LDNet [42], FAPNet [46], and CFA Net [48]. For transformer-based models, we evaluated SSFormer [35], Polyp PVT [6], and HSNet [43]. Furthermore, we explored the impact and effectiveness of SAM-Adapter [4], AutoSAM [9], SAMPath [40], SAMed [19], SAMUS [21], SurgicalSAM [39], MedSAM [22], Polyp-SAM [20], ASPS [18], Polyp-SAM++ [3], and M2ixNet—all[45] of which are built upon the SAM model.
Based on the results, we can conclude that SAM 2 is capable of effectively locating and segmenting polyps without additional training. More importantly, among all image segmentation methods, SAM 2 has achieved the highest performance across all scores by a considerable margin (e.g.,1%, 0.4% in mDice, mIoU on Kvasir-SEG [12], 4%, 2.2% in mDice, mIoU on CVC-ClinicDB [2], 5%, 2.4% in mDice, mIoU on ETIS-LaribPolypD [29] and 1.3%, 1.2% in mDice, mIoU on CVC-300 [34] than the second-best methods).

2.5 Quantitative Results on video polyp segmentation

In this section, we assess the performance of the SAM 2 model for polyp video segmentation using the PolypGen dataset. As our input prompt, we employ a single bounding box corresponding to the first frame of the video sequence. This bounding box is derived from the ground truth mask of the first frame. Our evaluation results, presented in Tab. 5, demonstrate that SAM 2 significantly improves video segmentation performance, achieving a substantial increase of 31.4% in mean intersection over union (mIoU) compared to the previous state-of-the-art method. Notably, these results were obtained without fine-tuning, distinguishing our approach from prior works.

3 Conclusion

In this study, we investigated the effectiveness of two zero-shot segmentation models: SAM and SAM 2. Our evaluation focused on medical image segmentation tasks, specifically polyp image and video segmentation. SAM 2 consistently outperformed SAM across various metrics, achieving significant improvements without fine-tuning. Bounding box prompts yielded better outcomes for both models, highlighting SAM 2’s practical applicability. Furthermore, SAM 2 surpassed state-of-the-art methods, showcasing its potential for clinical applications. Future research could explore fine-tuning strategies and generalization to other medical imaging tasks.

References

  • Ali et al. [2023] Sharib Ali, Debesh Jha, Noha Ghatwary, Stefano Realdon, Renato Cannizzaro, Osama E Salem, Dominique Lamarque, Christian Daul, Michael A Riegler, Kim V Anonsen, et al. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Scientific Data, 10(1):75, 2023.
  • Bernal et al. [2015] Jorge Bernal, F Javier Sánchez, Gloria Fernández-Esparrach, Debora Gil, Cristina Rodríguez, and Fernando Vilariño. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics, 43:99–111, 2015.
  • Biswas [2023] Risab Biswas. Polyp-sam++: Can a text guided sam perform better for polyp segmentation? arXiv preprint arXiv:2308.06623, 2023.
  • Chen et al. [2023] Tianrun Chen, Lanyun Zhu, Chaotao Deng, Runlong Cao, Yan Wang, Shangzhan Zhang, Zejian Li, Lingyun Sun, Ying Zang, and Papa Mao. Sam-adapter: Adapting segment anything in underperformed scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3367–3375, 2023.
  • Deng et al. [2023] Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W Remedios, Shunxing Bao, Bennett A Landman, Lee E Wheless, Lori A Coburn, Keith T Wilson, et al. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
  • Dong et al. [2021] Bo Dong, Wei Wang, Dong-Ping Fan, Jie Li, Hao Fu, and Ling Shao. Polyppvt: Polyp segmentation with pyramid vision transformers. arXiv preprint arXiv:2108.06932, 2021.
  • Fan et al. [2020] Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. Pranet: Parallel reverse attention network for polyp segmentation. In International conference on medical image computing and computer-assisted intervention, pages 263–273. Springer, 2020.
  • Fang et al. [2019] Yuqi Fang, Cheng Chen, Yixuan Yuan, and Kai-yu Tong. Selective feature aggregation network with area-boundary constraints for polyp segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, pages 302–310. Springer, 2019.
  • Hu et al. [2023] Xinrong Hu, Xiaowei Xu, and Yiyu Shi. How to efficiently adapt large segmentation model (sam) to medical images. arXiv preprint arXiv:2306.13731, 2023.
  • Huang et al. [2021] Chien-Hsiang Huang, Hung-Yu Wu, and Youn-Long Lin. Hardnet-mseg: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps. arXiv preprint arXiv:2101.07172, 2021.
  • Jha et al. [2019] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange, Pål Halvorsen, and Håvard D Johansen. Resunet++: An advanced architecture for medical image segmentation. In 2019 IEEE international symposium on multimedia (ISM), pages 225–2255. IEEE, 2019.
  • Jha et al. [2020] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas De Lange, Dag Johansen, and Håvard D Johansen. Kvasir-seg: A segmented polyp dataset. In MultiMedia modeling: 26th international conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, proceedings, part II 26, pages 451–462. Springer, 2020.
  • Jha et al. [2021] Debesh Jha, Sharib Ali, Nikhil Kumar Tomar, Håvard D Johansen, Dag Johansen, Jens Rittscher, Michael A Riegler, and Pål Halvorsen. Real-time polyp detection, localization and segmentation in colonoscopy using deep learning. Ieee Access, 9:40496–40510, 2021.
  • Jha et al. [2024] Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma, and Ulas Bagci. Transnetr: transformer-based residual network for polyp segmentation with multi-center out-of-distribution testing. In Medical Imaging with Deep Learning, pages 1372–1384. PMLR, 2024.
  • Ji et al. [2024] Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu, Wenbo Li, and Li Cheng. Segment anything is not always perfect: An investigation of sam on different real-world applications, 2024.
  • Kim et al. [2021] Taehun Kim, Hyemin Lee, and Daijin Kim. Uacanet: Uncertainty augmented context attention for polyp segmentation. In Proceedings of the 29th ACM international conference on multimedia, pages 2167–2175, 2021.
  • Kirillov et al. [2023] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
  • Li et al. [2024a] Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, and Junwei Han. Asps: Augmented segment anything model for polyp segmentation. arXiv preprint arXiv:2407.00718, 2024a.
  • Li et al. [2023] Yuheng Li, Mingzhe Hu, and Xiaofeng Yang. Polyp-sam: Transfer sam for polyp segmentation. arXiv preprint arXiv:2305.00293, 2023.
  • Li et al. [2024b] Yuheng Li, Mingzhe Hu, and Xiaofeng Yang. Polyp-sam: Transfer sam for polyp segmentation. In Medical Imaging 2024: Computer-Aided Diagnosis, pages 759–765. SPIE, 2024b.
  • Lin et al. [2023] Xian Lin, Yangyang Xiang, Li Zhang, Xin Yang, Zengqiang Yan, and Li Yu. Samus: Adapting segment anything model for clinically-friendly and generalizable ultrasound image segmentation. arXiv preprint arXiv:2309.06824, 2023.
  • Ma et al. [2024a] Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images. Nature Communications, 15:654, 2024a.
  • Ma et al. [2024b] Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, and Bo Wang. Segment anything in medical images and videos: Benchmark and deployment. arXiv preprint arXiv:2408.03322, 2024b.
  • Patel et al. [2021] Krushi Patel, Andrés M Bur, and Guanghui Wang. Enhanced u-net: A feature enhancement network for polyp segmentation. In 2021 18th conference on robots and vision (CRV), pages 181–188. IEEE, 2021.
  • Ravi et al. [2024] Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024.
  • Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, pages 234–241, 2015.
  • Roy et al. [2023] Saikat Roy, Tassilo Wald, Gregor Koehler, Maximilian R Rokuss, Nico Disch, Julius Holzschuh, David Zimmerer, and Klaus H Maier-Hein. Sam. md: Zero-shot medical image segmentation capabilities of the segment anything model. arXiv preprint arXiv:2304.05396, 2023.
  • Shen et al. [2024] Chuyun Shen, Wenhao Li, Yuhang Shi, and Xiangfeng Wang. Interactive 3d medical image segmentation with sam 2. arXiv preprint arXiv:2408.02635, 2024.
  • Silva et al. [2014] Juan Silva, Aymeric Histace, Olivier Romain, Xavier Dray, and Bertrand Granado. Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. International journal of computer assisted radiology and surgery, 9:283–293, 2014.
  • Sun et al. [2021] Yujia Sun, Geng Chen, Tao Zhou, Yi Zhang, and Nian Liu. Context-aware cross-level fusion network for camouflaged object detection. arXiv preprint arXiv:2105.12555, 2021.
  • Tajbakhsh et al. [2015] Nima Tajbakhsh, Suryakanth R Gurudu, and Jianming Liang. Automated polyp detection in colonoscopy videos using shape and context information. IEEE transactions on medical imaging, 35(2):630–644, 2015.
  • Tang et al. [2023] Lv Tang, Haoke Xiao, and Bo Li. Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709, 2023.
  • Valanarasu and Patel [2022] Jeya Maria Jose Valanarasu and Vishal M Patel. Unext: Mlp-based rapid medical image segmentation network. In International conference on medical image computing and computer-assisted intervention, pages 23–33. Springer, 2022.
  • Vázquez et al. [2017] David Vázquez, Jorge Bernal, F Javier Sánchez, Gloria Fernández-Esparrach, Antonio M López, Adriana Romero, Michal Drozdzal, and Aaron Courville. A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of healthcare engineering, 2017(1):4037190, 2017.
  • Wang et al. [2022] Jinfeng Wang, Qiming Huang, Feilong Tang, Jia Meng, Jionglong Su, and Sifan Song. Stepwise feature fusion: Local guides global. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 110–120. Springer, 2022.
  • Wei et al. [2021] Jun Wei, Yiwen Hu, Ruimao Zhang, Zhen Li, S Kevin Zhou, and Shuguang Cui. Shallow attention network for polyp segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 699–708. Springer, 2021.
  • Yan et al. [2024] Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, et al. Biomedical sam 2: Segment anything in biomedical images and videos. arXiv preprint arXiv:2408.03286, 2024.
  • Yin et al. [2022] Zijin Yin, Kongming Liang, Zhanyu Ma, and Jun Guo. Duplex contextual relation network for polyp segmentation. In 2022 IEEE 19th international symposium on biomedical imaging (ISBI), pages 1–5. IEEE, 2022.
  • Yue et al. [2024] Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, and Zhiyong Wang. Surgicalsam: Efficient class promptable surgical instrument segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 6890–6898, 2024.
  • Zhang et al. [2023] Jingwei Zhang, Ke Ma, Saarthak Kapse, Joel Saltz, Maria Vakalopoulou, Prateek Prasanna, and Dimitris Samaras. Sam-path: A segment anything model for semantic segmentation in digital pathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 161–170. Springer, 2023.
  • Zhang et al. [2020] Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, and Yizhou Yu. Adaptive context selection for polyp segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23, pages 253–262. Springer, 2020.
  • Zhang et al. [2022a] Ruifei Zhang, Peiwen Lai, Xiang Wan, De-Jun Fan, Feng Gao, Xiao-Jian Wu, and Guanbin Li. Lesion-aware dynamic kernel for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 99–109. Springer, 2022a.
  • Zhang et al. [2022b] Wenchao Zhang, Chong Fu, Yu Zheng, Fangyuan Zhang, Yanli Zhao, and Chiu-Wing Sham. Hsnet: A hybrid semantic network for polyp segmentation. Computers in biology and medicine, 150:106173, 2022b.
  • Zhao et al. [2021] Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. Automatic polyp segmentation via multi-scale subtraction network. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 120–130. Springer, 2021.
  • Zheng et al. [2024] Zhuoran Zheng, Chen Wu, Wei Wang, Yeying Jin, and Xiuyi Jia. Polyp-dam: Polyp segmentation via depth anything model. arXiv preprint arXiv:2402.02298, 2024.
  • Zhou et al. [2022] Tao Zhou, Yi Zhou, Chen Gong, Jian Yang, and Yu Zhang. Feature aggregation and propagation network for camouflaged object detection. IEEE Transactions on Image Processing, 31:7036–7047, 2022.
  • Zhou et al. [2023a] Tao Zhou, Yizhe Zhang, Yi Zhou, Ye Wu, and Chen Gong. Can sam segment polyps? arXiv preprint arXiv:2304.07583, 2023a.
  • Zhou et al. [2023b] Tao Zhou, Yuxin Zhou, Kaiming He, Chen Gong, Jie Yang, Hao Fu, and Dinggang Shen. Cross-level feature aggregation network for polyp segmentation. Pattern Recognition, page 109555, 2023b.
  • Zhou et al. [2019] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging, 39(6):1856–1867, 2019.
  • Zhu et al. [2024] Jiayuan Zhu, Yunli Qi, and Junde Wu. Medical sam 2: Segment medical images as video via segment anything model 2. arXiv preprint arXiv:2408.00874, 2024.