ICOS Protein Expression Segmentation: Can Transformer Networks Give Better Results?
Abstract
Biomarkers identify a patient’s response to treatment. With the recent advances in artificial intelligence based on the Transformer networks, there is only limited research has been done to measure the performance on challenging histopathology images. In this paper, we investigate the efficacy of the numerous state-of-the-art Transformer networks for immune-checkpoint biomarker, Inducible T-cell COStimulator (ICOS) protein cell segmentation in colon cancer from immunohistochemistry (IHC) slides. Extensive and comprehensive experimental results confirm that MiSSFormer achieved the highest Dice score of than the rest evaluated Transformer and Efficient U-Net methods.
Keywords:
Artificial intelligence Immunohistochemistry Transformer.1 Introduction
Inducible Co-Stimulator (ICOS) may be a biomarker of interest in checkpoint inhibitor therapy and as a means of assessing T-cell regulation as part of a complex process of adaptive immunity. Over the last decade, deep learning-based convolutional neural networks (CNNs) have achieved outstanding performance in multiple applications [3]. Capturing long-term dependencies requires increasing the size of the convolution kernel, which can slow down the system while improving the representation of functionality. Recently, Transformer-based networks developed that use self-attention mechanisms to extract long-distance dependencies without slowing them down. This demonstrates significant benefits for natural image classification, segmentation, and recognition tasks. Due to the higher parameters and the need of larger datasets in training, the work done in the medical field is limited [6]. Therefore, the main motivation for this study is to measure the performance of the multiple Transformer methods, which capture a wide range of contextual information of different shapes and sizes of ICOS cells.

Figure 1 shows the general pipeline of proposed framework. To the best of our knowledge, we believe this is the first adaptation of analyzing the multiple Transformer network (i.e., SETR [8], TransUNet [2], Swin-UNet [1], MedT [7] and MiSSFormer [4]) performance with self-attention mechanism in the application for ICOS cell segmentation from IHC images.
2 Material and Methods
Dataset: In this study, we used ICOS IHC data from our previous research [5]. The entire dataset of 534 image patches is divided into training, validation, and test sets at 60%, 10%, and 30%, respectively.
Transformer-based segmentation methods: we used the five methods describe as follows.
SETR [8] substitutes the convolutions in the encoder with a spatial resolution Transformer. The input is converted to a sequence of many patches via learned patch embedding. It transformed through a global self-attention delivering an identifiable feature representation. The decoder generates the binary mask. The source code is available at https://github.com/920232796/SETR-pytorch.
TransUNet [2] incorporated the CNN-Transformer network. The CNN layers capture spatial information, and Transformer responsible for global feature. It has u-shape, where extracted self-attention features maps upsampled to be merged with varying CNN features skipped from the encoder. The implementation is available at https://github.com/Beckschen/TransUNet.
Swin-UNet [1] used the Swin Transformer in the encoder with shifted windows approach. The image is split into several patches and fed to the encoder extracting spatial and global features. The decoder uses a symmetric Swin Transformer consisting patch expanding layer to create the mask. The source code is available at https://github.com/HuCaoFighting/Swin-Unet.
MedT [7] combines the extracted encoding layers local and global features. It applied a Local-Global training process for Transformers that utilize a narrower global path and a deep local path to works on the image patches. It leverages the axial attention technique that maintains the information passed to key, query, and value via a positional embedding. The implementation is available at https://github.com/jeya-maria-jose/Medical-Transformer.
MISSFormer [4] used the enhanced Transformer block to obtain better features of ICOS cells. It includes the encoder, bridge, and decoder network with skip connection to narrow down the semantic gaps. The encoder layer extracts the relevant features of overlapped patches. The local and global features are combined via the bridge mechanism. The decoder network provide the segmentation map. The implementation is available at https://github.com/ZhifangDeng/MISSFormer.
Implementation details: We used the input size of and applied the data augmentation with rotation, and horizontal flipping. An Adam optimizer was employed with a learning rate of 0.0002. We used the binary cross-entropy and dice losses. All the models has trained with a batch size of 4 with 50 epochs.
3 Experimental Results
Table 1 shows the five state-of-the-art Transformer-based methods for ICOS cell segmentation. It includes the SETR [8], TransUNet [2], Swin-UNet [1], MedT [7] and MiSSFormer [4]. Experimental results confirm that MiSSFormer has achieved the highest Dice and aggregated Jaccard index (AJI) scores of and respectively, than existing methods. This method allows capturing the long-range dependencies and local context of multi-scale ICOS cells features, improving segmentation results. SETR has achieved the lowest Dice and AJI scores, which were lower against the MiSSFormer. We found that TransUNet, Swin-UNet, and MedT performed relatively well and were very similar in all the metrics. We compared the Transformer methods to recent work [5] that achieved a Dice score, which was lower than MiSSFormer. Overall, Transformer-based method shows better segmentation results than CNN.
Model | Dice coefficient | AJI | Sensitivity | Specificity |
SETR | 57.90 | 41.95 | 64.47 | 99.03 |
TransUNet | 71.09 | 56.39 | 78.78 | 99.28 |
Swin-UNet | 71.52 | 55.61 | 77.92 | 99.42 |
MedT | 71.61 | 56.91 | 80.26 | 99.28 |
MiSSFormer | 74.85 | 59.26 | 81.13 | 99.49 |
Figure 2 provide the qualitative comparison of Transformer-based methods. We provide the two examples has variability in cell structures. The reported AJI score confirms that the MiSSFormer performed better than other methods and had fewer false positives. We also show the heatmap generated by the intermediate layer of the MiSSFormer. The higher red highlight network precisely captures the cells and ignores the background shown in blue.

4 Conclusion
This paper presents novel research to investigate Transformer-based networks for ICOS cell segmentation. We examined the effectiveness of five state-of-the-art segmentation methods. Current studies comparing Transformer results to the recent CNN-based method show MiSSFomer attained improved segmentation results in all metrics. It can learn better feature representation than CNN. Future work aims to generalize the Transformer network to other protein cell datasets.
References
- [1] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
- [2] Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
- [3] Dong, S., Wang, P., Abbas, K.: A survey on deep learning and its applications. Computer Science Review 40, 100379 (2021)
- [4] Huang, X., Deng, Z., Li, D., Yuan, X.: Missformer: An effective medical image segmentation transformer. arXiv preprint arXiv:2109.07162 (2021)
- [5] Sarker, M.M.K., Makhlouf, Y., Craig, S.G., Humphries, M.P., Loughrey, M., James, J.A., Salto-Tellez, M., O’Reilly, P., Maxwell, P.: A means of assessing deep learning-based detection of icos protein expression in colon cancer. Cancers 13(15), 3825 (2021)
- [6] Shamshad, F., Khan, S., Zamir, S.W., Khan, M.H., Hayat, M., Khan, F.S., Fu, H.: Transformers in medical imaging: A survey. arXiv preprint arXiv:2201.09873 (2022)
- [7] Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: Gated axial-attention for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 36–46. Springer (2021)
- [8] Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6881–6890 (2021)