¹¹institutetext: Tianjin Key Laboratory of Network and Data Security Technology, College of Computer Science, Nankai University, Tianjin, China
¹¹email: [email protected]
²²institutetext: Haihe Lab of ITAI

TransUKAN:Computing-Efficient Hybrid KAN-Transformer for Enhanced Medical Image Segmentation

Yanlin Wu 11 Tao Li

{}^{(\textrm{{\char 0\relax}})}

1122 Zhihong Wang 11 Hong Kang 11 Along He 11

Abstract

U-Net is currently the most widely used architecture for medical image segmentation. Benefiting from its unique encoder-decoder architecture and skip connections, it can effectively extract features from input images to segment target regions. The commonly used U-Net is typically based on convolutional operations or Transformers, modeling the dependencies between local or global information to accomplish medical image analysis tasks. However, convolutional layers, fully connected layers, and attention mechanisms used in this process introduce a significant number of parameters, often requiring the stacking of network layers to model complex nonlinear relationships, which can impact the training process. To address these issues, we propose TransUKAN. Specifically, we have improved the KAN to reduce memory usage and computational load. On this basis, we explored an effective combination of KAN, Transformer, and U-Net structures. This approach enhances the model’s capability to capture nonlinear relationships by introducing only a small number of additional parameters and compensates for the Transformer structure’s deficiency in local information extraction. We validated TransUKAN on multiple medical image segmentation tasks. Experimental results demonstrate that TransUKAN achieves excellent performance with significantly reduced parameters. The code will be available athttps://github.com/wuyanlin-wyl/TransUKAN.

1 Introduction

Medical image segmentation is a crucial task in medical imaging analysis, aiming to accurately segment different anatomical structures or lesion areas in images [8, 9, 30]. This process plays a vital role in clinical diagnosis, surgical planning, and treatment evaluation [31, 3, 23]. Traditional medical image segmentation methods typically rely on manual feature extraction and heuristic rules. However, these methods exhibit significant limitations when dealing with complex and diverse medical images [12, 35]. With the development of deep learning technology, data-driven approaches have gradually become the mainstream in the field of medical image segmentation.

Refer to caption — Figure 1: Structural Comparison Among Different Neural Networks: KAN revolutionizes the computational approach of multilayer perceptrons (MLPs) by placing learnable parameters within the activation operations, offering stronger nonlinear fitting capabilities but with computational challenges. ReluKAN simplifies the computational process but introduces a large number of parameters. EfficientKAN reduces both the parameter and computational of the original KAN.

In recent years, U-Net [29] and its variants [40, 17, 27, 18] have performed exceptionally well in medical image segmentation, driving continuous advancements in this field. For example, UNet++ [40] incorporates nested encoder-decoder sub-networks within the overall encoder-decoder network structure, redesigning the skip connections in the UNet architecture. UNet3+ [17] utilizes full-scale skip connections to directly combine high-level and low-level semantics from feature maps of different scales. It also employs deep supervisions to learn hierarchical representations from multi-scale aggregated feature maps. Methods like 3D U-Net and V-Net enhance performance by introducing three-dimensional convolutions to process 3D medical images [10, 26]. However, these networks lack the ability to model long-range dependencies between features, indicating room for further improvement in the models. With the successful application of Transformers in computer vision, many Transformer-based networks have been introduced into medical image segmentation [7, 33, 6, 34, 37, 32, 13]. These methods model global dependencies within images, overcoming the limitations of traditional convolutional networks in handling long-range dependencies.

Despite the significant progress made by convolutional networks and Transformers in medical image segmentation, they still have inherent limitations. Specifically, convolutional operations primarily capture spatial dependencies between local pixels, making it challenging to effectively model complex nonlinear patterns across channels, which are often crucial for diagnosis in medical images [24, 21]. Transformers typically require large amounts of data for training, while medical image data is usually scarce. Additionally, Transformers have relatively weak capabilities in extracting local detail information, which is also essential in medical image segmentation [14, 28].

Recently, Kolmogorov–Arnold Networks (KANs) [25], as an emerging network structure, have introduced learnable nonlinear activation functions, providing superior accuracy and interpretability, showing great potential in replacing traditional multilayer perceptrons (MLPs). KANs stack learnable nonlinear activation functions, enabling neural networks to learn complex functional mappings more efficiently, thereby improving model performance and interpretability.

To better balance the modeling capabilities of global and local information in medical image segmentation, we propose a new network architecture called TransUKAN. It combines the strengths of Convolutional Neural Networks (CNNs), U-Net, Transformers, and KANs. By introducing improved KAN into Transformers, TransUKAN enhances the modeling capability of local details while capturing global information. Additionally, the design of improved KAN significantly enhances the modeling of nonlinear relationships with only a small number of additional parameters, alleviating the training burden.

Our contributions can be summarized as follows:

1. We successfully complement the advantages of the KAN with Transformer and U-Net, using the local nonlinear modeling capability of the KAN to improve the Transformer structure. To the best of our knowledge, this is the first work to apply the KAN to medical image segmentation, providing a strong reference for subsequent research and application of KAN models in this field.

2. To address the issues of high memory usage and a large number of parameters in the KAN when processing images, we propose EfficientKAN. By sparsifying the matrices during the activation integration stage of the KAN, we simplify the computation process, making it efficiently applicable to medical image processing tasks.

3. We conducted extensive experimental validation of TransUKAN on multiple medical image segmentation tasks. Experimental results show that TransUKAN can achieve performance comparable to state-of-the-art methods with significantly reduced parameters, demonstrating its effectiveness and superiority in medical image segmentation tasks.

2 Related Work

2.1 Medical Image Segmentation

The advent of deep learning technologies, particularly CNNs, has significantly advanced the field of medical image segmentation. U-Net [29] is one of the most classic CNN-based network architectures for medical image segmentation. The success of U-Net has inspired the development of numerous improved models. For instance, UNet++ incorporates nested encoder-decoder sub-networks and redesigned skip connections to improve feature fusion and gradient flow. UNet3+ leverages full-scale skip connections to combine multi-scale features and employs deep supervisions to learn hierarchical representations. 3D UNet and V-Net extend the traditional UNet architecture to three dimensions, making them suitable for volumetric image segmentation. Attention UNet and R2UNet introduce attention mechanisms to focus on relevant features, enhancing segmentation accuracy. Residual UNet and ResUNet++ incorporate residual connections to address the vanishing gradient problem, facilitating the training of deeper networks. Dense UNet and U-Net with Dense Blocks apply dense connectivity to improve feature reuse and network efficiency. Furthermore, UNetGAN combines UNet with generative adversarial networks (GANs) to improve segmentation quality through adversarial training.

Furthermore, to address the lack of long-range dependency modeling capabilities in CNNs, researchers introduced Transformers into the field of computer vision, proposing the Vision Transformer (ViT). Subsequently, networks such as TransUNet, TransFuse, and UCTransNet emerged, combining Transformers with CNNs and fully exploring the extended capabilities of Transformers. Although Transformers can capture global information, they have a large number of parameters, high computational complexity, and are difficult to train. Additionally, Transformers are less effective at extracting local information, which can impact segmentation accuracy.

2.2 Kolmogorov-Arnold Networks (KAN)

Kolmogorov–Arnold Networks (KAN) is a neural network architecture based on the Kolmogorov–Arnold super approximation theorem. This theorem states that any multivariate continuous function can be represented as a finite superposition of univariate continuous functions. This theory provides a solid mathematical foundation for the representation and computation of complex functions. In KAN, by constructing a multi-layer network structure and utilizing the combination and superposition of univariate functions, complex nonlinear relationships can be efficiently approximated and represented. The core advantage of KAN lies in its ability to model nonlinear features efficiently with fewer parameters, making it highly effective in handling high-dimensional data and complex tasks. Specifically, KAN captures nonlinear patterns in input data through a series of linear transformations and nonlinear activation functions. Compared to traditional CNNs, KAN has advantages in terms of the number of parameters and computational complexity, making it highly applicable in scenarios requiring efficient models.

In computer vision, ConvKANs [4] adapt KANs into a convolutional architecture by integrating nonlinear activation functions from KANs into the convolutional layer. This integration effectively reduces parameter count while maintaining high accuracy levels. Graph-based applications also benefit from KAN [20, 36, 5], which replaces traditional MLPS in graph neural networks with KAN layers. By using learnable spline-based functions instead of fixed activation functions, this substitution enhances the model’s ability to capture complex relationships in graph-structured data.

KANs offer significant advantages in accuracy and interpretability, positioning them as a promising alternative to traditional neural network models. However, the application of KAN in image processing has not been fully explored, especially in challenging areas such as medical image processing, where issues of high computation and memory usage have not been adequately addressed.

In our research, KAN has been improved and introduced into the task of medical image segmentation. By integrating it with the Transformer, we fully leverage its strengths in modeling nonlinear relationships. The improvements and introduction of KAN not only reduce the number of model parameters and lower computational complexity but also enhance segmentation accuracy. This combined approach effectively addresses the shortcomings of Transformers in local information extraction and significantly improves overall model performance and training efficiency.

3 METHODOLOGY

TransUKAN integrates the strengths of UNet, Transformer, and KAN, effectively combining them. We will first introduce the overall framework of the model in Section 3.1. Then, in Section 3.2, we will describe the integration of KAN and Transformer. Finally, in Section 3.3, we will elaborate on the detailed structure of the proposed EfficientKAN.

3.1 Overall Structure

As shown in Fig. 1, the overall structure of TransUKAN combines the advantages of CNN, Transformer, and KAN. First, the input image is processed by a CNN to extract features, generating feature maps. These feature maps are reshaped and linearly projected into a high-dimensional feature space. Then, these embedded features serve as the input to the Kansformer encoder, processed through multiple Kansformer layers to capture contextual information. Each Kansformer layer includes Layer Normalization (LN), Multi-Head Self-Attention mechanism (MSA), and EfficientKAN, ensuring the integration of global and local information in the features.

To restore the spatial resolution of the image, the encoded features are progressively upsampled through a cascaded upsampler. It consists of multiple upsampling blocks, each containing a 2x upsampling operation, a 3×3 convolutional layer, and a ReLU activation function. During the upsampling process, the encoded features are fused with high-resolution feature maps from the CNN encoding path through skip connections, enhancing the recovery of low-level spatial information for precise image segmentation. Finally, the fused feature maps are further upsampled to the full resolution of the original image to generate the final segmentation mask.

3.2 Kansformer

Since its inception, KAN has been optimized compared to the MLP. Its powerful nonlinear representation capability and training stability enable it to better capture the complex features of data. Therefore, an intuitive improvement approach is to directly replace MLP with KAN. Additionally, to maintain the continuity of nonlinear representation, we also replaced the QKV mapping matrices with a single-layer KAN. This allows the powerful nonlinear representation capability of KAN to be utilized in the self-attention mechanism, ensuring that the model maintains efficient nonlinear representation at each stage of feature extraction and processing, thereby improving overall performance.

The overall structure of Kansformer is shown in Figure 1(a), primarily composed of LN, EfficientKAN layers, and MSA. The output of the l-th layer can be expressed as follows:

\mathrm{z}_{l}^{\prime}=\operatorname{MSA}\left(\text{EfficientKAN}\left(LN\left(z_{l-1}\right)\right)\right)+z_{l-1}

(1)

\mathrm{z}_{1}=\text{EfficientKAN}\left(LN\left(z^{\prime}\right)\right)+z^{\prime}

(2)

Here, $\text{LN}(\cdot)$ denotes the Layer Normalization operator, and $z_{l}$ represents the feature map encoded by the Kansformer at layer $l$ .

By gradually replacing the MLP and QKV mapping matrices, the model benefits from KAN’s parameter compression and nonlinear representation at each step. After replacing the MLP, the reduction in parameter count and increase in computational efficiency provides a solid foundation for further replacing the QKV mapping matrices. When the QKV mapping matrices are also replaced with KAN, the overall computational complexity is further reduced, and parameter efficiency is further improved. This synergistic optimization ensures that the model achieves optimal performance at each stage.

3.3 EfficientKAN

The core idea of KAN can be expressed by Eq. (3), which states that a high-dimensional function can be represented as a composition of a finite number of one-dimensional functions:

f(x)=\sum_{i=1}^{2n+1}\Phi_{i}\left(\sum_{j=1}^{n}\phi_{i,j}(x_{j})\right)

(3)

Specifically, assuming the input vector $x$ has a length of $n$ , the computation of the output $y$ can be expressed as follows:

y=\begin{pmatrix}\Phi_{1}\\ \Phi_{2}\\ \vdots\\ \Phi_{2n+1}\end{pmatrix}\begin{pmatrix}\varphi_{1,1}&\varphi_{1,2}&\cdots&\varphi_{1,2n+1}\\ \varphi_{2,1}&\varphi_{2,2}&\cdots&\varphi_{2,2n+1}\\ \vdots&\vdots&\ddots&\vdots\\ \varphi_{n,1}&\varphi_{n,2}&\cdots&\varphi_{n,2n+1}\end{pmatrix}x

(4)

Here, $\phi_{i,j}$ is referred to as the inner function, and $\Phi_{i}$ is referred to as the outer function. Specifically, the inner and outer functions can be expressed in the form of linear combinations and B-spline functions as follows:

\varphi(x)=w_{b}\frac{x}{(1+e^{-x})}+w_{s}\sum c_{i}B_{i}(x)

(5)

Here, $B_{i}(x)$ is a B-spline function, $w_{b}$ and $w_{s}$ are weight parameters, and $c$ is a control coefficient for shaping the B-spline. B-spline functions are a set of bell-shaped functions used to represent any univariate function on a finite domain. These functions have the same shape but different positions.

Due to the computational complexity of B-spline functions, KAN is limited in utilizing the parallel processing capabilities of GPUs. This complexity results in significant limitations in processing speed and scalability, especially in fine-grained classification tasks such as medical image segmentation. The ReLU-KAN architecture simplifies the basis function by adopting ReLU and dot product operations, optimizing the computational process as follows:

R_{i}(x)=\left[\text{ReLU}(e_{i}-x)\times\text{ReLU}(x-s_{i})\right]^{2}\times\frac{16}{(e_{i}-s_{i})^{4}}

(6)

Here, $e_{i}$ and $s_{i}$ represent the upper and lower bounds of the basis function, respectively, controlling the range and position of the basis function through these two parameters. The factor $\frac{16}{(e_{i}-s_{i})^{4}}$ in the basis function plays a role in normalization and scaling. Its main purpose is to ensure that the activation value remains within a reasonable range, avoiding overflow or underflow. Subsequently, the computational process of KAN is optimized through convolution operations. This simplified basis function has higher computational efficiency, making it more suitable for GPU processing.

Through our experiments and observations, we found that although using convolution can make the KAN more favorable for GPU computation during the activation value integration, this method integrates not only the activation values of different neurons but also the activation values of individual neurons, treating all activation values as a whole. This approach can result in significant information redundancy and introduce a large number of parameters when handling high-dimensional data, leading to substantial computational resource usage during backpropagation. Specifically, assuming the input $X$ has dimensions $(B,C_{in})$ , where $B$ is the batch size and $C_{in}$ is the number of neurons, the network activated based on the ReLUKAN principle will generate $X_{1}$ with dimensions $(B,(G+K),C_{in})$ , where $G$ and $K$ are hyperparameters used by KAN to generate the number of grids and also represent the height and width of the feature map and convolution kernel. The relationship between the activation values of each neuron calculated through convolution operations can be expressed as follows:

X^{\prime}=\text{reshape}(X,(B,1,G+K,C_{in}))

(7)

X^{\prime\prime}=\text{conv}(X^{\prime},W)

(8)

Here, the size of the convolution kernel is $(G+K,C_{in})$ , and the dimension of the integrated feature map $X^{\prime\prime}$ is $(B,C_{out})$ . This method introduces significant computational and parameter redundancy when handling high-dimensional data because the convolution kernel needs to integrate all neurons at each position:

		$\displaystyle\left[\begin{array}[]{ccc}A_{11}&A_{12}&A_{13}\\ A_{12}&A_{22}&A_{23}\\ A_{13}&A_{32}&A_{33}\end{array}\right]\times\left[\begin{array}[]{ccc}B&C&D\end{array}\right]$		(9)
		$\displaystyle\rightarrow\left[\begin{array}[]{ccc}A_{11}&\ldots&A_{33}\end{array}\right]\times\left[\begin{array}[]{ccc}B_{11}&C_{11}&D_{11}\\ \vdots&\vdots&\vdots\\ B_{33}&C_{33}&D_{33}\end{array}\right]$
		$\displaystyle=\left[\begin{array}[]{ccc}X_{1}&X_{2}&X_{3}\end{array}\right]$

Here, $A_{ij}$ represents the activation values of different neurons in the input $X$ . The same row represents different activation values of the same neuron, and the same column represents different neurons. $B$ , $C$ , and $D$ are convolution kernels that integrate the activation values of $X$ . ReluKAN replaces the original KAN activation value integration operation with convolution operations using kernels of the same size as the input feature map. Therefore, when the input feature map dimension increases, it leads to a significant increase in the number of parameters and computational load. To reduce computational and parameter redundancy, we limit the integration of activation values to within a single neuron and implicitly learn the relationships between different neurons only during backpropagation, as follows:

		$\displaystyle\left[\begin{array}[]{ccc}A_{11}&\cdots&A_{1j}\\ \vdots&\ddots&\vdots\\ A_{i1}&\cdots&A_{jj}\end{array}\right]\times\left[\begin{array}[]{ccc}a&\cdots&0\\ \vdots&\ddots&\vdots\\ k&\cdots&0\end{array}\right]$		(10)
		$\displaystyle=\left[\begin{array}[]{ccc}X_{11}&\cdots&0\\ \vdots&\ddots&\vdots\\ X_{i1}&\cdots&0\end{array}\right]\rightarrow\left[\begin{array}[]{ccc}X_{1}&\cdots&X_{i}\end{array}\right]$		(10)

By using this method, the activation value processing matrix is maximally sparsified. To further reduce parameters and computational load, the activation value processing matrix is further simplified such that $a=k=1/j$ . In practical implementation, this can be replaced by an average pooling operation. Additionally, to further enhance the nonlinear fitting capability of KAN, a square calculation is introduced after the integration of activation values.

4 EXPERIMENTS

In this section, we test the effectiveness of our model using both external and internal datasets. Section 4.1 introduces the external and internal datasets. Section 4.2 details the experimental setup and procedures. Sections 4.3 and 4.4 present comparative experiments and ablation experiments, respectively, to demonstrate the effectiveness and superiority of TransUKAN.

4.1 Datasets

ISIC [11]: This dataset contains 2594 skin lesion images captured from real patients using a dermatoscope equipped with a digital camera. Each image has been annotated by a professional physician to mark the area of the skin lesion, and all data have been reviewed and managed by professional dermatologists with knowledge of dermatoscopy.

Kvasir [19]: Kvasir-SEG is an open-source dataset manually annotated and verified by an experienced gastroenterologist. The dataset contains 1000 images of polyps and their corresponding masks, with image resolutions ranging from 332 $\times$ 487 to 1920 $\times$ 1072. The aim of creating this dataset is to promote the development and progress of polyp detection tasks.

BUSI [1]: The BUSI dataset collects ultrasound images of breasts from women aged between 25 and 75 years old. The dataset includes 780 images from 600 female patients, which are divided into three categories: normal, benign, and malignant. Among them, there are 133 normal cases, 437 benign tumors, and 210 malignant tumors. The average image size is 500 $\times$ 500 pixels.

NKUT [39]: NKUT is a specialized dataset designed for the segmentation of pediatric mandibular wisdom teeth (MWT) from Cone Beam Computed Tomography (CBCT) images. This dataset comprises 133 CBCT volumes, representing over 53,000 slices, with patient ages ranging from 7 to 22 years, and an average age of 13.2 years. The dataset includes detailed pixel-level annotations created by pediatric dentistry experts, covering bilateral MWT germs, second molars (SM), and partial alveolar bones (AB).

4.2 Implementation Details

The images in the dataset were uniformly preprocessed and adjusted to 256x256 pixels to meet the model’s input requirements. The dataset was divided into training, validation, and test sets in an 8:1:1 ratio. Data augmentation techniques, including random cropping, rotation, and flipping, were widely applied during training to increase data diversity. Model training was conducted on a single NVIDIA A6000 GPU.

During training, we used the Adam optimizer with an initial learning rate set to 1e-4 and employed weight decay to prevent overfitting. The total training epochs were set to 200, with the first 10 epochs as a warm-up phase, using linear learning rate growth, followed by cosine annealing learning rate decay. Each training batch size was set to 8. For binary classification, the loss function is a weighted sum of cross-entropy loss and dice loss, while for multi-class classification, only cross-entropy loss is used. During the validation, we use DICE, IOU, and accuracy as evaluation metrics, and we also record model parameters as well as inference times to comprehensively evaluate the model’s performance.

4.3 Comparisons with Other SOTA models

We conducted a comprehensive evaluation of our method against several state-of-the-art models across multiple datasets. The results are presented in Table LABEL:table2, it can be seen that our proposed method has significant competitiveness in medical image segmentation. Our model outperformed its counterparts across all six medical image segmentation tasks. These experimental results provide strong evidence that our method has effectively improved the capabilities of SAM in medical image segmentation. Furthermore, our method achieves the objective of employing a single model for segmenting multiple medical images, while consistently delivering excellent performance.

Table 1: Quantitative comparison between SOTA methods and TransUKAN on Medical Image Datasets. All the metrics are based on the Acc performance.

Methods	#P(M)	BUSI	ISIC	Kvasir	NKUT
Methods	#P(M)	BUSI	ISIC	Kvasir	MWT	SAM	AB
UNet[29]	17.27	68.22	89.98	83.74	57.01	59.54	28.03
Att-Unet[27]	34.88	67.14	89.86	84.35	64.07	72.86	52.96
TransUNet[7]	105.32	72.76	91.58	86.30	89.67	90.13	80.94
TransDeeplab[2]	17.49	59.76	89.06	74.30	85.75	79.27	75.59
HiFormer[15]	23.25	68.80	91.13	85.27	62.44	66.6	44.78
UCTransNet[33]	66.49	71.49	91.04	86.02	73.29	70.67	75.70
TransFuse[38]	26.28	71.19	90.55	80.01	69.06	73.31	50.56
AutoSAM[16]	90.82	70.04	90.64	82.58	64.92	69.18	52.41
U-KAN[22]	25.36	69.35	90.47	84.69	42.24	35.00	25.26
TransUKAN (Ours)	20.85	75.46	91.17	87.75	90.29	89.09	77.96

4.4 Ablation Studies

In this section, we assess the impact of the proposed components on segmentation performance across external and internal datasets. All these models are based on TransUNet. The experimental results are shown in Table 2. The baseline model, comprising 105.3 M, demonstrates moderate performance on the external dataset with a DICE score of 91.8%, 88.51% and 75.58%, respectively.

Table 2: TransUKAN ablation Studies on medical image datasets

Methods	#P(M)	VM	BUSI	ISIC	Kvasir	NKUT
Methods	#P(M)	VM	BUSI	ISIC	Kvasir	MWT	SAM	AB
Vanilla	105.3	6	72.76	91.58	86.30	89.67	90.13	80.94
+KAN	21.23	24	73.78	88.04	73.78	88.18	86.35	75.79
+ReLUKAN	233.2	6	73.15	88.40	74.19	88.55	86.62	75.18
TransUKAN	20.85	6	75.46	91.17	87.75	90.29	89.09	77.96

After replacing all the fully connected operations in the Transformer with KAN, the number of parameters was reduced to 21.23 M, approximately five times lower. The results on external datasets remain highly competitive, demonstrating the potential of the KAN in medical image segmentation. However, after improving KAN to ReLUKAN, the introduction of numerous convolution operations led to an increase in model parameters to 233.2 M. Despite the increase in parameters, the model’s performance did not significantly improve and even showed a downward trend, demonstrating that the ReLUKAN approach is not suitable for medical image processing tasks. Finally, using EfficientKAN to build the TransUKAN reduced the number of parameters to 20.85 M. This significantly lowered memory usage while accelerating training and improved the model’s testing accuracy on external datasets, demonstrating the effectiveness of EfficientKAN.

Additionally, on the internal CBCT validation set, TransUKAN still achieved excellent performance, demonstrating its generalization and robustness.

5 Conclusion

In this study, we proposes TransUKAN, a medical image segmentation model based on EfficientKAN. By replacing the multi-layer perceptrons (MLP) and the QKV mapping matrices in the multi-head self-attention mechanism (MSA) of traditional Transformer models with Kolmogorov-Arnold Networks (KAN), the model’s nonlinear representation capability, computational efficiency, and parameter efficiency are significantly improved. This study achieves a breakthrough in optimizing computational and parameter redundancy issues by using average pooling operations to integrate only the activation values of the current neuron, avoiding unnecessary computational burden while retaining key feature information. Experimental results demonstrate that the improved TransUKAN model significantly outperforms state-of-the-art models in terms of performance, while also significantly reducing computational complexity and the number of parameters, enhancing computational efficiency and model stability, thus validating its potential in practical applications.

Future research will further optimize the EfficientKAN structure, explore more efficient nonlinear basis function designs, and apply the model to multi-classification tasks and larger-scale datasets to validate its generality and robustness. Additionally, exploring the model’s performance in real-time applications, optimizing inference speed and resource usage, will promote its application in practical medical scenarios, aiming to provide more advanced technical support for the field of medical image analysis and to advance the development of automated diagnostic technology.

5.0.1 Acknowledgements

This work is partially supported by the National Natural Science Foundation (62272248), the China Scholarship Council (202306200119).

References

[1] Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data in brief 28, 104863 (2020)
[2] Azad, R., Heidari, M., Shariatnia, M., Aghdam, E.K., Karimijafarbigloo, S., Adeli, E., Merhof, D.: Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation. In: International Workshop on PRedictive Intelligence In MEdicine. pp. 91–102. Springer (2022)
[3] Bilic, P., Christ, P., Li, H.B., Vorontsov, E., Ben-Cohen, A., Kaissis, G., Szeskin, A., Jacobs, C., Mamani, G.E.H., Chartrand, G., et al.: The liver tumor segmentation benchmark (lits). Medical Image Analysis 84, 102680 (2023)
[4] Bodner, A.D., Tepsich, A.S., Spolski, J.N., Pourteau, S.: Convolutional kolmogorov-arnold networks. arXiv preprint arXiv:2406.13155 (2024)
[5] Bresson, R., Nikolentzos, G., Panagopoulos, G., Chatzianastasis, M., Pang, J., Vazirgiannis, M.: Kagnns: Kolmogorov-arnold networks meet graph learning. arXiv preprint arXiv:2406.18380 (2024)
[6] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision. pp. 205–218. Springer (2022)
[7] Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
[8] Chen, K., Qin, T., Lee, V.H.F., Yan, H., Li, H.: Learning robust shape regularization for generalizable medical image segmentation. IEEE Transactions on Medical Imaging (2024)
[9] Cheng, Z., Wang, S., Xin, T., Zhou, T., Zhang, H., Shao, L.: Few-shot medical image segmentation via generating multiple representative descriptors. IEEE Transactions on Medical Imaging (2024)
[10] Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19. pp. 424–432. Springer (2016)
[11] Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). pp. 168–172. IEEE (2018)
[12] Demirhan, A., Törü, M., Güler, I.: Segmentation of tumor and edema along with healthy tissues of brain using wavelets and neural networks. IEEE journal of biomedical and health informatics 19(4), 1451–1458 (2014)
[13] He, A., Wang, K., Li, T., Du, C., Xia, S., Fu, H.: H2former: An efficient hierarchical hybrid transformer for medical image segmentation. IEEE Transactions on Medical Imaging 42(9), 2763–2775 (2023)
[14] He, K., Gan, C., Li, Z., Rekik, I., Yin, Z., Ji, W., Gao, Y., Wang, Q., Zhang, J., Shen, D.: Transformers in medical image analysis. Intelligent Medicine 3(1), 59–78 (2023)
[15] Heidari, M., Kazerouni, A., Soltany, M., Azad, R., Aghdam, E.K., Cohen-Adad, J., Merhof, D.: Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 6202–6212 (2023)
[16] Hu, X., Xu, X., Shi, Y.: How to efficiently adapt large segmentation model (sam) to medical images. arXiv preprint arXiv:2306.13731 (2023)
[17] Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.W., Wu, J.: Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 1055–1059. IEEE (2020)
[18] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)
[19] Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., de Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: A segmented polyp dataset. In: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26. pp. 451–462. Springer (2020)
[20] Kiamari, M., Kiamari, M., Krishnamachari, B.: Gkan: Graph kolmogorov-arnold networks. arXiv preprint arXiv:2406.06470 (2024)
[21] Kim, B.N., Dolz, J., Jodoin, P.M., Desrosiers, C.: Privacy-net: an adversarial approach for identity-obfuscated segmentation of medical images. IEEE Transactions on Medical Imaging 40(7), 1737–1749 (2021)
[22] Li, C., Liu, X., Li, W., Wang, C., Liu, H., Yuan, Y.: U-kan makes strong backbone for medical image segmentation and generation. arXiv preprint arXiv:2406.02918 (2024)
[23] Liu, H., Xu, Z., Gao, R., Li, H., Wang, J., Chabin, G., Oguz, I., Grbic, S.: Cosst: Multi-organ segmentation with partially labeled datasets using comprehensive supervisions and self-training. IEEE Transactions on Medical Imaging (2024)
[24] Liu, Y., Zhou, J., Liu, L., Zhan, Z., Hu, Y., Fu, Y., Duan, H.: Fcp-net: A feature-compression-pyramid network guided by game-theoretic interactions for medical image segmentation. IEEE Transactions on Medical Imaging 41(6), 1482–1496 (2022)
[25] Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T.Y., Tegmark, M.: Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756 (2024)
[26] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV). pp. 565–571. Ieee (2016)
[27] Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
[28] Parvaiz, A., Khalid, M.A., Zafar, R., Ameer, H., Ali, M., Fraz, M.M.: Vision transformers in medical computer vision—a contemplative retrospection. Engineering Applications of Artificial Intelligence 122, 106126 (2023)
[29] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
[30] Shaker, A.M., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Unetr++: delving into efficient and accurate 3d medical image segmentation. IEEE Transactions on Medical Imaging (2024)
[31] Tian, Y., Liu, F., Pang, G., Chen, Y., Liu, Y., Verjans, J.W., Singh, R., Carneiro, G.: Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images. Medical image analysis 90, 102930 (2023)
[32] Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: Gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. pp. 36–46. Springer (2021)
[33] Wang, H., Cao, P., Wang, J., Zaiane, O.R.: Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 2441–2449 (2022)
[34] Wu, H., Chen, S., Chen, G., Wang, W., Lei, B., Wen, Z.: Fat-net: Feature adaptive transformers for automated skin lesion segmentation. Medical image analysis 76, 102327 (2022)
[35] Yang, W., Liu, Y., Lin, L., Yun, Z., Lu, Z., Feng, Q., Chen, W.: Lung field segmentation in chest radiographs from boundary maps by a structured edge detector. IEEE journal of biomedical and health informatics 22(3), 842–851 (2017)
[36] Zhang, F., Zhang, X.: Graphkan: Enhancing feature extraction with graph kolmogorov arnold networks. arXiv preprint arXiv:2406.13597 (2024)
[37] Zhang, Y., Liu, H., Hu, Q.: Transfuse: Fusing transformers and cnns for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. pp. 14–24. Springer (2021)
[38] Zhang, Y., Liu, H., Hu, Q.: Transfuse: Fusing transformers and cnns for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, proceedings, Part I 24. pp. 14–24. Springer (2021)
[39] Zhou, Z., Chen, Y., He, A., Que, X., Wang, K., Yao, R., Li, T.: Nkut: Dataset and benchmark for pediatric mandibular wisdom teeth segmentation. IEEE Journal of Biomedical and Health Informatics (2024)
[40] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. pp. 3–11. Springer (2018)