¹¹institutetext: Department of Biostatistics, School of Global Public Health, New York University
Equal contributions
* Correspondence: ¹¹email: [email protected]

3D U-KAN Implementation for Multi-modal MRI Brain Tumor Segmentation

Tianze Tang ** 0009-0007-6244-5813 Yanbing Chen ** 0009-0009-1218-0129 Hai Shu **** 0000-0002-6968-4063

Abstract

We explore the application of U-KAN, a U-Net based network enhanced with Kolmogorov-Arnold Network (KAN) layers, for 3D brain tumor segmentation using multi-modal MRI data. We adapt the original 2D U-KAN model to the 3D task, and introduce a variant called UKAN-SE, which incorporates Squeeze-and-Excitation modules for global attention. We compare the performance of U-KAN and UKAN-SE against existing methods such as U-Net, Attention U-Net, and Swin UNETR, using the BraTS 2024 dataset. Our results show that U-KAN and UKAN-SE, with approximately 10.6 million parameters, achieve exceptional efficiency, requiring only about 1/4 of the training time of U-Net and Attention U-Net, and 1/6 that of Swin UNETR, while surpassing these models across most evaluation metrics. Notably, UKAN-SE slightly outperforms U-KAN.

Keywords:

Brain Tumor Segmentation, BraTS Challenge , KAN, Multi-modal MRI, U-KAN.

1 Introduction

Gliomas are a common type of malignant brain tumors and a leading cause of cancer-related deaths among adults. Diagnosing gliomas is particularly challenging due to their invasive nature and their ability to occur in any part of the brain [14, 19]. Accurate detection of these tumors requires multi-protocol magnetic resonance imaging (MRI), which remains the gold standard for post-treatment imaging of gliomas. MRI provides crucial information on tumor size, location, and morphological changes. The most commonly-used MRI modalities include T1-weighted (T1), post-contrast T1-weighted (T1Gd), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) [20].

The Brain Tumor Segmentation (BraTS) Challenge, an annual event hosted by the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) [15, 4, 1], has been held since 2012. In the BraTS 2024 Challenge, we participated in Task 1: Segmentation–Adult Glioma Post Treatment [2, 3], which focuses on evaluating advanced techniques for segmenting brain diffuse gliomas. Participants are provided with a multi-modal MRI dataset to develop automated algorithms to accurately segment the tumors. This initiative is valuable for creating tools to objectively assess residual tumor volumes, which is crucial for treatment planning and predicting patient outcomes.

U-Net [18] has become the benchmark network for medical image segmentation, and U-Net based models have successfully demonstrated their efficacy in past BraTS challenges [16, 9, 8]. Originating from a straightforward design that features convolutional layers and optionally normalization layers, U-Net has been significantly enhanced through various modifications. For example, Attention U-Net [17] incorporates attention blocks to improve focus on relevant features, while Swin UNETR [6] integrates a Swin Transformer as the encoder and connects it to a CNN-based decoder at various resolutions through skip connections.

While previous U-Net based models have made progress in medical image segmentation, they face significant challenges with conventional kernels. These kernels often struggle to capture complex nonlinear patterns and rely on empirical designs that lack interpretability, limiting their clinical trustworthiness. Recently, Kolmogorov-Arnold Networks (KANs) [11] have emerged as a promising alternative, using learnable activation functions on edges instead of fixed ones on nodes. This approach enhances the network’s ability to model complex functions more accurately, improves interpretability, and more effectively links the network’s structure to its performance.

Given the potential of KANs, U-KAN [10] has been developed by integrating KAN layers into the U-Net architecture. This network design includes a tokenized KAN block that captures complex patterns more effectively. Empirical evaluations of U-KAN have shown its superior performance in medical segmentation tasks and its potential as an advanced alternative in various visual applications.

In this paper, we compare five segmentation models on the BraTS 2024 Challenge’s Task 1, including U-Net [18], Attention U-Net [17], Swin UNETR [6], U-KAN [10], and our UKAN-SE, a new U-KAN variant with Squeeze-and-Excitation modules [7] for global attention. We evaluate these models based on total parameter size, training time, and segmentation accuracy.

2 Methods

2.1 Dataset

The dataset for Task 1 of the 2024 BraTS Challenge [19] comprises 1,350 labeled training samples and 188 unlabeled validation samples. Each sample includes four modalities (T1, T1Gd, T2, and FLAIR), as illustrated in Figure 1. The data is derived from multi-institutional, routine post-treatment clinically-acquired multi-parametric MRI (mpMRI) scans of glioma patients. The MRI scans undergo preprocessing to align them to a standard anatomical template, normalize the resolution to $1\text{ mm}^{3}$ , and remove the skull. The final dimensions of the processed scans are $218\times 182\times 182$ .

Refer to caption — Figure 1: An example of the four MRI modalities and the ground-truth segmentation. For the truth labels, red is NETC, green is SNFH, blue is ET, and yellow is RC.

The ground-truth segmentation labels for the validation set are hidden from the Challenge participating teams, who can only access the Dice scores and the 95% Hausdorff distance of their segmentation results via the online participation platform. The labels for the MRI scans include non-enhancing tumor core (NETC), surrounding non-enhancing FLAIR hyperintensity (SNFH), enhancing tissue (ET), and resection cavity (RC), as shown in Figure 1.

2.2 Data Augmentation

Data Processing. Our data augmentation pipeline for MRI images incorporates several transformations to enhance the diversity of the training data. We combine the four modalities into a single 4D image ( $C\times H\times W\times D$ , where $C=4$ ) and save it, along with the segmentation labels, in the .h5 file format. Each image is stored in 32-bit floating-point format, while the labels are stored as 8-bit unsigned integers. The grayscale values of each image are normalized, ensuring a background value of zero. This process helps reduce the file size.

Data Enhancing. The augmentation process begins with a fixed crop of $192\times 160\times 160$ to ensure a consistent input size, followed by random flips along different axes to introduce spatial variability. Gaussian noise is added to simulate variations in image quality, and random rotations are applied to account for potential misalignments. Additionally, the contrast of the images is adjusted to vary the intensity distribution. Finally, the images and labels are converted into tensor format, preparing them for further processing in the neural network. This augmentation strategy enhances the model’s robustness.

2.3 Network Architectures

Loss Function. The loss function is given in equation (1), which combines the soft Dice score and Cross Entropy (CE) loss, with uniform weighting applied across all labels in the CE loss. We experimented with different weights for the CE loss, but no improvement in training performance was observed. The loss is calculated on a batch-by-batch basis. Specifically,

\mathcal{L}_{\text{total}}=(1-\alpha)\cdot\mathcal{L}_{\text{CE}}+\alpha\cdot(1-\text{Soft Dice}),

(1)

where $\alpha$ is a weight factor, and we use 0.5. The softmax output excluding the background class is used as the input for the soft Dice loss, and the one-hot encoded target excluding the background class is used as the target.

Previous U-Net based models. We consider three U-Net based models, including the classic U-Net [18], Attention U-Net (Att-Unet) [17], and Swin UNETR [6], which have demonstrated strong performance in previous challenges. The U-Net architecture captures features at multiple resolutions through its downscaling and upscaling layers (encoding and decoding), making it particularly suitable for tasks like brain tumor segmentation. Att-Unet incorporates attention gates, enhancing segmentation accuracy. Swin UNETR combines a U-shaped network design with a Swin Transformer encoder, utilizing multi-scale attention mechanisms to capture both global and local contextual information, making it effective for complex pattern recognition in medical images.

U-KAN. The U-KAN model [10] integrates KAN layers [11] into the traditional U-Net structure. In our experiment, we use the default U-KAN configuration, with channel sizes $D_{1}=128$ , $D_{2}=160$ , and $D_{3}=256$ (see Figure 2). The architecture employs a two-phase design: a Convolution Phase for initial feature extraction, followed by a Tokenized KAN (Tok-KAN) Phase where the KAN layers refine the feature representations. This design enables U-KAN to effectively capture complex non-linear patterns, making it particularly suitable for medical image segmentation tasks. The KAN layers are mathematically described as

\text{KAN}(\mathbf{Z})=\mathbf{\Phi}_{K-1}\circ\mathbf{\Phi}_{K-2}\circ\cdots\circ\mathbf{\Phi}_{0}(\mathbf{Z}),

where $\mathbf{\Phi}_{i}$ is the $i$ -th KAN layer consisting of learnable activation functions, enhancing the model’s ability to capture intricate patterns in medical images.

Given the strong performance of U-KAN in 2D image segmentation, we hypothesize that it can also excel in 3D tasks like brain tumor segmentation. Thus, we adapt it to a 3D version by replacing the 2D convolutional and batch normalization blocks with their 3D counterparts, as shown in Figure 2. Notably, the KAN itself does not impose any dimensional constraints, as all input data are transformed into 1D channels, facilitating a straightforward transition to 3D.

UKAN-SE. We introduce a new U-KAN variant, called UKAN-SE, which incorporates a Squeeze-and-Excitation (SE) module [7] after each convolutional block to provide global attention and enhance feature representations. Figure 2 illustrates the network structure.

2.4 Evaluation Metrics

Besides the Dice score and 95% Hausdorff distance (HD95) for the whole image segmentation, the lesion-wise Dice and lesion-wise HD95 are used in the BraTS 2024 Challenge [19] to evaluate the performance of segmentation models. The two lesion-wise metrics are defined as

\text{Lesion-wise Dice}=\frac{\sum_{i=1}^{L}\text{Dice}(l_{i})}{\text{TP}+\text{FN}+\text{FP}},

\text{Lesion-wise HD95}=\frac{\sum_{i=1}^{L}\text{HD95}(l_{i})}{\text{TP}+\text{FN}+\text{FP}},

where $L$ is the number of ground-truth lesions, $l_{i}$ denotes the $i$ -th lesion, TP is the number of true positives, FN is the number of false negatives, and FP is the number of false positives.

The lesion-wise Dice measures the overlap between predicted and ground-truth segmentations, excluding true negative voxels. The lesion-wise HD95 evaluates the distance between the closest points of the predicted and ground-truth segmentations, emphasizing the most significant segmentation errors. The lesion-wise metrics provide a detailed understanding of a model’s ability to accurately segment complex and multi-focal lesions, and help prevent bias toward models that excel only in detecting larger lesions.

The Dice and HD95 metrics are computed for each of the six tumor sub-regions: non-enhancing tumor core (NETC), surrounding non-enhancing FLAIR hyperintensity (SNFH), enhancing tissue (ET), resection cavity (RC), tumor core (TC = ET + NETC), and whole tumor (WT = TC + SNFH).

To compare the efficiency of the five segmentation models, we report the total number of parameters and calculate the average training time per epoch. All models are implemented on a NVIDIA RTX8000 GPU (48GB memory).

2.5 Training Details

To find the optimal hyperparameters for U-KAN, we aim to achieve the highest lesion-wise Dice and smallest lesion-wise HD95 scores to ensure the best performance. The best-performing set of hyperparameters includes a combination of batch size, epochs, optimizer, weight decay and learning rate. Due to the large-scale BraTS images, we set the batch size to 2. Model training is conducted for 100 epochs, beginning with a 10-epoch warm-up period. The AdamW optimizer [13] is utilized, with a weight decay of 0.0001. The learning rate starts from 0.005, reaches a maximum value of 0.01 after 10 warm-up epochs, and then gradually decreases to 0.005 through cosine annealing [12]. For consistency, this set of hyperparameters is also applied to the other four models in our study.

3 Results

The lesion-wise Dice and whole-image Dice results in Table 1 indicate that UKAN-SE achieves the best performance across various lesion types, particularly excelling in ET and RC. It also shows consistent high scores in WT segmentation. Att-Unet performs well in some areas, notably in SNFH and WT, but does not surpass UKAN-SE in overall performance. U-KAN also demonstrates strong performance, with competitive scores across multiple categories.

In Table 2, which presents lesion-wise HD95 and whole-image HD95 metrics, the UKAN-SE method continues to lead, showcasing lower values in most lesion types, indicative of better boundary delineation and contour accuracy. This method is particularly strong in ET, RC, and WT segmentation. The U-KAN method also shows commendable performance, particularly in RC and SNFH.

Table 3 shows the parameter size and training time for each of the five models. Notably, while U-KAN and U-Net have comparable parameter sizes, with 10.612 million and 10.743 million parameters, respectively, U-KAN achieves significantly faster training time, averaging only 803 seconds per epoch compared to U-Net’s 3322 seconds per epoch. Although UKAN-SE has slightly more parameters and requires slightly more training time than U-KAN, it generally performs better in segmentation accuracy due to its global attention mechanism. An example of segmentation results is presented in Figure 3.

Table 1: The Dice results of the validation set computed by the online validation platform. ET = enhancing tissue, NETC = non-enhancing tumor core, RC = resection cavity, SNFH = surrounding non-enhancing FLAIR hyperintensity, TC = tumor core (ET + NETC), WT = whole tumor (TC + SNFH).

Method	Lesion-wise Dice						Whole-image Dice
	ET	NETC	RC	SNFH	TC	WT	ET	NETC	RC	SNFH	TC	WT
U-Net	0.587	0.652	0.580	0.763	0.563	0.774	0.611	0.595	0.573	0.888	0.598	0.896
Att-Unet	0.596	0.653	0.604	0.819	0.578	0.829	0.626	0.592	0.594	0.889	0.617	0.899
Swin UNETR	0.582	0.644	0.493	0.716	0.555	0.706	0.630	0.578	0.540	0.870	0.620	0.881
U-KAN	0.600	0.667	0.566	0.784	0.582	0.793	0.636	0.603	0.568	0.871	0.626	0.888
UKAN-SE	0.633	0.637	0.621	0.796	0.597	0.804	0.655	0.578	0.614	0.879	0.645	0.893

Table 2: The HD95 results of the validation set computed by the online validation platform. ET = enhancing tissue, NETC = non-enhancing tumor core, RC = resection cavity, SNFH = surrounding non-enhancing FLAIR hyperintensity, TC = tumor core (ET + NETC), WT = whole tumor (TC + SNFH).

Method	Lesion-wise HD95						Whole-image HD95
	ET	NETC	RC	SNFH	TC	WT	ET	NETC	RC	SNFH	TC	WT
U-Net	75.840	76.566	80.149	52.424	81.437	49.442	55.328	73.672	68.210	8.088	57.092	8.071
Att-Unet	69.097	79.548	74.985	31.403	74.394	30.699	47.350	81.337	64.280	7.169	47.852	6.988
Swin UNETR	72.498	75.746	110.619	64.128	80.542	71.014	42.113	81.871	71.318	8.757	42.939	9.367
U-KAN	67.745	69.447	75.882	39.983	73.733	40.216	39.548	73.334	62.171	6.712	46.866	6.779
UKAN-SE	59.371	79.972	66.061	38.095	70.676	38.434	36.611	79.492	58.007	8.280	37.137	8.251

Table 3: Total parameter size and average training time (per epoch).

Method	# parameters	Average time
U-Net	10.743M	3322s
Att-Unet	6.438M	3222s
Swin UNETR	62.365M	4526s
U-KAN	10.612M	803s
UKAN-SE	10.628M	813s

4 Discussion

This study evaluated the performance of the latest KAN-enhanced U-Net model, U-KAN, on 3D multi-modal MRI brain tumor segmentation using the BraTS 2024 dataset. The results indicate that the U-KAN structure achieves strong performance and exceptional efficiency in the brain tumor segmentation. Compared to traditional models like U-Net and Swin UNETR, the 3D U-KAN model delivers comparable performance with approximately only $1/4$ to $1/6$ of the computational cost, highlighting the significant potential of the KAN structure.

In this study, we implemented only simple data augmentation techniques and did not utilize ensemble learning, unlike the more complex approaches described by the past challenge’s winner [5]. A medium-sized KAN layer configuration ( $[D_{1},D_{2},D_{3}]=[128,160,256]$ ) was used. Future research may explore varying these dimensions to assess their impact. Besides, the KAN structure’s inherent advantages [10, 11] in interpretability suggest promising avenues for further investigation in image segmentation tasks, potentially enhancing the understanding of model predictions.

5 Acknowledgements

The data used in this publication were obtained as part of the Challenge project through Synapse ID (syn51156910). This work was partially supported by the NYU IT High Performance Computing resources, services, and staff expertise.

References

[1] Baid, U., Ghodasara, S., Mohan, S., Bilello, M., Calabrese, E., Colak, E., Farahani, K., Kalpathy-Cramer, J., Kitamura, F.C., Pati, S., et al.: The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314 (2021)
[2] Bakas, S., Akbari, H., Sotiras, A., et al.: Segmentation labels for the pre-operative scans of the tcga-gbm collection. The cancer imaging archive (2017)
[3] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann, J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection. The cancer imaging archive 286 (2017)
[4] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann, J.B., Farahani, K., Davatzikos, C.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific Data 4(1), 170117 (2017)
[5] Ferreira, A., Solak, N., Li, J., Dammann, P., Kleesiek, J., Alves, V., Egger, J.: How we won brats 2023 adult glioma challenge? just faking it! enhanced synthetic data augmentation and model ensemble for brain tumour segmentation. arXiv preprint arXiv:2402.17317 (2024)
[6] Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.: Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In: International MICCAI brainlesion workshop. pp. 272–284. Springer (2021)
[7] Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141 (2018)
[8] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)
[9] Jiang, Z., Ding, C., Liu, M., Tao, D.: Two-stage cascaded u-net: 1st place solution to brats challenge 2019 segmentation task. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 17, 2019, Revised Selected Papers, Part I 5. pp. 231–241. Springer (2020)
[10] Li, C., Liu, X., Li, W., Wang, C., Liu, H., Yuan, Y.: U-kan makes strong backbone for medical image segmentation and generation. arXiv preprint arXiv:2406.02918 (2024)
[11] Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T.Y., Tegmark, M.: Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756 (2024)
[12] Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
[13] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
[14] Louis, D.N., Wesseling, P., Aldape, K., Brat, D.J., Capper, D., Cree, I.A., Eberhart, C., Figarella-Branger, D., Fouladi, M., Fuller, G.N., et al.: cimpact-now update 6: new entity and diagnostic principle recommendations of the cimpact-utrecht meeting on future cns tumor classification and grading (2020)
[15] Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993–2024 (2014)
[16] Myronenko, A.: 3d mri brain tumor segmentation using autoencoder regularization. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4. pp. 311–320. Springer (2019)
[17] Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
[18] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. pp. 234–241. Springer (2015)
[19] de Verdier, M.C., Saluja, R., Gagnon, L., LaBella, D., Baid, U., Tahon, N.H., Foltyn-Dumitru, M., Zhang, J., Alafif, M., Baig, S., et al.: The 2024 brain tumor segmentation (brats) challenge: Glioma segmentation on post-treatment mri. arXiv preprint arXiv:2405.18368 (2024)
[20] Visser, M., Müller, D., Van Duijn, R., Smits, M., Verburg, N., Hendriks, E., Nabuurs, R., Bot, J., Eijgelaar, R., Witte, M., et al.: Inter-rater agreement in glioma segmentations on longitudinal mri. NeuroImage: Clinical 22, 101727 (2019)