Robust Split Federated Learning for U-shaped Medical Image Networks

Ziyuan Yang, Yingyu Chen, Huijie Huangfu, Maosong Ran, Hui Wang, Xiaoxiao Li, , and Yi Zhang Z. Yang, Y. Chen, H. Huangfu, M. Ran and H. Wang are with the College of Computer Science, Sichuan University, Chengdu, 610065, China
E-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected] X. Li is with the Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, V6T1Z4, Canada
E-mail: [email protected] Y. Zhang is with the School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
E-mail: [email protected] Y. Zhang is the corresponding author.

Abstract

U-shaped networks are widely used in various medical image tasks, such as segmentation, restoration and reconstruction. Most of them usually rely on centralized learning and thus ignore privacy issues. To address privacy concerns, federated learning (FL) and split learning (SL) have attracted increasing attention. However, it is hard for both FL and SL to balance the local computational cost, model privacy and parallel training simultaneously. To achieve this goal, in this paper, we propose Robust Split Federated Learning (RoS-FL) for U-shaped medical image networks, which is a novel hybrid learning paradigm of FL and SL. To preserve data privacy, including the input, model parameters, label and output simultaneously, we propose to split the network into three parts hosted by different parties. Besides, distributed learning methods usually suffer from a drift between local and global models caused by data heterogeneity. Based on this consideration, we propose a Dynamic Weight Correction Strategy (DWCS) to stabilize the training process and avoid model drift. The effectiveness of the proposed RoS-FL is supported by extensive experimental results on different tasks ¹¹1Related codes will be released at https://github.com/Zi-YuanYang/RoS-FL..

Index Terms:

Split Federated learning, privacy, U-shaped medical image network, dynamic weight correction.

1 Introduction

In the past decade, U-shaped medical image networks have achieved great success in various medical image tasks, such as segmentation [1], restoration [2], reconstruction [3] etc. These methods require numerous data to train a network model. However, it is difficult to collect a large amount of data in a single institution in realistic scenario. Meanwhile, gathering data from multiple sources is hindered by several data protection regulations, especially for the privacy issues. To address this problem, distributed collaborative machine learning (DCML) has attracted increasing attention [4, 5, 6]. It enables training a model in a decentralized manner without accessing the raw data from a single institution [7].

As a representative DCML paradigm, Federated Learning (FL) [8, 9] facilitates parallel training and their training time overhead is satisfactory. However, there are two challenges in FL: (1). The Requirement of Client computational resource: Each client needs to train a full model, which requires high computational resources for each client. Hence, FL may be not suitable for computational resource-constrained environments. (2). Model Privacy: All participants, including clients and the server, have full access to the whole model, which increases the potential privacy and security risks [10].

Recently, Split Learning (SL) was proposed to remedy FL’s shortcomings by splitting the full model into server-side and client-side models [11]. The splitting operation has two advantages. First, clients only need to train a part of the model, and the most resource-consuming part is trained in the server. Second, each party, including clients and server, only partially accesses the full model. Hence, SL provides better model privacy and requires fewer client-side computational resources than FL. Despite the above merits, the main concern of SL is the training cost. The training process of SL is sequential, which only allows one client to engage with the server at one instance [12].

Except for the above issues, data heterogeneity is an opening challenge for all DCML paradigms [13]. This problem leads to a huge gap between the optimization directions of local and global models. For clients, local optimization attempts to find an optimal model for local data with a disinterest in global performance. global optimization expects to find a globally optimal solution, which has a certain tolerance for sub-optimal local performance. The different optimization goals will cause a drift between local and global models.

To relieve the above problems in a unified framework, we propose Robust Split Federated Learning (RoS-FL) for U-shaped medical image networks. RoS-FL is a hybrid learning paradigm, which combines SL and FL and attempts to achieve the best of both worlds. We notice that, unlike classification task, the labels and outputs in segmentation, restoration and reconstruction tasks usually contain privacy information. To protect data privacy and reduce the computational costs in client side, we split the U-shaped network into three parts, including the head, body, and tail models. The proposed framework consists of three parts, which are computation server, aggregation server, and clients, respectively. The head and tail models are hosted in clients, and the body part which requires large computational resource is hosted in computation server. To enable parallel training, each client corresponds to one body model in computation server. All clients perform the forward propagation of client-side models in parallel and upload the outputs of head models to computation server. Body models are executed separately in parallel and transmit their outputs to the clients as the inputs of tail models. The loss computation and back-propagation are independently performed in each client, and then the gradients of intermediate results are transferred to update the body and head models. After one communication round, client-side and server-side models are aggregated in the aggregation and computation servers, respectively. It can be seen that the input, label and output are not transferred to other parties and each party only partially accesses the full model in the whole learning process. As a result, our method can effectively protect data privacy.

In addition, to relieve the data heterogeneity problem, we propose a dynamic weight correction strategy (DWCS) to relieve the model drift problem and correct the global model. Specifically, a weight correction loss is designed to quantify the drift between models from two adjacent communication rounds, and the correction model is optimized by minimizing this loss. Then, we treat the weighted sum of the correction and last round models as the final corrected model. In early training, a large weight will be assigned to the model of current round, since the model is under-fitting. As the training continues, since the local training in the clients may lead to the model drift problem, a large weight for the correction model is necessary to recover from this dilemma and approach the global optimal solution. In summary, we expect to stabilize the training process for a robust model and the weights of the two models are dynamically tuned to avoid model drift. The main contributions of this paper are summarized as:

•

To simultaneously protect the privacy of input, model parameters, label and output, we propose a novel distributed learning framework RoS-FL for U-shaped networks, which can further facilitate parallel training and reduce local computation cost.
•

To relieve the problems of data heterogeneity and model drift, we propose DWCS to dynamically correct the global model.
•

Extensive experiments are conducted on medical image segmentation and restoration tasks, and our method demonstrates superior performance to competing methods in both tasks.

2 Related Works

2.1 U-shaped Medical Image Networks

Since U-Net was proposed [14], most segmentation methods chose U-shaped architecture as the backbone and achieved encouraging performances [15, 16, 17]. For example, Fakhry et al. [18] combined the residual connection with U-Net [14]. Similarly, some researchers used DenseNet block to replace regular convolutional layer [19, 20, 21]. Oktay et al. [22] proposed a novel attention gate segmentation model, which is dubbed Attention U-Net. Isensee et al. [23] proposed nnUNet, which trains the vanilla U-Net with multiple preprocessing steps and surpassed most existing approaches. The success of these methods demonstrates the effectiveness of U-shaped structure in medical segmentation [24].

Besides, U-shaped architecture is also widely used in medical restoration and reconstruction tasks [25]. For example, Chen et al. [26] introduced the residual block into the U-shaped autoencoder for low-dose CT restoration, which is dubbed residual encoder-decoder convolutional neural network (RED-CNN). Wang et al. [27] proposed generative adversarial networks with dual-domain U-Net-based discriminators for low-dose CT restoration. Wang et al. [2] combined the transformer block and U-Net, and achieved impressive performance. On the other hand, many recent proposed works chose U-shaped networks as the benchmark for medical image reconstruction [28, 29, 30]. Although the above segmentation and imaging methods achieved competitive performance, they need to collect numerous samples from multiple different data sources and usually ignore data privacy.

2.2 Distributed Collaborative Machine Learning

FL is one of the most popular DCML paradigms, which trains a full network at each client in parallel, and then the local gradients are transferred to the server for aggregation [31, 32, 33]. Typically, McMahan et al. [34] proposed FedAvg, which learns the global model by aggregating local models. Furthermore, Li et al. [35] proposed FedProx, which can be considered a re-parametrization of FedAvg. Li et al. [36] represented a personalized federated learning method FedBN, which alleviates the feature shift using personalized batch normalization (BN) in clients. In [37], Li et al. compared the local presentation and the global presentation to correct local updates. However, these methods require high client-side computation costs and both the client and server will access the full model, which risks privacy leak.

Recently, SL was proposed to alleviate the above problems by splitting a full model into multiple parts [11] and each client only needs to train a part of the model. SL seems to be a better choice than FL in computational resource-constrained environment [38]. In [39], graph neural network is combined with SL to protect model privacy. Jeon et al. [40] proposed parallel SL (PSL) learning to reduce the training time overhead. In this method, each client preserves its local model and doesn’t upload it to other clients.

Recently, Researchers have been enthusiastic about introducing DCML to the healthcare field to protect data privacy. Feng et al. [41] proposed a personalized magnetic resonance imaging method FedMRI, which consists of a globally shared encoder and client-specific decoders. Yang et al. [42] leveraged the prior information of scanning parameters to modulate different local models for CT imaging. In [43], federated domain generalization (FedDG) utilized the frequency information from different clients to handle the data heterogeneity. Park et al. [44] proposed a multi-task federated learning method for COVID-19 segmentation, detection and classification. Poirot et al. [45] introduced SL for disease classification. Roth et al. [46] combined split learning and U-Net, but labels and inputs are hosted in different parties, which is against the privacy settings.

However, FL and SL-based methods are hampered in some challenges [47]. SL has the advantage of model privacy protection, but it cannot achieve parallel training. FL can parallelize training, but it requires huge client computational sources, and no model privacy is committed. To enjoy the best of both worlds, Thapa et al. [48] proposed split federated learning (SFL), the hybrid of FL and SL, which achieved satisfactory training overhead and prediction accuracy simultaneously. Zhang et al. [49] evaluated the performance of split federated learning in several medical tasks. Since SFL transfers outputs or labels between different parties, violating the privacy setting, it is unsuitable for the widely used U-shaped networks for medical image segmentation and reconstruction. Besides, its other limitation is it suffers from the data heterogeneity problem.

3 Proposed Method

3.1 Problem Formulation

Suppose that there are $N$ clients, denoted as $C_{1}$ , …, $C_{N}$ . Each client $C_{i}$ has a local specific dataset $\mathcal{D}_{i}$ . The goal of FL is to learn a full model $\mathcal{M}$ from $\mathcal{D}=\bigcup_{i=1}^{N}\mathcal{D}_{i}$ . Then, the optimization of FL can be formulated as follows:

\underset{\theta}{\arg\min}\mathcal{L}=\sum_{i=1}^{N}\frac{\left|\mathcal{D}_{i}\right|}{|\mathcal{D}|}\mathbb{E}_{(x,y)\sim\mathcal{D}_{i}}\left[\mathcal{L}_{i}(\mathcal{M}(x,\theta),y)\right],

(1)

where $\mathcal{L}_{i}$ denotes the loss function for $C_{i}$ , and $\mathcal{L}$ means the overall loss function across all clients. $x$ and $y$ represent the input and label, respectively. $|\mathcal{D}_{i}|$ and $|\mathcal{D}|$ are the number of samples in $\mathcal{D}_{i}$ and $\mathcal{D}$ . $\theta$ is the parameter set of $\mathcal{M}$ .

Different from FL, whose optimal objective is the full network, the optimal objective of SL is composed of multiple partitions. Assume $\mathcal{M}$ is split into two parts, the client-side model $\mathcal{M}_{c}$ and the server-side model $\mathcal{M}_{s}$ . The optimization of SL is formulated as:

\underset{\theta_{s},\theta_{c}}{\arg\min}\mathcal{L}=\sum_{i=1}^{N}\mathbb{E}_{(x,y)\sim\mathcal{D}_{i}}\left[\mathcal{L}_{i}(\mathcal{M}_{s}(\mathcal{M}_{c}(x,\theta_{c}),\theta_{s}),y)\right],

(2)

where $\theta_{c}$ and $\theta_{s}$ are the parameter sets of $\mathcal{M}_{c}$ and $\mathcal{M}_{s}$ , respectively.

Refer to caption — Figure 1: The learning process of FL and SL. The numbers represent the order of processing, and different lines denote different kinds of dataflow.

3.2 Architecture of RoS-FL

The learning processes of FL and SL are illustrated in Fig. 1. It can be observed that the optimization objectives of these two methods are different. FL expects an optimal full network, but SL tries to optimize the different partition networks. Besides, SL can effectively protect the model privacy, and its local computational costs are low. However, the learning process of SL is sequential, leading to serious training time overhead and under-utilization of client computational resources. Compared with SL, FL can be implemented in a parallel way, but its clients need to train the full models locally, which requires high client-side computational resources.

Actually, SL and FL are complementary and the drawbacks of SL can be fixed by FL and vice-versa. The split operation in SL can reduce the client-side computational cost and protects the model privacy. The aggregation operation in FL can take full advantage of local computational resources and significantly lower the training time overhead. To enjoy the merits of both worlds, we propose a hybrid learning paradigm for U-shaped medical image networks. A similar idea was also proposed in [10], but its splitting method is not suitable for segmentation, restoration, and reconstruction. The main reasons lie in that:

1) Violating privacy setting: This method transfers the label or output to other parties and these data contain patients’ privacy information.

2) Extra bandwidth: SFL requires extra bandwidth to transfer intermediate features of shortcut connections between different parties. However, shortcut connection is the main contribution for U-shaped medical image networks, and it is inappropriate to delete them to improve communication efficiency.

The above problems motivate us to propose a split method without sharing the input, model parameters, output and label of different parties. As shown in Fig. 2(a), the full network is split into three parts, including head, body, and tail networks. The lightweight head and tail networks are hosted in clients to reduce the local computational costs, and the computational resource-required body network is hosted in the server with high-performance computational resources. Then, we can formulate the forward process as follows:

\mathcal{F}(x)=\mathcal{M}_{t}(\mathcal{M}_{b}(\mathcal{M}_{h}(x,\theta_{h}),\theta_{b}),\theta_{t}),

(3)

where $\mathcal{M}_{h}$ , $\mathcal{M}_{b}$ and $\mathcal{M}_{t}$ denote the head, body and tail networks with parameter sets $\theta_{h}$ , $\theta_{b}$ and $\theta_{t}$ , respectively.

As mentioned above, the output of each encoder layer is the input of the corresponding decoder layer. Assuming that the connected encoder and decoder layers are hosted in different parties, the outputs of encoder layers must be uploaded to other party, which will increase the communication cost. In our method, the corresponding encoder and decoder layers are hosted in the same party, so there is no extra communication bandwidth for feature transfer. Notably, the input, label, and output are preserved locally without any sharing requirements in our method, so the data privacy is well protected.

3.3 RoS-FL Implementation

Based on the proposed splitting strategy, we propose a collaborative distributed training framework RoS-FL to achieve parallel training and protect data privacy simultaneously. The flowchart of RoS-FL is illustrated in Fig. 2(b). As mentioned above, the clients and server can access the full model in FL, which is against the principle of model privacy protection. To relieve this problem in FL, we assume that three parties, including clients, computation server, and aggregation server, participate in training. Each of them is authorized to only access part of the model. Clients and aggregation server can access the head and tail models, and the computation server access the body model. Besides, to implement parallel training, $N$ body models are built in the computation server corresponding to the $N$ clients, and the body models of all clients are executed separately to reduce the training time overhead.

At the beginning of training, client-side and server-side models are initialized in the aggregation and computation servers, respectively. All clients perform forward propagation of head models locally: $\hat{y}_{h}=\mathcal{M}_{h}(x,\theta_{h})$ , and then deliver the encoded results to the computation server. Benefiting from the setting of multiple body models, the forward propagation of body models can be executed in parallel: $\hat{y}_{b}=\mathcal{M}_{b}(\hat{y}_{h},\theta_{b})$ . At the end of the forward path, $\hat{y}_{b}$ is delivered to the clients to generate the final prediction of tail networks as $\hat{y}=\mathcal{M}_{t}(\hat{y}_{b},\theta_{t})$ .

1 Function Main:

\triangleright

Computation Server Executes

2 Initialize

\theta_{b}^{0}

3 for round $k=1,2,...,R$ do

4 AggSer(

k-1

)

5 for client $n=1,2,...,N$ in parallel do

\theta_{b,n}^{k}\leftarrow\theta_{b}^{k-1}

7 for epoch $i=1,2,...,E$ do

8 for client $n=1,2,...,N$ in parallel do

\hat{y}_{h}\leftarrow

HeadForward(

n,k

)

\hat{y}_{b}\leftarrow\mathcal{M}_{b}(\hat{y}_{h},\theta_{b,n}^{k})

\frac{\partial\mathcal{L}_{n}}{\partial\hat{y}_{b}}\leftarrow

TailMain(

n,\hat{y}_{b}

) & Backprop

12 HeadBack(

n,\frac{\partial\mathcal{L}_{n}}{\partial\hat{y}_{b}}

)

\theta_{b,n}^{k}\leftarrow\theta_{b,n}^{k}-\eta\frac{\partial\mathcal{L}_{n}}{\partial\theta_{b,n}^{k}}

\theta_{b}^{k}\leftarrow\sum_{n=1}^{N}\frac{\left|\mathcal{D}_{n}\right|}{|\mathcal{D}|}\theta_{b,n}^{k}

\theta_{b}^{k}\leftarrow

Correct(

\theta_{b}^{k},\theta_{b}^{k-1}

)

19Function HeadForward(

n

\triangleright

Client

C_{n}

Executes

x_{n}\leftarrow

Sampled input batch from

D_{i}

21 return

\mathcal{M}_{h}(x_{n},\theta_{h,n}^{k})

22Function TailMain(

n,\hat{y}_{b}

\triangleright

Client

C_{n}

Exceutes

y_{n}\leftarrow

Sampled label batch from

D_{i}

\hat{y}_{t}\leftarrow\mathcal{M}_{t}(\hat{y}_{b},\theta_{t,n}^{k})

\mathcal{L}_{n}\leftarrow

TaskLoss

(\hat{y}_{t},y_{n})

26 Backprop &

\theta_{t,n}^{k}\leftarrow\theta_{t,n}^{k}-\eta\frac{\partial\mathcal{L}_{n}}{\partial\theta_{t,n}^{k}}

27 return

\frac{\partial\mathcal{L}_{n}}{\partial\hat{y}_{b,n}^{k}}

28 Function HeadBack(

n

\frac{\partial\mathcal{L}_{n}}{\partial\hat{y}_{b}}

\triangleright

Client

C_{n}

Executes

29 Backprop &

\theta_{h,n}^{k}\leftarrow\theta_{h,n}^{k}-\eta\frac{\partial\mathcal{L}_{n}}{\partial\theta_{h,n}^{k}}

30 Function AggSer(

k

\triangleright

Aggregation Server Exceutes

31 if $k=0$ then

32 Initlialize

\theta_{h}^{k},\theta_{t}^{k}

33else

(\theta_{h}^{k},\theta_{t}^{k})\leftarrow(\sum_{n=1}^{N}\frac{\left|\mathcal{D}_{i}\right|}{|\mathcal{D}|}\theta_{h,n}^{k},\sum_{n=1}^{N}\frac{\left|\mathcal{D}_{i}\right|}{|\mathcal{D}|}\theta_{t,n}^{k})

(\theta_{h}^{k},\theta_{t}^{k})\leftarrow

(Correct(

\theta_{h}^{k},\theta_{h}^{k-1}

),Correct(

\theta_{t}^{k},\theta_{t}^{k-1}

))

37Deliver

\theta_{h}^{k},\theta_{t}^{k}

to clients

38Function Correct(

\theta^{k},\theta^{k-1}

\mathcal{L}_{con}\leftarrow

WeightCorrectionLoss

(\theta^{k},\theta^{k-1})

\theta_{c}^{k}\leftarrow\theta^{k}-\eta\frac{\partial\mathcal{L}_{con}}{\partial\theta^{k-1}}

\theta_{r}^{k}\leftarrow(1-\alpha)\theta^{k}+\alpha\theta_{c}^{k}

return

\theta^{k}

Algorithm 1 Main steps of RoS-FL.

After forward propagation, each client calculates the loss and starts backpropagation. Concretely, the gradients about $\mathcal{M}_{t}$ and $\hat{y}_{b}$ are calculated at first. Then, the gradients of $\hat{y}_{b}$ are transmitted to the computation server, and the server executes the backpropagation on $\mathcal{M}_{b}$ and deliver the gradients of $\hat{y}_{h}$ to clients. Finally, with the received gradients of $\hat{y}_{h}$ , the client executes the backpropagation of $\mathcal{M}_{h}$ . So far, one backpropagation pass between clients and the server is completed. To make full use of all local data and get optimal global models, we aggregate the client-side and server-side models in the aggregation and computation servers, respectively. Until now, one complete global training round has been finished. It can be observed that the training processes of different clients are executed in parallel, which greatly reduce the training time overhead of SL. Meanwhile, there is no party in our framework can access the full model, which effectively protects model privacy. Then, the servers provide aggregated models by averaging local models as:

\theta^{k}=\sum_{i=1}^{N}\frac{\left|\mathcal{D}_{i}\right|}{|\mathcal{D}|}\theta_{i}^{k},

(4)

where $k$ and $i$ represent the current training round and client index, respectively. $\theta^{k}=\{\theta^{k}_{h},\theta^{k}_{b},\theta^{k}_{t}\}$ represents the parameter sets of $\mathcal{M}^{k}=\{\mathcal{M}_{h}^{k},\mathcal{M}_{b}^{k},\mathcal{M}_{t}^{k}\}$ . Then our optimization problem is formulated as:

\underset{\theta_{h},\theta_{b},\theta_{t}}{\arg\min}\mathcal{L}=\sum_{i=1}^{N}\frac{\left|\mathcal{D}_{i}\right|}{|\mathcal{D}|}\mathbb{E}_{(x,y)\sim\mathcal{D}_{i}}\left[\mathcal{L}_{i}(\mathcal{F}(x),y)\right].

(5)

3.4 Dynamic Weight Correction Strategy

The original DCML suffers from the data heterogeneity problem, which leads to a huge gap between the optimization directions of local and global models. Since there is always a distribution gap between $D_{i}$ and $D$ in practice, local training will lead the local model to work badly in other data domains. As a result, it may generate a poor optimization solution to the global model and cause model collapse after aggregation. This problem happens more commonly in healthcare tasks, in which the collected data inevitably suffer from serious data heterogeneity caused by several factors, such as different hardware, scanning protocols and patients.

To recover from this situation, we propose DWCS to avoid the model drift problem. Specifically, we treat the model of the last communication round as the anchor model and propose a weight correction loss to quantify the drift between the anchor model and its adjacent communication round model. Then we get the correction model by minimizing the weight correction loss, and the weighted sum of the correction and last round models is treated as the final result. The weight correction loss function is defined as:

\underset{\theta^{k}}{\arg\min}\mathcal{L}_{con}(\theta^{k},\theta^{k-1})=\frac{\mu}{2}||\theta^{k}-\theta^{k-1}||_{2}^{2},

(6)

where $\mathcal{L}_{con}$ represents the weight correction loss, and $\mu$ is the hyperparameter constraining the optimization step factor. Then, our correction model can be formulated as:

\theta_{c}^{k}=\theta^{k}+\eta\bigtriangledown\mathcal{L}_{con}(\theta^{k},\theta^{k-1}),

(7)

where $\eta$ is the learning rate, and our correction model is $\theta_{c}^{k}=\{\theta_{c,h}^{k},\theta_{c,b}^{k},\theta_{c,t}^{k}\}$ .

In the early stage of training, a small weight should be assigned to the correction model to accelerate convergence. With the training continuing, the model almost converges, but the local training may cause severe model drift, which leads to global model collapse by aggregating. To alleviate the above issue, inspired by [50], we propose a dynamic adjustment strategy to stabilize the training process and minish the model drift. Then, a robust model $\theta_{r}^{k}$ can be obtained with the weighted summation of $\theta_{c}^{k}$ and $\theta^{k}$ , which can be defined as:

\theta_{r}^{k}=(1-\alpha)\theta^{k}+\alpha\theta_{c}^{k}.

(8)

where $\alpha=\min(1-\frac{1}{k+1},\beta)$ is the balancing factor, and $\beta$ denotes the maximum constraint value, which is set to $0.99$ in this paper.

To help readers to follow our method, the main steps of RoS-FL are introduced in Algorithm 1 in pseudocode style. It can be noticed that the proposed RoS-FL is a parallel DCML method without sharing the input, model parameters, output and label. As a result, the model privacy can be well protected in the whole training phase. Additionally, we only need to perform DWCS once during one communication round, so our method is lighter than regularization-based methods.

TABLE I: Quantitative Results for the Segmentation Task.

Method	DSC $\uparrow$	HD95 $\downarrow$	ASD $\downarrow$	JC $\uparrow$
CL	0.8910	2.587	0.7169	0.8089
SL [11]	0.8754	3.624	1.0272	0.7886
PSL [40]	0.8469	3.884	1.1564	0.7461
FedAvg [34]	0.8493	3.747	1.1164	0.7489
FedProx [35]	0.8692	3.044	0.9267	0.7777
FedBN [36]	0.8510	4.268	1.3180	0.7514
FedMRI [41]	0.4921	21.410	6.8436	0.3583
FedDG [43]	0.8704	3.419	0.9860	0.7801
RoS-FL w/o DWCS	0.8743	3.450	1.0302	0.7861
RoS-FL w/ DWCS	0.8799	2.718	0.8572	0.7932

4 Experiments

4.1 Implementation Details

To validate the effectiveness of our method for U-shaped medical image networks in different tasks, we compare our method with several state-of-the-art DCML methods, such as SL[11], PSL[40], FedAvg[34], FedProx[35], FedBN[36], FedDG[43], FedMRI [41] and centralized learning (CL) method, which is treated as the benchmark. Adam [51] is adopted to optimize the models. The learning rate is set to $1\times 10^{-4}$ , and the weight decay is $1\times 10^{-8}$ . All codes are implemented in PyTorch and the experiments are performed on an NVIDIA GTX 3090 GPU.

4.2 Segmentation Experiments

We conduct segmentation experiments on the public dataset Automated Cardiac Diagnosis Challenge (ACDC) [52], which contains 200 annotated short-axis cardiac MR-cine images from 100 patients. All short-axis slices within 3D scans are resized to 256 $\times$ 256 as 2D images. We randomly divide 80 patients on average into 4 clients as the training set, and the remaining 20 patients are treated as the testing set. Dice Similarity Coeffcient (DSC), 95% Hausdorff Distance (HD95), Average Surface Distance (ASD), and Jaccard Index (JC) are chosen as the quantitative metrics in this paper. Segmentation networks are all optimized with cross-entropy loss and dice loss. The numbers of communication rounds and local training epochs are set to 300 and 1, respectively.

TABLE II: The Geometry Parameters and Dose Levels in Different Clients.

	Client #1	Client #2	Client #3	Client #4
Number of views	1024	128	512	384
Number of detector bins	512	768	768	600
Pixel length (mm)	0.66	0.78	1.0	1.4
Detector bin length (mm)	0.72	0.58	1.23	1.64
Distance between the source and rotation center (mm)	250	350	500	350
Distance between the detector and rotation center (mm)	250	300	400	300
Intensity of X-rays	1e5	1e6	5e4	1.25e5

The quantitative results are listed in Tab. I. It is observed that our method can achieve competitive performance to CL and outperforms other DCML methods. SL achieves satisfactory performance, but its training process is sequential, which leads the training time overhead is $N$ times (the number of clients) longer than our RoS-FL and other FL methods. SL beats most FL-based methods. Our method inherits the advantages of SL and performs much better than FL-based methods. Three representative visual results are shown in Fig. 3. Our method accurately identifies the boundaries of organ regions, and significantly outperforms other methods. In the first case, our result is even better than CL. We must mention that FedMRI is proposed for MRI reconstruction, which is unsuitable for segmentation task. In the ablation experiment, we find that DWCS can effectively improve the segmentation performance due to its good trade-off between the convergence acceleration and model drift recovery.

4.3 Restoration Experiments

TABLE III: Quantitative Results for the Imaging Task.

	Client #1		Client #2		Client #3		Client #4		Average
Method	PSNR $\uparrow$	SSIM $\uparrow$	PSNR $\uparrow$	SSIM $\uparrow$	PSNR $\uparrow$	SSIM $\uparrow$	PSNR $\uparrow$	SSIM $\uparrow$	PSNR $\uparrow$	SSIM $\uparrow$
CL	39.35	0.9647	42.31	0.9617	45.55	0.9859	42.83	0.9683	42.60	0.9711
SL [11]	39.14	0.9562	41.71	0.9592	45.43	0.9863	42.79	0.9664	42.27	0.9670
PSL [40]	38.75	0.9572	41.47	0.9565	44.61	0.9831	41.58	0.9582	41.60	0.9637
FedAvg [34]	39.67	0.9643	40.29	0.9444	45.22	0.9851	42.73	0.9675	41.98	0.9653
FedProx [35]	37.35	0.9309	38.10	0.9049	43.23	0.9750	40.34	0.9471	39.75	0.9395
FedBN [36]	38.09	0.9477	40.01	0.9321	44.73	0.9843	42.33	0.9628	41.29	0.9567
FedDG [43]	38.95	0.9528	39.34	0.9231	44.06	0.9788	42.10	0.9623	41.11	0.9542
FedMRI [41]	37.48	0.9384	40.91	0.9451	44.57	0.9842	39.95	0.9430	40.73	0.9524
RoS-FL w/o DWCS	39.58	0.9668	40.82	0.9527	45.17	0.9851	43.00	0.9694	42.14	0.9685
RoS-FL w/ DWCS	42.15	0.9749	40.92	0.9513	45.48	0.9870	43.71	0.9750	42.82	0.9721

The well-known NIH-AAPM-Mayo Low-Dose CT dataset [53], which contains 5936 full-dose CT images from 10 patients, is used to verify the performance of our method. Eight patients are randomly divided into four clients on average as the training dataset, and two patients are treated as testing dataset. To simulate real environments, we generate multi-source non-iid low-dose CT (LDCT) data following [54, 55]. Poisson noise and electronic noise were added to the measured projection data to simulate the low-dose case as follows:

p=\ln\frac{I_{0}}{\operatorname{Poisson}\left(I_{0}\exp(-\hat{p})\right)+\operatorname{Normal}\left(0,\sigma_{e}^{2}\right)},

(9)

where $\hat{p}$ represents the clean projection, and $\sigma_{e}$ denotes the variance of electronic noise. $I_{0}$ represents the number of photons. In this paper, we fixed the electronic noise variance at $\sigma_{e}^{2}=10$ and treat $I_{0}=1\times 10^{6}$ following [54, 55].

In this paper, four cases with different sparse-view and low-dose data are simulated and the corresponding geometric parameters and dose levels are listed in Tab. II. Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM) are employed as the quantitative metrics. Restoration networks are optimized with mean-squared error (MSE) loss. The numbers of communication rounds and local training epochs are set to 500 and 1, respectively.

The quantitative results are shown in Tab. III. It can be noticed that our method achieves the best performance in comparison with other DCML methods, and even works better than CL in some clients. Similar to the segmentation task, FL-based methods are inferior to SL-based methods in most cases. The possible reason lies in that the transferred weights are only with limited information and they cannot make full use of the information from local data. However, as mentioned above, the training time overheads of FL-based methods are much less than those of SL-based methods. Our method combines the merits of both learning paradigms and achieves the best performance. In the ablation experiment about DWCS, we can easily find that it improves the overall performance. We can notice that the results of other methods in Client #1 have a significant performance gap with those in other clients, which is probably caused by the model drift problem. Benefiting from DWCS, which effectively alleviates the model drift problem by correcting the optimal solution, our method has no noticeable performance gap between different nodes. Fig. 4 shows several typical slices denoised using different methods. It can be observed that results denoised by other methods still contain noise or artifacts to varying degrees, but our method can effectively remove them. In some results of other methods, edges are blurry and some tiny structures are wrongly restored. Compared with them, our method correctly restores those structural details and edges, and they are clearer in our results.

TABLE IV: Analysis of Different Numbers of Local Epochs in Restoration Task. (Communication Round/Local Epoch)

	1/500	2/250	4/125	5/100
PSNR	42.81	42.38	41.89	41.90
SSIM	0.9721	0.9699	0.9650	0.9649

4.4 Ablation Experiments

TABLE V: Analysis of Different Numbers of Local Epochs in Segmentation Task. (Communication Round/Local Epoch)

	1/300	2/150	4/75	5/60
Dice	88.02	88.21	87.52	88.00
HD95	2.4560	2.3756	2.5918	2.5810

In this subsection, We evaluate the impact of the hyperparameter $\mu$ on the performance, which is used to control the optimization step of calculating the correction model. The results are shown in Fig. 5, where the left and right vertical axes denote the PSNR and DSC values for imaging and segmentation tasks, respectively. We conduct experiments with $\mu$ from $1\times 10^{-6}$ to 100. It can be observed that our method is not very sensitive to $\mu$ , and our performance is better than other DCML methods even under the worst case. As suggested in Fig. 5, $1\times 10^{-4}$ is chosen as the default selection in this paper.

The domain shift problem between clients may cause the global model to deviate from the global optimal solution after aggregation. We conduct experiments to sense the impact of the number of local training epochs and communication rounds on the performance. For all the cases, we set the numbers of training iterations equal, and the results of segmentation and restoration tasks are shown in Tabs. V and IV, respectively. We notice that increasing the number of local training epochs would decrease the imaging performance, but the segmentation performance is not significantly affected. The domain gap in imaging task is greater than that in segmentation task since the scanner and scanning parameters may be different, which lead to a more serious model drift problem in imaging. The model drift is more serious as the number of local training epochs increases, which will lead the global model to deviate from the global optimal solution. As a result, we empirically decrease the number of local training epochs to avoid the above issues.

5 Conclusions

Current U-shaped medical image networks have achieved impressive success without considering privacy issues. To simultaneously protect the data privacy of input, model parameters, output and label, we propose dynamic weight correction split federated learning (RoS-FL) for U-shaped medical image networks. Except for privacy protection, RoS-FL also has other important merits, such as low training time overhead and local computational resource. Meanwhile, we focus on the model drift problem in distributed learning, and propose dynamic weight correction strategy (DWCS) to correct the optimization solution and stabilize the training. Extensive experiments on different tasks demonstrate the effectiveness of our method. On the other side, although RoS-FL can achieve satisfactory performance on different domains, it ignores the test data belonging to an unseen domain. As a result, how to improve the performance for an unseen domain seems an interesting research field in our future work.

Another issue we must mention is that although the proposed RoS-FL achieves promising performance, one concern for our method is the risk of reconstructing raw images from shared feature maps. Related risks can be prevented by integrating privacy protecting techniques, such as differential privacy [56] and secure multi-party computation [57, 58]. Extending the proposed RoS-FL against inversion attacks by combining these technologies will be another possible research direction in the future.

Compliance with Ethical Standards

This research study was conducted retrospectively using real clinical exams acquired at the University Hospital of Dijon and Mayo Clinic. Ethical approval was not required as confirmed by the license attached with the open access data.

References

[1] L. Liu, J. Cheng, Q. Quan, F.-X. Wu, Y.-P. Wang, and J. Wang, “A survey on u-shaped networks in medical image segmentations,” Neurocomputing, vol. 409, pp. 244–258, 2020.
[2] Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general u-shaped transformer for image restoration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17683–17693, June 2022.
[3] W. Xia, H. Shan, G. Wang, and Y. Zhang, “Synergizing physics/model-based and data-driven methods for low-dose ct,” arXiv preprint arXiv:2203.15725, 2022.
[4] G. Xu, X. Han, S. Xu, T. Zhang, H. Li, X. Huang, and R. H. Deng, “Hercules: Boosting the performance of privacy-preserving federated learning,” IEEE Transactions on Dependable and Secure Computing, 2022.
[5] Y. Liu, Z. Ma, Y. Yang, X. Liu, J. Ma, and K. Ren, “Revfrf: Enabling cross-domain random forest training with revocable federated learning,” IEEE Transactions on Dependable and Secure Computing, 2021.
[6] G. Xu, H. Li, Y. Zhang, S. Xu, J. Ning, and R. Deng, “Privacy-preserving federated deep learning with irregular users,” IEEE Transactions on Dependable and Secure Computing, 2020.
[7] C. Thapa, M. A. P. Chamikara, and S. A. Camtepe, “Advancements of federated learning towards privacy preservation: from federated learning to split learning,” in Federated Learning Systems, pp. 79–109, Springer, 2021.
[8] B. Huang, X. Li, Z. Song, and X. Yang, “Fl-ntk: A neural tangent kernel-based framework for federated learning analysis,” in Proceedings of the International Conference on Machine Learning, pp. 4423–4434, PMLR, 2021.
[9] M. Jiang, H. Yang, X. Li, Q. Liu, P.-A. Heng, and Q. Dou, “Dynamic bank learning for semi-supervised federated image diagnosis with class imbalance,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 196–206, Springer, 2022.
[10] C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8485–8493, 2022.
[11] O. Gupta and R. Raskar, “Distributed learning of deep neural network over multiple agents,” Journal of Network and Computer Applications, vol. 116, pp. 1–8, 2018.
[12] A. Singh, P. Vepakomma, O. Gupta, and R. Raskar, “Detailed comparison of communication efficiency of split learning and federated learning,” arXiv preprint arXiv:1909.09145, 2019.
[13] L. Qu, Y. Zhou, P. P. Liang, Y. Xia, F. Wang, E. Adeli, L. Fei-Fei, and D. Rubin, “Rethinking architecture design for tackling data heterogeneity in federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10061–10071, 2022.
[14] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of the International Conference on Medical image Computing and Computer-Assisted Intervention, pp. 234–241, Springer, 2015.
[15] Z. Peng, X. Fang, P. Yan, H. Shan, T. Liu, X. Pei, G. Wang, B. Liu, M. K. Kalra, and X. G. Xu, “A method of rapid quantification of patient-specific organ doses for ct using deep-learning-based multi-organ segmentation and gpu-accelerated monte carlo dose computing,” Medical physics, vol. 47, no. 6, pp. 2526–2536, 2020.
[16] B. Lei, Z. Xia, F. Jiang, X. Jiang, Z. Ge, Y. Xu, J. Qin, S. Chen, T. Wang, and S. Wang, “Skin lesion segmentation via generative adversarial networks with dual discriminators,” Medical Image Analysis, vol. 64, p. 101716, 2020.
[17] B. Lei, S. Huang, H. Li, R. Li, C. Bian, Y.-H. Chou, J. Qin, P. Zhou, X. Gong, and J.-Z. Cheng, “Self-co-attention neural network for anatomy segmentation in whole breast ultrasound,” Medical image analysis, vol. 64, p. 101753, 2020.
[18] A. Fakhry, T. Zeng, and S. Ji, “Residual deconvolutional networks for brain electron microscopy image segmentation,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 447–456, 2016.
[19] Z.-H. Wang, Z. Liu, Y.-Q. Song, and Y. Zhu, “Densely connected deep u-net for abdominal multi-organ segmentation,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 1415–1419, IEEE, 2019.
[20] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes,” IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
[21] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019.
[22] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
[23] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature Methods, vol. 18, no. 2, pp. 203–211, 2021.
[24] N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-net and its variants for medical image segmentation: A review of theory and applications,” IEEE Access, vol. 9, pp. 82031–82057, 2021.
[25] K. Kulathilake, N. A. Abdullah, A. Q. M. Sabri, and K. W. Lai, “A review on deep learning approaches for low-dose computed tomography restoration,” Complex & Intelligent Systems, pp. 1–33, 2021.
[26] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose ct with a residual encoder-decoder convolutional neural network,” IEEE Transactions on Medical Imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
[27] J. Wang, Y. Tang, Z. Wu, B. M. Tsui, W. Chen, X. Yang, J. Zheng, and M. Li, “Domain adaptive denoising network for low-dose ct via noise estimation and transfer learning,” Medical Physics, 2022.
[28] S. Lee, M. Negishi, H. Urakubo, H. Kasai, and S. Ishii, “Mu-net: Multi-scale u-net for two-photon microscopy image denoising and restoration,” Neural Networks, vol. 125, pp. 92–103, 2020.
[29] Z. Huang, J. Zhang, Y. Zhang, and H. Shan, “Du-gan: Generative adversarial networks with dual-domain u-net-based discriminators for low-dose ct denoising,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2021.
[30] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets: Application to sparse-view ct,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1418–1429, 2018.
[31] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
[32] W. Huang, M. Ye, and B. Du, “Learn from others and be yourself in heterogeneous federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10143–10153, 2022.
[33] J. Chen, M. Jiang, Q. Dou, and Q. Chen, “Federated domain generalization for image recognition via cross-client style transfer,” arXiv preprint arXiv:2210.00912, 2022.
[34] B. McMahan, E. Moore, D. Ramage, et al., “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics, pp. 1273–1282, PMLR, 2017.
[35] T. Li, A. K. Sahu, M. Zaheer, et al., “Federated optimization in heterogeneous networks,” in Proceedings of the Machine Learning and Systems, vol. 2, pp. 429–450, 2020.
[36] X. Li, M. Jiang, X. Zhang, et al., “Fedbn: Federated learning on non-iid features via local batch normalization,” arXiv preprint arXiv:2102.07623, 2021.
[37] Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognitition, pp. 10713–10722, 2021.
[38] A. Singh, P. Vepakomma, O. Gupta, and R. Raskar, “Detailed comparison of communication efficiency of split learning and federated learning,” arXiv preprint arXiv:1909.09145, 2019.
[39] C. Shan, H. Jiao, and J. Fu, “Towards representation identical privacy-preserving graph neural network via split learning,” arXiv preprint arXiv:2107.05917, 2021.
[40] J. Jeon and J. Kim, “Privacy-sensitive parallel split learning,” in Proceedings of the International Conference on Information Networking, pp. 7–9, IEEE, 2020.
[41] C.-M. Feng, Y. Yan, S. Wang, Y. Xu, L. Shao, and H. Fu, “Specificity-preserving federated learning for mr image reconstruction,” IEEE Transactions on Medical Imaging, 2022.
[42] Z. Yang, W. Xia, Z. Lu, Y. Chen, X. Li, and Y. Zhang, “Hypernetwork-based personalized federated learning for multi-institutional ct imaging,” arXiv preprint arXiv:2206.03709, 2022.
[43] Q. Liu, C. Chen, J. Qin, Q. Dou, and P.-A. Heng, “Feddg: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1013–1023, 2021.
[44] S. Park, G. Kim, J. Kim, B. Kim, and J. C. Ye, “Federated split task-agnostic vision transformer for covid-19 cxr diagnosis,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 34, pp. 24617–24630, 2021.
[45] M. G. Poirot, P. Vepakomma, K. Chang, J. Kalpathy-Cramer, R. Gupta, and R. Raskar, “Split learning for collaborative deep learning in healthcare,” arXiv preprint arXiv:1912.12115, 2019.
[46] H. R. Roth, A. Hatamizadeh, Z. Xu, C. Zhao, W. Li, A. Myronenko, and D. Xu, “Split-u-net: Preventing data leakage in split learning for collaborative multi-modal brain tumor segmentation,” in International Workshop on Distributed, Collaborative, and Federated Learning, Workshop on Affordable Healthcare and AI for Resource Diverse Global Health, pp. 47–57, Springer, 2022.
[47] V. Turina, Z. Zhang, F. Esposito, and I. Matta, “Federated or split? a performance and privacy analysis of hybrid split and federated learning architectures,” in Proceeding of the IEEE International Conference on Cloud Computing, pp. 250–260, IEEE, 2021.
[48] C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8485–8493, 2022.
[49] M. Zhang, L. Qu, P. Singh, J. Kalpathy-Cramer, and D. L. Rubin, “Splitavg: A heterogeneity-aware federated deep learning method for medical imaging,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 9, pp. 4635–4644, 2022.
[50] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Proceedings of the Advances in Neural Information Processing systems, vol. 30, 2017.
[51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[52] O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.-A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. G. Ballester, et al., “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?,” IEEE Transactions on Medical Imaging, vol. 37, no. 11, pp. 2514–2525, 2018.
[53] C. McCollough, “Tu-fg-207a-04: overview of the low dose ct grand challenge,” Med. Phys., vol. 43, no. 6, pp. 3759–3760, 2016.
[54] W. Xia, Z. Lu, Y. Huang, et al., “Ct reconstruction with pdf: parameter-dependent framework for data from multiple geometries and dose levels,” IEEE Transactions on Medical Imaging, vol. 40, no. 11, pp. 3065–3076, 2021.
[55] S. Niu, Y. Gao, Z. Bian, J. Huang, W. Chen, G. Yu, Z. Liang, and J. Ma, “Sparse-view x-ray ct reconstruction via total generalized variation regularization,” Physics in Medicine & Biology, vol. 59, no. 12, p. 2997, 2014.
[56] J. Zhou, N. Wu, Y. Wang, S. Gu, Z. Cao, X. Dong, and K.-K. R. Choo, “A differentially private federated learning model against poisoning attacks in edge computing,” IEEE Transactions on Dependable and Secure Computing, 2022.
[57] B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, M. Ibrahim, and L. van der Maaten, “Crypten: Secure multi-party computation meets machine learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 4961–4973, 2021.
[58] C. Park, D. Hong, and C. Seo, “Evaluating differentially private decision tree model over model inversion attack,” International Journal of Information Security, vol. 21, no. 3, pp. 1–14, 2022.