¹¹institutetext: Tsinghua University ²²institutetext: Southern University of Science and Technology ³³institutetext: The Hong Kong Polytechnic University ⁴⁴institutetext: Southeast University

Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation ^†^†thanks: The short version is accepted by IJCAI 1st International Workshop on Generalizing from Limited Resources in the Open World

Shuai Wang 11 Daoan Zhang 22 Zipei Yan 33 Shitong Shao 44 Rui Li 11

Abstract

Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data. However, accessing the source model still has a potential concern about leaking the source data, which reveals the patient’s privacy. In this paper, we study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available. We propose a simple but effective two-stage knowledge distillation method. In Stage I, we train the target model from scratch with soft pseudo-labels generated by the source model in a knowledge distillation manner. In Stage II, we initialize another model as the new student model to avoid the error accumulation caused by noisy pseudo-labels. We feed the images with weak augmentation to the teacher model to guide the learning of the student model. Our method is simple and flexible, and achieves surprising results on three cross-domain segmentation tasks.

1 Introduction

Deep Neural Networks (DNNs) have achieved remarkable success in the medical image analysis field [13, 23, 8, 22, 28]. However, DNNs are notoriously sensitive to the domain shift that train and test samples have different distributions. Especially in the medical image analysis field, domain shift usually occurs because medical images are obtained with varying parameters of acquisition or modalities. To tackle this problem, unsupervised domain adaptation (UDA) that aims to transfer knowledge from labeled source domain to unlabeled target domain has been widely explored [4, 25, 26].

Refer to caption — Table 1: Different settings. UDA: unsupervised domain adaptation. SFDA: source-free domain adaptation. BSFDA: black-box source-free domain adaptation. S. denotes source domain.

	S.data	S.model
UDA	✓	✓
SFDA	✗	✓
BSFDA	✗	✗

However, existing UDA methods usually need to access the source data, which raises concerns about data security and fairness. Source-free domain adaptation (SFDA) [5, 16, 20, 32, 29] aims to update the model trained on source domain with target data, i.e. not access to the source data. Even so, these methods still require source models, which have two limitations [17]. First, it is possible to recover the source data using generative models [10, 11]. This property may raise potential data security problems. Second, SFDA methods usually tune the parameters of the source model. So the target model must employ the same network architecture as the source model, which is impractical for low-source target users, i.e. some community hospitals.

This paper studies the challenging but practical problem: black-box source-free domain adaptation [17] (BSFDA). The illustration of BSFDA is shown in Figure 1. In this setting, we could only access the prediction from the source model via cloud open APIs (e.g. chatGPT ¹¹1https://chat.openai.com/) and target data without ground truth. Our goal is to train the target model from scratch (or only use ImageNet pre-trained weights [7]) using target data and the corresponding prediction from the black-box source model. Compared with UDA or SFDA, BSFDA does not raise concerns about data security and model security, as shown in Table 1.

To tackle this challenging problem, we propose a novel method based on knowledge distillation [12]. Our approach consists of two stages. In the first stage, we get soft pseudo-labels from the black-box source model. Then we adopt these soft pseudo-labels to guide the learning of the target model. We train the target model from scratch (or use ImageNet [7] pre-traind initialization) with a knowledge distillation [12] manner because soft labels usually provide more helpful information (“dark” knowledge [12]) than hard labels. The target model trained in the first stage has already achieved promising results but is still suboptimal. This is because pseudo-labels from the source model can not be updated, and the quality of pseudo-labels is never improved. To boost the performance of the target model, we design the two-view knowledge distillation to achieve this. We load the well-trained model in the first stage as the new teacher model and randomly initialize another model as the student model to avoid error accumulation. The key insight in the second stage is that we use soft labels under weak augmentation to guide the training for the strong augmented images. Specifically, we feed the weak augmented images to the teacher network and strong augmented images to the student network. We use the outputs of the teacher model to guide the outputs of the student model because inputs of the teacher network suffer from less distortion.

We validate our method on three cross-domain medical image segmentation tasks. Our method is simple but achieves state-of-the-art performance under black-box source-free domain adaptation setting on three tasks.

2 Method

The overview of our method is shown in Figure 2. Given a set of source images and pixel-wise segmentation labels $(x_{s},y_{s})\in\mathcal{D}_{s}$ , we first train the source model with parameters $\theta_{s}$ without bells and whistles. After that, only the trained source model is provided without access to the source data and model parameters. Specifically, only the outputs of the source model for target images $x_{t}$ are provided to utilize for domain adaption in the target domain. We aim to train a model from scratch (or only use ImageNet pre-trained [7] initialization) just using the outputs of the black-box source model and target images $x_{t}$ with the proposed method, i.e. two stage knowledge distillation. We will describe the details below.

2.1 Source Model Training

First, we train the source model on labeled source data $\mathcal{D}_{s}$ with source images $x_{s}$ and paired labels $y_{s}$ . Unlike [2, 14], we train the source model using common cross-entropy loss without bells and whistles. The objective function is formulated as

\mathcal{L}_{s}=-\mathbb{E}_{(x_{s},y_{s})\in\mathcal{D}_{s}}y_{s}\log f_{s}(x_{s}),

(1)

where $f_{s}$ denotes the source model with parameter $\theta_{s}$ .

After that, the source model and source data are not accessible. Only the outputs of the source model can be utilized.

2.2 Two-stage Knowledge Distillation

For the target domain image $x_{t}$ , we can get the soft pseudo-labels $\hat{y}_{t}=f_{s}(x_{t})$ using the black-box source model $f_{s}$ with open APIs. It is not trivial to design a method to train a new model from scratch only using pseudo-labels provided the source model. A simple strategy to train the target model is self-training that uses pseudo-labels $\hat{y}_{t}$ with cross-entropy loss. However, two significant concerns will raise. First, pseudo-labels are inevitably noise due to the distribution shift between the source domain and the target domain. How to efficiently use knowledge from the source model is still unclear. Second, pseudo-labels are frozen because the source model can not be updated after source training (i.e. stage 0).

To address these two issues, we propose a novel method consisting two stages via knowledge distillation. In Stage I, we train the target model $f_{t}$ from scratch using soft pseudo-labels rather than hard labels which aims to exact more help knowledge from source domain. In Stage II, another model is initialized randomly (or ImageNet [7] pre-trained initialization) to avoid error accumulation. After that, we use pseudo-labels under weak data augmentation to guide the learning for the strong augmented images. Next we will describe more details of our method.

Stage I. In this stage, we use knowledge distillation [12] to exact knowledge from source model. The reason of applying knowledge distillation manner (i.e. soft label) rather than hard pseudo labels have two folds. First, soft label could provide “dark” knowledge [12] from the source model. Second, soft pseudo-labels work better than hard pseudo-labels for out-of-domain data as suggested in [31].

To be specific, we train the target model $f_{t}$ with parameter $\theta_{t}$ from scratch as follows

\mathcal{L}_{1}=D_{\textrm{KL}}(\hat{y}_{t}||f_{t}(x_{t})),

(2)

where $D_{\textrm{KL}}$ denotes Kullback-Leibler divergence. This method has certain effect but the model $f_{t}$ is trained on the target domain with noisy and fixed labels $\hat{y}_{t}$ , which is suboptimal for target domain. Thus, we propose to leverage the second stage to enhance the trained model $f_{t}$ rely on knowledge distillation between two views of images.

Stage II. In Stage II, we aim to improve the performance of model $f_{t}$ via knowledge distillation method between two views of images. We initialize another model $f_{t^{\prime}}$ with parameter $\theta_{t^{\prime}}$ randomly or using ImageNet [7] pre-trained weights to avoid error accumulation due to noisy labels in Stage I. The core idea of this stage is to use pseudo-labels of weak augmentation images to guide the learning for the student model. Specifically, let $\mathcal{T}(x_{t})$ and $\mathcal{T}^{\prime}(x_{t})$ denote the weak and strong augmented images for $x_{t}$ , respectively. We feed the weak augmented images $\mathcal{T}(x_{t})$ into $f_{t}$ to get pseudo-labels $\hat{y}^{\prime}_{t}=f_{t}(\mathcal{T}(x_{t}))$ . After that, we use $\hat{y}^{\prime}_{t}$ to guide the learning of strong augmented images $\mathcal{T}^{\prime}(x_{t})$ for model $f_{t^{\prime}}$ because weak augmented images usually produce more reliable pseudo-labels. The loss function of this stage is formulated as follows

\mathcal{L}_{2}=D_{\textrm{KL}}(\hat{y}^{\prime}_{t}||f_{t^{\prime}}(\mathcal{T}^{\prime}(x_{t}))).

(3)

Finally, we get the target model $f_{t^{\prime}}$ for evaluation.

Remark. Our method has three advantages. First, our method does not require the target model to have the same network architecture as the source model, which is helpful for low-source target users (e.g. some community hospitals). Second, the final target model $f_{t^{\prime}}$ is fully trained on target images $x_{t}$ , which is not affected by domain shift. Third, in this setting, parameters of source model $f_{s}$ are not accessible for target users, which ensures the security of (source) model parameters.

Table 2: Results on the fundus segmentation task. Results of the first block are cited from [5]. We highlight the best results of each column. “-” denotes that the results are not reported and * denotes our implementation.

Methods	DSC [%] $\uparrow$			ASD [pixel] $\downarrow$
Methods	Disc	Cup	Avg.	Disc	Cup	Avg.
Source	83.18 $\pm$ 6.46	74.51 $\pm$ 16.40	78.85	24.15 $\pm$ 15.58	14.44 $\pm$ 11.27	19.30
BEAL [30]	89.80	81.00	85.40	-	-	-
AdvEnt [27]	89.70 $\pm$ 3.66	77.99 $\pm$ 21.08	83.86	9.84 $\pm$ 3.86	7.57 $\pm$ 4.24	8.71
SRDA [2]	89.37 $\pm$ 2.70	77.61 $\pm$ 13.58	83.49	9.91 $\pm$ 2.45	10.15 $\pm$ 5.75	10.03
DAE [14]	89.08 $\pm$ 3.32	79.01 $\pm$ 12.82	84.05	11.63 $\pm$ 6.84	10.31 $\pm$ 8.45	10.97
DPL [5]	90.13 $\pm$ 3.06	79.78 $\pm$ 11.05	84.96	9.43 $\pm$ 3.46	9.01 $\pm$ 5.59	9.22
Source *	88.34 $\pm$ 4.48	71.35 $\pm$ 22.75	79.85 $\pm$ 12.59	10.65 $\pm$ 4.27	10.75 $\pm$ 5.34	10.70 $\pm$ 4.18
EMD [19]	90.50 $\pm$ 3.78	73.50 $\pm$ 11.56	82.00 $\pm$ 8.76	10.52 $\pm$ 4.18	7.12 $\pm$ 4.15	8.82 $\pm$ 2.59
Ours	94.78 $\pm$ 2.65	77.79 $\pm$ 12.33	86.28 $\pm$ 7.04	4.41 $\pm$ 2.09	8.75 $\pm$ 5.27	6.58 $\pm$ 3.14

Table 3: Results on the cardiac dataset and prostate dataset in terms of DSC.

Methods	Cardiac				Prostate
Methods	RV	Myo	LV	Avg.	Prostate
Source	40.28 $\pm$ 26.73	48.83 $\pm$ 10.84	76.45 $\pm$ 10.21	55.19 $\pm$ 14.19	47.50 $\pm$ 26.21
EMD [19]	47.59 $\pm$ 28.46	53.67 $\pm$ 9.79	75.48 $\pm$ 9.58	58.91 $\pm$ 13.48	52.47 $\pm$ 23.18
Ours	51.10 $\pm$ 24.67	55.45 $\pm$ 8.88	77.12 $\pm$ 9.01	61.22 $\pm$ 12.41	56.12 $\pm$ 20.72

Table 4: Results on the cardiac dataset and prostate dataset in terms of ASD.

Methods	Cardiac				Prostate
Methods	RV	Myo	LV	Avg	Prostate
Source	4.50 $\pm$ 3.42	4.60 $\pm$ 2.51	5.78 $\pm$ 2.02	4.96 $\pm$ 1.74	9.80 $\pm$ 8.84
EMD [19]	2.12 $\pm$ 1.47	4.25 $\pm$ 1.95	5.13 $\pm$ 2.78	3.83 $\pm$ 1.41	8.11 $\pm$ 7.85
Ours	1.21 $\pm$ 1.05	3.98 $\pm$ 1.56	5.01 $\pm$ 1.95	3.40 $\pm$ 1.32	6.12 $\pm$ 5.42

3 Experiments

3.1 Dataset

We evaluate our method on three types of dataset: fundus, cardiac and prostate dataset.

Fundus. We choose the training set of REFUGE challenge [21] as the source domain and use the public dataset RIM-ONE-r3 [9] as the target domain. Following [5, 30], we split the source domain into 320/80 and target domain into 99/60 for training and test, respectively. We resize each image to $512\times 512$ disc region to feed the network during training and test process.

Cardiac. We use ACDC dataset [3] (200 volumes) as the source domain and LGE dataset (45 volumes) from Multi-sequence Cardiac MR segmentation Challenge (MSCMR 2019) [33] as the target domain. For two datasets, we randomly split them into 80% and 20% for training and test. We use 2d slices for training and resize all images to $192\times 192$ as the network input.

Prostate. MSD05 [1] (32 volumes) is used as the source domain and Promise12 [18] (50 volumes) is used as the target domain. We randomly split 80%/20% for training and test, respectively. We use 2d slices for training and resize all images to $224\times 224$ during the training process.

Table 5: Results on different networks between source domain and target domain on RIM-ONE-r3 dataset. Source model: DeepLabv3+ [6] with MobileNetV2 [24] backbone (# param: 5.8 M). Target model: UNet [23] (# param: 1.8 M).

Methods	Dice [%] $\uparrow$			ASD [pixel] $\downarrow$
Methods	Cup	Disc	Avg.	Cup	Disc	Avg.
Stage I	46.39 $\pm$ 34.86	75.06 $\pm$ 27.98	60.73 $\pm$ 29.42	14.90 $\pm$ 11.52	15.61 $\pm$ 14.45	15.26 $\pm$ 8.95
Stage II w/o aug	48.45 $\pm$ 33.66	76.18 $\pm$ 25.42	62.32 $\pm$ 27.4	13.78 $\pm$ 10.15	12.65 $\pm$ 11.48	13.22 $\pm$ 8.54
Stage II w/ aug	54.22 $\pm$ 33.13	75.89 $\pm$ 28.28	65.05 $\pm$ 28.75	13.27 $\pm$ 9.76	12.81 $\pm$ 10.63	13.04 $\pm$ 8.87
EMD [19]	47.14 $\pm$ 32.56	75.48 $\pm$ 26.77	61.31 $\pm$ 28.65	13.54 $\pm$ 10.88	14.75 $\pm$ 12.64	14.15 $\pm$ 8.78

Table 6: Ablation Study on three datasets. Metric: DSC (%). “aug” denotes strong augmentation in Stage II as introduced in Sec. 2.2.

Stage I	Stage II	w/ aug	Fundus	Cardiac	Prostate
			79.85 $\pm$ 12.59	55.19 $\pm$ 14.19	47.50 $\pm$ 26.21
✓			81.59 $\pm$ 9.83	58.87 $\pm$ 13.37	51.68 $\pm$ 24.56
✓	✓		83.61 $\pm$ 8.14	59.46 $\pm$ 13.24	53.87 $\pm$ 22.99
✓	✓	✓	86.28 $\pm$ 7.04	61.22 $\pm$ 12.41	56.12 $\pm$ 20.72

3.2 Implementation

Following [5], we choose DeepLabv3+ [6] with MobileNetV2 [24] as the backbone. We use Adam optimizer [15] with learning rate as $1e^{-4}$ and set batch size as 8. We train the source model for 200 epochs, and subsequently train the target model for two stages, with each stage consisting of 100 epochs. The weak augmentation only includes Gaussian noise and the strong augmentation includes Gaussian blur, contrast adjustment, brightness, gamma augmentation. For more details of data augmentation, we refer readers to [13]. All experiments are implemented using PyTorch on one RTX A6000 GPU.

For evaluation, we use two commonly used metrics in the medical image segmentation task including Dice Score (DSC) for pixel-wise measure and the Average Surface Distance (ASD) for measuring the performance at the object boundary. Note that the higher DSC and lower ASD means better performance.

3.3 Comparison Study

We mainly compared our method with: BEAL [30] and AdvEnt [27], two domain adaptation methods; SRDA [2], DAE [14] and DPL [5], three source-free domain adaptation methods; EMD [19], recent state-of-the-art methods for black-box source free domain adaption.

The quantitative results on the fundus dataset are listed in Table 2. From Table 2, our method outperforms the baseline (source in Table 2) with a significant margin. Specifically, our method outperforms baseline 6.43% and 4.12 pixels in DSC and ASD. Compared with another black-box method EMD, our method outperforms it by 4.28% DSC and 2.27 pixels in terms of ASD, respectively. Furthermore, our method defeats all compared methods, even domain adaptation methods (e.g. BEAL [30] and AdvEnt [27]). Experimental results demonstrate the effectiveness of our approach.

We present qualitative results in Figure 3, which shows segmentation results of two cases on the fundus dataset. It is observed that our method generates more compact and accurate predictions than compared methods.

Furthermore, we evaluated our method on cardiac and prostate dataset and results are presented in Table 3 in terms of DSC and Table 4 in terms of ASD, respectively. We could see that the proposed method still improves baseline stably on the cardiac and prostate dataset. Furthermore, compared with the recent black-box source-free domain adaptation method EMD [19], our method distinctly outperforms EMD 2.3% and 3.7% DSC on two datasets, respectively.

Lastly, we evaluate our method in the setting where target users only have limited computation resources. To simulate the situation of low-source target users, we adopted DeepLabv3+ [6] with MobileNetV2 [24] backbone (# param: 5.8 M) as the source model, and use UNet [23] (# param: 1.8 M) as the target model. We keep the other training details the same as in previous experiments. The quantitative results of fundus dataset are listed in Table 5. It is seen that our method still outperforms EMD [19] by a significant margin. More importantly, this experiment demonstrates the flexibility of black-box source-free domain adaptation when target users have limited computation resources. Without any groundtruth on the target domain , the small target model (e.g. UNet with 1.8 M parameters) will achieve promising results with 65.05% DSC and 13.04 pixel ASD only using outputs from the (bigger) source model.

3.4 Ablation Study

In this section, we conduct ablation studies on three datasets. The quantitative results are listed in Table 6. From Table 6, it is noticed that the target model only trained with Stage I improves baseline with a significant margin. Specifically, Stage I brings 1.7%, 3.7%, and 4.2% DSC gains compared with baseline. Furthermore, the performance of target models could be improved stably in Stage II. Finally, with the help of strong data augmentation, we achieved the best performance on three datasets. This demonstrates the importance of strong data augmentation.

4 Conclusion

In this paper, we address the challenging yet practical problem: black-box source-free domain adaptation. We propose a new method to tackle this problem. The key idea is that we aim to transfer knowledge from the black-box source model to the target model with two-stage knowledge distillation. In Stage I, we transfer knowledge from the black-box source model to the target model. In Stage II, we regard the target model as the new teacher to guide the learning of augmented images. Thanks to the two-stage knowledge distillation, our model could achieves remarkable performance on the target domain without any ground truth. Finally, we conducted extensive experiments on three medical image segmentation tasks, and the results demonstrate the effectiveness of the proposed method.

References

[1] Antonelli, M., Reinke, A., Bakas, S., Farahani, K., et al.: The medical segmentation decathlon. Nature Communications 13(1) (Jul 2022)
[2] Bateson, M., Kervadec, H., Dolz, J., Lombaert, H., Ben Ayed, I.: Source-relaxed domain adaptation for image segmentation. In: MICCAI. pp. 490–499. Springer (2020)
[3] Bernard, O., Lalande, A., Zotti, C., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Transactions on Medical Imaging 37(11), 2514–2525 (2018)
[4] Chen, C., Dou, Q., Chen, H., Qin, J., Heng, P.A.: Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation. IEEE Transactions on Medical Imaging 39(7), 2494–2505 (2020)
[5] Chen, C., Liu, Q., Jin, Y., Dou, Q., Heng, P.A.: Source-free domain adaptive fundus image segmentation with denoised pseudo-labeling. In: MICCAI. pp. 225–235. Springer (2021)
[6] Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)
[7] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
[8] Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (Jan 2017)
[9] Fumero, F., Alayón, S., Sanchez, J.L., Sigut, J., Gonzalez-Hernandez, M.: Rim-one: An open retinal image database for optic nerve evaluation. In: 2011 24th international symposium on computer-based medical systems (CBMS). pp. 1–6. IEEE (2011)
[10] Gao, J., Zhang, J., Liu, X., Darrell, T., Shelhamer, E., Wang, D.: Back to the source: Diffusion-driven test-time adaptation. arXiv preprint arXiv:2207.03442 (2022)
[11] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)
[12] Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
[13] Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18(2), 203–211 (Dec 2020)
[14] Karani, N., Erdil, E., Chaitanya, K., Konukoglu, E.: Test-time adaptable neural networks for robust medical image segmentation. Medical Image Analysis 68, 101907 (2021)
[15] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
[16] Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: ICML (2020)
[17] Liang, J., Hu, D., Feng, J., He, R.: Dine: Domain adaptation from single and multiple black-box predictors. In: CVPR (2022)
[18] Litjens, G., Toth, R., van de Ven, W., Hoeks, C., Kerkstra, S., van Ginneken, B., Vincent, G., Guillard, G., Birbeck, N., Zhang, J., et al.: Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Medical image analysis 18(2), 359–373 (2014)
[19] Liu, X., Yoo, C., Xing, F., Kuo, C.C.J., Fakhri, G.E., Kang, J.W., Woo, J.: Unsupervised black-box model domain adaptation for brain tumor segmentation. Frontiers in Neuroscience 16 (Jun 2022)
[20] Liu, Y., Zhang, W., Wang, J.: Source-free domain adaptation for semantic segmentation. In: CVPR. pp. 1215–1224 (2021)
[21] Orlando, J.I., Fu, H., Breda, J.B., Van Keer, K., Bathula, D.R., Diaz-Pinto, A., Fang, R., Heng, P.A., Kim, J., Lee, J., et al.: Refuge challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Medical image analysis 59, 101570 (2020)
[22] Rajpurkar, P., Chen, E., Banerjee, O., Topol, E.J.: AI in health and medicine. Nature Medicine 28(1), 31–38 (Jan 2022)
[23] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI (2015)
[24] Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR (2018)
[25] Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: CVPR. pp. 7472–7481 (2018)
[26] Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: CVPR. pp. 7167–7176 (2017)
[27] Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: CVPR. pp. 2517–2526 (2019)
[28] Wang, S., Yan, Z., Zhang, D., Wei, H., Li, Z., Li, R.: Prototype knowledge distillation for medical segmentation with missing modality. In: ICASSP (2023)
[29] Wang, S., Zhang, D., Yan, Z., Zhang, J., Li, R.: Feature alignment and uniformity for test time adaptation. In: CVPR (2023)
[30] Wang, S., Yu, L., Li, K., Yang, X., Fu, C.W., Heng, P.A.: Boundary and entropy-driven adversarial learning for fundus image segmentation. In: MICCAI. pp. 102–110. Springer (2019)
[31] Xie, Q., Luong, M., Hovy, E.H., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: CVPR (2020)
[32] Yang, S., Wang, Y., Van De Weijer, J., Herranz, L., Jui, S.: Generalized source-free domain adaptation. In: ICCV. pp. 8978–8987 (2021)
[33] Zhuang, X.: Multivariate mixture model for myocardial segmentation combining multi-source images. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(12), 2933–2946 (2018)

Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation ††thanks: The short version is accepted by IJCAI 1st International Workshop on Generalizing from Limited Resources in the Open World