This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation

Xinyang Huang1, Chuang Zhu1, Bowen Zhang1, Shanghang Zhang2
Abstract

Semi-supervised domain adaptation (SSDA) has been widely studied due to its ability to utilize a few labeled target data to improve the generalization ability of the model. However, existing methods only consider designing certain strategies for target samples to adapt, ignoring the exploration of customized learning for different target samples. When the model encounters complex target distribution, existing methods will perform limited due to the inability to clearly and comprehensively learn the knowledge of multiple types of target samples. To fill this gap, this paper focuses on designing a framework to use different strategies for comprehensively mining different target samples. We propose a novel source-free framework (SOUF) to achieve semi-supervised fine-tuning of the source pre-trained model on the target domain. Different from existing SSDA methods, SOUF decouples SSDA from the perspectives of different target samples, specifically designing robust learning techniques for unlabeled, reliably labeled, and noisy pseudo-labeled target samples. For unlabeled target samples, probability-based weighted contrastive learning (PWC) helps the model learn more discriminative feature representations. To mine the latent knowledge of labeled target samples, reliability-based mixup contrastive learning (RMC) learns complex knowledge from the constructed reliable sample set. Finally, predictive regularization learning (PR) further mitigates the misleading effect of noisy pseudo-labeled samples on the model. Extensive experiments on benchmark datasets demonstrate the superiority of our framework over state-of-the-art methods. Our code will be available after acceptance.

Introduction

Deep neural networks (DNNs) have achieved great success in many computer vision tasks (Krizhevsky, Sutskever, and Hinton 2012, 2017; Simonyan and Zisserman 2014). However, these networks need to provide rich labels during training (Rawat and Wang 2017; Alzubaidi et al. 2021). Therefore, domain adaptation (DA) is proposed to alleviate this problem and help the model migrate from a source domain with rich labels to a target domain with different distributions of no or few labels (Ganin and Lempitsky 2015; Pan et al. 2010). Depending on the number of target labels, DA can be divided into unsupervised domain adaptation (UDA) and semi-supervised domain adaptation (SSDA). This paper focuses on SSDA, which can utilize a few target labels to expand semantic information and learn target knowledge, thereby achieving domain alignment.

Refer to caption
Figure 1: Differences between the proposed method and existing methods. Figures (a) and (b) show that our method focuses on fine-tuning in a semi-supervised target domain without the source data. Based on this learning framework, SOUF performs customized learning for different types of target samples. Figure (c) shows that our method achieves SoTA with a large advantage on the DomainNet and Office-Home datasets.

Due to its practical advantages, SSDA has attracted more attention and has been widely studied (Saito et al. 2019; Li et al. 2021a, b; Yan et al. 2022; Yu and Lin 2023; Huang, Zhu, and Chen 2023; Li, Li, and Yu 2023, 2024; He, Liu, and Yin 2024). However, existing methods focus on the study of target sample learning strategies and ignore the importance of customized learning for different types of target samples. Most existing methods (Yan et al. 2022; Huang, Zhu, and Chen 2023; Yu and Lin 2023) only design a series of learning strategies for unlabeled samples, but when there is knowledge bias in the distribution of unlabeled target samples (such as noisy pseudo-label samples), the learned knowledge is unreliable. At the same time, existing methods (Yan et al. 2022; Li, Li, and Yu 2023) ignore the potential deep knowledge of labeled target samples, which is difficult to learn with simple supervised learning. Therefore, it is necessary to design a framework to use different strategies for the comprehensive mining of different target samples. To fill this gap, this paper proposes a novel source-free framework called SOUF, which decouples the SSDA from the perspective of learning different samples. Decoupling the target domain according to different samples can help the model better understand the target domain by learning different sub-tasks. Compared with existing SSDA methods, our framework can more fully utilize the powerful representation and transfer capabilities of the model, thereby better adapting to semi-supervised target domains.

As shown in Figure 1, unlike the training paradigm of most existing SSDA methods, this paper considers a source-free scenario (Liang, Hu, and Feng 2020), i.e.i.e., fine-tuning the source-pretrained model in the target domain. Our SOUF is proposed to decouple the semi-supervised target domain from the perspective of learning from unlabeled, reliable labeled, and noisy pseudo-labeled target samples.

Specifically, we propose probability-based weighted contrastive learning (PWC) for unlabeled target samples to help the model learn more discriminative representations. The adaptive weights assign lower weights to low-confidence samples to reduce the impact of erroneous semantic information, which can better help the model learn unlabeled target knowledge (Zhang et al. 2023). In the semi-supervised fine-tuning stage, it is also essential to further learn the latent knowledge of labeled samples. We combine labeled samples with high-confidence unlabeled samples to construct a new set of reliable labeled samples. We propose reliability-based mixup contrastive learning (RMC) to mix transformer patches from the constructed reliable labeled sample set and learn complex target representations, which can capture the commonalities and differences between different samples (Zhang et al. 2017; Chen et al. 2022; Zhu, Bai, and Wang 2023). Finally, even if we learn the feature knowledge of labeled and unlabeled samples through sufficient contrastive learning, the learned knowledge will still be biased when there is much noise in target pseudo-labeled samples (Song et al. 2019; Liu et al. 2020). We leverage predictions of pseudo-labeled samples to constrain the probabilistic output from the predictive regularization learning (PR) perspective to improve the performance of the model when facing complex target data.

Our contributions can be summarized as follows: 1) We propose a novel source-free framework for SSDA scenarios to decouple and customize learning of different types of target samples. 2) We design a series of novel learning methods for unlabeled, reliably labeled, and noisy pseudo-labeled target samples. 3) Our framework is one of the first attempts to solve SSDA using a source-free transformer-based framework. 4) Compared with existing SSDA methods, our method achieves state-of-the-art performance and does not need to access any source data during the semi-supervised adaptation stage.

Related Work

Unsupervised Domain Adaptation

To solve the problem that supervised learning requires extensive manual annotation, unsupervised domain adaptation (UDA) aims to transfer knowledge from a fully labeled source domain to an unlabeled target domain. The basic idea is to use the maximum mean difference (MMD)(Gretton et al. 2012), which achieves migration from the source domain to the target domain by minimizing the feature distribution. DANN (Ganin et al. 2016) and JAN (Long et al. 2017) propose using the MMD criterion to learn transmission networks through cross-region alignment of multiple region-specific layers. Meanwhile, many recent works (Long et al. 2018; Hoffman et al. 2018; Xie et al. 2018; Shen et al. 2018; Ge et al. 2023) perform domain alignment from the perspective of adversarial learning can be effectively exploited to the target domain. Compared with CNN-based methods, due to the good generalization ability of the transformer (Vaswani et al. 2017), many works (Yang et al. 2023; Xu et al. 2021; Sun et al. 2022; Zhu, Bai, and Wang 2023) have also explored the application of the transformer in UDA. However, due to the ability to utilize additional target domain information, SSDA methods usually perform better than UDA methods in the target domain.

Semi-supervised Domain Adaptation

Semi-supervised domain adaptation (SSDA) aims to use a small number of labeled target samples to help the model better adapt to the target domain. Existing works can be simply divided into cross-domain alignment, adversarial training, and semi-supervised learning methods. In terms of cross-domain alignment, related works (Li et al. 2021b; Singh et al. 2021; Yang et al. 2021; He, Liu, and Yin 2024) integrate various complementary domain alignment techniques. G-ABC (Li, Li, and Yu 2023) further achieves semantic alignment by forcing the transfer from labeled source and target data to unlabeled target samples. Utilizing the idea of adversarial training, related methods (Saito et al. 2019; Qin et al. 2021; Jiang et al. 2020; Kim and Kim 2020; Li et al. 2021a; Qin et al. 2022; Ma et al. 2022) solve the SSDA by minimizing the entropy between the prototype and adjacent unlabeled target domain samples, thereby achieving adversarial training. Through semi-supervised learning methods, MCL (Yan et al. 2022) and ProML (Huang, Zhu, and Chen 2023) further help the model understand target domains that lack a large number of labels through consistency regularization. However, existing methods ignore the importance of customized learning for different types of target samples. This paper decouples SSDA and proposes a learning framework called SOUF to fully learn the target domain from the perspective of learning different samples.

Refer to caption
Figure 2: Illustration of our source-free framework. For unlabeled samples, probability-weighted contrastive learning (PWC) adaptively learns discriminative features, enhancing credible probability outputs. To further exploit labeled samples, reliability-based mixup contrastive learning (RMC) mixes patches from reliable samples, aiding in complex representation learning. Predictive regularization (PR) minimizes the impact of erroneous semantic information from noisy labels.

Methodology

Preliminaries

In the SSDA, the source dataset 𝒟s={xis,yis}i=1Ns{\mathcal{D}_{s}=\{x^{s}_{i},y^{s}_{i}\}^{N_{s}}_{i=1}} contains fully labeled data, 𝒟t={xit,yit}i=1Nt{\mathcal{D}_{t}=\{x^{t}_{i},y^{t}_{i}\}^{N_{t}}_{i=1}} contains a small amount of labeled data of the target domain, where NsN_{s} and NtN_{t} are the source domain and target domain dataset size respectively. xisx_{i}^{s} and xitx_{i}^{t} represent the labeled source and target image data and yisy^{s}_{i} and yity^{t}_{i} represent the corresponding labels. There is also an unlabeled target image set 𝒟u={xiu}i=1Nu{\mathcal{D}_{u}=\{x^{u}_{i}\}^{N_{u}}_{i=1}} for adaptation in the target domain, which contains unlabeled target image data and NuNtN_{u}\gg N_{t}. Our framework comprises a transformer-based feature extractor g()g(\cdot) and a linear classifier f()f(\cdot), as shown in Figure 2. Following (Huang, Zhu, and Chen 2023), we generate the strong augment view for each unlabeled target sample xiux_{i}^{u}, represented as x^iu\hat{x}_{i}^{u}. The target samples are then fed to the same feature extractor g()g(\cdot) and classifier f()f(\cdot) to obtain the probabilistic predictions piup_{i}^{u}, p^iu\hat{p}_{i}^{u}. This paper uses the cross-entropy loss to train feature extractor g()g(\cdot) and classifier f()f(\cdot) on the source domain. Following works (Liang, Hu, and Feng 2020; Ma et al. 2022), we freeze f()f(\cdot) and train g()g(\cdot) in the target domain. Based on this, we will introduce the proposed learning framework and how the training program can achieve further target learning.

Probability-based Weighted Contrastive Learning

After pre-training in the source domain, we consider fine-tuning the feature extractor in the semi-supervised target domain. Therefore, an effective method is needed to help the model learn the target knowledge representation in a large number of unlabeled target samples. In recent years, contrastive learning (Oord, Li, and Vinyals 2018; Grill et al. 2020; He et al. 2020) has been proven to be a reliable representation learning method, which can help models better understand data and learn helpful knowledge representations by constraining feature representations between samples. However, feature-based contrastive learning cannot represent the actual target distribution represented by unlabeled target features, which will harm the generalization ability of the classifier in the target domain.

To address these challenges, we consider proposing probability-based weighted contrastive learning (PWC) to help the model better adapt to the target domain. Specifically, we consider the following loss:

Lpwc=i=12Nuk=12Nuwiklogexp(piupku+/τ)j=12Nu𝕀(ji)exp(piupju/τ),\begin{aligned} L_{\mathrm{pwc}}&=-\sum_{i=1}^{2N_{u}}\sum_{k=1}^{2N_{u}}w_{ik}\log\frac{\exp(p_{i}^{u}\cdot p_{k}^{u+}/\tau)}{\sum_{j=1}^{2N_{u}}\mathbb{I}(j\neq i)\mathrm{exp}(p_{i}^{u}\cdot p_{j}^{u}/\tau)},\end{aligned}

(1)

where pku+p^{u+}_{k} represents the predicted probability of the positive target sample kk, 𝕀()\mathbb{I}(\cdot) represents the indicator function, τ\tau represents the temperature coefficient, and the adaptive weight wikw_{ik} is defined as follows:

wik={1if k=i,piupkuif i and k belong to the same category,0otherwise.\left.w_{ik}=\left\{\begin{array}[]{ll}1&\text{if }k=i,\\ p_{i}^{u}\cdot p_{k}^{u}&\text{if $i$ and $k$ belong to the same category},\\ 0&\text{otherwise}.\end{array}\right.\right.

(2)

Compared with instance-level contrastive learning, PWC learns category-level target relationships and can better combine the good feature representation extracted by the model. When considering the similarity between two samples, only when the predicted probability of two similar samples of the same category is the maximum value 1, their probability product piupkup_{i}^{u}\cdot p_{k}^{u} is 1. Therefore, the probability-based contrastive learning constraint model makes low-entropy judgments for similar target samples, thereby learning more discriminative feature representations. In addition, the adaptive weight ww in Eq. 2 helps the model learn more accurate sample relationships in the target domain by reducing the weight of low-confidence sample pairs and the possible error in learning target sample relationships.

Reliability-based Mixup Contrastive Learning

In the target adaptation stage, it is necessary to leverage the model to learn the latent knowledge of labeled samples (Berthelot et al. 2019), which is ignored by existing SSDA methods. Not limited to labeled samples, we consider combining labeled and high-confidence unlabeled samples to build a new set of reliable labeled samples to learn complex target knowledge.

Constructing reliable labeled samples. We mix image patches from new sets of reliable labeled samples to construct mixed samples to learn complex knowledge representations. The knowledge learned by mixing samples from the reliably labeled sample set is more reliable than directly mixing unlabeled samples. Because learning complex target knowledge relies on reliable sample label information (Yao et al. 2018; Wang et al. 2022; Albert et al. 2021; Chen et al. 2023), the pseudo-labels of low-confidence unlabeled samples may contain noise, and directly mixing them will cause the model to learn incorrect target knowledge. Specifically, the new reliable sample set is constructed as follows:

𝒟reliable=𝒟t𝒟highconf={xir,yir}i=1Nt+C×K={xir,yir}i=1Nr,\begin{aligned} \mathcal{D}_{reliable}=\mathcal{D}_{t}\cup\mathcal{D}_{high-conf}=\{x^{r}_{i},y^{r}_{i}\}^{N_{t}+C\times K}_{i=1}=\{x^{r}_{i},y^{r}_{i}\}^{N_{r}}_{i=1},\end{aligned}

(3)

where 𝒟t\mathcal{D}_{t} represents the labeled sample set, CC represents the total number of categories, 𝒟highconf={xih,yih}i=1C×K\mathcal{D}_{high-conf}=\{x^{h}_{i},y^{h}_{i}\}^{C\times K}_{i=1} represents the top-KK unlabeled samples with the smallest entropy in each category cc, and yihy^{h}_{i} represents the pseudo-label of xihx^{h}_{i} output by the model.

For samples from the new reliable sample set 𝒟reliable\mathcal{D}_{reliable}, we mix image patches from two different samples:

𝒳kn=λnxinr+(1λn)xjnr,\displaystyle\mathcal{X}_{kn}=\lambda_{n}\cdot x_{in}^{r}+(1-\lambda_{n})\cdot x_{jn}^{r}, (4)

where xinrx_{in}^{r} represents the nn-th patch of the ii-th sample from the reliable label sample set 𝒟reliable\mathcal{D}_{reliable}, 𝒳kn\mathcal{X}_{kn} represents the nn-th patch of the kk-th mixed sample, and λn\lambda_{n} is sampled from a learnable beta distribution Beta(β,γ)\mathrm{Beta}(\beta,\gamma).

Mixup cross-entropy learning. For mixed samples 𝒳k\mathcal{X}_{k}, we first consider calculating its mixup cross-entropy loss (Zhang et al. 2017):

Lmixce=k=1Nr(λ^kyirlog(𝒫k)+(1λ^k)yjrlog(𝒫k)),\begin{aligned} L_{\mathrm{mix-ce}}=\sum_{k=1}^{N_{r}}\left(\hat{\lambda}_{k}\cdot y^{r}_{i}\log(\mathcal{P}_{k})+(1-\hat{\lambda}_{k})\cdot y^{r}_{j}\log(\mathcal{P}_{k})\right),\end{aligned}

(5)

where 𝒫k\mathcal{P}_{k} represents the probability output of 𝒳k\mathcal{X}_{k}, and yiry^{r}_{i} and yjry^{r}_{j} represent the labels of components xirx^{r}_{i} and xjrx^{r}_{j} for the mixed samples 𝒳k\mathcal{X}_{k}, and λ^k\hat{\lambda}_{k} represents the adaptive coefficient after rescaling the attention score in the transformer:

λ^k=n=1mλnainn=1mλnain+n=1m(1λn)ajn,\displaystyle\hat{\lambda}_{k}=\frac{\sum_{n=1}^{m}\lambda_{n}a_{in}}{\sum_{n=1}^{m}\lambda_{n}a_{in}+\sum_{n=1}^{m}(1-\lambda_{n})a_{jn}}, (6)

where aina_{in} represents the average of the sum of attention scores for each layer corresponding to the nn-th patch, and mm is the number of the patches.

Mixup contrastive learning. Furthermore, we consider a novel mixup contrastive learning based on the PWC proposed in the previous section to learn complex target relationships. The specific equation is as follows:

Lmixcon=k=12Nrλ^ki=12Nrwkilogexp(𝒫kpir/τ)i=12Nr𝕀(ik)exp(𝒫kpir/τ)k=12Nr(1λ^k)j=12Nrwkjlogexp(𝒫kpjr/τ)j=12Nr𝕀(jk)exp(𝒫kpjr/τ),\begin{aligned} L_{\mathrm{mix-con}}=&-\sum_{k=1}^{2N_{r}}\hat{\lambda}_{k}\sum_{i=1}^{2N_{r}}w_{ki}\log\frac{\exp(\mathcal{P}_{k}\cdot p_{i}^{r}/\tau)}{\sum_{i=1}^{2N_{r}}\mathbb{I}(i\neq k)\mathrm{exp}(\mathcal{P}_{k}\cdot p_{i}^{r}/\tau)}\\ &-\sum_{k=1}^{2N_{r}}(1-\hat{\lambda}_{k})\sum_{j=1}^{2N_{r}}w_{kj}\log\frac{\exp(\mathcal{P}_{k}\cdot p_{j}^{r}/\tau)}{\sum_{j=1}^{2N_{r}}\mathbb{I}(j\neq k)\mathrm{exp}(\mathcal{P}_{k}\cdot p_{j}^{r}/\tau)},\end{aligned}

(7)

where pirp^{r}_{i} and pjrp^{r}_{j} represent the probability outputs of components xirx^{r}_{i} and xjrx^{r}_{j}. Combining the mixup cross-entropy loss and mixup contrastive learning loss, we consider the complete reliability-based mixup contrastive loss (RMC) as follows:

Lrmc=Lmixce+Lmixcon.\displaystyle L_{\mathrm{rmc}}=L_{\mathrm{mix-ce}}+L_{\mathrm{mix-con}}. (8)

Since original contrastive learning relies heavily on discriminative semantic knowledge, there is not enough opportunity to learn from negative examples once the model can distinguish between positive and negative examples. Even if applying certain data augmentation techniques can alleviate this problem, as proposed by works (Chen et al. 2020; Wang and Qi 2022; Zhang and Ma 2022), original contrastive learning is not robust to different data augmentation combinations. Through the proposed RMC, the model can customizedly learn the latent knowledge of labeled samples in the target domain, thereby better and comprehensively adapting to the target domain.

Predictive Regularization Learning

Even if the framework sufficiently considers comprehensive contrastive learning, the learning of the model will still be biased when there is more noise in the target pseudo-label. However, existing SSDA methods omit this effect. The early training phenomenon (Song et al. 2019; Liu et al. 2020; Bai et al. 2021) confirms that classifiers can predict noisy pseudo-labeled samples with high accuracy in the early adaptation stage before overfitting noisy data. Therefore, we propose predictive regularization learning (PR) for the SSDA scenario for the first time to reduce the misleading impact of noisy pseudo-labeled samples on the model. The following equation gives the specific form:

Lpr=1Nui=1Nulog(1y^iupiu),L_{\mathrm{pr}}=\frac{1}{N_{u}}\sum_{i=1}^{N_{u}}\log(1-\hat{y}^{u\ell\top}_{i}p_{i}^{u\ell}), (9)

where piup_{i}^{u\ell} is the target probability output at \ell-th epoch, y^iu=αy^iu(1)+(1α)piu\hat{y}^{u\ell}_{i}=\alpha\cdot\hat{y}^{u(\ell-1)}_{i}+(1-\alpha)\cdot p_{i}^{u\ell} is the moving average prediction and α=0.7\alpha=0.7 is the hyper-parameter. Compared with the existing SSDA method, PR can help the model better combine the proposed contrastive learning technique and minimize the impact of noisy labels.

Overall Training Objective

For labeled and unlabeled target data, we employ the cross-entropy objective for their labels and pseudo-labels, respectively, and combine them with mutual information loss (Liang, Hu, and Feng 2020). We denote it as LbaseL_{\mathrm{base}} and leave its details to (Liang, Hu, and Feng 2020) due to the page limit. Finally, the overall loss function can be shown as follows:

Lall=Lbase+λpwcLpwc+λrmcLrmc+λprLpr,L_{\mathrm{all}}=L_{\mathrm{base}}+\lambda_{\mathrm{pwc}}L_{\mathrm{pwc}}+\lambda_{\mathrm{rmc}}L_{\mathrm{rmc}}+\lambda_{\mathrm{pr}}L_{\mathrm{pr}}, (10)

where λpwc\lambda_{\mathrm{pwc}}, λrmc\lambda_{\mathrm{rmc}} and λpr\lambda_{\mathrm{pr}} are scalar hyper-parameters of the loss weights. By completely decoupling the target domain for semi-supervised adaptation, the framework can better leverage the complementary advantages of different learning objectives, thereby fully improving the generalization ability of the model. The overall target adaptation algorithm is described in the supplementary material.

SF Trans. Method R\rightarrowC R\rightarrowP P\rightarrowC C\rightarrowS S\rightarrowP R\rightarrowS P\rightarrowR Mean 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot ×\times ×\times S+T 55.6 60.0 60.6 62.2 56.8 59.4 50.8 55.0 56.0 59.5 46.3 50.1 71.8 73.9 56.9 60.0 ×\times ×\times DANN 58.2 59.8 61.4 62.8 56.3 59.6 52.8 55.4 57.4 59.9 52.2 54.9 70.3 72.2 58.4 60.7 ×\times ×\times MME 70.0 72.2 67.7 69.7 69.0 71.7 56.3 61.8 64.8 66.8 61.0 61.9 76.1 78.5 66.4 68.9 ×\times ×\times CDAC 77.4 79.6 74.2 75.1 75.5 79.3 67.6 69.9 71.0 73.4 69.2 72.5 80.4 81.9 73.6 76.0 ×\times ×\times CLDA 76.1 77.7 75.1 75.7 71.0 76.4 63.7 69.7 70.2 73.7 67.1 71.1 80.1 82.9 71.9 75.3 ×\times ×\times ECACL 75.3 79.0 74.1 77.3 75.3 79.4 65.0 70.6 72.1 74.6 68.1 71.6 79.7 82.4 72.8 76.4 ×\times ×\times ASDA 77.0 79.4 75.4 76.7 75.5 78.3 66.5 70.2 72.1 74.2 70.9 72.1 79.7 82.3 73.9 76.2 ×\times ×\times MCL 77.4 79.4 74.6 76.3 75.5 78.8 66.4 70.9 74.0 74.7 70.7 72.3 82.0 83.3 74.4 76.5 ×\times ×\times ProML 78.5 80.2 75.4 76.5 77.8 78.9 70.2 72.0 74.1 75.4 72.4 73.5 84.0 84.8 76.1 77.4 ×\times ×\times SLA 79.8 81.6 75.6 76.0 77.4 80.3 68.1 71.3 71.7 73.5 71.7 73.5 80.4 82.5 75.0 76.9 ×\times ×\times IDMNE 79.6 80.8 76.0 76.9 79.4 80.3 71.7 72.2 75.4 75.4 73.5 73.9 82.1 82.8 76.8 77.5 ×\times ×\times G-ABC 80.7 82.1 76.8 76.7 79.3 81.6 72.0 73.7 75.0 76.3 73.2 74.3 83.4 83.9 77.5 78.2 ×\times ×\times TriCT 86.5 89.1 85.3 86.6 80.4 86.3 71.0 79.9 80.3 84.5 78.9 82.1 82.2 90.1 80.7 85.5 ×\times DEEM 79.7 80.5 78.1 79.0 77.0 77.5 71.9 74.9 77.7 80.0 76.7 75.9 85.4 88.5 78.1 79.5 ×\times SOUF-C (Ours) 87.0 88.5 85.2 87.0 87.7 88.2 81.8 82.7 85.5 86.8 81.0 82.8 89.8 90.7 85.4 86.6 ×\times MCL-T 77.3 79.8 75.0 76.5 78.0 80.0 68.6 71.9 72.7 75.6 67.3 70.5 81.0 83.3 74.3 76.8 SHOT-T 74.3 76.6 78.8 82.4 75.4 78.8 70.5 71.0 78.6 81.9 69.9 72.1 86.5 87.5 76.3 78.6 DEEM-T 82.9 83.8 81.3 82.1 80.3 80.5 75.8 77.9 80.9 83.5 80.2 80.4 88.7 89.9 81.4 82.5 SOUF (Ours) 92.1 93.1 90.0 92.3 92.5 92.8 85.7 86.2 90.3 91.2 86.2 87.7 93.8 94.0 90.0 (+8.6) 91.0 (+5.5)

Table 1: Accuracy (%) on DomainNet under the settings of 1-shot and 3-shot; SFSF represents the source-free training scenario; Trans.Trans. represents the transformer-based method and {\dagger} represents our reproduced results.
SF Trans. Method R\rightarrowC R\rightarrowP R\rightarrowA P\rightarrowR P\rightarrowC P\rightarrowA A\rightarrowP A\rightarrowC A\rightarrowR C\rightarrowR C\rightarrowA C\rightarrowP Mean
×\times ×\times S+T 52.1 78.6 66.2 74.4 48.3 57.2 69.8 50.9 73.8 70.0 56.3 68.1 63.8
×\times ×\times DANN 53.1 74.8 64.5 68.4 51.9 55.7 67.9 52.3 73.9 69.2 54.1 66.8 62.7
×\times ×\times MME 61.9 82.8 71.2 79.2 57.4 64.7 75.5 59.6 77.8 74.8 65.7 74.5 70.4
×\times ×\times APE 60.7 81.6 72.5 78.6 58.3 63.6 76.1 53.9 75.2 72.3 63.6 69.8 68.9
×\times ×\times CDAC 61.9 83.1 72.7 80.0 59.3 64.6 75.9 61.2 78.5 75.3 64.5 75.1 71.0
×\times ×\times SLA 66.1 84.6 72.7 80.5 61.8 67.3 78.0 63.0 79.2 77.0 66.9 77.6 72.9
×\times ×\times MCL 67.0 85.5 73.8 81.3 61.1 68.0 79.5 64.4 81.2 78.4 68.5 79.3 74.0
×\times ×\times ProML 67.5 86.1 73.7 81.9 61.4 69.3 79.7 64.5 81.7 79.0 69.1 80.5 74.6
×\times DEEM 62.0 82.5 71.6 77.1 60.8 65.8 74.5 63.9 75.0 76.5 62.0 76.8 70.7
×\times SOUF-C (Ours) 79.9 89.5 80.6 84.2 77.9 79.7 85.7 82.5 84.1 85.4 84.7 86.6 83.4
×\times MCL-T 59.9 85.0 75.5 82.4 57.8 72.1 79.3 58.3 79.7 78.5 69.3 78.9 73.1
SHOT-T 65.8 77.5 68.8 75.8 64.8 68.9 76.7 63.3 75.9 78.0 74.6 80.3 72.6
DEEM-T 69.9 81.2 72.6 79.5 69.2 73.1 80.4 68.3 79.6 80.6 78.4 83.7 76.4
SOUF (Ours) 84.4 94.5 85.1 89.2 82.4 84.2 90.2 87.5 89.1 89.9 84.9 91.1 87.7 (+11.3)
Table 2: Accuracy (%) on Office-Home under the setting of 1-shot; SFSF represents the source-free training scenario; Trans.Trans. represents the transformer-based method and {\dagger} represents our reproduced results.
SF Trans. Method R\rightarrowC R\rightarrowP R\rightarrowA P\rightarrowR P\rightarrowC P\rightarrowA A\rightarrowP A\rightarrowC A\rightarrowR C\rightarrowR C\rightarrowA C\rightarrowP Mean
×\times ×\times S+T 55.7 80.8 67.8 73.1 53.8 63.5 73.1 54.0 74.2 68.3 57.6 72.3 66.2
×\times ×\times DANN 57.3 75.5 65.2 69.2 51.8 56.6 68.3 54.7 73.8 67.1 55.1 67.5 63.5
×\times ×\times MME 64.6 85.5 71.3 80.1 64.6 65.5 79.0 63.6 79.7 76.6 67.2 79.3 73.1
×\times ×\times APE 66.4 86.2 73.4 82.0 65.2 66.1 81.1 63.9 80.2 76.8 66.6 79.9 74.0
×\times ×\times CDAC 67.8 85.6 72.2 81.9 67.0 67.5 80.3 65.9 80.6 80.2 67.4 81.4 74.2
×\times ×\times SLA 70.1 87.1 73.9 82.5 69.3 70.1 82.6 67.3 81.4 80.1 69.2 82.1 76.3
×\times ×\times MCL 70.1 88.1 75.3 83.0 68.0 69.9 83.9 67.5 82.4 81.6 71.4 84.3 77.1
×\times ×\times ProML 71.0 88.6 75.8 83.8 68.9 72.5 83.9 67.8 82.2 82.3 72.1 84.1 77.8
×\times ×\times IDMNE 71.7 88.1 75.2 82.7 67.6 69.0 82.4 66.4 79.3 79.5 69.1 83.1 76.2
×\times ×\times G-ABC 70.0 88.1 76.0 82.8 69.3 70.5 83.8 67.2 80.4 80.2 69.2 83.9 77.2
×\times ×\times TriCT 81.9 94.1 86.3 92.3 78.7 83.4 91.1 76.9 91.3 91.8 80.5 92.5 86.7
×\times DEEM 67.2 83.0 67.1 78.3 69.3 66.3 79.7 65.9 78.3 77.6 65.1 80.8 73.2
×\times SOUF-C (Ours) 84.3 94.0 82.6 89.9 81.4 83.1 89.8 86.0 89.3 91.2 82.1 92.1 87.1
×\times MCL-T 69.4 89.8 77.6 84.2 65.7 74.0 84.2 66.2 82.6 81.4 74.8 84.4 77.9
SHOT-T 71.5 86.3 75.7 80.5 69.2 71.9 80.6 69.1 68.8 79.4 74.5 81.6 76.6
DEEM-T 75.3 89.7 79.3 84.4 73.1 76.0 84.3 73.5 82.6 83.2 78.1 85.3 80.4
SOUF (Ours) 86.8 97.0 85.8 93.2 83.6 86.0 93.4 88.6 92.1 94.3 85.3 94.8 90.1 (+3.4)
Table 3: Accuracy (%) on Office-Home under the setting of 3-shot with DeiT-S (Touvron et al. 2021) backbone which has the similar parameter size as ResNet-34 (He et al. 2016) used by existing CNN-based methods; SFSF represents the source-free training scenario; Trans.Trans. represents the transformer-based method and {\dagger} represents our reproduced results.
SF Trans. Method D\rightarrowA W\rightarrowA Mean
×\times ×\times S+T 62.4 61.2 61.8
×\times ×\times DANN 70.4 74.6 72.5
×\times ×\times MME 73.6 77.6 75.6
×\times ×\times DECOTA 74.2 78.3 76.3
×\times DEEM 78.9 80.5 79.7
SOUF (Ours) 82.6 82.9 82.8
Table 4: Accuracy (%) on Office-31 under the settings of 3-shot with DeiT-S (Touvron et al. 2021) backbone which has the similar parameter size as ResNet-34 (He et al. 2016) used by existing CNN-based methods.
Number LpwcL_{\mathrm{pwc}} LrmcL_{\mathrm{rmc}} LprL_{\mathrm{pr}} P\rightarrowC P\rightarrowR Mean
1 79.9 86.5 83.2
2 88.1 88.9 88.5
3 83.6 86.9 84.3
4 85.8 87.8 86.3
5 89.4 90.8 90.1
6 90.9 91.6 91.3
7 86.4 88.2 87.3
8 92.5 93.8 93.2
Table 5: Accuracy (%) of ablation study on DomainNet under the setting of 1-shot.

Experiment

In this section, we conduct experiments to answer the following questions: 1) Does SOUF outperform existing SSDA methods? 2) Are the proposed novel components effective? 3) Does SOUF perform better than other transformer-based DA methods or CNN-based SSDA methods that replace the backbone with transformers?

Experimental Settings

The experiment mainly consists of three datasets. DomainNet (Peng et al. 2019) is a large-scale domain adaptation benchmark dataset. We selected four domains: Real (R), Clipart (C), Painting (P), and Sketch (S). Office-Home (Venkateswara et al. 2017) is a medium-sized SSDA benchmark dataset. It includes four domains: Art (A), Clipart (C), Product (P), and Real (R). Office-31 (Saenko et al. 2010) is a small domain adaptation benchmark dataset. We choose DeiT-S (Touvron et al. 2021) as the extraction backbone of the transformer because its parameter size is similar to ResNet-34 (He et al. 2016) used by most existing CNN-based methods. The learning rate of the feature extractor classifier is set to 0.001 and 0.01. The temperature coefficient τ\tau in PWC is 0.15. The number of high-confidence samples KK is 2. The loss weights λpwc\lambda_{\mathrm{pwc}}, λrmc\lambda_{\mathrm{rmc}} and λpr\lambda_{\mathrm{pr}} are specified as 0.1, 0.1 and 3 respectively. We adopt the widely used Randaugmnt (Cubuk et al. 2020) as the strong data augmentation strategy. For more details, please refer to our code and supplementary material.

Comparison with State-of-the-Arts

We compare the performance of SOUF with previous state-of-the-art SSDA methods, including S+T, DANN (Ganin et al. 2016), MME (Saito et al. 2019), APE (Kim and Kim 2020), DECOTA (Yang et al. 2021), ECACL (Li et al. 2021b), ASDA (Qin et al. 2022), MCL (Yan et al. 2022), SLA (Yu and Lin 2023), CLDA (Singh 2021), CDAC (Li et al. 2021a), ProML (Huang, Zhu, and Chen 2023), DEEM (Ma et al. 2022), TriCT (Ngo et al. 2023), IDMNE (Li, Li, and Yu 2024), G-ABC (Li, Li, and Yu 2023), and SHOT (Liang, Hu, and Feng 2020). S+T refers to the method of training a model by supervising only labeled samples from two domains. SHOT is a source-free UDA method, and we add target cross-entropy loss to adapt it to SSDA. To further eliminate the unfairness of experiments using different backbone networks and further prove that our framework improves the ability of the model, we reproduce the results of MCL, SHOT, and DEEM on DeiT-S (Touvron et al. 2021) and denote them as “-T”. Due to the robustness of our method, we also apply PWC and PR learning techniques to the same CNN structure as existing methods, denoted as “SOUF-C”.

Effectiveness of the proposed SOUF. Table 1, 2, 3 and 4 show the quantitative comparison results of our proposed method with existing methods. As can be seen from the results, our method far outperforms most previous methods in most scenarios without source data. At the same time, the comparison results of SOUF-C with existing methods also show that our method can be well adapted to different backbone networks. It is worth noting that, attributed to the strong potential of the model exploited by our framework, even if the backbone of the existing SSDA method is simply replaced with the transformer (such as MCL-T and DEEM-T), the performance is still far behind our SOUF. The results show that our method performs well in different domains, further demonstrating the robustness of our method and the importance of learning from different samples.

Ablation Study

Each main component in SOUF. We performed an ablation study of the main components of SOUF in the 1-shot and 3-shot setups of DomainNet P\rightarrowC and P\rightarrowR, as shown in Table 5. Rows 2-4 show that each component can produce significant improvements. Rows 5-7 show that each combination can still improve performance, demonstrating the versatility of the proposed modules. Meanwhile, the PWC and PR modules can bring more significant improvements to the model than the RMC module. This is because, in the PWC and PR modules, the model has already learned good feature representations for most of the samples in the target domain, while the number of labeled target samples is relatively small. The best performance is obtained when all components of the model are activated.

\addstackgap[.5]0 Class-wise Probability Contrast Adaptive Weight DomainNet C\rightarrowS Office-Home R\rightarrowC Mean
77.9 76.2 77.1
80.2 79.8 80.0
83.6 83.4 83.5
85.7 84.4 85.1
Table 6: Accuracy (%) of ablation study for different components in PWC with 1-shot setting.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 3: The effect of different loss balance parameters λpwc\lambda_{\mathrm{pwc}}, λrmc\lambda_{\mathrm{rmc}}, and λpr\lambda_{\mathrm{pr}} on the model classification accuracy in the Office-Home P\rightarrowC scenario under the 1-shot setting.

Each main component in PWC. We investigated the specific techniques mentioned in PWC to further demonstrate the effectiveness of PWC in our framework, as shown in Table 6. It is worth noting that when no factors are considered, the model degenerates into InfoNCE loss (Chen et al. 2020). When considering contrastive learning between categories, the performance is improved because the model can learn the target knowledge between categories. When learning discriminative features from the probability space is considered, the model performance improves significantly as the model is forced to output more confident predictions and can be combined with the knowledge learned by the classifier to allow the feature extractor to discover more compact clusters of target representations. When considering the addition of adaptive weights, the model can adaptively learn relevant target representations of the same category with different confidence levels to achieve the best performance.

Each main component in RMC. We show the performance impact of using only unlabeled samples, labeled samples, and building new reliable sample sets on RMC in Table 7. RMC can further explore more target feature knowledge by building a new reliable sample set, thereby performing better in the target domain. At the same time, we explore the impact of mixup contrastive learning and different mixup methods on performance in Table 8. It can be seen that mixup contrastive learning can help the model learn more useful knowledge. From the comparison of the last two rows, it can be verified that the performance of patch-level mixup is better than that of image-level mixup. Patch-level mixup can re-weight according to the importance of each image patch, instead of linearly interpolating at the same ratio as image-level input. It can leverage knowledge in transformer patches that are difficult for CNN to capture and better complement other proposed learning techniques. This is also an important reason why SOUF can be specifically adapted to transformer-based backbone.

Unlabeled Labeled Reliable (Ours) C\rightarrowA C\rightarrowR Mean
82.3 86.8 84.6
84.0 88.3 86.2
84.9 89.9 87.4
Table 7: Accuracy (%) of ablation study with different samples for RMC on Office-Home with 1-shot setting.
Mixup contrast learning Image-level mixup Patch-level mixup C\rightarrowA C\rightarrowR Mean
83.9 88.6 86.3
82.4 87.8 85.1
84.9 89.9 87.4
Table 8: Accuracy (%) of ablation study with different mixup methods for RMC on Office-Home with 1-shot task.

Further Analysis

Comparison with transformer-based models in other DA scenarios. We note that there are works (Xu et al. 2021) exploring the application of transformers in other DA scenarios (such as CDTrans (Xu et al. 2021) in UDA). We add the target domain supervision loss to CDTrans to apply it to the SSDA scenario and make a fair comparison with our method, as shown in Table 9. Due to the lack of sufficient knowledge of the semi-supervised target domain, simply applying existing transformer-based methods in other DA scenarios to SSDA shows limited results compared to our SOUF, which further proves the innovation and effectiveness of our process for the SSDA.

Method Scenario DomainNet C\rightarrowS Office-Home R\rightarrowC Mean
CDTrans UDA 82.2 81.7 82.0
SOUF (ours) SSDA 85.7 84.4 85.1
Table 9: Compare with transformer-based methods of other DA scenarios in SSDA.

Sensitivity of λpwc\lambda_{\mathrm{pwc}}, λrmc\lambda_{\mathrm{rmc}} and λpr\lambda_{\mathrm{pr}}. In Eq. 10, we present the overall learning objectives of our SOUF framework which encompasses various components critical to the adaptation process. To further explore the factors affecting the performance of our framework, we show the impact of the loss balance parameters λpwc\lambda_{\mathrm{pwc}}, λrmc\lambda_{\mathrm{rmc}} and λpr\lambda_{\mathrm{pr}} on the classification accuracy under the Office-Home P\rightarrowC scenario in Figure 3. Specifically, it can be observed that when λpwc=0.1\lambda_{\mathrm{pwc}}=0.1, λrmc=0.1\lambda_{\mathrm{rmc}}=0.1, and λpr=3\lambda_{\mathrm{pr}}=3, the trained model achieves the highest performance in target image classification. More analyses are in the supplementary material.

Conclusion

In this paper, we present SOUF, a novel source-free framework for SSDA. Unlike existing methods, SOUF is proposed to decouple SSDA into distinct learning tasks for unlabeled, reliably labeled, and noisy pseudo-labeled target samples. For unlabeled samples, probability-weighted contrastive learning (PWC) enhances the model’s ability to learn discriminative features. Reliability-based mixup contrastive learning (RMC) further enriches the model’s understanding by mixing patches from a reliable sample set, capturing complex target knowledge. Predictive regularization learning (PR) mitigates the impact of noisy pseudo-labels by aligning current predictions with earlier outputs. Extensive experiments validate that SOUF significantly outperforms state-of-the-art SSDA methods.

References

  • Albert et al. (2021) Albert, P.; Ortego, D.; Arazo, E.; O’Connor, N.; and McGuinness, K. 2021. Relab: Reliable label bootstrapping for semi-supervised learning. In 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
  • Alzubaidi et al. (2021) Alzubaidi, L.; Zhang, J.; Humaidi, A. J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M. A.; Al-Amidie, M.; and Farhan, L. 2021. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of big Data, 8: 1–74.
  • Bai et al. (2021) Bai, Y.; Yang, E.; Han, B.; Yang, Y.; Li, J.; Mao, Y.; Niu, G.; and Liu, T. 2021. Understanding and improving early stopping for learning with noisy labels. Advances in Neural Information Processing Systems, 34: 24392–24403.
  • Berthelot et al. (2019) Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; and Raffel, C. A. 2019. Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32.
  • Chen et al. (2022) Chen, J.-N.; Sun, S.; He, J.; Torr, P. H.; Yuille, A.; and Bai, S. 2022. Transmix: Attend to mix for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12135–12144.
  • Chen et al. (2020) Chen, T.; Kornblith, S.; Norouzi, M.; and Hinton, G. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR.
  • Chen et al. (2023) Chen, Y.; Liu, M.; Wang, X.; Wang, F.; Liu, A.-A.; and Wang, Y. 2023. Refining Noisy Labels with Label Reliability Perception for Person Re-identification. IEEE Transactions on Multimedia.
  • Cubuk et al. (2020) Cubuk, E. D.; Zoph, B.; Shlens, J.; and Le, Q. V. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 702–703.
  • Ganin and Lempitsky (2015) Ganin, Y.; and Lempitsky, V. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning, 1180–1189. PMLR.
  • Ganin et al. (2016) Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; and Lempitsky, V. 2016. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1): 2096–2030.
  • Ge et al. (2023) Ge, P.; Ren, C.-X.; Xu, X.-L.; and Yan, H. 2023. Unsupervised domain adaptation via deep conditional adaptation network. Pattern Recognition, 134: 109088.
  • Gretton et al. (2012) Gretton, A.; Borgwardt, K. M.; Rasch, M. J.; Schölkopf, B.; and Smola, A. 2012. A kernel two-sample test. The Journal of Machine Learning Research, 13(1): 723–773.
  • Grill et al. (2020) Grill, J.-B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. 2020. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33: 21271–21284.
  • He, Liu, and Yin (2024) He, J.; Liu, B.; and Yin, G. 2024. Enhancing Semi-supervised Domain Adaptation via Effective Target Labeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 12385–12393.
  • He et al. (2020) He, K.; Fan, H.; Wu, Y.; Xie, S.; and Girshick, R. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9729–9738.
  • He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  • Hoffman et al. (2018) Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.-Y.; Isola, P.; Saenko, K.; Efros, A.; and Darrell, T. 2018. Cycada: Cycle-consistent adversarial domain adaptation. In International conference on machine learning, 1989–1998. Pmlr.
  • Huang, Zhu, and Chen (2023) Huang, X.; Zhu, C.; and Chen, W. 2023. Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning. arXiv preprint arXiv:2305.02693.
  • Jiang et al. (2020) Jiang, P.; Wu, A.; Han, Y.; Shao, Y.; Qi, M.; and Li, B. 2020. Bidirectional Adversarial Training for Semi-Supervised Domain Adaptation. In IJCAI, 934–940.
  • Kim and Kim (2020) Kim, T.; and Kim, C. 2020. Attract, perturb, and explore: Learning a feature alignment network for semi-supervised domain adaptation. In European conference on computer vision, 591–607. Springer.
  • Krizhevsky, Sutskever, and Hinton (2012) Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
  • Krizhevsky, Sutskever, and Hinton (2017) Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84–90.
  • Li et al. (2021a) Li, J.; Li, G.; Shi, Y.; and Yu, Y. 2021a. Cross-domain adaptive clustering for semi-supervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2505–2514.
  • Li, Li, and Yu (2023) Li, J.; Li, G.; and Yu, Y. 2023. Adaptive betweenness clustering for semi-supervised domain adaptation. IEEE Transactions on Image Processing.
  • Li, Li, and Yu (2024) Li, J.; Li, G.; and Yu, Y. 2024. Inter-domain mixup for semi-supervised domain adaptation. Pattern Recognition, 146: 110023.
  • Li et al. (2021b) Li, K.; Liu, C.; Zhao, H.; Zhang, Y.; and Fu, Y. 2021b. ECACL: A holistic framework for semi-supervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8578–8587.
  • Liang, Hu, and Feng (2020) Liang, J.; Hu, D.; and Feng, J. 2020. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International conference on machine learning, 6028–6039. PMLR.
  • Liu et al. (2020) Liu, S.; Niles-Weed, J.; Razavian, N.; and Fernandez-Granda, C. 2020. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33: 20331–20342.
  • Long et al. (2018) Long, M.; Cao, Z.; Wang, J.; and Jordan, M. I. 2018. Conditional adversarial domain adaptation. Advances in neural information processing systems, 31.
  • Long et al. (2017) Long, M.; Zhu, H.; Wang, J.; and Jordan, M. I. 2017. Deep transfer learning with joint adaptation networks. In International conference on machine learning, 2208–2217. PMLR.
  • Ma et al. (2022) Ma, N.; Bu, J.; Lu, L.; Wen, J.; Zhou, S.; Zhang, Z.; Gu, J.; Li, H.; and Yan, X. 2022. Context-guided entropy minimization for semi-supervised domain adaptation. Neural Networks, 154: 270–282.
  • Ngo et al. (2023) Ngo, B. H.; Chae, Y. J.; Kwon, J. E.; Park, J. H.; and Cho, S. I. 2023. Improved Knowledge Transfer for Semi-supervised Domain Adaptation via Trico Training Strategy. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 19214–19223.
  • Oord, Li, and Vinyals (2018) Oord, A. v. d.; Li, Y.; and Vinyals, O. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  • Pan et al. (2010) Pan, S. J.; Tsang, I. W.; Kwok, J. T.; and Yang, Q. 2010. Domain adaptation via transfer component analysis. IEEE transactions on neural networks, 22(2): 199–210.
  • Peng et al. (2019) Peng, X.; Bai, Q.; Xia, X.; Huang, Z.; Saenko, K.; and Wang, B. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, 1406–1415.
  • Qin et al. (2021) Qin, C.; Wang, L.; Ma, Q.; Yin, Y.; Wang, H.; and Fu, Y. 2021. Contradictory structure learning for semi-supervised domain adaptation. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), 576–584. SIAM.
  • Qin et al. (2022) Qin, C.; Wang, L.; Ma, Q.; Yin, Y.; Wang, H.; and Fu, Y. 2022. Semi-Supervised Domain Adaptive Structure Learning. IEEE Transactions on Image Processing, 31: 7179–7190.
  • Rawat and Wang (2017) Rawat, W.; and Wang, Z. 2017. Deep convolutional neural networks for image classification: A comprehensive review. Neural computation, 29(9): 2352–2449.
  • Saenko et al. (2010) Saenko, K.; Kulis, B.; Fritz, M.; and Darrell, T. 2010. Adapting visual category models to new domains. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, 213–226. Springer.
  • Saito et al. (2019) Saito, K.; Kim, D.; Sclaroff, S.; Darrell, T.; and Saenko, K. 2019. Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8050–8058.
  • Shen et al. (2018) Shen, J.; Qu, Y.; Zhang, W.; and Yu, Y. 2018. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  • Simonyan and Zisserman (2014) Simonyan, K.; and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  • Singh (2021) Singh, A. 2021. CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 5089–5101. Curran Associates, Inc.
  • Singh et al. (2021) Singh, A.; Doraiswamy, N.; Takamuku, S.; Bhalerao, M.; Dutta, T.; Biswas, S.; Chepuri, A.; Vengatesan, B.; and Natori, N. 2021. Improving semi-supervised domain adaptation using effective target selection and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2709–2718.
  • Song et al. (2019) Song, H.; Kim, M.; Park, D.; and Lee, J.-G. 2019. Prestopping: How does early stopping help generalization against label noise?
  • Sun et al. (2022) Sun, T.; Lu, C.; Zhang, T.; and Ling, H. 2022. Safe self-refinement for transformer-based domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7191–7200.
  • Touvron et al. (2021) Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; and Jégou, H. 2021. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, 10347–10357. PMLR.
  • Vaswani et al. (2017) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. Advances in neural information processing systems, 30.
  • Venkateswara et al. (2017) Venkateswara, H.; Eusebio, J.; Chakraborty, S.; and Panchanathan, S. 2017. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5018–5027.
  • Wang et al. (2022) Wang, K.; Peng, X.; Yang, S.; Yang, J.; Zhu, Z.; Wang, X.; and You, Y. 2022. Reliable label correction is a good booster when learning with extremely noisy labels. arXiv preprint arXiv:2205.00186.
  • Wang and Qi (2022) Wang, X.; and Qi, G.-J. 2022. Contrastive learning with stronger augmentations. IEEE transactions on pattern analysis and machine intelligence, 45(5): 5549–5560.
  • Xie et al. (2018) Xie, S.; Zheng, Z.; Chen, L.; and Chen, C. 2018. Learning semantic representations for unsupervised domain adaptation. In International conference on machine learning, 5423–5432. PMLR.
  • Xu et al. (2021) Xu, T.; Chen, W.; Wang, P.; Wang, F.; Li, H.; and Jin, R. 2021. Cdtrans: Cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165.
  • Yan et al. (2022) Yan, Z.; Wu, Y.; Li, G.; Qin, Y.; Han, X.; and Cui, S. 2022. Multi-level consistency learning for semi-supervised domain adaptation. arXiv preprint arXiv:2205.04066.
  • Yang et al. (2023) Yang, J.; Liu, J.; Xu, N.; and Huang, J. 2023. Tvt: Transferable vision transformer for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 520–530.
  • Yang et al. (2021) Yang, L.; Wang, Y.; Gao, M.; Shrivastava, A.; Weinberger, K. Q.; Chao, W.-L.; and Lim, S.-N. 2021. Deep co-training with task decomposition for semi-supervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8906–8916.
  • Yao et al. (2018) Yao, J.; Wang, J.; Tsang, I. W.; Zhang, Y.; Sun, J.; Zhang, C.; and Zhang, R. 2018. Deep learning from noisy image labels with quality embedding. IEEE Transactions on Image Processing, 28(4): 1909–1922.
  • Yu and Lin (2023) Yu, Y.-C.; and Lin, H.-T. 2023. Semi-Supervised Domain Adaptation with Source Label Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 24100–24109.
  • Zhang et al. (2017) Zhang, H.; Cisse, M.; Dauphin, Y. N.; and Lopez-Paz, D. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
  • Zhang and Ma (2022) Zhang, J.; and Ma, K. 2022. Rethinking the augmentation module in contrastive learning: Learning hierarchical augmentation invariance with expanded views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16650–16659.
  • Zhang et al. (2023) Zhang, Y.; Wang, Z.; Li, J.; Zhuang, J.; and Lin, Z. 2023. Towards Effective Instance Discrimination Contrastive Loss for Unsupervised Domain Adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11388–11399.
  • Zhu, Bai, and Wang (2023) Zhu, J.; Bai, H.; and Wang, L. 2023. Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3561–3571.