This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Universal Domain Adaptive Object Detection via Dual Probabilistic Alignment

Yuanfan Zheng1,2\equalcontrib, Jinlin Wu1,2\equalcontrib, Wuyang Li3, Zhen Chen1 Corresponding author.
Abstract

Domain Adaptive Object Detection (DAOD) transfers knowledge from a labeled source domain to an unannotated target domain under closed-set assumption. Universal DAOD (UniDAOD) extends DAOD to handle open-set, partial-set, and closed-set domain adaptation. In this paper, we first unveil two issues: domain-private category alignment is crucial for global-level features, and the domain probability heterogeneity of features across different levels. To address these issues, we propose a novel Dual Probabilistic Alignment (DPA) framework to model domain probability as Gaussian distribution, enabling the heterogeneity domain distribution sampling and measurement. The DPA consists of three tailored modules: the Global-level Domain Private Alignment (GDPA), the Instance-level Domain Shared Alignment (IDSA), and the Private Class Constraint (PCC). GDPA utilizes the global-level sampling to mine domain-private category samples and calculate alignment weight through a cumulative distribution function to address the global-level private category alignment. IDSA utilizes instance-level sampling to mine domain-shared category samples and calculates alignment weight through Gaussian distribution to conduct the domain-shared category domain alignment to address the feature heterogeneity. The PCC aggregates domain-private category centroids between feature and probability spaces to mitigate negative transfer. Extensive experiments demonstrate that our DPA outperforms state-of-the-art UniDAOD and DAOD methods across various datasets and scenarios, including open, partial, and closed sets. Codes are available at https://github.com/zyfone/DPA.

Introduction

Object detection has made significant progress in recent years (Li et al. 2021b; Jia et al. 2023; Zhao et al. 2024). However, the well-trained object detector failed to generalize in novel domain scenarios due to domain shift. Domain Adaptive Object Detection (DAOD) (Krishna, Ohashi, and Sinha 2023; Huang et al. 2024) transferring from the source domain to the unlabelled target domain to overcome domain shift and has been widely applied in medical analysis (Pu et al. 2024; Ali et al. 2024; Liu, Li, and Yuan 2023), autonomous driving (Cai et al. 2024; Shi, Zheng, and Chen 2024) and robotic understanding (Chapman et al. 2023; Li et al. 2024). However, DAOD is limited by the closed-set assumption (Ben-David et al. 2010; Li et al. 2023c) failing to generalize to real-world scenarios. To address this, Universal DAOD (UniDAOD) endows DAOD with open-set domain adaption capabilities to overcome label shifts without prior knowledge of categories. The former work, US-DAF (Shi et al. 2022), leverages the threshold filter mechanism and the scale-sensitive domain alignment. CODE (Shi et al. 2024b) adopts virtual domain alignment to avoid aligning domain-private category samples to mitigate negative transfer. Other methods (Lang et al. 2023; Shi et al. 2024a) adopt dynamic weighting for domain-private categories to facilitate positive transfer. Essentially, existing UniDAOD approaches align shared categories at both the global and instance levels while ignoring the alignment of domain-private categories.

Refer to caption
Figure 1: The visualization of domain probability in the domain discriminator222. (a) As the number of domain-private categories increases, a more distinct gap emerges between the two domains at the global level, suggesting the alignment of the domain-private category. (b) With the increases of the domain-private category, the mean probability gap remains approximately constant at the instance level, indicating the alignment of domain-shared category.
22footnotetext: The horizontal axis is training iteration (×\times100), and the vertical axis is the probability of domain discriminator.

Despite significant progress in recent years, the current UniDAOD paradigm encounters two major issues that result in suboptimal domain alignment. The first issue is that they overlook global feature alignment with domain-private categories. Due to the agnostic prior category knowledge, existing methods (Lang et al. 2023; Shi et al. 2024a) primarily focus on estimating the domain-shared category set to mitigate negative transfer, wrongly assuming that both global and instance features fairly contribute to the domain-shared alignment. For the first time, we empirically reveal the issue of this assumption in Fig. 2. We claim the fact is that global-level features tend to align domain private categories, while instance-level features tend to align domain shared categories, which can also be thoroughly justified in Fig. 4. This phenomenon motivates us to revisit the UniDAOD domain alignment, focusing on the domain-private category alignment at the global level for UniDAOD.

The second issue is the heterogeneity of features at different feature levels. Since the global feature is a rough representation of entire input images, while the instance feature corresponds to object instances, the feature gap results in significant differences in domain probabilities. Existing approaches address this by employing different thresholds and entropy functions, but these methods require manual parameter tuning. In addition to that, adopting advanced UniDA frameworks such as clustering (Saito et al. 2020; Li et al. 2021a), optimal transport (Chang et al. 2022), and mutual learning (Lu et al. 2024) are complex and challenging to adapt to detection tasks.

To address these issues, we propose a novel Dual Probabilistic Alignment (DPA) framework. For the first issue, we conduct a theoretical analysis to unveil that domain-private alignment is crucial for global-level features. Therefore, we propose a Global-level Domain Private Alignment (GDPA) module that includes global-level sampling, alignment weight calculation, and global-level domain alignment. Global-level sampling aims to mine domain-private category samples. Alignment weight calculation involves the cumulative distribution function to refine the distribution distance estimation as the weight, thereby conducting global-level domain alignment to address the domain-private alignment issue. For the second problem, we conduct a tailored domain-shared category alignment at the instance-level features. To effectively obtain the domain-shared category, we propose a novel unsupervised clustering perspective. We set the domain label as the center and map samples to domain probabilities to calculate the gradient norm (distance). We then model the frequency of the gradient norm as a Gaussian distribution using bins. The continuous frequency bins of the samples represent those within a certain radius (the sum of bins). Therefore, we propose an Instance-level Domain Shared Alignment (IDSA) method consisting of instance-level sampling, alignment weight calculation, and instance-level domain alignment. The instance-level sampling utilizes a Gaussian distribution modeling to select domain-shared category samples. Alignment weight calculation involves the Gaussian distribution statistical properties as the weight, thereby conducting instance-level domain alignment to address the heterogeneity of features across different levels. According to the upper bound obtained by theoretical analysis, the PCC module aggregates domain-private category centroids and conducts cross-space consistency in the private category to mitigate negative transfer. In conclusion, our key contributions are as follows:

  • We first reveal that domain-specific alignment is crucial for global-level features. Additionally, we provide a theoretical analysis of the upper bound of UniDAOD to support this observation.

  • A novel unsupervised clustering perspective is proposed to sample the instance samples through continuous frequency bins of the gradient norm as the sampling radius on the Gaussian distribution modeling.

  • We propose a novel Dual Probabilistic Alignment (DPA) framework. DPA aligns domain-private categories at the global and domain-shared categories at the instance level. In addition, the DPA aggregates domain-private categories centroid between feature and probability spaces to mitigate negative transfer.

  • Extensive experiments across open-set, partial-set, and closed-set scenarios demonstrate that the DPA framework achieves state-of-the-art performance, significantly surpassing existing UniDAOD methods.

Related Work

Domain Adaptive Object Detection (DAOD)

DAOD addresses the covariate shift from labeled data in the source domain to the unlabeled target domain under the closed-set categories. Existing DAOD methods can be categorized into adversarial training and mean teacher paradigms. As for adversarial training, DAF (Chen et al. 2018) incorporates global and instance alignment modules based on the Faster-RCNN detector, while incremental variants (Krishna, Ohashi, and Sinha 2023) improve global and instance alignment. ATMT (Li et al. 2023a) explores the potential of self-supervised learning, and EPM (Hsu et al. 2020) introduces a new FCOS detector. Li et al. propose graph-based alignment methods (Li et al. 2022; Li, Liu, and Yuan 2022b) to align class-conditional features, with IGG (Li et al. 2023b) enhancing graph generation by effectively addressing non-informative noise. As for the mean teacher paradigm, existing works (Chen et al. 2022b; Deng et al. 2023; Cao et al. 2023; Li, Guo, and Yuan 2023; Liu et al. 2022) focus on generating pseudo-labels for the target domain. In general, existing DAOD works have limited generalizability in open-world scenarios.

Refer to caption
Figure 2: Illustration of the proposed DPA framework. (a) GDPA establishes the global-level embedding feature to sample the outlier in the feature space, then applies a CDF of Gauss distribution to weighting the probability distribution. (b) IDSA obtains the gradient norm of instance probability and models it as a Gauss distribution for sampling and weighting. (c) PCC obtains the domain-private common centroid and constraints distances of samples to the centroid between feature and probability spaces.

Universal Domain Adaptation (UniDA)

UniDA (You et al. 2019) is a general paradigm for partial-set (Zhang et al. 2018), open-set (Panareda Busto and Gall 2017), and closed-set domain adaptation (Tzeng et al. 2017). The existing UniDA can be categorized into four paradigms, including threshold, clustering, optimal migration, and mutual learning. The threshold methods (You et al. 2019; Fu et al. 2020; Chen et al. 2022a) estimate inter-sample uncertainty to identify shared categories, often relying heavily on manually set thresholds. Clustering-based UniDA methods (Saito et al. 2020; Li et al. 2021a) have been developed to distinguish shared categories. UniOT (Chang et al. 2022) introduces optimal transport to detect shared categories. Despite diversity advancements in classification tasks, UniDAOD remains in its early stages. US-DAF (Shi et al. 2022) conducts scale domain alignment and uses thresholds for sample mining. Recently, UCF (Lang et al. 2023) and W-adapt (Shi et al. 2024a) employ weighting mechanisms to mitigate the negative transfer of private categories. CODE (Shi et al. 2024b) leverages virtual domain labels to avoid domain-private samples alignment. In general, existing UniDAOD methods perform domain-shared category alignment at both the global and instance levels but overlook the alignment of domain-private categories.

Theoretical Motivation

We theoretically analyze the error risk upper of UniDAOD for domain-shared and domain-private categories based on the theory of Unsupervised Domain Adaptation (UDA) (Ben-David et al. 2010).
Definition 1. Universal Domain Adaptation (UniDA). We define source and target domains with data xs/t{x}_{s/t} with distribution {𝒟s/t|𝒫x𝒟s𝒫x𝒟t}\left\{\mathcal{D}_{s/t}|\mathcal{P}_{{x}\sim\mathcal{D}_{s}}\neq\mathcal{P}_{{x}\sim\mathcal{D}_{t}}\right\}, and a label function ψ:x{c,c}\psi:{x}\to\left\{{c},{c}^{\ast}\right\}, where c{c} is the domain-shared categories and c{c}^{\ast} is the domain-private categories. For simplicity, we omit the s/ts/t notation unless explicitly indicated. The goal of UniDA is to train a model hh that can minimize the shared categories error risk of target domain ϵc(h)=min 𝐄(x,ψ(x))𝒟tc[h(x)ψ(x)]\epsilon_{c}(h)=\text{min }\mathbf{E}_{({x},\psi({x}))\sim\mathcal{D}_{t}^{c}}\left[h({x})\neq\psi({x})\right].
Definition 2. Private and Shared Category Error Risks. Given the input xx draw from the distribution 𝒟\mathcal{D} with the label function ψ\psi, we can get the error risk as follows:

ϵ(h)=𝔼x𝒟|h(x)ψ(x)|=x|h(x)ψ(x)|𝒫x,ψdx.\epsilon(h)=\mathbb{E}_{{x}\sim\mathcal{D}}\left|h({x})-\psi({x})\right|=\int_{{x}}\left|h({x})-\psi({x})\right|\,\mathcal{P}_{x,\psi}\,{\rm d}x. (1)

We can decompose Eq. (1) into the domain-shared categories and domain-private categories and let h,ψ\mathcal{F}_{h,\psi} denote |h(x)ψ(x)|\left|h({x})-\psi({x})\right| as follows:

ϵ(h)=xch,ψ𝒫x,ψdx+xch,ψ𝒫x,ψdx=ϵc(h)+ϵc(h),\epsilon(h)=\int_{{x}_{c}}\mathcal{F}_{h,\psi}\,\mathcal{P}_{x,\psi}\,{\rm d}x+\int_{{x}_{{c}^{\ast}}}\mathcal{F}_{h,\psi}\,\mathcal{P}_{x,\psi}\,{\rm d}x=\epsilon_{c}(h)+\epsilon_{{c}^{\ast}}(h), (2)

where ϵc(h)\epsilon_{c}(h) and ϵc(h)\epsilon_{{c}^{\ast}}(h) are the error risk for domain-shared and domain-private categories. Then, we define a symmetric hypothesis space \mathcal{H} based on the error risks upper bound of UDA (Ben-David et al. 2010) and combined with Eq. (2) to obtain the error risks upper bound for UniDAOD as follows:

ϵct(h)ϵs(h)detϵct(h)PCC+dc(𝒟sc,𝒟tc)IDSA+dc(𝒟sc,𝒟tc)GDPA+𝐄x𝒟sc[ψt(x)ψs(x)]argmin𝐄x𝒟scat global-level+𝐄x𝒟sc[ψt(x)ψs(x)]argmin𝐄x𝒟scat instance-level.\begin{array}[]{l}\epsilon_{c}^{t}(h)\leq\underbrace{\epsilon^{s}(h)}_{\mathcal{L}_{\mathrm{det}}}-\underbrace{\epsilon_{c^{*}}^{t}(h)}_{\mathcal{L}_{\mathrm{PCC}}}+\underbrace{d_{\mathcal{H}}^{c}\left(\mathcal{D}_{s}^{c},\mathcal{D}_{t}^{c}\right)}_{\mathcal{L}_{\mathrm{IDSA}}}+\underbrace{d_{\mathcal{H}}^{c^{\ast}}\left(\mathcal{D}_{s}^{c\ast},\mathcal{D}_{t}^{c\ast}\right)}_{\mathcal{L}_{\mathrm{GDPA}}}\\ +\underbrace{\mathbf{E}_{x\sim\mathcal{D}_{s}^{c^{*}}}\left[\psi_{t}\mathbf{(}{x})-\psi_{s}({x})\right]}_{\operatorname*{arg\,min}_{\mathbf{E}_{x\sim\mathcal{D}_{s}^{c^{*}}}}\text{at global-level}}+\underbrace{\mathbf{E}_{{x}\sim\mathcal{D}_{s}^{c}}\left[\psi_{t}({x})-\psi_{s}({x})\right]}_{\operatorname*{arg\,min}_{\mathbf{E}_{x\sim\mathcal{D}_{s}^{c}}}\text{at instance-level}}.\end{array} (3)

Remark 1. Existing UniDAOD methods (Shi et al. 2022; Lang et al. 2023; Shi et al. 2024b, a) employ the domain shared category domain alignment dc(𝒟sc,𝒟tc)d_{\mathcal{H}}^{c}\left(\mathcal{D}_{s}^{c},\mathcal{D}_{t}^{c}\right) for both the global and instance level features, ignore domain-private category alignment dc(𝒟sc,𝒟tc)d_{\mathcal{H}}^{c\ast}\left(\mathcal{D}_{s}^{c\ast},\mathcal{D}_{t}^{c\ast}\right) and maximize target domain error risk ϵct(h)\epsilon_{c^{*}}^{t}(h) of the domain-private categories. This oversight leads to an increase in the upper bound of the domain-shared categories, denoted as ϵct(h)\epsilon_{c}^{t}(h), in the target domain. In UniDAOD, this issue pertains to the global-level domain-private category alignment and the instance-level domain-shared category alignment. This conclusion is consistent with the observations in Fig. 2.

Dual Probabilistic Alignment Framework

Overview. The proposed DPA framework is depicted in Fig. 2. To minimize the upper bound ϵct(h)\epsilon_{c}^{t}(h) of the domain-shared category of the target domain, DPA comprises GDPA, IDSA, and PCC to optimal the terms in Eq. (3). GDPA minimizes the domain-private category cc^{*} domain distribution discrepancy dc(𝒟sc,𝒟tc)d_{\mathcal{H}}^{c^{\ast}}\left(\mathcal{D}_{s}^{c^{\ast}},\mathcal{D}_{t}^{c\ast}\right) for global-level features, and IDSA minimizes domain-shared category domain distribution discrepancy dc(𝒟sc,𝒟tc)d_{\mathcal{H}}^{c}\left(\mathcal{D}_{s}^{c},\mathcal{D}_{t}^{c}\right) for instance-level features. Additionally, PCC maximize the domain-private category risk error ϵct(h)\epsilon_{c^{*}}^{t}(h) of target domain.

Global-level Domain Private Alignment (GDPA)

To align global-level domain-private category features, we sample outliers in feature space and model the batch samples as Gauss distribution for the cumulative distribution function to estimate the domain distribution as the weights.

Global-level Sampling.

The global-level sampling process involves constructing the dynamic feature centroid and updating the learnable radius. The embedding feature xex_{e} through the encoder of the domain discriminator computes the dynamic feature centroid and the learnable radius. The dynamic feature centroid, denoted as C=M(yd)\textbf{C}=\textbf{M}(y_{d}), is derived from the memory bank M2×c\textbf{M}\in\mathbb{R}^{2\times c^{\prime}}. The learnable radius applies the softplus activation function to calculate the boundary d=log(1+e(yd))d=\log(1+e^{\nabla(y_{d})}), where 2\nabla\in\mathbb{R}^{2} represents the learnable boundary parameters and ydy_{d} is the domain label. Subsequently, we can calculate the distance from the sample to the feature centroid and perform sampling with the learnable boundaries as follows:

Ωgneg\displaystyle\Omega^{\text{neg}}_{g} ={ixiC2>d},\displaystyle=\{i\mid\|x_{i}-\textbf{C}\|_{2}>d\}, (4)
Ωgpos\displaystyle\Omega^{\text{pos}}_{g} ={ixiC2d},\displaystyle=\{i\mid\|x_{i}-\textbf{C}\|_{2}\leq d\},

where Ωgneg\Omega^{\text{neg}}_{g} represents the negative sample indexs and Ωgpos\Omega^{\text{pos}}_{g} represents the positive sample indexs. Finally, we update the feature centroid and the learnable radius. The memory bank M(yd)=M(yd)π+x¯e(1π)\textbf{M}(y_{d})=\textbf{M}(y_{d})\cdot\pi+\overline{x}_{e}\cdot(1-\pi) is adjusted using a momentum update as π=x¯eM(yd)x¯e2M(yd)2\pi=\frac{\overline{x}_{e}\cdot\textbf{M}(y_{d})}{\left\|\overline{x}_{e}\right\|_{2}\cdot\left\|\textbf{M}(y_{d})\right\|_{2}}, where x¯e=1ninxe,i\overline{x}_{e}=\frac{1}{n}\sum_{i}^{n}x_{e,i} represents the mean of the current batch of embedding features. Additionally, the learnable radius is updated based on the boundary loss bound\mathcal{L}_{\text{bound}} as follows:

bound=1ni=1nϵi(dxiC2)+(1ϵi)(xiC2d),\mathcal{L}_{\text{bound}}=\frac{1}{n}\sum_{i=1}^{n}\epsilon_{i}\left(d-\left\|x_{i}-{\textbf{C}}\right\|_{2}\right)+(1-\epsilon_{i})\left(\left\|x_{i}-{\textbf{C}}\right\|_{2}-d\right), (5)

where ϵi=𝕀(iΩgneg)\epsilon_{i}=\mathbb{I}(i\in\Omega^{\text{neg}}_{g}) is indicator function. In contrast to the existing UniDAOD threshold methods (Shi et al. 2024b, 2022), GDPA sampling is data-driven adaptive updating.
Calculating Alignment Weight. We obtain the domain probability of embedding feature pg=𝒟g(xe)p_{g}={\mathcal{F}}_{\mathcal{D}_{g}}({x}_{e}) through the global level domain discriminator 𝒟g\mathcal{D}_{g}. To model the Gaussian distribution for the probabilities of the current batch, we estimate expectation μg=1ninpg,i\mu_{g}=\frac{1}{n}\sum_{i}^{n}p_{g,i} and variance  σg2=1nin(pg,iμg)2\text{ }\sigma_{g}^{2}=\frac{1}{n}\sum_{i}^{n}(p_{g,i}-\mu_{g})^{2}. After that, we adopt the cumulative distribution function (CDF), which is calculated as the weight for domain alignment as follows:

Φ(z)=12[1+erf(zμgσg2)],\Phi(z)=\frac{1}{2}\left[1+{\rm erf}(\frac{z-\mu_{g}}{\sigma_{g}\sqrt{2}})\right], (6)

where erf(){\rm erf(\cdot)} is the Gauss error function, and zz is the mean of the probability distribution for adversarial training. As illustrated in Fig. 5, we observe that as the shared category ratio β=𝒞s𝒞t𝒞s𝒞t\beta=\frac{\mathcal{C}_{s}\cap\mathcal{C}_{t}}{\mathcal{C}_{s}\cup\mathcal{C}_{t}} decreases, the weights Φs/(1Φt)Φs+(1Φt)\frac{\Phi_{s}/(1-\Phi_{t})}{\Phi_{s}+(1-\Phi_{t})} exhibit increased scaling to accommodate a substantial domain gap.

Global-level Domain Alignment.

To achieve global-level alignment, the gradient reversal layer is employed with focal loss as follows:

GDPA\displaystyle\mathcal{L}_{\text{GDPA}} =1nnegi=1nneg[ΦsΦs+(1Φt)(1ps,i)γlogps,i\displaystyle=-\frac{1}{n_{\text{neg}}}\sum_{i=1}^{n_{\text{neg}}}\Big{[}\frac{\Phi_{s}}{\Phi_{s}+(1-\Phi_{t})}\left(1-p_{s,i}\right)^{\gamma}\log p_{s,i} (7)
+(1Φt)Φs+(1Φt)pt,iγ(1logpt,i)],\displaystyle\quad+\frac{(1-\Phi_{t})}{\Phi_{s}+(1-\Phi_{t})}{p^{\gamma}_{t,i}}\left(1-\log p_{t,i}\right)\Big{]},

where γ\gamma is the gamma parameter, pp is the probalibity of the domain discriminator, and nnegΩgneg{n_{\text{neg}}}\in\Omega^{\text{neg}}_{g} represents the negative samples numbers.

Methods boat bottle bus car cat chair cow table dog horse motor person plant sheep sofa mAP
Source Only 31.8 41.2 31.1 34.7 5.1 33.7 23.0 20.7 8.3 43.0 52.7 49.6 40.6 17.0 13.8 29.8
DAF (Chen et al. 2018) 37.2 38.0 26.9 35.9 2.3 35.2 24.0 28.5 4.2 33.8 54.7 59.4 58.4 13.4 17.9 31.3
MAF (He and Zhang 2019) 24.2 42.9 35.1 32.3 11.0 41.7 22.4 32.6 6.7 40.0 59.1 52.7 41.0 24.1 17.9 32.2
HTCN (Chen et al. 2020) 25.9 47.8 36.0 32.8 11.3 39.4 51.7 18.7 10.5 40.9 56.3 57.9 49.4 21.3 20.4 34.7
UAN (You et al. 2019) 26.6 37.7 48.2 31.5 8.6 32.8 23.7 31.6 2.4 36.6 56.6 42.8 44.8 14.7 16.4 30.3
CMU (Fu et al. 2020) 14.7 41.9 52.5 34.7 9.2 36.5 38.1 21.0 7.6 37.0 48.6 55.7 44.5 17.7 21.1 32.1
SFA (Wang et al. 2021) 25.2 30.5 24.3 22.1 1.6 28.3 37.5 23.4 2.6 19.9 43.1 47.0 39.0 16.8 22.0 25.5
US-DAF (Shi et al. 2022) 34.9 40.8 28.9 36.4 17.7 38.4 64.6 28.0 10.3 45.8 64.5 62.5 52.1 25.8 24.8 38.4
UCF (Lang et al. 2023) 36.2 44.3 28.3 37.1 2.2 36.0 61.9 27.7 4.0 39.9 64.7 64.2 52.6 20.9 26.9 36.5
CODE (Shi et al. 2024b) 36.8 45.1 42.0 37.7 18.4 44.5 47.7 33.1 8.4 45.7 69.2 61.7 50.7 25.5 24.6 39.4
DPA (Ours) 32.9 46.0 62.1 41.4 4.2 42.0 64.4 33.3 8.1 40.5 67.1 64.2 57.8 32.1 25.1 41.4
Table 1: Comparison on Pascal VOC to Clipart1k (Open-set: β=75%,CsCt,CsCt,CtCs\beta=75\%,C_{s}\cap C_{t}\neq\emptyset,C_{s}\setminus C_{t}\neq\emptyset,C_{t}\setminus C_{s}\neq\emptyset ).
Methods bus car cat chair cow table dog horse motor person mAP
Source Only 43.3 33.0 8.4 32.1 24.0 28.7 6.9 34.9 51.8 42.5 30.6
DAF (Chen et al. 2018) 37.5 32.8 10.2 40.3 27.2 31.3 4.1 41.0 55.5 52.0 33.2
MAF (He and Zhang 2019) 37.1 31.1 9.7 38.1 19.9 29.1 2.5 37.3 50.7 50.0 30.6
HTCN (Chen et al. 2020) 29.5 34.4 17.3 33.8 50.6 14.0 3.6 46.9 74.7 58.5 36.3
UAN (You et al. 2019) 48.9 26.4 14.6 36.7 49.9 30.0 3.2 39.9 56.1 52.0 35.8
CMU (Fu et al. 2020) 33.3 32.8 8.1 41.5 55.5 24.6 5.6 43.3 54.9 60.4 36.0
SFA (Wang et al. 2021) 29.4 28.3 14.4 30.5 29.4 13.0 3.1 26.8 61.6 2.9 27.9
US-DAF (Shi et al. 2022) 31.3 41.9 7.3 42.9 64.3 30.0 5.7 44.8 69.5 61.9 40.0
UCF (Lang et al. 2023) 32.4 37.4 4.0 33.5 59.3 40.2 4.4 39.0 58.1 61.7 37.0
CODE (Shi et al. 2024b) 50.0 38.8 19.5 42.0 46.9 34.5 13.5 43.7 64.3 52.9 40.6
DPA (Ours) 45.0 41.3 13.7 38.0 64.9 30.0 13.8 45.8 74.0 60.2 42.7
Table 2: Comparison on Pascal VOC to Clipart1k (Open-set: β=50%,CsCt,CsCt,CtCs\beta=50\%,C_{s}\cap C_{t}\neq\emptyset,C_{s}\setminus C_{t}\neq\emptyset,C_{t}\setminus C_{s}\neq\emptyset).

Instance-level Domain Shared Alignment (IDSA)

To efficiently align the domain-shared category at the instance level, we calculate the gradient norm of the instance samples to model a Gaussian distribution to discard outlier samples and estimate the weight of the domain-shared category samples to improve domain alignment.
Instance-level Sampling. The instance-level sampling process involves constructing the probability space and the sampling criteria. First, we build the probability space to model the Gaussian distribution. The number of n^\hat{n} instance-level features xυx_{\upsilon} is generated by the domain discriminator 𝒟υ{\mathcal{F}}_{\mathcal{D}_{\upsilon}} to calculate the domain probabilities pυ=Sigmoid(𝒟υ(xυ))p_{\upsilon}={\rm Sigmoid}\left({\mathcal{F}}_{\mathcal{D}_{\upsilon}}(x_{\upsilon})\right). The domain probabilities are used to compute the gradient norm ηυ\eta_{\upsilon} for each instance as ηυ=|pυyd|\eta_{\upsilon}=\left|p_{\upsilon}-{y_{d}}\right|, where yd{y_{d}} is the domain label. We then construct the gradient norm bins Ω^={iψ|i}\hat{\Omega}=\left\{i\cdot\psi|i\in\mathbb{Z}\right\} to calculate the gradient norm frequencies τ\tau, which model the Gaussian distribution with the minimum interval ψ=argmin{(ηυmaxηυmin)ηυstd,δ}\psi=\mathop{\rm argmin}\left\{(\eta_{\upsilon}^{\rm max}-\eta_{\upsilon}^{\rm min})\cdot\eta_{\upsilon}^{\rm std},\delta\right\}, where δ\delta is the hyperparameter. For the sampling criteria, we leverage the statistical characteristics of the Gaussian distribution. The first sampling criterion involves filtering out samples in noncontinuous frequency bins. The second criterion is related to the characteristic of the Gaussian distribution, where the frequency of filtered bins is lower than that of continuous bins. The sampling process is as follows:

Ωυpos={iτi>0,ij+1ij=1,j},\displaystyle{\rm\Omega}^{\rm pos}_{\upsilon}=\left\{i\mid\tau_{i}>0,\;i_{j+1}-i_{j}=1,\forall j\right\}, (8)
Ωυneg={iiΩυpos,τi<τω},\displaystyle{\rm\Omega}^{\rm neg}_{\upsilon}=\left\{i\mid i\notin{\rm\Omega}^{\rm pos}_{\upsilon},\;\tau_{i}<\tau_{\omega}\right\},

where τω\tau_{\omega} denotes the frequency in the first and last continuous bins in positive samples Ωυpos{\rm\Omega}^{\rm pos}_{\upsilon}. The sum of the bins represents the sampling radius from the feature centroid in the feature space, which dynamically adjusts the bins following a Gaussian distribution of the source or target domain data during adversarial training.
Calculating Alignment Weight. These negative instances are excluded from the instance alignment through the instance weight as follows:

𝒲υ={0,𝒲υΩυneg,1|ηυmean0.5|0.5,𝒲υΩυpos,\mathcal{W}_{\upsilon}=\left\{\begin{array}[]{ll}0,&\mathcal{W}_{\upsilon}\in{\rm\Omega}^{\rm neg}_{\upsilon},\\ 1-\frac{\left|\eta_{\upsilon}^{\rm mean}-0.5\right|}{0.5},&\mathcal{W}_{\upsilon}\in{\rm\Omega}^{\rm pos}_{\upsilon},\end{array}\right. (9)

where ηυmean\eta_{\upsilon}^{\rm mean} is the mean value of the gradient norm ηυ\eta_{\upsilon}.

Instance-level Domain Alignment.

This processing aims to provide instance-level features into the domain discriminator for adversarial training to achieve domain alignment. Based on the obtained weights 𝒲υ\mathcal{W}_{\upsilon}, the loss function of the IDSA module is as follows:

IDSA=1n^in^𝒲υ(1pυ)log(pυ)+𝒲υpυ(1log(pυ)),\mathcal{L}_{\rm IDSA}=-\frac{1}{\hat{n}}\sum\limits_{i}^{\hat{n}}\mathcal{W}_{\upsilon}\cdot(1-p_{\upsilon}){\rm log}(p_{\upsilon})+\mathcal{W}_{\upsilon}\cdot p_{\upsilon}(1-{\rm log}(p_{\upsilon})), (10)

where n^\hat{n} is the number of instance proposals. By optimizing the function IDSA{\mathcal{L}_{\rm IDSA}} in adversarial training, the IDSA module mitigates negative transfer caused by domain-private feature alignment. It calibrates the domain-shared feature distribution according to a Gaussian distribution to enhance positive transfer between source and target domains.

Private Class Constraint (PCC)

Given the instance-level feature xυx_{\upsilon} and the domain probabilities pυp_{\upsilon}, we first perform a classifier head to establish domain-private categories for both the source domain and the target domain: {ciy^s,icy^t,ic=}in\{{c}^{\ast}_{i}\mid\hat{y}_{s,i}^{{c}^{\ast}}\cap\hat{y}_{t,i}^{{c}^{\ast}}=\emptyset\}^{n^{*}}_{i}. To aggregate the centroids of domain-private categories in feature and probability spaces, we calculate the feature centroid x¯υ\bar{x}_{\upsilon} and the probability centroid p¯υ\bar{p}_{\upsilon}. We then conduct the cosine similarity distance 𝒢i=xυ,ix¯υ2\mathcal{G}_{i}=\left\|x_{\upsilon,i}-\bar{x}_{\upsilon}\right\|_{2} and probability samples 𝐠i=pυ,ip¯υ2\mathbf{g}_{i}=\left\|p_{\upsilon,i}-\bar{p}_{\upsilon}\right\|_{2} to centroid to measure intra-domain distances. To measure the intra-domain distance, we use cosine similarity, defined as εs/t=1ni=1n𝒢i𝐠i𝒢i2𝐠i2\varepsilon_{s/t}=\frac{1}{n^{*}}\sum_{i=1}^{n^{*}}\frac{\mathcal{G}_{i}\cdot\mathbf{g}_{i}}{\left\|\mathcal{G}_{i}\right\|_{2}\cdot\left\|\mathbf{g}_{i}\right\|_{2}}. Finally, we employ the mean squared error (MSE) loss function to minimize the inter-domain distance, as follows:

PCC=(εsεt)2.{\mathcal{L}_{\rm PCC}}={{{{\left({\varepsilon_{s}-\varepsilon_{t}}\right)}^{2}}}}. (11)

The loss function PCC{\mathcal{L}_{\rm PCC}} optimizes the network by adopting a gradient detach for the source domain.

Optimization

The training loss of the DPA is represented as DPA{\mathcal{L}}_{\rm DPA}, which consists of the following loss terms:

DPA=det+GDPA+IDSA+αPCC,{\mathcal{L}}_{\rm DPA}={\mathcal{L}_{\rm det}}+{\mathcal{L}_{\rm GDPA}}+{\mathcal{L}_{\rm IDSA}}+\alpha{\mathcal{L}_{\rm PCC}}, (12)

where det\mathcal{L}_{\rm det} is the Faster-RCNN detector loss. GDPA\mathcal{L}_{\rm GDPA} and IDSA\mathcal{L}_{\rm IDSA} are the domain alignment losses for the GDPA and IDSA modules at the global and instance levels, respectively. The DPA{\mathcal{L}}_{\rm DPA} is optimized using the SGD optimizer. The bound loss bound{\mathcal{L}_{\rm{bound}}} is optimized using the Adam optimizer with a learning rate set to 0.10.1. The hyperparameter of α\alpha is 0 for the initial epoch and 0.1 thereafter.

Experiments

Methods plane bicycle bird boat bottle mAP
Source Only 33.2 55.7 25.4 29.2 41.6 37.0
DAF 31.5 42.5 25.2 34.4 50.8 36.9
MAF 29.3 57.0 27.1 33.9 41.8 37.8
HTCN 32.5 53.0 24.1 27.0 48.4 37.0
UAN 35.6 55.9 27.1 28.2 44.2 38.2
CMU 45.5 52.7 28.8 29.4 40.1 39.3
SFA 28.4 32.4 27.2 34.2 34.2 31.3
US-DAF 44.2 57.5 27.9 32.2 40.5 40.5
UCF 35.8 52.9 28.6 20.8 55.7 38.8
CODE 42.1 61.4 26.2 32.1 44.1 41.2
DPA (Ours) 40.8 58.1 28.2 33.7 52.0 42.5
Table 3: Comparison on Pascal VOC to Clipart1k (Open-set: β=25%,CsCt,CsCt,CtCs\beta=25\%,C_{s}\cap C_{t}\neq\emptyset,C_{s}\setminus C_{t}\neq\emptyset,C_{t}\setminus C_{s}\neq\emptyset).
Methods bicycle bird car cat dog person mAP
Source Only 29.8 50.2 47.1 62.2 51.5 57.8 49.8
DAF 29.5 53.8 50.6 58.1 48.1 56.5 49.4
MAF 28.5 50.0 46.8 59.4 50.2 58.6 48.9
HTCN 26.4 43.0 46.5 50.8 44.0 53.9 44.1
UAN 33.6 52.1 53.8 62.4 52.2 56.1 51.7
CMU 36.9 51.2 53.3 59.3 51.7 59.9 52.0
CODE 39.6 53.1 54.7 57.6 56.1 57.8 53.1
SFA 4.4 17.4 11.1 14.9 20.1 18.4 14.4
US-DAF 35.0 52.4 52.7 63.1 54.3 59.8 52.9
UCF 34.8 52.0 53.8 61.9 54.2 60.5 52.9
DPA (Ours) 31.3 55.2 56.4 61.1 62.1 58.3 54.1
Table 4: Comparison WaterColor to Pascal VOC (Partial-set: β=30%,CsCt\beta=30\%,C_{s}\subset C_{t}).
Methods bicycle bird car cat dog person mAP
Source Only 82.4 51.7 48.4 39.9 30.7 59.2 52.0
DAF 73.4 51.9 43.1 35.6 28.8 63.1 49.3
MAF 70.4 50.3 44.3 36.7 30.6 62.9 49.2
HTCN 74.1 49.8 51.9 35.3 35.3 66.0 52.1
UAN 78.0 53.6 50.4 36.4 35.8 65.6 53.3
CMU 82.0 53.9 48.6 39.6 33.1 66.0 53.9
SFA 37.1 39.3 32.3 52.7 9.9 34.1 34.2
US-DAF 86.5 54.1 50.0 43.0 34.0 63.2 55.2
UCF 84.8 52.1 49.8 40.6 33.8 63.2 54.1
CODE 87.9 55.3 50.7 38.9 34.7 67.5 55.8
DPA (Ours) 86.1 53.5 50.3 44.1 38.1 65.6 56.3
Table 5: Comparison on Pascal VOC to WaterColor (Partial-set: β=30%\beta=30\%, CsCt{C_{s}}\supset{C_{t}}).
Methods DA Settings mAP
SFA (Wang et al. 2021) DAOD 41.3
SCAN++ (Li, Liu, and Yuan 2022a) DAOD 42.8
SIGMA++ (Li, Liu, and Yuan 2023) DAOD 44.5
US-DAF (Shi et al. 2022) UniDAOD 37.8
UCF (Lang et al. 2023) UniDAOD 34.2
CODE (Shi et al. 2024b) UniDAOD 42.1
DPA (Ours) UniDAOD 46.3
Table 6: Comparison on Cityscape to Foggy Cityscape (Closed-set: β=100%,Cs=Ct\beta=100\%,{C_{s}}={C_{t}}).

Implementation Details

We conduct extensive experiments following the setting (Shi et al. 2022) for three benchmarks: open-set, partial-set, and closed-set. The baseline approach (Chen et al. 2018) adopts Faster-RCNN as the base detector with the focal loss, and the backbone is ResNet-101 (He et al. 2016) or VGG-16 in (Simonyan and Zisserman 2014) pre-trained on ImageNet (Deng et al. 2009), which adopt VGG-16 in Cityscape to Foggy Cityscape and other benchmarks adopt the ResNet-101. The DPA model optimized training iterations are 100100k, with an initial learning rate of 1e-3 and a subsequent decay of the learning rate to 1e-4 following 5050k iterations. The detection performance is evaluated with the mean Average Precision (mAP) metric, and the threshold of mAP follows the setting (Shi et al. 2022) to set 0.5.

Datasets and Domain Adaptation Settings

We evaluate our DPA framework on five datasets across three domain adaptation scenarios (open-set, partial-set, and closed-set): Foggy Cityscapes (Sakaridis, Dai, and Van Gool 2018), Cityscapes (Cordts et al. 2016), Pascal VOC (Everingham et al. 2010), Clipart1k (Inoue et al. 2018), and Watercolor (Inoue et al. 2018). In the open-set scenario, there are shared and private categories in both the source and target domains. We introduce mutil ratios β={0.25,0.5,0.75}\beta=\left\{0.25,0.5,0.75\right\} to construct different shared category ratios β=𝒞s𝒞t𝒞s𝒞t\beta=\frac{\mathcal{C}_{s}\cap\mathcal{C}_{t}}{\mathcal{C}_{s}\cup\mathcal{C}_{t}} benchmarks. In the partial-set scenario, the category set of the source domain is the subset for the target domain, and vice versa. In the closed-set scenario, the categories in the target and source domains are identical.

Comparisons with the State-of-the-Arts

Open-set scenario. Tables 1, 2, and 3 present the open-set domain adaptive object detection performance from Pascal VOC to Clipart1k under different category overlap ratios β\beta. The proposed framework consistently achieves state-of-the-art performance across various category settings compared to other methods. Compared to related DAOD methods (HTCN, MAF, DAF, SFA), the proposed DPA framework demonstrates significant performance advantages. Additionally, in comparison with UniDAOD methods (CODE, US-DAF, CMU, UAN, and UCF), the proposed DPA framework also exhibits superior performance.
Partial-set scenario. The results of the partial-set domain adaptation are presented in 4 and 5. In the partial-set scenario, the private categories are exclusively present in the source or target domain, leading to negative transfer. The proposed DPA method effectively addresses this issue and outperforms other approaches by a significant margin, achieving 54.1% and 56.3% mAP. For the domain-private category in the target domain (CsCtC_{s}\subset C_{t}), DPA enhances performance by 1.2% compared to the UniDAOD method (US-DAF), as shown in Table 4. For the domain-private category in the source domain (CsCt{C_{s}}\supset{C_{t}}), DPA improves performance by 4.2% compared to the DAOD method (HTCN).
Closed-set scenario. The closed-set scenario results are shown in Table 6 show that although existing UniDAOD methods (Shi et al. 2024b, a) integrated with advanced DAOD methods achieve notable performance improvements on closed-set, the DAOD methods significantly outperform the UniDAOD methods. This advantage comes from DAOD under the closed-set assumption, while UniDAOD prioritizes dealing with label shifts in open environments. The proposed DPA framework exhibits satisfactory performance in closed-set scenarios through probability modeling.

Ablation Study

We conduct ablation experiments on each submodule, with the corresponding results presented in Table 7. In these experiments, each module of the proposed DPA framework improves performance. The GDPA and IDSA modules provide significant gains when the domain-shared categories ratio is high (β=50%,75%\beta=50\%,75\%), while the PCC module leads to more substantial improvements when the ratio is low (β=25%\beta=25\%).

Ratio β\beta 75%75\% 50%50\% 25%25\%
Baseline 33.9 37.9 38.2
DPA w/o GDPA 38.3 39.7 40.1
DPA w/o IDSA 36.3 40.1 39.9
DPA w/o PCC 40.7 42.4 39.4
DPA (Ours) 41.4 42.7 42.5
Table 7: Ablation study on Pascal VOC to Clipart1k (Open-set: β=75%,50%,25%\beta=75\%,50\%,25\%).

Category-wise Performance Analysis

To compare the performance of the proposed method with existing DAOD and UniDAOD methods in terms of positive and negative transfer, we present the performance gains of DAOD and UniDAOD relative to the source-only model in Fig. 3. The DAOD methods exhibit significant negative transfer, where DAF, MAF, and HTCN drop by approximately 2%, 4%, and 1% AP in class 0, respectively. In contrast, the UniDAOD methods mitigate negative transfer, with CODE and DPA achieving positive transfer of around 3% and 10% in class 4, respectively. This category-wise performance analysis proves that the proposed method effectively combats negative transfer and strengthens positive transfer.

Refer to caption
Figure 3: Category-wise performance gain over the source-only model (classes are plane, bicycle, bird, boat, and bottle). Positive transfer is green, and negative transfer is red.
Refer to caption
Figure 4: Qualitative analysis of category alignment in terms of the mean probability gap: (a) global-level features and (b) instance-level features. The horizontal axis represents training iterations (×\times100), and the vertical axis shows the probability of the domain discriminator. The benchmark is Pascal VOC to Clipart1k. (β=25%\beta=25\%).
Refer to caption
Figure 5: The weight quantitative analysis of global-level domain-private feature. The horizontal axis is training iteration (×\times100), and the vertical axis is weight values Φs/(1Φt)Φs+(1Φt)\frac{\Phi_{s}/(1-\Phi_{t})}{\Phi_{s}+(1-\Phi_{t})} in source and target domains.

Qualitative Open-set Alignment Analysis

We further analyze the probability gap in our DPA framework for open-set alignment. As shown in Fig. 4(a), the global-level mean probability gap is more pronounced in our DPA, highlighting its effectiveness in distinguishing domain-private categories. In contrast, Fig. 4(b) shows a smaller mean probability gap at the instance level, demonstrating that our DPA better aligns domain-shared categories. Additionally, we perform a weight quantitative analysis of global-level domain-private alignment, as illustrated in Fig. 5. As the ratio of domain-private categories increases, the mean weight gap also increases, indicating that adversarial training adaptively penalizes features associated with domain-private categories through weight adjustments.

Conclusion

We propose a DPA framework for universal domain adaptive object detection with two kinds of probabilistic alignment. Inspired by a theoretical perspective, we propose a GDPA module for aligning global-level private samples and an IDSA module for aligning instance-level domain-shared samples. To combat negative transfer, we propose a PCC module to confuse the discriminability of private categories. Extensive experiments are conducted on open, partial, and closed set scenarios and demonstrate our DPA outperforms state-of-the-art UniDAOD methods by a remarkable margin.

Acknowledgments

This work is supported by the InnoHK program, and the National Natural Science Foundation of China (Grant No.#62306313).

References

  • Ali et al. (2024) Ali, S.; Ghatwary, N.; Jha, D.; Isik-Polat, E.; Polat, G.; Yang, C.; Li, W.; Galdran, A.; Ballester, M.-Á. G.; Thambawita, V.; et al. 2024. Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge. Sci. Rep.
  • Ben-David et al. (2010) Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; and Vaughan, J. W. 2010. A theory of learning from different domains. Machine learning, 79: 151–175.
  • Cai et al. (2024) Cai, M.; Kezierbieke, J.; Zhong, X.; and Chen, H. 2024. Uncertainty-Aware and Class-Balanced Domain Adaptation for Object Detection in Driving Scenes. IEEE Transactions on Intelligent Transportation Systems.
  • Cao et al. (2023) Cao, S.; Joshi, D.; Gui, L.-Y.; and Wang, Y.-X. 2023. Contrastive Mean Teacher for Domain Adaptive Object Detectors. In CVPR, 23839–23848.
  • Chang et al. (2022) Chang, W.; Shi, Y.; Tuan, H.; and Wang, J. 2022. Unified optimal transport framework for universal domain adaptation. NeurlPS, 35: 29512–29524.
  • Chapman et al. (2023) Chapman, N. H.; Dayoub, F.; Browne, W.; and Lehnert, C. 2023. Predicting class distribution shift for reliable domain adaptive object detection. IEEE Robotics and Automation Letters.
  • Chen et al. (2020) Chen, C.; Zheng, Z.; Ding, X.; Huang, Y.; and Dou, Q. 2020. Harmonizing transferability and discriminability for adapting object detectors. In CVPR, 8869–8878.
  • Chen et al. (2022a) Chen, L.; Lou, Y.; He, J.; Bai, T.; and Deng, M. 2022a. Evidential neighborhood contrastive learning for universal domain adaptation. In AAAI, volume 36, 6258–6267.
  • Chen et al. (2022b) Chen, M.; Chen, W.; Yang, S.; Song, J.; Wang, X.; Zhang, L.; Yan, Y.; Qi, D.; Zhuang, Y.; Xie, D.; et al. 2022b. Learning Domain Adaptive Object Detection with Probabilistic Teacher. In ICML, 3040–3055. PMLR.
  • Chen et al. (2018) Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; and Van Gool, L. 2018. Domain adaptive faster r-cnn for object detection in the wild. In CVPR, 3339–3348.
  • Cordts et al. (2016) Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; and Schiele, B. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR, 3213–3223.
  • Deng et al. (2009) Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248–255. Ieee.
  • Deng et al. (2023) Deng, J.; Xu, D.; Li, W.; and Duan, L. 2023. Harmonious Teacher for Cross-Domain Object Detection. In CVPR, 23829–23838.
  • Everingham et al. (2010) Everingham, M.; Van Gool, L.; Williams, C. K.; Winn, J.; and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, 88: 303–338.
  • Fu et al. (2020) Fu, B.; Cao, Z.; Long, M.; and Wang, J. 2020. Learning to detect open classes for universal domain adaptation. In ECCV, 567–583. Springer.
  • He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
  • He and Zhang (2019) He, Z.; and Zhang, L. 2019. Multi-adversarial faster-rcnn for unrestricted object detection. In ICCV, 6668–6677.
  • Hsu et al. (2020) Hsu, C.-C.; Tsai, Y.-H.; Lin, Y.-Y.; and Yang, M.-H. 2020. Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In ECCV, 733–748. Springer.
  • Huang et al. (2024) Huang, T.; Huang, C.-C.; Ku, C.-H.; and Chen, J.-C. 2024. Blenda: Domain Adaptive Object Detection Through Diffusion-Based Blending. In ICASSP, 4075–4079. IEEE.
  • Inoue et al. (2018) Inoue, N.; Furuta, R.; Yamasaki, T.; and Aizawa, K. 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, 5001–5009.
  • Jia et al. (2023) Jia, D.; Yuan, Y.; He, H.; Wu, X.; Yu, H.; Lin, W.; Sun, L.; Zhang, C.; and Hu, H. 2023. Detrs with hybrid matching. In CVPR, 19702–19712.
  • Krishna, Ohashi, and Sinha (2023) Krishna, O.; Ohashi, H.; and Sinha, S. 2023. MILA: memory-based instance-level adaptation for cross-domain object detection. arXiv preprint arXiv:2309.01086.
  • Lang et al. (2023) Lang, Q.; He, Z.; Fu, X.; and Zhang, L. 2023. Class-aware Memory Guided Unbiased Weighting for Universal Domain Adaptive Object Detection. In ICCV, 4345–4354.
  • Li et al. (2021a) Li, G.; Kang, G.; Zhu, Y.; Wei, Y.; and Yang, Y. 2021a. Domain consensus clustering for universal domain adaptation. In CVPR, 9757–9766.
  • Li et al. (2023a) Li, K.; Wigington, C.; Tensmeyer, C.; Morariu, V. I.; Zhao, H.; Manjunatha, V.; Barmpalios, N.; and Fu, Y. 2023a. Improving Cross-Domain Detection With Self-Supervised Learning. In CVPR, 4745–4754.
  • Li et al. (2023b) Li, P.; He, Y.; Yu, F. R.; Song, P.; Yin, D.; and Zhou, G. 2023b. IGG: Improved Graph Generation for Domain Adaptive Object Detection. In ACM MM, 1314–1324.
  • Li et al. (2021b) Li, W.; Chen, Z.; Li, B.; Zhang, D.; and Yuan, Y. 2021b. Htd: Heterogeneous task decoupling for two-stage object detection. IEEE Transactions on Image Processing, 30: 9456–9469.
  • Li, Guo, and Yuan (2023) Li, W.; Guo, X.; and Yuan, Y. 2023. Novel Scenes & Classes: Towards Adaptive Open-set Object Detection. In ICCV, 15780–15790.
  • Li et al. (2023c) Li, W.; Liu, J.; Han, B.; and Yuan, Y. 2023c. Adjustment and alignment for unbiased open set domain adaptation. In CVPR, 24110–24119.
  • Li et al. (2024) Li, W.; Liu, X.; Ma, J.; and Yuan, Y. 2024. CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection. In ECCV.
  • Li et al. (2022) Li, W.; Liu, X.; Yao, X.; and Yuan, Y. 2022. Scan: Cross domain object detection with semantic conditioned adaptation. In AAAI, volume 36, 1421–1428.
  • Li, Liu, and Yuan (2022a) Li, W.; Liu, X.; and Yuan, Y. 2022a. Scan++: Enhanced semantic conditioned adaptation for domain adaptive object detection. IEEE Transactions on Multimedia, 25: 7051–7061.
  • Li, Liu, and Yuan (2022b) Li, W.; Liu, X.; and Yuan, Y. 2022b. Sigma: Semantic-complete graph matching for domain adaptive object detection. In CVPR, 5291–5300.
  • Li, Liu, and Yuan (2023) Li, W.; Liu, X.; and Yuan, Y. 2023. Sigma++: Improved semantic-complete graph matching for domain adaptive object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7): 9022–9040.
  • Liu et al. (2022) Liu, X.; Li, W.; Yang, Q.; Li, B.; and Yuan, Y. 2022. Towards robust adaptive object detection under noisy annotations. In CVPR, 14207–14216.
  • Liu, Li, and Yuan (2023) Liu, X.; Li, W.; and Yuan, Y. 2023. Decoupled Unbiased Teacher for Source-Free Domain Adaptive Medical Object Detection. TNNLS.
  • Lu et al. (2024) Lu, Y.; Shen, M.; Ma, A. J.; Xie, X.; and Lai, J.-H. 2024. MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain Adaptation. In AAAI, volume 38, 3900–3908.
  • Panareda Busto and Gall (2017) Panareda Busto, P.; and Gall, J. 2017. Open set domain adaptation. In ICCV, 754–763.
  • Pu et al. (2024) Pu, B.; Wang, L.; Yang, J.; He, G.; Dong, X.; Li, S.; Tan, Y.; Chen, M.; Jin, Z.; Li, K.; et al. 2024. M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection. In CVPR, 11621–11630.
  • Saito et al. (2020) Saito, K.; Kim, D.; Sclaroff, S.; and Saenko, K. 2020. Universal domain adaptation through self supervision. NeurlPS, 33: 16282–16292.
  • Sakaridis, Dai, and Van Gool (2018) Sakaridis, C.; Dai, D.; and Van Gool, L. 2018. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126: 973–992.
  • Shi, Zheng, and Chen (2024) Shi, C.; Zheng, Y.; and Chen, Z. 2024. Domain Adaptive Thermal Object Detection with Unbiased Granularity Alignment. ACM Transactions on Multimedia Computing, Communications and Applications, 20(9): 1–23.
  • Shi et al. (2024a) Shi, W.; Liu, D.; Tan, D.; and Zheng, B. 2024a. A dynamically class-wise weighting mechanism for unsupervised cross-domain object detection under universal scenarios. Knowledge-Based Systems, 111987.
  • Shi et al. (2024b) Shi, W.; Liu, D.; Wu, Z.; and Zheng, B. 2024b. Confused and disentangled distribution alignment for unsupervised universal adaptive object detection. Knowledge-Based Systems, 112085.
  • Shi et al. (2022) Shi, W.; Zhang, L.; Chen, W.; and Pu, S. 2022. Universal domain adaptive object detector. In ACM MM, 2258–2266.
  • Simonyan and Zisserman (2014) Simonyan, K.; and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  • Tzeng et al. (2017) Tzeng, E.; Hoffman, J.; Saenko, K.; and Darrell, T. 2017. Adversarial discriminative domain adaptation. In CVPR, 7167–7176.
  • Wang et al. (2021) Wang, W.; Cao, Y.; Zhang, J.; He, F.; Zha, Z.-J.; Wen, Y.; and Tao, D. 2021. Exploring sequence feature alignment for domain adaptive detection transformers. In ACM MM, 1730–1738.
  • You et al. (2019) You, K.; Long, M.; Cao, Z.; Wang, J.; and Jordan, M. I. 2019. Universal domain adaptation. In CVPR, 2720–2729.
  • Zhang et al. (2018) Zhang, J.; Ding, Z.; Li, W.; and Ogunbona, P. 2018. Importance weighted adversarial nets for partial domain adaptation. In CVPR, 8156–8164.
  • Zhao et al. (2024) Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; and Chen, J. 2024. Detrs beat yolos on real-time object detection. In CVPR, 16965–16974.