This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Spectral Regularized Kernel Two-Sample Tests

Omar Hagrass Department of Statistics, Pennsylvania State University
University Park, PA 16802, USA.
{oih3,bks18,bxl9}@psu.edu
Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University
University Park, PA 16802, USA.
{oih3,bks18,bxl9}@psu.edu
Bing Li Department of Statistics, Pennsylvania State University
University Park, PA 16802, USA.
{oih3,bks18,bxl9}@psu.edu
Abstract

Over the last decade, an approach that has gained a lot of popularity to tackle nonparametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real data, we demonstrate the superior performance of the proposed test in comparison to the MMD test and other popular tests in the literature.

MSC 2010 subject classification: Primary: 62G10; Secondary: 65J20, 65J22, 46E22, 47A52.
Keywords and phrases: Two-sample test, maximum mean discrepancy, reproducing kernel Hilbert space, covariance operator, U-statistics, Bernstein’s inequality, minimax separation, adaptivity, permutation test, spectral regularization

1 Introduction

Given 𝕏N:=(Xi)i=1Ni.i.d.P\mathbb{X}_{N}:=(X_{i})_{i=1}^{N}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}P, and 𝕐M:=(Yj)j=1Mi.i.d.Q\mathbb{Y}_{M}:=(Y_{j})_{j=1}^{M}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}Q, where PP and QQ are defined on a measurable space 𝒳\mathcal{X}, the problem of two-sample testing is to test H0:P=QH_{0}:P=Q against H1:PQH_{1}:P\neq Q. This is a classical problem in statistics that has attracted a lot of attention both in the parametric (e.g., tt-test, χ2\chi^{2}-test) and nonparametric (e.g., Kolmogorov-Smirnoff test, Wilcoxon signed-rank test) settings (Lehmann and Romano, 2006). However, many of these tests either rely on strong distributional assumptions or cannot handle non-Euclidean data that naturally arise in many modern applications.

Over the last decade, an approach that has gained a lot of popularity to tackle nonparametric testing problems on general domains is based on the notion of reproducing kernel Hilbert space (RKHS) (Aronszajn, 1950) embedding of probability distributions (Smola et al., 2007, Sriperumbudur et al., 2009, Muandet et al., 2017). Formally, the RKHS embedding of a probability measure PP is defined as

μP=𝒳K(,x)𝑑P(x),\mu_{P}=\int_{\mathcal{X}}K(\cdot,x)\,dP(x)\in\mathscr{H},

where K:𝒳×𝒳K:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R} is the unique reproducing kernel (r.k.) associated with the RKHS \mathscr{H} with PP satisfying 𝒳K(x,x)𝑑P(x)<\int_{\mathcal{X}}\sqrt{K(x,x)}\,dP(x)<\infty. If KK is characteristic (Sriperumbudur et al., 2010, Sriperumbudur et al., 2011), this embedding induces a metric on the space of probability measures, called the maximum mean discrepancy (MMD\mathrm{MMD}) or the kernel distance (Gretton et al., 2012, Gretton et al., 2006), defined as

DMMD(P,Q)=μPμQ.D_{\mathrm{MMD}}(P,Q)=\left\|\mu_{P}-\mu_{Q}\right\|_{\mathscr{H}}. (1.1)

MMD has the following variational representation (Gretton et al., 2012, Sriperumbudur et al., 2010) given by

DMMD(P,Q):=supf:f1𝒳f(x)d(PQ)(x),D_{\mathrm{MMD}}(P,Q):=\sup_{f\in\mathscr{H}:\left\|f\right\|_{\mathscr{H}}\leq 1}\int_{\mathcal{X}}f(x)\,d(P-Q)(x), (1.2)

which clearly reduces to (1.1) by using the reproducing property: f(x)=f,K(,x)f(x)={\left\langle f,K(\cdot,x)\right\rangle}_{\mathscr{H}} for all ff\in\mathscr{H}, x𝒳x\in\mathcal{X}. We refer the interested reader to Sriperumbudur et al. (2010), Sriperumbudur (2016), and Simon-Gabriel and Schölkopf (2018) for more details about DMMDD_{\mathrm{MMD}}. Gretton et al. (2012) proposed a test based on the asymptotic null distribution of the UU-statistic estimator of DMMD2(P,Q)D^{2}_{\mathrm{MMD}}(P,Q), defined as

D^MMD2(X,Y)\displaystyle\hat{D}_{\mathrm{MMD}}^{2}(X,Y) =1N(N1)ijK(Xi,Xj)+1M(M1)ijK(Yi,Yj)\displaystyle=\frac{1}{N(N-1)}\sum_{i\neq j}K(X_{i},X_{j})+\frac{1}{M(M-1)}\sum_{i\neq j}K(Y_{i},Y_{j})
2NMi,jK(Xi,Yj),\displaystyle\qquad\qquad-\frac{2}{NM}\sum_{i,j}K(X_{i},Y_{j}),

and showed it to be consistent. Since the asymptotic null distribution does not have a simple closed form—the distribution is that of an infinite sum of weighted chi-squared random variables with the weights being the eigenvalues of an integral operator associated with the kernel KK w.r.t. the distribution PP—, several approximate versions of this test have been investigated and are shown to be asymptotically consistent (e.g., see Gretton et al., 2012 and Gretton et al., 2009). Recently, Li and Yuan (2019) and Schrab et al. (2021) showed these tests based on D^MMD2\hat{D}^{2}_{\text{MMD}} to be not optimal in the minimax sense but modified them to achieve a minimax optimal test by using translation-invariant kernels on 𝒳=d\mathcal{X}=\mathbb{R}^{d}. However, since the power of these kernel methods lies in handling more general spaces and not just d\mathbb{R}^{d}, the main goal of this paper is to construct minimax optimal kernel two-sample tests on general domains.

Before introducing our contributions, first, we will introduce the minimax framework pioneered by Burnashev (1979) and Ingster (1987, 1993) to study the optimality of tests, which is essential to understanding our contributions and their connection to the results of Li and Yuan (2019) and Schrab et al. (2021). Let ϕ(𝕏N,𝕐M)\phi(\mathbb{X}_{N},\mathbb{Y}_{M}) be any test that rejects H0H_{0} when ϕ=1\phi=1 and fails to reject H0H_{0} when ϕ=0\phi=0. Denote the class of all such asymptotic (resp. exact) α\alpha-level tests to be Φα\Phi_{\alpha} (resp. ΦN,M,α\Phi_{N,M,\alpha}). Let 𝒞\mathcal{C} be a set of probability measures on 𝒳\mathcal{X}. The Type-II error of a test ϕΦα\phi\in\Phi_{\alpha} (resp. ΦN,M,α\in\Phi_{N,M,\alpha}) w.r.t. 𝒫Δ\mathcal{P}_{\Delta} is defined as RΔ(ϕ)=sup(P,Q)𝒫Δ𝔼PN×QM(1ϕ),R_{\Delta}(\phi)=\sup_{(P,Q)\in\mathcal{P}_{\Delta}}\mathbb{E}_{P^{N}\times Q^{M}}(1-\phi), where

𝒫Δ:={(P,Q)𝒞2:ρ2(P,Q)Δ},\mathcal{P}_{\Delta}:=\left\{(P,Q)\in\mathcal{C}^{2}:\rho^{2}(P,Q)\geq\Delta\right\},

is the class of Δ\Delta-separated alternatives in probability metric ρ\rho, with Δ\Delta being referred to as the separation boundary or contiguity radius. Of course, the interest is in letting Δ0\Delta\rightarrow 0 as M,NM,N\rightarrow\infty (i.e., shrinking alternatives) and analyzing RΔR_{\Delta} for a given test, ϕ\phi, i.e., whether RΔ(ϕ)0R_{\Delta}(\phi)\rightarrow 0. In the asymptotic setting, the minimax separation or critical radius Δ\Delta^{*} is the fastest possible order at which Δ0\Delta\rightarrow 0 such that liminfN,MinfϕΦαRΔ(ϕ)0\lim\inf_{N,M\rightarrow\infty}\inf_{\phi\in\Phi_{\alpha}}R_{\Delta^{*}}(\phi)\rightarrow 0, i.e., for any Δ\Delta such that Δ/Δ\Delta/\Delta^{*}\rightarrow\infty, there is no test ϕΦα\phi\in\Phi_{\alpha} that is consistent over 𝒫Δ\mathcal{P}_{\Delta}. A test is asymptotically minimax optimal if it is consistent over 𝒫Δ\mathcal{P}_{\Delta} with ΔΔ\Delta\asymp\Delta^{*}. On the other hand, in the non-asymptotic setting, the minimax separation Δ\Delta^{*} is defined as the minimum possible separation, Δ\Delta such that infϕΦN,M,αRΔ(ϕ)δ\inf_{\phi\in\Phi_{N,M,\alpha}}R_{\Delta}(\phi)\leq\delta, for 0<δ<1α0<\delta<1-\alpha. A test ϕΦN,M,α\phi\in\Phi_{N,M,\alpha} is called minimax optimal if RΔ(ϕ)δR_{\Delta}(\phi)\leq\delta for some ΔΔ\Delta\asymp\Delta^{*}. In other words, there is no other α\alpha-level test that can achieve the same power with a better separation boundary.

In the context of the above notation and terminology, Li and Yuan (2019) considered distributions with densities (w.r.t. the Lebesgue measure) belonging to

𝒞={f:d:fa.s. continuous andfWds,2M},\mathcal{C}=\left\{f:\mathbb{R}^{d}\rightarrow\mathbb{R}\,:\,f\,\,\text{a.s. continuous and}\,\,\|f\|_{W^{s,2}_{d}}\leq M\right\}, (1.3)

where fWds,22=(1+x22)s|f^(x)|2𝑑x\|f\|^{2}_{W^{s,2}_{d}}=\int(1+\|x\|^{2}_{2})^{s}|\hat{f}(x)|^{2}\,dx with f^\hat{f} being the Fourier transform of ff, ρ(P,Q)=pqL2\rho(P,Q)=\|p-q\|_{L^{2}} with pp and qq being the densities of PP and QQ, respectively, and showed the minimax separation to be Δ(N+M)4s/(4s+d)\Delta^{*}\asymp(N+M)^{-4s/(4s+d)}. Furthermore, they chose KK to be a Gaussian kernel on d\mathbb{R}^{d}, i.e., K(x,y)=exp(xy22/h)K(x,y)=\exp(-\|x-y\|^{2}_{2}/h) in D^MMD2\hat{D}^{2}_{\text{MMD}} with h0h\rightarrow 0 at an appropriate rate as N,MN,M\rightarrow\infty (reminiscent of kernel density estimators) in contrast to fixed hh in Gretton et al. (2012), and showed the resultant test to be asymptotically minimax optimal w.r.t. 𝒫Δ\mathcal{P}_{\Delta} based on (1.3) and L2\|\cdot\|_{L^{2}}. Schrab et al. (2021) extended this result to translation-invariant kernels (particularly, as the product of one-dimensional translation-invariant kernels) on 𝒳=d\mathcal{X}=\mathbb{R}^{d} with a shrinking bandwidth and showed the resulting test to be minimax optimal even in the non-asymptotic setting. While these results are interesting, the analysis holds only for d\mathbb{R}^{d} as the kernels are chosen to be translation invariant on d\mathbb{R}^{d}, thereby limiting the power of the kernel approach.

In this paper, we employ an operator theoretic perspective to understand the limitation of DMMD2D^{2}_{\text{MMD}} and propose a regularized statistic that mitigates these issues without requiring 𝒳=d\mathcal{X}=\mathbb{R}^{d}. In fact, the construction of the regularized statistic naturally gives rise to a certain 𝒫Δ\mathcal{P}_{\Delta} which is briefly described below. To this end, define R:=P+Q2R:=\frac{P+Q}{2} and u:=dPdR1u:=\frac{dP}{dR}-1 which is well defined as PRP\ll R. It can be shown that DMMD2(P,Q)=4𝒯u,uL2(R)D^{2}_{\text{MMD}}(P,Q)=4\langle\mathcal{T}u,u\rangle_{L^{2}(R)} where 𝒯:L2(R)L2(R)\mathcal{T}:L^{2}(R)\rightarrow L^{2}(R) is an integral operator defined by KK (see Section 3 for details), which is in fact a self-adjoint positive trace-class operator if KK is bounded. Therefore, DMMD2(P,Q)=iλiu,ϕ~iL2(R)2D^{2}_{\text{MMD}}(P,Q)=\sum_{i}\lambda_{i}\langle u,\tilde{\phi}_{i}\rangle^{2}_{L^{2}(R)} where (λi,ϕ~i)i(\lambda_{i},\tilde{\phi}_{i})_{i} are the eigenvalues and eigenfunctions of 𝒯\mathcal{T}. Since 𝒯\mathcal{T} is trace-class, we have λi0\lambda_{i}\rightarrow 0 as ii\rightarrow\infty, which implies that the Fourier coefficients of uu, i.e., u,ϕ~iL2(R)\langle u,\tilde{\phi}_{i}\rangle_{L^{2}(R)}, for large ii, are down-weighted by λi\lambda_{i}. In other words, DMMD2D^{2}_{\text{MMD}} is not powerful enough to distinguish between PP and QQ if they differ in the high-frequency components of uu, i.e., u,ϕ~iL2(R)\langle u,\tilde{\phi}_{i}\rangle_{L^{2}(R)} for large ii. On the other hand,

uL2(R)2=iu,ϕ~iL2(R)2=χ2(PP+Q2)=12𝒳(dPdQ)2d(P+Q)=:ρ¯2(P,Q)\|u\|^{2}_{L^{2}(R)}=\sum_{i}\langle u,\tilde{\phi}_{i}\rangle^{2}_{L^{2}(R)}=\chi^{2}\left(P\left\|\right.\frac{P+Q}{2}\right)=\frac{1}{2}\int_{\mathcal{X}}\frac{(dP-dQ)^{2}}{d(P+Q)}=:\underline{\rho}^{2}(P,Q)

does not suffer from any such issue, with ρ¯\underline{\rho} being a probability metric that is topologically equivalent to the Hellinger distance (see Lemma A.18 and Le Cam, 1986, p. 47). With this motivation, we consider the following modification to DMMD2D^{2}_{\text{MMD}}:

ηλ(P,Q)=4𝒯gλ(𝒯)u,uL2(R),\eta_{\lambda}(P,Q)=4\left\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\right\rangle_{L^{2}(R)},

where gλ:(0,)(0,)g_{\lambda}:(0,\infty)\rightarrow(0,\infty), called the spectral regularizer (Engl et al., 1996) is such that limλ0xgλ(x)1\lim_{\lambda\rightarrow 0}xg_{\lambda}(x)\asymp 1 as λ0\lambda\rightarrow 0 (a popular example is the Tikhonov regularizer, gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda}), i.e., 𝒯gλ(𝒯)𝑰\mathcal{T}g_{\lambda}(\mathcal{T})\approx\boldsymbol{I}, the identity operator—refer to (4.1) for the definition of gλ(𝒯)g_{\lambda}(\mathcal{T}). In fact, in Section 4, we show ηλ(P,Q)\eta_{\lambda}(P,Q) to be equivalent to uL2(R)2\|u\|^{2}_{L^{2}(R)}, i.e., uL2(R)2ηλ(P,Q)uL2(R)2\|u\|^{2}_{L^{2}(R)}\lesssim\eta_{\lambda}(P,Q)\lesssim\|u\|^{2}_{L^{2}(R)} if uRan(𝒯θ),θ>0u\in\text{Ran}(\mathcal{T}^{\theta}),\,\theta>0 and uL2(R)2λ2θ\|u\|^{2}_{L^{2}(R)}\gtrsim\lambda^{2\theta}, where Ran(A)\text{Ran}(A) denotes the range space of an operator AA, θ\theta is the smoothness index (large θ\theta corresponds to “smooth” uu), and 𝒯θ\mathcal{T}^{\theta} is defined by choosing gλ(x)=xθ,x0g_{\lambda}(x)=x^{\theta},\,x\geq 0 in (4.1). This naturally leads to the class of Δ\Delta-separated alternatives,

𝒫:=𝒫θ,Δ:={(P,Q):dPdR1Ran(𝒯θ),ρ¯2(P,Q)Δ},\mathcal{P}:=\mathcal{P}_{\theta,\Delta}:=\left\{(P,Q):\frac{dP}{dR}-1\in\text{Ran}(\mathcal{T}^{\theta}),\,\,\underline{\rho}^{2}(P,Q)\geq\Delta\right\}, (1.4)

for θ>0\theta>0, where Ran(𝒯θ),θ(0,12]\text{Ran}(\mathcal{T}^{\theta}),\,\theta\in(0,\frac{1}{2}] can be interpreted as an interpolation space obtained by the real interpolation of \mathscr{H} and L2(R)L^{2}(R) at scale θ\theta (Steinwart and Scovel, 2012, Theorem 4.6)—note that the real interpolation of Sobolev spaces and L2(d)L^{2}(\mathbb{R}^{d}) yields Besov spaces (Adams and Fournier, 2003, p. 230). To compare the class in (1.4) to that obtained using (1.3) with ρ(,)=L2(d)\rho(\cdot,\cdot)=\|\cdot-\cdot\|_{L^{2}(\mathbb{R}^{d})}, note that the smoothness in (1.4) is determined through Ran(𝒯θ)\text{Ran}(\mathcal{T}^{\theta}) instead of the Sobolev smoothness where the latter is tied to translation-invariant kernels on d\mathbb{R}^{d}. Since we work with general domains, the smoothness is defined through the interaction between KK and the probability measures in terms of the behavior of the integral operator, 𝒯\mathcal{T}. In addition, as λ0\lambda\rightarrow 0, ηλ(P,Q)ρ¯2(P,Q)\eta_{\lambda}(P,Q)\rightarrow\underline{\rho}^{2}(P,Q), while DMMD2(P,Q)pq22D^{2}_{\text{MMD}}(P,Q)\rightarrow\|p-q\|^{2}_{2} as h0h\rightarrow 0, where DMMD2D^{2}_{\text{MMD}} is defined through a translation invariant kernel on d\mathbb{R}^{d} with bandwidth h>0h>0. Hence, we argue that (1.4) is a natural class of alternatives to investigate the performance of DMMD2D^{2}_{\text{MMD}} and ηλ\eta_{\lambda}. In fact, recently, Balasubramanian et al. (2021) considered an alternative class similar to (1.4) to study goodness-of-fit tests using DMMD2D^{2}_{\text{MMD}}.

Contributions

The main contributions of the paper are as follows:
(i) First, in Theorem 3.1, we show that the test based on D^MMD2\hat{D}^{2}_{\text{MMD}} cannot achieve a separation boundary better than (N+M)2θ2θ+1(N+M)^{\frac{-2\theta}{2\theta+1}} w.r.t. 𝒫\mathcal{P} in (1.4). However, this separation boundary depends only on the smoothness of uu, which is determined by θ\theta but is completely oblivious to the intrinsic dimensionality of the RKHS, \mathscr{H}, which is controlled by the decay rate of the eigenvalues of 𝒯\mathcal{T}. To this end, by taking into account the intrinsic dimensionality of \mathscr{H}, we show in Corollaries 3.3 and 3.4 (also see Theorem 3.2) that the minimax separation w.r.t. 𝒫\mathcal{P} is (N+M)4θβ4θβ+1(N+M)^{-\frac{4\theta\beta}{4\theta\beta+1}} for θ>12\theta>\frac{1}{2} if λiiβ\lambda_{i}\asymp i^{-\beta}, β>1\beta>1, i.e., the eigenvalues of 𝒯\mathcal{T} decay at a polynomial rate β\beta, and is log(N+M)/(N+M)\sqrt{\log(N+M)}/(N+M) if λiei\lambda_{i}\asymp e^{-i}, i.e., exponential decay. These results clearly establish the non-optimality of the MMD-based test.
(ii) To resolve this issue with MMD, in Section 4.2, we propose a spectral regularized test based on ηλ\eta_{\lambda} and show it to be minimax optimal w.r.t. 𝒫\mathcal{P} (see Theorems 4.2, 4.3 and Corollaries 4.4, 4.5). Before we do that, we first provide an alternate representation for ηλ(P,Q)\eta_{\lambda}(P,Q) as ηλ(P,Q)=gλ1/2(ΣR)(μPμQ)2\eta_{\lambda}(P,Q)=\|g^{1/2}_{\lambda}(\Sigma_{R})(\mu_{P}-\mu_{Q})\|^{2}_{\mathscr{H}}, which takes into account the information about the covariance operator, ΣR\Sigma_{R} along with the mean elements, μP\mu_{P}, and μQ\mu_{Q}, thereby showing resemblance to Hotelling’s T2T^{2}-statistic (Lehmann and Romano, 2006) and its kernelized version (Harchaoui et al., 2007). This alternate representation is particularly helpful to construct a two-sample UU-statistic (Hoeffding, 1992) as a test statistic (see Section 4.1), which has a worst-case computational complexity of O((N+M)3)O((N+M)^{3}) in contrast to O((N+M)2)O((N+M)^{2}) of the MMD test (see Theorem 4.1). However, the drawback of the test is that it is not usable in practice since the critical level depends on (ΣR+λI)1/2(\Sigma_{R}+\lambda I)^{-1/2}, which is unknown since RR is unknown. Therefore, we refer to this test as the Oracle test.
(iii) In order to make the Oracle test usable in practice, in Section 4.3, we propose a permutation test (e.g., see Lehmann and Romano, 2006, Pesarin and Salmaso, 2010, and Kim et al., 2022) leading to a critical level that is easy to compute (see Theorem 4.6), while still being minimax optimal w.r.t. 𝒫\mathcal{P} (see Theorem 4.7 and Corollaries 4.84.9). However, the minimax optimal separation rate is tightly controlled by the choice of the regularization parameter, λ\lambda, which in turn depends on the unknown parameters, θ\theta and β\beta (in the case of the polynomial decay of the eigenvalues of 𝒯\mathcal{T}). This means the performance of the permutation test depends on the choice of λ\lambda. To make the test completely data-driven, in Section 4.4, we present an aggregate version of the permutation test by aggregating over different λ\lambda and show the resulting test to be minimax optimal up to a loglog\log\log factor (see Theorems 4.10 and 4.11). In Section 4.5, we discuss the problem of kernel choice and present an adaptive test by jointly aggregating over λ\lambda and kernel KK, which we show to be minimax optimal up to a loglog\log\log factor (see Theorem 4.12).
(iv) Through numerical simulations on benchmark data, we demonstrate the superior performance of the spectral regularized test in comparison to the adaptive MMD test (Schrab et al., 2021), Energy test (Szekely and Rizzo, 2004) and Kolmogorov-Smirnov (KS) test (Puritz et al., 2022; Fasano and Franceschini, 1987), in Section 5.

All these results hinge on Bernstein-type inequalities for the operator norm of a self-adjoint Hilbert-Schmidt operator-valued U-statistics (Sriperumbudur and Sterge, 2022). A closely related work to ours is by Harchaoui et al. (2007) who consider a regularized MMD test with gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda} (see Remark 4.2 for a comparison of our regularized statistic to that of Harchaoui et al., 2007). However, our work deals with general gλg_{\lambda}, and our test statistic is different from that of Harchaoui et al. (2007). In addition, our tests are non-asymptotic and minimax optimal in contrast to that of Harchaoui et al. (2007), which only shows asymptotic consistency against fixed alternatives and provides some asymptotic results against local alternatives.

2 Definitions & Notation

For a topological space 𝒳\mathcal{X}, Lr(𝒳,μ)L^{r}(\mathcal{X},\mu) denotes the Banach space of rr-power (r1)(r\geq 1) μ\mu-integrable function, where μ\mu is a finite non-negative Borel measure on 𝒳\mathcal{X}. For fLr(𝒳,μ)=:Lr(μ)f\in L^{r}(\mathcal{X},\mu)=:L^{r}(\mu), fLr(μ):=(𝒳|f|r𝑑μ)1/r\left\|f\right\|_{L^{r}(\mu)}:=(\int_{\mathcal{X}}|f|^{r}\,d\mu)^{1/r} denotes the LrL^{r}-norm of ff. μn:=μ×n×μ\mu^{n}:=\mu\times\stackrel{{\scriptstyle n}}{{...}}\times\mu is the nn-fold product measure. \mathscr{H} denotes a reproducing kernel Hilbert space with a reproducing kernel K:𝒳×𝒳K:\mathcal{X}\times\mathcal{X}\to\mathbb{R}. [f][f]_{\sim} denotes the equivalence class of the function ff, that is the collection of functions gLr(𝒳,μ)g\in L^{r}(\mathcal{X},\mu) such that fgLr(μ)=0\left\|f-g\right\|_{L^{r}(\mu)}=0. For two measures PP and QQ, PQP\ll Q denotes that PP is dominated by QQ which means, if Q(A)=0Q(A)=0 for some measurable set AA, then P(A)=0P(A)=0.

Let H1H_{1} and H2H_{2} be abstract Hilbert spaces. (H1,H2)\mathcal{L}(H_{1},H_{2}) denotes the space of bounded linear operators from H1H_{1} to H2H_{2}. For S(H1,H2)S\in\mathcal{L}(H_{1},H_{2}), SS^{*} denotes the adjoint of SS. S(H):=(H,H)S\in\mathcal{L}(H):=\mathcal{L}(H,H) is called self-adjoint if S=SS^{*}=S. For S(H)S\in\mathcal{L}(H), Tr(S)\text{Tr}(S), S2(H)\left\|S\right\|_{\mathcal{L}^{2}(H)}, and S(H)\left\|S\right\|_{\mathcal{L}^{\infty}(H)} denote the trace, Hilbert-Schmidt and operator norms of SS, respectively. For x,yHx,y\in H, xHyx\otimes_{H}y is an element of the tensor product space of HHH\otimes H which can also be seen as an operator from HHH\to H as (xHy)z=xy,zH(x\otimes_{H}y)z=x{\left\langle y,z\right\rangle}_{H} for any zHz\in H.

For constants aa and bb, aba\lesssim b (resp. aba\gtrsim b) denotes that there exists a positive constant cc (resp. cc^{\prime}) such that acba\leq cb (resp. acb)a\geq c^{\prime}b). aba\asymp b denotes that there exists positive constants cc and cc^{\prime} such that cbacbcb\leq a\leq c^{\prime}b. We denote [][\ell] for {1,,}\{1,\ldots,\ell\}.

3 Non-optimality of DMMD2D^{2}_{\text{MMD}} test

In this section, we establish the non-optimality of the test based on DMMD2D^{2}_{\text{MMD}}. First, we make the following assumption throughout the paper.
(A0)(A_{0}) (𝒳,)(\mathcal{X},\mathcal{B}) is a second countable (i.e., completely separable) space endowed with Borel σ\sigma-algebra \mathcal{B}. (,K)(\mathscr{H},K) is an RKHS of real-valued functions on 𝒳\mathcal{X} with a continuous reproducing kernel KK satisfying supxK(x,x)κ.\sup_{x}K(x,x)\leq\kappa.
The continuity of KK ensures that K(,x):𝒳K(\cdot,x):\mathcal{X}\to\mathscr{H} is Bochner-measurable for all x𝒳x\in\mathcal{X}, which along with the boundedness of KK ensures that μP\mu_{P} and μQ\mu_{Q} are well-defined (Dinculeanu, 2000). Also the separability of 𝒳\mathcal{X} along with the continuity of KK ensures that \mathscr{H} is separable (Steinwart and Christmann, 2008, Lemma 4.33). Therefore,

DMMD2(P,Q)\displaystyle D^{2}_{\mathrm{MMD}}(P,Q) =𝒳K(,x)d(PQ)(x),𝒳K(,x)d(PQ)(x)\displaystyle={\left\langle\int_{\mathcal{X}}K(\cdot,x)\,d(P-Q)(x),\int_{\mathcal{X}}K(\cdot,x)\,d(P-Q)(x)\right\rangle}_{\mathscr{H}}
=4𝒳K(,x)u(x)𝑑R(x),𝒳K(,x)u(x)𝑑R(x),\displaystyle=4{\left\langle\int_{\mathcal{X}}K(\cdot,x)u(x)\,dR(x),\int_{\mathcal{X}}K(\cdot,x)u(x)\,dR(x)\right\rangle}_{\mathscr{H}}, (3.1)

where R=P+Q2R=\frac{P+Q}{2} and u=dPdR1u=\frac{dP}{dR}-1. Define :L2(R)\mathfrak{I}:\mathscr{H}\to L^{2}(R), f[f𝔼Rf]f\mapsto[f-\mathbb{E}_{R}f]_{\sim}, which is usually referred in the literature as the inclusion operator (e.g., see Steinwart and Christmann, 2008, Theorem 4.26), where 𝔼Rf=𝒳f(x)𝑑R(x)\mathbb{E}_{R}f=\int_{\mathcal{X}}f(x)\,dR(x). It can be shown (Sriperumbudur and Sterge, 2022, Proposition C.2) that :L2(R)\mathfrak{I}^{*}:L^{2}(R)\to\mathscr{H}, fK(,x)f(x)𝑑R(x)μR𝔼Rff\mapsto\int K(\cdot,x)f(x)\,dR(x)-\mu_{R}\mathbb{E}_{R}f. Define 𝒯:=:L2(R)L2(R)\mathcal{T}:=\mathfrak{I}\mathfrak{I}^{*}:L^{2}(R)\to L^{2}(R). It can be shown (Sriperumbudur and Sterge, 2022, Proposition C.2) that 𝒯=Υ(1L2(R)1)ΥΥ(1L2(R)1)+(1L2(R)1)Υ(1L2(R)1)\mathcal{T}=\Upsilon-(1\otimes_{L^{2}(R)}1)\Upsilon-\Upsilon(1\otimes_{L^{2}(R)}1)+(1\otimes_{L^{2}(R)}1)\Upsilon(1\otimes_{L^{2}(R)}1), where Υ:L2(R)L2(R)\Upsilon:L^{2}(R)\to L^{2}(R), fK(,x)f(x)𝑑R(x)f\mapsto\int K(\cdot,x)f(x)\,dR(x). Since KK is bounded, it is easy to verify that 𝒯\mathcal{T} is a trace class operator, and thus compact. Also, it is self-adjoint and positive, thus spectral theorem (Reed and Simon, 1980, Theorems VI.16, VI.17) yields that

𝒯=iIλiϕi~L2(R)ϕi~,\mathcal{T}=\sum_{i\in I}\lambda_{i}\tilde{\phi_{i}}\otimes_{L^{2}(R)}\tilde{\phi_{i}},

where (λi)i+(\lambda_{i})_{i}\subset\mathbb{R}^{+} are the eigenvalues and (ϕ~i)i(\tilde{\phi}_{i})_{i} are the orthonormal system of eigenfunctions (strictly speaking classes of eigenfunctions) of 𝒯\mathcal{T} that span Ran(𝒯)¯\overline{\text{Ran}(\mathcal{T})} with the index set II being either countable in which case λi0\lambda_{i}\to 0 or finite. In this paper, we assume that the set II is countable, i.e., infinitely many eigenvalues. Note that ϕi~\tilde{\phi_{i}} represents an equivalence class in L2(R)L^{2}(R). By defining ϕi:=ϕi~λi\phi_{i}:=\frac{\mathfrak{I}^{*}\tilde{\phi_{i}}}{\lambda_{i}}, it is clear that ϕi=[ϕi𝔼Rϕi]=ϕi~\mathfrak{I}\phi_{i}=[\phi_{i}-\mathbb{E}_{R}\phi_{i}]_{\sim}=\tilde{\phi_{i}} and ϕi\phi_{i}\in\mathscr{H}. Throughout the paper, ϕi\phi_{i} refers to this definition. Using these definitions, we can see that

DMMD2(P,Q)\displaystyle D^{2}_{\mathrm{MMD}}(P,Q) =4u,u=4𝒯u,uL2(R)=4i1λiu,ϕi~L2(R)2.\displaystyle=4{\left\langle\mathfrak{I}^{*}u,\mathfrak{I}^{*}u\right\rangle}_{\mathscr{H}}=4{\left\langle\mathcal{T}u,u\right\rangle}_{L^{2}(R)}=4\sum_{i\geq 1}\lambda_{i}\langle u,\tilde{\phi_{i}}\rangle^{2}_{L^{2}(R)}. (3.2)
Remark 3.1.

From the form of DMMD2D^{2}_{\mathrm{MMD}} in (3.1), it seems more natural to define :L2(R)\mathfrak{I}:\mathscr{H}\to L^{2}(R), f[f]f\mapsto[f]_{\sim}, so that :L2(R)\mathfrak{I}^{*}:L^{2}(R)\to\mathscr{H}, fK(,x)f(x)𝑑R(x)f\mapsto\int K(\cdot,x)f(x)\,dR(x), leading to DMMD2(P,Q)=4u,uD^{2}_{\mathrm{MMD}}(P,Q)=4\langle\mathfrak{I}^{*}u,\mathfrak{I}^{*}u\rangle_{\mathscr{H}}—an expression similar to (3.2). However, since uRan(𝒯θ)u\in\emph{\text{Ran}}(\mathcal{T}^{\theta}), θ>0\theta>0 as specified by 𝒫\mathcal{P}, it is clear that uu lies in the span of the eigenfunctions of 𝒯\mathcal{T}, while being orthogonal to constant functions in L2(R)L^{2}(R) since u,1L2(R)=0\langle u,1\rangle_{L^{2}(R)}=0. Defining the inclusion operator with centering as proposed under (3.1) guarantees that the eigenfunctions of 𝒯\mathcal{T} are orthogonal to constant functions since λi1,ϕ~iL2(R)=1,ϕ~iL2(R)=1,ϕ~iL2(R)=0\lambda_{i}\langle 1,\tilde{\phi}_{i}\rangle_{L^{2}(R)}=\langle 1,\mathfrak{I}\mathfrak{I}^{*}\tilde{\phi}_{i}\rangle_{L^{2}(R)}=\langle\mathfrak{I}^{*}1,\mathfrak{I}^{*}\tilde{\phi}_{i}\rangle_{L^{2}(R)}=0, which implies that constant functions are also orthogonal to the space spanned by the eigenfunctions, without assuming that the kernel KK is degenerate with respect to RR, i.e., K(,x)𝑑R(x)=0\int K(\cdot,x)\,dR(x)=0. The orthogonality of eigenfunctions to constant functions is crucial in establishing the minimax separation boundary, which relies on constructing a specific example of uu from the span of eigenfunctions that is orthogonal to constant functions (see the proof of Theorem 3.2). On the other hand, the eigenfunctions of \mathfrak{I}\mathfrak{I}^{*} with \mathfrak{I} as considered in this remark are not guaranteed to be orthogonal to constant functions in L2(R)L^{2}(R).

Suppose uspan{ϕ~i:iI}u\in\text{span}\{\tilde{\phi}_{i}:i\in I\}. Then i1u,ϕi~L2(R)2=uL2(R)2=()ρ¯2(P,Q),\sum_{i\geq 1}\langle u,\tilde{\phi_{i}}\rangle^{2}_{L^{2}(R)}=\left\|u\right\|_{L^{2}(R)}^{2}\stackrel{{\scriptstyle(*)}}{{=}}\underline{\rho}^{2}(P,Q), where ρ¯2(P,Q):=12(dPdQ)2dP+dQ\underline{\rho}^{2}(P,Q):=\frac{1}{2}\int\frac{(dP-dQ)^{2}}{dP+dQ} and ()(*) follows from Lemma A.18 by noting that uL2(R)2=χ2(P||R)\|u\|^{2}_{L^{2}(R)}=\chi^{2}(P||R). As mentioned in Section 1, DMMDD_{\mathrm{MMD}} might not capture the difference between between PP and QQ if they differ in the higher Fourier coefficients of uu, i.e., u,ϕ~iL2(R)\langle u,\tilde{\phi}_{i}\rangle_{L^{2}(R)} for large ii.

The following result shows that the test based on D^MMD2\hat{D}_{\mathrm{MMD}}^{2} cannot achieve a separation boundary of order better than (N+M)2θ2θ+1(N+M)^{\frac{-2\theta}{2\theta+1}}.

Theorem 3.1 (Separation boundary of MMD test).

Suppose (A0)(A_{0}) holds. Let N2N\geq 2, M2M\geq 2, MNDMM\leq N\leq DM, for some constant D>1D>1, k{1,2},k\in\{1,2\}, and

sup(P,Q)𝒫𝒯θuL2(R)<.\sup_{(P,Q)\in\mathcal{P}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}<\infty. (3.3)

Then for any α>0\alpha>0, δ>0,\delta>0, PH0{D^MMD2γk}α,P_{H_{0}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{k}\}\leq\alpha,

inf(P,Q)𝒫PH1{D^MMD2γk}1kδ,k=1,2,\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{k}\}\geq 1-k\delta,\,\,k=1,2,

where γ1=26κα(1N+1M)\gamma_{1}=\frac{2\sqrt{6}\kappa}{\sqrt{\alpha}}\left(\frac{1}{N}+\frac{1}{M}\right), γ2=q1α,\gamma_{2}=q_{1-\alpha},

ΔN,M:=Δ=ck(α,δ)(N+M)2θ2θ+1,\Delta_{N,M}:=\Delta=c_{k}(\alpha,\delta)(N+M)^{\frac{-2\theta}{2\theta+1}},

c1(α,δ)max{α1/2,δ1}c_{1}(\alpha,\delta)\asymp\max\{\alpha^{-1/2},\delta^{-1}\} and c2(α,δ)δ1log1αc_{2}(\alpha,\delta)\asymp\delta^{-1}\log\frac{1}{\alpha}, with q1αq_{1-\alpha} being the (1α)(1-\alpha)-quantile of the permutation function of D^MMD2\hat{D}_{\mathrm{MMD}}^{2} based on (N+M)!(N+M)! permutations of the samples (𝕏N,𝕐M)\left(\mathbb{X}_{N},\mathbb{Y}_{M}\right).
Furthermore, suppose ΔN,M(N+M)2θ2θ+10\Delta_{N,M}(N+M)^{\frac{2\theta}{2\theta+1}}\to 0 as N,MN,M\to\infty and one of the following holds: (i) θ12\theta\geq\frac{1}{2}, (ii) supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, θ>0.\theta>0. Then for any decay rate of (λi)i(\lambda_{i})_{i},

lim infN,Minf(P,Q)𝒫PH1{D^MMD2γk}<1.\liminf_{N,M\to\infty}\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{k}\}<1.
Remark 3.2.

(i) The MMD test with threshold γ1\gamma_{1} is simply based on Chebyshev’s inequality, while the data-dependent threshold γ2\gamma_{2} is based on a permutation test. Theorem 3.1 shows that both these tests yield a separation radius of (N+M)2θ2θ+1(N+M)^{\frac{-2\theta}{2\theta+1}}, which in fact also holds if γ2:=q1α\gamma_{2}:=q_{1-\alpha} is replaced by its Monte-Carlo approximation using only BB random permutations instead of all (M+N)!(M+N)! as long as BB is large enough. This can be shown using the same approach as in Lemma A.14.
(ii) Theorem 3.1 shows that the power of the test based on D^MMD2\hat{D}_{\mathrm{MMD}}^{2} does not go to one, even when N,MN,M\to\infty, which implies that asymptotically the separation boundary of such test is of order (N+M)2θ2θ+1(N+M)^{\frac{-2\theta}{2\theta+1}}. For the threshold γ1\gamma_{1}, we can also show a non asymptotic result that if ΔN,M<dα(N+M)2θ2θ+1\Delta_{N,M}<d_{\alpha}(N+M)^{\frac{-2\theta}{2\theta+1}} for some dα>0d_{\alpha}>0, then inf(P,Q)𝒫PH1{D^MMD2γ}<δ.\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma\}<\delta. However, for the threshold γ2\gamma_{2}, since our proof technique depends on the asymptotic distribution of D^MMD2\hat{D}_{\mathrm{MMD}}^{2}, the result is presented in the asymptotic setting of N,MN,M\to\infty.
(iii) The condition in (3.3) implies that uRan(𝒯θ)u\in\emph{Ran}(\mathcal{T}^{\theta}) for all P,Q𝒫P,Q\in\mathcal{P}. Note that Ran(𝒯1/2)=\emph{Ran}(\mathcal{T}^{1/2})=\mathscr{H}, i.e., uu\in\mathscr{H} if θ=12\theta=\frac{1}{2} and for θ>12\theta>\frac{1}{2}, Ran(𝒯θ)\emph{Ran}(\mathcal{T}^{\theta})\subset\mathscr{H}. When θ<12\theta<\frac{1}{2}, uL2(R)\u\in L^{2}(R)\backslash\mathscr{H} with the property that: for all G>0G>0, f\exists f\in\mathscr{H} such that fG\left\|f\right\|_{\mathscr{H}}\leq G and ufL2(R)2G4θ12θ\left\|u-f\right\|_{L^{2}(R)}^{2}\lesssim G^{\frac{-4\theta}{1-2\theta}}. In other words, uu can be approximated by some function in an RKHS ball of radius GG with the approximation error decaying polynomially in GG (Cucker and Zhou, 2007, Theorem 4.1).
(iv) The uniform boundedness condition supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty does not hold in general, for example, see Minh et al. (2006, Theorem 5), which shows that for 𝒳=Sd1\mathcal{X}=S^{d-1}, where Sd1S^{d-1} denotes the dd-dimensional unit sphere, supiϕi=\sup_{i}\left\|\phi_{i}\right\|_{\infty}=\infty, for all d3d\geq 3 for any kernel of the form K(x,y)=f(x,y2),x,y𝒳K(x,y)=f(\langle x,y\rangle_{2}),\,x,y\in\mathcal{X} with ff being continuous. An example of such a kernel is the Gaussian kernel on Sd1S^{d-1}. On the other hand, when d=2d=2, the Gaussian kernel satisfies the uniform boundedness condition. Also, when 𝒳Rd\mathcal{X}\subset R^{d}, the uniform boundedness condition is satisfied by the Gaussian kernel (Steinwart et al., 2006). In this paper, we provide results both with and without the assumption of supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty to understand the impact of the assumption on the behavior of the test. We would like to mention that this uniform boundedness condition has been used in the analysis of the impact of regularization in kernel learning (see Mendelson and Neeman, 2010, p. 531).
(v) Theorem 3.1 can be viewed as a generalization and extension of Theorem 1 in Balasubramanian et al. (2021), which shows the separation boundary of the goodness-of-fit test, H0:P=P0H_{0}:P=P_{0} vs. H1:PP0H_{1}:P\neq P_{0} based on a V-statistic estimator of DMMD2D_{\mathrm{MMD}}^{2} to be of the order N1/2N^{-1/2} when θ=12\theta=\frac{1}{2}. In their work, the critical level is chosen from the asymptotic distribution of the test statistic under the null with P0P_{0} being known, assuming the uniform boundedness of the eigenfunctions of 𝒯\mathcal{T} and K(,x)dP0(x)=0\int K(\cdot,x)\,dP_{0}(x)=0. Note that the zero mean condition is not satisfied by many popular kernels including the Gaussian and Matérn kernels. In contrast, Theorem 3.1 deals with a two-sample setting based on a UU-statistic estimator of DMMD2D_{\mathrm{MMD}}^{2}, with no requirement of the uniform boundedness assumption and K(,x)dR(x)=0\int K(\cdot,x)\,dR(x)=0, while allowing arbitrary θ>0\theta>0, and the critical levels being non-asymptotic (permutation and concentration-based).

The following result provides general conditions on the minimax separation rate w.r.t. 𝒫\mathcal{P}, which together with Corollaries 3.3 and 3.4 demonstrates the non-optimality of the MMD tests presented in Theorem 3.1.

Theorem 3.2 (Minimax separation boundary).

Suppose λiL(i)\lambda_{i}\asymp L(i), where L()L(\cdot) is a strictly decreasing function on (0,)(0,\infty), and MNDMM\leq N\leq DM. Then, for any 0δ1α,0\leq\delta\leq 1-\alpha, there exists c(α,δ)c(\alpha,\delta) such that if

(N+M)ΔN,Mc(α,δ)min{L1(ΔN,M1/2θ),L1(ΔN,M)}(N+M)\Delta_{N,M}\leq c(\alpha,\delta)\sqrt{\min\left\{L^{-1}\left(\Delta_{N,M}^{1/2\theta}\right),L^{-1}\left(\Delta_{N,M}\right)\right\}}

then

RΔN,M:=infϕΦN,M,αRΔN,M(ϕ)>δ,R^{*}_{\Delta_{N,M}}:=\inf_{\phi\in\Phi_{N,M,\alpha}}R_{\Delta_{N,M}}(\phi)>\delta,

where RΔN,M(ϕ):=sup(P,Q)𝒫𝔼PN×QM[1ϕ].R_{\Delta_{N,M}}(\phi):=\sup_{(P,Q)\in\mathcal{P}}\mathbb{E}_{P^{N}\times Q^{M}}[1-\phi].
Furthermore if supkϕk<\sup_{k}\left\|\phi_{k}\right\|_{\infty}<\infty, then the above condition on ΔN,M\Delta_{N,M} can be replaced by

(N+M)ΔN,Mc(α,δ)min{L1(ΔN,M1/2θ),ΔN,M2}.(N+M)\Delta_{N,M}\leq c(\alpha,\delta)\sqrt{\min\left\{L^{-1}\left(\Delta_{N,M}^{1/2\theta}\right),\Delta_{N,M}^{-2}\right\}}.
Corollary 3.3 (Minimax separation boundary-Polynomial decay).

Suppose λiiβ\lambda_{i}\asymp i^{-\beta}, β>1\beta>1. Then

ΔN,M(N+M)4θβ4θβ+1,\Delta^{*}_{N,M}\asymp(N+M)^{\frac{-4\theta\beta}{4\theta\beta+1}},

provided one of the following holds: (i) θ12\theta\geq\frac{1}{2}, (ii) supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, θ14β.\theta\geq\frac{1}{4\beta}.

Corollary 3.4 (Minimax separation boundary-Exponential decay).

Suppose λieτi\lambda_{i}\asymp e^{-\tau i}, τ>0\tau>0, θ>0\theta>0. Then for all (N+M)kα,δ(N+M)\geq k_{\alpha,\delta} we have

ΔN,Mlog(N+M)N+M,\Delta^{*}_{N,M}\asymp\frac{\sqrt{\log(N+M)}}{N+M},

provided one of the following holds: (i) θ12\theta\geq\frac{1}{2}, (ii) supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, θ>0.\theta>0.

Remark 3.3.

(i) Observe that for any bounded kernel satisfying supxK(x,x)κ,\sup_{x}K(x,x)\leq\kappa, 𝒯\mathcal{T} is a trace class operator. This implies L()L(\cdot) has to satisfy iL(i)0iL(i)\rightarrow 0 as ii\rightarrow\infty. Without further assumptions on the decay rate (i.e., if we allow the space 𝒫\mathcal{P} to include any decay rate of order o(i1)o(i^{-1})), then we can show that ΔN,M(N+M)4θ4θ+1,\Delta^{*}_{N,M}\asymp(N+M)^{\frac{-4\theta}{4\theta+1}}, provided that θ12\theta\geq\frac{1}{2} (or θ14\theta\geq\frac{1}{4} in the case of supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty). However, assuming specific decay rates (i.e., considering a smaller space 𝒫\mathcal{P}), the separation boundary can be improved, as shown in Corollaries 3.3 and 3.4. Note that infβ>14θβ4θβ+1=4θ4θ+1>2θ2θ+1\inf_{\beta>1}\frac{4\theta\beta}{4\theta\beta+1}=\frac{4\theta}{4\theta+1}>\frac{2\theta}{2\theta+1} and 1>2θ2θ+11>\frac{2\theta}{2\theta+1} for any θ>0\theta>0, implying that the separation boundary of MMD is larger than the minimax separation boundary w.r.t. 𝒫\mathcal{P} irrespective of the decay rate of the eigenvalues of 𝒯\mathcal{T}.
(ii) The minimax separation boundary depends only on θ\theta (the degree of smoothness of uu), and β\beta (the decay rate of the eigenvalues), with β\beta controlling the smoothness of the RKHS. While one may think that the minimax rate is independent of dd, it is actually not the case since β\beta depends on dd. Minh et al. (2006, Section 5.2) provides examples of kernels with explicit decay rates of eigenvalues. For example, when 𝒳=Sd1\mathcal{X}=S^{d-1}, d2d\geq 2, (i) The Spline kernel, defined as K(x,t)=1+1|Sd1|k=1λkc(d,k)Pk(d;x,t2)K(x,t)=1+\frac{1}{|S^{d-1}|}\sum_{k=1}^{\infty}\lambda_{k}c(d,k)P_{k}(d;{\left\langle x,t\right\rangle}_{2}), where Pk(d;t)P_{k}(d;t) denotes the Legendre polynomial of degree kk in dimension d,d, and c(d,k)c(d,k) is some normalization constant depending on dd and kk, has λk(k(k+d2))β\lambda_{k}\asymp\left(k(k+d-2)\right)^{-\beta}, for β>d12,\beta>\frac{d-1}{2}, (ii) The polynomial kernel with degree hh, defined as K(x,t)=(1+x,t2)hK(x,t)=(1+{\left\langle x,t\right\rangle}_{2})^{h}, has (k+h+d2)2hd+32λk(k+h+d2)hd+32(k+h+d-2)^{-2h-d+\frac{3}{2}}\lesssim\lambda_{k}\lesssim(k+h+d-2)^{-h-d+\frac{3}{2}}, and (iii) The Gaussian kernel with bandwidth σ2\sigma^{2} satisfies λk(2eσ2)k(2k+d2)kd12\lambda_{k}\asymp\left(\frac{2e}{\sigma^{2}}\right)^{k}(2k+d-2)^{-k-\frac{d-1}{2}}, for σ>2/d.\sigma>\sqrt{2/d}.

4 Spectral regularized MMD test

To address the limitation of the MMD test, in this section, we propose a spectral regularized version of the MMD test and show it to be minimax optimal w.r.t. 𝒫\mathcal{P}. To this end, we define the spectral regularized discrepancy as

ηλ(P,Q):=4𝒯gλ(𝒯)u,uL2(R),\eta_{\lambda}(P,Q):=4{\left\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\right\rangle}_{L^{2}(R)},

where the spectral regularizer, gλ:(0,)(0,)g_{\lambda}:(0,\infty)\rightarrow(0,\infty) satisfies limλ0xgλ(x)1\lim_{\lambda\rightarrow 0}xg_{\lambda}(x)\asymp 1 (more concrete assumptions on gλg_{\lambda} will be introduced later). By functional calculus, we define gλg_{\lambda} applied to any compact, self-adjoint operator \mathcal{B} defined on a separable Hilbert space, HH as

gλ():=i1gλ(τi)(ψiHψi)+gλ(0)(𝑰i1ψiHψi),g_{\lambda}(\mathcal{B}):=\sum_{i\geq 1}g_{\lambda}(\tau_{i})(\psi_{i}\otimes_{H}\psi_{i})+g_{\lambda}(0)\left(\boldsymbol{I}-\sum_{i\geq 1}\psi_{i}\otimes_{H}\psi_{i}\right), (4.1)

where \mathcal{B} has the spectral representation, =iτiψiHψi\mathcal{B}=\sum_{i}\tau_{i}\psi_{i}\otimes_{H}\psi_{i} with (τi,ψi)i(\tau_{i},\psi_{i})_{i} being the eigenvalues and eigenfunctions of \mathcal{B}. A popular example of gλg_{\lambda} is gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda}, yielding gλ()=(+λ𝑰)1g_{\lambda}(\mathcal{B})=(\mathcal{B}+\lambda\boldsymbol{I})^{-1}, which is well known as the Tikhonov regularizer. We will later provide more examples of spectral regularizers that satisfy additional assumptions.

Remark 4.1.

We would like to highlight that the common definition of gλ()g_{\lambda}(\mathcal{B}) in the inverse problem literature (see Engl et al., 1996, Section 2.3) does not include the term gλ(0)(𝐈i1ψiHψi)g_{\lambda}(0)(\boldsymbol{I}-\sum_{i\geq 1}\psi_{i}\otimes_{H}\psi_{i}), which represents the projection onto the space orthogonal to span{ψi:iI}\emph{span}\{\psi_{i}:i\in I\}. The reason for adding this term is to ensure that gλ()g_{\lambda}(\mathcal{B}) is invertible whenever gλ(0)0g_{\lambda}(0)\neq 0. Moreover, the condition that gλ()g_{\lambda}(\mathcal{B}) is invertible will be essential for the power analysis of our test.

Based on the definition of gλ(𝒯)g_{\lambda}(\mathcal{T}), it is easy to verify that 𝒯gλ(𝒯)=i1λigλ(λi)(ϕ~iL2(R)ϕ~i)\mathcal{T}g_{\lambda}(\mathcal{T})=\sum_{i\geq 1}\lambda_{i}g_{\lambda}(\lambda_{i})(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}) so that 𝒯gλ(𝒯)u,uL2(R)=i1λigλ(λi)u,ϕ~i2L2(R)i1u,ϕ~i2L2(R)=()u2L2(R)\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\rangle_{L^{2}(R)}=\sum_{i\geq 1}\lambda_{i}g_{\lambda}(\lambda_{i})\langle u,\tilde{\phi}_{i}\rangle^{2}_{L^{2}(R)}\rightarrow\sum_{i\geq 1}\langle u,\tilde{\phi}_{i}\rangle^{2}_{L^{2}(R)}\stackrel{{\scriptstyle(*)}}{{=}}\|u\|^{2}_{L^{2}(R)} as λ0\lambda\rightarrow 0 where ()(*) holds if uspan{ϕ~i:i1}u\in\text{span}\{\tilde{\phi}_{i}:i\geq 1\}. In fact, it can be shown that u2L2(R)\|u\|^{2}_{L^{2}(R)} and 𝒯gλ(𝒯)u,uL2(R)\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\rangle_{L^{2}(R)} are equivalent up to constants if uRan(𝒯θ)u\in\text{Ran}(\mathcal{T}^{\theta}) and λ\lambda is large enough compared to u2L2(R)\|u\|^{2}_{L^{2}(R)} (see Lemma A.7). Therefore, the issue with DMMDD_{\text{MMD}} can be resolved by using ηλ\eta_{\lambda} as a discrepancy measure to construct a test. In the following, we present details about the construction of the test statistic and the test using ηλ\eta_{\lambda}. To this end, we first provide an alternate representation for ηλ\eta_{\lambda} which is very useful to construct the test statistic. Define ΣR:=ΣPQ=:\Sigma_{R}:=\Sigma_{PQ}=\mathfrak{I}^{*}\mathfrak{I}:\mathscr{H}\rightarrow\mathscr{H}, which is referred to as the covariance operator. It can be shown (Sriperumbudur and Sterge, 2022, Proposition C.2) that ΣPQ\Sigma_{PQ} is a positive, self-adjoint, trace-class operator, and can be written as

ΣPQ\displaystyle\Sigma_{PQ} =𝒳(K(,x)μR)(K(,x)μR)dR(x)\displaystyle=\int_{\mathcal{X}}(K(\cdot,x)-\mu_{R})\otimes_{\mathscr{H}}(K(\cdot,x)-\mu_{R})\,dR(x)
=12𝒳×𝒳(K(,x)K(,y))(K(,x)K(,y))dR(x)dR(y),\displaystyle=\frac{1}{2}\int_{\mathcal{X}\times\mathcal{X}}(K(\cdot,x)-K(\cdot,y))\otimes_{\mathscr{H}}(K(\cdot,x)-K(\cdot,y))\,dR(x)\,dR(y), (4.2)

where μR=𝒳K(,x)dR(x)\mu_{R}=\int_{\mathcal{X}}K(\cdot,x)\,dR(x). Note that

ηλ(P,Q)\displaystyle\eta_{\lambda}(P,Q) =4𝒯gλ(𝒯)u,uL2(R)=()4gλ(ΣPQ)u,uL2(R)\displaystyle=4\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\rangle_{L^{2}(R)}\stackrel{{\scriptstyle(\dagger)}}{{=}}4\langle\mathfrak{I}g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}u,u\rangle_{L^{2}(R)}
=4gλ(ΣPQ)u,u=gλ(ΣPQ)(μPμQ),μPμQ\displaystyle=4\langle g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}u,\mathfrak{I}^{*}u\rangle_{\mathscr{H}}=\langle g_{\lambda}(\Sigma_{PQ})(\mu_{P}-\mu_{Q}),\mu_{P}-\mu_{Q}\rangle_{\mathscr{H}}
=g1/2λ(ΣPQ)(μPμQ)2,\displaystyle=\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})(\mu_{P}-\mu_{Q})\right\|_{\mathscr{H}}^{2}, (4.3)

where ()(\dagger) follows from Lemma A.8(i) that states 𝒯gλ(𝒯)=gλ(ΣPQ)\mathcal{T}g_{\lambda}(\mathcal{T})=\mathfrak{I}g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}. Define ΣPQ,λ:=ΣPQ+λ𝑰\Sigma_{PQ,\lambda}:=\Sigma_{PQ}+\lambda\boldsymbol{I}.

Remark 4.2.

Suppose gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda}. Then g1/2λ(ΣPQ)=ΣPQ,λ1/2g^{1/2}_{\lambda}(\Sigma_{PQ})=\Sigma_{PQ,\lambda}^{-1/2}. Note that ΣPQf,f{\left\langle\Sigma_{PQ}f,f\right\rangle}_{\mathscr{H}} =f𝔼Rf2L2(R)=\left\|f-\mathbb{E}_{R}f\right\|^{2}_{L^{2}(R)} for any ff\in\mathscr{H}, which implies ΣPQ,λf,f=f𝔼Rf2L2(R)+λf2{\left\langle\Sigma_{PQ,\lambda}f,f\right\rangle}_{\mathscr{H}}=\left\|f-\mathbb{E}_{R}f\right\|^{2}_{L^{2}(R)}+\lambda\left\|f\right\|^{2}_{\mathscr{H}}. Therefore, ηλ(P,Q)\eta_{\lambda}(P,Q) in (4.3) can be written as

ηλ(P,Q)\displaystyle\eta_{\lambda}(P,Q) =supf:ΣPQ,λf,f1f,μPμQ\displaystyle=\sup_{f\in\mathscr{H}\,:\,\langle\Sigma_{PQ,\lambda}f,f\rangle_{\mathscr{H}}\leq 1}\langle f,\mu_{P}-\mu_{Q}\rangle_{\mathscr{H}}
=supf:f𝔼RfL2(R)2+λf21𝒳f(x)d(PQ)(x).\displaystyle=\sup_{f\in\mathscr{H}\,:\,\left\|f-\mathbb{E}_{R}f\right\|_{L^{2}(R)}^{2}+\lambda\left\|f\right\|_{\mathscr{H}}^{2}\leq 1}\int_{\mathcal{X}}f(x)\,d(P-Q)(x). (4.4)

This means the regularized discrepancy involves test functions that belong to a growing ball in \mathscr{H} as λ0\lambda\rightarrow 0 in contrast to a fixed unit ball as in the case with D2MMDD^{2}_{\emph{MMD}} (see (1.2)). Balasubramanian et al. (2021) considered a similar discrepancy in a goodness-of-fit test problem, H0:P=P0H_{0}:P=P_{0} vs. H1:PP0H_{1}:P\neq P_{0} where P0P_{0} is known, by using ηλ(P0,P)\eta_{\lambda}(P_{0},P) in (4.4) but with RR being replaced by P0P_{0}. In the context of two-sample testing, Harchaoui et al. (2007) considered a discrepancy based on kernel Fisher discriminant analysis whose regularized version is given by

sup0ff,μPμQf,(ΣP+ΣQ+λ𝑰)f=supf:(ΣP+ΣQ+λ𝑰)f,f1f,μPμQ\displaystyle\sup_{0\neq f\in\mathscr{H}}\frac{\langle f,\mu_{P}-\mu_{Q}\rangle_{\mathscr{H}}}{\langle f,(\Sigma_{P}+\Sigma_{Q}+\lambda\boldsymbol{I})f\rangle_{\mathscr{H}}}=\sup_{f\in\mathscr{H}\,:\,\langle(\Sigma_{P}+\Sigma_{Q}+\lambda\boldsymbol{I})f,f\rangle_{\mathscr{H}}\leq 1}\langle f,\mu_{P}-\mu_{Q}\rangle_{\mathscr{H}}
=supf:f𝔼PfL2(P)2+f𝔼QfL2(Q)2+λf21𝒳f(x)d(PQ)(x),\displaystyle=\sup_{f\in\mathscr{H}\,:\,\|f-\mathbb{E}_{P}f\|_{L^{2}(P)}^{2}+\|f-\mathbb{E}_{Q}f\|_{L^{2}(Q)}^{2}+\lambda\left\|f\right\|_{\mathscr{H}}^{2}\leq 1}\int_{\mathcal{X}}f(x)\,d(P-Q)(x),

where the constraint set in the above variational form is larger than the one in (4.4) since ΣPQ=14[2ΣP+2ΣQ+(μPμQ)(μPμQ)]\Sigma_{PQ}=\frac{1}{4}\left[2\Sigma_{P}+2\Sigma_{Q}+(\mu_{P}-\mu_{Q})\otimes_{\mathscr{H}}(\mu_{P}-\mu_{Q})\right].

4.1 Test statistic

Define A(x,y):=K(,x)K(,y)A(x,y):=K(\cdot,x)-K(\cdot,y). Using the representation,

ηλ(P,Q)=𝒳4gλ(ΣPQ)A(x,y),A(u,w)dP(x)dP(u)dQ(y)dQ(w),\displaystyle\eta_{\lambda}(P,Q)=\int_{\mathcal{X}^{4}}\langle g_{\lambda}(\Sigma_{PQ})A(x,y),A(u,w)\rangle_{\mathscr{H}}\,dP(x)\,dP(u)\,dQ(y)\,dQ(w), (4.5)

obtained by expanding the r.h.s. of (4.4), and of ΣPQ\Sigma_{PQ} in (4.2), we construct an estimator of ηλ(P,Q)\eta_{\lambda}(P,Q) as follows, based on 𝕏N\mathbb{X}_{N} and 𝕐M\mathbb{Y}_{M}. We first split the samples (Xi)i=1N(X_{i})_{i=1}^{N} into (Xi)i=1Ns(X_{i})_{i=1}^{N-s} and (X1i)i=1s:=(Xi)i=Ns+1N(X^{1}_{i})_{i=1}^{s}:=(X_{i})_{i=N-s+1}^{N}, and (Yj)j=1M(Y_{j})_{j=1}^{M} to (Yj)j=1Ms(Y_{j})_{j=1}^{M-s} and (Y1j)j=1s:=(Yj)j=Ms+1M(Y^{1}_{j})_{j=1}^{s}:=(Y_{j})_{j=M-s+1}^{M}. Then, the samples (X1i)i=1s(X^{1}_{i})_{i=1}^{s} and (Y1j)j=1s(Y^{1}_{j})_{j=1}^{s} are used to estimate the covariance operator ΣPQ\Sigma_{PQ} while (Xi)i=1Ns(X_{i})_{i=1}^{N-s} and (Yi)i=1Ms(Y_{i})_{i=1}^{M-s} are used to estimate the mean elements μP\mu_{P} and μQ\mu_{Q}, respectively. Define n:=Nsn:=N-s and m:=Msm:=M-s. Using the form of ηλ\eta_{\lambda} in (4.5), we estimate it using a two-sample UU-statistic (Hoeffding, 1992),

η^λ:=1n(n1)1m(m1)1ijn1ijmh(Xi,Xj,Yi,Yj),\hat{\eta}_{\lambda}:=\frac{1}{n(n-1)}\frac{1}{m(m-1)}\sum_{1\leq i\neq j\leq n}\sum_{1\leq i^{\prime}\neq j^{\prime}\leq m}h(X_{i},X_{j},Y_{i^{\prime}},Y_{j^{\prime}}), (4.6)

where

h(Xi,Xj,Yi,Yj):=g1/2λ(Σ^PQ)A(Xi,Yi),g1/2λ(Σ^PQ)A(Xj,Yj),h(X_{i},X_{j},Y_{i^{\prime}},Y_{j^{\prime}}):={\left\langle g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})A(X_{i},Y_{i^{\prime}}),g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})A(X_{j},Y_{j^{\prime}})\right\rangle}_{\mathscr{H}},

and

Σ^PQ:=12s(s1)ijs(K(,Zi)K(,Zj))(K(,Zi)K(,Zj)),\hat{\Sigma}_{PQ}:=\frac{1}{2s(s-1)}\sum_{i\neq j}^{s}(K(\cdot,Z_{i})-K(\cdot,Z_{j}))\otimes_{\mathscr{H}}(K(\cdot,Z_{i})-K(\cdot,Z_{j})),

which is a one-sample UU-statistic estimator of ΣPQ\Sigma_{PQ} based on Zi=αiX1i+(1αi)Y1iZ_{i}=\alpha_{i}X^{1}_{i}+(1-\alpha_{i})Y^{1}_{i}, for 1is1\leq i\leq s, where (αi)i=1si.i.d.Bernoulli(12)(\alpha_{i})_{i=1}^{s}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Bernoulli}(\frac{1}{2}). It is easy to verify that (Zi)i=1si.i.d.R(Z_{i})_{i=1}^{s}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}R. Note that η^λ\hat{\eta}_{\lambda} is not exactly a UU-statistic since it involves Σ^PQ\hat{\Sigma}_{PQ}, but conditioned on (Zi)i=1s(Z_{i})_{i=1}^{s}, one can see it is exactly a two-sample UU-statistic. When gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda}, in contrast to our estimator which involves sample splitting, Harchaoui et al. (2007) estimate ΣP+ΣQ\Sigma_{P}+\Sigma_{Q} using a pooled estimator, and μP\mu_{P} and μQ\mu_{Q} through empirical estimators, using all the samples, thereby resulting in a kernelized version of Hotelling’s T2T^{2}-statistic (Lehmann and Romano, 2006). However, we consider sample splitting for two reasons: (i) To achieve independence between the covariance operator estimator and the mean element estimators, which leads to a convenient analysis, and (ii) to reduce the computational complexity of η^λ\hat{\eta}_{\lambda} from (N+M)3(N+M)^{3} to (N+M)s2(N+M)s^{2}. By writing (4.6) as

η^λ\displaystyle\hat{\eta}_{\lambda} =\displaystyle{}={} 1n(n1)ijgλ1/2(Σ^PQ)K(,Xi),gλ1/2(Σ^PQ)K(,Xj)\displaystyle\frac{1}{n(n-1)}\sum_{i\neq j}{\left\langle g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})K(\cdot,X_{i}),g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})K(\cdot,X_{j})\right\rangle}_{\mathscr{H}}
+1m(m1)ijgλ1/2(Σ^PQ)K(,Yi),gλ1/2(Σ^PQ)K(,Yj)\displaystyle\qquad+\frac{1}{m(m-1)}\sum_{i\neq j}{\left\langle g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})K(\cdot,Y_{i}),g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})K(\cdot,Y_{j})\right\rangle}_{\mathscr{H}}
2nmi,jgλ1/2(Σ^PQ)K(,Xi),gλ1/2(Σ^PQ)K(,Yj),\displaystyle\qquad\qquad-\frac{2}{nm}\sum_{i,j}{\left\langle g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})K(\cdot,X_{i}),g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})K(\cdot,Y_{j})\right\rangle}_{\mathscr{H}},

the following result shows that η^λ\hat{\eta}_{\lambda} can be computed through matrix operations and by solving a finite-dimensional eigensystem.

Theorem 4.1.

Let (λ^i,αi^)i(\hat{\lambda}_{i},\hat{\alpha_{i}})_{i} be the eigensystem of 1s𝐇~s1/2Ks𝐇~s1/2\frac{1}{s}\tilde{\boldsymbol{H}}_{s}^{1/2}K_{s}\tilde{\boldsymbol{H}}_{s}^{1/2} where Ks:=[K(Zi,Zj)]i,j[s]K_{s}:=[K(Z_{i},Z_{j})]_{i,j\in[s]}, 𝐇s=𝐈s1s𝟏s𝟏s\boldsymbol{H}_{s}=\boldsymbol{I}_{s}-\frac{1}{s}\boldsymbol{1}_{s}\boldsymbol{1}_{s}^{\top}, and 𝐇~s=ss1𝐇s\tilde{\boldsymbol{H}}_{s}=\frac{s}{s-1}\boldsymbol{H}_{s}. Define G:=i(gλ(λ^i)gλ(0)λ^i)α^iα^i.G:=\sum_{i}\left(\frac{g_{\lambda}(\hat{\lambda}_{i})-g_{\lambda}(0)}{\hat{\lambda}_{i}}\right)\hat{\alpha}_{i}\hat{\alpha}_{i}^{\top}. Then

η^λ=1n(n1)(12)+1m(m1)(34)2nm5,\hat{\eta}_{\lambda}=\frac{1}{n(n-1)}\left(\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}-\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{2}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)+\frac{1}{m(m-1)}\left(\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{3}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}-\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{4}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)-\frac{2}{nm}\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{5}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}},

where 1=𝟏nA1𝟏n\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\boldsymbol{1}_{n}^{\top}A_{1}\boldsymbol{1}_{n}, 2=Tr(A1)\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{2}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\emph{Tr}(A_{1}), 3=𝟏nA2𝟏n\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{3}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\boldsymbol{1}_{n}^{\top}A_{2}\boldsymbol{1}_{n}, 4=Tr(A2)\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{4}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\emph{Tr}(A_{2}), and

5=𝟏m(gλ(0)Kmn+1sKms𝑯~1/2sG𝑯~1/2sKns)𝟏n,\leavevmode\hbox to13.33pt{\vbox to13.33pt{\pgfpicture\makeatletter\hbox{\hskip 6.66582pt\lower-6.66582pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.46582pt}{0.0pt}\pgfsys@curveto{6.46582pt}{3.57101pt}{3.57101pt}{6.46582pt}{0.0pt}{6.46582pt}\pgfsys@curveto{-3.57101pt}{6.46582pt}{-6.46582pt}{3.57101pt}{-6.46582pt}{0.0pt}\pgfsys@curveto{-6.46582pt}{-3.57101pt}{-3.57101pt}{-6.46582pt}{0.0pt}{-6.46582pt}\pgfsys@curveto{3.57101pt}{-6.46582pt}{6.46582pt}{-3.57101pt}{6.46582pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.25pt}{-2.9pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\emph{\small{5}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\boldsymbol{1}_{m}^{\top}\left(g_{\lambda}(0)K_{mn}+\frac{1}{s}K_{ms}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ns}^{\top}\right)\boldsymbol{1}_{n},

with A1:=gλ(0)Kn+1sKns𝐇~1/2sG𝐇~1/2sKnsA_{1}:=g_{\lambda}(0)K_{n}+\frac{1}{s}K_{ns}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ns}^{\top} and A2:=gλ(0)Km+1sKms𝐇~1/2sG𝐇~1/2sKmsA_{2}:=g_{\lambda}(0)K_{m}+\frac{1}{s}K_{ms}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ms}^{\top}. Here Kn:=[K(Xi,Xj)]i,j[n]K_{n}:=[K(X_{i},X_{j})]_{i,j\in[n]}, Km:=[K(Yi,Yj)]i,j[m]K_{m}:=[K(Y_{i},Y_{j})]_{i,j\in[m]}, [K(Xi,Zj)]i[n],j[s][K(X_{i},Z_{j})]_{i\in[n],j\in[s]} =:Kns=:K_{ns},
Kms:=[K(Yi,Zj)]i[m],j[s]K_{ms}:=[K(Y_{i},Z_{j})]_{i\in[m],j\in[s]}, and Kmn:=[K(Yi,Xj)]i[m],j[n]K_{mn}:=[K(Y_{i},X_{j})]_{i\in[m],j\in[n]}.

Note that in the case of Tikhonov regularization, G=1λ(1s𝑯~1/2sKs𝑯~1/2s+λ𝑰s)1G=\frac{-1}{\lambda}(\frac{1}{s}\tilde{\boldsymbol{H}}^{1/2}_{s}K_{s}\tilde{\boldsymbol{H}}^{1/2}_{s}+\lambda\boldsymbol{I}_{s})^{-1}. The complexity of computing η^λ\hat{\eta}_{\lambda} is given by O(s3+m2+n2+ns2+ms2)O(s^{3}+m^{2}+n^{2}+ns^{2}+ms^{2}), which is comparable to that of the MMD test if s=o(N+M)s=o(\sqrt{N+M}), otherwise the proposed test is computationally more complex than the MMD test.

4.2 Oracle test

Before we present the test, we make the following assumptions on gλg_{\lambda} which will be used throughout the analysis.

(A1)supxΓ|xgλ(x)|C1,\displaystyle(A_{1})\,\sup_{x\in\Gamma}|xg_{\lambda}(x)|\leq C_{1}, (A2)supxΓ|λgλ(x)|C2,\displaystyle(A_{2})\,\sup_{x\in\Gamma}|\lambda g_{\lambda}(x)|\leq C_{2},
(A3)sup{xΓ:xgλ(x)<B3}|B3xgλ(x)|x2φC3λ2φ,\displaystyle(A_{3})\,\sup_{\{x\in\Gamma:xg_{\lambda}(x)<B_{3}\}}|B_{3}-xg_{\lambda}(x)|x^{2\varphi}\leq C_{3}\lambda^{2\varphi}, (A4)infxΓgλ(x)(x+λ)C4,\displaystyle(A_{4})\,\inf_{x\in\Gamma}g_{\lambda}(x)(x+\lambda)\geq C_{4},

where Γ:=[0,κ]\Gamma:=[0,\kappa], φ(0,ξ]\varphi\in(0,\xi] with the constant ξ\xi being called the qualification of gλg_{\lambda}, and C1C_{1}, C2C_{2}, C3C_{3}, B3B_{3}, C4C_{4} are finite positive constants, all independent of λ>0\lambda>0. Note that (A3A_{3}) necessarily implies that xgλ(x)1xg_{\lambda}(x)\asymp 1 as λ0\lambda\to 0 (see Lemma A.20), and ξ\xi controls the rate of convergence, which combined with (A1)(A_{1}) yields upper and lower bounds on ηλ\eta_{\lambda} in terms of uL2(R)2\left\|u\right\|_{L^{2}(R)}^{2} (see Lemma A.7).

Remark 4.3.

In the inverse problem literature (see Engl et al., 1996, Theorems 4.1, 4.3 and Corollary 4.4; Bauer et al., 2007, Definition 1), (A1)(A_{1}) and (A2)(A_{2}) are common assumptions with (A3)(A_{3}) being replaced by supxΓ|1xgλ(x)|x2φC3λ2φ\sup_{x\in\Gamma}|1-xg_{\lambda}(x)|x^{2\varphi}\leq C_{3}\lambda^{2\varphi}. These assumptions are also used in the analysis of spectral regularized kernel ridge regression (Bauer et al., 2007). However, (A3)(A_{3}) is less restrictive than supxΓ|1xgλ(x)|x2φC3λ2φ\sup_{x\in\Gamma}|1-xg_{\lambda}(x)|x^{2\varphi}\leq C_{3}\lambda^{2\varphi} and allows for higher qualification for gλg_{\lambda}. For example, when gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda}, the condition supxΓ|1xgλ(x)|x2φC3λ2φ\sup_{x\in\Gamma}|1-xg_{\lambda}(x)|x^{2\varphi}\leq C_{3}\lambda^{2\varphi} holds only for φ(0,12]\varphi\in(0,\frac{1}{2}], while (A3)(A_{3}) holds for any φ>0\varphi>0 (i.e., infinite qualification with no saturation at φ=12\varphi=\frac{1}{2}) by setting B3=12B_{3}=\frac{1}{2} and C3=1C_{3}=1, i.e., supxλ(12xx+λ)x2φλ2φ\sup_{x\leq\lambda}(\frac{1}{2}-\frac{x}{x+\lambda})x^{2\varphi}\leq\lambda^{2\varphi} for all φ>0\varphi>0. Intuitively, the standard assumption from inverse problem literature is concerned about the rate at which xgλ(x)xg_{\lambda}(x) approaches 1 uniformly, however in our case, it turns out that we are interested in the rate at which ηλ\eta_{\lambda} becomes greater than cuL2(R)2c\left\|u\right\|_{L^{2}(R)}^{2} for some constant c>0c>0, leading to a weaker condition. (A4)(A_{4}) is not used in the inverse problem literature but is crucial in our analysis (see Remark 4.4(iii)).

Some examples of gλg_{\lambda} that satisfy (A1)(A_{1})(A4)(A_{4}) include the Tikhonov regularizer, gλ(x)=1x+λg_{\lambda}(x)=\frac{1}{x+\lambda}, and Showalter regularizer, gλ(x)=1ex/λx𝟙{x0}+1λ𝟙{x=0}g_{\lambda}(x)=\frac{1-e^{-x/\lambda}}{x}\mathds{1}_{\{x\neq 0\}}+\frac{1}{\lambda}\mathds{1}_{\{x=0\}}, where both have qualification ξ=\xi=\infty. Note that the spectral cutoff regularizer is defined as gλ(x)=1x𝟙{xλ}g_{\lambda}(x)=\frac{1}{x}\mathds{1}_{\{x\geq\lambda\}} satisfies (A1)(A_{1})(A3)(A_{3}) with ξ=\xi=\infty but unfortunately does not satisfy (A4)(A_{4}) since gλ(0)=0g_{\lambda}(0)=0. Now, we are ready to present a test based on η^λ\hat{\eta}_{\lambda} where gλg_{\lambda} satisfies (A1)(A_{1})(A4)(A_{4}). Define

𝒩1(λ):=Tr(ΣPQ,λ1/2ΣPQΣPQ,λ1/2)and𝒩2(λ):=ΣPQ,λ1/2ΣPQΣPQ,λ1/22(),\mathcal{N}_{1}(\lambda):=\text{Tr}(\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2})\,\,\,\text{and}\,\,\,\mathcal{N}_{2}(\lambda):=\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})},

which capture the intrinsic dimensionality (or degrees of freedom) of \mathscr{H}. 𝒩1(λ)\mathcal{N}_{1}(\lambda) appears quite heavily in the analysis of kernel ridge regression (e.g., Caponnetto and Vito, 2007). The following result provides a critical region with level α\alpha.

Theorem 4.2 (Critical region–Oracle).

Let n2n\geq 2 and m2m\geq 2. Suppose (A0)(A_{0})(A2)(A_{2}) hold. Then for any α>0\alpha>0 and 140κslog48κsαλΣPQ()\frac{140\kappa}{s}\log\frac{48\kappa s}{\alpha}\leq\lambda\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})},

PH0{η^λγ}α,P_{H_{0}}\{\hat{\eta}_{\lambda}\geq\gamma\}\leq\alpha,

where γ=62(C1+C2)𝒩2(λ)α(1n+1m).\gamma=\frac{6\sqrt{2}(C_{1}+C_{2})\mathcal{N}_{2}(\lambda)}{\sqrt{\alpha}}\left(\frac{1}{n}+\frac{1}{m}\right). Furthermore if C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, the above bound holds for 136C2𝒩1(λ)log24𝒩1(λ)δs136C^{2}\mathcal{N}_{1}(\lambda)\log\frac{24\mathcal{N}_{1}(\lambda)}{\delta}\leq s and λΣPQ()\lambda\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}.

First, note that the above result yields an α\alpha-level test that rejects H0H_{0} when η^λγ\hat{\eta}_{\lambda}\geq\gamma. But the critical level γ\gamma depends on 𝒩2(λ)\mathcal{N}_{2}(\lambda) which in turn depends on the unknown distributions PP and QQ. Therefore, we call the above test the Oracle test. Later in Sections 4.3 and 4.4, we present a completely data-driven test based on the permutation approach that matches the performance of the Oracle test. Second, the above theorem imposes a condition on λ\lambda with respect to ss in order to control the Type-I error, where this restriction can be weakened if we further assume the uniform boundedness of the eigenfunctions, i.e., supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty. Moreover, the condition on λ\lambda implies that λ\lambda cannot decay to zero faster than logss\frac{\log s}{s}.

Next, we will analyze the power of the Oracle test. Note that

PH1(η^λγ)=PH1(η^ληλγηλ)1VarH1(η^λ)/(ηλγ)2P_{H_{1}}(\hat{\eta}_{\lambda}\geq\gamma)=P_{H_{1}}(\hat{\eta}_{\lambda}-\eta_{\lambda}\geq\gamma-\eta_{\lambda})\geq 1-\text{Var}_{H_{1}}(\hat{\eta}_{\lambda})/(\eta_{\lambda}-\gamma)^{2}, which implies that the power is controlled by VarH1(η^λ)\text{Var}_{H_{1}}(\hat{\eta}_{\lambda}) and ηλ\eta_{\lambda}. Lemma A.7 shows that ηλuL2(R)2\eta_{\lambda}\gtrsim\left\|u\right\|_{L^{2}(R)}^{2}, provided that gλg_{\lambda} satisfies (A3)(A_{3}), uRan(𝒯θ)u\in\text{\text{Ran}}(\mathcal{T}^{\theta}), θ>0\theta>0 and ρ¯2(P,Q)=()uL2(R)2λ2min(θ,ξ)\underline{\rho}^{2}(P,Q)\stackrel{{\scriptstyle(*)}}{{=}}\left\|u\right\|_{L^{2}(R)}^{2}\gtrsim\lambda^{2\min(\theta,\xi)}, where ()(*) follows from Lemma A.18. Combining this bound with the bound on λ\lambda in Theorem 4.2 provides a condition on the separation boundary. Additional sufficient conditions on ΔN,M\Delta_{N,M} and λ\lambda are obtained by controlling VarH1(η^λ)/u4L2(R)\text{Var}_{H_{1}}(\hat{\eta}_{\lambda})/\|u\|^{4}_{L^{2}(R)} to achieve the desired power, which is captured by the following result.

Remark 4.4.

(i) Note that while VarH1(D^MMD2)\emph{Var}_{H_{1}}(\hat{D}_{\mathrm{MMD}}^{2}) is lower than VarH1(η^λ)\emph{Var}_{H_{1}}(\hat{\eta}_{\lambda}) (see the proofs of Theorem 3.1 and Lemma A.12), the rate at which D2MMDD^{2}_{\mathrm{MMD}} approaches u2L2(R)\left\|u\right\|^{2}_{L^{2}(R)} is much slower than that of η^λ\hat{\eta}_{\lambda} (see Lemmas A.19 and A.7). Thus one can think of this phenomenon as a kind of estimation-approximation error trade-off for the separation boundary rate.
(ii) Observe from the condition ρ¯2(P,Q)=uL2(R)2λ2min(θ,ξ)𝒯θuL2(R)2\underline{\rho}^{2}(P,Q)=\left\|u\right\|_{L^{2}(R)}^{2}\gtrsim\lambda^{2\min(\theta,\xi)}\|\mathcal{T}^{-\theta}u\|_{L^{2}(R)}^{2} that larger ξ\xi corresponds to a smaller separation boundary. Therefore, it is important to work with regularizers with infinite qualification, such as Tikhonov and Showalter.
(iii) The assumption (A4)(A_{4}) plays a crucial role in controlling the power of the test and in providing the conditions on the separation boundary in terms of ρ¯2(P,Q)\underline{\rho}^{2}(P,Q). Note that PH1(η^λ>γ)=𝔼(P(η^λγ|(Zi)i=1s)𝔼[𝟙A(1Var(η^λ|(Zi)i=1s)(ζγ)2)]P_{H_{1}}(\hat{\eta}_{\lambda}>\gamma)=\mathbb{E}(P(\hat{\eta}_{\lambda}\geq\gamma|(Z_{i})_{i=1}^{s})\geq\mathbb{E}\left[\mathds{1}_{A}\left(1-\frac{\emph{Var}(\hat{\eta}_{\lambda}|(Z_{i})_{i=1}^{s})}{(\zeta-\gamma)^{2}}\right)\right] for any set AA—we choose AA to be such that 2()\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})} and 12()\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})} are bounded in probability with :=Σ^1/2PQ,λΣ1/2PQ,λ\mathcal{M}:=\hat{\Sigma}^{-1/2}_{PQ,\lambda}\Sigma^{1/2}_{PQ,\lambda}—, where 𝔼(η^λ|(Zi)i=1s)=:ζ=g1/2λ(Σ^PQ)(μPμQ)2\mathbb{E}(\hat{\eta}_{\lambda}|(Z_{i})_{i=1}^{s})=:\zeta=\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})(\mu_{P}-\mu_{Q})\|_{\mathscr{H}}^{2}.

If gλg_{\lambda} satisfies (A4)(A_{4}), then it implies that gλ(0)0g_{\lambda}(0)\neq 0, hence g1/2λ(Σ^PQ)g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ}) is invertible. Thus, ηλg1/2λ(ΣPQ)g1/2λ(Σ^PQ)()2ζ\eta_{\lambda}\leq\|g^{1/2}_{\lambda}(\Sigma_{PQ})g^{-1/2}_{\lambda}(\hat{\Sigma}_{PQ})\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\zeta. Moreover, (A4)(A_{4}) yields that ζηλ\zeta\gtrsim\eta_{\lambda} with high probability (see Lemma A.11, Sriperumbudur and Sterge, 2022, Lemma A.1(ii)), which when combined with Lemma A.7, yields sufficient conditions on the separation boundary to control the power.
(iv) As aforementioned, the spectral cutoff regularizer does not satisfy (A4)(A_{4}). For gλg_{\lambda} that does not satisfy (A4)(A_{4}), an alternative approach can be used to obtain a lower bound on ζ\zeta. Observe that ζ=(gλ(Σ^PQ)gλ(ΣPQ))u,uL2(R)+ηλ\zeta=\langle\mathfrak{I}(g_{\lambda}(\hat{\Sigma}_{PQ})-g_{\lambda}(\Sigma_{PQ}))\mathfrak{I}^{*}u,u\rangle_{L^{2}(R)}+\eta_{\lambda}, hence

ζηλ(gλ(Σ^PQ)gλ(ΣPQ))()uL2(R)2.\zeta\geq\eta_{\lambda}-\|\mathfrak{I}(g_{\lambda}(\hat{\Sigma}_{PQ})-g_{\lambda}(\Sigma_{PQ}))\mathfrak{I}^{*}\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|u\right\|_{L^{2}(R)}^{2}.

However, the upper bound that we can achieve on (gλ(Σ^PQ)gλ(ΣPQ))()\|\mathfrak{I}(g_{\lambda}(\hat{\Sigma}_{PQ})-g_{\lambda}(\Sigma_{PQ}))\mathfrak{I}^{*}\|_{\mathcal{L}^{\infty}(\mathscr{H})} is worse than the bound on g1/2λ(ΣPQ)g1/2λ(Σ^PQ)()2\|g^{1/2}_{\lambda}(\Sigma_{PQ})g^{-1/2}_{\lambda}(\hat{\Sigma}_{PQ})\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}, eventually leading to a worse separation boundary. Improving such bounds (if possible) and investigating this problem can be of independent interest and left for future analysis.

For the rest of the paper, we make the following assumption.
(B)(B)\qquad M<N<DMM<N<DM for some constant D1D\geq 1.
This assumption helps to keep the results simple and presents the separation rate in terms of N+MN+M. If this assumption is not satisfied, the analysis can still be carried out but leads to messy calculations with the separation rate depending on min(N,M)\text{min}(N,M).

Theorem 4.3 (Separation boundary–Oracle).

Suppose (A0)(A_{0})(A4)(A_{4}) and (B)(B) hold. Let s=d1N=d2Ms=d_{1}N=d_{2}M, sup(P,Q)𝒫𝒯θuL2(R)<,\sup_{(P,Q)\in\mathcal{P}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}<\infty, ΣPQ()λ=dθΔN,M12θ~\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\geq\lambda=d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}, for some constants 0<d1,d2<10<d_{1},d_{2}<1 and dθ>0d_{\theta}>0, where dθd_{\theta} is a constant that depends on θ\theta. For any 0<δ10<\delta\leq 1, if N+M32κd1δN+M\geq\frac{32\kappa d_{1}}{\delta} and ΔN,M\Delta_{N,M} satisfies

ΔN,M2θ~+12θ~𝒩2(dθΔN,M12θ~)dθ1δ2(N+M)2,ΔN,M𝒩2(dθΔN,M12θ~)(α1/2+δ1)N+M,\frac{\Delta_{N,M}^{\frac{2\tilde{\theta}+1}{2\tilde{\theta}}}}{\mathcal{N}_{2}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{d_{\theta}^{-1}\delta^{-2}}{(N+M)^{2}},\qquad\,\frac{\Delta_{N,M}}{\mathcal{N}_{2}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{(\alpha^{-1/2}+\delta^{-1})}{N+M},

and

ΔN,Mcθ(N+Mlog(N+M))2θ~,\Delta_{N,M}\geq c_{\theta}\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}},

then

inf(P,Q)𝒫PH1{η^λγ}12δ,\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\gamma\right\}\geq 1-2\delta, (4.7)

where γ=62(C1+C2)𝒩2(λ)α(1n+1m)\gamma=\frac{6\sqrt{2}(C_{1}+C_{2})\mathcal{N}_{2}(\lambda)}{\sqrt{\alpha}}\left(\frac{1}{n}+\frac{1}{m}\right), cθ>0c_{\theta}>0 is a constant that depends on θ\theta, and θ~=min(θ,ξ)\tilde{\theta}=\min(\theta,\xi).

Furthermore, suppose N+Mmax{32δ1,ed1/272C2}N+M\geq\max\{32\delta^{-1},e^{d_{1}/272C^{2}}\} and C:=supiϕiC:=\sup_{i}\left\|\phi_{i}\right\|_{\infty} <<\infty. Then (4.7) holds when the above conditions on ΔN,M\Delta_{N,M} are replaced by

ΔN,M𝒩1(dθΔN,M12θ~)δ2(N+M)2,ΔN,M𝒩2(dθΔN,M12θ~)(α1/2+δ1)N+M,\frac{\Delta_{N,M}}{\mathcal{N}_{1}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{\delta^{-2}}{(N+M)^{2}},\qquad\frac{\Delta_{N,M}}{\mathcal{N}_{2}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{(\alpha^{-1/2}+\delta^{-1})}{N+M},

and

𝒩1(dθΔN,M12θ~)(N+Mlog(N+M)).\mathcal{N}_{1}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)\lesssim\left(\frac{N+M}{\log(N+M)}\right).

The above result is too general to appreciate the performance of the Oracle test. The following corollaries to Theorem 4.3 investigate the separation boundary of the test under the polynomial and exponential decay condition on the eigenvalues of ΣPQ\Sigma_{PQ}.

Corollary 4.4 (Polynomial decay–Oracle).

Suppose λiiβ\lambda_{i}\lesssim i^{-\beta}, β>1\beta>1. Then there exists kθ,βk_{\theta,\beta}\in\mathbb{N} such that for all N+Mkθ,βN+M\geq k_{\theta,\beta} and for any δ>0\delta>0,

inf(P,Q)𝒫PH1{η^λγ}12δ,\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\gamma\right\}\geq 1-2\delta,

when

ΔN,M={c(α,δ,θ)(N+M)4θ~β4θ~β+1,θ~>1214βc(α,δ,θ)(log(N+M)N+M)2θ~,θ~1214β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{2}-\frac{1}{4\beta}\\ c(\alpha,\delta,\theta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}-\frac{1}{4\beta}\end{array}\right.,

with c(α,δ,θ)(α1/2+δ2+d32θ~)c(\alpha,\delta,\theta)\gtrsim(\alpha^{-1/2}+\delta^{-2}+d_{3}^{2\tilde{\theta}}) for some constant d3>0d_{3}>0. Furthermore, if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then

ΔN,M={c(α,δ,θ,β)(N+M)4θ~β4θ~β+1,θ~>14βc(α,δ,θ,β)(log(N+M)N+M)2θ~β,θ~14β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta,\beta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{4\beta}\\ c(\alpha,\delta,\theta,\beta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}\beta},&\ \ \tilde{\theta}\leq\frac{1}{4\beta}\end{array}\right.,

where c(α,δ,θ,β)(α1/2+δ2+d42θ~β)c(\alpha,\delta,\theta,\beta)\gtrsim(\alpha^{-1/2}+\delta^{-2}+d_{4}^{2\tilde{\theta}\beta}) for some constant d4>0d_{4}>0.

Corollary 4.5 (Exponential decay–Oracle).

Suppose λieτi\lambda_{i}\lesssim e^{-\tau i}, τ>0\tau>0. Then for any δ>0\delta>0, there exists kθk_{\theta} such that for all N+MkθN+M\geq k_{\theta},

inf(P,Q)𝒫PH1{η^λγ}12δ,\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\gamma\right\}\geq 1-2\delta,

when

ΔN,M={c(α,δ,θ)log(N+M)N+M,θ~>12c(α,δ,θ)(log(N+M)N+M)2θ~,θ~12,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},&\ \ \tilde{\theta}>\frac{1}{2}\\ c(\alpha,\delta,\theta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}\end{array}\right.,

where c(α,δ,θ)max{12θ~,1}(α1/2+δ2+d52θ~)c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}(\alpha^{-1/2}+\delta^{-2}+d_{5}^{2\tilde{\theta}}) for some d5>0d_{5}>0. Furthermore, if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then

ΔN,M=c(α,δ,θ)log(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}(α1/2+δ2).c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}(\alpha^{-1/2}+\delta^{-2}).

Remark 4.5.

(i) For a bounded kernel KK, note that 𝒯\mathcal{T} is trace class operator. With no further assumption on the decay of (λi)i(\lambda_{i})_{i}, it can be shown that the separation boundary has the same expression as in Corollary 4.4 for β=1\beta=1 (see Remark 3.3(i) for the minimax separation). Under additional assumptions on the decay rate, the separation boundary improves (as shown in the above corollaries) unlike the separation boundary in Theorem 3.1 which does not capture the decay rate.
(ii) Suppose gλg_{\lambda} has infinite qualification, ξ=\xi=\infty, then θ~=θ\tilde{\theta}=\theta. Comparison of Corollaries 4.4 and 3.3 shows that the oracle test is minimax optimal w.r.t. 𝒫\mathcal{P} in the ranges of θ\theta as given in Corollary 3.3 if the eigenvalues of 𝒯\mathcal{T} decay polynomially. Similarly, if the eigenvalues of 𝒯\mathcal{T} decay exponentially, it follows from Corollary 4.5 and Corollary 3.4 that the Oracle test is minimax optimal w.r.t. 𝒫\mathcal{P} if θ>12\theta>\frac{1}{2} (resp. θ>0\theta>0 if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty). Outside these ranges of θ\theta, the optimality of the oracle test remains an open question since we do not have a minimax separation covering these ranges of θ\theta.
(iii) On the other hand, if gλg_{\lambda} has a finite qualification, ξ\xi, then the test does not capture the smoothness of uu beyond ξ\xi, i.e., the test only captures the smoothness up to ξ\xi, which implies the test is minimax optimal only for θξ\theta\leq\xi. Therefore, it is important to use spectral regularizers with infinite qualification.
(iv) Note that the splitting choice s=d1N=d2Ms=d_{1}N=d_{2}M yields that the complexity of computing the test statistic is of the order (N+M)3(N+M)^{3}. However, it is worth noting that such a choice is just to keep the splitting method independent of θ\theta and β\beta. But, in practice, a smaller order of ss still performs well (see Section 5). This can be theoretically justified by following the proof of Theorem 4.3 and its application to Corollaries 4.4 and 4.5, that for the polynomial decay of eigenvalues if θ>1214β\theta>\frac{1}{2}-\frac{1}{4\beta}, we can choose s(N+M)2β4θ~β+1s\asymp(N+M)^{\frac{2\beta}{4\tilde{\theta}\beta+1}} and still achieve the same separation boundary (up to log factor) and furthermore if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty and θ>14β\theta>\frac{1}{4\beta}, then we can choose s(N+M)24θ~β+1s\asymp(N+M)^{\frac{2}{4\tilde{\theta}\beta+1}}. Thus, as θ\theta increases ss can be of a much lower order than N+MN+M. Similarly, for the exponential decay case, when θ>12\theta>\frac{1}{2}, we can choose s(N+M)12θ~s\asymp(N+M)^{\frac{1}{2\tilde{\theta}}} and still achieve the same separation boundary (up to log\sqrt{\log} factor), and furthermore if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then for θlog(N+M)log(log(N+M))N+M\theta\geq\frac{\log(N+M)\log(\log(N+M))}{N+M}, we can choose s1θlog(N+M)log(log(N+M))s\asymp\frac{1}{\theta}\log(N+M)\log(\log(N+M)) and achieve the same separation boundary.
(v) The key intuition in using the uniform bounded condition, i.e., supiϕi<\sup_{i}\|\phi_{i}\|_{\infty}<\infty is that it provides a reduction in the variance when applying Chebyshev’s (or Bernstein’s) inequality, which in turn provides an improvement in the separation rate, as can be seen in Corollaries 3.3, 3.4, 4.4, and 4.5, wherein the minimax optimal rate is valid for a large range of θ\theta compared to the case where this assumption is not made.

4.3 Permutation test

In the previous section, we established the minimax optimality w.r.t. 𝒫\mathcal{P} of the regularized test based on η^λ\hat{\eta}_{\lambda}. However, this test is not practical because of the dependence of the threshold γ\gamma on 𝒩2(λ)\mathcal{N}_{2}(\lambda) which is unknown in practice since we do not know PP and QQ. In order to achieve a more practical threshold, one way is to estimate 𝒩2(λ)\mathcal{N}_{2}(\lambda) from data and use the resultant critical value to construct a test. However, in this section, we resort to ideas from permutation testing (Lehmann and Romano, 2006, Pesarin and Salmaso, 2010, Kim et al., 2022) to construct a data-driven threshold. Below, we first introduce the idea of permutation tests, then present a permutation test based on η^λ\hat{\eta}_{\lambda}, and provide theoretical results that such a test can still achieve minimax optimal separation boundary w.r.t. 𝒫\mathcal{P}, and in fact with a better dependency on α\alpha of the order log1α\log\frac{1}{\alpha} compared to 1α\frac{1}{\sqrt{\alpha}} of the Oracle test.

Recall that our test statistic defined in Section 4.1 involves sample splitting resulting in three sets of independent samples, (Xi)i=1ni.i.d.P(X_{i})_{i=1}^{n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}P, (Yj)j=1mi.i.d.Q(Y_{j})_{j=1}^{m}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}Q, (Zi)i=1si.i.d.P+Q2(Z_{i})_{i=1}^{s}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\frac{P+Q}{2}. Define (Ui)i=1n:=(Xi)i=1n(U_{i})_{i=1}^{n}:=(X_{i})_{i=1}^{n}, and (Un+j)j=1m:=(Yj)j=1m(U_{n+j})_{j=1}^{m}:=(Y_{j})_{j=1}^{m}. Let Πn+m\Pi_{n+m} be the set of all possible permutations of {1,,n+m}\{1,\ldots,n+m\} with πΠn+m\pi\in\Pi_{n+m} be a randomly selected permutation from the DD possible permutations, where D:=|Πn+m|=(n+m)!D:=|\Pi_{n+m}|=(n+m)!. Define (Xπi)i=1n:=(Uπ(i))i=1n(X^{\pi}_{i})_{i=1}^{n}:=(U_{\pi(i)})_{i=1}^{n} and (Yπj)j=1m:=(Uπ(n+j))j=1m(Y^{\pi}_{j})_{j=1}^{m}:=(U_{\pi(n+j)})_{j=1}^{m}. Let η^πλ:=η^λ(Xπ,Yπ,Z)\hat{\eta}^{\pi}_{\lambda}:=\hat{\eta}_{\lambda}(X^{\pi},Y^{\pi},Z) be the statistic based on the permuted samples. Let (πi)i=1B(\pi^{i})_{i=1}^{B} be BB randomly selected permutations from Πn+m\Pi_{n+m}. For simplicity, define η^iλ:=η^πiλ\hat{\eta}^{i}_{\lambda}:=\hat{\eta}^{\pi^{i}}_{\lambda} to represent the statistic based on permuted samples w.r.t. the random permutation πi\pi^{i}. Given the samples (Xi)i=1n(X_{i})_{i=1}^{n}, (Yj)j=1m(Y_{j})_{j=1}^{m} and (Zi)i=1s(Z_{i})_{i=1}^{s}, define

Fλ(x):=1DπΠn+m𝟙(η^πλx)F_{\lambda}(x):=\frac{1}{D}\sum_{\pi\in\Pi_{n+m}}\mathds{1}(\hat{\eta}^{\pi}_{\lambda}\leq x)

to be the permutation distribution function, and define

q1αλ:=inf{q:Fλ(q)1α}.q_{1-\alpha}^{\lambda}:=\inf\{q\in\mathbb{R}:F_{\lambda}(q)\geq 1-\alpha\}.

Furthermore, we define the empirical permutation distribution function based on BB random permutations as

F^Bλ(x):=1Bi=1B𝟙(η^iλx),\hat{F}^{B}_{\lambda}(x):=\frac{1}{B}\sum_{i=1}^{B}\mathds{1}(\hat{\eta}^{i}_{\lambda}\leq x),

and define

q^1αB,λ:=inf{q:F^Bλ(q)1α}.\hat{q}_{1-\alpha}^{B,\lambda}:=\inf\{q\in\mathbb{R}:\hat{F}^{B}_{\lambda}(q)\geq 1-\alpha\}.

Based on these notations, the following result presents an α\alpha-level test with a completely data-driven critical level.

Theorem 4.6 (Critical region–permutation).

For any 0<α10<\alpha\leq 1 and 0<w+w~<10<w+\tilde{w}<1, if B12w~2α2log2α(1ww~)B\geq\frac{1}{2\tilde{w}^{2}\alpha^{2}}\log\frac{2}{\alpha(1-w-\tilde{w})}, then

PH0{η^λq^1wαB,λ}α.P_{H_{0}}\{\hat{\eta}_{\lambda}\geq\hat{q}_{1-w\alpha}^{B,\lambda}\}\leq\alpha.

Note that the above result holds for any statistic, not necessarily η^λ\hat{\eta}_{\lambda}, thus it does not require any assumption on gλg_{\lambda} as opposed to Theorem 4.2. This follows from the exchangeability of the samples under H0H_{0} and the way q1αλq_{1-\alpha}^{\lambda} is defined, thus it is well known that this approach will yield an α\alpha-level test when using q1αλq_{1-\alpha}^{\lambda} as the threshold.

Remark 4.6.

(i) The requirement on BB in Theorem 4.6 is to ensure the proximity of q^1αB,λ\hat{q}_{1-\alpha}^{B,\lambda} to q1αλq_{1-\alpha}^{\lambda} (refer to Lemma A.14), through an application of Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (Dvoretzky et al., 1956). The parameters ww and w~\tilde{w} introduced in the statement of Theorem 4.6 appear for technical reasons within the proof of Lemma A.14 since Lemma A.14 does not directly establish a relationship between the true quantile q1αλq_{1-\alpha}^{\lambda} and the estimated quantile q^1αB,λ\hat{q}_{1-\alpha}^{B,\lambda}. Instead, it associates the true quantile with an arbitrary small shift in α\alpha to the estimated quantile, hence the inclusion of these parameters. However, in practice, we rely solely on the quantile q^1αB,λ\hat{q}_{1-\alpha}^{B,\lambda}, which will be very close to the true quantile q1αλq_{1-\alpha}^{\lambda} for a sufficiently large BB.

(ii) Another approach to construct q^1αB,λ\hat{q}_{1-\alpha}^{B,\lambda} is by using F^Bλ(x):=1B+1Bi=0𝟙(η^iλx)\hat{F}^{B}_{\lambda}(x):=\frac{1}{B+1}\sum^{B}_{i=0}\mathds{1}(\hat{\eta}^{i}_{\lambda}\leq x) instead of the above definition, where the new definition of F^Bλ\hat{F}^{B}_{\lambda} includes the unpermuted statistic η^λ\hat{\eta}_{\lambda} (see Romano and Wolf 2005, Lemma 1 and Albert et al. 2022, Proposition 1). The advantage of this new construction is that the Type-I error is always smaller than α\alpha for all BB, i.e., no condition is needed on BB. However, a condition on BB similar to that in Theorem 4.7 is anyway needed to achieve the required power. Therefore, the condition on BB in Theorem 4.6 does not impose an additional constraint in the power analysis. In practice, we observed that the new approach requires a significantly large BB to achieve similar power, leading to an increased computational requirement. Thus, we use the construction in Theorem 4.6 in our numerical experiments

Next, similar to Theorem 4.3, in the following result, we give the general conditions under which the power level can be controlled.

Theorem 4.7 (Separation boundary–permutation).

Suppose (A0)(A_{0})(A4)(A_{4}) and (B)(B) hold. Let s=d1N=d2Ms=d_{1}N=d_{2}M, sup(P,Q)𝒫𝒯θuL2(R)<,\sup_{(P,Q)\in\mathcal{P}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}<\infty, ΣPQ()λ=dθΔN,M12θ~\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\geq\lambda=d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}, for some constants 0<d1,d2<10<d_{1},d_{2}<1 and dθ>0d_{\theta}>0, where dθd_{\theta} is a constant that depends on θ\theta. For any 0<δ10<\delta\leq 1, if (N+M)max{d3δ1/2log1(ww~)α,32κd1δ1}(N+M)\geq\max\left\{d_{3}\delta^{-1/2}\log\frac{1}{(w-\tilde{w})\alpha},32\kappa d_{1}\delta^{-1}\right\} for some d3>0d_{3}>0, B12w~2α2log2δB\geq\frac{1}{2\tilde{w}^{2}\alpha^{2}}\log\frac{2}{\delta} for any 0<w~<w<10<\tilde{w}<w<1 and ΔN,M\Delta_{N,M} satisfies

ΔN,M2θ~+12θ~𝒩2(dθΔN,M12θ~)dθ1(δ1log(1/α~))2(N+M)2,ΔN,M𝒩2(dθΔN,M12θ~)δ1log(1/α~)N+M,\frac{\Delta_{N,M}^{\frac{2\tilde{\theta}+1}{2\tilde{\theta}}}}{\mathcal{N}_{2}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{d_{\theta}^{-1}(\delta^{-1}\log(1/\tilde{\alpha}))^{2}}{(N+M)^{2}},\qquad\,\frac{\Delta_{N,M}}{\mathcal{N}_{2}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{\delta^{-1}\log(1/\tilde{\alpha})}{N+M},
ΔN,Mcθ(N+Mlog(N+M))2θ~,\Delta_{N,M}\geq c_{\theta}\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}},

then

inf(P,Q)𝒫PH1{η^λq^1wαB,λ}15δ,\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\hat{q}_{1-w\alpha}^{B,\lambda}\right\}\geq 1-5\delta, (4.8)

where cθ>0c_{\theta}>0 is a constant that depends on θ\theta, α~=(ww~)α\tilde{\alpha}=(w-\tilde{w})\alpha and θ~=min(θ,ξ)\tilde{\theta}=\min(\theta,\xi).

Furthermore, suppose C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty and N+Mmax{d3δlog1(ww~)α,32δ,ed1272C}N+M\geq\max\left\{\frac{d_{3}}{\sqrt{\delta}}\log\frac{1}{(w-\tilde{w})\alpha},\frac{32}{\delta},e^{\frac{d_{1}}{272C}}\right\}. Then (4.8) holds when the above conditions on ΔN,M\Delta_{N,M} are replaced by

ΔN,M𝒩1(dθΔN,M12θ~)(δ1log(1/α~))2(N+M)2,ΔN,M𝒩2(dθΔN,M12θ~)δ1log(1/α~)N+M,\frac{\Delta_{N,M}}{\mathcal{N}_{1}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{(\delta^{-1}\log(1/\tilde{\alpha}))^{2}}{(N+M)^{2}},\qquad\frac{\Delta_{N,M}}{\mathcal{N}_{2}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\frac{\delta^{-1}\log(1/\tilde{\alpha})}{N+M},
and1𝒩1(dθΔN,M12θ~)(N+Mlog(N+M))1.\text{and}\qquad\frac{1}{\mathcal{N}_{1}\left(d_{\theta}\Delta_{N,M}^{\frac{1}{2\tilde{\theta}}}\right)}\gtrsim\left(\frac{N+M}{\log(N+M)}\right)^{-1}.

The following corollaries specialize Theorem 4.7 for the case of polynomial and exponential decay of eigenvalues of 𝒯\mathcal{T}.

Corollary 4.8 (Polynomial decay–permutation).

Suppose λiiβ\lambda_{i}\lesssim i^{-\beta}, β>1\beta>1. Then there exists kθ,βk_{\theta,\beta}\in\mathbb{N} such that for all N+Mkθ,βN+M\geq k_{\theta,\beta} and for any δ>0\delta>0,

inf(P,Q)𝒫PH1{η^λq^1wαB,λ}15δ,\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\hat{q}_{1-w\alpha}^{B,\lambda}\right\}\geq 1-5\delta,

when

ΔN,M={c(α,δ,θ)(N+M)4θ~β4θ~β+1,θ~>1214βc(α,δ,θ)(log(N+M)N+M)2θ~,θ~1214β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{2}-\frac{1}{4\beta}\\ c(\alpha,\delta,\theta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}-\frac{1}{4\beta}\end{array}\right.,

with c(α,δ,θ)max{δ2(log1α)2,d42θ~}c(\alpha,\delta,\theta)\gtrsim\max\{\delta^{-2}(\log\frac{1}{\alpha})^{2},d_{4}^{2\tilde{\theta}}\} for some constant d4>0d_{4}>0. Furthermore, if supiϕi\sup_{i}\left\|\phi_{i}\right\|_{\infty} <<\infty, then

ΔN,M={c(α,δ,θ,β)(N+M)4θ~β4θ~β+1,θ~>14βc(α,δ,θ,β)(log(N+M)N+M)2θ~β,θ~14β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta,\beta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{4\beta}\\ c(\alpha,\delta,\theta,\beta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}\beta},&\ \ \tilde{\theta}\leq\frac{1}{4\beta}\end{array}\right.,

where c(α,δ,θ,β)max{δ2(log1α)2,d52θ~β}c(\alpha,\delta,\theta,\beta)\gtrsim\max\{\delta^{-2}(\log\frac{1}{\alpha})^{2},d_{5}^{2\tilde{\theta}\beta}\} for some constant d5>0d_{5}>0.

Corollary 4.9 (Exponential decay–permutation).

Suppose λieτi\lambda_{i}\lesssim e^{-\tau i}, τ>0\tau>0. Then for any δ>0\delta>0, there exists kθk_{\theta} such that for all N+MkθN+M\geq k_{\theta}, inf(P,Q)𝒫PH1{η^λq^1wαB,λ}15δ\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\hat{q}_{1-w\alpha}^{B,\lambda}\right\}\geq 1-5\delta when

ΔN,M={c(α,δ,θ)log(N+M)N+M,θ~>12c(α,δ,θ)(log(N+M)N+M)2θ~,θ~12,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},&\ \ \tilde{\theta}>\frac{1}{2}\\ c(\alpha,\delta,\theta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}\end{array}\right.,

where c(α,δ,θ)max{12θ~,1}max{δ2(log1α)2,d62θ~}c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}\max\left\{\delta^{-2}(\log\frac{1}{\alpha})^{2},d_{6}^{2\tilde{\theta}}\right\} for some constant d6>0d_{6}>0. Furthermore, if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then

ΔN,M=c(α,δ,θ)log(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}δ2(log1α)2.c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}\delta^{-2}(\log\frac{1}{\alpha})^{2}.

These results show that the permutation-based test constructed in Theorem 4.6 is minimax optimal w.r.t. 𝒫\mathcal{P}, matching the rates of the Oracle test with a completely data-driven test threshold. The computational complexity of the test increases to O(Bs3+Bm2+Bn2+Bns2+Bms2)O(Bs^{3}+Bm^{2}+Bn^{2}+Bns^{2}+Bms^{2}) as the test statistic is computed BB times to calculate the threshold q^1αB,λ\hat{q}_{1-\alpha}^{B,\lambda}. However, since the test can be parallelized over the permutations, the computational complexity in practice is still the complexity of one permutation.

4.4 Adaptation

While the permutation test defined in the previous section provides a practical test, the choice of λ\lambda that yields the minimax separation boundary depends on the prior knowledge of θ\theta (and β\beta in the case of polynomially decaying eigenvalues). In this section, we construct a test based on the union (aggregation) of multiple tests for different values of λ\lambda taking values in a finite set, Λ\Lambda, that guarantees to be minimax optimal (up to log\log factors) for a wide range of θ\theta (and β\beta in case of polynomially decaying eigenvalues).

Define Λ:={λL,2λL,,λU},\Lambda:=\{\lambda_{L},2\lambda_{L},...\,,\lambda_{U}\}, where λU=2bλL\lambda_{U}=2^{b}\lambda_{L}, for bb\in\mathbb{N}. Clearly |Λ|=b+1=1+log2λUλL|\Lambda|=b+1=1+\log_{2}\frac{\lambda_{U}}{\lambda_{L}} is the cardinality of Λ\Lambda. Let λ\lambda^{*} be the optimal λ\lambda that yields minimax optimality. The main idea is to choose λL\lambda_{L} and λU\lambda_{U} to ensure that there is an element in Λ\Lambda that is close to λ\lambda^{*} for any θ\theta (and β\beta in case of polynomially decaying eigenvalues). Define v:=sup{xΛ:xλ}v^{*}:=\sup\{x\in\Lambda:x\leq\lambda^{*}\}. Then it is easy to see that for λLλλU\lambda_{L}\leq\lambda^{*}\leq\lambda_{U}, we have λ2vλ\frac{\lambda^{*}}{2}\leq v^{*}\leq\lambda^{*}, in other words vλv^{*}\asymp\lambda^{*}. Thus, vv^{*} is also an optimal choice for λ\lambda that belongs to Λ\Lambda. Motivated by this, in the following, we construct an α\alpha-level test based on the union of the tests over λΛ\lambda\in\Lambda that rejects H0H_{0} if one of the tests rejects H0H_{0}, which is captured by Theorem 4.10. The separation boundary of this test is analyzed in Theorem 4.11 under the polynomial and the exponential decay rates of the eigenvalues of 𝒯\mathcal{T}, showing that the adaptive test achieves the same performance (up to a loglog\log\log factor) as that of the Oracle test, i.e., minimax optimal w.r.t. 𝒫\mathcal{P} over the range of θ\theta mentioned in Corollaries 3.3, 3.4 without requiring the knowledge of λ\lambda^{*}.

Theorem 4.10 (Critical region–adaptation).

For any 0<α10<\alpha\leq 1 and 0<w+w~<10<w+\tilde{w}<1, if B|Λ|22w~2α2log2|Λ|α(1ww~)B\geq\frac{|\Lambda|^{2}}{2\tilde{w}^{2}\alpha^{2}}\log\frac{2|\Lambda|}{\alpha(1-w-\tilde{w})}, then

PH0{λΛη^λq^1wα|Λ|B,λ}α.P_{H_{0}}\left\{\bigcup_{\lambda\in\Lambda}\hat{\eta}_{\lambda}\geq\hat{q}_{1-\frac{w\alpha}{|\Lambda|}}^{B,\lambda}\right\}\leq\alpha.
Theorem 4.11 (Separation boundary–adaptation).

Suppose (A0)(A_{0})(A4)(A_{4}) and (B)(B) hold. Let θ~=min(θ,ξ),\tilde{\theta}=\min(\theta,\xi), s=e1N=e2Ms=e_{1}N=e_{2}M for 0<e1,e2<10<e_{1},e_{2}<1, and supθ>0sup(P,Q)𝒫𝒯θuL2(R)\sup_{\theta>0}\sup_{(P,Q)\in\mathcal{P}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)} <<\infty. Then for any δ>0\delta>0, B|Λ|22w~2α2log2δB\geq\frac{|\Lambda|^{2}}{2\tilde{w}^{2}\alpha^{2}}\log\frac{2}{\delta}, 0<w~<w<10<\tilde{w}<w<1, 0<αe10<\alpha\leq e^{-1}, θl>0\theta_{l}>0, there exists kk such for all N+MkN+M\geq k, we have

infθ>θlinf(P,Q)𝒫PH1{λΛη^λq^1wα|Λ|B,λ}15δ,\inf_{\theta>\theta_{l}}\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\left\{\bigcup_{\lambda\in\Lambda}\hat{\eta}_{\lambda}\geq\hat{q}_{1-\frac{w\alpha}{|\Lambda|}}^{B,\lambda}\right\}\geq 1-5\delta,

provided one of the following cases hold:

(i) λiiβ\lambda_{i}\lesssim i^{-\beta}, 1<β<βu<1<\beta<\beta_{u}<\infty, λL=r1log(N+M)N+M\lambda_{L}=r_{1}\frac{\log(N+M)}{N+M}, λU=r2(log(N+M)N+M)24ξ~+1\lambda_{U}=r_{2}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{2}{4\tilde{\xi}+1}}, for some constants r1,r2>0r_{1},r_{2}>0, where ξ~=max(ξ,14)\tilde{\xi}=\max(\xi,\frac{1}{4}), ΣPQ()λU\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\geq\lambda_{U}, and

ΔN,M=c(α,δ,θ)max{(loglog(N+M)N+M)4θ~β4θ~β+1,(log(N+M)N+M)2θ~},\Delta_{N,M}=c(\alpha,\delta,\theta)\max\left\{\left(\frac{\log\log(N+M)}{N+M}\right)^{\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}}\right\},

with c(α,δ,θ)max{δ2(log1/α)2,d12θ~}c(\alpha,\delta,\theta)\gtrsim\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{1}^{2\tilde{\theta}}\right\}, for some constant d1>0.d_{1}>0. Furthermore, if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then the above conditions on λL\lambda_{L} and λU\lambda_{U} can be replaced by λL=r3(log(N+M)N+M)βu\lambda_{L}=r_{3}\left(\frac{\log(N+M)}{N+M}\right)^{\beta_{u}}, λU=r4(log(N+M)N+M)24ξ~+1\lambda_{U}=r_{4}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{2}{4\tilde{\xi}+1}}, for some constants r3,r4>0r_{3},r_{4}>0 and

ΔN,M=c(α,δ,θ,β)max{(loglog(N+M)N+M)4θ~β4θ~β+1,(log(N+M)N+M)2θ~β},\Delta_{N,M}=c(\alpha,\delta,\theta,\beta)\max\left\{\left(\frac{\log\log(N+M)}{N+M}\right)^{\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}\beta}\right\},

where c(α,δ,θ,β)max{δ2(log1/α)2,d22θ~β}c(\alpha,\delta,\theta,\beta)\gtrsim\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{2}^{2\tilde{\theta}\beta}\right\} for some constant d2>0.d_{2}>0.

(ii) λieτi\lambda_{i}\lesssim e^{-\tau i}, τ>0\tau>0, λL=r5log(N+M)N+M\lambda_{L}=r_{5}\frac{\log(N+M)}{N+M}, λU=r6(log(N+M)N+M)1/2ξ\lambda_{U}=r_{6}\left(\frac{\log(N+M)}{N+M}\right)^{1/2\xi} for some r5,r6>0r_{5},r_{6}>0, λUΣPQ()\lambda_{U}\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}, and

ΔN,M=c(α,δ,θ)max{log(N+M)loglog(N+M)N+M,(log(N+M)N+M)2θ~},\Delta_{N,M}=c(\alpha,\delta,\theta)\max\left\{\frac{\sqrt{\log(N+M)}\log\log(N+M)}{N+M},\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}}\right\},

where c(α,δ,θ)max{12θ~,1}max{δ2(log1/α)2,d42θ~},c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{4}^{2\tilde{\theta}}\right\}, for some constant d4>0.d_{4}>0. Furthermore if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then the above conditions on λL\lambda_{L} and λU\lambda_{U} can be replaced by λL=r7(log(N+M)N+M)12θl\lambda_{L}=r_{7}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{1}{2\theta_{l}}}, λU=r8(log(N+M)N+M)12ξ\lambda_{U}=r_{8}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{1}{2\xi}}, for some r7,r8>0r_{7},r_{8}>0 and

ΔN,M=c(α,δ,θ)log(N+M)loglog(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}\log\log(N+M)}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}δ2(log1/α)2.c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}\delta^{-2}(\log 1/\alpha)^{2}.

It follows from the above result that the set Λ\Lambda which is defined by λL\lambda_{L} and λU\lambda_{U} does not depend on any unknown parameters.

Remark 4.7.

Theorem 4.11 shows that the adaptive test achieves the same performance (up to log\log log\log factor) as that of the Oracle test but without the prior knowledge of unknown parameters, θ\theta and β\beta. In fact, following the ideas we used in the proof of Theorem 3.2 combined with the ideas in Ingster (2000) and Balasubramanian et al. (2021, Theorem 6), it can be shown that an extra factor of loglog(N+M)\sqrt{\log\log(N+M)} is unavoidable in the expression of the adaptive minimax separation boundary compared to the non-adaptive case. Thus, our adaptive test is actually minimax optimal up to a factor loglog(N+M)\sqrt{\log\log(N+M)}. This gap occurs, since the approach we are using to bound the threshold q^1α|Λ|B,λ\hat{q}_{1-\frac{\alpha}{|\Lambda|}}^{B,\lambda} uses Bernstein-type inequality (see Lemma A.15), which involves a factor log(|Λ|/α),\log(|\Lambda|/\alpha), with |Λ|log(N+M)|\Lambda|\lesssim\log(N+M), hence yielding the extra loglog\log\log factor. We expect that this gap can be filled by using a threshold that depends on the asymptotic distribution of η^λ\hat{\eta}_{\lambda} (as was done in Balasubramanian et al., 2021), yielding an asymptotic α\alpha-level test in contrast to the exact α\alpha-level test achieved by the permutation approach.

4.5 Choice of kernel

In the discussion so far, a kernel is first chosen which determines the test statistic, the test, and the set of local alternatives, 𝒫\mathcal{P}. But the key question is what is the right kernel. In fact, this question is the holy grail of all kernel methods.

To this end, we propose to start with a family of kernels, 𝒦\mathcal{K} and construct an adaptive test by taking the union of tests jointly over λΛ\lambda\in\Lambda and K𝒦K\in\mathcal{K} to test H0:P=QH_{0}:P=Q vs. H1:K𝒦θ>0𝒫~,H_{1}:\cup_{K\in\mathcal{K}}\cup_{\theta>0}\tilde{\mathcal{P}}, where

𝒫~:=𝒫~θ,Δ,K:={(P,Q):dPdR1Ran(𝒯Kθ),ρ¯2(P,Q)Δ},\tilde{\mathcal{P}}:=\tilde{\mathcal{P}}_{\theta,\Delta,K}:=\left\{(P,Q):\frac{dP}{dR}-1\in\text{Ran}(\mathcal{T}_{K}^{\theta}),\,\,\underline{\rho}^{2}(P,Q)\geq\Delta\right\},

with 𝒯K\mathcal{T}_{K} being defined similar to 𝒯\mathcal{T} for K𝒦.K\in\mathcal{K}. Some examples of 𝒦\mathcal{K} include the family of Gaussian kernels indexed by the bandwidth, {exy22/h,x,yd:h(0,)}\{e^{-\|x-y\|^{2}_{2}/h},\,x,y\in\mathbb{R}^{d}:h\in(0,\infty)\}; the family of Laplacian kernels indexed by the bandwidth, {exy1/h,x,yd:h(0,)}\{e^{-\|x-y\|_{1}/h},\,x,y\in\mathbb{R}^{d}:h\in(0,\infty)\}; family of radial-basis functions, {0eσxy22dμ(σ):μM+((0,))}\{\int^{\infty}_{0}e^{-\sigma\|x-y\|^{2}_{2}}\,d\mu(\sigma):\mu\in M^{+}((0,\infty))\}, where M+((0,))M^{+}((0,\infty)) is the family of finite signed measures on (0,)(0,\infty); a convex combination of base kernels, {i=1λiKi:i=1λi=1,λi0,i[]}\{\sum^{\ell}_{i=1}\lambda_{i}K_{i}:\sum^{\ell}_{i=1}\lambda_{i}=1,\,\lambda_{i}\geq 0,\forall\,i\in[\ell]\}, where (Ki)i=1(K_{i})^{\ell}_{i=1} are base kernels. In fact, any of the kernels in the first three examples can be used as base kernels. The idea of adapting to a family of kernels has been explored in regression and classification settings under the name multiple-kernel learning and we refer the reader to Gönen and Alpaydin (2011) and references therein for details.

Let η^λ,K\hat{\eta}_{\lambda,K} be the test statistic based on kernel KK and regularization parameter λ\lambda. We reject H0H_{0} if η^λ,Kq^1wα|Λ||𝒦|B,λ,K\hat{\eta}_{\lambda,K}\geq\hat{q}_{1-\frac{w\alpha}{|\Lambda||\mathcal{K}|}}^{B,\lambda,K} for any (λ,K)Λ×𝒦(\lambda,K)\in\Lambda\times\mathcal{K}. Similar to Theorem 4.10, it can be shown that this test has level α\alpha if |𝒦|<|\mathcal{K}|<\infty. The requirement of |𝒦|<|\mathcal{K}|<\infty holds if we consider the above-mentioned families with a finite collection of bandwidths in the case of Gaussian and Laplacian kernels, and a finite collection of measures from M+((0,))M^{+}((0,\infty)) in the case of radial basis functions.

Similar to Theorem 4.11, it can be shown that the kernel adaptive test is minimax optimal w.r.t. 𝒫~\tilde{\mathcal{P}} up to a loglog\log\log factor, with the main difference being an additional factor of log|𝒦|\log|\mathcal{K}| as illustrated in the next Theorem. We do not provide a proof since it is very similar to that of Theorem 4.11 with |Λ||\Lambda| replaced by |Λ||𝒦|.|\Lambda||\mathcal{K}|.

Theorem 4.12 (Separation boundary–adaptation over kernel).

Suppose (A0)(A_{0})(A4)(A_{4}) and (B)(B) hold. Let 𝒜:=log|𝒦|\mathcal{A}:=\log|\mathcal{K}|, θ~=min(θ,ξ),\tilde{\theta}=\min(\theta,\xi), s=e1N=e2Ms=e_{1}N=e_{2}M for 0<e1,e2<10<e_{1},e_{2}<1, and

supK𝒦supθ>0sup(P,Q)𝒫~𝒯θuL2(R)<.\sup_{K\,\in\mathcal{K}}\sup_{\theta>0}\sup_{(P,Q)\in\tilde{\mathcal{P}}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}<\infty.

Then for any δ>0\delta>0, 0<αe10<\alpha\leq e^{-1}, B|Λ|2|𝒦|22w~2α2log2δB\geq\frac{|\Lambda|^{2}|\mathcal{K}|^{2}}{2\tilde{w}^{2}\alpha^{2}}\log\frac{2}{\delta}, 0<w~<w<10<\tilde{w}<w<1, 0<αe10<\alpha\leq e^{-1}, θl>0\theta_{l}>0, there exists kk such for all N+MkN+M\geq k, we have

infK𝒦infθ>θlinf(P,Q)𝒫~PH1{(λ,K)Λ×𝒦η^λ,Kq^1wα|Λ||𝒦|B,λ,K}15δ,\inf_{K\in\mathcal{K}}\inf_{\theta>\theta_{l}}\inf_{(P,Q)\in\tilde{\mathcal{P}}}P_{H_{1}}\left\{\bigcup_{(\lambda,K)\in\Lambda\times\mathcal{K}}\hat{\eta}_{\lambda,K}\geq\hat{q}_{1-\frac{w\alpha}{|\Lambda||\mathcal{K}|}}^{B,\lambda,K}\right\}\geq 1-5\delta,

provided one of the following cases hold: For any K𝒦K\in\mathcal{K} and (P,Q)𝒫~(P,Q)\in\tilde{\mathcal{P}},
(i) λiiβ\lambda_{i}\lesssim i^{-\beta}, 1<β<βu<1<\beta<\beta_{u}<\infty, λL=r1log(N+M)N+M\lambda_{L}=r_{1}\frac{\log(N+M)}{N+M}, λU=r2(log(N+M)N+M)24ξ~+1\lambda_{U}=r_{2}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{2}{4\tilde{\xi}+1}}, for some constants r1,r2>0r_{1},r_{2}>0, where ξ~=max(ξ,14)\tilde{\xi}=\max(\xi,\frac{1}{4}), ΣPQ()λU\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\geq\lambda_{U}, and

ΔN,M=c(α,δ,θ)max{(𝒜loglog(N+M)N+M)4θ~β4θ~β+1,(log(N+M)N+M)2θ~},\Delta_{N,M}=c(\alpha,\delta,\theta)\max\left\{\left(\frac{\mathcal{A}\log\log(N+M)}{N+M}\right)^{\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}}\right\},

with c(α,δ,θ)max{δ2(log1/α)2,d12θ~}c(\alpha,\delta,\theta)\gtrsim\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{1}^{2\tilde{\theta}}\right\}, for some constant d1>0.d_{1}>0. Furthermore, if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then the above conditions on λL\lambda_{L} and λU\lambda_{U} can be replaced by λL=r3(log(N+M)N+M)βu\lambda_{L}=r_{3}\left(\frac{\log(N+M)}{N+M}\right)^{\beta_{u}}, λU=r4(log(N+M)N+M)24ξ~+1\lambda_{U}=r_{4}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{2}{4\tilde{\xi}+1}}, for some constants r3,r4>0r_{3},r_{4}>0 and

ΔN,M=c(α,δ,θ,β)max{(𝒜loglog(N+M)N+M)4θ~β4θ~β+1,(log(N+M)N+M)2θ~β},\Delta_{N,M}=c(\alpha,\delta,\theta,\beta)\max\left\{\left(\frac{\mathcal{A}\log\log(N+M)}{N+M}\right)^{\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}\beta}\right\},

where c(α,δ,θ,β)max{δ2(log1/α)2,d22θ~β}c(\alpha,\delta,\theta,\beta)\gtrsim\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{2}^{2\tilde{\theta}\beta}\right\} for some constant d2>0.d_{2}>0.
(ii) λieτi\lambda_{i}\lesssim e^{-\tau i}, τ>0\tau>0, λL=r5log(N+M)N+M\lambda_{L}=r_{5}\frac{\log(N+M)}{N+M}, λU=r6(log(N+M)N+M)1/2ξ\lambda_{U}=r_{6}\left(\frac{\log(N+M)}{N+M}\right)^{1/2\xi} for some r5,r6>0r_{5},r_{6}>0, λUΣPQ()\lambda_{U}\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}, and

ΔN,M=c(α,δ,θ)max{𝒜log(N+M)loglog(N+M)N+M,(log(N+M)N+M)2θ~},\Delta_{N,M}=c(\alpha,\delta,\theta)\max\left\{\frac{\mathcal{A}\sqrt{\log(N+M)}\log\log(N+M)}{N+M},\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}}\right\},

where c(α,δ,θ)max{12θ~,1}max{δ2(log1/α)2,d42θ~},c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{4}^{2\tilde{\theta}}\right\}, for some constant d4>0.d_{4}>0. Furthermore if supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then the above conditions on λL\lambda_{L} and λU\lambda_{U} can be replaced by λL=r7(log(N+M)N+M)12θl\lambda_{L}=r_{7}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{1}{2\theta_{l}}}, λU=r8(log(N+M)N+M)12ξ\lambda_{U}=r_{8}\left(\frac{\log(N+M)}{N+M}\right)^{\frac{1}{2\xi}}, for some r7,r8>0r_{7},r_{8}>0 and

ΔN,M=c(α,δ,θ)𝒜log(N+M)loglog(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\mathcal{A}\sqrt{\log(N+M)}\log\log(N+M)}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}δ2(log1/α)2.c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}\delta^{-2}(\log 1/\alpha)^{2}.

5 Experiments

In this section, we study the empirical performance of the proposed two-sample test by comparing it to the performance of the adaptive MMD test (Schrab et al., 2021), Energy test (Szekely and Rizzo, 2004) and Kolmogorov-Smirnov (KS) test (Puritz et al., 2022; Fasano and Franceschini, 1987). The adaptive MMD test in Schrab et al. (2021) involves using a translation invariant kernel on d\mathbb{R}^{d} in D2MMDD^{2}_{\text{MMD}} with bandwidth hh where the critical level is obtained by a permutation/wild bootstrap. Multiple such tests are constructed over hh, which are aggregated to achieve adaptivity and the resultant test is referred to as MMDAgg. All tests are repeated 500 times and the average power is reported.

To compare the performance, in the following, we consider different experiments on Euclidean and directional data using the Gaussian kernel, K(x,y)=exp(xy222h)K(x,y)=\exp\left(-\frac{\left\|x-y\right\|_{2}^{2}}{2h}\right) and by setting α=0.05\alpha=0.05, with hh being the bandwidth. For our test, as discussed in Section 4.5, we construct an adaptive test by taking the union of tests jointly over λΛ\lambda\in\Lambda and hWh\in W. Let η^λ,h\hat{\eta}_{\lambda,h} be the test statistic based on λ\lambda and bandwidth hh. We reject H0H_{0} if η^λ,hq^1α|Λ||W|B,λ,h\hat{\eta}_{\lambda,h}\geq\hat{q}_{1-\frac{\alpha}{|\Lambda||W|}}^{B,\lambda,h} for any (λ,h)Λ×W(\lambda,h)\in\Lambda\times W. We performed such a test for Λ:={λL,2λL,,λU},\Lambda:=\{\lambda_{L},2\lambda_{L},...\,,\lambda_{U}\}, and W:={wLhm,2wLhm,,wUhm}W:=\{w_{L}h_{m},2w_{L}h_{m},...\,,w_{U}h_{m}\}, where hm:=median{qq22:q,qXY}h_{m}:=\text{median}\{\left\|q-q^{\prime}\right\|_{2}^{2}:q,q^{\prime}\in X\cup Y\}. In our experiments we set λL=106\lambda_{L}=10^{-6}, λU=5,\lambda_{U}=5, wL=0.01w_{L}=0.01, wU=100w_{U}=100, B=250B=250 for all the experiments. We chose the number of permutations BB to be large enough to ensure the control of Type-I error (see Figure 1). We show the results for both Tikhonov and Showalter regularization and for different choices of the parameter ss, which is the number of samples used to estimate the covariance operator after sample splitting. All codes used for our spectral regularized test are available at https://github.com/OmarHagrass/Spectral-regularized-two-sample-test.

Remark 5.1.

(i) For our experimental evaluations of the other tests, we used the following:
For the Energy test, we used the ”eqdist.etest” function from the R Package ”energy” (for code see https://github.com/mariarizzo/energy) with the parameter R=199R=199 indicating the number of bootstrap replicates. For the KS test, we used the R package ”fasano.franceschini.test” (for code see https://github.com/braunlab-nu/fasano.franceschini.test). For the MMDAgg test, we employed the code provided in Schrab et al. (2021). Since Schrab et al. (2021) presents various versions of the MMDAgg test, we compared our results to the version of MMDAgg that yielded the best performance on the datasets considered in Schrab et al. (2021), which include the MNIST data and the perturbed uniform. For the rest of the experiments, we compared our test to ”MMDAgg uniform” with Λ[6,1]\Lambda[-6,1]—see Schrab et al. (2021) for details.

(ii) As mentioned above, in all the experiments, BB is chosen to be 250250. It has to be noted that this choice of BB is much smaller than that suggested by Theorems 4.10 and 4.11 to maintain the Type-I error and so one could expect the resulting test to be anti-conservative. However, in this section’s experimental settings, we found this choice of BB to yield a test that is neither anti-conservative nor overly conservative. Of course, we would like to acknowledge that too small BB would make the test anti-conservative while too large BB would make it conservative, i.e., loss of power (see Figure 1 where increasing BB leads to a drop in the Type-I error below 0.050.05 and therefore a potential drop in the power). Hence, the choice of BB is critical in any permutation-based test. This phenomenon is attributed to the conservative nature of the union bound utilized in computing the threshold of the adapted test. Thus, an intriguing future direction would be to explore methods superior to the union bound to accurately control the Type-I error at the desired level and further enhance the power.

(iii) In an ideal scenario, the choice of BB should be contingent upon α\alpha, as evidenced in the statements of Theorems 4.10 and 4.11. However, utilizing this theoretical bound for the number of permutations would be computationally prohibitive, given the expensive nature of computing each permuted statistic. Exploring various approximation schemes such as random Fourier features (Rahimi and Recht, 2008), Nyström subsampling (e.g., Williams and Seeger, 2001; Drineas and Mahoney, 2005), or sketching (Yang et al., 2017), which are known for expediting kernel methods, could offer more computationally efficient testing approaches, and therefore could allow to choose BB as suggested in Theorems 4.10 and 4.11.

Refer to caption
Figure 1: Type-I error for different number of permutations.

5.1 Bechmark datasets

In this section, we investigate the empirical performance of the spectral regularized test and compare it to the other methods.

5.1.1 Gaussian distribution

Let P=N(0,Id)P=N(0,I_{d}) and Q=N(μ,σ2Id)Q=N(\mu,\sigma^{2}I_{d}), where μ0,σ2=1\mu\neq 0,\sigma^{2}=1 or μ=0\mu=0, σ21\sigma^{2}\neq 1, i.e., we test for Gaussian mean shift or Gaussian covariance scaling, where IdI_{d} is the dd-dimensional identity matrix. Figures 2(a) and 3(a) show the results for the Gaussian mean shift and covariance scale experiments, where we used s=20s=20 for our test. It can be seen from Figure 2(a) that the best power is obtained by the Energy test, followed closely by the proposed test, with the other tests showing poor performance, particularly in high-dimensional settings. On the other hand, the proposed test performs the best in Figure 3(a), closely followed by the Energy test. We also investigated the effect of ss on the test power for the Showalter method (Tikhonov method also enjoys very similar results), whose results are reported in Figures 2(b) and 3(b). We note from these figures that lower values of ss perform slightly better, though overall, the performance is less sensitive to the choices of ss. Based on these results and those presented below, as a practical suggestion, the choice s=(N+M)/20s=(N+M)/20 is probably fine.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: Power for Gaussian shift experiments with different dd and ss using N=M=200,N=M=200, where the Showalter method is used in (b).
Refer to caption
(a)
Refer to caption
(b)
Figure 3: Power for Gaussian covariance scale experiments with different dd and ss using N=M=200N=M=200, where the Showalter method is used in (b).

5.1.2 Cauchy distribution

In this section, we investigate the performance of the proposed test for heavy-tailed distribution, specifically the Cauchy distribution with median shift alternatives. Particularly, we test samples from a Cauchy distribution with zero median and unit scale against another set of Cauchy samples with different median shifts and unit scale. Figure 4(a) shows that for d{1,10}d\in\{1,10\}, the KS test achieves the highest power for the majority of considered median shifts, followed closely by our regularized test which achieves better power for the harder problem when the shift is small. Moreover, for d{20,50}d\in\{20,50\}, our proposed test achieves the highest power. The effect of ss is captured in Figure 4(b).

Refer to caption
(a)
Refer to caption
(b)
Figure 4: Power for Cauchy with median shifts for different ss and dd with N=M=500N=M=500, where the Showalter method is used in (b).

5.1.3 MNIST dataset

Next, we investigate the performance of the regularized test on the MNIST dataset (LeCun et al., 2010), which is a collection of images of digits from 099. In our experiments, as in Schrab et al. (2021), the images were downsampled to 7×77\times 7 (i.e. d=49d=49) and consider 500 samples drawn with replacement from set PP while testing against the set QiQ_{i} for i=1,2,3,4,5i=1,2,3,4,5, where PP consists of images of the digits

P:0,1,2,3,4,5,6,7,8,9,P:0,1,2,3,4,5,6,7,8,9,

and

Q1:1,3,5,7,9,Q2:0,1,3,5,7,9,Q3:0,1,2,3,5,7,9,Q_{1}:1,3,5,7,9,\qquad Q_{2}:0,1,3,5,7,9,\qquad Q_{3}:0,1,2,3,5,7,9,
Q4:0,1,2,3,4,5,7,9,Q5:0,1,2,3,4,5,6,7,9.Q_{4}:0,1,2,3,4,5,7,9,\qquad Q_{5}:0,1,2,3,4,5,6,7,9.

Figure 5(a) shows the power of our test for both Gaussian and Laplace kernels in comparison to that of MMDAgg and the other tests which shows the superior performance of the regularized test, particularly in the difficult cases, i.e., distinguishing between PP and QiQ_{i} for larger ii. Figure 5(b) shows the effect of ss on the test power, from which we can see that the best performance in this case is achieved for s=50s=50, however, the overall results are not very sensitive to the choice of ss.

Refer to caption
(a)
Refer to caption
(b)
Figure 5: Power for MNIST dataset using N=M=500N=M=500, where the Showalter method is used in (b).

5.1.4 Directional data

In this section, we consider two experiments with directional domains. First, we consider a multivariate von Mises-Fisher distribution (which is the Gaussian analogue on unit-sphere) given by f(x,μ,k)=kd/212πd/2Id/21(k)exp(kμTx),xSd1,f(x,\mu,k)=\frac{k^{d/2-1}}{2\pi^{d/2}I_{d/2-1}(k)}\exp(k\mu^{T}x),\,x\in S^{d-1}, where k0k\geq 0 is the concentration parameter, μ\mu is the mean parameter and II is the modified Bessel function (see Figure 6(a)). Figure 7(a) shows the results for testing von Mises-Fisher distribution against spherical uniform distribution (k=0k=0) for different concentration parameters using a Gaussian kernel. Note that the theoretical properties of MMDAgg do not hold in this case, unlike the proposed test. We can see from Figure 7(a) that the best power is achieved by the Energy test followed closely by our regularized test. Figure 7(b) shows effect of ss on the test power of the regularized test.

Second, we consider a mixture of two multivariate Watson distribution (which is an axially symmetric distribution on a sphere) given by f(x,μ,k)=Γ(d/22πd/2M(1/2,d/2.k)exp(k(μTx)2)f(x,\mu,k)=\frac{\Gamma(d/2}{2\pi^{d/2}M(1/2,d/2.k)}\exp(k(\mu^{T}x)^{2}), xSd1,x\in S^{d-1}, where k0k\geq 0 is the concentration parameter, μ\mu is the mean parameter and MM is Kummer’s confluent hypergeometric function. Using equal weights we drew 500 samples from a mixture of two Watson distributions with similar concentration parameter kk and mean parameter μ1,μ2\mu_{1},\mu_{2} respectively, where μ1=(1,,1)Rd\mu_{1}=(1,\dots,1)\in R^{d} and μ2=(1,1,1)Rd\mu_{2}=(-1,1\dots,1)\in R^{d} with the first coordinate changed to 1-1 (see Figure 6(b)). Figure 8(a) shows the results for testing against spherical uniform distribution for different concentration parameters using a Gaussian kernel and we can see that our regularized test outperforms the other methods. Figure 8(b) shows the effect of different choices of the parameter ss on the test power, which like in previous scenarios, is not very sensitive to the choice of ss. Moreover, in Figure 8(c) we illustrate how the power changes with increasing the sample size for the case d=20d=20 and k=6k=6, which shows that the regularized test power converges to one more quickly than the other methods.

Refer to caption
(a)
Refer to caption
(b)
Figure 6: Illustration for 500 samples drawn from directional distributions when d=2d=2. (a) von Mises-Fisher distribution with k=4k=4, μ=(1,1)\mu=(1,1), (b) Mixture of two Watson distributions with k=10k=10 and μ1=(1,1)\mu_{1}=(1,1), μ2=(1,1)\mu_{2}=(-1,1).
Refer to caption
(a)
Refer to caption
(b)
Figure 7: Power for von Mises-Fisher distribution with different concentration parameter kk and ss using N=M=500N=M=500, where the Showalter method is used in (b).
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 8: Power for mixture of Watson distributions: (a) for different concentration parameter kk using N=M=500N=M=500, (b) using different choices of ss for Showalter method, and (c) using different samples sizes with d=20d=20, k=6k=6, and with Showalter regularization.

5.2 Perturbed uniform distribution

In this section, we consider a simulated data experiment where we are testing a dd-dimensional uniform distribution against a perturbed version of it. The perturbed density for xdx\in\mathbb{R}^{d} is given by

fw(x):=𝟙[0,1]d(x)+cdP1v{1,,P}dwvi=1dG(Pxivi),f_{w}(x):=\mathds{1}_{[0,1]^{d}}(x)+c_{d}P^{-1}\sum_{v\in\{1,\dots,P\}^{d}}w_{v}\prod_{i=1}^{d}G(Px_{i}-v_{i}),

where w=(wv)v{1,,P}d{1,1}Pdw=(w_{v})_{v\in\{1,\dots,P\}^{d}}\in\{-1,1\}^{P^{d}}, PP represents the number of perturbations being added and for xx\in\mathbb{R},

G(x):=exp(11(4x+3)2)𝟙(1,1/2)(x)exp(11(4x+1)2)𝟙(1/2,0)(x).G(x):=\exp\left(-\frac{1}{1-(4x+3)^{2}}\right)\mathds{1}_{(-1,-1/2)}(x)-\exp\left(-\frac{1}{1-(4x+1)^{2}}\right)\mathds{1}_{(-1/2,0)}(x).

As done in Schrab et al. (2021), we construct two-sample tests for d=1d=1 and d=2d=2, wherein we set c1=2.7c_{1}=2.7, c2=7.3c_{2}=7.3. The tests are constructed 500 times with a new value of w{1,1}Pdw\in\{-1,1\}^{P^{d}} being sampled uniformly each time, and the average power is computed for both our regularized test and MMDAgg. Figure 9(a) shows the power results of our test when d=1d=1 for P=1,2,3,4,5,6P=1,2,3,4,5,6 for both Gaussian and Laplace kernels and also when d=2d=2 for P=1,2,3P=1,2,3 for both Gaussian and Laplace kernels in comparison to that of other methods, where the Laplace kernel is defined as K(x,y)=exp(xy12h)K(x,y)=\exp\left(\frac{-\left\|x-y\right\|_{1}}{2h}\right) with the bandwidth hh being chosen as median{qq1:q,qXY}\text{median}\{\left\|q-q^{\prime}\right\|_{1}:q,q^{\prime}\in X\cup Y\}. It can be seen in Figure 9(a) that our proposed test performs similarly for both Tikhonov and Showalter regularizations, while significantly improving upon other methods, particularly in the difficult case of large perturbations (note that large perturbations make distinguishing the uniform and its perturbed version difficult). Moreover, we can also see from Figure 9(b) that the performance of the regularized test is not very sensitive to the choice of ss. Note that, when d=2d=2, it becomes really hard to differentiate the two samples for perturbations P3P\geq 3 when using a sample size smaller than N=M=2000N=M=2000, thus we presented the result for this choice of sample size in order to compare with other methods. We also investigated the effect of changing the sample size when d=2d=2 for perturbations P=2,3P=2,3 with different choices of ss as shown in Figure 10, which again shows the non-sensitivity of the power to the choice of ss while the power improving with increasing sample size.

Refer to caption
(a)
Refer to caption
(b)
Figure 9: Power for perturbed uniform distributions, for d=1d=1, N=M=500N=M=500 and for d=2d=2, N=M=2000N=M=2000.
Refer to caption
Figure 10: Perturbed uniform experiment with varying sample size for d=2d=2 with Showalter regularization and s=100s=100.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 11: Power with different kernel bandwidths. (a) Perturbed uniform distribution with d=1d=1, (b) Perturbed uniform distribution with d=2d=2, (c) MNIST data using N=M=500N=M=500.
Refer to caption
Refer to caption
Figure 12: Power for 1-dimensional perturbed uniform distributions and MNIST data using N+M=2000N+M=2000

5.3 Effect of kernel bandwidth

In this section, we investigate the effect of kernel bandwidth on the performance of the regularized test when no adaptation is done. Figure 11 shows the performance of the test under different choices of bandwidth, wherein we used both fixed bandwidth choices and bandwidths that are multiples of the median hmh_{m}. The results in Figures 11(a,b) are obtained using the perturbed uniform distribution data with d=1d=1 and d=2d=2, respectively while Figure 11(c) is obtained using the MNIST data with N=M=500N=M=500—basically using the same settings of Sections 5.2 and 5.1.3. We observe from Figures 11(a,b) that the performance is better at smaller bandwidths for the Gaussian kernel and deteriorates as the bandwidth gets too large, while a too-small or too-large bandwidth affects the performance in the case of a Laplacian kernel.

In Figure 11(c), we can observe that the performance gets better for large bandwidth and deteriorates when the bandwidth gets too small. Moreover, one can see from the results that for most choices of the bandwidth, the test based on η^λ\hat{\eta}_{\lambda} still yields a non-trivial power as the number of perturbations (or the index of QiQ_{i} in the case of the MNIST data) increases and eventually outperforms the MMDAgg test.

5.4 Unbalanced size for NN and MM

We investigated the performance of the regularized test when NMN\neq M and report the results in Figure 12 for the 1-dimensional perturbed uniform and MNIST data set using Gaussian kernel and for fixed N+M=2000N+M=2000. It can be observed that the best performance is for N=M=1000N=M=1000 as we get more representative samples from both of the distributions PP and QQ, which is also expected theoretically, as otherwise the rates are controlled by min(M,N)\text{min}(M,N).

6 Discussion

To summarize, we have proposed a two-sample test based on spectral regularization that not only uses the mean element information like the MMD test but also uses the information about the covariance operator and showed it to be minimax optimal w.r.t. the class of local alternatives 𝒫\mathcal{P} defined in (1.4). This test improves upon the MMD test in terms of the separation rate as the MMD test is shown to be not optimal w.r.t. 𝒫\mathcal{P}. We also presented a permutation version of the proposed regularized test along with adaptation over the regularization parameter, λ\lambda, and the kernel KK so that the resultant test is completely data-driven. Through numerical experiments, we also established the superiority of the proposed method over the MMD variant.

However, still there are some open questions that may be of interest to address. (i) The proposed test is computationally intensive and scales as O((N+M)s2)O((N+M)s^{2}) where ss is the number of samples used to estimate the covariance operator after sample splitting. Various approximation schemes like random Fourier features (Rahimi and Recht, 2008), Nyström method (e.g., Williams and Seeger, 2001; Drineas and Mahoney, 2005) or sketching (Yang et al., 2017) are known to speed up the kernel methods, i.e., computationally efficient tests can be constructed using any of these approximation methods. The question of interest, therefore, is about the computational vs. statistical trade-off of these approximate tests, i.e., are these computationally efficient tests still minimax optimal w.r.t. 𝒫\mathcal{P}? (ii) The construction of the proposed test statistic requires sample splitting, which helps in a convenient analysis. It is of interest to develop an analysis for the kernel version of Hotelling’s T2T^{2}-statistic (see Harchaoui et al., 2007) which does not require sample splitting. We conjecture that the theoretical results of Hotelling’s T2T^{2}-statistic will be similar to those of this paper, however, it may enjoy a better empirical performance but at the cost of higher computational complexity. (iii) The adaptive test presented in Section 4.5 only holds for the class of kernels, 𝒦\mathcal{K} for which |𝒦|<|\mathcal{K}|<\infty. It will be interesting to extend the analysis for 𝒦\mathcal{K} for which |𝒦|=|\mathcal{K}|=\infty, e.g., the class of Gaussian kernels with bandwidth in (0,)(0,\infty), or the class of convex combination of certain base kernels as explained in Section 4.5.

7 Proofs

In this section, we present the proofs of all the main results of the paper.

7.1 Proof of Theorem 3.1

Define a(x)=K(,x)μPa(x)=K(\cdot,x)-\mu_{P}, and b(x)=K(,x)μQb(x)=K(\cdot,x)-\mu_{Q}. Then we have

D^MMD2\displaystyle\hat{D}_{\mathrm{MMD}}^{2} =()1N(N1)ija(Xi),a(Xj)+1M(M1)ija(Yi),a(Yj)\displaystyle\stackrel{{\scriptstyle(*)}}{{=}}\frac{1}{N(N-1)}\sum_{i\neq j}\left\langle a(X_{i}),a(X_{j})\right\rangle_{\mathscr{H}}+\frac{1}{M(M-1)}\sum_{i\neq j}\left\langle a(Y_{i}),a(Y_{j})\right\rangle_{\mathscr{H}}
2NMi,ja(Xi),a(Yj)\displaystyle\qquad\qquad-\frac{2}{NM}\sum_{i,j}\left\langle a(X_{i}),a(Y_{j})\right\rangle_{\mathscr{H}}
=()1N(N1)ija(Xi),a(Xj)+1M(M1)ijb(Yi),b(Yj)\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{=}}\frac{1}{N(N-1)}\sum_{i\neq j}\left\langle a(X_{i}),a(X_{j})\right\rangle_{\mathscr{H}}+\frac{1}{M(M-1)}\sum_{i\neq j}{\left\langle b(Y_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}
+2Mi=1Mb(Yi),(μQμP)+DMMD22NMi,ja(Xi),b(Yj)\displaystyle\qquad\qquad+\frac{2}{M}\sum_{i=1}^{M}{\left\langle b(Y_{i}),(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}}+D_{\mathrm{MMD}}^{2}-\frac{2}{NM}\sum_{i,j}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}
2Ni=1Na(Xi),(μQμP),\displaystyle\qquad\qquad-\frac{2}{N}\sum_{i=1}^{N}{\left\langle a(X_{i}),(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}},

where ()(*) follows from Lemma A.2 by setting gλ(x)=1g_{\lambda}(x)=1 and ()({\dagger}) follows by writing a(Y)=b(Y)+(μQμP)a(Y)=b(Y)+(\mu_{Q}-\mu_{P}) in the last two terms. Thus we have

D^MMD2DMMD2\displaystyle\hat{D}_{\mathrm{MMD}}^{2}-D_{\mathrm{MMD}}^{2} =1+2+345,\displaystyle=\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}-\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}-\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}},

where 1:=1N(N1)ija(Xi),a(Xj)\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}:=\frac{1}{N(N-1)}\sum_{i\neq j}\left\langle a(X_{i}),a(X_{j})\right\rangle_{\mathscr{H}}, 2:=1M(M1)ijb(Yi),b(Yj)\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}:=\frac{1}{M(M-1)}\sum_{i\neq j}\left\langle b(Y_{i}),b(Y_{j})\right\rangle_{\mathscr{H}},

3:=2Mi=1Mb(Yi),(μQμP),4:=2NMi,ja(Xi),b(Yj),\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}:=\frac{2}{M}\sum_{i=1}^{M}{\left\langle b(Y_{i}),(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}},\,\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}:=\frac{2}{NM}\sum_{i,j}\left\langle a(X_{i}),b(Y_{j})\right\rangle_{\mathscr{H}},

and 5:=2Ni=1Na(Xi),(μQμP)\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}:=\frac{2}{N}\sum_{i=1}^{N}{\left\langle a(X_{i}),(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}}. Next, we bound the terms 15 as follows (similar to the technique in the proofs of Lemmas A.4, A.5, and A.6):

𝔼(12)4N2ΣP2()2,𝔼(22)4M2ΣQ2()2,\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}\right)\leq\frac{4}{N^{2}}\left\|\Sigma_{P}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2},\qquad\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}\right)\leq\frac{4}{M^{2}}\left\|\Sigma_{Q}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2},
𝔼(32)4MΣQ()μPμQ2,𝔼(52)4NΣP()μPμQ2,\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}\right)\leq\frac{4}{M}\left\|\Sigma_{Q}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mu_{P}-\mu_{Q}\right\|_{\mathscr{H}}^{2},\quad\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}\right)\leq\frac{4}{N}\left\|\Sigma_{P}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mu_{P}-\mu_{Q}\right\|_{\mathscr{H}}^{2},
and𝔼(42)\displaystyle\text{and}\quad\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}\right) 4NMΣP,ΣQ4NMΣPΣQ.\displaystyle\leq\frac{4}{NM}{\left\langle\Sigma_{P},\Sigma_{Q}\right\rangle}_{\mathscr{H}}\leq\frac{4}{NM}\left\|\Sigma_{P}\right\|_{\mathscr{H}}\left\|\Sigma_{Q}\right\|_{\mathscr{H}}.

Combining these bounds with the fact that aba2+b2\sqrt{ab}\leq\frac{a}{2}+\frac{b}{2}, and that (i=1kak)2ki=1kak2(\sum_{i=1}^{k}a_{k})^{2}\leq k\sum_{i=1}^{k}a_{k}^{2} for any a,b,aka,b,a_{k}\in\mathbb{R}, kk\in\mathbb{N} yields that

𝔼[(D^MMD2DMMD2)2]1N2+1M2+DMMD2N+DMMD2M()1(N+M)2+DMMD2N+M,\mathbb{E}[(\hat{D}_{\mathrm{MMD}}^{2}-D_{\mathrm{MMD}}^{2})^{2}]\lesssim\frac{1}{N^{2}}+\frac{1}{M^{2}}+\frac{D_{\mathrm{MMD}}^{2}}{N}+\frac{D_{\mathrm{MMD}}^{2}}{M}\stackrel{{\scriptstyle(*)}}{{\lesssim}}\frac{1}{(N+M)^{2}}+\frac{D_{\mathrm{MMD}}^{2}}{N+M}, (7.1)

where ()(*) follows from Lemma A.13.

When P=QP=Q, we have 3=5=0\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=0 and 𝔼(12)=𝔼(14)=𝔼(24)=0\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\cdot\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)=\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\cdot\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)=\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\cdot\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)=0. Therefore under H0H_{0},

𝔼[(D^MMD2)2]6ΣP2()2(1N2+1M2)()24κ2(1N2+1M2),\mathbb{E}[(\hat{D}_{\mathrm{MMD}}^{2})^{2}]\leq 6\left\|\Sigma_{P}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}\left(\frac{1}{N^{2}}+\frac{1}{M^{2}}\right)\stackrel{{\scriptstyle(*)}}{{\leq}}24\kappa^{2}\left(\frac{1}{N^{2}}+\frac{1}{M^{2}}\right), (7.2)

where in ()(*) we used ΣP2()24κ2\left\|\Sigma_{P}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}\leq 4\kappa^{2}. Thus using (7.2) and Chebyshev’s inequality yields

PH0{D^MMD2γ1}α,P_{H_{0}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{1}\}\leq\alpha,

where γ1=26κα(1N+1M)\gamma_{1}=\frac{2\sqrt{6}\kappa}{\sqrt{\alpha}}\left(\frac{1}{N}+\frac{1}{M}\right).

Moreover using the same exchangeability argument used in the proof of Theorem 4.6, it is easy to show that

PH0{D^MMD2γ2}α,P_{H_{0}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{2}\}\leq\alpha,

where γ2=q1α.\gamma_{2}=q_{1-\alpha}.
Bounding the power of the tests: For the threshold γ1\gamma_{1}, we can use the bound in (7.1) to bound the power. Let γ3=1δ(N+M)+DMMD2δN+M\gamma_{3}=\frac{1}{\sqrt{\delta}(N+M)}+\frac{\sqrt{D_{\mathrm{MMD}}^{2}}}{\sqrt{\delta}\sqrt{N+M}}, then C~γ3Var(D^MMD2)δ\tilde{C}\gamma_{3}\geq\sqrt{\frac{\text{Var}(\hat{D}_{\mathrm{MMD}}^{2})}{\delta}}, for some constant C~>0\tilde{C}>0. Thus

PH1{D^MMD2γ1}\displaystyle P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{1}\} ()PH1{D^MMD2DMMD2C~γ3}\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq D_{\mathrm{MMD}}^{2}-\tilde{C}\gamma_{3}\}
PH1{|D^MMD2DMMD2|C~γ3}()1δ,\displaystyle\geq P_{H_{1}}\{|\hat{D}_{\mathrm{MMD}}^{2}-D_{\mathrm{MMD}}^{2}|\leq\tilde{C}\gamma_{3}\}\stackrel{{\scriptstyle(**)}}{{\geq}}1-\delta,

where ()(*) holds when DMMD2γ1+C~γ3D_{\mathrm{MMD}}^{2}\geq\gamma_{1}+\tilde{C}\gamma_{3}, which is implied if DMMD21N+MD_{\mathrm{MMD}}^{2}\gtrsim\frac{1}{N+M}. ()(**) follows from (7.1) and an application of Chebyshev’s inequality. The desired result, therefore, holds by taking infimum over (P,Q)𝒫(P,Q)\in\mathcal{P}.

For the threshold γ2\gamma_{2}, using a similar approach to the proof of Lemma A.15 we can show that

PH1{γ2γ4}1δ,P_{H_{1}}\left\{\gamma_{2}\leq\gamma_{4}\right\}\geq 1-\delta, (7.3)

where γ4=C1log1αδ(M+N)(1+DMMD2)\gamma_{4}=\frac{C_{1}\log\frac{1}{\alpha}}{\sqrt{\delta}(M+N)}(1+D_{\mathrm{MMD}}^{2}), for some constant C1>0.C_{1}>0. Thus

PH1{D^MMD2γ2}\displaystyle P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{2}\} ()PH1{{D^MMD2DMMD2C~γ3}{γ2<γ4}}\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}P_{H_{1}}\left\{\{\hat{D}_{\mathrm{MMD}}^{2}\geq D_{\mathrm{MMD}}^{2}-\tilde{C}\gamma_{3}\}\cap\{\gamma_{2}<\gamma_{4}\}\right\}
1PH1{|D^MMD2DMMD2|C~γ3}PH1{γ2γ4}()12δ,\displaystyle\geq 1-P_{H_{1}}\{|\hat{D}_{\mathrm{MMD}}^{2}-D_{\mathrm{MMD}}^{2}|\geq\tilde{C}\gamma_{3}\}-P_{H_{1}}\left\{\gamma_{2}\geq\gamma_{4}\right\}\stackrel{{\scriptstyle(**)}}{{\geq}}1-2\delta,

where ()(*) holds when DMMD2γ4+C~γ3D_{\mathrm{MMD}}^{2}\geq\gamma_{4}+\tilde{C}\gamma_{3}, which is implied if N+Mlog(1/α)δN+M\geq\frac{\log(1/\alpha)}{\sqrt{\delta}} and DMMD21N+MD_{\mathrm{MMD}}^{2}\gtrsim\frac{1}{N+M}. ()(**) follows from (7.1) with an application of Chebyshev’s inequality and using (7.3).

Thus for both thresholds γ1\gamma_{1} and γ2\gamma_{2}, the condition to control the power is DMMD21N+MD_{\mathrm{MMD}}^{2}\gtrsim\frac{1}{N+M}, which in turn is implied if uL2(R)2(N+M)2θ2θ+1,\left\|u\right\|_{L^{2}(R)}^{2}\gtrsim(N+M)^{\frac{-2\theta}{2\theta+1}}, where in the last implication we used Lemma A.19. The desired result, therefore, holds by taking infimum over (P,Q)𝒫(P,Q)\in\mathcal{P}.

Finally, we will show that we cannot achieve a rate better than (M+N)2θ2θ+1(M+N)^{\frac{-2\theta}{2\theta+1}} over 𝒫\mathcal{P}. To this end, we will first show that if DMMD2=o((M+N)1)D_{\mathrm{MMD}}^{2}=o\left((M+N)^{-1}\right), then

lim infN.Minf(P,Q)𝒫PH1{D^MMD2γk}<1\liminf_{N.M\to\infty}\inf_{(P,Q)\in\mathcal{P}}P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{k}\}<1

for k{1,2}k\in\{1,2\}. Gretton et al. (2012, Appendix B.2) show that under H1H_{1},

(M+N)D^MMD2DS+2c(wx1/2zxwy1/2zy)+c2,(M+N)\hat{D}_{\mathrm{MMD}}^{2}\stackrel{{\scriptstyle D}}{{\to}}S+2c\left(w_{x}^{-1/2}z_{x}-w_{y}^{-1/2}z_{y}\right)+c^{2},

where zx𝒩(0,1),zy𝒩(0,1)z_{x}\sim\mathcal{N}(0,1),z_{y}\sim\mathcal{N}(0,1), S:=i=1λi((wx12aiwy12bi)2(wxwy)1),S:=\sum_{i=1}^{\infty}\lambda_{i}\left((w_{x}^{\frac{-1}{2}}a_{i}-w_{y}^{\frac{-1}{2}}b_{i})^{2}-(w_{x}w_{y})^{-1}\right), ai𝒩(0,1),a_{i}\sim\mathcal{N}(0,1), bi𝒩(0,1)b_{i}\sim\mathcal{N}(0,1), wx:=limN,MNN+M,w_{x}:=\lim_{N,M\to\infty}\frac{N}{N+M}, wy:=limN,MMN+Mw_{y}:=\lim_{N,M\to\infty}\frac{M}{N+M}, (λi)i(\lambda_{i})_{i} are the eigenvalues of the operator 𝒯\mathcal{T} and c2=(M+N)DMMD2.c^{2}=(M+N)D_{\mathrm{MMD}}^{2}. If DMMD2=o((M+N)1)D_{\mathrm{MMD}}^{2}=o\left((M+N)^{-1}\right), then c0c\to 0 as N,MN,M\to\infty and (M+N)D^MMD2DS(M+N)\hat{D}_{\mathrm{MMD}}^{2}\stackrel{{\scriptstyle D}}{{\to}}S which is the distribution under H0.H_{0}. Hence for k{1,2}k\in\{1,2\},

PH1{D^MMD2γk}=PH1{(M+N)D^MMD2(M+N)γk}P{Sdk}\displaystyle P_{H_{1}}\{\hat{D}_{\mathrm{MMD}}^{2}\geq\gamma_{k}\}=P_{H_{1}}\left\{(M+N)\hat{D}_{\mathrm{MMD}}^{2}\geq(M+N)\gamma_{k}\right\}\to P\{S\geq d_{k}\}

where dk=limN,M(M+N)γkd_{k}=\lim_{N,M\to\infty}(M+N)\gamma_{k}, thus d1=26κα(1wx+1wy)>0d_{1}=\frac{2\sqrt{6}\kappa}{\sqrt{\alpha}}\left(\frac{1}{w_{x}}+\frac{1}{w_{y}}\right)>0, and d20d_{2}\geq 0 using the symmetry of the permutation distribution . In both cases, by the definition of SS, we can deduce that P{Sdk}<1P\{S\geq d_{k}\}<1, which follows from the fact that the SS has a non-zero probability of being negative. Therefore it remains to show that when ΔN,M=o((N+M)2θ2θ+1)\Delta_{N,M}=o\left((N+M)^{\frac{-2\theta}{2\theta+1}}\right), we can construct (P,Q)𝒫(P,Q)\in\mathcal{P} such that DMMD2=o((M+N)1).D_{\mathrm{MMD}}^{2}=o\left((M+N)^{-1}\right). To this end, let RR be a probability measure. Recall that 𝒯=iIλiϕi~L2(R)ϕi~\mathcal{T}=\sum_{i\in I}\lambda_{i}\tilde{\phi_{i}}\otimes_{L^{2}(R)}\tilde{\phi_{i}}. Let ϕi¯=ϕi𝔼Rϕi\bar{\phi_{i}}=\phi_{i}-\mathbb{E}_{R}\phi_{i}, where ϕi=ϕi~λi\phi_{i}=\frac{\mathfrak{I}^{*}\tilde{\phi_{i}}}{\lambda_{i}}. Then ϕi¯=ϕi=𝒯ϕ~iλi=ϕi~.\mathfrak{I}\bar{\phi_{i}}=\mathfrak{I}\phi_{i}=\frac{\mathcal{T}\tilde{\phi}_{i}}{\lambda_{i}}=\tilde{\phi_{i}}. Suppose λi=h(i)\lambda_{i}=h(i), where hh is a strictly decreasing continuous function on \mathbb{N} (for example h=iβh=i^{-\beta}, and h=eτih=e^{-\tau i} respectively correspond to polynomial and exponential decay). Let L(N+M)=(ΔN,M)1/2θ=o((N+M)12θ+1)L(N+M)=(\Delta_{N,M})^{1/2\theta}=o\left((N+M)^{\frac{-1}{2\theta+1}}\right), k=h1(L(N+M))k=\lfloor h^{-1}\left(L(N+M)\right)\rfloor, hence λk=L(N+M)\lambda_{k}=L(N+M). Define

f:=ϕ¯k.f:=\bar{\phi}_{k}.

Then fL2(R)2=1,\left\|f\right\|_{L^{2}(R)}^{2}=1, and thus fL2(R)f\in L^{2}(R). Define

u~:=𝒯θf=λkθϕ~k,andu:=λkθϕ¯k.\tilde{u}:=\mathcal{T}^{\theta}f=\lambda_{k}^{\theta}\tilde{\phi}_{k},\quad\text{and}\quad u:=\lambda_{k}^{\theta}\bar{\phi}_{k}.

Note that 𝔼Ru=λθk𝔼Rϕk¯=0\mathbb{E}_{R}u=\lambda^{\theta}_{k}\mathbb{E}_{R}\bar{\phi_{k}}=0. Since u=u~\mathfrak{I}u=\tilde{u}, we have u[u~]Ran(𝒯θ),u\in[\tilde{u}]_{\sim}\in\text{Ran}(\mathcal{T}^{\theta}), uL2(R)2=λk2θ=ΔN,M.\left\|u\right\|_{L^{2}(R)}^{2}=\lambda_{k}^{2\theta}=\Delta_{N,M}. Next we bound |u(x)||u(x)| in the following two cases.

Case I: θ12\theta\geq\frac{1}{2} and supiϕi\sup_{i}\left\|\phi_{i}\right\|_{\infty} is not finite.
Note that

|u(x)|=λkθ|k(,x)μR,ϕk|λkθk(,x)μRϕk()2κλkθ12()1,|u(x)|=\lambda_{k}^{\theta}\left|{\left\langle k(\cdot,x)-\mu_{R},\phi_{k}\right\rangle}_{\mathscr{H}}\right|\leq\lambda_{k}^{\theta}\left\|k(\cdot,x)-\mu_{R}\right\|_{\mathscr{H}}\left\|\phi_{k}\right\|_{\mathscr{H}}\stackrel{{\scriptstyle(*)}}{{\leq}}2\sqrt{\kappa}\lambda_{k}^{\theta-\frac{1}{2}}\stackrel{{\scriptstyle({\dagger})}}{{\leq}}1,

where in ()(*) we used ϕk2=λk2ϕ~k,ϕ~k=λk1.\left\|\phi_{k}\right\|^{2}_{\mathscr{H}}=\lambda_{k}^{-2}{\left\langle\mathfrak{I}^{*}\tilde{\phi}_{k},\mathfrak{I}^{*}\tilde{\phi}_{k}\right\rangle}=\lambda_{k}^{-1}. In ()({\dagger}) we used θ>12.\theta>\frac{1}{2}.

Case II: supiϕi<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty.
In this case,

|u(x)|2supkϕkλkθ1,|u(x)|\leq 2\sup_{k}\left\|\phi_{k}\right\|_{\infty}\lambda_{k}^{\theta}\leq 1,

for N+MN+M large enough. This implies that in both cases we can find (P,Q)𝒫(P,Q)\in\mathcal{P} such that dPdR=u+1\frac{dP}{dR}=u+1 and dQdR=2dPdR.\frac{dQ}{dR}=2-\frac{dP}{dR}. Then for such (P,Q)(P,Q), we have D2MMD=4𝒯1/2uL2(R)2=4λk2θ+1=o((M+N)1)D^{2}_{\mathrm{MMD}}=4\left\|\mathcal{T}^{1/2}u\right\|_{L^{2}(R)}^{2}=4\lambda_{k}^{2\theta+1}=o\left((M+N)^{-1}\right).

7.2 Proof of Theorem 3.2

Let ϕ(X1,,XN,Y1,,YM)\phi(X_{1},\dots,X_{N},Y_{1},\dots,Y_{M}) be any test that rejects H0H_{0} when ϕ=1\phi=1 and fail to reject when ϕ=0\phi=0. Fix some probability probability measure RR, and let {(Pk,Qk)}Jk=1𝒫\{(P_{k},Q_{k})\}^{J}_{k=1}\subset\mathcal{P} such that Pk+Qk=2RP_{k}+Q_{k}=2R. First, we prove the following lemma.

Lemma 7.1.

For 0<δ<1α0<\delta<1-\alpha, if the following hold,

𝔼RN[(1Jk=1JdPkNdRN)2]1+(1αδ)2,\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dP_{k}^{N}}{dR^{N}}\right)^{2}\right]\leq 1+(1-\alpha-\delta)^{2}, (7.4)
𝔼RN[(1Jk=1JdQkNdRN)2]1+(1αδ)2,\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dQ_{k}^{N}}{dR^{N}}\right)^{2}\right]\leq 1+(1-\alpha-\delta)^{2}, (7.5)

then

RΔN,Mδ.R^{*}_{\Delta_{N,M}}\geq\delta.
Proof.

For any two-sample test ϕ(X1,,XN,Y1,,YM)ΦN,M,α\phi(X_{1},\dots,X_{N},Y_{1},\dots,Y_{M})\in\Phi_{N,M,\alpha}, define a corresponding one sample test as ΨQ(X1,,XN):=𝔼QMϕ(X1,,XN,Y1,,YM),\Psi_{Q}(X_{1},\dots,X_{N}):=\mathbb{E}_{Q^{M}}\phi(X_{1},\dots,X_{N},Y_{1},\dots,Y_{M}), and note that it is still an α\alpha-level test since QN({ΨQ=1})=𝔼QN[ΨQ]=𝔼QN×QM[ϕ]αQ^{N}(\{\Psi_{Q}=1\})=\mathbb{E}_{Q^{N}}[\Psi_{Q}]=\mathbb{E}_{Q^{N}\times Q^{M}}[\phi]\leq\alpha. For any ϕΦN,M,α\phi\in\Phi_{N,M,\alpha}, we have

RΔN,M(ϕ)=sup(P,Q)𝒫𝔼PN×QM[1ϕ]\displaystyle R_{\Delta_{N,M}}(\phi)=\sup_{(P,Q)\in\mathcal{P}}\mathbb{E}_{P^{N}\times Q^{M}}[1-\phi]
sup(P,Q)𝒫:P+Q=2R𝔼PN×QM[1ϕ]1Jk=1J𝔼PkN[1ΨQk]\displaystyle\geq\sup_{(P,Q)\in\mathcal{P}\ :\ P+Q=2R}\mathbb{E}_{P^{N}\times Q^{M}}[1-\phi]\geq\frac{1}{J}\sum_{k=1}^{J}\mathbb{E}_{P_{k}^{N}}[1-\Psi_{Q_{k}}]
=1Jk=1J[PkN({ΨQk=0})QkN({ΨQk=0})+QkN({ΨQk=0})]\displaystyle=\frac{1}{J}\sum_{k=1}^{J}\left[P_{k}^{N}(\{\Psi_{Q_{k}}=0\})-Q_{k}^{N}(\{\Psi_{Q_{k}}=0\})+Q_{k}^{N}(\{\Psi_{Q_{k}}=0\})\right]
1α+1Jk=1J[PkN({ΨQk=0})QkN({ΨQk=0})]\displaystyle\geq 1-\alpha+\frac{1}{J}\sum_{k=1}^{J}\left[P_{k}^{N}(\{\Psi_{Q_{k}}=0\})-Q_{k}^{N}(\{\Psi_{Q_{k}}=0\})\right]
=1Jk=1J[PkN({ΨQk=0})RN({ΨQk=0})]\displaystyle=\frac{1}{J}\sum_{k=1}^{J}\left[P_{k}^{N}(\{\Psi_{Q_{k}}=0\})-R^{N}(\{\Psi_{Q_{k}}=0\})\right]
+1Jk=1J[RN({ΨQk=0})QkN({ΨQk=0})]+1α\displaystyle\qquad\qquad\qquad+\frac{1}{J}\sum_{k=1}^{J}\left[R^{N}(\{\Psi_{Q_{k}}=0\})-Q_{k}^{N}(\{\Psi_{Q_{k}}=0\})\right]+1-\alpha
1αsup𝒜|1Jk=1J(PkN(𝒜)RN(𝒜))|sup𝒜|1Jk=1J(RN(𝒜)QkN(𝒜))|\displaystyle\geq 1-\alpha-\sup_{\mathcal{A}}\left|\frac{1}{J}\sum_{k=1}^{J}\left(P_{k}^{N}(\mathcal{A})-R^{N}(\mathcal{A})\right)\right|-\sup_{\mathcal{A}}\left|\frac{1}{J}\sum_{k=1}^{J}\left(R^{N}(\mathcal{A})-Q_{k}^{N}(\mathcal{A})\right)\right|
1α12𝔼RN[(1Jk=1JdPkNdRN)2]112𝔼RN[(1Jk=1JdQkNdRN)2]1,\displaystyle\geq 1-\alpha-\frac{1}{2}\sqrt{\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dP_{k}^{N}}{dR^{N}}\right)^{2}\right]-1}-\frac{1}{2}\sqrt{\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dQ_{k}^{N}}{dR^{N}}\right)^{2}\right]-1}\ ,

where the last inequality follows by noting that

sup𝒜|1Jk=1J(PkN(𝒜)RN(𝒜))|+sup𝒜|1Jk=1J(RN(𝒜)QkN(𝒜))|\displaystyle\sup_{\mathcal{A}}\left|\frac{1}{J}\sum_{k=1}^{J}\left(P_{k}^{N}(\mathcal{A})-R^{N}(\mathcal{A})\right)\right|+\sup_{\mathcal{A}}\left|\frac{1}{J}\sum_{k=1}^{J}\left(R^{N}(\mathcal{A})-Q_{k}^{N}(\mathcal{A})\right)\right|
=()121Jk=1JdPkNdRN1+12dRN1Jk=1JdQkN1\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{=}}\frac{1}{2}\left\|\frac{1}{J}\sum_{k=1}^{J}dP_{k}^{N}-dR^{N}\right\|_{1}+\frac{1}{2}\left\|dR^{N}-\frac{1}{J}\sum_{k=1}^{J}dQ_{k}^{N}\right\|_{1}
=12𝔼RN|1Jk=1JdPkNdRN1|+12𝔼RN|11Jk=1JdQkNdRN|\displaystyle=\frac{1}{2}\mathbb{E}_{R^{N}}\left|\frac{1}{J}\sum_{k=1}^{J}\frac{dP_{k}^{N}}{dR^{N}}-1\right|+\frac{1}{2}\mathbb{E}_{R^{N}}\left|1-\frac{1}{J}\sum_{k=1}^{J}\frac{dQ_{k}^{N}}{dR^{N}}\right|
12𝔼RN[(1Jk=1JdPkNdRN)2]1+12𝔼RN[(1Jk=1JdQkNdRN)2]1,\displaystyle\leq\frac{1}{2}\sqrt{\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dP_{k}^{N}}{dR^{N}}\right)^{2}\right]-1}+\frac{1}{2}\sqrt{\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dQ_{k}^{N}}{dR^{N}}\right)^{2}\right]-1},

where ()({\dagger}) uses the alternative definition of total variation distance using the L1L_{1}-distance. Thus taking the infimum over ϕΦN,M,α\phi\in\Phi_{N,M,\alpha} yields

RΔN,M1α12𝔼RN[(1Jk=1JdPkNdRN)2]112𝔼RN[(1Jk=1JdQkNdRN)2]1\displaystyle R^{*}_{\Delta_{N,M}}\geq 1-\alpha-\frac{1}{2}\sqrt{\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dP_{k}^{N}}{dR^{N}}\right)^{2}\right]-1}-\frac{1}{2}\sqrt{\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dQ_{k}^{N}}{dR^{N}}\right)^{2}\right]-1}

and the result follows. ∎

Thus, in order to show that a separation boundary ΔN,M\Delta_{N,M} will imply that RΔN,MδR^{*}_{\Delta_{N,M}}\geq\delta, it is sufficient to find a set of distributions {(Pk,Qk)}k\{(P_{k},Q_{k})\}_{k} such that (7.4) and (7.5) hold. Note that since Pk+Qk=2RP_{k}+Q_{k}=2R, it is clear that the operator 𝒯\mathcal{T} is fixed for all k{1,,J}k\in\{1,\dots,J\}. Recall 𝒯=iIλiϕi~L2(R)ϕi~\mathcal{T}=\sum_{i\in I}\lambda_{i}\tilde{\phi_{i}}\otimes_{L^{2}(R)}\tilde{\phi_{i}}. Let ϕi¯=ϕi𝔼Rϕi\bar{\phi_{i}}=\phi_{i}-\mathbb{E}_{R}\phi_{i}, where ϕi=ϕi~λi\phi_{i}=\frac{\mathfrak{I}^{*}\tilde{\phi_{i}}}{\lambda_{i}}. Then ϕi¯=ϕi=𝒯ϕ~iλi=ϕi~\mathfrak{I}\bar{\phi_{i}}=\mathfrak{I}\phi_{i}=\frac{\mathcal{T}\tilde{\phi}_{i}}{\lambda_{i}}=\tilde{\phi_{i}}. Furthermore, since the lower bound on RΔN,MR^{*}_{\Delta_{N,M}} is only in terms of NN, for simplicity, we will write ΔN,M\Delta_{N,M} as ΔN\Delta_{N}. However, since we assume MNDMM\leq N\leq DM, we can write the resulting bounds in terms of (N+M)(N+M) using Lemma A.13.

Following the ideas from Ingster (1987) and Ingster (1993), we now provide the proof of Theorem 3.2.

Case I: supiϕi<D<\sup_{i}\left\|\phi_{i}\right\|_{\infty}<D<\infty.

Let

BN=min{L1(ΔN1/2θ),1(2D)4ΔN2},B_{N}=\min\left\{L^{-1}\left(\Delta_{N}^{1/2\theta}\right),\frac{1}{(2D)^{4}\Delta_{N}^{2}}\right\},

CN=BNC_{N}=\lfloor\sqrt{B_{N}}\rfloor and aN=ΔNCNa_{N}=\sqrt{\frac{\Delta_{N}}{C_{N}}}. For k{1,,J}k\in\{1,\dots,J\}, define

fN,k:=aNi=1BNεkiλiθϕi¯,f_{N,k}:=a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\lambda_{i}^{-\theta}\bar{\phi_{i}},

where εk:={εk1,εk2,,εkBN}{0,1}BN\varepsilon_{k}:=\{\varepsilon_{k1},\varepsilon_{k2},\ldots,\varepsilon_{kB_{N}}\}\in\{0,1\}^{B_{N}} such that i=1BNεki=CN\sum_{i=1}^{B_{N}}\varepsilon_{ki}=C_{N}, thus J=(BNCN)J={B_{N}\choose C_{N}}. Then we have fN,k2L2(R)=aN2i=1BNλi2θεki2aN2(L(BN))2θCN()(ΔN1/2θ)2θΔN1\left\|f_{N,k}\right\|^{2}_{L^{2}(R)}=a_{N}^{2}\sum_{i=1}^{B_{N}}\lambda_{i}^{-2\theta}\varepsilon_{ki}^{2}\lesssim a_{N}^{2}(L(B_{N}))^{-2\theta}C_{N}\stackrel{{\scriptstyle(*)}}{{\lesssim}}(\Delta_{N}^{1/2\theta})^{-2\theta}\Delta_{N}\lesssim 1, where in ()(*) we used that L()L(\cdot) is a decreasing function. Thus, fN,kL2(R)f_{N,k}\in L^{2}(R). Define

u~N,k:=𝒯θfN,k=aNi=1BNλiθεkiλiθϕi¯,ϕi~L2(R)ϕi~=aNi=1BNεkiϕi~,\tilde{u}_{N,k}:=\mathcal{T}^{\theta}f_{N,k}=a_{N}\sum_{i=1}^{B_{N}}\lambda_{i}^{\theta}{\left\langle\varepsilon_{ki}\lambda_{i}^{-\theta}\bar{\phi_{i}},\tilde{\phi_{i}}\right\rangle}_{L^{2}(R)}\tilde{\phi_{i}}=a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\tilde{\phi_{i}},

and

uN,k:=aNi=1BNεkiϕi¯.u_{N,k}:=a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\bar{\phi_{i}}.

Note that 𝔼RuN,k=aNi=1BNεki𝔼Rϕi¯=0\mathbb{E}_{R}u_{N,k}=a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\mathbb{E}_{R}\bar{\phi_{i}}=0. Since uN,k=u~N,k\mathfrak{I}u_{N,k}=\tilde{u}_{N,k}, we have uN,k[u~N,k]Ran(𝒯θ)u_{N,k}\in[\tilde{u}_{N,k}]_{\sim}\in\text{Ran}(\mathcal{T}^{\theta}), uN,kL2(R)2=aN2CN=ΔN,\left\|u_{N,k}\right\|_{L^{2}(R)}^{2}=a_{N}^{2}C_{N}=\Delta_{N}, and

|uN,k(x)|\displaystyle|u_{N,k}(x)| 2aNi=1BNεkiϕi2aNCND=2ΔNBN1/4D1.\displaystyle\leq 2a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\left\|\phi_{i}\right\|_{\infty}\leq 2a_{N}C_{N}D=2\sqrt{\Delta_{N}}B_{N}^{1/4}D\leq 1.

This implies that we can find (Pk,Qk)𝒫(P_{k},Q_{k})\in\mathcal{P} such that dPkdR=uN,k+1\frac{dP_{k}}{dR}=u_{N,k}+1 and dQkdR=2dPkdR\frac{dQ_{k}}{dR}=2-\frac{dP_{k}}{dR}. Thus it remains to verify conditions (7.4) and (7.5) from Lemma 7.1.

For condition (7.4), we have

𝔼RN[(1Jk=1JdPkNdRN)2]=𝔼RN[(1Jk=1Jj=1N(uN,k(Xj)+1))2]\displaystyle\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\frac{dP_{k}^{N}}{dR^{N}}\right)^{2}\right]=\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{k=1}^{J}\prod_{j=1}^{N}(u_{N,k}(X_{j})+1)\right)^{2}\right]
=𝔼RN[1Jk=1Jj=1N(1+aNi=1BNεkiϕi¯(Xj))]2\displaystyle=\mathbb{E}_{R^{N}}\left[\frac{1}{J}\sum_{k=1}^{J}\prod_{j=1}^{N}\left(1+a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\bar{\phi_{i}}(X_{j})\right)\right]^{2}
=1J2k,k=1Jj=1N(1+aNi=1BNεki𝔼Rϕi¯(Xj)\displaystyle=\frac{1}{J^{2}}\sum_{k,k^{\prime}=1}^{J}\prod_{j=1}^{N}\left(1+a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\mathbb{E}_{R}\bar{\phi_{i}}(X_{j})\right.
+aNi=1BNεki𝔼Rϕi¯(Xj)+aN2i,l=1BNεkiεkl𝔼R[ϕi¯(Xj)ϕl¯(Xj)])\displaystyle\qquad\qquad\qquad\left.+a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{k^{\prime}i}\mathbb{E}_{R}\bar{\phi_{i}}(X_{j})+a_{N}^{2}\sum_{i,l=1}^{B_{N}}\varepsilon_{ki}\varepsilon_{k^{\prime}l}\mathbb{E}_{R}[\bar{\phi_{i}}(X_{j})\bar{\phi_{l}}(X_{j})]\right)
()1J2k,k=1J(1+aN2i=1BNεkiεki)N,\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{\leq}}\frac{1}{J^{2}}\sum_{k,k^{\prime}=1}^{J}\left(1+a_{N}^{2}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\varepsilon_{k^{\prime}i}\right)^{N},

where ()({\dagger}) follows from the fact that 𝔼R[ϕi¯(X)]=0\mathbb{E}_{R}[\bar{\phi_{i}}(X)]=0 and 𝔼R[ϕi¯(X)ϕl¯(X)]=ϕ¯i,ϕ¯lL2(R)=δil\mathbb{E}_{R}[\bar{\phi_{i}}(X)\bar{\phi_{l}}(X)]=\langle\bar{\phi}_{i},\bar{\phi}_{l}\rangle_{L^{2}(R)}=\delta_{il}. Similarly for (7.5), we have

𝔼RN[(1Ji=1JdQkNdRN)2]1J2k,k=1J(1+aN2i=1BNεkiεki)N,\displaystyle\mathbb{E}_{R^{N}}\left[\left(\frac{1}{J}\sum_{i=1}^{J}\frac{dQ_{k}^{N}}{dR^{N}}\right)^{2}\right]\leq\frac{1}{J^{2}}\sum_{k,k^{\prime}=1}^{J}\left(1+a_{N}^{2}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\varepsilon_{k^{\prime}i}\right)^{N},

where J=(BNCN)J={B_{N}\choose C_{N}}. Thus it is sufficient to show that c(α,δ)\exists c(\alpha,\delta) such that if ΔNc(α,δ)N4θβ4θβ+1\Delta_{N}\leq c(\alpha,\delta)N^{\frac{-4\theta\beta}{4\theta\beta+1}}, then 1J2k,k=1J(1+aN2i=1BNεkiεki)N1+(1αδ)2\frac{1}{J^{2}}\sum_{k,k^{\prime}=1}^{J}(1+a_{N}^{2}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\varepsilon_{k^{\prime}i})^{N}\leq 1+(1-\alpha-\delta)^{2}. To this end, consider

1J2k,k=1J(1+aN2i=1BNεkiεki)N=1J2(BNCN)i=0CN(BNCNCNi)(CNi)(1+aN2i)N\displaystyle\frac{1}{J^{2}}\sum_{k,k^{\prime}=1}^{J}\left(1+a_{N}^{2}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\varepsilon_{k^{\prime}i}\right)^{N}=\frac{1}{J^{2}}{B_{N}\choose C_{N}}\sum_{i=0}^{C_{N}}{B_{N}-C_{N}\choose C_{N}-i}{C_{N}\choose i}(1+a_{N}^{2}i)^{N}
i=0CN(BNCNCNi)(CNi)(BNCN)exp(NaN2i)=i=0CNHii!exp(NaN2i),\displaystyle\leq\sum_{i=0}^{C_{N}}\frac{{B_{N}-C_{N}\choose C_{N}-i}{C_{N}\choose i}}{{B_{N}\choose C_{N}}}\exp(Na_{N}^{2}i)=\sum_{i=0}^{C_{N}}\frac{H_{i}}{i!}\exp(Na_{N}^{2}i),

where Hi:=((BNCN)!CN!)2((CNi)!)2BN!(BN2CN+i)!H_{i}:=\frac{((B_{N}-C_{N})!C_{N}!)^{2}}{((C_{N}-i)!)^{2}B_{N}!(B_{N}-2C_{N}+i)!}. Then as argued in Ingster (1987), it can be shown for any r>0r>0, we have HiHi11+r\frac{H_{i}}{H_{i-1}}\leq 1+r and H0exp(r1)H_{0}\leq\exp(r-1), thus Hi(1+r)ieH_{i}\leq\frac{(1+r)^{i}}{e}, which yields that for any r>0r>0, we have

i=0CNHii!exp(NaN2i)\displaystyle\sum_{i=0}^{C_{N}}\frac{H_{i}}{i!}\exp(Na_{N}^{2}i) exp((1+r)exp(NΔNBN1/2)1).\displaystyle\leq\exp\left((1+r)\exp(N\Delta_{N}B_{N}^{-1/2})-1\right).

Since NΔNc(α,δ)BN1/2,N\Delta_{N}\leq c(\alpha,\delta)B_{N}^{1/2}, we can choose c(α,δ)c(\alpha,\delta) and rr such that

exp((1+r)exp(NΔNBN1/2)1)1+(1αδ)2\exp\left((1+r)\exp(N\Delta_{N}B_{N}^{-1/2})-1\right)\leq 1+(1-\alpha-\delta)^{2}

and therefore both (7.4) and (7.5) hold.

Case II: supiϕi\sup_{i}\left\|\phi_{i}\right\|_{\infty} is not finite.

Since λiL(i)\lambda_{i}\asymp L(i), there exists constants A¯>0\underline{A}>0 and A¯>0\bar{A}>0 such that A¯L(i)λiA¯L(i)\underline{A}L(i)\leq\lambda_{i}\leq\bar{A}L(i). Let

BN=min{L1(ΔN1/2θ),L1(4κA¯1ΔN)},B_{N}=\min\left\{L^{-1}\left(\Delta_{N}^{1/2\theta}\right),L^{-1}\left(4\kappa\underline{A}^{-1}\Delta_{N}\right)\right\},

CN=BNC_{N}=\lfloor\sqrt{B_{N}}\rfloor and aN=ΔNCNa_{N}=\sqrt{\frac{\Delta_{N}}{C_{N}}}. The proof proceeds similarly to that of Case I by noting that

|uN,k(x)|\displaystyle|u_{N,k}(x)| =|k(,x)μR,aNi=1BNεkiϕi|k(,x)μRaNi=1BNεkiϕi\displaystyle=\left|{\left\langle k(\cdot,x)-\mu_{R},a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\phi_{i}\right\rangle}_{\mathscr{H}}\right|\leq\left\|k(\cdot,x)-\mu_{R}\right\|_{\mathscr{H}}\left\|a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\phi_{i}\right\|_{\mathscr{H}}
2κaNi=1BNεkiϕi()1,\displaystyle\leq 2\sqrt{\kappa}\left\|a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\phi_{i}\right\|_{\mathscr{H}}\stackrel{{\scriptstyle(*)}}{{\leq}}1,

where ()(*) follows from

aNi=1BNεkiϕi2\displaystyle\left\|a_{N}\sum_{i=1}^{B_{N}}\varepsilon_{ki}\phi_{i}\right\|_{\mathscr{H}}^{2} =aN2i,j=1BNεkiεkjϕi,ϕj=aN2i,j=1BNλi1εkiεkjϕi~,ϕj~L2(R)\displaystyle=a_{N}^{2}\sum_{i,j=1}^{B_{N}}\varepsilon_{ki}\varepsilon_{kj}{\left\langle\phi_{i},\phi_{j}\right\rangle}_{\mathscr{H}}=a_{N}^{2}\sum_{i,j=1}^{B_{N}}\lambda_{i}^{-1}\varepsilon_{ki}\varepsilon_{kj}{\left\langle\tilde{\phi_{i}},\tilde{\phi_{j}}\right\rangle}_{L^{2}(R)}
=aN2i=1BNλi1εki2aN2A¯1(L(BN))1CN()ΔNA¯114κA¯1ΔN14κ,\displaystyle=a_{N}^{2}\sum_{i=1}^{B_{N}}\lambda_{i}^{-1}\varepsilon_{ki}^{2}\leq a_{N}^{2}\underline{A}^{-1}(L(B_{N}))^{-1}C_{N}\stackrel{{\scriptstyle(*)}}{{\leq}}\Delta_{N}\underline{A}^{-1}\frac{1}{4\kappa\underline{A}^{-1}\Delta_{N}}\leq\frac{1}{4\kappa},

where ()(*) follows since L()L(\cdot) is a decreasing function. The rest of the proof proceeds exactly similar to that of Case I.

7.3 Proof of Corollary 3.3

If λiiβ,\lambda_{i}\asymp i^{-\beta}, β>1\beta>1, then Theorem 3.2 yields that RΔN,M>δR^{*}_{\Delta_{N,M}}>\delta, if

(N+M)ΔN,Mmin{(ΔN,M1/2θ)1/β,(ΔN,M)1/β},(N+M)\Delta_{N,M}\lesssim\sqrt{\min\left\{\left(\Delta_{N,M}^{1/2\theta}\right)^{-1/\beta},\left(\Delta_{N,M}\right)^{-1/\beta}\right\}},

which is equivalent to ΔN,Mmin{(N+M)4θβ1+4θβ,(N+M)2β1+2β}.\Delta_{N,M}\lesssim\min\left\{\left(N+M\right)^{\frac{-4\theta\beta}{1+4\theta\beta}},\left(N+M\right)^{\frac{-2\beta}{1+2\beta}}\right\}. This implies the following lower bound on the minimax separation

ΔN,Mmin{(N+M)4θβ1+4θβ,(N+M)2β1+2β}.\Delta^{*}_{N,M}\gtrsim\min\left\{\left(N+M\right)^{\frac{-4\theta\beta}{1+4\theta\beta}},\left(N+M\right)^{\frac{-2\beta}{1+2\beta}}\right\}. (7.6)

Observe that 4θβ1+4θβ2β1+2β\frac{4\theta\beta}{1+4\theta\beta}\geq\frac{2\beta}{1+2\beta} when θ12\theta\geq\frac{1}{2}. Matching the upper and lower bound on ΔN,M\Delta^{*}_{N,M} by combining (7.6) with the result from Corollary 4.4 when ξ=\xi=\infty implies that ΔN,M(N+M)4θβ1+4θβ\Delta^{*}_{N,M}\asymp\left(N+M\right)^{\frac{-4\theta\beta}{1+4\theta\beta}} when θ>12\theta>\frac{1}{2}. Similarly for the case supkϕk<\sup_{k}\left\|\phi_{k}\right\|_{\infty}<\infty, Theorem 3.2 yields that RΔN,M>δR^{*}_{\Delta_{N,M}}>\delta if

(N+M)ΔN,Mmin{(ΔN,M1/2θ)1/β,ΔN,M2},(N+M)\Delta_{N,M}\lesssim\sqrt{\min\left\{\left(\Delta_{N,M}^{1/2\theta}\right)^{-1/\beta},\Delta_{N,M}^{-2}\right\}},

which implies that

ΔN,Mmin{(N+M)4θβ1+4θβ,(N+M)1/2}.\Delta^{*}_{N,M}\gtrsim\min\left\{\left(N+M\right)^{\frac{-4\theta\beta}{1+4\theta\beta}},\left(N+M\right)^{-1/2}\right\}. (7.7)

Then combining (7.7) with the result from Corollary 4.4 when ξ=\xi=\infty yields that ΔN,M(N+M)4θβ1+4θβ\Delta^{*}_{N,M}\asymp\left(N+M\right)^{\frac{-4\theta\beta}{1+4\theta\beta}} when θ>14β\theta>\frac{1}{4\beta}.

7.4 Proof of Corollary 3.4

If λieτi\lambda_{i}\asymp e^{-\tau i}, then Theorem 3.2 yields that RΔN,M>δR^{*}_{\Delta_{N,M}}>\delta, if

(N+M)ΔN,Mlog(ΔN,M1),(N+M)\Delta_{N,M}\lesssim\sqrt{\log\left(\Delta_{N,M}^{-1}\right)},

which is implied if

ΔN,M(log(N+M))bN+M,\Delta_{N,M}\lesssim\frac{(\log(N+M))^{b}}{N+M},

for any b<12b<\frac{1}{2} and N+MN+M large enough. Furthermore if supkϕk<\sup_{k}\left\|\phi_{k}\right\|_{\infty}<\infty, Theorem 3.2 yields that

(N+M)ΔN,Mmin{log(ΔN,M1),ΔN,M2},(N+M)\Delta_{N,M}\lesssim\sqrt{\min\left\{\log\left(\Delta_{N,M}^{-1}\right),\Delta_{N,M}^{-2}\right\}},

which is implied if ΔN,Mmin{(log(N+M))bN+M,(N+M)1/2}=(log(N+M))bN+M,\Delta_{N,M}\lesssim\min\left\{\frac{(\log(N+M))^{b}}{N+M},(N+M)^{-1/2}\right\}=\frac{(\log(N+M))^{b}}{N+M}, for any b<12b<\frac{1}{2} and N+MN+M large enough. Thus for both cases, we can deduce that ΔN,M(log(N+M))bN+M,\Delta^{*}_{N,M}\gtrsim\frac{(\log(N+M))^{b}}{N+M}, for any b<12b<\frac{1}{2}, then by taking supremum over b<12b<\frac{1}{2} yields that

ΔN,M(log(N+M))N+M.\Delta^{*}_{N,M}\gtrsim\frac{\sqrt{(\log(N+M))}}{N+M}. (7.8)

Matching the upper and lower bound on ΔN,M\Delta^{*}_{N,M} by combining (7.8) with the result from Corollary 4.5 when ξ=\xi=\infty yields the desired result.

7.5 Proof of Theorem 4.1

Define

Sz:s,f1s(f(Z1),,f(Zs))S_{z}:\mathscr{H}\to\mathbb{R}^{s},\ \ f\to\frac{1}{\sqrt{s}}(f(Z_{1}),\cdot\cdot\cdot,f(Z_{s}))^{\top}

so that

Sz:s,α1si=1sαiK(,Zi).S_{z}^{*}:\mathbb{R}^{s}\to\mathscr{H},\ \ \alpha\to\frac{1}{\sqrt{s}}\sum_{i=1}^{s}\alpha_{i}K(\cdot,Z_{i}).

It can be shown that Σ^PQ=Sz𝑯~sSz\hat{\Sigma}_{PQ}=S_{z}^{*}\tilde{\boldsymbol{H}}_{s}S_{z} (Sterge and Sriperumbudur, 2022, Proposition C.1). Also, it can be shown that if (λ^i,αi^)i(\hat{\lambda}_{i},\hat{\alpha_{i}})_{i} is the eigensystem of 1s𝑯s~1/2Ks𝑯s~1/2\frac{1}{s}\tilde{\boldsymbol{H}_{s}}^{1/2}K_{s}\tilde{\boldsymbol{H}_{s}}^{1/2} where Ks:=[K(Zi,Zj)]i,j[s]K_{s}:=[K(Z_{i},Z_{j})]_{i,j\in[s]}, 𝑯s=𝑰s1s𝟏s𝟏s\boldsymbol{H}_{s}=\boldsymbol{I}_{s}-\frac{1}{s}\boldsymbol{1}_{s}\boldsymbol{1}_{s}^{\top}, and 𝑯~s=ss1𝑯s\tilde{\boldsymbol{H}}_{s}=\frac{s}{s-1}\boldsymbol{H}_{s}, then (λ^i,ϕi^)i(\hat{\lambda}_{i},\hat{\phi_{i}})_{i} is the eigensystem of Σ^PQ\hat{\Sigma}_{PQ}, where

ϕ^i=1λ^iSz𝑯s~1/2αi^.\hat{\phi}_{i}=\frac{1}{\sqrt{\hat{\lambda}_{i}}}S_{z}^{*}\tilde{\boldsymbol{H}_{s}}^{1/2}\hat{\alpha_{i}}. (7.9)

We refer the reader to Sriperumbudur and Sterge (2022, Proposition 1) for details. Using (7.9) in the definition of gλ(Σ^PQ)g_{\lambda}(\hat{\Sigma}_{PQ}), we have

gλ(Σ^PQ)\displaystyle g_{\lambda}(\hat{\Sigma}_{PQ}) =gλ(0)𝑰+Sz𝑯~s1/2[i(gλ(λ^i)gλ(0)λ^i)α^iα^i]𝑯~s1/2Sz\displaystyle=g_{\lambda}(0)\boldsymbol{I}+S_{z}^{*}\tilde{\boldsymbol{H}}_{s}^{1/2}\left[\sum_{i}\left(\frac{g_{\lambda}(\hat{\lambda}_{i})-g_{\lambda}(0)}{\hat{\lambda}_{i}}\right)\hat{\alpha}_{i}\hat{\alpha}_{i}^{\top}\right]\tilde{\boldsymbol{H}}_{s}^{1/2}S_{z}
=gλ(0)𝑰+Sz𝑯~s1/2G𝑯~s1/2Sz.\displaystyle=g_{\lambda}(0)\boldsymbol{I}+S_{z}^{*}\tilde{\boldsymbol{H}}_{s}^{1/2}G\tilde{\boldsymbol{H}}_{s}^{1/2}S_{z}.

Define 𝟏n:=(1,n,1)\boldsymbol{1}_{n}:=(1,\stackrel{{\scriptstyle n}}{{\ldots}},1)^{\top}, and let 𝟏ni\boldsymbol{1}_{n}^{i} be a vector of zeros with only the ithi^{th} entry equal one. Also we define SxS_{x} and SyS_{y} similar to that of SzS_{z} based on samples (Xi)ni=1(X_{i})^{n}_{i=1} and (Yi)mi=1(Y_{i})^{m}_{i=1}, respectively. Based on Sterge and Sriperumbudur (2022, Proposition C.1), it can be shown that Ks=sSzSzK_{s}=sS_{z}S_{z}^{*}, Kn:=nSxSxK_{n}:=nS_{x}S_{x}^{*}, Km=mSySyK_{m}=mS_{y}S_{y}^{*}, Kns=nsSxSzK_{ns}=\sqrt{ns}S_{x}S_{z}^{*}, Kms=msSySzK_{ms}=\sqrt{ms}S_{y}S_{z}^{*}, and Kmn=mnSySxK_{mn}=\sqrt{mn}S_{y}S_{x}^{*}.

Based on these observations, we have

1 =gλ(Σ^PQ)i=1nK(,Xi),j=1nK(,Xj)=ngλ(Σ^PQ)Sx𝟏n,Sx𝟏n\displaystyle={\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})\sum_{i=1}^{n}K(\cdot,X_{i}),\sum_{j=1}^{n}K(\cdot,X_{j})\right\rangle}_{\mathscr{H}}=n{\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})S_{x}^{*}\boldsymbol{1}_{n},S_{x}^{*}\boldsymbol{1}_{n}\right\rangle}_{\mathscr{H}}
=nSxgλ(Σ^PQ)Sx𝟏n,𝟏n2\displaystyle=n{\left\langle S_{x}g_{\lambda}(\hat{\Sigma}_{PQ})S_{x}^{*}\boldsymbol{1}_{n},\boldsymbol{1}_{n}\right\rangle}_{2}
=ngλ(0)1nKn+1nsKns𝑯~s1/2G𝑯~s1/2Kns𝟏n,𝟏n2\displaystyle=n{\left\langle g_{\lambda}(0)\frac{1}{n}K_{n}+\frac{1}{ns}K_{ns}\tilde{\boldsymbol{H}}_{s}^{1/2}G\tilde{\boldsymbol{H}}_{s}^{1/2}K_{ns}^{\top}\boldsymbol{1}_{n},\boldsymbol{1}_{n}\right\rangle}_{2}
=𝟏n(gλ(0)Kn+1sKns𝑯~1/2sG𝑯~1/2sKns)𝟏n,\displaystyle=\boldsymbol{1}_{n}^{\top}\left(g_{\lambda}(0)K_{n}+\frac{1}{s}K_{ns}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ns}^{\top}\right)\boldsymbol{1}_{n},
2 =i=1ngλ(Σ^PQ)K(,Xi),K(,Xi)=ni=1ngλ(Σ^PQ)Sx𝟏ni,Sx𝟏ni\displaystyle=\sum_{i=1}^{n}{\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})K(\cdot,X_{i}),K(\cdot,X_{i})\right\rangle}_{\mathscr{H}}=n\sum_{i=1}^{n}{\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})S_{x}^{*}\boldsymbol{1}_{n}^{i},S_{x}^{*}\boldsymbol{1}_{n}^{i}\right\rangle}_{\mathscr{H}}
=ni=1nSxgλ(Σ^PQ)Sx𝟏ni,𝟏ni2\displaystyle=n\sum_{i=1}^{n}{\left\langle S_{x}g_{\lambda}(\hat{\Sigma}_{PQ})S_{x}^{*}\boldsymbol{1}_{n}^{i},\boldsymbol{1}_{n}^{i}\right\rangle}_{2}
=ni=1ngλ(0)1nKn+1nsKns𝑯~s1/2G𝑯~s1/2Kns𝟏ni,𝟏ni2\displaystyle=n\sum_{i=1}^{n}{\left\langle g_{\lambda}(0)\frac{1}{n}K_{n}+\frac{1}{ns}K_{ns}\tilde{\boldsymbol{H}}_{s}^{1/2}G\tilde{\boldsymbol{H}}_{s}^{1/2}K_{ns}^{\top}\boldsymbol{1}_{n}^{i},\boldsymbol{1}_{n}^{i}\right\rangle}_{2}
=Tr(gλ(0)Kn+1sKns𝑯~1/2sG𝑯~1/2sKns),\displaystyle=\text{Tr}\left(g_{\lambda}(0)K_{n}+\frac{1}{s}K_{ns}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ns}^{\top}\right),
3 =gλ(Σ^PQ)i=1mK(,Yi),j=1mK(,Yj)\displaystyle={\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})\sum_{i=1}^{m}K(\cdot,Y_{i}),\sum_{j=1}^{m}K(\cdot,Y_{j})\right\rangle}_{\mathscr{H}}
=𝟏m(gλ(0)Km+1sKms𝑯~1/2sG𝑯~1/2sKms)𝟏m,\displaystyle=\boldsymbol{1}_{m}^{\top}\left(g_{\lambda}(0)K_{m}+\frac{1}{s}K_{ms}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ms}^{\top}\right)\boldsymbol{1}_{m},
4 =i=1mgλ(Σ^PQ)K(,Yi),K(,Yi)\displaystyle=\sum_{i=1}^{m}{\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})K(\cdot,Y_{i}),K(\cdot,Y_{i})\right\rangle}_{\mathscr{H}}
=Tr(gλ(0)Km+1sKms𝑯~1/2sG𝑯~1/2sKms),\displaystyle=\text{Tr}\left(g_{\lambda}(0)K_{m}+\frac{1}{s}K_{ms}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ms}^{\top}\right),

and

5 =gλ(Σ^PQ)i=1nK(,Xi),i=1mK(,Yi)\displaystyle={\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})\sum_{i=1}^{n}K(\cdot,X_{i}),\sum_{i=1}^{m}K(\cdot,Y_{i})\right\rangle}_{\mathscr{H}}
=nmgλ(Σ^PQ)Sx𝟏n,Sy𝟏m\displaystyle=\sqrt{nm}{\left\langle g_{\lambda}(\hat{\Sigma}_{PQ})S_{x}^{*}\boldsymbol{1}_{n},S_{y}^{*}\boldsymbol{1}_{m}\right\rangle}_{\mathscr{H}}
=nmSygλ(Σ^PQ)Sx𝟏n,𝟏m2\displaystyle=\sqrt{nm}{\left\langle S_{y}g_{\lambda}(\hat{\Sigma}_{PQ})S_{x}^{*}\boldsymbol{1}_{n},\boldsymbol{1}_{m}\right\rangle}_{2}
=nmgλ(0)1nmKmn+1snmKms𝑯~s1/2G𝑯~s1/2Kns𝟏n,𝟏m2\displaystyle=\sqrt{nm}{\left\langle g_{\lambda}(0)\frac{1}{\sqrt{nm}}K_{mn}+\frac{1}{s\sqrt{nm}}K_{ms}\tilde{\boldsymbol{H}}_{s}^{1/2}G\tilde{\boldsymbol{H}}_{s}^{1/2}K_{ns}^{\top}\boldsymbol{1}_{n},\boldsymbol{1}_{m}\right\rangle}_{2}
=𝟏m(gλ(0)Kmn+1sKms𝑯~1/2sG𝑯~1/2sKns)𝟏n.\displaystyle=\boldsymbol{1}_{m}^{\top}\left(g_{\lambda}(0)K_{mn}+\frac{1}{s}K_{ms}\tilde{\boldsymbol{H}}^{1/2}_{s}G\tilde{\boldsymbol{H}}^{1/2}_{s}K_{ns}^{\top}\right)\boldsymbol{1}_{n}.

7.6 Proof of Theorem 4.2

Since 𝔼(η^λ|(Zi)i=1s)=0\mathbb{E}(\hat{\eta}_{\lambda}|(Z_{i})_{i=1}^{s})=0, an application of Chebyshev’s inequality via Lemma A.12 yields,

PH0{|η^λ|6(C1+C2)()2𝒩2(λ)δ(1n+1m)|(Zi)i=1s}δ.P_{H_{0}}\left\{|\hat{\eta}_{\lambda}|\geq\frac{\sqrt{6}(C_{1}+C_{2})\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\mathcal{N}_{2}(\lambda)}{\sqrt{\delta}}\left(\frac{1}{n}+\frac{1}{m}\right)\Big{|}(Z_{i})_{i=1}^{s}\right\}\leq\delta.

By defining

γ1:=26(C1+C2)𝒩2(λ)δ(1n+1m),\gamma_{1}:=\frac{2\sqrt{6}(C_{1}+C_{2})\mathcal{N}_{2}(\lambda)}{\sqrt{\delta}}\left(\frac{1}{n}+\frac{1}{m}\right),
γ2:=6(C1+C2)()2𝒩2(λ)δ(1n+1m),\gamma_{2}:=\frac{\sqrt{6}(C_{1}+C_{2})\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\mathcal{N}_{2}(\lambda)}{\sqrt{\delta}}\left(\frac{1}{n}+\frac{1}{m}\right),

we obtain

PH0{η^λγ1}\displaystyle P_{H_{0}}\{\hat{\eta}_{\lambda}\leq\gamma_{1}\} PH0{{η^λγ2}{γ2γ1}}\displaystyle\geq P_{H_{0}}\{\{\hat{\eta}_{\lambda}\leq\gamma_{2}\}\ \cap\{\gamma_{2}\leq\gamma_{1}\}\}
1PH0{η^λγ2}PH0{γ2γ1}()13δ,\displaystyle\geq 1-P_{H_{0}}\{\hat{\eta}_{\lambda}\geq\gamma_{2}\}-P_{H_{0}}\{\gamma_{2}\geq\gamma_{1}\}\stackrel{{\scriptstyle(*)}}{{\geq}}1-3\delta,

where ()(*) follows using

PH0{η^λγ2}PH0{|η^λ|γ2}=𝔼Rs[PH0{|η^λ|γ2|(Zi)i=1s}]δ,\displaystyle P_{H_{0}}\{\hat{\eta}_{\lambda}\geq\gamma_{2}\}\leq P_{H_{0}}\{|\hat{\eta}_{\lambda}|\geq\gamma_{2}\}=\mathbb{E}_{R^{s}}\left[P_{H_{0}}\{|\hat{\eta}_{\lambda}|\geq\gamma_{2}|(Z_{i})_{i=1}^{s}\}\right]\leq\delta,

and

PH0{γ2γ1}=PH0{()22}()2δ,\displaystyle P_{H_{0}}\{\gamma_{2}\geq\gamma_{1}\}=P_{H_{0}}\{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\geq 2\}\stackrel{{\scriptstyle({\dagger})}}{{\leq}}2\delta,

where ()({\dagger}) follows from (Sriperumbudur and Sterge, 2022, Lemma B.2(ii)), under the condition that 140κslog16κsδλΣPQ()\frac{140\kappa}{s}\log\frac{16\kappa s}{\delta}\leq\lambda\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}. When C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, using Lemma A.17, we can obtain an improved condition on λ\lambda satisfying 136C2𝒩1(λ)log8𝒩1(λ)δs136C^{2}\mathcal{N}_{1}(\lambda)\log\frac{8\mathcal{N}_{1}(\lambda)}{\delta}\leq s and λΣPQ()\lambda\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}. The desired result then follows by setting δ=α3.\delta=\frac{\alpha}{3}.

7.7 Proof of Theorem 4.3

Let =Σ^PQ,λ1/2ΣPQ,λ1/2\mathcal{M}=\hat{\Sigma}_{PQ,\lambda}^{-1/2}\Sigma_{PQ,\lambda}^{1/2}, and

γ1=1δ(CλuL2(R)+𝒩2(λ)n+m+Cλ1/4uL2(R)3/2+uL2(R)n+m),\gamma_{1}=\frac{1}{\sqrt{\delta}}\left(\frac{\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}+\mathcal{N}_{2}(\lambda)}{n+m}+\frac{C_{\lambda}^{1/4}\left\|u\right\|_{L^{2}(R)}^{3/2}+\left\|u\right\|_{L^{2}(R)}}{\sqrt{n+m}}\right),

where CλC_{\lambda} is defined in Lemma A.9. Then Lemma A.12 implies

C~()2γ1Var(η^λ|(Zi)i=1s)δ\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{1}\geq\sqrt{\frac{\text{Var}(\hat{\eta}_{\lambda}|(Z_{i})_{i=1}^{s})}{\delta}}

for some constant C~>0\tilde{C}>0. By Lemma A.1, if

P{γζC~()2γ1}δ,P\left\{\gamma\geq\zeta-\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{1}\right\}\leq\delta, (7.10)

for any (P,Q)𝒫(P,Q)\in\mathcal{P}, then we obtain P{η^λγ}12δ.P\{\hat{\eta}_{\lambda}\geq\gamma\}\geq 1-2\delta. The result follows by taking the infimum over (P,Q)𝒫(P,Q)\in\mathcal{P}. Therefore, it remains to verify (7.10), which we do below. Define c2:=B3C4(C1+C2)1c_{2}:=B_{3}C_{4}(C_{1}+C_{2})^{-1}. Consider

PH1{γζC~()2γ1}\displaystyle P_{H_{1}}\{\gamma\leq\zeta-\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{1}\}
()PH1{γc212()uL2(R)2C~()2γ1}\displaystyle\stackrel{{\scriptstyle({\ddagger})}}{{\geq}}P_{H_{1}}\left\{\gamma\leq c_{2}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|u\right\|_{L^{2}(R)}^{2}-\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{1}\right\}
=PH1{12()γ+C~γ112()()2c2uL2(R)21}\displaystyle=P_{H_{1}}\left\{\frac{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\gamma+\tilde{C}\gamma_{1}\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}}{c_{2}\left\|u\right\|_{L^{2}(R)}^{2}}\leq 1\right\}
()PH1{12()3+12()()261}\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}P_{H_{1}}\left\{\frac{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}}{3}+\frac{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}}{6}\leq 1\right\}
PH1{{12()32}{2()2}}\displaystyle\geq P_{H_{1}}\left\{\left\{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\leq\frac{3}{2}\right\}\ \cap\left\{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\leq 2\right\}\right\}
1PH1{12()32}PH1{2()2}\displaystyle\geq 1-P_{H_{1}}\left\{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\geq\frac{3}{2}\right\}-P_{H_{1}}\left\{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\geq 2\right\}
()1δ,\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{\geq}}1-\delta,

where ()({\ddagger}) follows by using ζc212()uL2(R)2\zeta\geq c_{2}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|u\right\|_{L^{2}(R)}^{2}, which is obtained by combining Lemma A.11 with Lemma A.7 under the assumptions uRan(𝒯θ)u\in\text{\text{Ran}}(\mathcal{T}^{\theta}), and

uL2(R)24C33B3𝒯()2max(θξ,0)λ2θ~𝒯θuL2(R)2.\left\|u\right\|_{L^{2}(R)}^{2}\geq\frac{4C_{3}}{3B_{3}}\left\|\mathcal{T}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2\max(\theta-\xi,0)}\lambda^{2\tilde{\theta}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}^{2}. (7.11)

Note that uRan(𝒯θ)u\in\text{\text{Ran}}(\mathcal{T}^{\theta}) is guaranteed since (P,Q)𝒫(P,Q)\in\mathcal{P} and

uL2(R)2c4λ2θ~\left\|u\right\|_{L^{2}(R)}^{2}\geq c_{4}\lambda^{2\tilde{\theta}} (7.12)

guarantees (7.11) since 𝒯()=ΣPQ()2κ\left\|\mathcal{T}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}=\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\leq 2\kappa and

c1:=sup(P,Q)𝒫𝒯θuL2(R)<,c_{1}:=\sup_{(P,Q)\in\mathcal{P}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}<\infty,

where c4=4c12C3(2κ)2max(θξ,0)3B3.c_{4}=\frac{4c_{1}^{2}C_{3}(2\kappa)^{2\max(\theta-\xi,0)}}{3B_{3}}. ()(*) follows when

uL2(R)23γc2\left\|u\right\|_{L^{2}(R)}^{2}\geq\frac{3\gamma}{c_{2}} (7.13)

and

uL2(R)26C~γ1c2,\left\|u\right\|_{L^{2}(R)}^{2}\geq\frac{6\tilde{C}\gamma_{1}}{c_{2}}, (7.14)

and ()({\dagger}) follows from (Sriperumbudur and Sterge, 2022, Lemma B.2(ii)), under the condition that

140κslog64κsδλΣPQ().\frac{140\kappa}{s}\log\frac{64\kappa s}{\delta}\leq\lambda\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}. (7.15)

When C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, ()({\dagger}) follows from Lemma A.17 by replacing (7.15) with

136C2𝒩1(λ)log32𝒩1(λ)δs,λΣPQ().136C^{2}\mathcal{N}_{1}(\lambda)\log\frac{32\mathcal{N}_{1}(\lambda)}{\delta}\leq s,\qquad\lambda\leq\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}. (7.16)

Below, we will show that (7.13)–(7.16) are satisfied. Using u2L2(R)ΔN,M\left\|u\right\|^{2}_{L^{2}(R)}\geq\Delta_{N,M}, it is easy to see that (7.12) is implied when λ=(c41ΔN,M)1/2θ~\lambda=(c_{4}^{-1}\Delta_{N,M})^{1/2\tilde{\theta}}. Using (n+m)=(1d1)N+(1d2)M(1d2)(N+M)(n+m)=(1-d_{1})N+(1-d_{2})M\geq(1-d_{2})(N+M), where in the last inequality we used d2d1d_{2}\geq d_{1} since s=d1N=d2Ms=d_{1}N=d_{2}M and MNM\leq N, and applying Lemma A.13, we can verify that (7.13) is implied if ΔN,Mr1𝒩2(λ)α(N+M)\Delta_{N,M}\geq\frac{r_{1}\mathcal{N}_{2}(\lambda)}{\sqrt{\alpha}(N+M)} for some constant r1>0r_{1}>0. It can be also verified that (7.14) is implied if ΔN,Mr2Cλδ2(N+M)2\Delta_{N,M}\geq\frac{r_{2}C_{\lambda}}{\delta^{2}(N+M)^{2}} and ΔN,Mr3𝒩2(λ)δ(N+M)\Delta_{N,M}\geq\frac{r_{3}\mathcal{N}_{2}(\lambda)}{\delta(N+M)} for some constants r2,r3>0r_{2},r_{3}>0. Using s=d1Nd1(N+M)2s=d_{1}N\geq\frac{d_{1}(N+M)}{2}, (N+M)32κd1δ(N+M)\geq\frac{32\kappa d_{1}}{\delta} and ΣPQ()(c41ΔN,M)12θ~\left\|\Sigma_{PQ}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\geq(c_{4}^{-1}\Delta_{N,M})^{\frac{1}{2\tilde{\theta}}}, it can be seen that (7.15) is implied when ΔN,Mr4(280κd1)2θ~(N+Mlog(N+M))2θ~\Delta_{N,M}\geq r_{4}(\frac{280\kappa}{d_{1}})^{2\tilde{\theta}}\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}} for some constant r4>0r_{4}>0. On the other hand, when C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, using sd1(N+M)2s\geq\frac{d_{1}(N+M)}{2}, (N+M)max{32δ,ed1/272C2}(N+M)\geq\max\{\frac{32}{\delta},e^{d_{1}/272C^{2}}\}, it can be verified that (7.16) is implied when 𝒩1(λ)d1(N+M)544Clog(N+M).\mathcal{N}_{1}(\lambda)\leq\frac{d_{1}(N+M)}{544C\log(N+M)}.

7.8 Proof of Corollary 4.4

When λiiβ\lambda_{i}\lesssim i^{-\beta}, it follows from (Sriperumbudur and Sterge 2022, Lemma B.9) that

𝒩2(λ)ΣPQ,λ1/2ΣPQΣPQ,λ1/2()1/2𝒩1/21(λ)λ1/2β.\mathcal{N}_{2}(\lambda)\leq\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{1/2}\mathcal{N}^{1/2}_{1}(\lambda)\lesssim\lambda^{-1/2\beta}.

Using this bound in the conditions mentioned in Theorem 4.3, ensures that these conditions on the separation boundary hold if

ΔN,M\displaystyle\Delta_{N,M} max{(N+Mα1/2+δ1)4θ~β4θ~β+1,(δ(N+M))8θ~β4θ~β+2β+1,d42θ~(N+Mlog(N+M))2θ~},\displaystyle\gtrsim\max\left\{\left(\frac{N+M}{\alpha^{-1/2}+\delta^{-1}}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},(\delta(N+M))^{-\frac{8\tilde{\theta}\beta}{4\tilde{\theta}\beta+2\beta+1}},d_{4}^{2\tilde{\theta}}\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

where d4>0d_{4}>0 is a constant. The above condition is implied if

ΔN,M={c(α,δ,θ)(N+M)4θ~β4θ~β+1,θ~>1214βc(α,δ,θ)(N+Mlog(N+M))2θ~,θ~1214β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{2}-\frac{1}{4\beta}\\ c(\alpha,\delta,\theta)\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}-\frac{1}{4\beta}\end{array}\right.,

where c(α,δ,θ)(α1/2+δ2+d42θ~)c(\alpha,\delta,\theta)\gtrsim(\alpha^{-1/2}+\delta^{-2}+d_{4}^{2\tilde{\theta}}) and we used that θ~>1214β4θ~β4θ~β+1<min{2θ~,8θ~β4θ~β+2β+1}\tilde{\theta}>\frac{1}{2}-\frac{1}{4\beta}\Rightarrow\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}<\min\left\{2\tilde{\theta},\frac{8\tilde{\theta}\beta}{4\tilde{\theta}\beta+2\beta+1}\right\}, θ~1214β2θ~min{4θ~β4θ~β+1,8θ~β4θ~β+2β+1},\tilde{\theta}\leq\frac{1}{2}-\frac{1}{4\beta}\Rightarrow 2\tilde{\theta}\leq\min\left\{\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1},\frac{8\tilde{\theta}\beta}{4\tilde{\theta}\beta+2\beta+1}\right\}, and that xa(xlogx)bx^{-a}\geq\left(\frac{x}{\log x}\right)^{-b}, when a<ba<b and xx is large enough, for any a,b,xa,b,x\in\mathbb{R}.

On the other hand when C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, we obtain the corresponding condition as

ΔN,M\displaystyle\Delta_{N,M} max{(N+Mα1/2+δ1)4θ~β4θ~β+1,(δ(N+M))4θ~β2θ~β+1,d52θ~β(N+Mlog(N+M))2θ~β},\displaystyle\gtrsim\max\left\{\left(\frac{N+M}{\alpha^{-1/2}+\delta^{-1}}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},(\delta(N+M))^{-\frac{4\tilde{\theta}\beta}{2\tilde{\theta}\beta+1}},d_{5}^{2\tilde{\theta}\beta}\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}\beta}\right\},

for some constant d5>0d_{5}>0, which in turn is implied for N+MN+M large enough, if

ΔN,M={c(α,δ,θ,β)(N+M)4θ~β4θ~β+1,θ~>14βc(α,δ,θ,β)(N+Mlog(N+M))2θ~β,θ~14β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta,\beta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{4\beta}\\ c(\alpha,\delta,\theta,\beta)\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}\beta},&\ \ \tilde{\theta}\leq\frac{1}{4\beta}\end{array}\right.,

where c(α,δ,θ,β)(α1/2+δ2+d52θ~β).c(\alpha,\delta,\theta,\beta)\gtrsim(\alpha^{-1/2}+\delta^{-2}+d_{5}^{2\tilde{\theta}\beta}).

7.9 Proof of Corollary 4.5

When λieτi\lambda_{i}\lesssim e^{-\tau i}, it follows from (Sriperumbudur and Sterge 2022, Lemma B.9) that

𝒩2(λ)ΣPQ,λ1/2ΣPQΣPQ,λ1/2()1/2𝒩1/21(λ)log1λ.\mathcal{N}_{2}(\lambda)\leq\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{1/2}\mathcal{N}^{1/2}_{1}(\lambda)\lesssim\sqrt{\log\frac{1}{\lambda}}.

Thus substituting this in the conditions from Theorem 2 and assuming that

(N+M)max{e2,α1/2+δ1},(N+M)\geq\max\{e^{2},\alpha^{-1/2}+\delta^{-1}\},

we can write the separation boundary as

ΔN,M\displaystyle\Delta_{N,M} max{(2θ~(α1/2+δ1)1(N+M)log(N+M))1,\displaystyle\gtrsim\max\left\{\left(\frac{\sqrt{2\tilde{\theta}}(\alpha^{-1/2}+\delta^{-1})^{-1}(N+M)}{\sqrt{\log(N+M)}}\right)^{-1},\right.
(δ(N+M)log(N+M))4θ~2θ~+1,(d41(N+M)log(N+M))2θ~},\displaystyle\qquad\qquad\left.\left(\frac{\delta(N+M)}{\sqrt{\log(N+M)}}\right)^{-\frac{4\tilde{\theta}}{2\tilde{\theta}+1}},\left(\frac{d_{4}^{-1}(N+M)}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

which is implied if

ΔN,M={c(α,δ,θ)log(N+M)N+M,θ~>12c(α,δ,θ)(log(N+M)N+M)2θ~,θ~12\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},&\ \ \tilde{\theta}>\frac{1}{2}\\ c(\alpha,\delta,\theta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}\end{array}\right.

for large enough N+MN+M, where c(α,δ,θ)max{12θ~,1}(α1/2+δ2+d42θ~)c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}(\alpha^{-1/2}+\delta^{-2}+d_{4}^{2\tilde{\theta}}) and we used that θ~>12\tilde{\theta}>\frac{1}{2} implies 1min{2θ~,4θ~2θ~+1}1\leq\min\left\{2\tilde{\theta},\frac{4\tilde{\theta}}{2\tilde{\theta}+1}\right\} and that θ~12\tilde{\theta}\leq\frac{1}{2} implies 2θ~min{1,4θ~2θ~+1}.2\tilde{\theta}\leq\min\left\{1,\frac{4\tilde{\theta}}{2\tilde{\theta}+1}\right\}.

On the other hand when C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, we obtain

ΔN,M\displaystyle\Delta_{N,M} max{12θ~((α1/2+δ1)1(N+M)log(N+M))1,\displaystyle\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}}\left(\frac{(\alpha^{-1/2}+\delta^{-1})^{-1}(N+M)}{\sqrt{\log(N+M)}}\right)^{-1},\right.
12θ~(δ(N+M)log(N+M))2,e2θ~d5(N+M)log(N+M)}\displaystyle\qquad\qquad\left.\frac{1}{2\tilde{\theta}}\left(\frac{\delta(N+M)}{\sqrt{\log(N+M)}}\right)^{-2},e^{\frac{-2\tilde{\theta}d_{5}(N+M)}{\log(N+M)}}\right\}

for some constant d5>0d_{5}>0. We can deduce that the condition is reduced to

ΔN,M=c(α,δ,θ)log(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}(α1/2+δ2)c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}(\alpha^{-1/2}+\delta^{-2}), and we used eaxlogx(axlogx)1e^{\frac{-ax}{\log x}}\leq(ax\sqrt{\log x})^{-1} when xx is large enough, for a,x>0a,x>0.

7.10 Proof of Theorem 4.6

Under H0H_{0}, we have η^λ(X,Y,Z)=dη^λ(Xπ,Yπ,Z)\hat{\eta}_{\lambda}(X,Y,Z)\stackrel{{\scriptstyle d}}{{=}}\hat{\eta}_{\lambda}(X^{\pi},Y^{\pi},Z) for any πΠn+m\pi\in\Pi_{n+m}, i.e., η^λ=dη^πλ.\hat{\eta}_{\lambda}\stackrel{{\scriptstyle d}}{{=}}\hat{\eta}^{\pi}_{\lambda}. Thus, given samples (Xi)i=1n(X_{i})_{i=1}^{n}, (Yj)j=1m(Y_{j})_{j=1}^{m} and (Zi)i=1s(Z_{i})_{i=1}^{s}, we have

1α1DπΠn+m𝟙(η^πλq1αλ).\displaystyle 1-\alpha\leq\frac{1}{D}\sum_{\pi\in\Pi_{n+m}}\mathds{1}(\hat{\eta}^{\pi}_{\lambda}\leq q_{1-\alpha}^{\lambda}).

Taking expectations on both sides of the above inequality with respect to the samples yields

1α\displaystyle 1-\alpha 1DπΠn+m𝔼𝟙(η^πλq1αλ)=𝔼𝟙(η^λq1αλ)=PH0{η^λq1αλ}.\displaystyle\leq\frac{1}{D}\sum_{\pi\in\Pi_{n+m}}\mathbb{E}\mathds{1}(\hat{\eta}^{\pi}_{\lambda}\leq q_{1-\alpha}^{\lambda})=\mathbb{E}\mathds{1}(\hat{\eta}_{\lambda}\leq q_{1-\alpha}^{\lambda})=P_{H_{0}}\{\hat{\eta}_{\lambda}\leq q_{1-\alpha}^{\lambda}\}.

Therefore,

PH0{η^λq^1wαB,λ}\displaystyle P_{H_{0}}\{\hat{\eta}_{\lambda}\leq\hat{q}_{1-w\alpha}^{B,\lambda}\} PH0{{η^λq1wαα~λ}{q^1wαB,λq1wαα~λ}}\displaystyle\geq P_{H_{0}}\left\{\{\hat{\eta}_{\lambda}\leq q_{1-w\alpha-\tilde{\alpha}}^{\lambda}\}\cap\{\hat{q}_{1-w\alpha}^{B,\lambda}\geq q_{1-w\alpha-\tilde{\alpha}}^{\lambda}\}\right\}
1PH0{η^λq1wαα~λ}PH0{q^1wαB,λq1wαα~λ}\displaystyle\geq 1-P_{H_{0}}\{\hat{\eta}_{\lambda}\geq q_{1-w\alpha-\tilde{\alpha}}^{\lambda}\}-P_{H_{0}}\{\hat{q}_{1-w\alpha}^{B,\lambda}\leq q_{1-w\alpha-\tilde{\alpha}}^{\lambda}\}
()1wαα~(1ww~)α,\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}1-w\alpha-\tilde{\alpha}-(1-w-\tilde{w})\alpha,

where we applied Lemma A.14 in ()(*), and the result follows by choosing α~=w~α\tilde{\alpha}=\tilde{w}\alpha.

7.11 Proof of Theorem 4.7

First, we show that for any (P,Q)𝒫(P,Q)\in\mathcal{P}, the following holds under the conditions of Theorem 4.7:

PH1{η^λq1αλ}14δ.P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq q_{1-\alpha}^{\lambda}\right\}\geq 1-4\delta. (7.17)

To this end, let =Σ^PQ,λ1/2ΣPQ,λ1/2\mathcal{M}=\hat{\Sigma}_{PQ,\lambda}^{-1/2}\Sigma_{PQ,\lambda}^{1/2} ,

γ1=1δ(CλuL2(R)+𝒩2(λ)n+m+Cλ1/4uL2(R)3/2+uL2(R)n+m),\gamma_{1}=\frac{1}{\sqrt{\delta}}\left(\frac{\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}+\mathcal{N}_{2}(\lambda)}{n+m}+\frac{C_{\lambda}^{1/4}\left\|u\right\|_{L^{2}(R)}^{3/2}+\left\|u\right\|_{L^{2}(R)}}{\sqrt{n+m}}\right),

where CλC_{\lambda} as defined in Lemma A.9. Then Lemma A.12 implies

C~()2γ1Var(η^λ|(Zi)i=1s)δ\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{1}\geq\sqrt{\frac{\text{Var}(\hat{\eta}_{\lambda}|(Z_{i})_{i=1}^{s})}{\delta}}

for some constant C~>0\tilde{C}>0. By Lemma A.1, if

PH1{q1αλζC~()2γ1}2δ,P_{H_{1}}\left\{q_{1-\alpha}^{\lambda}\geq\zeta-\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{1}\right\}\leq 2\delta, (7.18)

then we obtain (7.17). Therefore, it remains to verify (7.18) which we do below. Define

γ\displaystyle\gamma =()2log1αδ(n+m)(CλuL2(R)+𝒩2(λ)+Cλ1/4u3/2L2(R)+uL2(R))\displaystyle=\frac{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)}\left(\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}+\mathcal{N}_{2}(\lambda)+C_{\lambda}^{1/4}\left\|u\right\|^{3/2}_{L^{2}(R)}+\left\|u\right\|_{L^{2}(R)}\right)
+ζlog1αδ(n+m),\displaystyle\qquad\qquad+\frac{\zeta\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)},

and

γ2=log1αδ(n+m)(CλuL2(R)+𝒩2(λ)+Cλ1/4u3/2L2(R)+uL2(R)).\gamma_{2}=\frac{\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)}\left(\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}+\mathcal{N}_{2}(\lambda)+C_{\lambda}^{1/4}\left\|u\right\|^{3/2}_{L^{2}(R)}+\left\|u\right\|_{L^{2}(R)}\right).

Thus we have

γ=()2γ2+ζlog1αδ(n+m).\gamma=\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\gamma_{2}+\frac{\zeta\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)}.

Then it follows from Lemma A.15 that there exists a constant C5>0C_{5}>0 such that

PH1{q1αλC5γ}δ.\displaystyle P_{H_{1}}\{q_{1-\alpha}^{\lambda}\geq C_{5}\gamma\}\leq\delta.

Let c2=B3C4(C1+C2)1c_{2}=B_{3}C_{4}(C_{1}+C_{2})^{-1}. Then we have

PH1{q1αλζ()2C~γ1}\displaystyle P_{H_{1}}\{q_{1-\alpha}^{\lambda}\leq\zeta-\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\tilde{C}\gamma_{1}\}
PH1{{C5γζ()2C~γ1}{q1αλC5γ}}\displaystyle\geq P_{H_{1}}\left\{\{C_{5}\gamma\leq\zeta-\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\tilde{C}\gamma_{1}\}\cap\{q_{1-\alpha}^{\lambda}\leq C_{5}\gamma\}\right\}
=PH1{{()2C5γ2+C5ζlog1αδ(n+m)ζ()2C~γ1}\displaystyle=P_{H_{1}}\left\{\left\{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}C_{5}\gamma_{2}+\frac{C_{5}\zeta\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)}\leq\zeta-\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\tilde{C}\gamma_{1}\right\}\right.
{q1αλC5γ}}\displaystyle\qquad\qquad\qquad\left.\cap\{q_{1-\alpha}^{\lambda}\leq C_{5}\gamma\}\right\}
1PH1{()2(C~γ1+C5γ2)ζ(1C5log1αδ(n+m))}\displaystyle\geq 1-P_{H_{1}}\left\{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}(\tilde{C}\gamma_{1}+C_{5}\gamma_{2})\geq\zeta\left(1-\frac{C_{5}\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)}\right)\right\}
PH1{q1αλC5γ}\displaystyle\qquad\qquad\qquad-P_{H_{1}}\{q_{1-\alpha}^{\lambda}\geq C_{5}\gamma\}
()1PH1{2()12()(C~γ1+C5γ2)c2uL2(R)212}δ\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}1-P_{H_{1}}\left\{\frac{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}(\tilde{C}\gamma_{1}+C_{5}\gamma_{2})}{c_{2}\left\|u\right\|_{L^{2}(R)}^{2}}\geq\frac{1}{2}\right\}-\delta
=PH1{2()12()(C~γ1+C5γ2)c2uL2(R)212}δ\displaystyle=P_{H_{1}}\left\{\frac{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}(\tilde{C}\gamma_{1}+C_{5}\gamma_{2})}{c_{2}\left\|u\right\|_{L^{2}(R)}^{2}}\leq\frac{1}{2}\right\}-\delta
()PH1{2()12()3}δ\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{\geq}}P_{H_{1}}\left\{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\leq 3\right\}-\delta
PH1{{12()32}{2()2}}δ\displaystyle\geq P_{H_{1}}\left\{\left\{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\leq\frac{3}{2}\right\}\cap\left\{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\leq 2\right\}\right\}-\delta
1PH1{12()32}PH1{2()2}δ\displaystyle\geq 1-P_{H_{1}}\left\{\left\|\mathcal{M}^{-1}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\geq\frac{3}{2}\right\}-P_{H_{1}}\left\{\left\|\mathcal{M}\right\|^{2}_{\mathcal{L}^{\infty}(\mathscr{H})}\geq 2\right\}-\delta
()12δ,\displaystyle\stackrel{{\scriptstyle({\ddagger})}}{{\geq}}1-2\delta,

where in ()(*) we assume (n+m)2C5log2αδ(n+m)\geq\frac{2C_{5}\log\frac{2}{\alpha}}{\sqrt{\delta}}, then it follows by using

ζc212()uL2(R)2,\zeta\geq c_{2}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|u\right\|_{L^{2}(R)}^{2},

which is obtained by combining Lemma A.11 with Lemma A.7 under the assumptions of uRan(𝒯θ)u\in\text{\text{Ran}}(\mathcal{T}^{\theta}), and (7.11). Note that uRan(𝒯θ)u\in\text{\text{Ran}}(\mathcal{T}^{\theta}) is guaranteed since (P,Q)𝒫(P,Q)\in\mathcal{P} and (7.12) guarantees (7.11) as discussed in the proof of Theorem 4.7. ()({\dagger}) follows when

uL2(R)26(C~γ1+C5γ2)c2.\left\|u\right\|_{L^{2}(R)}^{2}\geq\frac{6(\tilde{C}\gamma_{1}+C_{5}\gamma_{2})}{c_{2}}. (7.19)

()({\ddagger}) follows from (Sriperumbudur and Sterge, 2022, Lemma B.2(ii)), under the condition (7.15). When C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, ()({\ddagger}) follows from Lemma A.17 by replacing (7.15) with (7.16).

As in the proof of Theorem 4.3, it can be shown that (7.11), (7.15) and (7.16) are satisfied under the assumptions made in the statement of Theorem 4.7. It can also be verified that (7.19) is implied if ΔN,Mr1Cλ(log(1/α))2δ2(N+M)2\Delta_{N,M}\geq\frac{r_{1}C_{\lambda}(\log(1/\alpha))^{2}}{\delta^{2}(N+M)^{2}} and ΔN,Mr2𝒩2(λ)log(1/α)δ(N+M)\Delta_{N,M}\geq\frac{r_{2}\mathcal{N}_{2}(\lambda)\log(1/\alpha)}{\delta(N+M)} for some constants r1,r2>0r_{1},r_{2}>0. Finally, we have

PH1{η^λq^1wαB,λ}\displaystyle P_{H_{1}}\{\hat{\eta}_{\lambda}\geq\hat{q}_{1-w\alpha}^{B,\lambda}\} PH1{{η^λq1wα+α~λ}{q^1wαB,λq1wα+α~λ}}\displaystyle\geq P_{H_{1}}\left\{\{\hat{\eta}_{\lambda}\geq q_{1-w\alpha+\tilde{\alpha}}^{\lambda}\}\cap\{\hat{q}_{1-w\alpha}^{B,\lambda}\leq q_{1-w\alpha+\tilde{\alpha}}^{\lambda}\}\right\}
1PH1{η^λq1wα+α~λ}PH1{q^1wαB,λ>q1wα+α~λ}\displaystyle\geq 1-P_{H_{1}}\left\{\hat{\eta}_{\lambda}\leq q_{1-w\alpha+\tilde{\alpha}}^{\lambda}\right\}-P_{H_{1}}\left\{\hat{q}_{1-w\alpha}^{B,\lambda}>q_{1-w\alpha+\tilde{\alpha}}^{\lambda}\right\}
()14δδ=15δ,\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}1-4\delta-\delta=1-5\delta,

where in ()(*) we use (7.17) and Lemma A.14 by setting α~=w~α\tilde{\alpha}=\tilde{w}\alpha, for 0<w~<w<10<\tilde{w}<w<1. Then, the desired result follows by taking infimum over (P,Q)𝒫(P,Q)\in\mathcal{P}.

7.12 Proof of Corollary 4.8

The proof is similar to that of Corollary 4.4. Since λiiβ\lambda_{i}\lesssim i^{-\beta}, we have 𝒩2(λ)λ1/2β.\mathcal{N}_{2}(\lambda)\lesssim\lambda^{-1/2\beta}. By using this bound in the conditions of Theorem 4.7, we that the conditions on ΔN,M\Delta_{N,M} hold if

ΔN,M\displaystyle\Delta_{N,M} max{(δ(N+M)log(1/α))4θ~β4θ~β+1,(δ(N+M)log(1/α))8θ~β4θ~β+2β+1,\displaystyle\gtrsim\max\left\{\left(\frac{\delta(N+M)}{\log(1/\alpha)}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\delta(N+M)}{\log(1/\alpha)}\right)^{-\frac{8\tilde{\theta}\beta}{4\tilde{\theta}\beta+2\beta+1}},\right. (7.20)
(d41(N+M)log(N+M))2θ~},\displaystyle\qquad\qquad\qquad\left.\left(\frac{d_{4}^{-1}(N+M)}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

where d4>0d_{4}>0 is a constant. By exactly using the same arguments as in the proof of Corollary 4.4, it is easy to verify that the above condition on ΔN,M\Delta_{N,M} is implied if

ΔN,M={c(α,δ,θ)(N+M)4θ~β4θ~β+1,θ~>1214βc(α,δ,θ)(N+Mlog(N+M))2θ~,θ~1214β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{2}-\frac{1}{4\beta}\\ c(\alpha,\delta,\theta)\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}-\frac{1}{4\beta}\end{array}\right.,

where c(α,δ,θ)max{δ2(log1/α)2,d42θ~}c(\alpha,\delta,\theta)\gtrsim\max\{\delta^{-2}(\log 1/\alpha)^{2},d_{4}^{2\tilde{\theta}}\}.

On the other hand when C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, we obtain the corresponding condition as

ΔN,M\displaystyle\Delta_{N,M} max{(δ(N+M)log(1/α))4θ~β4θ~β+1,(δ(N+M)log(1/α))4θ~β2θ~β+1,\displaystyle\gtrsim\max\left\{\left(\frac{\delta(N+M)}{\log(1/\alpha)}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\delta(N+M)}{\log(1/\alpha)}\right)^{-\frac{4\tilde{\theta}\beta}{2\tilde{\theta}\beta+1}},\right. (7.21)
d52θ~β(N+Mlog(N+M))2θ~β},\displaystyle\qquad\qquad\qquad\left.d_{5}^{2\tilde{\theta}\beta}\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}\beta}\right\},

for some d5>0d_{5}>0, which in turn is implied for N+MN+M large enough, if

ΔN,M={c(α,δ,θ,β)(N+M)4θ~β4θ~β+1,θ~>14βc(α,δ,θ,β)(N+Mlog(N+M))2θ~β,θ~14β,\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta,\beta)\left(N+M\right)^{\frac{-4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},&\ \ \tilde{\theta}>\frac{1}{4\beta}\\ c(\alpha,\delta,\theta,\beta)\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}\beta},&\ \ \tilde{\theta}\leq\frac{1}{4\beta}\end{array}\right.,

where c(α,δ,θ,β)max{δ2(log1/α)2,d52θ~β}.c(\alpha,\delta,\theta,\beta)\gtrsim\max\{\delta^{-2}(\log 1/\alpha)^{2},d_{5}^{2\tilde{\theta}\beta}\}.

7.13 Proof of Corollary 4.9

The proof is similar to that of Corollary 4.5. When λieτi\lambda_{i}\lesssim e^{-\tau i}, we have 𝒩2(λ)log1λ\mathcal{N}_{2}(\lambda)\lesssim\sqrt{\log\frac{1}{\lambda}}. Thus substituting this in the conditions from Theorem 4.7 and assuming that (N+M)max{e2,δ1(log1/α)}(N+M)\geq\max\{e^{2},\delta^{-1}(\log 1/\alpha)\}, we can write the separation boundary as

ΔN,M\displaystyle\Delta_{N,M} \displaystyle{}\gtrsim{} max{(2θ~δ(N+M)log(1/α)log(N+M))1,\displaystyle\max\left\{\left(\frac{\sqrt{2\tilde{\theta}}\delta(N+M)}{\log(1/\alpha)\sqrt{\log(N+M)}}\right)^{-1},\right. (7.22)
(δ(N+M)log(1/α)1log(N+M))4θ~2θ~+1,(d61(N+M)log(N+M))2θ~},\displaystyle\qquad\left.\left(\frac{\delta(N+M)\log(1/\alpha)^{-1}}{\sqrt{\log(N+M)}}\right)^{-\frac{4\tilde{\theta}}{2\tilde{\theta}+1}},\left(\frac{d_{6}^{-1}(N+M)}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

where d6>0d_{6}>0 is a constant. This condition in turn is implied if

ΔN,M={c(α,δ,θ)log(N+M)N+M,θ~>12c(α,δ,θ)(log(N+M)N+M)2θ~,θ~12\Delta_{N,M}=\left\{\begin{array}[]{ll}c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},&\ \ \tilde{\theta}>\frac{1}{2}\\ c(\alpha,\delta,\theta)\left(\frac{\log(N+M)}{N+M}\right)^{2\tilde{\theta}},&\ \ \tilde{\theta}\leq\frac{1}{2}\end{array}\right.

for large enough N+MN+M, where

c(α,δ,θ)max{12θ~,1}max{δ2(log1/α)2,d52θ~},c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}\max\{\delta^{-2}(\log 1/\alpha)^{2},d_{5}^{2\tilde{\theta}}\},

and we used that θ~12\tilde{\theta}\geq\frac{1}{2} implies that 1min{2θ~,4θ~2θ~+1}1\leq\min\left\{2\tilde{\theta},\frac{4\tilde{\theta}}{2\tilde{\theta}+1}\right\} and that θ~<12\tilde{\theta}<\frac{1}{2} implies that 2θ~min{1,4θ~2θ~+1}.2\tilde{\theta}\leq\min\left\{1,\frac{4\tilde{\theta}}{2\tilde{\theta}+1}\right\}. On the other hand when C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, we obtain

ΔN,M\displaystyle\Delta_{N,M} max{(2θ~δ(N+M)log(1/α)log(N+M))1,\displaystyle\gtrsim\max\left\{\left(\frac{\sqrt{2\tilde{\theta}}\delta(N+M)}{\log(1/\alpha)\sqrt{\log(N+M)}}\right)^{-1},\right.
12θ~(δ(N+M)log(1/α)log(N+M))2,e2θ~d7(N+M)log(N+M)}\displaystyle\qquad\qquad\left.\frac{1}{2\tilde{\theta}}\left(\frac{\delta(N+M)}{\log(1/\alpha)\sqrt{\log(N+M)}}\right)^{-2},e^{\frac{-2\tilde{\theta}d_{7}(N+M)}{\log(N+M)}}\right\}

for some constant d7>0d_{7}>0. We can deduce that the condition is reduced to

ΔN,M=c(α,δ,θ)log(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}(δ2(log1/α)2)c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}(\delta^{-2}(\log 1/\alpha)^{2}), and we used eaxlogx(axlogx)1e^{\frac{-ax}{\log x}}\leq(ax\sqrt{\log x})^{-1} when xx is large enough, for a,x>0a,x>0.

7.14 Proof of Theorem 4.10

The proof follows from Lemma A.16 and Theorem 4.6 by using α|Λ|\frac{\alpha}{|\Lambda|} in the place of α\alpha.

7.15 Proof Theorem 4.11

Using q^1wα|Λ|B,λ\hat{q}_{1-\frac{w\alpha}{|\Lambda|}}^{B,\lambda} as the threshold, the same steps as in the proof of Theorem 4.7 will follow, with the only difference being α\alpha replaced by α|Λ|\frac{\alpha}{|\Lambda|}. Of course, this leads to an extra factor of log|Λ|\log|\Lambda| in the expression of γ2\gamma_{2} in condition (7.19), which will show up in the expression for the separation boundary (i.e., there will a factor of log|Λ|α\log\frac{|\Lambda|}{\alpha} instead of log1α\log\frac{1}{\alpha}). Observe that for all cases of Theorem 4.11, we have |Λ|=1+log2λUλLlog(N+M).|\Lambda|=1+\log_{2}\frac{\lambda_{U}}{\lambda_{L}}\lesssim\log(N+M).

For the case of λiiβ\lambda_{i}\lesssim i^{-\beta}, we can deduce from the proof of Corollary 4.8 (see (7.20)) that when λ=d31/2θ~ΔN,M1/2θ~\lambda=d_{3}^{-1/2\tilde{\theta}}\Delta_{N,M}^{1/2\tilde{\theta}} for some d3>0d_{3}>0, then

PH1{η^λq^1wα|Λ|B,λ}15δ,\displaystyle P_{H_{1}}\left\{\hat{\eta}_{\lambda}\geq\hat{q}_{1-\frac{w\alpha}{|\Lambda|}}^{B,\lambda}\right\}\geq 1-5\delta,

and the condition on the separation boundary becomes

ΔN,M\displaystyle\Delta_{N,M} max{(δ(N+M)log(log(N+M)α))4θ~β4θ~β+1,(δ(N+M)log(log(N+M)α))8θ~β4θ~β+2β+1,\displaystyle\gtrsim\max\left\{\left(\frac{\delta(N+M)}{\log\left(\frac{\log(N+M)}{\alpha}\right)}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{\delta(N+M)}{\log\left(\frac{\log(N+M)}{\alpha}\right)}\right)^{-\frac{8\tilde{\theta}\beta}{4\tilde{\theta}\beta+2\beta+1}},\right.
(d51(N+M)log(N+M))2θ~},\displaystyle\qquad\qquad\qquad\left.\left(\frac{d_{5}^{-1}(N+M)}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

which in turn is implied if

ΔN,M=c(α,δ,θ)max{(N+Mloglog(N+M))4θ~β4θ~β+1,(N+Mlog(N+M))2θ~},\Delta_{N,M}=c(\alpha,\delta,\theta)\max\left\{\left(\frac{N+M}{\log\log(N+M)}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

where c(α,δ,θ)max{δ2(log1/α)2,d42θ~}c(\alpha,\delta,\theta)\gtrsim\max\{\delta^{-2}(\log 1/\alpha)^{2},d_{4}^{2\tilde{\theta}}\} for some d4>0d_{4}>0 and we used that logxαlog1αlogx\log\frac{x}{\alpha}\leq\log\frac{1}{\alpha}\log x for large enough xx\in\mathbb{R}.

Note that the optimal choice of λ\lambda is given by

λ=λ\displaystyle\lambda=\lambda^{*} :=d31/2θ~c(α,δ,θ)1/2θ~max{(N+Mloglog(N+M))2β4θ~β+1,(N+Mlog(N+M))1}.\displaystyle:=d_{3}^{-1/2\tilde{\theta}}c(\alpha,\delta,\theta)^{1/2\tilde{\theta}}\max\left\{\left(\frac{N+M}{\log\log(N+M)}\right)^{-\frac{2\beta}{4\tilde{\theta}\beta+1}},\left(\frac{N+M}{\log(N+M)}\right)^{-1}\right\}.

Observe that the constant term d31/2θ~c(α,δ,θ)1/2θ~d_{3}^{-1/2\tilde{\theta}}c(\alpha,\delta,\theta)^{1/2\tilde{\theta}} can be expressed as r11/2θ~r_{1}^{1/2\tilde{\theta}} for some constant r1r_{1} that depends only on δ\delta and α\alpha. If r11r_{1}\leq 1, we can bound r11/2θ~r_{1}^{1/2\tilde{\theta}} as r11/2θlr11/2θ~r11/2ξr_{1}^{1/2\theta_{l}}\leq r_{1}^{1/2\tilde{\theta}}\leq r_{1}^{1/2\xi}, and as r11/2ξr11/2θ~r11/2θlr_{1}^{1/2\xi}\leq r_{1}^{1/2\tilde{\theta}}\leq r_{1}^{1/2\theta_{l}} when r>1r>1. Therefore, for any θ\theta and β\beta, the optimal lambda can be bounded as r2(N+Mlog(N+M))1λr3(N+Mlog(N+M))24ξ~+1r_{2}\left(\frac{N+M}{\log(N+M)}\right)^{-1}\leq\lambda\leq r_{3}\left(\frac{N+M}{\log(N+M)}\right)^{\frac{-2}{4\tilde{\xi}+1}}, where r2,r3r_{2},r_{3} are constants that depend only on δ\delta, α\alpha, ξ\xi and θl\theta_{l}, and ξ~=max{ξ,1/4}\tilde{\xi}=\max\{\xi,1/4\}.

\blacklozenge Define v:=sup{xΛ:xλ}v^{*}:=\sup\{x\in\Lambda:x\leq\lambda^{*}\}. From the definition of Λ\Lambda, it is easy to see that λLλλU\lambda_{L}\leq\lambda^{*}\leq\lambda_{U} and λ2vλ\frac{\lambda^{*}}{2}\leq v^{*}\leq\lambda^{*}. Thus vΛv^{*}\in\Lambda is an optimal choice of λ\lambda that will yield to the same form of the separation boundary up to constants. Therefore, by Lemma A.16, for any θ\theta and any (P,Q)(P,Q) in 𝒫\mathcal{P}, we have

PH1{λΛη^λq^1wα|Λ|B,λ}15δ.\displaystyle P_{H_{1}}\left\{\bigcup_{\lambda\in\Lambda}\hat{\eta}_{\lambda}\geq\hat{q}_{1-\frac{w\alpha}{|\Lambda|}}^{B,\lambda}\right\}\geq 1-5\delta.

Thus the desired result holds by taking the infimum over (P,Q)𝒫(P,Q)\in\mathcal{P} and θ\theta. \spadesuit

When λiiβ\lambda_{i}\lesssim i^{-\beta} and C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, then using (7.21), the conditions on the separation boundary becomes

ΔN,M\displaystyle\Delta_{N,M} =c(α,δ,θ,β)max{(N+Mloglog(N+M))4θ~β4θ~β+1,(N+Mlog(N+M))2θ~β},\displaystyle=c(\alpha,\delta,\theta,\beta)\max\left\{\left(\frac{N+M}{\log\log(N+M)}\right)^{-\frac{4\tilde{\theta}\beta}{4\tilde{\theta}\beta+1}},\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}\beta}\right\},

where c(α,δ,θ,β)max{δ2(log1/α)2,d52θ~β}c(\alpha,\delta,\theta,\beta)\gtrsim\max\{\delta^{-2}(\log 1/\alpha)^{2},d_{5}^{2\tilde{\theta}\beta}\}. This yields the optimal λ\lambda to be

λ\displaystyle\lambda^{*} :=d31/2θ~c(α,δ,θ,β)1/2θ~max{(N+Mloglog(N+M))2β4θ~β+1,(N+Mlog(N+M))β}.\displaystyle:=d_{3}^{-1/2\tilde{\theta}}c(\alpha,\delta,\theta,\beta)^{1/2\tilde{\theta}}\max\left\{\left(\frac{N+M}{\log\log(N+M)}\right)^{-\frac{2\beta}{4\tilde{\theta}\beta+1}},\left(\frac{N+M}{\log(N+M)}\right)^{-\beta}\right\}.

Using the similar argument as in the previous case, we can deduce that for any θ\theta and β\beta, we have r4(N+Mlog(N+M))βuλr5(N+Mlog(N+M))24ξ~+1r_{4}\left(\frac{N+M}{\log(N+M)}\right)^{-\beta_{u}}\leq\lambda^{*}\leq r_{5}\left(\frac{N+M}{\log(N+M)}\right)^{\frac{-2}{4\tilde{\xi}+1}}, where r4,r5r_{4},r_{5} are constants that depend only on δ\delta, α\alpha, ξ\xi, θl\theta_{l}, and βu\beta_{u}. The claim therefore follows by using the argument mentioned between \blacklozenge and \spadesuit.

For the case λieτi\lambda_{i}\lesssim e^{-\tau i}, τ>0\tau>0, the condition from (7.22) becomes

ΔN,M\displaystyle\Delta_{N,M} =c(α,δ,θ)max{(N+Mlog(N+M)loglog(N+M))1,(N+Mlog(N+M))2θ~},\displaystyle=c(\alpha,\delta,\theta)\max\left\{\left(\frac{N+M}{\sqrt{\log(N+M)}\log\log(N+M)}\right)^{-1},\left(\frac{N+M}{\log(N+M)}\right)^{-2\tilde{\theta}}\right\},

where c(α,δ,θ)max{12θ~,1}max{δ2(log1/α)2,d42θ~}.c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},1\right\}\max\left\{\delta^{-2}(\log 1/\alpha)^{2},d_{4}^{2\tilde{\theta}}\right\}. Thus

λ\displaystyle\lambda^{*} =d31/2θ~c(α,δ,θ)1/2θ~max{(N+Mlog(N+M)(loglog(N+M)))1/2θ~,\displaystyle=d_{3}^{-1/2\tilde{\theta}}c(\alpha,\delta,\theta)^{1/2\tilde{\theta}}\max\left\{\left(\frac{N+M}{\sqrt{\log(N+M)}(\log\log(N+M))}\right)^{-1/2\tilde{\theta}},\right.
(N+Mlog(N+M))1},\displaystyle\qquad\qquad\qquad\left.\left(\frac{N+M}{\log(N+M)}\right)^{-1}\right\},

which can be bounded as r6(N+Mlog(N+M))1λr7(N+Mlog(N+M))1/2ξr_{6}\left(\frac{N+M}{\log(N+M)}\right)^{-1}\leq\lambda\leq r_{7}\left(\frac{N+M}{\log(N+M)}\right)^{-1/2\xi} for any θ\theta, where r6,r7r_{6},r_{7} are constants that depend only on δ\delta, α\alpha, ξ\xi and θl.\theta_{l}. Furthermore when C:=supiϕi<C:=\sup_{i}\left\|\phi_{i}\right\|_{\infty}<\infty, the condition on the separation boundary becomes

ΔN,M=c(α,δ,θ)log(N+M)loglog(N+M)N+M,\Delta_{N,M}=c(\alpha,\delta,\theta)\frac{\sqrt{\log(N+M)}\log\log(N+M)}{N+M},

where c(α,δ,θ)max{12θ~,12θ~,1}δ2(log1/α)2.c(\alpha,\delta,\theta)\gtrsim\max\left\{\sqrt{\frac{1}{2\tilde{\theta}}},\frac{1}{2\tilde{\theta}},1\right\}\delta^{-2}(\log 1/\alpha)^{2}. Thus

λ=d31/2θ~c(α,δ,θ)1/2θ~(N+Mlog(N+M)loglog(N+M))1/2θ~,\lambda^{*}=d_{3}^{-1/2\tilde{\theta}}c(\alpha,\delta,\theta)^{1/2\tilde{\theta}}\left(\frac{N+M}{\sqrt{\log(N+M)}\log\log(N+M)}\right)^{-1/2\tilde{\theta}},

which can be bounded as

r8(N+Mlog(N+M)loglog(N+M))1/2θlλ\displaystyle r_{8}\left(\frac{N+M}{\sqrt{\log(N+M)}\log\log(N+M)}\right)^{-1/2\theta_{l}}\leq\lambda^{*}
r9(N+Mlog(N+M)loglog(N+M))1/2ξ\displaystyle\qquad\qquad\leq r_{9}\left(\frac{N+M}{\sqrt{\log(N+M)}\log\log(N+M)}\right)^{-1/2\xi}

for any θ\theta, where r8,r9r_{8},r_{9} are constants that depend only on δ\delta, α\alpha, ξ\xi and θl.\theta_{l}. The claim, therefore, follows by using the same argument as mentioned in the polynomial decay case.

Acknowledgments

The authors thank the reviewers for their valuable comments and constructive feedback, which helped to significantly improve the paper. OH and BKS are partially supported by National Science Foundation (NSF) CAREER award DMS–1945396. BL is supported by NSF grant DMS–2210775.

References

  • Adams and Fournier (2003) R. A. Adams and J. J. F. Fournier. Sobolev Spaces. Academic Press, 2003.
  • Albert et al. (2022) M. Albert, B. Laurent, A. Marrel, and A. Meynaoui. Adaptive test of independence based on HSIC measures. The Annals of Statistics, 50(2):858 – 879, 2022.
  • Aronszajn (1950) N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., pages 68:337–404, 1950.
  • Balasubramanian et al. (2021) K. Balasubramanian, T. Li, and M. Yuan. On the optimality of kernel-embedding based goodness-of-fit tests. Journal of Machine Learning Research, 22(1):1–45, 2021.
  • Bauer et al. (2007) F. Bauer, S. Pereverzev, and L. Rosasco. On regularization algorithms in learning theory. Journal of Complexity, 23(1):52–72, 2007.
  • Burnashev (1979) M. V. Burnashev. On the minimax detection of an inaccurately known signal in a white Gaussian noise background. Theory of Probability & Its Applications, 24(1):107–119, 1979.
  • Caponnetto and Vito (2007) A. Caponnetto and E. De Vito. Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368, 2007.
  • Cucker and Zhou (2007) F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge, UK, 2007.
  • Dinculeanu (2000) N. Dinculeanu. Vector Integration and Stochastic Integration in Banach Spaces. John-Wiley & Sons, Inc., 2000.
  • Drineas and Mahoney (2005) P. Drineas and M. W. Mahoney. On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175, December 2005.
  • Dvoretzky et al. (1956) A. Dvoretzky, J. Kiefer, and J. Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27(3):642–669, 1956.
  • Engl et al. (1996) H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996.
  • Fasano and Franceschini (1987) G. Fasano and A. Franceschini. A multidimensional version of the Kolmogorov–Smirnov test. Monthly Notices of the Royal Astronomical Society, 225(1):155–170, 03 1987. ISSN 0035-8711.
  • Gönen and Alpaydin (2011) M. Gönen and E. Alpaydin. Multiple kernel learning algorithms. Journal of Machine Learning Research, 12(64):2211–2268, 2011.
  • Gretton et al. (2006) A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola. A kernel method for the two-sample problem. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19, pages 513–520. MIT Press, 2006.
  • Gretton et al. (2009) A. Gretton, K. Fukumizu, Z. Harchaoui, and B. K. Sriperumbudur. A fast, consistent kernel two-sample test. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009.
  • Gretton et al. (2012) A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012.
  • Harchaoui et al. (2007) Z. Harchaoui, F. R. Bach, and E. Moulines. Testing for homogeneity with kernel fisher discriminant analysis. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.
  • Hoeffding (1992) W. Hoeffding. A class of statistics with asymptotically normal distribution. In Breakthroughs in Statistics, pages 308–334, 1992.
  • Ingster (2000) Y Ingster. Adaptive chi-square tests. Journal of Mathematical Sciences, 99:1110–1119, 04 2000.
  • Ingster (1987) Y. I. Ingster. Minimax testing of nonparametric hypotheses on a distribution density in the Lp{L}_{p} metrics. Theory of Probability & Its Applications, 31(2):333–337, 1987.
  • Ingster (1993) Y. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives i, ii, iii. Mathematical Methods of Statistics, 2(2):85–114, 1993.
  • Kim et al. (2022) I. Kim, S. Balakrishnan, and L. Wasserman. Minimax optimality of permutation tests. The Annals of Statistics, 50(1):225–251, 2022.
  • Le Cam (1986) L. Le Cam. Asymptotic Methods In Statistical Decision Theory. Springer, 1986.
  • LeCun et al. (2010) Y. LeCun, C. Cortes, and C. Burges. MNIST handwritten digit database. AT &T Labs, 2010.
  • Lehmann and Romano (2006) E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer Science & Business Media, 2006.
  • Li and Yuan (2019) T. Li and M. Yuan. On the optimality of Gaussian kernel based nonparametric tests against smooth alternatives. 2019. https://arxiv.org/pdf/1909.03302.pdf.
  • Massart (1990) P. Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz Inequality. The Annals of Probability, 18(3):1269–1283, 1990.
  • Mendelson and Neeman (2010) S. Mendelson and J. Neeman. Regularization in kernel learning. The Annals of Statistics, 38(1):526–565, 2010.
  • Minh et al. (2006) H. Q. Minh, P. Niyogi, and Y. Yao. Mercer’s theorem, feature maps, and smoothing. In Gábor Lugosi and Hans Ulrich Simon, editors, Learning Theory, pages 154–168, Berlin, 2006. Springer.
  • Muandet et al. (2017) K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141, 2017.
  • Pesarin and Salmaso (2010) F. Pesarin and L. Salmaso. Permutation Tests for Complex Data: Theory, Applications and Software. John Wiley & Sons, 2010.
  • Puritz et al. (2022) C. Puritz, E. Ness-Cohn, and R. Braun. fasano.franceschini.test: An implementation of a multidimensional ks test in r, 2022.
  • Rahimi and Recht (2008) A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1177–1184. Curran Associates, Inc., 2008.
  • Reed and Simon (1980) M. Reed and B. Simon. Methods of Modern Mathematical Physics: Functional Analysis I. Academic Press, New York, 1980.
  • Romano and Wolf (2005) J. P. Romano and M. Wolf. Exact and approximate stepdown methods for multiple hypothesis testing. Journal of the American Statistical Association, 100(469):94–108, 2005.
  • Schrab et al. (2021) A. Schrab, I. Kim, M. Albert, B. Laurent, B. Guedj, and A. Gretton. MMD aggregated two-sample test. 2021. https://arxiv.org/pdf/2110.15073.pdf.
  • Simon-Gabriel and Schölkopf (2018) C. Simon-Gabriel and B. Schölkopf. Kernel distribution embeddings: Universal kernels, characteristic kernels and kernel metrics on distributions. Journal of Machine Learning Research, 19(44):1–29, 2018.
  • Smola et al. (2007) A. J. Smola, A. Gretton, L. Song, and B. Schölkopf. A Hilbert space embedding for distributions. In Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto, editors, Algorithmic Learning Theory, pages 13–31. Springer-Verlag, Berlin, Germany, 2007.
  • Sriperumbudur (2016) B. K. Sriperumbudur. On the optimal estimation of probability measures in weak and strong topologies. Bernoulli, 22(3):1839 – 1893, 2016.
  • Sriperumbudur and Sterge (2022) B. K. Sriperumbudur and N. Sterge. Approximate kernel PCA using random features: Computational vs. statistical trade-off. The Annals of Statistics, 50(5):2713–2736, 2022.
  • Sriperumbudur et al. (2009) B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, and B. Schölkopf. Kernel choice and classifiability for RKHS embeddings of probability distributions. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1750–1758, Cambridge, MA, 2009. MIT Press.
  • Sriperumbudur et al. (2010) B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, 2010.
  • Sriperumbudur et al. (2011) B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet. Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12:2389–2410, 2011.
  • Steinwart and Christmann (2008) I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York, 2008.
  • Steinwart and Scovel (2012) I. Steinwart and C. Scovel. Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs. Constructive Approximation, 35:363–417, 2012.
  • Steinwart et al. (2006) I. Steinwart, D. R. Hush, and C. Scovel. An explicit description of the reproducing kernel hilbert spaces of gaussian rbf kernels. IEEE Transactions on Information Theory, 52:4635–4643, 2006.
  • Sterge and Sriperumbudur (2022) N. Sterge and B. K. Sriperumbudur. Statistical optimality and computational efficiency of Nyström kernel PCA. Journal of Machine Learning Research, 23(337):1–32, 2022.
  • Szekely and Rizzo (2004) G Szekely and M Rizzo. Testing for equal distributions in high dimension. InterStat, 5, 11 2004.
  • Williams and Seeger (2001) C.K.I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In V. Tresp T. K. Leen, T. G. Diettrich, editor, Advances in Neural Information Processing Systems 13, pages 682–688, Cambridge, MA, 2001. MIT Press.
  • Yang et al. (2017) Y. Yang, M. Pilanci, and M. J. Wainwright. Randomized sketches for kernels: Fast and optimal nonparametric regression. Annals of Statistics, 45(3):991–1023, 2017.

A Technical results

In this section, we collect technical results used to prove the main results of the paper. Unless specified otherwise, the notation used in this section matches that of the main paper.

Lemma A.1.

Let γ\gamma be a function of a random variable YY, and define ζ=𝔼[X|Y]\zeta=\mathbb{E}[X|Y]. Suppose for all δ>0\delta>0,

P{ζγ(Y)+Var(X|Y)δ}1δ.P\left\{\zeta\geq\gamma(Y)+\sqrt{\frac{\emph{Var}(X|Y)}{\delta}}\right\}\geq 1-\delta.

Then

P{Xγ(Y)}12δ.P\{X\geq\gamma(Y)\}\geq 1-2\delta.
Proof.

Define γ1=Var(X|Y)δ\gamma_{1}=\sqrt{\frac{\text{Var}(X|Y)}{\delta}}. Consider

P{Xγ(Y)}\displaystyle P\{X\geq\gamma(Y)\} P{{Xζγ1}{γ(Y)ζγ1}}\displaystyle\geq P\{\{X\geq\zeta-\gamma_{1}\}\ \cap\{\gamma(Y)\leq\zeta-\gamma_{1}\}\}
1P{Xζγ1}P{γ(Y)ζγ1}\displaystyle\geq 1-P\{X\leq\zeta-\gamma_{1}\}-P\{\gamma(Y)\geq\zeta-\gamma_{1}\}
1P{|Xζ|γ1}P{γ(Y)ζγ1}\displaystyle\geq 1-P\{|X-\zeta|\geq\gamma_{1}\}-P\{\gamma(Y)\geq\zeta-\gamma_{1}\}
12δ,\displaystyle\geq 1-2\delta,

where in the last step we invoked Chebyshev’s inequality: P{|Xζ|γ1}δ.P\{|X-\zeta|\geq\gamma_{1}\}\leq\delta.

Lemma A.2.

Define a(x)=gλ1/2(Σ^PQ)(K(,x)μ)a(x)=g_{\lambda}^{1/2}(\hat{\Sigma}_{PQ})(K(\cdot,x)-\mu) where μ\mu is an arbitrary function in \mathscr{H}. Then η^λ\hat{\eta}_{\lambda} defined in (4.6) can be written as

η^λ\displaystyle\hat{\eta}_{\lambda} =\displaystyle{}={} 1n(n1)ija(Xi),a(Xj)+1m(m1)ija(Yi),a(Yj)\displaystyle\frac{1}{n(n-1)}\sum_{i\neq j}{\left\langle a(X_{i}),a(X_{j})\right\rangle}_{\mathscr{H}}+\frac{1}{m(m-1)}\sum_{i\neq j}{\left\langle a(Y_{i}),a(Y_{j})\right\rangle}_{\mathscr{H}}
2nmi,ja(Xi),a(Yj).\displaystyle\qquad\qquad-\frac{2}{nm}\sum_{i,j}{\left\langle a(X_{i}),a(Y_{j})\right\rangle}_{\mathscr{H}}.
Proof.

The proof follows by using a(x)+g1/2λ(Σ^PQ)μa(x)+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu for g1/2λ(Σ^PQ)K(,x)g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})K(\cdot,x) in η^λ\hat{\eta}_{\lambda} as shown below:

η^λ\displaystyle\hat{\eta}_{\lambda} =\displaystyle{}={} 1n(n1)ija(Xi)+g1/2λ(Σ^PQ)μ,a(Xj)+g1/2λ(Σ^PQ)μ\displaystyle\frac{1}{n(n-1)}\sum_{i\neq j}{\left\langle a(X_{i})+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu,a(X_{j})+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu\right\rangle}_{\mathscr{H}}
+1m(m1)ija(Yi)+g1/2λ(Σ^PQ)μ,a(Yj)+g1/2λ(Σ^PQ)μ\displaystyle\,\,+\frac{1}{m(m-1)}\sum_{i\neq j}{\left\langle a(Y_{i})+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu,a(Y_{j})+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu\right\rangle}_{\mathscr{H}}
2nmi,ja(Xi)+g1/2λ(Σ^PQ)μ,a(Yj)+g1/2λ(Σ^PQ)μ,\displaystyle\quad-\frac{2}{nm}\sum_{i,j}{\left\langle a(X_{i})+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu,a(Y_{j})+g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\mu\right\rangle}_{\mathscr{H}},

and noting that all the terms in expansion of the inner product cancel except for the terms of the form a(),a()\left\langle a(\cdot),a(\cdot)\right\rangle_{\mathscr{H}}. ∎

Lemma A.3.

Let (Gi)i=1n(G_{i})_{i=1}^{n} and (Fi)i=1m(F_{i})_{i=1}^{m} be independent sequences of zero-mean \mathscr{H}-valued random elements, and let f be an arbitrary function in \mathscr{H}. Then the following hold.

  1. (i)

    𝔼(i,jGi,Fj)2=i,j𝔼Gi,Fj2;\mathbb{E}\left(\sum_{i,j}\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}\right)^{2}=\sum_{i,j}\mathbb{E}\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}^{2};

  2. (ii)

    𝔼(ijGi,Gj)2=2ij𝔼Gi,Gj2;\mathbb{E}\left(\sum_{i\neq j}\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\right)^{2}=2\sum_{i\neq j}\mathbb{E}\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}^{2};

  3. (iii)

    𝔼(iGi,f)2=i𝔼Gi,f2.\mathbb{E}\left(\sum_{i}\left\langle G_{i},f\right\rangle_{\mathscr{H}}\right)^{2}=\sum_{i}\mathbb{E}\left\langle G_{i},f\right\rangle_{\mathscr{H}}^{2}.

Proof.

(i) can be shown as follows:

𝔼(i,jGi,Fj)2\displaystyle\mathbb{E}\left(\sum_{i,j}\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}\right)^{2}
=\displaystyle{}={} i,j,k,l𝔼[Gi,FjGk,Fl]\displaystyle\sum_{i,j,k,l}\mathbb{E}[\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}\left\langle G_{k},F_{l}\right\rangle_{\mathscr{H}}]
=\displaystyle{}={} i,j𝔼Gi,Fj2+jik𝔼[Gi,FjGk,Fj]\displaystyle\sum_{i,j}\mathbb{E}\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}^{2}+\sum_{j}\sum_{i\neq k}\mathbb{E}\left[\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}\left\langle G_{k},F_{j}\right\rangle_{\mathscr{H}}\right]
+ijl𝔼[Gi,FjGi,Fl]+jlik𝔼[Gi,FjGk,Fl]\displaystyle+\sum_{i}\sum_{j\neq l}\mathbb{E}\left[\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}\left\langle G_{i},F_{l}\right\rangle_{\mathscr{H}}\right]+\sum_{j\neq l}\sum_{i\neq k}\mathbb{E}\left[\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}\left\langle G_{k},F_{l}\right\rangle_{\mathscr{H}}\right]
=\displaystyle{}={} i,j𝔼Gi,Fj2+jik𝔼(Gi),𝔼(FjGk,Fj)\displaystyle\sum_{i,j}\mathbb{E}\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}^{2}+\sum_{j}\sum_{i\neq k}\left\langle\mathbb{E}(G_{i}),\mathbb{E}(F_{j}\left\langle G_{k},F_{j}\right\rangle_{\mathscr{H}})\right\rangle_{\mathscr{H}}
+ijl𝔼(GiGi,Fl),𝔼(Fj)+jlik𝔼(Gi),𝔼(FjGk,Fl)\displaystyle+\sum_{i}\sum_{j\neq l}\left\langle\mathbb{E}(G_{i}\left\langle G_{i},F_{l}\right\rangle_{\mathscr{H}}),\mathbb{E}(F_{j})\right\rangle_{\mathscr{H}}+\sum_{j\neq l}\sum_{i\neq k}\left\langle\mathbb{E}(G_{i}),\mathbb{E}(F_{j}\left\langle G_{k},F_{l}\right\rangle_{\mathscr{H}})\right\rangle_{\mathscr{H}}
=\displaystyle{}={} i,j𝔼Gi,Fj2.\displaystyle\sum_{i,j}\mathbb{E}\left\langle G_{i},F_{j}\right\rangle_{\mathscr{H}}^{2}.

For (ii), we have

𝔼(ijGi,Gj)2=ijkl𝔼[Gi,GjGk,Gl]\displaystyle\mathbb{E}\left(\sum_{i\neq j}\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\right)^{2}=\sum_{i\neq j}\sum_{k\neq l}\mathbb{E}\left[\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\left\langle G_{k},G_{l}\right\rangle_{\mathscr{H}}\right]
=\displaystyle{}={} ij𝔼[Gi,GjGi,Gj]+ij𝔼[Gi,GjGj,Gi]\displaystyle\sum_{i\neq j}\mathbb{E}\left[\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\right]+\sum_{i\neq j}\mathbb{E}\left[\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\left\langle G_{j},G_{i}\right\rangle_{\mathscr{H}}\right]
+{i,j}{k,l}ijkl𝔼[Gi,GjGk,Gl].\displaystyle\qquad\qquad+\sum_{\begin{subarray}{c}\{i,j\}\neq\{k,l\}\\ {i\neq j}\\ {k\neq l}\end{subarray}}\mathbb{E}\left[\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\left\langle G_{k},G_{l}\right\rangle_{\mathscr{H}}\right].

Consider the last term and note that {i,j}{k,l}\{i,j\}\neq\{k,l\} and klk\neq l, implies that either k{i,j}k\notin\{i,j\}, or l{i,j}l\notin\{i,j\}. If k{i,j}k\notin\{i,j\}, then

𝔼[Gi,GjGk,Gl]=𝔼(Gk),𝔼(Gi,GjGl)=0,\mathbb{E}\left[\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}\left\langle G_{k},G_{l}\right\rangle_{\mathscr{H}}\right]=\left\langle\mathbb{E}(G_{k}),\mathbb{E}(\left\langle G_{i},G_{j}\right\rangle_{\mathscr{H}}G_{l})\right\rangle_{\mathscr{H}}=0,

and the same result holds when l{i,j}l\notin\{i,j\}. Therefore we conclude that the third term equals zero and the result follows.
(iii) Note that

𝔼(iGi,f)2\displaystyle\mathbb{E}\left(\sum_{i}\left\langle G_{i},f\right\rangle_{\mathscr{H}}\right)^{2} =\displaystyle{}={} i𝔼Gi,f2+ij𝔼Gi,fGj,f\displaystyle\sum_{i}\mathbb{E}{\left\langle G_{i},f\right\rangle}_{\mathscr{H}}^{2}+\sum_{i\neq j}\mathbb{E}{\left\langle G_{i},f\right\rangle}_{\mathscr{H}}{\left\langle G_{j},f\right\rangle}_{\mathscr{H}}
=\displaystyle{}={} i𝔼Gi,f2+ij𝔼Gi,f𝔼Gj,f\displaystyle\sum_{i}\mathbb{E}{\left\langle G_{i},f\right\rangle}_{\mathscr{H}}^{2}+\sum_{i\neq j}{\left\langle\mathbb{E}G_{i},f\right\rangle}_{\mathscr{H}}{\left\langle\mathbb{E}G_{j},f\right\rangle}_{\mathscr{H}}
=\displaystyle{}={} i𝔼Gi,f2\displaystyle\sum_{i}\mathbb{E}{\left\langle G_{i},f\right\rangle}_{\mathscr{H}}^{2}

and the result follows. ∎

Lemma A.4.

Let (Xi)i=1ni.i.d.Q(X_{i})_{i=1}^{n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}Q with n2n\geq 2. Define

I=1n(n1)ija(Xi),a(Xj),I=\frac{1}{n(n-1)}\sum_{i\neq j}{\left\langle a(X_{i}),a(X_{j})\right\rangle}_{\mathscr{H}},

where a(x)=ΣPQ,λ1/2(K(,x)μ)a(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu) with μ=𝒳K(,x)dQ(x)\mu=\int_{\mathcal{X}}K(\cdot,x)\,dQ(x) and :\mathcal{B}:\mathscr{H}\to\mathscr{H} is a bounded operator. Then the following hold.

  1. (i)

    𝔼a(Xi),a(Xj)2()4ΣPQ,λ1/2ΣQΣPQ,λ1/22()2;\mathbb{E}{\left\langle a(X_{i}),a(X_{j})\right\rangle}_{\mathscr{H}}^{2}\leq\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2};

  2. (ii)

    𝔼(I2)4n2()4ΣPQ,λ1/2ΣQΣPQ,λ1/22()2.\mathbb{E}\left(I^{2}\right)\leq\frac{4}{n^{2}}\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}.

Proof.

For (i), we have

𝔼a(Xi),a(Xj)2\displaystyle\mathbb{E}{\left\langle a(X_{i}),a(X_{j})\right\rangle}_{\mathscr{H}}^{2} =\displaystyle{}={} 𝔼a(Xi)a(Xi),a(Xj)a(Xj)2()\displaystyle\mathbb{E}{\left\langle a(X_{i})\otimes_{\mathscr{H}}a(X_{i}),a(X_{j})\otimes_{\mathscr{H}}a(X_{j})\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
=\displaystyle{}={} 𝔼[a(Xi)a(Xi)]2()2\displaystyle\left\|\mathbb{E}\left[a(X_{i})\otimes_{\mathscr{H}}a(X_{i})\right]\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}
=\displaystyle{}={} ΣPQ,λ1/2𝔼[(K(,Xi)μ)(K(,Xi)μ)]ΣPQ,λ1/222()\displaystyle\left\|\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\mathbb{E}[(K(\cdot,X_{i})-\mu)\otimes_{\mathscr{H}}(K(\cdot,X_{i})-\mu)]\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\|^{2}_{\mathcal{L}^{2}(\mathscr{H})}
=\displaystyle{}={} ΣPQ,λ1/2ΣQΣPQ,λ1/222()\displaystyle\left\|\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\|^{2}_{\mathcal{L}^{2}(\mathscr{H})}
\displaystyle{}\leq{} ()2()2ΣPQ,λ1/2ΣQΣPQ,λ1/222()\displaystyle\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\left\|\mathcal{B}^{*}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|^{2}_{\mathcal{L}^{2}(\mathscr{H})}
()\displaystyle{}\stackrel{{\scriptstyle(\ddagger)}}{{\leq}}{} ()4ΣPQ,λ1/2ΣQΣPQ,λ1/222(),\displaystyle\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|^{2}_{\mathcal{L}^{2}(\mathscr{H})},

where we used the fact that ()=()\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}=\left\|\mathcal{B}^{*}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})} in ()(\ddagger). On the other hand, (ii) can be written as

𝔼(I2)\displaystyle\mathbb{E}\left(I^{2}\right) =\displaystyle{}={} 1n2(n1)2𝔼(ija(Xi),a(Xj))2\displaystyle\frac{1}{n^{2}(n-1)^{2}}\mathbb{E}\left(\sum_{i\neq j}{\left\langle a(X_{i}),a(X_{j})\right\rangle}_{\mathscr{H}}\right)^{2}
=()\displaystyle{}\stackrel{{\scriptstyle(*)}}{{=}}{} 2n2(n1)2ij𝔼a(Xi),a(Xj)2\displaystyle\frac{2}{n^{2}(n-1)^{2}}\sum_{i\neq j}\mathbb{E}{\left\langle a(X_{i}),a(X_{j})\right\rangle}_{\mathscr{H}}^{2}
\displaystyle{}\leq{} 4n2()4ΣPQ,λ1/2ΣQΣPQ,λ1/222(),\displaystyle\frac{4}{n^{2}}\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|^{2}_{\mathcal{L}^{2}(\mathscr{H})},

where ()(*) follows from Lemma A.3(ii)

Lemma A.5.

Let (Xi)i=1ni.i.d.Q(X_{i})_{i=1}^{n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}Q. Suppose GG\in\mathscr{H} is an arbitrary function and :\mathcal{B}:\mathscr{H}\to\mathscr{H} is a bounded operator. Define

I=2nni=1a(Xi),ΣPQ,λ1/2(Gμ),I=\frac{2}{n}\sum^{n}_{i=1}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\rangle}_{\mathscr{H}},

where a(x)=ΣPQ,λ1/2(K(,x)μ)a(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\left(K(\cdot,x)-\mu\right) and μ=𝒳K(,x)dQ(x)\mu=\int_{\mathcal{X}}K(\cdot,x)\,dQ(x). Then

  1. (i)
    𝔼a(Xi),ΣPQ,λ1/2(Gμ)2()4ΣPQ,λ1/2ΣQΣPQ,λ1/2()\displaystyle\mathbb{E}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\rangle}_{\mathscr{H}}^{2}\leq\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}
    ×ΣPQ,λ1/2(Gμ)2;\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\times\left\|\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\|_{\mathscr{H}}^{2};
  2. (ii)

    𝔼(I2)4n()4ΣPQ,λ1/2ΣQΣPQ,λ1/2()ΣPQ,λ1/2(Gμ)2.\mathbb{E}\left(I^{2}\right)\leq\frac{4}{n}\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\|_{\mathscr{H}}^{2}.

Proof.

We prove (i) as

𝔼a(Xi),ΣPQ,λ1/2(Gμ)2\displaystyle\mathbb{E}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\rangle}_{\mathscr{H}}^{2}
=\displaystyle{}={} 𝔼a(Xi)a(Xi),ΣPQ,λ1/2(Gμ)(Gμ)ΣPQ,λ1/22()\displaystyle\mathbb{E}{\left\langle a(X_{i})\otimes_{\mathscr{H}}a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\otimes_{\mathscr{H}}(G-\mu)\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
=\displaystyle{}={} ΣPQ,λ1/2ΣQΣPQ,λ1/2,ΣPQ,λ1/2(Gμ)(Gμ)ΣPQ,λ1/22()\displaystyle{\left\langle\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*},\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\otimes_{\mathscr{H}}(G-\mu)\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
=\displaystyle{}={} Tr[ΣPQ,λ1/2ΣQΣPQ,λ1/2ΣPQ,λ1/2(Gμ)(Gμ)ΣPQ,λ1/2]\displaystyle\text{Tr}\left[\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\otimes_{\mathscr{H}}(G-\mu)\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\mathcal{B}\right]
\displaystyle{}\leq{} ΣPQ,λ1/2ΣQΣPQ,λ1/2()()2Tr[ΣPQ,λ1/2(Gμ)(Gμ)ΣPQ,λ1/2]\displaystyle\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{B}^{*}\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\text{Tr}\left[\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\otimes_{\mathscr{H}}(G-\mu)\Sigma_{PQ,\lambda}^{-1/2}\right]
\displaystyle{}\leq{} ΣPQ,λ1/2ΣQΣPQ,λ1/2()()4ΣPQ,λ1/2(Gμ)2.\displaystyle\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\|_{\mathscr{H}}^{2}.

(ii) From Lemma A.3(iii), we have

𝔼(I2)\displaystyle\mathbb{E}\left(I^{2}\right) =\displaystyle{}={} 4n2ni=1𝔼a(Xi),ΣPQ,λ1/2(Gμ)2,\displaystyle\frac{4}{n^{2}}\sum^{n}_{i=1}\mathbb{E}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(G-\mu)\right\rangle}_{\mathscr{H}}^{2},

and the result follows from (i). ∎

Lemma A.6.

Let (Xi)i=1ni.i.d.Q,(Yi)i=1mi.i.d.P(X_{i})_{i=1}^{n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}Q,\ (Y_{i})_{i=1}^{m}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}P and :\mathcal{B}:\mathscr{H}\to\mathscr{H} be a bounded operator. Define

I=2nmi,ja(Xi),b(Yj),I=\frac{2}{nm}\sum_{i,j}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}},

where a(x)=ΣPQ,λ1/2(K(,x)μQ)a(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu_{Q}), b(x)=ΣPQ,λ1/2(K(,x)μP)b(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu_{P}), μQ=𝒳K(,x)dQ(x)\mu_{Q}=\int_{\mathcal{X}}K(\cdot,x)\,dQ(x) and μP=𝒴K(,y)dP(y)\mu_{P}=\int_{\mathcal{Y}}K(\cdot,y)\,dP(y). Then

  1. (i)

    𝔼a(Xi),b(Yj)2()4ΣPQ,λ1/2ΣPQΣPQ,λ1/22()2\mathbb{E}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}^{2}\leq\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2};

  2. (ii)

    𝔼(I2)4nm()4ΣPQ,λ1/2ΣPQΣPQ,λ1/22()2.\mathbb{E}\left(I^{2}\right)\leq\frac{4}{nm}\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}.

Proof.

(i) Define 𝒜:=𝒳(K(,x)μR)(K(,x)μR)u(x)dR(x)\mathcal{A}:=\int_{\mathcal{X}}(K(\cdot,x)-\mu_{R})\otimes_{\mathscr{H}}(K(\cdot,x)-\mu_{R})\ u(x)\,dR(x), where u=dPdR1u=\frac{dP}{dR}-1 with R=P+Q2R=\frac{P+Q}{2}. Then it can be verified that ΣP=ΣPQ+𝒜(μRμP)(μRμP)\Sigma_{P}=\Sigma_{PQ}+\mathcal{A}-(\mu_{R}-\mu_{P})\otimes_{\mathscr{H}}(\mu_{R}-\mu_{P}), ΣQ=ΣPQ𝒜(μQμR)(μQμR)\Sigma_{Q}=\Sigma_{PQ}-\mathcal{A}-(\mu_{Q}-\mu_{R})\otimes_{\mathscr{H}}(\mu_{Q}-\mu_{R}). Thus ΣPΣPQ+𝒜\Sigma_{P}\preccurlyeq\Sigma_{PQ}+\mathcal{A} and ΣQΣPQ𝒜\Sigma_{Q}\preccurlyeq\Sigma_{PQ}-\mathcal{A}. Therefore we have

𝔼a(Xi),b(Yj)2\displaystyle\mathbb{E}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}^{2} =\displaystyle{}={} 𝔼a(Xi)a(Xi),b(Yj)b(Yj)2()\displaystyle\mathbb{E}{\left\langle a(X_{i})\otimes_{\mathscr{H}}a(X_{i}),b(Y_{j})\otimes_{\mathscr{H}}b(Y_{j})\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
=\displaystyle{}={} ΣPQ,λ1/2ΣQΣPQ,λ1/2,ΣPQ,λ1/2ΣPΣPQ,λ1/22()\displaystyle{\left\langle\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*},\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{P}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
\displaystyle{}\leq{} ΣPQ,λ1/2(ΣPQ𝒜)ΣPQ,λ1/2,ΣPQ,λ1/2(ΣPQ+𝒜)ΣPQ,λ1/22()\displaystyle{\left\langle\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\Sigma_{PQ}-\mathcal{A})\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*},\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\Sigma_{PQ}+\mathcal{A})\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
=\displaystyle{}={} ΣPQ,λ1/2ΣPQΣPQ,λ1/22()2ΣPQ,λ1/2𝒜ΣPQ,λ1/22()2\displaystyle\left\|\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}-\left\|\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\mathcal{B}^{*}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}
\displaystyle{}\leq{} ()4ΣPQ,λ1/2ΣPQΣPQ,λ1/22()2.\displaystyle\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}.

(ii) follows by noting that

𝔼(I2)()4n2m2i,j𝔼a(Xi),b(Yj)2,\displaystyle\mathbb{E}\left(I^{2}\right)\stackrel{{\scriptstyle({\dagger})}}{{\leq}}\frac{4}{n^{2}m^{2}}\sum_{i,j}\mathbb{E}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}^{2},

where ()({\dagger}) follows from Lemma A.3(i). ∎

Lemma A.7.

Let u=dPdR1L2(R)u=\frac{dP}{dR}-1\in L^{2}(R) and η=g1/2λ(ΣPQ)(μQμP)2\eta=\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}, where gλg_{\lambda} satisfies (A1)(A_{1})(A4)(A_{4}). Then

η4C1uL2(R)2.\eta\leq 4C_{1}\left\|u\right\|_{L^{2}(R)}^{2}.

Furthermore, if uRan(𝒯θ)u\in\emph{\text{Ran}}(\mathcal{T}^{\theta}), θ>0\theta>0 and

uL2(R)24C33B3𝒯(L2(R))2max(θξ,0)λ2θ~𝒯θuL2(R)2,\left\|u\right\|_{L^{2}(R)}^{2}\geq\frac{4C_{3}}{3B_{3}}\left\|\mathcal{T}\right\|_{\mathcal{L}^{\infty}(L^{2}(R))}^{2\max(\theta-\xi,0)}\lambda^{2\tilde{\theta}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}^{2},

where θ~=min(θ,ξ)\tilde{\theta}=\min(\theta,\xi), then,

ηB3uL2(R)2.\eta\geq B_{3}\left\|u\right\|_{L^{2}(R)}^{2}.
Proof.

Note that

η=g1/2λ(ΣPQ)(μPμQ)2=gλ(ΣPQ)(μPμQ),μPμQ=4gλ(ΣPQ)u,u=4gλ(ΣPQ)u,uL2(R)=()4𝒯gλ(𝒯)u,uL2(R),\begin{split}\eta&=\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})(\mu_{P}-\mu_{Q})\right\|_{\mathscr{H}}^{2}={\left\langle g_{\lambda}(\Sigma_{PQ})(\mu_{P}-\mu_{Q}),\mu_{P}-\mu_{Q}\right\rangle}_{\mathscr{H}}\\ &=4{\left\langle g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}u,\mathfrak{I}^{*}u\right\rangle}_{\mathscr{H}}=4{\left\langle\mathfrak{I}g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}u,u\right\rangle}_{L^{2}(R)}\\ &\stackrel{{\scriptstyle(*)}}{{=}}4{\left\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\right\rangle}_{L^{2}(R)},\end{split}

where we used Lemma A.8(i) in ()(*). The upper bound therefore follows by noting that

η=4𝒯gλ(𝒯)u,uL2(R)4𝒯gλ(𝒯)(L2(R))uL2(R)24C1uL2(R)2.\eta=4{\left\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\right\rangle}_{L^{2}(R)}\leq 4\left\|\mathcal{T}g_{\lambda}(\mathcal{T})\right\|_{\mathcal{L}^{\infty}(L^{2}(R))}\left\|u\right\|_{L^{2}(R)}^{2}\leq 4C_{1}\left\|u\right\|_{L^{2}(R)}^{2}.

For the lower bound, consider

B3η\displaystyle B_{3}\eta =4B3𝒯gλ(𝒯)u,uL2(R)\displaystyle=4B_{3}{\left\langle\mathcal{T}g_{\lambda}(\mathcal{T})u,u\right\rangle}_{L^{2}(R)}
=𝒯gλ(𝒯)uL2(R)2+4B32uL2(R)2𝒯gλ(𝒯)u2B3uL2(R)2.\displaystyle=\left\|\mathcal{T}g_{\lambda}(\mathcal{T})u\right\|_{L^{2}(R)}^{2}+4B_{3}^{2}\left\|u\right\|_{L^{2}(R)}^{2}-\left\|\mathcal{T}g_{\lambda}(\mathcal{T})u-2B_{3}u\right\|_{L^{2}(R)}^{2}.

Since uRan(𝒯θ)u\in\text{Ran}(\mathcal{T}^{\theta}), there exists fL2(R)f\in L^{2}(R) such that u=𝒯θfu=\mathcal{T}^{\theta}f. Therefore, we have

𝒯gλ(𝒯)uL2(R)2=iλi2θ+2gλ2(λi)f,ϕ~iL2(R)2,\left\|\mathcal{T}g_{\lambda}(\mathcal{T})u\right\|_{L^{2}(R)}^{2}=\sum_{i}\lambda_{i}^{2\theta+2}g_{\lambda}^{2}(\lambda_{i}){\left\langle f,\tilde{\phi}_{i}\right\rangle}_{L^{2}(R)}^{2},

and

𝒯gλ(𝒯)u2B3uL2(R)2=iλi2θ(λigλ(λi)2B3)2f,ϕ~iL2(R)2,\left\|\mathcal{T}g_{\lambda}(\mathcal{T})u-2B_{3}u\right\|_{L^{2}(R)}^{2}=\sum_{i}\lambda_{i}^{2\theta}(\lambda_{i}g_{\lambda}(\lambda_{i})-2B_{3})^{2}{\left\langle f,\tilde{\phi}_{i}\right\rangle}_{L^{2}(R)}^{2},

where (λi,ϕ~i)i(\lambda_{i},\tilde{\phi}_{i})_{i} are the eigenvalues and eigenfunctions of 𝒯\mathcal{T}. Using these expressions we have

𝒯gλ(𝒯)uL2(R)2𝒯gλ(𝒯)u2B3uL2(R)2=i4B3λi2θ(λigλ(λi)B3)f,ϕ~iL2(R)2.\displaystyle\left\|\mathcal{T}g_{\lambda}(\mathcal{T})u\right\|_{L^{2}(R)}^{2}-\left\|\mathcal{T}g_{\lambda}(\mathcal{T})u-2B_{3}u\right\|_{L^{2}(R)}^{2}=\sum_{i}4B_{3}\lambda_{i}^{2\theta}(\lambda_{i}g_{\lambda}(\lambda_{i})-B_{3}){\left\langle f,\tilde{\phi}_{i}\right\rangle}_{L^{2}(R)}^{2}.

Thus

B3η\displaystyle B_{3}\eta =4B32uL2(R)2+i4B3λi2θ(λigλ(λi)B3)f,ϕ~iL2(R)2\displaystyle=4B_{3}^{2}\left\|u\right\|_{L^{2}(R)}^{2}+\sum_{i}4B_{3}\lambda_{i}^{2\theta}\left(\lambda_{i}g_{\lambda}(\lambda_{i})-B_{3}\right){\left\langle f,\tilde{\phi}_{i}\right\rangle}_{L^{2}(R)}^{2}
4B32uL2(R)2{i:λigλ(λi)<B3}4B3λi2θ(B3λigλ(λi))f,ϕ~iL2(R)2.\displaystyle\geq 4B_{3}^{2}\left\|u\right\|_{L^{2}(R)}^{2}-\sum_{\{i:\lambda_{i}g_{\lambda}(\lambda_{i})<B_{3}\}}4B_{3}\lambda_{i}^{2\theta}\left(B_{3}-\lambda_{i}g_{\lambda}(\lambda_{i})\right){\left\langle f,\tilde{\phi}_{i}\right\rangle}_{L^{2}(R)}^{2}.

When θξ\theta\leq\xi, by Assumption (A3)(A_{3}), we have

sup{i:λigλ(λi)<B3}λi2θ(B3λigλ(λi))C3λ2θ.\sup_{\{i:\lambda_{i}g_{\lambda}(\lambda_{i})<B_{3}\}}\lambda_{i}^{2\theta}\left(B_{3}-\lambda_{i}g_{\lambda}(\lambda_{i})\right)\leq C_{3}\lambda^{2\theta}.

On the other hand, for θ>ξ\theta>\xi,

sup{i:λigλ(λi)<B3}λi2θ(B3λigλ(λi))\displaystyle\sup_{\{i:\lambda_{i}g_{\lambda}(\lambda_{i})<B_{3}\}}\lambda_{i}^{2\theta}\left(B_{3}-\lambda_{i}g_{\lambda}(\lambda_{i})\right)
sup{i:λigλ(λi)<B3}λi2θ2ξsup{i:λigλ(λi)<B3}λi2ξ(B3λigλ(λi))\displaystyle\leq\sup_{\{i:\lambda_{i}g_{\lambda}(\lambda_{i})<B_{3}\}}\lambda_{i}^{2\theta-2\xi}\sup_{\{i:\lambda_{i}g_{\lambda}(\lambda_{i})<B_{3}\}}\lambda_{i}^{2\xi}\left(B_{3}-\lambda_{i}g_{\lambda}(\lambda_{i})\right)
()C3𝒯2θ2ξ(L2(R))λ2ξ,\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}C_{3}\left\|\mathcal{T}\right\|^{2\theta-2\xi}_{\mathcal{L}^{\infty}(L^{2}(R))}\lambda^{2\xi},

where ()(*) follows by Assumption (A3)(A_{3}). Therefore we can conclude that

η4B3u2L2(R)4C3𝒯(L2(R))2max(θξ,0)λ2θ~𝒯θuL2(R)2()B3u2L2(R),\displaystyle\eta\geq 4B_{3}\left\|u\right\|^{2}_{L^{2}(R)}-4C_{3}\left\|\mathcal{T}\right\|_{\mathcal{L}^{\infty}(L^{2}(R))}^{2\max(\theta-\xi,0)}\lambda^{2\tilde{\theta}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}^{2}\stackrel{{\scriptstyle({\dagger})}}{{\geq}}B_{3}\left\|u\right\|^{2}_{L^{2}(R)},

where we used uL2(R)24C33B3𝒯(L2(R))2max(θξ,0)λ2θ~𝒯θuL2(R)2\left\|u\right\|_{L^{2}(R)}^{2}\geq\frac{4C_{3}}{3B_{3}}\left\|\mathcal{T}\right\|_{\mathcal{L}^{\infty}(L^{2}(R))}^{2\max(\theta-\xi,0)}\lambda^{2\tilde{\theta}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}^{2} in ()({\dagger}). ∎

Lemma A.8.

Let gλg_{\lambda} satisfies (A1)(A_{1})(A4)(A_{4}). Then the following hold.

  1. (i)

    gλ(ΣPQ)=𝒯gλ(𝒯)=gλ(𝒯)𝒯\mathfrak{I}g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}=\mathcal{T}g_{\lambda}(\mathcal{T})=g_{\lambda}(\mathcal{T})\mathcal{T};

  2. (ii)

    g1/2λ(ΣPQ)ΣPQ,λ1/2()(C1+C2)1/2\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})\Sigma_{PQ,\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\leq(C_{1}+C_{2})^{1/2};

  3. (iii)

    g1/2λ(Σ^PQ)Σ^PQ,λ1/2()(C1+C2)1/2\left\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\hat{\Sigma}_{PQ,\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\leq(C_{1}+C_{2})^{1/2};

  4. (iv)

    ΣPQ,λ1/2g1/2λ(ΣPQ)()C41/2\left\|\Sigma_{PQ,\lambda}^{-1/2}g^{-1/2}_{\lambda}(\Sigma_{PQ})\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\leq C_{4}^{-1/2};

  5. (v)

    Σ^PQ,λ1/2g1/2λ(Σ^PQ)()C41/2\left\|\hat{\Sigma}_{PQ,\lambda}^{-1/2}g^{-1/2}_{\lambda}(\hat{\Sigma}_{PQ})\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\leq C_{4}^{-1/2}.

Proof.

Let (λi,ϕi~)i(\lambda_{i},\tilde{\phi_{i}})_{i} be the eigenvalues and eigenfunctions of 𝒯.\mathcal{T}. Since 𝒯=\mathcal{T}=\mathfrak{I}\mathfrak{I}^{*} and ΣPQ=\Sigma_{PQ}=\mathfrak{I}^{*}\mathfrak{I}, we have ϕ~i=λiϕ~i\mathfrak{I}\mathfrak{I}^{*}\tilde{\phi}_{i}=\lambda_{i}\tilde{\phi}_{i} which implies (ϕ~i)=λi(ϕ~i)\mathfrak{I}^{*}\mathfrak{I}\left(\mathfrak{I}^{*}\tilde{\phi}_{i}\right)=\lambda_{i}\left(\mathfrak{I}^{*}\tilde{\phi}_{i}\right), i.e., ΣPQψi=λiψi\Sigma_{PQ}\psi_{i}=\lambda_{i}\psi_{i}, where ψi:=ϕ~i/λi\psi_{i}:=\mathfrak{I}^{*}\tilde{\phi}_{i}/\sqrt{\lambda_{i}}. Note that (ψi)(\psi_{i}), which are the eigenfunctions of ΣPQ\Sigma_{PQ}, form an orthonormal system in \mathscr{H}. Define R:=P+Q2R:=\frac{P+Q}{2}.
(i) Using the above, we have

gλ(ΣPQ)\displaystyle\mathfrak{I}g_{\lambda}(\Sigma_{PQ})\mathfrak{I}^{*}
=igλ(λi)(ϕ~iλiϕ~iλi)+gλ(0)(𝒯i(ϕ~iλiϕ~iλi))\displaystyle=\mathfrak{I}\sum_{i}g_{\lambda}(\lambda_{i})\left(\frac{\mathfrak{I}^{*}\tilde{\phi}_{i}}{\sqrt{\lambda_{i}}}\otimes_{\mathscr{H}}\frac{\mathfrak{I}^{*}\tilde{\phi}_{i}}{\sqrt{\lambda_{i}}}\right)\mathfrak{I}^{*}+g_{\lambda}(0)\left(\mathcal{T}-\mathfrak{I}\sum_{i}\left(\frac{\mathfrak{I}^{*}\tilde{\phi}_{i}}{\sqrt{\lambda_{i}}}\otimes_{\mathscr{H}}\frac{\mathfrak{I}^{*}\tilde{\phi}_{i}}{\sqrt{\lambda_{i}}}\right)\mathfrak{I}^{*}\right)
=iλi1gλ(λi)(ϕ~iL2(R)ϕ~i)+gλ(0)(𝒯iλi1(ϕ~iL2(R)ϕ~i))\displaystyle=\sum_{i}\lambda_{i}^{-1}g_{\lambda}(\lambda_{i})\mathfrak{I}\mathfrak{I}^{*}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)\mathfrak{I}\mathfrak{I}^{*}+g_{\lambda}(0)\left(\mathcal{T}-\sum_{i}\lambda_{i}^{-1}\mathfrak{I}\mathfrak{I}^{*}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)\mathfrak{I}\mathfrak{I}^{*}\right)
=()igλ(λi)(ϕ~iL2(R)ϕ~i)𝒯+gλ(0)(𝑰i(ϕ~iL2(R)ϕ~i))𝒯\displaystyle\stackrel{{\scriptstyle(*)}}{{=}}\sum_{i}g_{\lambda}(\lambda_{i})\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)\mathcal{T}+g_{\lambda}(0)\left(\boldsymbol{I}-\sum_{i}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)\right)\mathcal{T}
=gλ(𝒯)𝒯,\displaystyle=g_{\lambda}(\mathcal{T})\mathcal{T},

where ()(*) follows using 𝒯(ϕ~iL2(R)ϕ~i)=λiϕ~iL2(R)ϕ~i\mathcal{T}(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i})=\lambda_{i}\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}. On the other hand,

gλ(𝒯)𝒯\displaystyle g_{\lambda}(\mathcal{T})\mathcal{T} =()igλ(λi)(ϕ~iL2(R)ϕ~i)jλj(ϕ~jL2(R)ϕ~j)\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{=}}\sum_{i}g_{\lambda}(\lambda_{i})\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)\sum_{j}\lambda_{j}\left(\tilde{\phi}_{j}\otimes_{L^{2}(R)}\tilde{\phi}_{j}\right)
=ijgλ(λi)λjϕ~i,ϕ~jL2(R)(ϕ~iL2(R)ϕ~j)\displaystyle=\sum_{i}\sum_{j}g_{\lambda}(\lambda_{i})\lambda_{j}{\left\langle\tilde{\phi}_{i},\tilde{\phi}_{j}\right\rangle}_{L^{2}(R)}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{j}\right)
=igλ(λi)λi(ϕ~iL2(R)ϕ~i)=igλ(λi)𝒯(ϕ~iL2(R)ϕ~i)\displaystyle=\sum_{i}g_{\lambda}(\lambda_{i})\lambda_{i}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)=\sum_{i}g_{\lambda}(\lambda_{i})\mathcal{T}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)
=𝒯i1gλ(λi)(ϕ~iL2(R)ϕ~i)=𝒯gλ(𝒯),\displaystyle=\mathcal{T}\sum_{i\geq 1}g_{\lambda}(\lambda_{i})\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)=\mathcal{T}g_{\lambda}(\mathcal{T}),

where ()({\dagger}) follows using (Ii(ϕ~iL2(R)ϕ~i))𝒯=0.\left(I-\sum_{i}\left(\tilde{\phi}_{i}\otimes_{L^{2}(R)}\tilde{\phi}_{i}\right)\right)\mathcal{T}=0.

(ii)

g1/2λ(ΣPQ)ΣPQ,λ1/2()=g1/2λ(ΣPQ)ΣPQ,λg1/2λ(ΣPQ)()1/2=supi|gλ(λi)(λi+λ)|1/2()(C1+C2)1/2,\begin{split}\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})\Sigma_{PQ,\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}&=\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})\Sigma_{PQ,\lambda}\ g^{1/2}_{\lambda}(\Sigma_{PQ})\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{1/2}\\ &=\sup_{i}\left|g_{\lambda}(\lambda_{i})(\lambda_{i}+\lambda)\right|^{1/2}\stackrel{{\scriptstyle({\dagger})}}{{\leq}}(C_{1}+C_{2})^{1/2},\end{split}

where ()({\dagger}) follows from Assumptions (A1)(A_{1}) and (A2)(A_{2}).
(iii) The proof is exactly same as that of (ii) but with ΣPQ\Sigma_{PQ} being replaced by Σ^PQ\hat{\Sigma}_{PQ}.
(iv)

ΣPQ,λ1/2g1/2λ(ΣPQ)()=ΣPQ,λ1/2g1λ(ΣPQ)ΣPQ,λ1/2()1/2=supi|(λ+λi)gλ(λi)|1/2|infi(λ+λi)gλ(λi)|1/2()C41/2,\begin{split}&\left\|\Sigma_{PQ,\lambda}^{-1/2}g^{-1/2}_{\lambda}(\Sigma_{PQ})\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\\ &=\left\|\Sigma_{PQ,\lambda}^{-1/2}g^{-1}_{\lambda}(\Sigma_{PQ})\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{1/2}=\sup_{i}\left|(\lambda+\lambda_{i})g_{\lambda}(\lambda_{i})\right|^{-1/2}\\ &\leq\left|\inf_{i}(\lambda+\lambda_{i})g_{\lambda}(\lambda_{i})\right|^{-1/2}\stackrel{{\scriptstyle({\ddagger})}}{{\leq}}C_{4}^{-1/2},\end{split}

where ()({\ddagger}) follows from Assumption (A4)(A_{4}).
(v) The proof is exactly same as that of (iv) but with ΣPQ\Sigma_{PQ} being replaced by Σ^PQ\hat{\Sigma}_{PQ}. ∎

Lemma A.9.

Define u:=dPdR1L2(R)u:=\frac{dP}{dR}-1\in L^{2}(R), 𝒩1(λ):=Tr(ΣPQ,λ1/2ΣPQΣPQ,λ1/2)\mathcal{N}_{1}(\lambda):=\emph{Tr}(\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}), and 𝒩2(λ):=ΣPQ,λ1/2ΣPQΣPQ,λ1/22(),\mathcal{N}_{2}(\lambda):=\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}, where R=P+Q2R=\frac{P+Q}{2}. Then the following hold:

  1. (i)

    ΣPQ,λ1/2ΣVΣPQ,λ1/22()24CλuL2(R)2+2𝒩22(λ);\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{V}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}\leq 4C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+2\mathcal{N}^{2}_{2}(\lambda);

  2. (ii)

    ΣPQ,λ1/2ΣVΣPQ,λ1/2()2CλuL2(R)+1,\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{V}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\leq 2\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}+1,

where VV can be either PP or QQ, and

Cλ={𝒩1(λ)supiϕi2,supiϕi2<2𝒩2(λ)λsupxK(,x)2,otherwise.C_{\lambda}=\left\{\begin{array}[]{ll}\mathcal{N}_{1}(\lambda)\sup_{i}\left\|\phi_{i}\right\|^{2}_{\infty},&\ \ \sup_{i}\left\|\phi_{i}\right\|^{2}_{\infty}<\infty\\ \frac{2\mathcal{N}_{2}(\lambda)}{\lambda}\sup_{x}\left\|K(\cdot,x)\right\|_{\mathscr{H}}^{2},&\ \ \text{otherwise}\end{array}\right..
Proof.

Let (λi,ϕi~)i(\lambda_{i},\tilde{\phi_{i}})_{i} be the eigenvalues and eigenfunctions of 𝒯.\mathcal{T}. Since 𝒯=\mathcal{T}=\mathfrak{I}\mathfrak{I}^{*} and ΣPQ=\Sigma_{PQ}=\mathfrak{I}^{*}\mathfrak{I}, we have ϕ~i=λiϕ~i\mathfrak{I}\mathfrak{I}^{*}\tilde{\phi}_{i}=\lambda_{i}\tilde{\phi}_{i} which implies (ϕ~i)=λi(ϕ~i)\mathfrak{I}^{*}\mathfrak{I}\left(\mathfrak{I}^{*}\tilde{\phi}_{i}\right)=\lambda_{i}\left(\mathfrak{I}^{*}\tilde{\phi}_{i}\right), i.e., ΣPQψi=λiψi\Sigma_{PQ}\psi_{i}=\lambda_{i}\psi_{i}, where ψi:=ϕ~i/λi\psi_{i}:=\mathfrak{I}^{*}\tilde{\phi}_{i}/\sqrt{\lambda_{i}}. Note that (ψi)i(\psi_{i})_{i} form an orthonormal system in \mathscr{H}. Define ϕi=ϕi~λi\phi_{i}=\frac{\mathfrak{I}^{*}\tilde{\phi_{i}}}{\lambda_{i}}, thus ψi=λiϕi.\psi_{i}=\sqrt{\lambda_{i}}\phi_{i}. Let ϕi¯=ϕi𝔼Rϕi\bar{\phi_{i}}=\phi_{i}-\mathbb{E}_{R}\phi_{i}. Then ϕi¯=ϕi=𝒯ϕ~iλi=ϕi~\mathfrak{I}\bar{\phi_{i}}=\mathfrak{I}\phi_{i}=\frac{\mathcal{T}\tilde{\phi}_{i}}{\lambda_{i}}=\tilde{\phi_{i}}.
(i) Define 𝒜:=𝒳(K(,x)μR)(K(,x)μR)u(x)dR(x)\mathcal{A}:=\int_{\mathcal{X}}(K(\cdot,x)-\mu_{R})\otimes_{\mathscr{H}}(K(\cdot,x)-\mu_{R})\ u(x)\,dR(x). Then it can be verified that ΣP=ΣPQ+𝒜(μPμR)(μPμR)\Sigma_{P}=\Sigma_{PQ}+\mathcal{A}-(\mu_{P}-\mu_{R})\otimes_{\mathscr{H}}(\mu_{P}-\mu_{R}), ΣQ=ΣPQ𝒜(μQμR)(μQμR)\Sigma_{Q}=\Sigma_{PQ}-\mathcal{A}-(\mu_{Q}-\mu_{R})\otimes_{\mathscr{H}}(\mu_{Q}-\mu_{R}). Thus ΣPΣPQ+𝒜\Sigma_{P}\preccurlyeq\Sigma_{PQ}+\mathcal{A} and ΣQΣPQ𝒜\Sigma_{Q}\preccurlyeq\Sigma_{PQ}-\mathcal{A}. Therefore we have,

ΣPQ,λ1/2ΣVΣPQ,λ1/22()2\displaystyle\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{V}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2} \displaystyle{}\leq{} 2ΣPQ,λ1/2ΣPQΣPQ,λ1/22()2+2ΣPQ,λ1/2𝒜ΣPQ,λ1/22()2\displaystyle 2\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}+2\left\|\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}
=\displaystyle{}={} 2𝒩22(λ)+2ΣPQ,λ1/2𝒜ΣPQ,λ1/22()2.\displaystyle 2\mathcal{N}^{2}_{2}(\lambda)+2\left\|\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}.

Next we bound the second term in the above inequality in the two cases for CλC_{\lambda}.

Case 1: supiϕi2<\sup_{i}\left\|\phi_{i}\right\|_{\infty}^{2}<\infty. Define a(x)=K(,x)μRa(x)=K(\cdot,x)-\mu_{R}. Then we have,

ΣPQ,λ1/2𝒜ΣPQ,λ1/22()2=Tr(ΣPQ,λ1𝒜ΣPQ,λ1𝒜)\displaystyle\left\|\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}=\text{Tr}\left(\Sigma_{PQ,\lambda}^{-1}\mathcal{A}\Sigma_{PQ,\lambda}^{-1}\mathcal{A}\right)
=𝒳𝒳Tr[ΣPQ,λ1a(x)a(x)ΣPQ,λ1a(y)a(y)]u(x)u(y)dR(y)dR(x)\displaystyle=\int_{\mathcal{X}}\int_{\mathcal{X}}\text{Tr}\left[\Sigma_{PQ,\lambda}^{-1}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1}a(y)\otimes_{\mathscr{H}}a(y)\right]u(x)u(y)\,dR(y)\,dR(x)
=()i,j(λiλi+λ)(λjλj+λ)[𝒳ϕi¯(x)ϕj¯(x)u(x)dR(x)]2\displaystyle\stackrel{{\scriptstyle(*)}}{{=}}\sum_{i,j}\left(\frac{\lambda_{i}}{\lambda_{i}+\lambda}\right)\left(\frac{\lambda_{j}}{\lambda_{j}+\lambda}\right)\left[\int_{\mathcal{X}}\bar{\phi_{i}}(x)\bar{\phi_{j}}(x)u(x)\,dR(x)\right]^{2}
i,jλiλi+λϕj¯,ϕi¯uL2(R)2=iλiλi+λjϕ¯j,ϕi¯uL2(R)2\displaystyle\leq\sum_{i,j}\frac{\lambda_{i}}{\lambda_{i}+\lambda}{\left\langle\bar{\phi_{j}},\bar{\phi_{i}}u\right\rangle}_{L^{2}(R)}^{2}=\sum_{i}\frac{\lambda_{i}}{\lambda_{i}+\lambda}\sum_{j}{\left\langle\mathfrak{I}{\bar{\phi}_{j}},\bar{\phi_{i}}u\right\rangle}_{L^{2}(R)}^{2}
=iλiλi+λjϕj~,ϕi¯uL2(R)2iλiλi+λϕi¯uL2(R)2\displaystyle=\sum_{i}\frac{\lambda_{i}}{\lambda_{i}+\lambda}\sum_{j}{\left\langle\tilde{\phi_{j}},\bar{\phi_{i}}u\right\rangle}_{L^{2}(R)}^{2}\leq\sum_{i}\frac{\lambda_{i}}{\lambda_{i}+\lambda}\left\|\bar{\phi_{i}}u\right\|_{L^{2}(R)}^{2}
2𝒩1(λ)supiϕi2uL2(R)2,\displaystyle\leq 2\mathcal{N}_{1}(\lambda)\sup_{i}\left\|\phi_{i}\right\|_{\infty}^{2}\left\|u\right\|_{L^{2}(R)}^{2},

where ()(*) follows from

Tr[ΣPQ,λ1a(x)a(x)ΣPQ,λ1a(y)a(y)]\displaystyle\text{Tr}\left[\Sigma_{PQ,\lambda}^{-1}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1}a(y)\otimes_{\mathscr{H}}a(y)\right]
=a(y),ΣPQ,λ1a(x)a(x),ΣPQ,λ1a(y)=a(y),ΣPQ,λ1a(x)2\displaystyle={\left\langle a(y),\Sigma_{PQ,\lambda}^{-1}a(x)\right\rangle}_{\mathscr{H}}{\left\langle a(x),\Sigma_{PQ,\lambda}^{-1}a(y)\right\rangle}_{\mathscr{H}}={\left\langle a(y),\Sigma_{PQ,\lambda}^{-1}a(x)\right\rangle}_{\mathscr{H}}^{2}
=i1λi+λa(x),λiϕiλiϕi\displaystyle=\left\langle\sum_{i}\frac{1}{\lambda_{i}+\lambda}{\left\langle a(x),\sqrt{\lambda_{i}}\phi_{i}\right\rangle}_{\mathscr{H}}\sqrt{\lambda_{i}}\phi_{i}\right.
+1λ(a(x)ia(x),λiϕiλiϕi),a(y)2\displaystyle\qquad\qquad\left.+\frac{1}{\lambda}\left(a(x)-\sum_{i}{\left\langle a(x),\sqrt{\lambda_{i}}\phi_{i}\right\rangle}_{\mathscr{H}}\sqrt{\lambda_{i}}\phi_{i}\right),a(y)\right\rangle_{\mathscr{H}}^{2}
=()[iλiλi+λϕi¯(x)ϕi¯(y)+1λ(a(x),a(y)iλiϕi¯(x)ϕi¯(y))]2\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{=}}\left[\sum_{i}\frac{\lambda_{i}}{\lambda_{i}+\lambda}\bar{\phi_{i}}(x)\bar{\phi_{i}}(y)+\frac{1}{\lambda}\left({\left\langle a(x),a(y)\right\rangle}_{\mathscr{H}}-\sum_{i}\lambda_{i}\bar{\phi_{i}}(x)\bar{\phi_{i}}(y)\right)\right]^{2}
=()[iλiλi+λϕi¯(x)ϕi¯(y)]2,\displaystyle\stackrel{{\scriptstyle({\ddagger})}}{{=}}\left[\sum_{i}\frac{\lambda_{i}}{\lambda_{i}+\lambda}\bar{\phi_{i}}(x)\bar{\phi_{i}}(y)\right]^{2},

where ()({\dagger}) follows from K(,x)μR,λiϕi=λiϕi¯(x),{\left\langle K(\cdot,x)-\mu_{R},\sqrt{\lambda_{i}}\phi_{i}\right\rangle}_{\mathscr{H}}=\sqrt{\lambda_{i}}\bar{\phi_{i}}(x), and in ()({\ddagger}) we used

a(x),a(y)=iλiϕi¯(x)ϕi¯(y),{\left\langle a(x),a(y)\right\rangle}_{\mathscr{H}}=\sum_{i}\lambda_{i}\bar{\phi_{i}}(x)\bar{\phi_{i}}(y),

which is proved below. Consider

a(x),a(y)\displaystyle{\left\langle a(x),a(y)\right\rangle}_{\mathscr{H}}
=K(,x)μR,K(,y)μR\displaystyle={\left\langle K(\cdot,x)-\mu_{R},K(\cdot,y)-\mu_{R}\right\rangle}_{\mathscr{H}}
=K(x,y)𝔼RK(x,Y)𝔼RK(X,y)𝔼R×RK(X,Y):=K¯(x,y).\displaystyle=K(x,y)-\mathbb{E}_{R}K(x,Y)-\mathbb{E}_{R}K(X,y)-\mathbb{E}_{R\times R}K(X,Y):=\bar{K}(x,y).

Furthermore, from the definition of 𝒯\mathcal{T}, we can equivalently it write as 𝒯:L2(R)L2(R)\mathcal{T}:L^{2}(R)\to L^{2}(R), fK¯(,x)f(x)dR(x)f\mapsto\int\bar{K}(\cdot,x)f(x)\,dR(x). Thus by Mercer’s theorem (see Steinwart and Scovel 2012, Lemma 2.6), we obtain K¯(x,y)=iλiϕi¯(x)ϕi¯(y)\bar{K}(x,y)=\sum_{i}\lambda_{i}\bar{\phi_{i}}(x)\bar{\phi_{i}}(y).

Case 2: Suppose supiϕi2\sup_{i}\left\|\phi_{i}\right\|_{\infty}^{2} is not finite. From the calculations in Case 1, we have

ΣPQ,λ1/2𝒜ΣPQ,λ1/22()2=𝒳𝒳a(x),ΣPQ,λ1a(y)2u(x)u(y)dR(x)dR(y)\displaystyle\left\|\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}=\int_{\mathcal{X}}\int_{\mathcal{X}}{\left\langle a(x),\Sigma_{PQ,\lambda}^{-1}a(y)\right\rangle}_{\mathscr{H}}^{2}u(x)u(y)\,dR(x)\,dR(y)
=𝒳𝒳ΣPQ,λ1/2a(x)a(x)ΣPQ,λ1/2,ΣPQ,λ1/2a(y)a(y)ΣPQ,λ1/22()\displaystyle=\int_{\mathcal{X}}\int_{\mathcal{X}}{\left\langle\Sigma_{PQ,\lambda}^{-1/2}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1/2},\Sigma_{PQ,\lambda}^{-1/2}a(y)\otimes_{\mathscr{H}}a(y)\Sigma_{PQ,\lambda}^{-1/2}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
×u(x)u(y)dR(x)dR(y)\displaystyle\qquad\qquad\qquad\times u(x)u(y)\,dR(x)\,dR(y)
(𝒳𝒳ΣPQ,λ1/2a(x)a(x)ΣPQ,λ1/2,ΣPQ,λ1/2a(y)a(y)ΣPQ,λ1/22()2\displaystyle\leq\left(\int_{\mathcal{X}}\int_{\mathcal{X}}{\left\langle\Sigma_{PQ,\lambda}^{-1/2}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1/2},\Sigma_{PQ,\lambda}^{-1/2}a(y)\otimes_{\mathscr{H}}a(y)\Sigma_{PQ,\lambda}^{-1/2}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}^{2}\right.
×dR(x)dR(y))1/2(𝒳𝒳u(x)2u(y)2dR(x)dR(y))1/2\displaystyle\qquad\qquad\times\,dR(x)\,dR(y)\Big{)}^{1/2}\left(\int_{\mathcal{X}}\int_{\mathcal{X}}u(x)^{2}u(y)^{2}\,dR(x)dR(y)\right)^{1/2}
()(𝒳𝒳ΣPQ,λ1/2a(x)a(x)ΣPQ,λ1/2,ΣPQ,λ1/2a(y)a(y)ΣPQ,λ1/22()\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}\left(\int_{\mathcal{X}}\int_{\mathcal{X}}{\left\langle\Sigma_{PQ,\lambda}^{-1/2}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1/2},\Sigma_{PQ,\lambda}^{-1/2}a(y)\otimes_{\mathscr{H}}a(y)\Sigma_{PQ,\lambda}^{-1/2}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}\right.
×dR(x)dR(y))1/2\displaystyle\qquad\qquad\qquad\times\,dR(x)\,dR(y)\Big{)}^{1/2}
×supx,yΣPQ,λ1/2a(x)a(x)ΣPQ,λ1/2,ΣPQ,λ1/2a(y)a(y)ΣPQ,λ1/21/22()\displaystyle\qquad\times\sup_{x,y}{\left\langle\Sigma_{PQ,\lambda}^{-1/2}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1/2},\Sigma_{PQ,\lambda}^{-1/2}a(y)\otimes_{\mathscr{H}}a(y)\Sigma_{PQ,\lambda}^{-1/2}\right\rangle}^{1/2}_{\mathcal{L}^{2}(\mathscr{H})}
×(𝒳𝒳u(x)2u(y)2dR(x)dR(y))1/2\displaystyle\qquad\qquad\times\left(\int_{\mathcal{X}}\int_{\mathcal{X}}u(x)^{2}u(y)^{2}\,dR(x)dR(y)\right)^{1/2}
ΣPQ,λ1/2ΣPQΣPQ,λ1/22()ΣPQ,λ1()supxa(x)2uL2(R)2\displaystyle\leq\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}\left\|\Sigma_{PQ,\lambda}^{-1}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\sup_{x}\left\|a(x)\right\|^{2}_{\mathscr{H}}\left\|u\right\|_{L^{2}(R)}^{2}
4𝒩2(λ)λsupxK(,x)2uL2(R)2,\displaystyle\leq\frac{4\mathcal{N}_{2}(\lambda)}{\lambda}\sup_{x}\left\|K(\cdot,x)\right\|_{\mathscr{H}}^{2}\left\|u\right\|_{L^{2}(R)}^{2},

where we used

ΣPQ,λ1/2a(x)a(x)ΣPQ,λ1/2,ΣPQ,λ1/2a(y)a(y)ΣPQ,λ1/22()\displaystyle{\left\langle\Sigma_{PQ,\lambda}^{-1/2}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1/2},\Sigma_{PQ,\lambda}^{-1/2}a(y)\otimes_{\mathscr{H}}a(y)\Sigma_{PQ,\lambda}^{-1/2}\right\rangle}_{\mathcal{L}^{2}(\mathscr{H})}
=Tr[ΣPQ,λ1a(x)a(x)ΣPQ,λ1a(y)a(y)]\displaystyle=\text{Tr}\left[\Sigma_{PQ,\lambda}^{-1}a(x)\otimes_{\mathscr{H}}a(x)\Sigma_{PQ,\lambda}^{-1}a(y)\otimes_{\mathscr{H}}a(y)\right]
=a(y),ΣPQ,λ1a(x)a(x),ΣPQ,λ1a(y)=a(y),ΣPQ,λ1a(x)20\displaystyle={\left\langle a(y),\Sigma_{PQ,\lambda}^{-1}a(x)\right\rangle}_{\mathscr{H}}{\left\langle a(x),\Sigma_{PQ,\lambda}^{-1}a(y)\right\rangle}_{\mathscr{H}}={\left\langle a(y),\Sigma_{PQ,\lambda}^{-1}a(x)\right\rangle}_{\mathscr{H}}^{2}\geq 0

in ()(*).

(ii) Note that

ΣPQ,λ1/2ΣVΣPQ,λ1/2()\displaystyle\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{V}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})} ΣPQ,λ1/2ΣPQΣPQ,λ1/2()+ΣPQ,λ1/2𝒜ΣPQ,λ1/2()\displaystyle\leq\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{PQ}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}+\left\|\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}
1+ΣPQ,λ1/2𝒜ΣPQ,λ1/22().\displaystyle\leq 1+\left\|\Sigma_{PQ,\lambda}^{-1/2}\mathcal{A}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}.

The result therefore follows by using the bounds in part (i)(i). ∎

Lemma A.10.

Let A:HHA:H\rightarrow H and B:HHB:H\rightarrow H be bounded operators on a Hilbert space, HH such that AB1AB^{-1} is bounded. Then B(H)AB11(H)A(H)\|B\|_{\mathcal{L}^{\infty}(H)}\geq\|AB^{-1}\|^{-1}_{\mathcal{L}^{\infty}(H)}\|A\|_{\mathcal{L}^{\infty}(H)}. Also for any fHf\in H,

BfHAB11(H)AfH.\|Bf\|_{H}\geq\|AB^{-1}\|^{-1}_{\mathcal{L}^{\infty}(H)}\|Af\|_{H}.
Proof.

The result follows by noting that

A(H)=AB1B(H)AB1(H)B(H)\|A\|_{\mathcal{L}^{\infty}(H)}=\|AB^{-1}B\|_{\mathcal{L}^{\infty}(H)}\leq\|AB^{-1}\|_{\mathcal{L}^{\infty}(H)}\|B\|_{\mathcal{L}^{\infty}(H)}

and AfH=AB1BfHAB1(H)BfH\|Af\|_{H}=\|AB^{-1}Bf\|_{H}\leq\|AB^{-1}\|_{\mathcal{L}^{\infty}(H)}\|Bf\|_{H}. ∎

Lemma A.11.

Let ζ=g1/2λ(Σ^PQ)(μQμP)2\zeta=\left\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}. Then

ζC4(C1+C2)112()η,\zeta\geq C_{4}(C_{1}+C_{2})^{-1}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\eta,

where η=g1/2λ(ΣPQ)(μQμP)2\eta=\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}, and =Σ^PQ,λ1/2ΣPQ,λ1/2\mathcal{M}=\hat{\Sigma}_{PQ,\lambda}^{-1/2}\Sigma_{PQ,\lambda}^{1/2}.

Proof.

Repeated application of Lemma A.10 in conjunction with assumption (A4)(A_{4}) yields

ζ\displaystyle\zeta Σ^PQ,λ1/2g1/2λ(Σ^PQ)()2Σ^PQ,λ1/2(μQμP)2\displaystyle\geq\left\|\hat{\Sigma}_{PQ,\lambda}^{-1/2}g^{-1/2}_{\lambda}(\hat{\Sigma}_{PQ})\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{-2}\left\|\hat{\Sigma}_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}
C412()ΣPQ,λ1/2(μQμP)2\displaystyle\geq C_{4}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}
C412()g1/2λ(ΣPQ)ΣPQ,λ1/22η\displaystyle\geq C_{4}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|g^{1/2}_{\lambda}(\Sigma_{PQ})\Sigma_{PQ,\lambda}^{1/2}\right\|^{-2}_{\mathscr{H}}\eta
C4(C1+C2)112()η,\displaystyle\geq C_{4}(C_{1}+C_{2})^{-1}\left\|\mathcal{M}^{-1}\right\|^{-2}_{\mathcal{L}^{\infty}(\mathscr{H})}\eta,

where the last inequality follows from Lemma A.8(ii). ∎

Lemma A.12.

Let ζ=g1/2λ(Σ^PQ)(μPμQ)2\zeta=\left\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})(\mu_{P}-\mu_{Q})\right\|_{\mathscr{H}}^{2}, =Σ^PQ,λ1/2ΣPQ,λ1/2\mathcal{M}=\hat{\Sigma}_{PQ,\lambda}^{-1/2}\Sigma_{PQ,\lambda}^{1/2}, and mnDmm\leq n\leq Dm for some constant D1D\geq 1. Then

𝔼[(η^λζ)2|(Zi)i=1s]\displaystyle\mathbb{E}\left[(\hat{\eta}_{\lambda}-\zeta)^{2}|(Z_{i})_{i=1}^{s}\right]
C~()4(CλuL2(R)2+𝒩22(λ)(n+m)2+CλuL2(R)3+uL2(R)2n+m),\displaystyle\quad\leq\tilde{C}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(\frac{C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+\mathcal{N}^{2}_{2}(\lambda)}{(n+m)^{2}}+\frac{\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}^{3}+\left\|u\right\|_{L^{2}(R)}^{2}}{n+m}\right),

where CλC_{\lambda} is defined in Lemma A.9 and C~\tilde{C} is a constant that depends only on C1C_{1}, C2C_{2} and DD. Furthermore, if P=QP=Q, then

𝔼[η^λ2|(Zi)i=1s]6(C1+C2)2()4𝒩22(λ)(1n2+1m2).\displaystyle\mathbb{E}[\hat{\eta}_{\lambda}^{2}|(Z_{i})_{i=1}^{s}]\leq 6(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda)\left(\frac{1}{n^{2}}+\frac{1}{m^{2}}\right).
Proof.

Define a(x)=ΣPQ,λ1/2(K(,x)μP)a(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu_{P}), and b(x)=ΣPQ,λ1/2(K(,x)μQ)b(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu_{Q}), where =g1/2λ(Σ^PQ)ΣPQ,λ1/2\mathcal{B}=g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\Sigma_{PQ,\lambda}^{1/2}. Then we have

η^λ\displaystyle\hat{\eta}_{\lambda} =()1n(n1)ija(Xi),a(Xj)+1m(m1)ija(Yi),a(Yj)\displaystyle\stackrel{{\scriptstyle(*)}}{{=}}\frac{1}{n(n-1)}\sum_{i\neq j}\left\langle a(X_{i}),a(X_{j})\right\rangle_{\mathscr{H}}+\frac{1}{m(m-1)}\sum_{i\neq j}\left\langle a(Y_{i}),a(Y_{j})\right\rangle_{\mathscr{H}}
2nmi,ja(Xi),a(Yj)\displaystyle\qquad-\frac{2}{nm}\sum_{i,j}\left\langle a(X_{i}),a(Y_{j})\right\rangle_{\mathscr{H}}
=()1n(n1)ija(Xi),a(Xj)+1m(m1)ijb(Yi),b(Yj)\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{=}}\frac{1}{n(n-1)}\sum_{i\neq j}\left\langle a(X_{i}),a(X_{j})\right\rangle_{\mathscr{H}}+\frac{1}{m(m-1)}\sum_{i\neq j}{\left\langle b(Y_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}
+2mi=1mb(Yi),ΣPQ,λ1/2(μQμP)+ζ2nmi,ja(Xi),b(Yj)\displaystyle\qquad+\frac{2}{m}\sum_{i=1}^{m}{\left\langle b(Y_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}}+\zeta-\frac{2}{nm}\sum_{i,j}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}
2ni=1na(Xi),ΣPQ,λ1/2(μQμP),\displaystyle\qquad-\frac{2}{n}\sum_{i=1}^{n}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}},

where ()(*) follows from Lemma A.2 and ()({\dagger}) follows by writing a(Y)=b(Y)+ΣPQ,λ1/2(μQμP)a(Y)=b(Y)+\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P}) in the last two terms. Thus we have

η^λζ=1n(n1)ija(Xi),a(Xj)1+1m(m1)ijb(Yi),b(Yj)2+2mi=1mb(Yi),ΣPQ,λ1/2(μQμP)32nmi,ja(Xi),b(Yj)42ni=1na(Xi),ΣPQ,λ1/2(μQμP)5.\hat{\eta}_{\lambda}-\zeta=\underbrace{\frac{1}{n(n-1)}\sum_{i\neq j}\left\langle a(X_{i}),a(X_{j})\right\rangle_{\mathscr{H}}}_{\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}+\underbrace{\frac{1}{m(m-1)}\sum_{i\neq j}{\left\langle b(Y_{i}),b(Y_{j})\right\rangle}_{\mathscr{H}}}_{\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\\ +\underbrace{\frac{2}{m}\sum_{i=1}^{m}{\left\langle b(Y_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}}}_{\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}-\underbrace{\frac{2}{nm}\sum_{i,j}\left\langle a(X_{i}),b(Y_{j})\right\rangle_{\mathscr{H}}}_{\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\\ -\underbrace{\frac{2}{n}\sum_{i=1}^{n}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\rangle}_{\mathscr{H}}}_{\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}.

Furthermore using Lemma A.8(iii),

()\displaystyle\left\|\mathcal{B}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})} =\displaystyle{}={} ()=g1/2λ(Σ^PQ)ΣPQ,λ1/2()\displaystyle\left\|\mathcal{B}^{*}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}=\left\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\Sigma_{PQ,\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}
\displaystyle{}\leq{} g1/2λ(Σ^PQ)Σ^PQ,λ1/2()Σ^PQ,λ1/2ΣPQ,λ1/2()\displaystyle\left\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\hat{\Sigma}_{PQ,\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\left\|\hat{\Sigma}_{PQ,\lambda}^{-1/2}\Sigma_{PQ,\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}
\displaystyle{}\leq{} (C1+C2)1/2().\displaystyle(C_{1}+C_{2})^{1/2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}.

Next we bound the terms 15 using Lemmas A.4, A.5, A.6 and A.9. It follows from Lemmas A.4(ii) and A.9(i) that

𝔼(12|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right) 4n2()4ΣPQ,λ1/2ΣPΣPQ,λ1/22()2\displaystyle\leq\frac{4}{n^{2}}\|\mathcal{B}\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{P}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}
4n2(C1+C2)2()4(4CλuL2(R)2+2𝒩22(λ)),\displaystyle\leq\frac{4}{n^{2}}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(4C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+2\mathcal{N}^{2}_{2}(\lambda)\right),

and

𝔼(22|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right) 4m2()4ΣPQ,λ1/2ΣQΣPQ,λ1/22()2\displaystyle\leq\frac{4}{m^{2}}\|\mathcal{B}\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{2}(\mathscr{H})}^{2}
4m2(C1+C2)2()4(4CλuL2(R)2+2𝒩22(λ)).\displaystyle\leq\frac{4}{m^{2}}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(4C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+2\mathcal{N}^{2}_{2}(\lambda)\right).

Using Lemma A.5(ii) and A.9(ii), we obtain

𝔼(32|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right)
4mΣPQ,λ1/2ΣQΣPQ,λ1/2()()4ΣPQ,λ1/2(μPμQ)2\displaystyle\leq\frac{4}{m}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{Q}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\|\mathcal{B}\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}(\mu_{P}-\mu_{Q})\right\|_{\mathscr{H}}^{2}
4m(C1+C2)2()4(1+2CλuL2(R))ΣPQ,λ1/2(μPμQ)2\displaystyle\leq\frac{4}{m}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(1+2\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}\right)\left\|\Sigma_{PQ,\lambda}^{-1/2}(\mu_{P}-\mu_{Q})\right\|_{\mathscr{H}}^{2}
()16m(C1+C2)2()4(1+2CλuL2(R))uL2(R)2,\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}\frac{16}{m}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(1+2\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}\right)\left\|u\right\|_{L^{2}(R)}^{2},

and

𝔼(52|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right)
4nΣPQ,λ1/2ΣPΣPQ,λ1/2()()4ΣPQ,λ1/2(μQμP)2\displaystyle\leq\frac{4}{n}\left\|\Sigma_{PQ,\lambda}^{-1/2}\Sigma_{P}\Sigma_{PQ,\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}\|\mathcal{B}\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left\|\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}
4n(C1+C2)2()4(1+2CλuL2(R))ΣPQ,λ1/2(μQμP)2\displaystyle\leq\frac{4}{n}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}(1+2\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)})\left\|\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}
()16n(C1+C2)2()4(1+2CλuL2(R))uL2(R)2,\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}\frac{16}{n}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(1+2\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}\right)\left\|u\right\|_{L^{2}(R)}^{2},

where ()(*) follows from using gλ(x)=(x+λ)1g_{\lambda}(x)=(x+\lambda)^{-1} with C1=1C_{1}=1 in Lemma A.7. For term 4, using Lemma A.6 yields,

𝔼(42|(Zi)i=1s)4nm()4𝒩22(λ)4nm(C1+C2)2()4𝒩22(λ).\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right)\leq\frac{4}{nm}\|\mathcal{B}\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda)\leq\frac{4}{nm}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda).

Combining these bounds with the fact that aba2+b2\sqrt{ab}\leq\frac{a}{2}+\frac{b}{2}, and that (i=1kak)2ki=1kak2(\sum_{i=1}^{k}a_{k})^{2}\leq k\sum_{i=1}^{k}a_{k}^{2} for any a,b,aka,b,a_{k}\in\mathbb{R}, kk\in\mathbb{N} yields that

𝔼[(η^λζ)2|(Zi)i=1s]\displaystyle\mathbb{E}[(\hat{\eta}_{\lambda}-\zeta)^{2}|(Z_{i})_{i=1}^{s}]
()4(CλuL2(R)2+𝒩22(λ))(n2+m2)\displaystyle\lesssim\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+\mathcal{N}^{2}_{2}(\lambda)\right)(n^{-2}+m^{-2})
+()4(CλuL2(R)3+uL2(R)2)(n1+m1)\displaystyle\qquad\qquad+\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}^{3}+\left\|u\right\|_{L^{2}(R)}^{2}\right)(n^{-1}+m^{-1})
()()4(CλuL2(R)2+𝒩22(λ)(n+m)2+CλuL2(R)3+uL2(R)2n+m),\displaystyle\stackrel{{\scriptstyle(*)}}{{\lesssim}}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\left(\frac{C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+\mathcal{N}^{2}_{2}(\lambda)}{(n+m)^{2}}+\frac{\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}^{3}+\left\|u\right\|_{L^{2}(R)}^{2}}{n+m}\right),

where ()(*) follows by using Lemma A.13.

When P=QP=Q, and using the same Lemmas as above, we have

𝔼(12|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right) 4n2(C1+C2)2()4𝒩22(λ),\displaystyle\leq\frac{4}{n^{2}}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda),
𝔼(22|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right) 4m2(C1+C2)2()4𝒩22(λ),\displaystyle\leq\frac{4}{m^{2}}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda),
𝔼(42|(Zi)i=1s)\displaystyle\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right) 4nm(C1+C2)2()4𝒩22(λ),\displaystyle\leq\frac{4}{nm}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda),

and 3=5=0\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{5}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}=0. Therefore,

𝔼[η^λ2|(Zi)i=1s]\displaystyle\mathbb{E}[\hat{\eta}_{\lambda}^{2}|(Z_{i})_{i=1}^{s}] =𝔼[(1+2+4)2|(Zi)i=1s]\displaystyle=\mathbb{E}\left[\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)^{2}|(Z_{i})_{i=1}^{s}\right]
()𝔼(12+22+42|(Zi)i=1s)\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}+\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}+\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}^{2}|(Z_{i})_{i=1}^{s}\right)
()(C1+C2)2()4𝒩22(λ)(6m2+6n2),\displaystyle\stackrel{{\scriptstyle({\dagger})}}{{\leq}}(C_{1}+C_{2})^{2}\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}\mathcal{N}^{2}_{2}(\lambda)\left(\frac{6}{m^{2}}+\frac{6}{n^{2}}\right),

where ()(*) follows by noting that 𝔼(12)=𝔼(14)=𝔼(24)=0\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\cdot\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)=\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\cdot\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)=\mathbb{E}\left(\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\cdot\leavevmode\hbox to12.54pt{\vbox to12.54pt{\pgfpicture\makeatletter\hbox{\hskip 6.27202pt\lower-6.27202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{6.07202pt}{0.0pt}\pgfsys@curveto{6.07202pt}{3.35352pt}{3.35352pt}{6.07202pt}{0.0pt}{6.07202pt}\pgfsys@curveto{-3.35352pt}{6.07202pt}{-6.07202pt}{3.35352pt}{-6.07202pt}{0.0pt}\pgfsys@curveto{-6.07202pt}{-3.35352pt}{-3.35352pt}{-6.07202pt}{0.0pt}{-6.07202pt}\pgfsys@curveto{3.35352pt}{-6.07202pt}{6.07202pt}{-3.35352pt}{6.07202pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{4}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right)=0 under the assumption P=QP=Q, and ()({\dagger}) follows using aba2+b2.\sqrt{ab}\leq\frac{a}{2}+\frac{b}{2}.

Lemma A.13.

For any n,m(0,)n,m\in(0,\infty), if mnDmm\leq n\leq Dm for some D1D\geq 1, then for any a>0a>0

1ma+1na2a(Da+1)(m+n)a.\frac{1}{m^{a}}+\frac{1}{n^{a}}\leq\frac{2^{a}(D^{a}+1)}{(m+n)^{a}}.
Proof.

Observe that using mnDmm\leq n\leq Dm yields 1ma+1naDa+1na=2a(Da+1)(2n)a2a(Da+1)(m+n)a.\frac{1}{m^{a}}+\frac{1}{n^{a}}\leq\frac{D^{a}+1}{n^{a}}=\frac{2^{a}(D^{a}+1)}{(2n)^{a}}\leq\frac{2^{a}(D^{a}+1)}{(m+n)^{a}}.

Lemma A.14.

Define q1αλ:=inf{q:Fλ(q)1α},q_{1-\alpha}^{\lambda}:=\inf\{q\in\mathbb{R}:F_{\lambda}(q)\geq 1-\alpha\}, where

Fλ(x):=1DπΠn+m𝟙(η^πλx),F_{\lambda}(x):=\frac{1}{D}\sum_{\pi\in\Pi_{n+m}}\mathds{1}(\hat{\eta}^{\pi}_{\lambda}\leq x),

is the permutation distribution function. Let (πi)i=1B(\pi^{i})_{i=1}^{B} be BB randomly selected permutations from Πn+m\Pi_{n+m} and define

F^Bλ(x):=1Bi=1B𝟙(η^iλx),\hat{F}^{B}_{\lambda}(x):=\frac{1}{B}\sum_{i=1}^{B}\mathds{1}(\hat{\eta}^{{}^{i}}_{\lambda}\leq x),

where η^iλ:=η^λ(Xπi,Yπi,Z)\hat{\eta}^{i}_{\lambda}:=\hat{\eta}_{\lambda}(X^{\pi^{i}},Y^{\pi^{i}},Z) is the statistic based on the permuted samples. Define

q^1αB,λ:=inf{q:F^Bλ(q)1α}.\hat{q}_{1-\alpha}^{B,\lambda}:=\inf\{q\in\mathbb{R}:\hat{F}^{B}_{\lambda}(q)\geq 1-\alpha\}.

Then, for any α>0,α~>0,δ>0\alpha>0,\,\tilde{\alpha}>0,\ \delta>0, if B12α~2log2δ1B\geq\frac{1}{2\tilde{\alpha}^{2}}\log 2\delta^{-1}, the following hold:

  1. (i)

    Pπ(q^1αB,λq1αα~λ)1δP_{\pi}(\hat{q}_{1-\alpha}^{B,\lambda}\geq q_{1-\alpha-\tilde{\alpha}}^{\lambda})\geq 1-\delta;

  2. (ii)

    Pπ(q^1αB,λq1α+α~λ)1δ.P_{\pi}(\hat{q}_{1-\alpha}^{B,\lambda}\leq q_{1-\alpha+\tilde{\alpha}}^{\lambda})\geq 1-\delta.

Proof.

We first use the Dvoretzky-Kiefer-Wolfowitz (see Dvoretzky et al. 1956, Massart 1990) inequality to get uniform error bound for the empirical permutation distribution function, and then use it to obtain bounds on the empirical quantiles. Let

𝒜:={supxR|F^Bλ(x)Fλ(x)|12Blog(2δ1)}.\mathcal{A}:=\left\{\sup_{x\in R}|\hat{F}^{B}_{\lambda}(x)-F_{\lambda}(x)|\leq\sqrt{\frac{1}{2B}\log(2\delta^{-1})}\right\}.

Then DKW inequality yields Pπ(𝒜)1δP_{\pi}(\mathcal{A})\geq 1-\delta. Now assuming 𝒜\mathcal{A} holds, we have

q^1αB,λ\displaystyle\hat{q}_{1-\alpha}^{B,\lambda} =inf{q:F^Bλ(q)1α}\displaystyle=\inf\{q\in\mathbb{R}:\hat{F}^{B}_{\lambda}(q)\geq 1-\alpha\}
inf{q:Fλ(q)+12Blog(2δ1)1α}\displaystyle\geq\inf\left\{q\in\mathbb{R}:F_{\lambda}(q)+\sqrt{\frac{1}{2B}\log(2\delta^{-1})}\geq 1-\alpha\right\}
=inf{q:Fλ(q)1α12Blog(2δ1)}.\displaystyle=\inf\left\{q\in\mathbb{R}:F_{\lambda}(q)\geq 1-\alpha-\sqrt{\frac{1}{2B}\log(2\delta^{-1})}\right\}.

Furthermore, we have

q^1αB,λ\displaystyle\hat{q}_{1-\alpha}^{B,\lambda} =inf{q:F^Bλ(q)1α}\displaystyle=\inf\{q\in\mathbb{R}:\hat{F}^{B}_{\lambda}(q)\geq 1-\alpha\}
inf{q:Fλ(q)12Blog(2δ1)1α}\displaystyle\leq\inf\left\{q\in\mathbb{R}:F_{\lambda}(q)-\sqrt{\frac{1}{2B}\log(2\delta^{-1})}\geq 1-\alpha\right\}
=inf{q:Fλ(q)1α+12Blog(2δ1)}.\displaystyle=\inf\left\{q\in\mathbb{R}:F_{\lambda}(q)\geq 1-\alpha+\sqrt{\frac{1}{2B}\log(2\delta^{-1})}\right\}.

Thus (i) and (ii) hold if 12Blog(2δ1)α~\sqrt{\frac{1}{2B}\log(2\delta^{-1})}\leq\tilde{\alpha}, which is equivalent to the condition B12α~2log(2δ1).B\geq\frac{1}{2\tilde{\alpha}^{2}}\log(2\delta^{-1}).

Lemma A.15.

For 0<αe10<\alpha\leq e^{-1}, δ>0\delta>0 and mnDmm\leq n\leq Dm, there exists a constant C5>0C_{5}>0 such that

PH1(q1αλC5γ)1δ,\displaystyle P_{H_{1}}(q_{1-\alpha}^{\lambda}\leq C_{5}\gamma)\geq 1-\delta,

where

γ\displaystyle\gamma =()2log1αδ(n+m)(CλuL2(R)+𝒩2(λ)+Cλ1/4u3/2L2(R)+uL2(R))\displaystyle=\frac{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{2}\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)}\left(\sqrt{C_{\lambda}}\left\|u\right\|_{L^{2}(R)}+\mathcal{N}_{2}(\lambda)+C_{\lambda}^{1/4}\left\|u\right\|^{3/2}_{L^{2}(R)}+\left\|u\right\|_{L^{2}(R)}\right)
+ζlog1αδ(n+m),\displaystyle\qquad\qquad\qquad+\frac{\zeta\log\frac{1}{\alpha}}{\sqrt{\delta}(n+m)},

ζ=g1/2λ(Σ^PQ)(μQμP)2\zeta=\left\|g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})(\mu_{Q}-\mu_{P})\right\|_{\mathscr{H}}^{2}, and CλC_{\lambda} is defined in Lemma A.9.

Proof.

Let =g1/2λ(Σ^PQ)ΣPQ,λ1/2\mathcal{B}=g^{1/2}_{\lambda}(\hat{\Sigma}_{PQ})\Sigma_{PQ,\lambda}^{1/2}, a(x)=ΣPQ,λ1/2(K(,x)μP)a(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu_{P}) , and

b(x)=ΣPQ,λ1/2(K(,x)μQ).b(x)=\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(K(\cdot,x)-\mu_{Q}).

By (Kim et al., 2022, Equation 59), we can conclude that given the samples (Zi)i=1s(Z_{i})_{i=1}^{s} there exists a constant C6>0C_{6}>0 such that

q1αλC6Ilog1α,q_{1-\alpha}^{\lambda}\leq C_{6}I\log\frac{1}{\alpha},

almost surely, where

I2\displaystyle I^{2} :=1m2(m1)2ija(Xi),a(Xj)2+1m2(m1)2ija(Yi),a(Yj)2\displaystyle:=\frac{1}{m^{2}(m-1)^{2}}\sum_{i\neq j}{\left\langle a(X_{i}),a(X_{j})\right\rangle}^{2}_{\mathscr{H}}+\frac{1}{m^{2}(m-1)^{2}}\sum_{i\neq j}{\left\langle a(Y_{i}),a(Y_{j})\right\rangle}^{2}_{\mathscr{H}}
+2m2(m1)2i,ja(Xi),a(Yj)2.\displaystyle\qquad+\frac{2}{m^{2}(m-1)^{2}}\sum_{i,j}{\left\langle a(X_{i}),a(Y_{j})\right\rangle}^{2}_{\mathscr{H}}.

We bound I2I^{2} as

I2\displaystyle I^{2} ()1m2(m1)2ija(Xi),a(Xj)2+1m2(m1)2ijb(Yi),b(Yj)2\displaystyle\stackrel{{\scriptstyle(*)}}{{\lesssim}}\frac{1}{m^{2}(m-1)^{2}}\sum_{i\neq j}{\left\langle a(X_{i}),a(X_{j})\right\rangle}^{2}_{\mathscr{H}}+\frac{1}{m^{2}(m-1)^{2}}\sum_{i\neq j}{\left\langle b(Y_{i}),b(Y_{j})\right\rangle}^{2}_{\mathscr{H}}
+1m2(m1)i=1mb(Yi),ΣPQ,λ1/2(μQμP)2+ζ2m(m1)\displaystyle\qquad+\frac{1}{m^{2}(m-1)}\sum_{i=1}^{m}{\left\langle b(Y_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\rangle}^{2}_{\mathscr{H}}+\frac{\zeta^{2}}{m(m-1)}
+1m2(m1)i=1ma(Xi),ΣPQ,λ1/2(μQμP)2\displaystyle\qquad+\frac{1}{m^{2}(m-1)}\sum_{i=1}^{m}{\left\langle a(X_{i}),\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P})\right\rangle}^{2}_{\mathscr{H}}
+1m2(m1)2i,jma(Xi),b(Yj)2,\displaystyle\qquad+\frac{1}{m^{2}(m-1)^{2}}\sum_{i,j}^{m}{\left\langle a(X_{i}),b(Y_{j})\right\rangle}^{2}_{\mathscr{H}},

where ()(*) follows by writing a(Y)=b(Y)+ΣPQ,λ1/2(μQμP)a(Y)=b(Y)+\mathcal{B}\Sigma_{PQ,\lambda}^{-1/2}(\mu_{Q}-\mu_{P}) then using (i=1kak)2ki=1kak2,(\sum_{i=1}^{k}a_{k})^{2}\leq k\sum_{i=1}^{k}a_{k}^{2}, for any aka_{k}\in\mathbb{R}, kk\in\mathbb{N}. Then following the procedure similar to that in the proof of Lemma A.12, we can bound the expectation of each term using Lemma A.4, A.5, A.6, A.7, and A.9 resulting in

𝔼(I2|(Zi)i=1s)\displaystyle\mathbb{E}(I^{2}|(Z_{i})_{i=1}^{s})
()4m2(CλuL2(R)2+𝒩22(λ)+Cλu3L2(R)+uL2(R)2)+ζ2m2\displaystyle\lesssim\frac{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}}{m^{2}}\left(C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+\mathcal{N}^{2}_{2}(\lambda)+\sqrt{C_{\lambda}}\left\|u\right\|^{3}_{L^{2}(R)}+\left\|u\right\|_{L^{2}(R)}^{2}\right)+\frac{\zeta^{2}}{m^{2}}
()4(n+m)2(CλuL2(R)2+𝒩22(λ)+Cλu3L2(R)+uL2(R)2)+ζ2(n+m)2,\displaystyle\lesssim\frac{\left\|\mathcal{M}\right\|_{\mathcal{L}^{\infty}(\mathscr{H})}^{4}}{(n+m)^{2}}\left(C_{\lambda}\left\|u\right\|_{L^{2}(R)}^{2}+\mathcal{N}^{2}_{2}(\lambda)+\sqrt{C_{\lambda}}\left\|u\right\|^{3}_{L^{2}(R)}+\left\|u\right\|_{L^{2}(R)}^{2}\right)+\frac{\zeta^{2}}{(n+m)^{2}},

where in the last inequality we used Lemma A.13. Thus using q1αλC6Ilog1αq_{1-\alpha}^{\lambda}\leq C_{6}I\log\frac{1}{\alpha} and Markov’s inequality, we obtain the desired result. ∎

Lemma A.16.

Let ff be a function of a random variable XX and some (deterministic) parameter λΛ\lambda\in\Lambda, where Λ\Lambda has finite cardinality |Λ||\Lambda|. Let γ(α,λ)\gamma(\alpha,\lambda) be any function of λ\lambda and α>0\alpha>0. If for all λΛ\lambda\in\Lambda and α>0\alpha>0, P{f(X,λ)γ(α,λ)}αP\{f(X,\lambda)\geq\gamma(\alpha,\lambda)\}\leq\alpha, then

P{λΛf(X,λ)γ(α|Λ|,λ)}α.P\left\{\bigcup_{\lambda\in\Lambda}f(X,\lambda)\geq\gamma\left(\frac{\alpha}{|\Lambda|},\lambda\right)\right\}\leq\alpha.

Furthermore, if P{f(X,λ)}γ(α,λ)}δP\{f(X,\lambda^{*})\}\geq\gamma(\alpha,\lambda^{*})\}\geq\delta for some λΛ\lambda^{*}\in\Lambda and δ>0\delta>0, then

P{λΛf(X,λ)γ(α,λ)}δ.P\left\{\bigcup_{\lambda\in\Lambda}f(X,\lambda)\geq\gamma(\alpha,\lambda)\right\}\geq\delta.
Proof.

The proof follows directly from the fact that for any sets AA and BB, P(AB)P(A)+P(B)P(A\cup B)\leq P(A)+P(B):

P{λΛf(X,λ)γ(α|Λ|,λ)}\displaystyle P\left\{\bigcup_{\lambda\in\Lambda}f(X,\lambda)\geq\gamma\left(\frac{\alpha}{|\Lambda|},\lambda\right)\right\} λΛP{f(X,λ)γ(α|Λ|,λ)}\displaystyle\leq\sum_{\lambda\in\Lambda}P\left\{f(X,\lambda)\geq\gamma\left(\frac{\alpha}{|\Lambda|},\lambda\right)\right\}
λΛα|Λ|=α.\displaystyle\leq\sum_{\lambda\in\Lambda}\frac{\alpha}{|\Lambda|}=\alpha.

For the second part, we have

P{λΛf(X,λ)γ(α,λ)}P{f(X,λ)γ(α,λ)},\displaystyle P\left\{\bigcup_{\lambda\in\Lambda}f(X,\lambda)\geq\gamma(\alpha,\lambda)\right\}\geq P\left\{f(X,\lambda^{*})\geq\gamma(\alpha,\lambda^{*})\right\},

and the result follows. ∎

Lemma A.17.

Let HH be an RKHS with reproducing kernel k that is defined on a separable topological space, 𝒴\mathcal{Y}. Define

Σ=12𝒴𝒴(s(x)s(y))H(s(x)s(y))dR(x)dR(y),\Sigma=\frac{1}{2}\int_{\mathcal{Y}}\int_{\mathcal{Y}}(s(x)-s(y))\otimes_{H}(s(x)-s(y))\,dR(x)dR(y),

where s(x):=k(,x)s(x):=k(\cdot,x). Let (ψi)i(\psi_{i})_{i} be orthonormal eigenfunctions of Σ\Sigma with corresponding eigenvalues (λi)i(\lambda_{i})_{i} that satisfy C:=supiψiλi<C:=\sup_{i}\left\|\frac{\psi_{i}}{\sqrt{\lambda_{i}}}\right\|_{\infty}<\infty. Given (Yi)i=1ri.i.d.R(Y_{i})_{i=1}^{r}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}R with r2r\geq 2, define

Σ^:=12r(r1)ijr(s(Yi)s(Yj))H(s(Yi)s(Yj)).\hat{\Sigma}:=\frac{1}{2r(r-1)}\sum_{i\neq j}^{r}(s(Y_{i})-s(Y_{j}))\otimes_{H}(s(Y_{i})-s(Y_{j})).

Then for any 0δ120\leq\delta\leq\frac{1}{2}, r136C2𝒩1(λ)log8𝒩1(λ)δr\geq 136C^{2}\mathcal{N}_{1}(\lambda)\log\frac{8\mathcal{N}_{1}(\lambda)}{\delta} and λΣ(H)\lambda\leq\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)} where Σλ:=Σ+λ𝐈\Sigma_{\lambda}:=\Sigma+\lambda\boldsymbol{I} and 𝒩1(λ):=Tr(Σλ1/2ΣΣλ1/2)\mathcal{N}_{1}(\lambda):=\emph{Tr}(\Sigma_{\lambda}^{-1/2}\Sigma\Sigma_{\lambda}^{-1/2}), the following hold:

  1. (i)

    Pr{(Yi)i=1r:Σλ1/2(Σ^Σ)Σλ1/2(H)12}12δP^{r}\left\{(Y_{i})_{i=1}^{r}:\left\|\Sigma_{\lambda}^{-1/2}(\hat{\Sigma}-\Sigma)\Sigma_{\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)}\leq\frac{1}{2}\right\}\geq 1-2\delta;

  2. (ii)

    Pr{(Yi)i=1r:23Σλ1/2(Σ^+λ𝑰)1/2(H)2}12δP^{r}\left\{(Y_{i})_{i=1}^{r}:\sqrt{\frac{2}{3}}\leq\left\|\Sigma_{\lambda}^{1/2}(\hat{\Sigma}+\lambda\boldsymbol{I})^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)}\leq\sqrt{2}\right\}\geq 1-2\delta;

  3. (iii)

    Pr{(Yi)i=1r:Σλ1/2(Σ^+λ𝑰)1/2(H)32}12δP^{r}\left\{(Y_{i})_{i=1}^{r}:\left\|\Sigma_{\lambda}^{-1/2}(\hat{\Sigma}+\lambda\boldsymbol{I})^{1/2}\right\|_{\mathcal{L}^{\infty}(H)}\leq\sqrt{\frac{3}{2}}\right\}\geq 1-2\delta.

Proof.

(i) Define A(x,y):=12(s(x)s(y))A(x,y):=\frac{1}{\sqrt{2}}(s(x)-s(y)), U(x,y):=Σλ1/2A(x,y)U(x,y):=\Sigma_{\lambda}^{-1/2}A(x,y), Z(x,y)=U(x,y)HU(x,y)Z(x,y)=U(x,y)\otimes_{H}U(x,y). Then

Σλ1/2(Σ^Σ)Σλ1/2=1r(r1)ijZ(Xi,Yj)𝔼(Z(X,Y)).\Sigma_{\lambda}^{-1/2}(\hat{\Sigma}-\Sigma)\Sigma_{\lambda}^{-1/2}=\frac{1}{r(r-1)}\sum_{i\neq j}Z(X_{i},Y_{j})-\mathbb{E}(Z(X,Y)).

Also,

supx,yZ(x,y)2(H)\displaystyle\sup_{x,y}\left\|Z(x,y)\right\|_{\mathcal{L}^{2}(H)} =supx,yU(x,y)H2=12supx,yΣλ1/2(s(x)s(y))H2\displaystyle=\sup_{x,y}\left\|U(x,y)\right\|_{H}^{2}=\frac{1}{2}\sup_{x,y}\left\|\Sigma_{\lambda}^{-1/2}(s(x)-s(y))\right\|_{H}^{2}
=12supx,ys(x)s(y),Σλ1(s(x)s(y))H\displaystyle=\frac{1}{2}\sup_{x,y}{\left\langle s(x)-s(y),\Sigma_{\lambda}^{-1}(s(x)-s(y))\right\rangle}_{H}
=()12supx,yiλiλi+λ(ψi(x)ψi(y)λi)22C2𝒩1(λ),\displaystyle\stackrel{{\scriptstyle(*)}}{{=}}\frac{1}{2}\sup_{x,y}\sum_{i}\frac{\lambda_{i}}{\lambda_{i}+\lambda}\left(\frac{\psi_{i}(x)-\psi_{i}(y)}{\sqrt{\lambda_{i}}}\right)^{2}\leq 2C^{2}\mathcal{N}_{1}(\lambda), (A.1)

where in ()(*) we used that s(x)s(y),s(x)s(y)H=i(ψi(x)ψi(y))2{\left\langle s(x)-s(y),s(x)-s(y)\right\rangle}_{H}=\sum_{i}(\psi_{i}(x)-\psi_{i}(y))^{2} which is proved below. To this end, define a(x)=s(x)μRa(x)=s(x)-\mu_{R} so that

a(x),a(y)H\displaystyle{\left\langle a(x),a(y)\right\rangle}_{H} =k(,x)μR,k(,y)μRH\displaystyle={\left\langle k(\cdot,x)-\mu_{R},k(\cdot,y)-\mu_{R}\right\rangle}_{H}
=k(x,y)𝔼Rk(x,Y)𝔼Rk(X,y)𝔼R×Rk(X,Y):=k¯(x,y).\displaystyle=k(x,y)-\mathbb{E}_{R}k(x,Y)-\mathbb{E}_{R}k(X,y)-\mathbb{E}_{R\times R}k(X,Y):=\bar{k}(x,y).

Therefore,

s(x)s(y),s(x)s(y)H\displaystyle{\left\langle s(x)-s(y),s(x)-s(y)\right\rangle}_{H} =a(x)a(y),a(x)a(y)H\displaystyle={\left\langle a(x)-a(y),a(x)-a(y)\right\rangle}_{H}
=k¯(x,x)2k¯(x,y)+k¯(y,y).\displaystyle=\bar{k}(x,x)-2\bar{k}(x,y)+\bar{k}(y,y).

Following the same argument as in proof of Lemma A.9(i), we obtain

k¯(x,y)=iψi¯(x)ψi¯(y),\bar{k}(x,y)=\sum_{i}\bar{\psi_{i}}(x)\bar{\psi_{i}}(y),

where ψi¯=ψi𝔼Rψi\bar{\psi_{i}}=\psi_{i}-\mathbb{E}_{R}\psi_{i}, yielding

i(ψi(x)ψi(y))2=i(ψ¯i(x)ψ¯i(y))2=k¯(x,x)2k¯(x,y)+k¯(y,y).\displaystyle\sum_{i}(\psi_{i}(x)-\psi_{i}(y))^{2}=\sum_{i}(\bar{\psi}_{i}(x)-\bar{\psi}_{i}(y))^{2}=\bar{k}(x,x)-2\bar{k}(x,y)+\bar{k}(y,y).

Define ζ(x):=𝔼Y[Z(x,Y)]\zeta(x):=\mathbb{E}_{Y}[Z(x,Y)]. Then

supxζ(x)(H)\displaystyle\sup_{x}\left\|\zeta(x)\right\|_{\mathcal{L}^{\infty}(H)} supx,yU(x,y)HU(x,y)(H)=supx,yU(x,y)H22C𝒩1(λ),\displaystyle\leq\sup_{x,y}\left\|U(x,y)\otimes_{H}U(x,y)\right\|_{\mathcal{L}^{\infty}(H)}=\sup_{x,y}\left\|U(x,y)\right\|_{H}^{2}\leq 2C\mathcal{N}_{1}(\lambda),

where the last inequality follows from (A.1). Furthermore, we have

𝔼(ζ(X)Σ)2𝔼ζ2(X)=𝔼(Σλ1/2𝔼Y[A(X,Y)HA(X,Y)]Σλ1/2)2\displaystyle\mathbb{E}(\zeta(X)-\Sigma)^{2}\preccurlyeq\mathbb{E}\zeta^{2}(X)=\mathbb{E}\left(\Sigma_{\lambda}^{-1/2}\mathbb{E}_{Y}[A(X,Y)\otimes_{H}A(X,Y)]\Sigma_{\lambda}^{-1/2}\right)^{2}
=𝔼(Σλ1/2𝔼Y[A(X,Y)HA(X,Y)]Σλ1𝔼Y[A(X,Y)HA(X,Y)]Σλ1/2)\displaystyle=\mathbb{E}\left(\Sigma_{\lambda}^{-1/2}\mathbb{E}_{Y}[A(X,Y)\otimes_{H}A(X,Y)]\Sigma_{\lambda}^{-1}\mathbb{E}_{Y}[A(X,Y)\otimes_{H}A(X,Y)]\Sigma_{\lambda}^{-1/2}\right)
supxζ(x)(H)𝔼(Σλ1/2𝔼Y[A(X,Y)HA(X,Y)]Σλ1/2)\displaystyle\preccurlyeq\sup_{x}\left\|\zeta(x)\right\|_{\mathcal{L}^{\infty}(H)}\mathbb{E}\left(\Sigma_{\lambda}^{-1/2}\mathbb{E}_{Y}[A(X,Y)\otimes_{H}A(X,Y)]\Sigma_{\lambda}^{-1/2}\right)
2C𝒩1(λ)Σλ1/2ΣΣλ1/2:=S.\displaystyle\preccurlyeq 2C\mathcal{N}_{1}(\lambda)\Sigma_{\lambda}^{-1/2}\Sigma\Sigma_{\lambda}^{-1/2}:=S.

Note that S(H)2C𝒩1(λ):=σ2\left\|S\right\|_{\mathcal{L}^{\infty}(H)}\leq 2C\mathcal{N}_{1}(\lambda):=\sigma^{2} and

d:=Tr(S)S(H)𝒩1(λ)(Σ(H)+λ)Σ(H)()2𝒩1(λ),d:=\frac{\text{Tr}(S)}{\left\|S\right\|_{\mathcal{L}^{\infty}(H)}}\leq\frac{\mathcal{N}_{1}(\lambda)(\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)}+\lambda)}{\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)}}\stackrel{{\scriptstyle(*)}}{{\leq}}2\mathcal{N}_{1}(\lambda),

where ()(*) follows by using λΣ(H)\lambda\leq\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)}. Using Theorem D.3(ii) from Sriperumbudur and Sterge (2022) , we get

Pr{(Yi)ri=1:Σλ1/2(Σ^Σ)Σλ1/2(H)4Cβ𝒩1(λ)r+24Cβ𝒩1(λ)r\displaystyle P^{r}\left\{(Y_{i})^{r}_{i=1}:\left\|\Sigma_{\lambda}^{-1/2}(\hat{\Sigma}-\Sigma)\Sigma_{\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)}\leq\frac{4C\beta\mathcal{N}_{1}(\lambda)}{r}+\sqrt{\frac{24C\beta\mathcal{N}_{1}(\lambda)}{r}}\right.
+16C𝒩1(λ)log3δr}12δ,\displaystyle\qquad\qquad\qquad\left.+\frac{16C\mathcal{N}_{1}(\lambda)\log\frac{3}{\delta}}{r}\right\}\geq 1-2\delta,

where β:=23log4dδ\beta:=\frac{2}{3}\log\frac{4d}{\delta}. Then using 𝒩1(λ)Σ(H)λ+Σ(H)12>38\mathcal{N}_{1}(\lambda)\geq\frac{\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)}}{\lambda+\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)}}\geq\frac{1}{2}>\frac{3}{8}, it can be verified that 4Cβ𝒩1(λ)r+24Cβ𝒩1(λ)r+16C𝒩1(λ)log3δr323w+8w\frac{4C\beta\mathcal{N}_{1}(\lambda)}{r}+\sqrt{\frac{24C\beta\mathcal{N}_{1}(\lambda)}{r}}+\frac{16C\mathcal{N}_{1}(\lambda)\log\frac{3}{\delta}}{r}\leq\frac{32}{3}w+\sqrt{8w}, where w:=2C𝒩1(λ)log8𝒩1(λ)δrw:=\frac{2C\mathcal{N}_{1}(\lambda)\log\frac{8\mathcal{N}_{1}(\lambda)}{\delta}}{r}. Note that w168w\leq\frac{1}{68} implies 323w+8w12.\frac{32}{3}w+\sqrt{8w}\leq\frac{1}{2}. This means, if r136C2𝒩1(λ)log8𝒩1(λ)δr\geq 136C^{2}\mathcal{N}_{1}(\lambda)\log\frac{8\mathcal{N}_{1}(\lambda)}{\delta} and λΣ(H)\lambda\leq\left\|\Sigma\right\|_{\mathcal{L}^{\infty}(H)}, we get

Pr{(Yi)i=1r:Σλ1/2(Σ^Σ)Σλ1/2(H)12}12δ.P^{r}\left\{(Y_{i})_{i=1}^{r}:\left\|\Sigma_{\lambda}^{-1/2}(\hat{\Sigma}-\Sigma)\Sigma_{\lambda}^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)}\leq\frac{1}{2}\right\}\geq 1-2\delta.

(ii) By defining Br:=Σλ1/2(ΣΣ^)Σλ1/2B_{r}:=\Sigma_{\lambda}^{-1/2}(\Sigma-\hat{\Sigma})\Sigma_{\lambda}^{-1/2}, we have

Σλ1/2(Σ^+λ𝑰)1/2(H)\displaystyle\left\|\Sigma_{\lambda}^{1/2}(\hat{\Sigma}+\lambda\boldsymbol{I})^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)} =(Σ^+λ𝑰)1/2Σλ(Σ^+λ𝑰)1/2(H)1/2\displaystyle=\left\|(\hat{\Sigma}+\lambda\boldsymbol{I})^{-1/2}\Sigma_{\lambda}(\hat{\Sigma}+\lambda\boldsymbol{I})^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)}^{1/2}
=Σλ1/2(Σ^+λ𝑰)1Σλ1/2(H)1/2\displaystyle=\left\|\Sigma_{\lambda}^{1/2}(\hat{\Sigma}+\lambda\boldsymbol{I})^{-1}\Sigma_{\lambda}^{1/2}\right\|_{\mathcal{L}^{\infty}(H)}^{1/2}
=(𝑰Br)1(H)1/2(1Br(H))1/2,\displaystyle=\left\|(\boldsymbol{I}-B_{r})^{-1}\right\|_{\mathcal{L}^{\infty}(H)}^{1/2}\leq(1-\left\|B_{r}\right\|_{\mathcal{L}^{\infty}(H)})^{-1/2},

where the last inequality holds whenever Br(H)<1\left\|B_{r}\right\|_{\mathcal{L}^{\infty}(H)}<1. Similarly,

Σλ1/2(Σ^+λ𝑰)1/2(H)=(𝑰+(Br))1(H)1/2(1+Br(H))1/2.\left\|\Sigma_{\lambda}^{1/2}(\hat{\Sigma}+\lambda\boldsymbol{I})^{-1/2}\right\|_{\mathcal{L}^{\infty}(H)}=\left\|(\boldsymbol{I}+(-B_{r}))^{-1}\right\|_{\mathcal{L}^{\infty}(H)}^{1/2}\geq(1+\left\|B_{r}\right\|_{\mathcal{L}^{\infty}(H)})^{-1/2}.

The result therefore follows from (i).

(iii) Since

Σλ1/2(Σ^+λ𝑰)1/2(H)=𝑰Br(H)1/2(1+Br(H))1/2,\left\|\Sigma_{\lambda}^{-1/2}(\hat{\Sigma}+\lambda\boldsymbol{I})^{1/2}\right\|_{\mathcal{L}^{\infty}(H)}=\left\|\boldsymbol{I}-B_{r}\right\|_{\mathcal{L}^{\infty}(H)}^{1/2}\leq(1+\left\|B_{r}\right\|_{\mathcal{L}^{\infty}(H)})^{1/2},

the result follows from (i). ∎

Lemma A.18.

For probability measures PP and QQ,

d(P,Q)=χ2(P||P+Q2)d(P,Q)=\sqrt{\chi^{2}\left(P||\frac{P+Q}{2}\right)}

is a metric. Futhermore H2(P,Q)d2(P,Q)2H2(P,Q)H^{2}(P,Q)\leq d^{2}(P,Q)\leq 2H^{2}(P,Q), where H(P,Q)H(P,Q) denotes the Hellinger distance between PP and QQ.

Proof.

Observe that d2(P,Q)=χ2(P||P+Q2)=12(dPdQ)2d(P+Q)d^{2}(P,Q)=\chi^{2}\left(P||\frac{P+Q}{2}\right)=\frac{1}{2}\int\frac{(dP-dQ)^{2}}{d(P+Q)}. Thus it is obvious that d(P,P)=0d(P,P)=0, d(P,Q)=d(Q,P)d(P,Q)=d(Q,P) and d(P,Q)>0d(P,Q)>0 if PQP\neq Q. Hence, it remains just to check the triangular inequality. For that matter, we will first show that

dPdQd(P+Q)dPdZd(P+Z)+dZdQd(Z+Q),\frac{dP-dQ}{\sqrt{d(P+Q)}}\leq\frac{dP-dZ}{\sqrt{d(P+Z)}}+\frac{dZ-dQ}{\sqrt{d(Z+Q)}}, (A.2)

where ZZ is a probability measure. Defining α=dPdZdPdQ\alpha=\frac{dP-dZ}{dP-dQ}, note that d(P+Q)=αd(P+Z)+(1α)`d(Z+Q)d(P+Q)=\alpha\cdot d(P+Z)+(1-\alpha)\cdot`d(Z+Q). Therefore, using the convexity of the function 1x\frac{1}{\sqrt{x}} over [0,)[0,\infty) yields

dPdQd(P+Q)\displaystyle\frac{dP-dQ}{\sqrt{d(P+Q)}} (dPdQ)(αd(P+Z)+1αd(Z+Q))\displaystyle\leq(dP-dQ)\left(\frac{\alpha}{\sqrt{d(P+Z)}}+\frac{1-\alpha}{\sqrt{d(Z+Q)}}\right)
=dPdZd(P+Z)+dZdQd(Z+Q).\displaystyle=\frac{dP-dZ}{\sqrt{d(P+Z)}}+\frac{dZ-dQ}{\sqrt{d(Z+Q)}}.

Then by squaring (A.2) and applying Cauchy-Schwartz inequality, we get

12(dPdQ)2d(P+Q)\displaystyle\frac{1}{2}\int\frac{(dP-dQ)^{2}}{d(P+Q)}
12(dPdZ)2d(P+Z)+12(dZdQ)2d(Z+Q)+(dPdZ)(dZdQ)d(P+Z)d(Z+Q)\displaystyle\leq\frac{1}{2}\int\frac{(dP-dZ)^{2}}{d(P+Z)}+\frac{1}{2}\int\frac{(dZ-dQ)^{2}}{d(Z+Q)}+\int\frac{(dP-dZ)(dZ-dQ)}{\sqrt{d(P+Z)}\sqrt{d(Z+Q)}}
12(dPdZ)2d(P+Z)+12(dZdQ)2d(Z+Q)+((dPdZ)2d(P+Z))1/2((dZdQ)2d(Z+Q))1/2\displaystyle\leq\frac{1}{2}\int\frac{(dP-dZ)^{2}}{d(P+Z)}+\frac{1}{2}\int\frac{(dZ-dQ)^{2}}{d(Z+Q)}+\left(\int\frac{(dP-dZ)^{2}}{d(P+Z)}\right)^{1/2}\left(\int\frac{(dZ-dQ)^{2}}{d(Z+Q)}\right)^{1/2}
=(d(P,Z)+d(Z,Q))2,\displaystyle=\left(d(P,Z)+d(Z,Q)\right)^{2},

which is equivalent to d(P,Q)d(P,Z)+d(Z,Q)d(P,Q)\leq d(P,Z)+d(Z,Q). For the relation with Hellinger distance, observe that H2(P,Q)=12(dPdQ)2H^{2}(P,Q)=\frac{1}{2}\int(\sqrt{dP}-\sqrt{dQ})^{2}, and d2(P,Q)=12(dPdQ)2(dP+dQ)2d(P+Q)d^{2}(P,Q)=\frac{1}{2}\int(\sqrt{dP}-\sqrt{dQ})^{2}\frac{(\sqrt{dP}+\sqrt{dQ})^{2}}{d(P+Q)}. Since d(P+Q)(dP+dQ)22d(P+Q)d(P+Q)\leq(\sqrt{dP}+\sqrt{dQ})^{2}\leq 2d(P+Q), the result follows. ∎

Lemma A.19.

Let u=dPdR1Ran(𝒯θ)u=\frac{dP}{dR}-1\in\emph{\text{Ran}}(\mathcal{T}^{\theta}) and DMMD2=μPμQ2D_{\mathrm{MMD}}^{2}=\left\|\mu_{P}-\mu_{Q}\right\|_{\mathscr{H}}^{2}. Then

DMMD24uL2(R)2θ+1θ𝒯θuL2(R)1θ.D_{\mathrm{MMD}}^{2}\geq 4\left\|u\right\|_{L^{2}(R)}^{\frac{2\theta+1}{\theta}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}^{\frac{-1}{\theta}}.
Proof.

Since uRan(𝒯θ)u\in\text{Ran}(\mathcal{T}^{\theta}), then u=𝒯θfu=\mathcal{T}^{\theta}f for some fL2(R)f\in L^{2}(R). Thus,

u2L2(R)\displaystyle\left\|u\right\|^{2}_{L^{2}(R)} =iλi2θf,ϕi~L2(R)2=iλi2θf,ϕi~L2(R)4θ2θ+1f,ϕi~L2(R)22θ+1\displaystyle=\sum_{i}\lambda_{i}^{2\theta}{\left\langle f,\tilde{\phi_{i}}\right\rangle}_{L^{2}(R)}^{2}=\sum_{i}\lambda_{i}^{2\theta}{\left\langle f,\tilde{\phi_{i}}\right\rangle}_{L^{2}(R)}^{\frac{4\theta}{2\theta+1}}{\left\langle f,\tilde{\phi_{i}}\right\rangle}_{L^{2}(R)}^{\frac{2}{2\theta+1}}
()(iλi2θ+1f,ϕi~L2(R)2)2θ2θ+1(if,ϕi~L2(R)2)12θ+1\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}\left(\sum_{i}\lambda_{i}^{2\theta+1}{\left\langle f,\tilde{\phi_{i}}\right\rangle}_{L^{2}(R)}^{2}\right)^{\frac{2\theta}{2\theta+1}}\left(\sum_{i}{\left\langle f,\tilde{\phi_{i}}\right\rangle}_{L^{2}(R)}^{2}\right)^{\frac{1}{2\theta+1}}
=𝒯1/2uL2(R)4θ2θ+1𝒯θuL2(R)22θ+1,\displaystyle=\left\|\mathcal{T}^{1/2}u\right\|_{L^{2}(R)}^{\frac{4\theta}{2\theta+1}}\left\|\mathcal{T}^{-\theta}u\right\|_{L^{2}(R)}^{\frac{2}{2\theta+1}},

where ()(*) holds by Holder’s inequality. The desired result follows by noting that DMMD2=4𝒯1/2u2L2(R).D_{\mathrm{MMD}}^{2}=4\left\|\mathcal{T}^{1/2}u\right\|^{2}_{L^{2}(R)}.

Lemma A.20.

Suppose (A1)(A_{1}) and (A3)(A_{3}) hold. Then limλ0xgλ(x)1\lim_{\lambda\rightarrow 0}xg_{\lambda}(x)\asymp 1 for all xΓx\in\Gamma.

Proof.

(A3)(A_{3}) states that

sup{xΓ:xgλ(x)<B3}|B3xgλ(x)|x2φC3λ2φ,\sup_{\{x\in\Gamma:xg_{\lambda}(x)<B_{3}\}}|B_{3}-xg_{\lambda}(x)|x^{2\varphi}\leq C_{3}\lambda^{2\varphi},

where Γ:=[0,κ]\Gamma:=[0,\kappa], φ(0,ξ]\varphi\in(0,\xi], C3,B3C_{3},B_{3} are positive constants all independent of λ>0\lambda>0. So by taking limit as λ0\lambda\to 0 on both sides of the above inequality, we get

0limλ0sup{xΓ:xgλ(x)<B3}|B3xgλ(x)|x2φ0,0\leq\lim_{\lambda\to 0}\sup_{\{x\in\Gamma:xg_{\lambda}(x)<B_{3}\}}|B_{3}-xg_{\lambda}(x)|x^{2\varphi}\leq 0,

which yields that

limλ0sup{xΓ:xgλ(x)<B3}|B3xgλ(x)|x2φ=0.\lim_{\lambda\to 0}\sup_{\{x\in\Gamma:xg_{\lambda}(x)<B_{3}\}}|B_{3}-xg_{\lambda}(x)|x^{2\varphi}=0.

This implies limλ0sup{xΓ:xgλ(x)<B3}|B3xgλ(x)|=0\lim_{\lambda\to 0}\sup_{\{x\in\Gamma:xg_{\lambda}(x)<B_{3}\}}|B_{3}-xg_{\lambda}(x)|=0, which implies that

limλ0xgλ(x)=B3\lim_{\lambda\to 0}xg_{\lambda}(x)=B_{3}

for all {xΓ:xgλ(x)<B3}\{x\in\Gamma:xg_{\lambda}(x)<B_{3}\}. For {xΓ:xgλ(x)>B3}\{x\in\Gamma:xg_{\lambda}(x)>B_{3}\} the result is trivial since (A1)(A_{1}) dictates that xgλ(x)<C1xg_{\lambda}(x)<C_{1}. ∎