A Distribution Free Conditional Independence Test with Applications to Causal Discovery

Zhanrui Cai¹, Runze Li², and Yaowu Zhang³
^1,2 The Pennsylvania State University, USA
³ Shanghai University of Finance and Economics, China

Abstract

This paper is concerned with test of the conditional independence. We first establish an equivalence between the conditional independence and the mutual independence. Based on the equivalence, we propose an index to measure the conditional dependence by quantifying the mutual dependence among the transformed variables. The proposed index has several appealing properties. (a) It is distribution free since the limiting null distribution of the proposed index does not depend on the population distributions of the data. Hence the critical values can be tabulated by simulations. (b) The proposed index ranges from zero to one, and equals zero if and only if the conditional independence holds. Thus, it has nontrivial power under the alternative hypothesis. (c) It is robust to outliers and heavy-tailed data since it is invariant to conditional strictly monotone transformations. (d) It has low computational cost since it incorporates a simple closed-form expression and can be implemented in quadratic time. (e) It is insensitive to tuning parameters involved in the calculation of the proposed index. (f) The new index is applicable for multivariate random vectors as well as for discrete data. All these properties enable us to use the new index as statistical inference tools for various data. The effectiveness of the method is illustrated through extensive simulations and a real application on causal discovery.

Key words : Conditional independence, mutual independence, distribution free.

1 Introduction

Conditional independence is fundamental in graphical models and causal inference (Jordan, 1998). Under multinormality assumption, conditional independence is equivalent to the corresponding partial correlation being 0. Thus, partial correlation may be used to measure conditional dependence (Lawrance, 1976). However, partial correlation has low power in detecting conditional dependence in the presence of nonlinear dependence. In addition, it cannot control Type I error when the multinormality assumption is violated. In general, testing for conditional independence is much more challenging than for unconditional independence (Zhang et al., 2011; Shah and Peters, 2020).

Recent works on test of conditional independence have focused on developing omnibus conditional independence test without assuming specific functional forms of the dependencies. Linton and Gozalo (1996) proposed a nonparametric conditional independence test based on the generalization of empirical distribution function, and proposed using bootstrap to obtain the null distribution of the proposed test. This diminishes the computational efficiency. Other approaches include measuring the difference between conditional characteristic functions (Su and White, 2007), the weighted Hellinger distance (Su and White, 2008), and the empirical likelihood (Su and White, 2014). Although these authors established the asymptotical normality of the proposed test under conditional independence, the performance of their proposed tests relies heavily on consistent estimate of the bias and variance terms, which are quite complicated in practice. The asymptotical null distribution may perform badly with a small sample. Thus, the authors recommended obtaining critical values of the proposed tests by a bootstrap. This results in heavy computation burdens. Huang (2010) proposed a test of conditional independence based on the maximum nonlinear conditional correlation. By discretizing the conditioning set into a set of bins, the author transforms the original problem into an unconditional testing problem. Zhang et al. (2011) proposed a kernel-based conditional independence test, which essentially tests for zero Hilbert-Schmidt norm of the partial cross-covariance operator in the reproducing kernel Hilbert spaces. The test also required a bootstrap to approximate the null distribution. Wang et al. (2015) introduced the energy statistics into the conditional test and developed the conditional distance correlation based on Székely et al. (2007), which can also be linked to kernel-based approaches. But the test statistics requires to compute high order U–statistics and therefore suffers heavy computation burden, which is of order $O(n^{3})$ for a sample with size $n$ . Runge (2018) proposed a non-parametric conditional independence testing based on the information theory framework, in which the conditional mutual information was estimated directly via combining the $k$ -nearest neighbor estimator with a nearest-neighbor local permutation scheme. However, the theoretical distribution of the proposed test is unclear.

In this paper, we develop a new methodology to test conditional independence and propose conditional independence tests that are applicable for continuous or discrete random variables or vectors. Let $X$ , $Y$ and $Z$ be three continuous random variables. We are interested in testing whether $X$ and $Y$ are statistically independent given $Z$ :

\displaystyle H_{0}:X\bot\!\!\!\bot Y\mid Z,\quad\textrm{ versus }\quad H_{1}:\textrm{ otherwise}.

Here we focus on random variables for simplicity. We will consider test of conditional independence for random vectors in Section 3. To begin with, we observe that with Rosenblatt transformation (Rosenblatt, 1952), i.e., $U\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}F_{X\mid Z}(X\mid Z)$ , $V\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}F_{Y\mid Z}(Y\mid Z)$ and $W\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}F_{Z}(Z)$ , $X\bot\!\!\!\bot Y\mid Z$ is equivalent to the mutual independence of $U$ , $V$ and $W$ . Thus we convert a conditional independence test into a mutual independence test, and any technique for testing mutual independence can be readily applied. For example, Chakraborty and Zhang (2019) proposed the joint distance covariance to test mutual independence and Drton et al. (2020) constructed a family of tests with maxima of rank correlations in high dimensions. However, these mutual independence tests do not consider the intrinsic properties of $U$ , $V$ and $W$ . This motivates us to develop a new index $\rho$ to measure the mutual dependence. We show that the index $\rho$ has a closed form, which is much simpler than that of Chakraborty and Zhang (2019). In addition, it is symmetric, invariant to strictly monotone transformations, and ranges from zero to one, and is equal to zero if and only if $U$ , $V$ and $W$ are mutually independent. Based on the index $\rho$ , we further proposed tests of conditional independence. We would like to further note a recent work proposed by Zhou et al. (2020), who suggested to simply test whether $U$ and $V$ are independent. However, this is not fully equivalent to the conditional independence test and it is unclear what kind of power loss one might have.

The proposed tests have several appealing features. (a) The proposed test is distribution free in the sense that its limiting null distribution does not depend on unknown parameters and the population distributions of the data. The fact that both $U$ and $V$ are independent of $W$ makes the test statistic $n$ -consistent under the null hypothesis without requiring under-smoothing. In addition, even though the test statistic depends on $U$ , $V$ and $W$ , which needs to be estimated nonparametrically, we show that the test statistic has the same asymptotic properties as the statistics where true $U$ , $V$ and $W$ are directly available. This leads to a distribution free test statistic when further considering that $U,V$ and $W$ are uniformly distributed. Although some tests in the literature are also distribution free, the asymptotic distributions are either complicated to estimate (e.g., Su and White, 2007) or rely on the Gaussian process, which is not known how to simulate (e.g., Song, 2009) and would require a wild bootstrap method to determine the critical values. Compared with existing ones, the limiting null distribution of the proposed test depends on $U$ , $V$ and $W$ only, and the critical values can be easily obtained by a simulation-based procedure. (b) The proposed test has nontrivial power against all fixed alternatives. The population version of the test statistic ranges from zero to one and equals zero if and only if conditional independence holds. Unlike many testing procedures that are weaker than that for conditional independence (e.g., Song, 2009), the equivalence between conditional independence and mutual independence guarantees that the newly proposed test has nontrivial power against all fixed alternatives. (c) The proposed test is robust since it is invariant to strictly monotone transformations and thus, it is robust to outliers. Furthermore, $U$ , $V$ and $W$ all have bounded support, and therefore it is suitable for handling heavy-tailed data. (d) The proposed test has low computational cost. It is a $V$ –statistic, and direct calculation requires only $O(n^{2})$ computational complexity. (e) It is insensitive to tuning parameters involved in the test statistics. The test statistics are $n$ –consistent under the null hypothesis without under-smoothing, and is hence much less sensitive to the bandwidth. The proposed index $\rho$ is extended to continuous random vectors and discrete data in Section 3. All these properties enable us to use the new conditional independence test for various data.

The rest of this paper is organized as follows. In Section 2 we first show the equivalence between conditional independence and mutual independence. We propose a new index to measure the mutual dependence, and derive desirable properties of the proposed index in Section 2.1. We propose an estimator for the new index in Section 2.2. The asymptotic distributions of the proposed estimator under the null hypothesis, global alternative, and local alternative hypothesis are derived in Section 2.2. We extend the new index to the multivariate and discrete cases in Section 3. We conduct numerical comparisons and apply the proposed test to causal discovery in directed acyclic graphs in Section 4. Some final remarks are given in Section 5. We provide some additional simulation results as well as all the technical proofs in the appendix.

2 Methodology

To begin with, we establish an equivalence between conditional independence and mutual independence. In this section, we focus on the setting in which $X$ , $Y$ and $Z$ are continuous univariate random variables, and the problem of interest is to test $X\bot\!\!\!\bot Y\mid Z$ . Throughout this section, denote $U=F_{X\mid Z}(X\mid Z)$ , $V=F_{Y\mid Z}(Y\mid Z)$ and $W=F_{Z}(Z)$ . The proposed methodology is built upon the following proposition.

Proposition 1

Suppose that $X$ and $Y$ are both univariate and have continuous conditional distribution functions for every given value of $Z$ , and $Z$ is a continuous univariate random variable. Then $X\bot\!\!\!\bot Y\mid Z$ if and only if $U,$ $V$ and $W$ are mutually independent.

We provide a detailed proof of Proposition 1 in the appendix. Essentially, it establishes an equivalence between the conditional independence of $X\bot\!\!\!\bot Y\mid Z$ and the mutual independence among $U$ , $V$ and $W$ under the conditions in Proposition 1. Therefore, we can alleviate the hardness issue of conditional independence testing (Shah and Peters, 2020) by restricting the distribution family of the data such that $U$ , $V$ and $W$ can be estimated sufficiently well using samples. As shown in our theoretical analysis, we further impose certain smoothness conditions on the conditional distributions of $X,Y\mid Z=z$ as $z$ varies in the support of $Z$ . This distribution family is also considered in Neykov et al. (2020) to develop a minimax optimal conditional independence test. We discuss the extension of Proposition 1 to multivariate and discrete data in Section 3.

According to Proposition 1, any techniques for testing mutual independence among three random variables can be readily applied for conditional independence testing problems. For example, Chakraborty and Zhang (2019) proposed the joint distance covariance and Patra et al. (2016) developed a bootstrap procedure to test mutual independence with known marginals. However, a direct application of these metrics may not be a good choice because it ignores the fact that the variables $U$ , $V$ and $W$ are all uniformly distributed, as well as $U\bot\!\!\!\bot W$ and $V\bot\!\!\!\bot W$ . Next, we discuss how to develop a new mutual independence test while considering these intrinsic properties of $(U,V,W)$ .

2.1 A mutual independence test

In this section, we propose to characterize the conditional dependence of $X$ and $Y$ given $Z$ through quantifying the mutual dependence among $U$ , $V$ and $W$ . Although our proposed test is based on the distance between characteristics functions, our proposed test is much simpler and has different asymptotic distribution as well as different convergence rate from the conditional distance correlation proposed by Wang et al. (2015). Let $\omega(\cdot)$ be an arbitrary positive weight function and $\varphi_{U,V,W}(\cdot)$ , $\varphi_{U}(\cdot)$ , $\varphi_{V}(\cdot)$ , and $\varphi_{W}(\cdot)$ be the characteristic functions of $(U,V,W)$ , $U$ , $V$ and $W$ , respectively. Then

			$U$ , $V$ and $W$ are mutually independent
		$\displaystyle\Longleftrightarrow$	$\displaystyle\varphi_{U,V,W}(t_{1},t_{2},t_{3})=\varphi_{U}(t_{1})\varphi_{V}(t_{2})\varphi_{W}(t_{3})\textrm{ for all $t_{1},t_{2},t_{3}\in\mathbb{R}$}$
		$\displaystyle\Longleftrightarrow$	$\displaystyle\iiint\big{\\|}\varphi_{U,V,W}(t_{1},t_{2},t_{3})-\varphi_{U}(t_{1})\varphi_{V}(t_{2})\varphi_{W}(t_{3})\big{\\|}^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}=0,$

where $\|\psi\|^{2}=\psi^{\mbox{\tiny{T}}}\overline{\psi}$ for a complex-valued function $\psi$ and $\overline{\psi}$ is the conjugate of $\psi$ . By choosing $\omega(t_{1},t_{2},t_{3})$ to be the joint probability density function of three independent and identically distributed standard Cauchy random variables, the integration in the above equation has a closed form,

			$\displaystyle Ee^{-\|U_{1}-U_{2}\|-\|V_{1}-V_{2}\|-\|W_{1}-W_{2}\|}-2Ee^{-\|U_{1}-U_{3}\|-\|V_{1}-V_{4}\|-\|W_{1}-W_{2}\|}$
			$\displaystyle\hskip 85.35826pt+Ee^{-\|U_{1}-U_{2}\|}Ee^{-\|V_{1}-V_{2}\|}Ee^{-\|W_{1}-W_{2}\|},$		(1)

where $(U_{k},V_{k},W_{k})$ , $k=1,\ldots,4$ , are four independent copies of $(U,V,W)$ . Here the choice of the weight function $\omega(t_{1},t_{2},t_{3})$ is mainly for the convenient analytic form of the integration. Different from the distance correlation (Székely et al., 2007), our integration exists without any moment conditions on the data, which is more widely applicable. Furthermore, with the fact that $U\bot\!\!\!\bot W$ and $V\bot\!\!\!\bot W$ , (1) boils down to

\displaystyle E\left\{S_{U}(U_{1},U_{2})S_{V}(V_{1},V_{2})e^{-|W_{1}-W_{2}|}\right\},

(2)

where $S_{U}(U_{1},U_{2})$ and $S_{V}(V_{1},V_{2})$ are defined as

	$\displaystyle S_{U}(U_{1},U_{2})$	$\displaystyle=$	$\displaystyle E\left\{e^{-\|U_{1}-U_{2}\|}+e^{-\|U_{3}-U_{4}\|}-e^{-\|U_{1}-U_{3}\|}-e^{-\|U_{2}-U_{3}\|}\mid(U_{1},U_{2})\right\},$
	$\displaystyle S_{V}(V_{1},V_{2})$	$\displaystyle=$	$\displaystyle E\left\{e^{-\|V_{1}-V_{2}\|}+e^{-\|V_{3}-V_{4}\|}-e^{-\|V_{1}-V_{3}\|}-e^{-\|V_{2}-V_{3}\|}\mid(V_{1},V_{2})\right\}.$

Recall that $U$ , $V$ and $W$ are uniformly distributed on $(0,1)$ . With further calculations based on (2), we obtain a normalized index and define it as $\rho$ to measure the mutual dependence:

	$\displaystyle\rho(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle c_{0}E\big{\{}\left(e^{-\|U_{1}-U_{2}\|}+e^{-U_{1}}+e^{U_{1}-1}+e^{-U_{2}}+e^{U_{2}-1}+2e^{-1}-4\right)$		(3)
			$\displaystyle\left(e^{-\|V_{1}-V_{2}\|}+e^{-V_{1}}+e^{V_{1}-1}+e^{-V_{2}}+e^{V_{2}-1}+2e^{-1}-4\right)e^{-\|W_{1}-W_{2}\|}\big{\}},$		(3)

where $c_{0}=(13e^{-3}-40e^{-2}+13e^{-1})^{-1}$ . Several appealing properties of the proposed index $\rho(X,Y\mid Z)$ are summarized in Theorem 1.

Theorem 1

Suppose that the conditions in Proposition 1 are fulfilled. The index $\rho(X,Y\mid Z)$ defined in (3) has the following properties:

(1) $0\leq\rho(X,Y\mid Z)\leq 1$ , $\rho(X,Y\mid Z)=0$ holds if and only if $X\bot\!\!\!\bot Y\mid Z$ . Furthermore, if $F_{X|Z}(X|Z)=F_{Y|Z}(Y|Z)$ or $F_{X|Z}(X|Z)+F_{Y|Z}(Y|Z)=1$ , then $\rho(X,Y\mid Z)=1$ .

(2) The index $\rho$ is symmetric conditioning on $Z$ . That is, $\rho(X,Y\mid Z)=\rho(Y,X\mid Z)$ .

(3) For any strictly monotone transformations $m_{1}(\cdot)$ , $m_{2}(\cdot)$ and $m_{3}(\cdot)$ , $\rho(X,Y\mid Z)=\rho\left\{m_{1}(X),m_{2}(Y)\mid m_{3}(Z)\right\}$ .

The step-by-step derivation of $\rho(X,Y\mid Z)$ and proof of Theorem 1 are presented in the appendix. Property (1) indicates that the index $\rho$ ranges from zero to one, equals zero when the conditional independence holds, and is equal to one if $Y$ is a strictly monotone transformation of $X$ conditional on $Z$ . Property (2) shows that the index $\rho$ is a symmetric measure of conditional dependence. Property (3) illustrates that the index $\rho$ is invariant to any strictly monotone transformation. In fact, $\rho$ is not only invariant to marginal strictly monotone transformations, but also invariant to strictly monotone transformations conditional on $Z$ . For example, it can be verified that $\rho(X,Y\mid Z)=\rho[m_{1}\{X-E(X\mid Z)\},Y\mid Z]$ .

2.2 Asymptotic properties

In this section, we establish the asymptotic properties of the sample version of the proposed index under the null and alternative hypothesis. Consider independent and identically distributed samples $\{X_{i},Y_{i},Z_{i}\}$ , $i=1,\dots,n$ . To estimate the proposed index $\rho(X,Y\mid Z)$ , we apply kernel estimator for the conditional cumulative distribution function. Specifically, define

	$\displaystyle\widehat{f}_{Z}(z)=n^{-1}\sum_{i=1}^{n}K_{h}(z-Z_{i}),$
	$\displaystyle\widehat{U}=\widehat{F}_{X\mid Z}(x\mid z)=n^{-1}\sum_{i=1}^{n}K_{h}(z-Z_{i})\mathbbm{1}(X_{i}\leq x)/\widehat{f}_{Z}(z),$
	$\displaystyle\widehat{V}=\widehat{F}_{Y\mid Z}(y\mid z)=n^{-1}\sum_{i=1}^{n}K_{h}(z-Z_{i})\mathbbm{1}(Y_{i}\leq y)/\widehat{f}_{Z}(z),$

where $K_{h}(\cdot)=K(\cdot/h)/h$ , $K(\cdot)$ is a kernel function, and $h$ is the bandwidth. Besides, we use empirical distribution function to estimate the cumulative distribution function, i.e., $\widehat{W}=\widehat{F}_{Z}(z)=n^{-1}\sum_{i=1}^{n}\mathbbm{1}(Z_{i}\leq z)$ . The sample version of the index, denoted by $\widehat{\rho}(X,Y\mid Z)$ , is thus given by

	$\displaystyle\widehat{\rho}(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle c_{0}n^{-2}\sum_{i,j}\left\{\left(e^{-\|\widehat{U}_{i}-\widehat{U}_{j}\|}+e^{-\widehat{U}_{i}}+e^{\widehat{U}_{i}-1}+e^{-\widehat{U}_{j}}+e^{\widehat{U}_{j}-1}+2e^{-1}-4\right)\right.$
			$\displaystyle\left.\left(e^{-\|\widehat{V}_{i}-\widehat{V}_{j}\|}+e^{-\widehat{V}_{i}}+e^{\widehat{V}_{i}-1}+e^{-\widehat{V}_{j}}+e^{\widehat{V}_{j}-1}+2e^{-1}-4\right)e^{-\|\widehat{W}_{i}-\widehat{W}_{j}\|}\right\}.$

One can also obtain a normalized index $\rho_{0}$ , which is a direct normalization based on (1) without considering $U\bot\!\!\!\bot W$ and $V\bot\!\!\!\bot W$ :

	$\displaystyle{\rho}_{0}(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle c_{0}\left\{Ee^{-\|U_{1}-U_{2}\|-\|V_{1}-V_{2}\|-\|W_{1}-W_{2}\|}+8e^{-3}\right.$
			$\displaystyle\left.-2E\left(2-e^{-U_{1}}-e^{U_{1}-1}\right)\left(2-e^{-V_{1}}-e^{V_{1}-1}\right)\left(2-e^{-W_{1}}-e^{W_{1}-1}\right)\right\}.$

The corresponding moment estimator is

	$\displaystyle\widehat{\rho}_{0}(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle c_{0}\left\{n^{-2}\sum_{i,j}e^{-\|\widehat{U}_{i}-\widehat{U}_{j}\|-\|\widehat{V}_{i}-\widehat{V}_{j}\|-\|\widehat{W}_{i}-\widehat{W}_{j}\|}+8e^{-3}\right.$
			$\displaystyle\left.-2n^{-1}\sum_{i=1}^{n}\left(2-e^{-\widehat{U}_{i}}-e^{\widehat{U}_{i}-1}\right)\left(2-e^{-\widehat{V}_{i}}-e^{\widehat{V}_{i}-1}\right)\left(2-e^{-\widehat{W}_{i}}-e^{\widehat{W}_{i}-1}\right)\right\}.$

Although $\rho(X,Y\mid Z)=\rho_{0}(X,Y\mid Z)$ at the population level, those two statistics $\widehat{\rho}(X,Y\mid Z)$ and $\widehat{\rho}_{0}(X,Y\mid Z)$ exhibit different properties at the sample level. This is because $\widehat{\rho}(X,Y\mid Z)$ considers the fact that $U\bot\!\!\!\bot W$ and $V\bot\!\!\!\bot W$ . But on the other hand, $\widehat{\rho}_{0}(X,Y\mid Z)$ is only a regular mutual independence test statistic, where $\widehat{U}_{i}$ , $\widehat{V}_{i}$ , and $\widehat{W}_{i}$ are exchangeable. When $X\bot\!\!\!\bot Y\mid Z$ , under Conditions 1-4 listed below, $\widehat{\rho}(X,Y\mid Z)$ is of order $(n^{-1}+h^{4m})$ , while $\widehat{\rho}_{0}(X,Y\mid Z)$ is of order $(n^{-1}+h^{2m})$ because of the bias caused by nonparametric estimation. Note that $m$ is the order of kernel functions and equal to 2 when using regular kernel functions such as Gaussian and epanechnikov kernels. This indicates that $\widehat{\rho}(X,Y\mid Z)$ is essentially $n$ consistent without under-smoothing while $\widehat{\rho}_{0}(X,Y\mid Z)$ typically requires under-smoothing. In addition, our statistic $\widehat{\rho}(X,Y\mid Z)$ has the same asymptotic properties as if $U,V$ and $W$ are observed, but $\widehat{\rho}_{0}(X,Y\mid Z)$ does not. See Figure 1 for a numerical comparison between the empirical null distributions of the two statistics.

We next study the asymptotical behaviors of the estimated index, $\widehat{\rho}(X,Y\mid Z)$ , under both the null and the alternative hypotheses. The following regularity conditions are imposed to facilitate our subsequent theoretical analyses. In what follows, we derive the limiting distribution of $\widehat{\rho}(X,Y\mid Z)$ under the null hypothesis in Theorem 2.

Condition 1. The univariate kernel function $K(\cdot)$ is symmetric about zero and Lipschitz continuous. In addition, it satisfies

\displaystyle\int K(\upsilon)d\upsilon=1,\quad\int\upsilon^{i}K(\upsilon)d\upsilon=0,1\leq i\leq m-1,\quad 0\neq\int\upsilon^{m}K(\upsilon)d\upsilon<\infty.

Condition 2. The bandwidth $h$ satisfies $nh^{2}/\log^{2}(n)\to\infty$ , and $nh^{4m}\to 0$ .

Condition 3. The probability density function of $Z$ , denoted by $f_{Z}(z)$ is bounded away from $0$ to infinity.

Condition 4. The $(m-1)$ th derivatives of $F_{X\mid Z}(x\mid z)f(z)$ , $F_{Y\mid Z}(y\mid z)f(z)$ and $f_{Z}(z)$ with respect to $z$ are locally Lipschitz-continuous.

Theorem 2

Suppose that Conditions 1-4 hold and the conditions in Proposition 1 are fulfilled. Under the null hypothesis,

\displaystyle n\widehat{\rho}(X,Y\mid Z)\to c_{0}\sum_{j=1}^{\infty}\lambda_{j}\chi_{j}^{2}(1),

in distribution, where $\chi_{j}^{2}(1)$ , $j=1,2,\dots$ are independent chi-square random variables with one degree of freedom, and $\lambda_{j}$ s, $j=1,2,\ldots$ are eigenvalues of

	$\displaystyle h(u,v,w;u^{\prime},v^{\prime},w^{\prime})=$	$\displaystyle(e^{-\|u-u^{\prime}\|}+e^{-u}+e^{u-1}+e^{-u^{\prime}}+e^{u^{\prime}-1}+2e^{-1}-4)(e^{-\|v-v^{\prime}\|}$
		$\displaystyle+e^{-v}+e^{v-1}+e^{-v^{\prime}}+e^{v^{\prime}-1}+2e^{-1}-4)e^{-\|w-w^{\prime}\|}.$

That is, there exists orthonormal eigenfunction $\Phi_{j}(u,v,w)$ such that

\displaystyle\int_{0}^{1}\int_{0}^{1}\int_{0}^{1}h(u,v,w;u^{\prime},v^{\prime},w^{\prime})\Phi_{j}(u^{\prime},v^{\prime},w^{\prime})du^{\prime}dv^{\prime}dw^{\prime}=\lambda_{j}\Phi_{j}(u,v,w).

The proof of Theorem 2 is given in the appendix. To understand the asymptotic distributions intuitively, we showed in the proof that $n\widehat{\rho}(X,Y\mid Z)$ can be approximated by degenerate V-statistics, i.e.,

n\widehat{\rho}(X,Y\mid Z)=c_{0}n^{-1}\sum_{i,j}h(U_{i},V_{i},W_{i};U_{j},V_{j},W_{j})+o_{p}(n^{-1}),

and $E\{h(U_{i},V_{i},W_{i};U_{j},V_{j},W_{j})\mid(U_{i},V_{i},W_{i})\}=0$ . By the spectral decomposition,

h(u,v,w;u^{\prime},v^{\prime},w^{\prime})=\sum_{j=1}^{\infty}\lambda_{j}\Phi_{j}(u,v,w)\Phi_{j}(u^{\prime},v^{\prime},w^{\prime}).

Therefore, $n\widehat{\rho}(X,Y\mid Z)=c_{0}\sum_{j=1}^{\infty}\lambda_{j}\{n^{-1/2}\sum_{i=1}^{n}\Phi_{j}(U_{i},V_{i},W_{i})\}^{2}+o_{p}(n^{-1}),$ which converges in distribution to the weighted sum of independent chi squared distributions provided in Theorem 2 because $n^{-1/2}\sum_{i=1}^{n}\Phi_{j}(U_{i},V_{i},W_{i})$ is asymptotically standard normal (Korolyuk and Borovskich, 2013). Moreover, the $\lambda_{j}$ s, $j=1,2,\dots$ are real numbers associated with the distribution of $U$ , $V$ and $W$ , all of which follow uniform distributions on $[0,1]$ . In addition, $U$ , $V$ and $W$ are mutually independent under the null hypothesis. This indicates that the proposed test statistic is essentially distribution free under the null hypothesis. Therefore, we suggest a simulation procedure to approximate the null distribution and decide the critical value. The simulation procedure can be independent of the original data and hence greatly improved the computation efficiency. In what follows, we describe the simulation-based procedure in detail to decide the critical value $c_{\alpha}$ .

1.

Generate $\{U_{i}^{*},V_{i}^{*},W_{i}^{*}\}$ , $i=1,\dots,n$ independently from mutually independent standard uniform distributions;

Compute the statistic $\widehat{\rho^{*}}$ based on $\{U_{i}^{*},V_{i}^{*},W_{i}^{*}\}$ , $i=1,\dots,n$ , i.e.,

	$\displaystyle\widehat{\rho^{*}}$	$\displaystyle=$	$\displaystyle c_{0}n^{-2}\sum_{i,j}\left\{\left(e^{-\|U_{i}^{}-U_{j}^{}\|}+e^{-U_{i}^{}}+e^{U_{i}^{}-1}+e^{-U_{j}^{}}+e^{U_{j}^{}-1}+2e^{-1}-4\right)\right.\hskip 65.44142pt$		(4)
			$\displaystyle\left.\left(e^{-\|V_{i}^{}-V_{j}^{}\|}+e^{-V_{i}^{}}+e^{V_{i}^{}-1}+e^{-V_{j}^{}}+e^{V_{j}^{}-1}+2e^{-1}-4\right)e^{-\|W_{i}^{}-W_{j}^{}\|}\right\}.$		(4)

3.

Repeat Steps 1-2 for $B$ times and set $c_{\alpha}$ to be the upper $\alpha$ quantile of the estimated $\widehat{\rho^{*}}$ obtained from the randomly simulated samples.

Because $(U^{*},V^{*},W^{*})$ has the same distribution as that of $(U,V,W)$ under the null hypothesis, it is straightforward that this simulation-based procedure can provide a valid approximation of the asymptotic null distribution of $\widehat{\rho}(X,Y\mid Z)$ when $B$ is large. The consistency of this procedure is guaranteed by Theorem 3.

Theorem 3

Under Conditions 1-4, it follows that

\displaystyle n\widehat{\rho^{*}}\to c_{0}\sum_{j=1}^{\infty}\lambda_{j}\chi_{j}^{2}(1)

in distribution, where $\chi_{j}^{2}(1)$ , $j=1,2,\dots$ are independent $\chi^{2}(1)$ variables, and $\lambda_{j}$ , $j=1,2,\dots$ are the same as that of Theorem 2.

Next, we study the power performance of the proposed test under two kinds of alternative hypotheses, under which the conditional independence no longer holds. We first consider the global alternative, denoted by $H_{1g}$ , we have

\displaystyle H_{1g}:\quad X\not\!\perp\!\!\!\perp Y\mid Z.

We then consider a sequence of local alternatives, denoted by $H_{1l}$ ,

\displaystyle H_{1l}:\quad F_{X\mid Z}(x\mid Z=z)-F_{X\mid(Y,Z)}\left\{x\mid(Y=y,Z=z)\right\}=n^{-1/2}\ell(x,y,z).

The asymptotical properties of the test statistics $\widehat{\rho}(X,Y\mid Z)$ under the global alternative and local alternatives are given in Theorem 4, whose proof is in the appendix. Theorem 4 shows that the proposed test can consistently detect any fixed alternatives as well as local alternatives at rate $O(n^{-1/2})$ .

Theorem 4

Suppose that Conditions 1-4 hold and the conditions in Proposition 1 are fulfilled. Under $H_{1g}$ , when $nh^{2m}\to 0$ ,

\displaystyle n^{1/2}\left\{\widehat{\rho}(X,Y\mid Z)-{\rho}(X,Y\mid Z)\right\}\to\mathcal{N}(0,\sigma^{2}_{0}),

in distribution, where $\sigma_{0}^{2}\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}4c_{0}^{2}\mbox{\rm var}(P_{1,1}+P_{2,1}+P_{3,1}+P_{4,1})$ , and $(P_{1,1},P_{2,1},P_{3,1},P_{4,1})$ are defined in (7)-(10) in the appendix, respectively.

Under $H_{1l}$ ,

\displaystyle n\widehat{\rho}(X,Y\mid Z)\to\iiint\left\|\zeta(t_{1},t_{2},t_{3})\right\|^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3},

in distribution, where $\zeta(t_{1},t_{2},t_{3})$ stands for a complex-valued Gaussian random process with mean function $E\left[it_{1}\ell(X,Y,Z)e^{it_{1}U}\left\{e^{it_{2}V}-\varphi_{V}(t_{2})\right\}e^{it_{3}W}\right]$ and covariance function defined in (11) in the appendix, and $\omega(t_{1},t_{2},t_{3})$ is the joint probability density function of three independent and identically distributed standard Cauchy random variables.

3 Extensions

3.1 Multivariate continuous data

The methodology developed in Section 2 assumes that all the variables are univariate. In this section, we generalize the proposed index $\rho$ to the multivariate case. Let ${\bf x}=(X_{1},\ldots,X_{p})^{\mbox{\tiny{T}}}\in\mathbb{R}^{p}$ , ${\bf y}=(Y_{1},\ldots,Y_{q})^{\mbox{\tiny{T}}}\in\mathbb{R}^{q}$ and ${\bf z}=(Z_{1},\ldots,Z_{r})^{\mbox{\tiny{T}}}\in\mathbb{R}^{r}$ be continuous random vectors. More specifically, all elements of ${\bf x}$ , ${\bf y}$ and ${\bf z}$ are continuous random variables. Define $F_{X_{1}\mid{\bf z}}(X_{1}\mid{\bf z})$ for the cumulative distribution function of $X_{1}$ given ${\bf z}$ and $F_{X_{k}\mid{\bf z},X_{1},\ldots,X_{k-1}}(X_{k}\mid{\bf z},X_{1},\ldots,X_{k-1})$ for the cumulative distribution function of $X_{k}$ given ${\bf z},X_{1},\ldots,X_{k-1}$ for $k=2,\cdots,p$ . Similar notation apply for ${\bf y}$ and ${\bf z}$ . Denote $\widetilde{U}_{1}=F_{X_{1}\mid{\bf z}}(X_{1}\mid{\bf z})$ , $\widetilde{V}_{1}=F_{Y_{1}\mid{\bf z}}(Y_{1}\mid{\bf z})$ , $\widetilde{W}_{1}=F_{Z_{1}}(Z_{1})$ ,

	$\displaystyle\widetilde{U}_{k}=F_{X_{k}\mid{\bf z},X_{1},\ldots,X_{k-1}}(X_{k}\mid{\bf z},X_{1},\ldots,X_{k-1}),\ k=2,\ldots,p,$
	$\displaystyle\widetilde{V}_{k}=F_{Y_{k}\mid{\bf z},Y_{1},\ldots,Y_{k-1}}(Y_{k}\mid{\bf z},Y_{1},\ldots,Y_{k-1}),\ k=2,\ldots,q,$
	$\displaystyle\widetilde{W}_{k}=F_{Z_{k}\mid Z_{1},\ldots,Z_{k-1}}(Z_{k}\mid Z_{1},\ldots,Z_{k-1}),\ k=2,\ldots,r.$

Further denote ${\bf u}=(\widetilde{U}_{1},\ldots,\widetilde{U}_{p})^{\mbox{\tiny{T}}}$ , ${\bf v}=(\widetilde{V}_{1},\ldots,\widetilde{V}_{q})^{\mbox{\tiny{T}}}$ , ${\bf w}=(\widetilde{W}_{1},\ldots,\widetilde{W}_{r})^{\mbox{\tiny{T}}}$ . Similar to test of conditional independence for random variables, we first establish an equivalence between the conditional independence ${\bf x}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ and the mutual independence of ${\bf u}$ , ${\bf v}$ and ${\bf w}$ , which is stated in Theorem 5.

Theorem 5

Assume that all the conditional cumulative distribution functions used in constructing ${\bf u}$ , ${\bf v}$ and ${\bf w}$ are continuous for every given values, then ${\bf x}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ if and only if ${\bf u}$ , ${\bf v}$ and ${\bf w}$ are mutually independent.

The proof of Theorem 5 is illustrated in the appendix. Theorem 5 established an equivalence between the conditional independence ${\bf x}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ and the mutual independence among ${\bf u}$ , ${\bf v}$ and ${\bf w}$ . It is notable that when $p,q,r$ are relatively large, $({\bf u},{\bf v},{\bf w})$ may be difficult to estimate because of the curse of dimensionality. In this paper, we mainly focus on the low dimensional case. Next, we develop the mutual independence test among ${\bf u}$ , ${\bf v}$ and ${\bf w}$ . Similar as the univariate case, we set the weight function to be the joint density of $(p+q+r)$ independent and identically distributed standard Cauchy random variables. We may further derive the closed form expression of $\rho({\bf x},{\bf y}\mid{\bf z})$

\displaystyle\rho({\bf x},{\bf y}\mid{\bf z})=E\left\{S_{{\bf u}}({\bf u}_{1},{\bf u}_{2})S_{{\bf v}}({\bf v}_{1},{\bf v}_{2})e^{-\|{\bf w}_{1}-{\bf w}_{2}\|_{1}}\right\},

where $({\bf u}_{1},{\bf v}_{1},{\bf w}_{1})$ and $({\bf u}_{2},{\bf v}_{2},{\bf w}_{2})$ are two independent copies of $({\bf u},{\bf v},{\bf w})$ . Moreover,

	$\displaystyle S_{{\bf u}}({\bf u}_{1},{\bf u}_{2})$	$\displaystyle=$	$\displaystyle E\left\{e^{-\\|{\bf u}_{1}-{\bf u}_{2}\\|_{1}}+e^{-\\|{\bf u}_{3}-{\bf u}_{4}\\|_{1}}-e^{-\\|{\bf u}_{1}-{\bf u}_{3}\\|_{1}}-e^{-\\|{\bf u}_{2}-{\bf u}_{3}\\|_{1}}\mid({\bf u}_{1},{\bf u}_{2})\right\},$
	$\displaystyle S_{{\bf v}}({\bf v}_{1},{\bf v}_{2})$	$\displaystyle=$	$\displaystyle E\left\{e^{-\\|{\bf v}_{1}-{\bf v}_{2}\\|_{1}}+e^{-\\|{\bf v}_{3}-{\bf v}_{4}\\|_{1}}-e^{-\\|{\bf v}_{1}-{\bf v}_{3}\\|_{1}}-e^{-\\|{\bf v}_{2}-{\bf v}_{3}\\|_{1}}\mid({\bf v}_{1},{\bf v}_{2})\right\}.$

and $\|\cdot\|_{1}$ is the $\ell_{1}$ norm. Then $\rho({\bf x},{\bf y}\mid{\bf z})$ is nonnegative and equals zero if and only if ${\bf x}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ . By estimating $\rho({\bf x},{\bf y}\mid{\bf z})$ consistently at the sample level, the resulting test is clearly consistent. To implement the test, it is still required to study the asymptotic distributions under the conditional independence using independent and identically distributed samples $\{{\bf x}_{i},{\bf y}_{i},{\bf z}_{i}\}$ , $i=1,\dots,n$ . We also apply kernel estimator for the conditional cumulative distribution functions when estimating ${\bf u}_{i},{\bf v}_{i}$ and ${\bf w}_{i}$ . Specifically, we estimate $F_{A\mid B_{1},\ldots,B_{\ell}}(a\mid b_{1},\ldots,b_{\ell})$ with

\displaystyle\widehat{F}_{A\mid B_{1},\ldots,B_{\ell}}(a\mid b_{1},\ldots,b_{\ell})=\frac{\sum_{i=1}^{n}\mathbbm{1}(A_{i}\leq a)\prod_{k=1}^{\ell}K_{h}(B_{ik}-b_{k})}{\sum_{i=1}^{n}\prod_{k=1}^{\ell}K_{h}(B_{ik}-b_{k})}.

where $(A,B_{1},\ldots,B_{\ell})^{\mbox{\tiny{T}}}$ are $(Z_{k},Z_{1},\ldots,Z_{k-1})^{\mbox{\tiny{T}}}$ , $k=2,\ldots,r$ , $(X_{\ell},X_{1},\ldots,X_{\ell-1},{\bf z}^{\mbox{\tiny{T}}})^{\mbox{\tiny{T}}}$ , $\ell=1,\ldots,p$ , or $(Y_{j},Y_{1},\ldots,Y_{j-1},{\bf z}^{\mbox{\tiny{T}}})^{\mbox{\tiny{T}}}$ , $j=1,\ldots,q$ when estimating ${\bf w}$ , ${\bf u}$ and ${\bf v}$ , respectively. The sample version of $\rho({\bf x},{\bf y}\mid{\bf z})$ is given by

	$\displaystyle\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}E\Big{[}\left\{e^{-\\|\widehat{\bf u}_{i}-\widehat{\bf u}_{j}\\|_{1}}+e^{-\\|{\bf u}-{\bf u}^{\prime}\\|_{1}}-e^{-\\|\widehat{\bf u}_{i}-{\bf u}\\|_{1}}-e^{-\\|{\bf u}-\widehat{\bf u}_{j}\\|_{1}}\mid(\widehat{\bf u}_{i},\widehat{\bf u}_{j})\right\}$
			$\displaystyle E\left\{e^{-\\|\widehat{\bf v}_{i}-\widehat{\bf v}_{j}\\|_{1}}+e^{-\\|{\bf v}-{\bf v}^{\prime}\\|_{1}}-e^{-\\|\widehat{\bf v}_{i}-{\bf v}\\|_{1}}-e^{-\\|{\bf v}-\widehat{\bf v}_{j}\\|_{1}}\mid(\widehat{\bf v}_{i},\widehat{\bf v}_{j})\right\}e^{-\\|\widehat{\bf w}_{i}-\widehat{\bf w}_{j}\\|_{1}}\Big{]},$

where $({\bf u}^{\prime},{\bf v}^{\prime})$ is an independent copy of $({\bf u},{\bf v})$ , and further calculations yield that $\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})$ is equal to

	$\displaystyle n^{-2}\sum_{i,j}\left[\left\{e^{-\\|\widehat{\bf u}_{i}-\widehat{\bf u}_{j}\\|_{1}}+(\frac{2}{e})^{p}-\prod_{k=1}^{p}(2-e^{-\widehat{\widetilde{U}}_{ik}-1}-e^{-\widehat{\widetilde{U}}_{ik}})-\prod_{k=1}^{p}(2-e^{-\widehat{\widetilde{U}}_{jk}-1}-e^{-\widehat{\widetilde{U}}_{jk}})\right\}\right.$
	$\displaystyle\left.\left\{e^{-\\|\widehat{\bf v}_{i}-\widehat{\bf v}_{j}\\|_{1}}+(\frac{2}{e})^{q}-\prod_{k=1}^{q}(2-e^{-\widehat{\widetilde{V}}_{ik}-1}-e^{-\widehat{\widetilde{V}}_{ik}})-\prod_{k=1}^{q}(2-e^{-\widehat{\widetilde{V}}_{jk}-1}-e^{-\widehat{\widetilde{V}}_{jk}})\right\}e^{-\\|\widehat{\bf w}_{i}-\widehat{\bf w}_{j}\\|_{1}}\right].$

We next study the asymptotical behaviors of $\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})$ under the null hypothesis in Theorem 6, whose proof is given in the appendix. We begin by providing some regularity conditions for the multivariate data.

Condition $2^{\prime}$ . The bandwidth $h$ satisfies $nh^{2(r+p-1)}/\log^{2}(n)\to\infty$ , $nh^{2(r+q-1)}/\log^{2}(n)\to\infty$ , and $nh^{4m}\to 0$ .

Condition $3^{\prime}$ . The probability density function of the random vector $(Z_{1},\ldots,Z_{k})^{\mbox{\tiny{T}}}$ , $k=1,\ldots,r$ , $({\bf z}^{\mbox{\tiny{T}}},X_{1},\ldots,X_{\ell})^{\mbox{\tiny{T}}}$ , $\ell=1,\ldots,p-1$ , and $({\bf z}^{\mbox{\tiny{T}}},Y_{1},\ldots,Y_{m})^{\mbox{\tiny{T}}}$ , $m=1,\ldots,q-1$ , are all bounded away from $0$ to infinity.

Condition $4^{\prime}$ . The $(m-1)$ th derivatives of $F_{A\mid{\bf B}}(a\mid\mathbf{b})f_{\bf B}(\mathbf{b})$ , and $f_{\bf B}(\mathbf{b})$ with respect to $\mathbf{b}$ are locally Lipschitz-continuous, where $(A,{\bf B}^{\mbox{\tiny{T}}})^{\mbox{\tiny{T}}}$ can be any one of $(Z_{k},Z_{1},\ldots,Z_{k-1})^{\mbox{\tiny{T}}}$ , $k=2,\ldots,r$ , $(X_{\ell},X_{1},\ldots,X_{\ell-1},{\bf z}^{\mbox{\tiny{T}}})^{\mbox{\tiny{T}}}$ , $\ell=1,\ldots,p$ , or $(Y_{j},Y_{1},\ldots,Y_{j-1},{\bf z}^{\mbox{\tiny{T}}})^{\mbox{\tiny{T}}}$ , $j=1,\ldots,q$ .

Theorem 6

Suppose that Conditions 1 and $2^{\prime}$ - $4^{\prime}$ hold and the conditions in Theorem 5 are fulfilled. Under the null hypothesis,

\displaystyle n\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})\to\sum_{j=1}^{\infty}\lambda_{j}\chi_{j}^{2}(1),

in distribution, where $\chi_{j}^{2}(1)$ , $j=1,2,\dots$ are independent chi-square random variables with one degree of freedom, and $\lambda_{j}$ s, $j=1,2,\ldots$ are eigenvalues of

\displaystyle\widetilde{h}({\bf u},{\bf v},{\bf w};{\bf u}^{\prime},{\bf v}^{\prime},{\bf w}^{\prime})=S_{\bf u}({\bf u},{\bf u}^{\prime})S_{\bf v}({\bf v},{\bf v}^{\prime})e^{-\|{\bf w}-{\bf w}^{\prime}\|_{1}}.

That is, there exists orthonormal eigenfunction $\Phi_{j}({\bf u},{\bf v},{\bf w})$ such that

\displaystyle\iiint_{[0,1]^{p+q+r}}\widetilde{h}({\bf u},{\bf v},{\bf w};{\bf u}^{\prime},{\bf v}^{\prime},{\bf w}^{\prime})\Phi_{j}({\bf u}^{\prime},{\bf v}^{\prime},{\bf w}^{\prime})d{\bf u}^{\prime}d{\bf v}^{\prime}d{\bf w}^{\prime}=\lambda_{j}\Phi_{j}({\bf u},{\bf v},{\bf w}).

3.2 Discrete data

In this section, we discuss the setting in which $X$ , $Y$ and $Z$ are univariate discrete random variables. Specifically, we apply transformations in Brockwell (2007) to obtain $U$ and $V$ . Define $F_{X\mid Z}(x\mid z)=\mbox{Pr}(X\leq x\mid Z=z)$ , $F_{X\mid Z}(x-\mid z)=\mbox{Pr}(X<x\mid Z=z)$ , $F_{Y\mid Z}(y\mid z)=\mbox{Pr}(Y\leq y\mid Z=z)$ , and $F_{Y\mid Z}(y-\mid z)=\mbox{Pr}(Y<y\mid Z=z)$ . We further let $U_{X}$ and $U_{Y}$ be two independent and identically distributed $U(0,1)$ random variables, and apply the transformations

	$\displaystyle U$	$\displaystyle=$	$\displaystyle(1-U_{X})F_{X\mid Z}(X-\mid Z)+U_{X}F_{X\mid Z}(X\mid Z),$
	$\displaystyle V$	$\displaystyle=$	$\displaystyle(1-U_{Y})F_{Y\mid Z}(Y-\mid Z)+U_{Y}F_{Y\mid Z}(Y\mid Z).$

According to Brockwell (2007), both $U$ and $V$ are uniformly distributed on $(0,1)$ . In addition, $U\bot\!\!\!\bot Z$ and $V\bot\!\!\!\bot Z$ . In the following proposition, we establish the equivalence between the conditional independence and the mutual independence.

Theorem 7

For discrete random variables $X$ , $Y$ and $Z$ , $X\bot\!\!\!\bot Y\mid Z$ if and only if $U,V$ and $Z$ are mutually independent.

The proof of Theorem 7 is presented in the appendix. With Theorem 7, we turn a discrete conditional independence problem into a mutual independence one. Hence similar techniques can be readily applied for the mutual independence test and we omit them to avoid verbosity.

4 Numerical Validations

4.1 Conditional independence test

In this section, we investigate the finite sample performance of the proposed methods. To begin with, we illustrate that the null distribution of $\widehat{\rho}(X,Y\mid Z)$ is indeed distribution free as if $U$ , $V$ and $W$ can be observed, and is insensitive to the bandwidth of the nonparametric kernel. In comparison, the null distribution of $\widehat{\rho}_{0}(X,Y\mid Z)$ does not enjoy such properties. To facilitate the analysis, let $X=Z+\varepsilon_{1}$ and $Y=Z+\varepsilon_{2}$ , where $Z$ , $\varepsilon_{1}$ and $\varepsilon_{2}$ are independent and identically distributed. We consider three scenarios where $Z$ , $\varepsilon_{1}$ , $\varepsilon_{2}$ are independently drawn from normal distribution $N(0,1)$ , uniform distribution $U(0,1)$ , and exponential distribution $Exp(1)$ , respectively. It is clear that $X$ and $Y$ are conditionally independent given $Z$ . The sample size $n$ is set to be 100.

The simulated null distributions based on $n\widehat{\rho}(X,Y\mid Z)$ and $n\widehat{\rho}_{0}(X,Y\mid Z)$ are depicted in Figure 1. The estimated kernel density curves of $n\widehat{\rho}(X,Y\mid Z)$ based on 1000 repetitions are shown in Figure 1(a), where the reference curve is generated by the simulation-based statistic $n\widehat{\rho^{*}}(X,Y\mid Z)$ defined in (4). Clearly all the estimated density curves are close to the reference, indicating that limiting null distribution of the estimated index is indeed distribution free as if no kernel estimation is involved. In comparison, we apply the same simulation settings for $\widehat{\rho}_{0}(X,Y\mid Z)$ and plot the null distributions in Figure 1(b), from which it can be seen that the involved estimation of $U$ and $V$ significantly influences the null distribution of $\widehat{\rho}_{0}(X,Y\mid Z)$ . To show the insensitivity of the choice of the bandwidth, we set the bandwidths to be $ch_{0}$ , where $c=0.5,$ 1, and 2, respectively, and $h_{0}$ is the bandwidth obtained by the rule of thumb. The estimated kernel density curves of $n\widehat{\rho}(X,Y\mid Z)$ and $n\widehat{\rho}_{0}(X,Y\mid Z)$ based normal distributions with 1000 repetitions, together with the reference curve, are shown in Figure 1(c) and (d), from which we can see that the null distributions of $n\widehat{\rho}(X,Y\mid Z)$ almost remain the same for all choices of the bandwidths, implying that our test is insensitive to the bandwidth of the nonparametric kernel. However, the null distributions of $n\widehat{\rho}_{0}(X,Y\mid Z)$ can be dramatically influenced when the bandwidth changes.

Next, we perform the sensitivity analysis under the alternative hypothesis using Models M2 - M6, which will be listed shortly. We fix the sample size $n=100$ and set the significance level $\alpha=0.05$ . To inspect how the power performance is varied with the choice of the bandwidth, we set the bandwidths to be $ch_{0}$ , where $h_{0}$ is the bandwidth obtained by the rule of thumb and increase $c$ from 0.5 to 1.5 step by 0.1. The respective empirical powers are charted in Table 1, from which we can see that the test is the most powerful when $c$ is around 1. Therefore, we advocate using the rule of thumb (i.e., $c=1$ ) to decide the bandwidth in practice.

Refer to caption — Figure 1: (a) and (b) are the simulated null distributions for $n\widehat{\rho}(X,Y\mid Z)$ and $n\widehat{\rho}_{0}(X,Y\mid Z)$ when data were generated from different distributions, while (c) and (d) are the simulated null distributions for $n\widehat{\rho}(X,Y\mid Z)$ and $n\widehat{\rho}_{0}(X,Y\mid Z)$ using different bandwidths, respectively.

Table 1: Empirical power of tests of conditional independence for Models M2 - M6 for different bandwidth

ch_{0}

, where

c

increase from 0.5 to 1.5. The significance level

\alpha=0.05

n=100

	0.5	0.6	0.7	0.8	0.9	1.0	1.1	1.2	1.3	1.4	1.5
M2	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
M3	0.957	0.962	0.98	0.977	0.975	0.971	0.972	0.968	0.955	0.960	0.956
M4	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
M5	0.999	1.000	0.997	0.998	0.999	0.997	0.995	0.997	0.999	0.997	0.997
M6	0.999	0.999	0.999	1.000	0.999	0.999	0.996	1.000	0.999	1.000	1.000

We compare our proposed conditional independence test (denoted by “CIT”) with some popular nonlinear conditional dependence measure. They are, respectively, the conditional distance correlation (Wang et al., 2015, denoted by “CDC”), conditional mutual information (Scutari, 2010, denoted by “CMI”), and the KCI.test (Zhang et al., 2011, denoted by “KCI”). We conduct 500 replications for each scenario. The critical values of the CIT are obtained by conducting 1000 simulations. We first consider the following models with random variable $Z$ . In (M1), $X\bot\!\!\!\bot Y\mid Z$ . This model is designed for examining the empirical Type I error rate. While (M2) $-$ (M6) are designed for examining the power of the proposed test of conditional independence. Moreover, for M1 - M3, we generate $\widetilde{X}_{1}$ , $\widetilde{X}_{2}$ and $Z$ independently from $N(0,1)$ . For M4 - M6, we let $Z\sim N(0,1)$ , and generate $\widetilde{X}_{1}$ , $\widetilde{X}_{2}\sim t_{1}$ independently to investigate the power of the methods under heavy tailed distributions.

M1: $X=\widetilde{X}_{1}+Z$ , $Y=\widetilde{X}_{2}+Z$ .

M2: $X=\widetilde{X}_{1}+Z$ , $Y=\widetilde{X}_{1}^{2}+Z$ .

M3: $X=\widetilde{X}_{1}+Z$ , $Y=0.5\sin(\pi\widetilde{X}_{1})+Z$ .

M4: $X=X_{1}+Z$ , $Y=X_{1}+X_{2}+Z$ .

M5: $X=\sqrt{|X_{1}Z|}+Z$ , $Y=0.25X_{1}^{2}X_{2}^{2}+X_{2}+Z$ .

M6: $X=\log(|X_{1}Z|+1)+Z$ , $Y=0.5(X_{1}^{2}Z)+X_{2}+Z$ .

The empirical sizes for M1 and powers for the other five models at the significance levels $\alpha=0.05$ and 0.1 are depicted in Table 2. In our simulation, we consider two sample sizes $n=50$ and $100$ . Table 2 indicates that the empirical sizes of all the tests are all very close to the level $\alpha$ , which means that the Type I error can be controlled very well. As for the empirical power performance of models M2 $-$ M6, the proposed test outperforms other tests for both normal data and heavy tailed data, especially when $n=50$ .

Table 2: Empirical size and power of tests of conditional independence when

Z

is random variable at significance levels with

\alpha=0.05

and 0.1 and

n=50

and 100.

$n$ $\alpha$ Test M1 M2 M3 M4 M5 M6 50 0.05 CIT 0.056 1.000 0.572 1.000 0.954 0.888 CDC 0.050 0.886 0.338 0.837 0.881 0.473 CMI 0.048 0.380 0.070 0.829 0.898 0.448 KCI 0.038 0.884 0.250 0.191 0.048 0.010 0.1 CIT 0.098 1.000 0.712 1.000 0.974 0.938 CDC 0.134 0.970 0.562 0.934 0.930 0.642 CMI 0.088 0.484 0.132 0.854 0.912 0.485 KCI 0.088 0.968 0.344 0.323 0.145 0.042 100 0.05 CIT 0.048 1.000 0.960 1.000 1.000 0.997 CDC 0.066 0.998 0.694 0.918 0.971 0.624 CMI 0.054 0.402 0.070 0.877 0.904 0.424 KCI 0.040 1.000 0.444 0.371 0.095 0.020 0.1 CIT 0.112 1.000 0.998 1.000 1.000 0.999 CDC 0.158 1.000 0.834 0.974 0.985 0.745 CMI 0.098 0.496 0.124 0.902 0.926 0.455 KCI 0.088 1.000 0.598 0.513 0.199 0.057

We next examine the finite sample performance of the tests when $Z$ is two-dimensional random vector, i.e., ${\bf z}=(Z_{1},Z_{2})$ . M7 is designed for examining the size since $X\bot\!\!\!\bot Y\mid{\bf z}$ . Five conditional dependent model M8–M12 are designed to examine the power of the tests. Similar as M1-M6, we generate $\widetilde{X}_{1}$ , $\widetilde{X}_{2}$ , $Z_{1}$ and $Z_{2}$ independently from $N(0,1)$ in each of the following model.

M7: $X=\widetilde{X}_{1}+Z_{1}+Z_{2}$ , $Y=\widetilde{X}_{2}+Z_{1}+Z_{2}$ .

M8: $X=\widetilde{X}_{1}^{2}+Z_{1}+Z_{2}$ , $Y=\log(\widetilde{X}_{1}+10)+Z_{1}+Z_{2}$ .

M9: $X=\tanh(\widetilde{X}_{1})+Z_{1}+Z_{2}$ , $Y=\log(\widetilde{X}_{1}^{2}+10)+Z_{1}+Z_{2}$ .

M10: $X=\widetilde{X}_{1}^{2}+Z_{1}+Z_{2}$ , $Y=\log(\widetilde{X}_{1}Z_{1}+10)+Z_{1}+Z_{2}$ .

M11: $X=\widetilde{X}_{1}+Z_{1}+Z_{2}$ , $Y=\sin(\widetilde{X}_{1}Z_{1})+Z_{1}+Z_{2}$ .

M12: $X=\log(\widetilde{X}_{1}Z_{1}+10)+Z_{1}+Z_{2}$ , $Y=\exp(\widetilde{X}_{1}Z_{2})+Z_{1}+Z_{2}$ .

Table 3: Empirical size and power of conditional independence tests with

{\bf z}

being random vector, level

\alpha

= 0.05 and 0.1, and sample size

n=

50 and 100.

$n$ $\alpha$ Test M7 M8 M9 M10 M11 M12 50 0.05 CIT 0.046 0.672 0.906 0.686 0.440 0.788 CDC 0.052 0.408 0.850 0.054 0.128 0.134 CMI 0.072 0.426 0.192 0.392 0.118 0.300 KCI 0.046 0.030 0.088 0.026 0.662 0.248 0.1 CIT 0.092 0.792 0.948 0.798 0.582 0.874 CDC 0.126 0.670 0.976 0.194 0.298 0.264 CMI 0.120 0.506 0.292 0.500 0.210 0.386 KCI 0.098 0.092 0.190 0.074 0.820 0.492 100 0.05 CIT 0.048 0.936 0.998 0.936 0.664 0.988 CDC 0.084 0.890 1.000 0.920 0.972 0.306 CMI 0.044 0.412 0.164 0.392 0.126 0.300 KCI 0.044 0.038 0.172 0.028 0.990 0.358 0.1 CIT 0.104 0.958 1.000 0.966 0.766 0.996 CDC 0.168 0.978 1.000 0.986 0.992 0.456 CMI 0.128 0.486 0.240 0.460 0.194 0.390 KCI 0.090 0.092 0.358 0.108 0.998 0.608

Lastly, we study the finite sample performance of the tests when $X$ , $Y$ and $Z$ are all multivariate. Specifically, ${\bf x}=(X_{1},X_{2})$ , ${\bf y}=(Y_{1},Y_{2})$ , ${\bf z}=(Z_{1},Z_{2})$ . M13 is designed to examine the size of the tests, while M14 -M18 are designed to study the powers. We generate $\widetilde{X}_{1}$ , $X_{2}$ , $Y_{2}$ , $Z_{1}$ and $Z_{2}$ independently from $N(0,1)$ for each model in M13-M18.

M13: $X_{1}=\widetilde{X}_{1}+Z_{1}$ , $Y_{1}=Z_{1}+Z_{2}$ .

M14: $X_{1}=\log(\widetilde{X}_{1}Z_{1}+100)+Z_{1}+Z_{2}$ , $Y_{1}=\exp(\widetilde{X}_{1}Z_{1})+Z_{1}+Z_{2}$ .

M15: $X=\log(\widetilde{X}_{1}^{2}+100)+Z_{1}+Z_{2}$ , $Y_{1}=0.1\widetilde{X}_{1}^{3}+Z_{1}+Z_{2}$ .

M16: $X=\log(\widetilde{X}_{1}*Z_{1}+100)+Z_{1}+Z_{2}$ , $Y_{1}=0.5\widetilde{X}_{1}^{3}Z_{1}^{3}+Z_{1}+Z_{2}$ .

M17: $X=0.1\exp(\widetilde{X}_{1})+Z_{1}+Z_{2}$ , $Y_{1}=\sin(\widetilde{X}_{1})+|\widetilde{X}_{1}|+Z_{1}+Z_{2}$ .

M18: $X=\tanh(\widetilde{X}_{1})+Z_{1}+Z_{2}$ , $Y_{1}=0.5\log(\widetilde{X}_{1}^{2}+100)+0.5X_{2}+Z_{1}+Z_{2}$ .

Simulation results of models M7 $-$ M12 and models M13 $-$ M18 are summarized in Tables 3 and 4, respectively, from which it can be seen that the proposed method outperforms all other tests in terms of type I error and power. Furthermore, the numerical results seem to indicates that when the conditional set is large, the conditional mutual information and the kernel based conditional test tend to have relatively low power. The conditional distance correlation has high power but suffers huge computational burden.

Table 4: Empirical size and power of conditional independence tests when

{\bf x}

{\bf y}

and

{\bf z}

are random vectors. Level

\alpha

= 0.05 and 0.1, and sample size

n=

50 and 100.

$n$ $\alpha$ Test M13 M14 M15 M16 M17 M18 50 0.05 CIT 0.05 1.000 1.000 1.000 0.363 0.986 CDC 0.019 0.022 0.984 0.304 0.120 0.833 CMI 0.01 0.582 0.812 0.205 0.082 0.020 KCI 0.036 0.036 0.047 0.042 0.052 0.688 0.1 CIT 0.100 1.000 1.000 1.000 0.564 0.997 CDC 0.092 0.064 1.000 0.733 0.262 0.939 CMI 0.028 0.6 0.915 0.294 0.170 0.054 KCI 0.086 0.081 0.099 0.083 0.118 0.886 100 0.05 CIT 0.026 1.000 1.000 1.000 0.873 1 CDC 0.048 0.032 1.000 0.965 0.38 0.999 CMI 0.004 0.498 1.000 0.211 0.218 0.036 KCI 0.044 0.042 0.052 0.05 0.068 0.999 0.1 CIT 0.077 1.000 1.000 1.000 0.965 1.000 CDC 0.127 0.13 1.000 1.000 0.567 1.000 CMI 0.013 0.523 1.000 0.338 0.378 0.077 KCI 0.087 0.077 0.102 0.091 0.119 1.000

4.2 Application to causal discovery

In this section, we consider a real application of conditional independence test in causal discovery of directed acyclic graphs. For a directed acyclic graph $G=(V,E)$ , the nodes $V=\{1,2,\dots,p\}$ corresponds to a random vector ${\bf x}=(X_{1},\dots,X_{p})\in\mathbb{R}^{p}$ , and the set of edges $E\subset V\times V$ do not form any directed cycles. Two vertices $X_{1}$ and $X_{2}$ are d-separated by a subset of vertices $S$ if every path between them is blocked by $S$ . One may refer to Wasserman (2013) for a formal definition. Denote the joint distribution of ${\bf x}$ by $P({\bf x})$ . The joint distribution is said to be faithful with respect to a graph $G$ if and only if for any $i,j\in V$ , and any subset $S\subset V$ ,

\displaystyle X_{i}\bot\!\!\!\bot X_{j}\mid\{X_{r}:r\in S\}\Leftrightarrow\mbox{node }i\mbox{ and node }j\mbox{ are $d$-separated by the set }S.

One of the most famous algorithms for recovering the graphs satisfying the faithfulness assumption is the PC–algorithm (Spirtes et al., 2000; Kalisch and Bühlmann, 2007). The algorithm could recover the graph up to its Markov equivalence class, which are sets of graphs that entail the same set of (conditional) independencies. The performance of the PC–algorithm relies heavily on the (conditional) independence tests because small mistakes at the beginning of the algorithm may lead to a totally different directed acyclic graph (Zhang et al., 2011). One of the most popular approach for testing conditional independence is the partial correlation, under the assumption that the joint distribution $P({\bf x})$ follows Gaussian distribution and the nodes relationship is linear (Kalisch and Bühlmann, 2007). Conditional mutual information (Scutari, 2010) is another possible option. Zhang et al. (2011) proposed a kernel-based conditional independence test for causal discovery in directed acyclic graphs. In this section, we demonstrate how the proposed conditional independence index can be applied for causal discovery in real data. Additional simulation results are relegated into the appendix.

We analyze a real data set originally from the National Institute of Diabetes and Digestive and Kidney Diseases (Smith et al., 1988). The dataset consists of serval medical predictor variables for the outcome of diabetes. We are interested in the causal structural of five variables: age, body mass index, 2-hour serum insulin, plasma glucose concentration and diastolic blood pressure. After removing the missing data, we obtain $n=392$ samples. The PC–algorithm is applied to examine the causal structure of the five variables based on the four different conditional independence measures. We implement the causal algorithms by the R package pcalg (Kalisch et al., 2012). The estimated causal structure are shown in Figure 2. The proposed test gives the same estimated graph as the partial correlation, since the data is approximately normally distributed. To interpret the graph, note that age is likely to affect the diastolic blood pressure. The plasma glucose concentration level is also likely to be related to age. This is confirmed by the causal findings of (a), (b) and (c) in Figure 2. Besides, serum insulin has plausible causal effects on body mass index, and is also related to plasma glucose concentration. The causal relationship between age and blood pressure is not confirmed in part (c), the test of conditional mutual information. This is not a surprise given the high false positive rate reported in Table 5 in the appendix. The kernel based conditional independence is a little conservative and is not able to detect some of the possible edges. To further illustrate the robustness of the proposed test, we make a logarithm transformation on the data, and apply the same procedure again. The estimated causal structures are reported in Figure 3. We observe that the proposed test results in the same estimated structure as the original data, which echos property (4) in Theorem 1, i.e., the proposed test is invariant with respect to monotone transformations. However, the partial correlation test yields more false positives, since the normality assumption is violated.

5 Discussions

In this paper we developed a new index to measure conditional dependence of random variables and vectors. The calculation of the estimated index requires low computational cost. The test of conditional independence based on the newly proposed index has nontrivial power against all fixed and local alternatives. The proposed test is distribution free under the null hypothesis, and is robust to outliers and heavy-tailed data. Numerical simulations indicate that the proposed test is more powerful than some existing ones. The proposed test is further applied to directed acyclic graphs for causal discovery and shows superior performance.

6 Technical Proofs

6.1 Proof of Proposition 1

For $0<u<1$ , define quantile function for $X\mid Z$ as

F_{X\mid Z}^{-1}(u\mid Z=z)=\inf\{x:F_{X\mid Z}(x\mid Z=z)\geq u\}.

Similarly, we can define $F_{Y\mid Z}^{-1}(v\mid Z=z)$ , the quantile function for $Y\mid Z$ , for $0<v<1$ . Since $X$ and $Y$ have continuous conditional distribution functions for every given value of $Z$ , it follows that when $0<u<1$ and $0<v<1$ ,

			$\displaystyle\mbox{Pr}\{F_{X\mid Z}(X\mid Z)\leq u,F_{Y\mid Z}(Y\mid Z)\leq v\mid Z=z\}$
		$\displaystyle=$	$\displaystyle\mbox{Pr}\{X\leq F_{X\mid Z}^{-1}(u\mid Z),Y\leq F_{Y\mid Z}^{-1}(v\mid Z)\mid Z=z\}.$

This implies that $X\bot\!\!\!\bot Y\mid Z$ is equivalent to $U\bot\!\!\!\bot V\mid Z$ . In addition, conditional on $Z=z$ , $F_{X\mid Z}(X\mid Z=z)$ is uniformly distributed on $(0,1)$ , which does not depend on the particular value of $z$ , indicating $F_{X\mid Z}(X\mid Z)\bot\!\!\!\bot Z$ . That is, $U\bot\!\!\!\bot Z$ . Similarly, $V\bot\!\!\!\bot Z$ . Thus, the conditional independence $f_{U,V\mid Z}(u,v\mid z)=f_{U\mid Z}(u\mid z)f_{V\mid Z}(v\mid z)$ together with $f_{U\mid Z}(u\mid z)=f_{U}(u)$ and $f_{V\mid Z}(v\mid z)=f_{V}(v)$ implies that

\displaystyle f_{U,V\mid Z}(u,v\mid z)=f_{U}(u)f_{V}(v).

Thus, $U$ , $V$ and $Z$ are mutually independent.

On the other hand, the mutual independence immediately leads to the conditional independence $U\bot\!\!\!\bot V\mid Z$ . Therefore, the conditional independence $X\bot\!\!\!\bot Y\mid Z$ is equivalent to the mutual independence of $U$ , $V$ and $Z$ . We next show that the mutual independence of $U$ , $V$ and $Z$ is equivalent to mutual independence of $U$ , $V$ and $W$ .

Define $F_{Z}^{-1}(w)=\inf\{z:F(z)\geq w\}$ for $0<w<1$ . If $U$ , $V$ and $Z$ are mutually independent, then

			$\displaystyle\mbox{Pr}(U\leq u,V\leq v,W\leq w)=\mbox{Pr}\{U\leq u,V\leq v,Z\leq F_{Z}^{-1}(w)\}$
		$\displaystyle=$	$\displaystyle\mbox{Pr}(U\leq u)\mbox{Pr}(V\leq v)\mbox{Pr}\{Z\leq F_{Z}^{-1}(w)\}=\mbox{Pr}(U\leq u)\mbox{Pr}(V\leq v)\mbox{Pr}(W\leq w)$

holds for all $u,$ $v$ and $w$ . On the other hand, if $U$ , $V$ and $W$ are mutually independent, it follows that

			$\displaystyle\mbox{Pr}(U\leq u,V\leq v,Z\leq z)=\mbox{Pr}\{U\leq u,V\leq v,W\leq F_{Z}(z)\}$
		$\displaystyle=$	$\displaystyle\mbox{Pr}(U\leq u)\mbox{Pr}(V\leq v)\mbox{Pr}\{W\leq F_{Z}(z)\}=\mbox{Pr}(U\leq u)\mbox{Pr}(V\leq v)\mbox{Pr}(Z\leq z),$

holds for all $u,$ $v$ and $z$ . Thus, the mutual independence of $U$ , $V$ and $Z$ is equivalent to the mutual independence of $U$ , $V$ and $W$ . This completes the proof.

6.2 Proof of Theorem 1

We start with the derivation of the index $\rho$ . $U$ , $V$ and $W$ are mutually independent if and only if

\displaystyle\iiint\left\|\varphi_{U,V,W}(t_{1},t_{2},t_{3})-\varphi_{U}(t_{1})\varphi_{V}(t_{2})\varphi_{W}(t_{3})\right\|^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}=0,

(5)

for arbitrary positive weight function $\omega(\cdot)$ . We now show that the proposed index $\rho$ is proportional to the integration in (5) by choosing $\omega(t_{1},t_{2},t_{3})$ to be the joint probability density function of three independent and identically distributed standard Cauchy random variables.

With some calculation and Fubini’s theorem, we have

			$\displaystyle\iiint\left\\|\varphi_{U,V,W}(t_{1},t_{2},t_{3})-\varphi_{U}(t_{1})\varphi_{V}(t_{2})\varphi_{W}(t_{3})\right\\|^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}$
		$\displaystyle=$	$\displaystyle E\iiint e^{it_{1}(U_{1}-U_{2})+it_{2}(V_{1}-V_{2})+it_{3}(W_{1}-W_{2})}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}$
			$\displaystyle-E\iiint e^{it_{1}(U_{1}-U_{3})+it_{2}(V_{1}-V_{4})+it_{3}(W_{1}-W_{2})}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}$
			$\displaystyle-E\iiint e^{it_{1}(U_{3}-U_{1})+it_{2}(V_{4}-V_{1})+it_{3}(W_{2}-W_{1})}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}$
			$\displaystyle+E\iiint e^{it_{1}(U_{1}-U_{2})+it_{2}(V_{3}-V_{4})+it_{3}(W_{5}-W_{6})}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}.$

According to the property of characteristic function for standard Cauchy distribution, we have

\displaystyle\int e^{it(U_{1}-U_{2})}\pi^{-1}(1+t^{2})^{-1}dt=e^{-|U_{1}-U_{2}|}.

Then by choosing $\omega(t_{1},t_{2},t_{3})=\pi^{-3}(1+t_{1}^{2})^{-1}(1+t_{2}^{2})^{-1}(1+t_{3}^{2})^{-1}$ , i.e., the joint density function of three i.i.d. standard Cauchy distributions, we have

\displaystyle\begin{split}&\iiint\left\|\varphi_{U,V,W}(t_{1},t_{2},t_{3})-\varphi_{U}(t_{1})\varphi_{V}(t_{2})\varphi_{W}(t_{3})\right\|^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}\\ &=Ee^{-|U_{1}-U_{2}|-|V_{1}-V_{2}|-|W_{1}-W_{2}|}-2Ee^{-|U_{1}-U_{3}|-|V_{1}-V_{4}|-|W_{1}-W_{2}|}\\ &\hskip 28.45274pt+Ee^{-|U_{1}-U_{2}|}Ee^{-|V_{1}-V_{2}|}Ee^{-|W_{1}-W_{2}|}.\end{split}

(6)

Furthermore, with the fact that $U\bot\!\!\!\bot W$ and $V\bot\!\!\!\bot W$ , (6) is equal to

\displaystyle E\left\{S_{U}(U_{1},U_{2})S_{V}(V_{1},V_{2})e^{-|W_{1}-W_{2}|}\right\},

where $S_{U}(U_{1},U_{2})$ and $S_{V}(V_{1},V_{2})$ are defined as

	$\displaystyle S_{U}(U_{1},U_{2})$	$\displaystyle=$	$\displaystyle E\left\{e^{-\|U_{1}-U_{2}\|}+e^{-\|U_{3}-U_{4}\|}-e^{-\|U_{1}-U_{3}\|}-e^{-\|U_{2}-U_{3}\|}\mid(U_{1},U_{2})\right\},$
	$\displaystyle S_{V}(V_{1},V_{2})$	$\displaystyle=$	$\displaystyle E\left\{e^{-\|V_{1}-V_{2}\|}+e^{-\|V_{3}-V_{4}\|}-e^{-\|V_{1}-V_{3}\|}-e^{-\|V_{2}-V_{3}\|}\mid(V_{1},V_{2})\right\}.$

Now we calculate the normalization constant $c_{0}$ . It follows by the Cauchy-Schwarz inequality that

			$\displaystyle E\left\{S_{U}(U_{1},U_{2})S_{V}(V_{1},V_{2})e^{-\|W_{1}-W_{2}\|}\right\}$
		$\displaystyle=$	$\displaystyle E\left[e^{-\|W_{1}-W_{2}\|}E\left\{S_{U}(U_{1},U_{2})S_{V}(V_{1},V_{2})\mid(W_{1},W_{2})\right\}\right]$
		$\displaystyle\leq$	$\displaystyle E\left[e^{-\|W_{1}-W_{2}\|}E^{1/2}\left\{S_{U}^{2}(U_{1},U_{2})\mid(W_{1},W_{2})\right\}E^{1/2}\left\{S_{V}^{2}(V_{1},V_{2})\mid(W_{1},W_{2})\right\}\right]$
		$\displaystyle=$	$\displaystyle E\left[e^{-\|W_{1}-W_{2}\|}E^{1/2}\left\{S_{U}^{2}(U_{1},U_{2})\right\}E^{1/2}\left\{S_{V}^{2}(V_{1},V_{2})\right\}\right]$
		$\displaystyle=$	$\displaystyle 2e^{-1}(6.5e^{-2}-20e^{-1}+6.5)$
		$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle c_{0}^{-1},$

where the equality holds if and only if $S_{U}(U_{1},U_{2})=\lambda\left\{S_{V}(V_{1},V_{2})\right\}$ holds with probability 1, where $\lambda\geq 0$ (because $E\left\{S_{U}(U_{1},U_{2})S_{V}(V_{1},V_{2})\right\}$ is nonnegative). Recall that $U$ , $V$ and $W$ are all uniformly distributed on $(0,1)$ , further calculations give us

	$\displaystyle S_{U}(U_{1},U_{2})$	$\displaystyle=$	$\displaystyle e^{-\|U_{1}-U_{2}\|}+e^{-U_{1}}+e^{U_{1}-1}+e^{-U_{2}}+e^{U_{2}-1}+2e^{-1}-4,$
	$\displaystyle S_{V}(V_{1},V_{2})$	$\displaystyle=$	$\displaystyle e^{-\|V_{1}-V_{2}\|}+e^{-V_{1}}+e^{V_{1}-1}+e^{-V_{2}}+e^{V_{2}-1}+2e^{-1}-4.$

This, together with the normalization constant $c_{0}$ , yield the expression of the index $\rho(X,Y\mid Z)$ . Subsequently, the properties of the index $\rho(X,Y\mid Z)$ can be established.

(1) $\rho(X,Y\mid Z)\geq 0$ holds obviously. It equals $0$ only when $U$ , $V$ , $W$ are mutual independent, which is equivalent to the conditional independence $X\bot\!\!\!\bot Y\mid Z$ . $\rho(X,Y\mid Z)\leq 1$ holds obviously according to the derivation of the index $\rho$ . The equality holds if and only if $S_{U}(U_{1},U_{2})=\lambda\left\{S_{V}(V_{1},V_{2})\right\}$ . Because $ES_{U}^{2}(U_{1},U_{2})=ES_{V}^{2}(V_{1},V_{2})$ , we have $\lambda=1$ . If $U=V$ or $U+V=1$ , it is easy to check $S_{U}(U_{1},U_{2})=S_{V}(V_{1},V_{2})$ . This completes the proof of Part (1).

(2) This property is trivial according to the definition of $\rho$ .

(3) For strictly monotone transformations $m_{3}(\cdot)$ , we have when $m_{1}(\cdot)$ is strictly increasing, $U_{m}=F_{m_{1}(X)\mid m_{3}(Z)}\{m_{1}(X)\mid m_{3}(Z)\}$ equals $U=F_{X\mid Z}(X\mid Z)$ , while when $m_{1}(\cdot)$ is strictly decreasing, it equals $1-U$ . It can be easily verified that $S_{U}(U_{1},U_{2})=S_{U}(1-U_{1},1-U_{2})$ , then we have $S_{U_{m}}(U_{m1},U_{m2})=S_{U}(U_{1},U_{2})$ no matter whether $m_{1}(\cdot)$ is strictly increasing or decreasing. Similarly, let $V_{m}=F_{m_{2}(Y)\mid m_{3}(Z)}\{m_{2}(Y)\mid m_{3}(Z)\}$ , we obtain that $S_{V_{m}}(V_{m1},V_{m2})=S_{V}(V_{1},V_{2})$ . It is clear that $W_{m}=F_{m_{3}(Z)}\{m_{3}(Z)\}$ equals either $W$ or $1-W$ , implying $e^{-|W_{m1}-W_{m2}|}=e^{-|W_{1}-W_{2}|}$ . Therefore, we have

\displaystyle E\left\{S_{U_{m}}(U_{m1},U_{m2})S_{V_{m}}(V_{m1},V_{m2})e^{-|W_{m1}-W_{m2}|}\right\}=E\left\{S_{U}(U_{1},U_{2})S_{V}(V_{1},V_{2})e^{-|W_{1}-W_{2}|}\right\},

and it is true that $\rho\left\{m_{1}(X),m_{2}(Y)\mid m_{3}(Z)\right\}=\rho(X,Y\mid Z)$ .

6.3 Proof of Theorem 2

For simplicity, we denote by $g(x)=e^{-|x|}$ and $S_{0}(x,y)=g\left(x-y\right)+e^{-x}+e^{x-1}+e^{-y}+e^{y-1}+2e^{-1}-4$ . We write $c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$ as

\displaystyle n^{-2}\sum_{i,j}\left\{S_{0}(\widehat{U}_{i},\widehat{U}_{j})S_{0}(\widehat{V}_{i},\widehat{V}_{j})g(\widehat{W}_{i}-\widehat{W}_{j})\right\}.

With Taylor’s expansion, when $nh^{4m}\to 0$ and $nh^{2}/\log^{2}(n)\to\infty$ , we have

			$\displaystyle S_{0}(\widehat{U}_{i},\widehat{U}_{j})$
		$\displaystyle=$	$\displaystyle\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}\Delta U_{i}-\left\{g^{\prime}\left(U_{i}-U_{j}\right)-e^{U_{j}-1}+e^{-U_{j}}\right\}\Delta U_{j}$
		$\displaystyle+$	$\displaystyle 2^{-1}\left\{g^{\prime\prime}\left(U_{i}-U_{j}\right)(\Delta U_{i}-\Delta U_{j})^{2}+\left(e^{-U_{i}}+e^{U_{i}-1}\right)(\Delta U_{i})^{2}+\left(e^{-U_{j}}+e^{U_{j}-1}\right)(\Delta U_{j})^{2}\right\}$
		$\displaystyle+$	$\displaystyle 6^{-1}\left\{g^{\prime\prime\prime}\left(U_{i}-U_{j}\right)(\Delta U_{i}-\Delta U_{j})^{3}+\left(e^{U_{i}-1}-e^{-U_{i}}\right)(\Delta U_{i})^{3}+\left(e^{U_{j}-1}-e^{-U_{j}}\right)(\Delta U_{j})^{3}\right\}$
		$\displaystyle+$	$\displaystyle S_{0}\left(U_{i},U_{j}\right)+o_{p}(n^{-1})$
		$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle S_{1}(U_{i},U_{j})+S_{2}(U_{i},U_{j})+S_{3}(U_{i},U_{j})+S_{0}(U_{i},U_{j})+o_{p}(n^{-1}),$

where $\Delta U_{i}=\widehat{U}_{i}-U_{i}$ , and $S_{k}(U_{i},U_{j}),k=1,2,3$ are defined to be each row in an obvious way. Similarly, we expand $S_{0}(\widehat{V}_{i},\widehat{V}_{j})$ as

\displaystyle S_{0}(\widehat{V}_{i},\widehat{V}_{j})

\displaystyle=

\displaystyle S_{0}(V_{i},V_{j})+S_{1}(V_{i},V_{j})+S_{2}(V_{i},V_{j})+S_{3}(V_{i},V_{j})+o_{p}(n^{-1}).

As for $g(\widehat{W}_{i}-\widehat{W}_{j})$ , we have

	$\displaystyle g(\widehat{W}_{i}-\widehat{W}_{j})$	$\displaystyle=$	$\displaystyle g(W_{i}-W_{j})+g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})$
			$\displaystyle+2^{-1}g^{\prime\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})^{2}+o_{p}(n^{-1}).$

Therefore, it follows that

$\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle 2^{-1}n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})^{2}$
		$\displaystyle+n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 1}S_{k}\left(U_{i},U_{j}\right)S_{l}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})$
		$\displaystyle+n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 3}S_{k}\left(U_{i},U_{j}\right)S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2^{-1}Q_{1}+Q_{2}+Q_{3}+o_{p}(n^{-1}).$

We first show that $Q_{1}$ is of order $o_{p}(n^{-1})$ . In fact,

$\displaystyle Q_{1}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})^{2}$
	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})\left\{\left(\Delta W_{i}\right)^{2}+\left(\Delta W_{j}\right)^{2}\right\}$
		$\displaystyle+2n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})(W_{i}\Delta W_{j}+W_{j}\Delta W_{i})$
		$\displaystyle-2n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})(\widehat{W}_{i}\widehat{W}_{j}-W_{i}W_{j})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle Q_{1,1}+Q_{1,2}+Q_{1,3}.$

Under the null hypothesis, $U$ , $V$ and $W$ are mutually independent, it is easy to verify that $E\left\{S_{0}\left(U_{i},U_{j}\right)\mid(U_{i},V_{i},W_{i},V_{j},W_{j})\right\}=0$ , and hence

\displaystyle E\left\{S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})\mid(U_{i},V_{i},W_{i})\right\}=0.

Then for each fixed $i$ , we have

\displaystyle n^{-1}\sum_{j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})=O_{p}(n^{-1/2}).

Thus, $Q_{1,1}$ is clearly of order $o_{p}(n^{-1})$ because $\left(\Delta W_{i}\right)^{2}=o_{p}(n^{-1/2})$ . Now we deal with $Q_{1,2}$ .

	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})W_{i}\Delta W_{j}$
	$\displaystyle=n^{-3}\sum_{i,j,k}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})W_{i}\left\{\mathbbm{1}(W_{k}\leq W_{j})-W_{j}\right\}.$

Under the null hypothesis, because $E\left\{S_{0}\left(U_{i},U_{j}\right)\mid(U_{i},V_{i},W_{i},V_{j},W_{j})\right\}=0$ , $W$ is uniformly distributed, we have $E\{\mathbbm{1}(W_{k}\leq W_{j})\mid W_{j}\}=W_{j}$ . Thus, the corresponding U-statistic of the equation above is second order degenerate. In addition, when any two of $i,j,k$ are identical, we have

\displaystyle E\left[S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})W_{i}\left\{\mathbbm{1}(W_{k}\leq W_{j})-W_{j}\right\}\right]=0.

Then the summations associated with any two of the $i,j,k$ are identical is of order $o_{p}(1)$ . Therefore, $Q_{1,2}=o_{p}(n^{-1})$ . It remains to deal with $Q_{1,3}$ . Similarly, the corresponding U-statistic of

\displaystyle n^{-4}\sum_{i,j,k,l}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})\left\{\mathbbm{1}(W_{k}\leq W_{i})\mathbbm{1}(W_{l}\leq W_{j})-W_{i}W_{j}\right\}

is second order degenerate and hence we obtain that $Q_{1,3}=o_{p}(n^{-1})$ .

Next, we show $Q_{2}=o_{p}(n^{-1})$ . Recall that

$\displaystyle Q_{2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})$
		$\displaystyle+2n^{-2}\sum_{i,j}S_{1}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\Delta W_{i}$
		$\displaystyle+2n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{1}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\Delta W_{i}$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle Q_{2,1}+2Q_{2,2}+2Q_{2,3}.$

Similar to dealing with $Q_{1,2}$ , we have $Q_{2,1}=o_{p}(n^{-1})$ . We now evaluate $Q_{2,2}$ .

	$\displaystyle Q_{2,2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}\Delta U_{i}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\Delta W_{i}$
			$\displaystyle-n^{-2}\sum_{i,j}\left\{g^{\prime}\left(U_{i}-U_{j}\right)-e^{U_{j}-1}+e^{-U_{j}}\right\}\Delta U_{j}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\Delta W_{i}.$

Because under the null hypothesis, $E\left\{S_{0}\left(V_{i},V_{j}\right)\mid(U_{i},V_{i},W_{i},U_{j},W_{j})\right\}=0$ , it follows that for each $i$ ,

\displaystyle n^{-1}\sum_{j}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})=O_{p}(n^{-1/2}),

and the first term of $Q_{2,2}$ is of order $o_{p}(n^{-1})$ because $\Delta W_{i}=O_{p}(n^{-1/2})$ and $\Delta U_{i}=o_{p}(1)$ . In addition, for each $j$ ,

\displaystyle n^{-2}\sum_{i,k}\left\{g^{\prime}\left(U_{i}-U_{j}\right)-e^{U_{j}-1}+e^{-U_{j}}\right\}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\left\{\mathbbm{1}(W_{k}\leq W_{i})-W_{i}\right\}

is degenerate and hence the second term of $Q_{2,2}$ is also of order $o_{p}(n^{-1})$ because $\Delta U_{j}=o_{p}(1)$ , indicating $Q_{2,2}=o_{p}(n^{-1})$ . Similarly, we have $Q_{2,3}=o_{p}(n^{-1})$ . Thus it follows that $Q_{2}=o_{p}(n^{-1})$ .

Finally, we show that $Q_{3}=n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1})$ . Or equivalently, we show that $Q_{3,1},Q_{3,2}$ and $Q_{3,3}$ are all of order $o_{p}(n^{-1})$ , where

$\displaystyle Q_{3,1}$	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle n^{-2}\sum_{i,j}\sum_{k+l=1}S_{k}\left(U_{i},U_{j}\right)S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right),$
$\displaystyle Q_{3,2}$	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle n^{-2}\sum_{i,j}\sum_{k+l=2}S_{k}\left(U_{i},U_{j}\right)S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right),$
$\displaystyle Q_{3,3}$	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle n^{-2}\sum_{i,j}\sum_{k+l=3}S_{k}\left(U_{i},U_{j}\right)S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right).$

We first show that $Q_{3,1}\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}Q_{3,1,1}+Q_{3,1,2}=o_{p}(n^{-1})$ , where

	$\displaystyle Q_{3,1,1}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{1}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right),$
	$\displaystyle Q_{3,1,2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right).$

Without loss of generality, we only show that $Q_{3,1,1}=o_{p}(n^{-1})$ . Calculate

$\displaystyle S_{1}\left(U_{i},U_{j}\right)$	$\displaystyle=$	$\displaystyle\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}\Delta U_{i}-\left\{g^{\prime}\left(U_{i}-U_{j}\right)-e^{U_{j}-1}+e^{-U_{j}}\right\}\Delta U_{j},$
$\displaystyle\Delta U_{i}$	$\displaystyle=$	$\displaystyle n^{-1}\sum_{k=1}^{n}\left[\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})}{f(Z_{i})}-U_{i}-\frac{U_{i}\left\{K_{h}(Z_{k}-Z_{i})-f(Z_{i})\right\}}{f(Z_{i})}\right]$
		$\displaystyle\hskip 56.9055pt+O_{p}(h^{2m}+n^{-1}h^{-1}\log^{2}n).$

Thus, when $nh^{4m}\to 0$ and $nh^{2}/\log^{2}(n)\to\infty$ ,

$\displaystyle Q_{3,1,1}$	$\displaystyle=$	$\displaystyle 2n^{-2}\sum_{i,j}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)\Delta U_{i}$
	$\displaystyle=$	$\displaystyle 2n^{-3}\sum_{i,j,k}\bigg{(}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle\cdot\left[\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})}{f(Z_{i})}-U_{i}-\frac{U_{i}\left\{K_{h}(Z_{k}-Z_{i})-f(Z_{i})\right\}}{f(Z_{i})}\right]\bigg{)}+o_{p}(n^{-1})$
	$\displaystyle=$	$\displaystyle\frac{2}{n(n-1)}\sum_{i\neq j}\bigg{(}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle\cdot\left[\frac{E\left\{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})-U_{i}K_{h}(Z_{k}-Z_{i})\mid(X_{i},Z_{i})\right\}}{f(Z_{i})}\right]\bigg{)}+o_{p}(n^{-1}),$

where the last equality holds due to equations (2)-(3) of section 5.3.4 in Serfling (2009) and the fact that $\mbox{\rm var}\{K_{h}(Z_{i}-Z_{j})\}=O(h^{-1})$ . Therefore, since

\displaystyle\sup_{X_{i},Z_{i}}\big{|}E\left\{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})-U_{i}K_{h}(Z_{k}-Z_{i})\mid(X_{i},Z_{i})\right\}\big{|}=O(h^{m}),

when the $(m-1)$ th derivatives of $F_{X\mid Z}(x\mid z)f_{Z}(z)$ and $f_{Z}(z)$ with respect to $z$ are locally Lipschitz-continuous, $Q_{3,1,1}$ is clearly of order $o_{p}(n^{-1})$ by noting that the summation in the last display is degenerate.

Next, we consider $Q_{3,2}$ , where

$\displaystyle Q_{3,2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{1}\left(U_{i},U_{j}\right)S_{1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
	$\displaystyle+$	$\displaystyle n^{-2}\sum_{i,j}S_{2}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
	$\displaystyle+$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{2}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle Q_{3,2,1}+Q_{3,2,2}+Q_{3,2,3}.$

We first show that $Q_{3,2,1}=o_{p}(n^{-1})$ . It follows that

$\displaystyle Q_{3,2,1}$	$\displaystyle=$	$\displaystyle 2n^{-2}\sum_{i,j}\bigg{[}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}\left\{g^{\prime}\left(V_{i}-V_{j}\right)+e^{V_{i}-1}-e^{-V_{i}}\right\}$
		$\displaystyle\cdot g\left(W_{i}-W_{j}\right)\Delta U_{i}\Delta V_{i}\bigg{]}+2n^{-2}\sum_{i,j}\bigg{[}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}$
		$\displaystyle\cdot\left\{g^{\prime}\left(V_{i}-V_{j}\right)-e^{V_{j}-1}+e^{-V_{j}}\right\}g\left(W_{i}-W_{j}\right)\Delta U_{i}\Delta V_{j}\bigg{]}$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle Q_{3,2,1,1}+Q_{3,2,1,2}.$

Because $E\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\mid(U_{i},V_{i},W_{i},V_{j},W_{j})\}=0$ . For each $i$ ,

\displaystyle n^{-1}\sum_{j=1}^{n}\bigg{[}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}\left\{g^{\prime}\left(V_{i}-V_{j}\right)+e^{V_{i}-1}-e^{-V_{i}}\right\}g\left(W_{i}-W_{j}\right)\bigg{]}

is of order $O_{p}(n^{-1/2})$ . Then $Q_{3,2,1,1}=o_{p}(n^{-1})$ because $\Delta U_{i}\Delta V_{i}=o_{p}(n^{-1/2})$ . For $Q_{3,2,1,2}$ , $-\Delta U_{i}\Delta V_{j}=(U_{i}\Delta V_{j}+U_{j}\Delta V_{i})+(\widehat{U}_{i}\widehat{V}_{j}-U_{i}V_{j})$ . By expanding the $\widehat{U}_{i}$ , $\widehat{U}_{j}$ , $\widehat{V}_{i}$ , $\widehat{V}_{j}$ in $Q_{3,2,1,2}$ as U statistics and apply the same technique as showing $Q_{3,1,1}=o_{p}(n^{-1})$ and $Q_{1,3}=o_{p}(n^{-1})$ , it follows immediately that $Q_{3,2,1,2}$ is of order $o_{p}(n^{-1})$ . Thus $Q_{3,2,1}$ is of order $o_{p}(n^{-1})$ .

For $Q_{3,2,2}$ and $Q_{3,2,3}$ , we only show that $Q_{3,2,2}$ is of order $o_{p}({n^{-1}})$ , for simplicity. Recall that $S_{2}\left(U_{i},U_{j}\right)$ is defined as

	$\displaystyle S_{2}\left(U_{i},U_{j}\right)$	$\displaystyle=$	$\displaystyle 2^{-1}\bigg{[}g^{\prime\prime}\left(U_{i}-U_{j}\right)(\Delta U_{i})^{2}+g^{\prime\prime}\left(U_{i}-U_{j}\right)(\Delta U_{j})^{2}+\left(e^{-U_{i}}+e^{U_{i}-1}\right)(\Delta U_{i})^{2}$
			$\displaystyle\hskip 56.9055pt+\left(e^{-U_{j}}+e^{U_{j}-1}\right)(\Delta U_{j})^{2}-2g^{\prime\prime}\left(U_{i}-U_{j}\right)\Delta U_{i}\Delta U_{j}\bigg{]}.$

The summations associated with either $(\Delta U_{i})^{2}$ or $(\Delta U_{j})^{2}$ are of order $o_{p}(n^{-1})$ following similar reasons as showing $Q_{3,2,1,1}=o_{p}(n^{-1})$ , and that associated with $\Delta U_{i}\Delta U_{j}$ are of order $o_{p}(n^{-1})$ similar to dealing with $Q_{3,2,1,2}$ . As a result, $Q_{3,2}$ is of order $o_{p}(n^{-1})$ .

For $Q_{3,3}$ , we have

\displaystyle Q_{3,3}=n^{-2}\sum_{i,j}\sum_{k=1}^{4}S_{4-k}\left(U_{i},U_{j}\right)S_{k-1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}\sum_{k=1}^{4}Q_{3,3,k}.

We only show that $Q_{3,3,1}=o_{p}(n^{-1})$ because the other terms are similar. Calculate

			$\displaystyle 6S_{3}\left(U_{i},U_{j}\right)$
		$\displaystyle=$	$\displaystyle g^{\prime\prime\prime}\left(U_{i}-U_{j}\right)\left\{(\Delta U_{i})^{3}-3(\Delta U_{i})^{2}\Delta U_{j}+3\Delta U_{i}(\Delta U_{j})^{2}-(\Delta U_{j})^{3}\right\}$
			$\displaystyle\hskip 85.35826pt+\left(e^{U_{i}-1}-e^{-U_{i}}\right)(\Delta U_{i})^{3}+\left(e^{U_{j}-1}-e^{-U_{j}}\right)(\Delta U_{j})^{3}.$

Then the summations associated with either $(\Delta U_{i})^{3}$ or $(\Delta U_{j})^{3}$ are of order $o_{p}(n^{-1})$ similar to dealing with $Q_{3,2,1,1}$ , and that associated with $\Delta U_{i}(\Delta U_{j})^{2}$ or $(\Delta U_{i})^{2}\Delta U_{j}$ are of order $o_{p}(n^{-1})$ similar to the second term of $Q_{2,2}$ .

To sum up, we have shown that

\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)=n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1}),

where the right hand side is essentially a first order degenerate V-statistics. Thus by applying Theorem 6.4.1.B of Serfling (2009),

\displaystyle n\widehat{\rho}(X,Y\mid Z)\xrightarrow{d}c_{0}\sum_{j=1}^{\infty}\lambda_{j}\chi_{j}^{2}(1)

where $\chi_{j}^{2}(1)$ , $j=1,2,\dots$ are independent $\chi^{2}(1)$ random variables, and $\lambda_{j}$ , $j=1,2,\dots$ are the corresponding eigenvalues of $h(u,v,w;u^{\prime},v^{\prime},w^{\prime})$ . It is worth mentioning that the kernel is positive definite and hence all the $\lambda_{j}$ s are positive. Therefore, the proof is completed.

6.4 Proof of Theorem 3

Since we generate $\{U_{i}^{*},V_{i}^{*},W_{i}^{*}\}$ , $i=1,\dots,n$ independently from uniform distribution, it is quite straightforward that $U^{*},V^{*}$ and $W^{*}$ are mutually independent. In addition, we can write $\widehat{\rho}^{*}$ as

\displaystyle\widehat{\rho}^{*}=n^{-2}\sum_{i,j}c_{0}S_{0}\left(U_{i}^{*},U_{j}^{*}\right)S_{0}\left(V_{i}^{*},V_{j}^{*}\right)g\left(W_{i}^{*}-W_{j}^{*}\right),

which clearly converges in distribution to $c_{0}\sum_{j=1}^{\infty}\widetilde{\lambda}_{j}\chi_{j}^{2}(1)$ , where $\chi_{j}^{2}(1)$ , $j=1,2,\dots$ are independent $\chi^{2}(1)$ random variables, and $\widetilde{\lambda}_{j}$ , $j=1,2,\dots$ are the eigenvalues of $h(u,v,w;u^{\prime},v^{\prime},w^{\prime})$ , implying $\widetilde{\lambda}_{j}=\lambda_{j}$ , for $j=1,2,\dots$ , and hence the proof is completed.

6.5 Proof of Theorem 4

We use the same notation as the proof in Theorem 2. With Taylor’s expansion, when $nh^{4m}\to 0$ and $nh^{2}/\log^{2}(n)\to\infty$ , we have

$\displaystyle S_{0}(\widehat{U}_{i},\widehat{U}_{j})$	$\displaystyle=$	$\displaystyle S_{0}(U_{i},U_{j})+S_{1}(U_{i},U_{j})+o_{p}(n^{-1/2}),$
$\displaystyle S_{0}(\widehat{V}_{i},\widehat{V}_{j})$	$\displaystyle=$	$\displaystyle S_{0}(V_{i},V_{j})+S_{1}(V_{i},V_{j})+o_{p}(n^{-1/2}),$
$\displaystyle g(\widehat{W}_{i}-\widehat{W}_{j})$	$\displaystyle=$	$\displaystyle g(W_{i}-W_{j})+g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})+o_{p}(n^{-1/2}).$

Therefore, we have

$\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})$
		$\displaystyle+n^{-2}\sum_{i,j}S_{1}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g(W_{i}-W_{j})$
		$\displaystyle+n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{1}\left(V_{i},V_{j}\right)g(W_{i}-W_{j})+o_{p}(n^{-1/2})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle P_{1}+P_{2}+P_{3}+P_{4}+o_{p}(n^{-1/2}).$

We deal with the four terms, respectively. For $P_{1}$ , by applying Lemma 5.7.3 and equation (2) in section 5.3.1 of Serfling (2009), we have

$\displaystyle P_{1}-c_{0}^{-1}\rho(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}E\left[\left\{S_{0}\left(U_{i},U\right)S_{0}\left(V_{i},V\right)g\left(W_{i}-W\right)\right\}\mid(U_{i},V_{i},W_{i})\right]$	(7)
		$\displaystyle\hskip 85.35826pt-2c_{0}^{-1}\rho(X,Y\mid Z)+o_{p}(n^{-1/2})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}\left\{P_{1,i}-c_{0}^{-1}\rho(X,Y\mid Z)\right\}+o_{p}(n^{-1/2}).$

Next, we deal with $P_{2}$ . Recall that

	$\displaystyle P_{2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})$
		$\displaystyle=$	$\displaystyle 2n^{-3}\sum_{i,j,k}S_{0}\left(U_{i},U_{j}\right)S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\left\{I(W_{k}\leq W_{i})-W_{i}\right\}.$

By applying Lemma 5.7.3 and equation (2) in section 5.3.1 of Serfling (2009) again, we can obtain that

	$\displaystyle P_{2}$	$\displaystyle=$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}E\left[S_{0}(U,U^{\prime})S_{0}(V,V^{\prime})g^{\prime}(W-W^{\prime})\left\{I(W_{i}\leq W)-W\right\}\mid W_{i}\right]+o_{p}(n^{-1/2})$		(8)
		$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}P_{2,i}+o_{p}(n^{-1/2}),$		(8)

where $(U^{\prime},V^{\prime},W^{\prime})$ is an independent copy of $(U,V,W)$ .

It remains to deal with $P_{3}$ and $P_{4}$ . $P_{3}$ equals

	$\displaystyle P_{3}$	$\displaystyle=$	$\displaystyle 2n^{-3}\sum_{i,j,k}\bigg{[}\left\{g^{\prime}\left(U_{i}-U_{j}\right)+e^{U_{i}-1}-e^{-U_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g(W_{i}-W_{j})$
			$\displaystyle\hskip 56.9055pt\cdot\left\{\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})-U_{i}K_{h}(Z_{k}-Z_{i})}{f(Z_{i})}\right\}\bigg{]}+o_{p}(n^{-1/2}).$

By definition, we have $V\bot\!\!\!\bot W$ and hence it can be verified that

\displaystyle E\left\{S_{0}\left(V_{i},V_{j}\right)g(W_{i}-W_{j})\mid V_{i},W_{i}\right\}=0.

Denote $P_{3}^{k,i}=\{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})-U_{i}K_{h}(Z_{k}-Z_{i})\}/f_{Z}(Z_{i})$ . Thus,

			$\displaystyle E\left\{\left(e^{U_{i}-1}-e^{-U_{i}}\right)S_{0}\left(V_{i},V_{j}\right)g(W_{i}-W_{j})P_{3}^{k,i}\mid X_{k},Z_{k}\right\}$
		$\displaystyle=$	$\displaystyle E\left\{\left(e^{U_{i}-1}-e^{-U_{i}}\right)S_{0}\left(V_{i},V_{j}\right)g(W_{i}-W_{j})P_{3}^{k,i}\mid X_{k},Z_{k},U_{i}\right\}=0.$

Thus when $nh^{2m}\to 0$ and $nh\to\infty$ , we have

$\displaystyle P_{3}$	$\displaystyle=$	$\displaystyle 2n^{-1}\sum_{k=1}^{n}E\left[\left\{I(X\geq X_{k})-U\right\}g^{\prime}(U-U^{\prime})S_{0}(V,V^{\prime})\right.$	(9)
		$\displaystyle\left.\hskip 56.9055pt\cdot g(W-W^{\prime})\mid X_{k},Z_{k}\right]+o_{p}(n^{-1/2})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}P_{3,i}+o_{p}(n^{-1/2}).$

Following similar arguments, we can show that

$\displaystyle P_{4}$	$\displaystyle=$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}E\left[\left\{I(Y\geq Y_{i})-V\right\}g^{\prime}(V-V^{\prime})S_{0}(U,U^{\prime})\right.$	(10)
		$\displaystyle\left.\hskip 56.9055pt\cdot g(W-W^{\prime})\mid Z=Z_{i}\right]+o_{p}(n^{-1/2})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}P_{4,i}+o_{p}(n^{-1/2}).$

To sum up, it is shown that $c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$ could be written as

			$\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)-c_{0}^{-1}\rho(X,Y\mid Z)$
		$\displaystyle=$	$\displaystyle 2n^{-1}\sum_{i=1}^{n}\left\{P_{1,i}+P_{2,i}+P_{3,i}+P_{4,i}-c_{0}^{-1}\rho(X,Y\mid Z)\right\}+o_{p}(n^{-1/2}),$

where $P_{1,i},P_{2,i},P_{3,i}$ and $P_{4,i}$ are defined in (7)-(10), respectively. Thus the asymptotic normality follows.

Under the local alternative, we have $U=F(X\mid Y,Z)+n^{-1/2}\ell(X,Y,Z),$ and it is easy to verify that $\breve{U}\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}F(X\mid Y,Z)$ , $V$ and $W$ are mutually independent. With Taylor’s expansion, we have

			$\displaystyle S_{0}(\widehat{U}_{i},\widehat{U}_{j})$
		$\displaystyle=$	$\displaystyle\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\Delta\breve{U}_{i}-\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})-e^{\breve{U}_{j}-1}+e^{-\breve{U}_{j}}\right\}\Delta\breve{U}_{j}$
		$\displaystyle+$	$\displaystyle 2^{-1}\left\{g^{\prime\prime}(\breve{U}_{i}-\breve{U}_{j})(\Delta\breve{U}_{i}-\Delta\breve{U}_{j})^{2}+\left(e^{-\breve{U}_{i}}+e^{\breve{U}_{i}-1}\right)(\Delta\breve{U}_{i})^{2}+\left(e^{-\breve{U}_{j}}+e^{\breve{U}_{j}-1}\right)(\Delta\breve{U}_{j})^{2}\right\}$
		$\displaystyle+$	$\displaystyle 6^{-1}\left\{g^{\prime\prime\prime}(\breve{U}_{i}-\breve{U}_{j})(\Delta\breve{U}_{i}-\Delta\breve{U}_{j})^{3}+\left(e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right)(\Delta\breve{U}_{i})^{3}+\left(e^{\breve{U}_{j}-1}-e^{-\breve{U}_{j}}\right)(\Delta\breve{U}_{j})^{3}\right\}$
		$\displaystyle+$	$\displaystyle S_{0}(\breve{U}_{i},\breve{U}_{j})+o_{p}(n^{-1})$
		$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle S_{1}(\breve{U}_{i},\breve{U}_{j})+S_{2}(\breve{U}_{i},\breve{U}_{j})+S_{3}(\breve{U}_{i},\breve{U}_{j})+S_{0}(\breve{U}_{i},\breve{U}_{j})+o_{p}(n^{-1}),$

where $\Delta\breve{U}_{i}=\widehat{U}_{i}-\breve{U}_{i}$ . Then we can write $c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$ as

			$\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$
		$\displaystyle=$	$\displaystyle 2^{-1}n^{-2}\sum_{i,j}S_{0}(\breve{U}_{i},\breve{U}_{j})S_{0}\left(V_{i},V_{j}\right)g^{\prime\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})^{2}$
			$\displaystyle+n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 1}S_{k}(\breve{U}_{i},\breve{U}_{j})S_{l}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})$
			$\displaystyle+n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 3}S_{k}(\breve{U}_{i},\breve{U}_{j})S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1})$
		$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2^{-1}\widetilde{Q}_{1}+\widetilde{Q}_{2}+\widetilde{Q}_{3}+o_{p}(n^{-1}).$

With the same arguments as that in deriving $Q_{1}=o_{p}(n^{-1})$ in the proof of Theorem 2, we have $\widetilde{Q}_{1}=o_{p}(n^{-1})$ .

Now we deal with $\widetilde{Q}_{2}$ . For ease of notation, we write $\ell(X_{i},Y_{i},Z_{i})$ as $\ell_{i}$ in the remaining proof. By decomposing $\Delta\breve{U}_{i}$ as $\Delta\breve{U}_{i}=\Delta U_{i}+n^{-1/2}\ell_{i}$ , we have

$\displaystyle\widetilde{Q}_{2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{1}(\breve{U}_{i},\breve{U}_{j})S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})(\Delta W_{i}-\Delta W_{j})+o_{p}(n^{-1})$
	$\displaystyle=$	$\displaystyle 2n^{-2}\sum_{i,j}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\Delta\breve{U}_{i}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\Delta W_{i}$
		$\displaystyle-2n^{-2}\sum_{i,j}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\Delta\breve{U}_{i}$
		$\displaystyle\hskip 56.9055pt\cdot S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\Delta W_{j}+o_{p}(n^{-1})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle 2\widetilde{Q}_{2,1}-2\widetilde{Q}_{2,2}+o_{p}(n^{-1}).$

$\widetilde{Q}_{2,1}$ is clearly of order $o_{p}(n^{-1})$ because for each fixed $i$ ,

\displaystyle E\left[\left.\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\ \right|\ (\breve{U}_{i},V_{i},W_{i})\right]=0.

$\widetilde{Q}_{2,2}$ is also of order $o_{p}(n^{-1})$ because for each $i$ ,

\displaystyle n^{-2}\sum_{j,k}\bigg{[}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g^{\prime}(W_{i}-W_{j})\left\{\mathbbm{1}(W_{k}\leq W_{j})-W_{j}\right\}\bigg{]}

is degenerate.

Then we deal with the last quantity, $\widetilde{Q}_{3}$ , where

$\displaystyle\widetilde{Q}_{3}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}(\breve{U}_{i},\breve{U}_{j})S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+n^{-2}\sum_{i,j}\sum_{k+l=1}S_{k}(\breve{U}_{i},\breve{U}_{j})S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+n^{-2}\sum_{i,j}\sum_{k+l=2}S_{k}(\breve{U}_{i},\breve{U}_{j})S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+n^{-2}\sum_{i,j}\sum_{k+l=3}S_{k}(\breve{U}_{i},\breve{U}_{j})S_{l}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle\widetilde{Q}_{3,0}+\widetilde{Q}_{3,1}+\widetilde{Q}_{3,2}+\widetilde{Q}_{3,3}+o_{p}(n^{-1}).$

We simplify $\widetilde{Q}_{3,1}$ first. According to the proof of Theorem 2, we have

$\displaystyle\widetilde{Q}_{3,1}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{1}(\breve{U}_{i},\breve{U}_{j})S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1})$
	$\displaystyle=$	$\displaystyle 2n^{-5/2}\sum_{i,j}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\ell_{i}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+2n^{-3}\sum_{i,j,k}\bigg{[}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle\hskip 56.9055pt\cdot\left\{\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})}{f(Z_{i})}-\frac{U_{i}K_{h}(Z_{k}-Z_{i})}{f(Z_{i})}\right\}\bigg{]}+o_{p}(n^{-1})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle\widetilde{Q}_{3,1,1}+2\widetilde{Q}_{3,1,2}+o_{p}(n^{-1}).$

As we can see, $\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})}{f(Z_{i})}-\frac{U_{i}K_{h}(Z_{k}-Z_{i})}{f(Z_{i})}$ is of order $h^{m}$ . Then we can derive that

	$\displaystyle\widetilde{Q}_{3,1,2}$	$\displaystyle=$	$\displaystyle n^{-1}\sum_{j=1}^{n}E\bigg{[}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
			$\displaystyle\cdot\left\{\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})}{f(Z_{i})}-\frac{U_{i}K_{h}(Z_{k}-Z_{i})}{f(Z_{i})}\right\}\ \bigg{\|}\ (X_{j},Y_{j},Z_{j})\bigg{]}+O_{p}(n^{-1}h^{m}).$

It can be verified that

			$\displaystyle E\left\{\frac{K_{h}(Z_{k}-Z_{i})\mathbbm{1}(X_{k}\leq X_{i})}{f(Z_{i})}-\frac{U_{i}K_{h}(Z_{k}-Z_{i})}{f(Z_{i})}\ \bigg{\|}\ (X_{i},Z_{i})\right\}$
		$\displaystyle=$	$\displaystyle E\left\{\frac{K_{h}(Z_{k}-Z_{i})F(X_{i}\mid Z_{k})}{f(Z_{i})}-\frac{U_{i}K_{h}(Z_{k}-Z_{i})}{f(Z_{i})}\ \bigg{\|}\ (X_{i},Z_{i})\right\}$
		$\displaystyle=$	$\displaystyle f^{-1}(Z_{i})\int K(u)\left\{F(X_{i}\mid Y_{i},Z_{i}+uh)+n^{-1/2}\ell(X_{i},Y_{i},Z_{i}+uh)\right\}f(uh+Z_{i})du$
			$\displaystyle-f^{-1}(Z_{i})\breve{U}_{i}E\left\{K_{h}(Z_{k}-Z_{i})\mid Z_{i}\right\}-n^{-1/2}f^{-1}(Z_{i})\ell_{i}E\left\{K_{h}(Z_{k}-Z_{i})\mid Z_{i}\right\}.$

And

\displaystyle f^{-1}(Z_{i})\int K(u)F(X_{i}\mid Y_{i},Z_{i}+uh)f(uh+Z_{i})du-f^{-1}(Z_{i})\breve{U}_{i}E\left\{K_{h}(Z_{k}-Z_{i})\mid Z_{i}\right\}

is of order $h^{m}$ and is only a function of $(\breve{U}_{i},Z_{i})$ , which is independent of $V_{i}$ . Substituting this into $\widetilde{Q}_{3,1,2}$ , we have

	$\displaystyle\widetilde{Q}_{3,1,2}$	$\displaystyle=$	$\displaystyle n^{-3/2}\sum_{j=1}^{n}E\bigg{(}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
			$\displaystyle\cdot\left[\frac{\int K(u)\left\{\ell(X_{i},Y_{i},Z_{i}+uh)-\ell_{i}\right\}f(Z_{i}+uh)du}{f(Z_{i})}\right]\ \bigg{\|}\ (X_{j},Y_{j},Z_{j})\bigg{)}+O_{p}(n^{-1}h^{m}).$

Then $\widetilde{Q}_{3,1,2}$ is clearly of order $o_{p}(n^{-1})$ by noting that the conditional expectation of the above display is of order $h^{m}$ while the unconditional expectation is zero.

Next, we deal with $\widetilde{Q}_{3,2}$ . It is straightforward that

$\displaystyle\widetilde{Q}_{3,2}$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{1}(\breve{U}_{i},\breve{U}_{j})S_{1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+n^{-2}\sum_{i,j}S_{2}(\breve{U}_{i},\breve{U}_{j})S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle\widetilde{Q}_{3,2,1}+\widetilde{Q}_{3,2,2}+o_{p}(n^{-1}).$

Similar to dealing with $\widetilde{Q}_{3,1,2}$ , we can show that

\displaystyle\widetilde{Q}_{3,2,1}

\displaystyle=

\displaystyle 2n^{-5/2}\sum_{i,j}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\ell_{i}S_{1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1}).

Then $\widetilde{Q}_{3,2,1}$ is of order $o_{p}(n^{-1})$ because $S_{1}\left(V_{i},V_{j}\right)=o_{p}(1)$ and

\displaystyle\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\ell_{i}S_{1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)

is also $o_{p}(1)$ with the expectation being zero. Similar as before, we can show that

$\displaystyle\widetilde{Q}_{3,2,2}$	$\displaystyle=$	$\displaystyle 2^{-1}n^{-3}\sum_{i,j}\bigg{[}\bigg{\{}g^{\prime\prime}(\breve{U}_{i}-\breve{U}_{j})(\ell_{i}-\ell_{j})^{2}+\left(e^{-\breve{U}_{i}}+e^{\breve{U}_{i}-1}\right)\ell_{i}^{2}$
		$\displaystyle\hskip 56.9055pt+\left(e^{-\breve{U}_{j}}+e^{\breve{U}_{j}-1}\right)\ell_{j}^{2}\bigg{\}}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)\bigg{]}+o_{p}(n^{-1})$
	$\displaystyle=$	$\displaystyle 2^{-1}n^{-1}E\bigg{[}\left\{g^{\prime\prime}(\breve{U}_{1}-\breve{U}_{2})(\ell_{1}-\ell_{2})^{2}+\left(e^{-\breve{U}_{1}}+e^{\breve{U}_{1}-1}\right)\ell_{1}^{2}+\left(e^{-\breve{U}_{2}}+e^{\breve{U}_{2}-1}\right)\ell_{2}^{2}\right\}$
		$\displaystyle\hskip 56.9055pt\cdot S_{0}\left(V_{1},V_{2}\right)g\left(W_{1}-W_{2}\right)\bigg{]}+o_{p}(n^{-1})$
	$\displaystyle=$	$\displaystyle-n^{-1}E\left\{g^{\prime\prime}(\breve{U}_{1}-\breve{U}_{2})\ell_{1}\ell_{2}S_{0}\left(V_{1},V_{2}\right)g\left(W_{1}-W_{2}\right)\right\}.$

Now we show that $\widetilde{Q}_{3,3}=o_{p}(n^{-1})$ . Because $(\Delta U_{i})^{2}=o_{p}(n^{-1/2})$ , we have

$\displaystyle\widetilde{Q}_{3,3}$	$\displaystyle=$	$\displaystyle 2n^{-2}\sum_{i,j}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\Delta U_{i}S_{2}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+2^{-1}n^{-2}\sum_{i,j}\bigg{\{}g^{\prime\prime}(\breve{U}_{i}-\breve{U}_{j})(\Delta U_{i}-\Delta U_{j})^{2}+\left(e^{-\breve{U}_{i}}+e^{\breve{U}_{i}-1}\right)(\Delta U_{i})^{2}$
		$\displaystyle\hskip 71.13188pt+\left(e^{-\breve{U}_{j}}+e^{\breve{U}_{j}-1}\right)(\Delta U_{j})^{2}\bigg{\}}S_{1}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
		$\displaystyle+6^{-1}n^{-2}\sum_{i,j}\bigg{\{}g^{\prime\prime\prime}(\breve{U}_{i}-\breve{U}_{j})(\Delta U_{i}-\Delta U_{j})^{3}+\left(e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right)(\Delta U_{i})^{3}$
		$\displaystyle\hskip 71.13188pt+\left(e^{\breve{U}_{j}-1}-e^{-\breve{U}_{j}}\right)(\Delta U_{j})^{3}\bigg{\}}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)+o_{p}(n^{-1/2}).$

Similar to dealing with $\widetilde{Q}_{3,1,2}$ , we can obtain that $\widetilde{Q}_{3,3}=o_{p}(n^{-1})$ .

Combining these results together, we have

			$\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$
		$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}S_{0}(\breve{U}_{i},\breve{U}_{j})S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
			$\displaystyle+2n^{-5/2}\sum_{i,j}\left\{g^{\prime}(\breve{U}_{i}-\breve{U}_{j})+e^{\breve{U}_{i}-1}-e^{-\breve{U}_{i}}\right\}\ell_{i}S_{0}\left(V_{i},V_{j}\right)g\left(W_{i}-W_{j}\right)$
			$\displaystyle-n^{-1}E\left\{g^{\prime\prime}(\breve{U}_{1}-\breve{U}_{2})\ell_{1}\ell_{2}S_{0}\left(V_{1},V_{2}\right)g\left(W_{1}-W_{2}\right)\right\}+o_{p}(n^{-1}).$

Then we can verify that $c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$ can be written as

	$\displaystyle c_{0}^{-1}\widehat{\rho}(X,Y\mid Z)$	$\displaystyle=$	$\displaystyle\iiint\bigg{\\|}n^{-1}\sum_{j=1}^{n}\bigg{[}\left\{e^{it_{1}\breve{U}_{j}}-\varphi_{\breve{U}}(t_{1})\right\}\left\{e^{it_{2}V_{j}}-\varphi_{V}(t_{2})\right\}e^{it_{3}W_{j}}$
			$\displaystyle+it_{1}n^{-1/2}\ell_{j}e^{it_{1}\breve{U}_{j}}\left\{e^{it_{2}V_{j}}-\varphi_{V}(s)\right\}e^{it_{3}W_{j}}\bigg{]}\bigg{\\|}^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}+o_{p}(n^{-1}).$

It is clear that the empirical process

\displaystyle n^{-1/2}\sum_{j=1}^{n}\bigg{[}\left\{e^{it_{1}\breve{U}_{j}}-\varphi_{\breve{U}}(t_{1})\right\}\left\{e^{it_{2}V_{j}}-\varphi_{V}(t_{2})\right\}e^{it_{3}W_{j}}+it_{1}\ell_{j}e^{it_{1}\breve{U}_{j}}\left\{e^{it_{2}V_{j}}-\varphi_{V}(t_{2})\right\}e^{it_{3}W_{j}}\bigg{]}

converges in distribution to a complex valued gaussian process $\zeta(t_{1},t_{2},t_{3})$ with mean function

\displaystyle E\left[it_{1}\ell(X,Y,Z)e^{it_{1}\breve{U}}\left\{e^{it_{2}V}-\varphi_{V}(t_{2})\right\}e^{it_{3}W}\right],

and covariance function $\mbox{\rm cov}\{\zeta(t_{1},t_{2},t_{3}),\overline{\zeta(t_{10},t_{20},t_{30})}\}$ given by

\displaystyle\left\{\varphi_{U}(t_{1}-t_{10})-\varphi_{U}(t_{1})\varphi_{U}(-t_{10})\right\}\left\{\varphi_{V}(t_{2}-t_{20})-\varphi_{V}(t_{2})\varphi_{V}(-t_{20})\right\}\varphi_{W}(t_{3}-t_{30}).

(11)

Therefore, by employing empirical process technology, we can derive that

\displaystyle c_{0}^{-1}n\widehat{\rho}(X,Y\mid Z)\overset{d}{\longrightarrow}\iiint\left\|\zeta(t_{1},t_{2},t_{3})\right\|^{2}\omega(t_{1},t_{2},t_{3})dt_{1}dt_{2}dt_{3}.

Hence we conclude the proof for local alternatives.

6.6 Proof of Theorem 5

Firstly, ${\bf x}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ is equivalent to $(X_{1},\ldots,X_{p})\bot\!\!\!\bot{\bf y}\mid{\bf z}$ . According to Proposition 4.6 of Cook (2009), it is also equivalent to

\displaystyle X_{1}\bot\!\!\!\bot{\bf y}\mid{\bf z},\quad X_{2}\bot\!\!\!\bot{\bf y}\mid({\bf z},X_{1}),\quad\ldots,\quad X_{p}\bot\!\!\!\bot{\bf y}\mid({\bf z},X_{1},\ldots,X_{p-1}).

Following similar arguments for proving the equivalence between $X\bot\!\!\!\bot Y\mid Z$ and $U\bot\!\!\!\bot V\mid Z$ in the proof of Proposition 1, the above conditional independence series are equivalent to

\displaystyle\widetilde{U}_{1}\bot\!\!\!\bot{\bf y}\mid{\bf z},\quad\widetilde{U}_{2}\bot\!\!\!\bot{\bf y}\mid({\bf z},X_{1}),\quad\ldots,\quad\widetilde{U}_{p}\bot\!\!\!\bot{\bf y}\mid({\bf z},X_{1},\ldots,X_{p-1}).

According to the proof of Proposition 1, we know that $\widetilde{U}_{k}\bot\!\!\!\bot{\bf y}\mid({\bf z},X_{1},\ldots,X_{k-1})$ is equivalent to $\widetilde{U}_{k}\bot\!\!\!\bot{\bf y}\mid({\bf z},\widetilde{U}_{1},\ldots,\widetilde{U}_{k-1})$ for $k=1,\ldots,p-1$ . Hence the conditional independence series hold if and only if

\displaystyle\widetilde{U}_{1}\bot\!\!\!\bot{\bf y}\mid{\bf z},\quad\widetilde{U}_{2}\bot\!\!\!\bot{\bf y}\mid({\bf z},\widetilde{U}_{1}),\quad\ldots,\quad\widetilde{U}_{p}\bot\!\!\!\bot{\bf y}\mid({\bf z},\widetilde{U}_{1},\ldots,\widetilde{U}_{p-1}).

Then by applying Proposition 4.6 of Cook (2009) again, we know that $(X_{1},\ldots,X_{p})\bot\!\!\!\bot{\bf y}\mid{\bf z}$ is equivalent to ${\bf u}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ . Furthermore, with the same arguments for dealing with ${\bf y}$ , we can obtain that it is additionally equivalent to ${\bf u}\bot\!\!\!\bot{\bf v}\mid{\bf z}$ . Besides, with the fact that ${\bf u}\bot\!\!\!\bot{\bf z}$ and ${\bf v}\bot\!\!\!\bot{\bf z}$ , we can get the conditional independence ${\bf x}\bot\!\!\!\bot{\bf y}\mid{\bf z}$ is equivalent to the mutual independence of ${\bf u}$ , ${\bf v}$ and ${\bf z}$ . Therefore, the proof is completed by following similar arguments with the proof of Proposition 1.

6.7 Proof of Theorem 6

Following the proof of Theorem 2, we denote by $\widetilde{g}({\bf u}_{1},{\bf u}_{2})=e^{-\|{\bf u}_{1}-{\bf u}_{2}\|_{1}}$ , $\widetilde{g}({\bf v}_{1},{\bf v}_{2})=e^{-\|{\bf v}_{1}-{\bf v}_{2}\|_{1}}$ and $\widetilde{g}({\bf w}_{1},{\bf w}_{2})=e^{-\|{\bf w}_{1}-{\bf w}_{2}\|_{1}}$ . Then we have

	$\displaystyle S_{{\bf u}}({\bf u}_{1},{\bf u}_{2})$	$\displaystyle=$	$\displaystyle E\left\{\widetilde{g}({\bf u}_{1},{\bf u}_{2})+\widetilde{g}({\bf u}_{3},{\bf u}_{4})-\widetilde{g}({\bf u}_{1},{\bf u}_{3})-\widetilde{g}({\bf u}_{2},{\bf u}_{3})\mid({\bf u}_{1},{\bf u}_{2})\right\},$
	$\displaystyle S_{{\bf v}}({\bf v}_{1},{\bf v}_{2})$	$\displaystyle=$	$\displaystyle E\left\{\widetilde{g}({\bf v}_{1},{\bf v}_{2})+\widetilde{g}({\bf v}_{3},{\bf v}_{4})-\widetilde{g}({\bf v}_{1},{\bf v}_{3})-\widetilde{g}({\bf v}_{2},{\bf v}_{3})\mid({\bf v}_{1},{\bf v}_{2})\right\}.$

Therefore, $\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})$ can be written as

\displaystyle\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})=n^{-2}\sum_{i,j}\left\{S_{{\bf u}}(\widehat{\bf u}_{i},\widehat{\bf u}_{j})S_{{\bf v}}(\widehat{\bf v}_{i},\widehat{\bf v}_{j})\widetilde{g}(\widehat{\bf w}_{i},\widehat{\bf w}_{j})\right\}.

With Taylor’s expansion, when $nh^{4m}\to 0$ , $nh^{2(r+p-1)}/\log^{2}(n)\to\infty$ , under conditions $2^{\prime}$ and $3^{\prime}$ , we have

\displaystyle\widetilde{g}(\widehat{\bf u}_{1},\widehat{\bf u}_{2})=\widetilde{g}({\bf u}_{1},{\bf u}_{2})+\sum_{k=1}^{3}(\Delta{\bf u}_{1}^{\mbox{\tiny{T}}},\Delta{\bf u}_{2}^{\mbox{\tiny{T}}})^{\otimes k}D^{\otimes k}\widetilde{g}({\bf u}_{1},{\bf u}_{2})+o_{p}(n^{-1}),

where $\mathbf{A}^{\otimes k}$ denotes the $k$ -th Kronecker power of the matrix $\mathbf{A}$ , $\Delta{\bf u}_{i}=\widehat{\bf u}_{i}-{\bf u}_{i}$ and

\displaystyle D^{\otimes k}\widetilde{g}({\bf u}_{1},{\bf u}_{2})=\frac{\partial^{k}\widetilde{g}({\bf u}_{1},{\bf u}_{2})}{\{\partial({\bf u}_{1}^{\mbox{\tiny{T}}},{\bf u}_{2}^{\mbox{\tiny{T}}})^{\mbox{\tiny{T}}}\}^{\otimes k}}.

In addition, we can expand $\widetilde{g}(\widehat{\bf u}_{1},{\bf u}_{2})$ as

\displaystyle\widetilde{g}(\widehat{\bf u}_{1},{\bf u}_{2})=\widetilde{g}({\bf u}_{1},{\bf u}_{2})+\sum_{k=1}^{3}(k!)^{-1}(\Delta{\bf u}_{1}^{\mbox{\tiny{T}}},{\bf 0}^{\mbox{\tiny{T}}})^{\otimes k}D^{\otimes k}\widetilde{g}({\bf u}_{1},{\bf u}_{2})+o_{p}(n^{-1}).

Therefore, by the definition of $S_{{\bf u}}({\bf u}_{1},{\bf u}_{2})$ , we have

			$\displaystyle S_{{\bf u}}(\widehat{\bf u}_{i},\widehat{\bf u}_{j})$
		$\displaystyle=$	$\displaystyle E\left\{\widetilde{g}({\bf u}_{i},{\bf u}_{j})+\widetilde{g}({\bf u},{\bf u}^{\prime})-\widetilde{g}({\bf u}_{i},{\bf u})-\widetilde{g}({\bf u},{\bf u}_{j})\mid({\bf u}_{i},{\bf u}_{j})\right\}$
		$\displaystyle+$	$\displaystyle\sum_{k=1}^{3}(k!)^{-1}E\Big{\{}(\Delta{\bf u}_{i}^{\mbox{\tiny{T}}},\Delta{\bf u}_{j}^{\mbox{\tiny{T}}})^{\otimes k}D^{\otimes k}\widetilde{g}({\bf u}_{i},{\bf u}_{j})-(\Delta{\bf u}_{i}^{\mbox{\tiny{T}}},{\bf 0}^{\mbox{\tiny{T}}})^{\otimes k}D^{\otimes k}\widetilde{g}({\bf u}_{i},{\bf u})$
			$\displaystyle\hskip 28.45274pt-({\bf 0}^{\mbox{\tiny{T}}},\Delta{\bf u}_{j}^{\mbox{\tiny{T}}})^{\otimes k}D^{\otimes k}\widetilde{g}({\bf u},{\bf u}_{j})\mid{\bf u}_{i},{\bf u}_{j}\Big{\}}+o_{p}(n^{-1}),$
		$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle\sum_{k=0}^{3}\widetilde{S}_{k}({\bf u}_{i},{\bf u}_{j})+o_{p}(n^{-1}),$

where $\widetilde{S}_{0}({\bf u}_{i},{\bf u}_{j})=S_{{\bf u}}({\bf u}_{i},{\bf u}_{j})$ and $\widetilde{S}_{k}({\bf u}_{i},{\bf u}_{j}),k=1,2,3$ are defined obviously. Similarly, when $nh^{2(r+q-1)}/\log^{2}(n)\to\infty$ , and $nh^{4m}\to 0$ , we can expand $S_{{\bf v}}(\widehat{\bf v}_{i},\widehat{\bf v}_{j})$ as

\displaystyle S_{{\bf v}}(\widehat{\bf v}_{i},\widehat{\bf v}_{j})

\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}

\displaystyle\sum_{k=0}^{3}\widetilde{S}_{k}({\bf v}_{i},{\bf v}_{j})+o_{p}(n^{-1}),

and it follows that

$\displaystyle\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})$	$\displaystyle=$	$\displaystyle n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 3}\widetilde{S}_{k}({\bf u}_{i},{\bf u}_{j})\widetilde{S}_{l}({\bf v}_{i},{\bf v}_{j})\widetilde{g}({\bf w}_{i},{\bf w}_{j})$
		$\displaystyle+n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 2}\widetilde{S}_{k}({\bf u}_{i},{\bf u}_{j})\widetilde{S}_{l}({\bf v}_{i},{\bf v}_{j})(\Delta{\bf w}_{i}^{\mbox{\tiny{T}}},\Delta{\bf w}_{j}^{\mbox{\tiny{T}}})D\widetilde{g}({\bf w}_{i},{\bf w}_{j})$
		$\displaystyle+2^{-1}n^{-2}\sum_{i,j}\sum_{0\leq k+l\leq 1}\widetilde{S}_{k}({\bf u}_{i},{\bf u}_{j})\widetilde{S}_{l}({\bf v}_{i},{\bf v}_{j})(\Delta{\bf w}_{i}^{\mbox{\tiny{T}}},\Delta{\bf w}_{j}^{\mbox{\tiny{T}}})^{\otimes 2}D^{\otimes 2}\widetilde{g}({\bf w}_{i},{\bf w}_{j})$
		$\displaystyle+6^{-1}n^{-2}\sum_{i,j}\widetilde{S}_{0}({\bf u}_{i},{\bf u}_{j})\widetilde{S}_{0}({\bf v}_{i},{\bf v}_{j})(\Delta{\bf w}_{i}^{\mbox{\tiny{T}}},\Delta{\bf w}_{j}^{\mbox{\tiny{T}}})^{\otimes 3}D^{\otimes 3}\widetilde{g}({\bf w}_{i},{\bf w}_{j})+o_{p}(n^{-1})$
	$\displaystyle\stackrel{{\scriptstyle\mbox{{\rm\tiny def}}}}{{=}}$	$\displaystyle Q_{1}^{\prime}+Q_{2}^{\prime}+Q_{3}^{\prime}+Q_{4}^{\prime}+o_{p}(n^{-1}).$

Then following similar arguments in the proof of Theorem 2, we have $Q_{2}^{\prime}$ , $Q_{3}^{\prime}$ and $Q_{4}^{\prime}$ are all of order $o_{p}(n^{-1})$ and $Q_{1}^{\prime}$ equals $n^{-2}\sum_{i,j}S_{\bf u}({\bf u}_{i},{\bf u}_{j})S_{\bf v}({\bf v}_{i},{\bf v}_{j})\widetilde{g}({\bf w}_{i},{\bf w}_{j})+o_{p}(n^{-1})$ . Combing these results, we have

\displaystyle\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})=n^{-2}\sum_{i,j}S_{\bf u}({\bf u}_{i},{\bf u}_{j})S_{\bf v}({\bf v}_{i},{\bf v}_{j})\widetilde{g}({\bf w}_{i},{\bf w}_{j})+o_{p}(n^{-1}),

where the right hand side is a first order degenerate V statistics. Thus by applying Theorem 6.4.1.B of Serfling (2009),

\displaystyle n\widehat{\rho}({\bf x},{\bf y}\mid{\bf z})\xrightarrow{d}\sum_{j=1}^{\infty}\lambda_{j}\chi_{j}^{2}(1)

where $\chi_{j}^{2}(1)$ , $j=1,2,\dots$ are independent $\chi^{2}(1)$ random variables, and $\lambda_{j}$ , $j=1,2,\dots$ are the corresponding eigenvalues of $\widetilde{h}({\bf u},{\bf v},{\bf w};{\bf u}^{\prime},{\bf v}^{\prime},{\bf w}^{\prime})$ . Therefore, the proof is completed.

6.8 Proof of Theorem 7

It suffices to show that $X\bot\!\!\!\bot Y\mid Z$ if and only if $U\bot\!\!\!\bot V\mid Z$ because $U\bot\!\!\!\bot V\mid Z$ is equivalent to $U,V$ and $Z$ are mutually independent under $U\bot\!\!\!\bot Z$ and $V\bot\!\!\!\bot Z$ .

We only show that $X\bot\!\!\!\bot Y\mid Z$ if and only if $U\bot\!\!\!\bot Y\mid Z$ , because similar arguments will yield that it is also equivalent to $U\bot\!\!\!\bot V\mid Z$ . It is quite straightforward that $X\bot\!\!\!\bot Y\mid Z$ implies $U\bot\!\!\!\bot Y\mid Z$ . While when $U\bot\!\!\!\bot Y\mid Z$ , we have for each $Z=z$ , $u$ and $y$ in the corresponding support,

\displaystyle\mbox{Pr}(U\leq u,Y\leq y\mid Z=z)=uF_{Y\mid Z}(y\mid z).

Substituting $U=(1-U_{X})F_{X\mid Z}(X-\mid Z)+U_{X}F_{X\mid Z}(X\mid Z)$ into the above equation, with some straight calculation, the left hand side is

		$\displaystyle\mbox{Pr}\left\{\mbox{Pr}(\widetilde{X}<X\mid X,\widetilde{Z}=z)+\mbox{Pr}(\widetilde{X}=X\mid X,\widetilde{Z}=z)U_{X}\leq u,Y\leq y\mid Z=z\right\}$
	$\displaystyle=$	$\displaystyle\mbox{Pr}\left\{\mbox{Pr}(\widetilde{X}<X\mid X,\widetilde{Z}=z)+\mbox{Pr}(\widetilde{X}=X\mid X,\widetilde{Z}=z)U_{X}\leq u,F_{Y\mid X,Z}(y\mid X,z)\mid Z=z\right\}.$

Because $U_{X}$ is standard uniformly distributed, we obtain

\displaystyle E\left[g\left\{\frac{u-\mbox{Pr}(\widetilde{X}<X\mid X,\widetilde{Z}=Z=z)}{\mbox{Pr}(\widetilde{X}=X\mid X,\widetilde{Z}=Z=z)}\right\}F_{Y\mid X,Z}(y\mid X,z)\right]=uF_{Y\mid Z}(y\mid z),

where $g(\cdot)$ is the cumulative distribution function of a standard uniformly distributed random variable. Now assume that conditional on $Z=z$ , the support of $X$ is $\{x_{1},\ldots,x_{N}\}$ , where $x_{1}<\ldots<x_{N}$ . Therefore, when $0<u<F_{X\mid Z}(x_{1}\mid z)$ , the expectation in the above equation is

			$\displaystyle\sum_{i=1}^{N}g\left\{\frac{u-\mbox{Pr}\left(X<x_{i}\mid Z=z\right)}{\mbox{Pr}\left(X=x_{i}\mid Z=z\right)}\right\}F_{Y\mid X,Z}(y\mid x_{i},z)\mbox{Pr}\left(X=x_{i}\mid Z=z\right)$
		$\displaystyle=$	$\displaystyle\frac{u-\mbox{Pr}\left(X<x_{1}\mid Z=z\right)}{\mbox{Pr}\left(X=x_{1}\mid Z=z\right)}\mbox{Pr}\left(X=x_{1}\mid Z=z\right)F_{Y\mid X,Z}(y\mid x_{1},z)$
		$\displaystyle=$	$\displaystyle uF_{Y\mid X,Z}(y\mid x_{1},z).$

The expectation equals $uF_{Y\mid Z}(y\mid z)$ . That is, $F_{Y\mid X,Z}(y\mid x_{1},z)=F_{Y\mid Z}(y\mid z)$ .

When $F_{X\mid Z}(x_{1}\mid z)<u<F_{X\mid Z}(x_{2}\mid z)$ , we can calculate the expectation as

			$\displaystyle\sum_{i=1}^{N}g\left\{\frac{u-\mbox{Pr}\left(X<x_{i}\mid Z=z\right)}{\mbox{Pr}\left(X=x_{i}\mid Z=z\right)}\right\}F_{Y\mid X,Z}(y\mid x_{i},z)\mbox{Pr}\left(X=x_{i}\mid Z=z\right)$
		$\displaystyle=$	$\displaystyle F_{Y\mid X,Z}(y\mid x_{1},z)\mbox{Pr}\left(X=x_{1}\mid Z=z\right)+\left\{u-\mbox{Pr}\left(X<x_{2}\mid Z=z\right)\right\}F_{Y\mid X,Z}(y\mid x_{2},z).$

Since we have shown that $F_{Y\mid X,Z}(y\mid x_{1},z)=F_{Y\mid Z}(y\mid z)$ , with the fact that the expectation equals $uF_{Y\mid Z}(y\mid z)$ , we can get $F_{Y\mid X,Z}(y\mid x_{2},z)=F_{Y\mid Z}(y\mid z)$ .

Similarly, we can obtain that $F_{Y\mid X,Z}(y\mid x_{k},z)=F_{Y\mid Z}(y\mid z)$ , $k=3,\ldots,N$ . Consequently, we have $F_{Y\mid X,Z}(y\mid x,z)=F_{Y\mid Z}(y\mid z)$ for all $x,y$ and $z$ in their support. That is, $X\bot\!\!\!\bot Y\mid Z$ . Therefore, the proof is completed.

7 Additional Simulations Results

We consider the directed acyclic graph with 5 nodes, i.e., $X=(X_{1},\dots,X_{5})$ , and only allow directed edge from $X_{i}$ and $X_{j}$ for $i<j$ . Denote the adjacency matrix $A$ . The existence of the edge follows a Bernoulli distribution, and we set $\mbox{Pr}(A_{i,j}=1)=0.4$ , for $i<j$ . When $A_{i,j}=1$ , we replace $A_{i,j}$ with independent realizations of a uniform $U(0.1,1)$ random variable. The value of the first random variable $X_{1}$ is randomly sampled from some distribution $\widetilde{P}$ . Specifically,

\displaystyle\epsilon_{1}\sim\widetilde{P},\mbox{ and }X_{1}=\epsilon_{1}.

The value of the next nodes is

\displaystyle\epsilon_{j}\sim\widetilde{P},\mbox{ and }X_{j}=\sum_{k=1}^{j-1}A_{j,k}X_{k}+\epsilon_{j}

for $j=1,\cdots,p$ . All random errors $\epsilon_{1},\dots,\epsilon_{p}$ are independently sampled from the distribution $\widetilde{P}$ . We consider two scenarios, where $\widetilde{P}$ follows either normal distribution $N(0,1)$ or uniform distribution $U(0,1)$ . We compare our proposed conditional independence test (denoted by “CIT”) with other popular conditional dependence measure. They are, respectively, the partial correlation conditional mutual information (Scutari, 2010, denoted by “CMI”), and the KCI.test (Zhang et al., 2011, denoted by “KCI”). We set the sample size $n=50$ , 100, 200 and $300$ . The true positive rate and false positive rate for the four different tests are reported in Tables 5, from which we can see that as the sample size increases, the true positive rate of the proposed method steadily grows, and the proposed method outperforms the other tests, while the false positive rate remains under control with slightly decrease.

Table 5: The true positive rate and false positive rate for the causal discovery of the directed acyclic graph with different tests

Samples	50	100	200	300	50	100	200	300
Tests	$\widetilde{P}\sim N(0,1)$
	true positive rate				false positive rate
CIT	0.555	0.658	0.734	0.789	0.117	0.112	0.107	0.103
PCR	0.479	0.489	0.546	0.589	0.101	0.110	0.130	0.135
CMI	0.472	0.530	0.604	0.590	0.097	0.127	0.143	0.150
KCI	0.360	0.516	0.592	0.634	0.072	0.135	0.144	0.168
Tests	$\widetilde{P}\sim U(0,1)$
	true positive rate				false positive rate
CIT	0.468	0.587	0.734	0.736	0.070	0.099	0.095	0.113
PCR	0.469	0.526	0.588	0.545	0.111	0.129	0.140	0.140
CMI	0.497	0.568	0.566	0.633	0.103	0.127	0.138	0.149
KCI	0.386	0.458	0.523	0.564	0.082	0.099	0.123	0.122

References

Brockwell (2007) Brockwell, A. (2007). “Universal residuals: A multivariate transformation.” Statistics & Probability Letters, 77(14), 1473–1478.
Chakraborty and Zhang (2019) Chakraborty, S. and Zhang, X. (2019). “Distance metrics for measuring joint dependence with application to causal inference.” Journal of the American Statistical Association, 114(528), 1638–1650.
Cook (2009) Cook, R.D. (2009). Regression graphics: ideas for studying regressions through graphics, volume 482. John Wiley & Sons.
Drton et al. (2020) Drton, M., Han, F., and Shi, H. (2020). “High dimensional independence testing with maxima of rank correlations.” The Annals of Statistics, To Appear.
Huang (2010) Huang, T.M. (2010). “Testing conditional independence using maximal nonlinear conditional correlation.” The Annals of Statistics, 38(4), 2047–2091.
Jordan (1998) Jordan, M.I. (1998). Learning in Graphical Models, volume 89. Springer Science & Business Media.
Kalisch and Bühlmann (2007) Kalisch, M. and Bühlmann, P. (2007). “Estimating high-dimensional directed acyclic graphs with the pc-algorithm.” Journal of Machine Learning Research, 8, 613–636.
Kalisch et al. (2012) Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H., and Bühlmann, P. (2012). “Causal inference using graphical models with the r package pcalg.” Journal of Statistical Software, 47(11), 1–26.
Korolyuk and Borovskich (2013) Korolyuk, V.S. and Borovskich, Y.V. (2013). Theory of U-statistics, volume 273. Springer Science & Business Media.
Lawrance (1976) Lawrance, A. (1976). “On conditional and partial correlation.” The American Statistician, 30(3), 146–149.
Linton and Gozalo (1996) Linton, O. and Gozalo, P. (1996). “Conditional independence restrictions: Testing and estimation.” Cowels Foundation Discussion Paper.
Neykov et al. (2020) Neykov, M., Balakrishnan, S., and Wasserman, L. (2020). “Minimax optimal conditional independence testing.” arXiv preprint arXiv:2001.03039.
Patra et al. (2016) Patra, R.K., Sen, B., and Székely, G.J. (2016). “On a nonparametric notion of residual and its applications.” Statistics & Probability Letters, 109, 208–213.
Rosenblatt (1952) Rosenblatt, M. (1952). “Remarks on a multivariate transformation.” The Annals of Mathematical Statistics, 23(3), 470–472.
Runge (2018) Runge, J. (2018). “Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information.” In “Proceeding of International Conference on Artificial Intelligence and Statistics,” pages 938–947.
Scutari (2010) Scutari, M. (2010). “Learning bayesian networks with the bnlearn r package.” Journal of Statistical Software, Articles, 35(3), 1–22. ISSN 1548-7660. doi:10.18637/jss.v035.i03.
Serfling (2009) Serfling, R.J. (2009). Approximation Theorems of Mathematical Statistics. John Wiley & Sons.
Shah and Peters (2020) Shah, R.D. and Peters, J. (2020). “The hardness of conditional independence testing and the generalised covariance measure.” Annals of Statistics, 48(3), 1514–1538.
Smith et al. (1988) Smith, J.W., Everhart, J., Dickson, W., Knowler, W., and Johannes, R. (1988). “Using the adap learning algorithm to forecast the onset of diabetes mellitus.” In “Proceedings of the Annual Symposium on Computer Application in Medical Care,” page 261. American Medical Informatics Association.
Song (2009) Song, K. (2009). “Testing conditional independence via rosenblatt transforms.” The Annals of Statistics, 37(6B), 4011–4045.
Spirtes et al. (2000) Spirtes, P., Glymour, C.N., and Scheines, R. (2000). Causation, prediction, and search. MIT press.
Su and White (2007) Su, L. and White, H. (2007). “A consistent characteristic function-based test for conditional independence.” Journal of Econometrics, 141(2), 807–834.
Su and White (2008) Su, L. and White, H. (2008). “A nonparametric hellinger metric test for conditional independence.” Econometric Theory, 24(4), 829–864.
Su and White (2014) Su, L. and White, H. (2014). “Testing conditional independence via empirical likelihood.” Journal of Econometrics, 182(1), 27–44.
Székely et al. (2007) Székely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007). “Measuring and testing dependence by correlation of distances.” The Annals of Statistics, 35(6), 2769–2794.
Wang et al. (2015) Wang, X., Pan, W., Hu, W., Tian, Y., and Zhang, H. (2015). “Conditional distance correlation.” Journal of the American Statistical Association, 110(512), 1726–1734.
Wasserman (2013) Wasserman, L. (2013). All of Statistics: A Concise Course in Statistical Inference. Springer Science & Business Media.
Zhang et al. (2011) Zhang, K., Peters, J., Janzing, D., and Schölkopf, B. (2011). “Kernel-based conditional independence test and application in causal discovery.” In “Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence,” UAI’11, pages 804–813. AUAI Press.
Zhou et al. (2020) Zhou, Y., Liu, J., and Zhu, L. (2020). “Test for conditional independence with application to conditional screening.” Journal of Multivariate Analysis, 175, 104557.