This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-scale tests of independence powerful for detecting explicit or implicit functional relationship

Angshuman Roy Department of Mathematics and Statistics, Indian Institute of Technology, Tirupati Sagnik Das Indian Institute of Science Education and Research, Kolkata
Abstract

In this article, we consider the problem of testing the independence between two random variables. Our primary objective is to develop tests that are highly effective at detecting associations arising from explicit or implicit functional relationship between two variables. We adopt a multi-scale approach by analyzing neighborhoods of varying sizes within the dataset and aggregating the results. We introduce a general testing framework designed to enhance the power of existing independence tests to achieve our objective. Additionally, we propose a novel test method that is powerful as well as computationally efficient. The performance of these tests is compared with existing methods using various simulated datasets.

1 Introduction

Tests of independence play a vital role in statistical analysis. They are used to determine relationships between variables, validate models, select relevant features from a large pool of features, and establish causal directions, among other applications. These tests are particularly important in fields such as economics, biology, social sciences, and clinical trials.

The mathematical formulation for testing independence is as follows. Consider two random variables XX and YY with distribution functions FXF_{X} and FYF_{Y}, respectively, and let FF represent their joint distribution function. The objective is to test the null hypothesis H0H_{0} against the alternative hypothesis H1H_{1} based on nn independent and identically distributed (i.i.d.) observations (x1,y1),(x2,y2),,(xn,yn)(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{n},y_{n}) drawn from FF, where

{H0:F=FXFYH1:FFXFY.\begin{cases}H_{0}:F=F_{X}F_{Y}\\ H_{1}:F\neq F_{X}F_{Y}.\end{cases} (1)

A substantial amount of research has been conducted on this topic, and it remains an active area of investigation. Pearson’s correlation is perhaps the most well-known classical measure that quantifies linear dependence. Spearman’s rank correlation (Spearman, 1904) and Kendall’s concordance-discordance statistic (Kendall, 1938) are used to measure monotonic associations. Hoeffding (1948) proposed a measure using the L2L_{2} distance between the joint distribution function and the product of marginals. Székely et al. (2007) introduced distance correlation (dCor), based on energy distance, while Gretton et al. (2007) leveraged kernel methods to develop the Hilbert-Schmidt independence criterion (HSIC). Reshef et al. (2011) proposed the maximal information coefficient (MIC), an information-theoretic measure that reaches its maximum when one variable is a function of the other. (Heller et al., 2013) introduced a powerful test that breaks down the problem into multiple 2×22\times 2 contingency table independence tests and aggregates the results. Zhang (2019) proposed a test that is uniformly consistent with respect to total variation distance. Roy et al. (2020), and Roy (2020) used copula and checkerboard copula approaches to propose monotonic transformation-invariant tests of dependence for two or more variables. Chatterjee (2021) introduced an asymmetric measure of dependence with a simple form and an asymptotic normal distribution under independence, which attains its highest value only when a functional relationship exists. These are just a few of the many contributions in this field.

Different tests of independence have unique strengths and limitations. As demonstrated in Theorem 2.2 of Zhang (2019), no test of independence can achieve uniform consistency. This implies that for any fixed sample size, the power of a test cannot always exceed the nominal level across all alternatives. Therefore, selecting the most appropriate test for a given scenario is crucial. For instance, the correlation coefficient is particularly powerful for detecting dependence between two jointly normally distributed random variables. It is highly effective at identifying linear relationships. Similarly, Spearman’s rank correlation is particularly suited for detecting monotonic relationships.

In this article, we aim to develop tests of independence (as described in Equation 1) that will be particularly powerful for detecting association between two random variables, XX and YY, that adhere to the following parametric equation:

{X=f(Z)+ϵXY=g(Z)+ϵY,\begin{cases}X=f(Z)+\epsilon_{X}\\ Y=g(Z)+\epsilon_{Y},\end{cases} (2)

where ff and gg are continuous functions not constant on any interval, ZZ is a continuous random variable defined on an interval, and ϵX\epsilon_{X} and ϵY\epsilon_{Y} are independent noise components that are each independent of ZZ. As a result, our tests will also be powerful for detecting association between random variables XX and YY that are functionally related to each other, i.e., when Y=f(X)+ϵYY=f(X)+\epsilon_{Y} or X=g(Y)+ϵXX=g(Y)+\epsilon_{X}. In contrast to the method proposed by Chatterjee (2021), which is powerful for detecting dependence when YY is a function of XX, in our approach, XX and YY are treated symmetrically.

The article is organized as follows. In Section 2, we introduce a general testing framework aimed at enhancing the power of existing tests of independence in scenarios described by Equation (2) and examine its computational complexity. In Section 3, we apply this framework to develop a novel test of independence and analyze its computational complexity. Section 4 presents a performance comparison of our methods on various simulated datasets. Finally, we conclude with a discussion and summary of our findings.

2 A General Testing Framework

Let us begin this section with a few examples of continuous dependent random variables XX and YY that satisfy the parametric equation described in Equation 2. We consider three examples, referred to as Example A, B, and C.

  • Example A: X=Θ+ϵXX=\Theta+\epsilon_{X} and Y=sin(Θ)+ϵYY=\sin(\Theta)+\epsilon_{Y}, where Θ\Theta, ϵX\epsilon_{X}, and ϵY\epsilon_{Y} are mutually independent, with ΘU(0,2π)\Theta\sim U(0,2\pi) and ϵX,ϵYN(0,(1/20)2)\epsilon_{X},\epsilon_{Y}\sim N(0,(1/20)^{2}).

  • Example B: X=cos(Θ)+ϵXX=\cos(\Theta)+\epsilon_{X} and Y=sin(Θ)+ϵYY=\sin(\Theta)+\epsilon_{Y}.

  • Example C: Y=U+ϵYY=U+\epsilon_{Y} and X=U2+ϵXX=U^{2}+\epsilon_{X}, where UU(0,1)U\sim U(0,1) and UU is independent of ϵX\epsilon_{X} and ϵY\epsilon_{Y}.

The scatter plots for these examples are shown in Figure 1. A common feature of all these examples is that within certain neighborhoods around support points, the conditional correlation between XX and YY is non-zero. Specifically, there exist points (x0,y0)(x_{0},y_{0}) and (x,y)(x^{\prime},y^{\prime}) in the support of the distribution such that Cor(X,Y(X,Y)N(x,y;|xx0|,|yy0|)\text{Cor}(X,Y\mid(X,Y)\in N(x,y;|x^{\prime}-x_{0}|,|y^{\prime}-y_{0}|) is non-zero, where N(x,y;ϵ1,ϵ2)N(x,y;\epsilon_{1},\epsilon_{2}) represents the ϵ1,ϵ2\epsilon_{1},\epsilon_{2}-neighborhood of (x,y)(x,y), defined as [xϵ1,x+ϵ1]×[yϵ2,y+ϵ2][x-\epsilon_{1},x+\epsilon_{1}]\times[y-\epsilon_{2},y+\epsilon_{2}]. In Figure 1, red rectangles highlight instances of such neighborhoods with non-zero conditional correlation.

Refer to caption
Figure 1: Neighborhoods having non-zero conditional correlation in Example A, B, and C.

As the next example, we consider the BEXdBEX_{d} distribution, originally introduced in (Zhang, 2019) where it is defined as the uniform distribution over a set of parallel and intersecting lines given by i=12d1j=12d1(|xci||ycj|)I[|xci|2d,|ycj|2d]=0\sum_{i=1}^{2^{d}-1}\sum_{j=1}^{2^{d}-1}(|x-c_{i}|-|y-c_{j}|)I[|x-c_{i}|\leq 2^{-d},|y-c_{j}|\leq 2^{-d}]=0, where ci=(2i1)/2dc_{i}=(2i-1)/2^{d}. Figure 2 illustrates the BEXdBEX_{d} distribution for d=2,3d=2,3, and 44. If (X,Y)(X,Y) follows the BEXdBEX_{d} distribution, the marginals XX and YY are dependent but both follow a continuous uniform distribution over [0,1][0,1]. Moreover, XX and YY can be shown to satisfy Equation 2. Although Cor(X,Y)=0\text{Cor}(X,Y)=0 for the BEXdBEX_{d} distribution, there exist neighborhoods around support points where the conditional correlation is extremely high.

Refer to caption
Figure 2: Existence of neighborhoods having high conditional correlation values in BEXdBEX_{d} distributions.

Red rectangles highlight some such neighborhoods with high conditional correlation values in BEX1BEX_{1}, BEX2BEX_{2}, and BEX3BEX_{3} distributions in Figure 2.

These examples suggest that by calculating the test statistics for an existing test of independence across different neighborhoods and aggregating these values meaningfully, we can enhance its power, especially in scenarios described by Equation (2). Building on this idea, we introduce a multi-scale approach for existing tests of independence in this section.

Let 𝐱𝐲1:n\mathbf{xy}_{1:n} denote the nn i.i.d. observations (x1,y1),,(xn,yn)(x_{1},y_{1}),\ldots,(x_{n},y_{n}). We analyze the dataset using neighborhoods of varying sizes centered at each observation point, with the distances from the center point to other observations serving as our guide for selecting neighborhood sizes. For a given sample 𝐱𝐲1:n\mathbf{xy}_{1:n} from a bivariate distribution FF, we consider all neighborhoods of the form N(xi,yi;|xjxi|,|yjyi|)N(x_{i},y_{i};|x_{j}-x_{i}|,|y_{j}-y_{i}|) for iji\neq j. Thus, we consider a total of n(n1)n(n-1) neighborhoods. For notational convenience, we denote N(xi,yi;|xjxi|,|yjyi|)N(x_{i},y_{i};|x_{j}-x_{i}|,|y_{j}-y_{i}|) by Ni,jN_{i,j}.

Let TT be a test statistic for testing independence between two univariate random variables. Let us use the notation T(𝐱𝐲1:n)T(\mathbf{xy}_{1:n}) to denote the value of the test statistic TT computed on the sample 𝐱𝐲1:n\mathbf{xy}_{1:n}. Here, we consider only those test statistics TT such that T(𝐱𝐲1:n)0T(\mathbf{xy}_{1:n})\geq 0, T(𝐱𝐲1:n)0T(\mathbf{xy}_{1:n})\rightarrow 0 in probability as nn\rightarrow\infty under independence, and the rejection region is of the form {T(𝐱𝐲1:n)>Cn}\{T(\mathbf{xy}_{1:n})>C_{n}\} for some constant Cn>0C_{n}>0. For example, Person’s correlation coefficient cannot be considered as TT, but its absolute value can be used. We define Ti,j:=T(𝐱𝐲1:nNi,j)T_{i,j}:=T(\mathbf{xy}_{1:n}\cap N_{i,j}), that is, Ti,jT_{i,j} is the value of the test statistic TT when evaluated on the observations that fall within the neighborhood Ni,jN_{i,j}. If TT is undefined on 𝐱𝐲1:nNi,j\mathbf{xy}_{1:n}\cap N_{i,j} (it may happen when xi=xjx_{i}=x_{j} or yi=yjy_{i}=y_{j}), we set Ti,j:=0T_{i,j}:=0. One naive way to aggregate all the findings from different neighborhoods is to sum Ti,jT_{i,j}’s, that is, to consider 1ijnTi,j\sum_{1\leq i\neq j\leq n}T_{i,j} as our test statistics. The problem with this summation is that if the dependency information is limited to relatively small neighborhoods, summing all the Ti,jT_{i,j} values might introduce noise into this information. To address this issue, we separate Ti,jT_{i,j}’s in (n1)(n-1) distinct groups according to the proximity between (xi,yi)(x_{i},y_{i}) and (xj,yj)(x_{j},y_{j}). A detailed description is as follows.

For each observation (xi,yi)(x_{i},y_{i}), we order remaining observations according to their Euclidean distance from (xi,yi)(x_{i},y_{i}). Random tie-breaking is to be used in case of ties while ordering. Let (xπi(1),yπi(1)),(x_{\pi_{i}(1)},y_{\pi_{i}(1)}), ,\ldots, (xπi(n1),yπi(n1))(x_{\pi_{i}(n-1)},y_{\pi_{i}(n-1)}) be the observations ordered by their Euclidean distance from (xi,yi)(x_{i},y_{i}) in ascending order. Thus, Ni,πi(k)N_{i,\pi_{i}(k)} is the neighborhood of (xi,yi)(x_{i},y_{i}) that has kk-th nearest neighbor of (xi,yi)(x_{i},y_{i}) at one of its vertices. It is easy to see that for a fixed ii, the length of the diagonal of Ni,πi(k)N_{i,\pi_{i}(k)} increases as kk increases. We average the value of the test statistic TT based on the sample 𝐱𝐲1:nNi,πi(k)\mathbf{xy}_{1:n}\cap N_{i,\pi_{i}(k)} keeping kk fixed and varying over ii over 1,,n1,\ldots,n. We denote this average as T[k]T_{[k]}, that is, T[k]:=n1i=1nT(𝐱𝐲1:nNi,πi(k))=n1i=1nTi,πi(k)T_{[k]}:=n^{-1}\sum_{i=1}^{n}T(\mathbf{xy}_{1:n}\cap N_{i,\pi_{i}(k)})=n^{-1}\sum_{i=1}^{n}T_{i,\pi_{i}(k)}. To determine how extreme the value of T[k]T_{[k]} is, we need to know its distribution under the independence of the marginals, which can be estimated by resampling technique. We discuss this in details next.

Given a sample 𝐱𝐲1:n\mathbf{xy}_{1:n}, a randomly permuted sample 𝐱𝐲1:n(τ)\mathbf{xy}^{(\tau)}_{1:n} can be generated, where τ(1),,τ(n)\tau(1),\ldots,\tau(n) is a random permutation of 1,,n1,\ldots,n and 𝐱𝐲1:n(τ):={(x1,yτ(1)),,(xn,yτ(n))}\mathbf{xy}^{(\tau)}_{1:n}:=\{(x_{1},y_{\tau(1)}),\ldots,(x_{n},y_{\tau(n)})\}. We calculate T[1],,T[n1]T_{[1]},\ldots,T_{[n-1]} in the same way on the sample 𝐱𝐲1:n(τ)\mathbf{xy}^{(\tau)}_{1:n} and denote their values with T[1](τ),,T[n1](τ)T_{[1]}^{(\tau)},\ldots,T_{[n-1]}^{(\tau)}. For BB independent random permutations τ1,,τB\tau_{1},\ldots,\tau_{B}, we can compute T[k](τ1),,T[k](τB)T_{[k]}^{(\tau_{1})},\ldots,T_{[k]}^{(\tau_{B})} for 1kn11\leq k\leq n-1. According to the permutation test principle, the empirical distribution of T[k](τ1),,T[k](τB)T_{[k]}^{(\tau_{1})},\ldots,T_{[k]}^{(\tau_{B})} is an estimator of the distribution of T[k]T_{[k]} under independence. We can estimate the mean of T[k]T_{[k]} under independence by T¯[k]H0:=B1i=1BT[k](τi)\bar{T}_{[k]}^{H_{0}}:=B^{-1}\sum_{i=1}^{B}T_{[k]}^{(\tau_{i})} and the standard deviation of T[k]T_{[k]} under independence by s[k]H0:=B1i=1B(T[k](τi)T¯[k]H0)2s_{[k]}^{H_{0}}:=\sqrt{B^{-1}\sum_{i=1}^{B}(T_{[k]}^{(\tau_{i})}-\bar{T}_{[k]}^{H_{0}})^{2}}. Next we compute Z-score of T[k]T_{[k]} with respect to its distribution under independence by zk:=T[k]T¯[k]H0s[k]H0z_{k}:=\frac{T_{[k]}-\bar{T}_{[k]}^{H_{0}}}{s_{[k]}^{H_{0}}}. If s[k]H0=0s_{[k]}^{H_{0}}=0, we define zk=0z_{k}=0. The Z-scores z1,,zn1z_{1},\ldots,z_{n-1} suggest how extreme the values of T[1],,T[n1]T_{[1]},\ldots,T_{[n-1]} are. Analyzing them for a sample 𝐱𝐲1:n\mathbf{xy}_{1:n} can give us valuable insight into the dependence structure. We demonstrate this point below using various bivariate distributions.

In this illustration, we consider two popular test statistics as TT: absolute value of Pearson’s correlation and distance correlation. We denote these two test statistics by TcorT^{cor} and TdcorT^{dcor} respectively. We consider eight bivariate distributions, description of each is provided in the Table 1.

Distribution Description
(a) Square X,Yi.i.d.U(1,1)X,Y\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}U(-1,1)
(b) Straight line XU(1,1),Y=XX\sim U(-1,1),Y=X
(c) Noisy straight line UU(1,1),X=U+eX,Y=U+eY,U\sim U(-1,1),X=U+e_{X},Y=U+e_{Y},
eX,eYi.i.d.N(0,0.1)e_{X},e_{Y}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}N(0,0.1)
(d) Sine XU(0,2π),Y=sin(5X),X\sim U(0,2\pi),Y=\sin(5X),
(e) Circle ΘU(0,2π),X=cos(Θ),Y=sin(Θ)\Theta\sim U(0,2\pi),X=\cos(\Theta),Y=\sin(\Theta)
(f) Noisy parabola UU(1,1),X=U2+ϵX,Y=U+ϵY,U\sim U(-1,1),X=U^{2}+\epsilon_{X},Y=U+\epsilon_{Y},
eX,eYi.i.d.N(0,0.25)e_{X},e_{Y}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}N(0,0.25)
(g) BEX2BEX_{2} As described in Section 1
(h) BVN, rho=0.5 (X,Y)(X,Y) follows bivariate normal distribution
with zero means, unit variances and correlation
coefficient 0.5
Table 1: Description of distributions considered

The scatter plots of these distributions are presented in Figure 3. It is easy to see that except for the ‘Square’ distribution, XX and YY are dependent in all the other distributions.

From each distribution, we generated 50 i.i.d. observations and calculated z1,,z49z_{1},\ldots,z_{49}. By repeating this step independently 1000 times, we then calculated the average values z¯1,,z¯49\bar{z}_{1},\ldots,\bar{z}_{49}. We plotted these average values for each distribution in Figure 4. Under independence, it is evident that the expected value of zkz_{k} is 0. Therefore, if zkz_{k} deviates from 0, it will be indicative of dependence.

Refer to caption
Figure 3: Scatter plots of different datasets used
Refer to caption
Figure 4: The plot of z¯1,,z¯49\bar{z}_{1},\ldots,\bar{z}_{49} for different distributions. The blue and red points are corresponding to TcorT^{cor} and TdcorT^{dcor} statistics respectively.

From Figure 4, we observe that the average Z-score values are close to 0 for ‘Square’ distribution as XX and YY are independent. For the ‘Straight line’ and ‘Noisy straight line’ distributions, the average Z-scores are higher in larger neighborhoods for both TcorT^{cor} and TdcorT^{dcor}. However, in the ‘Noisy straight line’, the dependence information is less apparent in smaller neighborhoods compared to the ‘Straight line’. For the ‘Sine’ distribution, the dependence information is clearly noticeable in smaller neighborhoods. An interesting phenomenon occurs with the ‘Circle’ distribution, where most neighborhoods, including the largest ones, contain arcs, allowing both TcorT^{cor} and TdcorT^{dcor} to detect dependence effectively in most neighborhoods. In the ‘Noisy parabola’, the dependence is most prominent in mid-sized neighborhoods. For the more complex BEX2BEX_{2} distribution, these statistics detect the dependence patterns in smaller and mid-sized neighborhoods, but less so in larger ones. Finally, in ‘BVN, rho=0.5’, the dependence information is only clearly evident in larger neighborhoods.

It follows clearly from the above illustration that test statistics such as absolute value of correlation and distance correlation sometimes detect dependence information better in smaller neighborhoods, sometimes in mid-sized, and sometimes in larger neighborhoods. Therefore, it makes sense to aggregate information from all the Z-scores z1,,zn1z_{1},\ldots,z_{n-1}. We propose an aggregation method in the following paragraph.

First, we observe that under independence, the expected value of zkz_{k} is 0 for k=1,,n1k=1,\ldots,n-1. Under dependence, if the Ti,jT_{i,j}’s are stochastically larger, T[k]T_{[k]} will also be stochastically larger. In that case, the expected values of zkz_{k} will also be positive. Let μk\mu_{k} be the expected value of zkz_{k}, that is, μk=E[zk]\mu_{k}=E[z_{k}]. Thus testing H0H_{0} against H1H_{1} can be done by testing H0:μk=0kH_{0}^{{}^{\prime}}:\mu_{k}=0~{}\forall k vs. H1:ksuch thatμk>0H_{1}^{{}^{\prime}}:\exists~{}k~{}\text{such that}~{}\mu_{k}>0. To test this, we suggest the following test statistic Ψn:=k=1n1(max{zk,0})2\Psi_{n}:=\sum_{k=1}^{n-1}(\max\{z_{k},0\})^{2}. Clearly, a high value of Ψn\Psi_{n} presents evidence against null hypothesis. To determine the distribution of Ψn\Psi_{n} under H0H_{0}, we again use resampling approach by utilizing BB randomly permuted samples 𝐱𝐲1:n(τ1),,𝐱𝐲1:n(τB)\mathbf{xy}_{1:n}^{(\tau_{1})},\ldots,\mathbf{xy}_{1:n}^{(\tau_{B})} which are already in our disposal. Similar how we computed z1,,zn1z_{1},\ldots,z_{n-1} for 𝐱𝐲1:n\mathbf{xy}_{1:n} using permuted samples 𝐱𝐲1:n(τ1),,𝐱𝐲1:n(τB)\mathbf{xy}_{1:n}^{(\tau_{1})},\ldots,\mathbf{xy}_{1:n}^{(\tau_{B})}, we compute z1(τi),,zn1(τi)z_{1}^{(\tau_{i})},\ldots,z_{n-1}^{(\tau_{i})} for 𝐱𝐲1:n(τi)\mathbf{xy}_{1:n}^{(\tau_{i})} based on permuted samples 𝐱𝐲1:n(τ1),,𝐱𝐲1:n(τi1),𝐱𝐲1:n(τi+1),𝐱𝐲1:n(τB)\mathbf{xy}_{1:n}^{(\tau_{1})},\ldots,\mathbf{xy}_{1:n}^{(\tau_{i-1})},\mathbf{xy}_{1:n}^{(\tau_{i+1})},\mathbf{xy}_{1:n}^{(\tau_{B})} for i=1,,Bi=1,\ldots,B. Next we calculate Ψn(τi):=k=1n1(max{zk(τi),0})2\Psi_{n}^{(\tau_{i})}:=\sum_{k=1}^{n-1}(\max\{z_{k}^{(\tau_{i})},0\})^{2} for i=1,,Bi=1,\ldots,B. Finally we determine the p-value of Ψn\Psi_{n} by p=B1i=1BI[ΨnΨn(τi)]p=B^{-1}\sum_{i=1}^{B}I[\Psi_{n}\leq\Psi_{n}^{(\tau_{i})}]. We reject H0H_{0} if the p-value turns out to be less than the level of significance.

Assume that computing T(𝐱𝐲1:n)T(\mathbf{xy}_{1:n}) requires 𝒪(Θ(n))\mathcal{O}(\Theta(n)) operations. In that case, computing Ti,jT_{i,j} collectively for 1ijn1\leq i\neq j\leq n requires 𝒪(n2Θ(n))\mathcal{O}(n^{2}\Theta(n)) operations. For a fixed 1in1\leq i\leq n, the merge sort algorithm allows us to compute πi\pi_{i} in 𝒪(nlogn)\mathcal{O}(n\log n) operations. Therefore, determining T[1],,T[n1]T_{[1]},\ldots,T_{[n-1]} collectively takes 𝒪(n2Θ(n)+n2logn)\mathcal{O}(n^{2}\Theta(n)+n^{2}\log n) operations (recall that T[k]=n1i=1nTi,πi(k)T_{[k]}=n^{-1}\sum_{i=1}^{n}T_{i,\pi_{i}(k)}). As a result, the computational complexity of computing z1,,zn1z_{1},\ldots,z_{n-1} together also is 𝒪(n2Θ(n)+n2logn)\mathcal{O}(n^{2}\Theta(n)+n^{2}\log n) assuming that number of randomly permuted samples BB is constant with respect to nn. Thus we conclude that 𝒪(n2Θ(n)+n2logn)\mathcal{O}(n^{2}\Theta(n)+n^{2}\log n) operations are required for computing the test statistics Ψn=k=1n1(max{zk,0})2\Psi_{n}=\sum_{k=1}^{n-1}(\max\{z_{k},0\})^{2}.

Since the correlation coefficient can be calculated in linear time, Θ(n)=n\Theta(n)=n when T=TcorT=T^{cor}. Thus, the complexity of calculating Ψn\Psi_{n} when T=TcorT=T^{cor} is 𝒪(n3)\mathcal{O}(n^{3}). On the other hand, the distance correlation between two univariate random variables can be computed in 𝒪(nlogn)\mathcal{O}(n\log n) operations (see, Chaudhuri and Hu, 2019). Therefore, the complexity of calculating Ψn\Psi_{n} when T=TdcorT=T^{dcor} is 𝒪(n3logn)\mathcal{O}(n^{3}\log n).

3 A Special Test Method

In this section, we introduce a testing method that closely follows the framework proposed in the previous section, with a few modifications. XX and YY are assumed to be continuous random variables in this setup. The key distinction in this approach lies in the manner in which the Ti,jT_{i,j} values are computed.

Given a sample 𝐱𝐲1:n\mathbf{xy}_{1:n}, we begin by fixing two observations (xi,yi)(x_{i},y_{i}) and (xj,yj)(x_{j},y_{j}), where iji\neq j. Similar to the previous method, we define the neighborhood Ni,j=N(xi,yi;|xjxi|,|yjyi|)N_{i,j}=N(x_{i},y_{i};|x_{j}-x_{i}|,|y_{j}-y_{i}|), centered at (xi,yi)(x_{i},y_{i}) with (xj,yj)(x_{j},y_{j}) positioned at a corner of the rectangle defined by the neighborhood. Within this neighborhood, we partition the space into four quadrants, classifying observations according to whether the xx-coordinate is less than or greater than xix_{i} and whether the yy-coordinate is less than or greater than yiy_{i}. Subsequently, a 2×22\times 2 contingency table is constructed to count the number of observations in each of these quadrants (see Table 2).

xk<xix_{k}<x_{i} xk>xix_{k}>x_{i}
yk>yiy_{k}>y_{i} ai,ja_{i,j} bi,jb_{i,j}
yk<yiy_{k}<y_{i} ci,jc_{i,j} di,jd_{i,j}
Table 2: Number of (xk,yk)(x_{k},y_{k}) that belong to one of the four quadrants of Ni,jN_{i,j}.

Let ai,j,bi,j,ci,j,di,ja_{i,j},b_{i,j},c_{i,j},d_{i,j} represent the frequencies in this contingency table, where ai,j=#{(xk,yk):(xk,yk)𝐱𝐲1:nNi,j,xk<xi,yk>yi}a_{i,j}=\#\{(x_{k},y_{k}):(x_{k},y_{k})\in\mathbf{xy}_{1:n}\cap N_{i,j},x_{k}<x_{i},y_{k}>y_{i}\}, bi,j=#{(xk,yk):(xk,yk)𝐱𝐲1:nNi,j,xk>xi,yk>yi}b_{i,j}=\#\{(x_{k},y_{k}):(x_{k},y_{k})\in\mathbf{xy}_{1:n}\cap N_{i,j},x_{k}>x_{i},y_{k}>y_{i}\}, ci,j=#{(xk,yk):(xk,yk)𝐱𝐲1:nNi,j,xk<xi,yk<yi}c_{i,j}=\#\{(x_{k},y_{k}):(x_{k},y_{k})\in\mathbf{xy}_{1:n}\cap N_{i,j},x_{k}<x_{i},y_{k}<y_{i}\}, and di,j=#{(xk,yk):(xk,yk)𝐱𝐲1:nNi,j,xk>xi,yk<yi}d_{i,j}=\#\{(x_{k},y_{k}):(x_{k},y_{k})\in\mathbf{xy}_{1:n}\cap N_{i,j},x_{k}>x_{i},y_{k}<y_{i}\}.

Ti,jT_{i,j} is defined as the absolute value of the phi coefficient calculated from this contingency table:

Ti,j:=|ai,jdi,jbi,jci,j|(ai,j+bi,j)(ci,j+di,j)(ai,j+ci,j)(bi,j+di,j).T_{i,j}:=\frac{|a_{i,j}d_{i,j}-b_{i,j}c_{i,j}|}{\sqrt{(a_{i,j}+b_{i,j})(c_{i,j}+d_{i,j})(a_{i,j}+c_{i,j})(b_{i,j}+d_{i,j})}}.

If (ai,j+bi,j)(ci,j+di,j)(ai,j+ci,j)(bi,j+di,j)=0(a_{i,j}+b_{i,j})(c_{i,j}+d_{i,j})(a_{i,j}+c_{i,j})(b_{i,j}+d_{i,j})=0, we set Ti,j=0T_{i,j}=0.

Next, we employ the same framework described in the previous section to determine a p-value. For notational convenience, the test statistic TT used here will be referred to as TphiT^{phi}.

Following the setup in Figure 4, we plotted the average Z-scores for the TphiT^{phi} statistic across various neighborhood sizes, as shown in Figure 5 to gain a deeper understanding of its underlying mechanism. These values are represented by red dots. For comparison, the average Z-scores for the TcorT^{cor} statistic are plotted alongside and represented by the blue dots.

Refer to caption
Figure 5: Plot of z¯1,,z¯49\bar{z}_{1},\ldots,\bar{z}_{49} for different distributions. The blue and red points correspond to TcorT^{cor} and TphiT^{phi} statistics, respectively.

As shown in Figure 5(a), the Z-scores for TphiT^{phi} exhibit higher variance compared to TcorT^{cor} under independence. Figures 5(b) through 5(g) demonstrate that TphiT^{phi} consistently yields higher Z-scores than TcorT^{cor} for the ‘Straight line,’ ‘Noisy straight line,’ ‘Sine,’ ‘Circle,’ ‘noisy parabola,’ and ‘BEX2BEX_{2}’ distributions. Additionally, the average Z-score plot for TphiT^{phi} appears to be more wiggly compared to TcorT^{cor}. In the ‘BVN, rho=0.5’ distribution, the average Z-scores of TphiT^{phi} are comparable to those of TcorT^{cor}.

For a fixed 1ijn1\leq i\neq j\leq n, calculating Ti,jT_{i,j} requires counting the values ai,j,bi,j,ci,j,a_{i,j},b_{i,j},c_{i,j}, and di,jd_{i,j}. A straightforward approach to obtain these counts is to check each observation in 𝐱𝐲1:n\mathbf{xy}_{1:n} individually. This process involves 𝒪(n)\mathcal{O}(n) operations to compute Ti,jT_{i,j}. As established in the previous section, this naive approach requires 𝒪(n3)\mathcal{O}(n^{3}) operations to compute the test statistics. However, by applying the algorithm described in Lemma 1, which computes Ti,jT_{i,j} for all 1jin1\leq j\neq i\leq n simultaneously in 𝒪(nlogn)\mathcal{O}(n\log n) operations for a fixed ii, the computational complexity can be significantly reduced. When this algorithm is implemented, calculating T[1],,T[n1]T_{[1]},\ldots,T_{[n-1]} collectively will require 𝒪(n2logn)\mathcal{O}(n^{2}\log n) operations. As a result, the complexity of computing the test statistics Ψn\Psi_{n} is reduced to 𝒪(n2logn)\mathcal{O}(n^{2}\log n).

4 Performance on simulated datasets

In this section, we compare the performance of our proposed test methods on various simulated datasets with that of existing tests in the literature. From the proposed general testing framework, we select two well-known test statistics - the absolute value of Pearson’s correlation and distance correlation - as TT. We denote the resulting test statistic Ψn\Psi_{n} in these two cases as Ψncor\Psi_{n}^{cor} and Ψndcor\Psi_{n}^{dcor}, respectively. We denote the test statistic Ψn\Psi_{n} for our proposed special test method as Ψnphi\Psi_{n}^{phi}. As pointed out in previous sections, the computational complexities of Ψnphi,Ψncor\Psi_{n}^{phi},\Psi_{n}^{cor}, and Ψndcor\Psi_{n}^{dcor} are 𝒪(n2logn),𝒪(n3)\mathcal{O}(n^{2}\log n),\mathcal{O}(n^{3}), and 𝒪(n3logn)\mathcal{O}(n^{3}\log n), respectively.

From existing tests of independence, we selected the following popular tests: dCor (Székely et al., 2007), HSIC (Gretton et al., 2007), HHG (Heller et al., 2013), MIC (Reshef et al., 2011), Chatterjee’s correlation (xicor) (Chatterjee, 2021) and BET (Zhang, 2019). For performing dCor, HSIC, HHG, and xicor tests, we used R packages energy (Rizzo and Szekely, 2022), dHSIC (Pfister and Peters, 2019), HHG (Kaufman et al., 2019), and XICOR (Chatterjee and Holmes, 2023), respectively. We calculated the MIC statistic by using ‘mine_stat’ function from the R package minerva (Albanese et al., 2012) and subsequently performed a permutation test. For executing BET test, we used the R code provided in the supplemental material of (Zhang, 2019). We fixed the nominal level at 5%5\%. Except for BET, all other tests were performed using the permutation principle, with the number of permutations set to 1000. The empirical power of a test is determined by calculating the proportion of times it rejects the null hypothesis over 1000 independently generated samples of a specific size from a given distribution.

First, we considered eighty jointly continuous distributions, namely ‘Doppler’, ‘BEX2BEX_{2}’, ‘Lissajous A’, ‘Lissajous B’, ‘Rose curve’, ‘Spiral’, ‘Tilted square’, and ‘Five clouds’. Their descriptions are provided in Table 3. Scatter plots for each of these distributions are presented in Figure 8 in Appendix B.

Distribution Description
(a) Doppler XU(0,1/2),Y=Xsin(1/X)X\sim U(0,1/2),Y=\sqrt{X}\sin(1/X)
(b) BEX2BEX_{2} As described in Section 1
(c) Lissajous A ΘU(0,2π),X=sin(3Θ+3π/4)+ϵX,Y=sin(Θ)+ϵY\Theta\sim U(0,2\pi),X=\sin(3\Theta+3\pi/4)+\epsilon_{X},Y=\sin(\Theta)+\epsilon_{Y}
(d) Lissajous B ΘU(0,2π),X=sin(4Θ+π/2)+ϵX,Y=sin(3Θ)+ϵY\Theta\sim U(0,2\pi),X=\sin(4\Theta+\pi/2)+\epsilon_{X},Y=\sin(3\Theta)+\epsilon_{Y}
(e) Rose curve ΘU(0,2π),R=cos(4Θ),X=Rcos(Θ)+ϵX/2,Y=Rsin(Θ)+ϵY/2\Theta\sim U(0,2\pi),R=\cos(4\Theta),X=R\cos(\Theta)+\epsilon_{X}/2,Y=R\sin(\Theta)+\epsilon_{Y}/2
(f) Spiral ΘU(0,4π),R=Θ/(2π),X=Rcos(Θ)+ϵX,Y=Rsin(Θ)+ϵY\Theta\sim U(0,4\pi),R=\Theta/(2\pi),X=R\cos(\Theta)+\epsilon_{X},Y=R\sin(\Theta)+\epsilon_{Y}
(g) Tilted Square U,Vi.i.d.U(1,1),X=cos(π/3)Usin(π/3)V,Y=sin(π/3)U+cos(π/3)U,V\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}U(-1,1),X=\cos(\pi/3)U-\sin(\pi/3)V,Y=\sin(\pi/3)U+cos(\pi/3)V
(h) Five Clouds (U,V)(U,V) uniform on {(0,0),(0,1),(0.5,0.5),(1,0),(1,1)}\{(0,0),(0,1),(0.5,0.5),(1,0),(1,1)\},
X=U+2ϵX,Y=V+2ϵYX=U+2\epsilon_{X},Y=V+2\epsilon_{Y}
Table 3: The description of ‘Doppler’, ‘BEX2BEX_{2}’, ‘Lissajous A’, ‘Lissajous B’, ‘Rose curve’, ‘Spiral’, ‘Tilted square’, and ‘Five clouds’ distributions. Here ϵX,ϵY\epsilon_{X},\epsilon_{Y} are i.i.d. random variables following N(0,1/302)N(0,1/30^{2}), independent of U,VU,V, and Θ\Theta.

It can be observed from the table that in the ‘Doppler’ distribution, YY is directly a function of XX. In ‘Lissajous curve A’, ‘Lissajous curve B’, ‘Rose curve’, and ‘Spiral’, XX and YY are related to each other in form of an implicit function. ‘Tilted square’ and ‘five clouds’ are two distributions where XX and YY are dependent, but their correlation is 0.

The empirical powers of all test methods on these 8 distributions are presented in Figure 6.

Refer to caption
Figure 6: The empirical powers of different test methods on ‘Doppler’, ‘BEX2BEX_{2}’, ‘Lissajous A’, ‘Lissajous B’, ‘Rose curve’, ‘Spiral’, ‘Tilted square’, and ‘Five clouds’.

It can be seen from this figure that Ψnphi\Psi_{n}^{phi} performed best in both ‘Doppler’ and ‘BEX2BEX_{2}’ and performed well in all other distributions. Ψncor\Psi_{n}^{cor} performed fairly well in all distributions except for ‘Rose curve’ and ‘Tilted square’. Ψndcor\Psi_{n}^{dcor} performed well in all cases, and in ‘Lissajous curve A’, ‘Lissajous curve B’, ‘Rose curve’, ‘Spiral’, ‘Five clouds’, it performed the best. HHG yielded satisfactory power overall except for ‘Lissajous curve B’. The power of HSIC is satisfactory only in the ‘Doppler’ and ‘Tilted square’ distributions, not otherwise. dCor, MIC, xicor did not perform well. BET performed somewhat well for larger sample sizes in ‘Lissajous curve B’, ‘Rose curve’ and in ‘Spiral’.

Next, we considered four datasets to assess the performance of our methods under different levels of noise. In these examples, X=f(Z)+λϵXX=f(Z)+\lambda\epsilon_{X} and Y=g(Z)+λϵYY=g(Z)+\lambda\epsilon_{Y}, where ff and gg are real functions, ZZ is a random variable, λ\lambda is a non-negative constant, and ϵX\epsilon_{X} and ϵY\epsilon_{Y} are i.i.d. normal random variables that are also independent of ZZ. As λ\lambda increases, the noise level increases. In particular, we considered 4 distribution named - ‘Parabola λ\lambda’, ‘Circle λ\lambda’, ‘Sine λ\lambda’ and ‘Laminscate λ\lambda’, descriptions of each of these can be found in Table 4.

Distribution Description
(a) Parabola λ\lambda UU(1,1),X=U2+2λϵX,Y=U+2λϵYU\sim U(-1,1),X=U^{2}+2\lambda\epsilon_{X},Y=U+2\lambda\epsilon_{Y}
(b) Circle λ\lambda ΘU(0,2π),X=cos(Θ)+6λϵX,Y=sin(Θ)+6λϵY\Theta\sim U(0,2\pi),X=\cos(\Theta)+6\lambda\epsilon_{X},Y=\sin(\Theta)+6\lambda\epsilon_{Y}
(c) Sine λ\lambda ΘU(0,2π),X=Θ+3λϵX,Y=sin(Θ)+3λϵY\Theta\sim U(0,2\pi),X=\Theta+3\lambda\epsilon_{X},Y=\sin(\Theta)+3\lambda\epsilon_{Y}
(d) Laminscate λ\lambda ΘU(0,2π),X=cos(Θ)(1+sin(Θ))2+2λϵX,Y=sin(Θ)cos(Θ)(1+sin(Θ))2+2λϵY\Theta\sim U(0,2\pi),X=\frac{\cos(\Theta)}{(1+\sin(\Theta))^{2}}+2\lambda\epsilon_{X},Y=\frac{\sin(\Theta)\cos(\Theta)}{(1+\sin(\Theta))^{2}}+2\lambda\epsilon_{Y}
Table 4: The descriptions of ‘Parabola λ\lambda’, ‘Circle λ\lambda’, ‘Sine λ\lambda’ and ‘Laminscate λ\lambda’ distributions. Possible values of lambda are λ=0,1,2\lambda=0,1,2. Here, ϵX,ϵY\epsilon_{X},\epsilon_{Y} are i.i.d. random variables following N(0,1/602)N(0,1/60^{2}), independent of UU and Θ\Theta.

We considered three noise levels by taking λ=0,1,2\lambda=0,1,2. Scatter plots of the twelve (4×3=124\times 3=12) distributions are presented in Figure 9 in the Appendix B.

The empirical powers of each of these tests are presented in Figure 7.

Refer to caption
Figure 7: The empirical power of different test methods on ‘parabola λ\lambda’, ‘circle λ\lambda’, ‘sine λ\lambda’ and ‘laminscate λ\lambda’ distributions for λ=0,1,2\lambda=0,1,2.

It can be observed from this figure that Ψnphi\Psi_{n}^{phi} performs best or next to best in the absence of noise. Even in the presence of noise, power of Ψnphi\Psi_{n}^{phi} is quite good. As the noise level increases, Ψndcor\Psi_{n}^{dcor} takes the lead. The overall performance of Ψncor\Psi_{n}^{cor} is competitive except for the ‘Circle λ\lambda’ case. HHG performed well but it lagged behind our proposed tests most of the time. HSIC performed well in ‘Parabola λ\lambda’ and ‘Circle λ\lambda’ distributions, but didn’t achieve satisfactory power in other distributions. dCor had a somewhat competitive power in ‘Parabola λ\lambda’, but not in other distributions. MIC had low power overall, except for higher sample sizes in the ‘Sine λ\lambda’ distributions. Xicor also performed poorly except for ‘Sine λ\lambda’, where it performed the best. This is not unexpected from xicor, as it is mainly designed to detect whether YY is a function of XX. BET had the lowest power among all tests in most distributions.

5 Discussion and conclusion

We proposed a general framework for testing independence, which can utilize existing independence tests in different neighborhoods of the dataset and combine the results in a meaningful way. This approach has been shown to enhance performance when the variables are explicitly or implicitly functionally dependent. Additionally, we introduced a novel test of independence that leverages a similar framework. It is important to note that multiple approaches can be taken towards selecting neighborhoods, and both the power and the complexity of the resulting method can vary based on it. Therefore, an optimal selection of neighborhoods could be a potential direction for future research. Our proposed method assigns equal weight to each neighborhood, from nearest to farthest. The potential benefits of using varying weights for different neighborhoods could be explored in future research.

Appendix A

Lemma 1.

For the proposed special test method, for a fixed 1in1\leq i\leq n, the computation of Ti,jT_{i,j} for all 1jin1\leq j\neq i\leq n can be done collectively in 𝒪(nlogn)\mathcal{O}(n\log n) operations.

Proof.

Let us fix ii such that 1in1\leq i\leq n. At first, we will prove that the sequence ai,1,,ai,i1,ai,i+1,,ai,na_{i,1},\dots,a_{i,i-1},a_{i,i+1},\dots,a_{i,n} can be computed in 𝒪(nlogn)\mathcal{O}(n\log n) operations.

For 1jin1\leq j\neq i\leq n, djx=|xjxi|d^{x}_{j}=|x_{j}-x_{i}| and djy=|yjyi|d^{y}_{j}=|y_{j}-y_{i}| can be computed in 𝒪(n)\mathcal{O}(n) operations. As (x1,y1),,(xn,yn)(x_{1},y_{1}),\ldots,(x_{n},y_{n}) are nn i.i.d. observations from a continuous distribution, there are no ties in xx-coordinate or in yy-coordinate, and hence djxd^{x}_{j} and djyd^{y}_{j} are positive for all 1jin1\leq j\neq i\leq n without any ties.

Using the merge sort algorithm, a permutation ω\omega of the sequence (1,,i1,i+1,,n)(1,\ldots,i-1,i+1,\ldots,n) can be computed in 𝒪(nlogn)\mathcal{O}(n\log n) operations, such that dω(1)xdω(i1)xdω(i+1)xdω(n)xd^{x}_{\omega(1)}\leq\cdots\leq d^{x}_{\omega(i-1)}\leq d^{x}_{\omega(i+1)}\leq\cdots\leq d^{x}_{\omega(n)}.

Next, by iterating through j{1,,i1,i+1,,n}j\in\{1,\ldots,i-1,i+1,\ldots,n\} in ascending order, we can store in an array qq those jj values that satisfy xω(j)<xix_{\omega(j)}<x_{i} and yω(j)>yiy_{\omega(j)}>y_{i}. As a result, for all k,l{1,,length(q)}k,l\in\{1,\ldots,\text{length}(q)\}, we have q[k]<q[l]q[k]<q[l] whenever k<lk<l, with xω(q[k])<xix_{\omega(q[k])}<x_{i} and yω(q[k])>yiy_{\omega(q[k])}>y_{i}. The remaining integers in {1,,i1,i+1,,n}\{1,\ldots,i-1,i+1,\ldots,n\} that do not satisfy the conditions for qq can be stored in another array rr in ascending order. Thus, for all k,l{1,,length(r)}k,l\in\{1,\ldots,\text{length}(r)\}, we have r[k]<r[l]r[k]<r[l] whenever k<lk<l, with xω(r[k])>xix_{\omega(r[k])}>x_{i} or yω(r[k])<yiy_{\omega(r[k])}<y_{i}. This iterative checking and storing process requires only 𝒪(n)\mathcal{O}(n) operations. Note that qq and rr have no common elements, and together they contain all elements from the set {1,,i1,i+1,,n}\{1,\ldots,i-1,i+1,\ldots,n\}.

Given an array of real numbers [s1,,sm][s_{1},\ldots,s_{m}], the ‘surpasser count’ is defined as an array of integers [t1,,tm][t_{1},\ldots,t_{m}] of the same length, where tj=k=j+1mI[sj<sk]t_{j}=\sum_{k=j+1}^{m}I[s_{j}<s_{k}] for 1jm1\leq j\leq m. Using the merge sort technique, this can be computed in 𝒪(mlogm)\mathcal{O}(m\log m) operations (see Bird, 2010, Chapter 2). Instead of using the surpasser count algorithm directly, we’ll use the ‘trail count’. For a given array of real numbers [s1,,sm][s_{1},\ldots,s_{m}], the trail count is defined as an array of integers [t1,,tm][t_{1},\ldots,t_{m}] of the same length, where tj=k=1jI[sjsk]=1+k=1j1I[sj>sk]t_{j}=\sum_{k=1}^{j}I[s_{j}\geq s_{k}]=1+\sum_{k=1}^{j-1}I[s_{j}>s_{k}] for 1jm1\leq j\leq m. Essentially, the trail count is the surpasser count applied to the reversed array, thus maintaining the same complexity.

We apply the trail count to the array [dω(1)y,,dω(i1)y,dω(i+1)y,,dω(n)y][d^{y}_{\omega(1)},\ldots,d^{y}_{\omega(i-1)},d^{y}_{\omega(i+1)},\ldots,d^{y}_{\omega(n)}], resulting in the array, say, [w1,,wi1,wi+1,,wn][w_{1},\ldots,w_{i-1},w_{i+1},\ldots,w_{n}]. It’s easy to verify that wkw_{k} represents the number of observations, excluding (xi,yi)(x_{i},y_{i}), that are within the neighborhood Ni,ω(k)N_{i,\omega(k)} for 1kin1\leq k\neq i\leq n. Next, we apply the trail count to the array [dω(q[1])y,,dω(q[length(q)])y][d^{y}_{\omega(q[1])},\ldots,d^{y}_{\omega(q[\text{length}(q)])}], resulting in the array, say, [u1,,ulength(q)][u_{1},\ldots,u_{\text{length}(q)}]. It can be verified that uku_{k} is the number of observations that fall within the set {(x,y):(x,y)Ni,ω(q[k]),x<xi,y>yi}\{(x,y):(x,y)\in N_{i,\omega(q[k])},x<x_{i},y>y_{i}\} for 1klength(q)1\leq k\leq\text{length}(q). Thus, ai,q[k]=uka_{i,q[k]}=u_{k} for 1klength(q)1\leq k\leq\text{length}(q). Similarly, applying the trail count to the array [dω(r[1])y,,dω(r[length(r)])y][d^{y}_{\omega(r[1])},\ldots,d^{y}_{\omega(r[\text{length}(r)])}] results in the array, say, [v1,,vlength(r)][v_{1},\ldots,v_{\text{length}(r)}]. It can be verified that vkv_{k} represents the number of points in the set {(x,y):(x,y)Ni,ω(r[k]),x>xi or y<yi}\{(x,y):(x,y)\in N_{i,\omega(r[k])},x>x_{i}\text{ or }y<y_{i}\} for 1klength(r)1\leq k\leq\text{length}(r). Therefore, ai,r[k]=wr[k]vka_{i,r[k]}=w_{r[k]}-v_{k} for 1klength(r)1\leq k\leq\text{length}(r). Hence, all ai,ja_{i,j} for 1jin1\leq j\neq i\leq n can be computed together in 𝒪(nlogn)\mathcal{O}(n\log n) operations.

Similar steps can be taken to show that for a fixed ii, bi,j,ci,j,di,jb_{i,j},c_{i,j},d_{i,j} for 1jin1\leq j\neq i\leq n can be computed collectively in 𝒪(nlogn)\mathcal{O}(n\log n) operations. Since Ti,jT_{i,j} is a function of ai,j,bi,j,ci,ja_{i,j},b_{i,j},c_{i,j}, and di,jd_{i,j}, it directly follows that computing Ti,jT_{i,j} collectively for 1jin1\leq j\neq i\leq n also requires 𝒪(nlogn)\mathcal{O}(n\log n) operations. ∎

Appendix B

Refer to caption
Figure 8: The scatter plots of ‘doppler’, ‘BEX2BEX_{2}’, ‘Lissajous A’, ‘Lissajous B’, ‘rose curve’, ‘spiral’, ‘tilted square’ and ‘five clouds’ distributions.
Refer to caption
Figure 9: The plot of ‘parabola λ\lambda’, ‘circle λ\lambda’, ‘sine λ\lambda’ and ‘laminscate λ\lambda’ distributions for λ=0,1,2\lambda=0,1,2.

References

  • Albanese et al. [2012] Davide Albanese, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman, and Cesare Furlanello. Minerva and minepy: a c engine for the mine suite and its r, python and matlab wrappers. Bioinformatics, page bts707, 2012.
  • Bird [2010] Richard Bird. Pearls of functional algorithm design. Cambridge University Press, 2010.
  • Chatterjee [2021] Sourav Chatterjee. A new coefficient of correlation. Journal of the American Statistical Association, 116(536):2009–2022, 2021.
  • Chatterjee and Holmes [2023] Sourav Chatterjee and Susan Holmes. XICOR: Robust and generalized correlation coefficients, 2023. URL https://CRAN.R-project.org/package=XICOR. https://github.com/spholmes/XICOR.
  • Chaudhuri and Hu [2019] Arin Chaudhuri and Wenhao Hu. A fast algorithm for computing distance correlation. Computational statistics & data analysis, 135:15–24, 2019.
  • Gretton et al. [2007] Arthur Gretton, Kenji Fukumizu, Choon Teo, Le Song, Bernhard Schölkopf, and Alex Smola. A kernel statistical test of independence. Advances in neural information processing systems, 20, 2007.
  • Heller et al. [2013] Ruth Heller, Yair Heller, and Malka Gorfine. A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2):503–510, 2013.
  • Hoeffding [1948] Wassily Hoeffding. A non-parametric test of independence. The Annals of Mathematical Statistics, 19(4):546–557, 1948.
  • Kaufman et al. [2019] Barak Brill & Shachar Kaufman, based in part on an earlier implementation by Ruth Heller, and Yair Heller. HHG: Heller-Heller-Gorfine Tests of Independence and Equality of Distributions, 2019. URL https://CRAN.R-project.org/package=HHG. R package version 2.3.
  • Kendall [1938] Maurice G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
  • Pfister and Peters [2019] Niklas Pfister and Jonas Peters. dHSIC: Independence Testing via Hilbert Schmidt Independence Criterion, 2019. URL https://CRAN.R-project.org/package=dHSIC. R package version 2.1.
  • Reshef et al. [2011] David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander, Michael Mitzenmacher, and Pardis C Sabeti. Detecting novel associations in large data sets. science, 334(6062):1518–1524, 2011.
  • Rizzo and Szekely [2022] Maria Rizzo and Gabor Szekely. energy: E-Statistics: Multivariate Inference via the Energy of Data, 2022. URL https://CRAN.R-project.org/package=energy. R package version 1.7-11.
  • Roy [2020] Angshuman Roy. Some copula-based tests of independence among several random variables having arbitrary probability distributions. Stat, 9(1):e263, 2020.
  • Roy et al. [2020] Angshuman Roy, Anil K Ghosh, Alok Goswami, and CA Murthy. Some new copula based distribution-free tests of independence among several random variables. Sankhya A, pages 1–41, 2020.
  • Spearman [1904] Charles Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72–101, 1904.
  • Székely et al. [2007] Gábor J Székely, Maria L Rizzo, and Nail K Bakirov. Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6):2769–2794, 2007.
  • Zhang [2019] Kai Zhang. Bet on independence. Journal of the American Statistical Association, 2019.