This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymmetric Dependence Measurement and Testing

H. D. Vinod address: H. D. Vinod, Professor of Economics, Fordham University, Bronx, New York, USA 10458. E-mail: [email protected]. JEL codes C30, C51. Keywords: Kernel regression, Standardized beta coefficients, Partial Correlation.
Abstract

Measuring the (causal) direction and strength of dependence between two variables (events), XiX_{i} and XjX_{j}, is fundamental for all science. Our survey of decades-long literature on statistical dependence reveals that most assume symmetry in the sense that the strength of dependence of XiX_{i} on XjX_{j} exactly equals the strength of dependence of XjX_{j} on XiX_{i}. However, we show that such symmetry is often untrue in many real-world examples, being neither necessary nor sufficient. Vinod’s (2014) asymmetric matrix R[1,1]R^{*}\in[-1,1] of generalized correlation coefficients provides intuitively appealing, readily interpretable, and superior measures of dependence. This paper proposes statistical inference for RR^{*} using Taraldsen’s (2021) exact sampling distribution of correlation coefficients and the bootstrap. When the direction is known, proposed asymmetric (one-tail) tests have greater power.

1 Introduction

A great deal of science focuses on understanding the dependence between variables. Its quantification has a long history starting with Galton-Pearson correlation coefficient rijr_{ij} from the 1890s and its cousins, including Spearman’s ρ\rho, Kendall’s τ\tau, and Hoeffding’s DD. Let dep(Xi|Xj)dep(X_{i}|X_{j}) measure the strength of dependence of XiX_{i} on XjX_{j} given a measurement XjX_{j}. Many measures of dependence try to satisfy the symmetry postulate by Renyi (1959), which posits that the two strengths based on opposite conditioning are identical:

dep(Xi|Xj)dep(Xj|Xi).dep(X_{i}|X_{j})\equiv dep(X_{j}|X_{i}). (1)

We regard the symmetry postulate akin to an avoidable dogma. The following subsection explains why attempting to satisfy this symmetry equation (1) provides misleading measures of dependence in practice.

1.1 Four Examples of Asymmetric Dependence

A correct notion of dependence in nature or data is rarely (if ever) symmetric.

  • A newborn baby boy depends on his mother for his survival, but it is ludicrous to expect that his mother must exactly equally depend on the boy for her survival, as implied by (1).

  • Meteorologists know that the average daily high of December temperatures in New York city is 44 degrees Fahrenheit and that this number depends on New York’s latitude (40.7). The latitude is a geographical given and does not depend on anything like city temperatures. Symmetric dependence by (1) between temperature and latitude implies the ludicrous claim that latitude depends on temperature with equal strength.

  • For a third example, imagine a business person B with several shops. B’s 30% earnings depend on the hours worked by a key employee in one shop. Now the symmetry by (1) means that hours worked by the key employee always depend on B’s earnings, exactly 30%.

  • Our fourth example assumes YY as complete data, but a subset of YY is unavailable. The available subset XX is a proxy that depends on YY, but the complete set YY does not equally depend on its subset XX.

These four examples are enough to convince the reader that the symmetry postulate is neither necessary nor sufficient for real-world dependence. However, it is interesting that the unrealistic property (1) is an old established sacrosanct postulate from the 1950s, Renyi (1959). Even in 2022, Geenens and de Micheaux (2022)(“GM22”), still adhere to the symmetry postulate (dogma) by proposing an ingenious new definition of dependence to fit the model in (1). Actually, measure of dependence satisfying (1) can be ludicrous some contexts analogous to the four examples above.

1.2 Sources of the Symmetry Dogma

What is the origin of symmetry dogma?

(i) The definitional and numerical equality of covariances, Cov(Xi,Xj)=Cov(Xj,Xi)Cov(X_{i},X_{j})=Cov(X_{j},X_{i}) may have been the initial reason for the symmetry result.

(ii) In a bivariate linear regression X1=a+bX2+ϵX_{1}=a+bX_{2}+\epsilon, the strength of dependence of X1X_{1} on X2X_{2} is clearly measured by the coefficient of determination R1|22R^{2}_{1|2}. If we consider a flipped linear regression, X2=a+bX1+ϵX_{2}=a^{\prime}+b^{\prime}X_{1}+\epsilon^{\prime}, the strength of dependence is R2|12R^{2}_{2|1}. The assumption of linearity makes the two strengths equal to each other R2|12=R1|22R^{2}_{2|1}=R^{2}_{1|2}. The equality of two R2R^{2} strengths supports the symmetry dogma. When we consider the signed square roots of the two R2R^{2} values, we have a symmetric matrix of correlation coefficients, rij=rjir_{ij}=r_{ji}. These signed measures of dependence further support the dogma.

The symmetry dogma depends on the harmless-looking linearity assumption. Back in 1784, the German philosopher Kant said: “Out of the crooked timber of humanity, no straight thing was ever made.” Since social sciences and medicine deal with human subjects, evidence supporting linearity and the implied symmetry dogma is missing.

(iii) Since all distances satisfy symmetry, it may have been another reason behind Renyi’s postulate.

(iv) The concept of statistical independence in probability theory is symmetric. It can be formulated in terms of the absence of any divergence between a joint density and a product of two marginal densities,

f(XiXj)=f(Xi)f(Xj).f(X_{i}X_{j})=f(X_{i})\,f(X_{j}). (2)

Since dependence is the opposite of independence, it is tempting (but unhelpful) to impose symmetry on dependence as well.

1.3 Statistical Independence in Contingency Tables

Two-way contingency tables refer to tabulated data on a grand total of TT observations distributed over an r×cr\times c matrix. There are two categorical variables represented by rr manifestations of row characteristics RiR_{i} along (i=1,2,ri=1,2,\ldots r) rows, and cc column characteristics CjC_{j} along (j=1,2,cj=1,2,\dots c) columns. The body of the (r×cr\times c) contingency table has observed values OijO_{ij} in a matrix cell located at row number ii and column number jj. The joint probability P(Ri,Cj)P(R_{i},C_{j}) is simply Oij/GTO_{ij}/GT, where GT denotes the grand total of the tabulated numbers. The row margins of the contingency table have row totals, Ri=ΣjOijR_{i}=\Sigma_{j}O_{ij}. The column margin has column totals, Cj=ΣiOijC_{j}=\Sigma_{i}O_{ij}. The marginal probabilities are P(Ri)=Ri/GTP(R_{i})=R_{i}/GT and P(Cj)=Cj/GTP(C_{j})=C_{j}/GT, which are also called unconditional probabilities.

A conditional probability restricts the sample space to the part of the Table which satisfies the specified condition, referring to a particular row RiR_{i} or column CjC_{j}. The direct computation of conditional probability has the respective row or column sums in its denominator instead of the grand total GTGT. An equivalent calculation of conditional probability defines P(Ri|Cj)=P(Ri,Cj)/P(Cj)P(R_{i}|C_{j})=P(R_{i},C_{j})/P(C_{j}), as a ratio of the joint probability to the marginal probability of the conditioning characteristic. Analogous conditional probability conditioning on row characteristic is a ratio of the same joint probability to the marginal probability of the conditioning row, P(Cj|Ri)=P(Ri,Cj)/P(Ri)P(C_{j}|R_{i})=P(R_{i},C_{j})/P(R_{i}).

In probability theory based on contingency tables, the notion of statistical independence is studied by considering the following three criteria.

(a) P(Ri,Cj)=P(Ri)P(Cj)P(R_{i},C_{j})=P(R_{i})P(C_{j}), joint probability equals the product of marginals.

(b) P(Ri|Cj)=P(Ri)P(R_{i}|C_{j})=P(R_{i}), conditional probability equals unconditional or marginal probability.

(c) P(Cj|Ri)=P(Cj)P(C_{j}|R_{i})=P(C_{j}), the other conditional probability equals unconditional or marginal probability.

Note that criterion (a) is both necessary and sufficient for independence. It is symmetric in that the joint probability is the same even if we interchange the order and write it as P(Cj,Ri)P(C_{j},R_{i}). However, data can satisfy (b) without satisfying (c), and vice versa. Hence tests of independence typically rely on the symmetric criterion (a). However, dependence is the opposite of independence and is generally asymmetric. We find that using (b) and (c) helps avoid the misleading symmetry postulate in the context of dependence.

It is customary to imagine a population of thousands of contingency tables; the observed table is one realization from that population. The null hypothesis (H0H_{0}) is that row and column characteristics are statistically independent. The sample table may not exactly satisfy independence in the sense of (a) to (c) above. The testing problem is whether the observed table of OijO_{ij} values could have arisen from a population where conditions (a) to (c) are satisfied. That is, OijO_{ij} are numerically close enough to the expected values Eij=RiCjE_{ij}=R_{i}C_{j} obtained from the cross-product of relevant marginal totals.

Pearson’s Chi-square test statistic for (H0H_{0}) or independence of row effect and column effect in a contingency table is

χ2=(OijEij)2/Eij,df=(r1)(c1),\chi^{2}=(O_{ij}-E_{ij})^{2}/E_{ij},\quad df=(r-1)(c-1), (3)

where dfdf denotes the degrees of freedom. Note that χ2[0,)]\chi^{2}\in[0,\infty)] of (3) cannot be computed unless we have contingency tables. Statisticians have long recognized that the magnitude of χ2\chi^{2} cannot reliably measure the direction and strength of dependence. This paper assumes that a practitioner would want to know both the general direction and strength of dependence.

2 Symmetric Measures of Dependence

Granger et al. (2004) (“Gr04”) is an important paper on formal testing for statistical independence, especially for time series data. They cite a survey by Tjostheim (1996) on the topic. The novelty in Gr04 is in using nonparametric nonlinear kernel densities in testing the equality (2) in their test of independence. Unfortunately, Gr04 authors adhere to the symmetry dogma by insisting that, similar to independence, a measure of dependence should be a symmetric distance-type ‘metric.’

2.1 Dependence Measures and Entropy

Shannon defined information content in 1948 as the amount of surprise in a piece of information. His information is inversely proportional to the probability of occurrence and applies to both discrete and continuous random variables with probabilities defined by a probability distribution f(y)f(y). In the context of entropy, let us use the fourth example of Section 1.1, where YY is the complete data and XX is a subset with some missing observations. How does XX depend on YY? We develop a measure of dependence using information theory, especially entropy.

Intuitively, entropy is our ignorance or the extent of disorder in a system. The entropy H(Y)H(Y) is defined by the mathematical expectation of the Shannon information or E(logf(y))E(-logf(y)). And the conditional entropy of Y given X averaged over XXis

H(Y|X)=E[E[log(fY|X(Y|X))|X]].H(Y|X)=-E[E[log(f_{Y|X}(Y|X))|X]]. (4)

The reduction in our ignorance H(Y)H(Y) by knowing the proxy XX is H(Y)H(Y|X)H(Y)-H(Y|X). Mutual information Imu(X,Y)I_{mu}(X,Y) is defined as H(x)+H(Y)H(X,Y).H(x)+H(Y)-H(X,Y). It is symmetric since Imu(X,Y)=Imu(Y,X)I_{mu}(X,Y)=I_{mu}(Y,X). The entropy-based measure of dependence is

D(X;Y)=H(Y)H(Y|X)H(Y),D(X;Y)=\frac{H(Y)-H(Y|X)}{H(Y)}, (5)

or proportional reduction in entropy of YY by knowing XX. Reimherr and Nicolae (2013) complain that (5) is not symmetric. By contrast, we view asymmetry as a desirable property.

Neyman-Pearson showed that a way to distinguish between two distributions f(X)f(X) and f(Y)f(Y) for parameter θ\theta is the difference between logs of their likelihood functions. Shannon’s relative entropy, also known as Kullback–Leibler (KL) divergence, is the expected value of that difference,

KLD=E(logf(θ|X)logf(θ|Y)).KLD=E(logf(\theta|X)-logf(\theta|Y)). (6)

It is easy to verify that KLD or relative entropy is not symmetric.

Gr04 authors state on page 650 that “Shannon’s relative entropy and almost all other entropies fail to be ‘metric’, as they violate either symmetry, or the triangularity rule, or both.” We argue that asymmetry is an asset, not a liability, in light of four examples in Section 1.1. Hence, we regard (5) or (6) as superior measures compared to the symmetric measure by Gr04.

D(X;Y) of (5) and KLD of (6) cannot be used directly on data vectors. They need frequency distribution counts as input based on the grouping of data into bins (histogram class intervals). The choice of the number of bins is arbitrary, and D(X;Y) and KLD are sensitive to that choice. Hence, we do not recommend D(X;Y) or KLD as a general-purpose measure of dependence.

2.2 Dependence Measures and Fisher Information

Fisher information measures the expected amount of information given by a random variable YY about a parameter θ\theta of interest. Under Gaussian assumptions, the Fisher information is inversely proportional to the variance. Reimherr and Nicolae (2013) use the Fisher information to define a measure of dependence. Consider the estimation of a model parameter θ\theta using XX as a proxy for unavailable YY. That is, XX is a subset of YY with missing observations, as in the fourth example of Section 1.1. If the Fisher information for θ\theta based on proxy XX is denoted by X(θ)\mathcal{I}_{X}(\theta), they define a measure of dependence as:

D(X;Y)=X(θ)Y(θ),D(X;Y)=\frac{\mathcal{I}_{X}(\theta)}{\mathcal{I}_{Y}(\theta)}, (7)

where X(θ)Y(θ)\mathcal{I}_{X}(\theta)\leq\mathcal{I}_{Y}(\theta). Consider the special case where a proportion pp of the YY data are missing in XX at completely random locations. Then, the measure of dependence (7) equals pp. This measure of dependence is almost acceptable because it is asymmetric, where subset XX being a proxy for YY cannot be interchanged with YY, except that D(X;Y)D(X;Y) of (7) cannot be negative. Later, we recommend in Section 3 a more generally applicable and intuitive measure of dependence.

2.3 Regression Dependence from Copulas

Consider a two-dimensional joint (cumulative) distribution function F(X,Y)F(X,Y) and marginal densities U=F1(X)U=F_{1}(X) and V=F2(Y)V=F_{2}(Y) obtained by probability integral transformations. Sklar proved in 1959 that a copula function C(F1,F2)=FC(F_{1},F_{2})=F is unique if the components are continuous. The copula function C:[0,1]2[0,1]C:[0,1]^{2}\to[0,1] is subject to certain conditions forcing it to be a bivariate uniform distribution function. It is extended to the multivariate case to describe the dependence structure of the joint density. We have noted in section 1.3 that a contingency table represents the joint dependence structure of row and column characteristics. Copulas represent similar joint dependence when row and column characteristics are continuous variables rather than simple categories.

Dette et al. (2013) (“DSS13”) define joint density as FX,YF_{X,Y}, and conditional density of YY given XX as FY|X=xF_{Y|X=x}. They use uniform random variables UU and VV to construct copula CC as a joint distribution function. The copula serves as their measure of dependence based on the quality of regression-based prediction of YY from XX. The flipped prediction of XX from YY ignored by DSS13 is considered in Section 3 in the sequel.

DSS13 assume Lipschitz continuity, which implies that a copula is absolutely continuous in each argument, so that it can be recovered from any of its partial derivatives by integration. The conditional distribution FV|U=uF_{V|U=u} is related to the corresponding copula C(X,Y)C(X,Y) by FV|U=u(v)=1CX,Y(u,v)F_{V|U=u}(v)=\partial_{1}C_{X,Y}(u,v).

A symmetric measure of dependence proposed by DSS13 is denoted here as

rD(X,Y)=60101FV|U=u(v)2𝑑v𝑑u,r_{D}(X,Y)=6\int_{0}^{1}\int_{0}^{1}F_{V|U=u}(v)^{2}dvdu, (8)

where rD=0r_{D}=0 represents independence, and rD=1r_{D}=1 represents almost sure functional dependence. DSS13 focus on rDr_{D} filling the intermediate range of the closed interval [0,1][0,1] while ignoring the negative range [1,0)[-1,0). Section 3 covers [1,1][-1,1], including the negative range. DSS13 rely on parametric copulas, making them subject to identification problems, as explained by Allen (2022).

The numerical computation of (8) is involved since it requires the estimation of the copula’s partial derivative. DSS13 authors propose a kernel-based estimation method without providing any ready-to-use computational tools for rDr_{D}.

Remark 3.7 in Beare (2010) states that symmetric copulas imply time reversibility, which is unrealistic for economic and financial data. Bouri et al. (2020) reject the symmetry dogma and note that their parametric copula can capture tail dependence, which is important in a study of financial markets. Allen (2022) uses nonparametric copula construction and asymmetric RR^{*} from Vinod (2014). Allen’s application to financial data shows that cryptocurrencies do not help portfolio diversification.

2.4 Hellinger Correlation η\eta as a Dependence Measure

Now we turn to the recent GM22 paper mentioned earlier, which proposes Hellinger correlation η\eta as a new symmetric measure of the strength of dependence. They need to normalize to ensure that η[0,1]\eta\in[0,1]. GM22 denote the normalized version as η^\hat{\eta}. GM22 authors explain why dependence axioms by Renyi (1959) need updating, while claiming that their η\eta satisfies all updated axioms. Unfortunately, GM22 retain the symmetry axiom criticized in Section 1.1 above. An advantage of η\eta over Pearson’s rijr_{ij} is that it incorporates some nonlinearities.

Let F1F_{1} and F2F_{2} denote the known marginal distributions of random variables X1X_{1} and X2X_{2}, and let F12F_{12} denote their joint distribution. Now, GM22 authors ask readers to imagine reconstructing the joint distribution from the two marginals. The un-intuitive (convoluted?) definition of the strength of dependence by GM22 is the size of the “missing link” in reconstructing the joint from marginals. This definition allows GM22 to claim that symmetry is “unquestionable.”

GM22 authors define squared Hellinger distance 2(X1,X2)\mathcal{H}^{2}(X_{1},X_{2}) as the missing link between F12F_{12} and F1F2F_{1}F_{2}. They approximate a copula formulation of 2\mathcal{H}^{2} using the Bhattacharyya (1943) affinity coefficient \mathcal{B}. Let C12C_{12} denote the copula of (X1,X2)(X_{1},X_{2}), and c12c_{12} denote its density. The computation of η^\hat{\eta} in the R package HellCor uses numerical integrals =c12\mathcal{B}=\int\int\surd c_{12}. Hellinger correlation η\eta is

η=22{4+(434)1/22}1/2.\eta=\frac{2}{\mathcal{B}^{2}}\{\mathcal{B}^{4}+(4-3\mathcal{B}^{4})^{1/2}-2\}^{1/2}. (9)

The Hellinger correlation is symmetric, η(X1,X2)=η(X2,X1)\eta(X_{1},X_{2})=\eta(X_{2},X_{1}).

GM22 provide an R package HellCor to compute η^\hat{\eta} from data as a measure of dependence, and test the null hypothesis of independence of two variables.

A direct and intuitive measure of dependence in a regression framework is the multiple correlation coefficient (of determination) R1|22R_{1|2}^{2}. It is symmetric because even if we flip X1X_{1} and X2X_{2}, the R2|12R_{2|1}^{2} from linear regressions is exactly the same. The reason for the equality of two flipped R2R^{2} values, R(2|1)2=R(1|2)2R^{2}_{(2|1)}=R^{2}_{(1|2)}, is the assumption of linearity of the two regressions. When we relax linearity, the two R2R^{2} values generally differ, R2|12R1|22R_{2|1}^{2}\neq R^{2}_{1|2}. We argue that quantitative researchers should reject the unrealistic linearity assumption in the presence of ready-to-use kernel regression (np package) software.

Kernel-based R2R^{2} values of flipped regressions are rarely equal. GM22 cite Janzing et al. (2013) only to reject such asymmetric dependence suggested by nonparametric regressions.

3 Recommended Measures of Dependence

We have noted earlier that covariances satisfy symmetry Cov(Xi,Xj)=Cov(Xj,Xi)Cov(X_{i},X_{j})=Cov(X_{j},X_{i}). However, the sign of symmetric covariances suggests the overall direction of the dependence between the two variables. For example, Cov(Xi,Xj)<0Cov(X_{i},X_{j})<0 means when XiX_{i} goes up XjX_{j} goes down, by and large. Most of the symmetric measures of dependence discussed above fail to provide this type of useful directional information except Pearson’s correlation coefficients rijr_{ij}. Hence, rijr_{ij} has retained its popularity as a valuable measure of dependence for over a century, despite assuming unrealistic linearity.

Zheng et al. (2012) reject the dogma by introducing nonsymmetric generalized measures of correlation (GMC[0,1]GMC\in[0,1]), proving that

GMC(Y|X)GMC(X|Y).GMC(Y|X)\neq GMC(X|Y). (10)

Since GMCs fail to provide directional information in covariances needed by practitioners, Vinod (2014) and Vinod (2017) extend Zheng et al. (2012) to develop a non-symmetric correlation matrix R={rij}R^{*}=\{r^{*}_{ij}\}, where rijrjir^{*}_{ij}\neq r^{*}_{ji}, while providing an R package. The R package generalCorr uses kernel regressions to overcome the linearity of rijr_{ij} from the np package by Hayfield and Racine, which can handle kernel regressions among both continuous and discrete variables.

Sometimes the research interest is focused on the strength of dependence, while the direction is ignored, perhaps because it is already established. In that case, one can use the R package generalCorr and the R function depMeas(,). It is defined as appropriately signed larger of the two generalized correlations, or

depMeas(Xi,Xj)=sgnmax(|r(i|j)|,|r(j|i)|),depMeas(X_{i},X_{j})=sgn*max(|r^{*}(i|j)|,|r^{*}(j|i)|), (11)

where sgnsgn is the sign of the covariance between the two variables.

In general, both the strength and general direction of quantitative dependence matter. Hence, we recommend two asymmetric measures r(Xi|Xj)r^{*}(X_{i}|X_{j}) and r(Xj|Xi)r^{*}(X_{j}|X_{i}). The generalCorr package functions for computing RR^{*} elements are rstar(x,y) and gmcmtx0(mtx). The latter converts a data matrix argument (mtx) with pp columns, into a p×pp\times p asymmetric matrix RR^{*} of generalized correlation coefficients. Regarding the direction of dependence, the convention is that the variable named in the column is the “cause” or the right-hand regressor, and the variable named along the row is the response. Thus the recommended measures from RR^{*} are easy to compute. See an application to forecasting the stock market index of fear (VIX) and causal path determination in Allen and Hooper (2018).

3.1 Statistical Inference for Recommended Measures

We recommend the signed generalized correlation coefficients 1rijrji1-1\leq r^{*}_{ij}\neq r^{*}_{ji}\leq 1 from the RR^{*} matrix as the best dependence measure. This is because they do not adhere to the potentially misleading symmetry dogma while measuring arbitrary nonlinear dependence dictated by the data. An additional reason is its potential for more powerful (one-tail) inference, discussed in this section.

The sign of each element of the RR^{*} matrix is based on the sign of the covariance Covij=Cov(Xi,Xj)Cov_{ij}=Cov(X_{i},X_{j}). A two-tail test of significance is appropriate only when Covij0Cov_{ij}\approx 0. Otherwise, a one-tail test is appropriate. Any one-tailed test provides greater power to detect an effect in one direction by not testing the effect in the other direction, Kendall and Stuart (1977), sections 22.24 and 22.28.

Since sample correlation coefficient rijr_{ij} from a bivariate normal parent has a non-normal distribution, Fisher developed his famous z-transformation in the 1920s. He proved that the following transformed statistic rijTr^{T}_{ij} is approximately normal with a stable variance,

rijT=(1/2)log(1+rij)(1rij)N(0,1/n),r^{T}_{ij}=(1/2)\quad log\frac{(1+r_{ij})}{(1-r_{ij})}\sim N(0,1/n), (12)

provided rij1r_{ij}\neq 1. Recent work has developed the exact distribution of a correlation coefficient. It is now possible to directly compute a confidence interval for any hypothesized value ρ\rho of the population correlation coefficient.

Let rr be the empirical correlation of a random sample of size nn from a bivariate normal parent. Theorem 1 of Taraldsen (2021) generalized Fisher’s famous z-transformation extended by C. R. Rao. The exact density with v=(n1)>1v=(n-1)>1 is

f(ρ|r,v)\displaystyle f(\rho|r,v) =v(v1)Γ(v1))(2π)Γ(v+0.5)(1r2)v12(1ρ2)v22(1rρ)12v2\displaystyle=\frac{v(v-1)\Gamma(v-1))}{\surd(2\pi)\Gamma(v+0.5)}(1-r^{2})^{\frac{v-1}{2}}\,(1-\rho^{2})^{\frac{v-2}{2}}(1-r\rho)^{\frac{1-2v}{2}}
F(32;0.5;v+0.5;1+rρ2),\displaystyle\quad F(\frac{3}{2};-0.5;v+0.5;\frac{1+r\rho}{2}),

where F(.;.;.;.) denotes the Gaussian hypergeometric function, available in the R package hypergeom by R.K.S Hankin. The following R code readily computes (3.1) over a grid of 2001 rr values.

library(hypergeo); r=seq(-1,1,by=0.001)
Tarald=function(r,v,rho,cum){ #find quantile r given cum
Trm1=(v*(v-1)*gamma(v-1))/((sqrt(2*pi)*gamma(v+0.5)))
Trm2=(1-r^2)^((v-1)/2)
Trm2b=((1-rho^2)^((v-2)/2))*((1-rho*r)^((1-2*v)/2))
Trm3b=hypergeo(3/2,-1/2,(v+0.5),(1+r*rho)/2)
y0=Re(Trm1*Trm2*Trm2b*Trm3b)
p=y0/sum(y0)
cup=cumsum(p)
loc=max(which(cup<cum))+1
return(r[loc])}
Tarald(r=seq(-1,1,by=0.001),v=11,rho=0,cum=0.05) #example

Assuming that the data come from a bivariate normal parent, the sampling distribution of any correlation coefficient is (3.1). Hence, the sampling distribution of unequal off-diagonal elements of the matrix of generalized correlations RR^{*} also follows (3.1). When we test the null hypothesis H0:ρ=0H_{0}:\rho=0, the relevant sampling distribution is obtained by plugging ρ=0\rho=0 in (3.1) depicted in Figure 1 for two selected sample sizes. Both distributions are centered at the null value ρ=0\rho=0.

A two-tail (95%, say) confidence interval is obtained by using the 2.5% and 97.5% quantiles of the density. If the observed correlation coefficient rr is inside the confidence interval, we say that the observed rr is statistically insignificant, as it could have arisen from a population where the null value ρ=0\rho=0 holds.

Figure 1: Taraldsen’s exact sampling density of a correlation coefficient under the null of ρ=0\rho=0, solid line n=50, dashed line n=15
Refer to caption

Similarly, one can test the nonzero null hypothesis H0:ρ=0.5H_{0}:\rho=0.5 using the equation obtained by plugging ρ=0.5\rho=0.5 in (3.1) depicted in Figure 2 .

Figure 2: Taraldsen’s exact sampling density of correlation coefficient under the null of ρ=0.5\rho=0.5, solid line n=50, dashed line n=15
Refer to caption

Figures 1 and 2 show that the formula (3.1) and our numerical implementation are ready for practical use. These exact densities depend on the sample size and on the value of the population correlation coefficient, 1ρ1-1\leq\rho\leq 1. Given any hypothesized ρ\rho and sample size, a computer algorithm readily computes the exact density, similar to Figures 1 and 2. Suppose we want to help typical practitioners who want the tail areas useful for testing the null hypothesis ρ=0\rho=0. Then, we need to create a table of a set of typical quantiles evaluated at certain cumulative probabilities and a corresponding selected set of common sample sizes with a fixed ρ=0\rho=0.

Because of the complicated form of the density (3.1), it is not surprising that its (cumulative) distribution function 1rf(ρ|r,v)\int_{-1}^{r}f(\rho|r,v) by analytical methods is not available in the literature. Hence, let us compute cumulative probabilities by numerical integration defined as the rescaled area under the curve f(r,v)f(r,v) for ρ=0\rho=0. See Figure 1 for two choices of v(=n1)v(=n-1) for sample sizes (n=50, 15). The cumulative probability becomes a sum of rescaled areas of small-width rectangles whose heights are determined by the curve tracing f(r,v)f(r,v). The accuracy of numerical approximation to the area is obviously better, the larger the number of rectangles.

We use a sequence of r[1,1]r\in[-1,1] created by the R command r=seq(-1,1, by =0.001), yielding 2001 rectangles. Denote the height of f(r,v)f(r,v) by Hf=Hf(r,v)H_{f}=H_{f(r,v)}. Now, the area between any two r[1,1]r\in[-1,1] limits, say rLor_{Lo} and rUpr_{Up} is a summation of areas (height times width=0.001) of all rectangles. Now, the cumulative probabilities in the range are

ΣrLorUpHf/Σ11Hf,\Sigma_{r_{Lo}}^{r_{Up}}H_{f}/\Sigma_{-1}^{1}H_{f}, (14)

where the common width cancels, and where the denominator Σ11Hf\Sigma_{-1}^{1}H_{f} converts the rectangle areas into probabilities. More generally, we can use f(ρ,r,v)f(\rho,r,v) for any ρ[1,1]\rho\in[-1,1].

Thus we have a numerical approximation to the exact (cumulative) distribution function under the bivariate normality of the parent,

F(ρ,r,v)=1rf(ρ|r,v).F(\rho,r,v)=\int_{-1}^{r}f(\rho|r,v).

The transform from f(.)f(.) to F(.)F(.) is called the probability integral transform, and its inverse F1(c|ρ,v)F^{-1}(c|\rho,v) gives relevant correlation coefficients rr as quantiles for specified cumulative probability cc as the argument. A computer algorithm can readily find such quantiles.

The exact F1(c|ρ,v)F^{-1}(c|\rho,v) allows the construction of confidence intervals based on quantiles for each ρ\rho and sample size. For example, a 95% two-tail confidence interval uses the 2.5% quantile F1(c=0.025)F^{-1}(c=0.025) as the lower limit, and 97.5% quantile F1(c=0.975)F^{-1}(c=0.975) as the upper limit. These limits depend on hypothesized ρ\rho and sample size. Since ρ=0\rho=0 is a common null hypothesis for correlation coefficients, let us provide a table of F1(c)F^{-1}(c) quantiles for eleven sample sizes (listed in row names) and eight cumulative probabilities listed in column titles of Table 1.

The p-values in statistical inference are defined as the probability of observing the random variable (correlation coefficient) as extreme or more extreme than the observed value of the correlation coefficient rr for a given null value ρ=0\rho=0. Any one-tail p-values based on f(ρ|r,v)f(\rho|r,v) of (3.1) for arbitrary nonzero “null” values of ρ\rho can be similarly computed by numerical integration defined as the area under the curve. Some code for R functions Tarald(.), and pTarald(.) is included in Sections 3.1 and 4, respectively.

Table 1: Correlation coefficients as quantiles evaluated at specified cumulative probabilities (c=.) using Taraldsen’s exact sampling distribution for various sample sizes assuming ρ=0\rho=0
c= c=
c=0.01 0.025 c=0.05 c=0.1 c=0.9 c=0.95 0.975 c=0.99
n=5 -0.83 -0.75 -0.67 -0.55 0.55 0.67 0.75 0.83
n=10 -0.66 -0.58 -0.50 -0.40 0.40 0.50 0.58 0.66
n=15 -0.56 -0.48 -0.41 -0.33 0.33 0.41 0.48 0.56
n=20 -0.49 -0.42 -0.36 -0.28 0.28 0.36 0.42 0.49
n=25 -0.44 -0.38 -0.32 -0.26 0.26 0.32 0.38 0.44
n=30 -0.41 -0.35 -0.30 -0.23 0.23 0.30 0.35 0.41
n=40 -0.36 -0.30 -0.26 -0.20 0.20 0.26 0.30 0.36
n=70 -0.27 -0.23 -0.20 -0.15 0.15 0.20 0.23 0.27
n=90 -0.24 -0.20 -0.17 -0.14 0.14 0.17 0.20 0.24
n=100 -0.23 -0.20 -0.16 -0.13 0.13 0.16 0.20 0.23
n=150 -0.19 -0.16 -0.13 -0.10 0.10 0.13 0.16 0.19

For the convenience of practitioners, we explain how to use the cumulative probabilities in Table 1 in the context of testing the null hypothesis ρ=0\rho=0. The Table confirms that the distribution is symmetric around ρ=0\rho=0 as in Figure 1. Let us consider some examples. If n=100, the critical value from Table 1 for a one-tail 95% test is 0.16 (line n=100, column c=0.95). Let the observed positive r be 0.3. Since rr exceeds the critical value, (r>0.16(r>0.16), we reject ρ=0\rho=0. If n=25, the critical value for a 5% left tail in Table 1 is 0.32-0.32. If the observed r=0.44r=-0.44, is less than the critical value 0.32-0.32 it falls in the left tail, and we reject ρ=0\rho=0 to conclude that it is significantly negative.

Table 1 can be used for constructing a few two-tail 95% confidence intervals as follows. If the sample size is 30, we see along the row n=30, and column c=0.025 gives 0.35-0.35 as the lower limit, and column c=0.975 gives 0.350.35 as the upper limit. In other words, for n=30, any correlation coefficient smaller than 0.35 in absolute value is statistically insignificant.

If the standard bivariate normality assumption is not believed, one can use the maximum entropy bootstrap (R package meboot) designed for dependent data. A bootstrap application creates a large number J=999J=999, say, versions of data (Xi,XjX_{i\ell},X_{j\ell}) for =1,J\ell=1,\ldots J. Each version yields r(i|j;),r(j|i;)r^{*}(i|j;\ell),r^{*}(j|i;\ell) values. A large set of JJ replicates of these correlations give a numerical approximation to the sampling distribution of these correlations. Note that such a bootstrap sampling distribution is data-driven. It does not assume bivariate normality needed for the construction of Table 1 based on (3.1).

Sorting the replicated r(i|j;),r(j|i;)r^{*}(i|j;\ell),r^{*}(j|i;\ell) values from the smallest to the largest, one gets their “order statistics” denoted upon inserting parentheses by replacing \ell by ()(\ell). Now a left-tail 95% confidence interval for r(i|j)r^{*}(i|j) leaves a 5% probability mass in the left tail. The interval is approximated by the order statistics as [r(i|j;(50)),1][r^{*}(i|j;(50)),1]. If the hypothesized ρ=0\rho=0 is inside the one-tail interval, one fails to reject (accepts) the null hypothesis H0:ρ=0H_{0}:\rho=0.

We conclude this section by noting that recommended measures of dependence based on the RR^{*} matrix and their formal inference are easy to implement. The tabulation of Taraldsen’s exact sampling distribution of correlation coefficients in Table 1 is new and should be of broader applicability. It is an improvement over standard significance tests of correlation coefficients based on Fisher’s z-transform. The next section illustrates with examples the use of Table 1, newer dependence measures, and other inference tools.

4 Dependence Measure Examples & Tests

This section considers some examples of dependence measures. Our first example deals with fuel economy in automobile design. R software comes with ‘mtcars’ data on ten aspects of automobile design and performance for 32 automobiles. We consider two design features for illustration, miles per gallon mpgmpg and horsepower hphp. Vinod (2014) reports the Pearson correlation coefficient r(mpg,hp)=0.78r(mpg,hp)=-0.78 in his Figure 2. The negative sign correctly shows that one gets reduced mpgmpg when a car has larger horsepower hp.hp. Table 2 in Vinod (2014) reports two generalized correlation coefficients obtained by using kernel regressions as r(mpg|hp)=0.938r^{*}(mpg|hp)=-0.938 and r(hp|mpg)=0.853r^{*}(hp|mpg)=-0.853.

One can interpret these r(Xi|Xj)r^{*}(X_{i}|X_{j}) values as signed strengths of dependence of XiX_{i} on the conditioning variable XjX_{j}. The strengths are asymmetric, |r(mpg|hp)|>|r(hp|mpg)||r^{*}(mpg|hp)|>|r^{*}(hp|mpg)|, the absolute value of a dependence strength using both generalized correlation coefficients is larger than the dependence strength suggested under linearity. Thus Pearson’s correlation coefficient can underestimate dependence by assuming linearity.

For the ‘mtcars’ data depMeas based on (11) is 0.938-0.938. Now consider Table 1 for n=30 row and c=0.05 for a one-tail critical value 0.30-0.30. The observed correlation 0.938-0.938 is obviously in the left tail (rejection region) of the exact sampling distribution of the correlation coefficient. Thus the negative dependence of fuel economy (mpg) on the car’s horsepower is statistically significant. We re-confirm the significance by computing the one-tail p-value (= 1e-16) using an R function pTarald(.). Our R code for p-values from Taraldsen’s exact density of correlation coefficients is given next.

pTarald=function(r,n,rho,obsr){
v=n-1
if(v<=164)  Trm1=(v*(v-1)*gamma(v-1))/((sqrt(2*pi)*gamma(v+0.5)))
if(v>164)  Trm1=(164*(163)*gamma(163))/((sqrt(2*pi)*gamma(163.5)))
Trm2=(1-r^2)^((v-1)/2)
if(rho!=0)  Trm2b=((1-rho^2)^((v-2)/2))*((1-rho*r)^((1-2*v)/2))
if(rho==0)  Trm2b=1
Trm3b=Re(hypergeo(3/2,-1/2,(v+0.5),(1+r*rho)/2))
y0=Re(Trm1*Trm2*Trm2b*Trm3b)
p=y0/sum(y0)
cup=cumsum(p)
loc=max(which(r<obsr))+1
if(obsr<0) ans=cup[loc]
if(obsr>=0) ans=1-cup[loc]
return(ans)}
pTarald(r=seq(-1,1,by=0.001),n=32,rho=0,obsr=-0.938)

The first term (Trm1) in the R function computing the p-values involves a ratio of two gamma (factorial) functions appearing in (3.1). For n>164n>164 each gamma becomes infinitely large, and Trm1 becomes ‘NaN’ or not a number. Our code winsorizes large nn values.

Since mtcars data has n=32, and the observed generalized correlation is r=0.938r^{*}=0.938, we use the command on the last line of the code to get the p-value of 1e-16, or extremely small, suggesting statistical significance. If we use the same automobile data using GM22’s R package called HellCor we find that η=0.845>0\eta=0.845>0, giving no hint that mpg and hp are negatively related. If we compare numerical magnitudes, we have η>|r(mpg,hp)|=0.78\eta>|r(mpg,hp)|=0.78. Since η\eta exceeds Pearson’s correlation in absolute value, η\eta is seen to incorporate nonlinear dependence. However, η=0.78\eta=0.78 may be an underestimation of the absolute value of depMeas=0.938-0.938. We fear that either η\eta may be failing to incorporate some nonlinear dependence or is paying an unknown penalty for adhering to the symmetry dogma.

4.1 Further Real-Data Applications in GM22

GM22 illustrate the Hellinger correlation measure of dependence using two sets of data where the Pearson correlation is statistically insignificant, yet their Hellinger correlation is significant. Their first data set refers to the population of seabirds and coral reef fish residing around n = 12 islands in the British Indian Ocean Territory of Chagos Archipelago. Ecologists and other scientists cited by GM22 have determined that fish and seabirds have an ecologically symbiotic relationship. The seabirds create an extra nutrient supply to help algae. Since fish primarily feed on those algae, the two variables should have a significantly positive dependence.

GM22 begin with the low Pearson correlation r(fish,seabirds)=0.374r(fish,seabirds)=0.374 and a 95% confidence interval [0.2548,0.7803][-0.2548,0.7803] that contains a zero, suggesting no significant dependence. The p-value using pTarald(..,obsr=0.374) is 0.0935, which exceeds the benchmark of 0.05, confirming statistical insignificance. The wide confidence interval, which includes zero, is partly due to the small sample size (n=12).

Our Table 1 with the exact distribution of correlations suggests that when n=10n=10, more conservative than the correct n=12, the exact two-tail 95% confidence interval (leaving 2.5% probability mass in both tails) also has a wide range [0.58,0.58][-0.58,0.58], which includes zero. Assuming the direction is known, a one-tail interval with 5% in the right tail (n=10) value is 0.50. That is, only when the observed correlation is larger than 0.50 it is significantly positive (assuming a bivariate normal parent density).

Figure 3: Marginal densities of fish and seabirds data are skewed, not Normal
Refer to caption

GM22 find that their Hellinger correlation η\eta needs to be normalized to ensure that η[0,1]\eta\in[0,1], because their estimate of \mathcal{B} can exceed unity. They denote the normalized version as η^\hat{\eta} and claim an easier interpretation of η^\hat{\eta} on the “familiar Pearson scale,” though Pearson’s rij[1,1]r_{ij}\in[-1,1] scale admits negative values. GM22 employ considerable ingenuity to achieve the positive range [0, 1] described in their Section 5.3. They state on page 650 that their range normalization “comes at the price of a lower power when it comes to test for independence.”

Using the population of seabirds and coral reef fish residing around n = 12 islands, GM22 report the estimate η^\hat{\eta}(fish, seabirds)=0.744. If one assumes bivariate normal parent distribution and uses Taraldsen’s exact density from Table 1, η^\hat{\eta}(fish, seabirds)=0.744>0.50=0.744>0.50 suggests statistical significance. The p-value using pTarald(..,obsr=0.744) is 0.0027, which is smaller than the benchmark 0.05, confirming significance.

In light of Figure 3, it is unrealistic to assume that the data come from a bivariate normal parent distribution. Hence the evidence showing a significantly positive correlation between fish and seabirds based on Taraldsen’s exact density is suspect. Accordingly, GM22 report a bootstrap p-value of 0.045<0.050.045<0.05 as their evidence. Since this p-value is too close to 0.05, we check for unintended p-hacking. When one runs their HellCor(.) function with set.seed(99) and default settings, the bootstrap p-value becomes 0.0513>0.050.0513>0.05, which exceeds the benchmark suggesting insignificant η^(fish,seabirds).\hat{\eta}(fish,seabirds). Then, GM22’s positive Hellinger correlation estimate of η^\hat{\eta}= 0.744 is not statistically significant at the usual 95% level. Thus, the Hellinger correlation fails to be strongly superior to Pearson’s correlation rr because η^\hat{\eta} is also insignificantly positive.

Now, let us compare η^\hat{\eta} with off-diagonal elements of the generalized correlation matrix RR^{*} recommended here. Our gmcmtx0(cbind(fish,seabirds)) suggests the “causal” direction (seabirds\to fish), to be also positive, r(fish|seabirds)=0.6687r^{*}(fish|\break seabirds)=0.6687. The p-value using pTarald(..,obsr=0.6687) is 0.0086, which is smaller than the benchmark 0.05, confirming significance. There is no suspicion of p-hacking here. A 95% bootstrap two-tail confidence interval using the meboot R package is [0.3898, 0.9373]. A one-tail interval is [0.4394, 1], which includes the observed 0.6687 with a p-value of zero. See Figure 4, where almost the entire density has positive support. Note that the interval does not include a zero, suggesting significant positive dependence consistent with what the ecologists expect. The lower limit of our meboot confidence interval is not close to zero. More importantly, our RR^{*} generalized correlation coefficients do not impose symmetric dependence, revealing sign information borrowed from the covariance, absent in the Hellinger correlation.

Figure 4: Bootstrap density of generalized correlation coefficient r*(seabirds, fish).
Refer to caption

The second example in GM22 has the number of births (X1X_{1}) and deaths (X2X_{2}) per year per 1000 individuals in n=229 countries in 2020. A data scatterplot in their Figure 7 displays a C-shaped nonlinear relation. Pearson’s correlation r12=0.13r_{12}=-0.13 is negative and insignificant at level α=0.05\alpha=0.05. This is based on a two-tail traditional Fisher approximation to the sampling distribution of a correlation coefficient. It is reversed by our more powerful one-tail p-value using Taraldsen’s exact sampling distribution. Our pTarald(..,n=229, obsr=-0.13) is (0.0246<0.05)(0.0246<0.05), implying a statistically significant negative correlation. On the other hand, GM22 estimate η^=0.69\hat{\eta}=0.69 with a two-tail 95% bootstrap confidence interval [0.474, 0.746], hiding important information about the negative direction of dependence. Since zero is outside the confidence interval, GM22 claim that they have correctly overcome an apparently incorrect inference based on traditional methods. We have shown that traditional inference was incorrect because more accurate Taraldsen distribution was not used.

Our gmcmtx0(cbind(birth,death)) estimates that r(death|birth)r^{*}(death|birth) is =0.6083=-0.6083. A one-tail 95% confidence interval using the maximum entropy bootstrap (R package meboot) is [-1, -0.5693]. A somewhat less powerful two-tail interval [0.6251,0.5641-0.6251,-0.5641] is also entirely negative. The null hypothesis states that the true unknown rr^{*} is zero. Since our random interval excludes zero, the dependence is significantly negative. The p-value is zero in Figure 5, since almost the entire density has negative support. A larger birth rate significantly leads to a lower death rate in 229 countries in 2020.

Figure 5: Bootstrap density of generalized correlation coefficient r*(birth, death).
Refer to caption

In summary, the two examples used by GM22 to sell their Hellinger correlation have a discernible advantage over Pearson’s rijr_{ij}, but not over our generalized correlation RR^{*}. The examples confirm four shortcomings of Hellinger’s correlation η^\hat{\eta} over RR^{*}. (a) It imposes an unrealistic symmetry assumption. (b) It provides no information about the direction of dependence. (c) It forces the use of less powerful two-tail confidence intervals. (d) It is currently not implemented for discrete variables.

5 Final Remarks

Many scientists are interested in measuring the directions and strengths of dependence between variables. This paper surveys quantitative measures of dependence between two variables. We use four real-world examples in Section 1.1 to show that any symmetric measure of dependence alleging equal strength in both directions is unacceptable. Yet, the majority of statistical dependence extant in the literature adheres to the symmetry dogma. A 2022 paper (GM22) develops an intrinsically flawed symmetric measure of dependence while proposing Hellinger correlations.

We show that off-diagonal elements of the asymmetric RR^{*} matrix of generalized correlation coefficients provide an intuitively sensible measure of dependence after incorporating nonlinear and nonparametric relations among the variables involved. The R package generalCorr makes it easy to implement our proposal. Its six vignettes provide ample illustrations of the theory and applications of RR^{*}.

We discuss statistical inference for elements of the RR^{*} matrix, providing a new Table 1 of quantiles of Taraldsen (2021) exact density of a correlation coefficient for eleven typical sample sizes and eight cumulative probabilities. We illustrate with two data sets used by GM22 to support their Hellinger correlation. Directional information is uniquely provided by our asymmetric measure of dependence in the form of generalized correlation coefficients {r(i|j)}\{r^{*}(i|j)\}. It allows the researcher to achieve somewhat better qualitative results and more powerful one-tail tests compared to symmetric measures of dependence in the literature.

We claim that one-tail p-values of the Taraldsen’s density can overcome the inaccuracy of the traditional Pearson correlation inference based on Fisher’s z-transform. We illustrate the claim using GM22’s second example where Pearson correlation r(birth,death)r(birth,death) is shown to be significantly negative using Taraldsen’s density. Hence, the complicated Hellinger correlation inference is not really needed to achieve correct significance. Interestingly, both hand-picked examples designed to show the superiority of GM22’s η^\hat{\eta} over rijr_{ij} show the merit of our proposal based on RR^{*} over η^\hat{\eta}.

Almost every issue of every quantitative journal refers to correlation coefficients at least once, underlining its importance in measuring dependence. We hope that RR^{*} and our implementation of Taraldsen’s exact sampling distribution of correlation coefficients receive further attention and development.

References

  • Allen (2022) Allen, D. E. (2022), “Cryptocurrencies, Diversification and the COVID-19 Pandemic,” Journal of Risk and Financial Management, 15, URL https://www.mdpi.com/1911-8074/15/3/103.
  • Allen and Hooper (2018) Allen, D. E. and Hooper, V. (2018), “Generalized Correlation Measures of Causality and Forecasts of the VIX Using Non-Linear Models,” Sustainability, 10, 1–15, URL https://www.mdpi.com/2071-1050/10/8/2695.
  • Beare (2010) Beare, B. K. (2010), “Copulas and Temporal Dependence,” Econometrica, 78(1), 395–410.
  • Bhattacharyya (1943) Bhattacharyya, A. (1943), “On a Measure of Divergence Between Two Statistical Populations Defined by Their Probability Distributions,” Bulletin of the Calcutta Mathematical Society, 35, 99–109.
  • Bouri et al. (2020) Bouri, E., Shahzad, S. J. H., Roubaud, D., Kristoufek, L., and Lucey, B. (2020), “Bitcoin, gold, and commodities as safe havens for stocks: New insight through wavelet analysis,” The Quarterly Review of Economics and Finance, 77, 156–164, URL https://www.sciencedirect.com/science/article/pii/S1062976920300326.
  • Dette et al. (2013) Dette, H., Siburg, K. F., and Stoimenov, P. A. (2013), “A Copula-Based NonParametric Measure of Regression Dependence,” Scandinavian Journal of Statistics, 40, 21–41.
  • Geenens and de Micheaux (2022) Geenens, G. and de Micheaux, P. L. (2022), “The Hellinger Correlation,” Journal of the American Statistical Association, 117 (538), 639–653, URL 10.1080/01621459.2020.1791132.
  • Granger et al. (2004) Granger, C. W. J., Maasoumi, E., and Racine, J. (2004), “A Dependence Metric for Possibly Nonlinear Processes,” Journal of Time Series Analysis, 25, 649–669.
  • Janzing et al. (2013) Janzing, D., Balduzzi, D., Grosse-Wentrup, M., and Schölkopf, B. (2013), “Quantifying Causal Influences,” The Annals of Statistics, 41, 2324–2358.
  • Kendall and Stuart (1977) Kendall, M. and Stuart, A. (1977), The Advanced Theory of Statistics, vol. 2, New York: Macmillan Publishing Co., 4th ed.
  • Reimherr and Nicolae (2013) Reimherr, M. and Nicolae, D. L. (2013), “On Quantifying Dependence: A Framework for Developing Interpretable Measures,” Statistical Science, 28, 116–130, URL https://doi.org/10.1214/12-STS405.
  • Renyi (1959) Renyi, A. (1959), “On Measures of Dependence,” Acta Mathematica Academiae Scientiarum Hungarica, 10, 441–4510.
  • Taraldsen (2021) Taraldsen, G. (2021), “Confidence in Correlation,” arxiv, 1–7, URL 10.13140/RG.2.2.23673.49769.
  • Tjostheim (1996) Tjostheim, D. (1996), “Measures and tests of independence: a survey,” Statistics, 28, 249–284, URL https://arxiv.org/pdf/1809.10455.pdf.
  • Vinod (2014) Vinod, H. D. (2014), “Matrix Algebra Topics in Statistics and Economics Using R,” in “Handbook of Statistics: Computational Statistics with R,” , eds. Rao, M. B. and Rao, C. R., New York: North-Holland, Elsevier Science, vol. 34, chap. 4, pp. 143–176.
  • Vinod (2017) — (2017), “Generalized correlation and kernel causality with applications in development economics,” Communications in Statistics - Simulation and Computation, 46, 4513–4534, available online: 29 Dec 2015, URL https://doi.org/10.1080/03610918.2015.1122048.
  • Zheng et al. (2012) Zheng, S., Shi, N.-Z., and Zhang, Z. (2012), “Generalized Measures of Correlation for Asymmetry, Nonlinearity, and Beyond,” Journal of the American Statistical Association, 107, 1239–1252.