This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

BEAUTY Powered BEAST

Kai Zhang 111Kai Zhang is Associate Professor (E-mail: [email protected]), Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.    Wan Zhang 222Wan Zhang is Assistant Professor (E-mail: [email protected]), The Academy of Mathematics and System Science Center for Forecasting Science, Chinese Academy of Sciences, Beijing, China.    Zhigen Zhao 333Zhigen Zhao is Associate Professor (E-mail: [email protected]), Department of Statistical Science, Temple University, Philadelphia, PA 19121.    Wen Zhou 444Wen Zhou is Associate Professor (E-mail: [email protected]), Department of Biostatistics, School of Global Public Health, New York University, New York, NY 10003.
Abstract

We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler’s formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests of independence via approximations from specific quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we examine test statistics with data-adaptive weights, referred to as the Binary Expansion Adaptive Symmetry Test (BEAST). For any given alternative, we demonstrate that the Neyman-Pearson test can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a useful benchmark of feasible power. To approach this oracle power, we devise the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.

Keywords— Nonparametric Inference; Goodness-of-fit Test; Test of Independence; Resampling; Characteristic Function; Euler’s Formula

1 Introduction

As we enter the era of data dominance, the prevalence of complex datasets presents significant challenges to traditional parametric inference methods, which often lose efficacy in practical applications, as flawless models are rarely attainable through scientific theories alone. In contrast, nonparametric methods deliver more robust inference, rendering them increasingly appealing for practical use. Hypotheses addressing particular structures of distributions are classified as goodness-of-fit tests (Lehmann and Romano, 2006). Notable progress within this domain is extensively documented in the literature, as exemplified by moment or cumulant based methods (Anderson and Darling, 1954; Stephens, 1976), likelihood ratio based methods (Cressie and Read, 1984; Fan and Huang, 2001; Zhang, 2002), empirical process based methods (Genest et al., 2006; Escanciano, 2006; Jager and Wellner, 2007; Genest et al., 2009; Kaiser and Soumendra, 2012), kernel based methods (González-Manteiga and Crujeiras, 2013; Sen and Sen, 2014), and randomization based methods (Janková et al., 2020; Kim and Ramdas, 2020; Barber and Janson, 2022), along with the references cited therein.

Without loss of generality, we consider a pp-dimensional distribution in [1,1]p[-1,1]^{p} for notation convenience. Let 𝑼=(U1,,Up)T\bm{U}=({U_{1},\ldots,U_{p}})^{T} denote a pp-dimensional vector whose marginal distributions are continuous and whose joint distribution 𝐏𝑼\mathbf{P}_{\bm{U}} has a support within [1,1]p[-1,1]^{p}. With the above notation, the goodness-of-fit test for some hypothesized distribution 𝐏\mathbf{P} can be written as follows:

H0:𝐏𝑼=𝐏v.s.H1:Dist(𝐏𝑼,𝐏)δH_{0}:\mathbf{P}_{\bm{U}}=\mathbf{P}~{}~{}v.s.~{}~{}H_{1}:\text{Dist}(\mathbf{P}_{\bm{U}},\mathbf{P})\geq\delta (1.1)

for some distance Dist(,)\text{Dist}(\cdot,\cdot) between distributions and some 0<δ1.0<\delta\leq 1. Some common choices of Dist(,)\text{Dist}(\cdot,\cdot) include the total variation (TV) distance TV(,)\text{TV}(\cdot,\cdot) (Zhang, 2019) and the L2L_{2} distance (Berrett et al., 2021).

Two important special cases of goodness-of-fit tests are the test of uniformity and the test of independence. The test of uniformity can be formulated as

H0:𝐏𝑼=𝐏0v.s.H1:Dist(𝐏𝑼,𝐏0)δ,H_{0}:\mathbf{P}_{\bm{U}}=\mathbf{P}_{0}~{}~{}v.s.~{}~{}H_{1}:\text{Dist}(\mathbf{P}_{\bm{U}},\mathbf{P}_{0})\geq\delta, (1.2)

where 𝐏0=Unif[1,1]p\mathbf{P}_{0}=\text{Unif}[-1,1]^{p} denotes the uniform distribution. The test of independence can be formulated as

H0:𝐏(𝑼1,𝑼2)=𝐏1×𝐏2v.s.H1:Dist(𝐏(𝑼1,𝑼2),𝐏1×𝐏2)δ,H_{0}:\mathbf{P}_{(\bm{U}_{1},\bm{U}_{2})}=\mathbf{P}_{1}\times\mathbf{P}_{2}~{}~{}v.s.~{}~{}H_{1}:\text{Dist}(\mathbf{P}_{(\bm{U}_{1},\bm{U}_{2})},\mathbf{P}_{1}\times\mathbf{P}_{2})\geq\delta, (1.3)

where 𝑼1\bm{U}_{1} and 𝑼2\bm{U}_{2} are p1p_{1} and p2p_{2} dimensional random vectors with distributions 𝐏1\mathbf{P}_{1} and 𝐏2\mathbf{P}_{2} respectively. Applications of the test of uniformity include Diaz Rivero and Dvorkin (2020); Liang et al. (2001). Important developments in distribution-free tests of independence include cumulative distribution function (CDF) based methods (Hoeffding, 1948; Blum et al., 1961; Genest and Verret, 2005; Kojadinovic and Holmes, 2009; Genest et al., 2019; Chatterjee, 2020; Cao and Bickel, 2020), kernel based methods (Székely et al., 2007; Gretton et al., 2007; Zheng et al., 2012; Sejdinovic et al., 2014; Pfister et al., 2016; Zhu et al., 2017; Jin and Matteson, 2018; Balakrishnan and Wasserman, 2019a; Shi et al., 2020; Deb et al., 2020; Geenens and de Micheaux, 2020; Berrett et al., 2021; Berrett and Samworth, 2021), binning based methods (Miller and Siegmund, 1982; Reshef et al., 2011; Heller et al., 2013; Kinney and Atwal, 2014; Heller et al., 2016; Heller and Heller, 2016; Ma and Mao, 2019; Lee et al., 2023), and references therein.

To facilitate the analysis of large datasets, some desirable attributes of distribution-free tests of independence include (a) a robust high power against a wide range of alternatives, (b) a clear interpretation of the form of dependency upon rejection, and (c) a computationally efficient algorithm. An example of recent development towards these goals is the binary expansion testing (BET) framework and the Max BET procedure in Zhang (2019). It was shown that the Max BET is minimax optimal in power under mild conditions, has clear interpretability of statistical significance and is implemented through computationally efficient bitwise operations Zhao et al. (2023b). Potential improvements of the Max BET include the followings: (a) The procedure is only univariate and needs to be generalized to higher dimensions. (b) The multiplicity correction is through the conservative Bonferrnoni procedure, which leaves room for further enhancement of power. In Lee et al. (2023), random projections are applied to the multivariate observations to reduce the dimension to one, so that the univariate methods of Zhang (2019) are applicable. An ensembled approach involving distance correlation is further used to improve the power towards monotone relationships.

In this paper, we develop an in-depth understanding of the BET framework and construct a class of powerful distribution-free goodness-of-fit tests, encompassing both the uniformity test and the independence test between a random vector and a random variable. While many tests have been devised for this problem, Zhang (2019) showed that uniform consistency for testing  (1.3) is generally unachievable for all alternative distributions. Analogous results for the L2L^{2}-distance was recently documented by Berrett et al. (2021). These results reaffirm earlier observations by LeCam (1973); Barron (1989); Balakrishnan and Wasserman (2019b).

Practically, this finding indicates that each test encounters a “blind spot,” resulting in a significant loss of power. To mitigate the power loss arising from non-uniform consistency, Zhang (2019) introduced the binary expansion statistics (BEStat) framework to restrict the space of alternative distributions up to a suitable finite resolution. The BEStat approach is inspired by the classical probability result of the binary expansion of a uniformly distributed random variable (Kac, 1959), as stated below.

Theorem 1.1.

If UUnif[1,1],U\sim\mathrm{Unif}[-1,1], then U=d=12dAdU=\sum_{d=1}^{\infty}2^{-d}{A_{d}} where Adi.i.d.RademacherA_{d}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Rademacher}, that is Ad{1,1}A_{d}\in\{-1,1\} with equal probabilities.

Theorem 1.1 allows the approximation of the σ\sigma-field generated by UU using UD=d=1D2dAdU_{D}=\sum_{d=1}^{D}{2^{-d}}{A_{d}} for any positive integer depth DD. This filtration approach for testing uniformity facilitates a universal distribution approximation, an identifiable model, and uniformly consistent tests at any depth DD. The testing framework based on the binary expansion filtration approximation is referred to as the binary expansion testing (BET). Specifically, the BET of approximate uniformity for 𝑼D=(U1,D,,Up,D)T\bm{U}_{D}=(U_{1,D},\ldots,U_{p,D})^{T} is

H0,D:𝐏𝑼D=𝐏0,Dv.s.H1,D:Dist(𝐏𝑼D,𝐏0,D)δ,H_{0,D}:\mathbf{P}_{\bm{U}_{D}}=\mathbf{P}_{0,D}~{}~{}v.s.~{}~{}H_{1,D}:\text{Dist}(\mathbf{P}_{\bm{U}_{D}},\mathbf{P}_{0,D})\geq\delta, (1.4)

where 𝐏0,D\mathbf{P}_{0,D} is the uniform distribution over pp-dimensional dyadic rationals {2D(12D)+2D+1k,k=0,1,,2D1}p.\{2^{-D}(1-2^{D})+2^{-D+1}k,k=0,1,\ldots,2^{D}-1\}^{p}.

1.1 Our Contributions

Our study of the BET framework is inspired by the celebrated Euler’s formula,

eix=cosx+isinx,x,e^{ix}=\cos{x}+i\sin{x},~{}~{}~{}~{}~{}~{}\forall x\in\mathbb{R},

which is often regarded as one of the most beautiful equations in mathematics. In particular, when x=πx=\pi, one has Euler’s identity, eiπ+1=0e^{i\pi}+1=0, which connects the five most important numbers in mathematics 0,1,i,e,π0,1,i,e,\pi in one simple yet deep equation. Beside the beauty of this equation, how is it useful for statisticians? To see that, consider any binary variable AA (not necessarily symmetric) which takes values 1-1 or 11. Through the parity of the sine and cosine functions, one can easily show the following binary Euler’s equation. Since we were not aware of any reference of this equation in literature, we formally state it below.

Lemma 1.2 (Binary Euler’s Equation).

For any binary random variable AA with possible outcomes of 1-1 or 11, it holds that for any x,x\in\mathbb{R},

eiAx=cosx+iAsinx.e^{iAx}=\cos{x}+iA\sin{x}. (1.5)

Lemma 1.2 generalizes Euler’s formula with additional randomness from a binary variable and reduces its complex exponentiation to its linear polynomial. To the best of our knowledge, no other random variables enjoy the same remarkable attribute. Moreover, note that the random variable eiAxe^{iAx} in (1.5) is closely related to characteristic functions, particularly when it is combined with the binary expansion in Theorem 1.1. For example, for U=d=12dAdUnif[1,1]U=\sum_{d=1}^{\infty}2^{-d}{A_{d}}\sim\text{Unif}[-1,1] and for any t,t\in\mathbb{R}, we have

eiUt=eitd=1Ad2d=d=1eiAdt2d=d=1{cos(t/2d)+iAdsin(t/2d)}.e^{iUt}=e^{it\sum_{d=1}^{\infty}\frac{A_{d}}{2^{d}}}=\prod\nolimits_{d=1}^{\infty}e^{\frac{iA_{d}t}{2^{d}}}=\prod\nolimits_{d=1}^{\infty}\{\cos{(t/2^{d})}+iA_{d}\sin{(t/2^{d})}\}. (1.6)

The complex exponent of UU can be approximated by a polynomial of the binary variables in its binary expansion! Moreover, we show in Section 2 that this approximation is universal for any pp-dimensional vector supported within [1,1]p[-1,1]^{p}. We refer this universal binary interaction approximation of the complex exponent and the characteristic function as the Binary Expansion Approximation of UniformiTY (BEAUTY) in Theorem 2.2.

Based on the BEAUTY, in this paper we make the following three main novel contributions to the problem of nonparametric tests of independence:

1. A unification of important nonparamatric tests of independence. In Section 3, we show that many important tests of independence in literature can be approximated by some quadratic forms of symmetry statistics, which are shown to be complete sufficient statistics for dependence in Zhang (2019). In particular, each of these test statistics corresponds to a different deterministic weight matrix in the quadratic form, which in turn dictates the power properties of the test. Therefore, this deterministic weight in existing test statistics creates the key issue on uniformity and robustness of the test, as it may favor certain alternatives but cause a substantial loss of power for other alternatives. Following this observation, we consider a test statistic that has data-adaptive weights to make automatic adjustments under different situations so as to achieve a robust power. We refer this test as the Binary Expansion Adaptive Symmetry Test (BEAST), as described in Section 4.

2. A benchmark of feasible power from the BEAST with oracle. We begin by considering the test of uniformity. By utilizing the properties of the binary expansion filtration, we show in a heuristic asymptotic study of the BEAST a surprising fact that for any given alternative, the Neyman-Pearson test for testing uniformity can be approximated by a weighted sum of symmetry statistics. We thus develop the BEAST through an oracle approach over this Neyman-Pearson one-dimensional projection of symmetry statistics, which quantifies a boundary of feasible power performance. Numerical studies in Section 5 show that the BEAST with oracle leads a wide range of prevailing tests by a surprisingly huge margin under all alternatives we considered. This enormous margin thus provides helpful information about the potential of substantial power improvement for each alternative. To the best of our knowledge, there is no other type of similar approach or results to study the potential performance of a test of uniformity or independence. Therefore, the BEAST with oracle sets a novel and useful benchmark for the feasible power under any alternative. Moreover, it provides guidance for choosing suitable weights to boost the power of the test.

3. A powerful and robust BEAST from a regularized resampling approximation of the oracle. Motivated by the form of the BEAST with oracle, we construct the practical BEAST to approximate the optimal power by approximating the oracle weights in testing uniformity. The proposed BEAST combines the ideas of resampling and regularization to obtain data-adaptive weights that adjusts the statistic towards the oracle under each alternative. Here resampling helps the approximation of the sampling distribution of the oracle test statistic, and regularization screens the noise in the estimation of optimal weights. This test is applied to the problem to the test of independence. Simulation studies in Section 5 demonstrate that the BEAST improves the power of many existing tests of univariate or multivariate independence against many common forms of non-uniformity, particularly multimodal and nonlinear ones. Besides its robust power, the BEAST provides clear and meaningful interpretations of statistical significance, which we demonstrate in Section 6. We conclude our paper with discussions in Section 7. Details of notation, theoretical proofs and additional numerical results are deferred to Supplementary materials.

2 The BEAUTY Equation

To further understand the BET framework, we first develop the general binary expansion for any random vector supported within [1,1]p[-1,1]^{p} as in Lemma 2.1.

Lemma 2.1.

Let 𝐔=(U1,U2,,Up)T\bm{U}=(U_{1},U_{2},\cdots,U_{p})^{T} be a random vector supported within [1,1]p[-1,1]^{p}. There exists a sequence of random variables {Aj,d}\{A_{j,d}\}, j=1,2,,pj=1,2,\cdots,p, d=1,2,,Dd=1,2,\cdots,D, which only take values 1-1 and 11, such that max1jp{|UjUj,D|}0\max_{1\leq j\leq p}\{|U_{j}-U_{j,D}|\}\to 0 uniformly as DD\rightarrow\infty, where Uj,D=d=1D(Aj,d)/2d.U_{j,D}=\sum_{d=1}^{D}\left(A_{j,d}\right)/2^{d}.

A classical construction of Aj,dA_{j,d}’s is to consider the binary numeral system representation of real numbers, or data bits (Kac, 1959; Zhang, 2019). We refer the collection of variables {Aj,d}\{A_{j,d}\} as the general binary expansion of UjU_{j} and denote 𝑼D=(U1,D,U2,D,,Up,D)T\bm{U}_{D}=(U_{1,D},U_{2,D},\cdots,U_{p,D})^{T} as the depth-DD binary approximation of 𝑼\bm{U}. Let 𝐁p×D\mathbf{B}^{p\times D} denote the set of all p×Dp\times D binary matrices with entries being either 0 or 11. We use a matrix Λ=Λp×D𝐁p×D\Lambda=\Lambda^{p\times D}\in\mathbf{B}^{p\times D} to index an interaction of binary variables {Aj,d}\{A_{j,d}\} via AΛ=j=1pd=1D(Aj,d)ΛjdA_{\Lambda}=\prod\nolimits_{j=1}^{p}\prod\nolimits_{d=1}^{D}(A_{j,d})^{\Lambda_{jd}}. For the zero matrix Λ=𝟎p×D,\Lambda={\bm{0}}^{p\times D}, we define A𝟎p×D=1.A_{{\bm{0}}^{p\times D}}=1.

With the above notation, we develop the following theorem on the binary expansion approximation of uniformity (BEAUTY), which provides an approximation of the characteristic function of any distribution supported within [1,1]p[-1,1]^{p} from the expectation of a polynomial of general binary expansion interactions.

Theorem 2.2 (Binary Expansion Approximation of Uniformity, BEAUTY).

Let 𝐔\bm{U} be a pp-dimensional random vector such that Uj[1,1],jU_{j}\in[-1,1],\forall j. Let ϕ𝐔(𝐭)\phi_{\bm{U}}(\bm{t}) be the characteristic function of 𝐔\bm{U} for any 𝐭=(t1,,tp)Tp\bm{t}=(t_{1},\ldots,t_{p})^{T}\in\mathbb{R}^{p}. We have

ei𝒕T𝑼D=Λ𝐁p×DAΛΨΛ(𝒕)e^{i\bm{t}^{T}\bm{U}_{D}}=\sum\nolimits_{\Lambda\in\mathbf{B}^{p\times D}}A_{\Lambda}\Psi_{\Lambda}(\bm{t}) (2.1)

and

ϕ𝑼(𝒕)=𝐄[exp(i𝒕T𝑼)]=limDΛ𝐁p×DΨΛ(𝒕)𝐄[AΛ],\phi_{\bm{U}}(\bm{t})=\mathbf{E}[\exp(i\bm{t}^{T}\bm{U})]=\lim_{D\to\infty}\sum\nolimits_{\Lambda\in\mathbf{B}^{p\times D}}\Psi_{\Lambda}(\bm{t})\mathbf{E}[A_{\Lambda}], (2.2)

where ΨΛ(𝐭)=j=1pd=1D{cos(tj/2d)}1Λjd{isin(tj/2d)}Λjd.\Psi_{\Lambda}(\bm{t})=\prod\nolimits_{j=1}^{p}\prod\nolimits_{d=1}^{D}\{\cos(t_{j}/2^{d})\}^{1-\Lambda_{jd}}\{i\sin(t_{j}/2^{d})\}^{\Lambda_{jd}}.

Building upon Lemma 1.2, (2.1) equates a complex exponent ei𝒕T𝑼De^{i\bm{t}^{T}\bm{U}_{D}} and a polynomial of binary variable AΛA_{\Lambda}’s derived from the binary expansion of 𝑼D\bm{U}_{D}. Equation (2.2) reveals that the characteristic function of any random vector supported within [1,1]p[-1,1]^{p} can be approximated by a linear combination of ΨΛ(𝒕)\Psi_{\Lambda}(\bm{t})’s, representing products of homogeneous trigonometric functions. As per Theorem 1.3 in Zhao et al. (2023a), these functions are linearly independent. Furthermore, the coefficients of this linear combination correspond to the expectations of all binary variables in the σ\sigma-field induced by 𝑼D\bm{U}_{D}. These expectations encapsulate the distributional properties of 𝑼\bm{U}, and inference on them yields crucial insights into the distribution of 𝑼\bm{U}. Specifically, they provide clear insights about the goodness-of-fit tests by translating distributional properties into those of 𝐄[AΛ]\mathbf{E}[A_{\Lambda}]’s. Moreover, sufficient statistics for 𝐄[AΛ]\mathbf{E}[A_{\Lambda}]’s are the symmetry statistics SΛ=i=1nAΛ,iS_{\Lambda}=\sum_{i=1}^{n}A_{\Lambda,i} and equivalently S¯Λ=n1i=1nAΛ,i.\bar{S}_{\Lambda}=n^{-1}\sum_{i=1}^{n}A_{\Lambda,i}. Consequently, test statistics should be constructed as a function of S¯Λ\bar{S}_{\Lambda}’s. We list some important examples connecting goodness-of-fit tests and 𝐄[AΛ]\mathbf{E}[A_{\Lambda}]’s below:

(1) Test of uniformity. Consider the collection of non-zero Λ\Lambda’s, p,D,unif={Λ𝐁p×D:Λ𝟎p×D}\mathcal{L}_{p,D,\text{unif}}=\{\Lambda\in\mathbf{B}^{p\times D}:\Lambda\neq{\bm{0}}^{p\times D}\}. Note that 𝑼Unif[1,1]p\bm{U}\sim\text{Unif}[-1,1]^{p} if and only if 𝐄[AΛ]=0\mathbf{E}[A_{\Lambda}]=0 for Λp,D,unif\Lambda\in\mathcal{L}_{p,D,\text{unif}} where DD is any positive integer. Consequently,

𝐄[exp(i𝒕𝑼)]=limDd=1DΨ𝟎p×D(𝒕)=j=1plimDd=1D{cos(tj/2d)}=j=1p{sin(tj)/tj},\mathbf{E}[\exp(i\bm{t}\bm{U})]=\lim_{D\to\infty}\prod\nolimits_{d=1}^{D}\Psi_{{\bm{0}}^{p\times D}}(\bm{t})=\prod\nolimits_{j=1}^{p}\lim_{D\to\infty}\prod\nolimits_{d=1}^{D}\{\cos(t_{j}/2^{d})\}=\prod\nolimits_{j=1}^{p}\{\sin(t_{j})/t_{j}\},

i.e., (2.2) recovers the characteristic function of Unif[1,1]p.[-1,1]^{p}. Therefore, the test of approximate uniformity (1.4) is equivalent to test for any positive integer DD,

H0,D:𝐄[AΛ]=0, for Λp,D,unif.H_{0,D}:\mathbf{E}[A_{\Lambda}]=0,\text{ for }\Lambda\in\mathcal{L}_{p,D,\text{unif}}.

(2) Test of independence between two univariate random variables. The BEAUTY equation in Theorem 2.2 offers clear insights into the test of independence for bivariate copula up to a certain depth DD. When p1=p2=1p_{1}=p_{2}=1, it is always possible to transform U1U_{1} and U2U_{2} into uniform random variables on [1,1][-1,1] using their marginal distributions. In this context, Zhang (2019) demonstrated that test (1.3) with p1=p2=1p_{1}=p_{2}=1 considered the following test for any positive integer DD,

H0,D:𝐄[AΛ]=0 for Λ2,D,cross,H_{0,D}:\mathbf{E}[A_{\Lambda}]=0\text{ for }\Lambda\in\mathcal{L}_{2,D,\text{cross}},

where 2,D,cross={Λ=Λ1\raisebox{-.9pt} {r}⃝ Λ2:Λ11,D,unif and Λ21,D,unif}\mathcal{L}_{2,D,\text{cross}}=\{\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{2}:\Lambda_{1}\in\mathcal{L}_{1,D,\text{unif}}\text{ and }\Lambda_{2}\in\mathcal{L}_{1,D,\text{unif}}\}. Here, \raisebox{-.9pt} {r}⃝ denotes the row binding of matrices with the same number of columns (see Definition LABEL:def:_rbind in the supplement).

(3) Test of independence between response and predictors. The BEAUTY equation provides an in-depth understanding of testing the association between p1>1p_{1}>1 predictors 𝑼1\bm{U}_{1} and an arbitrary response U2U_{2}, extending beyond the scope of traditional regressions. As noticed before, U2U_{2} can always be transformed into a uniform variable on [1,1][-1,1]. Note that U2U_{2} is independent of 𝑼1\bm{U}_{1} if and only if Λ𝑩(p1+1)×D𝐄[AΛ]ΨΛ(𝒕)=0\sum_{\Lambda\in\bm{B}^{(p_{1}+1)\times D}}\mathbf{E}[A_{\Lambda}]\Psi_{\Lambda}(\bm{t})=0 for any positive integer DD. Note also that from Theorem 1.3 in Zhao et al. (2023a), the ΨΛ(t)\Psi_{\Lambda}(t)’s are linearly independent in the function space. Therefore, by comparing the characteristic function of the joint distribution and the product of marginal characteristic functions, we see that for any positive integer DD, the null hypothesis that 𝑼1,D\bm{U}_{1,D} and U2,DU_{2,D} are independent is equivalent to the following hypothesis over the expectations of AΛA_{\Lambda}’s: :

H0,D:𝐄[AΛ]=0 for Λp1+1,D,joint cross,H_{0,D}:\mathbf{E}[A_{\Lambda}]=0\text{ for }\Lambda\in\mathcal{L}_{p_{1}+1,D,~{}\text{joint cross}},

where p1+1,D,joint cross={Λ=Λ1\raisebox{-.9pt} {r}⃝ Λ2:Λ1p1,D,unif and Λ21,D,unif}\mathcal{L}_{p_{1}+1,D,~{}\text{joint cross}}=\{\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{2}:\Lambda_{1}\in\mathcal{L}_{p_{1},D,\text{unif}}\text{ and }\Lambda_{2}\in\mathcal{L}_{1,D,\text{unif}}\}.

(4) Test of independence between a uniform vector and an arbitrary vector. For random vector 𝑼1Unif[1,1]p1\bm{U}_{1}\sim\mathrm{Unif}[-1,1]^{p_{1}} and arbitrary random vector 𝑼2\bm{U}_{2} within [1,1]p2[-1,1]^{p_{2}}, by leveraging Theorem 2.2, the null hypothesis that posits the independence of 𝑼1\bm{U}_{1} and 𝑼2\bm{U}_{2} can be approximated by the following one over the expectations of AΛA_{\Lambda}’s:

H0,D:𝐄[AΛ]=0 for Λp1+p2,D,joint cross,H_{0,D}:\mathbf{E}[A_{\Lambda}]=0\text{ for }\Lambda\in\mathcal{L}_{p_{1}+p_{2},D,~{}\text{joint cross}},

where p1+p2,D,joint cross={Λ=Λ1\raisebox{-.9pt} {r}⃝ Λ2:Λ1p1,D,unif and Λ2p2,D,unif}\mathcal{L}_{p_{1}+p_{2},D,~{}\text{joint cross}}=\{\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{2}:\Lambda_{1}\in\mathcal{L}_{p_{1},D,\text{unif}}\text{ and }\Lambda_{2}\in\mathcal{L}_{p_{2},D,\text{unif}}\}.

3 Unification of Several Tests of Independence

To construct a powerful test statistic, we first study existing tests of independence and their properties under the BET framework. We consider three important test statistics: Spearman’s ρ\rho (Spearman, 1904), the χ2\chi^{2} statistics, and the distance correlation (Székely et al., 2007). We find that each of these statistics can be approximated by a certain quadratic form of symmetry statistics. We further discuss the effect of the weight matrix in the quadratic forms on their power properties.

Since each specific statistic may involve a different collection of binary interactions, we denote a collection of certain Λ\Lambda’s by .\mathcal{L}. For such a collection \mathcal{L}, we denote the vector of AΛA_{\Lambda}’s, SΛS_{\Lambda}’s and S¯Λ\bar{S}_{\Lambda}’s with Λ\Lambda\in\mathcal{L} by AA_{\mathcal{L}}, 𝑺\bm{S}_{\mathcal{L}} and 𝑺¯\bar{\bm{S}}_{\mathcal{L}}, respectively.

3.1 Spearman’s ρ\rho

As a robust version of the Pearson correlation, the Spearman’s ρ\rho statistic leads to a test with high asymptotic relative efficiency compared to the optimal test with Pearson correlation under bivariate normal distribution (Lehmann and Romano, 2006). We show below it can be approximated by a quadratic form of symmetry statistics.

When U1U_{1} and U2U_{2} are marginally uniformly distributed over [1,1][-1,1], Spearman’s ρ\rho can be written as the correlation between U1U_{1} and U2U_{2}, i.e.,

ρ=3𝐄[U1U2]=3𝐄[d1=1A1,d12d1d2=1A2,d22d2]=3limDΛ2,D,spe𝒓DT𝐄[AΛ],\rho=3\mathbf{E}[U_{1}U_{2}]=3\mathbf{E}\left[\sum_{d_{1}=1}^{\infty}\frac{A_{1,d_{1}}}{2^{d_{1}}}\sum_{d_{2}=1}^{\infty}\frac{A_{2,d_{2}}}{2^{d_{2}}}\right]=3\lim_{D\rightarrow\infty}\sum_{\Lambda\in\mathcal{L}_{2,D,\text{spe}}}\bm{r}_{D}^{T}\mathbf{E}[A_{\Lambda}], (3.1)

where 2,D,spe={Λ=Λ1\raisebox{-.9pt} {r}⃝ Λ2:Λ1,Λ2𝐁1×D, where Λ1𝟏=1 and Λ2𝟏=1}\mathcal{L}_{2,D,\text{spe}}=\{\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{2}:\Lambda_{1},\Lambda_{2}\in\mathbf{B}^{1\times D},\text{ where }\Lambda_{1}{\bm{1}}=1\text{ and }\Lambda_{2}{\bm{1}}=1\} consists of 2×D2\times D matrices whose rows are both binary vectors with only one unique 11, and the D2D^{2}-dimensional vector 𝒓D\bm{r}_{D} has entry 2(d1+d2)2^{-(d_{1}+d_{2})} corresponding to 𝐄[A1,d1A2,d2].\mathbf{E}[A_{1,d_{1}}A_{2,d_{2}}]. The test based on Spearman’s ρ\rho rejects the null when the estimate of ρ\rho has a large absolute value. This test statistic can be approximated with

Qρ,D=1n(𝒓DT𝑺2,D,spe)2=1n𝑺2,D,speT𝒓D𝒓DT𝑺2,D,spe,Q_{\rho,D}=\frac{1}{n}(\bm{r}_{D}^{T}\bm{S}_{\mathcal{L}_{2,D,\text{spe}}})^{2}=\frac{1}{n}\bm{S}_{\mathcal{L}_{2,D,\text{spe}}}^{T}\bm{r}_{D}\bm{r}_{D}^{T}\bm{S}_{\mathcal{L}_{2,D,\text{spe}}},

which is a quadratic form with a rank-one weight matrix 𝐖ρ,D=𝒓D𝒓DT.\mathbf{W}_{\rho,D}=\bm{r}_{D}\bm{r}_{D}^{T}.

Although the test based on Spearman’s ρ\rho has a higher power against the linear form of dependency particularly present in bivariate normal distributions, we see from 2,D,spe\mathcal{L}_{2,D,\text{spe}} and 𝐖ρ,D\mathbf{W}_{\rho,D} that this test only considers D2D^{2} out of (2D1)2(2^{D}-1)^{2} cross interactions of binary variables in 2,D,cross\mathcal{L}_{2,D,\text{cross}}. Thus this test is not capable of detecting complex nonlinear forms of dependency.

3.2 χ2\chi^{2} Test Statistic

When U1U_{1} and U2U_{2} are Unif[1,1]\text{Unif}[-1,1] distributed, the binary expansion up to depth DD effectively leads to a discretization of [1,1]2[-1,1]^{2} into a 2D×2D2^{D}\times 2^{D} contingency table. Classical tests for contingency tables such as χ2\chi^{2}-test can thus be applied. Similar tests include Fisher’s exact test and its extensions (Ma and Mao, 2019). Multivariate extensions of these methods include Gorsky and Ma (2018); Lee et al. (2023).

In Zhang (2019), it is shown that the χ2\chi^{2}-statistic at depth DD can be written as the sum of squares of symmetry statistics for cross interactions. Thus,

Qχ2=1n𝑺2,D,crossT𝑺2,D,crossQ_{\chi^{2}}=\frac{1}{n}\bm{S}_{\mathcal{L}_{2,D,\text{cross}}}^{T}\bm{S}_{\mathcal{L}_{2,D,\text{cross}}}

where 2,D,cross\mathcal{L}_{2,D,\text{cross}} is the collection of all cross interactions. The weight matrix for Qχ2Q_{\chi^{2}} is thus the identity matrix 𝐈(2D1)×(2D1)\mathbf{I}_{(2^{D}-1)\times(2^{D}-1)}.

The Max BET proposed in Zhang (2019) can be approximated by a quadratic form with another diagonal weight matrix, which we explain in the Supplementary Materials. These tests with diagonal weights can detect signals among the squared symmetry statistics, but might be powerless for signals from their cross products.

3.3 Distance Correlation between uniformly distributed random vectors

To study the dependency between a p1p_{1}-dimensional vector 𝑼1\bm{U}_{1} and a p2p_{2}-dimensional vector 𝑼2\bm{U}_{2}, in Székely et al. (2007), a class of measures of dependence is defined as

𝒱2(𝑼1,𝑼2)=p1+p2|ϕ(𝑼1,𝑼2)(𝒕1,𝒕2)ϕ𝑼1(𝒕1)ϕ𝑼2(𝒕2)|2w(𝒕1,𝒕2)𝑑𝒕1𝑑𝒕2,\mathcal{V}^{2}(\bm{U}_{1},\bm{U}_{2})=\int_{\mathbb{R}^{p_{1}+p_{2}}}|\phi_{(\bm{U}_{1},\bm{U}_{2})}(\bm{t}_{1},\bm{t}_{2})-\phi_{\bm{U}_{1}}(\bm{t}_{1})\phi_{\bm{U}_{2}}(\bm{t}_{2})|^{2}w(\bm{t}_{1},\bm{t}_{2})d\bm{t}_{1}d\bm{t}_{2}, (3.2)

where ϕ(𝑼1,𝑼2)(𝒕1,𝒕2)\phi_{(\bm{U}_{1},\bm{U}_{2})}(\bm{t}_{1},\bm{t}_{2}) is the characteristic function of the joint distribution of (𝑼1,𝑼2),(\bm{U}_{1},\bm{U}_{2}), w(𝒕1,𝒕2)w(\bm{t}_{1},\bm{t}_{2}) is a suitable weight function, and ϕ𝑼k(𝒕k)\phi_{\bm{U}_{k}}(\bm{t}_{k}) is the characteristic function of 𝑼k,k=1,2\bm{U}_{k},k=~{}1,2. Note that 𝒱2(𝑼1,𝑼2)=0\mathcal{V}^{2}(\bm{U}_{1},\bm{U}_{2})=0 if and only if 𝑼1\bm{U}_{1} and 𝑼2\bm{U}_{2} are independent. The distance correlation is then defined through 𝒱2(𝑼1,𝑼2)\mathcal{V}^{2}(\bm{U}_{1},\bm{U}_{2}) and admits some desirable properties such as universal consistency against alternatives with finite expectation.

When 𝑼1Unif[1,1]p1\bm{U}_{1}\sim\text{Unif}[-1,1]^{p_{1}} and 𝑼2Unif[1,1]p2\bm{U}_{2}\sim\text{Unif}[-1,1]^{p_{2}}, by Theorem 2.2, the term corresponding to Λ=𝟎\Lambda={\bm{0}} cancels with ϕ𝑼1(𝒕1)ϕ𝑼2(𝒕2)\phi_{\bm{U}_{1}}(\bm{t}_{1})\phi_{\bm{U}_{2}}(\bm{t}_{2}), and we can write (3.2) as

𝒱2(𝑼1,𝑼2)=limDp1+p2|Λp1+p2,D,unifΨΛ(𝒕)𝐄[AΛ]|2w(𝒕1,𝒕2)𝑑𝒕1𝑑𝒕2=limDΛ1,Λ2p1+p2,D,unifwΛ1,Λ2𝐄[AΛ1]𝐄[AΛ2]=limD𝐄[𝑨p1+p2,D,unif]T𝐖𝒱2,p1,p2,D𝐄[𝑨p1+p2,D,unif]\begin{split}\mathcal{V}^{2}(\bm{U}_{1},\bm{U}_{2})=&\lim_{D\rightarrow\infty}\int_{\mathbb{R}^{p_{1}+p_{2}}}\bigg{|}\sum_{\Lambda\in\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}}\Psi_{\Lambda}(\bm{t})\mathbf{E}[A_{\Lambda}]\bigg{|}^{2}w(\bm{t}_{1},\bm{t}_{2})d\bm{t}_{1}d\bm{t}_{2}\\ =&\lim_{D\rightarrow\infty}\sum_{\Lambda_{1},\Lambda_{2}\in\in\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}}w_{\Lambda_{1},\Lambda_{2}}\mathbf{E}[A_{\Lambda_{1}}]\mathbf{E}[A_{\Lambda_{2}}]\\ =&\lim_{D\rightarrow\infty}\mathbf{E}[\bm{A}_{\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}}]^{T}\mathbf{W}_{\mathcal{V}^{2},p_{1},p_{2},D}\mathbf{E}[\bm{A}_{\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}}]\end{split} (3.3)

where p1+p2,D,unif={Λ𝐁(p1+p2)×D:Λ𝟎p×D}\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}=\{\Lambda\in\mathbf{B}^{(p_{1}+p_{2})\times D}:\Lambda\neq{\bm{0}}^{p\times D}\}, and the weight matrix 𝐖𝒱2,p1,p2,D\mathbf{W}_{\mathcal{V}^{2},p_{1},p_{2},D} consists of constants wΛ1,Λ2w_{\Lambda_{1},\Lambda_{2}}’s from the integration over 𝒕1\bm{t}_{1} and 𝒕2\bm{t}_{2}. The test is significant when the empirical quadratic form Q𝒱2,p1,p2,DQ_{\mathcal{V}^{2},p1,p2,D} is large, where

Q𝒱2,p1,p2,D=1n𝑺p1+p2,D,unifT𝐖𝒱2,p1,p2,D𝑺p1+p2,D,unif.Q_{\mathcal{V}^{2},p1,p2,D}=\frac{1}{n}\bm{S}_{\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}}^{T}\mathbf{W}_{\mathcal{V}^{2},p_{1},p_{2},D}\bm{S}_{\mathcal{L}_{p_{1}+p_{2},D,\text{unif}}}.

Note that 𝐖𝒱2,p1,p2,D\mathbf{W}_{\mathcal{V}^{2},p_{1},p_{2},D} here depends only on the weight function w(𝒕1,𝒕2)w(\bm{t}_{1},\bm{t}_{2}) and is deterministic. Hence, the test based on Q𝒱2,p1,p2,DQ_{\mathcal{V}^{2},p1,p2,D} will have a high power when the vector of 𝐄[AΛ]\mathbf{E}[A_{\Lambda}]’s from the alternative distribution lies in the subspace spanned by eigenvectors of 𝐖𝒱2,p1,p2,D\mathbf{W}_{\mathcal{V}^{2},p_{1},p_{2},D} corresponding to its largest eigenvalues. On the other hand, if instead the signals lie in the subspace spanned by eigenvectors of 𝐖𝒱2,p1,p2,D\mathbf{W}_{\mathcal{V}^{2},p_{1},p_{2},D} corresponding to its lowest eigenvalues, then the power of the test could be considerably compromised. Therefore, a deterministic weight over symmetry statistics becomes a general uniformity issue of existing test statistics. In the next section, we study data-adaptive weights with the aim to improve the power by setting proper weights both among diagonal and off-diagonal entries in the matrix.

4 The BEAST and Its Properties

4.1 The First Two Moments of Binary Interactions

The unification in Section 3 inspires us to consider a class of nonparametric statistics for the goodness-of-fit test as a weighted sum of symmetry statistics. Since the properties of this form of statistics are closely related to the first two moments of the binary interaction variables in the filtration, we consider the collection of all nontrivial binary interactions =p,D,unif={Λ𝐁p×D:Λ𝟎p×D}\mathcal{L}=\mathcal{L}_{p,D,\text{unif}}=\{\Lambda\in\mathbf{B}^{p\times D}:\Lambda\neq{\bm{0}}^{p\times D}\} and study the moment properties of the corresponding binary random vector 𝑨.\bm{A}_{\mathcal{L}}.

We begin by studying the connection between the (2pD1)×1(2^{pD}-1)\times 1 vector 𝑨\bm{A}_{\mathcal{L}} and the multinomial distribution from the corresponding discretization with 2pD2^{pD} categories. We order the indices Λ\Lambda’s in 𝑨\bm{A}_{\mathcal{L}} by the integer corresponding to the binary vector representation vec(ΛT),vec(\Lambda^{T}), where vec()vec(\cdot) is the vectorization function. For example, the last (i.e. the (2pD1)(2^{pD}-1)th) entry in 𝑨\bm{A}_{\mathcal{L}} corresponds to the Λ=𝟏p×D\Lambda={\bm{1}}^{p\times D}. We also denote the 2pD×12^{pD}\times 1 vector of cell probabilities in the multinomial distribution by 𝒑c.\bm{p}_{c}. Label the entries in 𝒑c\bm{p}_{c} by binary matrices Λ𝐁p×D\Lambda\in\mathbf{B}^{p\times D} through Λ=Λ1\raisebox{-.9pt} {r}⃝ \raisebox{-.9pt} {r}⃝ Λp,\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\ldots\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{p}, where each realization of the 2D×12^{D}\times 1 vector Λj\Lambda_{j} labels one of the 2D2^{D} intervals for dimension jj from low to high according to 1 plus the integer corresponding to the binary representation of ΛjT\Lambda_{j}^{T}. We define a 2pD×12^{pD}\times 1 random vector 𝒁=(ZΛ)Multinomial(1,𝒑c)\bm{Z}=(Z_{\Lambda})\sim Multinomial(1,\bm{p}_{c}) to denote one draw from the 2pD2^{pD} intervals from the discretization. With the above notation, we develop the general binary interaction design (BID) equation, which extends the two-dimensional case in Zhang (2019).

Theorem 4.1.

Let 𝐀c=(1,𝐀T)T,\bm{A}_{c}=(1,\bm{A}_{\mathcal{L}}^{T})^{T}, 𝛍c=𝐄[𝐀c]\bm{\mu}_{c}=\mathbf{E}[\bm{A}_{c}] and 𝚺𝛍c=𝐄[𝐀c𝐀cT].\bm{\Sigma}_{\bm{\mu}_{c}}=\mathbf{E}[\bm{A}_{c}\bm{A}_{c}^{T}]. Denote the 2pD×2pD2^{pD}\times 2^{pD} Sylvester’s Hadamard matrix by 𝐇.\mathbf{H}. We have the binary interaction design (BID) equation

𝑨c=𝐇𝒁.\bm{A}_{c}=\mathbf{H}\bm{Z}. (4.1)

In particular, we have the BID equation for the mean vector

𝝁c=𝐇𝒑c\bm{\mu}_{c}=\mathbf{H}\bm{p}_{c} (4.2)

and the corresponding BID equation for 𝚺𝛍c\bm{\Sigma}_{\bm{\mu}_{c}}

𝚺𝝁c=𝐇diag(𝒑c)𝐇,\bm{\Sigma}_{\bm{\mu}_{c}}=\mathbf{H}diag(\bm{p}_{c})\mathbf{H}, (4.3)

where diag(𝐩c)diag(\bm{p}_{c}) is the diagonal matrix with diagonal entries corresponding to 𝐩c\bm{p}_{c}.

The Hadamard matrix 𝐇\mathbf{H} is also referred to as the Walsh matrix in engineering, where the linear transformation with 𝐇\mathbf{H} is referred to as the Hadamard transform (Lynn, 1973; Golubov et al., 2012; Harmuth, 2013). The earliest referral to the Hadamard matrix we found in the statistical literature is Pearl (1971), and it is also closely related to the orthogonal full factorial design (Cox and Reid, 2000; Box et al., 2005). In our context of testing independence, the BID equation can be regarded as a transformation from the physical domain to the frequency domain, which turns the focus to global forms of non-uniformity instead of local ones. In developing statistics, this transformation facilitates regularizations through thresholding, as 𝝁=𝟎\bm{\mu}_{\mathcal{L}}={\bm{0}} is equivalent to uniformity 𝒑c=1/2pD𝟏.\bm{p}_{c}=1/2^{pD}{\bm{1}}. This transformation also enables clear interpretations of statistical significance with the form of dependency, as shown in Zhang (2019).

To study the power of the test of uniformity, we further study the properties of the first two moments of 𝑨.\bm{A}_{\mathcal{L}}. Let 𝝁=𝐄[𝑨]\bm{\mu}_{\mathcal{L}}=\mathbf{E}[\bm{A}_{\mathcal{L}}] and 𝚺𝝁=𝐄[𝑨𝑨T]\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}=\mathbf{E}[\bm{A}_{\mathcal{L}}\bm{A}_{\mathcal{L}}^{T}] denote the vector of expectations and the matrix of second moments of 𝑨\bm{A}_{\mathcal{L}} respectively. We summarize some properties of 𝝁\bm{\mu}_{\mathcal{L}} and 𝚺𝝁\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}} in the following theorem.

Theorem 4.2.

We have the following results on the properties of first two moments of binary interaction variables in the binary expansion filtration.

  1. (a)

    The connection between the first and second moments of binary interactions:

    𝝁cT𝚺𝝁c1𝝁c=1.\bm{\mu}_{c}^{T}\bm{\Sigma}_{\bm{\mu}_{c}}^{-1}\bm{\mu}_{c}=1. (4.4)
  2. (b)

    The connection between the harmonic mean of probabilities and the Hotelling’s T2T^{2} quadratic form when pΛ>0,Λp_{\Lambda}>0,\forall\Lambda\in\mathcal{L}:

    122pDΛpΛ1=1+𝝁T(𝚺𝝁𝝁𝝁T)1𝝁=(1𝝁T𝚺𝝁1𝝁)1.\frac{1}{2^{2pD}}\sum_{\Lambda\in\mathcal{L}}p_{\Lambda}^{-1}=1+\bm{\mu}_{\mathcal{L}}^{T}(\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T})^{-1}\bm{\mu}_{\mathcal{L}}=(1-\bm{\mu}_{\mathcal{L}}^{T}\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}^{-1}\bm{\mu}_{\mathcal{L}})^{-1}. (4.5)
  3. (c)

    For 𝝁\bm{\mu}_{\mathcal{L}} with 𝝁2(2pD1)1/2,\|\bm{\mu}_{\mathcal{L}}\|_{2}\leq(2^{pD}-1)^{-1/2}, with constant cp,D=(2pD2)/2pD1,c_{p,D}=(2^{pD}-2)/\sqrt{2^{pD}-1},

    𝝁22cp,D𝝁23𝝁T𝚺𝝁𝝁𝝁22+cp,D𝝁23.\|\bm{\mu}_{\mathcal{L}}\|_{2}^{2}-c_{p,D}\|\bm{\mu}_{\mathcal{L}}\|_{2}^{3}\leq\bm{\mu}_{\mathcal{L}}^{T}\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}\bm{\mu}_{\mathcal{L}}\leq\|\bm{\mu}_{\mathcal{L}}\|_{2}^{2}+c_{p,D}\|\bm{\mu}_{\mathcal{L}}\|_{2}^{3}. (4.6)
  4. (d)

    Denote the vector-valued function (𝚺𝝁𝝁𝝁T)1𝝁(\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T})^{-1}\bm{\mu}_{\mathcal{L}} by 𝒈(𝝁)=(gΛ(𝝁))\bm{g}(\bm{\mu}_{\mathcal{L}})=(g_{\Lambda}(\bm{\mu}_{\mathcal{L}})) for each Λ\Lambda\in\mathcal{L}. As 𝝁20,\|\bm{\mu}_{\mathcal{L}}\|_{2}\rightarrow 0,

    gΛ(𝝁)=μΛ+o(𝝁2).g_{\Lambda}(\bm{\mu}_{\mathcal{L}})=\mu_{\Lambda}+o(\|\bm{\mu}_{\mathcal{L}}\|_{2}). (4.7)

To the best of our knowledge, the results in Theorem 4.2, despite their simplicity, have not been documented in literature. These simple results unveil interesting insights of the first two moments of binary variables in the filtration. The quadratic form in (4.4) characterizes the functional relationship between 𝝁c\bm{\mu}_{c} and 𝚺𝝁c\bm{\Sigma}_{\bm{\mu}_{c}}. The two equations in (4.5) show that for binary variables, the Hotelling T2T^{2} quadratic form is a monotone function of the harmonic mean of the cell probabilities in the corresponding multinomial distribution. The inequalities in (4.6) reveal the eigen structure of 𝚺𝝁\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}} when the signal 𝝁\bm{\mu}_{\mathcal{L}} is weak. The Taylor expansion in (4.7) provides the asymptotic behavior of (𝚺𝝁𝝁𝝁T)1𝝁(\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T})^{-1}\bm{\mu}_{\mathcal{L}} when the joint distribution is close to the uniform distribution. These insights shed important lights on how we can develop a powerful goodness-of-fit test, as we explain in Sections 4.2 and 4.3.

4.2 An Oracle Approach for Test Construction

In this section, we study how to construct a powerful robust nonparametric test of uniformity based on what we learned in Sections 3 and 4.1. As discussed in Section 3, the deterministic weights of symmetry statistics in existing tests create an issue on the uniformity and robustness: They make the test powerful for some alternatives but not for others. Therefore, we construct a test statistic with data-adaptive weights, which allow the test to adjust itself towards the alternative to improve the power. We refer this class of statistics as the binary expansion adaptive symmetry test (BEAST).

We construct our test through an oracle approach. Suppose we know from an oracle 𝝁\bm{\mu}_{\mathcal{L}} and thus 𝚺𝝁\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}} as shown in Theorem 4.1. Note that by Theorem 4.1, 4.2 and 4.3 in Zhang (2019), under H0,DH_{0,D} of (1.4), the distribution of 𝑺¯\bar{\bm{S}}_{\mathcal{L}} is either pairwise independent centered binomial or uncorrelated centered hypergeometric, depending on whether the marginal distributions are known. Therefore, for fixed pp and DD, with a large nn and the central limit theorem on 𝑺¯=𝑺/n\bar{\bm{S}}_{\mathcal{L}}=\bm{S}_{\mathcal{L}}/n, we approximately have a simple-versus-simple hypothesis testing problem:

H0:n𝑺¯𝒩(𝟎,𝐈) v.s. H1:n(𝑺¯𝝁)𝒩(𝟎,𝚺𝝁𝝁𝝁T).H_{0}:\sqrt{n}\bar{\bm{S}}_{\mathcal{L}}\sim\mathcal{N}({\bm{0}},\mathbf{I})\text{ v.s. }H_{1}:\sqrt{n}(\bar{\bm{S}}_{\mathcal{L}}-\bm{\mu}_{\mathcal{L}})\sim\mathcal{N}({\bm{0}},\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T}).

According to the fundamental Neyman-Pearson Lemma (Neyman and Pearson, 1933), the corresponding most powerful (MP) test is the likelihood ratio test. We thus consider the data-relevant part of the log-likelihood ratio of the above two distributions,

f𝑺¯(𝝁)=12n𝑺T(𝐈(𝚺𝝁𝝁𝝁T)1)𝑺+𝝁T(𝚺𝝁𝝁𝝁T)1𝑺.f_{\bar{\bm{S}}_{\mathcal{L}}}(\bm{\mu}_{\mathcal{L}})=-\frac{1}{2n}\bm{S}_{\mathcal{L}}^{T}(\mathbf{I}-(\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T})^{-1})\bm{S}_{\mathcal{L}}+\bm{\mu}_{\mathcal{L}}^{T}(\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T})^{-1}\bm{S}_{\mathcal{L}}.

For a large nn, the dominating term in f𝑺¯(𝝁)f_{\bar{\bm{S}}_{\mathcal{L}}}(\bm{\mu}_{\mathcal{L}}) is 𝝁T(𝚺𝝁𝝁𝝁T)1𝑺.\bm{\mu}_{\mathcal{L}}^{T}(\bm{\Sigma}_{\bm{\mu}_{\mathcal{L}}}-\bm{\mu}_{\mathcal{L}}\bm{\mu}_{\mathcal{L}}^{T})^{-1}\bm{S}_{\mathcal{L}}. By (4.7) in Theorem 4.2, the first order Taylor expansion of this term is precisely 𝝁T𝑺\bm{\mu}_{\mathcal{L}}^{T}\bm{S}_{\mathcal{L}}! This implies that the MP test rejects when 𝑺¯\bar{\bm{S}}_{\mathcal{L}} is colinear with 𝝁.\bm{\mu}_{\mathcal{L}}. The above heuristics thus suggests that we consider the oracle test statistic Boracle=𝝁T𝑺¯/𝝁2B_{\text{oracle}}=\bm{\mu}_{\mathcal{L}}^{T}\bar{\bm{S}}_{\mathcal{L}}/\|\bm{\mu}_{\mathcal{L}}\|_{2}.

In our simulation studies in Section 5, since we know the form of the alternative distribution, we can estimate 𝝁\bm{\mu}_{\mathcal{L}} with high accuracy through an independent simulation. That is, with the known alternative distribution 𝐏𝑼\mathbf{P}_{\bm{U}}, for a large KK we simulate 𝑽1,,𝑽Ki.i.d.𝐏𝑼.\bm{V}_{1},\ldots,\bm{V}_{K}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathbf{P}_{\bm{U}}. From the binary expansion of 𝑽1,,𝑽K,\bm{V}_{1},\ldots,\bm{V}_{K}, we obtain the vector of symmetry statistics 𝑺~\tilde{\bm{S}}_{\mathcal{L}} and an estimate of 𝝁\bm{\mu}_{\mathcal{L}} denoted by 𝝁~=𝑺~/n.\tilde{\bm{\mu}}_{\mathcal{L}}=\tilde{\bm{S}}_{\mathcal{L}}/n. The oracle test statistic from simulations is then B~oracle=𝝁~T𝑺¯/𝝁~.\widetilde{B}_{\text{oracle}}=\tilde{\bm{\mu}}_{\mathcal{L}}^{T}\bar{\bm{S}}_{\mathcal{L}}/\|\tilde{\bm{\mu}}_{\mathcal{L}}\|.

We show in simulations that even when DD is as small as 33, B~oracle\widetilde{B}_{\text{oracle}} is powerful, and numerically it outperforms all existing competitors under consideration across a wide spectrum of alternatives and noise levels. For example, for the cases when the joint distributions are Gaussian with linear dependency, the power curves of B~oracle\widetilde{B}_{\text{oracle}} dominate those of the distance correlation when p=2p=2 and the F-test when p=3,p=3, which are known to be optimal. Compared to existing tests, the gain of the BEAST with oracle in power suggests that suitably chosen deterministic weights for the alternative provide a unified yet simple solution to improve the power. To the best of our knowledge, this is the first time that such a benchmark on the feasible power performance is available for the problem of testing uniformity.

Besides the useful insight about the feasible limit of power, the oracle also provides insights on the optimal weights under each alternative. For example, in simulations we find high colinearity between the approximate oracle weight vector 𝝁~\tilde{\bm{\mu}}_{\mathcal{L}} and that of the Spearman’s ρ\rho, 𝒓D\bm{r}_{D}, as found in Section 3.1. This weight vector makes the one-sided test with B~oracle\widetilde{B}_{\text{oracle}} more powerful than the two-sided test with Spearman’s ρ.\rho.

Although the optimal weight 𝝁\bm{\mu}_{\mathcal{L}} or 𝝁~\tilde{\bm{\mu}}_{\mathcal{L}} is unknown in practice, an unbiased and asymptotically efficient estimate of 𝝁\bm{\mu}_{\mathcal{L}} is 𝑺¯.\bar{\bm{S}}_{\mathcal{L}}. This motivates us to develop an approximation of B~oracle\widetilde{B}_{\text{oracle}} through resampling and regularization, which we discuss next.

4.3 The BEAST Statistic

In practice, we are agnostic about 𝝁.\bm{\mu}_{\mathcal{L}}. Blindly replacing 𝝁~\tilde{\bm{\mu}}_{\mathcal{L}} in B~oracle\widetilde{B}_{\text{oracle}} with 𝑺¯\bar{\bm{S}}_{\mathcal{L}} will result in colinearity with itself and the statistic reduces to the classical χ2\chi^{2}-test statistic. Traditionally, the data-splitting strategy has often been employed for this type of situations to facilitate data-driven decision (Hartigan, 1969; Cox, 1975), i.e., half of the data is used to calibrate the statistical procedure such as screening the null features (Wasserman and Roeder, 2009; Barber and Candès, 2019), determining the proper weights for individual hypotheses (Ignatiadis et al., 2016), recovering the optimal projection for dimension reducion (Huang, 2015), and estimating the latent loading for factor models (Fan et al., 2019), while a statistical decision is implemented using the remaining half. However, the single data-splitting procedure only uses half of the data for decision making, which inevitably bears undesirable randomness and therefore leads to power loss for hypothesis testing. Some recent efforts have shown that this shortcoming can be lessened by using multiple splittings (Romano and DiCiccio, 2019; Liu et al., 2019; Dai et al., 2020).

Motivated by the principle of multiple splitting, we propose to approximate BoracleB_{\text{oracle}} through resampling: We replace 𝝁\bm{\mu}_{\mathcal{L}} in BoracleB_{\text{oracle}} with 𝑺¯\bar{\bm{S}}_{\mathcal{L}}, and we replace 𝑺¯\bar{\bm{S}}_{\mathcal{L}} in BoracleB_{\text{oracle}} with its resampling version 𝑺¯\bar{\bm{S}}_{\mathcal{L}}^{*}. Important resampling methods include bootstrap (Efron and Tibshirani, 1994) and subsampling (Politis et al., 1999). Bootstrap and subampling are known to have similar performance in approximating the sampling distribution of the target statistic. In this paper, we use the subsampling method to facilitate the calculation of the empirical copula distribution when the marginal distributions are unknown Nelsen (2007). In addition to the above consideration, one intuition behind this resampling approach is to help distinguish the alternative distribution from the null: Under the null, since 𝝁=𝟎,\bm{\mu}_{\mathcal{L}}={\bm{0}}, we expect the magnitude of 𝑺¯\bar{\bm{S}}_{\mathcal{L}} and 𝑺¯\bar{\bm{S}}_{\mathcal{L}}^{*} to be small and not very colinear after regularization. On the other hand, under the alternative, since 𝝁𝟎,\bm{\mu}_{\mathcal{L}}\neq{\bm{0}}, we expect the two estimations of 𝝁\bm{\mu}_{\mathcal{L}} to be both colinear with 𝝁\bm{\mu}_{\mathcal{L}} and thus to be highly colinear themselves. Therefore, the magnitude of the test statistic could be different to help distinguish the alternative distribution from the null.

In addition, we apply regularization to accommodate sparsity, i.e., the non-uniformity can be explained by a few binary interactions Λ\Lambda’s with 𝐄[AΛ]0.\mathbf{E}[A_{\Lambda}]\neq 0. This sparsity assumption is often reasonable in the BET framework, since 𝐄[AΛ]=0\mathbf{E}[A_{\Lambda}]=0 is equivalent to the symmetry of distribution according to the interaction Λ\Lambda. Thus, sparsity over 𝐄[AΛ]\mathbf{E}[A_{\Lambda}]’s is equivalent to a highly symmetric distribution. For example, if a multivariate distribution is symmetric in every direction, then each one-dimensional projection of this distribution has a real characteristic function. By Theorem 2.2, we have 𝐄[AΛ]=0\mathbf{E}[A_{\Lambda}]=0 for all Λ\Lambda involving an even number of binary variables (𝟏pTΛ𝟏D{\bm{1}}_{p}^{T}\Lambda{\bm{1}}_{D} is even). Many global forms of dependency also correspond to sparse structures in 𝝁.\bm{\mu}_{\mathcal{L}}.

The estimation of 𝝁\bm{\mu}_{\mathcal{L}} under the sparsity assumption is closely related to the normal mean problem, where many good regularization based methods are readily available (Wasserman, 2006). For example, in Donoho and Johnstone (1994), it is shown that estimation with soft thresholding is nearly optimal. We denote the vector-valued soft thresholding function by 𝒯(𝒙,λ){\mathcal{T}}(\bm{x},\lambda) for q×1q\times 1 vector 𝒙\bm{x} and threshold λ>0,\lambda>0, so that 𝒯(𝒙,λ)=sign(x)(|x|λ)+,=1,,q.{\mathcal{T}}_{\ell}(\bm{x},\lambda)=\text{sign}(x_{\ell})(|x_{\ell}|-\lambda)_{+},~{}\ell=1,\ldots,q. In construction of our test statistic, we choose to use soft-thresholding as a regularization step to screen the small observations in 𝑺¯\bar{\bm{S}}_{\mathcal{L}} and 𝑺¯\bar{\bm{S}}_{\mathcal{L}}^{*} due to the null distribution or due to the sparsity 𝐄[AΛ]=0\mathbf{E}[A_{\Lambda}]=0 for certain interaction Λ\Lambda’s under the alternative, thus improves the power of the test statistic.

In summary, we consider the approximation of BoracleB_{\text{oracle}} through subsampling, while using regularization to obtain a good estimate of the optimal weight vector 𝒯(𝑺¯,λ)/𝒯(𝑺¯,λ)2.{\mathcal{T}}(\bar{\bm{S}}_{\mathcal{L}},\lambda)/\|{\mathcal{T}}(\bar{\bm{S}}_{\mathcal{L}},\lambda)\|_{2}. The detailed steps are listed below.

  1. Step 1:

    From nn observations of 𝑼1,,,𝑼n,\bm{U}_{1},,\ldots,\bm{U}_{n}, obtain mm subsamples of size rr: 𝑼1,k,,𝑼r,k\bm{U}_{1,k}^{*},\ldots,\bm{U}_{r,k}^{*}, k=1,,m.k=1,\ldots,m. For each subsample k,k, base on the binary expansions of 𝑼1,k,,𝑼r,k\bm{U}_{1,k}^{*},\ldots,\bm{U}_{r,k}^{*}, find the vector of average symmetry statistics 𝑺¯,k\bar{\bm{S}}_{\mathcal{L},k}^{*}.

  2. Step 2:

    Take the average over mm subsamples to obtain 𝑺¯=m1k=1m𝑺¯,k.\bar{\bm{S}}_{\mathcal{L}}^{*}={m}^{-1}\sum_{k=1}^{m}\bar{\bm{S}}_{\mathcal{L},k}^{*}. Apply the soft-thresholding function to get an estimate of 𝝁\bm{\mu}_{\mathcal{L}} as 𝒯(𝑺¯,λ).{\mathcal{T}}(\bar{\bm{S}}_{\mathcal{L}}^{*},\lambda).

  3. Step 3:

    The BEAST statistic BλB_{\lambda} is obtained as

    Bλ=𝒯(𝑺¯,λ)T𝒯(𝑺¯,λ)/𝒯(𝑺¯,λ)2.B_{\lambda}={\mathcal{T}}(\bar{\bm{S}}_{\mathcal{L}},\lambda)^{T}{\mathcal{T}}(\bar{\bm{S}}^{*}_{\mathcal{L}},\lambda)/\|{\mathcal{T}}(\bar{\bm{S}}_{\mathcal{L}},\lambda)\|_{2}. (4.8)

We study the empirical power of the BEAST in Section 5, which shows that by approximating B~oracle\widetilde{B}_{\text{oracle}} with regularization and subsampling, BλB_{\lambda} has a robust power against many alternative distributions, especially complex nonlinear forms of dependency.

We now study the asymptotic distributional properties of BλB_{\lambda} under the assumption of known marginal distributions. Denote the 2pD×12^{pD}\times 1 vector of cell proportions of the discretization out of nn samples by 𝒑^c.\widehat{\bm{p}}_{c}. We have the following theorem on the distribution of the subsample symmetry statistic 𝑺¯\bar{\bm{S}}_{\mathcal{L}} condition on 𝑺¯\bar{\bm{S}}_{\mathcal{L}}.

Theorem 4.3.

Condition on 𝐒¯,\bar{\bm{S}}_{\mathcal{L}}, as m,m\rightarrow\infty, we have

m(𝑺¯𝑺¯)𝒩(𝟎,nrr(n1)(𝐇diag(𝒑^c)𝐇𝑺¯𝑺¯T)[1,1])\sqrt{m}(\bar{\bm{S}}_{\mathcal{L}}^{*}-\bar{\bm{S}}_{\mathcal{L}})\sim\mathcal{N}\bigg{(}{\bm{0}},\frac{n-r}{r(n-1)}(\mathbf{H}diag(\widehat{\bm{p}}_{c})\mathbf{H}-\bar{\bm{S}}_{\mathcal{L}}\bar{\bm{S}}_{\mathcal{L}}^{T})_{[-1,-1]}\bigg{)}

where M[1,1]M_{[-1,-1]} is the submatrix of MM with the first row and first column removed.

Theorem 4.3 holds both under the null distribution and the alternative distribution. This result thus provides useful guidance and efficient algorithms to simulate the null and alternative distributions of BλB_{\lambda} for any λ.\lambda. The detailed asymptotic distribution of BλB_{\lambda} with a positive λ\lambda and the analysis of the power function are useful for developing optimal adaptive tests and are interesting problems for future studies.

4.4 Practical Considerations

In this section, we discuss some practical considerations in applying the BEAST. The first practical issue is whether using the empirical CDF would lead to some loss of power. As discussed in Zhang (2019), the difference between using the known CDF and empirical CDF is similar to the difference between the multinomial model and the multivariate hypergeometric model for the contingency table, in which the theory and performance are similar too. In all of our numerical studies, we considered the method using the empirical CDF.

A related issue is the choice of depth DD and threshold λ\lambda in practice. In our simulations, we find that with D=3D=3, the BEAST with oracle has a higher power than the linear model based tests for Gaussian data, which indicates that D=3D=3 is sufficiently large to detect many important forms of dependency. In our simulation studies in Section 5 below and Section D in the supplmentary materials, we also show the power is relatively stable as DD increases slightly. Moreover, data studies show that using D=2D=2 can already provide many interesting findings Xiang et al. (2022) in practice. Therefore, we choose D=3D=3 for this paper. We shall also choose a λ=O(Dp/n)\lambda=O(\sqrt{Dp/n}) according to the large deviation of the symmetry statistic. That is approximately 90%90\% of these symmetry statistics under the null of uniformity will be thresholded to 0. A general optimal choice of DD and λ\lambda for some specific alternative should come from a trade-off between them and n,p,n,p, and the signal strength. This would be an interesting problem for future studies.

5 Simulation Studies

5.1 Testing Bivariate Independence

In this section, we consider the problem of testing the bivariate independence. The sample size is set to be n=128n=128. The BEAST with oracle and the BEAST are constructed with the empirical copula distribution and with 2,D,cross={Λ=Λ1\raisebox{-.9pt} {r}⃝ Λ2:Λ11,D,unif and Λ21,D,unif},m=128\mathcal{L}_{2,D,\text{cross}}=\{\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{2}:\Lambda_{1}\in\mathcal{L}_{1,D,\text{unif}}\text{ and }\Lambda_{2}\in\mathcal{L}_{1,D,\text{unif}}\},m=128, D=3D=3, r=24r=24, and λ=(pDlog2)(8n)1=0.064\lambda=\sqrt{(pD\log 2)(8n)^{-1}}=0.064. For the BEAST with oracle, we choose K=105K=10^{5} to obtain the oracle weights 𝝁~\widetilde{\bm{\mu}}_{\mathcal{L}} and B~oracle\widetilde{B}_{\text{oracle}} for each alternative distribution, and the critical region is then obtained through 10410^{4} draws from the bivariate uniform distribution over [0,1]2[0,1]^{2}. For the BEAST, the critical region is formed with 100100 BλB_{\lambda}’s permuted from the data. The level of all tests is set to be 0.1.0.1.

We compare the power of the two versions of the BEAST with the following methods: the χ2\chi^{2}-test and its improvement the UU-statistic permutation (USP) test (Berrett et al., 2021; Berrett and Samworth, 2021) with the same discretization for BλB_{\lambda}, the Fisher exact scanning (Ma and Mao, 2019), the distance correlation (Székely et al., 2007), the kk-nearest neighbor mutual information (KNN-MI, Kinney and Atwal (2014)) with the default parameters, the kk-nearest neighbour based Mutual Information Test (MINT, Berrett and Samworth (2019)) with default averaging over kk, MaxBET (Zhang, 2019), BERET (Lee et al., 2023), and the high-dimensional multinomial test (HDMultinomial) by Balakrishnan and Wasserman (2019a). Among these tests, the HDMultinomial, the MINT, and the USP test are shown to be minimax optimal in power.

The data (xi,yi),i=1,2,,n=128(x_{i},y_{i}),i=1,2,\cdots,n=128 of the alternative distributions are generated according to the four different settings in Table 1 below. Parameter κ\kappa is evenly spaced over [0,1][0,1] to represent the level of noise. The settings are chosen such that the power curves display a thorough comparison for different signal strengths. In Figure 1, 1,0001,000 simulations are conducted to calculate the empirical power of each test for each setting with a given κ\kappa.

Table 1: Simulation scenarios for p=2p=2: The following variables are all independent. ϵj𝒩(0,1)\epsilon_{j}\sim\mathcal{N}(0,1) for j=1,,8j=1,\ldots,8; UUnif[1,1]U\sim\text{Unif}[-1,1] ; ϑUnif[π,π]\vartheta\sim\text{Unif}[-\pi,\pi]; WMulti-Bern({1,2,3},(1/3,1/3,1/3))W\sim\text{Multi-Bern}(\{1,2,3\},(1/3,1/3,1/3)); V1Bern({2,4},(1/2,1/2))V_{1}\sim\text{Bern}(\{2,4\},(1/2,1/2)); and V2Multi-Bern({1,3,5},(1/3,1/3,1/3))V_{2}\sim\text{Multi-Bern}(\{1,3,5\},(1/3,1/3,1/3)). κ\kappa is evenly spaced between 0 and 1.
Scenario Generation of XX Generation of YY
Bivariate Normal X=0.40.3κϵ1+0.6+0.3κϵ2\displaystyle X=\sqrt{0.4-0.3\kappa}\epsilon_{1}+\sqrt{0.6+0.3\kappa}\epsilon_{2} Y=0.40.3κϵ1+0.6+0.3ϵ3\displaystyle Y=\sqrt{0.4-0.3\kappa}\epsilon_{1}+\sqrt{0.6+0.3}\epsilon_{3}
Parabolic X=UX=U~{}~{}~{}~{}~{} Y=0.25X2+(0.4κ+0.1)ϵ4Y=0.25X^{2}+(0.4\kappa+0.1)\epsilon_{4}
Circle X=cosϑ+(0.6κ+0.1)ϵ5X=\cos\vartheta+(0.6\kappa+0.1)\epsilon_{5}~{}~{}~{}~{}~{} Y=sinϑ+(0.6κ+0.1)ϵ6Y=\sin\vartheta+(0.6\kappa+0.1)\epsilon_{6}
Checkerboard X=W+(0.3κ+0.05)ϵ7X=W+(0.3\kappa+0.05)\epsilon_{7}~{}~{}~{}~{}~{} Y=V1𝕀(W=2)+V2𝕀(W2)+(1.2κ+0.2)ϵ8\displaystyle Y=V_{1}\mathbb{I}(W=2)+V_{2}\mathbb{I}(W\neq 2)+(1.2\kappa+0.2)\epsilon_{8}
Refer to caption
Figure 1: The power curves of various methods when testing the bivariate independence under four alternatives. The sample size n=128n=128 and the depth of the BEAST is chosen as 33. The level of significance is set to be 0.10.1. The BEAST with oracle provides a benchmark on the feasible power for all cases. The power of the BEAST consistently ranks within the top three among all tests for all cases, while being the best under the “Parabolic” and “Circle” cases.

We first comment on the performance of the BEAST with oracle. Although this test is not achievable in practice, it provides many important insights in these simulation examples. From Figure 1, we see that with a small depth D=3D=3, the BEAST with oracle achieves the highest power among all methods, for every alternative distribution and every level of noise. In particular, under the bivariate normal case, the power curve of B~oracle\widetilde{B}_{\text{oracle}} is higher than that of the distance correlation, while leaving substantial gaps to other nonparametric tests. The good performance of the distance correlation is expected, since it has been shown that it is a monotone function of Pearson correlation under normality (Székely et al., 2007). These facts thus again show that the BEAST with oracle can accurately approximate the optimal power under an alternative. Therefore, the BEAST with oracle provides a useful benchmark for the performance of tests.

Moreover, in this case we find high colinearity between the approximate oracle weight vector 𝝁~\tilde{\bm{\mu}}_{\mathcal{L}} and that of the Spearman’s ρ\rho, 𝒓D\bm{r}_{D}, as found in Section 3.1. This shows the ability of B~oracle\widetilde{B}_{\text{oracle}} to approximate the optimal weights. The higher power of B~oracle\widetilde{B}_{\text{oracle}} can be also attributed to knowing the sign of correlation under this oracle.

The optimality of the BEAST with oracle is further demonstrated in other three more complicated scenarios with nonlinear dependency, where its power curve dominates all others by a huge margin. This result again indicates the potential of gains in power for these alternatives. To the extent of our knowledge, the BEAST with oracle is the first method in literature that evidences the potential of profound improvement in power via a suitable choice of weights.

We now turn to the comparison of BλB_{\lambda} with existing tests. The general phenomenon in Figure 1 is that every existing test has some advantageous and disadvantageous scenarios. For examples, the Spearman’s ρ\rho will have optimal power under the “Bivariate Normal” case while being powerless in the other three situations due to a zero correlation, the χ2\chi^{2}-test has a good power in the “Checkerboard” scenario but has the worst power under the “Bivariate Normal” case, and the distance correlation has a high power under the “Bivariate Normal” and “Parabolic” cases while not performing well in the other two. These phenomena about the power properties of these three tests can be explained by the deterministic weight matrices in the approximate quadratic form of symmetry statistics, as discussed in Section 3.

The empirical power of the BEAST, however, is always high against each alternative distribution and consistently ranks within the top three among all tests, for all alternatives, and for all levels of noise. In particular, the power curve of BλB_{\lambda} dominates those of other tests under the scenarios ‘Parabolic” and “Circle.” The reasons for this high power include (a) the subsampling approximation of the optimal weights 𝝁\bm{\mu}_{\mathcal{L}} and the approximate MP test statistic B~oracle\widetilde{B}_{\text{oracle}} and (b) the regularization step with soft-thresholding which takes advantage of the equivalence of sparsity and symmetry.

Note also that under the “Checkerboard” scenario, the data contain several natural clusters. This feature of the alternative distribution would favor statistical methods from the kk-nearest neighbour methods. Therefore, the good powers of KNN-MI and MINT are expected. The fact that BλB_{\lambda} has competitive power with KNN-MI and MINT under this scenario again demonstrates the ability of the BEAST to provide a high power despite being agnostic of the specific alternative.

5.2 Testing Independence of Response and Predictors

In this section, we consider the test of independence between a bivariate predictor (X1,X2)(X_{1},X_{2}) and one univariate response YY. Based on the BEAUTY equation in Theorem 2.2 and the discussions afterwards, this test at depth DD is equivalent to test H0:𝐄[AΛ]=𝟎H_{0}:\mathbf{E}[A_{\Lambda}]={\bm{0}} for Λ3,D,cross\Lambda\in\mathcal{L}_{3,D,\text{cross}} where

3,D,joint cross={Λ=Λ1\raisebox{-.9pt} {r}⃝ Λ2:Λ12,D,unif,Λ21,D,unif}.\mathcal{L}_{3,D,~{}\text{joint cross}}=\left\{\Lambda=\Lambda_{1}\raisebox{0.5pt}{\raisebox{-.9pt} {r}⃝ }\Lambda_{2}:\Lambda_{1}\in\mathcal{L}_{2,D,\text{unif}},\Lambda_{2}\in\mathcal{L}_{1,D,\text{unif}}\right\}. (5.1)

Thus, B~oracle\widetilde{B}_{\text{oracle}} and BλB_{\lambda} are constructed according to 3,D,cross.\mathcal{L}_{3,D,\text{cross}}. The critical regions of these statistics are obtained through permutations similarly to that in Section 5.1. With D=3D=3 and p=3p=3, we set λ=(pDlog2)(8n)1=0.078\lambda=\sqrt{(pD\log 2)(8n)^{-1}}=0.078 for the BEAST.

We compare B~oracle\widetilde{B}_{\text{oracle}} and BλB_{\lambda} with existing nonparametric tests of independence for vectors including the χ2\chi^{2}-test from the same discretization for BλB_{\lambda} with simulated pp-values, the FF-test from the linear model of YY against (X1,X2)(X_{1},X_{2}), the distance correlation (Székely et al., 2007), the kk-nearest neighbor mutual information (KNN-MI, Kinney and Atwal (2014)) with the default parameters, the kk-nearest neighbor based Mutual Information Test (MINT, Berrett and Samworth (2019)) with averaging over kk, BERET (Lee et al., 2023), and the multiscale Fisher’s independence test (MultiFIT, Gorsky and Ma (2018)).

The data (x1,i,x2,i,yi),i=1,2,,n=128(x_{1,i},x_{2,i},y_{i}),i=1,2,\cdots,n=128 are generated according to the settings in Table 2 below. The values of κ\kappa are evenly spaced over [0,1][0,1] to represent the strength of noise. The parameters in the scenarios are chosen such that the power curves in Figure 2 show a thorough comparison over different magnitude of signals.

Table 2: Simulation scenarios for p=3p=3: The following variables are all independent. ϵj𝒩(0,1)\epsilon_{j}\sim\mathcal{N}(0,1) for j=1,,6j=1,\ldots,6; Gj𝒩(0,1)G_{j}\sim\mathcal{N}(0,1) for j=1,2,3j=1,2,3; UjUnif[0,1]U_{j}\sim\text{Unif}[0,1] for j=1,2j=1,2; ϑUnif[π,π]\vartheta\sim\text{Unif}[-\pi,\pi]; and RBern({1,1},(1/2,1/2))R\sim\text{Bern}(\{-1,1\},(1/2,1/2)). κ\kappa is evenly spaced between 0 and 1. h(κ)=0.68+0.64κ0.32κ2.h(\kappa)=\sqrt{0.68+0.64\kappa-0.32\kappa^{2}}. In the sphere setting, 𝑮=(G12+G22+G32)1/2||\bm{G}||=(G_{1}^{2}+G_{2}^{2}+G_{3}^{2})^{1/2}. In the doule helix setting, c0=0.4κ+0.5c_{0}=0.4\kappa+0.5.
Scenario Generation of (X1,X2)(X_{1},X_{2}) Generation of YY
Linear (X1,X2)𝒩2(𝟎,𝐈2)(X_{1},X_{2})\sim\mathcal{N}_{2}({\bm{0}},\mathbf{I}_{2}) Y=0.4(1κ)(X1+X2)+h(κ)ϵ1Y=0.4(1-\kappa)(X_{1}+X_{2})+h(\kappa)\epsilon_{1}
Sphere (X1,X2)=(G1/𝑮,G2/𝑮)(X_{1},X_{2})=(G_{1}/||\bm{G}||,G_{2}/||\bm{G}||) Y=G3/𝑮+(0.7κ+0.3)ϵ2\displaystyle Y=G_{3}/||\bm{G}||+(0.7\kappa+0.3)\epsilon_{2}
Sine (X1,X2)=(U1,U2)(X_{1},X_{2})=(U_{1},U_{2}) Y=sin(4π(X1+X2))+(2κ+0.2)ϵ3\displaystyle Y=\sin\left(4\pi(X_{1}+X_{2})\right)+(2\kappa+0.2)\epsilon_{3}
Double Helix (X1,X2)=(Rcosϑ+c0ϵ4,Rsinϑ+c0ϵ5)(X_{1},X_{2})=(R\cos\vartheta+c_{0}\epsilon_{4},R\sin\vartheta+c_{0}\epsilon_{5}) Y=ϑ+c0ϵ6Y=\vartheta+c_{0}\epsilon_{6}
Refer to caption
Figure 2: The power curves of various methods when testing the independence between (X1,X2)(X_{1},X_{2}) and YY under four alternatives. The depth of the BEAST is 33 and n=128n=128. The level of significance is set to be 0.10.1. The BEAST with oracle provides a benchmark on the feasible power for all cases. The power of the BEAST is the highest among all tests for all nonlinear forms of dependency.

The messages from Figure 2 are similar to those when p=2.p=2. The BEAST with oracle leads the power under all scenarios to provide a benchmark for feasible power. In particular, under the “Linear” scenario, the gain of the power curve of BoracleB_{\text{oracle}} from those of the FF-test and the distance correlation demonstrates the ability of BoracleB_{\text{oracle}} to approximate the optimal power. Similar to what we observed in the bivariate cases, the huge margin between the power curve of BoracleB_{\text{oracle}} and other tests indicates the potential substantial gain in power with a proper choice of weights. By approximating the BEAST with oracle, BλB_{\lambda} achieves robust power against any form of alternative. The BEAST is particularly powerful against complex nonlinear forms of dependency, and its power curve leads others with a huge margin under all three nonlinear scenarios.

In summary, our simulations in this section show that BλB_{\lambda} can approximate the optimal power benchmarked by BoracleB_{\text{oracle}}. The BEAST demonstrates a robust power against many common alternatives in both dimensions p=2p=2 or 33. The BEAST is particularly powerful against a large class of complex nonlinear forms of dependency.

6 Empirical Data Analysis

In this section, we apply the BEAST method to the n=256n=256 visually brightest stars from the Hipparcos catalog (Hoffleit and Warren Jr, 1987; Perryman et al., 1997). For each star, a number of features about its location and brightness are recorded. Here, we are interested in detecting if there exists any dependence between the joint galactic coordinates (X1,X2X_{1},X_{2}) and the brightness of stars. We consider the absolute magnitude in this section, while study the visual magnitude in the Supplementary Materials. We consider the BEAST, χ2\chi^{2}-test, FF-test, distant correlation (Dist Corr), KNN-MI, MINT, and MultiFIT to this problem. The pp-values of all the approaches are summarized in Table 3. The BEAST is constructed with m=128m=128, r=48r=48, λ=(pDlog2)(8n)1=0.055\lambda=\sqrt{(pD\log 2)(8n)^{-1}}=0.055, and =3,D,joint cross\mathcal{L}=\mathcal{L}_{3,D,\text{joint cross}} defined in (5.1) where D=3D=3.

Table 3: The pp-values of various methods in testing the independence between the location and brightness of stars.
BEAST Dist Corr χ2\chi^{2}-test FF-test MultiFIT KNN-MI MINT
pp-value 0 0 0.091 6e-5 0.011 0.34 0.01

When testing the independence between the absolute magnitude and the galactic coordinates, this hypothesis is significant based on all the methods except KNN-MI. In addition to producing pp-values, the BEAST is capable to provide interpretation of the dependence while most competing methods cannot. Hence, we investigated the most important binary interaction among all possible combinations when analyzing the absolute magnitude. From each subsample, we record the most significant binary interaction. The most frequently occurred such interaction is Λ=(000110100).\Lambda=\left(\begin{array}[]{ccc}0&0&0\\ 1&1&0\\ 1&0&0\end{array}\right).

Refer to caption
Refer to caption
Refer to caption
Figure 3: Display of the binary interaction explaining the relationship between the location and brightness of stars. The left panel shows the scatter plot of galactic latitude (XX) and absolute magnitude (YY) on the original scale. The middle panel shows the empirical copula of this distribution, equipped with the most frequent binary interaction in subsamples. There are 162 points in white regions in contrast to 94 points in blue regions, resulting in a symmetry statistic is 6868 and a ZZ-statistic of 4.254.25 for testing the balance of points in white regions and blue regions. The right panel shows the scatter plot on the original scale equipped with the same binary interaction. We notice that brighter stars (lower YY) tend to fall between 16.1-16.1^{\circ} and 23.423.4^{\circ} in latitude, while darker stars (higher YY) tend to be outside this interval of XX. This pattern provides a scientifically meaningful explanation of the statistical significance.

Note that for this Λ\Lambda with a first row of 0’s, the first dimension (the galactic longitude) is not involved. In Figure 3, we plot the absolute magnitude against the galactic latitude. The left panel is the scatter plot of these two variables; the middle panel is the scatter plot after the copula transformation, grouped according to the aforementioned Λ\Lambda, with the white regions indicating positive interaction and blue regions indicating negative interaction; the right panel is the scatter plot on the original scale when grouped according to the same Λ\Lambda. The symmetry statistic for Λ\Lambda is 6868, resulting in a ZZ-statistic of 4.254.25 for testing the balance of points in white regions and blue regions. Among these stars with the most absolute magnitude, the majority of them are placed between 16.1-16.1^{\circ} and 23.423.4^{\circ} in latitude. Note that in the galactic coordinate system, the fundamental plane is approximately the galactic plane of the Milky Way galaxy. Therefore, the most frequent binary interaction Λ\Lambda makes scientific sense for the statistical significance: the bright stars in the data are around the fundamental plane of the Milky Way galaxy. This clear scientific interpretation of the statistical significance is an advantage of the BEAST and the general BET framework.

7 Summary and Discussions

We study the classical problem of nonparametric dependence detection through a novel perspective of binary expansion. The novel insights from the extension of the Euler formula and the binary expansion approximation of uniformity (BEAUTY) shed lights on the unification of important tests into the novel framework of the binary expansion adaptive symmetry test (BEAST), which considers a data-adaptively weighted sum of symmetry statistics from the binary expansion. The one-dimensional oracle on the weights leads to a benchmark of optimal power for nonparametric tests while being agnostic of the alternative. By approximating the oracle weights with resampling and regularization, the proposed BEAST demonstrates consistent performance and is effectively powerful against a variety of complex dependency forms, showcasing its potential across diverse scenarios.

Our study on powerful nonparametric tests of uniformity can be further extended and generalized to many directions. For example, extensions to general goodness-of-fit tests and two-sample tests can be investigated through the BEAST approach. Recent papers in these directions include Brown and Zhang (2023); Zhao et al. (2023a). Tests of other distributional properties related to uniformity, such as tests of Gaussianity and tests of multivariate symmetry can also be studied through the BEAST approach. Insights from these tests can be used in constructing distribution-free models of dependence Brown et al. (2022).

With Theorem 2.2, we further observe that for two arbitrary random vectors 𝑼1\bm{U}_{1} and 𝑼2\bm{U}_{2} distributed within [1,1]p1[-1,1]^{p_{1}} and [1,1]p2[-1,1]^{p_{2}} respectively, by comparing the characteristic function of the joint distribution and the product of marginal characteristic distributions, the null hypothesis that 𝑼1,D\bm{U}_{1,D} and 𝑼2,D\bm{U}_{2,D} are independent is equivalent to the following hypothesis over the expectations of AΛA_{\Lambda}’s:

H0,D:Cov(AΛ1,AΛ2)=0 for Λ1p1,D,unif and Λ2p2,D,unif.H_{0,D}:Cov(A_{\Lambda_{1}},A_{\Lambda_{2}})=0\text{ for }\Lambda_{1}\in\mathcal{L}_{p_{1},D,~{}\text{unif}}\text{ and }\Lambda_{2}\in\mathcal{L}_{p_{2},D,~{}\text{unif}}.

With this insight, we propose to study this general test of independence between random vectors in future work. In particular, we plan to study the connection between smooth forms of dependency and the order of binary interactions. A good understanding of this connection can help the development of distribution-free tests targeting at smooth forms of dependence.

Our simulation studies show a gap in empirical power between the BEAST and the BEAST with oracle. Thus the optimal trade-off between sample size, dimension, the depth of binary expansion, and the strength of the non-uniformity would be another interesting problem for investigation. The optimal subsampling and thresholding procedures are critical as well. Results on these problems would lead to a BEAST that is adaptively optimal for a wide class of distributions in power.

Software

The R function BEAST is freely available in the R package of BET.

Acknowledgement

Zhang’s research was partially supported by NSF DMS-1613112, NSF IIS-1633212, NSF DMS-1916237 and DMS-2152289. Zhao’s research was partially supported by NSF IIS-1633283 and DMS-2311216. Zhou’s research was partially supported by NSF-IIS 1545994, NSF IOS-1922701, DOE DE-SC0018344. The authors thank Dr. Xiao-Li Meng and Dr. David Donoho for their generous and insightful comments that substantially improve the paper.

References

  • Anderson and Darling [1954] T. W. Anderson and D. A. Darling. A test of goodness of fit. Journal of the American statistical association, 49(268):765–769, 1954.
  • Balakrishnan and Wasserman [2019a] S. Balakrishnan and L. Wasserman. Hypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates. The Annals of Statistics, 47(4):1893 – 1927, 2019a. doi: 10.1214/18-AOS1729. URL https://doi.org/10.1214/18-AOS1729.
  • Balakrishnan and Wasserman [2019b] S. Balakrishnan and L. Wasserman. Hypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates. The Annals of Statistics, 47(4):1893 – 1927, 2019b. doi: 10.1214/18-AOS1729. URL https://doi.org/10.1214/18-AOS1729.
  • Barber and Candès [2019] R. F. Barber and E. J. Candès. A knockoff filter for high-dimensional selective inference. The Annals of Statistics, 47(5):2504–2537, 2019.
  • Barber and Janson [2022] R. F. Barber and L. Janson. Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling. The Annals of Statistics, 50(5):2514–2544, 2022.
  • Barron [1989] A. R. Barron. Uniformly powerful goodness of fit tests. The Annals of Statistics, pages 107–124, 1989.
  • Berrett and Samworth [2019] T. B. Berrett and R. J. Samworth. Nonparametric independence testing via mutual information. Biometrika, 106(3):547–566, 2019.
  • Berrett and Samworth [2021] T. B. Berrett and R. J. Samworth. Usp: an independence test that improves on pearson’s chi-squared and the g-test. Proceedings of the Royal Society A, 477(2256):20210549, 2021.
  • Berrett et al. [2021] T. B. Berrett, I. Kontoyiannis, and R. J. Samworth. Optimal rates for independence testing via u-statistic permutation tests. The Annals of Statistics, 49(5):2457–2490, 2021.
  • Blum et al. [1961] J. R. Blum, J. Kiefer, and M. Rosenblatt. Distribution Free Tests of Independence Based on the Sample Distribution Function. The Annals of Mathematical Statistics, 32(2):485 – 498, 1961. doi: 10.1214/aoms/1177705055. URL https://doi.org/10.1214/aoms/1177705055.
  • Box et al. [2005] G. E. Box, J. S. Hunter, and W. G. Hunter. Statistics for experimenters: design, innovation, and discovery, volume 2. Wiley-Interscience New York, 2005.
  • Brown and Zhang [2023] B. Brown and K. Zhang. The AUGUST two-sample test: Powerful, interpretable, and fast. The New England Journal of Statistics in Data Science, (accepted), 2023.
  • Brown et al. [2022] B. Brown, K. Zhang, and X.-L. Meng. Belief in dependence: Leveraging atomic linearity in data bits for rethinking generalized linear models. arXiv preprint arXiv:2210.10852, 2022.
  • Cao and Bickel [2020] S. Cao and P. J. Bickel. Correlations with tailored extremal properties. arXiv preprint arXiv:2008.10177, 2020.
  • Chatterjee [2020] S. Chatterjee. A new coefficient of correlation. Journal of the American Statistical Association, page in press, 2020.
  • Cox [1975] D. Cox. A note on data-splitting for the evaluation of significance levels. Biometrika, 62(2):441–444, 1975.
  • Cox and Reid [2000] D. R. Cox and N. Reid. The theory of the design of experiments. CRC Press, 2000.
  • Cressie and Read [1984] N. Cressie and T. R. Read. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society: Series B (Methodological), 46(3):440–464, 1984.
  • Dai et al. [2020] C. Dai, B. Lin, X. Xing, and J. Liu. False discovery rate control via data splitting. arXiv preprint arXiv:2002.08542v1, 2020.
  • Deb et al. [2020] N. Deb, P. Ghosal, and B. Sen. Measuring association on topological spaces using kernels and geometric graphs. arXiv preprint arXiv:2010.01768, 2020.
  • Diaz Rivero and Dvorkin [2020] A. Diaz Rivero and C. Dvorkin. Flow-based likelihoods for non-gaussian inference. Phys. Rev. D, 102:103507, Nov 2020. doi: 10.1103/PhysRevD.102.103507. URL https://link.aps.org/doi/10.1103/PhysRevD.102.103507.
  • Donoho and Johnstone [1994] D. L. Donoho and J. M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994.
  • Efron and Tibshirani [1994] B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.
  • Escanciano [2006] J. C. Escanciano. Goodness-of-fit tests for linear and nonlinear time series models. Journal of the American Statistical Association, 101(474):531–541, 2006.
  • Fan and Huang [2001] J. Fan and L.-S. Huang. Goodness-of-fit tests for parametric regression models. Journal of the American Statistical Association, 96(454):640–652, 2001.
  • Fan et al. [2019] J. Fan, Y. Ke, Q. Sun, and W. Zhou. Farmtest: Factor-adjusted robust multiple testing with approximate false discovery control. Journal of the American Statistical Association, 114(528):1880–1893, 2019.
  • Geenens and de Micheaux [2020] G. Geenens and P. de Micheaux. The hellinger correlation. Journal of the American Statistical Association, page in press, 2020.
  • Genest and Verret [2005] C. Genest and F. Verret. Locally most powerful rank tests of independence for copula models. Nonparametric Statistics, 17(5):521–539, 2005.
  • Genest et al. [2006] C. Genest, J.-F. Quessy, and B. Rémillard. Goodness-of-fit procedures for copula models based on the probability integral transformation. Scandinavian Journal of Statistics, 33(2):337–366, 2006.
  • Genest et al. [2009] C. Genest, B. Rémillard, and D. Beaudoin. Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and economics, 44(2):199–213, 2009.
  • Genest et al. [2019] C. Genest, J. Nešlehová, B. Rémillard, and O. Murphy. Testing for independence in arbitrary distributions. Biometrika, 106(1):47–68, 2019.
  • Golubov et al. [2012] B. Golubov, A. Efimov, and V. Skvortsov. Walsh series and transforms: theory and applications, volume 64. Springer Science & Business Media, 2012.
  • González-Manteiga and Crujeiras [2013] W. González-Manteiga and R. M. Crujeiras. An updated review of goodness-of-fit tests for regression models. Test, 22:361–411, 2013.
  • Gorsky and Ma [2018] S. Gorsky and L. Ma. Multiscale Fisher’s independence test for multivariate dependence. arXiv preprint arXiv:1806.06777, 2018.
  • Gretton et al. [2007] A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola. A kernel statistical test of independence. In Advances in neural information processing systems, pages 585–592, 2007.
  • Harmuth [2013] H. Harmuth. Transmission of Information by Orthogonal Functions. Springer Berlin Heidelberg, 2013. ISBN 9783662132272. URL https://books.google.com/books?id=O67wCAAAQBAJ.
  • Hartigan [1969] J. Hartigan. Using subsample values as typical values. Journal of the American Statistical Association, 64(328):1303–1317, 1969.
  • Heller and Heller [2016] R. Heller and Y. Heller. Multivariate tests of association based on univariate tests. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 208–216. Curran Associates, Inc., 2016. URL http://papers.nips.cc/paper/6220-multivariate-tests-of-association-based-on-univariate-tests.pdf.
  • Heller et al. [2013] R. Heller, Y. Heller, and M. Gorfine. A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2):503–510, 2013.
  • Heller et al. [2016] R. Heller, Y. Heller, S. Kaufman, B. Brill, and M. Gorfine. Consistent distribution-free kk-sample and independence tests for univariate random variables. Journal of Machine Learning Research, 17(29):1–54, 2016.
  • Hoeffding [1948] W. Hoeffding. A non-parametric test of independence. The Annals of Mathematical Statistics, pages 546–557, 1948.
  • Hoffleit and Warren Jr [1987] D. Hoffleit and W. Warren Jr. The bright star catalogue. ADCBu, 1(4), 1987.
  • Huang [2015] Y. Huang. Projection test for high-dimensional mean vectors with optimal direction. PhD thesis, Dept. of Stat., The Pennsylvania State University, 2015.
  • Ignatiadis et al. [2016] N. Ignatiadis, B. Klaus, J. Zaugg, and W. Huber. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13(7):577–580, 2016.
  • Jager and Wellner [2007] L. Jager and J. A. Wellner. Goodness-of-fit tests via phi-divergences. Annals of Statistics, 35(5):2018–2053, 2007.
  • Janková et al. [2020] J. Janková, R. D. Shah, P. Bühlmann, and R. J. Samworth. Goodness-of-fit testing in high dimensional generalized linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):773–795, 2020.
  • Jin and Matteson [2018] Z. Jin and D. Matteson. Generalizing distance covariance to measure and test multivariate mutual dependence via complete and incomplete v-statistics. Journal of Multivariate Analysis, 168:304–322, 2018.
  • Kac [1959] M. Kac. Statistical independence in probability, analysis and number theory, volume 134. Mathematical Association of America, 1959.
  • Kaiser and Soumendra [2012] M. S. Kaiser and N. Soumendra. Goodness of fit tests for a class of markov random field models. The Annals of Statistics, 40(1):104–130, 2012.
  • Kim and Ramdas [2020] I. Kim and A. Ramdas. Dimension-agnostic inference. arXiv preprint arXiv:2011.05068, 2020.
  • Kinney and Atwal [2014] J. Kinney and G. Atwal. Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences, 111(9):3354–3359, 2014.
  • Kojadinovic and Holmes [2009] I. Kojadinovic and M. Holmes. Tests of independence among continuous random vectors based on cramér–von mises functionals of the empirical copula process. Journal of Multivariate Analysis, 100(6):1137–1154, 2009.
  • LeCam [1973] L. LeCam. Convergence of estimates under dimensionality restrictions. The Annals of Statistics, pages 38–53, 1973.
  • Lee et al. [2023] D. Lee, K. Zhang, and M. R. Kosorok. The binary expansion randomized ensemble test. Statistica Sinica, 33(4), 2023.
  • Lehmann and Romano [2006] E. L. Lehmann and J. P. Romano. Testing statistical hypotheses. Springer Science & Business Media, 2006.
  • Liang et al. [2001] J.-J. Liang, K.-T. Fang, F. Hickernell, and R. Li. Testing multivariate uniformity and its applications. Mathematics of Computation, 70(233):337–355, 2001.
  • Liu et al. [2019] W. Liu, X. Yu, and R. Li. Multiple-splitting projection test for high-dimensional mean vectors. Dept. of Stat., Stanford University, 2019. Unpublished.
  • Lynn [1973] P. A. Lynn. An introduction to the analysis and processing of signals. McMillan, 1973.
  • Ma and Mao [2019] L. Ma and J. Mao. Fisher exact scanning for dependency. Journal of the American Statistical Association, 114(525):245–258, 2019.
  • Miller and Siegmund [1982] R. Miller and D. Siegmund. Maximally selected chi square statistics. Biometrics, 38(4):1011–1016, 1982.
  • Nelsen [2007] R. B. Nelsen. An introduction to copulas. Springer Science & Business Media, 2007.
  • Neyman and Pearson [1933] J. Neyman and E. Pearson. Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231:289–337, 1933.
  • Pearl [1971] J. Pearl. Application of walsh transform to statistical analysis. IEEE Transactions on Systems, Man, and Cybernetics, SMC-1(2):111–119, 1971. ISSN 0018-9472. doi: 10.1109/TSMC.1971.4308267.
  • Perryman et al. [1997] M. Perryman, L. Lindegren, J. Kovalevsky, E. Hoeg, U. Bastian, P. Bernacca, M. Crézé, F. Donati, M. Grenon, M. Grewing, and F. Van Leeuwen. The hipparcos catalogue. Astronomy and Astrophysics, 323(1):49–52, 1997.
  • Pfister et al. [2016] N. Pfister, P. Bühlmann, B. Schölkopf, and J. Peters. Kernel-based tests for joint independence. arXiv preprint arXiv:1603.00285, 2016.
  • Politis et al. [1999] D. N. Politis, J. P. Romano, and M. Wolf. Subsampling. Springer, New York, 1999. ISBN 0387988548 9780387988542. URL http://www.worldcat.org/search?qt=worldcat_org_all&q=9780387988542.
  • Reshef et al. [2011] D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. Detecting novel associations in large data sets. Science, 334(6062):1518–1524, 2011.
  • Romano and DiCiccio [2019] J. Romano and C. DiCiccio. Multiple data splitting for testing. Department of Statistics, Stanford University, 2019. Unpublished manuscript.
  • Sejdinovic et al. [2014] D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 10(5):2263–2291, 2014.
  • Sen and Sen [2014] A. Sen and B. Sen. Testing independence and goodness-of-fit in linear models. Biometrika, 101(4):927–942, 2014.
  • Shi et al. [2020] H. Shi, M. Hallin, M. Drton, and F. Han. Rate-optimality of consistent distribution-free tests of independence based on center-outward ranks and signs. arXiv preprint arXiv:2007.02186, 2020.
  • Spearman [1904] C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72–101, 1904.
  • Stephens [1976] M. A. Stephens. Asymptotic results for goodness-of-fit statistics with unknown parameters. The Annals of Statistics, 4(2):357–369, 1976.
  • Székely et al. [2007] G. J. Székely, M. L. Rizzo, and N. K. Bakirov. Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6):2769–2794, 12 2007. doi: 10.1214/009053607000000505. URL http://dx.doi.org/10.1214/009053607000000505.
  • Wasserman [2006] L. Wasserman. All of Nonparametric Statistics. Springer Texts in Statistics. Springer New York, 2006. ISBN 9780387306230. URL https://books.google.com/books?id=MRFlzQfRg7UC.
  • Wasserman and Roeder [2009] L. Wasserman and K. Roeder. High dimensional variable selection. The Annals of Statistics, 37(5A):2178–2201, 2009.
  • Xiang et al. [2022] S. Xiang, W. Zhang, S. Liu, C. M. Perou, K. Zhang, and J. Marron. Pairwise nonlinear dependence analysis of genomic data. Annals of Applied Statistics, 2022.
  • Zhang [2002] J. Zhang. Powerful goodness-of-fit tests based on the likelihood ratio. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(2):281–294, 2002.
  • Zhang [2019] K. Zhang. BET on independence. Journal of the American Statistical Association, 114(528):1620–1637, 2019. doi: 10.1080/01621459.2018.1537921. URL https://doi.org/10.1080/01621459.2018.1537921.
  • Zhao et al. [2023a] Z. Zhao, K. Zhang, and W. Zhou. Bit by bit: The universal binary coding of random variables in statistics. 2023a. Technical report.
  • Zhao et al. [2023b] Z. Zhao, W. Zhang, M. Baiocchi, Y. Li, and K. Zhang. Fast, flexible, and powerful: Introducing a scalable, bitwise framework for non-parametric testing for dependence structure. Techinical report, 2023b.
  • Zheng et al. [2012] S. Zheng, N.-Z. Shi, and Z. Zhang. Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, 107(499):1239–1252, 2012.
  • Zhu et al. [2017] L. Zhu, K. Xu, R. Li, and W. Zhong. Projection correlation between two random vectors. Biometrika, 104(4):829–843, 2017.