This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Delaunay Weighted Two-sample Test for High-dimensional Data by Incorporating Geometric Information

Jiaqi Gu
Department of Neurology and Neurological Sciences, Stanford University
and
Ruoxu Tan111Email: [email protected]
School of Mathematical Sciences, Key Laboratory of
Intelligent Computing and Applications(Ministry of Education),
School of Economics and Management, Tongji University
and
Guosheng Yin
Department of Statistics and Actuarial Science, University of Hong Kong
Abstract

Two-sample hypothesis testing is a fundamental problem with various applications, which faces new challenges in the high-dimensional context. To mitigate the issue of the curse of dimensionality, high-dimensional data are typically assumed to lie on a low-dimensional manifold. To incorporate geometric informtion in the data, we propose to apply the Delaunay triangulation and develop the Delaunay weight to measure the geometric proximity among data points. In contrast to existing similarity measures that only utilize pairwise distances, the Delaunay weight can take both the distance and direction information into account. A detailed computation procedure to approximate the Delaunay weight for the unknown manifold is developed. We further propose a novel nonparametric test statistic using the Delaunay weight matrix to test whether the underlying distributions of two samples are the same or not. Applied on simulated data, the new test exhibits substantial power gain in detecting differences in principal directions between distributions. The proposed test also shows great power on a real dataset of human face images.


Keywords: Delaunay triangulation; geometric proximity; high dimension; manifold learning; permutation test.

1 Introduction

The two-sample test on whether the two underlying distributions are the same or not is a fundamental problem in statistics and machine learning with broad applications in change point detection (Cao et al.,, 2018; Chen,, 2019), goodness-of-fit evaluation (Chwialkowski et al.,, 2016; Arias-Castro et al.,, 2018) and experimental design (Morgan and Rubin,, 2012). In the era of big data, there are emerging challenges that the power of classical two-sample tests (e.g., Hotelling’s TT-squared test, Wald test or likelihood ratio test) decreases or even vanishes as the data dimension increases. Under the high-dimensional setting, several methods have been developed to test a simpler hypothesis on whether two population means are the same. For example, under the sparse alternative that mean vectors of two samples only differ in a small number of dimensions, Cai et al., 2013b developed a maximum-type test statistic. To enhance the power and robustness to extreme values in detecting sparse mean differences, a parametric bootstrap approach with feature screening is proposed by Chang et al., (2017), while Wang et al., (2019) incorporated the trimmed mean approach to develop a robust test. Liu et al., (2022) developed a projection test by assuming the sparsity of the optimal projection direction. However, these methods are not consistent against general alternatives, e.g., the two populations possess the same mean but differ in the covariance structures.

Nonparametric two-sample testing for general differences between distributions is a historical research topic with rich literature. For univariate data, the Kolmogorov–Smirnov test (Kolmogorov,, 1933; Smirnov,, 1948), the Wilcoxon–Mann–Whitney test (Wilcoxon,, 1945; Mann and Whitney,, 1947) and the Wald–Wolfowitz runs test (Wald and Wolfowitz,, 1940) are early developments on this topic. Although such methods may be directly applicable to multivariate settings (Bickel,, 1969), they would perform poorly unless the total sample size grows at an exponential rate with the dimension. To circumvent such difficulty, Friedman and Rafsky, (1979) developed a two-sample test based on the minimum spanning tree (MST) that connects all data points with the minimal total length of edges. Given an MST of the pooled sample, the test statistic is constructed as the number of edges connecting data points from different samples, which is shown to be consistent against general alternatives. More recently, many nonparametric two-sample tests have been proposed for multivariate/high-dimensional data, including: (i) distance/energy-based tests based on pairwise distances of the pooled sample (Székely and Rizzo,, 2004; Jurečková and Kalina,, 2012; Marozzi,, 2015); (ii) graph-based tests, e.g., the kk-nearest neighbor (kk-NN) graph (Schilling,, 1986; Henze,, 1988; Hall and Tajvidi,, 2002) and the kk-MST graph (Chen and Friedman,, 2017; Chen et al.,, 2018); (iii) kernel tests via a decaying kernel of pairwise Euclidean distances of the pooled sample (Gretton et al.,, 2009, 2012; Cheng and Xie,, 2021; Yan and Zhang,, 2023); and (iv) regression tests by connecting the binary classification and two-sample testing (Kim et al.,, 2019; Hediger et al.,, 2022).

Some of the aforementioned works can handle high-dimensional data. To mitigate the curse of dimensionality, high-dimensional data are often assumed to lie on an unknown low-dimensional manifold, which is reasonable in many applications, such as bioinformatics (Moon et al.,, 2018), image data analysis (Pless and Souvenir,, 2009) and natural language processing (Zhao et al.,, 2021). Performances of the nonparametric tests under the low-dimensional manifold setting have been studied in the literature. For example, it is shown that the kernel test (Cheng and Xie,, 2021) and the local regression test (Kim et al.,, 2019) are adaptive to the intrinsic dimension of the manifold by properly choosing the tuning parameter. Arias-Castro et al., (2018) also showed that the multivariate chi-squared test is adaptive to the intrinsic dimension of the manifold, while their work is mainly focused on theoretical development rather than practical implementation. However, none of these tests directly take the manifold structure into account, while our work aims to fill this gap.

In particular, we develop a new nonparametric two-sample test, named the Delaunay weighted test, for high-dimensional data under the low-dimensional manifold assumption. Following the kk-NN test statistic proposed by Schilling, (1986), we propose the test statistic as the sum of all within-group proximities of the pooled sample. The proximity is measured by our newly developed Delaunay weight instead of the kk-NN. By incorporating the Delaunay triangulation on a Riemannian manifold (Leibon and Letscher,, 2000), the Delaunay weight measures the geometric proximity among data points by taking both the geodesic distance and the relative direction into account. In practice, the Delaunay weight cannot be computed exactly because the manifold is usually unknown. As a remedy, we compute the Delaunay weight matrix of low-dimensional manifold representations for approximate inference. We modify a robust manifold learning algorithm (Dimeglio et al.,, 2014) and apply the classical multi-dimensional scaling (MDS, Kruskal,, 1964) to obtain the manifold representations. The stereographic projected DELAUNAYSPARSE algorithm is used to find the Delaunay simplicies for computing the Delaunay weight. The pp-value of our test is obtained by a permutation procedure, and our Delaunay weighted test is applicable as long as the sample size is larger than the (estimated) intrinsic dimension of the underlying manifold. Compared with the kernel test and the local regression test based on local distances that are adaptive to the intrinsic dimension of the manifold, ours further takes the relative direction among data points into account, and thus it is more efficient in detecting the direction difference in addition to the location difference of two distributions. Numerical experiments on both synthetic and real data reveal that our Delaunay weighted test outperforms existing approaches, especially when the two distributions only differ in the principal directions of covariance matrices.

The rest of this article is organized as follows. Section 2 formulates the two-sample testing problem for high-dimensional data on an unknown manifold and introduces necessary geometric concepts on the manifold. The Delaunay weighted test is proposed in Section 3, including development of Delaunay weight and the test statistic, computational strategies and the permutation procedure for inference. Section 4 investigates the theoretical properties of the proposed test, while numerical experiments in Section 5 demonstrate the practical advantages of our method. Supplementary material contains details of our computational strategies, proofs of theorems and additional experiments.

2 Delaunay Triangulation on Manifold

Consider two samples 𝕏={x1,,xn1}D\mathbb{X}=\{\textbf{x}_{1},\ldots,\textbf{x}_{n_{1}}\}\subset\mathcal{R}^{D} and 𝕐={y1,,yn0}D\mathbb{Y}=\{\textbf{y}_{1},\ldots,\textbf{y}_{n_{0}}\}\subset\mathcal{R}^{D} of i.i.d. data from two distributions FF and GG, respectively. Our goal is to test H0:F=GH_{0}:F=G versus H1:FGH_{1}:F\neq G. Let \mathbb{Z} be the union of two samples, =𝕏𝕐={z1,,zn}\mathbb{Z}=\mathbb{X}\cup\mathbb{Y}=\{\textbf{z}_{1},\ldots,\textbf{z}_{n}\}, and let 𝜹={δ1,,δn}\boldsymbol{\delta}=\{\delta_{1},\ldots,\delta_{n}\} denote the group indicator of the pooled sample, where n=n1+n0n={n_{1}}+{n_{0}}, δi=1\delta_{i}=1 if zi𝕏\textbf{z}_{i}\in\mathbb{X} and δi=0\delta_{i}=0 if zi𝕐\textbf{z}_{i}\in\mathbb{Y}, for i=1,,ni=1,\ldots,n. We assume that both FF and GG are supported on a dd-dimensional geodesically convex Riemannian manifold \mathcal{M} embedded in the ambient space D\mathcal{R}^{D} with dDd\ll D (Arias-Castro et al.,, 2018; Kim et al.,, 2019; Cheng and Xie,, 2021).

We first review a few notions in Riemannian geometry. Let (a,b)\ell_{\mathcal{M}}(\textbf{a},\textbf{b}) denote the shortest geodesic path between two points a,b\textbf{a},\textbf{b}\in\mathcal{M}, whose length is the geodesic distance, d(a,b)d_{\mathcal{M}}(\textbf{a},\textbf{b}), induced by the ambient space D\mathcal{R}^{D}. The manifold \mathcal{M} is assumed to be geodesically convex meaning that the path (a,b)\ell_{\mathcal{M}}(\textbf{a},\textbf{b}) uniquely exists for any a,b\textbf{a},\textbf{b}\in\mathcal{M}. Let Ta\textbf{T}_{\textbf{a}}\mathcal{M} denote the tangent space of \mathcal{M} at point a\textbf{a}\in\mathcal{M}. As illustrated in Figure 1, the logarithmic map loga:Ta\log_{\textbf{a}}:\mathcal{M}\to\textbf{T}_{\textbf{a}}\mathcal{M} is defined as the vector starting from a whose length equals d(a,b)d_{\mathcal{M}}(\textbf{a},\textbf{b}) and direction is the derivative of (a,b)\ell_{\mathcal{M}}(\textbf{a},\textbf{b}) at a. Since (a,b)\ell_{\mathcal{M}}(\textbf{a},\textbf{b}) always exists uniquely for any two points a,b\textbf{a},\textbf{b}\in\mathcal{M}, the domain of the logarithmic map loga()\log_{\textbf{a}}(\cdot) is the whole \mathcal{M}, for all a\textbf{a}\in\mathcal{M}.

Refer to caption
Figure 1: A graphical illustration of the logarithmic map loga(b)Ta2\log_{\textbf{a}}(\textbf{b})\in\textbf{T}_{\textbf{a}}\mathcal{M}\equiv\mathcal{R}^{2}, where \mathcal{M} is a hemisphere embedded in 3\mathcal{R}^{3}.

We can generalize the geometric concepts from the Euclidean space to the geodesically convex Riemannian manifold \mathcal{M} as follows.

  • Convex set: A subset 𝒞\mathcal{C}\subset\mathcal{M} is convex on \mathcal{M} if for any two points a,b𝒞\textbf{a},\textbf{b}\in\mathcal{C}, (a,b)𝒞\ell_{\mathcal{M}}(\textbf{a},\textbf{b})\subset\mathcal{C}.

  • Convex hull: For a set of points \mathbb{Z}\subset\mathcal{M}, the convex hull of \mathbb{Z}, denoted by ()\mathcal{H}_{\mathcal{M}}(\mathbb{Z})\subset\mathcal{M}, is the intersection of all convex sets on \mathcal{M} that contain \mathbb{Z}.

  • Ball and sphere: A ball on \mathcal{M} with the center z\textbf{z}\in\mathcal{M} and radius r>0r>0 is defined as (z,r)={z;d(z,z)r},\mathcal{B}_{\mathcal{M}}(\textbf{z},r)=\{\textbf{z}^{*}\in\mathcal{M};d_{\mathcal{M}}(\textbf{z},\textbf{z}^{*})\leq r\}, and the corresponding sphere is defined as ¯(z,r)={z;d(z,z)=r}.\overline{\mathcal{B}}_{\mathcal{M}}(\textbf{z},r)=\{\textbf{z}^{*}\in\mathcal{M};d_{\mathcal{M}}(\textbf{z},\textbf{z}^{*})=r\}.

  • Simplex and circumscribing ball: For k=0,,dk=0,\ldots,d, the kk-simplex of k+1k+1 points z1,,zk+1\textbf{z}_{1},\ldots,\textbf{z}_{k+1}\in\mathcal{M} is the convex hull ({z1,,zk+1})\mathcal{H}_{\mathcal{M}}(\{\textbf{z}_{1},\ldots,\textbf{z}_{k+1}\}), and the circumscribing ball of z1,,zk+1\textbf{z}_{1},\ldots,\textbf{z}_{k+1} is the smallest radius ball among all balls on \mathcal{M} containing z1,,zk+1\textbf{z}_{1},\ldots,\textbf{z}_{k+1}. In particular, the 11-simplex of points {z1,z2}\{\textbf{z}_{1},\textbf{z}_{2}\} is the shortest geodesic path (z1,z2)\ell_{\mathcal{M}}(\textbf{z}_{1},\textbf{z}_{2}).

  • Projection: For a set of points \mathbb{Z}\subset\mathcal{M}, the projection p\textbf{p}^{*} of p\textbf{p}\in\mathcal{M} on ()\mathcal{H}_{\mathcal{M}}(\mathbb{Z}) is defined as p=argminq()d(p,q)\textbf{p}^{*}=\mathop{\arg\min}_{\textbf{q}\in\mathcal{H}_{\mathcal{M}}(\mathbb{Z})}d_{\mathcal{M}}(\textbf{p},\textbf{q}), in the case that the right hand side exists uniquely; otherwise, we first find p=argminqD()dD(p,q)\textbf{p}^{\dagger}=\mathop{\arg\min}_{\textbf{q}\in\mathcal{H}_{\mathcal{R}^{D}}(\mathbb{Z})}d_{\mathcal{R}^{D}}(\textbf{p},\textbf{q}) and define p\textbf{p}^{*} as the Euclidean projection of p\textbf{p}^{\dagger} onto ()\mathcal{H}_{\mathcal{M}}(\mathbb{Z}), which exists uniquely almost everywhere with respect to the Lebesgue measure of D\mathcal{R}^{D} (Bhattacharya and Patrangenaru,, 2003).

Following Leibon and Letscher, (2000), the Delaunay triangulation can be defined on \mathcal{M} as follows.

Definition 1.

For a set of nn points ={z1,,zn}\mathbb{Z}=\{\textbf{z}_{1},\ldots,\textbf{z}_{n}\}\subset\mathcal{M}, the Delaunay triangulation of \mathbb{Z}, denoted by 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}), is a mesh of mm dd-simplices {𝒮1,,𝒮m}\{\mathcal{S}_{1},\ldots,\mathcal{S}_{m}\} satisfying:

  1. 1.

    For j=1,,mj=1,\ldots,m, the set of d+1d+1 vertices of simplex 𝒮j\mathcal{S}_{j}, denoted by 𝕍(𝒮j)\mathbb{V}(\mathcal{S}_{j}), is a subset of \mathbb{Z} and is not contained in any kk-simplex (k=0,,d1k=0,\ldots,d-1) on \mathcal{M}.

  2. 2.

    Different simplices are disjoint except on their shared boundaries.

  3. 3.

    The union 𝒮1𝒮m\mathcal{S}_{1}\cup\cdots\cup\mathcal{S}_{m} is the convex hull of \mathbb{Z} on \mathcal{M}, ()\mathcal{H}_{\mathcal{M}}(\mathbb{Z}).

  4. 4.

    The empty ball property: For j=1,,mj=1,\ldots,m, the circumscribing ball of 𝒮j\mathcal{S}_{j} on \mathcal{M} contains no points of \mathbb{Z} except on its sphere.

We refer simplices 𝒮1,,𝒮m𝒟𝒯()\mathcal{S}_{1},\ldots,\mathcal{S}_{m}\in\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}) as the Delaunay simplices. According to Leibon and Letscher, (2000), the Delaunay triangulation 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}) is unique if and only if the pooled sample \mathbb{Z} is generic, i.e., any d+2d+2 points in \mathbb{Z} do not lie on a sphere on \mathcal{M} and any k+2k+2 points do not lie in a kk-simplex on \mathcal{M} (k=1,,d1k=1,\ldots,d-1). As long as both FF and GG have Lipschitz continuous densities ff and gg with respect to the measure on \mathcal{M} induced by the Lebesgue measure of the ambient Euclidean space, the pooled sample \mathbb{Z} is generic with probability one. Being the geometric dual of the Voronoi diagram, the Delaunay triangulation generates a mesh of simplices on \mathcal{M} that are most regularized in shape. Figure 2 shows the empty ball property of the Delaunay triangulation 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}) for =2\mathcal{M}=\mathcal{R}^{2}, and 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}) maximizes the minimum angle in all the triangles (22-simplices) over all possible triangulations of \mathbb{Z} (Sibson,, 1978).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: (a) Graphical illustration of the empty ball property of the Delaunay triangulation with =2\mathcal{M}=\mathcal{R}^{2}; (b)–(c) The Delaunay triangulation versus a random triangulation of the same \mathbb{Z}.

3 Delaunay Weighted Test

3.1 Delaunay Weight

Considering the simplest case =d\mathcal{M}=\mathcal{R}^{d}, for each internal point pd()=𝒮1𝒮m\textbf{p}\in\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z})=\mathcal{S}_{1}\cup\cdots\cup\mathcal{S}_{m}, there exists a Delaunay simplex 𝒮(p;)𝒟𝒯d()\mathcal{S}(\textbf{p};\mathbb{Z})\in\mathcal{DT}_{\mathcal{R}^{d}}(\mathbb{Z}) such that p𝒮(p;)\textbf{p}\in\mathcal{S}(\textbf{p};\mathbb{Z}). We define the Delaunay weight vector for pd()\textbf{p}\in\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}) with respect to \mathbb{Z} as a vector w(p;)=(w1(p;),,wn(p;))𝖳\textbf{w}(\textbf{p};\mathbb{Z})=(w_{1}(\textbf{p};\mathbb{Z}),\ldots,w_{n}(\textbf{p};\mathbb{Z}))^{\mathsf{T}} such that

  • (a)

    wi(p;)[0,1]w_{i}(\textbf{p};\mathbb{Z})\in[0,1] if zi𝕍(𝒮(p;)),\textbf{z}_{i}\in\mathbb{V}(\mathcal{S}(\textbf{p};\mathbb{Z})), and wi(p;)=0w_{i}(\textbf{p};\mathbb{Z})=0 if zi𝕍(𝒮(p;))\textbf{z}_{i}\notin\mathbb{V}(\mathcal{S}(\textbf{p};\mathbb{Z})), where 𝕍(𝒮(p;))\mathbb{V}(\mathcal{S}(\textbf{p};\mathbb{Z})) denotes the vertex set of the simplex 𝒮(p;)\mathcal{S}(\textbf{p};\mathbb{Z});

  • (b)

    i=1nwi(p;)=1\sum_{i=1}^{n}w_{i}(\textbf{p};\mathbb{Z})=1;

  • (c)

    i=1nwi(p;)zi=p\sum_{i=1}^{n}w_{i}(\textbf{p};\mathbb{Z})\cdot\textbf{z}_{i}=\textbf{p}.

According to Abadie and L’Hour, (2021), the Delaunay weight vector w(p;)\textbf{w}(\textbf{p};\mathbb{Z}) for pd()\textbf{p}\in\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}) corresponds to the convex combination of \mathbb{Z} with the smallest compound discrepancy i=1nwipzi2\sum_{i=1}^{n}w_{i}\|\textbf{p}-\textbf{z}_{i}\|^{2} among all convex combinations satisfying p=i=1nwizi\textbf{p}=\sum_{i=1}^{n}w_{i}\textbf{z}_{i}. That is, w(p;)\textbf{w}(\textbf{p};\mathbb{Z}) selects at most d+1d+1 points in \mathbb{Z} (those with wi>0w_{i}>0) that are convexly combined to form p and closest to p in the sense of the minimal compound discrepancy.

Conditions (a)–(c) only define the Delaunay weight vector for an internal point pd()\textbf{p}\in\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}). For generalization to an external point pd()\textbf{p}\notin\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}), we utilize the projection idea from Section 2.3 of Abadie and L’Hour, (2021). Let w(λ)(p;)=(w1(λ)(p;),,wn(λ)(p;))𝖳\textbf{w}^{(\lambda)}(\textbf{p};\mathbb{Z})=(w^{(\lambda)}_{1}(\textbf{p};\mathbb{Z}),\ldots,w^{(\lambda)}_{n}(\textbf{p};\mathbb{Z}))^{\mathsf{T}} be the solution to the optimization problem,

minpi=1nwizi2+λi=1nwipzi2,\min\Bigg{\|}\textbf{p}-\sum_{i=1}^{n}w_{i}\textbf{z}_{i}\Bigg{\|}^{2}+\lambda\sum_{i=1}^{n}w_{i}\|\textbf{p}-\textbf{z}_{i}\|^{2},\;

subject to w=(w1,,wn)𝖳[0,1]n\textbf{w}=(w_{1},\ldots,w_{n})^{\mathsf{T}}\in[0,1]^{n} and i=1nwi=1\sum_{i=1}^{n}w_{i}=1, where \|\cdot\| denotes the Euclidean norm. The Delaunay weight vector w(p;)\textbf{w}(\textbf{p};\mathbb{Z}) for pd()\textbf{p}\in\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}) can be rewritten as w(p;)=limλ0w(λ)(p;)\textbf{w}(\textbf{p};\mathbb{Z})=\lim\limits_{\lambda\to 0}\textbf{w}^{(\lambda)}(\textbf{p};\mathbb{Z}). For pd()\textbf{p}\notin\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}), Abadie and L’Hour, (2021) showed that limλ0i=1nwi(λ)(p;)zi=p,\lim\limits_{\lambda\to 0}\sum_{i=1}^{n}{w}_{i}^{(\lambda)}(\textbf{p};\mathbb{Z})\textbf{z}_{i}=\textbf{p}^{*}, where p\textbf{p}^{*} is the projection of p on the convex hull d()\mathcal{H}_{\mathcal{R}^{d}}(\mathbb{Z}). Therefore, we have

w(p;)=limλ0w(λ)(p;),pd.\textbf{w}(\textbf{p}^{*};\mathbb{Z})=\lim\limits_{\lambda\to 0}\textbf{w}^{(\lambda)}(\textbf{p};\mathbb{Z}),\quad\forall\textbf{p}\in\mathcal{R}^{d}. (1)

Based on (1), we can generalize the definition of the Delaunay weight for all pd\textbf{p}\in\mathcal{R}^{d} by substituting condition (c) with

  • (c)

    i=1nwi(p;)zi=p.\sum_{i=1}^{n}w_{i}(\textbf{p};\mathbb{Z})\cdot\textbf{z}_{i}=\textbf{p}^{*}.

However, when \mathcal{M} is not Euclidean, condition (c) is no longer applicable because it relies on the linear operation of Cartesian coordinates, which is not well-defined on \mathcal{M}. To generalize the Delaunay weight vector to nonlinear manifolds, we make use of the logarithmic map.

Definition 2.

For a set ={𝐳1,,𝐳n}\mathbb{Z}=\{\mathbf{z}_{1},\ldots,\mathbf{z}_{n}\}\subset\mathcal{M}, the Delaunay weight vector of p\textbf{p}\in\mathcal{M} is w(p;)=(w1(p;),,wn(p;))𝖳\textbf{w}(\textbf{p};\mathbb{Z})=(w_{1}(\textbf{p};\mathbb{Z}),\ldots,w_{n}(\textbf{p};\mathbb{Z}))^{\mathsf{T}} satisfying (a), (b) and

  1. (c)

    i=1nwi(p;)logp(zi)=𝟎\sum_{i=1}^{n}w_{i}(\textbf{p};\mathbb{Z})\log_{\textbf{p}^{*}}(\textbf{z}_{i})=\mathbf{0},

where p\textbf{p}^{*} is the projection of p on the convex hull ()\mathcal{H}_{\mathcal{M}}(\mathbb{Z}) and 𝒮(p;)𝒟𝒯()\mathcal{S}(\textbf{p};\mathbb{Z})\in\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}) in (a) is the Delaunay simplex containing p\textbf{p}^{*}.

For the special case that =d\mathcal{M}=\mathcal{R}^{d}, (c) and (c) are equivalent because

i=1nwi(p;)pzi=pp\displaystyle\sum_{i=1}^{n}w_{i}(\textbf{p};\mathbb{Z})\cdot\overrightarrow{\textbf{p}^{*}\textbf{z}_{i}}=\overrightarrow{\textbf{p}^{*}\textbf{p}^{*}} i=1nwi(p;)logp(zi)=logp(p),\displaystyle\Longleftrightarrow\sum_{i=1}^{n}w_{i}(\textbf{p};\mathbb{Z})\log_{\textbf{p}^{*}}(\textbf{z}_{i})=\log_{\textbf{p}^{*}}(\textbf{p}^{*}), (2)

where pzi\overrightarrow{\textbf{p}^{*}\textbf{z}_{i}} denotes the vector from p\textbf{p}^{*} to zi\textbf{z}_{i}. Therefore, condition (c) is indeed a generalization of (c) on a nonlinear manifold. Since the Delaunay triangulation 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}) is unique for generic \mathbb{Z} (Leibon and Letscher,, 2000), the Delaunay weight vector w(p;)\textbf{w}(\textbf{p};\mathbb{Z}) is unique for generic \mathbb{Z} and any point p\textbf{p}\in\mathcal{M}, as shown in Section S1 of the supplementary material.

Based on Definition 2, the Delaunay weight matrix of \mathbb{Z} is defined as 𝚪=(γij)n×n\boldsymbol{\Gamma}_{\mathbb{Z}}=(\gamma_{ij})_{n\times n}, with

γii=0,(γi1,,γi,i1,γi,i+1,,γin)𝖳=w(zi;{zi}),i=1,,n,\gamma_{ii}=0,\quad(\gamma_{i1},\ldots,\gamma_{i,i-1},\gamma_{i,i+1},\ldots,\gamma_{in})^{\mathsf{T}}=\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}),\quad i=1,\ldots,n, (3)

where γij\gamma_{ij} measures the geometric proximity of zj\textbf{z}_{j} to zi\textbf{z}_{i} and {zi}\mathbb{Z}\setminus\{\textbf{z}_{i}\} represents the remaining set of \mathbb{Z} excluding zi\textbf{z}_{i}. Because γij\gamma_{ij} and γji\gamma_{ji} are usually not the same, the Delaunay weight matrix 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} is a directed weighted adjacency matrix of \mathbb{Z}. Unlike the kernel, local regression and kk-NN based approaches which only consider local distances to zi\textbf{z}_{i}, the Delaunay weight w(zi;{zi})\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) evaluates the geometric proximity of the vertices of the Delaunay simplex 𝒮(zi;{zi})\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) to zi\textbf{z}_{i} using both the distance and direction information. In particular, γij\gamma_{ij} and γik\gamma_{ik} can be different even if logzi(zj)=logzi(zk)\|\log_{\textbf{z}_{i}^{*}}(\textbf{z}_{j})\|=\|\log_{\textbf{z}_{i}^{*}}(\textbf{z}_{k})\|. Indeed, we see from Definition 2 that w(zi;{zi})\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) depends on the vectors logzi(zj)\log_{\textbf{z}_{i}^{*}}(\textbf{z}_{j}) rather than merely the distances logzi(zj)\|\log_{\textbf{z}_{i}^{*}}(\textbf{z}_{j})\|, for jij\neq i.

Refer to caption
Figure 3: Graphical illustration of the Delaunay simplicies (triangles) 𝒮(zi;{zi})\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) for i=1,,4i=1,\ldots,4 (top panel) and for i=5,,8i=5,\ldots,8 (bottom panel) computed from a random sample ={z1,,z8}\mathbb{Z}=\{\textbf{z}_{1},\ldots,\textbf{z}_{8}\} on 2\mathcal{R}^{2}.
Refer to caption
Figure 4: The Delaunay weight matrix 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} computed from the \mathbb{Z} in Figure 3, with the group indicators δi=1\delta_{i}=1, for i=1,,4i=1,\ldots,4, and δi=0\delta_{i}=0 otherwise. The test statistic TDW{T}_{\text{DW}} equals the sum of all the Delaunay weights in the two diagonal blocks of 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}}.

To gain more insights on the Delaunay weight, we see that, when zi\textbf{z}_{i} is an inner point of the Delaunay simplex 𝒮(zi;{zi})\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}), the non-zero elements of w(zi;{zi})\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) correspond to the points in {zi}\mathbb{Z}\setminus\{\textbf{z}_{i}\} from all directions surrounding zi\textbf{z}_{i}. In contrast, under sparse or nonuniform sampling, nearest neighbors of zi\textbf{z}_{i} cannot cover all directions from zi\textbf{z}_{i} because they tend to cluster in certain directions. We illustrate this point using an example with a sample ={z1,,z8}\mathbb{Z}=\{\textbf{z}_{1},\ldots,\textbf{z}_{8}\} in =2\mathcal{M}=\mathcal{R}^{2}. Figure 3 shows the Delaunay simplex 𝒮(zi;{zi})\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) that contains the projection of zi\textbf{z}_{i} on the convex hull ({zi})\mathcal{H}_{\mathcal{M}}(\mathbb{Z}\setminus\{\textbf{z}_{i}\}). Based on 𝒮(zi;{zi})\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) (i=1,,8i=1,\ldots,8), the weight vectors w(zi;{zi})\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) are concatenated to form the Delaunay weight matrix 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}}, as shown in Figure 4. For z2({z2})\textbf{z}_{2}\in\mathcal{H}_{\mathcal{M}}(\mathbb{Z}\setminus\{\textbf{z}_{2}\}), the three nearest neighbors of z2\textbf{z}_{2} are z3,z7\textbf{z}_{3},\textbf{z}_{7} and z8\textbf{z}_{8}, which are all from the top-right direction of z2\textbf{z}_{2} and the information from the bottom-left direction is not captured. In contrast, the Delaunay simplex 𝒮(z2;{z2})\mathcal{S}(\textbf{z}_{2};\mathbb{Z}\setminus\{\textbf{z}_{2}\}) is formed by z5,z7\textbf{z}_{5},\textbf{z}_{7} and z8\textbf{z}_{8}, which covers all directions from z2\textbf{z}_{2}, and the relative directions of z2zk\overrightarrow{\textbf{z}_{2}\textbf{z}_{k}} (k=5,7,8)(k=5,7,8) are used in constructing w(z2;{z2})\textbf{w}(\textbf{z}_{2};\mathbb{Z}\setminus\{\textbf{z}_{2}\}). Therefore, the Delaunay weight utilizes the local geometric information in a more comprehensive way than the kk-NN.

3.2 Test Statistic

For the pooled sample ={z1,,zn}\mathbb{Z}=\{\textbf{z}_{1},\ldots,\textbf{z}_{n}\} with the group indicator 𝜹={δ1,,δn}\boldsymbol{\delta}=\{\delta_{1},\ldots,\delta_{n}\}, the kk-NN test statistic in Schilling, (1986) is

j=1nijl=1kI(zj is the l-th nearest neighbor of zi)×I(δi=δj),\sum_{j=1}^{n}\sum_{i\neq j}\sum_{l=1}^{k}I(\textbf{z}_{j}\text{ is the }l\text{-th nearest neighbor of }\textbf{z}_{i})\times I(\delta_{i}=\delta_{j}),

which has inspired a family of nonparametric two-sample tests for detecting distribution differences via a proximity graph of the pooled sample (Rosenbaum,, 2005; Chen and Friedman,, 2017; Chen et al.,, 2018). Based on the Delaunay weight γij\gamma_{ij}, we construct the test statistic

TDW=j=1nijγij×I(δi=δj),{T}_{\text{DW}}=\sum_{j=1}^{n}\sum_{i\neq j}\gamma_{ij}\times I(\delta_{i}=\delta_{j}), (4)

where the new geometric proximity measure γij\gamma_{ij} accounts for the manifold structure and direction information.

3.3 Approximation of Delaunay Weight

In practice, the underlying manifold \mathcal{M} is typically unknown, so that exact computation of the Delaunay weight matrix 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} via Definition 2 (and thus TDWT_{\text{DW}}) is infeasible. Because 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} is constructed by Delaunay weight vectors w(zi;{zi})\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}) which only measure the geometric proximity between zi\textbf{z}_{i} and the points locally surrounding zi\textbf{z}_{i} (i.e., the vertices of 𝒮(zi;{zi})\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\})), we can utilize manifold learning techniques to approximate 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} by 𝚪~\boldsymbol{\Gamma}_{\widetilde{\mathbb{Z}}}, where ~={z~1,,z~n}d\widetilde{\mathbb{Z}}=\{\tilde{\textbf{z}}_{1},\ldots,\tilde{\textbf{z}}_{n}\}\subset\mathcal{R}^{d} is a low-dimensional Euclidean representation of \mathbb{Z} whose Delaunay weight matrix can be computed. Generally speaking, any nonlinear dimension reduction technique that aims to preserve the local geometric structure of \mathbb{Z} can be applied to obtain ~\widetilde{\mathbb{Z}}. Often, a kk-NN nonlinear dimension reduction technique may yield a proximity graph of ~\widetilde{\mathbb{Z}} with more than one connected components. Under such cases, elements in 𝚪~\boldsymbol{\Gamma}_{\widetilde{\mathbb{Z}}} corresponding to all pairs of points from different components must be zero, which incurs information loss and power deterioration. To circumvent this issue, we suggest adopting a procedure that can return a connected low-dimensional representation ~\widetilde{\mathbb{Z}}. Specifically, we modify the robust manifold learning algorithm in Dimeglio et al., (2014), as detailed in Section S2 of the supplementary material, to estimate the geodesic distances d(zi,zj){d}_{\mathcal{M}}(\textbf{z}_{i},\textbf{z}_{j}) and apply the classical MDS (Kruskal,, 1964) to obtain ~d\widetilde{\mathbb{Z}}\subset\mathcal{R}^{{d}}. For the unknown intrinsic dimension dd, any existing method can be applied to estimate dd, e.g., Facco et al., (2017) as summarized in Section S2 of the supplementary material. Our numerical experiments show that our method is robust to estimation of dd.

To compute 𝚪~\boldsymbol{\Gamma}_{\widetilde{\mathbb{Z}}}, we note the equivalence of conditions (c) and (c) in (2) for the low-dimensional Euclidean representation ~\widetilde{\mathbb{Z}}. Once we know the Delaunay simplex 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}) that contains z~i\tilde{\textbf{z}}_{i}^{*}, which is the projection of z~i\tilde{\textbf{z}}_{i} on the convex hull d(~{z~i})\mathcal{H}_{\mathcal{R}^{d}}(\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}), we can easily compute w(z~i;~{z~i})=(γ~i1,,γ~i,i1,γ~i,i+1,,γ~in)𝖳\textbf{w}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\})=(\tilde{\gamma}_{i1},\ldots,\tilde{\gamma}_{i,i-1},\tilde{\gamma}_{i,i+1},\ldots,\tilde{\gamma}_{in})^{\mathsf{T}} by solving the linear equations,

{γ~ij=0,z~j𝕍(𝒮(z~i;~{z~i})),z~j𝕍(𝒮(z~i;~{z~i}))γ~ij=1,z~j𝕍(𝒮(z~i;~{z~i}))γ~ijz~j=z~i.\begin{cases}\tilde{\gamma}_{ij}=0,&\forall\tilde{\textbf{z}}_{j}\notin\mathbb{V}(\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\})),\\ \sum_{\tilde{\textbf{z}}_{j}\in\mathbb{V}(\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}))}\tilde{\gamma}_{ij}=1,&\\ \sum_{\tilde{\textbf{z}}_{j}\in\mathbb{V}(\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}))}\tilde{\gamma}_{ij}\tilde{\textbf{z}}_{j}=\tilde{\textbf{z}}_{i}^{*}.&\\ \end{cases} (5)

The remaining difficulty in computing 𝚪~\boldsymbol{\Gamma}_{\widetilde{\mathbb{Z}}} lies in finding the Delaunay simplex 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}), for i=1,,ni=1,\ldots,n.

Although the DELAUNAYSPARSE algorithm by Chang et al., (2020) can efficiently find 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}) for z~id(~{z~i})\tilde{\textbf{z}}_{i}\in\mathcal{H}_{\mathcal{R}^{d}}(\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}), it does not work for those z~id(~{z~i})\tilde{\textbf{z}}_{i}\notin\mathcal{H}_{\mathcal{R}^{d}}(\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}). To tackle this difficulty, we incorporate the inverse stereographic projection to the DELAUNAYSPARSE algorithm and develop the stereographic projected DELAUNAYSPARSE algorithm to find 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}), regardless of whether z~id(~{z~i})\tilde{\textbf{z}}_{i}\in\mathcal{H}_{\mathcal{R}^{d}}(\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}) or not. Without loss of generality, we assume the Euclidean representation ~\widetilde{\mathbb{Z}} is centered at 0d\textbf{0}^{d}. The stereographic projected DELAUNAYSPARSE algorithm is given as follows.

Refer to caption
(a) Inverse stereographic projection.
Refer to caption
(b) Delaunay triangulation on ¯ηrmax\overline{\mathcal{B}}_{\eta r_{\text{max}}}.
Refer to caption
(c) Projecting the Delaunay triangulation on ¯ηrmax\overline{\mathcal{B}}_{\eta r_{\text{max}}} back to d\mathcal{R}^{d} leads to an approximation of the Delaunay triangulation on d\mathcal{R}^{d}.
Refer to caption
(d) Approximated Delaunay triangulation approaches the true one as the scaling parameter η\eta grows.
Figure 5: Graphical illustration of the stereographic projected DELAUNAYSPARSE algorithm.
  • 1.

    We first compute rmax=maxi=1,,nz~ir_{\text{max}}=\max_{i=1,\ldots,n}\|\tilde{\textbf{z}}_{i}\|, set a scaling parameter η>0\eta>0 and apply the inverse stereographic projection,

    ϕηrmax1:d¯ηrmax{(𝟎d,ηrmax)𝖳},z~(η2rmax2η2rmax2+z~2z~,ηrmaxz~2η2rmax2+z~2)𝖳,\phi_{\eta r_{\text{max}}}^{-1}:\mathcal{R}^{d}\to\overline{\mathcal{B}}_{\eta r_{\text{max}}}\setminus\{(\mathbf{0}_{d},\eta r_{\text{max}})^{\mathsf{T}}\},\quad\tilde{\textbf{z}}\mapsto\Bigg{(}\frac{\eta^{2}r_{\text{max}}^{2}}{\eta^{2}r_{\text{max}}^{2}+\|\tilde{\textbf{z}}\|^{2}}\cdot\tilde{\textbf{z}},\frac{\eta r_{\text{max}}\|\tilde{\textbf{z}}\|^{2}}{\eta^{2}r_{\text{max}}^{2}+\|\tilde{\textbf{z}}\|^{2}}\Bigg{)}^{\mathsf{T}}, (6)

    to project ~\widetilde{\mathbb{Z}} to a dd-sphere ¯ηrmax={zd+1;z(𝟎d,ηrmax/2)=ηrmax/2}\overline{\mathcal{B}}_{\eta r_{\text{max}}}=\{\textbf{z}\in\mathcal{R}^{d+1};\|\textbf{z}-(\mathbf{0}_{d},\eta r_{\text{max}}/2)^{\top}\|=\eta r_{\text{max}}/2\}. Let zˇi=ϕηrmax1(z~i)\check{\textbf{z}}_{i}=\phi_{\eta r_{\text{max}}}^{-1}(\tilde{\textbf{z}}_{i}), for i=1,,ni=1,\ldots,n, and let the reference point zˇ0\check{\textbf{z}}_{0} be (𝟎d,ηrmax)(\mathbf{0}_{d},\eta r_{\text{max}}). We define the augmented point set as ˇ={zˇ0,zˇ1,,zˇn}¯ηrmax\check{\mathbb{Z}}=\{\check{\textbf{z}}_{0},\check{\textbf{z}}_{1},\ldots,\check{\textbf{z}}_{n}\}\subset\overline{\mathcal{B}}_{\eta r_{\text{max}}}. A graphical illustration of obtaining ˇ\check{\mathbb{Z}} from ~{\widetilde{\mathbb{Z}}} with d=2d=2 is exhibited in Figure 5 (a).

  • 2.

    As shown in Figure 5 (b), for any i=1,,ni=1,\ldots,n, the union of simplicies in the Delaunay triangulation 𝒟𝒯¯ηrmax(ˇ{zˇi})\mathcal{DT}_{\overline{\mathcal{B}}_{\eta r_{\text{max}}}}(\check{\mathbb{Z}}\setminus\{\check{\textbf{z}}_{i}\}) covers the whole sphere ¯ηrmax\overline{\mathcal{B}}_{\eta r_{\text{max}}}. By projecting 𝒟𝒯¯ηrmax(ˇ{zˇi})\mathcal{DT}_{\overline{\mathcal{B}}_{\eta r_{\text{max}}}}(\check{\mathbb{Z}}\setminus\{\check{\textbf{z}}_{i}\}) back to d\mathcal{R}^{d} under the convention that the geodesic path ¯ηrmax(zˇj,zˇ0)\ell_{\overline{\mathcal{B}}_{\eta r_{\text{max}}}}(\check{\textbf{z}}_{j},\check{\textbf{z}}_{0}) (for any jij\neq i) is mapped to the line in d\mathcal{R}^{d} from z~j\tilde{\textbf{z}}_{j} in the opposite direction of 𝟎d\mathbf{0}_{d}, an approximation of 𝒟𝒯d(~{z~i})\mathcal{DT}_{\mathcal{R}^{d}}({\widetilde{\mathbb{Z}}}\setminus\{\tilde{\textbf{z}}_{i}\}) is obtained as shown in Figure 5 (c). Figure 5 (d) shows that such an approximation converges to 𝒟𝒯d(~{z~i})\mathcal{DT}_{\mathcal{R}^{d}}({\widetilde{\mathbb{Z}}}\setminus\{\tilde{\textbf{z}}_{i}\}) as the scaling parameter η\eta tends to infinity.

  • 3.

    Because 𝒟𝒯¯ηrmax(ˇ{zˇi})\mathcal{DT}_{\overline{\mathcal{B}}_{\eta r_{\text{max}}}}(\check{\mathbb{Z}}\setminus\{\check{\textbf{z}}_{i}\}) covers ¯ηrmax\overline{\mathcal{B}}_{\eta r_{\text{max}}}, for any i=1,,ni=1,\ldots,n, there must exist a Delaunay simplex 𝒮𝒟𝒯¯ηrmax(ˇ{zˇi})\mathcal{S}^{*}\in\mathcal{DT}_{\overline{\mathcal{B}}_{\eta r_{\text{max}}}}(\check{\mathbb{Z}}\setminus\{\check{\textbf{z}}_{i}\}) that contains zˇi\check{\textbf{z}}_{i} on ¯ηrmax\overline{\mathcal{B}}_{\eta r_{\text{max}}}. Thus, we use the breadth first search procedure similar to the DELAUNAYSPARSE algorithm (Chang et al.,, 2020) to compute the Delaunay simplex 𝒮\mathcal{S}^{*}. Finally, if zˇ0𝕍(𝒮)\check{\textbf{z}}_{0}\notin\mathbb{V}(\mathcal{S}^{*}), 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}) on d\mathcal{R}^{d} is obtained by the simplex formed by {z~j:zˇj𝕍(𝒮)}\{\tilde{\textbf{z}}_{j}:\check{\textbf{z}}_{j}\in\mathbb{V}(\mathcal{S}^{*})\}; if zˇ0𝕍(𝒮)\check{\textbf{z}}_{0}\in\mathbb{V}(\mathcal{S}^{*}), 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}) is obtained by the simplex formed by {z~j:zˇj𝕍(𝒮)}\{\tilde{\textbf{z}}_{j}:\check{\textbf{z}}_{j}\in\mathbb{V}(\mathcal{S}^{\dagger})\}, where 𝒮\mathcal{S}^{\dagger} is the neighbor simplex of 𝒮\mathcal{S}^{*} opposite to zˇ0\check{\textbf{z}}_{0}.

The stereographic projected DELAUNAYSPARSE algorithm is detailed in Section S3 of the supplementary material. Note that we only need to generate the local (but not full) Delaunay triangulation to find the Delaunay simplex 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}), and thus the computation is not intensive. Specifically, Section S3 of the supplementary material shows that the computational complexity of the stereographic projected DELAUNAYSPARSE algorithm is O(n2+1/dd2)O(n^{2+1/d}d^{2}). As the computational complexity of solving (5) via Gaussian elimination is O(nd3)O(nd^{3}) and the sample size nn is usually much larger than the intrinsic dimension dd, we conclude that the overall computational complexity for computing the approximation 𝚪~\boldsymbol{\Gamma}_{\widetilde{\mathbb{Z}}} is O(n2+1/dd2)O(n^{2+1/d}d^{2}). As shown in Figure 5 (d), the difference between 𝒟𝒯d(~{z~i})\mathcal{DT}_{\mathcal{R}^{d}}({\widetilde{\mathbb{Z}}}\setminus\{\tilde{\textbf{z}}_{i}\}) and its approximation with η>5\eta>5 is negligible. Because computation of the Delaunay weight is unrelated to η\eta provided that the simplex 𝒮(z~i;~{z~i})\mathcal{S}(\tilde{\textbf{z}}_{i};\widetilde{\mathbb{Z}}\setminus\{\tilde{\textbf{z}}_{i}\}) is correctly identified, we suggest η=10\eta=10 in practice.

3.4 Permutation Test

The Delaunay weighted test statistic in (4), TDW=j=1nijγij×I(δi=δj){T}_{\text{DW}}=\sum_{j=1}^{n}\sum_{i\neq j}\gamma_{ij}\times I(\delta_{i}=\delta_{j}), aims to detect differences between the underlying distributions FF and GG via the deviation of the sum of within-group weights from its expectation under the null. Figure 4 gives an illustrative example of the Delaunay weight matrix 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} with group indicators δi=1\delta_{i}=1, for i=1,,4i=1,\ldots,4, and δi=0\delta_{i}=0 otherwise. If we partition Γ\Gamma_{\mathbb{Z}} at the 4-th row and column, TDW{T}_{\text{DW}} equals the sum of all Delaunay weights in the two diagonal blocks of 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}}. Generally, off-diagonal elements in 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} are identically distributed under the null. In contrast, under H1:FGH_{1}:F\neq G, more zi\textbf{z}_{i}’s in 𝕏\mathbb{X} are located in the region where the density of FF is higher than GG. For these zi\textbf{z}_{i}’s in 𝕏\mathbb{X}, the expected proportion of data points from 𝕏\mathbb{X} in 𝕍(𝒮(zi;{zi}))\mathbb{V}(\mathcal{S}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\})) is larger than that in {zi}\mathbb{Z}\setminus\{\textbf{z}_{i}\}. Similar arguments also hold for 𝕐\mathbb{Y}. Thus, TDW{T}_{\text{DW}} is stochastically larger under the alternative than under the null.

The exact distribution of TDWT_{\text{DW}} under the null is complex and difficult to derive, and thus we implement a permutation procedure to compute the pp-value.

  1. 1.

    Compute the observed value of test statistic TDWT_{\text{DW}} via (4).

  2. 2.

    For b=1,,Bb=1,\ldots,B, we randomly generate a permutation of group indicators 𝜹(b)={δ1(b),,δn(b)}\boldsymbol{\delta}^{(b)}=\{\delta_{1}^{(b)},\ldots,\delta_{n}^{(b)}\}, where i=1nδi(b)=n1\sum_{i=1}^{n}\delta_{i}^{(b)}=n_{1} and i=1n(1δi(b))=n0\sum_{i=1}^{n}(1-\delta_{i}^{(b)})=n_{0}, and compute TDW(b)=j=1nijγij×I(δi(b)=δj(b))T^{(b)}_{\text{DW}}=\sum_{j=1}^{n}\sum_{i\neq j}\gamma_{ij}\times I(\delta_{i}^{(b)}=\delta_{j}^{(b)}) under 𝜹(b)\boldsymbol{\delta}^{(b)}.

  3. 3.

    Compute p-value={b=1BI(TDWTDW(b))+1}/(B+1),p{\textrm{-value}}=\{\sum_{b=1}^{B}I(T_{\text{DW}}\leq T^{(b)}_{\text{DW}})+1\}/(B+1), where 11 is added in both the numerator and denominator to calibrate that under H0H_{0}, Pr(p-valueα)α\text{Pr}(p\text{-value}\leq\alpha)\leq\alpha for all α[0,1]\alpha\in[0,1]. Indeed, under H0H_{0}, TDW,TDW(1),,,TDW(B)T_{\text{DW}},T_{\text{DW}}^{(1)},\ldots,,T_{\text{DW}}^{(B)} are i.i.d. given the pooled sample \mathbb{Z} and sample sizes (n1,n0)(n_{1},n_{0}). Therefore, the p-valuep\text{-value} is uniformly distributed on {1/(B+1),2/(B+1),,1}\{1/(B+1),2/(B+1),\ldots,1\}, which converges to Unif[0,1]\text{Unif}[0,1] as BB\to\infty.

Under the unknown manifold setting where Γ\Gamma_{\mathbb{Z}} cannot be computed exactly, we replace it by Γ~\Gamma_{\widetilde{\mathbb{Z}}} for approximate inference. Unlike most of the existing tests based on pairwise distances only, the Delaunay weight takes the relative directions among data points into account, and thus our test gains more power in detecting the direction difference between FF and GG.

4 Theoretical Analysis

For convenience, we write γˇij=γij+γji\check{\gamma}_{ij}=\gamma_{ij}+\gamma_{ji}, for all iji\neq j, and we use the subscript “Hk{}_{H_{k}}” in EHk\text{E}_{H_{k}} and VarHk\text{Var}_{H_{k}} to denote the condition that HkH_{k} is true, for k=0,1k=0,1. To deduce asymptotic properties of the test statistic TDWT_{\text{DW}}, it is common to deduce the distribution of the unconditional standardized statistic. However, since TDWT_{\text{DW}} is a function of both 𝜹\boldsymbol{\delta} and the Delaunay weight matrix 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}}, it is difficult to obtain an explicit expression of its variance under the null due to discrete nature of the Delaunay triangulation. Therefore, we first provide the expectation and variance of the standardized statistic conditional on \mathbb{Z}, where the proximity graph based on 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} becomes nonrandom as in Chen and Friedman, (2017) and Chen et al., (2018).

Theorem 1.

Let ={z1,,zn}\mathbb{Z}=\{\textbf{z}_{1},\ldots,\textbf{z}_{n}\} be a set of generic points on a dd-dimensional geodesically convex Riemannian manifold D\mathcal{M}\subset\mathcal{R}^{D}. When H0H_{0} is true, we have

EH0(n1TDW|)=n1(n11)+n0(n01)n(n1)\text{E}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z})=\frac{n_{1}(n_{1}-1)+n_{0}(n_{0}-1)}{n(n-1)}

and

VarH0(n1TDW|)=n1(n11)n0(n01)3n2V0,+n1n0(n2)3n2V1,+n1n0n2V2,4n12n02(n1)2n2,\text{Var}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z})=\frac{n_{1}(n_{1}-1)n_{0}(n_{0}-1)}{3n^{2}}V_{0,\mathbb{Z}}+\frac{n_{1}n_{0}(n-2)}{3n^{2}}V_{1,\mathbb{Z}}+\frac{n_{1}n_{0}}{n^{2}}V_{2,\mathbb{Z}}-\frac{4n_{1}^{2}n_{0}^{2}}{(n-1)^{2}n^{2}},

where

V0,\displaystyle V_{0,\mathbb{Z}} =1i<j<k<ln(γˇijγˇkl+γˇikγˇjl+γˇilγˇjk)n(n1)(n2)(n3)/24,\displaystyle=\frac{\sum_{1\leq i<j<k<l\leq n}(\check{\gamma}_{ij}\check{\gamma}_{kl}+\check{\gamma}_{ik}\check{\gamma}_{jl}+\check{\gamma}_{il}\check{\gamma}_{jk})}{n(n-1)(n-2)(n-3)/24},
V1,\displaystyle V_{1,\mathbb{Z}} =1i<j<kn(γˇijγˇik+γˇijγˇjk+γˇikγˇjk)n(n1)(n2)/6,\displaystyle=\frac{\sum_{1\leq i<j<k\leq n}(\check{\gamma}_{ij}\check{\gamma}_{ik}+\check{\gamma}_{ij}\check{\gamma}_{jk}+\check{\gamma}_{ik}\check{\gamma}_{jk})}{n(n-1)(n-2)/6},
V2,\displaystyle V_{2,\mathbb{Z}} =1i<jnγˇij2n(n1)/2,\displaystyle=\frac{\sum_{1\leq i<j\leq n}\check{\gamma}_{ij}^{2}}{n(n-1)/2},

are average values of {(γˇijγˇkl+γˇikγˇjl+γˇilγˇjk)|1i<j<k<ln}\{(\check{\gamma}_{ij}\check{\gamma}_{kl}+\check{\gamma}_{ik}\check{\gamma}_{jl}+\check{\gamma}_{il}\check{\gamma}_{jk})|1\leq i<j<k<l\leq n\}, {(γˇijγˇik+γˇijγˇjk+γˇikγˇjk)|1i<j<kn}\{(\check{\gamma}_{ij}\check{\gamma}_{ik}+\check{\gamma}_{ij}\check{\gamma}_{jk}+\check{\gamma}_{ik}\check{\gamma}_{jk})|1\leq i<j<k\leq n\} and {γˇij2|1i<jn}\{\check{\gamma}_{ij}^{2}|1\leq i<j\leq n\}, respectively.

The proof of Theorem 1 is based on the fact that 𝜹\boldsymbol{\delta} is uniformly distributed on all (nn1)\binom{n}{n_{1}} realizations containing n1n_{1} ones and n0n_{0} zeros conditional on \mathbb{Z} under H0H_{0}, as detailed in Section S4 of the supplementary material. Given the finite-sample conditional mean EH0(n1TDW|)\text{E}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z}) and conditional variance VarH0(n1TDW|)\text{Var}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z}), we deduce the asymptotic null distribution of the standardized statistic of n1TDWn^{-1}T_{\text{DW}}.

Theorem 2.

Assume that (i) limnn1/n=κ(0,1)\lim\limits_{n\to\infty}n_{1}/n=\kappa\in(0,1); (ii) F=GF=G and they have Lipschitz continuous densities f=gf=g with respect to the measure on \mathcal{M} induced by the Lebesgue measure on the ambient Euclidean space; and (iii) there exists ν<9/4\nu<9/4 such that i=1n|𝒩i|4=o(nν)\sum_{i=1}^{n}|\mathcal{N}_{i}|^{4}=o(n^{\nu}), with 𝒩i=𝒮𝒟𝒯():zi𝒮𝕍(𝒮)\mathcal{N}_{i}=\cup_{\mathcal{S}\in\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}):\textbf{z}_{i}\in\mathcal{S}}\mathbb{V}(\mathcal{S}). As nn\to\infty, we have

n1TDWEH0(n1TDW|)VarH0(n1TDW|)DN(0,1),\frac{n^{-1}T_{\text{DW}}-\text{E}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z})}{\sqrt{\text{Var}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z})}}{\displaystyle\mathop{\longrightarrow}^{D}}N(0,1), (7)

where D{\displaystyle\mathop{\longrightarrow}^{D}} denotes convergence in distribution.

The proof of Theorem 2 is given in Section S5 of the supplementary material. In Theorem 2, conditions (i) and (ii) are standard and mild. To gain insight into condition (iii), we define a hub zi\textbf{z}_{i} as a point with a degree much larger than the average degree in the 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}), which only occurs under the rare case that many data points are located on the sphere ¯(zi,r)\overline{\mathcal{B}}_{\mathcal{M}}(\textbf{z}_{i},r) for some r>0r>0. Condition (iii) regularizes the Delaunay triangulation 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}), which is satisfied if no hub exists in 𝒟𝒯()\mathcal{DT}_{\mathcal{M}}(\mathbb{Z}). Empirical evidence of Theorem 2 is provided in Section S7 of the supplementary material.

The consistency of the Delaunay weighted test with the permutation procedure can be shown as follows.

Theorem 3.

Under the same conditions as in Theorem 2, the Delaunay weighted test using TDWT_{\text{DW}} is consistent against all alternatives. That is, for any significance level α(0,1)\alpha\in(0,1) and FGF\neq G, the Delaunay weighted test with the permutation procedure rejects the null H0:F=GH_{0}:F=G with asymptotic probability one.

The proof of Theorem 3 follows Henze and Penrose, (1999) as detailed in Section S6 of the supplementary material, where we show n1TDWEH0(n1TDW|)n^{-1}T_{\text{DW}}-\text{E}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z}) converges to a positive value and VarH0(n1TDW|)\text{Var}_{H_{0}}(n^{-1}T_{\text{DW}}|\mathbb{Z}) converges to zero a.s. as nn\to\infty. Although the test statistic TDWT_{\text{DW}} is constructed based on 𝚪~\boldsymbol{\Gamma}_{\widetilde{\mathbb{Z}}} instead of 𝚪\boldsymbol{\Gamma}_{\mathbb{Z}} in practice, the consistency of the test is preserved as long as the mapping from \mathbb{Z} to ~\widetilde{\mathbb{Z}} is one-to-one, a standard assumption in manifold learning.

5 Experiments

To illustrate the empirical performance of the Delaunay weighted test, we conduct extensive experiments on both simulated and real data. For comparison, we also implement several existing nonparametric tests, including the kk-NN test (Hall and Tajvidi,, 2002), the ee-distance test (Székely and Rizzo,, 2004), the covariance test (Cai et al., 2013a, ), the kk-MST test (Chen and Friedman,, 2017), the regression test (Kim et al.,, 2019) and the kernel test (Gretton et al.,, 2012; Cheng and Xie,, 2021). Among these methods, the (local) regression test with the kk-NN regression and the kernel test are adaptive to the intrinsic dimension of the underlying manifold, and thus we adopt the kk-NN regression with k=n2/(2+d)k=n^{2/(2+d)} as suggested by Kim et al., (2019) in the regression test. Following the suggestion that the bandwidth should scale with D\sqrt{D} and n1/(d+2β)n^{-1/(d+2\beta)} (Cheng and Xie,, 2021), we choose the bandwidth sn1/(d+2β)s\cdot n^{-1/(d+2\beta)} for the kernel test, where s2s^{2} is the sum of variances of \mathbb{Z} and β=1\beta=1 is the Hölder constant. As the Delaunay weight w(zi;{zi})\textbf{w}(\textbf{z}_{i};\mathbb{Z}\setminus\{\textbf{z}_{i}\}), for each zi\textbf{z}_{i}, contains at most d+1d+1 nonzero components, we use k=d+1k=d+1 in the kk-NN tests and kk-MST test for fair comparison. The covariance test is consistent only if the covariances of FF and GG differ, while it is of interest to compare its performance under general cases. For the standard permutation procedure, we set the number of permutations as B=200B=200.

5.1 Scenario 1: Euclidean data

Refer to caption
(a) d=20d=20, n1=50n_{1}=50, n0=50n_{0}=50.
Refer to caption
(b) d=20d=20, n1=100n_{1}=100, n0=50n_{0}=50.
Refer to caption
(c) d=50d=50, n1=100n_{1}=100, n0=100n_{0}=100.
Figure 6: Empirical cumulative distribution function (ECDF) of the pp-value of the Delaunay weighted test under the null for Gaussian data with different dimensions (dd) and sample sizes (n1n_{1} and n0n_{0}). The dashed line is the reference line y=xy=x.

To validate the size of the Delaunay weighted test on Euclidean data, i.e., =D\mathcal{M}=\mathcal{R}^{D}, we generate 1000 Gaussian datasets under the null H0H_{0}. In each dataset, 𝕏\mathbb{X} and 𝕐\mathbb{Y} respectively contain n1n_{1} and n0n_{0} i.i.d. samples from the multivariate normal distribution MVN(𝟎d,Id×d)\text{MVN}(\mathbf{0}_{d},\textbf{I}_{d\times d}), where Id×d\textbf{I}_{d\times d} is the d×dd\times d identity matrix with d=Dd=D. We consider different dimensions d{20,50}d\in\{20,50\} and sample sizes (n1,n0){(50,50),(100,50),(100,100)}(n_{1},n_{0})\in\{(50,50),(100,50),(100,100)\}. Figure 6 shows the empirical cumulative distribution functions (ECDFs) of the pp-value of the Delaunay weighted test under several cases, while Figrue S3 shows ECDFs for other cases. The ECDF of the pp-value is close to that of the uniform distribution Unif[0,1]\text{Unif}[0,1] under all dimensions and sample sizes, suggesting that the Delaunay weighted test can control the type I error rate at any target level α[0,1]\alpha\in[0,1].

Refer to caption
(a) d=20d=20, n1=50n_{1}=50, n0=50n_{0}=50.
Refer to caption
(b) d=20d=20, n1=100n_{1}=100, n0=50n_{0}=50.
Refer to caption
(c) d=20d=20, n1=100n_{1}=100, n0=100n_{0}=100.
Refer to caption
(d) d=50d=50, n1=50n_{1}=50, n0=50n_{0}=50.
Refer to caption
(e) d=50d=50, n1=100n_{1}=100, n0=50n_{0}=50.
Refer to caption
(f) d=50d=50, n1=100n_{1}=100, n0=100n_{0}=100.
Figure 7: Empirical cumulative distribution function (ECDF) of the pp-value of all methods under the location alternative for Gaussian data with different dimensions (dd) and sample sizes (n1n_{1} and n0n_{0}). The dashed line is the reference line y=xy=x.

We also investigate the power of the Delaunay weighted test in distinguishing differences between distributions FF and GG on 1000 Gaussian datasets. We consider both the location alternative and the direction alternative, where the former imposes the difference in location parameters (mean or median) of distributions FF and GG, and the latter sets the same location parameter for distributions FF and GG but their covariance matrices have different principal directions (directions with large variance).

Under the location alternative, 𝕏\mathbb{X} contains n1n_{1} independent samples x1,,xn1\textbf{x}_{1},\ldots,\textbf{x}_{n_{1}} from F=MVN(𝟎d,Id×d)F=\text{MVN}(\mathbf{0}_{d},\textbf{I}_{d\times d}), while y1,,yn0\textbf{y}_{1},\ldots,\textbf{y}_{n_{0}} in 𝕐\mathbb{Y} are generated from G=MVN(𝚫d,Id×d)G=\text{MVN}(\boldsymbol{\Delta}_{d},\textbf{I}_{d\times d}), where 𝚫d\boldsymbol{\Delta}_{d} is an independent variable uniformly distributed on ¯d(𝟎,0.8)\overline{\mathcal{B}}_{\mathcal{R}^{d}}(\mathbf{0},0.8) for d=20d=20 and ¯d(𝟎,1)\overline{\mathcal{B}}_{\mathcal{R}^{d}}(\mathbf{0},1) for d=50d=50. Under the direction alternative, both samples 𝕏\mathbb{X} and 𝕐\mathbb{Y} are generated from zero-mean Gaussian distributions F=MVN(𝟎d,𝚺x)F=\text{MVN}(\mathbf{0}_{d},\boldsymbol{\Sigma}_{\textbf{x}}) and G=MVN(𝟎d,𝚺y)G=\text{MVN}(\mathbf{0}_{d},\boldsymbol{\Sigma}_{\textbf{y}}), with covariance matrices,

𝚺x=(1.252I(d/2)×(d/2)0(d/2)×(d/2)0(d/2)×(d/2)I(d/2)×(d/2))and𝚺y=(I(d/2)×(d/2)0(d/2)×(d/2)0(d/2)×(d/2)1.252I(d/2)×(d/2)).\boldsymbol{\Sigma}_{\textbf{x}}=\begin{pmatrix}1.25^{2}\textbf{I}_{(d/2)\times(d/2)}&\textbf{0}_{(d/2)\times(d/2)}\\ \textbf{0}_{(d/2)\times(d/2)}&\textbf{I}_{(d/2)\times(d/2)}\\ \end{pmatrix}\mathrm{~{}~{}and~{}~{}}\boldsymbol{\Sigma}_{\textbf{y}}=\begin{pmatrix}\textbf{I}_{(d/2)\times(d/2)}&\textbf{0}_{(d/2)\times(d/2)}\\ \textbf{0}_{(d/2)\times(d/2)}&1.25^{2}\textbf{I}_{(d/2)\times(d/2)}\\ \end{pmatrix}.

Figure 7 presents ECDFs of the pp-value of the Delaunay weighted test in comparison with other methods under the location alternative. For both d=20d=20 and d=50d=50, the power of the Delaunay weighted test tends to 11 as the total sample size n1+n0n_{1}+n_{0} grows, demonstrating its ability in distinguishing the location difference of distributions. Compared with the existing ones, our test outperforms the kk-NN test and the regression test and it yields comparable power with the kk-MST test, while the kernel test and the ee-distance test possess the highest power. As expected, the covariance test is powerless, because it is not consistent in distinguishing the location difference.

Refer to caption
(a) d=20d=20, n1=50n_{1}=50, n0=50n_{0}=50.
Refer to caption
(b) d=20d=20, n1=100n_{1}=100, n0=50n_{0}=50.
Refer to caption
(c) d=20d=20, n1=100n_{1}=100, n0=100n_{0}=100.
Refer to caption
(d) d=50d=50, n1=50n_{1}=50, n0=50n_{0}=50.
Refer to caption
(e) d=50d=50, n1=100n_{1}=100, n0=50n_{0}=50.
Refer to caption
(f) d=50d=50, n1=100n_{1}=100, n0=100n_{0}=100.
Figure 8: Empirical cumulative distribution function (ECDF) of the pp-value of all methods under the direction alternative for the Gaussian data with different dimensions (dd) and sample sizes (n1n_{1} and n0n_{0}). The dashed line is the reference line y=xy=x.

Figure 8 presents empirical distributions of the pp-value under the direction alternative, where our method outperforms all the tests in comparison, except that the covariance test has comparable power. Under the direction alternative, our test, which takes the direction information into account, is more powerful than those tests only using the distance information. The advantage of our Delaunay weighted test over the others further amplifies when the dimension dd grows to 5050, because the information contained in the Euclidean distance diminishes as dd grows.

5.2 Scenario 2: Manifold Data

To validate the effectiveness of the Delaunay weighted test on high-dimensional data that lie on a low-dimensional manifold, we use a digit image in the MNIST data as shown in Figure 9 (a) to generate synthetic datasets. The size of original image is 28×2828\times 28, leading to 784784 pixels in total with the gray scale in {0,,255}\{0,\ldots,255\}. To generate synthetic data, we increase the size of the original image to 40×4040\times 40 by padding 66 rows (or columns) of 0 to both the top and bottom (or left and right) as shown in Figure 9 (b). Each pixel is assigned a 2D coordinate at its center via the Cartesian coordinate system as shown in Figure 9 (c). We next apply the rotation and shift distortion on the padded image where the pixel located at (ξ,η)(\xi,\eta) is projected to (ξcosθηsinθ+h,ξsinθ+ηcosθ+v)(\xi\cos\theta-\eta\sin\theta+h,\xi\sin\theta+\eta\cos\theta+v), where θ\theta is the rotation angle, and hh and vv are horizontal and vertical shift distances respectively. We then align the gray scale of all pixels of each synthetic image by rows into a vector of D=1600D=1600 components, which are located in a manifold with the intrinsic dimension d=3d=3.

To validate the size and power of the Delaunay weighted test on the manifold data, we randomly generate 1000 synthetic datasets with sample sizes (n1,n0)={(250,250),(n_{1},n_{0})=\{(250,250), (500,250),(500,500)}(500,250),(500,500)\} by applying the rotation and shift distortion on the padded image. The transformation parameters (θ,h,v)(\theta,h,v) follow a uniform distribution in different domains under different scenarios as follows.

  • Null: {(θ,h,v)|π/8θπ/8,h2+v24}\{(\theta,h,v)|-\pi/8\leq\theta\leq\pi/8,h^{2}+v^{2}\leq 4\} for both 𝕏\mathbb{X} and 𝕐\mathbb{Y}.

  • Location alternative: {(θ,h,v)|19π/160θ21π/160,(h0.05)2+(v0.05)24}\{(\theta,h,v)|-19\pi/160\leq\theta\leq 21\pi/160,(h-0.05)^{2}+(v-0.05)^{2}\leq 4\} for 𝕏\mathbb{X} and {(θ,h,v)|21π/160θ19π/160,(h+0.05)2+(v+0.05)24}\{(\theta,h,v)|-21\pi/160\leq\theta\leq 19\pi/160,(h+0.05)^{2}+(v+0.05)^{2}\leq 4\} for 𝕐\mathbb{Y}.

  • Direction alternative: {(θ,h,v)|19π/160θ19π/160,(h/1.05)2+(v/1.05)24}\{(\theta,h,v)|-19\pi/160\leq\theta\leq 19\pi/160,(h/1.05)^{2}+(v/1.05)^{2}\leq 4\} for 𝕏\mathbb{X} and {(θ,h,v)|21π/160θ21π/160,(h/0.95)2+(v/0.95)24}\{(\theta,h,v)|-21\pi/160\leq\theta\leq 21\pi/160,(h/0.95)^{2}+(v/0.95)^{2}\leq 4\} for 𝕐\mathbb{Y}.

To investigate how the intrinsic dimension dd affects the test performance, we implement the Delaunay weighted test and the kernel test with both the true dd and the estimated d^\hat{d} using Algorithm S1 in the supplementary material. Table 1 presents rejection proportions of all tests under different scenarios. Although d^\hat{d} usually overestimates dd, the performance of our test is robust to d^\hat{d}. Sometimes the test using d^\hat{d} performs even better than using dd, possibly because rounding of pixel values after distortion increases the intrinsic dimension. Under the direction alternative, the Delaunay weighted test outperforms all the others no matter whether the true dd or the estimated d^\hat{d} is used, which suggests its effectiveness for manifold data with direction differences. Under the location alternative, although the kernel test often performs the best, the Delaunay weighted test still yields comparable power, and it outperforms all other tests. The covariance test performs poorly, mainly because the sparsity assumption does not hold for the manifold data.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 9: Preprocessing of the digit image: (a) the original digit image; (b) increasing the size of the digit image by padding 0 to all four sides; (c) assigning a 2D coordinate to all pixels with the Cartesian coordinate system.
Table 1: Rejection proportions (%) of different two-sample tests on the manifold data under the null and different alternatives (H1H_{1}) and different significant levels α\alpha, where the best performance under H1H_{1} is in boldface.
Tests dd Null (α\alpha) Location H1H_{1} (α\alpha) Direction H1H_{1} (α\alpha)
0.010.01 0.050.05 0.100.10 0.010.01 0.050.05 0.100.10 0.010.01 0.050.05 0.100.10
n1=250n_{1}=250, n0=250n_{0}=250
Delaunay weighted True 1.0 4.4 9.6 8.7 26.5 39.3 15.1 39.3 54.4
Estimate 0.6 4.9 10.1 9.0 26.7 39.5 15.6 33.8 46.7
Kernel True 0.9 3.8 8.3 6.1 22.1 34.0 6.5 23.6 38.9
Estimate 1.2 6.2 10.6 9.5 27.1 39.8 4.9 21.0 32.9
Regression 1.3 6.1 11.5 4.2 15.8 25.6 3.6 13.8 23.5
kk-NN 1.3 5.2 10.1 3.1 10.9 20.1 3.8 13.4 21.8
kk-MST 0.7 5.4 10.4 1.7 8.1 14.0 4.8 15.8 24.5
ee-distance 0.7 4.3 9.2 9.1 20.3 26.6 1.8 12.4 21.8
Covariance 0.0 0.1 0.1 1.0 3.8 5.5 0.1 0.5 0.9
n1=500n_{1}=500, n0=250n_{0}=250
Delaunay weighted True 1.1 5.3 10.6 10.6 30.5 43.2 27.8 53.7 67.2
Estimate 0.7 4.1 10.4 9.1 28.4 41.4 33.4 60.6 73.1
Kernel True 0.7 4.5 9.8 9.4 27.1 38.6 11.1 32.7 47.7
Estimate 1.5 5.6 10.4 14.1 34.7 47.3 10.6 30.7 46.8
Regression 1.1 4.5 8.3 8.8 20.2 32.0 7.6 21.2 32.1
kk-NN 0.7 3.9 8.9 2.2 9.9 17.7 6.5 19.0 29.5
kk-MST 1.0 6.4 12.2 2.5 7.7 13.4 6.4 17.4 26.6
ee-distance 0.8 5.4 9.4 13.9 25.3 31.8 3.0 14.6 27.2
Covariance 0.6 1.4 2.5 12.5 25.8 34.1 3.1 7.9 13.3
n1=500n_{1}=500, n0=500n_{0}=500
Delaunay weighted True 0.7 4.3 11.4 20.2 48.5 60.2 40.7 69.3 79.3
Estimate 1.8 5.5 14.3 31.0 56.6 66.9 43.6 72.7 84.4
Kernel True 0.4 5.8 12.1 19.1 44.4 58.2 23.3 52.9 68.3
Estimate 0.9 3.7 7.9 27.0 49.5 64.1 23.8 55.0 70.6
Regression 1.6 4.6 10.0 9.6 27.6 42.6 14.4 37.8 48.9
kk-NN 1.6 8.3 12.8 4.1 15.1 29.3 9.5 26.2 38.1
kk-MST 0.7 5.4 11.4 4.1 16.8 24.7 14.5 30.2 40.7
ee-distance 0.9 4.3 11.0 23.4 36.9 42.0 6.2 23.6 39.5
Covariance 0.0 0.0 0.2 14.3 24.7 34.4 4.8 12.8 18.4

5.3 Real Data Analysis: Human Face Images

For illustration, we apply the Delaunay weighted test to the Human Face Image (HFI) dataset222https://www.kaggle.com/datasets/nipunarora8/age-gender-and-ethnicity-face-data-csv, a subset of the large-scale UTKFace dataset (Zhang et al.,, 2017). The HFI dataset contains 27305 images of individuals of different races (White, Black, Asian, Indian, and Others) and different ages (from 11 to 116116 years old). The size of each image is 48×4848\times 48, leading to 23042304 pixels in total with the gray scale in {0,,255}\{0,\ldots,255\}. We extract images of 72057205 individuals between 20 and 29 years old and align the gray scale of all pixels of each image by rows into a vector of D=2304D=2304 components.

We consider two scenarios for the extracted dataset. The first is the null scenario, where for each replication, we randomly draw n1+n0n_{1}+n_{0} records without replacement and distribute n1n_{1} and n0n_{0} records to 𝕏\mathbb{X} and 𝕐\mathbb{Y} respectively. We consider sample sizes (n1,n0)=(200,200)(n_{1},n_{0})=(200,200) or (400,400)(400,400), under which the estimated d^\hat{d} ranges in 1515 to 2121. The pp-value of the Delaunay weighted test over 1000 replications is shown in the top panel of Figure 10. Similar to the results in Section 5.1, the pp-value of the Delaunay weighted test is uniformly distributed in [0,1][0,1], as expected.

Refer to caption
(a) n1=200n_{1}=200, n0=200n_{0}=200.
Refer to caption
(b) n1=400n_{1}=400, n0=400n_{0}=400.
Refer to caption
(c) n1=200n_{1}=200, n0=200n_{0}=200.
Refer to caption
(d) n1=400n_{1}=400, n0=400n_{0}=400.
Figure 10: Empirical cumulative distribution function (ECDF) of the pp-value of the Delaunay weighted test under the null scenario (top) and those of all methods under the alternative scenario (bottom) for the HFI dataset with different sample sizes (n1n_{1} and n0n_{0}). The dashed line is the reference line y=xy=x.

We also consider the alternative scenario, where records of two samples differ in age. Under this scenario, n1n_{1} (or n0n_{0}) records of individuals not older than 25 years (or not younger than 26 years) are randomly drawn without replacement to form 𝕏\mathbb{X} (or 𝕐\mathbb{Y}). Under different sample sizes (n1,n0)=(200,200)(n_{1},n_{0})=(200,200) or (400,400)(400,400), empirical distributions of the pp-value of the Delaunay weighted test and the compared ones over 1000 replications are shown in the second row of Figure 10. The Delaunay weighted test outperforms all the existing tests in distinguishing human faces of different ages.

References

  • Abadie and L’Hour, (2021) Abadie, A. and L’Hour, J. (2021). A Penalized Synthetic Control Estimator for Disaggregated Data. Journal of the American Statistical Association, 116(536):1817–1834.
  • Arias-Castro et al., (2018) Arias-Castro, E., Pelletier, B., and Saligrama, V. (2018). Remember the Curse of Dimensionality: The Case of Goodness-of-fit Testing in Arbitrary Dimension. Journal of Nonparametric Statistics, 30(2):448–471.
  • Bhattacharya and Patrangenaru, (2003) Bhattacharya, R. and Patrangenaru, V. (2003). Large sample theory of intrinsic and extrinsic sample means on manifolds. The Annals of Statistics, 31(1):1–29.
  • Bickel, (1969) Bickel, P. J. (1969). A Distribution Free Version of the Smirnov Two Sample Test in the pp-Variate Case. The Annals of Mathematical Statistics, 40(1):1–23.
  • (5) Cai, T. T., Liu, W., and Xia, Y. (2013a). Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings. Journal of the American Statistical Association, 108(501):265–277.
  • (6) Cai, T. T., Liu, W., and Xia, Y. (2013b). Two-sample Test of High Dimensional Means under Dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2):349–372.
  • Cao et al., (2018) Cao, Y., Nemirovski, A., Xie, Y., Guigues, V., and Juditsky, A. (2018). Change Detection via Affine and Quadratic Detectors. Electronic Journal of Statistics, 12(1):1–57.
  • Chang et al., (2017) Chang, J., Zheng, C., Zhou, W.-X., and Zhou, W. (2017). Simulation-based Hypothesis Testing of High Dimensional Means under Covariance Heterogeneity. Biometrics, 73(4):1300–1310.
  • Chang et al., (2020) Chang, T. H., Watson, L. T., Lux, T. C. H., Butt, A. R., Cameron, K. W., and Hong, Y. (2020). Algorithm 1012: DELAUNAYSPARSE: Interpolation via a Sparse Subset of the Delaunay Triangulation in Medium to High Dimensions. ACM Transactions on Mathematical Software, 46(4):1–20.
  • Chen, (2019) Chen, H. (2019). Sequential Change-point Detection based on Nearest Neighbors. The Annals of Statistics, 47(3):1381–1407.
  • Chen et al., (2018) Chen, H., Chen, X., and Su, Y. (2018). A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data. Journal of the American Statistical Association, 113(523):1146–1155.
  • Chen and Friedman, (2017) Chen, H. and Friedman, J. H. (2017). A New Graph-Based Two-Sample Test for Multivariate and Object Data. Journal of the American Statistical Association, 112(517):397–409.
  • Cheng and Xie, (2021) Cheng, X. and Xie, Y. (2021). Kernel Two-Sample Tests for Manifold Data. arXiv:2105.03425.
  • Chwialkowski et al., (2016) Chwialkowski, K., Strathmann, H., and Gretton, A. (2016). A Kernel Test of Goodness of Fit. In Proceedings of The 33rd International Conference on Machine Learning, pages 2606–2615, New York, New York, USA. PMLR.
  • Dimeglio et al., (2014) Dimeglio, C., Gallón, S., Loubes, J.-M., and Maza, E. (2014). A Robust Algorithm for Template Curve Estimation based on Manifold Embedding. Computational Statistics & Data Analysis, 70:373–386.
  • Facco et al., (2017) Facco, E., d’Errico, M., Rodriguez, A., and Laio, A. (2017). Estimating the Intrinsic Dimension of Datasets by a Minimal Neighborhood Information. Scientific Reports, 7:12140.
  • Friedman and Rafsky, (1979) Friedman, J. H. and Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests. The Annals of Statistics, 7(4):697–717.
  • Gretton et al., (2012) Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A Kernel Two-Sample Test. Journal of Machine Learning Research, 13(25):723–773.
  • Gretton et al., (2009) Gretton, A., Fukumizu, K., Harchaoui, Z., and Sriperumbudur, B. K. (2009). A Fast, Consistent Kernel Two-Sample Test. In Proceedings of the 22nd International Conference on Neural Information Processing Systems, volume 22, page 673–681, Red Hook, New York, USA. Curran Associates, Inc.
  • Hall and Tajvidi, (2002) Hall, P. and Tajvidi, N. (2002). Permutation Tests for Equality of Distributions in High-dimensional Settings. Biometrika, 89(2):359–374.
  • Hediger et al., (2022) Hediger, S., Michel, L., and Näf, J. (2022). On the Use of Random Forest for Two-sample Testing. Computational Statistics & Data Analysis, 170:107435.
  • Henze, (1988) Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2):772–783.
  • Henze and Penrose, (1999) Henze, N. and Penrose, M. D. (1999). On the Multivariate Runs Test. The Annals of Statistics, 27(1):290–298.
  • Jurečková and Kalina, (2012) Jurečková, J. and Kalina, J. (2012). Nonparametric Multivariate Rank Tests and Their Unbiasedness. Bernoulli, 18(1):229–251.
  • Kim et al., (2019) Kim, I., Lee, A. B., and Lei, J. (2019). Global and Local Two-sample Tests via Regression. Electronic Journal of Statistics, 13(2):5253–5305.
  • Kolmogorov, (1933) Kolmogorov, A. N. (1933). Sulla Determinazione Empirica di Una Legge di Distribuzione. Giornale dell’Istituto Italiano degli Attuari, 4:83–91.
  • Kruskal, (1964) Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29:1–27.
  • Leibon and Letscher, (2000) Leibon, G. and Letscher, D. (2000). Delaunay Triangulations and Voronoi Diagrams for Riemannian Manifolds. In Proceedings of the sixteenth annual symposium on Computational geometry, pages 341–349, New York, New York, USA. Association for Computing Machinery.
  • Liu et al., (2022) Liu, W., Yu, X., Zhong, W., and Li, R. (2022). Projection Test for Mean Vector in High Dimensions. Journal of the American Statistical Association.
  • Mann and Whitney, (1947) Mann, H. B. and Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1):50–60.
  • Marozzi, (2015) Marozzi, M. (2015). Multivariate Multidistance Tests for High-dimensional Low Sample Size Case-control Studies. Statistics in Medicine, 34(9):1511–1526.
  • Moon et al., (2018) Moon, K. R., Stanley III, J. S., Burkhardt, D., van Dijk, D., Wolf, G., and Krishnaswamy, S. (2018). Manifold Learning-based Methods for Analyzing Single-cell RNA-sequencing Data. Current Opinion in Systems Biology, 7:36–46.
  • Morgan and Rubin, (2012) Morgan, K. L. and Rubin, D. B. (2012). Rerandomization to Improve Covariate Balance in Experiments. The Annals of Statistics, 40(2):1263–1282.
  • Pless and Souvenir, (2009) Pless, R. and Souvenir, R. (2009). A Survey of Manifold Learning for Images. IPSJ Transactions on Computer Vision and Applications, 1:83–94.
  • Rosenbaum, (2005) Rosenbaum, P. R. (2005). An Exact Distribution-free Test Comparing Wwo Multivariate Distributions based on Adjacency. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(4):515–530.
  • Schilling, (1986) Schilling, M. F. (1986). Multivariate Two-sample Tests based on Nearest Neighbors. Journal of the American Statistical Association, 81(395):799–806.
  • Sibson, (1978) Sibson, R. (1978). Locally Equiangular Triangulations. The Computer Journal, 21(3):243–245.
  • Smirnov, (1948) Smirnov, N. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions. The Annals of Mathematical Statistics, 19(2):279–281.
  • Székely and Rizzo, (2004) Székely, G. J. and Rizzo, M. L. (2004). Testing for Equal Distributions in High Dimension. InterStat, November(5).
  • Wald and Wolfowitz, (1940) Wald, A. and Wolfowitz, J. (1940). On a Test Whether Two Samples are from the Same Population. The Annals of Mathematical Statistics, 11(2):147–162.
  • Wang et al., (2019) Wang, W., Lin, N., and Tang, X. (2019). Robust Two-sample Test of High-dimensional Mean Vectors under Dependence. Journal of Multivariate Analysis, 169:312–329.
  • Wilcoxon, (1945) Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80–83.
  • Yan and Zhang, (2023) Yan, J. and Zhang, X. (2023). Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders. Biometrika, 110(2):411–430.
  • Zhang et al., (2017) Zhang, Z., Song, Y., and Qi, H. (2017). Age Progression/Regression by Conditional Adversarial Autoencoder. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4352–4360. IEEE.
  • Zhao et al., (2021) Zhao, D., Wang, J., Lin, H., Chu, Y., Wang, Y., Zhang, Y., and Yang, Z. (2021). Sentence Representation with Manifold Learning for Biomedical Texts. Knowledge-Based Systems, 218:106869.