A polynomial-time iterative algorithm for random graph matching with non-vanishing correlation
Abstract
We propose an efficient algorithm for matching two correlated Erdős–Rényi graphs with vertices whose edges are correlated through a latent vertex correspondence. When the edge density for a constant , we show that our algorithm has polynomial running time and succeeds to recover the latent matching as long as the edge correlation is non-vanishing. This is closely related to our previous work on a polynomial-time algorithm that matches two Gaussian Wigner matrices with non-vanishing correlation, and provides the first polynomial-time random graph matching algorithm (regardless of the regime of ) when the edge correlation is below the square root of the Otter’s constant (which is ).
1 Introduction
In this paper, we study the algorithmic perspective for recovering the latent matching between two correlated Erdős–Rényi graphs. To be mathematically precise, we first need to choose a model for a pair of correlated Erdős–Rényi graphs, and a natural choice is that the two graphs are independently sub-sampled from a common Erdős–Rényi graph, as described more precisely next. For two vertex sets and with cardinality , let be the set of unordered pairs with and define similarly with respect to . For some model parameters , we generate a pair of correlated random graphs and with the following procedure: sample a uniform bijection , independent Bernoulli variables with parameter for as well as independent Bernoulli variables with parameter for and . Let
(1.1) |
and let . It is obvious that marginally is an Erdős–Rényi graph with edge density and so is . In addition, the edge correlation is given by .
An important question is to recover the latent matching from the observation of . More precisely, we wish to find an estimator which is measurable with respect to such that . Our main contribution is an efficient matching algorithm as incorporated in the theorem below.
Theorem 1.1.
Suppose that for some constant and that is a constant. Then there exist a constant and an algorithm (see Algorithm 2.6 below) with time-complexity that recovers the latent matching with probability . That is, this polynomial-time algorithm takes as the input and outputs an estimator such that with probability tending to 1 as .
1.1 Backgrounds and related works
The random graph matching problem is motivated by questions from various applied fields such as social network analysis [40, 41], computer vision [8, 4], computational biology [48, 49, 19] and natural language processing [30]. In biology, an important problem is to identify proteins with similar structures/functions across different species. Toward this goal, directly comparing amino acid sequences that constitute proteins is often complicated, since genetic mutations of the species can result in significant variations of such a sequence. However, despite these variations, proteins typically maintain similar functions within each species’ metabolism. In light of this, biologists employ graph-based representations, such as Protein-Protein Interaction (PPI) graphs, for each species. Under the assumption that the topological structures of PPI graphs are similar across species, researchers can then effectively match proteins with similar functions by taking advantage of such similarity (and by possibly taking advantage of some other domain knowledge). This approach turns out to be successful and offers a nuanced understanding of phenomena such as the evolution of protein complexes. We refer the reader to [19] for more information on the topic. In the domain of social networks, data privacy is a fundamental and complicated problem. One complication arises from the fact that a user may have accounts in multiple social network platforms, where the user shares similar content or engages in comparable activities. Thus, from such similarities in different platforms, it is possible to infer their identities by aligning the respective graphs representing user interactions and content consumption. That is to say, it is possible to use graph matching techniques to deanonymize private social networks using information gleaned from public social platforms. In this scope, well-known examples include the deanonymization of Netflix using data from IMDb [40] and the deanonymization of Twitter using Flickr [41]. Viewing this problem from the opposite perspective, a deeper understanding on the random graph matching problem may offer insights on better mechanism to protect data privacy. With aforementioned applications in mind, the graph matching problem has been extensively studied recently from a theoretical point of view.
It is natural that different applications have their own distinct features, which mathematically boils down to a careful choice of underlying random graph model suitable for the desired application. Similar to most previous works on the random graph matching problem, in this paper we consider the correlated Erdős–Rényi random graph model, which is possibly an over idealization for any realistic network but nevertheless offers a good playground to develop insights and methods for this problem in general. By the collective efforts as in [10, 9, 31, 53, 52, 27, 13, 14], it is fair to say that we have a fairly complete understanding on the information thresholds for the problem of correlation detection and vertex matching. In contrast, the understanding on the computational aspect is far from being complete, and in what follows we briefly review the progress on this front.
A huge amount of efforts have been devoted to developing efficient algorithms for graph matching [42, 54, 34, 33, 24, 47, 1, 18, 21, 22, 5, 11, 12, 39, 26, 35, 36, 37, 28, 29, 38]. Previous to this work, arguably the best result is the recent work [38] (see also [28, 29] for a remarkable result on partial recovery of similar flavor when the average degree is ), where the authors substantially improved a groundbreaking work [36] and obtained a polynomial time algorithm which succeeds as long as the correlation is above the square root of the Otter’s constant (the Otter’s constant is around ). In terms of methods, the present work seems drastically different from these existing works; instead, this work is closely related to our previous work [17] on a polynomial-time iterative algorithm that matches two correlated Gaussian Wigner matrices. One is encouraged to see [38, Section 1.1] and [17, Section 1.1] for a more elaborated review on previous algorithms. In addition, one is encouraged to see [17, Section 1.2] for discussions on novel features of this iterative algorithm, especially in comparison with the message-passing algorithm [29, 43] and a recent greedy algorithm for aligning two independent graphs [15].
1.2 Our contributions
While the present work can be viewed as an extension of [17], we do feel that we have conquered substantial obstacles and have made substantial contributions to this topic, as we describe below.
-
•
While in [38] (see also [29]) polynomial-time matching algorithms were obtained when the correlation is above the threshold from the Otter’s constant, our work establishes a polynomial-time algorithm as long as the average degree grows polynomially and the correlation is non-vanishing. In addition, the power in the running time only tends to as the correlation tends to (for each fixed ), and we are under the feeling that this is the best possible; such feeling is supported by a recent work on the complexity of low-degree polynomials for graph matching [16].
-
•
From a conceptual point of view, our work demonstrates the robustness of the iterative matching algorithm proposed in [17]. This type of “algorithmic universality” is closely related to the universality phenomenon in random matrix theory, which roughly speaking postulates that the particular distribution for entries of random matrices is often irrelevant for spectral properties under investigation. Our work also encourages future study on even more ambitious perspectives for robustness, for instance algorithms that are robust with assumptions on underlying random graph models. This is of major interest since realistic networks are usually captured better by more structured graph models, such as the random geometric graph model [50], the random growing graph model [44] and the stochastic block model [45].
-
•
In terms of techniques, our work employs a method which argues that Gaussian approximation is valid typically. There are a couple of major challenges : (1) Gaussian approximation is valid only in a rather weak sense that the Radon-Nikodym derivative is not too large; (2) we can not simply ignore the atypical cases when Gaussian approximation is invalid since we have to analyze the conditional behavior given all the information the algorithm has exploited so far. The latter raises a major obstacle in our proof for theoretical guarantee; see Section 3.1 for more detailed discussions on how this obstacle is addressed.
In addition, it is natural to suspect that our work brings us one step closer to understanding computational phase transitions for random graph matching problems as well as understanding algorithmic perspectives for other matching problems (see, e.g., [6]). We refer an interested reader to [17, Section 1.3] and omit further discussions here.
1.3 Notations
We record in this subsection some notation conventions, and we point out that a list of commonly used notations is included at the end of the paper for better reference.
Denote the identity matrix by and the zero matrix by . For a matrix , we use to denote its transpose, and let denote the Hilbert-Schmidt norm of . For , define the -norm of by . Note that when is a symmetric square matrix, we have for
We further denote the operator norm of by . If , denote and the determinant and the trace of , respectively. For -dimensional vectors and a symmetric matrix , let be the “inner product” of with respect to , and we further denote . For two vectors , we say (or equivalently ) if their entries satisfy for all . The indicator function of a set is denoted by .
Without further specification, all asymptotics are taken with respect to . We also use standard asymptotic notations: for two sequences and , we write or , if for some absolute constant and all . We write or , if ; we write or , if and ; we write or , if as . We write if .
We denote by the Bernoulli distribution with parameter , denote by the Binomial distribution with trials and success probability , denote by the normal distribution with mean and variance , and denote by the multi-variate normal distribution with mean and covariance matrix . We say is a pair of correlated binomial random variables, denoted as for and , if with such that are independent with and the covariance between and is when , and that is independent with and is independent with when . We say is a sub-Gaussian variable, if there exists a positive constant such that , and we use to denote its sub-Gaussian norm.
Acknowledgment. We thank Zongming Ma, Yihong Wu, Jiaming Xu and Fan Yang for stimulating discussions on random graph matching problems. J. Ding is partially supported by NSFC Key Program Project No. 12231002 and the Xplorer Prize.
2 An iterative matching algorithm
We first describe the underlying heuristics of our algorithm (the reader is strongly encouraged to consult [17, Section 2] for a description on the iterative matching algorithm for correlated Gaussian Wigner matrices). Since we expect that Wigner matrices and Erdős–Rényi graphs (with sufficient edge density) should belong to the same algorithmic universality class, it is natural to try to extend the algorithm proposed in [17] to the case for correlated random graphs. As in [17], our wish is to iteratively construct a sequence of paired sets for (with and ), where each contains more true pairs of the form than the case when the two sets are sampled uniformly and independently. In addition, we may further require for convenience of analysis later.
For initialization in [17], we obtain true pairs via brute-force search, and provided with true pairs we then for each such pair define to be the collections of their neighbors such that the corresponding edge weights exceed a certain threshold. In this work, however, due to the sparsity of Erdős–Rényi graphs (when ) we cannot produce an efficient initialization by simply looking at the 1-neighborhoods of some true pairs. In order to address this, we instead look at their -neighborhoods with carefully chosen (see the definition of in (2.8) below). This would require a significantly more complicated analysis since this initialization will have influence on iterations later. The idea to address this is to argue that in the initialization we have only used information on a small fraction of the edges; this is why will be chosen carefully.
Provided with the initialization, the iteration of the algorithm is similar to that in [17] (although we will introduce some modifications in order to facilitate our analysis later). Since each pair carries some signal, we then hope to construct more paired sets at time by considering various linear combinations of vertex degrees restricted to each (or to ). As a key novelty of this iterative algorithm, as in [17] we will use the increase on the number of paired sets to compensate the decrease on the signal carried in each pair. As we hope, once the iteration progresses to time for some well chosen (see (2.35) below) we would have accumulated enough total signal so that we can just complete the matching directly in the next step, as described in Section 2.4.
However, controlling the correlation among different iterative steps is a much more sophisticated job in this setting. In [17] we used Gaussian projection to remove the influence of conditioning on information obtained in previous steps. This is indeed a powerful technique but it crucially relies on the property of Gaussian process. Although there are examples that the universality of iterative algorithms have been established (see, e.g., [2, 7, 23] for development on this front for approximate-message-passing), we are not sure how their techniques can help solving our problem since dealing with a pair of correlated matrices seems of substantial and novel challenge. Instead we try to compare the Bernoulli process we obtained in the iterative algorithm with the corresponding Gaussian process when we replace by a Gaussian process with the same mean and covariance structure. In order to facilitate such comparison, we also apply a Gaussian smoothing to our Bernoulli process in our algorithm below (see (2.29) where we introduce external Gaussian noise for smoothing purpose). However, since we need to analyze the conditional behavior of two processes, we need to compare their densities; this is much more demanding than controlling e.g. the transportation distance between two processes, and actually the density ratio of these two processes are fairly large. In order to address this, on the one hand, we will show that if we ignore a vanishing fraction of vertices (which is a highly non-trivial step as we will elaborate in Section 3.1), the density ratio is then under control while still being fairly large; on the other hand, we show that in the Gaussian setting some really bad event occurs with tiny probability (and thus still with small probability even after multiplying by this fairly large density ratio). We refer to Sections 3.4 and 3.5 for more detailed discussions on this point.
Finally, due to the aforementioned complications we are only able to show that our iterative algorithm constructs an almost exact matching. To obtain an exact matching, we will employ the method of seeded graph matching, as developed in previous works [1, 39, 38].
In the rest of this section, we will describe in detail our iterative algorithm, which consists of a few steps including preprocessing (see Section 2.1), initialization (see Section 2.2), iteration (see Section 2.3), finishing (see Section 2.4) and seeded graph matching (see Section 2.5). We formally present our algorithm in Section 2.6. In Section 2.7 we analyze the time complexity of the algorithm.
2.1 Preprocessing
Similar to [17], we make appropriate preprocessing on random graphs such that we only need to consider graphs with directed edges. We first make a technical assumption that we only need to consider the case when is a sufficiently small constant, which can be easily achieved by keeping each edge independently with a sufficiently small constant probability.
Now, we define from . For any , we do the following:
-
•
if , then independently among all such :
-
•
if , then set .
We define from in the same manner such that and are conditionally independent given . We continue to use the convention that . It is then straightforward to verify that and are two families of i.i.d. Bernoulli random variables with parameter . In addition, we have
Thus, are edge-correlated directed graphs, denoted as , such that and . Also note that and since . From now on we will work on the directed graph .
2.2 Initialization
For a pair of standard bivariate normal variables with correlation , we define by (below the number 10 is somewhat arbitrarily chosen)
(2.1) |
In addition, we define
(2.2) |
From the definition we know is strictly increasing and by [17, Claims 2.6 and 2.8] we have , and thus both and are positive and finite. Also we write . Recalling in Subsection 2.1 it was shown that can be assumed to be a sufficiently small constant, from now on we will assume that
(2.3) |
Let be a sufficiently large constant depending on whose exact value will be decided later in (2.10). Set . We then arbitrarily choose a sequence where ’s are distinct vertices in , and we list all the sequences of length with distinct elements in as where . As in [17], for each , we will run a procedure of initialization and iteration and clearly for one of them (although a priori we do not know which one it is) we are running the algorithm as if we have true pairs as seeds. For convenience, when describing the initialization and the iteration we will drop from notations, but we emphasize that this procedure will be applied to each . Having this clarified, we take a fixed and denote . In what follows, we abuse the notation and write when regarding as a set (similarly for ).
We next describe our initialization procedure. As discussed earlier, to the contrary of the case for Wigner matrices, we have to investigate the neighborhood around a seed up to a certain depth in order to get information for a large number of vertices. To this end, we choose an integer as the depth such that
(2.4) |
(this is possible since ). We choose such since on the one hand we wish to see a large number of vertices near the seed and on the other hand we want to reveal only a vanishing fraction of the edges. Now for , define the seeds
(2.5) |
Then for , we iteratively define the -neighborhood of each seed by
(2.6) | ||||
Also, for we iteratively define
(2.7) | ||||
We will show in Subsection 3.3 that actually we have
Let be the minimal integer such that , and set
(2.8) | ||||
And we further define
(2.9) | ||||
We may then choose sufficiently large such that and
(2.10) |
In addition, we define to be matrices by
(2.11) |
and in the iterative steps we will also construct and for . Similarly as in [17], the matrices and are supposed to approximate the cardinalities of the sets in the following sense. Write
(2.12) |
Then somewhat informally, we expect that
(2.13) | |||
(2.14) | |||
(2.15) | |||
(2.16) |
As in [17, Lemma 2.1], in order to facilitate our analysis later we will also need an important property on the eigenvalues of and :
(2.17) | ||||
and | (2.18) |
We will show in Subsection 3.3 that (2.13)–(2.18) are satisfied for . The main challenge is to construct and such that (2.13)–(2.18) hold for , under the inductive assumption that (2.13)–(2.18) hold for . We conclude this subsection with some bounds on .
Lemma 2.1.
for and . Also, we have for . In addition, we have either or .
Proof.
We prove the first claim by induction. The claim is trivial for . Now suppose the claim holds up to some . Using (2.7) and Poisson approximation (note that when we have )
which verifies the claim for and thus verifies the first claim (for ). If , we have and thus ; if , we have and thus by the choice of we have and using Poisson approximation. Thus, we have for (note that the case for can be checked in a straightforward manner). In addition, if for some arbitrarily small but fixed , then and thus . This completes the proof of the lemma. ∎
2.3 Iteration
We reiterate that in this subsection we are describing the iteration for a fixed and eventually this iterative procedure will be applied to each . Define
(2.19) |
for . Since we have assumed , we can then prove by induction that
(2.20) |
We now suppose that and have been constructed for (which will be implemented inductively via (2.29) as we describe next). Recall that we are working under the assumption that (2.13)–(2.18) hold for . For and , define to be the “normalized degrees” of in and of in as follows:
(2.21) | ||||
Recalling (2.12), we note that there is a difference between the definition (2.21) for and ; this is because and may only contain a vanishing fraction of vertices. We also point out that similar to [17], in the above definition we used the “centered” version of and since (2.13) suggests that intuitively each vertex (respectively, ) has probability approximately to belong to (respectively, ); such centering will be useful for our proof later as it leads to additional cancellation.
Assuming Lemma 2.2, we can then write and as their spectral decompositions:
(2.22) |
where
(2.23) |
and are the unit eigenvectors with respect to respectively. Next, for we define to be matrices as follows:
(2.24) | ||||
These matrices actually represent the covariance matrices for random vectors of the form and . To get a rough intuition of this, we (formally incorrectly) regard as a linear combination of with deterministic coefficients (and the same applies to ). Then we can see that for all , the “correlation” between and equals
This justifies our definition of which aims to record the correlation between and . Similarly under the same simplification we have (respectively, ) is the correlation matrix between and (respectively, between and ). In addition, from (2.13)–(2.16) we expect that and . Note that are accessible by the algorithm but is not (since it relies on the latent matching). We further define two linear subspaces as follows:
(2.25) | ||||
We refer to [17, Remark 3.3] for underlying reasons of the definition above. Note that the number of linear constraints posed on are at most . So
As proved in [17, (2.10) and (2.11)], we can choose from such that
(2.26) | |||
(2.27) |
Furthermore, we must have . As in [17], we will project the degrees to a set of carefully chosen directions in the space spanned by all ’s. These directions are defined as follows: we sample as i.i.d. uniform variables on . By [17, Proposition 2.4], these ’s satisfy [17, (2.21)–(2.24)] with probability at least 0.5. As in [17], we will keep resampling until these requirements are satisfied. Define
(2.28) |
We sample i.i.d. standard normal variables and complete our iteration by setting
(2.29) | ||||
In the above, we introduced a Gaussian smoothing . We believe this is not essential but provides technical convenience: on the one hand it probably simply reduces the efficiency of the algorithm since it weakens the signal, but on the other hand it facilitates the analysis since it brings the distribution closer to Gaussian. In addition, we have used the absolute value of a random variable instead of a random variable itself, with the purpose of introducing more symmetry as in [17] (e.g., to bound (3.95) below). Recall that we have explained that records the covariance matrix of for all . Thus, we expect that the correlation between and is approximately
In particular, the variance of each is approximately . Similarly, we can show the correlation between and is approximately , and the correlation between and is approximately , where
(2.30) |
(Here we also used that (2.16) implies that ). Recall our desire for (2.13)–(2.16) to hold for . Thus the signal contained in each pair at time is approximately
(2.31) |
By (2.27), we have that
(2.32) |
Recalling (2.3), we have , and thus (recall from (2.10) that )
(2.33) |
which verifies our statement that the signal in each pair is decreasing. We can then finish the iteration by defining to be matrices such that
(2.34) | ||||
Next, we state a lemma which then inductively justifies (2.17) and (2.18).
Lemma 2.2.
2.4 Almost exact matching
In this subsection we describe how we get an almost exact matching once we accumulate enough signal along the iteration. To this end, define
(2.35) |
Obviously . By (2.19), we have and as a result we have . In addition, recalling (2.32) we have
Using the choice of in (2.10) we see that the total signal is increasing in . We also have that
(2.36) |
For each , we run the procedure of initialization and then run the iteration up to time , and then we construct a permutation (with respect to ) as follows. For and , set for . We set a prefixed ordering of and as and , initialize the set to be , and initialize the sets , and to be empty sets. The algorithm processes in the increasing order of : for each , we find the minimal such that and
(2.37) |
We then define , put into and move from to . If there is no satisfying (2.37), we put into . Having processed all vertices in , we pair the vertices in and the (remaining) vertices in in an arbitrary but pre-fixed manner to get the matching .
We say a pair of sequences and is a good pair if
(2.38) |
The success of our algorithm lies in the following proposition which states that starting from a good pair we have that correctly recovers almost all vertices.
Proposition 2.3.
For a pair , define if . If is a good pair, then with probability , we have
2.5 From almost exact matching to exact matching
In this subsection, we employ a seeded matching algorithm [1] (see also [39, 55]) to enhance an almost exact matching (which we denote as in what follows) to an exact matching. Our matching algorithm is a simplified version of [1, Algorithm 4].
Algorithm 1 Seeded Matching Algorithm
At this point, we can run Algorithm 2.5 for each (which serves as the input ), and obtain the corresponding refined matching (which is the output ). By [1, Lemma 4.2] and Proposition 2.3, we see that with probability (note that [1, Lemma 4.2] applies to an adversarially chosen input as long as agrees with on fraction of vertices). Finally, we set
(2.39) |
Combined with [52, Theorem 4], it yields the following theorem.
Theorem 2.4.
With probability , we have .
2.6 Formal description of the algorithm
We are now ready to present our algorithm formally.
Algorithm 2 Random Graph Matching Algorithm
2.7 Running time analysis
In this subsection, we analyze the running time for Algorithm 2.6.
Proposition 2.5.
The running time for computing each is . Furthermore, the running time for Algorithm 2.6 is .
Proof.
We first prove the first claim. For each , it takes time to compute all and [17, Proposition 2.13] can be easily adapted to show that computing based on the initialization takes time . In addition, it is easy to see that Algorithm 2.5 runs in time . Altogether, this yields the claim.
We now prove the second claim. Since , the running time for computing all is . In addition, finding from takes time. So the total running time is . ∎
3 Analysis of the matching algorithm
The main goal of this section is to prove Proposition 2.3.
3.1 Outline of the proof
We fix a good pair . As in [17], the basic intuition is that each pair of carries signal of strength at least , and thus the total signal strength of all pairs will grow in (recall (2.36)). A natural attempt is to prove this via induction, for which a key challenge is to control correlations among different iterative steps. As a related challenge, we also need to show that the signals carried by different pairs are essentially non-repetitive. To this end, we will (more or less) follow [17] and propose the following admissible conditions on and we hope that this will allow us to verify these admissible conditions by induction. Define the targeted approximation error at time by
(3.1) |
Since and , we have
(3.2) |
Definition 3.1.
For and a collection of pairs with and , we say is -admissible if the following hold:
-
(i.)
for ;
-
(ii.)
for ;
-
(iii.)
and for ;
-
(iv.)
for and ;
-
(v.)
for and ;
-
(vi.)
for and ;
-
(vii.)
for , and ;
-
(viii.)
for , and .
-
(ix.)
for , and ;
-
(x.)
for , and .
Here and are defined previously in Section 2.
Proofs for [17, (3.3)-(3.8)] can be easily adapted (with no essential change) to show that under the assumption of admissibility, the matrices and concentrate around and respectively with error , and have entries bounded by . For notational convenience, define
(3.3) |
As hinted earlier, to verify inductively, the main difficulty is the complicated dependency among the iteration steps. In [17], much effort was dedicated to this issue even with the help of Gaussian property. In the present work, this is even much harder than in [17] due to the lack of Gaussian property (for instance, a crucial Gaussian property is that conditioned on linear statistics a Gaussian process remains Gaussian). To this end, our rough intuition is to try to compare our process with a Gaussian process whenever possible, and (as we will see) major challenge arises in case when such comparison is out of control.
We first consider the initialization. Define
(3.4) |
to be the set of vertices that we have explored for initialization either directly or indirectly (e.g., through correlation of the latent matching). We further denote . Define
(note that is measurable with respect to ). We have is measurable with respect to . We will show that conditioning on a realization of will not affect the degree of “most” vertices, thus verifying the concentration of inductively. This will eventually yield that holds with probability , as incorporated in Section 3.3.
We now consider the iterations. When comparing our process to the case of Wigner matrices, a key challenge is that in each time we can only control the behavior (i.e., show they are close to “Gaussian”) of all but a vanishing fraction of . This set of uncontrollable vertices will be inductively defined as in (3.15). However, the algorithm forces us to deal with the behavior of conditioned on all (since the algorithm has explored all these variables), and thus we also need to control the influence from these “bad” vertices. To address this problem, we will separate into two parts, one involved with bad vertices and one not. To this end, for and for , we decompose into a sum of two terms with (and similarly for the mathsf version), where
(3.5) |
Further, we define to be the -field generated by
(3.6) | ||||
Then we see that is fixed under for and , and for in a sense conditioned on we may view and as the “biased” part and the “free” part, respectively. We set , and inductively define
(3.7) | ||||
and then define to be the collection of vertices that are overly biased by the set (see (3.15)). Accordingly, we will give up on controlling the behavior for vertices in .
Now we turn to the term . We will try to argue that this “free” part behaves like a suitably chosen Gaussian process via the technique of density comparison, as in Section 3.4. To this end, we sample a pair of Gaussian matrices with zero diagonal terms such that their off-diagonals is a family of Gaussian process with mean 0 and variance , and the only non-zero covariance occurs on pairs of the form or (for ) where (this is analogous to the process defined in [17, Section 2.1]). In addition, we sample i.i.d standard normal variables for . (We emphasize that we will sample only once and then will stick to it throughout the analysis.) We can then define the following “Gaussian substitution”, where we replace each with for each : for , define to be a -dimensional vector whose -th entry is given by
(3.8) |
and we define similarly. We will use a delicate Lindeberg’s interpolation argument to bound the ratio of the densities before and after the substitution. To be more precise, we will process each pair sequentially (in an arbitrarily prefixed order) where we replace by . Define this operation as , and the key is to bound the change of density ratio for each operation. To this end, list as in the aforementioned prefixed order, and define a corresponding operation which replaces by . For and for any random variable , define
(3.9) |
where is the composition of the first operations and is the composition of the first operations . For notational convenience, we simply define to be the identity map. We will see that a crucial point of our argument is to employ suitable truncations; such truncations will be useful when establishing Lemma 3.15. To this end, we define to be the collection of vertices such that for some
(3.10) |
Let . We then define
Here is defined in (3.14) below and later we will explain that this definition is valid (although we have not defined yet). For , we inductively define to be the collection of vertices such that for some
(3.11) |
define , and define
Also we similarly define (using similar analogy for ). Having completed this inductive definition, we finally write . We will argue that after removing the remaining random variables have smoothed density whose change in each substitution can be bounded, thereby verifying that their original joint density is not too far away from that of a Gaussian process by a delicate Lindeberg’s argument. The details of such Lindeberg’s argument are incorporated in Section 3.4.
Thanks to the above discussions, we have essentially reduced the problem to analyzing the corresponding Gaussian process (not exactly yet since we still need to consider yet another type of bad vertices arising from Gaussian approximation as in (3.14) below). To this end, we will employ the techniques of Gaussian projection. Define where
(3.12) |
We will condition on (see (3.51) below)
(3.13) |
and thus is viewed as a Gaussian process. We can then obtain the conditional distribution of given (see Remark 3.22). In particular, we will show that the projection of onto has the form
Here , and is the conditional covariance matrix of given (3.13). In addition, is the conditional covariance between and , and is that between and . See (3.60) and Remark 3.22 for precise definitions of and . Furthermore, we define
and define the mathsf version similarly. We let be the collection of such that
(3.14) |
and let ; we will see that removing the vertices in is crucial for establishing Lemma 3.26. Since , and are all measurable with respect to , we could have defined before defining as in (3.11); we chose to postpone the definition of until now since it is only natural to introduce it after writing down the form of the Gaussian projection. Finally, we are ready to complete the inductive definition for the “bad” set as follows:
(3.15) |
We summarize in Figure 1 the logic flow for defining from and variables in , which should illustrate clearly that our definitions are not “cyclic”.

Recall (3.4) and recall . Let , and for let be the event such that for
(3.16) | ||||
By Lemma 2.1 and the fact that , we have on the event . Our hope is that, on the one hand occurs typically (as stated in Proposition 3.2 below) so that the number of “bad” vertices is under control, and on the other hand on most vertices can be dealt with by techniques of Gaussian projection which then allows to control the conditional distribution.
Proposition 3.2.
We have .
3.2 Preliminaries on probability and linear algebra
In this subsection we collect some standard lemmas on probability and linear algebra that will be useful for our further analysis.
The following version of Bernstein’s inequality appeared in [20, Theorem 1.4].
Lemma 3.3.
(Bernstein’s inequality). Let , where ’s are independent random variables such that almost surely. Then, for we have
where is the variance of .
We will also use Hanson-Wright inequality (see [32, 51, 25] and see [46, Theorem 1.1]), which is useful in controlling the quadratic form of sub-Gaussian random variables.
Lemma 3.4.
(Hanson-Wright Inequality). Let be a random vector with independent components which satisfy and for all . If is an symmetric matrix, then for an absolute constant we have
(3.17) |
The following is a corollary of Hanson-Wright inequality (see [17, Lemma 3.8]).
Lemma 3.5.
Let be mean-zero variables with for all . In addition, assume for all that is independent with where is obtained from by dropping its -th component (and similarly for ). Let be an matrix with diagonal entries being 0. Then for an absolute constant and for every
The following inequality is standard for posterior estimation.
Lemma 3.6.
For a random variable and an event in the same probability space, we have for .
Proof.
We have that
completing the proof of this lemma. ∎
We also need some results in linear algebra.
Lemma 3.7.
For two matrices , if and the entries of are bounded by , then .
Proof.
It suffices to show that for any . First we consider the case when . Since , we get . Next we consider the case when . In this case . Thus, and . Therefore, , as desired. ∎
Lemma 3.8.
For an matrix , suppose that there exist two partitions and with (for ) such that for and , and that . Then we have .
Proof.
Denote and . Define such that for and that otherwise. Then, there exist two permutation matrices such that
where is a matrix with size and with entries bounded by . Thus . Also we have , and thus the result follows from the triangle inequality. ∎
3.3 Analysis of initialization
In this subsection we analyze the initialization. We will prove a concentration result for for in Lemma 3.13, which then implies that as in Lemma 3.14. As preparations, we first collect a few technical estimates on binomial variables.
Lemma 3.9.
For and , we have
Proof.
The left hand side equals to . Since , a straightforward computation yields that for any fixed . Since , we also have that . Thus,
which yields the desired bound. ∎
Corollary 3.10.
For and , we have
Proof.
Let and be such that is independent of . Then the left hand side (in the corollary-statement) equals to
where the last inequality follows from Lemma 3.9. This gives the desired bound. ∎
Lemma 3.11.
For all and we have
Proof.
Writing , we have that the left hand side is bounded by . Applying Chebyshev’s inequality, we have . In addition, we have
(3.18) |
where . Here the last inequality above can be verified as follows: if , then (and thus the bound holds); if , then by Sterling’s formula we have , as desired. ∎
Corollary 3.12.
For and , we have
Proof.
Let , , and be independent pairs of variables. Let and . Then the difference of probabilities in the statement is equal to
where the last transition follows from Chebyshev’s inequality and (3.3). ∎
Recall (2.7). Define the targeted approximation error in the -th iteration of the initialization by
(3.19) |
Lemma 3.13.
The following hold with probability for all :
(1) for ;
(2) for ;
(3) for .
Proof.
The proof is by induction on . The base case for is trivial. Now suppose that Items (1), (2) and (3) hold up to some and we wish to prove that (1), (2) and (3) hold with probability for . To this end, applying (2.4) and Lemma 2.1 we have . Recall the definition of as in (3.4), which records the collection of vertices explored by our algorithm. By the induction hypothesis we know
(3.20) |
where the last transition follows from Lemma 2.1. Thus, it suffices to control the concentration of in order to control that for . Note that
where by Lemma 2.1. Since the indicators in the above sum are measurable with respect to , we see that conditioned on a realization of we have that is a collection of i.i.d. Bernoulli random variables with parameter given by
By the induction hypothesis, we have . Combined with Lemma 3.9, it yields that
(3.21) |
Thus, we may apply Lemma 3.3 and get that
(3.22) |
Similar results hold for . We now move to Item (2). Similarly, conditioned on a realization of , we have that
is a (normalized) sum of i.i.d. Bernoulli random variables with parameter give by
where . By the induction hypothesis we have . In addition, by Lemma 2.1 we have and thus . Combined with Corollary 3.10, these yield that
Applying Lemma 3.3 again, we get that
(3.23) |
The terms can be bounded in the same way. We next turn to Item (3). In order to bound , note that
Given a realization of , we have is a collection of i.i.d. Bernoulli variables, with parameter (by Lemma 3.10 and the induction hypothesis again)
where . By Lemma 3.3 again, we get that
(3.24) |
Combining (3.22), (3.23), and (3.24) and applying a union bound, we see that (assuming (1), (2) and (3) hold for ) the conditional probability for (1), (2) or (3) to fail for is at most since . Therefore, we complete the proof of Lemma 3.13 by induction (note that ). ∎
Lemma 3.14.
We have .
Proof.
By Lemma 3.13, with probability we have
Here the last inequality can be derived as follows: from Lemma 2.1, we have either or . In addition, when we have (recall (2.9))
And when we have
Provided with the preceding bound on , it suffices to analyze , whose conditional distribution given is the same as the sum of i.i.d. Bernoulli variables with parameter given by
By Lemma 3.13 again, we have with probability . Provided with this and applying Lemma 3.11 (with ), we get that
where we used (3.1) and the inequalities (by Lemma 2.1) as well as . By Lemma 3.3,
(3.25) |
We can obtain the concentration for , , and similarly. For instance, for , we note that given ,
is a sum of i.i.d. Bernoulli variables with parameter given by
By Lemma 3.13, we have with probability . Provided with this and applying Corollary 3.12 again (with ), we get that . Thus we can obtain a similar concentration bound using Lemma 3.3. We omit further details due to similarity.
3.4 Density comparison
Our proof of the admissibility along the iteration relies on a direct comparison of the smoothed Bernoulli density and the Gaussian density, which then allows us to use the techniques developed in [17] for correlated Gaussian Wigner matrices. Recall (3.6), (3.8) and (3.12). Our main result in this subsection is Lemma 3.16 below, and we need to introduce more notations before its statement.
For a random variable and a -field , we denote by the conditional density of given . For a realization of and a realization of , we define vector-valued functions for and , where for the -th component is given by
(3.27) | ||||
Similarly, we define where for the -th component is given by
(3.28) | ||||
In addition, define to be the corresponding realization of , i.e., is the collection of vertices satisfying either of (3.7), (3.10), (3.11) and (3.14) with replaced by and replaced by . Define a random vector by
where , and when , and when . Define similarly by replacing with and replacing with . Let be the vector obtained from by keeping its coordinates with , and let be the vector obtained from by keeping its coordinates with . Define and with respect to similarly. We also define and . Also, define , and define . Denote by the realization of . For any fixed realization , we further define to be the conditional density as follows:
where the support of is consistent with the choice of (i.e., is a legitimate realization for ). Define similarly but with respect to . For the purpose of truncation later, we say a realization for is an amenable set-realization, if
(3.29) |
Also, we say is an amenable bias-realization with respect to , if
(3.30) |
In addition, we say a realization for is an amenable variable-realization with respect to and , if it is consistent with the choice of and , and (below the vector is obtained by keeping the coordinates of with )
(3.31) | ||||
(3.32) | ||||
and | (3.33) |
Lemma 3.15.
On the event , with probability we have that
-
•
sampled according to is an amenable set-realization;
-
•
sampled according to is an amenable bias-realization with respect to ;
-
•
sampled according to is an amenable variable-realization with respect to and .
Proof.
We first consider . On we have . Note that are two subsets of and each has at most possible values. Thus,
Given an amenable set-realization , we now consider . By Lemma 3.6,
Finally we consider . Combining Markov’s inequality and (below we write )
we get that conditioned on , the (random) realization for satisfies (3.33) with probability at least . Similarly, the realization satisfies (3.32) with probability at least . In addition, for any and , recalling that is the realization for and using , (3.10) and (3.11), we derive from the triangle inequality that
(3.34) |
where denotes the (first) expression (3.27) but the summation is taken for all , denotes the expression (3.27) but the summation is taken for all , and denotes the expression (3.27) but the summation is taken for all . We have similarly. Since , for choosing in (3.10) gives that
(3.35) |
Combining (3.34) and (3.35) we can show that implies that . Altogether, we complete the proof of the lemma. ∎
Lemma 3.16.
For , on the event , fix an amenable set-realization , fix an amenable bias-realization with respect to , and fix an amenable variable-realization with respect to and . Then we have (below the vector is defined by keeping the coordinates of such that )
(3.36) |
Remark 3.17.
Since is measurable with respect to and thus is independent of , we have that . That is, we may add conditioning on in the denominator of (3.36), but this will not change its conditional density.
The key to the proof of Lemma 3.16 is the following bound on the “joint” density.
Lemma 3.18.
For , on the event , for an amenable set-realization , an amenable bias-realization with respect to and an amenable variable-realization with respect to and , we have
(3.37) |
The rest of this subsection is devoted to the proof of Lemma 3.18. Due to similarity, we only prove it for . Also since is an amenable variable-realization, the lower bound is obvious and as a result we only need to prove the upper bound. Note that is a function of , and that is independent with and with . Since for any independent random vectors and any function
(3.38) |
we can then apply (3.38) and get that (note that the forms of are different in the equality below)
(3.39) |
where . For an amenable set-realization (for convenience we will drop from notations in what follows), recall (3.9) and define from via the procedure in (3.9) (and similarly define from ). For , let be the event that
(3.40) |
Recalling Remark 3.17, we get from (3.39) that
Note that . Applying (3.38) with , and , we get that
Thus, for an amenable set-realization and an amenable bias-realization ,
Therefore, it remains to show that for an amenable variable-realization
(3.41) |
For a random variable , define to be the density of on the event , i.e., for any . From the definition we see that is increasing in (and thus is increasing in ). Combined with the facts that and , it yields that the left hand side of (3.41) is equal to (also note that )
(3.42) |
Since , we can conclude the proof of Lemma 3.18 by combining Lemma 3.19 below.
Lemma 3.19.
For an amenable set-realization and an amenable variable-realization , we have for all
The proof of Lemma 3.19 requires a couple of results on the Gaussian-smoothed density.
Lemma 3.20.
For and , let be a random vector such that for , and let be sub-Gaussian random variables independent with such that and for any . Define such that for and for (and define similarly). Then, for any positive definite matrix ,
(3.43) | ||||
Proof.
For write , where for and for . Then (3.43) is equal to . This motivates the consideration of high-dimensional Taylor’s expansion. Define , and . Then,
where the remainder is bounded by
(3.44) |
Since are random variables measurable with respect to , we get
(3.45) | ||||
Define to be the ’th standard basis in , and define from similar to defining from . Since
which is bounded by if for all , we have that for distinct
where arises since we split into the product of three copies of . Since for all , we have that
Combining the preceding two inequalities with
we get that . Similarly, we have
Therefore, . Combined with (3.44) and (3.45), it yields that (3.43) is bounded by
completing the proof since is independent of . ∎
The next lemma will be useful in bounding the density change when locally replacing Bernoulli variables by Gaussian variables.
Lemma 3.21.
For , let be a normal vector, and let be a sub-Gaussian vector independent with such that . Let , be random variables independent with such that all have the same law as , and . Also, suppose that and have the same covariance matrix. For any with -norms at most , define such that for and for ; we similarly define . Then for all , the joint densities of and satisfy that
Proof.
We have that , where is the expectation by averaging over . Writing and writing , we then have
(3.46) |
where and
(3.47) |
Here the equality in (3.46) holds since one may simply cancel out the denominator in with the factor in . Thus, it suffices to bound and separately.
We first bound . Applying Lemma 3.20 with and using and we get that
For notational convenience, we denote and the two expectations in the preceding inequality. Since , we have and . Thus,
(3.48) |
Since is a sub-Gaussian variable with sub-Gaussian norm at most , we see that . Consequently,
Combined with (3.48), it yields that
(3.49) |
We next bound . Applying Jensen’s inequality, we get that
where the second inequality uses independence and the last inequality uses . Thus, . Combined with (3.49), this yields the desired bound. ∎
By Lemma 3.21, we need to employ suitable truncations in order to control the density ratio; this is why we defined as in (3.11). We now prove Lemma 3.19.
Proof of Lemma 3.19..
Fix . We now set the framework for applying Lemma 3.21. Recall (the third equality of) (3.9). Define
Let , and let
Thus we see that is the coefficient of in . Similarly denote by the coefficient of in , denote by the coefficient of in , and denote by the coefficient of in . Further, define and . Then we have , and on we have that . Also, we have since is an amenable variable-realization. Thus, by and Cauchy-Schwartz inequality,
Finally, let be the covariance matrix of . Since is the covariance matrix of , we have . Also, and
Thus, we may apply Lemma 3.8 with
and derive that . Furthermore, we can bound by Lemma 3.7 as follows. Set (in Lemma 3.7) to be a matrix with and all other entries being 0, and set . Then , the entries of are bounded by , and is a block-diagonal matrix where each block is the covariance matrix of
So is semi-positive definite and thus . Since also the dimension of is bounded by , we have . Thus, we have that . Therefore, we can apply Lemma 3.7 with and and obtain that . Applying Lemma 3.21 and using independence between and , we get that
3.5 Gaussian analysis
Since
(3.50) |
when analyzing the process defined by (3.8) it would be convenient to and thus we will
(3.51) |
As such, the following process defined by (3.8) can be viewed as a Gaussian process:
Note that our convention here is consistent with (3.36) (see also Remark 3.17). Recall that is the -field generated by (see (3.12)), which is slightly different from the above process since in we have . We will study the conditional law of given . A plausible approach is to apply techniques of Gaussian projection developed in [17] (see also e.g. [3] for important development on problems related to a single random matrix). To this end, we define the operation as follows: for any function (of the form ), define
(3.52) |
Our definition of appears to be simpler than that in [17] thanks to (3.50). We emphasize that the two definitions are in fact identical, and the simplicity of the expression in (3.52) is due to the reason that we have introduced an independent copy of Gaussian process already for the purpose of applying Lindeberg’s argument. Further, when calculating with respect to and , we regard and as vector valued functions defined in (3.27) and (3.28). For convenience of definiteness, we list variables in in the following order: first we list all indexed by in the dictionary order and then we list all indexed by in the dictionary order. Since are i.i.d. standard Gaussian variables, it suffices to calculate correlations between variables in the collection . Under this ordering, on the event we have for all
since is a collection of independent variables (and similarly for ). Also, for all (recall that for and ), on the event we have
where the last transition follows from (3.16). Similarly we have (again on )
Thus, on we have
(3.53) | ||||
(3.54) | ||||
(3.55) |
In addition, we have
(3.56) |
where in the second equality above we also used that the entries of are bounded by when , and concentrate around respectively with error . In addition, we have that
(3.57) | ||||
(3.58) | ||||
(3.59) |
Recall that consists of variables in (3.12) where is a collection of standard Gaussian variables independent with . Therefore, we may write the -correlation matrix of as , such that the following hold:
-
•
have diagonal entries in ;
-
•
for each fixed , there are at most non-zero non-diagonal (and also the same for ) and these entries are all bounded by (this fact implies that );
-
•
is the matrix with row indexed by for and column indexed by for , and with entries given by .
Thus, by [17, Lemma 3.10] we have
(3.60) |
Here and are all dimensional vectors; and are indexed by triple with in the dictionary order; and are indexed by triple with in the dictionary order. Also and can be divided into sub-vectors as follows:
where and are dimensional vectors indexed by and , respectively. In addition, their entries are given by
Remark 3.22.
In conclusion, we have shown that
(3.61) | |||
(3.62) |
In the above is given by
(3.63) |
In addition, is given by , where
is a linear combination of Gaussian variables with coefficients fixed (recall (3.51)), and is an independent copy of (and similarly for ). For notational convenience, we denote (3.61) as , which is a vector with entries given by (the analogue of below for the mathsf version also holds)
We also denote (3.62) as . We will further use the notation (note that there is no tilde here) to denote the projection obtained from by replacing each with therein (and similarly for ).
Denote . We next control norms of these matrices.
Lemma 3.23.
On the event , we have if .
Lemma 3.24.
On the event , we have .
Similar versions of Lemmas 3.23 and 3.24 were proved in [17, Lemmas 3.13 and 3.15], and the proofs can be adapted easily. Indeed, by proofs in [17], in order to bound the operator norm it suffices to use the fact that . Also, in order to bound the -norm, it suffices to show that the operator norm is bounded by a constant and that when . All of these can be easily checked and thus we omit further details.
Lemma 3.25.
On the event , we have
Proof.
Recall (3.63). By (3.53), (3.54), (3.55) and (3.56) we get for and that ; The similar results also hold for . In addition, for we have that is equal to
and a similar bound applies to with and . Combined with Items (i) and (iv) in Definition 3.1 for , it yields that and .
We next bound . Applying Lemma 3.8 by setting , and , we can then derive that . ∎
Remark 3.22 provides an explicit expression for the conditional law of of . However, the projection is not easy to deal with since the expression of every variable in relies on (even for those indexed with ). A more tractable “projection” is the projection of onto for where
(3.64) |
We can similarly show that (recall that the conditional expectation is the same as the projection) has the form
(3.65) |
where , is the projection of onto , is defined by
(3.66) |
and is defined to be the covariance matrix of . Adapting proofs of Lemmas 3.23, 3.24 and 3.25, we can show that under , we have and .
Similarly, we denote by a vector obtained from replacing the substituted Gaussian entries with the original Bernoulli variables in the projection (i.e, on the right hand side of (3.65)). The projection is more tractable than (recall its definition in Remark 3.22) since is measurable with respect to (recall (3.6)) and thus we can use induction to control . Another advantage is that the matrix is measurable with .
Lemma 3.26.
On the event , we have
And similar result holds for .
Proof.
By the triangle inequality, the left hand side in the lemma-statement is given by
(3.67) | ||||
(3.68) |
By (3.14), we have
It remains to bound (3.68). Note that for and and
As in the proof of Lemma 3.25, we can choose , and (since we are on the event ). We can then apply Lemma 3.8 and get that . Again by applying Lemma 3.8 for such and , we get . Thus,
Since in addition have operator norms bounded by , we get that
This implies that , as required. ∎
3.6 Proof of Proposition 3.2
In this subsection, we prove Proposition 3.2 by induction on . Recall the definition of (see (3.3)) and (see (3.16)). For a given realization for , define vectors and where and for . We recall and define as follows:
where , and when , as well as when . In what follows, we will use and to denote realization for and respectively. In addition, we define a mean-zero Gaussian process
(3.69) |
where each variable has variance 1 and the only non-zero covariance is given by
We defined (3.69) since eventually we will show that this is a good approximation for our actual process (see our definition of below). For , we also define
and we make analogous definitions for the mathsf version for . For , we define
Note that holds obviously. For notational convenience in the induction, we will also denote by and as the whole space. Also, by Lemma 3.13 holds with probability since . In addition, we have by Lemma 3.14. With these clarified, our inductive proof consists of the following steps:
Step 1. If holds for , then holds with probability ;
Step 2. If hold for , then holds with probability ;
Step 3. If holds for , then holds with probability ;
Step 4. If hold for , then holds with probability ;
Step 5. If hold for , then holds with probability .
3.6.1 Step 1:
In what follows, we assume holds without further notice. As we will see, the philosophy of our proof throughout this subsection is to first consider an arbitrarily fixed realization for e.g. and , and then we prove a bound on the tail probability for some “bad” event—we shall emphasize that, we will not compute the conditional probability as it will be difficult to implement; instead we will compute the probability (which we denote as ) by simply treating and as deterministic objects. Formally, we define the operation as follows: for any function (of the form ) and any realization for and , define
Then the operator is defined such that
Note that this definition of is consistent with that in [17]. It is also consistent with (3.52) except that thanks to (3.50) for the special case considered in (3.52) simplifications were applied.
Provided with , we can now precisely define for any event . After bounding the -probability, we apply a union bound over all possible realizations which then justifies that the bad event indeed typically will not occur; this union bound is necessary exactly since what we have computed earlier is not the conditional probability. The key to our success is that the tail probability is so small that it can afford a union bound.
Lemma 3.27.
We have
Proof.
Recall the definition of as in (3.7). We first consider an arbitrarily fixed realization of and . Since the events over are independent of each other and by Lemma 3.3 each such event occurs with probability at most , we can then apply Lemma 3.3 (again) and derive that
Clearly, a similar estimate holds for . We next apply a union bound over all admissible realizations for . Since we need to choose at most subsets of (or ), the enumeration is bounded by . Therefore, applying a union bound over all these realizations and over , we obtain the desired estimate by recalling that . ∎
Lemma 3.28.
We have where
Proof.
Recall (3.14). For each fixed admissible realization of and a realization of , we have that the matrices and (and thus the matrix ) are fixed. Define a vector such that and for for each . Then, we have
(3.70) |
or we have a version of (3.70) with replaced by . Without loss of generality, we in what follows assume that (3.70) holds. For , we have that
So the covariance matrix of (we denote as ) is a block-diagonal matrix with each block of dimension at most and of entries bounded by . As a result, . Thus, regarding as a deterministic vector, we have is a linear combination of , with variance given by
In addition, the coefficient of each can be bounded as follows: for denoting the coefficient of in , we have for and we have . Combined with Lemmas 3.24 and 3.25, it yields that the coefficient of in the linear combination satisfies that
Thus, recalling (3.70), we can apply Lemma 3.3 to each realization of and derive that (noting that on we have )
which is bounded by . Here in the above display the factor counts the enumeration for possible realizations of . At this point, we apply a union bound over all possible realizations of and (whose enumeration is again bounded by ), completing the proof of the lemma. ∎
In the next few lemmas, we control .
Lemma 3.29.
With probability we have .
Proof.
Recall (3.10). For each fixed realization of and and for each we can apply Lemma 3.3 and obtain that
Also . Then applying a union bound over , we get that
where in the last inequality we use the bound of in Lemma 2.1. Under the -measure, the events over are independent of each other. Thus, another application of Lemma 3.3 yields that
which is bounded by . Now, a union bound over all possible realizations (which is at most ) and over (as well as for the mathsf version) completes the proof. ∎
Lemma 3.30.
With probability we have
(3.71) | ||||
(3.72) |
Proof.
We will prove (3.71) and the proof for (3.72) is similar. Recall (3.11). For each fixed realization , , we apply Lemma 3.3 and obtain that
Since under the -measure we have independence among for different , we can then apply Lemma 3.3 again and get that
Then a union bound over all possible realizations (whose enumeration is bounded by ) completes the proof. ∎
We may assume that all the typical events as described in Lemmas 3.27, 3.28, 3.29 and 3.30 hold (note that this occurs with probability ). Then, we see that (by (3.72)). In addition, we have that
(3.73) |
This (together with events in Lemmas 3.27 and 3.28) implies that
Combined with the induction hypothesis , this yields that
(3.74) |
3.6.2 Step 2:
Before controlling , we prove a couple of lemmas as preparations.
Lemma 3.31.
For any two matrices , we have .
Proof.
Since , we have is semi-positive definite. Thus,
which yields . Similarly we can show . ∎
Lemma 3.32.
For any , let and let be positive definite matrices. Suppose that and . Then for all
Proof.
Recalling the formula for Gaussian density, we have that
(3.75) |
Let be a positive definite matrix such that . We have
(3.76) |
where is an identity matrix. We next control the determinants of . Let be the eigenvalues of . Then and . Also, using and the fact that for , we have that
where the last inequality follows from Lemma 3.31. Combined with (3.75) and (3.6.2), this completes the proof of the lemma. ∎
We now return to . Recall (3.36) as in Lemma 3.16. It remains to bound the density ratio between and . Recall that in Remark 3.22 we have shown
Let be the covariance matrix of the process
let be the covariance matrix of
(3.77) |
and let be the covariance matrix of
Also define vectors such that for
Applying Lemma 3.32 we have
(3.78) |
where the expectation is taken over . We need a few estimates on and .
Claim 3.33.
On the event , we have .
Proof.
On we know . Thus, . We can bound similarly. ∎
Claim 3.34.
We have .
Proof.
By definition, we see that is the sum of the identity matrix and a positive definite matrix, and thus we have . Similar results hold for and . ∎
Claim 3.35.
On , we have and .
Proof.
By [17, (3.12),(3.61)] which follow from standard properties for general Gaussian processes, we have
Here the coefficient does not matter since this result follows from the fact that
for general Gaussian process . Recalling (3.62), we see that the covariance matrix of equals to
Thus, we have . Combined with Lemmas 3.23 and 3.25, it yields that . In addition, applying Lemmas 3.23, 3.25 and 3.31, we get
(3.79) |
Furthermore, by (3.54) and (3.57) we get that ; by (3.55) and (3.58) we get that ; by (3.56) and (3.59) we get that . Also, for , by (3.53) we have for and
Combined with Items (i) and (iv) of in Definition 3.1, this implies that . Applying Lemma 3.8 by setting , and , we can deduce that . Combined with (3.79) and the fact that , this completes the proof by the triangle inequality. ∎
Corollary 3.36.
On the event , we have as well as .
Proof.
Corollary 3.37.
On the event we have and .
Proof.
By the definition of , we see that is a block-diagonal matrix where for we have is a matrix with diagonal entries 1 and non-diagonal entries
Thus, . By Claim 3.35 and the triangle inequality, we get that . ∎
We are now ready to show that typically occurs. In what follows, we assume on the event without further notice. Formally, we abuse the notation by meaning ; we abuse notation this way (and similarly in later subsections) since it shortens the notation and the meaning should be clear from the context. Define be the event that the realizations of and are amenable. By Lemma 3.15 we have
(3.80) |
By Lemma 3.16, under we have (recall that is implied in )
Plugging Claims 3.33 and 3.34 and Corollaries 3.36 and 3.37 into (3.78), we have under
where . Altogether, we get that
Thus, to estimate probability for it suffices to show that the preceding upper bound is out of control only with probability . By Lemma 3.16, (as we will show) it suffices to control this probability under the measure . To this end, define
By Claims 3.33 and 3.34, we have
Since the mean is equal to the variance in this case and , we then obtain from the tail probability of normal distribution that
(3.81) |
Next, we consider . On the event , we have
Recalling the definitions of , we have that is a mean zero Gaussian vector. This motivates us to write
By Claim 3.33 and Corollary 3.36, is a Gaussian variable with mean 0 and variance . Then,
(3.82) |
where and
It remains to bound . To this end, we see that there exist a linear transform and a standard normal random vector such that
and (so in particular ). Thus,
is a quadratic form of a standard Gaussian vector. We also have the following estimate:
where the second inequality follows from Corollaries 3.36 and 3.37. In addition, we have
where the first inequality follows from Lemma 3.31 and the second inequality follows from Corollaries 3.36, 3.37. We can then apply Lemma 3.4 and obtain that
Plugging the estimates of into (3.82) we get that the left hand side of (3.82) is bounded by . Combined with (3.81) and Lemma 3.16 it yields that
(3.83) |
Combined with (3.80), this yields that (by recalling (3.1) and noting that )
(3.84) |
3.6.3 Step 3:
It is straightforward to bound the probability for on the event . Indeed,
where the latter probability is bounded by using Chernoff bound. Thus, applying union bound on we have
(3.85) |
3.6.4 Step 4:
Recall Definition 3.1 and (3.3). The goal of this subsection is to prove
(3.86) |
To this end, we will verify Condition (i.)–(x.) in Definition 3.1. Since (i.), (ii.) and (iii.) are controlled by (3.26), we then focus on the other conditions. In what follows, we always assume that and hold. Crucially, we will reduce our analysis for events on under the conditioning of and to the same events on (recalling (3.77)) thanks to ; the latter would be much easier to estimate. To be more precise, note that
(3.87) | ||||
For , by (3.7) we have , and as a result . Thus, under the conditioning of and we have
Therefore, the conditional probability of (3.87) can be bounded by the conditional probability of the right hand side in the preceding inequality under the conditioning of and . We will tilt the measure to the same event on (as explained earlier), and on we know this tilting loses at most a factor of . Also on we have . Therefore, the conditional probability of (3.87) is bounded by the following probability up to a factor of :
(3.88) |
Since is a collection of i.i.d. standard normal variables, we have , and thus
Similarly a lower deviation for can be derived, completing the verification for (iv.). The bounds on , and (which correspond to (iv.), (v.) and (vi.) respectively) can be proved similarly.
Furthermore, we bound (which correspond to (vii.), (viii.), (ix.) and (x.) respectively). Note that under , we have that ’s are fixed subsets for . In addition, on the event we have . Thus,
Since on the event , the above can be bounded similarly to that for . The same applies to the other two items here. We omit further details since the modifications are minor.
Putting all above together, we finally complete the proof of (3.86).
3.6.5 Step 5:
We assume that hold throughout this subsection without further notice. Thus, we have
where the first inequality follows from Lemma 3.26 and the second inequality relies on our assumption that holds. In light of this, to show it suffices to bound for each the conditional probability given and of the event
(3.89) |
For notation convenience, we will write and as and in the rest of the subsection. In order to bound (3.89), we expand the matrix product in (3.65) into a summation as follows (we write and for below):
(3.90) |
where the summation is taken over . So we know that is bounded up to a factor of by the maximum of
where the maximum is taken over . Thus, it suffices to bound each term in the maximum. For simplicity, we only demonstrate how to bound terms of the form:
(3.91) |
Lemma 3.38.
We have for all
Since the bounds for other terms are similar, by Lemma 3.38 and a union bound, we get that . Applying a union bound then yields that
(3.92) |
Proof of Lemma 3.38.
Recall (3.66). We first divide the summation in into three parts , where accounts for the summation over and can be written as
and accounts for the summation over and can be written as
and accounts for the summation over and can be written as
By Cauchy-Schwartz inequality, we have that . We first bound . Using (3.54), on the event we have
Thus, recalling we have
(3.93) |
Next we bound . A straightforward calculation yields that
We tilt the measure on to again. Write and . We will first bound
(3.94) |
Note that
(3.95) | ||||
(3.96) | ||||
(3.97) |
To bound (3.95), note that ’s for different are independent, and the entries of are bounded by (since ). Also, we have for , and thus (this is the place where we use the symmetry from taking the absolute values as discussed below (2.29); in fact would have been 0 if were 0). We then get that by Chernoff bound. To bound (3.96), note that is a mean-zero Gaussian variable, with variance bounded by
using again. Thus we get . We now bound (3.97). Note that are mean-zero sub-Gaussian variables, and are independent with . In addition, we have . Thus, we can apply Lemma 3.5 and get that
Combining bounds on (3.95) and (3.96), we have . Thus, recalling definition of and averaging over and we have that
(3.98) |
It remains to bound . Again, using Cauchy-Schwartz inequality we have
Again, by tilting the measure to , we get that is bounded by (recall from above that )
We can write , where is given by . We see that ’s satisfy that
Using again, we have . Thus, we get that . Combined with Azuma-Hoeffding inequality, it yields that
This then implies that (by using and averaging again)
(3.99) |
Combined with (3.93) and (3.98), this completes the proof of Lemma 3.38. ∎
3.6.6 Conclusion
By putting together (3.74), (3.84), (3.85), (3.86) and (3.92), we have proved Step 1–Step 5 listed at the beginning of this subsection. In addition, since , our quantitative bounds imply that all these hold simultaneously for with probability . By the inductive logic explained at the beginning of this subsection, we complete the proof of Proposition 3.2. We also point out that in addition we have shown that
(3.100) |
which will be used in Section 4.
4 Almost exact matching
In this section we show that on the event , our algorithm matches all but a vanishing fraction of vertices with probability , thereby proving Proposition 2.3 (recall (3.100)). For notational convenience, we will drop from subscripts (unless we wish to emphasize it). That is, we will write instead of .
In light of (2.37), we define to be the collection of such that
and we define to be the collection of directed edges (with ) such that
It is clear that and will potentially lead to mis-matching for our algorithm in the finishing stage. As a result, our proof requires bounds on them.
Lemma 4.1.
We have .
Proof.
We assume . If , we have using . Let be a realization for . Then for . Thus,
(4.1) |
We again use the tilted measure. By the definition of , the events
are independent for different , and each has probability at most (by and Lemma 3.4 again)
Recalling the definition of and recalling that , we derive that
Since the enumeration for possible realizations of is at most , this completes the proof by a simple union bound. ∎
Lemma 4.2.
On with probability we have the following: any subset of has cardinality at most if each vertex is incident to at most one edge in this subset.
Proof.
Suppose otherwise there exists with such that for all
(4.2) |
We again tilt the measure. Note that the events
are independent and each occurs with probability at most . Therefore, by similarly applying and applying a union bound, we complete the proof of the lemma. ∎
We are now ready to provide the proof of Proposition 2.3.
Proof of Proposition 2.3.
By Lemmas 4.1 and 4.2, we assume without loss of generality that the events described in these two lemmas both occur. Let . Suppose in the finishing step (i.e, for the step of computing (2.37)) our algorithm processes in the order of . For , in order to have , either our algorithm assigns a wrong matching to or at the time when processing the vertex has already been matched to some other vertex. We then construct a directed graph on vertices as follows: for each , if the finishing step puts into and matches to some with , then we connect a directed edge from to . Note our algorithm will not match a vertex twice, so all vertices have in-degree and out-degree both at most 1. Also, for , if has out-degree 0, then must have been matched to some and thus there is a directed edge . Thus, the directed graph is a collection of non-overlapping directed chains. Since there are at least edges in (recall that each is incident to at least one edge in ), we can get a matching with cardinality at least . Since , we can then get a matching restricted on with cardinality at least . By the event in Lemma 4.2, we see that , completing the proof. ∎
Appendix A Index of notation
Here we record some commonly used symbols in the paper, along with their meaning and the location where they are first defined. Local notations will not be included.
-
•
: pre-processed graphs; Subsection 2.1.
-
•
: pre-processed edge density; Subsection 2.1.
-
•
: pre-processed edge correlation; Subsection 2.1.
-
•
, : lower and upper bounds for the increment of ; (2.2).
-
•
: number of sets generated in initialization; (2.10).
-
•
: depth of neighborhood in initialization; (2.4).
- •
-
•
: fraction of -neighborhood and their interactions; (2.7).
- •
- •
-
•
: fraction of sets generated at time ; (2.12).
-
•
: number of sets generated at time in iteration; (2.19).
-
•
: signal contained in each pair at time ; (2.31).
- •
-
•
: direction of projections at time ; (2.28).
-
•
: time when the iteration stops; (2.35).
-
•
: degree of vertices to the sets at time ; (2.21).
-
•
: targeted approximation error at time ; (3.1).
-
•
: matrices recording the correlation among iteration; (2.24).
-
•
: the vertices revealed in initialization; (3.4).
-
•
: Gaussian vectors introduced for smoothing; Section 2.3.
-
•
: the information used up to time ; (3.6).
-
•
: the bias part and the good part of degree. (3.5).
-
•
: independently sampled Gaussian matrices; Subsection 3.1.
-
•
: the Gaussian process up to time ; (3.12).
-
•
: the information generated by the Gaussian process up to time ; (3.12).
-
•
: the collection of bad vertices up to time ; (3.15).
-
•
: the collection of vertices with large bias; (3.7).
-
•
: the collection of vertices with large degree; (3.11).
-
•
: the collection of vertices with bad projection; (3.14).
-
•
: the degree replaced by Gaussian; (3.8).
-
•
: covariance matrix of Gaussian variables at time ; Remark 3.22.
-
•
: covariance matrix of revised Gaussian variables at time ; Subsection 3.5.
-
•
: coefficient matrix of Gaussian variables at time ; Remark 3.22.
-
•
: coefficient matrix of revised Gaussian variables at time ; Subsection 3.5.
-
•
: Gaussian process with simple covariance structure; (3.69).
-
•
: the revised Gaussian process up to time ; (3.64).
-
•
: the information generated by revised Gaussian process; (3.64).
References
- [1] B. Barak, C.-N. Chou, Z. Lei, T. Schramm, and Y. Sheng. (nearly) efficient algorithms for the graph matching problem on correlated random graphs. In Advances in Neural Information Processing Systems (NIPS), volume 32. Curran Associates, Inc., 2019.
- [2] M. Bayati, M. Lelarge and A. Montanari. Universality in polytope phase transitions and message passing algorithms. In The Annals of Applied Probability. 25(2):753–822, 2015.
- [3] M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. In IEEE Transactions on Information Theory, 57(2):764–785, 2011.
- [4] A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondences. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 26–33, 2005.
- [5] M. Bozorg, S. Salehkaleybar, and M. Hashemi. Seedless graph matching via tail of degree distribution for correlated Erdős–Rényi graphs. Preprint, arXiv:1907.06334.
- [6] S. Chen, S. Jiang, Z. Ma, G. P. Nolan, and B. Zhu. One-way matching of datasets with low rank signals. Preprint, arXiv:2204.13858.
- [7] W. Chen and W. Lam. Universality of approximate message passing algorithms. In Electronic Journal of Probability, 26:1–44, 2021.
- [8] T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In Advances in Neural Information Processing Systems (NIPS), volume 19. MIT Press, 2006.
- [9] D. Cullina and N. Kiyavash. Exact alignment recovery for correlated Erdős–Rényi graphs. Preprint, arXiv:1711.06783.
- [10] D. Cullina and N. Kiyavash. Improved achievability and converse bounds for erdos-renyi graph matching. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pages 63–72, New York, NY, USA, 2016. Association for Computing Machinery.
- [11] D. Cullina, N. Kiyavash, P. Mittal, and H. V. Poor. Partial recovery of Erdős–Rényi graph alignment via -core alignment. In Proceedings of the 2020 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pages 99–100, New York, NY, USA, 2020. Association for Computing Machinery.
- [12] O. E. Dai, D. Cullina, N. Kiyavash, and M. Grossglauser. Analysis of a canonical labeling algorithm for the alignment of correlated Erdős–Rényi graphs. In ACM SIGMETRICS Performance Evaluation Review, 47(1):96–97, 2019.
- [13] J. Ding and H. Du. Detection threshold for correlated Erdős–Rényi graphs via densest subgraph. In IEEE Transactions on Information Theory, 69(8):5289–5298, 2023.
- [14] J. Ding and H. Du. Matching recovery threshold for correlated random graphs. In Annals of Statistics, 51(4):1718–1743, 2023.
- [15] J. Ding, H. Du, and S. Gong. A polynomial-time approximation scheme for the maximal overlap of two independent Erdős–Rényi graphs. To appear in Random Structures and Algorithms.
- [16] J. Ding, H. Du and Z. Li. Low-Degree Hardness of Detection for Correlated Erdős–Rényi Graphs. Preprint, arXiv: 2311.15931.
- [17] J. Ding and Z. Li. A polynomial time iterative algorithm for matching Gaussian matrices with non-vanishing correlation. Preprint, arXiv:2212.13677.
- [18] J. Ding, Z. Ma, Y. Wu, and J. Xu. Efficient random graph matching via degree profiles. In Probability Theory and Related Fields, 179(1-2):29–115, 2021.
- [19] A. Elmsallati, C. Clark, and J. Kalita. Global alignment of protein-protein interaction networks: A survey. In IEEE/ACM transactions on computational biology and bioinformatics. 13(4):689–705, 2015.
- [20] D. P. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge, 2009.
- [21] Z. Fan, C. Mao, Y. Wu, and J. Xu. Spectral graph matching and regularized quadratic relaxations: Algorithm and theory. In Foundations of Computational Mathematics, 23(5):1511–1565, 2023.
- [22] Z. Fan, C. Mao, Y. Wu, and J. Xu. Spectral graph matching and regularized quadratic relaxations II: Erdős–Rényi graphs and universality. In Foundations of Computational Mathematics, 23(5):1567–1617, 2023.
- [23] Z. Fan, T. Wang and X. Zhong. Universality of Approximate Message Passing algorithms and tensor networks. Preprint, arXiv:2206.13037.
- [24] S. Feizi, G. Quon, M. Medard, M. Kellis, and A. Jadbabaie. Spectral alignment of networks. Preprint, arXiv:1602.04181.
- [25] S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Applied and Numerical Harmonic Analysis. Birkhäuser/Springer, New York, 2013.
- [26] L. Ganassali and L. Massoulié. From tree matching to sparse graph alignment. In Proceedings of Thirty Third Conference on Learning Theory (COLT), volume 125 of Proceedings of Machine Learning Research, pages 1633–1665. PMLR, 09–12 Jul 2020.
- [27] L. Ganassali, L. Massoulié, and M. Lelarge. Impossibility of partial recovery in the graph alignment problem. In Proceedings of Thirty Fourth Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 2080–2102. PMLR, 15–19 Aug 2021.
- [28] L. Ganassali, L. Massoulié, and M. Lelarge. Correlation detection in trees for planted graph alignment. To appear in Annals of Applied Probability.
- [29] L. Ganassali, L. Massoulié, and G. Semerjian. Statistical limits of correlation detection in trees. arXiv:2209.13723.
- [30] A. Haghighi, A. Ng, and C. Manning. Robust textual inference via graph matching. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 387–394, Vancouver, British Columbia, Canada, Oct 2005.
- [31] G. Hall and L. Massoulié. Partial recovery in the graph alignment problem. In Operations Research, 71(1):259–272, 2023.
- [32] D. L. Hanson and F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics, 42:1079–1083, 1971.
- [33] E. Kazemi, S. H. Hassani, and M. Grossglauser. Growing a graph matching from a handful of seeds. In Proceedings Of The VLDB Endowment, 8(10):1010–1021, jun 2015.
- [34] V. Lyzinski, D. E. Fishkind, and C. E. Priebe. Seeded graph matching for correlated Erdős–Rényi graphs. In Journal of Machine Learning Research, 15:3513–3540, 2014.
- [35] C. Mao, M. Rudelson, and K. Tikhomirov. Random Graph Matching with Improved Noise Robustness. In Proceedings of Thirty Fourth Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 1–34, 2021.
- [36] C. Mao, M. Rudelson, and K. Tikhomirov. Exact matching of random graphs with constant correlation. In Probability Theory and Related Fields, 186(2):327–389, 2023.
- [37] C. Mao, Y. Wu, J. Xu, and S. H. Yu. Testing network correlation efficiently via counting trees. To appear in Annals of Statistics.
- [38] C. Mao, Y. Wu, J. Xu, and S. H. Yu. Random graph matching at Otter’s threshold via counting chandeliers. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 1345–1356, 2023.
- [39] E. Mossel and J. Xu. Seeded graph matching via large neighborhood statistics. In Random Structures Algorithms, 57(3):570–611, 2020.
- [40] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (ISSP), pages 111–125, 2008.
- [41] A. Narayanan and V. Shmatikov. De-anonymizing social networks. In 2009 30th IEEE Symposium on Security and Privacy (ISSP), pages 173–187, 2009.
- [42] P. Pedarsani and M. Grossglauser. On the privacy of anonymized networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1235–1243, New York, NY, USA, 2011. Association for Computing Machinery.
- [43] G. Piccioli, G. Semerjian, G. Sicuro, and L. Zdeborová. Aligning random graphs with a sub-tree similarity message-passing algorithm. In Journal Of Statistical Mechanics-theory And Experiment, (6):Paper No. 063401, 44, 2022.
- [44] M. Z. Racz and A. Sridhar. Correlated randomly growing graphs. In Annals of Applied Probability, 32(2):1058–1111, 2022.
- [45] M. Z. Racz and A. Sridhar. Correlated stochastic block models: Exact graph matching with applications to recovering communities. In Advances in Neural Information Processing Systems (NIPS), volume 34. Curran Associates, Inc., 2021.
- [46] M. Rudelson and R. Vershynin. Hanson-Wright inequality and sub-Gaussian concentration. In Electronic Communications in Probability, 18(24):1–9, 2013.
- [47] F. Shirani, S. Garg, and E. Erkip. Seeded graph matching: Efficient algorithms and theoretical guarantees. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 253–257, 2017.
- [48] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. In Proceedings of the National Academy of Sciences of the United States of America, 105:12763–12768, 2008.
- [49] J. T. Vogelstein, J. M. Conroy, V. Lyzinski, L. J. Podrazik, S. G. Kratzer, E. T. Harley, D. E. Fishkind, R. J. Vogelstein, and C. E. Priebe. Fast approximate quadratic programming for graph matching. In PLOS ONE, 10(4):1–17, 04 2015.
- [50] H. Wang, Y. Wu, J. Xu, and I. Yolou. Random graph matching in geometric models: the case of complete graphs. In Conference on Learning Theory (COLT), volume 178 of Proceedings of Machine Learning Research, pages 3441–3488, 2022.
- [51] F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables whose distributions are not necessarily symmetric. In Annals of Probability, 1(6):1068–1070, 1973.
- [52] Y. Wu, J. Xu, and S. H. Yu. Settling the sharp reconstruction thresholds of random graph matching. In IEEE Transactions on Information Theory, 68(8): 5391–5417, 2022.
- [53] Y. Wu, J. Xu, and S. H. Yu. Testing correlation of unlabeled random graphs. In The Annals of Applied Probability, 33(4): 2519–2558, 2023.
- [54] L. Yartseva and M. Grossglauser. On the performance of percolation graph matching. In Proceedings of the First ACM Conference on Online Social Networks (COSN), pages 119–130, New York, NY, USA, 2013. Association for Computing Machinery.
- [55] L. Yu, J. Xu and X. Lin. Graph matching with partially-correct seeds. In Journal of Machine Learning Research, 22(1):12829–12882, 2021.