This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Settling the Sharp Reconstruction Thresholds
of Random Graph Matching

Yihong Wu, Jiaming Xu, and Sophie H. Yu This paper was presented in part at the IEEE International Symposium on Information Theory, July, 2021. Y. Wu is with Department of Statistics and Data Science, Yale University, New Haven CT, USA, [email protected]. J. Xu and S. H. Yu are with The Fuqua School of Business, Duke University, Durham NC, USA, {jx77,haoyang.yu}@duke.edu. Y. Wu is supported in part by the NSF Grant CCF-1900507, an NSF CAREER award CCF-1651588, and an Alfred Sloan fellowship. J. Xu is supported by the NSF Grants IIS-1838124, CCF-1850743, and CCF-1856424.
Abstract

This paper studies the problem of recovering the hidden vertex correspondence between two edge-correlated random graphs. We focus on the Gaussian model where the two graphs are complete graphs with correlated Gaussian weights and the Erdős-Rényi model where the two graphs are subsampled from a common parent Erdős-Rényi graph 𝒢(n,p){\mathcal{G}}(n,p). For dense Erdős-Rényi graphs with p=no(1)p=n^{-o(1)}, we prove that there exists a sharp threshold, above which one can correctly match all but a vanishing fraction of vertices and below which correctly matching any positive fraction is impossible, a phenomenon known as the “all-or-nothing” phase transition. Even more strikingly, in the Gaussian setting, above the threshold all vertices can be exactly matched with high probability. In contrast, for sparse Erdős-Rényi graphs with p=nΘ(1)p=n^{-\Theta(1)}, we show that the all-or-nothing phenomenon no longer holds and we determine the thresholds up to a constant factor. Along the way, we also derive the sharp threshold for exact recovery, sharpening the existing results in Erdős-Rényi graphs [CK16, CK17].

The proof of the negative results builds upon a tight characterization of the mutual information based on the truncated second-moment computation in [WXY20] and an “area theorem” that relates the mutual information to the integral of the reconstruction error. The positive results follows from a tight analysis of the maximum likelihood estimator that takes into account the cycle structure of the induced permutation on the edges.

1 Introduction

The problem of graph matching (or network alignment) refers to finding the underlying vertex correspondence between two graphs on the sole basis of their network topologies. Going beyond the worst-case intractability of finding the optimal correspondence (quadratic assignment problem [PW94, BCPP98]), an emerging line of research is devoted to the average-case analysis of graph matching under meaningful statistical models, focusing on either information-theoretic limits [CK16, CK17, CKMP19, HM20, WXY20, Gan20] or computationally efficient algorithms [FQRM+16, LFF+16, DMWX20, BCL+19, FMWX19a, FMWX19b, GM20]. Despite these recent advances, the sharp thresholds of graph matching remain not fully understood especially for approximate reconstruction. The current paper aims to close this gap.

Following [PG11, DMWX20], we consider the following probabilistic model for two random graphs correlated through a hidden vertex correspondence. Let the ground truth π\pi be a uniformly random permutation on [n][n]. We generate two random weighted graphs on the common vertex set [n][n] with (weighted) adjacency vectors A=(Aij)1i<jnA=(A_{ij})_{1\leq i<j\leq n} and B=(Bij)1i<jnB=(B_{ij})_{1\leq i<j\leq n} such that (Aπ(i)π(j),Bij)\left(A_{\pi(i)\pi(j)},B_{ij}\right) are i.i.d. pairs of correlated random variables with a joint distribution PP, which implicitly depends on nn. Of particular interest are the following two special cases:

  • (Gaussian model): P=𝒩((00),(1ρρ1))P={\mathcal{N}}\Big{(}\left(\begin{smallmatrix}0\\ 0\end{smallmatrix}\right),\left(\begin{smallmatrix}1&\rho\\ \rho&1\end{smallmatrix}\right)\Big{)} is the joint distribution of two standard Gaussian random variables with correlation coefficient ρ>0\rho>0. In this case, we have B=ρAπ+1ρ2ZB=\rho A^{\pi}+\sqrt{1-\rho^{2}}Z, where AA and ZZ are independent standard normal vectors and Aijπ=Aπ(i)π(j)A^{\pi}_{ij}=A_{\pi(i)\pi(j)}.

  • (Erdős-Rényi random graph): PP denotes the joint distribution of two correlated Bern(q){\rm Bern}(q) random variables XX and YY such that {Y=1X=1}=s\mathbb{P}\left\{Y=1\mid X=1\right\}=s, where qs1q\leq s\leq 1. In this case, AA and BB are the adjacency vectors of two Erdős-Rényi random graphs G1,G2𝒢(n,q)G_{1},G_{2}\sim{\mathcal{G}}(n,q), where G1πG_{1}^{\pi} (with the adjacency vector AπA^{\pi}) and G2G_{2} are independently edge-subsampled from a common parent graph G𝒢(n,p)G\sim{\mathcal{G}}(n,p) with p=q/sp=q/s.

Given the observation AA and BB, the goal is to recover the latent vertex correspondence π\pi as accurately as possible. More specifically, given two permutations π,π^\pi,\widehat{\pi} on [n][n], denote the fraction of their overlap by

𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)1n|{i[n]:π(i)=π^(i)}|.\mathsf{overlap}(\pi,\widehat{\pi})\triangleq\frac{1}{n}\left|\left\{i\in[n]:\pi(i)=\widehat{\pi}(i)\right\}\right|.
Definition 1.

We say an estimator π^\widehat{\pi} of π\pi achieves, as nn\to\infty,

  • partial recovery, if {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ}=1o(1)\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)\geq\delta\right\}=1-o(1) for some constant δ(0,1)\delta\in(0,1);

  • almost exact recovery, if {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ}=1o(1)\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)\geq\delta\right\}=1-o(1) for any constant δ(0,1)\delta\in(0,1);

  • exact recovery, if {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)=1}=1o(1)\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)=1\right\}=1-o(1).

The information-theoretic threshold of exact recovery has been determined for the Erdős-Rényi graph model [CK17] in the regime of p=O(log3(n))p=O\left(\log^{-3}(n)\right) and more recently for the Gaussian model [Gan20]; however, the results and proof techniques in [CK17] do not hold for denser graphs. In contrast, approximate recovery are far less well understood. Apart from the sharp condition for almost exact recovery in the sparse regime p=nΩ(1)p=n^{-\Omega(1)} [CKMP19], only upper and lower bounds are known for partial recovery [HM20]. See Section 1.2 for a detailed review of these previous results.

In this paper, we characterize the sharp reconstruction thresholds for both the Gaussian and dense Erdős-Rényi graphs with p=no(1)p=n^{-o(1)}. Specifically, we prove that there exists a sharp threshold, above which one can estimate all but a vanishing fraction of the latent permutation and below which recovering any positive fraction is impossible, a phenomenon known as the “all-or-nothing” phase transition [RXZ19]. This phenomenon is even more striking in the Gaussian model, in the sense that above the threshold the hidden permutation can be estimated error-free with high probability. In contrast, for sparse Erdős-Rényi graphs with p=nΘ(1)p=n^{-\Theta(1)}, we show that the all-or-nothing phenomenon no longer holds. To this end, we determine the threshold for partial recovery up to a constant factor and show that it is order-wise smaller than the almost exact recovery threshold found in [CKMP19].

Along the way, we also derive a sharp threshold for exact recovery, sharpening existing results in [CK16, CK17]. As a byproduct, the same technique yields an alternative proof of the result in [Gan20] for the Gaussian model.

1.1 Main results

Throughout the paper, we let ϵ>0\epsilon>0 denote an arbitrarily small but fixed constant. Let π^𝖬𝖫\widehat{\pi}_{\sf ML} denote the maximum likelihood estimator, which reduces to

π^𝖬𝖫argmaxπAπ,B.\widehat{\pi}_{\sf ML}\in\arg\max_{\pi^{\prime}}\left\langle A^{\pi^{\prime}},B\right\rangle\,. (1)
Theorem 1 (Gaussian model).

If

ρ2(4+ϵ)lognn,\rho^{2}\geq\frac{(4+\epsilon)\log n}{n}, (2)

then {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^ML,π)=1}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi}_{\rm ML},\pi\right)=1\right\}=1-o(1).

Conversely, if

ρ2(4ϵ)lognn,\rho^{2}\leq\frac{(4-\epsilon)\log n}{n}, (3)

then for any estimator π^\widehat{\pi} and any fixed constant δ>0\delta>0, {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)\leq\delta\right\}=1-o(1).

Theorem 1 implies that in the Gaussian setting, the recovery of π\pi exhibits a sharp phase transition in terms of the limiting value of nρ2logn\frac{n\rho^{2}}{\log n} at threshold 44, above which exact recovery is possible and below which even partial recovery is impossible. The positive part of Theorem 1 was first shown in [Gan20]. Here we provide an alternative proof that does not rely on the Gaussian property and works for Erdős-Rényi graphs as well.

The next result determines the sharp threshold for the Erdős-Rényi model in terms of the key quantity nps2nps^{2}, the average degree of the intersection graph G1G2G_{1}\wedge G_{2} (whose edges are sampled by both G1G_{1} and G2G_{2}).

Theorem 2 (Erdős-Rényi graphs, dense regime).

Assume pp is bounded away from 11 and p=no(1)p=n^{-o(1)}. If

nps2(2+ϵ)lognlog1p1+p,nps^{2}\geq\frac{(2+\epsilon)\log n}{\log\frac{1}{p}-1+p}, (4)

then for any constant δ<1\delta<1, {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^ML,π)δ}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi}_{\rm ML},\pi\right)\geq\delta\right\}=1-o(1). Conversely, if

nps2(2ϵ)lognlog1p1+p,nps^{2}\leq\frac{(2-\epsilon)\log n}{\log\frac{1}{p}-1+p}, (5)

then for any estimator π^\widehat{\pi} and any constant δ>0\delta>0, {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)\leq\delta\right\}=1-o(1).

Theorem 2 implies that analogous to the Gaussian model, in dense Erdős-Rényi graphs, the recovery of π\pi exhibits an “all-or-nothing” phase transition in terms of the limiting value of nps2(log1p1+p)logn\frac{nps^{2}\left(\log\frac{1}{p}-1+p\right)}{\log n} at threshold 22, above which almost exact recovery is possible and below which even partial recovery is impossible. However, as we will see in Theorem 4, unlike the Gaussian model, this threshold differs from that of exact recovery.

Remark 1 (Entropy interpretation of the thresholds).

The sharp thresholds in Theorem 1 and Theorem 2 can be interpreted from an information-theoretic perspective. Suppose an estimator π^=π^(A,B)\widehat{\pi}=\widehat{\pi}(A,B) achieves almost exact recovery with 𝔼[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)]=1o(1)\mathbb{E}[\mathsf{overlap}(\pi,\widehat{\pi})]=1-o(1), which, by a rate-distortion computation, implies that I(π;π^)I(\pi;\widehat{\pi}) must be close to the full entropy of π\pi, that is, I(π;π^)=(1o(1))nlognI(\pi;\widehat{\pi})=(1-o(1))n\log n. On the other hand, by the data processing inequality, we have I(π;π^)I(π;A,B)I(\pi;\widehat{\pi})\leq I(\pi;A,B). The latter can be bounded simply as (see Section 1.3.2)

I(π;A,B)(n2)I(P),I(\pi;A,B)\leq\binom{n}{2}I(P), (6)

where I(P)I(P) denote the mutual information between a pair of random variables with joint distribution PP. For the Gaussian model, we have

I(P)=12log11ρ2.I(P)=\frac{1}{2}\log\frac{1}{1-\rho^{2}}. (7)

For the correlated Erdős-Rényi graph,

I(P)=qd(sq)+(1q)d(ηq),I(P)=qd(s\|q)+(1-q)d(\eta\|q), (8)

where q=psq=ps, ηq(1s)1q\eta\triangleq\frac{q(1-s)}{1-q}, and d(sq)D(Bern(s)Bern(q))d(s\|q)\triangleq D({\rm Bern}(s)\|{\rm Bern}(q)) denotes the binary KL divergence. By Taylor expansion, we have I(P)=s2p(p1+log1p)(1o(1))I(P)=s^{2}p\left(p-1+\log\frac{1}{p}\right)(1-o(1)). Combining these with (n2)I(P)(1o(1))nlogn\binom{n}{2}I(P)\geq(1-o(1))n\log n shows the impossibility of almost exact recovery under the conditions (3) and (5). The fact that they are also necessary for partial recovery takes more effort to show, which we do in Section 2.

Theorem 3 (Erdős-Rényi graphs, sparse regime).

Assume p=nΩ(1)p=n^{-\Omega(1)}. If

nps2(2+ϵ)max{lognlog(1/p),2},nps^{2}\geq(2+\epsilon)\max\left\{\frac{\log n}{\log(1/p)},2\right\}, (9)

then there exists a constant δ>0\delta>0 such that {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^ML,π)δ}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi}_{\rm ML},\pi\right)\geq\delta\right\}=1-o(1). Conversely, assuming np=ω(log2n)np=\omega(\log^{2}n), if

nps21ϵ,\displaystyle nps^{2}\leq 1-\epsilon, (10)

then for any estimator π^\widehat{\pi} and any constant δ>0\delta>0, {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)\leq\delta\right\}=1-o(1).

Theorem 3 implies that for sparse Erdős-Rényi graphs with p=nαp=n^{-\alpha} for a constant α(0,1)\alpha\in(0,1), the information-theoretic thresholds for partial recovery is at nps21nps^{2}\asymp 1, which is much lower than the almost exact recovery threshold nps2=ω(1)nps^{2}=\omega(1) as established in [CKMP19]. Hence, interestingly the all-or-nothing phenomenon no longer holds for sparse Erdős-Rényi graphs. Note that the conditions (9) and (10) differ by a constant factor. Determining the sharp threshold for partial recovery in the sparse regime remains an open question.

Finally, we address the exact recovery threshold in the Erdős-Rényi graph model. For ease of notation, we consider a general correlated Erdős-Rényi graph model specified by the joint distribution P=(pab:a,b{0,1})P=(p_{ab}:a,b\in\{0,1\}), so that {Aπ(i)π(j)=a,Bij=b}=pab for a,b{0,1}.\mathbb{P}\left\{A_{\pi(i)\pi(j)}=a,B_{ij}=b\right\}=p_{ab}\text{ for }a,b\in\{0,1\}. In this general Erdős-Rényi model, π^𝖬𝖫\widehat{\pi}_{\sf ML} is again given by the maximization problem (1) if p11p00>p01p10p_{11}p_{00}>p_{01}p_{10} (positive correlation) and changes to minimization if p11p00<p01p10p_{11}p_{00}<p_{01}p_{10} (negative correlation). The subsampling model is a special case with positive correlation, where

p11=ps2,p10=p01=ps(1s),p00=12ps+ps2.\displaystyle p_{11}=ps^{2},\quad p_{10}=p_{01}=ps(1-s),\quad p_{00}=1-2ps+ps^{2}. (11)
Theorem 4 (Erdős-Rényi graphs, exact recovery).

Under the subsampling model (11), if

n(p00p11p01p10)2(1+ϵ)logn,\displaystyle n\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\geq\left(1+\epsilon\right)\log n, (12)

then {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^ML,π)=1}=1o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi}_{\rm ML},\pi\right)=1\right\}=1-o(1).

Conversely, if

n(p00p11p01p10)2(1ϵ)logn,\displaystyle n\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\leq\left(1-\epsilon\right)\log n, (13)

then for any estimator π^\widehat{\pi}, {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)=1}=o(1).\mathbb{P}\left\{\mathsf{overlap}\left(\widehat{\pi},\pi\right)=1\right\}=o(1).

If pp is bounded away from 11, Theorem 4 implies that the exact recovery threshold is given by limnnps2(1p)2logn=1\lim_{n\to\infty}\frac{nps^{2}\left(1-\sqrt{p}\right)^{2}}{\log n}=1. Since log1p1+p2(1p)2\log\frac{1}{p}-1+p\geq 2(1-\sqrt{p})^{2}, with equality if and only if p=1p=1, the threshold of exact recovery is strictly higher than that of almost exact recovery in the Erdős-Rényi graph model, unlike the Gaussian model. If p=1o(1)p=1-o(1), Theorem 4 implies that the exact recovery threshold is given by limnnρ2logn=4\lim_{n\to\infty}\frac{n\rho^{2}}{\log n}=4, where ρs(1p)1ps\rho\triangleq\frac{s(1-p)}{1-ps} denotes the correlation parameter between Aπ(i)π(j)A_{\pi(i)\pi(j)} and BijB_{ij} for any i<ji<j under the latent permutation π\pi.

1.2 Comparisons to Prior work

Exact recovery

The information-theoretic thresholds for exact recovery have been determined for the Gaussian model and the general Erdős-Rényi graph model in certain regimes. In particular, for the Gaussian model, it is shown in [Gan20] that if nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n for any constant ϵ>0\epsilon>0, then the MLE achieves exact recovery; if instead nρ2(4ϵ)lognn\rho^{2}\leq(4-\epsilon)\log n, then exact recovery is impossible. Theorem 1 significantly strengthens this negative result by showing that under the same condition even partial recovery is impossible.

Analogously, for Erdős-Rényi random graphs, it is shown in [CK16, CK17] that the MLE achieve exact recovery when nps2=logn+ω(1)nps^{2}=\log n+\omega(1) under the additional restriction that p=O(log3(n))p=O(\log^{-3}(n)).111In fact, [CK16, CK17] studied exact recovery condition in a more general correlated Erdős-Rényi model where {Aπ(i)π(j)=a,Bij=b}=pab\mathbb{P}\left\{A_{\pi(i)\pi(j)}=a,B_{ij}=b\right\}=p_{ab} for a,b{0,1}a,b\in\{0,1\}, which will also be the setting in Section 4. Conversely, exact recovery is shown in [CK16] to be information-theoretically impossible provided that nps2lognω(1)nps^{2}\leq\log n-\omega(1), based on the fact the intersection graph G1G2𝒢(n,ps2)G_{1}\wedge G_{2}\sim{\mathcal{G}}(n,ps^{2}) has many isolated nodes with high probability. These two results together imply that when p=O(log3(n))p=O(\log^{-3}(n)), the exact recovery threshold is given by limnps2logn=1\lim\frac{nps^{2}}{\log n}=1, coinciding with the connectivity threshold of G1G2G_{1}\wedge G_{2}. In comparison, Theorem 4 implies that if pp is bounded away from 11, the precise exact recovery threshold is instead given by limnps2(1p)2logn=1\lim\frac{nps^{2}\left(1-\sqrt{p}\right)^{2}}{\log n}=1, strictly higher than the connectivity threshold. In particular, deriving the tight condition (13) requires more than eliminating isolated nodes. In fact, our results show that when pp is bounded away from 11 and logn<nps2<logn(1p)2\log n<nps^{2}<\frac{\log n}{(1-\sqrt{p})^{2}}, exact recovery still fails even when the intersection graph is asymmetric (no non-trivial automorphism) with high probability [Wri71]. See the discussions in Section 1.3.4 for more details.

Partial and almost exact recovery

Compared to exact recovery, the understanding of partial and almost exact recovery is less precise. It is shown in [CKMP19] that in the sparse regime p=nΩ(1)p=n^{-\Omega(1)}, almost exact recovery is information-theoretically possible if and only if nps2=ω(1)nps^{2}=\omega(1). The more recent work [HM20] further investigates partial recovery. It is shown that if nps2C(δ)max{1,lognlog(1/p)}nps^{2}\geq C(\delta)\max\left\{1,\frac{\log n}{\log(1/p)}\right\}, then there exists an exponential-time estimator π^\widehat{\pi} that achieves 𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ\mathsf{overlap}\left(\widehat{\pi},\pi\right)\geq\delta with high probability, where C(δ)C(\delta) is some large constant that tends to \infty as δ1\delta\to 1; conversely, if I(P)δ=o(log(n)n)\frac{I(P)}{\delta}=o\left(\frac{\log\left(n\right)}{n}\right) with I(P)I(P) given in (8), then no estimator can achieve 𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ\mathsf{overlap}\left(\widehat{\pi},\pi\right)\geq\delta with positive probability. These conditions do not match in general and are much looser than the results in Theorems 2 and 3.

1.3 Proof Techniques

We start by introducing some preliminary definitions associated with permutations (cf. [WXY20, Section 3.1] for more details and examples).

1.3.1 Node permutation, edge permutation, and cycle decomposition

Let 𝒮n{\mathcal{S}}_{n} denote the set of permutations on the node set [n][n]. Each σ𝒮n\sigma\in{\mathcal{S}}_{n} induces a permutation σ𝖤\sigma^{\sf E} on the edge set ([n]2)\binom{[n]}{2} of unordered pairs, according to

σ𝖤((i,j))(σ(i),σ(j)).\sigma^{\sf E}((i,j))\triangleq(\sigma(i),\sigma(j)). (14)

We shall refer to σ\sigma and σ𝖤\sigma^{\sf E} as a node permutation and edge permutation. Each permutation can be decomposed as disjoint cycles known as orbits. Orbits of σ\sigma (resp. σ𝖤\sigma^{\sf E}) are referred as node orbits (resp. edge orbits). Let nkn_{k} (resp. NkN_{k}) denote the number of kk-node (resp. kk-edge) orbits in σ\sigma (resp. σ𝖤\sigma^{\sf E}). The cycle structure of σ𝖤\sigma^{\sf E} is determined by that of σ\sigma. For example, we have

N1=(n12)+n2,\displaystyle N_{1}=\binom{n_{1}}{2}+n_{2}, (15)

because an edge (i,j)(i,j) is a fixed point of σ𝖤\sigma^{\sf E} if and only if either both ii and jj are fixed points of σ\sigma or (i,j)(i,j) forms a 22-node orbit of σ\sigma. Let FF be the set of fixed points of σ\sigma with |F|=n1|F|=n_{1}. Denote 𝒪1=(F2)𝒪{\mathcal{O}}_{1}=\binom{F}{2}\subset{\mathcal{O}} as the subset of fixed points of edge permutation σ𝖤\sigma^{\sf E}, where 𝒪{\mathcal{O}} denotes the collection of all edge orbits of σ𝖤\sigma^{\sf E}.

1.3.2 Negative results on partial recovery

Let 𝒫{\mathcal{P}} denote the joint distribution of AA and BB under the correlated model. To prove our negative results, we introduce an auxiliary null model 𝒬{\mathcal{Q}}, under which AA and BB are independent with the same marginal as 𝒫{\mathcal{P}}. In other words, under 𝒬{\mathcal{Q}}, (Aij,Bij)(A_{ij},B_{ij}) are i.i.d. pairs of independent random variables with a joint distribution QQ equal to the product of the marginals of PP.

As the first step, we leverage the previous truncated second moment computation in [WXY20] to conclude that the KL-divergence 𝒟(𝒫𝒬){\mathcal{D}}({\mathcal{P}}\|{\mathcal{Q}}) is negligible under the desired conditions. By expressing the mutual information as I(π;A,B)=(n2)D(PQ)D(𝒫𝒬)I(\pi;A,B)=\binom{n}{2}D(P\|Q)-D({\mathcal{P}}\|{\mathcal{Q}}) where D(PQ)=I(P)D(P\|Q)=I(P), this readily implies that I(π;A,B)=(n2)I(P)(1+o(1))I(\pi;A,B)=\binom{n}{2}I(P)(1+o(1)). Next, we relate the mutual information I(π;A,B)I(\pi;A,B) to the integral of the minimum mean-squared error (MMSE) of AπA^{\pi}, the weighted adjacency vector relabeled according to the ground truth. For the Gaussian model, this directly follows from the celebrated I-MMSE formula [GSV05]. For correlated Erdős-Rényi graphs, we introduce an appropriate interpolating model and obtain an analogous but more involved “area theorem”, following [MMRU09, DAM17]. These two steps together imply that the MMSE of AπA^{\pi} given the observation (A,B)(A,B) is asymptotically equal to the estimation error of the trivial estimator 𝔼[Aπ]\mathbb{E}\left[A^{\pi}\right]. Finally, we connect the MMSE of AπA^{\pi} to the Hamming loss of reconstructing π\pi, concluding the impossibility of the partial recovery.

Note that by the non-negativity of D(𝒫𝒬)D({\mathcal{P}}\|{\mathcal{Q}}), we arrive at the simple upper bound (6), that is, I(π;A,B)(n2)I(P)I(\pi;A,B)\leq\binom{n}{2}I(P). Interestingly, our proof relies on establishing an asymptotically matching lower bound to the mutual information I(π;A,B)I(\pi;A,B). This significantly deviates from the existing results in [HM20] based on Fano’s inequality: {𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δ}1I(π;A,B)+1log(n!/m)\mathbb{P}\left\{\mathsf{overlap}(\widehat{\pi},\pi)\leq\delta\right\}\geq 1-\frac{I(\pi;A,B)+1}{\log(n!/m)} with m=|{π:𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π)δ}|m=|\{\pi:\mathsf{overlap}(\pi,\pi)\geq\delta\}|, followed by applying the simple bound (6).

1.3.3 Positive results on partial and almost exact recovery

Our positive results follow from a large deviation analysis of the maximum likelihood estimator (1). A crucial observation is that the difference between the objective function in (1) evaluated at a given permutation π\pi^{\prime} and that at the ground truth π\pi can be decomposed across the edge orbits of σπ1π\sigma\triangleq\pi^{-1}\circ{\pi^{\prime}} as

AπAπ,B=O𝒪𝒪1XOO𝒪𝒪1YOXY,\left\langle A^{\pi^{\prime}}-A^{\pi},B\right\rangle=\sum_{O\in{\mathcal{O}}\setminus{\mathcal{O}}_{1}}X_{O}-\sum_{O\in{\mathcal{O}}\setminus{\mathcal{O}}_{1}}Y_{O}\;\triangleq X-Y,

where FF is the fixed point of σ\sigma, 𝒪1=(F2)𝒪{\mathcal{O}}_{1}=\binom{F}{2}\subset{\mathcal{O}} is a subset of fixed points of the edge permutation σ𝖤\sigma^{\sf E}, XO(i,j)OAπ(i)π(j)BijX_{O}\triangleq\sum_{(i,j)\in O}A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}B_{ij}, and YO(i,j)OAπ(i)π(j)BijY_{O}\triangleq\sum_{(i,j)\in O}A_{\pi(i)\pi(j)}B_{ij}, are independent across edge orbits OO. Crucially, YY depends on π\pi^{\prime} only through its fixed point set FF, which has substantially fewer choices than π\pi^{\prime} itself when n|F|nn-|F|\asymp n. Therefore, for the purpose of applying the union bound it is beneficial to separately control XX and YY. Indeed, we show that YY is highly concentrated on its mean. Hence, it remains to analyze the large-deviation event of XX exceeding 𝔼[Y]\mathbb{E}\left[Y\right], which is accomplished by a careful computation of the moment generation function (MGF) M|O|𝔼[exp(tXO)]M_{|O|}\triangleq\mathbb{E}\left[\exp\left(tX_{O}\right)\right] and proving that

M|O|M2|O|/2, for |O|2.\displaystyle M_{|O|}\leq M_{2}^{|O|/2},\text{ for }|O|\geq 2. (16)

Intuitively, it means that the contribution of longer edge orbits can be effectively bounded by that of the 22-edge orbits. Capitalizing on this key finding and applying the Chernoff bound together with a union bound over π\pi^{\prime} yields the tight condition when q=o(1)q=o(1). When q=Θ(1)q=\Theta(1), it turns out that the correlation between XX and YY can no longer be ignored so that separately controlling XX and YY leads to suboptimal results. To remedy this issue, we need to center the adjacency vectors AA and BB. More precisely, define X¯O(i,j)O(Aπ(i)π(j)q)(Bijq)\overline{X}_{O}\triangleq\sum_{(i,j)\in O}(A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}-q)(B_{ij}-q), Y¯O(i,j)O(Aπ(i)π(j)q)(Bijq)\overline{Y}_{O}\triangleq\sum_{(i,j)\in O}(A_{\pi(i)\pi(j)}-q)(B_{ij}-q), X¯O𝒪𝒪1X¯O\overline{X}\triangleq\sum_{O\in{\mathcal{O}}\setminus{\mathcal{O}}_{1}}\overline{X}_{O}, and Y¯O𝒪𝒪1Y¯O\overline{Y}\triangleq\sum_{O\in{\mathcal{O}}\setminus{\mathcal{O}}_{1}}\overline{Y}_{O}. By definition, XY=X¯Y¯X-Y=\overline{X}-\overline{Y}. Crucially, it can be verified that X¯O\overline{X}_{O} and Y¯O\overline{Y}_{O} (and hence X¯\overline{X} and Y¯\overline{Y}) are uncorrelated after the centering. Thus we can apply our aforementioned techniques to separately bound X¯\overline{X} and Y¯\overline{Y}, which yield the sharp conditions of recovery.

We remark that the partial recovery results in [HM20] are obtained by analyzing an estimator slightly different from the MLE and the same MGF bound (16) is used. However, there are two major differences that led to the looseness of their results. First, their analysis does not separately bound XX and YY. Second, the tilting parameter in the Chernoff bound is suboptimal.

1.3.4 Exact recovery

For exact recovery, we need to further consider π\pi^{\prime} that is close to π\pi, i.e., n|F|=o(n)n-|F|=o(n). In this regime, the number of choices of FF is comparable to that of π\pi^{\prime}. Hence, instead of separately bounding XX and YY, it is more favorable to directly applying the Chernoff bound to the difference XYX-Y. Crucially, the moment generation function 𝔼[exp(t(XOYO))]\mathbb{E}\left[\exp\left(t(X_{O}-Y_{O})\right)\right] continues to satisfy the relation (16) and the bottleneck for exact recovery happens at |F|=n2|F|=n-2, where π\pi^{\prime} differs from π\pi from a 22-cycle (transposition).

Prompted by this observation, we prove a matching necessary condition of exact recovery by considering all possible permutations σπ1π\sigma\triangleq\pi^{-1}\circ{\pi^{\prime}} that consists of n2n-2 fixed points and a 2-node cycle (i,j)(i,j) (transposition), in which case,

AπAπ,B=k[n]\{ij}(AikπAjkπ)(BikBjk).\displaystyle\left\langle A^{\pi^{\prime}}-A^{\pi},B\right\rangle=-\sum_{k\in[n]\backslash\{ij\}}\left(A^{\pi}_{ik}-A^{\pi}_{jk}\right)\left(B_{ik}-B_{jk}\right). (17)

There remains two key challenges to conclude the existence of many choices of (i,j)(i,j) for which Aπ,BAπ,B\langle A^{\pi^{\prime}},B\rangle\geq\langle A^{\pi},B\rangle. First, to derive a tight impossibility condition, we need to obtain a tight large-deviation lower estimate for this event. Second, the RHS of (17) for different pairs (i,j)(i,j) are correlated. This dependency is addressed by restricting the choices of (i,j)(i,j) and applying a second moment computation.

Note that the impossibility proof of exact recovery for the Gaussian model in [Gan20] also considers the permutations that consist of a single transposition. The difference is that the large-deviation lower estimate simply follows from Gaussian tail probability and the correlations among different pairs (i,j)(i,j) is bounded by a second-moment calculation using densities of correlated Gaussians.

2 Impossibility of Partial Recovery

To start, we characterize the asymptotic value of the mutual information I(A,B;π)I(A,B;\pi) – a key quantity that measures the amount of information about π\pi provided by the observation (A,B)(A,B). By definition,

I(A,B;π)𝔼[D(𝒫A,Bπ𝒫A,B)]=𝔼[D(𝒫A,Bπ𝒬A,B)]D(𝒫A,B||𝒬AB)I(A,B;\pi)\triangleq\mathbb{E}\left[D\left({\mathcal{P}}_{A,B\mid\pi}\|{\mathcal{P}}_{A,B}\right)\right]=\mathbb{E}\left[D\left({\mathcal{P}}_{A,B\mid\pi}\|{\mathcal{Q}}_{A,B}\right)\right]-D\left({\mathcal{P}}_{A,B}||{\mathcal{Q}}_{AB}\right)

for any joint distribution 𝒬A,B{\mathcal{Q}}_{A,B} of (A,B)(A,B) such that D(𝒫A,B||𝒬AB)<D\left({\mathcal{P}}_{A,B}||{\mathcal{Q}}_{AB}\right)<\infty. Note that 𝒫A,Bπ{\mathcal{P}}_{A,B\mid\pi} factorizes into a product distribution i<j𝒫Aπ(i)π(j),Bij=P(n2)\prod_{i<j}{\mathcal{P}}_{A_{\pi(i)\pi(j)},B_{ij}}=P^{\otimes\binom{n}{2}}, where PP is the joint distribution of (Aπ(i)π(j),Bij)(A_{\pi(i)\pi(j)},B_{ij}). Thus, to exploit the tensorization property of the KL-divergence, we choose 𝒬A,B{\mathcal{Q}}_{A,B} to be a product distribution under which AA and BB are independent and (Aij,Bij)(A_{ij},B_{ij}) are i.i.d. pairs of independent random variables with a joint distribution QQ with the same marginals as PP. (We shall refer to this 𝒬A,B{\mathcal{Q}}_{A,B} as the null model.) In particular, for the Gaussian (resp. Erdős-Rényi) model, QQ is the joint distribution of two independent standard normal (resp. Bern(q){\rm Bern}(q)) random variables. Under this choice, we have D(𝒫A,Bπ𝒬A,B)=(n2)D(PQ)=(n2)I(P)D\left({\mathcal{P}}_{A,B\mid\pi}\|{\mathcal{Q}}_{A,B}\right)=\binom{n}{2}D(P\|Q)=\binom{n}{2}I(P) and hence

I(A,B;π)=(n2)I(P)D(𝒫A,B||𝒬AB).I(A,B;\pi)=\binom{n}{2}I(P)-D\left({\mathcal{P}}_{A,B}||{\mathcal{Q}}_{AB}\right).

By the non-negativity of the KL divergence, we have I(A,B;π)(n2)I(P)I(A,B;\pi)\leq\binom{n}{2}I(P). This bound turns out to be tight, as made precise by the following proposition.

Proposition 1.

It holds that

I(A,B;π)=(n2)I(P)ζn,I(A,B;\pi)=\binom{n}{2}I(P)-\zeta_{n},

where

  • ζn=o(1)\zeta_{n}=o(1) in the Gaussian model with ρ2(4ϵ)lognn\rho^{2}\leq\frac{(4-\epsilon)\log n}{n};

  • ζn=o(1)\zeta_{n}=o(1) in the dense Erdős-Rényi graphs with p=no(1)p=n^{-o(1)} and nps2(log(1/p)1+p)(2ϵ)log(n)nps^{2}\left(\log(1/p)-1+p\right)\leq(2-\epsilon)\log(n);

  • ζn=O(logn)\zeta_{n}=O(\log n) in the sparse Erdős-Rényi graphs with p=nΩ(1)p=n^{-\Omega(1)} and np=ω(1)np=\omega(1) and nps21ϵnps^{2}\leq 1-\epsilon;

for some arbitrarily small but fixed constant ϵ>0\epsilon>0.

Given the tight characterization of the mutual information in Proposition 1, we now relate it to the Bayes risk. Using the chain rule, we have

I(A,B;π)=I(B;πA)=I(B;AπA),I(A,B;\pi)=I(B;\pi\mid A)=I(B;A^{\pi}\mid A),

where the second equality follows from the fact that AAπBA\to A^{\pi}\to B forms a Markov chain. The intuition is that conditioned on AA, BB is a noisy observation of AπA^{\pi} (which is random owning to π\pi). In such a situation, the mutual information can typically be related to an integral of the reconstruction error of the signal AπA^{\pi}. To make this precise, we first introduce a parametric model 𝒫θ{\mathcal{P}}_{\theta} that interpolates between the planted model 𝒫{\mathcal{P}} and the null model 𝒬{\mathcal{Q}} as θ\theta varies. We write 𝔼θ\mathbb{E}_{\theta} to indicate expectation taken with respect to the law 𝒫θ{\mathcal{P}}_{\theta}.

For the Gaussian model, let 𝒫θ{\mathcal{P}}_{\theta} denote the model under which B=θAπ+1θZ,B=\sqrt{\theta}A^{\pi}+\sqrt{1-\theta}Z, where A,ZA,Z are two independent Gaussian matrices and θ[0,1]\theta\in[0,1]. Then θ=ρ2\theta=\rho^{2} corresponds to the planted model 𝒫{\mathcal{P}} while θ=0\theta=0 corresponds to the null model 𝒬{\mathcal{Q}}. As θ\theta increases from 0 to ρ2\rho^{2}, 𝒫θ{\mathcal{P}}_{\theta} interpolates between 𝒬{\mathcal{Q}} and 𝒫{\mathcal{P}}. Let

mmseθ(Aπ)𝔼θ[Aπ𝔼θ[Aπ|A,B]2]\mathrm{mmse}_{\theta}(A^{\pi})\triangleq\mathbb{E}_{\theta}[\|{A^{\pi}-\mathbb{E}_{\theta}[A^{\pi}|A,B]}\|^{2}] (18)

denote the minimum mean-squared error (MMSE) of estimating AπA^{\pi} based on (A,B)(A,B) distributed according to 𝒫θ{\mathcal{P}}_{\theta}. The following proposition follows from the celebrated I-MMSE formula [GSV05].

Proposition 2 (Gaussian model).
I(A,B;π)=120ρ2mmseθ(Aπ)(1θ)2𝑑θ.I(A,B;\pi)=\frac{1}{2}\int_{0}^{\rho^{2}}\frac{\mathrm{mmse}_{\theta}(A^{\pi})}{(1-\theta)^{2}}d\theta\,.

The correlated Erdős-Rényi graph model requires more effort. Let us fix q=psq=ps and consider the following coupling PθP_{\theta} between two Bern(q){\rm Bern}(q) random variables with joint probability mass function pθp_{\theta}, where pθ(11)=qθp_{\theta}(11)=q\theta, pθ(01)=pθ(10)=q(1θ)p_{\theta}(01)=p_{\theta}(10)=q(1-\theta), and pθ(00)=1(2θ)qp_{\theta}(00)=1-(2-\theta)q, with θ[q,s]\theta\in[q,s]. Let 𝒫θ{\mathcal{P}}_{\theta} denote the interpolated model under which (Aπ(i)π(j),Bij)\left(A_{\pi(i)\pi(j)},B_{ij}\right) are i.i.d. pairs of correlated random variables with joint distribution PθP_{\theta}. As θ\theta increases from qq to ss, 𝒫θ{\mathcal{P}}_{\theta} interpolates between the null model 𝒬=𝒫q{\mathcal{Q}}={\mathcal{P}}_{q} and the planted model 𝒫=𝒫s{\mathcal{P}}={\mathcal{P}}_{s}. We have the following area theorem that relates I(A,B;π)I(A,B;\pi) to the MMSE of AπA^{\pi}.

Proposition 3 (Erdős-Rényi random graph).

It holds that

I(A,B;π)(n2)I(P)+(n2)qs2+qsθqs(1q)2(mmseθ(Aπ)(n2)q(1q))𝑑θ.\displaystyle I(A,B;\pi)\leq\binom{n}{2}I(P)+\binom{n}{2}qs^{2}+\int_{q}^{s}\frac{\theta-q}{s(1-q)^{2}}\left(\mathrm{mmse}_{\theta}(A^{\pi})-\binom{n}{2}q(1-q)\right)d\theta.

Finally, we relate the estimation error of AπA^{\pi} to that of π\pi.

Proposition 4.

In both the Gaussian and Erdős-Rényi graph model, if

mmseθ(Aπ)𝔼[A2](1ξ),\displaystyle\mathrm{mmse}_{\theta}(A^{\pi})\geq\mathbb{E}\left[\|{A}\|^{2}\right](1-\xi), (19)

for some ξ>0\xi>0, then for any estimator π^=π^(A,B)\widehat{\pi}=\widehat{\pi}(A,B),

𝔼θ[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)]\displaystyle\mathbb{E}_{\theta}[\mathsf{overlap}(\widehat{\pi},\pi)] O(ξ1/4+(nlogn𝔼[A2])1/4).\displaystyle\leq O\left(\xi^{1/4}+\left(\frac{n\log n}{\mathbb{E}\left[\|{A}\|^{2}\right]}\right)^{1/4}\right).

Now, we are ready to prove the negative results on partial recovery. We start with the Gaussian case.

Proof of Theorem 1.

In the Gaussian model, we have

I(P)=D(𝒩((00),(1ρρ1))𝒩((00),(1001)))=12log11ρ2.I(P)=D\left({\mathcal{N}}\Big{(}\left(\begin{smallmatrix}0\\ 0\end{smallmatrix}\right),\left(\begin{smallmatrix}1&\rho\\ \rho&1\end{smallmatrix}\right)\Big{)}\Big{\|}{\mathcal{N}}\Big{(}\left(\begin{smallmatrix}0\\ 0\end{smallmatrix}\right),\left(\begin{smallmatrix}1&0\\ 0&1\end{smallmatrix}\right)\Big{)}\right)=\frac{1}{2}\log\frac{1}{1-\rho^{2}}.

Assume that ρ2=(4ϵ/2)lognn\rho^{2}=(4-\epsilon/2)\frac{\log n}{n}. Fix some θ0(0,ρ2)\theta_{0}\in(0,\rho^{2}) to be chosen. Then

(n2)12log11ρ2ζn\displaystyle\binom{n}{2}\frac{1}{2}\log\frac{1}{1-\rho^{2}}-\zeta_{n} =(a)I(A,B;π)\displaystyle\overset{\rm(a)}{=}I(A,B;\pi)
(b)12(1ρ2)20ρ2mmseθ(Aπ)dθ\displaystyle\overset{\rm(b)}{\leq}\frac{1}{2(1-\rho^{2})^{2}}\int_{0}^{\rho^{2}}\mathrm{mmse}_{\theta}(A^{\pi})d\theta
(c)12(1ρ2)2((n2)θ0+mmseθ0(Aπ)(ρ2θ0)))\displaystyle\overset{\rm(c)}{\leq}\frac{1}{2(1-\rho^{2})^{2}}\left(\binom{n}{2}\theta_{0}+\mathrm{mmse}_{\theta_{0}}(A^{\pi})(\rho^{2}-\theta_{0}))\right)
=12(1ρ2)2((n2)ρ2+(mmseθ0(Aπ)(n2))(ρ2θ0))\displaystyle=\frac{1}{2(1-\rho^{2})^{2}}\left(\binom{n}{2}\rho^{2}+\left(\mathrm{mmse}_{\theta_{0}}(A^{\pi})-\binom{n}{2}\right)(\rho^{2}-\theta_{0})\right)

where ζn=o(1)\zeta_{n}=o(1) and (a)(a) holds by Proposition 1 ; (b)(b) follows from the I-MMSE formula given in Proposition 2; (c)(c) holds because mmseθ(Aπ)𝔼[A22]=(n2)\mathrm{mmse}_{\theta}(A^{\pi})\leq\mathbb{E}\left[\|A\|_{2}^{2}\right]=\binom{n}{2} and the fact that mmseθ(Aπ)\mathrm{mmse}_{\theta}(A^{\pi}) is monotone decreasing in θ\theta. Rearranging the terms in the last displayed equation, we get

mmseθ0(Aπ)(n2)\displaystyle\mathrm{mmse}_{\theta_{0}}(A^{\pi})-\binom{n}{2} (1ρ2)2ρ2θ0((n2)(log11ρ2ρ2(1ρ2)2)2ζn)\displaystyle\geq\frac{(1-\rho^{2})^{2}}{\rho^{2}-\theta_{0}}\left(\binom{n}{2}\left(\log\frac{1}{1-\rho^{2}}-\frac{\rho^{2}}{\left(1-\rho^{2}\right)^{2}}\right)-2\zeta_{n}\right)
(1ρ2)2ρ2θ0((n2)2ρ4(1ρ2)2+2ζn),\displaystyle\geq-\frac{(1-\rho^{2})^{2}}{\rho^{2}-\theta_{0}}\left(\binom{n}{2}\frac{2\rho^{4}}{(1-\rho^{2})^{2}}+2\zeta_{n}\right),

where the last inequality holds because log(1+x)xx2\log(1+x)\geq x-x^{2} for x0x\geq 0. Choosing θ0=(4ϵ)lognn\theta_{0}=(4-\epsilon)\frac{\log n}{n}, we conclude that

mmseθ0(Aπ)(n2)(1O(ρ2+ζnn2ρ2))=(n2)(1O(lognn)),\displaystyle\mathrm{mmse}_{\theta_{0}}(A^{\pi})\geq\binom{n}{2}\left(1-O\left(\rho^{2}+\frac{\zeta_{n}}{n^{2}\rho^{2}}\right)\right)=\binom{n}{2}\left(1-O\left(\frac{\log n}{n}\right)\right), (20)

where the last equality holds because ρ2=Θ(log(n)/n)\rho^{2}=\Theta(\log(n)/n) and ζn=o(1)\zeta_{n}=o(1). Since 𝔼[A22]=(n2)\mathbb{E}\left[\|A\|_{2}^{2}\right]=\binom{n}{2}, it follows from Proposition 4 that

𝔼θ0[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)]O((lognn)1/4).\mathbb{E}_{\theta_{0}}[\mathsf{overlap}(\widehat{\pi},\pi)]\leq O\left(\left(\frac{\log n}{n}\right)^{1/4}\right).

Finally, by Markov’s inequality, for any δn=ω((lognn)1/4)\delta_{n}=\omega\left(\left(\frac{\log n}{n}\right)^{1/4}\right) (in particular, a fixed constant δn>0\delta_{n}>0), 𝒫θ0{𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δn}=o(1){\mathcal{P}}_{\theta_{0}}\{\mathsf{overlap}(\widehat{\pi},\pi)\geq\delta_{n}\}=o(1). Note that 𝒫θ0{\mathcal{P}}_{\theta_{0}} corresponds to the Gaussian model with squared correlation coefficient equal to θ0=(4ϵ)lognn\theta_{0}=(4-\epsilon)\frac{\log n}{n}. By the arbitrariness of ϵ\epsilon, this completes the proof of Theorem 1. ∎

Next, we move to the Erdős-Rényi graph model.

Proof of the negative parts in Theorems 2 and 3.

Let s2=(2ϵ)lognnp(log(1/p)1+p)s^{2}=\frac{(2-\epsilon)\log n}{np(\log(1/p)-1+p)} in the dense regime and s2=1ϵnps^{2}=\frac{1-\epsilon}{np} in the sparse regime. Then we get that

ζn(n2)qs2\displaystyle-\zeta_{n}-\binom{n}{2}qs^{2} (a)qsθqs(1q)2(mmseθ(Aπ)(n2)q(1q))dθ\displaystyle\overset{\rm(a)}{\leq}\int_{q}^{s}\frac{\theta-q}{s(1-q)^{2}}\left(\mathrm{mmse}_{\theta}(A^{\pi})-\binom{n}{2}q(1-q)\right)d\theta
(b)(1ϵ)ssθqs(1q)2(mmse(1ϵ)s(Aπ)(n2)q(1q))dθ\displaystyle\overset{\rm(b)}{\leq}\int_{(1-\epsilon)s}^{s}\frac{\theta-q}{s(1-q)^{2}}\left(\mathrm{mmse}_{(1-\epsilon)s}(A^{\pi})-\binom{n}{2}q(1-q)\right)d\theta
=s2(2ϵϵ2)2ϵsq2s(1q)2=Θ(s)(mmse(1ϵ)s(Aπ)(n2)q(1q))\displaystyle=\underbrace{\frac{s^{2}(2\epsilon-\epsilon^{2})-2\epsilon sq}{2s(1-q)^{2}}}_{=\Theta(s)}\left(\mathrm{mmse}_{(1-\epsilon)s}(A^{\pi})-\binom{n}{2}q(1-q)\right)

where ζn=o(1)\zeta_{n}=o(1) in the dense regime and ζn=O(logn)\zeta_{n}=O(\log n) in the sparse regime; (a)(a) follows from Proposition 1 and Proposition 3; (b)(b) holds because mmseθ(Aπ)𝔼[A𝔼[A]22]=(n2)q(1q)\mathrm{mmse}_{\theta}(A^{\pi})\leq\mathbb{E}\left[\|A-\mathbb{E}\left[A\right]\|_{2}^{2}\right]=\binom{n}{2}q(1-q) and mmseθ(Aπ)\mathrm{mmse}_{\theta}(A^{\pi}) is monotone decreasing in θ\theta.222The fact that mmseθ(Aπ)\mathrm{mmse}_{\theta}(A^{\pi}) is monotone decreasing in θ\theta follows from a simulation argument. Let (A,B)𝒫θ(A,B)\sim{\mathcal{P}}_{\theta}. Fix θ\theta^{\prime} such that q<θ<θ<sq<\theta^{\prime}<\theta<s. Define B=(Bij)B^{\prime}=(B^{\prime}_{ij}) by passing each BijB_{ij} independently through the same (asymmetric) channel WW to obtain BijB^{\prime}_{ij}, where W(0|1)=(1q)(θθ)θqW(0|1)=\frac{(1-q)(\theta-\theta^{\prime})}{\theta-q} and W(1|0)=q(θθ)θqW(1|0)=\frac{q(\theta-\theta^{\prime})}{\theta-q} are well-defined. Then (A,B)𝒫θ(A,B^{\prime})\sim{\mathcal{P}}_{\theta^{\prime}}. Rearranging the terms in the last displayed equation, we conclude that

mmse(1ϵ)s(Aπ)\displaystyle\mathrm{mmse}_{(1-\epsilon)s}(A^{\pi}) (n2)q(1q)(1O(ζnn2qs+s))\displaystyle\geq\binom{n}{2}q(1-q)\left(1-O\left(\frac{\zeta_{n}}{n^{2}qs}+s\right)\right)
(n2)q(1O(s)),\displaystyle\geq\binom{n}{2}q\left(1-O(s)\right),

where the last inequality holds because qsq\leq s, ζn=O(n2qs2)\zeta_{n}=O(n^{2}qs^{2}) and s2lognnplog(1/p)s^{2}\asymp\frac{\log n}{np\log(1/p)}. Since 𝔼[A22]=(n2)q\mathbb{E}\left[\|A\|_{2}^{2}\right]=\binom{n}{2}q, it follows from Proposition 4 that

𝔼(1ϵ)s[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)]O(s1/4+(lognnq)1/4)=O((lognnq)1/4),\mathbb{E}_{(1-\epsilon)s}[\mathsf{overlap}(\widehat{\pi},\pi)]\leq O\left(s^{1/4}+\left(\frac{\log n}{nq}\right)^{1/4}\right)=O\left(\left(\frac{\log n}{nq}\right)^{1/4}\right),

where the last equality holds because nqs=nps2=O(logn)nqs=nps^{2}=O(\log n). Note that in the dense regime, since s2lognnp(log1p1+p)s^{2}\asymp\frac{\log n}{np(\log\frac{1}{p}-1+p)} and p=no(1)p=n^{-o(1)}, we have nq=nps=ω(logn)nq=nps=\omega(\log n). This also holds in the sparse regime when s21nps^{2}\asymp\frac{1}{np} under the extra assumption that np=(log2n)np=\left(\log^{2}n\right). Thus, by Markov’s inequality, for any δn=ω((lognnq)1/4)\delta_{n}=\omega\left(\left(\frac{\log n}{nq}\right)^{1/4}\right), in particular, any fixed constant δn>0\delta_{n}>0, we have 𝒫(1ϵ)s{𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π^,π)δn}=o(1){\mathcal{P}}_{(1-\epsilon)s}\{\mathsf{overlap}(\widehat{\pi},\pi)\geq\delta_{n}\}=o(1). In other words, we have shown the desired impossibility result under the distribution 𝒫(1ϵ)s{\mathcal{P}}_{(1-\epsilon)s}, which corresponds to the correlated Erdős-Rényi model with parameters p=p1ϵp^{\prime}=\frac{p}{1-\epsilon} and s=s(1ϵ)s^{\prime}=s(1-\epsilon). By the arbitrariness of ϵ\epsilon, this completes the proof. ∎

2.1 Proof of Proposition 1

In this subsection, we prove Proposition 1, which reduces to bounding D(𝒫A,B𝒬A,B)D\left({\mathcal{P}}_{A,B}\|{\mathcal{Q}}_{A,B}\right). It is well-known that KL divergence can be bounded by the χ2\chi^{2}-divergence (variance of the likelihood ratio). This method, however, is often too loose as the second moment can be derailed by rare events. Thus a more robust version is by means of truncated second moment, which has been carried out in [WXY20] to bound TV(𝒫A,B,𝒬A,B)\mathrm{TV}\left({\mathcal{P}}_{A,B},{\mathcal{Q}}_{A,B}\right) for studying the hypothesis testing problem in graph matching. Here we leverage the same result to bound the KL divergence. To this end, we first present a general bound then specialize to our problem in both Gaussian and Erdős-Rényi models.

Lemma 1.

Let 𝒫XY{\mathcal{P}}_{XY} denote the joint distribution of (X,Y)(X,Y). Let {\mathcal{E}} be an event independent of XX such that 𝒫()=1δ{\mathcal{P}}({\mathcal{E}})=1-\delta. Let 𝒬Y{\mathcal{Q}}_{Y} be an auxiliary distribution such that 𝒫Y|X𝒬Y{\mathcal{P}}_{Y|X}\ll{\mathcal{Q}}_{Y} 𝒫X{\mathcal{P}}_{X}-a.s. Then

D(𝒫Y𝒬Y)log(1+χ2(𝒫Y|𝒬Y))+δ(log1δ+𝔼[D(𝒫Y|X𝒬Y)])+δVar(logd𝒫Y|Xd𝒬Y),D({\mathcal{P}}_{Y}\|{\mathcal{Q}}_{Y})\leq\log(1+\chi^{2}({\mathcal{P}}_{Y|{\mathcal{E}}}\|{\mathcal{Q}}_{Y}))+\delta\left(\log\frac{1}{\delta}+\mathbb{E}[D({\mathcal{P}}_{Y|X}\|{\mathcal{Q}}_{Y})]\right)+\sqrt{\delta\cdot\mathrm{Var}\left(\log\frac{d{\mathcal{P}}_{Y|X}}{d{\mathcal{Q}}_{Y}}\right)}, (21)

where 𝒫Y|{\mathcal{P}}_{Y|{\mathcal{E}}} denote the distribution of YY conditioned on {\mathcal{E}}, and the χ2\chi^{2}-divergence is defined as χ2(PQ)=𝔼Q[(dPdQ)2]\chi^{2}(P\|Q)=\mathbb{E}_{Q}[(\frac{dP}{dQ})^{2}] if PQP\ll Q and \infty otherwise.

Proof.

Note that 𝒫Y=(1δ)𝒫Y|+δ𝒫Y|c{\mathcal{P}}_{Y}=(1-\delta){\mathcal{P}}_{Y|{\mathcal{E}}}+\delta{\mathcal{P}}_{Y|{\mathcal{E}}^{c}}. Thanks to the convexity of the KL divergence, Jensen’s inequality yields

D(𝒫Y𝒬Y)(1δ)D(𝒫Y|𝒬Y)+δD(𝒫Y|c𝒬Y).D({\mathcal{P}}_{Y}\|{\mathcal{Q}}_{Y})\leq(1-\delta)D({\mathcal{P}}_{Y|{\mathcal{E}}}\|{\mathcal{Q}}_{Y})+\delta D({\mathcal{P}}_{Y|{\mathcal{E}}^{c}}\|{\mathcal{Q}}_{Y}).

The first term can be bounded using the generic fact that D(PQ)log𝔼Q[(dPdQ)2]=log(1+χ2(PQ))D(P\|Q)\leq\log\mathbb{E}_{Q}[(\frac{dP}{dQ})^{2}]=\log(1+\chi^{2}(P\|Q)). Let g(X,Y)=logd𝒫Y|Xd𝒬Yg(X,Y)=\log\frac{d{\mathcal{P}}_{Y|X}}{d{\mathcal{Q}}_{Y}}. Using the convexity of KL divergence and the independence of {\mathcal{E}} and XX, we bound the second term as follows:

D(𝒫Y|c𝒬Y)\displaystyle D({\mathcal{P}}_{Y|{\mathcal{E}}^{c}}\|{\mathcal{Q}}_{Y})\leq 𝔼[D(𝒫Y|X,c𝒬Y)]\displaystyle~{}\mathbb{E}[D({\mathcal{P}}_{Y|X,{\mathcal{E}}^{c}}\|{\mathcal{Q}}_{Y})]
=\displaystyle= 𝔼[𝑑𝒫Y|X,clog(d𝒫Y|X,cd𝒬Y)]\displaystyle~{}\mathbb{E}\left[\int d{\mathcal{P}}_{Y|X,{\mathcal{E}}^{c}}\log\left(\frac{d{\mathcal{P}}_{Y|X,{\mathcal{E}}^{c}}}{d{\mathcal{Q}}_{Y}}\right)\right]
=\displaystyle= 𝔼[𝑑𝒫Y|X𝟏{(X,Y)c}𝒫(c)log(d𝒫Y|Xd𝒬Y𝒫(c))]\displaystyle~{}\mathbb{E}\left[\int d{\mathcal{P}}_{Y|X}\frac{{\mathbf{1}_{\left\{{(X,Y)\in{\mathcal{E}}^{c}}\right\}}}}{{\mathcal{P}}({\mathcal{E}}^{c})}\log\left(\frac{d{\mathcal{P}}_{Y|X}}{d{\mathcal{Q}}_{Y}{\mathcal{P}}({\mathcal{E}}^{c})}\right)\right]
=\displaystyle= log1δ+1δ𝔼[(g(X,Y)𝟏{(X,Y)c}]\displaystyle~{}\log\frac{1}{\delta}+\frac{1}{\delta}\mathbb{E}[(g(X,Y){\mathbf{1}_{\left\{{(X,Y)\in{\mathcal{E}}^{c}}\right\}}}]
=\displaystyle= log1δ+𝔼[g(X,Y)]+1δ𝔼[(g(X,Y)𝔼[g(X,Y)])𝟏{(X,Y)c}].\displaystyle~{}\log\frac{1}{\delta}+\mathbb{E}[g(X,Y)]+\frac{1}{\delta}\mathbb{E}\left[\left(g(X,Y)-\mathbb{E}[g(X,Y)]\right){\mathbf{1}_{\left\{{(X,Y)\in{\mathcal{E}}^{c}}\right\}}}\right].

Applying Cauchy-Schwarz to the last term completes the proof. ∎

Next we apply Lemma 1 in the context of the random graph matching by identifying XX and YY with the latent π\pi and the observation (A,B)(A,B) respectively. Let {\mathcal{E}} be a certain high-probability event independent of π\pi. Then

𝒫A,B|=1𝒫()n!π𝒮n𝒫A,B𝟏{(A,B,π)}.{\mathcal{P}}_{A,B|{\mathcal{E}}}=\frac{1}{{\mathcal{P}}({\mathcal{E}})\ n!}\sum_{\pi\in{\mathcal{S}}_{n}}{\mathcal{P}}_{A,B}{\mathbf{1}_{\left\{{(A,B,\pi)\in{\mathcal{E}}}\right\}}}.

Recall that the null model is chosen to be 𝒬A,B=𝒫A𝒫B{\mathcal{Q}}_{A,B}={\mathcal{P}}_{A}\otimes{\mathcal{P}}_{B}. As shown in [WXY20], for both the Gaussian and Erdős-Rényi graph model, it is possible to construct a high-probability event {\mathcal{E}} satisfying the symmetry condition 𝒫(π)=𝒫(){\mathcal{P}}({\mathcal{E}}\mid\pi)={\mathcal{P}}({\mathcal{E}}), such that χ2(𝒫A,B|𝒬A,B)=o(1)\chi^{2}({\mathcal{P}}_{A,B|{\mathcal{E}}}\|{\mathcal{Q}}_{A,B})=o(1). This bounds the first term in (21). For the second term, since both AA and BB are individually independent of π\pi, we have

𝔼[logd𝒫A,Bπd𝒬A,B]=I(A;B|π)=(n2)D(PQ)=(n2)I(P),\mathbb{E}\left[\log\frac{d{\mathcal{P}}_{A,B\mid\pi}}{d{\mathcal{Q}}_{A,B}}\right]=I(A;B|\pi)=\binom{n}{2}D(P\|Q)=\binom{n}{2}I(P), (22)

where the last equality holds because QQ is the product of the marginals of PP. The third term in (21) can be computed explicitly. Next we give details on how to complete the proof of Proposition 1.

Gaussian model

It is shown in [WXY20, Section 4.1, Lemma 1] that there exists an event {\mathcal{E}} independent of π\pi such that 𝒫(c)=eΩ(n){\mathcal{P}}({\mathcal{E}}^{c})=e^{-\Omega(n)} and χ2(𝒫𝒬)=o(1),\chi^{2}\left({\mathcal{P}}_{\mathcal{E}}\|{\mathcal{Q}}\right)=o(1), provided that ρ2(4ϵ)lognn\rho^{2}\leq\frac{(4-\epsilon)\log n}{n}. Furthermore, by (22) and (7), we have 𝔼[logd𝒫A,Bπd𝒬A,B]=(n2)12log11ρ2=O(nlogn)\mathbb{E}\left[\log\frac{d{\mathcal{P}}_{A,B\mid\pi}}{d{\mathcal{Q}}_{A,B}}\right]=\binom{n}{2}\frac{1}{2}\log\frac{1}{1-\rho^{2}}=O(n\log n). To compute the variance, note that

logd𝒫A,Bπd𝒬A,B=12(n2)log(1ρ2)h(A,B,π)4(1ρ2),\log\frac{d{\mathcal{P}}_{A,B\mid\pi}}{d{\mathcal{Q}}_{A,B}}=-\frac{1}{2}\binom{n}{2}\log(1-\rho^{2})-\frac{h(A,B,\pi)}{4(1-\rho^{2})},

where

h(A,B,π)ρ2A2+ρ2B22ρAπ,B.h(A,B,\pi)\triangleq\rho^{2}\|{A}\|^{2}+\rho^{2}\|{B}\|^{2}-2\rho\left\langle A^{\pi},B\right\rangle.

Thus Var(logd𝒫A,Bπd𝒬A,B)=116(1ρ2)2Var(h(A,B,π))\mathrm{Var}(\log\frac{d{\mathcal{P}}_{A,B\mid\pi}}{d{\mathcal{Q}}_{A,B}})=\frac{1}{16(1-\rho^{2})^{2}}\mathrm{Var}(h(A,B,\pi)). Write B=ρAπ+1ρ2ZB=\rho A^{\pi}+\sqrt{1-\rho^{2}}Z where ZZ is an independent copy of AA, we have h(A,B,π)=ρ2(B2A2)2ρ1ρ2Aπ,Zh(A,B,\pi)=\rho^{2}(\|{B}\|^{2}-\|{A}\|^{2})-2\rho\sqrt{1-\rho^{2}}\left\langle A^{\pi},Z\right\rangle. Here both A2\|{A}\|^{2} and B2\|{B}\|^{2} are distributed as χ(n2)2\chi^{2}_{\binom{n}{2}}, with variance equal to 2(n2)2\binom{n}{2}. Furthermore, Var(Aπ,Z)=(n2)\mathrm{Var}(\left\langle A^{\pi},Z\right\rangle)=\binom{n}{2}. Thus Var(h(A,B,π))=O(nlogn)\mathrm{Var}(h(A,B,\pi))=O(n\log n). Applying Lemma 1, we conclude that D(𝒫A,B𝒬A,B)=o(1)D({\mathcal{P}}_{A,B}\|{\mathcal{Q}}_{A,B})=o(1).

Erdős-Rényi Graphs

In the dense regime of p=no(1)p=n^{-o(1)} and p=1Ω(1)p=1-\Omega(1), it is shown in [WXY20, Section A.3, Lemma 6] that there exists an event {\mathcal{E}} such that 𝒫(c)=en1o(1){\mathcal{P}}({\mathcal{E}}^{c})=e^{-n^{1-o(1)}} and χ2(𝒫𝒬)=o(1)\chi^{2}\left({\mathcal{P}}_{\mathcal{E}}\|{\mathcal{Q}}\right)=o(1), provided that nps2(log(1/p)1+p)(2ϵ)log(n)nps^{2}\left(\log(1/p)-1+p\right)\leq(2-\epsilon)\log(n). In the sparse regime (see [WXY20, Section 5, Lemma 2]), it is possible to choose {\mathcal{E}} such that 𝒫(c)=O(1n){\mathcal{P}}({\mathcal{E}}^{c})=O(\frac{1}{n}) and χ2(𝒫𝒬)=o(1)\chi^{2}\left({\mathcal{P}}_{\mathcal{E}}\|{\mathcal{Q}}\right)=o(1), provided that nps21ϵnps^{2}\leq 1-\epsilon and np=ω(1)np=\omega(1).

By (22) and (8), we have 𝔼[logd𝒫A,Bπd𝒬A,B]=(n2)I(P)\mathbb{E}\left[\log\frac{d{\mathcal{P}}_{A,B\mid\pi}}{d{\mathcal{Q}}_{A,B}}\right]=\binom{n}{2}I(P), where I(P)=qd(sq)+(1q)d(ηq)I(P)=qd(s\|q)+(1-q)d(\eta\|q), with q=psq=ps and η=q(1s)1q\eta=\frac{q(1-s)}{1-q}. In both the dense and sparse regime, one can verify that

I(P)=s2p(p1+log1p)(1+o(1)).I(P)=s^{2}p\left(p-1+\log\frac{1}{p}\right)(1+o(1)).

As a result, we have 𝔼[logd𝒫A,Bπd𝒬A,B]=O(nlogn)\mathbb{E}\left[\log\frac{d{\mathcal{P}}_{A,B\mid\pi}}{d{\mathcal{Q}}_{A,B}}\right]=O(n\log n) in both cases.

It remains to bound the variance in (21). Note that

log𝒫(A,Bπ)𝒬(A,B)=(n2)log1η1ps+12h(A,B,π),\log\frac{{\mathcal{P}}(A,B\mid\pi)}{{\mathcal{Q}}(A,B)}=\binom{n}{2}\log\frac{1-\eta}{1-ps}+\frac{1}{2}h(A,B,\pi),

where

h(A,B,π)log1s1η(A2+B2)+logs(1η)η(1s)Aπ,B.h(A,B,\pi)\triangleq\log\frac{1-s}{1-\eta}\left(\|{A}\|^{2}+\|{B}\|^{2}\right)+\log\frac{s(1-\eta)}{\eta(1-s)}\left\langle A^{\pi},B\right\rangle.

Since pp is bounded away from 11 and s=o(1)s=o(1) in both the dense regime (p=no(1)p=n^{-o(1)}) and sparse regime (p=nΩ(1)p=n^{-\Omega(1)} and np=ω(1)np=\omega(1)), it follows that

log1η1s\displaystyle\log\frac{1-\eta}{1-s} =log(1+s(1p)(1s)(1ps))=(1+o(1))s(1p)\displaystyle=\log\left(1+\frac{s(1-p)}{(1-s)(1-ps)}\right)=\left(1+o(1)\right)s(1-p)
s(1η)η(1s)\displaystyle\frac{s(1-\eta)}{\eta(1-s)} =(1η)(1ps)p(1s)2=1+o(1)p.\displaystyle=\frac{(1-\eta)(1-ps)}{p(1-s)^{2}}=\frac{1+o(1)}{p}.

Note that A2,B2Binom((n2),ps)\|{A}\|^{2},\|{B}\|^{2}\sim{\rm Binom}(\binom{n}{2},ps) and Aπ,BBinom((n2),ps2)\left\langle A^{\pi},B\right\rangle\sim{\rm Binom}(\binom{n}{2},ps^{2}). We have Var(h)=O(n2ps2log21p)\mathrm{Var}(h)=O(n^{2}ps^{2}\log^{2}\frac{1}{p}). Consequently, Var(h)𝒫(c)=o(1)\sqrt{\mathrm{Var}(h){\mathcal{P}}({\mathcal{E}}^{c})}=o(1) and O(logn)O(\log n) in the dense and sparse case, respectively. Applying Lemma 1 yields the same upper bound on D(𝒫A,B𝒬A,B)D({\mathcal{P}}_{A,B}\|{\mathcal{Q}}_{A,B}).

2.2 Proof of Proposition 2

The proof follows from a simple application of the I-MMSE formula for the additive Gaussian channel. Note that under the interpolated model 𝒫θ{\mathcal{P}}_{\theta}, we have

B1θ=θ1θAπ+Z,\frac{B}{\sqrt{1-\theta}}=\sqrt{\frac{\theta}{1-\theta}}A^{\pi}+Z,

which is the output of an additive Gaussian channel with input AπA^{\pi} and the standard Gaussian noise ZZ. Letting I(θ)=I(B;Aππ)=I(A,B;π)I(\theta)=I(B;A^{\pi}\mid\pi)=I(A,B;\pi) and using the I-MMSE formula [GSV05, Theorem 2], we have

dI(θ)d(θ/(1θ))=12mmseθ(Aπ).\displaystyle\frac{dI(\theta)}{d\left(\theta/(1-\theta)\right)}=\frac{1}{2}\mathrm{mmse}_{\theta}\left(A^{\pi}\right). (23)

Thus dI(θ)dθ=12(1θ)2mmseθ(Aπ)\frac{dI(\theta)}{d\theta}=\frac{1}{2(1-\theta)^{2}}\mathrm{mmse}_{\theta}\left(A^{\pi}\right). Integrating over θ\theta from 0 to ρ2\rho^{2} and noting I(0)=0I(0)=0, we arrive at

I(ρ2)=0ρ212(1θ)2mmseθ(Aπ)𝑑θ.I(\rho^{2})=\int_{0}^{\rho^{2}}\frac{1}{2(1-\theta)^{2}}\mathrm{mmse}_{\theta}\left(A^{\pi}\right)d\theta.

2.3 Proof of Proposition 3

Note that under the interpolated model 𝒫θ{\mathcal{P}}_{\theta}, we have

pθ(y|x)={θx=1,y=11θx=1,y=0ηx=0,y=11ηx=0,y=0,η=q(1θ)1q.p_{\theta}(y|x)=\begin{cases}\theta&x=1,y=1\\ 1-\theta&x=1,y=0\\ \eta&x=0,y=1\\ 1-\eta&x=0,y=0\\ \end{cases},\quad\eta=\frac{q(1-\theta)}{1-q}.

Let g(θ)D(PθQ)=qd(θq)+(1q)d(ηq)g(\theta)\triangleq D(P_{\theta}\|Q)=q\cdot d(\theta\|q)+(1-q)d(\eta\|q). Then g(s)=D(PQ)g(s)=D(P\|Q) and g(q)=0g(q)=0. Let I(θ)=Iθ(A,B;π)I(\theta)=I_{\theta}(A,B;\pi), where the subscript θ\theta indicates that AπA^{\pi} and BB are distributed according to 𝒫θ{\mathcal{P}}_{\theta}. Then

Is(Aπ;B|A)=Hs(Aπ|A)Hs(Aπ|B,A)=Hq(Aπ|B,A)Hs(Aπ|B,A)=qsddθHθ(Aπ|B,A)𝑑θ,I_{s}(A^{\pi};B|A)=H_{s}(A^{\pi}|A)-H_{s}(A^{\pi}|B,A)=H_{q}(A^{\pi}|B,A)-H_{s}(A^{\pi}|B,A)=-\int_{q}^{s}\frac{d}{d\theta}H_{\theta}(A^{\pi}|B,A)d\theta,

where the second equality holds because for a fixed qq, Hθ(Aπ|A)H_{\theta}(A^{\pi}|A) does not change with θ\theta and when θ=q\theta=q, AπA^{\pi} and BB are independent and hence Iq(Aπ;B|A)=0I_{q}(A^{\pi};B|A)=0 so that Hq(Aπ|A)=Hq(Aπ|B,A)H_{q}(A^{\pi}|A)=H_{q}(A^{\pi}|B,A). By [DAM17, Lemma 7.1], we have

ddθHθ(Aπ|B,A)=(I)+(n2)ddθ(h(q)g(θ))=(I)(n2)g(θ),\frac{d}{d\theta}H_{\theta}(A^{\pi}|B,A)=\text{(I)}+\binom{n}{2}\frac{d}{d\theta}\left(h(q)-g(\theta)\right)=\text{(I)}-\binom{n}{2}g^{\prime}(\theta),

where h(q)=qlogq(1q)log(1q)h(q)=-q\log q-(1-q)\log(1-q) is the binary entropy function,

(I)=e([n]2)xe,yepθ(ye|xe)θ𝔼[μe(xe|B\e,A)logxepθ(ye|xe)μe(xe|B\e,A)],\text{(I)}=\sum_{e\in\binom{[n]}{2}}\sum_{x_{e},y_{e}}\frac{\partial p_{\theta}(y_{e}|x_{e})}{\partial\theta}\mathbb{E}\left[\mu_{e}(x_{e}|B_{\backslash e},A)\log\sum_{x_{e}^{\prime}}p_{\theta}(y_{e}|x_{e}^{\prime})\mu_{e}(x_{e}^{\prime}|B_{\backslash e},A)\right],

B\eB_{\backslash e} denotes the adjacency vector BB excluding BeB_{e}, and μe(B\e,A)\mu_{e}(\cdot\mid B_{\backslash e},A) is the distribution of AeπA_{e}^{\pi} conditional on (B\e,A)(B_{\backslash e},A) under 𝒫θ{\mathcal{P}}_{\theta}. Since g(q)=0g(q)=0, we have

Is(Aπ;B|π)=qs(I)+(n2)g(s)=qs(I)+(n2)D(PQ).I_{s}(A^{\pi};B|\pi)=-\int_{q}^{s}\text{(I)}+\binom{n}{2}g(s)=-\int_{q}^{s}\text{(I)}+\binom{n}{2}D(P\|Q). (24)

It remains to relate (I) to the reconstruction error. Note that for x,y{0,1}x,y\in\{0,1\},

pθ(y|x)=α(x)y+(1α(x))(1y),α(x)=θx+η(1x)p_{\theta}(y|x)=\alpha(x)y+(1-\alpha(x))(1-y),\quad\alpha(x)=\theta x+\eta(1-x) (25)

and

pθ(1|x)θ=α(x)θ=x+ηθ(1x)=pθ(0|x)θ.\frac{\partial p_{\theta}(1|x)}{\partial\theta}=\frac{\partial\alpha(x)}{\partial\theta}=x+\frac{\partial\eta}{\partial\theta}(1-x)=-\frac{\partial p_{\theta}(0|x)}{\partial\theta}.

Thus for each xex_{e},

ye=0,1pθ(ye|xe)θ𝔼[μe(xe|B\e,A)logxe=0,1pθ(ye|xe)μe(xe|B\e,A)]\displaystyle~{}\sum_{y_{e}=0,1}\frac{\partial p_{\theta}(y_{e}|x_{e})}{\partial\theta}\mathbb{E}\left[\mu_{e}(x_{e}|B_{\backslash e},A)\log\sum_{x_{e}^{\prime}=0,1}p_{\theta}(y_{e}|x_{e}^{\prime})\mu_{e}(x_{e}^{\prime}|B_{\backslash e},A)\right]
=\displaystyle= pθ(1|xe)θ𝔼[μe(xe|B\e,A)logxe=0,1pθ(1|xe)μe(xe|B\e,A)1xe=0,1pθ(1|xe)μe(xe|B\e,A)]\displaystyle~{}\frac{\partial p_{\theta}(1|x_{e})}{\partial\theta}\mathbb{E}\left[\mu_{e}(x_{e}|B_{\backslash e},A)\log\frac{\sum_{x_{e}^{\prime}=0,1}p_{\theta}(1|x_{e}^{\prime})\mu_{e}(x_{e}^{\prime}|B_{\backslash e},A)}{1-\sum_{x_{e}^{\prime}=0,1}p_{\theta}(1|x_{e}^{\prime})\mu_{e}(x_{e}^{\prime}|B_{\backslash e},A)}\right]
=\displaystyle= pθ(1|xe)θ𝔼[μe(xe|B\e,A)h(ye)],\displaystyle~{}-\frac{\partial p_{\theta}(1|x_{e})}{\partial\theta}\mathbb{E}\left[\mu_{e}(x_{e}|B_{\backslash e},A)h^{\prime}(y_{e})\right],

where we defined

yeye(B\e,A)xe=0,1pθ(1|xe)μe(xe|B\e,A).y_{e}\equiv y_{e}(B_{\backslash e},A)\triangleq\sum_{x_{e}=0,1}p_{\theta}(1|x_{e})\mu_{e}(x_{e}|B_{\backslash e},A).

and used h(x)=log1xxh^{\prime}(x)=\log\frac{1-x}{x}. Then we have

xe=0,1ye=0,1pθ(ye|xe)θ𝔼[μe(xe|B\e,A)logxe=0,1pθ(ye|xe)μe(xe|B\e,A)]=𝔼[yeθh(ye)]\displaystyle~{}\sum_{x_{e}=0,1}\sum_{y_{e}=0,1}\frac{\partial p_{\theta}(y_{e}|x_{e})}{\partial\theta}\mathbb{E}\left[\mu_{e}(x_{e}|B_{\backslash e},A)\log\sum_{x_{e}^{\prime}=0,1}p_{\theta}(y_{e}|x_{e}^{\prime})\mu_{e}(x_{e}^{\prime}|B_{\backslash e},A)\right]=-\mathbb{E}\left[\frac{\partial y_{e}}{\partial\theta}h^{\prime}(y_{e})\right]

Let x~e=𝔼[Aeπ|B\e,A]\widetilde{x}_{e}=\mathbb{E}[A^{\pi}_{e}|B_{\backslash e},A]. Then ye=θx~e+η(1x~e)y_{e}=\theta\widetilde{x}_{e}+\eta(1-\widetilde{x}_{e}). Let

Δe=yeq=(θη)(x~eq)=θq1q(x~eq).\Delta_{e}=y_{e}-q=(\theta-\eta)(\widetilde{x}_{e}-q)=\frac{\theta-q}{1-q}(\widetilde{x}_{e}-q).

Then 𝔼[Δe]=0\mathbb{E}[\Delta_{e}]=0 and 𝔼[Δe2]=(θq1q)2Var(x~e)\mathbb{E}[\Delta_{e}^{2}]=(\frac{\theta-q}{1-q})^{2}\mathrm{Var}(\widetilde{x}_{e}). Furthermore,

yeθ=11q(x~eq).\frac{\partial y_{e}}{\partial\theta}=\frac{1}{1-q}(\widetilde{x}_{e}-q).

Using h′′(x)=1x(1x)h^{\prime\prime}(x)=-\frac{1}{x(1-x)}, we get h(ye)=h(q)1ξ(1ξ)Δeh^{\prime}(y_{e})=h^{\prime}(q)-\frac{1}{\xi(1-\xi)}\Delta_{e} for some ξ\xi between yey_{e} and qq. Note that ηqθs\eta\leq q\leq\theta\leq s and ye[η,θ]y_{e}\in[\eta,\theta]. Thus ηξs\eta\leq\xi\leq s. So

𝔼[yeθh(ye)]=\displaystyle-\mathbb{E}\left[\frac{\partial y_{e}}{\partial\theta}h^{\prime}(y_{e})\right]= h(q)1q𝔼[x~eq]+θq(1q)2𝔼[(x~eq)2ξ(1ξ)]\displaystyle~{}-\frac{h^{\prime}(q)}{1-q}\mathbb{E}\left[\widetilde{x}_{e}-q\right]+\frac{\theta-q}{(1-q)^{2}}\mathbb{E}\left[\frac{(\widetilde{x}_{e}-q)^{2}}{\xi(1-\xi)}\right]
\displaystyle\geq θqs(1q)2Varθ(x~e).\displaystyle~{}\frac{\theta-q}{s(1-q)^{2}}\mathrm{Var}_{\theta}(\widetilde{x}_{e}).

Integrating over θ\theta we get

qs(I)=\displaystyle\int_{q}^{s}\text{(I)}= eqs𝑑θ(𝔼[yeθh(ye)])\displaystyle~{}\sum_{e}\int_{q}^{s}d\theta\left(-\mathbb{E}\left[\frac{\partial y_{e}}{\partial\theta}h^{\prime}(y_{e})\right]\right) (26)
\displaystyle\geq eqs𝑑θθqs(1q)2Varθ(x~e)\displaystyle~{}\sum_{e}\int_{q}^{s}d\theta\frac{\theta-q}{s(1-q)^{2}}\mathrm{Var}_{\theta}(\widetilde{x}_{e}) (27)

Finally, note that the above bound pertains to x~e=𝔼[Aeπ|B\e,A]\widetilde{x}_{e}=\mathbb{E}[A^{\pi}_{e}|B_{\backslash e},A], which we now relate to x^e=𝔼[Aeπ|B,A]\widehat{x}_{e}=\mathbb{E}[A^{\pi}_{e}|B,A]. Denote by μe(|B,A)\mu_{e}(\cdot|B,A) the full posterior law of AeπA^{\pi}_{e}. Note that

x^e=xexeμe(xe|B,A)=xexeμe(xe|B\e,A)pθ(Be|xe)xeμe(xe|B\e,A)pθ(Be|xe)\widehat{x}_{e}=\sum_{x_{e}}x_{e}\mu_{e}(x_{e}|B,A)=\frac{\sum_{x_{e}}x_{e}\mu_{e}(x_{e}|B_{\backslash e},A)p_{\theta}(B_{e}|x_{e})}{\sum_{x_{e}}\mu_{e}(x_{e}|B_{\backslash e},A)p_{\theta}(B_{e}|x_{e})}

By (25), we have

pθ(y|x)=1y+(θx+η(1x))(2y1).p_{\theta}(y|x)=1-y+\left(\theta x+\eta(1-x)\right)(2y-1).

After some simplification, we have

x^e=xexeμe(xe|B,A)=x~e(θBe+(1θ)(1Be))1η(12η)Be+x~e(2Be1)(θη)={(1θ)x~e1ηx~e(θη)Be=0θx~eη+x~e(θη)Be=1.\widehat{x}_{e}=\sum_{x_{e}}x_{e}\mu_{e}(x_{e}|B,A)=\frac{\widetilde{x}_{e}(\theta B_{e}+(1-\theta)(1-B_{e}))}{1-\eta-(1-2\eta)B_{e}+\widetilde{x}_{e}(2B_{e}-1)(\theta-\eta)}=\begin{cases}\frac{(1-\theta)\widetilde{x}_{e}}{1-\eta-\widetilde{x}_{e}(\theta-\eta)}&B_{e}=0\\ \frac{\theta\widetilde{x}_{e}}{\eta+\widetilde{x}_{e}(\theta-\eta)}&B_{e}=1\\ \end{cases}.

Since ηqθs\eta\leq q\leq\theta\leq s, we have

x^eBemin(1,sηx~e)+(1Be)x~e\widehat{x}_{e}\leq B_{e}\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)+(1-B_{e})\widetilde{x}_{e}

and hence

𝔼[x^e2]\displaystyle\mathbb{E}[\widehat{x}_{e}^{2}]\leq 𝔼[min(1,sηx~e)2Be]+𝔼[x~e2].\displaystyle\mathbb{E}\left[\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)^{2}B_{e}\right]+\mathbb{E}[\widetilde{x}_{e}^{2}].

Note that

𝔼[min(1,sηx~e)2Be]=(a)\displaystyle\mathbb{E}\left[\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)^{2}B_{e}\right]\overset{\rm(a)}{=} 𝔼[𝔼[min(1,sηx~e)2|Aeπ]𝔼[BeAeπ]]\displaystyle~{}\mathbb{E}\left[\mathbb{E}\left[\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)^{2}\Big{|}A^{\pi}_{e}\right]\mathbb{E}[B_{e}\mid A^{\pi}_{e}]\right]
=(b)\displaystyle\overset{\rm(b)}{=} 𝔼[𝔼[min(1,sηx~e)2|Aeπ](sAeπ+η(1Aeπ))]\displaystyle~{}\mathbb{E}\left[\mathbb{E}\left[\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)^{2}\Big{|}A^{\pi}_{e}\right](sA^{\pi}_{e}+\eta(1-A^{\pi}_{e}))\right]
(c)\displaystyle\overset{\rm(c)}{\leq} s𝔼[Aeπ]+s𝔼[𝔼[x~e|Aeπ](1Aeπ)]\displaystyle~{}s\mathbb{E}\left[A^{\pi}_{e}\right]+s\mathbb{E}\left[\mathbb{E}\left[\widetilde{x}_{e}\Big{|}A^{\pi}_{e}\right](1-A^{\pi}_{e})\right]
\displaystyle\leq sq+s𝔼[x~e]=2sq,\displaystyle~{}sq+s\mathbb{E}[\widetilde{x}_{e}]=2sq,

where (a) follows from the conditional independence of x~e\widetilde{x}_{e} (which depends on (A,B\e)(A,B_{\backslash e})) and BeB_{e} given AeπA^{\pi}_{e}; (b) follows from (25); (c) follows by using min(1,sηx~e)21\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)^{2}\leq 1 to get the first term and min(1,sηx~e)2sηx~e\min\left(1,\frac{s}{\eta}\widetilde{x}_{e}\right)^{2}\leq\frac{s}{\eta}\widetilde{x}_{e} to get the second term. Combining the previous two displays yields that

𝔼θ[x^e2]2sq+𝔼[x~e2]2sq+q2+Var(x~e2).\displaystyle\mathbb{E}_{\theta}[\widehat{x}_{e}^{2}]\leq 2sq+\mathbb{E}\left[\widetilde{x}_{e}^{2}\right]\leq 2sq+q^{2}+\mathrm{Var}\left(\widetilde{x}_{e}^{2}\right).

It follows that

mmse(Aπ)=e𝔼[(xex^e)2]=e(𝔼[xe2]𝔼[x^e2])(n2)q(1q)2(n2)sqeVar(x~e2).\displaystyle\mathrm{mmse}(A^{\pi})=\sum_{e}\mathbb{E}\left[\left(x_{e}-\widehat{x}_{e}\right)^{2}\right]=\sum_{e}\left(\mathbb{E}\left[x_{e}^{2}\right]-\mathbb{E}\left[\widehat{x}^{2}_{e}\right]\right)\geq\binom{n}{2}q(1-q)-2\binom{n}{2}sq-\sum_{e}\mathrm{Var}\left(\widetilde{x}_{e}^{2}\right). (28)

Combining (27) with (28) yields that

qs(I)\displaystyle\int_{q}^{s}\text{(I)} qs𝑑θθqs(1q)2((n2)q(1q)2(n2)sqmmseθ(Aπ))𝑑θ\displaystyle\geq\int_{q}^{s}d\theta\frac{\theta-q}{s(1-q)^{2}}\left(\binom{n}{2}q(1-q)-2\binom{n}{2}sq-\mathrm{mmse}_{\theta}(A^{\pi})\right)d\theta
=qs𝑑θθqs(1q)2((n2)q(1q)mmseθ(Aπ))𝑑θ(n2)(sq)2q(1q)2\displaystyle=\int_{q}^{s}d\theta\frac{\theta-q}{s(1-q)^{2}}\left(\binom{n}{2}q(1-q)-\mathrm{mmse}_{\theta}(A^{\pi})\right)d\theta-\binom{n}{2}\frac{(s-q)^{2}q}{(1-q)^{2}}
qs𝑑θθqs(1q)2((n2)q(1q)mmseθ(Aπ))𝑑θ(n2)qs2\displaystyle\geq\int_{q}^{s}d\theta\frac{\theta-q}{s(1-q)^{2}}\left(\binom{n}{2}q(1-q)-\mathrm{mmse}_{\theta}(A^{\pi})\right)d\theta-\binom{n}{2}qs^{2}

The conclusion follows by combining the last display with (24).

2.4 Proof of Proposition 4

In this section, we prove Proposition 4 by connecting the MSE of AπA^{\pi} to the Hamming risk of estimating π\pi. In particular, assuming (19), that is, mmse(Aπ)𝔼[A2](1ξ)\mathrm{mmse}(A^{\pi})\geq\mathbb{E}\left[\|{A}\|^{2}\right](1-\xi), we aim to show that 𝔼[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)]=O(ξ1/4+(nlogn𝔼[A2])1/4)\mathbb{E}\left[\mathsf{overlap}(\pi,\widehat{\pi})\right]=O\left(\xi^{1/4}+(\frac{n\log n}{\mathbb{E}\left[\|{A}\|^{2}\right]})^{1/4}\right) for any estimator π^(A,B)\widehat{\pi}(A,B). We first present a general program and then specialize the argument to the Gaussian and Erdős-Rényi graph model.

Recall that 𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)\mathsf{overlap}(\pi,\widehat{\pi}) denotes the fraction of fixed points of σπ1π^\sigma\triangleq\pi^{-1}\circ\widehat{\pi}. Let α(π,π^)\alpha(\pi,\widehat{\pi}) denote the fraction of fixed points of the edge permutation σ𝖤\sigma^{\sf E} induced by the node permutation σ\sigma (cf. Section 1.3.1). The following simple lemma relates α(π,π^)\alpha(\pi,\widehat{\pi}) to 𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)\mathsf{overlap}(\pi,\widehat{\pi}).

Lemma 2.

It holds that

𝔼[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)]𝔼[α(π,π^)]+1n.\mathbb{E}\left[\mathsf{overlap}(\pi,\widehat{\pi})\right]\leq\sqrt{\mathbb{E}\left[\alpha(\pi,\widehat{\pi})\right]}+\frac{1}{n}.
Proof.

In view of (15),

(n𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)2)(n2)α(π,π^).\binom{n\mathsf{overlap}(\pi,\widehat{\pi})}{2}\leq\binom{n}{2}\alpha(\pi,\widehat{\pi}).

By Jensen’s inequality,

(n𝔼[𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)]2)𝔼[(n𝗈𝗏𝖾𝗋𝗅𝖺𝗉(π,π^)2)](n2)𝔼[α(π,π^)].\binom{n\mathbb{E}\left[\mathsf{overlap}(\pi,\widehat{\pi})\right]}{2}\leq\mathbb{E}\left[\binom{n\mathsf{overlap}(\pi,\widehat{\pi})}{2}\right]\leq\binom{n}{2}\mathbb{E}\left[\alpha(\pi,\widehat{\pi})\right].

The desired conclusion follows because for x,y0x,y\geq 0, (nx2)(n2)y\binom{nx}{2}\leq\binom{n}{2}y \iff nx2x(n1)y0nx^{2}-x-(n-1)y\leq 0 \implies x1+1+4n(n1)y2ny+1nx\leq\frac{1+\sqrt{1+4n(n-1)y}}{2n}\leq\sqrt{y}+\frac{1}{n}. ∎

In view of Lemma 2 and the fact that 𝔼[A2]n2\mathbb{E}\left[\|{A}\|^{2}\right]\leq n^{2}, it suffices to show 𝔼[α(π,π^)]=O(ξ1/2+(nlogn𝔼[A2])1/2)\mathbb{E}\left[\alpha(\pi,\widehat{\pi})\right]=O\left(\xi^{1/2}+(\frac{n\log n}{\mathbb{E}\left[\|{A}\|^{2}\right]})^{1/2}\right). Let α0=𝔼[α(π,π^)]\alpha_{0}=\mathbb{E}\left[\alpha(\pi,\widehat{\pi})\right] and define an estimator of AπA^{\pi} by A^=α0Aπ^+(1α0)𝔼[A]\widehat{A}=\alpha_{0}A^{\widehat{\pi}}+(1-\alpha_{0})\mathbb{E}\left[A\right]. This is well-defined since α0\alpha_{0} is deterministic and π^\widehat{\pi} only depends on (A,B)(A,B). Intuitively, A^\widehat{A} can be viewed as an interpolation between the “plug-in” estimator Aπ^A^{\widehat{\pi}} and the trivial estimator 𝔼[Aπ]=𝔼[A]\mathbb{E}\left[A^{\pi}\right]=\mathbb{E}\left[A\right]. We remark that to derive the desired lower bound to α0\alpha_{0}, it is crucial to use the interpolated estimator A^\widehat{A} rather than the “plug-in” estimator, because we expect α0\alpha_{0} to be small and π^\widehat{\pi} is only slightly correlated with π\pi.

On the one hand, by definition of the MMSE and the assumption (19),

𝔼[AπA^2]mmse(Aπ)𝔼[A2](1ξ).\displaystyle\mathbb{E}\left[\|{A^{\pi}-\widehat{A}}\|^{2}\right]\geq\mathrm{mmse}(A^{\pi})\geq\mathbb{E}\left[\|{A}\|^{2}\right](1-\xi). (29)

On the other hand, we claim that in both the Gaussian and Erdős-Rényi model,

𝔼[Aπ,Aπ^]𝔼[A22]α0O(𝔼[A22]nlogn),\displaystyle\mathbb{E}\left[\left\langle A^{\pi},A^{\widehat{\pi}}\right\rangle\right]\geq\mathbb{E}\left[\|A\|_{2}^{2}\right]\alpha_{0}-O\left(\sqrt{\mathbb{E}\left[\|A\|_{2}^{2}\right]n\log n}\right), (30)

so that

𝔼[AπA^2]\displaystyle\mathbb{E}\left[\|{A^{\pi}-\widehat{A}}\|^{2}\right] =𝔼[Aπ2]+𝔼[A^2]2𝔼[Aπ,A^]\displaystyle=\mathbb{E}\left[\|{A^{\pi}}\|^{2}\right]+\mathbb{E}\left[\|{\widehat{A}}\|^{2}\right]-2\mathbb{E}\left[\left\langle A^{\pi},\widehat{A}\right\rangle\right]
=(a)(1+α02)𝔼[A2](1α0)2𝔼[A]22α0𝔼[Aπ,Aπ^]\displaystyle\overset{\rm(a)}{=}(1+\alpha_{0}^{2})\mathbb{E}\left[\|{A}\|^{2}\right]-(1-\alpha_{0})^{2}\|{\mathbb{E}\left[A\right]}\|^{2}-2\alpha_{0}\mathbb{E}\left[\left\langle A^{\pi},A^{\widehat{\pi}}\right\rangle\right]
(1α02)𝔼[A2]+O(α0𝔼[A22]nlogn),\displaystyle\leq(1-\alpha_{0}^{2})\mathbb{E}\left[\|{A}\|^{2}\right]+O\left(\alpha_{0}\sqrt{\mathbb{E}\left[\|A\|_{2}^{2}\right]n\log n}\right), (31)

where in (a) we used the fact that 𝔼[Aπ]=𝔼[A]\mathbb{E}\left[A^{\pi}\right]=\mathbb{E}\left[A\right] is entrywise constant and 𝔼[A],Aπ^=𝔼[A12]i<jAπ^(i),π^(j)=𝔼[A12]i<jAi,j=𝔼[A],A\left\langle\mathbb{E}\left[A\right],A^{\widehat{\pi}}\right\rangle=\mathbb{E}[A_{12}]\sum_{i<j}A_{\widehat{\pi}(i),\widehat{\pi}(j)}=\mathbb{E}[A_{12}]\sum_{i<j}A_{i,j}=\left\langle\mathbb{E}\left[A\right],A\right\rangle so that 𝔼[A],𝔼[Aπ^]=𝔼[A]2\left\langle\mathbb{E}\left[A\right],\mathbb{E}[A^{\widehat{\pi}}]\right\rangle=\|\mathbb{E}[A]\|^{2}. Combining (29) and (31) yields that

α02ξ+O(α0nlogn𝔼[A2])α0=O(ξ1/2+nlogn𝔼[A2]).\displaystyle\alpha_{0}^{2}\leq\xi+O\left(\alpha_{0}\sqrt{\frac{n\log n}{\mathbb{E}\left[\|{A}\|^{2}\right]}}\right)\Longrightarrow\alpha_{0}=O\left(\xi^{1/2}+\sqrt{\frac{n\log n}{\mathbb{E}\left[\|{A}\|^{2}\right]}}\right). (32)

To finish the proof, it remains to prove the claim (30).

Proof of Claim (30).

Let CC be a sufficiently large enough constant. For each permutation π𝒮n\pi^{\prime}\in{\mathcal{S}}_{n}, define an event

π={Aπ,Aπ𝔼[A2]α(π,π)C𝔼[A2]nlogn}{\mathcal{F}}_{\pi^{\prime}}=\left\{\langle A^{\pi},A^{\pi^{\prime}}\rangle\geq\mathbb{E}\left[\|{A}\|^{2}\right]\alpha(\pi,\pi^{\prime})-C\sqrt{\mathbb{E}\left[\|{A}\|^{2}\right]n\log n}\right\} (33)

and set π𝒮nπ{\mathcal{F}}\triangleq\cap_{\pi^{\prime}\in{\mathcal{S}}_{n}}{\mathcal{F}}_{\pi^{\prime}}. It follows that

𝔼[Aπ,Aπ^]\displaystyle\mathbb{E}\left[\left\langle A^{\pi},A^{\widehat{\pi}}\right\rangle\right] =𝔼[Aπ,Aπ^𝟏{}]+𝔼[Aπ,Aπ^𝟏{c}]\displaystyle=\mathbb{E}\left[\left\langle A^{\pi},A^{\widehat{\pi}}\right\rangle{\mathbf{1}_{\left\{{{\mathcal{F}}}\right\}}}\right]+\mathbb{E}\left[\left\langle A^{\pi},A^{\widehat{\pi}}\right\rangle{\mathbf{1}_{\left\{{{\mathcal{F}}^{c}}\right\}}}\right]
𝔼[A2]𝔼[α(π,π^)𝟏{}]C𝔼[A2]nlogn𝔼[A2𝟏{c}]\displaystyle\geq\mathbb{E}\left[\|{A}\|^{2}\right]\mathbb{E}\left[\alpha(\pi,\widehat{\pi}){\mathbf{1}_{\left\{{{\mathcal{F}}}\right\}}}\right]-C\sqrt{\mathbb{E}\left[\|{A}\|^{2}\right]n\log n}-\mathbb{E}\left[\|{A}\|^{2}{\mathbf{1}_{\left\{{{\mathcal{F}}^{c}}\right\}}}\right]
𝔼[A2](α0{c})C𝔼[A2]nlogn𝔼[A4]{c},\displaystyle\geq\mathbb{E}\left[\|{A}\|^{2}\right]\left(\alpha_{0}-\mathbb{P}\left\{{\mathcal{F}}^{c}\right\}\right)-C\sqrt{\mathbb{E}\left[\|{A}\|^{2}\right]n\log n}-\sqrt{\mathbb{E}\left[\|{A}\|^{4}\right]\mathbb{P}\left\{{\mathcal{F}}^{c}\right\}},

where the last inequality holds because

𝔼[α(π,π^)𝟏{}]=𝔼[α(π,π^)]𝔼[α(π,π^)𝟏{c}]α0{c},\mathbb{E}\left[\alpha(\pi,\widehat{\pi}){\mathbf{1}_{\left\{{{\mathcal{F}}}\right\}}}\right]=\mathbb{E}\left[\alpha(\pi,\widehat{\pi})\right]-\mathbb{E}\left[\alpha(\pi,\widehat{\pi}){\mathbf{1}_{\left\{{{\mathcal{F}}^{c}}\right\}}}\right]\geq\alpha_{0}-\mathbb{P}\left\{{\mathcal{F}}^{c}\right\},

and 𝔼[A2𝟏{c}]𝔼[A4]{c}\mathbb{E}\left[\|{A}\|^{2}{\mathbf{1}_{\left\{{{\mathcal{F}}^{c}}\right\}}}\right]\leq\sqrt{\mathbb{E}\left[\|{A}\|^{4}\right]\mathbb{P}\left\{{\mathcal{F}}^{c}\right\}} by the Cauchy-Schwarz inequality. Note that 𝔼[A4]=O(n4)\mathbb{E}\left[\|{A}\|^{4}\right]=O(n^{4}), and 𝔼[A2]\mathbb{E}\left[\|{A}\|^{2}\right] is equal to (n2)\binom{n}{2} in the Gaussian case and (n2)q\binom{n}{2}q in the Erdős-Rényi case (with qnO(1)q\geq n^{-O(1)}). To get (30), it suffices to prove {c}enlogn\mathbb{P}\left\{{\mathcal{F}}^{c}\right\}\leq e^{-n\log n}, which, by union bound, further reduces to showing that {πc}e2nlogn\mathbb{P}\left\{{\mathcal{F}}_{\pi^{\prime}}^{c}\right\}\leq e^{-2n\log n} for any permutation π𝒮n\pi^{\prime}\in{\mathcal{S}}_{n}. To this end, we consider the Gaussian and Erdős-Rényi graph model separately.

For the Gaussian Winger model, let M{0,1}([n]2)×([n]2)M\in\{0,1\}^{\binom{[n]}{2}\times\binom{[n]}{2}} denote the permutation matrix corresponding to the edge permutation σ𝖤\sigma^{\sf E} induced by σ=π1π\sigma=\pi^{-1}\circ\pi^{\prime}. Recalling that AA denotes the weighted adjacency vector, we have Aπ,Aπ=AMA.\langle A^{\pi},A^{\pi^{\prime}}\rangle=A^{\top}MA. By Hanson-Wright inequality Lemma 9, with probability at least 1δ1-\delta

AMA𝖳𝗋(M)C(MF2log(1/δ)+M2log(1/δ)),A^{\top}MA\geq\mathsf{Tr}(M)-C^{\prime}\left(\|M\|_{F}^{2}\sqrt{\log(1/\delta)}+\|M\|_{2}\log(1/\delta)\right),

where C>0C^{\prime}>0 is a universal constant. Since 𝖳𝗋(M)=(n2)α(π,π)\mathsf{Tr}(M)=\binom{n}{2}\alpha(\pi,\pi^{\prime}), MF2=(n2)\|M\|_{F}^{2}=\binom{n}{2}, and the spectral norm M2=1\|M\|_{2}=1, it follows that with probability at least 1δ1-\delta,

AMA(n2)α(π,π)C(nlog(1/δ)+log(1/δ)).A^{\top}MA\geq\binom{n}{2}\alpha(\pi,\pi^{\prime})-C^{\prime}\left(n\sqrt{\log(1/\delta)}+\log(1/\delta)\right).

Choosing δ=e2nlogn\delta=e^{-2n\log n} and CC in (33) to be a large enough constant, we get that {π}1e2nlogn.\mathbb{P}\left\{{\mathcal{F}}_{\pi^{\prime}}\right\}\geq 1-e^{-2n\log n}.

Next, we move to the Erdős-Rényi graph model. Fix any permutation π𝒮n\pi^{\prime}\in{\mathcal{S}}_{n}. Let O1O_{1} denote the set of fixed points of the edge permutation induced by ππ1\pi^{\prime}\circ\pi^{-1}. By definition, |O1|=(n2)α(π,π)|O_{1}|=\binom{n}{2}\alpha(\pi,\pi^{\prime}) and

Aπ,Aπ(i,j)O1AijBinom((n2)α(π,π),q).\left\langle A^{\pi},A^{\pi^{\prime}}\right\rangle\geq\sum_{(i,j)\in O_{1}}A_{ij}\sim{\rm Binom}\left(\binom{n}{2}\alpha(\pi,\pi^{\prime}),q\right).

By Bernstein’s inequality, with probability at least 1δ1-\delta,

Aπ,Aπ\displaystyle\left\langle A^{\pi},A^{\pi^{\prime}}\right\rangle (n2)α(π,π)qC((n2)α(π,π)qlog(1/δ)+log(1/δ))\displaystyle\geq\binom{n}{2}\alpha(\pi,\pi^{\prime})q-C^{\prime}\left(\sqrt{\binom{n}{2}\alpha(\pi,\pi^{\prime})q\log(1/\delta)}+\log(1/\delta)\right)
(n2)α(π,π)qC(n2qlog(1/δ)+log(1/δ)),\displaystyle\geq\binom{n}{2}\alpha(\pi,\pi^{\prime})q-C^{\prime}\left(\sqrt{n^{2}q\log(1/\delta)}+\log(1/\delta)\right),

where C>0C^{\prime}>0 is a universal constant. Choosing δ=e2nlogn\delta=e^{-2n\log n} and CC to be a large enough constant, we get that {π}1e2nlogn.\mathbb{P}\left\{{\mathcal{F}}_{\pi^{\prime}}\right\}\geq 1-e^{-2n\log n}.

3 Possibility of Partial and Almost Exact Recovery

In this section, we prove the positive parts of Theorem 2 and Theorem 3.

We first argue that we can assume ps1/2ps\leq 1/2 without loss of generality. Note that the existence/absence of an edge is a matter of representation and they are mathematically equivalent. As a consequence, by flipping 0 and 11, the model with parameter (n,p,s)(n,p,s) is equivalent to that with parameter (n,p,s)(n,p^{\prime},s^{\prime}) for an appropriate choice of pp^{\prime} and ss^{\prime} such that ps=1psp^{\prime}s^{\prime}=1-ps, where

p=(1ps)212ps+ps2,s\displaystyle p^{\prime}=\frac{\left(1-ps\right)^{2}}{1-2ps+ps^{2}},\quad s^{\prime} =12ps+ps21ps.\displaystyle=\frac{1-2ps+ps^{2}}{1-ps}\,. (34)

Therefore when ps>12ps>\frac{1}{2}, we can replace the model (n,p,s)(n,p,s) with (n,p,s)(n,p^{\prime},s^{\prime}) so that ps=1ps<12p^{\prime}s^{\prime}=1-ps<\frac{1}{2}.

For any two permutations π,π𝒮n\pi,{\pi^{\prime}}\in{\mathcal{S}}_{n}, let d(π,π)d(\pi,{\pi^{\prime}}) denote the number of non-fixed points in the ππ1\pi^{\prime}\circ\pi^{-1}. The following proposition provides sufficient conditions for π^𝖬𝖫\widehat{\pi}_{\sf ML} defined in (1) to achieve the partial recovery and almost exact recovery in Erdős-Rényi graphs.

Proposition 5.

Let ps12ps\leq\frac{1}{2}. If p=1o(1)p=1-o(1),

ns2(1p)2(1ps)2(4+ϵ)logn,\displaystyle\frac{ns^{2}(1-p)^{2}}{(1-ps)^{2}}\geq(4+\epsilon)\log n\,, (35)

or if p=1Ω(1)p=1-\Omega(1),

nps2{(2+ϵ)lognlog(1/p)1+p if pn124+ϵ if p<n12,\displaystyle nps^{2}\geq\begin{cases}\frac{(2+\epsilon)\log n}{\log(1/p)-1+p}&\text{ if }p\geq n^{-\frac{1}{2}}\\ 4+\epsilon&\text{ if }p<n^{-\frac{1}{2}}\end{cases}\,, (36)

for any arbitrarily small constant ϵ>0\epsilon>0, there exists a constant 0<δ<10<\delta<1 such that

{d(π^𝖬𝖫,π)<δn}1n1+o(1),\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)<\delta n\right\}\geq 1-n^{-1+o(1)},

that is, π^𝖬𝖫\widehat{\pi}_{\sf ML} achieves partial recovery.

If in addition nps2(1p)2=ω(1)nps^{2}(1-p)^{2}=\omega(1), then for any constant δ>0\delta>0,

{d(π^𝖬𝖫,π)<δn}1n1+o(1),\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)<\delta n\right\}\geq 1-n^{-1+o(1)},

that is, π^𝖬𝖫\widehat{\pi}_{\sf ML} achieves almost exact recovery.

Remark 2.

We explain how to prove the positive parts of Theorem 2 and Theorem 3 using Proposition 5.

  • In the dense regime of p=no(1)p=n^{-o(1)}, either (35) or (36) already implies that nps2(1p)2=ω(1)nps^{2}(1-p)^{2}=\omega(1) and hence the MLE achieves almost exact recovery under the condition (35) when p=1o(1)p=1-o(1) or nps2(2+ϵ)lognlog(1/p)1+pnps^{2}\geq\frac{(2+\epsilon)\log n}{\log(1/p)-1+p} when p=1Ω(1)p=1-\Omega(1); this proves the positive part of Theorem 2.

  • In the sparse regime of p=nΩ(1)p=n^{-\Omega(1)}, condition (9) implies (36) and hence the MLE achieves partial recovery; this proves the positive part of Theorem 3. Furthermore, since nps2=ω(1)nps^{2}=\omega(1) implies (36), the MLE achieves the almost exact recovery provided that nps2=ω(1)nps^{2}=\omega(1), which is in fact needed for any estimator to succeed [CKMP19].

To prove Proposition 5, we need the following intermediate lemma, which bounds the probability that the ML estimator (1) makes a given number of errors.

Lemma 3.

Let ϵ(0,1)\epsilon\in(0,1) be an arbitrary constant and ps12ps\leq\frac{1}{2}.

  • For the Erdős-Rényi model, suppose that either (35) or (36) holds. Then there exists some constant 0<δ<10<\delta<1 such that for any kδnk\geq\delta n,

    {d(π^𝖬𝖫,π)=k}2exp(nh(kn))𝟏{kn1}\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\}\leq 2\exp\left(-nh\left(\frac{k}{n}\right)\right){\mathbf{1}_{\left\{{k\leq n-1}\right\}}}
    +e2logn𝟏{k=n}+exp(164ϵklogn),\displaystyle~{}~{}~{}~{}+e^{-2\log n}{\mathbf{1}_{\left\{{k=n}\right\}}}+\exp\left(-\frac{1}{64}\epsilon k\log n\right), (37)

    where h(x)=xlogx(1x)log(1x)h(x)=-x\log x-(1-x)\log(1-x) is the binary entropy function.

    If in addition nps2(1p)=ω(1)nps^{2}(1-p)=\omega(1), then (37) holds for any constant 0<δ<10<\delta<1 and all kδn.k\geq\delta n.

  • For the Gaussian model, suppose that nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n. Then (37) holds for any constant 0<δ<10<\delta<1 and all kδnk\geq\delta n.

Note that Lemma 3 also includes the Gaussian case. In fact, analogous to Proposition 5, we can apply Lemma 3 to show that the MLE attains almost exact recovery when nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n. We will not do it here; instead in the next section, we will directly prove a stronger result, showing that the MLE attains exact recovery under the same condition.

Now, we proceed to prove Proposition 5.

Proof of Proposition 5.

Applying Lemma 3 with a union bound yields that

{d(π^𝖬𝖫,π)δn}\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)\geq\delta n\right\} kδnn{d(π^𝖬𝖫,π)=k}\displaystyle\leq\sum_{k\geq\delta n}^{n}\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\}
exp(2logn)+2kδnn1exp(nh(kn))+kδnexp(164ϵklogn)\displaystyle\leq\exp\left(-2\log n\right)+2\sum_{k\geq\delta n}^{n-1}\exp\left(-nh\left(\frac{k}{n}\right)\right)+\sum_{k\geq\delta n}\exp\left(-\frac{1}{64}\epsilon k\log n\right)
=n1+o(1),\displaystyle=n^{-1+o(1)}, (38)

where the last inequality follows from kδnexp(164ϵklogn)exp(ϵ64δnlogn)1exp(ϵ64logn)=nΩ(n)\sum_{k\geq\delta n}\exp\left(-\frac{1}{64}\epsilon k\log n\right)\leq\frac{\exp\left(-\frac{\epsilon}{64}\delta n\log n\right)}{1-\exp\left(-\frac{\epsilon}{64}\log n\right)}=n^{-\Omega(n)} for any fixed constant δ>0\delta>0, and

k1n1exp(nh(kn))(a)\displaystyle\sum_{k\geq 1}^{n-1}\exp\left(-nh\left(\frac{k}{n}\right)\right)\overset{\rm(a)}{\leq} 21kn/2exp(klognk)\displaystyle~{}2\sum_{1\leq k\leq n/2}\exp\left(-k\log\frac{n}{k}\right)
\displaystyle\leq 2k=110lognexp(klognk)+210lognkn/22k\displaystyle~{}2\sum_{k=1}^{10\log n}\exp\left(-k\log\frac{n}{k}\right)+2\sum_{10\log n\leq k\leq n/2}2^{-k}
\displaystyle\leq 2elogn×10logn+4×210logn=n1+o(1),\displaystyle~{}2e^{-\log n}\times 10\log n+4\times 2^{-10\log n}=n^{-1+o(1)},

where (a) follows from h(x)=h(1x)h(x)=h(1-x) and h(x)xlog1xh(x)\geq x\log\frac{1}{x}. ∎

3.1 Proof of Lemma 3

Without loss of generality, we assume ϵ<1\epsilon<1. Fix k[n]k\in[n]. Let 𝒯k{\mathcal{T}}_{k} denote the set of permutations π{\pi^{\prime}} such that d(π,π)=kd(\pi,{\pi^{\prime}})=k. Recall that FF is the set of fixed points of σπ1π\sigma\triangleq\pi^{-1}\circ{\pi^{\prime}} with |F|=nk|F|=n-k and 𝒪1=(F2){\mathcal{O}}_{1}=\binom{F}{2} is a subset of fixed points of edge permutation σ𝖤\sigma^{\sf E}. Let A¯=(A¯ij)1i<jn=(Aij𝔼[Aij])1i<jn\overline{A}=(\overline{A}_{ij})_{1\leq i<j\leq n}=(A_{ij}-\mathbb{E}\left[A_{ij}\right])_{1\leq i<j\leq n} and B¯=(B¯ij)1i<jn=(Bij𝔼[Bij])1i<jn\overline{B}=(\overline{B}_{ij})_{1\leq i<j\leq n}=(B_{ij}-\mathbb{E}\left[B_{ij}\right])_{1\leq i<j\leq n} denote the centered adjacency vectors of AA and BB respectively. It follows from the definition of MLE in (1) that for any τ\tau\in{\mathbb{R}},

{d(π^𝖬𝖫,π)=k}\displaystyle\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\} {π𝒯k:Aπ,BAπ,B0}\displaystyle\subset\left\{\exists{\pi^{\prime}}\in{\mathcal{T}}_{k}:\left\langle A^{\pi},B\right\rangle-\left\langle A^{\pi^{\prime}},B\right\rangle\leq 0\right\}
(a){π𝒯k:i<jA¯π(i)π(j)B¯iji<jA¯π(i)π(j)B¯ij0}\displaystyle\overset{(a)}{\subset}\left\{\exists{\pi^{\prime}}\in{\mathcal{T}}_{k}:\sum_{i<j}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}-\sum_{i<j}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}\leq 0\right\}
={π𝒯k:(i,j)𝒪1A¯π(i)π(j)B¯ij(i,j)𝒪1A¯π(i)π(j)B¯ij0}\displaystyle=\left\{\exists{\pi^{\prime}}\in{\mathcal{T}}_{k}:\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}-\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}\leq 0\right\}
{π𝒯k:(i,j)𝒪1A¯π(i)π(j)B¯ij<τ}{π𝒯k:(i,j)𝒪1A¯π(i)π(j)B¯ijτ},\displaystyle\subset\left\{\exists{\pi^{\prime}}\in{\mathcal{T}}_{k}:\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}<\tau\right\}\cup\left\{\exists\pi\in{\mathcal{T}}_{k}:\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}\geq\tau\right\},

where (a)(a) holds because Aπ,BAπ,B=i<jA¯π(i)π(j)B¯iji<jA¯π(i)π(j)B¯ij\left\langle A^{\pi},B\right\rangle-\left\langle A^{\pi^{\prime}},B\right\rangle=\sum_{i<j}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}-\sum_{i<j}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}. Note that

{π𝒯k:(i,j)𝒪1A¯π(i)π(j)B¯ij<τ}={F[n]:|F|=nk,(i,j)(F2)A¯π(i)π(j)B¯ij<τ}.\left\{\exists{\pi^{\prime}}\in{\mathcal{T}}_{k}:\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}<\tau\right\}=\left\{\exists F\subset[n]:|F|=n-k,\;\sum_{(i,j)\notin\binom{F}{2}}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}<\tau\right\}.

Thus, by the union bound,

{d(π^𝖬𝖫,π)=k}\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\} F[n]:|F|=nk{(i,j)(F2)A¯π(i)π(j)B¯ij<τ}+π𝒯k{(i,j)𝒪1A¯π(i)π(j)B¯ijτ}.\displaystyle\leq\sum_{F\subset[n]:|F|=n-k}\mathbb{P}\left\{\sum_{(i,j)\notin\binom{F}{2}}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij}<\tau\right\}+\sum_{{\pi^{\prime}}\in{\mathcal{T}}_{k}}\mathbb{P}\left\{\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}\geq\tau\right\}.

Let Y=(i,j)𝒪1A¯π(i)π(j)B¯ijY=\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{\pi(i)\pi(j)}\overline{B}_{ij} and Xπ=(i,j)𝒪1A¯π(i)π(j)B¯ijX_{\pi^{\prime}}=\sum_{(i,j)\notin{\mathcal{O}}_{1}}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}. Then

{d(π^𝖬𝖫,π)=k}(nk){Yτ}+π𝒯k{Xπτ}(I)+(II).\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\}\leq\binom{n}{k}\mathbb{P}\left\{Y\leq\tau\right\}+\sum_{{\pi^{\prime}}\in{\mathcal{T}}_{k}}\mathbb{P}\left\{X_{\pi^{\prime}}\geq\tau\right\}\triangleq\text{(I)}+\text{(II)}. (39)

For (I), we claim that for both the Erdős-Rényi random graph and Gaussian model and appropriately choice of τ\tau,

(I)exp(nh(kn))+exp(2logn)𝟏{k=n}.\displaystyle\text{(I)}\leq\exp\left(-nh\left(\frac{k}{n}\right)\right)+\exp\left(-2\log n\right){\mathbf{1}_{\left\{{k=n}\right\}}}\,. (40)

For (II), applying the Chernoff bound together with the union bound yields that for any t>0t>0,

(II)π𝒯k{Xπτ}nkexp(tτ)𝔼[exp(tXπ)].\displaystyle\text{(II)}\leq\sum_{{\pi^{\prime}}\in{\mathcal{T}}_{k}}\mathbb{P}\left\{X_{\pi^{\prime}}\geq\tau\right\}\leq n^{k}\exp\left({-t\tau}\right)\mathbb{E}\left[\exp\left({tX_{\pi^{\prime}}}\right)\right]. (41)

Note that

Xπ=O𝒪\𝒪1(i,j)OA¯π(i)π(j)B¯ijX¯O,X_{\pi^{\prime}}=\sum_{O\in{\mathcal{O}}\backslash{\mathcal{O}}_{1}}\underbrace{\sum_{(i,j)\in O}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}}_{\triangleq\overline{X}_{O}},

where 𝒪{\mathcal{O}} denotes the collection of all edge orbits of σ\sigma. Since edge orbits are disjoint, it follows that X¯O\overline{X}_{O} are mutually independent across different OO. Therefore,

𝔼[exp(tXπ)]=O𝒪\𝒪1𝔼[exp(tX¯O)].\displaystyle\mathbb{E}\left[\exp\left({tX_{\pi^{\prime}}}\right)\right]=\prod_{O\in{\mathcal{O}}\backslash{\mathcal{O}}_{1}}\mathbb{E}\left[\exp\left(t\overline{X}_{O}\right)\right]. (42)

It turns out that the MGF of X¯O\overline{X}_{O} can be explicitly computed for both Erdős-Rényi random graph and Gaussian model. In particular, applying Lemma 7 yields that 𝔼[etX¯O]=M|O|,\mathbb{E}\left[e^{t\overline{X}_{O}}\right]=M_{|O|}, and MM2/2M_{\ell}\leq M_{2}^{\ell/2} for 2\ell\geq 2. It follows that

𝔼[exp(tXπ)]=(a)M1n2=2(n2)MN(b)M1n2=2(n2)M2N/2=(c)(M12M2)12n2M2(n2)(n12)2=(d)(M12M2)k2M2m2,\mathbb{E}\left[\exp\left({tX_{\pi^{\prime}}}\right)\right]\overset{\rm(a)}{=}M_{1}^{n_{2}}\prod_{\ell=2}^{\binom{n}{2}}M_{\ell}^{N_{\ell}}\overset{\rm(b)}{\leq}M_{1}^{n_{2}}\prod_{\ell=2}^{\binom{n}{2}}M_{2}^{\ell N_{\ell}/2}\overset{\rm(c)}{=}\left(\frac{M_{1}^{2}}{M_{2}}\right)^{\frac{1}{2}n_{2}}M_{2}^{\frac{\binom{n}{2}-\binom{n_{1}}{2}}{2}}\overset{\rm(d)}{=}\left(\frac{M_{1}^{2}}{M_{2}}\right)^{\frac{k}{2}}M_{2}^{\frac{m}{2}}\,,

where (a) follows from (42) and N1=(n12)+n2N_{1}=\binom{n_{1}}{2}+n_{2} in view of (15); (b) follows from MM2/2M_{\ell}\leq M_{2}^{\ell/2} for 2\ell\geq 2; (c) follows from =1(n2)N=(n2)\sum_{\ell=1}^{\binom{n}{2}}\ell N_{\ell}=\binom{n}{2}; (d) follows from n2kn_{2}\leq k, n1=nkn_{1}=n-k, and m(n2)(nk2)m\triangleq\binom{n}{2}-\binom{n-k}{2}. Combining the last displayed equation with (41) yields that

(II) exp(klogntτ+k2logM12M2+m2logM2)\displaystyle\leq\exp\left(k\log n-t\tau+\frac{k}{2}\log\frac{M_{1}^{2}}{M_{2}}+\frac{m}{2}\log M_{2}\right)
(a)exp(klogntτ+k2logM12M2+m2(M21))(b)exp(ϵ64klogn),\displaystyle\overset{(a)}{\leq}\exp\left(k\log n-t\tau+\frac{k}{2}\log\frac{M_{1}^{2}}{M_{2}}+\frac{m}{2}\left(M_{2}-1\right)\right)\overset{(b)}{\leq}\exp\left(-\frac{\epsilon}{64}k\log n\right),

where (a)(a) holds because log(1+x)x\log(1+x)\leq x for any x>1x>-1; (b)(b) follows from the claim that

inft0{tτ+k2logM12M2+m2(M21)}(1+ϵ64)klogn.\displaystyle\inf_{t\geq 0}\left\{-t\tau+\frac{k}{2}\log\frac{M_{1}^{2}}{M_{2}}+\frac{m}{2}\left(M_{2}-1\right)\right\}\leq-\left(1+\frac{\epsilon}{64}\right)k\log n. (43)

It remains to specify τ\tau and verify (40) and (43) for Erdős-Rényi and Gaussian models separately. In the following, we will use the inequality

12(11n)knmkn,\displaystyle\frac{1}{2}\left(1-\frac{1}{n}\right)kn\leq m\leq kn\,, (44)

which follows from m=(n2)(nk2)=12kn(2k+1n)m=\binom{n}{2}-\binom{n-k}{2}=\frac{1}{2}kn\left(2-\frac{k+1}{n}\right).

  • For Erdős-Rényi random graphs, A¯=(Aijq)1i<jn\overline{A}=(A_{ij}-q)_{1\leq i<j\leq n} and B¯=(Bijq)1i<jn\overline{B}=(B_{ij}-q)_{1\leq i<j\leq n}, where q=psq=ps. Then for any F[n]F\subset[n] with |F|=nk|F|=n-k,

    Y=(i,j)(F2)(Aπ(i)π(j)q)(Bijq).Y=\sum_{(i,j)\notin\binom{F}{2}}(A_{\pi(i)\pi(j)}-q)(B_{ij}-q)\,.

    Note that 𝔼[(Aπ(i)π(j)q)(Bijq)]=ps2(1p)\mathbb{E}\left[(A_{\pi(i)\pi(j)}-q)(B_{ij}-q)\right]=ps^{2}(1-p) any i<ji<j.

    Let μ𝔼[Y]=mps2(1p)\mu\triangleq\mathbb{E}\left[Y\right]=mps^{2}(1-p) and set τ=(1γ)μ\tau=(1-\gamma)\mu where

    γ{16h(k/n)kps2(1p)2kn116lognn(n1)ps2(1p)2k=n.\gamma\triangleq\begin{cases}\sqrt{\frac{16h(k/n)}{kps^{2}(1-p)^{2}}}&k\leq n-1\\ \sqrt{\frac{16\log n}{n(n-1)ps^{2}(1-p)^{2}}}&k=n\end{cases}\,.

    We next choose a constant 0<δ<10<\delta<1 such that h(δ)/δϵ2212nps2(1p)2h\left(\delta\right)/\delta\leq\frac{\epsilon^{2}}{2^{12}}nps^{2}(1-p)^{2}. Since h(x)/xh(x)/x is monotone decreasing in xx and converges to 0 as x1x\to 1, under the condition (35) or (36), there exists some 0<δ<10<\delta<1 such that h(δ)/δϵ2212nps2(1p)2h\left(\delta\right)/\delta\leq\frac{\epsilon^{2}}{2^{12}}nps^{2}(1-p)^{2}. If further nps2(1p)2=ω(1)nps^{2}(1-p)^{2}=\omega(1) then for any constant 0<δ<10<\delta<1, h(δ)/δϵ2212nps2(1p)2h\left(\delta\right)/\delta\leq\frac{\epsilon^{2}}{2^{12}}nps^{2}(1-p)^{2}. Hence, for all δnkn1\delta n\leq k\leq n-1, we have nkh(kn)h(δ)/δϵ2212nps2(1p)2\frac{n}{k}h\left(\frac{k}{n}\right)\leq h\left(\delta\right)/\delta\leq\frac{\epsilon^{2}}{2^{12}}nps^{2}(1-p)^{2}, and then γ16h(δ)/δnps2(1p)2<ϵ16\gamma\leq\sqrt{\frac{16h(\delta)/\delta}{nps^{2}(1-p)^{2}}}<\frac{\epsilon}{16}; if k=nk=n, since nps2(1p)2=Ω(1)nps^{2}(1-p)^{2}=\Omega(1) under the condition (35) or (36), γ<ϵ16\gamma<\frac{\epsilon}{16} for all sufficiently large nn. In conclusion, we have γ<ϵ16\gamma<\frac{\epsilon}{16}.

    Applying the Bernstein inequality yields

    {Yτ}(a)exp(12γ2μ2mps2+13γμ)\displaystyle\mathbb{P}\left\{Y\leq\tau\right\}\overset{(a)}{\leq}\exp\left(-\frac{\frac{1}{2}\gamma^{2}\mu^{2}}{mps^{2}+\frac{1}{3}\gamma\mu}\right) (b)exp(12+ϵ24γ2μ(1p))\displaystyle\overset{(b)}{\leq}\exp\left(-\frac{1}{2+\frac{\epsilon}{24}}\gamma^{2}\mu\left(1-p\right)\right)
    exp(14γ2μ(1p)),\displaystyle\leq\exp\left(-\frac{1}{4}\gamma^{2}\mu(1-p)\right)\,, (45)

    where (a)(a) holds because for any i<ji<j

    𝔼[(Aπ(i)π(j)q)2(Bijq)2]\displaystyle\mathbb{E}\left[(A_{\pi(i)\pi(j)}-q)^{2}(B_{ij}-q)^{2}\right] =ps2(14ps+2p2s+4p2s23p3s2)\displaystyle=ps^{2}\left(1-4ps+2p^{2}s+4p^{2}s^{2}-3p^{3}s^{2}\right)
    ps2(12ps+4p2s2)ps2,\displaystyle\leq ps^{2}\left(1-2ps+4p^{2}s^{2}\right)\leq ps^{2}\,,

    in view of ps12ps\leq\frac{1}{2}; (b)(b) holds because mps2=μ1pmps^{2}=\frac{\mu}{1-p} and γ<ϵ16\gamma<\frac{\epsilon}{16}. By (44), we have

    γ2μ(1p){8(n1)h(k/n).kn18lognk=n.\displaystyle\gamma^{2}\mu(1-p)\geq\begin{cases}8(n-1)h(k/n).&k\leq n-1\\ 8\log n&k=n\end{cases}\,.

    Finally, applying (45) together with (nk)enh(kn)\binom{n}{k}\leq e^{nh\left(\frac{k}{n}\right)}, we arrive at the desired claim (40).

    Next, we verify (43) by choosing t>0t>0 appropriately. Recall that q=psq=ps. In view of (65) in Lemma 7, M2M_{2} depends on tt and qq. When qq is small and pp is bounded away from 11, we will choose tlog1pt\leq\log\frac{1}{p}; otherwise, we will choose t1210t\leq\frac{1}{2^{10}}. As such, we bound M2M_{2} using its Taylor expansion around q=0q=0 and t=0t=0, respectively, resulting in the following lemma.

    Lemma 4.

    Assume ps12ps\leq\frac{1}{2}.

    • If tlog1pt\leq\log\frac{1}{p}, then M12M2e2\frac{M_{1}^{2}}{M_{2}}\leq e^{2}, and

      M2\displaystyle M_{2} 1+q2s22q2+10q3t(1+t)+2etq2(1s2)+e2tq2s2.\displaystyle\leq 1+q^{2}s^{2}-2q^{2}+10q^{3}t(1+t)+2e^{t}q^{2}(1-s^{2})+e^{2t}q^{2}s^{2}\,. (46)
    • If t1210t\leq\frac{1}{2^{10}}, then M12M2e2\frac{M_{1}^{2}}{M_{2}}\leq e^{2}, and

      M21+t2σ4(1+ρ2)+8t3qs,\displaystyle M_{2}\leq 1+t^{2}\sigma^{4}\left(1+\rho^{2}\right)+8t^{3}qs\,, (47)

      where ρ=sq1q\rho=\frac{s-q}{1-q} and σ2=q(1q)\sigma^{2}=q(1-q).

    The proof of Lemma 4 is deferred to Appendix B. With Lemma 4, we separately consider two cases, depending on whether qq is small and pp is bounded away from 11. Pick q0q_{0} and c0c_{0} to be sufficiently small constants such that

    c0ϵ213,q0ϵ480ϕ(1c0),c_{0}\leq\frac{\epsilon}{2^{13}},\quad q_{0}\leq\frac{\epsilon}{480}\phi\left(1-c_{0}\right)\,,

    where for any x(0,1)x\in(0,1), we define

    ϕ(x)log1x1+xx(log1x)(1+log1x)\displaystyle\phi\left(x\right)\triangleq\frac{\log\frac{1}{x}-1+x}{x\left(\log\frac{1}{x}\right)\left(1+\log\frac{1}{x}\right)}\, (48)

    This function is monotonically decreasing on (0,1)(0,1) (see Appendix C for a proof).

    Then we separately consider two cases.

    Case 1: qq0q\leq q_{0} and p1c0p\leq 1-c_{0}. Here we will pick tlog1pt\leq\log\frac{1}{p}. Then, by (46) in Lemma 4,

    tτ+k2logM12M2+m2(M21)\displaystyle-t\tau+\frac{k}{2}\log\frac{M_{1}^{2}}{M_{2}}+\frac{m}{2}\left(M_{2}-1\right) (a)t(1γ)mps2(1p)+k+m2(M21)\displaystyle\overset{(a)}{\leq}-t(1-\gamma)mps^{2}(1-p)+k+\frac{m}{2}\left(M_{2}-1\right)
    (b)k+12mps2(f(t)+10p2st(1+t)),\displaystyle\overset{(b)}{\leq}k+\frac{1}{2}mps^{2}\left(f(t)+10p^{2}st(1+t)\right)\,,

    where

    f(t)2(1γ)t+e2tps2+2etp(1s2)+ps22p;f(t)\triangleq-2\left(1-\gamma\right)t+e^{2t}ps^{2}+2e^{t}p(1-s^{2})+ps^{2}-2p\,;

    (a)(a) holds by M12M2e2\frac{M_{1}^{2}}{M_{2}}\leq e^{2}; (b)(b) holds by (46). Note that for tlog1pt\leq\log\frac{1}{p},

    10p2st(1+t)10qp(log1p)(1+log1p)ϵ48(log1p1+p),\displaystyle 10p^{2}st(1+t)\leq 10qp\left(\log\frac{1}{p}\right)\left(1+\log\frac{1}{p}\right)\leq\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\,, (49)

    where the last inequality holds because qq0ϵ480ϕ(1c0)ϵ480ϕ(p)q\leq q_{0}\leq\frac{\epsilon}{480}\phi(1-c_{0})\leq\frac{\epsilon}{480}\phi(p), and ϕ(x)\phi(x) is a monotone decreasing function on x(0,1)x\in(0,1) by Lemma 10.

    Therefore, to verify (43), it suffices to check

    12mps2(f(t)+ϵ48(log1p1+p))(1+ϵ64)klognk.\displaystyle\frac{1}{2}mps^{2}\left(f(t)+\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\right)\leq-\left(1+\frac{\epsilon}{64}\right)k\log n-k. (50)

    To this end, we need the following simple claim.

    Claim 1.

    Suppose nps2(log1p1+p)(2+ϵ)lognnps^{2}\left(\log\frac{1}{p}-1+p\right)\geq\left(2+\epsilon\right)\log n. Then, for all sufficiently large nn and all 0<p1c00<p\leq 1-c_{0},

    γ(1p)ϵ48(log1p1+p).\displaystyle\gamma(1-p)\leq\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\,. (51)

    We defer the proof of Claim 1 and first finish the proof of (50), by choosing different values of t[0,log1p]t\in[0,\log\frac{1}{p}] according to different cases.

    • Case 1(a): nps2(log(1/p)1+p)(4+ϵ)lognnps^{2}\left(\log(1/p)-1+p\right)\geq(4+\epsilon)\log n. We pick t=12log1pt=\frac{1}{2}\log\frac{1}{p}. Then

      f(t)\displaystyle f(t) =(1γ)log1p+s2(p1)2+2p2p\displaystyle=-(1-\gamma)\log\frac{1}{p}+s^{2}\left(\sqrt{p}-1\right)^{2}+2\sqrt{p}-2p
      (a)(1γ)(log1p+1p)+γ(1p)\displaystyle\overset{(a)}{\leq}-(1-\gamma)\left(\log\frac{1}{p}+1-p\right)+\gamma(1-p)
      (b)(1γϵ48)(log1p1+p),\displaystyle\overset{(b)}{\leq}-\left(1-\gamma-\frac{\epsilon}{48}\right)\left(\log\frac{1}{p}-1+p\right),

      where (a)(a) holds because s21s^{2}\leq 1; (b)(b) holds by (51) in Claim 1. Hence,

      12mps2(f(t)+ϵ48(log1p1+p))\displaystyle\frac{1}{2}mps^{2}\left(f(t)+\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\right) 12mps2(1γϵ24)(log1p1+p)\displaystyle\leq-\frac{1}{2}mps^{2}\left(1-\gamma-\frac{\epsilon}{24}\right)\left(\log\frac{1}{p}-1+p\right)
      (1+ϵ4)(11n)(1γϵ24)klogn.\displaystyle\leq-\left(1+\frac{\epsilon}{4}\right)\left(1-\frac{1}{n}\right)\left(1-\gamma-\frac{\epsilon}{24}\right)k\log n.

      where the last inequality follows from (44) and our assumption nps2(log(1/p)1+p)(4+ϵ)lognnps^{2}\left(\log(1/p)-1+p\right)\geq(4+\epsilon)\log n.

    • Case 1(b): (2+ϵ)lognnps2(log(1/p)1+p)(4+ϵ)logn(2+\epsilon)\log n\leq nps^{2}\left(\log(1/p)-1+p\right)\leq(4+\epsilon)\log n and pn12p\geq n^{-\frac{1}{2}}. We pick t=log1pt=\log\frac{1}{p} and get

      f(t)\displaystyle f(t) 2(1γ)log1p+s2p+2(1s2)+ps22p\displaystyle\leq-2\left(1-\gamma\right)\log\frac{1}{p}+\frac{s^{2}}{p}+2(1-s^{2})+ps^{2}-2p
      =2(1γ)(log1p1+p)+s2p+2γ(1p)+s2(p2)\displaystyle=-2\left(1-\gamma\right)\left(\log\frac{1}{p}-1+p\right)+\frac{s^{2}}{p}+2\gamma\left(1-p\right)+s^{2}(p-2)
      (a)2(1γϵ48)(log1p1+p)+2γ(1p)\displaystyle\overset{(a)}{\leq}-2\left(1-\gamma-\frac{\epsilon}{48}\right)\left(\log\frac{1}{p}-1+p\right)+2\gamma\left(1-p\right)
      (b)2(1γϵ24)(log1p1+p),\displaystyle\overset{(b)}{\leq}-2\left(1-\gamma-\frac{\epsilon}{24}\right)\left(\log\frac{1}{p}-1+p\right),

      where (a)(a) holds because s2p(4+ϵ)lognnp2(log(1/p)1+p)ϵ48(log(1/p)1+p)\frac{s^{2}}{p}\leq\frac{(4+\epsilon)\log n}{np^{2}\left(\log(1/p)-1+p\right)}\leq\frac{\epsilon}{48}\left(\log(1/p)-1+p\right) for all sufficiently large nn when n12p1c0n^{-\frac{1}{2}}\leq p\leq 1-c_{0}; (b)(b) holds in view of (51) in Claim 1. Then under the assumption that nps2(log(1/p)1+p)(2+ϵ)lognnps^{2}\left(\log(1/p)-1+p\right)\geq(2+\epsilon)\log n, and by (44),

      12mps2(f(t)+ϵ48(log1p1+p))\displaystyle\frac{1}{2}mps^{2}\left(f(t)+\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\right) (1+ϵ2)(11n)(1γϵ16)klogn.\displaystyle\leq-\left(1+\frac{\epsilon}{2}\right)\left(1-\frac{1}{n}\right)\left(1-\gamma-\frac{\epsilon}{16}\right)k\log n.
    • Case 1(c): nps24+ϵnps^{2}\geq 4+\epsilon and p<n12p<n^{-\frac{1}{2}}. We pick t=12log1ps2log1pt=\frac{1}{2}\log\frac{1}{ps^{2}}\leq\log\frac{1}{p}, where the inequality follows because s2ps^{2}\geq p, in view of nps24+ϵnps^{2}\geq 4+\epsilon and p<n12p<n^{-\frac{1}{2}}. Then

      f(t)\displaystyle f(t) =(1γ)log1ps2+1+2ps2(1s2)2p+ps2\displaystyle=-(1-\gamma)\log\frac{1}{ps^{2}}+1+2\sqrt{\frac{p}{s^{2}}}(1-s^{2})-2p+ps^{2}
      (1γ)log1ps2+1+2ps2\displaystyle\leq-(1-\gamma)\log\frac{1}{ps^{2}}+1+2\sqrt{\frac{p}{s^{2}}}
      (1γϵ48)log1ps2,\displaystyle\leq-\left(1-\gamma-\frac{\epsilon}{48}\right)\log\frac{1}{ps^{2}}\,,

      where the last inequality holds for nn sufficiently large because ps2np24+ϵ14+ϵ\frac{p}{s^{2}}\leq\frac{np^{2}}{4+\epsilon}\leq\frac{1}{4+\epsilon} and log1ps212logn\log\frac{1}{ps^{2}}\geq\frac{1}{2}\log n when nps24+ϵnps^{2}\geq 4+\epsilon and p<n12p<n^{-\frac{1}{2}}. Then,

      12mps2(f(t)+ϵ48(log1p1+p))\displaystyle\frac{1}{2}mps^{2}\left(f(t)+\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\right) (a)14(11n)(1γϵ24)knps2log1ps2\displaystyle\overset{(a)}{\leq}-\frac{1}{4}\left(1-\frac{1}{n}\right)\left(1-\gamma-\frac{\epsilon}{24}\right)knps^{2}\log\frac{1}{ps^{2}}
      (b)(1+ϵ4)(11n)(1γϵ24)klogn4+ϵ,\displaystyle\overset{(b)}{\leq}-\left(1+\frac{\epsilon}{4}\right)\left(1-\frac{1}{n}\right)\left(1-\gamma-\frac{\epsilon}{24}\right)k\log\frac{n}{4+\epsilon}\,,

      where (a)(a) holds by (44) and log1p1+plog1ps2;\log\frac{1}{p}-1+p\leq\log\frac{1}{ps^{2}}; (b)(b) holds due to nps24+ϵnps^{2}\geq 4+\epsilon.

    Hence, in view of γ<ϵ16\gamma<\frac{\epsilon}{16} for δnkn\delta n\leq k\leq n, combining the three cases, we get

    12mps2inf0tlog(1/ps2)f(t)\displaystyle\frac{1}{2}mps^{2}\inf_{0\leq t\leq\log(1/ps^{2})}f(t) <(1+ϵ4)(11n)(1ϵ8)klogn4+ϵ\displaystyle<-\left(1+\frac{\epsilon}{4}\right)\left(1-\frac{1}{n}\right)\left(1-\frac{\epsilon}{8}\right)k\log\frac{n}{4+\epsilon}
    (1+ϵ64)klognk,\displaystyle\leq-\left(1+\frac{\epsilon}{64}\right)k\log n-k,

    where the last inequality holds for all sufficiently large nn. Thus we arrive at the desired (50) which further implies (43).

    Now we are left to prove Claim 1.

    Proof of Claim 1.

    Note that for any 0<p1c00<p\leq 1-c_{0},

    γ(1p)(a)16h(δ)/δnps2(b)8h(δ)/δ(log1p1+p)logn\displaystyle\gamma(1-p)\overset{\rm(a)}{\leq}\sqrt{\frac{16h(\delta)/\delta}{nps^{2}}}\overset{\rm(b)}{\leq}\sqrt{\frac{8h(\delta)/\delta\left(\log\frac{1}{p}-1+p\right)}{\log n}} (c)16h(δ)/δ(log11c0c0)logn(log1p1+p)\displaystyle\overset{(c)}{\leq}\sqrt{\frac{16h(\delta)/\delta}{\left(\log\frac{1}{1-c_{0}}-c_{0}\right)\log n}}\left(\log\frac{1}{p}-1+p\right)
    (d)ϵ48(log1p1+p),\displaystyle\overset{(d)}{\leq}\frac{\epsilon}{48}\left(\log\frac{1}{p}-1+p\right)\,,

    where (a)(a) holds because if δnkn1\delta n\leq k\leq n-1, h(x)/xh(x)/x is a monotone decreasing function on 0<x10<x\leq 1 and nkh(kn)h(δ)/δ\frac{n}{k}h\left(\frac{k}{n}\right)\leq h\left(\delta\right)/\delta, and if k=nk=n, lognn1h(δ)/δ\frac{\log n}{n-1}\leq h(\delta)/\delta for nn sufficiently large; (b)(b) holds by assumption nps2(log1p1+p)(2+ϵ)lognnps^{2}\left(\log\frac{1}{p}-1+p\right)\geq(2+\epsilon)\log n; (c)(c) holds because log1x1+x\log\frac{1}{x}-1+x is monotone decreasing function on x(0,1)x\in(0,1) and we have p1c0p\leq 1-c_{0}; (d)(d) holds for nn sufficiently large. Hence, the claim (51) follows. ∎

    Case 2: qq0q\geq q_{0} or p1c0p\geq 1-c_{0}. Here we will pick t1210t\leq\frac{1}{2^{10}}. Then, by Lemma 4, we get

    tτ+k2logM12M2+m2(M21)\displaystyle-t\tau+\frac{k}{2}\log\frac{M_{1}^{2}}{M_{2}}+\frac{m}{2}\left(M_{2}-1\right) (a)ktτ+m2(M21)\displaystyle\overset{(a)}{\leq}k-t\tau+\frac{m}{2}\left(M_{2}-1\right)
    (b)k+mf(t),\displaystyle\overset{(b)}{\leq}k+mf(t),

    where

    f(t)tρσ2(1ϵ16)+12(M21);f(t)\triangleq-t\rho\sigma^{2}\left(1-\frac{\epsilon}{16}\right)+\frac{1}{2}\left(M_{2}-1\right)\,;

    (a)(a) holds by M12M2e2\frac{M_{1}^{2}}{M_{2}}\leq e^{2}, and (47) given t1210t\leq\frac{1}{2^{10}}; (b)(b) holds because τρσ2m(1ϵ16)\tau\geq\rho\sigma^{2}m\left(1-\frac{\epsilon}{16}\right), in view of τ=(1γ)mps2(1p)\tau=(1-\gamma)mps^{2}(1-p), ρ=s(1p)1ps\rho=\frac{s(1-p)}{1-ps}, σ2=ps(1ps)\sigma^{2}=ps(1-ps), and γϵ16\gamma\leq\frac{\epsilon}{16}.

    Therefore, to verify (43), it suffices to check

    mf(t)(1+ϵ64)klognk.\displaystyle mf(t)\leq-\left(1+\frac{\epsilon}{64}\right)k\log n-k\,. (52)
    • Case 2(a): p>1c0p>1-c_{0}. We pick t=ρσ2(1+ρ2)=(1p)p(1ps)(1+ρ2)<(a)2c01c01210t=\frac{\rho}{\sigma^{2}\left(1+\rho^{2}\right)}=\frac{(1-p)}{p(1-ps)(1+\rho^{2})}\overset{(a)}{<}\frac{2c_{0}}{1-c_{0}}\leq\frac{1}{2^{10}}, where (a)(a) holds in view of p>1c0p>1-c_{0} and ps12ps\leq\frac{1}{2}. By (47),

      M21+ρ21+ρ2+8ρ3qsσ6(1+ρ2)31+ρ21+ρ2(1+ϵ16),M_{2}\leq 1+\frac{\rho^{2}}{1+\rho^{2}}+\frac{8\rho^{3}qs}{\sigma^{6}(1+\rho^{2})^{3}}\leq 1+\frac{\rho^{2}}{1+\rho^{2}}\left(1+\frac{\epsilon}{16}\right)\,,

      where the last inequality holds because 8ρqsσ6(1+ρ2)2=8(1p)p2(1ps)4(1+ρ2)2ϵ32\frac{8\rho qs}{\sigma^{6}(1+\rho^{2})^{2}}=\frac{8(1-p)}{p^{2}(1-ps)^{4}(1+\rho^{2})^{2}}\leq\frac{\epsilon}{32} given 1pc0ϵ2131-p\leq c_{0}\leq\frac{\epsilon}{2^{13}} and ps12ps\leq\frac{1}{2}. Then, we have

      f(t)\displaystyle f(t) ρ21+ρ2(1ϵ1612(1+ϵ32))\displaystyle\leq-\frac{\rho^{2}}{1+\rho^{2}}\left(1-\frac{\epsilon}{16}-\frac{1}{2}\left(1+\frac{\epsilon}{32}\right)\right)
      =ρ21+ρ2(125ϵ64).\displaystyle=-\frac{\rho^{2}}{1+\rho^{2}}\left(\frac{1}{2}-\frac{5\epsilon}{64}\right)\,.

      Therefore,

      mf(t)\displaystyle mf(t) (a)12(11n)kn(ρ21+ρ2)(125ϵ64)(b)(1+ϵ64)klognk,\displaystyle\overset{(a)}{\leq}-\frac{1}{2}\left(1-\frac{1}{n}\right)kn\left(\frac{\rho^{2}}{1+\rho^{2}}\right)\left(\frac{1}{2}-\frac{5\epsilon}{64}\right)\overset{(b)}{\leq}-\left(1+\frac{\epsilon}{64}\right)k\log n-k\,,

      where (a)(a) holds by (44); (b)(b) holds under the assumption that ρ2=s2(1p)2(1ps)2(4+ϵ)lognn\rho^{2}=\frac{s^{2}(1-p)^{2}}{(1-ps)^{2}}\geq(4+\epsilon)\frac{\log n}{n} in (35) for nn sufficiently large.

    • Case 2(b): qq0q\geq q_{0} and p1c0p\leq 1-c_{0}. We pick t=4lognρσ2n=4lognps2(1p)n1210t=\frac{4\log n}{\rho\sigma^{2}n}=\frac{4\log n}{ps^{2}(1-p)n}\leq\frac{1}{2^{10}}, where the last inequality holds for nn sufficiently large. By (47),

      M21+t2σ4(1+ρ2+8tqsσ4(1+ρ2))\displaystyle M_{2}\leq 1+t^{2}\sigma^{4}\left(1+\rho^{2}+\frac{8tqs}{\sigma^{4}(1+\rho^{2})}\right) 1+t2σ4(1+ρ2)(1+ϵ64),\displaystyle\leq 1+t^{2}\sigma^{4}(1+\rho^{2})\left(1+\frac{\epsilon}{64}\right)\,,

      where the last inequality holds because 8tqsσ4(1+ρ2)2=32logn(1p)q2(1q)2(1+ρ2)2nϵ64\frac{8tqs}{\sigma^{4}(1+\rho^{2})^{2}}=\frac{32\log n}{(1-p)q^{2}(1-q)^{2}\left(1+\rho^{2}\right)^{2}n}\leq\frac{\epsilon}{64} for nn sufficiently large, given q0q12q_{0}\leq q\leq\frac{1}{2} and p1c0p\leq 1-c_{0}. Then, we have

      f(t)\displaystyle f(t) 4lognn(1ϵ32)+8log2nn21+ρ2ρ2(1+ϵ64)\displaystyle\leq-\frac{4\log n}{n}\left(1-\frac{\epsilon}{32}\right)+\frac{8\log^{2}n}{n^{2}}\frac{1+\rho^{2}}{\rho^{2}}\left(1+\frac{\epsilon}{64}\right)
      4lognn(1ϵ32)+32log2nρ2n2\displaystyle\leq-\frac{4\log n}{n}\left(1-\frac{\epsilon}{32}\right)+\frac{32\log^{2}n}{\rho^{2}n^{2}}
      3lognn,\displaystyle\leq-\frac{3\log n}{n}\,,

      where the last inequality holds for nn sufficiently large because ρ=s(1p)1psq0c01q0\rho=\frac{s(1-p)}{1-ps}\geq\frac{q_{0}c_{0}}{1-q_{0}}. Therefore,

      mf(t)\displaystyle mf(t) (a)32(11n)klogn(b)(1+ϵ64)klognk,\displaystyle\overset{(a)}{\leq}-\frac{3}{2}\left(1-\frac{1}{n}\right)k\log n\overset{(b)}{\leq}-\left(1+\frac{\epsilon}{64}\right)k\log n-k\,,

      where (a)(a) holds by (44); (b)(b) holds for nn sufficiently large.

  • For Gaussian model, set τ=ρmak\tau=\rho m-a_{k}, where

    ak={C2h(kn)nmkn1Cnlognk=na_{k}=\begin{cases}C\sqrt{2h\left(\frac{k}{n}\right)nm}&k\leq n-1\\ Cn\sqrt{\log n}&k=n\end{cases}\,

    for some universal constant C>0C>0 to be specified later. Recall that h(x)/xh(x)/x is monotone decreasing in xx and converges to 0 as x1x\to 1, under the condition nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n, for any constant 0<δ<10<\delta<1, we have h(δ)/δϵ2213C2nρ2h\left(\delta\right)/\delta\leq\frac{\epsilon^{2}}{2^{13}C^{2}}n\rho^{2} when nn is sufficiently large. Therefore, for δnkn1\delta n\leq k\leq n-1, nkh(kn)h(δ)/δϵ2213C2nρ2\frac{n}{k}h\left(\frac{k}{n}\right)\leq h\left(\delta\right)/\delta\leq\frac{\epsilon^{2}}{2^{13}C^{2}}n\rho^{2}. Since m=kn(1k+12n)14knm=kn\left(1-\frac{k+1}{2n}\right)\geq\frac{1}{4}kn, we have akϵ64ρknmϵ32ρma_{k}\leq\frac{\epsilon}{64}\rho\sqrt{knm}\leq\frac{\epsilon}{32}\rho m. For k=nk=n, by the assumption that nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n, akϵ32ρma_{k}\leq\frac{\epsilon}{32}\rho m. In conclusion, we have τρm(1ϵ32)\tau\geq\rho m\left(1-\frac{\epsilon}{32}\right).

    Note that (Aπ(i)π(j),Bij)(A_{\pi(i)\pi(j)},B_{ij}) are i.i.d. pairs of standard normal random variables with zero mean and correlation coefficient ρ\rho. Therefore, A¯=A\overline{A}=A and B¯=B\overline{B}=B. It follows that Y=(i,j)(F2)Aπ(i)π(j)BijY=\sum_{(i,j)\not\in\binom{F}{2}}A_{\pi(i)\pi(j)}B_{ij}.

    First, we verify (40) using the Hanson-Wright inequality stated in Lemma 9. Pick C=2cC=2c, where c>0c>0 is the universal constant in Lemma 9. If k=nk=n, applying Lemma 9 with M=ImM=I_{m} to Y=(i,j)(F2)Aπ(i)π(j)BijY=\sum_{(i,j)\not\in\binom{F}{2}}A_{\pi(i)\pi(j)}B_{ij} yields that 𝒫(Yτ)e2logn.{\mathcal{P}}\left(Y\leq\tau\right)\leq e^{-2\log n}. If kn1k\leq n-1, applying Lemma 9 again yields that

    (nk)𝒫(Yτ)(nk)exp(2nh(kn))exp(nh(kn)).\displaystyle\binom{n}{k}{\mathcal{P}}\left(Y\leq\tau\right)\leq\binom{n}{k}\exp\left(-2nh\left(\frac{k}{n}\right)\right)\leq\exp\left(-nh\left(\frac{k}{n}\right)\right).

    Next, we check (43). In view of (66) in Lemma 7, M1=1λ2M_{1}=\frac{1}{\lambda_{2}} and M2=1λ1λ2,M_{2}=\frac{1}{\lambda_{1}\lambda_{2}}, where λ1=(1+ρt)2t2\lambda_{1}=\sqrt{\left(1+\rho t\right)^{2}-t^{2}} and λ2=(1ρt)2t2\lambda_{2}=\sqrt{\left(1-\rho t\right)^{2}-t^{2}} for 0t11+ρ0\leq t\leq\frac{1}{1+\rho}. Thus for 0tρ1+ρ20\leq t\leq\frac{\rho}{1+\rho^{2}}, M1M2=λ11+ρ22\frac{M_{1}}{M_{2}}=\lambda_{1}\leq 1+\rho^{2}\leq 2. It follows that

    inft0(tτ+k2logM12M2+m2(M21))\displaystyle\inf_{t\geq 0}\left(-t\tau+\frac{k}{2}\log\frac{M_{1}^{2}}{M_{2}}+\frac{m}{2}\left(M_{2}-1\right)\right) =inft0(tτ+klogM1M2+m+k2(M21))\displaystyle=\inf_{t\geq 0}\left(-t\tau+k\log\frac{M_{1}}{M_{2}}+\frac{m+k}{2}\left(M_{2}-1\right)\right)
    k+inf0tρ1+ρ2(tτ+m+k2(M21))\displaystyle\leq k+\inf_{0\leq t\leq\frac{\rho}{1+\rho^{2}}}\left(-t\tau+\frac{m+k}{2}\left(M_{2}-1\right)\right)
    k+12k(n1)ρinf0tρ1+ρ2f(t),\displaystyle\leq k+\frac{1}{2}k(n-1)\rho\inf_{0\leq t\leq\frac{\rho}{1+\rho^{2}}}f(t),

    where

    f(t)t(1ϵ32)+12ρ(1+2n1)(M21),f(t)\triangleq-t\left(1-\frac{\epsilon}{32}\right)+\frac{1}{2\rho}\left(1+\frac{2}{n-1}\right)\left(M_{2}-1\right),

    and the last inequality holds because τρm(1ϵ32)\tau\geq\rho m\left(1-\frac{\epsilon}{32}\right) and (44).

    Therefore, to verify (43), it suffices to check

    12(n1)ρinf0tρ1+ρ2f(t)(1+ϵ64)logn1.\displaystyle\frac{1}{2}(n-1)\rho\inf_{0\leq t\leq\frac{\rho}{1+\rho^{2}}}f(t)\leq-\left(1+\frac{\epsilon}{64}\right)\log n-1. (53)

    Case 1: ρ2ϵ2256\rho^{2}\geq\frac{\epsilon^{2}}{256}. We pick t=ρ1+2ρ2t=\frac{\rho}{1+2\rho^{2}}. Then,

    M2=(1+2ρ2)21+ρ2+ρ41+5ρ2+9ρ4(1+2ρ21+32ρ2)21+2ρ22+3ρ2,M_{2}=\frac{\left(1+2\rho^{2}\right)^{2}}{\sqrt{1+\rho^{2}+\rho^{4}}\sqrt{1+5\rho^{2}+9\rho^{4}}}\leq\left(\frac{1+2\rho^{2}}{1+\frac{3}{2}\rho^{2}}\right)^{2}\leq 1+\frac{2\rho^{2}}{2+3\rho^{2}}\,,

    where the first inequality holds because (1+ρ2+ρ4)(1+5ρ2+9ρ4)(1+32ρ2)4(1+\rho^{2}+\rho^{4})(1+5\rho^{2}+9\rho^{4})\geq\left(1+\frac{3}{2}\rho^{2}\right)^{4}. Therefore,

    f(t)\displaystyle f(t) =ρ1+2ρ2(1ϵ32)+ρ2+3ρ2(1+2n1)\displaystyle=-\frac{\rho}{1+2\rho^{2}}\left(1-\frac{\epsilon}{32}\right)+\frac{\rho}{2+3\rho^{2}}\left(1+\frac{2}{n-1}\right)
    =ρ2+3ρ2((2ρ21+2ρ2)(1ϵ32)(1+2n1))\displaystyle=-\frac{\rho}{2+3\rho^{2}}\left(\left(2-\frac{\rho^{2}}{1+2\rho^{2}}\right)\left(1-\frac{\epsilon}{32}\right)-\left(1+\frac{2}{n-1}\right)\right)
    ρ2+3ρ2(53(1ϵ32)(1+2n1))\displaystyle\leq-\frac{\rho}{2+3\rho^{2}}\left(\frac{5}{3}\left(1-\frac{\epsilon}{32}\right)-\left(1+\frac{2}{n-1}\right)\right)
    ρ5(235ϵ962n1),\displaystyle\leq-\frac{\rho}{5}\left(\frac{2}{3}-\frac{5\epsilon}{96}-\frac{2}{n-1}\right),

    where the first inequality holds because ρ21+2ρ213\frac{\rho^{2}}{1+2\rho^{2}}\leq\frac{1}{3} and the last inequality follows from 2+3ρ252+3\rho^{2}\leq 5. Hence,

    12(n1)ρf(t)(n1)ρ210(235ϵ962n1).\frac{1}{2}(n-1)\rho f(t)\leq-(n-1)\frac{\rho^{2}}{10}\left(\frac{2}{3}-\frac{5\epsilon}{96}-\frac{2}{n-1}\right).

    Thus it follows from ρ2ϵ2256\rho^{2}\geq\frac{\epsilon^{2}}{256} that (53) holds for all sufficiently large nn.

    Case 2: (4+ϵ)lognnρ2<ϵ2256(4+\epsilon)\frac{\log n}{n}\leq\rho^{2}<\frac{\epsilon^{2}}{256}. We pick t=ρ1+ρ2t=\frac{\rho}{1+\rho^{2}}. Then,

    M2=(1+ρ2)21+3ρ2+4ρ41ρ2(1+ρ2)21+ρ2ρ41+ρ2+2ρ4,M_{2}=\frac{\left(1+\rho^{2}\right)^{2}}{\sqrt{1+3\rho^{2}+4\rho^{4}}\sqrt{1-\rho^{2}}}\leq\frac{\left(1+\rho^{2}\right)^{2}}{1+\rho^{2}-\rho^{4}}\leq 1+\rho^{2}+2\rho^{4},

    where the first inequality holds because (1+3ρ2+4ρ4)(1ρ2)(1+ρ2ρ4)2=ρ4(22ρ2+ρ4)0(1+3\rho^{2}+4\rho^{4})(1-\rho^{2})-(1+\rho^{2}-\rho^{4})^{2}=\rho^{4}(2-2\rho^{2}+\rho^{4})\geq 0 under the assumption that ρ2ϵ2/256\rho^{2}\leq\epsilon^{2}/256. Therefore,

    f(t)ρ1+ρ2(1ϵ32)+ρ(1+2ρ3)2(1+2n1)ρ(12ϵ16)(14n1),\displaystyle f(t)\leq-\frac{\rho}{1+\rho^{2}}\left(1-\frac{\epsilon}{32}\right)+\frac{\rho(1+2\rho^{3})}{2}\left(1+\frac{2}{n-1}\right)\leq-\rho\left(\frac{1}{2}-\frac{\epsilon}{16}\right)\left(1-\frac{4}{n-1}\right),

    where the first inequality holds because ρ2<ϵ2256\rho^{2}<\frac{\epsilon^{2}}{256}. Hence,

    12(n1)ρf(t)14(n1)ρ2(1ϵ32)(14n1)(1+ϵ32)(11n)(14n1)logn,\displaystyle\frac{1}{2}(n-1)\rho f(t)\leq-\frac{1}{4}(n-1)\rho^{2}\left(1-\frac{\epsilon}{32}\right)\left(1-\frac{4}{n-1}\right)\leq-\left(1+\frac{\epsilon}{32}\right)\left(1-\frac{1}{n}\right)\left(1-\frac{4}{n-1}\right)\log n,

    where the last inequality follows from the assumption that ρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\frac{\log n}{n}. Thus (53) holds for all sufficiently large nn.

4 Exact Recovery

4.1 Possibility of Exact Recovery

Building upon the almost exact recovery results in the preceding section, we now analyze the MLE for exact recovery. Improving upon Lemma 3, the following lemma gives a tighter bound on the probability that the MLE makes a small number of errors with respect to the general correlated Erdős-Rényi graph model specified by the joint distribution P=(pab:a,b{0,1})P=(p_{ab}:a,b\in\{0,1\}).

Lemma 5.

Suppose that for any constant 0<ϵ<10<\epsilon<1,

  • for general Erdős-Rényi random graphs, if n(p00p11p01p10)2(1+ϵ)lognn\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\geq\left(1+\epsilon\right)\log n;

  • for Gaussian model, if nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n;

then for any k[n]k\in[n] such that kϵ16nk\leq\frac{\epsilon}{16}n,

{d(π^𝖬𝖫,π)=k}exp(ϵ8klogn).\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\}\leq\exp\left(-\frac{\epsilon}{8}k\log n\right). (54)

Note that Lemma 5 only holds when k/nk/n is small but requires less stringent conditions than Lemma 3. Inspecting the proof of Lemma 5, one can see that if the condition is strengthened by a factor of 22 to n(p00p11p01p10)2(2+ϵ)lognn\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\geq\left(2+\epsilon\right)\log n for the Erdős-Rényi graphs (which is the assumption of [CK16]), then (54) holds for any k[n]k\in[n]. Conversely, we will prove in Section 4.2 that exact recovery is impossible if n(p00p11p01p10)2(1ϵ)lognn\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\leq\left(1-\epsilon\right)\log n. Closing this gap of two for general Erdős-Rényi graphs is an open problem.

In the following, we apply Lemma 5 for small kk and Lemma 3 (or Proposition 5) for large kk to show the success of MLE in exact recovery.

Proof of positive parts in Theorem 1 and Theorem 4.

Without loss of generality, we assume ϵ<1\epsilon<1. For all kϵ16nk\leq\frac{\epsilon}{16}n, by Lemma 5, for both Erdős-Rényi random graphs and Gaussian model,

k=1ϵ16n{d(π^𝖬𝖫,π)=k}k=1ϵ16nexp(ϵ8klogn)exp(ϵ8logn)1ϵ8logn=nϵ8+o(1).\displaystyle\sum_{k=1}^{\frac{\epsilon}{16}n}\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\}\leq\sum_{k=1}^{\frac{\epsilon}{16}n}\exp\left(-\frac{\epsilon}{8}k\log n\right)\leq\frac{\exp\left(-\frac{\epsilon}{8}\log n\right)}{1-\frac{\epsilon}{8}\log n}=n^{-\frac{\epsilon}{8}+o(1)}.

Moreover,

  • For the Gaussian model, pick δ=ϵ16\delta=\frac{\epsilon}{16}. Thus by (37) in Lemma 3 and (38), (58) follows.

  • For the subsampling Erdős-Rényi random graphs, we divide the proof into two cases depending on whether ps1/2ps\leq 1/2.

    Case 1: ps12ps\leq\frac{1}{2}. Note that condition (12) is equivalent to

    n(p11p00p01p10)2\displaystyle n\left(\sqrt{p_{11}p_{00}}-\sqrt{p_{01}p_{10}}\right)^{2} =nps2f(p,s)(1+ϵ)logn,\displaystyle=nps^{2}f(p,s)\geq(1+\epsilon)\log n\,, (55)

    where

    f(p,s)(12ps+ps2p(1s))2.f(p,s)\triangleq\left(\sqrt{1-2ps+ps^{2}}-\sqrt{p}(1-s)\right)^{2}\,.

    In the sequel, we aim to show that if (55) holds, then nps2(1p)=ω(1)nps^{2}(1-p)=\omega(1), (35) holds for p=1o(1)p=1-o(1), and (36) holds for p=1Ω(1)p=1-\Omega(1). To this end, we will use the inequality

    0f(p,s)s2p,\displaystyle 0\leq\frac{\partial f(p,s)}{\partial s}\leq 2\sqrt{p}\,, (56)

    which follows from

    f(p,s)s=2p(12ps+ps2p(1s))212ps+ps2.\frac{\partial f(p,s)}{\partial s}=\dfrac{2\sqrt{p}\left(\sqrt{1-2ps+ps^{2}}-\sqrt{p}\left(1-s\right)\right)^{2}}{\sqrt{1-2ps+ps^{2}}}\,.
    • By (56), f(p,s)f(p,s) is a monotone increasing on s(0,1)s\in(0,1). Therefore, f(p,s)1p.f(p,s)\leq 1-p. Hence, if (55) holds, then nps2(1p)=ω(1).nps^{2}\left(1-p\right)=\omega(1)\,.

    • Suppose p=1o(1)p=1-o(1). We have

      f(p,s)\displaystyle f(p,s) =p(1s)2(1+1pp(1s)21)2(1p)24p(1s)2,\displaystyle=p(1-s)^{2}\left(\sqrt{1+\frac{1-p}{p(1-s)^{2}}}-1\right)^{2}\leq\frac{(1-p)^{2}}{4p(1-s)^{2}}\,,

      where the last inequality holds because 1+x1+x2\sqrt{1+x}\leq 1+\frac{x}{2}. Thus, if (55) holds, then

      n(1p)2(1ps)2n(1p)2(1+ϵ2)p(1s)2(4+ϵ)logn,\displaystyle\frac{n\left(1-p\right)^{2}}{\left(1-ps\right)^{2}}\geq\frac{n\left(1-p\right)^{2}}{\left(1+\frac{\epsilon}{2}\right)p\left(1-s\right)^{2}}\geq(4+\epsilon)\log n\,,

      where the first inequality holds because p=1o(1)p=1-o(1) and ps12ps\leq\frac{1}{2}. Hence, (35) follows.

    • Suppose p=1Ω(1)p=1-\Omega(1). By the mean value theorem and (56), we have

      f(p,s)f(p,0)+ssups:ps1/2f(p,s)s(1p)2+2ps,f(p,s)\leq f(p,0)+s\sup_{s:ps\leq 1/2}\frac{\partial f(p,s)}{\partial s}\leq\left(1-\sqrt{p}\right)^{2}+2\sqrt{p}s\,,

      Now, suppose s=o(1)s=o(1) or p=o(1)p=o(1). Then

      (1p)2+2ps(1p)2(1+ϵ4),\left(1-\sqrt{p}\right)^{2}+2\sqrt{p}s\leq\left(1-\sqrt{p}\right)^{2}\left(1+\frac{\epsilon}{4}\right)\,,

      for all sufficiently large nn. Therefore, if (55) holds, then

      nps2(1p)2(1+ϵ2)logn.\displaystyle nps^{2}(1-\sqrt{p})^{2}\geq\left(1+\frac{\epsilon}{2}\right)\log n\,. (57)

      If instead s=Ω(1)s=\Omega(1) and p=Ω(1)p=\Omega(1), then (57) follows directly.

      In conclusion, for both cases, since log1p1+p2(1p)2\log\frac{1}{p}-1+p\geq 2(1-\sqrt{p})^{2}, we get that (36) holds.

    Finally, by Proposition 5, we have that

    {d(π^𝖬𝖫,π)>ϵ16n}n1+o(1).\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)>\frac{\epsilon}{16}n\right\}\leq n^{-1+o(1)}\,. (58)

    Case 2: ps>12ps>\frac{1}{2}. In this case, we consider the equivalent model (n,p,s)(n,p^{\prime},s^{\prime}) as specified in (34) with ps=1ps1/2p^{\prime}s^{\prime}=1-ps\leq 1/2 by flipping 0 and 11. Note that after flipping 0 and 11, both the value of p11p00p01p10\sqrt{p_{11}p_{00}}-\sqrt{p_{01}p_{10}} and the MLE are clearly unchanged. Therefore, applying the Case 11 of Theorem 4 to to the model (n,p,s)(n,p^{\prime},s^{\prime}), we get that (58) holds for the model (n,p,s).(n,p,s).

Hence, for both the Erdős-Rényi random graph and Gaussian model,

{d(π^𝖬𝖫,π)>0}k=1ϵ16n{d(π^𝖬𝖫,π)=k}+{d(π^𝖬𝖫,π)>ϵ16n}nϵ8+o(1).\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)>0\right\}\leq\sum_{k=1}^{\frac{\epsilon}{16}n}\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\}+\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)>\frac{\epsilon}{16}n\right\}\leq n^{-\frac{\epsilon}{8}+o(1)}.

4.1.1 Proof of Lemma 5

In this proof we focus on the case with positive correlation, i.e., p00p11p01p10p_{00}p_{11}\geq p_{01}p_{10} in the general Erdős-Rényi graphs and ρ0\rho\geq 0 in the Gaussian model. The case with negative correlation can be proved analogously by bounding the probability AπAπ,B0\left\langle A^{\pi^{\prime}}-A^{\pi},B\right\rangle\leq 0.

Fix k[n]k\in[n] and let 𝒯k{\mathcal{T}}_{k} denote the set of permutation π{\pi^{\prime}} such that d(π,π)=kd(\pi,{\pi^{\prime}})=k. Let 𝒪1{\mathcal{O}}_{1}^{\prime} is the set of fixed points of edge permutation σ𝖤\sigma^{\sf E}, where in view of (15),

|𝒪1|=N1=(n12)+n2=(nk2)+n2.|{\mathcal{O}}_{1}^{\prime}|=N_{1}=\binom{n_{1}}{2}+n_{2}=\binom{n-k}{2}+n_{2}.

Then, applying the Chernoff bound together with the union bound yields that for any t0t\geq 0,

{d(π^𝖬𝖫,π)=k}\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\} π𝒯k{Zπ0}nk𝔼[exp(tZπ)],\displaystyle\leq\sum_{{\pi^{\prime}}\in{\mathcal{T}}_{k}}\mathbb{P}\left\{Z_{\pi^{\prime}}\geq 0\right\}\leq n^{k}\mathbb{E}\left[\exp\left({tZ_{\pi^{\prime}}}\right)\right], (59)

where

Zπ\displaystyle Z_{\pi^{\prime}} i<jAπ(i)π(j)Biji<jAπ(i)π(j)Bij\displaystyle\triangleq\sum_{i<j}A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}B_{ij}-\sum_{i<j}A_{\pi(i)\pi(j)}B_{ij}
=𝒪\𝒪1((i,j)OAπ(i)π(j)Bi,j(i,j)OAπ(i)π(j)Bi,j)𝒪\𝒪1ZO,\displaystyle=\sum_{{\mathcal{O}}\backslash{\mathcal{O}}_{1}^{\prime}}\left(\sum_{(i,j)\in O}A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}B_{i,j}-\sum_{(i,j)\in O}A_{\pi(i)\pi(j)}B_{i,j}\right)\triangleq\sum_{{\mathcal{O}}\backslash{\mathcal{O}}_{1}^{\prime}}Z_{O}\,,

where

ZO=XOYO.Z_{O}=X_{O}-Y_{O}\,.

Since edge orbits are disjoint, it follows that ZOZ_{O} are mutually independent across different OO. Therefore, for any t>0t>0,

𝔼[exp(tO𝒪\𝒪1ZO)]=O𝒪\𝒪1𝔼[exp(tZO)].\mathbb{E}\left[\exp\left(t\sum_{O\in{\mathcal{O}}\backslash{\mathcal{O}}_{1}^{\prime}}Z_{O}\right)\right]=\prod_{O\in{\mathcal{O}}\backslash{\mathcal{O}}^{\prime}_{1}}\mathbb{E}\left[\exp\left(tZ_{O}\right)\right].

It turns out that the MGF of ZOZ_{O} can be explicitly computed in both Erdős-Rényi random graph and Gaussian model. In particular, applying Lemma 8 yields that 𝔼[etZO]=L|O|\mathbb{E}\left[e^{tZ_{O}}\right]=L_{|O|} and LL2/2L_{\ell}\leq L_{2}^{\ell/2} for 2\ell\geq 2. It follows that

𝔼[exp(tZπ)]==2(n2)LN=2(n2)L2N/2L2m2,\mathbb{E}\left[\exp\left({tZ_{\pi^{\prime}}}\right)\right]=\prod_{\ell=2}^{\binom{n}{2}}L_{\ell}^{N_{\ell}}\leq\prod_{\ell=2}^{\binom{n}{2}}L_{2}^{\ell N_{\ell}/2}\leq L_{2}^{\frac{m}{2}}\,,

where m(n2)(nk2)n2m\triangleq\binom{n}{2}-\binom{n-k}{2}-n_{2} and the last inequality follows from =1(n2)N=(n2)\sum_{\ell=1}^{\binom{n}{2}}\ell N_{\ell}=\binom{n}{2} and N1=(n12)+n2=(nk2)+n2N_{1}=\binom{n_{1}}{2}+n_{2}=\binom{n-k}{2}+n_{2}. Hence,

{d(π^𝖬𝖫,π)=k}\displaystyle\mathbb{P}\left\{d\left(\widehat{\pi}_{\sf ML},\pi\right)=k\right\} (a)exp(klogn+12kn(1k+22n)logL2)\displaystyle\overset{\rm(a)}{\leq}\exp\left(k\log n+\frac{1}{2}kn\left(1-\frac{k+2}{2n}\right)\log L_{2}\right)
(b)exp((ϵ4k+22n(1+ϵ4))klogn)\displaystyle\overset{\rm(b)}{\leq}\exp\left(-\left(\frac{\epsilon}{4}-\frac{k+2}{2n}\left(1+\frac{\epsilon}{4}\right)\right)k\log n\right)
(c)exp(ϵ8klogn),\displaystyle\overset{\rm(c)}{\leq}\exp\left(-\frac{\epsilon}{8}k\log n\right),

where (a)(a) holds because n2k/2n_{2}\leq k/2 and then mkn(1k+22n)m\geq kn\left(1-\frac{k+2}{2n}\right); (b)(b) holds by the claim that nlogL22(1+ϵ/4)lognn\log L_{2}\leq-2(1+\epsilon/4)\log n for appropriately chosen tt ; (c)(c) holds for all sufficiently large nn given k/nϵ16k/n\leq\frac{\epsilon}{16} and 0<ϵ<10<\epsilon<1. It remains to check nlogL22(1+ϵ/4)lognn\log L_{2}\leq-2(1+\epsilon/4)\log n by choosing appropriately tt for Erdős-Rényi random graphs and Gaussian model separately.

  • For Erdős-Rényi random graphs, in view of (67) in Lemma 8,

    L2=1+2(p01p10(et1)+p00p11(et1)).L_{2}=1+2\left(p_{01}p_{10}\left(e^{t}-1\right)+p_{00}p_{11}\left(e^{-t}-1\right)\right).

    Since p00p11p01p10p_{00}p_{11}\geq p_{01}p_{10}, by choosing the optimal t0t\geq 0 such that et=p00p11p01p10e^{t}=\sqrt{\frac{p_{00}p_{11}}{p_{01}p_{10}}}, L2=12(p00p11p01p10)2,L_{2}=1-2\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}, and hence

    nlogL22n(p00p11p01p10)22(1+ϵ)logn,n\log L_{2}\leq-2n\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\leq-2(1+\epsilon)\log n,

    where the last inequality holds by the assumption that n(p00p11p01p10)2(1+ϵ)lognn\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\geq(1+\epsilon)\log n;

  • For Gaussian model, in view of (68) in Lemma 8,

    L2=11+4tρ4t2(1ρ2).L_{2}=\frac{1}{\sqrt{1+4t\rho-4t^{2}\left(1-\rho^{2}\right)}}.

    By choosing the optimal t=ρ2(1ρ2)0t=\frac{\rho}{2\left(1-\rho^{2}\right)}\geq 0, L2=1ρ2.L_{2}=\sqrt{1-\rho^{2}}. and hence

    nlogL212nρ22(1+ϵ/4)logn,n\log L_{2}\leq-\frac{1}{2}n\rho^{2}\leq-2(1+\epsilon/4)\log n,

    where the last inequality holds by the assumption that nρ2(4+ϵ)lognn\rho^{2}\geq(4+\epsilon)\log n.

4.2 Impossibility of Exact Recovery

In this subsection we prove the negative result in Theorem 4. As in Section 4.1, we consider a general correlated Erdős-Rényi graph model, where {Aπ(i)π(j)=a,Bij=b}=pab\mathbb{P}\left\{A_{\pi(i)\pi(j)}=a,B_{ij}=b\right\}=p_{ab}. We aim to show that if

n(p00p11p01p10)2(1ϵ)logn,\displaystyle n\left(\sqrt{p_{00}p_{11}}-\sqrt{p_{01}p_{10}}\right)^{2}\leq\left(1-\epsilon\right)\log n, (60)

then the exact recovery of the latent permutation π\pi is impossible. Particularizing to the subsampling model parameterized by (11) the condition (60) reduces to

nps2(12ps+ps2p(1s))2(1ϵ)logn,nps^{2}\left(\sqrt{1-2ps+ps^{2}}-\sqrt{p}(1-s)\right)^{2}\leq(1-\epsilon)\log n,

which, under the assumption that p=1Ω(1)p=1-\Omega(1) in Theorem 4, is further equivalent to the desired (13).

Since the true permutation π\pi is uniformly distributed, the MLE π^𝖬𝖫\widehat{\pi}_{\sf ML} minimizes the error probability among all estimators. In the sequel, we focus on the case of positive correlation as the other case is entirely analogous. Without loss of generality, the latent permutation π\pi is assumed to be the identity permutation id\mathrm{id}.

To prove the failure of the MLE, it suffices to show the existence of a permutation π\pi^{\prime} that achieves a higher likelihood than the true permutation π=id\pi=\mathrm{id}. To this end, we consider the set 𝒯2{\mathcal{T}}_{2} of permutation π\pi^{\prime} such that d(π,π)=2d(\pi^{\prime},\pi)=2; in other words, π\pi^{\prime} coincides with π\pi except for swapping two vertices. Then the cycle decomposition of π\pi^{\prime} consists of n2n-2 fixed points and a 22-node orbit (i,j)(i,j) for some pair of vertices (i,j)(i,j). It follows that

A,BAπ,B=k[n]\{i,j}(AikAjk)(BikBjk)k[n]\{i,j}Xij,k,\displaystyle\left\langle A,B\right\rangle-\langle A^{{\pi^{\prime}}},B\rangle=\sum_{k\in[n]\backslash\{i,j\}}\left(A_{ik}-A_{jk}\right)\left(B_{ik}-B_{jk}\right)\triangleq\sum_{k\in[n]\backslash\{i,j\}}X_{ij,k},

where Xij,ki.i.d.aδ+1+bδ1+(1ab)δ0X_{ij,k}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}a\delta_{+1}+b\delta_{-1}+\left(1-a-b\right)\delta_{0} for k[n]\{i,j}k\in[n]\backslash\{i,j\} with a2p00p11a\triangleq 2p_{00}p_{11}, b2p01p10b\triangleq 2p_{01}p_{10} and aba\geq b by assumption. We aim to show the existence of π𝒯2\pi^{\prime}\in{\mathcal{T}}_{2} such that A,BAπ,B\left\langle A,B\right\rangle\leq\langle A^{{\pi^{\prime}}},B\rangle, which further implies that π\pi^{\prime} has a higher likelihood than π\pi. We divide the remaining analysis into two cases depending on whether na(2ϵ)lognna\geq(2-\epsilon)\log n.

First we consider the case where na(2ϵ)lognna\leq(2-\epsilon)\log n. We show that with probability at least 1nΩ(ϵ)1-n^{-\Omega(\epsilon)}, there are at least nΩ(ϵ)n^{\Omega(\epsilon)} distinct π𝒯2\pi^{\prime}\in{\mathcal{T}}_{2} such that A,BAπ,B\left\langle A,B\right\rangle\leq\langle A^{\pi^{\prime}},B\rangle. This implies that the MLE π^ML\widehat{\pi}_{\rm ML} coincides with π\pi with probability at most nΩ(ϵ)n^{-\Omega(\epsilon)}. Specifically, define χij\chi_{ij} as the indicator random variable which equals to 11 if Xij,k0X_{ij,k}\leq 0 for all ki,jk\neq i,j, and 0 otherwise. Then

{χij=1}=ki,j{Xij,k0}=(1a)n2n2+ϵo(1),\displaystyle\mathbb{P}\left\{\chi_{ij}=1\right\}=\prod_{k\neq i,j}\mathbb{P}\left\{X_{ij,k}\leq 0\right\}=\left(1-a\right)^{n-2}\geq n^{-2+\epsilon-o(1)},

where the last inequality holds because an(2ϵ)lognan\leq(2-\epsilon)\log n. Let I=1in/2n/2<jnχijI=\sum_{1\leq i\leq n/2}\sum_{n/2<j\leq n}\chi_{ij}. Then 𝔼[I]nϵo(1)\mathbb{E}\left[I\right]\geq n^{\epsilon-o(1)}. Next, we show 𝗏𝖺𝗋(I)=o(𝔼[I]2)\mathsf{var}(I)=o\left(\mathbb{E}\left[I\right]^{2}\right), which, by Chebyshev’s inequality, implies that I12𝔼[I]I\geq\frac{1}{2}\mathbb{E}\left[I\right] with high probability. Note that

𝗏𝖺𝗋(I)=1i,in/2n/2<j,jn{χij=1,χij=1}{χij=1}{χij=1}.\mathsf{var}(I)=\sum_{1\leq i,i^{\prime}\leq n/2}\sum_{n/2<j,j^{\prime}\leq n}\mathbb{P}\left\{\chi_{ij}=1,\chi_{i^{\prime}j^{\prime}}=1\right\}-\mathbb{P}\left\{\chi_{ij}=1\right\}\mathbb{P}\left\{\chi_{i^{\prime}j^{\prime}}=1\right\}.

We break the analysis into the following four cases depending on the choice of (i,j)(i,j) and (i,j)(i^{\prime},j^{\prime}).

Case 1: i=ii=i^{\prime}, and j=jj=j^{\prime}. In this case,

{χij=1,χij=1}{χij=1}{χij=1}{χij=1}.\mathbb{P}\left\{\chi_{ij}=1,\chi_{i^{\prime}j^{\prime}}=1\right\}-\mathbb{P}\left\{\chi_{ij}=1\right\}\mathbb{P}\left\{\chi_{i^{\prime}j^{\prime}}=1\right\}\leq\mathbb{P}\left\{\chi_{ij}=1\right\}.

Case 2: i=ii=i^{\prime}, and jjj\neq j^{\prime}. In this case, note that

{χij=1χij=1}\displaystyle\mathbb{P}\left\{\chi_{ij^{\prime}}=1\mid\chi_{ij}=1\right\} =ki,j{Xij,k0χij=1}\displaystyle=\prod_{k\neq i,j^{\prime}}\mathbb{P}\left\{X_{ij,k}\leq 0\mid\chi_{ij}=1\right\}
={Xij,j0Xij,j0}×ki,j,j{Xij,k0Xij,k0}\displaystyle=\mathbb{P}\left\{X_{ij^{\prime},j}\leq 0\mid X_{ij,j^{\prime}}\leq 0\right\}\times\prod_{k\neq i,j,j^{\prime}}\mathbb{P}\left\{X_{ij^{\prime},k}\leq 0\mid X_{ij,k}\leq 0\right\}
(1p00p11)n2,\displaystyle\leq\left(1-p_{00}p_{11}\right)^{n-2},

where the last inequality holds because

{Xij,k0Xij,k0}\displaystyle\mathbb{P}\left\{X_{ij^{\prime},k}\leq 0\mid X_{ij,k}\leq 0\right\} =p11(1p00)2+p01+p10+p00(1p11)212p00p11\displaystyle=\frac{p_{11}\left(1-p_{00}\right)^{2}+p_{01}+p_{10}+p_{00}\left(1-p_{11}\right)^{2}}{1-2p_{00}p_{11}}
=14p00p11+p00p11(p00+p11)12p00p11\displaystyle=\frac{1-4p_{00}p_{11}+p_{00}p_{11}\left(p_{00}+p_{11}\right)}{1-2p_{00}p_{11}}
13p00p1112p00p111p00p11,\displaystyle\leq\frac{1-3p_{00}p_{11}}{1-2p_{00}p_{11}}\leq 1-p_{00}p_{11},

and similarly for {Xij,j0Xij,j0}\mathbb{P}\left\{X_{ij^{\prime},j}\leq 0\mid X_{ij,j^{\prime}}\leq 0\right\}.

Case 3: iii\neq i^{\prime}, and j=jj=j^{\prime}. Analogous to Case 2, we have

{χij=1χij=1}(1p00p11)n2\mathbb{P}\left\{\chi_{i^{\prime}j}=1\mid\chi_{ij}=1\right\}\leq\left(1-p_{00}p_{11}\right)^{n-2}

Case 4: iii\neq i^{\prime}, and jjj\neq j^{\prime}. In this case,

{χij=1χij=1}\displaystyle\mathbb{P}\left\{\chi_{i^{\prime}j^{\prime}}=1\mid\chi_{ij}=1\right\} =ki,j{Xij,k0χij=1}\displaystyle=\prod_{k\neq i^{\prime},j^{\prime}}\mathbb{P}\left\{X_{i^{\prime}j^{\prime},k}\leq 0\mid\chi_{ij}=1\right\}
ki,i,j,j{Xij,k0}\displaystyle\leq\prod_{k\neq i,i^{\prime},j,j^{\prime}}\mathbb{P}\left\{X_{i^{\prime}j^{\prime},k}\leq 0\right\}
=(12p00p11)n4,\displaystyle=\left(1-2p_{00}p_{11}\right)^{n-4},

where the first inequality holds because for all ki,i,j,jk\neq i,i^{\prime},j,j^{\prime}, Xij,kX_{i^{\prime}j^{\prime},k} are independent from {Xij,k}ki,j\{X_{ij,k}\}_{k\neq i,j}. Therefore,

{χij=1,χij=1}{χij=1}{χij=1}\displaystyle\mathbb{P}\left\{\chi_{ij}=1,\chi_{i^{\prime}j^{\prime}}=1\right\}-\mathbb{P}\left\{\chi_{ij}=1\right\}\mathbb{P}\left\{\chi_{i^{\prime}j^{\prime}}=1\right\} =(12p00p11)2n6(1(12p00p11)2)\displaystyle=\left(1-2p_{00}p_{11}\right)^{2n-6}\left(1-\left(1-2p_{00}p_{11}\right)^{2}\right)
4p00p11(12p00p11)2n6.\displaystyle\leq 4p_{00}p_{11}\left(1-2p_{00}p_{11}\right)^{2n-6}.

Combining all four cases above, we get that

𝗏𝖺𝗋(I)𝔼[I](1+n(1p00p11)n2+n2p00p11(12p00p11)n4)\displaystyle\mathsf{var}(I)\leq\mathbb{E}\left[I\right]\left(1+n\left(1-p_{00}p_{11}\right)^{n-2}+n^{2}p_{00}p_{11}\left(1-2p_{00}p_{11}\right)^{n-4}\right)

Therefore,

𝗏𝖺𝗋(I)𝔼[I]21+n(1p00p11)n2+n2p00p11(12p00p11)n4n2(12p00p11)n2=O(nΩ(ϵ)),\frac{\mathsf{var}(I)}{\mathbb{E}\left[I\right]^{2}}\leq\frac{1+n\left(1-p_{00}p_{11}\right)^{n-2}+n^{2}p_{00}p_{11}\left(1-2p_{00}p_{11}\right)^{n-4}}{n^{2}\left(1-2p_{00}p_{11}\right)^{n-2}}=O\left(n^{-\Omega(\epsilon)}\right),

where the last inequality holds because np00p11(1ϵ/2)lognnp_{00}p_{11}\leq(1-\epsilon/2)\log n. In conclusion, we have shown that InΩ(ϵ)I\geq n^{\Omega(\epsilon)} with probability at least 1nΩ(ϵ)1-n^{-\Omega(\epsilon)}. This implies that the MLE exactly recovers the true permutation with probability at most nΩ(ϵ)n^{-\Omega(\epsilon)}.

Next, we shift to the case where na(2ϵ)lognna\geq(2-\epsilon)\log n. The assumption (60) translates to n(ab)22(1ϵ)lognn(\sqrt{a}-\sqrt{b})^{2}\leq 2(1-\epsilon)\log n, we have

(2ϵ)logn(1ba)2na(1ba)22(1ϵ)logn.(2-\epsilon)\log n\left(1-\sqrt{\frac{b}{a}}\right)^{2}\leq na\left(1-\sqrt{\frac{b}{a}}\right)^{2}\leq 2(1-\epsilon)\log n.

It follows that

ba12(1ϵ)2ϵ=11ϵ2ϵϵ2(2ϵ)ϵ4\sqrt{\frac{b}{a}}\geq 1-\sqrt{\frac{2(1-\epsilon)}{2-\epsilon}}=1-\sqrt{1-\frac{\epsilon}{2-\epsilon}}\geq\frac{\epsilon}{2(2-\epsilon)}\geq\frac{\epsilon}{4}

and hence ϵ216ba1\frac{\epsilon^{2}}{16}\leq\frac{b}{a}\leq 1. Let TT be a set of 2m2m vertices where m=n/log2nm=\left\lfloor n/\log^{2}n\right\rfloor. We further partition TT into T1T2T_{1}\cup T_{2} where |T1|=|T2|=m|T_{1}|=|T_{2}|=m. Let 𝒮{\mathcal{S}} denote the set of permutations π{\pi^{\prime}} that consists of n2n-2 fixed points and a 22-node orbit (i,j)(i,j) for some iT1i\in T_{1} and jT2j\in T_{2}. Then |𝒮|=m2|{\mathcal{S}}|=m^{2}. Next we show that

{π𝒮 s.t. A,BAπ,B<0}=1o(1).\displaystyle\mathbb{P}\left\{\exists\pi^{\prime}\in{\mathcal{S}}\text{ s.t. }\left\langle A,B\right\rangle-\left\langle A^{{\pi^{\prime}}},B\right\rangle<0\right\}=1-o(1). (61)

Fix a π𝒮{\pi^{\prime}}\in{\mathcal{S}} with (i,j)(i,j) as its 22-node orbit, i.e., π(i)=j{\pi^{\prime}}(i)=j, π(j)=i{\pi^{\prime}}(j)=i, and π(k)=k{\pi^{\prime}}(k)=k for any k[n]\{i,j}k\in[n]\backslash\{i,j\}. Then A,BAπ,B=Xij+Yij,\left\langle A,B\right\rangle-\left\langle A^{{\pi^{\prime}}},B\right\rangle=X_{ij}+Y_{ij}, where

XijkTcXij,k, and YijkT\{i,j}Xij,k.X_{ij}\triangleq\sum_{k\in T^{c}}X_{ij,k},\quad\text{ and }\quad Y_{ij}\triangleq\sum_{k\in T\backslash\{i,j\}}X_{ij,k}.

Note that 𝔼[Yij]=(2m2)(ab)\mathbb{E}\left[Y_{ij}\right]=\left(2m-2\right)(a-b) and Var[Yij](2m2)(a+b)\mathrm{Var}{\left[Y_{ij}\right]}\leq\left(2m-2\right)\left(a+b\right). Letting

τ(2m2)(ab)+(2m2)(a+b)logn,\tau\triangleq\left(2m-2\right)(a-b)+\sqrt{\left(2m-2\right)\left(a+b\right)\log n},

by Chebyshev’s inequality,

{Yijτ}1logn.\displaystyle\mathbb{P}\left\{Y_{ij}\geq\tau\right\}\leq\frac{1}{\log n}. (62)

Define

T={(i,j)T1×T2:Yij<τ}.T^{\prime}=\left\{(i,j)\in T_{1}\times T_{2}:Y_{ij}<\tau\right\}.

Then 𝔼[|(T1×T2)\T|]m2/logn\mathbb{E}\left[\left|(T_{1}\times T_{2})\backslash T^{\prime}\right|\right]\leq m^{2}/\log n. Hence, by Markov’s inequality, |(T1×T2)\T|12m2\left|(T_{1}\times T_{2})\backslash T^{\prime}\right|\geq\frac{1}{2}m^{2} with probability at most 2logn\frac{2}{\log n}. Hence, we have |T|12m2|T^{\prime}|\geq\frac{1}{2}m^{2} with probability at least 12logn1-\frac{2}{\log n}.

Note that crucially TT^{\prime} is independent of {Xij}iT1,jT2\{X_{ij}\}_{i\in T_{1},j\in T_{2}}. Thus, we condition on the event that |T|12m2|T^{\prime}|\geq\frac{1}{2}m^{2} in the sequel. Define In=(i,j)T𝟏{Xijτ}I_{n}=\sum_{(i,j)\in T^{\prime}}{\mathbf{1}_{\left\{{X_{ij}\leq-\tau}\right\}}}. To bound the {Xijτ}\mathbb{P}\{X_{ij}\leq-\tau\} from below, we need the following reverse large-deviation estimate (proved at the end of this subsection):

Lemma 6.

Suppose {Xk}i[n]i.i.d.aδ+1+bδ1+(1ab)δ0\{X_{k}\}_{i\in[n]}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}a\delta_{+1}+b\delta_{-1}+(1-a-b)\delta_{0} with some a,b[0,1]a,b\in[0,1] such that 0a+b10\leq a+b\leq 1, 1ab=Θ(1)1\leq\frac{a}{b}=\Theta(1), an=ω(1)an=\omega(1), and (ab)22lognn\left(\sqrt{a}-\sqrt{b}\right)^{2}\leq 2\frac{\log n}{n}. For all τ\tau such that τ=o(anlogn)\tau=o\left(\sqrt{an\log n}\right) and τ=o(an)\tau=o(an), and any constant 0<δ<10<\delta<1, there exists n0n_{0} such that for all nn0n\geq n_{0},

{k=1nXkτ}exp(n(ab)2δ2logn).\displaystyle\mathbb{P}\left\{\sum_{k=1}^{n}X_{k}\leq-\tau\right\}\geq\exp\left(-n\left(\sqrt{a}-\sqrt{b}\right)^{2}-\frac{\delta}{2}\log n\right).

To apply this lemma, note that τ=O(n(ab)log2n+nalogn)\tau=O\left(\frac{n(a-b)}{\log^{2}n}+\sqrt{\frac{na}{\log n}}\right), so

τanlogn\displaystyle\frac{\tau}{\sqrt{an\log n}} =O((ab)nlog5n+1logn)=O(1logn)\displaystyle=O\left(\left(\sqrt{a}-\sqrt{b}\right)\sqrt{\frac{n}{\log^{5}n}}+\frac{1}{\log n}\right)=O\left(\frac{1}{\log n}\right)
τan\displaystyle\frac{\tau}{an} =O(1log2n+1anlogn)=O(1logn),\displaystyle=O\left(\frac{1}{\log^{2}n}+\frac{1}{\sqrt{an\log n}}\right)=O\left(\frac{1}{\sqrt{\log n}}\right),

where we used the fact that mn/log2nm\leq n/\log^{2}n, bab\leq a, ab=O(log(n)/n)\sqrt{a}-\sqrt{b}=O(\sqrt{\log(n)/n}), and an=ω(1)an=\omega(1). Then, applying Lemma 6 with δ=ϵ4\delta=\frac{\epsilon}{4} yields

𝔼[In]12m2exp(|Tc|(ab)2ϵ8logn)nϵo(1),\displaystyle\mathbb{E}\left[I_{n}\right]\geq\frac{1}{2}m^{2}\exp\left(-|T^{c}|\left(\sqrt{a}-\sqrt{b}\right)^{2}-\frac{\epsilon}{8}\log n\right)\geq n^{\epsilon-o(1)}, (63)

where the last inequality holds by assumption that (ab)22(1ϵ)lognn\left(\sqrt{a}-\sqrt{b}\right)^{2}\leq 2\left(1-\epsilon\right)\frac{\log n}{n}. Next, we show that Var[In]/𝔼[In]2=o(1)\mathrm{Var}{\left[I_{n}\right]}/\mathbb{E}\left[I_{n}\right]^{2}=o(1), which, by Chebyshev’s inequality, implies that In12𝔼[In]I_{n}\geq\frac{1}{2}\mathbb{E}\left[I_{n}\right] with high probability.

Write

Var[In]\displaystyle\mathrm{Var}{\left[I_{n}\right]} =(i,j),(i,j)T({Xijτ,Xijτ}{Xijτ}{Xijτ})(I)+(II),\displaystyle=\sum_{(i,j),(i^{\prime},j^{\prime})\in T^{\prime}}\left(\mathbb{P}\left\{X_{ij}\leq-\tau,X_{i^{\prime}j^{\prime}}\leq-\tau\right\}-\mathbb{P}\left\{X_{ij}\leq-\tau\right\}\mathbb{P}\left\{X_{i^{\prime}j^{\prime}}\leq-\tau\right\}\right)\leq\text{(I)}+\text{(II)},

where

(I) =(i,j)T{Xijτ}=𝔼[In],\displaystyle=\sum_{(i,j)\in T^{\prime}}\mathbb{P}\left\{X_{ij}\leq-\tau\right\}=\mathbb{E}\left[I_{n}\right],
(II) =(i,j),(i,j)T,jj{Xij0,Xij0}+(i,j),(i,j)T,ii{Xij0,Xij0}.\displaystyle=\sum_{(i,j),(i,j^{\prime})\in T^{\prime},j\neq j^{\prime}}\mathbb{P}\left\{X_{ij}\leq 0,X_{ij^{\prime}}\leq 0\right\}+\sum_{(i,j),(i^{\prime},j)\in T^{\prime},i\neq i^{\prime}}\mathbb{P}\left\{X_{ij}\leq 0,X_{i^{\prime}j}\leq 0\right\}.

To bound (II), fix (i,j),(i,j)T(i,j),(i,j^{\prime})\in T^{\prime} such that jjj\neq j^{\prime}. Note that {Xij,k}jT2i.i.d.Bern(p00)\{X_{ij,k}\}_{j\in T_{2}}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}{\rm Bern}\left(p_{00}\right) conditional on Aik=1,Bik=1A_{ik}=1,B_{ik}=1; {Xij,k}jT2i.i.d.Bern(p10)\{-X_{ij,k}\}_{j\in T_{2}}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}{\rm Bern}\left(p_{10}\right) conditional on Aik=1,Bik=0A_{ik}=1,B_{ik}=0; {Xij,k}jT2i.i.d.Bern(p01)\{-X_{ij,k}\}_{j\in T_{2}}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}{\rm Bern}\left(p_{01}\right) conditional on Aik=0,Bik=1A_{ik}=0,B_{ik}=1; {Xij,k}jT2i.i.d.Bern(p11)\{X_{ij,k}\}_{j\in T_{2}}{\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}}{\rm Bern}\left(p_{11}\right) conditional on Aik=0,Bik=0A_{ik}=0,B_{ik}=0. Then, for ,{0,1}\ell,\ell^{\prime}\in\{0,1\}, we define

M=|{kTc|Aik=,Bik=}|,M_{\ell\ell^{\prime}}=\left|\left\{k\in T^{c}|A_{ik}=\ell,B_{ik}=\ell^{\prime}\right\}\right|,

and get that for any λ0\lambda\geq 0,

{kTcXij,k0,kTcXij,k0}\displaystyle\mathbb{P}\left\{\sum_{k\in T^{c}}X_{ij,k}\leq 0,\sum_{k\in T^{c}}X_{ij^{\prime},k}\leq 0\right\}
=(a)𝔼[{kTcXij,k0|M11,M10,M10,M00}{kTcXij,k0|M11,M10,M10,M00}]\displaystyle\overset{(a)}{=}\mathbb{E}\left[\mathbb{P}\left\{\sum_{k\in T^{c}}X_{ij,k}\leq 0\bigg{|}M_{11},M_{10},M_{10},M_{00}\right\}\mathbb{P}\left\{\sum_{k\in T^{c}}X_{ij^{\prime},k}\leq 0\bigg{|}M_{11},M_{10},M_{10},M_{00}\right\}\right]
=(b)𝔼[γ002M11γ012M10γ10p102M10γ112M00]\displaystyle\overset{(b)}{=}\mathbb{E}\left[\gamma_{00}^{2M_{11}}\gamma_{01}^{2M_{10}}\gamma_{10}p_{10}^{2M_{10}}\gamma_{11}^{2M_{00}}\right]
=(c)(γ002p11+γ012p10+γ102p01+γ112p00)|Tc|,\displaystyle\overset{(c)}{=}\left(\gamma_{00}^{2}p_{11}+\gamma_{01}^{2}p_{10}+\gamma_{10}^{2}p_{01}+\gamma_{11}^{2}p_{00}\right)^{|T^{c}|},

where γ,=1p+pe(2||1)λ\gamma_{\ell,\ell^{\prime}}=1-p_{\ell\ell^{\prime}}+p_{\ell\ell^{\prime}}e^{(2|\ell-\ell^{\prime}|-1)\lambda}; (a)(a) holds because kTXij,k\sum_{k\in T^{\prime}}X_{ij,k} and kTXij,k\sum_{k\in T^{\prime}}X_{ij^{\prime},k} are independent conditional on M11,M10,M10,M00M_{11},M_{10},M_{10},M_{00}; (b)(b) holds by applying the Chernoff bound; (c)(c) holds by applying the MGF of the multinomial distribution, since M11,M10,M10,M00Multi(|Tc|,p11,p10,p01,p00)M_{11},M_{10},M_{10},M_{00}\sim\mathrm{Multi}(|T^{c}|,p_{11},p_{10},p_{01},p_{00}). Choosing eλ=p00p11p01p10=abe^{\lambda}=\sqrt{\frac{p_{00}p_{11}}{p_{01}p_{10}}}=\sqrt{\frac{a}{b}} where a=2p00p11a=2p_{00}p_{11} and b=2p01p10b=2p_{01}p_{10}, we have

{kTcXij,k0,kTcXij,k0}(132(ab)2)|Tc|.\displaystyle\mathbb{P}\left\{\sum_{k\in T^{c}}X_{ij,k}\leq 0,\sum_{k\in T^{c}}X_{ij^{\prime},k}\leq 0\right\}\leq\left(1-\frac{3}{2}\left(\sqrt{a}-\sqrt{b}\right)^{2}\right)^{|T^{c}|}\,.

The same upper bound applies to any (i,j),(i,j)T(i,j),(i^{\prime},j)\in T^{\prime} such that iii\neq i^{\prime}.

Therefore, we get that

(II)2m3(132(ab)2)|Tc|\displaystyle\text{(II)}\leq 2m^{3}\left(1-\frac{3}{2}\left(\sqrt{a}-\sqrt{b}\right)^{2}\right)^{|T^{c}|} 2m3exp(32(ab)2|Tc|).\displaystyle\leq 2m^{3}\exp\left(-\frac{3}{2}\left(\sqrt{a}-\sqrt{b}\right)^{2}|T^{c}|\right).

Hence, by Chebyshev’s inequality,

{In12𝔼[In]}\displaystyle\mathbb{P}\left\{I_{n}\leq\frac{1}{2}\mathbb{E}\left[I_{n}\right]\right\} Var[In]14(𝔼[In])2\displaystyle\leq\frac{\mathrm{Var}{\left[I_{n}\right]}}{\frac{1}{4}\left(\mathbb{E}\left[I_{n}\right]\right)^{2}}
4𝔼[In]+8×(II)(𝔼[In])2\displaystyle\leq\frac{4}{\mathbb{E}\left[I_{n}\right]}+\frac{8\times\text{(II)}}{\left(\mathbb{E}\left[I_{n}\right]\right)^{2}}
(a)nϵ+o(1)+32mexp(12(ab)2|Tc|+ϵ8logn)\displaystyle\overset{(a)}{\leq}n^{-\epsilon+o(1)}+\frac{32}{m}\exp\left(\frac{1}{2}\left(\sqrt{a}-\sqrt{b}\right)^{2}|T^{c}|+\frac{\epsilon}{8}\log n\right)
(b)n7ϵ/8+o(1)\displaystyle\overset{(b)}{\leq}n^{-7\epsilon/8+o(1)}

where (a)(a) is due to (63); (b)(b) holds by the assumption that (ab)22(1ϵ)lognn\left(\sqrt{a}-\sqrt{b}\right)^{2}\leq 2\left(1-\epsilon\right)\frac{\log n}{n}.

Therefore, with probability 1nΩ(ϵ)1-n^{-\Omega(\epsilon)} there exists some (i,j)T(i,j)\in T^{\prime} such that XijτX_{ij}\leq-\tau. By definition of TT^{\prime}, we have Yij<τY_{ij}<\tau and hence Xij+Yij<0X_{ij}+Y_{ij}<0. Thus, we arrive at the desired claim (61).

Proof of Lemma 6.

Let En={k=1nXkτ}E_{n}=\left\{\sum_{k=1}^{n}X_{k}\leq-\tau\right\}. Let QQ denote the distribution of XkX_{k}. The following large-deviation lower estimate based on the data processing inequality is well-known (see [CK82, Eq. (5.21), p. 167] and [HWX17, Lemma 3]): For any QQ^{\prime},

Q(En)exp(nD(QQ)log2Q(En)).\displaystyle Q\left(E_{n}\right)\geq\exp\left(\frac{-nD\left(Q^{\prime}\|Q\right)-\log 2}{Q^{\prime}(E_{n})}\right). (64)

Choose

Q=αβ2δ+1+α+β2δ1+(1α)δ0,Q^{\prime}=\frac{\alpha-\beta}{2}\delta_{+1}+\frac{\alpha+\beta}{2}\delta_{-1}+(1-\alpha)\delta_{0}\,,

where

α2ab1(ab)2, and βmin{α2,δ8blognn}.\alpha\triangleq\frac{2\sqrt{ab}}{1-\left(\sqrt{a}-\sqrt{b}\right)^{2}},\quad\text{ and }\quad\beta\triangleq\min\left\{\frac{\alpha}{2},\,\frac{\delta}{8}\sqrt{\frac{b\log n}{n}}\right\}\,.

Note that under the assumption that 1a/b=Θ(1)1\leq a/b=\Theta(1) and (ab)2lognn\left(\sqrt{a}-\sqrt{b}\right)^{2}\leq\frac{\log n}{n}, we have that 2bα=Θ(a)2b\leq\alpha=\Theta(a). Moreover, since τ=o(anlogn)\tau=o(\sqrt{an\log n}) and τ=o(an)\tau=o(an), it follows that τ=o(βn)\tau=o(\beta n). Then,

Q(Enc)=Q(k=1nYkτ)\displaystyle Q^{\prime}(E_{n}^{c})=Q^{\prime}\left(\sum_{k=1}^{n}Y_{k}\geq-\tau\right) (a)Q(k=1n(Yk𝔼[Yk])βnτ)\displaystyle\overset{\rm(a)}{\leq}Q^{\prime}\left(\sum_{k=1}^{n}\left(Y_{k}-\mathbb{E}\left[Y_{k}\right]\right)\geq\beta n-\tau\right)
(b)k=1nVar[Yk](βnτ)2\displaystyle\overset{\rm(b)}{\leq}\frac{\sum_{k=1}^{n}\mathrm{Var}{\left[Y_{k}\right]}}{\left(\beta n-\tau\right)^{2}}
(c)αn(βnτ)2=(d)o(1),\displaystyle\overset{\rm(c)}{\leq}\frac{\alpha n}{\left(\beta n-\tau\right)^{2}}\overset{\rm(d)}{=}o(1)\,,

where (a)(a) holds because 𝔼[Yk]=β\mathbb{E}\left[Y_{k}\right]=-\beta; (b)(b) follows from by Chebyshev’s inequality as {Yk}i[n]\{Y_{k}\}_{i\in[n]} are independent and τ=o(βn)\tau=o(\beta n); (c)(c) holds because k=1nVar[Yk]k=1n𝔼[Yk2]αn\sum_{k=1}^{n}\mathrm{Var}{\left[Y_{k}\right]}\leq\sum_{k=1}^{n}\mathbb{E}\left[Y_{k}^{2}\right]\leq\alpha n; (d)(d) holds because τ=o(βn)\tau=o\left(\beta n\right), and αβ2n=max{64αδ2blogn,4αn}=o(1)\frac{\alpha}{\beta^{2}n}=\max\left\{\frac{64\alpha}{\delta^{2}b\log n},\frac{4}{\alpha n}\right\}=o(1), in view of that δ=Θ(1)\delta=\Theta(1), α=Θ(a)=Θ(b)\alpha=\Theta(a)=\Theta(b) and αn=Θ(an)=ω(1)\alpha n=\Theta(an)=\omega(1).

Next, we upper bound D(QQ)D\left(Q^{\prime}\|Q\right). We get

D(QQ)\displaystyle D\left(Q^{\prime}\|Q\right) =αβ2logαβ2a+α+β2logα+β2b+(1α)log1α1ab\displaystyle=\frac{\alpha-\beta}{2}\log\frac{\alpha-\beta}{2a}+\frac{\alpha+\beta}{2}\log\frac{\alpha+\beta}{2b}+(1-\alpha)\log\frac{1-\alpha}{1-a-b}
=(a)log(1(ab)2)+α2log(α2β2)α2+β2loga(α+β)b(αβ)\displaystyle\overset{\rm(a)}{=}-\log\left(1-\left(\sqrt{a}-\sqrt{b}\right)^{2}\right)+\frac{\alpha}{2}\log\frac{\left(\alpha^{2}-\beta^{2}\right)}{\alpha^{2}}+\frac{\beta}{2}\log\frac{a(\alpha+\beta)}{b(\alpha-\beta)}
(b)log(1(ab)2)+β2αβ+βabb\displaystyle\overset{\rm(b)}{\leq}-\log\left(1-\left(\sqrt{a}-\sqrt{b}\right)^{2}\right)+\frac{\beta^{2}}{\alpha-\beta}+\beta\frac{\sqrt{a}-\sqrt{b}}{\sqrt{b}}
(c)(ab)2+δlogn4n,\displaystyle\overset{\rm(c)}{\leq}\left(\sqrt{a}-\sqrt{b}\right)^{2}+\frac{\delta\log n}{4n}\,,

where (a)(a) holds because α=2ab1(ab)2\alpha=\frac{2\sqrt{ab}}{1-\left(\sqrt{a}-\sqrt{b}\right)^{2}}; (b)(b) holds because logα+βαβ2βαβ\log\frac{\alpha+\beta}{\alpha-\beta}\leq\frac{2\beta}{\alpha-\beta} and 12logababb\frac{1}{2}\log\frac{a}{b}\leq\frac{\sqrt{a}-\sqrt{b}}{\sqrt{b}}, in view of log(1+x)x\log(1+x)\leq x; (c)(c) holds for all sufficiently large nn because (ab)22lognn\left(\sqrt{a}-\sqrt{b}\right)^{2}\leq 2\frac{\log n}{n} so that log(1(ab)2)=(ab)2+O(log2(n)/n2)\log(1-(\sqrt{a}-\sqrt{b})^{2})=-(\sqrt{a}-\sqrt{b})^{2}+O\left(\log^{2}(n)/n^{2}\right), β12α\beta\leq\frac{1}{2}\alpha and βδ8blognn\beta\leq\frac{\delta}{8}\sqrt{\frac{b\log n}{n}} so that β2αβ2β2αδ2blogn32αnδ2logn32n\frac{\beta^{2}}{\alpha-\beta}\leq\frac{2\beta^{2}}{\alpha}\leq\frac{\delta^{2}b\log n}{32\alpha n}\leq\frac{\delta^{2}\log n}{32n}, and βabbδ8lognn(ab)2δlogn8n\beta\frac{\sqrt{a}-\sqrt{b}}{\sqrt{b}}\leq\frac{\delta}{8}\sqrt{\frac{\log n}{n}}\left(\sqrt{a}-\sqrt{b}\right)\leq\frac{\sqrt{2}\delta\log n}{8n}.

In conclusion, it follows from (64) that

Q(En)\displaystyle Q\left(E_{n}\right) exp((1+o(1))(n(ab)2+δ4logn+log2))\displaystyle\geq\exp\left(-\left(1+o(1)\right)\left(n\left(\sqrt{a}-\sqrt{b}\right)^{2}+\frac{\delta}{4}\log n+\log 2\right)\right)
exp(n(ab)2δ2logn),\displaystyle\geq\exp\left(-n\left(\sqrt{a}-\sqrt{b}\right)^{2}-\frac{\delta}{2}\log n\right),

where the last inequality holds for all sufficiently large nn in view of n(ab)22lognn\left(\sqrt{a}-\sqrt{b}\right)^{2}\leq 2\log n. ∎

Appendix A Moment Generating Functions Associated with Edge Orbits

Lemma 7.

Fixing π\pi and π{\pi^{\prime}}, where π\pi is the latent permutation under 𝒫{\mathcal{P}}. For any edge orbit OO of σ=π1π\sigma=\pi^{-1}\circ{\pi^{\prime}} with |O|=k|O|=k and t0t\geq 0, let Mk𝔼[exp(t(i,j)OA¯π(i)π(j)B¯ij)]M_{k}\triangleq\mathbb{E}\left[\exp\left(t\sum_{(i,j)\in O}\overline{A}_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\overline{B}_{ij}\right)\right], where A¯ij=A¯ij𝔼Aij\overline{A}_{ij}=\overline{A}_{ij}-\mathbb{E}A_{ij} and B¯ij=B¯ij𝔼Bij\overline{B}_{ij}=\overline{B}_{ij}-\mathbb{E}B_{ij}.

  • For Erdős-Rényi random graphs,

    Mk=(TT24D2)k+(T+T24D2)k,\displaystyle M_{k}=\left(\frac{T-\sqrt{T^{2}-4D}}{2}\right)^{k}+\left(\frac{T+\sqrt{T^{2}-4D}}{2}\right)^{k}, (65)

    where

    T\displaystyle T =et(1q)2qs+2etq(1q)q(1s)+etq2(12q+qs)\displaystyle=e^{t(1-q)^{2}}qs+2e^{-tq(1-q)}q(1-s)+e^{tq^{2}}(1-2q+qs)
    D\displaystyle D =(et(1q)2+tq2e2tq(1q))q(sq)\displaystyle=\left(e^{t(1-q)^{2}+tq^{2}}-e^{-2tq(1-q)}\right)q(s-q)

    and qpsq\triangleq ps.

  • For Gaussian model, for t11+ρt\leq\frac{1}{1+\rho},

    Mk=[(λ1+λ22)k(λ1λ22)k]1\displaystyle M_{k}=\left[\left(\frac{\lambda_{1}+\lambda_{2}}{2}\right)^{k}-\left(\frac{\lambda_{1}-\lambda_{2}}{2}\right)^{k}\right]^{-1} (66)

    where λ1=(1+ρt)2t2\lambda_{1}=\sqrt{(1+\rho t)^{2}-t^{2}} and λ2=(1ρt)2t2\lambda_{2}=\sqrt{(1-\rho t)^{2}-t^{2}}.

Moreover, MkM2k/2M_{k}\leq M_{2}^{k/2} for all k2k\geq 2.

Proof.

For ease of notation, let {(ai,bi)}i=1k\{(a_{{i}},b_{{i}})\}_{{i}=1}^{k} be independently and identically distributed pairs of random variables with joint distribution PP. Let ak+1=a1a_{k+1}=a_{1} and bk+1=b1b_{k+1}=b_{1}. Since OO is an edge orbit of σ𝖤\sigma^{\sf E}, we have {Aπ(i)π(j)}(i,j)O={Aπ(i)π(j)}(i,j)O\{A_{\pi(i)\pi(j)}\}_{(i,j)\in O}=\{A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\}_{(i,j)\in O} and (π(i),π(j))=(π(σ(i)),π(σ(j)))({\pi^{\prime}}(i),{\pi^{\prime}}(j))=(\pi(\sigma(i)),\pi(\sigma(j))). Then, we have that

Mk\displaystyle M_{k} =𝔼[exp(i=1kt(ai+1𝔼[ai+1])(bi𝔼[bi+1]))]\displaystyle=\mathbb{E}\left[\exp\left(\sum_{{i}=1}^{k}t(a_{{i}+1}-\mathbb{E}\left[a_{{i}+1}\right])(b_{{i}}-\mathbb{E}\left[b_{{i}+1}\right])\right)\right]
=𝔼[𝔼[exp(i=1kt(ai+1𝔼[ai+1])(bi𝔼[bi+1]))|a1,a2,,ak]]\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\exp\left(\sum_{{i}=1}^{k}t(a_{{i}+1}-\mathbb{E}\left[a_{{i}+1}\right])(b_{{i}}-\mathbb{E}\left[b_{{i}+1}\right])\right)\bigg{|}a_{1},a_{2},\cdots,a_{k}\right]\right]
=𝔼[i=1k𝔼[exp(t(ai+1𝔼[ai+1])(bi𝔼[bi+1]))|ai,ai+1]].\displaystyle=\mathbb{E}\left[\prod_{{i}=1}^{k}\mathbb{E}\left[\exp\bigg{(}t\left(a_{{i}+1}-\mathbb{E}\left[a_{{i}+1}\right]\right)\left(b_{{i}}-\mathbb{E}\left[b_{{i}+1}\right]\right)\bigg{)}\bigg{|}a_{{i}},a_{{i}+1}\right]\right].
  • For Erdős-Rényi random graphs, 𝔼[ai]=𝔼[bi]=q\mathbb{E}[a_{i}]=\mathbb{E}[b_{i}]=q, and Mk=tr(Mk),M_{k}=\mathrm{tr}\left(M^{k}\right), where MM is a 2×22\times 2 row-stochastic matrix with rows and columns indexed by {0,1}\{0,1\} and

    M(,m)\displaystyle M(\ell,m) =𝔼[exp(t(ai+1q)(biq))|ai=,ai+1=m](ai+1=m).\displaystyle=\mathbb{E}\left[\exp\bigg{(}t\left(a_{{i}+1}-q\right)\left(b_{{i}}-q\right)\bigg{)}\bigg{|}a_{{i}}=\ell,a_{{i}+1}=m\right]\mathbb{P}\left(a_{{i}+1}=m\right).
    =𝔼[exp(t(mq)(biq))||ai=](ai+1=m).\displaystyle=\mathbb{E}\left[\exp\bigg{(}t\left(m-q\right)\left(b_{{i}}-q\right)\bigg{)}\bigg{|}|a_{{i}}=\ell\right]\mathbb{P}\left(a_{{i}+1}=m\right).

    Explicitly, we have

    M=(etq2(1η)(1q)+etq(1q)η(1q)etq(1q)(1η)q+et(1q)2ηqetq2s(1q)+etq(1q)(1s)(1q)etq(1q)(1s)q+et(1q)2sq),M=\begin{pmatrix}e^{tq^{2}}\left(1-\eta\right)\left(1-q\right)+e^{-tq(1-q)}\eta\left(1-q\right)&e^{-tq(1-q)}\left(1-\eta\right)q+e^{t(1-q)^{2}}\eta q\\ e^{tq^{2}}s\left(1-q\right)+e^{-tq(1-q)}\left(1-s\right)\left(1-q\right)&e^{-tq(1-q)}\left(1-s\right)q+e^{t(1-q)^{2}}sq\end{pmatrix}\,,

    where η=q(1s)1q\eta=\frac{q(1-s)}{1-q}. The eigenvalues of MM are TT24D2\frac{T-\sqrt{T^{2}-4D}}{2} and T+T24D2\frac{T+\sqrt{T^{2}-4D}}{2}, where

    T\displaystyle T Tr(M)=et(1q)2qs+2etq(1q)q(1s)+etq2(12q+qs)\displaystyle\triangleq\mathrm{Tr}(M)=e^{t(1-q)^{2}}qs+2e^{-tq(1-q)}q(1-s)+e^{tq^{2}}(1-2q+qs)
    D\displaystyle D det(M)=(et(1q)2+tq2e2tq(1q))q(sq).\displaystyle\triangleq\det(M)=\left(e^{t(1-q)^{2}+tq^{2}}-e^{-2tq(1-q)}\right)q(s-q)\,.
  • For Gaussian model, 𝔼[ai]=𝔼[bi]=0\mathbb{E}[a_{i}]=\mathbb{E}[b_{i}]=0, and

    Mk=𝔼[i=1kexp(tρaiai+1+12t2(1ρ2)ai+12)],\displaystyle M_{k}=\mathbb{E}\left[\prod_{{i}=1}^{k}\exp\left(t\rho a_{{i}}a_{{i}+1}\,+\frac{1}{2}t^{2}\left(1-\rho^{2}\right)a_{{i}+1}^{2}\right)\right],

    where the equality follows from bi𝒩(ρai,1ρ2)b_{{i}}\sim{\mathcal{N}}(\rho a_{{i}},1-\rho^{2}) conditional on aia_{{i}} and 𝔼[exp(tZ)]=exp(tμ+t2ν2/2)\mathbb{E}\left[\exp\left(tZ\right)\right]=\exp\left(t\mu+t^{2}\nu^{2}/2\right) for Z𝒩(μ,ν2)Z\sim{\mathcal{N}}(\mu,\nu^{2}).

    Let λ1=(1+ρt)2t2\lambda_{1}=\sqrt{\left(1+\rho t\right)^{2}-t^{2}} and λ2=(1ρt)2t2\lambda_{2}=\sqrt{\left(1-\rho t\right)^{2}-t^{2}}, where t11+ρt\leq\frac{1}{1+\rho}. By change of variables,

    Mk\displaystyle M_{k} =i=1k12πexp((λ1+λ22ai+λ1λ22ai+1)22)da1dak\displaystyle=\int\cdots\int\prod_{{i}=1}^{k}\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{\left(\frac{\lambda_{1}+\lambda_{2}}{2}a_{{i}}+\frac{\lambda_{1}-\lambda_{2}}{2}a_{{i}+1}\right)^{2}}{2}\right)\mathrm{d}a_{1}\cdots\mathrm{d}a_{k}
    =(a)i=1k12πexp(Xi22)dX1dXkdet(J1)\displaystyle\overset{\rm(a)}{=}\int\cdots\int\prod_{{i}=1}^{k}\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{X_{{i}}^{2}}{2}\right)\mathrm{d}X_{1}\cdots\mathrm{d}X_{k}\det(J^{-1})
    =(b)1(λ1+λ22)k(λ1λ22)k,\displaystyle\overset{\rm(b)}{=}\frac{1}{\left(\frac{\lambda_{1}+\lambda_{2}}{2}\right)^{k}-\left(\frac{\lambda_{1}-\lambda_{2}}{2}\right)^{k}},

    where (a) holds by letting Xi=λ1+λ22ai+λ1λ22ai+1X_{{i}}=\frac{\lambda_{1}+\lambda_{2}}{2}a_{{i}}+\frac{\lambda_{1}-\lambda_{2}}{2}a_{{i}+1}, and denoting JJ as the Jacobian matrix with Jij=XkajJ_{ij}=\frac{\partial X_{k}}{\partial a_{j}} whose inverse matrix is

    J=(λ1+λ22λ1λ22000λ1+λ22λ1λ22000λ1+λ220λ1λ22λ1λ22000λ1+λ22);\displaystyle J=\begin{pmatrix}\frac{\lambda_{1}+\lambda_{2}}{2}&\frac{\lambda_{1}-\lambda_{2}}{2}&0&\cdots&0\\ 0&\frac{\lambda_{1}+\lambda_{2}}{2}&\frac{\lambda_{1}-\lambda_{2}}{2}&\cdots&0\\ 0&0&\frac{\lambda_{1}+\lambda_{2}}{2}&\cdots&0\\ \vdots&\vdots&\vdots&\ddots&\frac{\lambda_{1}-\lambda_{2}}{2}\\ \frac{\lambda_{1}-\lambda_{2}}{2}&0&\cdots 0&0&\frac{\lambda_{1}+\lambda_{2}}{2}\\ \end{pmatrix}\,;

    (b) holds because det(J1)=det(J)1\det(J^{-1})=\det(J)^{-1}, where det(J)=(λ1+λ22)k(λ1λ22)k\det(J)=\left(\frac{\lambda_{1}+\lambda_{2}}{2}\right)^{k}-\left(\frac{\lambda_{1}-\lambda_{2}}{2}\right)^{k}.

Finally, we prove that Mk(M2)k/2M_{k}\leq(M_{2})^{k/2} for k2k\geq 2. Indeed, for the Erdős-Rényi graphs, this simply follows from xk+yk(x2+y2)k/2x^{k}+y^{k}\leq(x^{2}+y^{2})^{k/2} for x,y0x,y\geq 0 and kk\in{\mathbb{N}}. Analogously, for the Gaussian model, this follows from (a+b)k(ab)k(4ab)k/2(a+b)^{k}-(a-b)^{k}\geq(4ab)^{k/2}, which holds by rewriting (a+b)2=(ab)2+4ab(a+b)^{2}=(a-b)^{2}+4ab and letting x=(ab)2x=(a-b)^{2} and y=4aby=4ab. ∎

Lemma 8.

Fixing π\pi and π~{\widetilde{\pi}}, where π\pi is the latent permutation under 𝒫{\mathcal{P}}. For any edge orbit OO of σ=π1π\sigma=\pi^{-1}\circ{\pi^{\prime}} with |O|=k|O|=k and t>0t>0, let Lk𝔼[exp(t(i,j)OAπ(i)π(j)Bij(i,j)OAπ(i)π(j)Bij)]L_{k}\triangleq\mathbb{E}\left[\exp\left(t\sum_{(i,j)\in O}A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}B_{ij}-\sum_{(i,j)\in O}A_{\pi(i)\pi(j)}B_{ij}\right)\right].

  • For general Erdős-Rényi random graphs model,

    Lk\displaystyle L_{k} =(TT24D2)k+(T+T24D2)k,\displaystyle=\left(\frac{T-\sqrt{T^{2}-4D}}{2}\right)^{k}+\left(\frac{T+\sqrt{T^{2}-4D}}{2}\right)^{k}, (67)

    where T=1T=1 and D=(p01p10(et1)+p00p11(et1))D=-\left(p_{01}p_{10}\left(e^{t}-1\right)+p_{00}p_{11}\left(e^{-t}-1\right)\right).

  • For Gaussian model, for t12(1ρ)t\leq\frac{1}{2(1-\rho)},

    Lk=[(λ1+λ22)k(λ1λ22)k]1,\displaystyle L_{k}=\left[\left(\frac{\lambda_{1}+\lambda_{2}}{2}\right)^{k}-\left(\frac{\lambda_{1}-\lambda_{2}}{2}\right)^{k}\right]^{-1}, (68)

    where λ1=1\lambda_{1}=1 and λ2=1+4tρ4t2(1ρ2)\lambda_{2}=\sqrt{1+4t\rho-4t^{2}\left(1-\rho^{2}\right)}.

Moreover, LkL2k/2L_{k}\leq L_{2}^{k/2} for all k2k\geq 2.

Proof.

For ease of notation, let {(ai,bi)}i=1k\{(a_{{i}},b_{{i}})\}_{{i}=1}^{k} be independently and identically distributed pairs of random variables with joint distribution PP. Let ak+1=a1a_{k+1}=a_{1} and bk+1=b1b_{k+1}=b_{1}. Since OO is an edge orbit of σ𝖤\sigma^{\sf E}, we have {Aπ(i)π(j)}(i,j)O={Aπ(i)π(j)}(i,j)O\{A_{\pi(i)\pi(j)}\}_{(i,j)\in O}=\{A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}\}_{(i,j)\in O} and (π(i),π(j))=π(σ(i),σ(j))({\pi^{\prime}}(i),{\pi^{\prime}}(j))=\pi(\sigma(i),\sigma(j)). Then, we have that

Lk\displaystyle L_{k} =𝔼[exp(t(i,j)O(Aπ(i)π(j)Aπ(i)π(j)))Bij]\displaystyle=\mathbb{E}\left[\exp\left(t\sum_{(i,j)\in O}\left(A_{{\pi^{\prime}}(i){\pi^{\prime}}(j)}-A_{\pi(i)\pi(j)}\right)\right)B_{ij}\right]
=𝔼[𝔼[exp(i=1kt(ai+1ai)bi)|a1,a2,,ak]]\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\exp\left(\sum_{{i}=1}^{k}t\left(a_{{i}+1}-a_{{i}}\right)b_{{i}}\right)\bigg{|}a_{1},a_{2},\cdots,a_{k}\right]\right]
=𝔼[i=1k𝔼[exp(t(ai+1ai)bi)|ai,ai+1]].\displaystyle=\mathbb{E}\left[\prod_{{i}=1}^{k}\mathbb{E}\left[\exp\left(t\left(a_{{i}+1}-a_{{i}}\right)b_{{i}}\right)\bigg{|}a_{{i}},a_{{i}+1}\right]\right].
  • For Erdős-Rényi random graphs, Lk=tr(Lk)L_{k}=\mathrm{tr}\left(L^{k}\right), where LL is a 2×22\times 2 row-stochastic matrix with rows and columns indexed by {0,1}\{0,1\} and

    L(,m)\displaystyle L(\ell,m) =𝔼[exp(t(ai+1ai)bi)|ai=,ai+1=m](ai+1=m).\displaystyle=\mathbb{E}\left[\exp\left(t\left(a_{{i}+1}-a_{{i}}\right)b_{{i}}\right)|a_{{i}}=\ell,a_{{i}+1}=m\right]\mathbb{P}\left(a_{{i}+1}=m\right).
    =𝔼[exp(t(m)bi)|ai=](ai+1=m).\displaystyle=\mathbb{E}\left[\exp\left(t(m-\ell)b_{{i}}\right)|a_{{i}}=\ell\right]\mathbb{P}\left(a_{{i}+1}=m\right).

    Explicitly, we have

    L=(1p10p11(p011p10p11et+p001p10p11)(p10+p11)(p11p10+p11(et1)+1)(p10+p11)p10+p11).L=\begin{pmatrix}1-p_{10}-p_{11}&\left(\frac{p_{01}}{1-p_{10}-p_{11}}e^{t}+\frac{p_{00}}{1-p_{10}-p_{11}}\right)(p_{10}+p_{11})\\ \left(\frac{p_{11}}{p_{10}+p_{11}}(e^{t}-1)+1\right)(p_{10}+p_{11})&p_{10}+p_{11}\end{pmatrix}\,.

    The eigenvalues of LL are TT24D2\frac{T-\sqrt{T^{2}-4D}}{2} and T+T24D2\frac{T+\sqrt{T^{2}-4D}}{2}, where T=1T=1 and

    D=(p01p10(et1)+p00p11(et1)).D=-\left(p_{01}p_{10}\left(e^{t}-1\right)+p_{00}p_{11}\left(e^{-t}-1\right)\right).
  • For Gaussian model,

    Lk\displaystyle L_{k} =𝔼[i=1kexp(t(ai+1ai)ρai+t2(ai+1ai)2(1ρ2)/2)]\displaystyle=\mathbb{E}\left[\prod_{{i}=1}^{k}\exp\left(t\left(a_{{i}+1}-a_{{i}}\right)\rho a_{{i}}+t^{2}\left(a_{{i}+1}-a_{{i}}\right)^{2}\left(1-\rho^{2}\right)/2\right)\right]
    =𝔼[exp((t2(1ρ2)tρ)i=1k(ai2aiai+1))]\displaystyle=\mathbb{E}\left[\exp\left(\left(t^{2}\left(1-\rho^{2}\right)-t\rho\right)\sum_{{i}=1}^{k}\left(a_{{i}}^{2}-a_{{i}}a_{{i}+1}\right)\right)\right]
    =1(λ1+λ22)k(λ1λ22)k.\displaystyle=\frac{1}{\left(\frac{\lambda_{1}+\lambda_{2}}{2}\right)^{k}-\left(\frac{\lambda_{1}-\lambda_{2}}{2}\right)^{k}}.

    where the first equality follows from bi𝒩(ρai,1ρ2)b_{{i}}\sim{\mathcal{N}}(\rho a_{{i}},1-\rho^{2}) conditional on aia_{{i}} and 𝔼[exp(tZ)]=exp(tμ+t2ν2/2)\mathbb{E}\left[\exp\left(tZ\right)\right]=\exp\left(t\mu+t^{2}\nu^{2}/2\right) for Z𝒩(μ,ν2)Z\sim{\mathcal{N}}(\mu,\nu^{2}) ; the last equality holds by change of variables and Gaussian integral analogous to the proof of Lemma 7.

Finally, Lk(L2)k/2L_{k}\leq(L_{2})^{k/2} for k2k\geq 2 follows from the same reasoning as in Lemma 7. ∎

Appendix B Proof of Lemma 4

In view of (65) in Lemma 7, M1=TM_{1}=T and

M2=T22D,\displaystyle M_{2}=T^{2}-2D\,, (69)

where by letting a=et(1q)2a=e^{t(1-q)^{2}}, b=etq(1q)b=e^{-tq(1-q)}, and c=etq2c=e^{tq^{2}}, we get

T=aqs+2bq(1s)+c(12q+qs),D=(acb2)q(sq).T=aqs+2bq(1-s)+c(1-2q+qs),\quad D=\left(ac-b^{2}\right)q(s-q)\,.
  • Suppose tlog1pt\leq\log\frac{1}{p}. Since 1+xex1+x+x21+x\leq e^{x}\leq 1+x+x^{2} for 0x10\leq x\leq 1 and 1+x+x2/2+x3ex1+x+x2/21+x+x^{2}/2+x^{3}\leq e^{x}\leq 1+x+x^{2}/2 for 1x<0-1\leq x<0, we get

    1tq(1q)+12t2q2(1q)2t3q3\displaystyle 1-tq(1-q)+\frac{1}{2}t^{2}q^{2}(1-q)^{2}-t^{3}q^{3} b1tq(1q)+12t2q2(1q)2\displaystyle\leq b\leq 1-tq(1-q)+\frac{1}{2}t^{2}q^{2}(1-q)^{2}
    1+tq2\displaystyle 1+tq^{2} c1+tq2+t2q4.\displaystyle\leq c\leq 1+tq^{2}+t^{2}q^{4}\,.

    It follows that

    T\displaystyle T et(1q)2qs+2(1tq(1q)+12t2q2(1q)2)q(1s)+(1+tq2+t2q4)(12q+qs)\displaystyle\leq e^{t(1-q)^{2}}qs+2\left(1-tq(1-q)+\frac{1}{2}t^{2}q^{2}(1-q)^{2}\right)q(1-s)+\left(1+tq^{2}+t^{2}q^{4}\right)(1-2q+qs)
    =α0+α1t+α2t2,\displaystyle=\alpha_{0}+\alpha_{1}t+\alpha_{2}t^{2}\,,

    where

    α0\displaystyle\alpha_{0} =2q(1s)+(12q+qs)+et(1q)2qs,\displaystyle=2q(1-s)+(1-2q+qs)+e^{t(1-q)^{2}}qs\,, (70)
    α1\displaystyle\alpha_{1} =2q(1q)q(1s)+q2(12q+qs),\displaystyle=-2q(1-q)q(1-s)+q^{2}(1-2q+qs)\,, (71)
    α2\displaystyle\alpha_{2} =q2(1q)2q(1s)+q4(12q+qs).\displaystyle=q^{2}(1-q)^{2}q(1-s)+q^{4}(1-2q+qs)\,. (72)

    Moreover,

    D\displaystyle D et(1q)2(1+tq2)q(sq)(1tq(1q)+12t2q2(1q)2)2q(sq)\displaystyle\geq e^{t(1-q)^{2}}\left(1+tq^{2}\right)q(s-q)-\left(1-tq(1-q)+\frac{1}{2}t^{2}q^{2}(1-q)^{2}\right)^{2}q(s-q)
    =β0+β1t+β2t2+β3t3+β4t4,\displaystyle=\beta_{0}+\beta_{1}t+\beta_{2}t^{2}+\beta_{3}t^{3}+\beta_{4}t^{4}\,,

    where

    β0\displaystyle\beta_{0} =q(sq)+et(1q)2q(sq),\displaystyle=-q(s-q)+e^{t(1-q)^{2}}q(s-q)\,, (73)
    β1\displaystyle\beta_{1} =2q2(1q)(sq)+et(1q)2q3(sq),\displaystyle=2q^{2}(1-q)(s-q)+e^{t(1-q)^{2}}q^{3}(s-q)\,, (74)
    β2\displaystyle\beta_{2} =2q3(1q)2(sq),\displaystyle=-2q^{3}(1-q)^{2}(s-q)\,, (75)
    β3\displaystyle\beta_{3} =q4(1q)3(sq),\displaystyle=q^{4}\left(1-q\right)^{3}(s-q)\,, (76)
    β4\displaystyle\beta_{4} =14q5(1q)4(sq).\displaystyle=-\frac{1}{4}q^{5}\left(1-q\right)^{4}(s-q)\,. (77)

    By (69), we get

    M2=T22D\displaystyle M_{2}=T^{2}-2D (α0+α1t+α2t2)22(β0+β1t+β2t2+β3t3+β4t4)\displaystyle\leq\left(\alpha_{0}+\alpha_{1}t+\alpha_{2}t^{2}\right)^{2}-2\left(\beta_{0}+\beta_{1}t+\beta_{2}t^{2}+\beta_{3}t^{3}+\beta_{4}t^{4}\right)
    =γ0+γ1t+γ2t2+γ3t3+γ4t4\displaystyle=\gamma_{0}+\gamma_{1}t+\gamma_{2}t^{2}+\gamma_{3}t^{3}+\gamma_{4}t^{4}

    where

    γ0\displaystyle\gamma_{0} =α022β0\displaystyle=\alpha_{0}^{2}-2\beta_{0} (78)
    γ1\displaystyle\gamma_{1} =2α0α12β1\displaystyle=2\alpha_{0}\alpha_{1}-2\beta_{1} (79)
    γ2\displaystyle\gamma_{2} =α12+2α0α22β2\displaystyle=\alpha_{1}^{2}+2\alpha_{0}\alpha_{2}-2\beta_{2} (80)
    γ3\displaystyle\gamma_{3} =2α1α22β3\displaystyle=2\alpha_{1}\alpha_{2}-2\beta_{3} (81)
    γ4\displaystyle\gamma_{4} =α222β4\displaystyle=\alpha_{2}^{2}-2\beta_{4} (82)

    By (70), (73) and (78), we get

    γ0\displaystyle\gamma_{0} =α022β0\displaystyle=\alpha_{0}^{2}-2\beta_{0}
    =(1qs+et(1q)2qs)22(q(sq)+et(1q)2q(sq))\displaystyle=\left(1-qs+e^{t(1-q)^{2}}qs\right)^{2}-2\left(-q(s-q)+e^{t(1-q)^{2}}q(s-q)\right)
    =12q2+q2s2+2et(1q)2q2(1s2)+e2t(1q)2q2s2,\displaystyle=1-2q^{2}+q^{2}s^{2}+2e^{t\left(1-q\right)^{2}}q^{2}\left(1-s^{2}\right)+e^{2t\left(1-q\right)^{2}}q^{2}s^{2}\,,
    12q2+q2s2+2etq2(1s2)+e2tq2s2.\displaystyle\leq 1-2q^{2}+q^{2}s^{2}+2e^{t}q^{2}\left(1-s^{2}\right)+e^{2t}q^{2}s^{2}\,.

    Note that the above bound on γ0\gamma_{0} determines the main terms in our final bound on M2M_{2}. For the remaining terms, we will show i=14γiti=O(q3t(1+t))\sum_{i=1}^{4}\gamma_{i}t^{i}=O(q^{3}t(1+t)) so that they can be made negligible later when we apply the bound on M2M_{2} in the proof of Lemma 3 (see (49)).

    By (70), (71), (74) and (79), we get

    γ1\displaystyle\gamma_{1} =2α0α12β1\displaystyle=2\alpha_{0}\alpha_{1}-2\beta_{1}
    =2q2+4q3+4q3s4q3s24q4+2q4s2+(4q3s+4q3s2+2q42q4s2)et(1q)2\displaystyle=-2q^{2}+4q^{3}+4q^{3}s-4q^{3}s^{2}-4q^{4}+2q^{4}s^{2}+\left(-4q^{3}s+4q^{3}s^{2}+2q^{4}-2q^{4}s^{2}\right)e^{t\left(1-q\right)^{2}}
    (a)4q3+2et(1q)2q4(b)6q3,\displaystyle\overset{(a)}{\leq}4q^{3}+2e^{t(1-q)^{2}}q^{4}\overset{(b)}{\leq}6q^{3}\,,

    where (a)(a) holds because 2q2+4q30-2q^{2}+4q^{3}\leq 0 given q12q\leq\frac{1}{2}, 4q3s24q4+2q4s20-4q^{3}s^{2}-4q^{4}+2q^{4}s^{2}\leq 0 and 4q3s+4q3s2+2q42q4s22q4-4q^{3}s+4q^{3}s^{2}+2q^{4}-2q^{4}s^{2}\leq 2q^{4}; (b)(b) holds because 2et(1q)2q42etq42q32e^{t\left(1-q\right)^{2}}q^{4}\leq 2e^{t}q^{4}\leq 2q^{3}, given tlog1pt\leq\log\frac{1}{p}. By (70), (71), (72), (75) and (80), we get

    γ2\displaystyle\gamma_{2} =α12+2α0α22β2\displaystyle=\alpha_{1}^{2}+2\alpha_{0}\alpha_{2}-2\beta_{2}
    =2q32q3sq44q4s+6q4s2+10q5s8q5s26q64q6s+q6s2+6q7+2q7s2q8\displaystyle=2q^{3}-2q^{3}s-q^{4}-4q^{4}s+6q^{4}s^{2}+10q^{5}s-8q^{5}s^{2}-6q^{6}-4q^{6}s+q^{6}s^{2}+6q^{7}+2q^{7}s-2q^{8}
    +et(1q)2(2q4s2q4s22q5s+4q5s2+2q6s)\displaystyle~{}~{}~{}~{}+e^{t\left(1-q\right)^{2}}\left(2q^{4}s-2q^{4}s^{2}-2q^{5}s+4q^{5}s^{2}+2q^{6}s\right)
    (a)2q32q3s+q4s2+10q5s+2etq4s(b)6q3,\displaystyle\overset{(a)}{\leq}2q^{3}-2q^{3}s+q^{4}s^{2}+10q^{5}s+2e^{t}q^{4}s\overset{(b)}{\leq}6q^{3}\,,

    where (a)(a) holds because q44q4s+6q4s2q4s2-q^{4}-4q^{4}s+6q^{4}s^{2}\leq q^{4}s^{2}, 8q5s26q64q6s+q6s2+6q7+2q7s2q80-8q^{5}s^{2}-6q^{6}-4q^{6}s+q^{6}s^{2}+6q^{7}+2q^{7}s-2q^{8}\leq 0, and 2q4s22q5s+4q5s2+2q6s0-2q^{4}s^{2}-2q^{5}s+4q^{5}s^{2}+2q^{6}s\leq 0 given q12q\leq\frac{1}{2}; (b)(b) holds because 2q3s+q4s2+10q5s2q3-2q^{3}s+q^{4}s^{2}+10q^{5}s\leq 2q^{3} given q12q\leq\frac{1}{2}, 2etq4s2q3s2q32e^{t}q^{4}s\leq 2q^{3}s\leq 2q^{3} given tlog1pt\leq\log\frac{1}{p}. By (71), (72), (76) and (81), we get

    γ3\displaystyle\gamma_{3} =2α1α22β3\displaystyle=2\alpha_{1}\alpha_{2}-2\beta_{3}
    =2q4s+12q5s4q5s24q616q6s+10q6s2+8q74q7s22q8+2q8s\displaystyle=-2q^{4}s+12q^{5}s-4q^{5}s^{2}-4q^{6}-16q^{6}s+10q^{6}s^{2}+8q^{7}-4q^{7}s^{2}-2q^{8}+2q^{8}s
    (a)2q4s+12q5s(b)4q4,\displaystyle\overset{(a)}{\leq}-2q^{4}s+12q^{5}s\overset{(b)}{\leq}4q^{4}\,,

    where (a)(a) holds because 4q5s24q616q6s+10q6s2+8q70-4q^{5}s^{2}-4q^{6}-16q^{6}s+10q^{6}s^{2}+8q^{7}\leq 0 and 4q7s22q8+2q8s0-4q^{7}s^{2}-2q^{8}+2q^{8}s\leq 0; (b)(b) holds by q12q\leq\frac{1}{2}. By (72), (77) and (82), we get

    γ4\displaystyle\gamma_{4} =α222β4\displaystyle=\alpha_{2}^{2}-2\beta_{4}
    =12(q5s+q68q6s+2q6s2+18q7s8q7s28q88q8s+8q8s2+8q97q9s+q10)\displaystyle=\frac{1}{2}(q^{5}s+q^{6}-8q^{6}s+2q^{6}s^{2}+18q^{7}s-8q^{7}s^{2}-8q^{8}-8q^{8}s+8q^{8}s^{2}+8q^{9}-7q^{9}s+q^{10})
    (a)12(q5s+q68q6s+2q6s2+18q7s)(b)2q5,\displaystyle\overset{(a)}{\leq}\frac{1}{2}(q^{5}s+q^{6}-8q^{6}s+2q^{6}s^{2}+18q^{7}s)\overset{(b)}{\leq}2q^{5}\,,

    where (a)(a) holds because 8q7s28q88q8s+8q8s2+8q90-8q^{7}s^{2}-8q^{8}-8q^{8}s+8q^{8}s^{2}+8q^{9}\leq 0, and 7q9s+q100-7q^{9}s+q^{10}\leq 0; (b)(b) holds because 8q6s+2q6s2+18q7s4q6s-8q^{6}s+2q^{6}s^{2}+18q^{7}s\leq 4q^{6}s and q5s+q6+4q6s4q5q^{5}s+q^{6}+4q^{6}s\leq 4q^{5}, given q12q\leq\frac{1}{2}.

    Combining the above bounds, we have

    M2\displaystyle M_{2} 12q2+q2s2+6q3t+6q3t2+4q4t3+2q5t4+2etq2(1s2)+e2tq2s2\displaystyle\leq 1-2q^{2}+q^{2}s^{2}+6q^{3}t+6q^{3}t^{2}+4q^{4}t^{3}+2q^{5}t^{4}+2e^{t}q^{2}(1-s^{2})+e^{2t}q^{2}s^{2}
    =12q2+q2s2+q3t(6+6t+4qt2+2q2t3)+2etq2(1s2)+e2tq2s2\displaystyle=1-2q^{2}+q^{2}s^{2}+q^{3}t\left(6+6t+4qt^{2}+2q^{2}t^{3}\right)+2e^{t}q^{2}(1-s^{2})+e^{2t}q^{2}s^{2}
    12q2+q2s2+10q3t(1+t)+2etq2(1s2)+e2tq2s2,\displaystyle\leq 1-2q^{2}+q^{2}s^{2}+10q^{3}t(1+t)+2e^{t}q^{2}(1-s^{2})+e^{2t}q^{2}s^{2}\,,

    where the last inequality holds because qt21qt^{2}\leq 1 and q2t31q^{2}t^{3}\leq 1 given tlog1pt\leq\log\frac{1}{p}.

    Moreover, given M1=TM_{1}=T and M2=T22DM_{2}=T^{2}-2D,

    M12M2=1+2DM21+2De2,\displaystyle\frac{M_{1}^{2}}{M_{2}}=1+\frac{2D}{M_{2}}\leq 1+2D\leq e^{2}\,,

    where the first inequality holds because M21M_{2}\geq 1 by definition, and the last inequality holds because

    D=(acb2)q(sq)(a)acq(sq)\displaystyle D=(ac-b^{2})q(s-q)\overset{(a)}{\leq}acq(s-q) et(1+tq2+t2q4)q(sq)(b)1+tq2+t2q4(c)3,\displaystyle\leq e^{t}\left(1+tq^{2}+t^{2}q^{4}\right)q(s-q)\overset{(b)}{\leq}1+tq^{2}+t^{2}q^{4}\overset{(c)}{\leq}3\,,

    where (a)(a) holds because b2q(sq)0b^{2}q(s-q)\geq 0; (b)(b) holds because etq1e^{t}q\leq 1 given tlog1pt\leq\log\frac{1}{p}; (c)(c) holds because qt1qt\leq 1, given tlog1pt\leq\log\frac{1}{p}.

  • Suppose t1210t\leq\frac{1}{2^{10}}. Since 1+x+x2/2ex1+x+x2/2+x31+x+x^{2}/2\leq e^{x}\leq 1+x+x^{2}/2+x^{3} for 0x10\leq x\leq 1 and 1+x+x2/2+x3ex1+x+x2/21+x+x^{2}/2+x^{3}\leq e^{x}\leq 1+x+x^{2}/2 for 1x<0-1\leq x<0, we get

    1+t(1q)2+12t2(1q)4\displaystyle 1+t(1-q)^{2}+\frac{1}{2}t^{2}(1-q)^{4} a1+t(1q)2+12t2(1q)4+t3(1q)6,\displaystyle\leq a\leq 1+t(1-q)^{2}+\frac{1}{2}t^{2}(1-q)^{4}+t^{3}(1-q)^{6}\,,
    1tq(1q)+12t2q2(1q)2t3q3\displaystyle 1-tq(1-q)+\frac{1}{2}t^{2}q^{2}(1-q)^{2}-t^{3}q^{3} b1tq(1q)+12t2q2(1q)2,\displaystyle\leq b\leq 1-tq(1-q)+\frac{1}{2}t^{2}q^{2}(1-q)^{2}\,,
    1+tq2+12t2q4\displaystyle 1+tq^{2}+\frac{1}{2}t^{2}q^{4} c1+tq2+12t2q4+t3q6.\displaystyle\leq c\leq 1+tq^{2}+\frac{1}{2}t^{2}q^{4}+t^{3}q^{6}\,.

    It follows that

    T\displaystyle T (1+t(1q)2+12t2(1q)4+t3(1q)6)qs\displaystyle\leq\left(1+t\left(1-q\right)^{2}+\frac{1}{2}t^{2}\left(1-q\right)^{4}+t^{3}\left(1-q\right)^{6}\right)qs
    +2(1tq(1q)+12t2q2(1q)2)q(1s)\displaystyle~{}~{}~{}~{}+2\left(1-tq\left(1-q\right)+\frac{1}{2}t^{2}q^{2}\left(1-q\right)^{2}\right)q\left(1-s\right)
    +(1+tq2+12t2q4+t3q6)(12q+qs)\displaystyle~{}~{}~{}~{}+\left(1+tq^{2}+\frac{1}{2}t^{2}q^{4}+t^{3}q^{6}\right)\left(1-2q+qs\right)
    =1+α1t+α2t2+α3t3,\displaystyle=1+\alpha_{1}t+\alpha_{2}t^{2}+\alpha_{3}t^{3}\,,

    where

    α1\displaystyle\alpha_{1} =(1q)2qs2q(1q)q(1s)+q2(12q+qs)=q(sq),\displaystyle=\left(1-q\right)^{2}qs-2q\left(1-q\right)q\left(1-s\right)+q^{2}\left(1-2q+qs\right)=q(s-q)\,, (83)
    α2\displaystyle\alpha_{2} =12(1q)4qs+q2(1q)2q(1s)+12q4(12q+qs),\displaystyle=\frac{1}{2}\left(1-q\right)^{4}qs+q^{2}\left(1-q\right)^{2}q\left(1-s\right)+\frac{1}{2}q^{4}\left(1-2q+qs\right)\,, (84)
    α3\displaystyle\alpha_{3} =(1q)6qs+q6(12q+qs).\displaystyle=\left(1-q\right)^{6}qs+q^{6}\left(1-2q+qs\right)\,. (85)

    And

    D\displaystyle D (1+t(1q)2+12t2(1q)4)(1+tq2+12t2q4)q(sq)\displaystyle\geq\left(1+t\left(1-q\right)^{2}+\frac{1}{2}t^{2}\left(1-q\right)^{4}\right)\left(1+tq^{2}+\frac{1}{2}t^{2}q^{4}\right)q(s-q)
    (1tq(1q)+12t2q2(1q)2)2q(sq)\displaystyle~{}~{}~{}~{}-\left(1-tq\left(1-q\right)+\frac{1}{2}t^{2}q^{2}\left(1-q\right)^{2}\right)^{2}q(s-q)
    =β1t+β2t2+β3t3+β4t4,\displaystyle=\beta_{1}t+\beta_{2}t^{2}+\beta_{3}t^{3}+\beta_{4}t^{4}\,,

    where

    β1\displaystyle\beta_{1} =(1q)2q(sq)+q3(sq)+2q(1q)q(sq)=q(sq),\displaystyle=\left(1-q\right)^{2}q(s-q)+q^{3}(s-q)+2q\left(1-q\right)q(s-q)=q(s-q)\,, (86)
    β2\displaystyle\beta_{2} =12(1q)4q(sq)+12q5(sq)+(1q)2q3(sq)\displaystyle=\frac{1}{2}\left(1-q\right)^{4}q(s-q)+\frac{1}{2}q^{5}(s-q)+\left(1-q\right)^{2}q^{3}(s-q)
    q2(1q)2q(sq)q2(1q)2q(sq)\displaystyle~{}~{}~{}~{}-q^{2}\left(1-q\right)^{2}q(s-q)-q^{2}\left(1-q\right)^{2}q(s-q)
    =12(14q+4q2)q(sq),\displaystyle=\frac{1}{2}\left(1-4q+4q^{2}\right)q(s-q)\,, (87)
    β3\displaystyle\beta_{3} =12(1q)2q5(sq)+12(1q)4q3(sq)+q(1q)q2(1q)2q(sq)\displaystyle=\frac{1}{2}(1-q)^{2}q^{5}(s-q)+\frac{1}{2}(1-q)^{4}q^{3}(s-q)+q(1-q)q^{2}(1-q)^{2}q(s-q)
    =12(q22q3+q4)q(sq),\displaystyle=\frac{1}{2}\left(q^{2}-2q^{3}+q^{4}\right)q(s-q)\,, (88)
    β4\displaystyle\beta_{4} =14(1q)4q5(sq)14(1q)4q5(sq)=0.\displaystyle=\frac{1}{4}(1-q)^{4}q^{5}(s-q)-\frac{1}{4}(1-q)^{4}q^{5}(s-q)=0\,. (89)

    By (69), we get that

    M2=T22D\displaystyle M_{2}=T^{2}-2D (1+α1t+α2t2+α3t3)22(β1t+β2t2+β3t3+β4t4)\displaystyle\leq\left(1+\alpha_{1}t+\alpha_{2}t^{2}+\alpha_{3}t^{3}\right)^{2}-2\left(\beta_{1}t+\beta_{2}t^{2}+\beta_{3}t^{3}+\beta_{4}t^{4}\right)
    1+γ1t+γ2t2+γ3t3+γ4t4+γ5t5+γ6t6,\displaystyle\leq 1+\gamma_{1}t+\gamma_{2}t^{2}+\gamma_{3}t^{3}+\gamma_{4}t^{4}+\gamma_{5}t^{5}+\gamma_{6}t^{6}\,,

    where

    γ1\displaystyle\gamma_{1} =α1β1,\displaystyle=\alpha_{1}-\beta_{1}\,, (90)
    γ2\displaystyle\gamma_{2} =α12+2α22β2,\displaystyle=\alpha_{1}^{2}+2\alpha_{2}-2\beta_{2}\,, (91)
    γ3\displaystyle\gamma_{3} =2α1α2+2α32β3,\displaystyle=2\alpha_{1}\alpha_{2}+2\alpha_{3}-2\beta_{3}\,, (92)
    γ4\displaystyle\gamma_{4} =α22+2α1α32β4,\displaystyle=\alpha_{2}^{2}+2\alpha_{1}\alpha_{3}-2\beta_{4}\,, (93)
    γ5\displaystyle\gamma_{5} =2α2α3,\displaystyle=2\alpha_{2}\alpha_{3}\,, (94)
    γ6\displaystyle\gamma_{6} =α32.\displaystyle=\alpha_{3}^{2}\,. (95)

    Next, we show γ1=0\gamma_{1}=0 and get a tight bound on γ2\gamma_{2}, which governs our final bound to M2M_{2}. For the remaining terms, we will show i=36γiti=O(t3qs)\sum_{i=3}^{6}\gamma_{i}t^{i}=O(t^{3}qs) so that they can be made negligible later when we apply the bound on M2M_{2} in the proof of Lemma 3 (See the proof of Case 2 of Erdős-Rényi graphs).

    By (83), (86) and (90), we get

    γ1\displaystyle\gamma_{1} =α1β1=0.\displaystyle=\alpha_{1}-\beta_{1}=0\,.

    Recall that ρ=s(1p)1ps\rho=\frac{s(1-p)}{1-ps} and σ2=ps(1ps)\sigma^{2}=ps(1-ps). By (83), (84), (87) and (91), we get

    γ2\displaystyle\gamma_{2} =α12+2α22β2=q2+q2s22q32q3s+2q4=σ4(1+ρ2).\displaystyle=\alpha_{1}^{2}+2\alpha_{2}-2\beta_{2}=q^{2}+q^{2}s^{2}-2q^{3}-2q^{3}s+2q^{4}=\sigma^{4}\left(1+\rho^{2}\right)\,.

    By (83), (84), (85),(88) and (92), we get

    γ3\displaystyle\gamma_{3} =2α1α2+2α32β3\displaystyle=2\alpha_{1}\alpha_{2}+2\alpha_{3}-2\beta_{3}
    =2qs12q2s+q2s2+28q3s4q3s2+q432q4s+4q4s24q5+22q5s+6q612q6s4q7+4q7s\displaystyle=2qs-12q^{2}s+q^{2}s^{2}+28q^{3}s-4q^{3}s^{2}+q^{4}-32q^{4}s+4q^{4}s^{2}-4q^{5}+22q^{5}s+6q^{6}-12q^{6}s-4q^{7}+4q^{7}s
    (a)2qs12q2s+q2s2+28q3s(b)4qs,\displaystyle\overset{(a)}{\leq}2qs-12q^{2}s+q^{2}s^{2}+28q^{3}s\overset{(b)}{\leq}4qs\,,

    where (a)(a) holds because 4q3s2+q40-4q^{3}s^{2}+q^{4}\leq 0, 32q4s+4q4s24q5+22q5s+6q60-32q^{4}s+4q^{4}s^{2}-4q^{5}+22q^{5}s+6q^{6}\leq 0 and 12q6s4q7+4q7s0-12q^{6}s-4q^{7}+4q^{7}s\leq 0; (b)(b) holds because 12q2s+q2s2+28q3s12q2s+q2s2+14q2s4q2s2qs-12q^{2}s+q^{2}s^{2}+28q^{3}s\leq-12q^{2}s+q^{2}s^{2}+14q^{2}s\leq 4q^{2}s\leq 2qs given q12q\leq\frac{1}{2}. By (83), (84), (85), (89) and (93), we get

    γ4\displaystyle\gamma_{4} =α22+2α1α32β4\displaystyle=\alpha_{2}^{2}+2\alpha_{1}\alpha_{3}-2\beta_{4}
    =14(9q2s212q3s56q3s2+4q4+60q4s+144q4s28q5146q5s192q5s2+8q6\displaystyle=\frac{1}{4}(9q^{2}s^{2}-12q^{3}s-56q^{3}s^{2}+4q^{4}+60q^{4}s+144q^{4}s^{2}-8q^{5}-146q^{5}s-192q^{5}s^{2}+8q^{6}
    +200q6s+136q6s212q7136q7s48q7s2+q8+32q8s+16q8s2+16q916q9s)\displaystyle~{}~{}~{}~{}+200q^{6}s+136q^{6}s^{2}-12q^{7}-136q^{7}s-48q^{7}s^{2}+q^{8}+32q^{8}s+16q^{8}s^{2}+16q^{9}-16q^{9}s)
    (a)14(9q2s212q3s56q3s2+4q4+60q4s+144q4s2)(b)13q2s2,\displaystyle\overset{(a)}{\leq}\frac{1}{4}(9q^{2}s^{2}-12q^{3}s-56q^{3}s^{2}+4q^{4}+60q^{4}s+144q^{4}s^{2})\overset{(b)}{\leq}13q^{2}s^{2}\,,

    where (a)(a) holds because

    8q5146q5s192q5s2+8q6+200q6s+136q6s2\displaystyle-8q^{5}-146q^{5}s-192q^{5}s^{2}+8q^{6}+200q^{6}s+136q^{6}s^{2}
    =(8q5+8q6)+(146q5s192q5s2)+(200q6s+136q6s2)\displaystyle=\left(-8q^{5}+8q^{6}\right)+\left(-146q^{5}s-192q^{5}s^{2}\right)+\left(200q^{6}s+136q^{6}s^{2}\right)
    338q5s2+336q6s0,\displaystyle\leq-338q^{5}s^{2}+336q^{6}s\leq 0\,,

    and

    12q7136q7s48q7s2+q8+32q8s+16q8s2+16q9\displaystyle-12q^{7}-136q^{7}s-48q^{7}s^{2}+q^{8}+32q^{8}s+16q^{8}s^{2}+16q^{9}
    =(12q7+q8)+(136q7s48q7s2)+(32q8s+16q8s2+16q9)\displaystyle=\left(-12q^{7}+q^{8}\right)+\left(-136q^{7}s-48q^{7}s^{2}\right)+\left(32q^{8}s+16q^{8}s^{2}+16q^{9}\right)
    184q7s2+64q8s0;\displaystyle\leq-184q^{7}s^{2}+64q^{8}s\leq 0\,;

    (b)(b) holds by holds because 12q3s+4q48q3s-12q^{3}s+4q^{4}\leq-8q^{3}s, 60q4s+144q4s230q3s+72q3s260q^{4}s+144q^{4}s^{2}\leq 30q^{3}s+72q^{3}s^{2}, and then 8q3s56q3s2+30q3s+72q3s224q3s+16q3s232q2s2-8q^{3}s-56q^{3}s^{2}+30q^{3}s+72q^{3}s^{2}\leq 24q^{3}s+16q^{3}s^{2}\leq 32q^{2}s^{2}, given q12q\leq\frac{1}{2}. By (84), (85) and (94), we get

    γ5\displaystyle\gamma_{5} =2α2α3\displaystyle=2\alpha_{2}\alpha_{3}
    =q2(s4qs+2q2+4q2s3q3)\displaystyle=q^{2}\left(s-4qs+2q^{2}+4q^{2}s-3q^{3}\right)
    (s6qs+15q2s20q3s+15q4s6q5s+q52q6+2q6s)\displaystyle~{}~{}~{}~{}\left(s-6qs+15q^{2}s-20q^{3}s+15q^{4}s-6q^{5}s+q^{5}-2q^{6}+2q^{6}s\right)
    q2(s+4qs+2q2+4q2s+3q3)\displaystyle\leq q^{2}\left(s+4qs+2q^{2}+4q^{2}s+3q^{3}\right)
    (s+6qs+15q2s+20q3s+15q4s+6q5s+q5+2q6+2q6s)\displaystyle~{}~{}~{}~{}\left(s+6qs+15q^{2}s+20q^{3}s+15q^{4}s+6q^{5}s+q^{5}+2q^{6}+2q^{6}s\right)
    =q2s2(1+4q+2qp+4q2+3q2p)(1+6q+15q2+20q3+15q4+6q5+q4p+2q5p+2q6)\displaystyle=q^{2}s^{2}\left(1+4q+2qp+4q^{2}+3q^{2}p\right)\left(1+6q+15q^{2}+20q^{3}+15q^{4}+6q^{5}+q^{4}p+2q^{5}p+2q^{6}\right)
    (1468)q2s2210q2s2.\displaystyle\leq\left(14\cdot 68\right)q^{2}s^{2}\leq 2^{10}q^{2}s^{2}\,.

    By (85) and (95), we get

    γ6\displaystyle\gamma_{6} =α32=q2(s6qs+15q2s20q3s+15q4s6q5s+q52q6+2q6s)2\displaystyle=\alpha_{3}^{2}=q^{2}\left(s-6qs+15q^{2}s-20q^{3}s+15q^{4}s-6q^{5}s+q^{5}-2q^{6}+2q^{6}s\right)^{2}
    q2(s+6qs+15q2s+20q3s+15q4s+6q5s+q5+2q6+2q6s)2\displaystyle\leq q^{2}\left(s+6qs+15q^{2}s+20q^{3}s+15q^{4}s+6q^{5}s+q^{5}+2q^{6}+2q^{6}s\right)^{2}
    q2s2(1+6q+15q2+20q3+15q4+6q5+q4p+2q5p+2q6)2(68)2q2s2213q2s2.\displaystyle\leq q^{2}s^{2}\left(1+6q+15q^{2}+20q^{3}+15q^{4}+6q^{5}+q^{4}p+2q^{5}p+2q^{6}\right)^{2}\leq(68)^{2}q^{2}s^{2}\leq 2^{13}q^{2}s^{2}\,.

    Then, we have

    M2\displaystyle M_{2} 1+t2σ4(1+ρ2)+4t3qs+13t4q2s2+210t5q2s2+213t6q2s2\displaystyle\leq 1+t^{2}\sigma^{4}\left(1+\rho^{2}\right)+4t^{3}qs+13t^{4}q^{2}s^{2}+2^{10}t^{5}q^{2}s^{2}+2^{13}t^{6}q^{2}s^{2}
    1+t2σ4(1+ρ2)+8t3qs,\displaystyle\leq 1+t^{2}\sigma^{4}\left(1+\rho^{2}\right)+8t^{3}qs\,,

    where the last inequality holds because 13t4+210t5+212t64t313t^{4}+2^{10}t^{5}+2^{12}t^{6}\leq 4t^{3} given t1210t\leq\frac{1}{2^{10}}.

    Moreover, given M1=TM_{1}=T and M2=T22DM_{2}=T^{2}-2D,

    M12M2=1+2DT21+2De2,\displaystyle\frac{M_{1}^{2}}{M_{2}}=1+\frac{2D}{T^{2}}\leq 1+2D\leq e^{2}\,,

    where the first inequality holds because M21M_{2}\geq 1 by definition, and the second inequality holds because

    D\displaystyle D =(acb2)q(sq)\displaystyle=(ac-b^{2})q(s-q)
    (a)acq(sq)\displaystyle\overset{\rm(a)}{\leq}acq(s-q)
    (1+t(1q)2+12t2(1q)4+t3(1q)6)(1+tq2+12t2q4+t3q6)q(sq)(b)3,\displaystyle\leq\left(1+t\left(1-q\right)^{2}+\frac{1}{2}t^{2}\left(1-q\right)^{4}+t^{3}\left(1-q\right)^{6}\right)\left(1+tq^{2}+\frac{1}{2}t^{2}q^{4}+t^{3}q^{6}\right)q(s-q)\overset{\rm(b)}{\leq}3\,,

    where (a)(a) holds because b2q(sq)0b^{2}q(s-q)\geq 0; (b)(b) holds because 1+t+12t2+t3321+t+\frac{1}{2}t^{2}+t^{3}\leq\frac{3}{2} given t1210t\leq\frac{1}{2^{10}}.

Appendix C Auxiliary results

The following lemma stated in [WXY20, Lemma 10] follows from the Hanson-Wright inequality [HW71, RV13].

Lemma 9 (Hanson-Wright inequality).

Let U,VnU,V\in{\mathbb{R}}^{n} are standard Gaussian vectors such that the pairs (Uk,Vk)𝒩((00),(1ρρ1))(U_{k},V_{k})\sim{\mathcal{N}}\Big{(}\left(\begin{smallmatrix}0\\ 0\end{smallmatrix}\right),\left(\begin{smallmatrix}1&\rho\\ \rho&1\end{smallmatrix}\right)\Big{)} are independent for k=1,,nk=1,\ldots,n. Let Mn×nM\in{\mathbb{R}}^{n\times n} be any deterministic matrix. There exists some universal constant c>0c>0 such that with probability at least 1δ1-\delta,

|UMVρ𝖳𝗋(M)|c(MFlog(1/δ)+M2log(1/δ)).\displaystyle\left|U^{\top}MV-\rho\mathsf{Tr}(M)\right|\leq c\left(\|M\|_{F}\sqrt{\log(1/\delta)}+\|M\|_{2}\log(1/\delta)\right). (96)
Lemma 10.

The function ϕ\phi in (48), namely

ϕ(x)log1x1+xx(log1x)(1+log1x)\displaystyle\phi\left(x\right)\triangleq\frac{\log\frac{1}{x}-1+x}{x\left(\log\frac{1}{x}\right)\left(1+\log\frac{1}{x}\right)}\,

is monotonically decreasing on x(0,1)x\in(0,1).

Proof.

We have

ϕ(x)=ψ(x)x2(log(x)1)2log2(x),\displaystyle\phi^{\prime}(x)=\dfrac{\psi(x)}{x^{2}\left(\log\left(x\right)-1\right)^{2}\log^{2}\left(x\right)}\,, (97)

where

ψ(x)log3(x)+log2(x)+(12x)log(x)+x1.\psi(x)\triangleq\log^{3}\left(x\right)+\log^{2}\left(x\right)+\left(1-2x\right)\log\left(x\right)+x-1\,.

Next, write

ψ(x)=κ(x)x,\displaystyle\psi^{\prime}(x)=\dfrac{\kappa(x)}{x}\,, (98)

where

κ(x)3log2(x)+(22x)log(x)x+1.\kappa(x)\triangleq 3\log^{2}\left(x\right)+\left(2-2x\right)\log\left(x\right)-x+1\,.

Note that for x(0,1)x\in(0,1),

κ(x)=2xlog(x)6log(x)+3x2x0,\kappa^{\prime}(x)=-\dfrac{2x\log\left(x\right)-6\log\left(x\right)+3x-2}{x}\leq 0\,,

where the last inequality holds because 2xlog(x)12x\log\left(x\right)\geq-1 and 6log(x)+3x3-6\log\left(x\right)+3x\geq 3 for x(0,1)x\in(0,1). Thus, κ(x)\kappa(x) is monotone decreasing on x(0,1)x\in(0,1) and then κ(x)>κ(1)=0\kappa(x)>\kappa(1)=0 for x(0,1)x\in(0,1). By (98), ψ(x)0\psi^{\prime}(x)\geq 0 for x(0,1)x\in(0,1), and then ψ(x)\psi(x) is monotone increasing on x(0,1)x\in(0,1) with ψ(x)<ψ(1)=0\psi(x)<\psi(1)=0. Then, by (97), ϕ(x)<0\phi^{\prime}(x)<0 on x(0,1)x\in(0,1), as desired. ∎

Acknowledgement

The authors thank Lele Wang for pointing out an error in the earlier proof of (62).

References

  • [BCL+19] Boaz Barak, Chi-Ning Chou, Zhixian Lei, Tselil Schramm, and Yueqi Sheng. (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. In Advances in Neural Information Processing Systems, pages 9186–9194, 2019.
  • [BCPP98] Rainer E Burkard, Eranda Cela, Panos M Pardalos, and Leonidas S Pitsoulis. The quadratic assignment problem. In Handbook of combinatorial optimization, pages 1713–1809. Springer, 1998.
  • [CK82] Imre Csiszár and János Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, Inc., 1982.
  • [CK16] Daniel Cullina and Negar Kiyavash. Improved achievability and converse bounds for Erdős-Rényi graph matching. ACM SIGMETRICS Performance Evaluation Review, 44(1):63–72, 2016.
  • [CK17] Daniel Cullina and Negar Kiyavash. Exact alignment recovery for correlated Erdős-Rényi graphs. arXiv preprint arXiv:1711.06783, 2017.
  • [CKMP19] Daniel Cullina, Negar Kiyavash, Prateek Mittal, and H Vincent Poor. Partial recovery of Erdős-Rényi graph alignment via k-core alignment. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(3):1–21, 2019.
  • [DAM17] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari. Asymptotic mutual information for the balanced binary stochastic block model. Information and Inference: A Journal of the IMA, 6(2):125–170, 2017.
  • [DMWX20] Jian Ding, Zongming Ma, Yihong Wu, and Jiaming Xu. Efficient random graph matching via degree profiles. Probability Theory and Related Fields, pages 1–87, Sep 2020.
  • [FMWX19a] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph matching and regularized quadratic relaxations I: The Gaussian model. arxiv preprint arXiv:1907.08880, 2019.
  • [FMWX19b] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph matching and regularized quadratic relaxations II: Erdős-Rényi graphs and universality. arxiv preprint arXiv:1907.08883, 2019.
  • [FQRM+16] Soheil Feizi, Gerald Quon, Mariana Recamonde-Mendoza, Muriel Médard, Manolis Kellis, and Ali Jadbabaie. Spectral alignment of networks. arXiv preprint arXiv:1602.04181, 2016.
  • [Gan20] Luca Ganassali. Sharp threshold for alignment of graph databases with gaussian weights. arXiv preprint arXiv:2010.16295, 2020.
  • [GM20] Luca Ganassali and Laurent Massoulié. From tree matching to sparse graph alignment. In Conference on Learning Theory, pages 1633–1665. PMLR, 2020.
  • [GSV05] Dongning Guo, Shlomo Shamai, and Sergio Verdú. Mutual information and minimum mean-square error in gaussian channels. IEEE Trans. on Information Theory, 51, 2005.
  • [HM20] Georgina Hall and Laurent Massoulié. Partial recovery in the graph alignment problem. arXiv preprint arXiv:2007.00533, 2020.
  • [HW71] David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079–1083, 1971.
  • [HWX17] Bruce Hajek, Yihong Wu, and Jiaming Xu. Information limits for recovering a hidden community. IEEE Trans. on Information Theory, 63(8):4729 – 4745, 2017.
  • [LFF+16] Vince Lyzinski, Donniell Fishkind, Marcelo Fiori, Joshua Vogelstein, Carey Priebe, and Guillermo Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis & Machine Intelligence, 38(1):60–73, 2016.
  • [MMRU09] Cyril Méasson, Andrea Montanari, Thomas J Richardson, and Rüdiger Urbanke. The generalized area theorem and some of its consequences. IEEE Transactions on Information Theory, 55(11):4793–4821, 2009.
  • [PG11] Pedram Pedarsani and Matthias Grossglauser. On the privacy of anonymized networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1235–1243, 2011.
  • [PW94] Panos M Pardalos and Henry Wolkowicz. Quadratic Assignment and Related Problems: DIMACS Workshop, May 20-21, 1993, volume 16. American Mathematical Soc., 1994.
  • [RV13] Mark Rudelson and Roman Vershynin. Hanson-Wright inequality and sub-Gaussian concentration. Electronic Communications in Probability, 18, 2013.
  • [RXZ19] Galen Reeves, Jiaming Xu, and Ilias Zadik. The all-or-nothing phenomenon in sparse linear regression. In Conference on Learning Theory, pages 2652–2663. PMLR, 2019.
  • [Wri71] Edward M Wright. Graphs on unlabelled nodes with a given number of edges. Acta Mathematica, 126(1):1–9, 1971.
  • [WXY20] Yihong Wu, Jiaming Xu, and Sophie H. Yu. Testing correlation of unlabeled random graphs. arXiv preprint arXiv:2008.10097, 2020.