This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the permanent of a random symmetric matrix

Matthew Kwan Department of Mathematics, Stanford University, Stanford, CA. Email: [email protected]. Research supported by NSF Award DMS-1953990.    Lisa Sauermann School of Mathematics, Institute for Advanced Study, Princeton, NJ. Email: [email protected]. Research supported by NSF Grant CCF-1900460 and NSF Award DMS-1953772. Part of this work was completed while this author was a Szegő Assistant Professor at Stanford University.
Abstract

Let MnM_{n} denote a random symmetric n×nn\times n matrix, whose entries on and above the diagonal are i.i.d. Rademacher random variables (taking values ±1\pm 1 with probability 1/21/2 each). Resolving a conjecture of Vu, we prove that the permanent of MnM_{n} has magnitude nn/2+o(n)n^{n/2+o(n)} with probability 1o(1)1-o(1). Our result can also be extended to more general models of random matrices. In our proof, we build on and extend some techniques introduced by Tao and Vu, studying the evolution of permanents of submatrices in a random matrix process.

Keywords: random symmetric matrix, permanent.
MSC Subject Classification: 60B20, 15A15.

1 Introduction

Two of the most basic matrix parameters are the determinant and the permanent: for an n×nn\times n matrix M=(xi,j)i,jM=(x_{i,j})_{i,j}, define

det(M)=πSnsign(π)i=1nxi,π(i)andper(M)=πSni=1nxi,π(i).\det(M)=\sum_{\pi\in S_{n}}\operatorname{sign}(\pi)\prod_{i=1}^{n}x_{i,\pi(i)}\quad\textnormal{and}\quad\operatorname{per}(M)=\sum_{\pi\in S_{n}}\prod_{i=1}^{n}x_{i,\pi(i)}. (1.1)

A central direction of research in probabilistic combinatorics and random matrix theory is to understand the determinant and permanent of different types of random matrices. For example, let AnA_{n} be an n×nn\times n matrix whose entries are i.i.d. Rademacher-distributed random variables, taking values ±1\pm 1 with probability 1/21/2 each (this is often called a random Bernoulli matrix). A classical theorem of Komlós [19] (perhaps the foundational theorem in discrete random matrix theory) is that Pr(detAn=0)=o(1)\Pr(\det A_{n}=0)=o(1) as nn\to\infty. That is to say, AnA_{n} is asymptotically almost surely nonsingular. Since then, there has been intensive effort to refine our understanding of the singularity probability (see [18, 31, 32, 28, 5]), culminating in a recent breakthrough of Tikhomirov [36] proving that Pr(detAn=0)=2n+o(n)\Pr(\det A_{n}=0)=2^{-n+o(n)}. The problem of estimating the order of magnitude of detAn\det A_{n} has also received significant attention: Tao and Vu [31] proved that with probability 1o(1)1-o(1) we have |detAn|=nn/2+o(n)\mathopen{}\mathclose{{}\left|\det A_{n}}\right|=n^{n/2+o(n)}, and later Nguyen and Vu [26] proved a central limit theorem for log|detAn|\log|\det A_{n}| (see also [15, 16]).

Most of the above-mentioned results generalise readily to more general types of random matrices, where the entries are independently sampled from any subgaussian distribution. There has also been intensive interest in random matrices with dependence between the entries. Perhaps the most prominent examples are symmetric random matrices. Let MnM_{n} be the random matrix whose entries on and above the diagonal are independent Rademacher random variables, and the entries below the diagonal are chosen to make the matrix symmetric (equivalently, we can choose a random matrix AnA_{n} as above and condition on the event that AnA_{n} is symmetric). The study of random symmetric matrices has necessitated the development of new tools, but by now there is a fairly complete understanding of the determinant of a random symmetric matrix with Rademacher entries. The fact that Pr(detMn=0)=o(1)\Pr(\det M_{n}=0)=o(1) was first proved by Costello, Tao and Vu [10] (see also [12]), and stronger estimates on Pr(detMn=0)\Pr(\det M_{n}=0) were obtained by several authors [24, 38, 13, 9]. It is also known that with probability 1o(1)1-o(1) we have |detMn|=nn/2+o(n)|\det M_{n}|=n^{n/2+o(n)} (this follows from work on the least singular value of MnM_{n} due to Nguyen [25] and Rudelson [38], together with Wigner’s celebrated semicircle law [43, 44]), and a central limit theorem for log|detMn|\log|\det M_{n}| was proved by Bourgade and Mody [4] (see also [34]).

It is widely believed that for all the above-mentioned theorems concerning determinants of random matrices (symmetric or not), there should be analogous theorems for permanents. However, the permanent appears to be a much more challenging parameter to study. For example, while the determinant encodes information about linear dependence and can be interpreted as the (signed) volume of a certain parallelepiped, it only seems possible to attack the permanent from a “combinatorial” point of view, directly considering the formal definition in Equation 1.1. The fact that the permanent is harder to study is maybe not surprising, since (in contrast to the determinant) the permanent of a matrix is #\#P-hard to compute (as was famously proved by Valiant [37]). Even the analogue of the singularity problem, to show that Pr(perAn=0)=o(1)\Pr(\operatorname{per}A_{n}=0)=o(1), was open for a long time. In 2009, Tao and Vu [33] finally resolved this problem and also estimated the typical magnitude of perAn\operatorname{per}A_{n}: namely, they proved that asymptotically almost surely |perAn|=nn/2+o(n)\mathopen{}\mathclose{{}\left|\operatorname{per}A_{n}}\right|=n^{n/2+o(n)}. Perhaps surprisingly, permanents of random matrices turn out to be of interest in quantum computing: Aaronson and Arkhipov [2] proved that quantum computers cannot be efficiently simulated by classical computers, conditional on a conjecture which strengthens the aforementioned Tao–Vu permanent theorem (see also [1, 11, 22]).

In this paper we study the permanent of random symmetric matrices. More precisely, we estimate the typical magnitude of the permanent of a random symmetric matrix with Rademacher entries.

Theorem 1.1.

Asymptotically almost surely, |perMn|=nn/2+o(n)|\operatorname{per}M_{n}|=n^{n/2+o(n)}.

The study of the permanent of a random symmetric matrix seems to have first been explicitly suggested by Tao and Vu (see [33, Remark 1.6]). They observed that their arguments for the permanent of a (not necessarily symmetric) random matrix “do not seem to easily yield any non-trivial result for the permanent of a random symmetric Bernoulli matrix”. The statement of Theorem 1.1 has been conjectured by Vu in 2009 (see [39]). He also mentioned the conjecture in a recent survey [40, Conjecture 6.11], and described it as “the still missing piece of the picture” regarding determinants and permanents of random discrete matrices.

Theorem 1.1 is actually a combination of two different results. First, the following proposition gives an upper bound on |perMn||\operatorname{per}M_{n}|.

Proposition 1.2.

For any ε>0\varepsilon>0, if nn is sufficiently large with respect to ε\varepsilon, we have

Pr(|perMn|nn/2+εn)nεn.\Pr\mathopen{}\mathclose{{}\left(|\operatorname{per}M_{n}|\geq n^{n/2+\varepsilon n}}\right)\leq n^{-\varepsilon n}.

Proposition 1.2 is easily proved using an estimate for 𝔼[(perMn)2]\mathbb{E}[(\operatorname{per}M_{n})^{2}] and Markov’s inequality; see Section 2. The main role of this paper is to prove the following lower bound on |perMn||\operatorname{per}M_{n}|.

Theorem 1.3.

There is a positive constant c>0c>0 such that for any ε>0\varepsilon>0 the following holds. If nn is sufficiently large with respect to ε\varepsilon, we have

Pr(|perMn|nn/2εn)nc.\Pr\mathopen{}\mathclose{{}\left(|\operatorname{per}M_{n}|\leq n^{n/2-\varepsilon n}}\right)\leq n^{-c}.

We remark that the constant cc in Theorem 1.3 can be made explicit. For example, c=1/150c=1/150 certainly suffices, though this can be improved substantially simply by optimising constants throughout the proof. However, without new ideas our methods do not seem to be capable of proving the statement of Theorem 1.3 with any c1/2c\geq 1/2. This state of affairs is essentially the same as for the non-symmetric case previously considered by Tao and Vu [33].

Of course, Theorem 1.3 also shows that the probability of having perMn=0\operatorname{per}M_{n}=0 is polynomially small. Before this paper no nontrivial bounds for this probability were known, except when n=2m1n=2^{m}-1 with mm\in\mathbb{N}, where for elementary number-theoretic reasons it is actually impossible to have perMn=0\operatorname{per}M_{n}=0 (see [30]111Actually, this result is part of an extensive body of research concerning permanents of (non-random) matrices with ±1\pm 1 entries; see for example [41, 6, 7, 8, 30, 42, 20, 29, 21].).

Finally, we remark that the methods used to prove Theorem 1.1 are quite robust, and analogous theorems can be proved for much more general distributions. For example, consider any fixed real probability distributions μ\mu and ν\nu, and let Mnμ,νM_{n}^{\mu,\nu} be the random symmetric matrix whose diagonal entries have distribution ν\nu and whose off-diagonal entries have distribution μ\mu (and whose entries on and above the diagonal are mutually independent). With some fairly mild assumptions on μ\mu and ν\nu it is a routine matter to prove an analogue of Proposition 1.2 for Mnμ,νM_{n}^{\mu,\nu}, and in Section 8 we sketch how to make some minor adaptations to the proof of Theorem 1.3 to obtain a version that holds for very general distributions (we only require that μ\mu has nontrivial support). In particular, one can prove an analogue of Theorem 1.1 for random symmetric Gaussian matrices such as the Gaussian Orthogonal Ensemble (GOE).

Notation. In this paper, we use the notation ={1,2,}\mathbb{N}=\{1,2,\dots\} for the positive integers. All logarithms are to base ee. For functions f:f:\mathbb{N}\to\mathbb{R} and g:>0g:\mathbb{N}\to\mathbb{R}_{>0}, we write f=o(g)f=o(g) if f(n)/g(n)0f(n)/g(n)\to 0 as nn\to\infty.

2 Proof of Proposition 1.2: the second moment of the permanent

In this section we provide the simple proof of Proposition 1.2. It will be an immediate consequence of Markov’s inequality and the following lemma.

Lemma 2.1.

𝔼[(perMn)2]nn+o(n)\mathbb{E}[(\operatorname{per}M_{n})^{2}]\leq n^{n+o(n)}.

Proof.

Write xi,jx_{i,j} for the (i,j)(i,j) entry of MnM_{n}. For a permutation πSn\pi\in S_{n}, let Xπ=i=1nxi,π(i)X_{\pi}=\prod_{i=1}^{n}x_{i,\pi(i)}, so that perMn=πXπ\operatorname{per}M_{n}=\sum_{\pi}X_{\pi}. It will not be necessary for the proof, but we remark that

𝔼Xπ={1if π consists only of 2-cycles,0otherwise.\mathbb{E}X_{\pi}=\begin{cases}1&\text{if }\pi\text{ consists only of 2-cycles,}\\ 0&\text{otherwise}.\end{cases}

Furthermore, let Iπ{1,,n}I_{\pi}\subseteq\{1,\dots,n\} be the set of indices which appear in 2-cycles of π\pi, and let FπF_{\pi} be the family of sets {i,π(i)}\{i,\pi(i)\}, for iIπi\notin I_{\pi}. Then for two permutations π,πSn\pi,\pi^{\prime}\in S_{n}, we have

𝔼[XπXπ]={1if (Iπ,Fπ)=(Iπ,Fπ),0otherwise.\mathbb{E}[X_{\pi}X_{\pi^{\prime}}]=\begin{cases}1&\text{if }(I_{\pi},F_{\pi})=(I_{\pi^{\prime}},F_{\pi^{\prime}}),\\ 0&\text{otherwise}.\end{cases}

For k=0,,nk=0,\dots,n, let 𝒬k\mathcal{Q}_{k} be the set of all permutations πSn\pi\in S_{n} satisfying |Iπ|=k|I_{\pi}|=k, and note that |𝒬k|(nk)kk/2(nk)!2nkk/2nnk|\mathcal{Q}_{k}|\leq\binom{n}{k}k^{k/2}(n-k)!\leq 2^{n}k^{k/2}n^{n-k}. Indeed, for any choice of a set I{1,,n}I\subseteq\{1,\dots,n\} of size kk, there are at most kk/2k^{k/2} ways to partition II into 2-cycles, and at most (nk)!(n-k)! ways to choose a permutation of {1,,n}I\{1,\dots,n\}\setminus I.

Now, for any 0kn0\leq k\leq n and any π𝒬k\pi\in\mathcal{Q}_{k}, there are at most 2nkkk/22^{n-k}k^{k/2} choices of π\pi^{\prime} satisfying (Iπ,Fπ)=(Iπ,Fπ)(I_{\pi},F_{\pi})=(I_{\pi^{\prime}},F_{\pi^{\prime}}). Indeed, for such π\pi^{\prime}, the restriction of π\pi^{\prime} to IπI_{\pi} must be a permutation of IπI_{\pi} consisting only of 2-cycles (so there are at most kk/2k^{k/2} ways to choose this restriction), and for each iIπi\notin I_{\pi} we must have π(i){π(i),π1(i)}\pi^{\prime}(i)\in\{\pi(i),\pi^{-1}(i)\}. It follows that

𝔼[(perMn)2]=π,πSn𝔼XπXπk=0n|𝒬k|2nkkk/2k=0n4nkknnknn+o(n),\mathbb{E}[(\operatorname{per}M_{n})^{2}]=\sum_{\pi,\pi^{\prime}\in S_{n}}\mathbb{E}X_{\pi}X_{\pi^{\prime}}\leq\sum_{k=0}^{n}|\mathcal{Q}_{k}|\cdot 2^{n-k}k^{k/2}\leq\sum_{k=0}^{n}4^{n}k^{k}n^{n-k}\leq n^{n+o(n)},

as claimed. ∎

Proof of Proposition 1.2.

Let ε>0\varepsilon>0, and suppose that nn is sufficiently large such that the o(n)o(n)-term in Lemma 2.1 is at most εn\varepsilon n. Then we have 𝔼[(perMn)2]nn+εn\mathbb{E}[(\operatorname{per}M_{n})^{2}]\leq n^{n+\varepsilon n} and consequently by Markov’s inequality

Pr(|perMn|nn/2+εn)=Pr((perMn)2nn+2εn)𝔼[(perMn)2]nn+2εnnεn,\Pr\mathopen{}\mathclose{{}\left(|\operatorname{per}M_{n}|\geq n^{n/2+\varepsilon n}}\right)=\Pr\mathopen{}\mathclose{{}\left((\operatorname{per}M_{n})^{2}\geq n^{n+2\varepsilon n}}\right)\leq\frac{\mathbb{E}[(\operatorname{per}M_{n})^{2}]}{n^{n+2\varepsilon n}}\leq n^{-\varepsilon n},

as desired. ∎

3 Structure of the proof of Theorem 1.3

The rest of this paper is devoted to the proof of Theorem 1.3. In this section we outline the high-level strategy of the proof, stating two key lemmas and deducing Theorem 1.3 from them.

We couple the distributions of the matrices MnM_{n} for all nn\in\mathbb{N} by viewing each MnM_{n} as containing the first nn rows and columns of an infinite random symmetric matrix (with Rademacher entries). Say that subsets of a given ground set are complement-disjoint if their complements are disjoint. For A,B{1,,n}A,B\subseteq\{1,\dots,n\}, let Mn[A,B]M_{n}[A,B] be the submatrix of MnM_{n} consisting of the rows in AA and the columns in BB. For λ0\lambda\geq 0, we say that a matrix is λ\lambda-heavy if its permanent has absolute value at least λ\lambda.

The following lemma shows that with high probability there exists a heavy submatrix of MnM_{n} consisting of almost all the rows and columns of MnM_{n}, and moreover we have some control over which rows and columns are not included. Roughly speaking, it is proved by studying how permanents of submatrices evolve in the sequence of random matrices M1,,MnM_{1},\dots,M_{n}.

Lemma 3.1.

There is a positive constant c>0c>0 such that for any ε>0\varepsilon>0 the following holds. Let nn\in\mathbb{N} be sufficiently large with respect to ε\varepsilon, and let L=(logn)/10L=\lfloor(\log n)/10\rfloor. Let XX and YY be disjoint subsets of {1,,n}\{1,\dots,n\} with sizes |X|=L|X|=L and |Y|=3L|Y|=3L. Then with probability at least 1(1/4)nc1-(1/4)\cdot n^{-c} there is a set BB satisfying |B|=nL|B|=n-L and {1,,n}YB{1,,n}\{1,\dots,n\}\setminus Y\subseteq B\subseteq\{1,\dots,n\}, such that Mn[{1,,n}X,B]M_{n}[\{1,\dots,n\}\setminus X,B] is n(1ε)n/2n^{(1-\varepsilon)n/2}-heavy.

By applying Lemma 3.1 with various different choices of XX and YY, we can obtain many heavy submatrices Mn[A1,B1],,Mn[Am,Bm]M_{n}[A_{1},B_{1}],\dots,M_{n}[A_{m},B_{m}] such that the sets A1,,Am,B1,,BmA_{1},\dots,A_{m},B_{1},\dots,B_{m} are complement-disjoint. The next lemma states that in such a situation, if we sample Mn+1M_{n+1} by adding a random row and column to MnM_{n}, then a large proportion of our submatrices can be transformed into larger submatrices without losing much of their heaviness. We will apply this lemma repeatedly, each step decreasing by 1 the number of rows and columns that our submatrices are missing.

Lemma 3.2.

Let mm\in\mathbb{N} be sufficiently large. Let λ>0\lambda>0, let 1L<n1\leq L<n be integers, and let A1,,Am,B1,,BmA_{1},\dots,A_{m},B_{1},\dots,B_{m} be complement-disjoint subsets of {1,,n}\{1,\dots,n\} of size nLn-L. Let us condition on an outcome of MnM_{n} such that all the submatrices Mn[A,B]M_{n}[A_{\ell},B_{\ell}], for =1,,m\ell=1,\dots,m, are λ\lambda-heavy.

Then, with probability at least 1m1/241-m^{-1/24}, for m=m/36m^{\prime}=\mathopen{}\mathclose{{}\left\lceil m/36}\right\rceil there are complement-disjoint subsets A1,,Am,B1,,Bm{1,,n+1}A_{1}^{\prime},\dots,A_{m^{\prime}}^{\prime},B_{1}^{\prime},\dots,B_{m^{\prime}}^{\prime}\subseteq\{1,\dots,n+1\} of size nL+2n-L+2, such that for all =1,,m\ell=1,\dots,m^{\prime} the submatrices Mn+1[A,B]M_{n+1}[A_{\ell}^{\prime},B_{\ell}^{\prime}] are λ/(4n4)\lambda/(4n^{4})-heavy.

We now show how to deduce Theorem 1.3 from Lemmas 3.1 and 3.2.

Proof of Theorem 1.3.

Choose an absolute constant 0<c<1/500<c<1/50, such that the statement in Lemma 3.1 is satisfied. Fix ε>0\varepsilon>0.

Let L=(logn)/10L=\lfloor(\log n)/10\rfloor and m=(nL)/(4L)m=\lfloor(n-L)/(4L)\rfloor, and consider disjoint sets X1,,Xm,Y1,,Ym{1,,nL}X_{1},\dots,X_{m},Y_{1},\dots,Y_{m}\subseteq\{1,\dots,n-L\} with |X1|==|Xm|=L|X_{1}|=\dots=|X_{m}|=L and |Y1|==|Ym|=3L|Y_{1}|=\dots=|Y_{m}|=3L.

For each =1,,m\ell=1,\dots,m, we apply Lemma 3.1 to the subsets X,Y{1,,nL}X_{\ell},Y_{\ell}\subseteq\{1,\dots,n-L\}. Each application fails with probability at most nc/4n^{-c}/4, so it follows from Markov’s inequality (see for example Lemma 4.8) that with probability at least 1(1/2)nc1-(1/2)\cdot n^{-c}, at least m/2m/2 applications succeed. That is to say, with m=m/2m^{\prime}=\mathopen{}\mathclose{{}\left\lceil m/2}\right\rceil and λ=(nL)(1ε)(nL)/2\lambda=(n-L)^{(1-\varepsilon)(n-L)/2}, we obtain complement-disjoint sets A1,,Am,B1,,Bm{1,,nL}A_{1},\dots,A_{m^{\prime}},B_{1},\dots,B_{m^{\prime}}\subseteq\{1,\dots,n-L\} of size n2Ln-2L such that for all =1,,m\ell=1,\dots,m^{\prime}, the matrices MnL[A,B]M_{n-L}[A_{\ell},B_{\ell}] are λ\lambda-heavy. Note that if nn is sufficiently large with respect to ε\varepsilon, then mn9/10m^{\prime}\geq n^{9/10} and λnn/2(3/4)εn\lambda\geq n^{n/2-(3/4)\varepsilon n}.

Now, we wish to iteratively apply Lemma 3.2, LL times in total. After each application, mm^{\prime} decreases by a factor of 36<e436<e^{4}, so after L=(logn)/10L=\lfloor(\log n)/10\rfloor steps the value of mm^{\prime} will still be at least n\sqrt{n}. Each of the LL applications of Lemma 3.2 succeeds with probability at least 1n1/481-n^{-1/48}. Thus, with probability at least 1Ln1/481(1/2)nc1-L\cdot n^{-1/48}\geq 1-(1/2)\cdot n^{-c} (for sufficiently large nn) we can indeed apply the lemma LL times. In the end we obtain subsets A,B{1,,n}A^{\prime},B^{\prime}\subseteq\{1,\dots,n\} of size nn such that the matrix Mn[A,B]M_{n}[A^{\prime},B^{\prime}] is λ\lambda^{\prime}-heavy, where

λ=λ4(nL)44(nL+1)44(n1)4λ(4n4)Lλn5Lnn/2(3/4)εnlogn/2nn/2εn\lambda^{\prime}=\frac{\lambda}{4(n-L)^{4}\cdot 4(n-L+1)^{4}\dotsm 4(n-1)^{4}}\geq\frac{\lambda}{(4n^{4})^{L}}\geq\frac{\lambda}{n^{5L}}\geq n^{n/2-(3/4)\varepsilon n-\log n/2}\geq n^{n/2-\varepsilon n}

(again assuming that nn is sufficiently large with respect to ε\varepsilon). But note that we must have A=B={1,,n}A^{\prime}=B^{\prime}=\{1,\dots,n\}, so this means that MnM_{n} itself is λ\lambda^{\prime}-heavy. In summary, with probability at least 1nc1-n^{-c} we have |perMn|λnn/2εn|\operatorname{per}M_{n}|\geq\lambda^{\prime}\geq n^{n/2-\varepsilon n}, as desired. ∎

We remark that the overall structure of our proof is similar to the work of Tao and Vu [33] on permanents of (not necessarily symmetric) random matrices. Indeed, Tao and Vu’s proof can also be broken up into two parts analogous to Lemmas 3.1 and 3.2. However, in Tao and Vu’s setting, all entries of the random matrix are independent, allowing them to expose the entries row by row. After exposing kk rows, they consider k×kk\times k submatrices that consist of all the kk exposed rows (and of kk of the nn columns). When exposing the kk-th row, the permanent of any such k×kk\times k submatrix can be described as a linear polynomial in some of the entries of the new row, where the coefficients are given by the permanents of certain (k1)×(k1)(k-1)\times(k-1) submatrices in the first k1k-1 rows. In contrast, in our setting with the random symmetric matrix MnM_{n}, we are forced to expose the entries of our matrix in a different way: at the kk-th step we reveal the entries in MkM_{k} that are not present in Mk1M_{k-1} (that is, we add a new random row and column, with equal entries, to the matrix considered so far222This type of exposure is standard in the study of symmetric random matrices (see for example [10]).). Since there is only one k×kk\times k submatrix in MkM_{k} (namely MkM_{k} itself), in our setting we also need to consider the permanents of (substantially) smaller submatrices of MkM_{k}.

This more intricate strategy introduces significant challenges. Most notably, the permanents of the submatrices of MkM_{k} are described by quadratic polynomials in the new matrix entries, where the coefficients depend on the permanents of certain submatrices of Mk1M_{k-1} (this is in contrast to Tao and Vu’s setting, where the permanents are described by linear polynomials in the entries of the new row). This necessitates the use of some more sophisticated probabilistic tools. Furthermore, there can be certain types of cancellations within these quadratic polynomials, which are not possible for the linear polynomials in the Tao–Vu setting. For example, even if all submatrices of Mk1M_{k-1} have non-zero permanent, it can happen that the polynomial describing the permanent of some submatrix of MkM_{k} has only very few nonzero coefficients. Handling these types of cancellations requires key new ideas.

Organization of the rest of the paper. Lemma 3.1 will be proved in Section 6, and Lemma 3.2 will be proved in Section 7. As preparation, in Section 4 we collect some probabilistic tools that we will use in the proofs, and in Section 5 we collect some lemmas that can be obtained by studying permanent expansion formulas.

4 Probabilistic tools

This section collects some theorems and simple facts that will be needed for proving Lemmas 3.1 and 3.2. We start with some basic anti-concentration estimates for linear forms. The first of these is the famous Erdős–Littlewood–Offord inequality (see for example [35, Corollary 7.8]).

Theorem 4.1.

Let t1t\geq 1 be a real number, and let ff be a linear polynomial in nn variables, in which at least mm degree-11 coefficients have absolute value at least rr. Then for uniformly random 𝛏{1,1}n\boldsymbol{\xi}\in\{-1,1\}^{n} we have

Pr(|f(𝝃)|tr)(t+1)(mm/2)2m3tm.\Pr\mathopen{}\mathclose{{}\left(|f(\boldsymbol{\xi})|\leq t\cdot r}\right)\leq(\lceil t\rceil+1)\cdot\binom{m}{\lfloor m/2\rfloor}\cdot 2^{-m}\leq\frac{3t}{\sqrt{m}}.

We will also need the following very easy fact.

Fact 4.2.

Let ff be a linear polynomial in nn variables, which has at least one coefficient with absolute value at least rr. Then for uniformly random 𝛏{1,1}n\boldsymbol{\xi}\in\{-1,1\}^{n} we have

Pr(|f(𝝃)|<r)12.\Pr\mathopen{}\mathclose{{}\left(|f(\boldsymbol{\xi})|<r}\right)\leq\frac{1}{2}.
Proof.

First, suppose that the constant coefficient of ff has absolute value at least rr, and suppose without loss of generality that f(𝝃)=a1ξ1++anξn+cf(\boldsymbol{\xi})=a_{1}\xi_{1}+\dots+a_{n}\xi_{n}+c with crc\geq r. Then, by symmetry we have that Pr(a1ξ1++anξn<0)1/2\Pr(a_{1}\xi_{1}+\dots+a_{n}\xi_{n}<0)\leq 1/2 and therefore Pr(f(𝝃)<r)1/2\Pr(f(\boldsymbol{\xi})<r)\leq 1/2.

Otherwise, for some i{1,,n}i\in\{1,\dots,n\}, the coefficient of ξi\xi_{i} in f(𝝃)f(\boldsymbol{\xi}) has absolute value at least rr, and suppose without loss of generality that i=1i=1. Condition on any outcomes of the variables ξ2,,ξn\xi_{2},\dots,\xi_{n}, and observe that then we can have |f(𝝃)|<r|f(\boldsymbol{\xi})|<r for at most one of the two possible outcomes of ξ1\xi_{1}. ∎

We will also need counterparts of both the above statements for quadratic polynomials. The quadratic counterpart of Fact 4.2 is again easy to prove.

Fact 4.3.

Let ff be a quadratic polynomial in nn variables, which has at least one multilinear degree-2 coefficient with absolute value at least rr. Then for uniformly random 𝛏{1,1}n\boldsymbol{\xi}\in\{-1,1\}^{n} we have

Pr(|f(𝝃)|<r)34.\Pr(|f(\boldsymbol{\xi})|<r)\leq\frac{3}{4}.
Proof.

We may assume that ff is multilinear (every term of the form ξi2\xi_{i}^{2} can be replaced by the constant 11 without changing the behaviour of f(𝝃)f(\boldsymbol{\xi})). Suppose without loss of generality that the coefficient a12a_{12} of ξ1ξ2\xi_{1}\xi_{2} satisfies |a12|r|a_{12}|\geq r, and write f(ξ1,,ξn)=ξ1(a12ξ2+g(ξ3,,ξn))+h(ξ2,,ξn)f(\xi_{1},\dots,\xi_{n})=\xi_{1}\cdot(a_{12}\xi_{2}+g(\xi_{3},\dots,\xi_{n}))+h(\xi_{2},\dots,\xi_{n}). Conditioning on any outcomes of ξ3,,ξn\xi_{3},\dots,\xi_{n}, with probability at least 1/21/2 we have |a12ξ2+g(ξ3,,ξn)|r|a_{12}\xi_{2}+g(\xi_{3},\dots,\xi_{n})|\geq r. Then, conditioning on such an outcome of ξ2\xi_{2}, we have |f(𝝃)|r|f(\boldsymbol{\xi})|\geq r with probability at least 1/21/2. ∎

It is more delicate to generalise the Erdős–Littlewood–Offord inequality to quadratic polynomials. For a multilinear quadratic polynomial ff in the variables x1,,xnx_{1},\dots,x_{n} and for a real number r>0r>0, let G(r)(f)G^{(r)}(f) be the graph with vertex set {1,,n}\{1,\dots,n\} having an edge ijij whenever the coefficient of xixjx_{i}x_{j} in ff has absolute value at least rr. Let ν(G)\nu(G) be the matching number333The matching number of a graph GG is the largest number ν\nu such that one can find ν\nu disjoint edges in GG. of a graph GG. The following is a special case of a theorem proved by Meka, Nguyen and Vu [23, Theorem 1.6].

Theorem 4.4.

Let r>0r>0, let ff be a multilinear quadratic polynomial in nn variables, and let ν=ν(G(r)(f))3\nu=\nu(G^{(r)}(f))\geq 3. Then for uniformly random 𝛏{1,1}n\boldsymbol{\xi}\in\{-1,1\}^{n} we have

Pr(|f(𝝃)|r)(logν)Cν1/2,\Pr(|f(\boldsymbol{\xi})|\leq r)\leq\frac{(\log\nu)^{C}}{\nu^{1/2}},

where CC is an absolute constant.

The following concentration inequality is a special case of the Azuma–Hoeffding martingale concentration inequality, and is sometimes known as McDiarmid’s inequality (see for example [35, Lemma 1.34]).

Lemma 4.5.

Let c>0c>0 and let XX be a random variable defined in terms of independent random variables ξ1,,ξn\xi_{1},\dots,\xi_{n}, having the property that varying any individual ξi\xi_{i} affects the value of XX by at most cc. Then for any t0t\geq 0 we have

Pr(|X𝔼X|t)2et2/(2nc2).\Pr\mathopen{}\mathclose{{}\left(|X-\mathbb{E}X|\geq t}\right)\leq 2e^{-t^{2}/(2nc^{2})}.

The next inequality is a one-sided version of the Azuma–Hoeffding inequality for supermartingales (see [33, Lemma 2.3]).

Lemma 4.6.

Let c>0c>0. In a probability space, let Z1,,ZnZ_{1},\dots,Z_{n} be a sequence of random objects, and let W1,,WnW_{1},\dots,W_{n} be a sequence of random variables, such that for each kk, all of Z1,,Zk,W1,,WkZ_{1},\dots,Z_{k},W_{1},\dots,W_{k} are fully determined by ZkZ_{k}, and such that |Wk+1Wk|c|W_{k+1}-W_{k}|\leq c for all k=1,,n1k=1,\dots,n-1. Suppose that the supermartingale property 𝔼[Wk+1|Zk]Wk\mathbb{E}[W_{k+1}|Z_{k}]\leq W_{k} is satisfied for k=1,,n1k=1,\dots,n-1. Then for any t>0t>0 we have

Pr(WnW1t)et2/(2nc2).\Pr\mathopen{}\mathclose{{}\left(W_{n}-W_{1}\geq t}\right)\leq e^{-t^{2}/(2nc^{2})}.

Recall that for 0<p<10<p<1, a Bernoulli random variable χBer(p)\chi\sim\operatorname{Ber}(p) is a random variable taking values 0 and 11 with Pr(χ=1)=p\Pr(\chi=1)=p and Pr(χ=0)=1p\Pr(\chi=0)=1-p. The following lemma is a version of the Chernoff concentration bound for sums of Bernoulli random variables (see for example [3, Theorem A.1.4]).

Lemma 4.7.

Let χ1,,χm\chi_{1},\dots,\chi_{m} be independent Bernoulli random variables, where for each i=1,,mi=1,\dots,m we have χiBer(pi)\chi_{i}\sim\operatorname{Ber}(p_{i}) for some 0<pi<10<p_{i}<1. Then for any t>0t>0 the sum X=χ1++χmX=\chi_{1}+\dots+\chi_{m} satisfies

Pr(X𝔼X>t)<e2t2/nandPr(X𝔼X<t)<e2t2/n.\Pr\mathopen{}\mathclose{{}\left(X-\mathbb{E}X>t}\right)<e^{-2t^{2}/n}\quad\text{and}\quad\Pr\mathopen{}\mathclose{{}\left(X-\mathbb{E}X<-t}\right)<e^{-2t^{2}/n}.

Sometimes we will encounter random variables that stochastically dominate a sum of Bernoulli random variables (we say that a random variable XX stochastically dominates another random variable YY if there is a coupling of XX and YY such that we always have XYX\geq Y). For example, consider a random process with nn steps where each step satisfies a certain property with probability at least 1/21/2, even when conditioning on any outcome of the previous steps. Then the number XX of steps with this property stochastically dominates a sum of nn independent Ber(1/2)\operatorname{Ber}(1/2) random variables. Denoting this sum by YY, we therefore have Pr(X(n/2)<t)Pr(Y(n/2)<t)<e2t2/n\Pr\mathopen{}\mathclose{{}\left(X-(n/2)<-t}\right)\leq\Pr\mathopen{}\mathclose{{}\left(Y-(n/2)<-t}\right)<e^{-2t^{2}/n} for any t>0t>0 by Lemma 4.7. Hence we can use Lemma 4.7 to show that a random variable is very likely reasonably large if it stochastically dominates a sum of Bernoulli random variables.

Finally, the following lemma is an easy consequence of Markov’s inequality (see, for example, [33, Lemma 2.1]).

Lemma 4.8.

Let 1>p>q>01>p>q>0, and let E1,,EmE_{1},\dots,E_{m} be events (not necessarily independent), each of which occurs with probability at least pp. Then the probability that at least qmqm of the events E1,,EmE_{1},\dots,E_{m} occur simultaneously is at least (pq)/(1q)(p-q)/(1-q).

5 Permanent expansion formulas

Just as for the determinant, it is possible to expand the permanent of a matrix in terms of permanents of submatrices. Below we record two such expansions, which we will use in the proofs of Lemmas 3.1 and 3.2.

Fact 5.1.

Let MM be an n×nn\times n matrix. Add a new row (x1,,xn)(x_{1},\dots,x_{n}) to obtain an (n+1)×n(n+1)\times n matrix MM^{\prime}. Then for any subsets A,B{1,,n}A,B\subseteq\{1,\dots,n\} with |B|=|A|+1|B|=|A|+1, we have

perM[A{n+1},B]=iBxiperM[A,B{i}].\operatorname{per}M^{\prime}[A\cup\{n+1\},B]=\sum_{i\in B}x_{i}\operatorname{per}M[A,B\setminus\{i\}].

For a matrix MM, let M(i,j)M^{(i,j)} be the submatrix of MM obtained by removing row ii and column jj.

Fact 5.2.

Let MM be an n×nn\times n matrix. Add a new row (x1,,xn,z)(x_{1},\dots,x_{n},z) and a new column (y1,,yn,z)(y_{1},\dots,y_{n},z) to obtain an (n+1)×(n+1)(n+1)\times(n+1) matrix MM^{\prime}. Then for any subsets A,B{1,,n}A,B\subseteq\{1,\dots,n\} with |A|=|B||A|=|B|, we have

perM[A{n+1},B{n+1}]=zperM[A,B]+iA,jBxjyiperM[A,B](i,j).\operatorname{per}M^{\prime}[A\cup\{n+1\},B\cup\{n+1\}]=z\operatorname{per}M[A,B]+\sum_{i\in A,j\in B}x_{j}y_{i}\operatorname{per}M[A,B]^{(i,j)}.

We will use Fact 5.1 in combination with the linear anti-concentration inequalities in Fact 4.2 and Theorem 4.1, and we will use Fact 5.2 in combination with the quadratic anti-concentration inequalities in Fact 4.3 and Theorem 4.4

Observe in particular that the formula in Fact 5.2 gives an expression for the permanent of a (k+1)×(k+1)(k+1)\times(k+1) matrix in terms of permanents of (k1)×(k1)(k-1)\times(k-1) submatrices (and one k×kk\times k submatrix). This means that, for example, when we add a new row and column to a matrix, the size of the largest submatrix with nonzero permanent can increase by two. This observation will be crucial for the proof of Lemma 3.2.

In the proof of Lemma 3.2, we will apply Fact 5.2 with the symmetric matrices M=MnM=M_{n} and M=Mn+1M^{\prime}=M_{n+1}. That is to say, (x1,,xn,z)=(y1,,yn,z)(x_{1},\dots,x_{n},z)=(y_{1},\dots,y_{n},z), so the formula in Fact 5.2 can be interpreted as a quadratic polynomial in x1,,xnx_{1},\dots,x_{n} (after conditioning on the value of zz). In order to apply Fact 4.3 to this polynomial, we need this polynomial to have a multilinear degree-2 coefficient with large absolute value. For this, it suffices that perM[A,B](i,j)+perM[A,B](j,i)\operatorname{per}M[A,B]^{(i,j)}+\operatorname{per}M[A,B]^{(j,i)} has large absolute value for some iji\neq j (with i,jABi,j\in A\cap B). The following lemma will be useful for ensuring this condition.

Lemma 5.3.

Let MM be an n×nn\times n matrix and let A,B{1,,n}A,B\subseteq\{1,\dots,n\} be subsets with |A|=|B||A|=|B| such that M[A,B]M[A,B] is λ\lambda-heavy. Suppose we are given an element aBAa\in B\setminus A and distinct elements b1,b2ABb_{1},b_{2}\in A\setminus B. Then there are distinct i,j{a,b1,b2}i,j\in\{a,b_{1},b_{2}\} such that

|perM[A,B](i,j)+perM[A,B](j,i)|λ/2,\mathopen{}\mathclose{{}\left|\operatorname{per}M[A^{\prime},B^{\prime}]^{(i,j)}+\operatorname{per}M[A^{\prime},B^{\prime}]^{(j,i)}}\right|\geq\lambda/2,

where A=A{a}A^{\prime}=A\cup\{a\} and B=(B{a}){i,j}B^{\prime}=(B\setminus\{a\})\cup\{i,j\}.

Proof.

Suppose without loss of generality that perM[A,B]λ\operatorname{per}M[A,B]\geq\lambda. If we have

perM[(A{bs}){a},(B{a}){bs}]λ/2\operatorname{per}M[(A\setminus\{b_{s}\})\cup\{a\},(B\setminus\{a\})\cup\{b_{s}\}]\geq-\lambda/2

for some s{1,2}s\in\{1,2\}, then we can take i=bsi=b_{s} and j=aj=a. Indeed, then we have A=A{a}A^{\prime}=A\cup\{a\} and B=B{bs}B^{\prime}=B\cup\{b_{s}\}, and obtain perM[A,B](i,j)+perM[A,B](j,i)(λ/2)+λ=λ/2\operatorname{per}M[A^{\prime},B^{\prime}]^{(i,j)}+\operatorname{per}M[A^{\prime},B^{\prime}]^{(j,i)}\geq(-\lambda/2)+\lambda=\lambda/2.

Otherwise, if there is no such s{1,2}s\in\{1,2\}, we can take i=b1i=b_{1} and j=b2j=b_{2}. Then we have A=A{a}A^{\prime}=A\cup\{a\} and B=(B{a}){b1,b2}B^{\prime}=(B\setminus\{a\})\cup\{b_{1},b_{2}\}, and obtain perM[A,B](i,j)+perM[A,B](j,i)<(λ/2)+(λ/2)=λ\operatorname{per}M[A^{\prime},B^{\prime}]^{(i,j)}+\operatorname{per}M[A^{\prime},B^{\prime}]^{(j,i)}<(-\lambda/2)+(-\lambda/2)=-\lambda. ∎

We end this section with two simple lemmas that illustrate how to apply Facts 5.1, 5.2 and 5.3 to “grow” heavy minors in a random symmetric matrix. Recall that MnM_{n} is a random symmetric matrix, and that Mn1M_{n-1} contains the first n1n-1 rows and columns of MnM_{n}.

Lemma 5.4.

Consider A,B{1,,n1}A,B\subseteq\{1,\dots,n-1\} with |A|=|B||A|=|B|, and fix any nonempty I{1,,n1}BI\subseteq\{1,\dots,n-1\}\setminus B. Consider any outcome MM of Mn1M_{n-1} such that M[A,B]M[A,B] is λ\lambda-heavy, for some λ>0\lambda>0. Then

Pr(Mn[A{n},B{i}] is λ-heavy for some iI|Mn1=M)12|I|.\Pr\mathopen{}\mathclose{{}\left(\text{$M_{n}[A\cup\{n\},B\cup\{i\}]$ is $\lambda$-heavy for some $i\in I$}\,\big{|}\,M_{n-1}=M}\right)\geq 1-2^{-|I|}.
Proof.

We condition on Mn1=MM_{n-1}=M. Let x1,,xnx_{1},\dots,x_{n} be the entries in the last row of MnM_{n}. Let us also condition on any outcome of the variables xbx_{b} for bBb\in B. Now, by Fact 5.1, for each iIi\in I we have

perMn[A{n},B{i}]=xiperMn1[A,B]+bBxbperMn1[A,(B{i}){b}].\operatorname{per}M_{n}[A\cup\{n\},B\cup\{i\}]=x_{i}\operatorname{per}M_{n-1}[A,B]+\sum_{b\in B}x_{b}\operatorname{per}M_{n-1}[A,(B\cup\{i\})\setminus\{b\}].

Since |perMn1[A,B]|λ|\operatorname{per}M_{n-1}[A,B]|\geq\lambda, each iIi\in I satisfies the desired condition |perMn[A{n},B{i}]|λ|\operatorname{per}M_{n}[A\cup\{n\},B\cup\{i\}]|\geq\lambda with probability at least 1/21/2, and (by our conditioning on the variables xbx_{b} for bBb\in B) this happens independently for all iIi\in I. ∎

Lemma 5.5.

Consider A,B{1,,n1}A,B\subseteq\{1,\dots,n-1\} with |A|=|B||A|=|B|, and consider an outcome MM of Mn1M_{n-1} such that M[A,B]M[A,B] is λ\lambda-heavy, for some λ>0\lambda>0. Then for any aBAa\in B\setminus A and any distinct b1,b2ABb_{1},b_{2}\in A\setminus B, we can choose distinct i,j{a,b1,b2}i,j\in\{a,b_{1},b_{2}\} such that

Pr(Mn[A{a,n},(B{a}){i,j,n}] is (λ/2)-heavy|Mn1=M)14.\Pr\mathopen{}\mathclose{{}\left(M_{n}[A\cup\{a,n\},(B\setminus\{a\})\cup\{i,j,n\}]\text{ is }(\lambda/2)\text{-heavy}\,\big{|}\,M_{n-1}=M}\right)\geq\frac{1}{4}.
Proof.

Let us condition on Mn1=MM_{n-1}=M. By Lemma 5.3, we can choose distinct i,j{a,b1,b2}i,j\in\{a,b_{1},b_{2}\} such that

|perMn1[A,B](i,j)+perMn1[A,B](j,i)|λ/2,\mathopen{}\mathclose{{}\left|\operatorname{per}M_{n-1}[A^{\prime},B^{\prime}]^{(i,j)}+\operatorname{per}M_{n-1}[A^{\prime},B^{\prime}]^{(j,i)}}\right|\geq\lambda/2, (5.1)

where A=A{a}A^{\prime}=A\cup\{a\} and B=(B{a}){i,j}B^{\prime}=(B\setminus\{a\})\cup\{i,j\}. Note that {i,j}{a,b1,b2}A\{i,j\}\subseteq\{a,b_{1},b_{2}\}\subseteq A^{\prime} and that clearly {i,j}B\{i,j\}\subseteq B^{\prime}.

Now, perMn[A{a,n},(B{a}){i,j,n}]=perMn[A{n},B{n}]\operatorname{per}M_{n}[A\cup\{a,n\},(B\setminus\{a\})\cup\{i,j,n\}]=\operatorname{per}M_{n}[A^{\prime}\cup\{n\},B^{\prime}\cup\{n\}], so it suffices to show that with probability at least 1/41/4 we have |perMn[A{n},B{n}]|λ/2|\operatorname{per}M_{n}[A^{\prime}\cup\{n\},B^{\prime}\cup\{n\}]|\geq\lambda/2.

Let (x1,,xn1,z)(x_{1},\dots,x_{n-1},z) be the random entries of the last row (and the last column) of MnM_{n}. By Fact 5.2, we have

perMn[A{n},B{n}]=zperMn1[A,B]+kA,BxkxperMn1[A,B](k,).\operatorname{per}M_{n}[A^{\prime}\cup\{n\},B^{\prime}\cup\{n\}]=z\operatorname{per}M_{n-1}[A^{\prime},B^{\prime}]+\sum_{k\in A^{\prime},\ell\in B^{\prime}}x_{k}x_{\ell}\operatorname{per}M_{n-1}[A^{\prime},B^{\prime}]^{(k,\ell)}.

Note that this is a quadratic polynomial in the variables x1,,xn1,zx_{1},\dots,x_{n-1},z, and the coefficient of xixjx_{i}x_{j} is precisely perMn1[A,B](i,j)+perMn1[A,B](j,i)\operatorname{per}M_{n-1}[A^{\prime},B^{\prime}]^{(i,j)}+\operatorname{per}M_{n-1}[A^{\prime},B^{\prime}]^{(j,i)}. Recalling iji\neq j and Equation 5.1, Fact 4.3 now implies that Pr(|perMn[A{n},B{n}]|<λ/2)3/4\Pr(|\operatorname{per}M_{n}[A^{\prime}\cup\{n\},B^{\prime}\cup\{n\}]|<\lambda/2)\leq 3/4. This finishes the proof of Lemma 5.5. ∎

6 Proof of Lemma 3.1: growing a single heavy submatrix

In this section we prove Lemma 3.1. Recall that L=(logn)/10L=\lfloor(\log n)/10\rfloor and that X,Y{1,,n}X,Y\subseteq\{1,\dots,n\} are disjoint subsets with |X|=L|X|=L and |Y|=3L|Y|=3L. By reordering the rows and columns, we can assume without loss of generality that X={1,,L}X=\{1,\dots,L\} and Y={n3L+1,,n}Y=\{n-3L+1,\dots,n\}.

Lemma 3.1 will be a consequence of the following two lemmas. The first of these lemmas is itself a weaker version of Lemma 3.1 (it also produces a heavy submatrix, but with less control over where it lies, and not with dimensions as close to n×nn\times n).

Lemma 6.1.

For any fixed 0<δ<1/160<\delta<1/16, the following holds for all integers nn\in\mathbb{N} that are sufficiently large with respect to δ\delta. Let λ=n(1/28δ)n\lambda=n^{(1/2-8\delta)n} and suppose that RR\in\mathbb{N} satisfies δnR2δn\delta n\leq R\leq 2\delta n. Then with probability at least 1eδ2n1-e^{-\delta^{2}n} there is a subset B{1,,n}B\subseteq\{1,\dots,n\} of size nRn-R such that Mn[{R+1,,n},B]M_{n}[\{R+1,\dots,n\},B] is λ\lambda-heavy.

To prove Lemma 6.1 we adapt an argument in Tao and Vu’s work [33], simultaneously tracking the propagation and growth of many heavy submatrices.

Our second lemma takes a heavy submatrix of a certain form with dimensions reasonably close to n×nn\times n, and produces a slightly less heavy submatrix with dimensions much closer to n×nn\times n, whose row and column sets satisfy the desired conditions in Lemma 3.1 (recall that we are assuming that X={1,,L}X=\{1,\dots,L\} and Y={n3L+1,,n}Y=\{n-3L+1,\dots,n\}). To be more precise, we actually start with a submatrix contained inside the n×nn^{\prime}\times n^{\prime} matrix MnM_{n^{\prime}} for some nn^{\prime} slightly smaller than nn, and, conditioning on the outcome of MnM_{n^{\prime}}, we only use the randomness from the additional rows and columns exposed when extending MnM_{n^{\prime}} to MnM_{n}

Lemma 6.2.

There is an absolute constant c>0c>0 such that the following holds for all sufficiently large integers nn\in\mathbb{N}. Consider λ>0\lambda>0 and integers LL and RR satisfying (logn)/20<L<L2<R<(n5L23L)/9(\log n)/20<L<L^{2}<R<(n-5L^{2}-3L)/9, and let n=n8R5L23Ln^{\prime}=n-8R-5L^{2}-3L and λ=λ/2RL\lambda^{\prime}=\lambda/2^{R-L}. Condition on an outcome of MnM_{n^{\prime}} for which there is a subset B{1,,n}B\subseteq\{1,\dots,n^{\prime}\} of size nRn^{\prime}-R such that Mn[{R+1,,n},B]M_{n^{\prime}}[\{R+1,\dots,n^{\prime}\},B] is λ\lambda-heavy. Then with probability at least 1(1/8)nc1-(1/8)\cdot n^{-c}, there is a set BB^{\prime} of size nLn-L with {1,,n3L}B{1,,n}\{1,\dots,n-3L\}\subseteq B^{\prime}\subseteq\{1,\dots,n\}, such that Mn[{L+1,,n},B]M_{n}[\{L+1,\dots,n\},B^{\prime}] is λ\lambda^{\prime}-heavy.

It is now easy to deduce Lemma 3.1 from the two lemmas above.

Proof of Lemma 3.1.

Let c>0c>0 be the constant in Lemma 6.2. Recall that we are considering some ε>0\varepsilon>0 and that we are assuming that nn is sufficiently large with respect to ε\varepsilon. Then in particular L=(logn)/10>(logn)/20L=\lfloor(\log n)/10\rfloor>(\log n)/20. As mentioned at the beginning of this section, we may assume that X={1,,L}X=\{1,\dots,L\} and Y={n3L+1,,n}Y=\{n-3L+1,\dots,n\}.

Let δ=ε/32\delta=\varepsilon/32, and R=δnR=\lceil\delta n\rceil, and note that by our assumption that nn is large with respect to ε\varepsilon we have L<L2<R<(n5L23L)/9L<L^{2}<R<(n-5L^{2}-3L)/9. Now let n=n8R5L23L(19δ)nn^{\prime}=n-8R-5L^{2}-3L\geq(1-9\delta)n and λ=(n)(1/28δ)nn(1/215δ)n\lambda=(n^{\prime})^{(1/2-8\delta)n^{\prime}}\geq n^{(1/2-15\delta)n} (again recalling that we assume nn to be large with respect to ε\varepsilon). Note that then δnR2δn\delta n^{\prime}\leq R\leq 2\delta n^{\prime}.

By Lemma 6.1, with probability at least 1eδ2n1(1/8)nc1-e^{-\delta^{2}n^{\prime}}\geq 1-(1/8)\cdot n^{-c} (for nn sufficiently large with respect to ε\varepsilon) there is a subset B{1,,n}B\subseteq\{1,\dots,n^{\prime}\} of size nRn^{\prime}-R such that Mn[{R+1,,n},B]M_{n^{\prime}}[\{R+1,\dots,n^{\prime}\},B] is λ\lambda-heavy. Then by Lemma 6.2 and our choice of cc, with probability at least 1(1/8)nc1-(1/8)\cdot n^{-c} there is a set BB^{\prime} of size nLn-L with {1,,n3L}B{1,,n}\{1,\dots,n-3L\}\subseteq B^{\prime}\subseteq\{1,\dots,n\} such that Mn[{L+1,,n},B]M_{n}[\{L+1,\dots,n\},B^{\prime}] is λ\lambda^{\prime}-heavy, where λ=λ/2RLn(1/215δ)n/2δnn(1/216δ)n=n(1ε)n/2\lambda^{\prime}=\lambda/2^{R-L}\geq n^{(1/2-15\delta)n}/2^{\delta n}\geq n^{(1/2-16\delta)n}=n^{(1-\varepsilon)n/2}. Thus, the total probability that such a set BB^{\prime} exists is at least 1(1/4)nc1-(1/4)\cdot n^{-c}, as desired. ∎

Lemma 6.1 will be proved in Subsection 6.1 and Lemma 6.2 will be proved in Subsection 6.2.

6.1 Proof of Lemma 6.1: propagation of heavy submatrices

In this subsection we prove Lemma 6.1, adapting an argument from [33] to simultaneously track the propagation and growth of heavy submatrices as we expose more rows and columns of our random matrix. Roughly speaking, at each step we track submatrices of a certain form (with dimension growing by 1 at each step). At a given step, if we are guaranteed that many of our submatrices under consideration are heavy, then it is extremely likely that at the next step there will also be reasonably many heavy submatrices of the desired form. Moreover, depending on the structure of our random matrix at this step, one of the following is true, which will likely improve our situation in one of two ways. Either we have a good chance to dramatically increase the number of heavy submatrices in the next step, or we have a (very) good chance to have many submatrices in the next step which are much heavier than before. Lemma 6.3 below makes this precise.

After having proved Lemma 6.3, we will deduce Lemma 6.1 by iteratively applying Lemma 6.3, adding a new row and a new column to our random matrix at every step. Most likely, there will be many steps where our situation improves in one of the two ways described above. However, there is an upper bound for the number of heavy submatrices that we can have at the end of the process (simply by counting the total number of submatrices of the form that we consider). Hence the first type of improvement, which significantly increases the number of heavy submatrices, cannot occur too many times. So, among the two ways we can “improve the situation”, the second type of improvement must happen most of the time. This means that during our process we get submatrices that are more and more heavy, and at the end we find a reasonably large number of very heavy submatrices in our final matrix MnM_{n} (in fact, we only need one such very heavy submatrix).

Lemma 6.3.

Fix R,nR,n\in\mathbb{N}. For kk\in\mathbb{N} and real numbers N>0N>0 and λ>0\lambda>0, let E(k,N,λ)E(k,N,\lambda) denote the event that there are at least NN different subsets B{1,,k+R}B\subseteq\{1,\dots,k+R\} with |B|=k|B|=k such that the matrix Mk+R[{R+1,,k+R},B]M_{k+R}[\{R+1,\dots,k+R\},B] is λ\lambda-heavy.

Then for any kk\in\mathbb{N} with k+Rnk+R\leq n, and any real numbers 0<δ<1/20<\delta<1/2 as well as K>1K>1, λ>0\lambda>0 and N>0N>0, there is a partition E(k,N,λ)=E(k,N,λ)E′′(k,N,λ)E(k,N,\lambda)=E^{\prime}(k,N,\lambda)\cup E^{\prime\prime}(k,N,\lambda) of the event E(k,N,λ)E(k,N,\lambda) such that the following holds. Let N+=RN/(8K)N^{+}=RN/(8K), N=RN/(8n)N^{-}=RN/(8n) and λ+=K1/2δλ\lambda^{+}=K^{1/2-\delta}\lambda, and let M,M,M′′M,M^{\prime},M^{\prime\prime} be any possible outcomes of Mk+RM_{k+R} satisfying E(k,N,λ)E(k,N,\lambda), E(k,N,λ)E^{\prime}(k,N,\lambda) and E′′(k,N,λ)E^{\prime\prime}(k,N,\lambda) respectively. Then

Pr(E(k+1,N,λ)|Mk+R=M)\displaystyle\Pr\mathopen{}\mathclose{{}\left(E(k+1,N^{-},\lambda)\,\middle|\,M_{k+R}=M}\right) 12eR/8.\displaystyle\geq 1-2e^{-R/8}. (6.1)
Pr(E(k+1,N+,λ)|Mk+R=M)\displaystyle\Pr\mathopen{}\mathclose{{}\left(E(k+1,N^{+},\lambda)\,\middle|\,M_{k+R}=M^{\prime}}\right) 1/3.\displaystyle\geq 1/3. (6.2)
Pr(E(k+1,N,λ+)|Mk+R=M′′)\displaystyle\Pr\mathopen{}\mathclose{{}\left(E(k+1,N^{-},\lambda^{+})\,\middle|\,M_{k+R}=M^{\prime\prime}}\right) 14Kδ.\displaystyle\geq 1-4K^{-\delta}. (6.3)
Proof.

We may assume without loss of generality that N>0N>0 is an integer (indeed, otherwise we can replace NN by N\mathopen{}\mathclose{{}\left\lceil N}\right\rceil, noting that the statement for N\mathopen{}\mathclose{{}\left\lceil N}\right\rceil implies the statement for NN).

Let x1,,xk+R+1x_{1},\dots,x_{k+R+1} be the entries in the last row of Mk+R+1M_{k+R+1}. For subsets BB{1,,k+R}B\subseteq B^{\prime}\subseteq\{1,\dots,k+R\} with sizes kk and k+1k+1 respectively, we say that BB is a parent of BB^{\prime} and that BB^{\prime} is a child of BB. For a subset B{1,,k+R}B^{\prime}\subseteq\{1,\dots,k+R\} of size k+1k+1, note that by Fact 5.1 we have

perMk+R+1[{R+1,,k+R+1},B]=BperMk+R[{R+1,,k+R},B]xBB,\operatorname{per}M_{k+R+1}[\{R+1,\dots,k+R+1\},B^{\prime}]=\sum_{B}\operatorname{per}M_{k+R}[\{R+1,\dots,k+R\},B]\cdot x_{B^{\prime}\setminus B}, (6.4)

where the sum is over all parents BB of BB^{\prime} (here, with slight abuse of notation we write x{i}x_{\{i\}} instead of xix_{i} for i{1,,k+R}i\in\{1,\dots,k+R\}).

For each outcome of Mk+RM_{k+R} such that E(k,N,λ)E(k,N,\lambda) holds, let us fix subsets B1,,BNB_{1},\dots,B_{N} as in the definition of E(k,N,λ)E(k,N,\lambda). Note that we always have |perMk+R[{R+1,,k+R},Bi]|λ|\operatorname{per}M_{k+R}[\{R+1,\dots,k+R\},B_{i}]|\geq\lambda for i=1,,Ni=1,\dots,N.

Furthermore, for each outcome of Mk+RM_{k+R} satisfying E(k,N,λ)E(k,N,\lambda), let SqS_{q} denote the collection of subsets of {1,,k+R}\{1,\dots,k+R\} of size k+1k+1 which have exactly qq parents among the sets B1,,BNB_{1},\dots,B_{N}, and let S=S1SnS=S_{1}\cup\dots\cup S_{n} be the collection of all such subsets which have at least one parent among B1,,BNB_{1},\dots,B_{N}. Furthermore, let SK=SKSnS_{\geq K}=S_{\lceil K\rceil}\cup\dots\cup S_{n} be the collection of all such subsets which have at least KK parents among B1,,BNB_{1},\dots,B_{N}. We say that BSB^{\prime}\in S is λ\lambda^{\prime}-heavy for some λ>0\lambda^{\prime}>0 if Mk+R+1[{R+1,,k+R+1},B]M_{k+R+1}[\{R+1,\dots,k+R+1\},B^{\prime}] is λ\lambda^{\prime}-heavy.

Since each of the sets B1,,BNB_{1},\dots,B_{N} is a parent of exactly RR different sets BSB^{\prime}\in S, a double-counting argument shows that we have

q=1nq|Sq|=RN\sum_{q=1}^{n}q|S_{q}|=RN

for each outcome of Mk+RM_{k+R} such that E(k,N,λ)E(k,N,\lambda) holds.

Now, let E(k,N,λ)E(k,N,λ)E^{\prime}(k,N,\lambda)\subseteq E(k,N,\lambda) be the event that q<Kq|Sq|RN/2\sum_{q<K}q|S_{q}|\geq RN/2, and condition on any outcome MM^{\prime} of Mk+RM_{k+R} satisfying E(k,N,λ)E^{\prime}(k,N,\lambda). Note that we then have |S|q<K|Sq|>RN/(2K)|S|\geq\sum_{q<K}|S_{q}|>RN/(2K). Furthermore note that for each BSB^{\prime}\in S, at least one of the terms perMk+R[{R+1,,k+R},B]\operatorname{per}M_{k+R}[\{R+1,\dots,k+R\},B] on the left-hand side of Equation 6.4 has absolute value at least λ\lambda (since BB^{\prime} has at least one parent among B1,,BNB_{1},\dots,B_{N}). Hence, by Fact 4.2, each BSB^{\prime}\in S is λ\lambda-heavy with probability at least 1/21/2, and Equation 6.2 follows from Markov’s inequality (to be precise, it follows from Lemma 4.8 applied with p=1/2p=1/2 and q=1/4q=1/4).

On the other hand, let E′′(k,N,λ)=E(k,N,λ)E(k,N,λ)E^{\prime\prime}(k,N,\lambda)=E(k,N,\lambda)\setminus E^{\prime}(k,N,\lambda) be the complementary event to E(k,N,λ)E^{\prime}(k,N,\lambda) within E(k,N,λ)E(k,N,\lambda), i.e. the event that qKq|Sq|>RN/2\sum_{q\geq K}q|S_{q}|>RN/2. Condition on any outcome M′′M^{\prime\prime} of Mk+RM_{k+R} satisfying E′′(k,N,λ)E^{\prime\prime}(k,N,\lambda), and note that then |SK|RN/(2n)|S_{\geq K}|\geq RN/(2n). Also note that for each BSKB^{\prime}\in S_{\geq K}, at least KK of the terms perMk+R[{R+1,,k+R},B]\operatorname{per}M_{k+R}[\{R+1,\dots,k+R\},B] on the left-hand side of Equation 6.4 have absolute value at least λ\lambda. Hence, by the Erdős–Littlewood–Offord inequality (specifically, Theorem 4.1, applied with m=Km=K, r=λr=\lambda and t=K1/2δt=K^{1/2-\delta}), each BSKB^{\prime}\in S_{\geq K} is λ+\lambda^{+}-heavy with probability at least 13Kδ1-3K^{-\delta}. Then, Equation 6.3 follows from Markov’s inequality (specifically, we apply Lemma 4.8 with p=13Kδp=1-3K^{-\delta} and q=1/4q=1/4).

Finally, to prove Equation 6.1, let us condition on any outcome MM of Mk+RM_{k+R} satisfying E(k,N,λ)E(k,N,\lambda). Say that for i=1,,Ni=1,\dots,N, the set BiB_{i} is good if at least R/4R/4 of its RR children BSB^{\prime}\in S are λ\lambda-heavy. We claim that each BiB_{i} is good with probability at least 1eR/81-e^{-R/8}. Indeed, consider some fixed i{1,,N}i\in\{1,\dots,N\}, and condition on any outcome of the variables xbx_{b} for bBib\in B_{i}. Now for each child BSB^{\prime}\in S of BiB_{i}, the sum in Equation 6.4 depends only on the outcome of xBBix_{B^{\prime}\setminus B_{i}} (since for all other elements of bBb\in B^{\prime} the corresponding variable xbx_{b} has already been fixed). Since |perMk+R[{R+1,,k+R},Bi]|λ|\operatorname{per}M_{k+R}[\{R+1,\dots,k+R\},B_{i}]|\geq\lambda, each child BSB^{\prime}\in S of BiB_{i} is λ\lambda-heavy with probability at least 1/21/2, independently for all children BSB^{\prime}\in S. So, by the Chernoff bound (Lemma 4.7), the set BiB_{i} is indeed good with probability at least 1e2(R/4)2/R=1eR/81-e^{-2(R/4)^{2}/R}=1-e^{-R/8}, as claimed.

Now, by Markov’s inequality (specifically, Lemma 4.8, applied with p=1eR/8p=1-e^{-R/8} and q=1/2q=1/2), with probability at least 12eR/81-2e^{-R/8} at least N/2N/2 of the sets B1,,BNB_{1},\dots,B_{N} are good. Whenever this is the case, there are at least (N/2)(R/4)/n=RN/(8n)(N/2)\cdot(R/4)/n=RN/(8n) different λ\lambda-heavy sets BSB^{\prime}\in S (since each such set BSB^{\prime}\in S is a child of at most k+1nk+1\leq n different sets BiB_{i}). This proves Equation 6.1. ∎

Now we deduce Lemma 6.1.

Proof of Lemma 6.1.

As in the lemma statement, let 0<δ<1/160<\delta<1/16 and assume that nn\in\mathbb{N} is sufficiently large with respect to δ\delta (sufficiently large to satisfy certain inequalities later in the proof). Let RR\in\mathbb{N} be an integer satisfying δnR2δn\delta n\leq R\leq 2\delta n, and let K=n1δK=n^{1-\delta}. Furthermore, recall the notation from the statement of Lemma 6.3. We define random sequences N1,,NnRN_{1},\dots,N_{n-R} and λ1,,λnR\lambda_{1},\dots,\lambda_{n-R} of positive real numbers by an iterative process. Let N1=λ1=1N_{1}=\lambda_{1}=1 and for each 1knR11\leq k\leq n-R-1 define Nk+1N_{k+1} and λk+1\lambda_{k+1} as follows:

  • (i)

    if E(k,Nk,λk)E^{\prime}(k,N_{k},\lambda_{k}) and E(k+1,Nk+,λk)E(k+1,N_{k}^{+},\lambda_{k}) both hold, then let Nk+1=Nk+N_{k+1}=N_{k}^{+} and λk+1=λk\lambda_{k+1}=\lambda_{k};

  • (ii)

    if E′′(k,Nk,λk)E^{\prime\prime}(k,N_{k},\lambda_{k}) and E(k+1,Nk,λk+)E(k+1,N_{k}^{-},\lambda_{k}^{+}) both hold, then let Nk+1=NkN_{k+1}=N_{k}^{-} and λk+1=λk+\lambda_{k+1}=\lambda_{k}^{+};

  • (iii)

    if neither (i) nor (ii) holds, but E(k,Nk,λk)E(k,N_{k},\lambda_{k}) and E(k+1,Nk,λk)E(k+1,N_{k}^{-},\lambda_{k}) both hold, then let Nk+1=NkN_{k+1}=N_{k}^{-} and λk+1=λk\lambda_{k+1}=\lambda_{k};

  • (iv)

    otherwise, abort (and then our sequences are not well-defined).

Note that the event E(1,N1,λ1)E(1,N_{1},\lambda_{1}) always holds. If we do not abort at any point in the above process, then E(k,Nk,λk)E(k,N_{k},\lambda_{k}) holds for each kk, and in particular there is a subset B{1,,n}B\subseteq\{1,\dots,n\} of size |B|=nR|B|=n-R such that Mn[{R+1,,n},B]M_{n}[\{R+1,\dots,n\},B] is λnR\lambda_{n-R}-heavy. Thus, in order for the desired event in Lemma 6.1 to hold, it is sufficient that the process does not abort and that λnRn(1/28δ)n\lambda_{n-R}\geq n^{(1/2-8\delta)n}. We will show that this happens with probability at least 1eδ2n1-e^{-\delta^{2}n}.

The main observation is that case (i) cannot occur too many times, simply because it is not possible for NkN_{k} to ever be larger than 2n2^{n}. Roughly speaking, it will follow from this observation and Equation 6.2 that E(k,Nk,λk)E^{\prime}(k,N_{k},\lambda_{k}) is unlikely to occur too many times. This will then imply that case (ii) is likely to occur many times, meaning that λnR\lambda_{n-R} is large.

Claim 6.4.

Case (i) in the above process occurs for fewer than δn\delta n different values of kk.

Proof.

Note that whenever (i) holds, we have Nk+1/Nk=Nk+/Nk=R/(8K)δn/8n1δ=nδ/8N_{k+1}/N_{k}=N_{k}^{+}/N_{k}=R/(8K)\geq\delta n/8n^{1-\delta}=n^{\delta}/8. On the other hand, whenever (ii) or (iii) holds, we have Nk+1/Nk=Nk/Nk=R/(8n)δn/8n=δ/8N_{k+1}/N_{k}=N_{k}^{-}/N_{k}=R/(8n)\geq\delta n/8n=\delta/8. Now suppose for the purpose of contradiction that (i) holds for at least δn\delta n different kk, and let us define m=k+1m=k+1 for the last such value kk. Note that then we have

Nm(nδ/8)δn(δ/8)mδnnδ2n(δ/8)mnδ2n(δ/8)n>2nN_{m}\geq(n^{\delta}/8)^{\delta n}\cdot(\delta/8)^{m-\delta n}\geq n^{\delta^{2}n}\cdot(\delta/8)^{m}\geq n^{\delta^{2}n}\cdot(\delta/8)^{n}>2^{n}

for sufficiently large nn. On the other hand, by our choice of mm, case (i) holds for k=m1k=m-1, and so in particular the event E(k+1,Nk+,λk)=E(m,Nm,λm)E(k+1,N_{k}^{+},\lambda_{k})=E(m,N_{m},\lambda_{m}) holds. This means that there are at least Nm>2nN_{m}>2^{n} different subsets B{1,,m+R}{1,,n}B\subseteq\{1,\dots,m+R\}\subseteq\{1,\dots,n\} satisfying the conditions in the definition of the event E(m,Nm,λm)E(m,N_{m},\lambda_{m}). But this is clearly a contradiction, since the total number of subsets of {1,,n}\{1,\dots,n\} is only 2n2^{n}. ∎

The next observation is that if case (ii) occurs many times, then we are done.

Claim 6.5.

If we do not abort at any point, and case (ii) occurs for at least n12δnn-12\delta n different values of kk, then λnRn(1/28δ)n\lambda_{n-R}\geq n^{(1/2-8\delta)n}.

Proof.

Whenever (ii) holds, we have λk+1/λk=λk+/λk=K1/2δ\lambda_{k+1}/\lambda_{k}=\lambda_{k}^{+}/\lambda_{k}=K^{1/2-\delta}. On the other hand, whenever (i) or (iii) holds, we have λk+1=λk\lambda_{k+1}=\lambda_{k}. So, if case (ii) occurs for at least n12δnn-12\delta n values of kk, then

λnR(K1/2δ)n12δnn(1δ)(1/2δ)(n12δn)n(1/22δ)(n12δn)n(1/28δ)n.\lambda_{n-R}\geq(K^{1/2-\delta})^{n-12\delta n}\geq n^{(1-\delta)\cdot(1/2-\delta)\cdot(n-12\delta n)}\geq n^{(1/2-2\delta)\cdot(n-12\delta n)}\geq n^{(1/2-8\delta)n}.

It now suffices to show that with probability at least 1eδ2n1-e^{-\delta^{2}n}, we do not abort and case (ii) occurs at least n12δnn-12\delta n times. To this end, we define an auxiliary random process W1,,WnRW_{1},\dots,W_{n-R} that evolves in parallel with N1,,NnRN_{1},\dots,N_{n-R} and λ1,,λnR\lambda_{1},\dots,\lambda_{n-R}. Namely, let W1=0W_{1}=0, and for 1knR11\leq k\leq n-R-1 let

Wk+1=Wk+(1δ){3in case (i),1in case (ii),0in case (iii) or (iv).W_{k+1}=W_{k}+(1-\delta)-\begin{cases}3&\text{in case (i),}\\ 1&\text{in case (ii),}\\ 0&\text{in case (iii) or (iv).}\end{cases}

Furthermore, if case (iv) occurs then let Wk+2=Wk+3==WnRW_{k+2}=W_{k+3}=\dots=W_{n-R} all be equal to the value of Wk+1W_{k+1} just defined (that is to say, we “freeze” the value of WkW_{k} after the process aborts).

Note that W1,,WkW_{1},\dots,W_{k} are fully determined by the random matrix Mk+RM_{k+R} (which also determines its submatrices MR+1,,Mk+RM_{R+1},\dots,M_{k+R}). Moreover, this defines a supermartingale, in the sense that 𝔼[Wk+1|Mk+R]Wk\mathbb{E}[W_{k+1}|M_{k+R}]\leq W_{k} for each kk (provided nn is sufficiently large). To see this, consider any outcome of MkM_{k} for which we have not yet aborted (meaning in particular that the event E(k,Nk,λk)=E(k,Nk,λk)E′′(k,Nk,λk)E(k,N_{k},\lambda_{k})=E^{\prime}(k,N_{k},\lambda_{k})\cup E^{\prime\prime}(k,N_{k},\lambda_{k}) holds). If E(k,Nk,λk)E^{\prime}(k,N_{k},\lambda_{k}) holds, then 𝔼[Wk+1Wk|Mk](1δ)(1/3)3δ\mathbb{E}[W_{k+1}-W_{k}|M_{k}]\leq(1-\delta)-(1/3)\cdot 3\leq-\delta by Equation 6.2. On the other hand, if E′′(k,Nk,λk)E^{\prime\prime}(k,N_{k},\lambda_{k}) holds, then 𝔼[Wk+1Wk|Mk](1δ)(14Kδ)=δ+4Kδ0\mathbb{E}[W_{k+1}-W_{k}|M_{k}]\leq(1-\delta)-(1-4K^{-\delta})=-\delta+4K^{-\delta}\leq 0 for sufficiently large nn, by Equation 6.3. In addition, observe that |WiWi1|3|W_{i}-W_{i-1}|\leq 3 for each 1<inR1<i\leq n-R.

By Lemma 4.6 (with Zk=Mk+RZ_{k}=M_{k+R} for k=1,,nRk=1,\dots,n-R, and c=3c=3) we have WnR5δnW_{n-R}\leq 5\delta n with probability at least 1e(25/18)δ2n1(1/2)eδ2n1-e^{-(25/18)\delta^{2}n}\geq 1-(1/2)e^{-\delta^{2}n}. Also, by Equation 6.1 and the union bound, the probability that we ever abort is bounded by (nR)2eR/8n2eδn/8(1/2)eδ2n(n-R)\cdot 2e^{-R/8}\leq n\cdot 2e^{-\delta n/8}\leq(1/2)e^{-\delta^{2}n}. But note that if we never abort, then

WnR=(nR1)(1δ)3X(i)X(ii),W_{n-R}=(n-R-1)(1-\delta)-3X_{\text{(i)}}-X_{\text{(ii)}},

where X(i)X_{\text{(i)}} is the number of times that case (i) occurs, and X(ii)X_{\text{(ii)}} is the number of times that case (ii) occurs. Recall that X(i)δnX_{\text{(i)}}\leq\delta n by Claim 6.4. Hence, if WnR5δnW_{n-R}\leq 5\delta n and the process does not abort, then case (ii) occurs X(ii)(nR1)(1δ)3δn5δnn12δnX_{\text{(ii)}}\geq(n-R-1)(1-\delta)-3\delta n-5\delta n\geq n-12\delta n times, which by Claim 6.4 implies that λnRn(1/28δ)n\lambda_{n-R}\geq n^{(1/2-8\delta)n}. Thus, we have indeed shown that with probability at least 1eδ2n1-e^{-\delta^{2}n} the process does not abort and we have λnRn(1/28δ)n\lambda_{n-R}\geq n^{(1/2-8\delta)n}. ∎

6.2 Proof of Lemma 6.2: “filling out” a single heavy submatrix

In this subsection we prove Lemma 6.2. It will be a consequence of the following two lemmas, which (in two slightly different ways) “grow” a heavy submatrix by exposing a few additional rows and columns.

Lemma 6.6.

Let 1S<n1\leq S<n and λ>0\lambda>0, and condition on an outcome of MnM_{n} for which there is a subset B{1,,n}B\subseteq\{1,\dots,n\} of size nSn-S such that Mn[{S+1,,n},B]M_{n}[\{S+1,\dots,n\},B] is λ\lambda-heavy. Then with probability at least 13S2SeS/61-3S\cdot 2^{-S}-e^{-S/6}, there is a set BB^{\prime} of size n+2Sn+2S with {1,,n}B{1,,n+3S}\{1,\dots,n\}\subseteq B^{\prime}\subseteq\{1,\dots,n+3S\} such that the matrix Mn+3S[{S+1,,n+3S},B]M_{n+3S}[\{S+1,\dots,n+3S\},B^{\prime}] is λ\lambda-heavy.

Lemma 6.7.

Let 2T<S<n2\leq T<S<n and λ>0\lambda>0, and condition on an outcome of MnM_{n} for which there is a set BB of size nSn-S with {1,,S}B{1,,n}\{1,\dots,S\}\subseteq B\subseteq\{1,\dots,n\} such that Mn[{S+1,,n},B]M_{n}[\{S+1,\dots,n\},B] is λ\lambda-heavy. Then with probability at least 15S2TeS/401-5S\cdot 2^{-T}-e^{-S/40}, there is a set BB^{\prime} of size n+5STn+5S-T with {1,,T}B{1,,n+5S}\{1,\dots,T\}\subseteq B\subseteq\{1,\dots,n+5S\} such that Mn+5S[{T+1,,n+5S},B]M_{n+5S}[\{T+1,\dots,n+5S\},B^{\prime}] is λ/2ST\lambda/2^{S-T}-heavy.

Before proving Lemmas 6.6 and 6.7, we deduce Lemma 6.2.

Proof of Lemma 6.2.

Recall that n=n8R5L23Ln^{\prime}=n-8R-5L^{2}-3L, and that we are conditioning on an outcome of MnM_{n^{\prime}} for which there is a subset B{1,,n}B\subseteq\{1,\dots,n^{\prime}\} of size nRn^{\prime}-R such that Mn[{R+1,,n},B]M_{n^{\prime}}[\{R+1,\dots,n^{\prime}\},B] is λ\lambda-heavy.

First, by Lemma 6.6 (applied with S=R<nS=R<n^{\prime}), with probability at least 13R2ReR/61-3R\cdot 2^{-R}-e^{-R/6}, there is a set B1B_{1} of size n+2Rn^{\prime}+2R with {1,,R}{1,,n}B{1,,n+3R}\{1,\dots,R\}\subseteq\{1,\dots,n^{\prime}\}\subseteq B^{\prime}\subseteq\{1,\dots,n^{\prime}+3R\} such that Mn+3R[{R+1,,n+3R},B1]M_{n^{\prime}+3R}[\{R+1,\dots,n^{\prime}+3R\},B_{1}] is λ\lambda-heavy. Let us now condition on such an outcome for Mn+3RM_{n^{\prime}+3R}.

Then, by Lemma 6.7 (applied with T=L2T=L^{2} and S=R<n+3RS=R<n^{\prime}+3R), we obtain that with probability at least 15R2L2eR/401-5R\cdot 2^{-L^{2}}-e^{-R/40} there is a set B2B_{2} of size n+8RL2n^{\prime}+8R-L^{2} with {1,,L2}B2{1,,n+8R}\{1,\dots,L^{2}\}\subseteq B_{2}\subseteq\{1,\dots,n^{\prime}+8R\} such that that the matrix Mn+8R[{L2+1,,n+8R},B2]M_{n^{\prime}+8R}[\{L^{2}+1,\dots,n^{\prime}+8R\},B_{2}] is (λ/2RL2)(\lambda/2^{R-L^{2}})-heavy. Let us condition on such an outcome for Mn+8RM_{n^{\prime}+8R}.

Applying Lemma 6.7 again (this time with T=LT=L and S=L2<n+8RS=L^{2}<n^{\prime}+8R), we now get that with probability at least 15L22LeL2/401-5L^{2}\cdot 2^{-L}-e^{-L^{2}/40}, there is a set B3B_{3} of size n+8R+5L2Ln^{\prime}+8R+5L^{2}-L with {1,,L}B3{1,,n+8R+5L2}\{1,\dots,L\}\subseteq B_{3}\subseteq\{1,\dots,n^{\prime}+8R+5L^{2}\} such that that Mn+8R+5L2[{L+1,,n+8R+5L2},B3]M_{n^{\prime}+8R+5L^{2}}[\{L+1,\dots,n^{\prime}+8R+5L^{2}\},B_{3}] is λ\lambda^{\prime}-heavy, where λ=(λ/2RL2)/2L2L=λ/2RL\lambda^{\prime}=(\lambda/2^{R-L^{2}})/2^{L^{2}-L}=\lambda/2^{R-L}. Let us condition on such an outcome for Mn+8R+5L2M_{n^{\prime}+8R+5L^{2}}

Finally, by Lemma 6.6 (applied with S=L<n+8R+5L2S=L<n^{\prime}+8R+5L^{2}), with probability at least 13L2LeL/61-3L\cdot 2^{-L}-e^{-L/6}, there is a set BB^{\prime} of size n+8R+5L2+2L=nLn^{\prime}+8R+5L^{2}+2L=n-L with

{1,,n3L}={1,,n+8R+5L2}B{1,,n+8R+5L2+3L}={1,,n}\{1,\dots,n-3L\}=\{1,\dots,n^{\prime}+8R+5L^{2}\}\subseteq B^{\prime}\subseteq\{1,\dots,n^{\prime}+8R+5L^{2}+3L\}=\{1,\dots,n\}

such that Mn[{L+1,,n},B]M_{n}[\{L+1,\dots,n\},B^{\prime}] is λ\lambda^{\prime}-heavy.

The probability that all four steps succeed is at least

1(3R2R+eR/6)(5R2L2+eR/40)(5L22L+eL2/40)(3L2L+eL/6)1(1/8)nc1-(3R\cdot 2^{-R}+e^{-R/6})-(5R\cdot 2^{-L^{2}}+e^{-R/40})-(5L^{2}\cdot 2^{-L}+e^{-L^{2}/40})-(3L\cdot 2^{-L}+e^{-L/6})\geq 1-(1/8)\cdot n^{-c}

for some small constant c>0c>0 (recall that (logn)/20<L<L2<R<n(\log n)/20<L<L^{2}<R<n and that nn is sufficiently large). ∎

We now prove Lemma 6.6.

Proof of Lemma 6.6.

For any mm\in\mathbb{N} with nmn+3Sn\leq m\leq n+3S, define the random variable QmQ_{m} to be the minimum value of |{1,,n}B||\{1,\dots,n\}\setminus B^{\prime}| among all subsets B{1,,m}B^{\prime}\subseteq\{1,\dots,m\} of size mSm-S such that Mm[{S+1,,m},B]M_{m}[\{S+1,\dots,m\},B^{\prime}] is λ\lambda-heavy. If no such subset BB^{\prime} exists, let Qm=Q_{m}=\infty.

Recall that in Lemma 6.6 we are conditioning on an outcome of MnM_{n} for which there is a subset B{1,,n}B\subseteq\{1,\dots,n\} of size nSn-S such that Mn[{S+1,,n},B]M_{n}[\{S+1,\dots,n\},B] is λ\lambda-heavy. This means that Qn|{1,,n}B|=SQ_{n}\leq|\{1,\dots,n\}\setminus B|=S.

For n<mn+3Sn<m\leq n+3S, we say that step mm is a failure if Qm>Qm1Q_{m}>Q_{m-1}. We say that step mm is progress if Qm<Qm1Q_{m}<Q_{m-1} or if Qm=Qm1=0Q_{m}=Q_{m-1}=0 or if Qm1=Q_{m-1}=\infty.

For any n<mn+3Sn<m\leq n+3S, when conditioning on any outcome of Mm1M_{m-1}, we claim that step mm is a failure with probability at most 2S2^{-S}. Indeed, if Qm1<Q_{m-1}<\infty, let B{1,,m1}B^{\prime}\subseteq\{1,\dots,m-1\} be a subset of size m1Sm-1-S with |{1,,n}B|=Qm1|\{1,\dots,n\}\setminus B^{\prime}|=Q_{m-1} such that Mm1[{S+1,,m1},B]M_{m-1}[\{S+1,\dots,m-1\},B^{\prime}] is λ\lambda-heavy. By applying Lemma 5.4 with I={1,,m1}BI=\{1,\dots,m-1\}\setminus B^{\prime}, we see that with probability at least 12S1-2^{-S} there exists some i{1,,m1}Bi\in\{1,\dots,m-1\}\setminus B^{\prime} such that Mm[{S+1,,m},B{i}]M_{m}[\{S+1,\dots,m\},B^{\prime}\cup\{i\}] is λ\lambda-heavy (which in particular implies Qm|{1,,n}(B{i})|Qm1Q_{m}\leq|\{1,\dots,n\}\setminus(B^{\prime}\cup\{i\})|\leq Q_{m-1}). On the other hand, if Qm1=Q_{m-1}=\infty, then step mm cannot be a failure. So this indeed shows that in any case (when conditioning on any outcome of Mm1M_{m-1}), step mm is a failure with probability at most 2S2^{-S}.

Furthermore, for any n<mn+3Sn<m\leq n+3S, when conditioning on any outcome of Mm1M_{m-1}, we claim that step mm is progress with probability at least 1/21/2. Indeed, if Qm1{0,}Q_{m-1}\notin\{0,\infty\}, let B{1,,m1}B^{\prime}\subseteq\{1,\dots,m-1\} be a subset of size m1Sm-1-S with |{1,,n}B|=Qm1|\{1,\dots,n\}\setminus B^{\prime}|=Q_{m-1} such that Mm1[{S+1,,m1},B]M_{m-1}[\{S+1,\dots,m-1\},B^{\prime}] is λ\lambda-heavy. We can then apply Lemma 5.4 with I={1,,n}BI=\{1,\dots,n\}\setminus B^{\prime}, to see that with probability at least 1/21/2 there exists some i{1,,n}Bi\in\{1,\dots,n\}\setminus B^{\prime} such that Mm[{S+1,,m},B{i}]M_{m}[\{S+1,\dots,m\},B^{\prime}\cup\{i\}] is λ\lambda-heavy (which in particular implies Qm|{1,,n}(B{i})|<Qm1Q_{m}\leq|\{1,\dots,n\}\setminus(B^{\prime}\cup\{i\})|<Q_{m-1}). If Qm1=Q_{m-1}=\infty, by definition step mm is always progress. If Qm1=0Q_{m-1}=0, then step mm is progress if and only if it is not failure, and we already showed that it is failure with probability at most 2S1/22^{-S}\leq 1/2. This shows that in any case (when conditioning on any outcome of Mm1M_{m-1}), step mm is progress with probability at least 1/21/2.

Hence the number of progress steps among the 3S3S steps m{n+1,,n+3S}m\in\{n+1,\dots,n+3S\} stochastically dominates a sum of 3S3S independent Ber(1/2)\operatorname{Ber}(1/2) random variables. By the Chernoff bound (Lemma 4.7) such a sum is at least SS with probability at least 1e2(S/2)2/(3S)=1eS/61-e^{-2(S/2)^{2}/(3S)}=1-e^{-S/6}. Thus, the number of progress steps is also at least SS with probability at least 1eS/61-e^{-S/6}. Furthermore, note that by the union bound, with probability at least 13S2S1-3S\cdot 2^{-S} none of the 3S3S steps m{n+1,,n+3S}m\in\{n+1,\dots,n+3S\} is a failure.

If there are no failures and at least SS progress steps, then we must have Qn+3S=0Q_{n+3S}=0. Hence, with probability at least 13S2SeS/61-3S\cdot 2^{-S}-e^{-S/6} there exists a subset B{1,,n+3S}B^{\prime}\subseteq\{1,\dots,n+3S\} of size n+2Sn+2S such that Mn+3S[{S+1,,n+3S},B]M_{n+3S}[\{S+1,\dots,n+3S\},B^{\prime}] is λ\lambda-heavy and |{1,,n}B|=0|\{1,\dots,n\}\setminus B^{\prime}|=0 (meaning that {1,,n}B\{1,\dots,n\}\subseteq B^{\prime}). ∎

The proof of Lemma 6.7 follows a similar strategy as the proof of Lemma 6.6 above.

Proof of Lemma 6.7.

For any mm\in\mathbb{N} with nmn+5Sn\leq m\leq n+5S, define the random variable QmQ_{m} to be the minimal number Q{T,T+1,T+2,}Q\in\{T,T+1,T+2,\dots\} such that there is a set BB^{\prime} of size mQm-Q with {1,,Q}B{1,,m}\{1,\dots,Q\}\subseteq B^{\prime}\subseteq\{1,\dots,m\} such that the matrix Mm[{Q+1,,m},B]M_{m}[\{Q+1,\dots,m\},B^{\prime}] is (λ/2SQ)(\lambda/2^{S-Q})-heavy. If there is no Q{T,T+1,T+2,}Q\in\{T,T+1,T+2,\dots\} for which such a set BB^{\prime} exists, we define Qm=Q_{m}=\infty.

Recall that in Lemma 6.7, we are conditioning on an outcome of MnM_{n} for which there is a set BB of size nSn-S with {1,,S}B{1,,n}\{1,\dots,S\}\subseteq B\subseteq\{1,\dots,n\} such that that Mn[{S+1,,n},B]M_{n}[\{S+1,\dots,n\},B] is λ\lambda-heavy. This means that QnSQ_{n}\leq S (recall that S>TS>T).

For n<mn+5Sn<m\leq n+5S, we say that step mm is a failure if Qm>Qm1Q_{m}>Q_{m-1}. We say that step mm is progress if Qm<Qm1Q_{m}<Q_{m-1} or if Qm=Qm1=TQ_{m}=Q_{m-1}=T or if Qm1=Q_{m-1}=\infty.

For any n<mn+5Sn<m\leq n+5S, when conditioning on any outcome of Mm1M_{m-1}, we claim that step mm is a failure with probability at most 2T2^{-T}. Indeed, if Qm1<Q_{m-1}<\infty, let BB^{\prime} be a set of size m1Qm1m-1-Q_{m-1} with {1,,Qm1}B{1,,m1}\{1,\dots,Q_{m-1}\}\subseteq B^{\prime}\subseteq\{1,\dots,m-1\} such that the matrix Mm1[{Qm1+1,,m1},B]M_{m-1}[\{Q_{m-1}+1,\dots,m-1\},B^{\prime}] is (λ/2SQm1)(\lambda/2^{S-Q_{m-1}})-heavy. By applying Lemma 5.4 with I={1,,m1}BI=\{1,\dots,m-1\}\setminus B^{\prime}, we obtain that with probability at least 12Qm112T1-2^{-Q_{m-1}}\geq 1-2^{-T} there exists some i{1,,m1}Bi\in\{1,\dots,m-1\}\setminus B^{\prime} such that the matrix Mm[{Qm1+1,,m},B{i}]M_{m}[\{Q_{m-1}+1,\dots,m\},B^{\prime}\cup\{i\}] is (λ/2SQm1)(\lambda/2^{S-Q_{m-1}})-heavy (which in particular implies that QmQm1Q_{m}\leq Q_{m-1}). On the other hand, if Qm1=Q_{m-1}=\infty, then step mm cannot be a failure.

We furthermore claim that for any n<mn+5Sn<m\leq n+5S, when conditioning on any outcome of Mm1M_{m-1}, step mm is progress with probability at least 1/41/4. First assume that Qm1{T,}Q_{m-1}\not\in\{T,\infty\}, and let BB^{\prime} be a set of size m1Qm1m-1-Q_{m-1} with {1,,Qm1}B{1,,m1}\{1,\dots,Q_{m-1}\}\subseteq B^{\prime}\subseteq\{1,\dots,m-1\} such that Mm1[{Qm1+1,,m1},B]M_{m-1}[\{Q_{m-1}+1,\dots,m-1\},B^{\prime}] is (λ/2SQm1)(\lambda/2^{S-Q_{m-1}})-heavy. We can then apply Lemma 5.5 to the sets {Qm1+1,,m1}\{Q_{m-1}+1,\dots,m-1\} and BB^{\prime}. Since {1,,Qm1}B\{1,\dots,Q_{m-1}\}\subseteq B^{\prime} and |B|=m1Qm1|B^{\prime}|=m-1-Q_{m-1}, we have |{Qm1+1,,m1}B|=Qm1T2|\{Q_{m-1}+1,\dots,m-1\}\setminus B^{\prime}|=Q_{m-1}\geq T\geq 2. So we can find two distinct elements b1,b2{Qm1+1,,m1}Bb_{1},b_{2}\in\{Q_{m-1}+1,\dots,m-1\}\setminus B^{\prime}. Let us furthermore take a=Qm1B{Qm1+1,,m1}a=Q_{m-1}\in B^{\prime}\setminus\{Q_{m-1}+1,\dots,m-1\}. By Lemma 5.5, there exist distinct elements i,j{a,b1,b2}i,j\in\{a,b_{1},b_{2}\} such that the set B=(B{a}){i,j,m}B^{*}=(B^{\prime}\setminus\{a\})\cup\{i,j,m\} has the property that the matrix Mm[{Qm1,,m},B]M_{m}[\{Q_{m-1},\dots,m\},B^{*}] is (λ/2SQm1+1)(\lambda/2^{S-Q_{m-1}+1})-heavy with probability at least 1/41/4. Also note that |B|=m(Qm1)|B^{*}|=m-(Q_{m}-1) and {1,,Qm11}B{1,,m}\{1,\dots,Q_{m-1}-1\}\subseteq B^{*}\subseteq\{1,\dots,m\}. So we can conclude that with probability at least 1/41/4 we have QmQm11Q_{m}\leq Q_{m-1}-1, meaning that step mm is progress.

In order to finish proving the claim that for any n<mn+5Sn<m\leq n+5S (when conditioning on any outcome of Mm1M_{m-1}), step mm is progress with probability at least 1/41/4, it only remains to consider the cases Qm1=Q_{m-1}=\infty and Qm1=TQ_{m-1}=T. If Qm1=Q_{m-1}=\infty, then step mm is always progress. If Qm1=TQ_{m-1}=T, then step mm is progress if and only if it is not failure, and we already proved that step mm is failure with probability at most 2T1/43/42^{-T}\leq 1/4\leq 3/4.

Having proved this claim, we can now conclude that the number of progress steps among the 5S5S steps m{n+1,,n+5S}m\in\{n+1,\dots,n+5S\} stochastically dominates a sum of 5S5S independent Ber(1/4)\operatorname{Ber}(1/4) random variables. By the Chernoff bound (Lemma 4.7) such a sum is at least SS with probability at least 1e2(S/4)2/(5S)=1eS/401-e^{-2(S/4)^{2}/(5S)}=1-e^{-S/40}. Thus, the number of progress steps is also at least SS with probability at least 1eS/401-e^{-S/40}. Furthermore, note that by the union bound, with probability at least 15S2T1-5S\cdot 2^{-T} none of the 5S5S steps m{n+1,,n+5S}m\in\{n+1,\dots,n+5S\} is a failure.

If there are no failures and at least SSTS\geq S-T progress steps, then we must have Qn+5S=TQ_{n+5S}=T. Hence, with probability at least 15S2TeS/401-5S\cdot 2^{-T}-e^{-S/40} there is a set BB^{\prime} of size n+5STn+5S-T with {1,,T}B{1,,n+5S}\{1,\dots,T\}\subseteq B^{\prime}\subseteq\{1,\dots,n+5S\} such that the matrix Mn+5S[{T+1,,n+5S},B]M_{n+5S}[\{T+1,\dots,n+5S\},B^{\prime}] is (λ/2ST)(\lambda/2^{S-T})-heavy. ∎

7 Proof of Lemma 3.2: survival of heavy submatrices

In this section we prove Lemma 3.2. Recall that we call subsets S1,,SmS_{1},\dots,S_{m} of some ground set SS complement-disjoint, if their complements SS1,,SSmS\setminus S_{1},\dots,S\setminus S_{m} are disjoint (and note that this condition is in particular satisfied if S1==Sm=SS_{1}=\dots=S_{m}=S).

As in the lemma statement, let λ>0\lambda>0 and let A1,,Am,B1,,BmA_{1},\dots,A_{m},B_{1},\dots,B_{m} be complement-disjoint subsets of {1,,n}\{1,\dots,n\} of size nLn-L. Recall that we are conditioning on an outcome of the matrix MnM_{n} such that we have |perMn[A,B]|λ|\operatorname{per}M_{n}[A_{\ell},B_{\ell}]|\geq\lambda for =1,,m\ell=1,\dots,m. Also recall that we are assuming that mm is large.

Let x1,,xn,zx_{1},\dots,x_{n},z be the entries of the new row and column in Mn+1M_{n+1}, and let us condition on any fixed outcome of zz (which we no longer view as being random).

First, starting from the complement-disjoint subsets A1,,Am,B1,,Bm{1,,n}A_{1},\dots,A_{m},B_{1},\dots,B_{m}\subseteq\{1,\dots,n\} of size nLn-L, we will construct certain complement-disjoint subsets A1,,Am,B1,,Bm{1,,n}A_{1}^{*},\dots,A_{m}^{*},B_{1}^{*},\dots,B_{m}^{*}\subseteq\{1,\dots,n\} of size nL+1n-L+1. The plan is then to choose the desired subsets A1,,Am,B1,,BmA_{1}^{\prime},\dots,A_{m^{\prime}}^{\prime},B_{1}^{\prime},\dots,B_{m^{\prime}}^{\prime} in Lemma 3.2 to each be of the form Ai{n+1}A_{i}^{*}\cup\{n+1\} or Bi{n+1}B_{i}^{*}\cup\{n+1\}, for suitably chosen i{1,,m}i\in\{1,\dots,m\}.

Claim 7.1.

We can find quadruples (A,B,i,j)(A^{*}_{\ell},B^{*}_{\ell},i_{\ell},j_{\ell}) for {1,,m}\ell\in\{1,\dots,m\}, satisfying the following conditions.

  • For each {1,,m}\ell\in\{1,\dots,m\}, we have A,B{1,,n}A_{\ell}^{*},B_{\ell}^{*}\subseteq\{1,\dots,n\} and |A|=|B|=nL+1|A^{*}_{\ell}|=|B^{*}_{\ell}|=n-L+1, and furthermore i,jABi_{\ell},j_{\ell}\in A_{\ell}^{*}\cap B_{\ell}^{*}.

  • The elements i1,j1,,im,jm{1,,n}i_{1},j_{1},\dots,i_{m},j_{m}\in\{1,\dots,n\} are distinct.

  • The sets A1,B1,,Am,BmA_{1}^{*},B_{1}^{*},\dots,A_{m}^{*},B_{m}^{*} are complement-disjoint (over the ground set {1,,n}\{1,\dots,n\}).

  • For each {1,,m}\ell\in\{1,\dots,m\}, if we view perMn+1[A{n+1},B{n+1}]\operatorname{per}M_{n+1}[A_{\ell}^{*}\cup\{n+1\},B_{\ell}^{*}\cup\{n+1\}] as a polynomial in x1,,xnx_{1},\dots,x_{n}, then the coefficient of xixjx_{i_{\ell}}x_{j_{\ell}} has absolute value at least λ/2\lambda/2.

Proof.

First consider the case L=1L=1. For every {1,,m}\ell\in\{1,\dots,m\}, let us take A=B={1,,n}A_{\ell}^{*}=B_{\ell}^{*}=\{1,\dots,n\}, let ii_{\ell} be the single element of {1,,n}A\{1,\dots,n\}\setminus A_{\ell}, and let jj_{\ell} be the single element of {1,,n}B\{1,\dots,n\}\setminus B_{\ell}. Note that the second condition is satisfied by the complement-disjointness of the sets A1,B1,,Am,BmA_{1},B_{1},\dots,A_{m},B_{m}. For the last condition, observe that by Fact 5.2 and the symmetry of our matrices, the coefficient of xixjx_{i_{\ell}}x_{j_{\ell}} in perMn+1\operatorname{per}M_{n+1} equals perMn[A,B]+perMn[B,A]=2perMn[A,B]\operatorname{per}M_{n}[A_{\ell},B_{\ell}]+\operatorname{per}M_{n}[B_{\ell},A_{\ell}]=2\operatorname{per}M_{n}[A_{\ell},B_{\ell}], which has absolute value at least 2λλ/22\lambda\geq\lambda/2.

Now, we consider the case L2L\geq 2. For each {1,,m}\ell\in\{1,\dots,m\}, choose aBAa_{\ell}\in B_{\ell}\setminus A_{\ell} and distinct b,bABb_{\ell},b_{\ell}^{\prime}\in A_{\ell}\setminus B_{\ell} (this is possible since AA_{\ell} and BB_{\ell} are complement-disjoint and have size at most n2n-2). Now let i,j{a,b,b}i_{\ell},j_{\ell}\in\{a_{\ell},b_{\ell},b_{\ell}^{\prime}\} be as in Lemma 5.3, and let A=A{a}A_{\ell}^{*}=A_{\ell}\cup\{a_{\ell}\} and B=(B{a}){i,j}B_{\ell}^{*}=(B_{\ell}\setminus\{a_{\ell}\})\cup\{i_{\ell},j_{\ell}\}. For the last condition, note that by Fact 5.2 the coefficient of xixjx_{i_{\ell}}x_{j_{\ell}} in perMn+1[A{n+1},B{n+1}]\operatorname{per}M_{n+1}[A_{\ell}^{*}\cup\{n+1\},B_{\ell}^{*}\cup\{n+1\}] equals perMn[A,B](i,j)+perMn[B,A](j,i)\operatorname{per}M_{n}[A_{\ell}^{*},B_{\ell}^{*}]^{(i_{\ell},j_{\ell})}+\operatorname{per}M_{n}[B_{\ell}^{*},A_{\ell}^{*}]^{(j_{\ell},i_{\ell})}, which has absolute value at least λ/2\lambda/2 by the conclusion of Lemma 5.3. ∎

Fix quadruples (A,B,i,j)(A^{*}_{\ell},B^{*}_{\ell},i_{\ell},j_{\ell}) for {1,,m}\ell\in\{1,\dots,m\} as in Claim 7.1. Let I={i1,j1,,im,jm}{1,,n}I=\{i_{1},j_{1},\dots,i_{m},j_{m}\}\subseteq\{1,\dots,n\}, and let us condition on any outcome for all the variables xix_{i} with iIi\notin I (which we will no longer view as being random). For every =1,,m\ell=1,\dots,m, define P=perMn+1[A{n+1},B{n+1}]P_{\ell}=\operatorname{per}M_{n+1}[A_{\ell}^{*}\cup\{n+1\},B_{\ell}^{*}\cup\{n+1\}], viewed as a polynomial in the variables xix_{i} for iIi\in I, but with all quadratic terms xi2x_{i}^{2} replaced by 11 (recall that our variables xix_{i} are chosen in {1,1}\{-1,1\}). Then PP_{\ell} is a multilinear quadratic polynomial and the coefficient of xixjx_{i_{\ell}}x_{j_{\ell}} has absolute value at least λ/2\lambda/2.

Now, after our conditioning, the only remaining randomness comes from the 2m2m variables xix_{i} for iIi\in I. It suffices to show that for sufficiently large mm we have

Pr(|P|λ/(4n4) for at least m/36 indices {1,,m})1m1/24.\Pr\mathopen{}\mathclose{{}\left(|P_{\ell}|\geq\lambda/(4n^{4})\text{ for at least }m/36\text{ indices }\ell\in\{1,\dots,m\}}\right)\geq 1-m^{-1/24}. (7.1)

Indeed, if the event in Equation 7.1 holds, then we can take the sets A1,,Am,B1,,BmA_{1}^{\prime},\dots,A_{m^{\prime}},B_{1}^{\prime},\dots,B_{m^{\prime}} in Lemma 3.2 to be the sets A{n+1}A_{\ell}^{*}\cup\{n+1\} and B{n+1}B_{\ell}^{*}\cup\{n+1\} for m=m/36m^{\prime}=\mathopen{}\mathclose{{}\left\lceil m/36}\right\rceil different indices {1,,n}\ell\in\{1,\dots,n\} for which we have |P|=|perMn+1[A{n+1},B{n+1}]|λ/(4n4)|P_{\ell}|=|\operatorname{per}M_{n+1}[A_{\ell}^{*}\cup\{n+1\},B_{\ell}^{*}\cup\{n+1\}]|\geq\lambda/(4n^{4}).

Let σ=λ/(4n2)\sigma=\lambda/(4n^{2}) and τ=λ/(4n4)\tau=\lambda/(4n^{4}). Using the notation introduced above Theorem 4.4, for each {1,,m}\ell\in\{1,\dots,m\} we consider the graph G=G(τ)(P)G_{\ell}=G^{(\tau)}(P_{\ell}) on the vertex set II whose edges correspond to the coefficients of the polynomial PP_{\ell} of absolute value at least τ\tau. We say that the index {1,,m}\ell\in\{1,\dots,m\} is easy if the graph GG_{\ell} has matching number ν(G)m1/6\nu(G_{\ell})\geq m^{1/6}. If there are many easy indices, then we can prove Equation 7.1 using the Meka–Nguyen–Vu polynomial anti-concentration inequality (Theorem 4.4), as follows.

Claim 7.2.

If there are at least m/3m/3 easy indices {1,,m}\ell\in\{1,\dots,m\}, then Equation 7.1 holds.

Proof.

Recall that for each easy index \ell, we have ν(G(τ)(P))=ν(G)m1/6\nu(G^{(\tau)}(P_{\ell}))=\nu(G_{\ell})\geq m^{1/6}. Hence by Theorem 4.4 we have that

Pr(|P|τ)1(logν(G))Cν(G)1/21ν(G)1/31m1/18\Pr(|P_{\ell}|\geq\tau)\geq 1-\frac{(\log\nu(G_{\ell}))^{C}}{\nu(G_{\ell})^{1/2}}\geq 1-\nu(G_{\ell})^{-1/3}\geq 1-m^{-1/18}

for sufficiently large mm (where CC is the absolute constant appearing in the statement of Theorem 4.4). Hence by Markov’s inequality (specifically Lemma 4.8 applied with p=1m1/18p=1-m^{-1/18} and q=1/2q=1/2), with probability at least 12m1/181m1/241-2m^{-1/18}\geq 1-m^{-1/24} we have |P|τ=λ/(4n4)|P_{\ell}|\geq\tau=\lambda/(4n^{4}) for at least (1/2)(m/3)=m/6(1/2)\cdot(m/3)=m/6 easy indices {1,,m}\ell\in\{1,\dots,m\} (again assuming that mm is sufficiently large). This in particular proves Equation 7.1. ∎

Let us from now on assume that there are at least 2m/32m/3 indices {1,,m}\ell\in\{1,\dots,m\} which are not easy. For each of these non-easy \ell, since GG_{\ell} has no large matching it must have a large vertex cover, as follows.

Claim 7.3.

For every non-easy index {1,,m}\ell\in\{1,\dots,m\}, there is a subset SIS_{\ell}\subseteq I of size |S|2m1/6|S_{\ell}|\leq 2m^{1/6} such that each edge of the graph GG_{\ell} contains at least one vertex in SS_{\ell} (in other words, SS_{\ell} is a vertex cover of the graph GG_{\ell}).

Proof.

Let us take a maximal collection of disjoint edges in GG_{\ell} (this collection consists of at most ν(G)<m1/6\nu(G_{\ell})<m^{1/6} edges), and let SS_{\ell} consist of all the vertices contained in one of these edges. Then by the maximality of the chosen edge collection, each edge of GG_{\ell} must contain at least one vertex in SS_{\ell}. ∎

For each non-easy \ell, fix a subset SIS_{\ell}\subseteq I as in Claim 7.3. We now briefly describe the strategy of the remainder of the proof. The idea is that all the degree-2 terms of PP_{\ell} whose coefficient has large absolute value must contain a variable whose index is in SS_{\ell}. So if we condition on outcomes of xix_{i} for iSi\in S_{\ell}, then PP_{\ell} “essentially” becomes a linear polynomial (apart from some small terms that we can ignore). If this linear polynomial has many coefficients with large absolute value, then we can apply the Erdős–Littlewood–Offord inequality to show that |P||P_{\ell}| is typically quite large. However, it is possible that for most of the non-easy \ell, we end up with linear polynomials which have few coefficients of large absolute value (in which case we will not be able to use such an argument). It turns out that this is unlikely to happen unless for many \ell the polynomial PP_{\ell} each essentially depend on only a few of the variables xix_{i} (we will call such indices \ell short). We will be able to handle the case that there are many such indices \ell using the Azuma–Hoeffding inequality.

Let us say that a variable xix_{i} with iIi\in I is bad if we have iSi\in S_{\ell} for at least m1/3m^{1/3} non-easy indices {1,,m}\ell\in\{1,\dots,m\}. Note that by a simple counting argument, there are at most m2m1/6/m1/3=2m5/6m\cdot 2m^{1/6}/m^{1/3}=2m^{5/6} bad variables. We say that a variable xix_{i} with iIi\in I is good if it is not bad. Let IgoodII_{\text{good}}\subseteq I be the set of all iIi\in I such that xix_{i} is good (and note that |Igood||I|=2m|I_{\text{good}}|\leq|I|=2m).

In addition to all of our previous conditioning, let us now also condition on any fixed outcome of all bad variables xix_{i}. This means that at this point the only remaining randomness comes from the variables xix_{i} with iIgoodi\in I_{\text{good}}. After fixing the outcomes for the bad variables, we can interpret each polynomial PP_{\ell} (for {1,,m}\ell\in\{1,\dots,m\}) as a polynomial in the variables xix_{i} with iIgoodi\in I_{\text{good}}. It suffices to prove Equation 7.1 with this additional conditioning on the outcomes of the bad variables.

Note that for each {1,,m}\ell\in\{1,\dots,m\} which is not easy, for any distinct i,jIgoodSi,j\in I_{\text{good}}\setminus S_{\ell} the coefficient of xixjx_{i}x_{j} in PP_{\ell} has absolute value less than τ\tau. Indeed, this follows from the definition of the graph G=G(τ)(P)G_{\ell}=G^{(\tau)}(P_{\ell}) and the fact that every edge of GG_{\ell} contains at least one vertex in SS_{\ell}.

Recall that σ=λ/(4n2)\sigma=\lambda/(4n^{2}). We say that an index {1,,m}\ell\in\{1,\dots,m\} is short if there are at most 6m1/66m^{1/6} good variables xix_{i} such that xix_{i} appears in a term of PP_{\ell} whose coefficient has absolute value at least σ\sigma.

Claim 7.4.

If there are at least m/3m/3 short indices {1,,m}\ell\in\{1,\dots,m\}, then Equation 7.1 holds.

Proof.

Recall that we are viewing each PP_{\ell} as a polynomial in the variables xix_{i} with iIgoodi\in I_{\text{good}}. Now, let PP_{\ell}^{\prime} be the polynomial obtained from PP_{\ell} by deleting all terms whose coefficients have absolute value less than σ\sigma. Note that for each short index {1,,m}\ell\in\{1,\dots,m\}, the polynomial PP_{\ell}^{\prime} contains at most 6m1/66m^{1/6} different variables xix_{i}. Also note that we always have |PP|<σn2=λ/4|P_{\ell}-P_{\ell}^{\prime}|<\sigma n^{2}=\lambda/4 for any outcomes of the good variables xi{1,1}x_{i}\in\{-1,1\} (this is because at most n2n^{2} terms get deleted in PP_{\ell}^{\prime}, each with absolute value less than σ\sigma).

We say that a good variable is short-popular if it appears in the polynomial PP_{\ell}^{\prime} for at least m1/3m^{1/3} different short indices {1,,m}\ell\in\{1,\dots,m\}. Note that there are at most m6m1/6/m1/3=6m5/6m\cdot 6m^{1/6}/m^{1/3}=6m^{5/6} short-popular variables. For the remainder of the proof of this claim, let us now also condition on any fixed outcomes for all short-popular variables xix_{i}, which we no longer view as being random.

Since there are at most 2m5/62m^{5/6} bad variables, and at most 6m5/66m^{5/6} short-popular variables, there are at least m/38m5/6m/4m/3-8m^{5/6}\geq m/4 short indices {1,,m}\ell\in\{1,\dots,m\} for which both of the variables xix_{i_{\ell}} and xjx_{j_{\ell}} are good and not short-popular (for the inequality here, we are assuming that mm is sufficiently large). For any such index \ell, the coefficient of xixjx_{i_{\ell}}x_{j_{\ell}} in the polynomial PP_{\ell}^{\prime} has absolute value at least λ/2\lambda/2 (since this is also the case for PP_{\ell}, and λ/2>σ\lambda/2>\sigma). Hence we can apply Fact 4.3 to find that Pr(|P|λ/2)1/4\Pr(|P_{\ell}^{\prime}|\geq\lambda/2)\geq 1/4.

So, if YY is the number of short indices \ell with |P|λ/2|P_{\ell}^{\prime}|\geq\lambda/2, then 𝔼Y(m/4)(1/4)=m/16\mathbb{E}Y\geq(m/4)\cdot(1/4)=m/16. Recall that we already conditioned on outcomes of all the short-popular variables. So each of the remaining random variables occur in PP_{\ell}^{\prime} for at most m1/3m^{1/3} short indices \ell, and hence varying each individual variable affects the value of YY by at most m1/3m^{1/3}. Therefore, by Lemma 4.5 we have Ym/32Y\geq m/32 with probability at least

12exp((m/32)222mm2/3)=12exp(m1/3212)1m1/24,1-2\exp\mathopen{}\mathclose{{}\left(-\frac{(m/32)^{2}}{2\cdot 2m\cdot m^{2/3}}}\right)=1-2\exp\mathopen{}\mathclose{{}\left(-\frac{m^{1/3}}{2^{12}}}\right)\geq 1-m^{-1/24},

assuming that mm is sufficiently large. Hence with probability at least 1m1/241-m^{-1/24} there are at least m/32m/32 short indices \ell with |P|λ/2|P_{\ell}^{\prime}|\geq\lambda/2, which implies that |P|λ/2λ/4λ/(4n4)|P_{\ell}|\geq\lambda/2-\lambda/4\geq\lambda/(4n^{4}). This in particular proves Equation 7.1. ∎

We may from now on assume that there are at least m/3m/3 indices {1,,m}\ell\in\{1,\dots,m\} which are not easy and not short. Let us call such indices interesting.

For every interesting index {1,,m}\ell\in\{1,\dots,m\}, recall that PP_{\ell} is a multilinear polynomial in the variables xix_{i} with iIgoodi\in I_{\text{good}}. Furthermore, recall that for any distinct i,jIgoodSi,j\in I_{\text{good}}\setminus S_{\ell} the coefficient of xixjx_{i}x_{j} in PP_{\ell} has absolute value less than τ\tau. Let PP_{\ell}^{*} be the polynomial obtained from PP_{\ell} by deleting all terms of the form xixjx_{i}x_{j} for i,jIgoodSi,j\in I_{\text{good}}\setminus S_{\ell}. Note that we always have |PP|τn(n1)/2τ(n21)=στ|P_{\ell}-P_{\ell}^{*}|\leq\tau n(n-1)/2\leq\tau(n^{2}-1)=\sigma-\tau (for all outcomes of the xix_{i}).

For every interesting {1,,m}\ell\in\{1,\dots,m\}, there are at least 6m1/66m^{1/6} good variables which appear in a term of PP_{\ell} whose coefficient has absolute value at least σ\sigma. Since only terms with coefficient less than τ<σ\tau<\sigma get deleted in PP_{\ell}^{*}, this means that there are also at least 6m1/66m^{1/6} good variables which appear in a term of PP_{\ell}^{*} whose coefficient has absolute value at least σ\sigma.

Now, for every interesting {1,,m}\ell\in\{1,\dots,m\}, let us interpret P[xi,iIgood]P_{\ell}^{*}\in\mathbb{R}[x_{i},i\in I_{\text{good}}] as a polynomial Q[xi,iS][xi,iIgoodS]Q_{\ell}\in\mathbb{R}[x_{i},i\in S_{\ell}][x_{i},i\in I_{\text{good}}\setminus S_{\ell}], i.e. as a polynomial in the variables xix_{i} for iIgoodSi\in I_{\text{good}}\setminus S_{\ell} whose coefficients are polynomials in the variables xix_{i} for iSi\in S_{\ell}. Then QQ_{\ell} is a linear polynomial (in the variables xix_{i} for iIgoodSi\in I_{\text{good}}\setminus S_{\ell}). Its constant coefficient is a quadratic polynomial, and its other coefficients are linear polynomials (in the variables xix_{i} for iSi\in S_{\ell}).

Let TT_{\ell} be the number of degree-11 coefficients of the linear polynomial QQ_{\ell} (in the variables xix_{i} for iIgoodSi\in I_{\text{good}}\setminus S_{\ell}) which have absolute value at least σ\sigma. This is a random variable depending on the outcomes of the xix_{i} with iSi\in S_{\ell}.

Claim 7.5.

For each interesting index {1,,m}\ell\in\{1,\dots,m\}, we have Pr(Tm1/6)1/3\Pr\mathopen{}\mathclose{{}\left(T_{\ell}\geq m^{1/6}}\right)\geq 1/3.

Proof.

Fix an interesting {1,,m}\ell\in\{1,\dots,m\}. Recall that there are at least 6m1/66m^{1/6} good variables xix_{i} which appear in a term of PP_{\ell}^{*} whose coefficient has absolute value at least σ\sigma. Since |S|2m1/6|S_{\ell}|\leq 2m^{1/6}, at least 4m1/64m^{1/6} of these variables satisfy iIgoodSi\in I_{\text{good}}\setminus S_{\ell}. For each such ii, the coefficient of xix_{i} in QQ_{\ell} is a linear polynomial in the variables xjx_{j} for jSj\in S_{\ell}, with at least one coefficient of absolute value at least σ\sigma. Thus, by Fact 4.2, with probability at least 1/21/2 the coefficient of xix_{i} in QQ_{\ell} has absolute value at least σ\sigma. By Markov’s inequality (specifically Lemma 4.8 applied with p=1/2p=1/2 and q=1/4q=1/4), with probability at least 1/31/3 there are at least (1/4)4m1/6=m1/6(1/4)\cdot 4m^{1/6}=m^{1/6} different iIgoodSi\in I_{\text{good}}\setminus S_{\ell} such that the coefficient of xix_{i} in QQ_{\ell} has absolute value at least σ\sigma. In other words, with probability at least 1/31/3, we have Tm1/6T_{\ell}\geq m^{1/6}. ∎

Claim 7.6.

For each interesting index {1,,m}\ell\in\{1,\dots,m\}, we have Pr(Tm1/6 and |P|<σ)3m1/12\Pr\mathopen{}\mathclose{{}\left(T_{\ell}\geq m^{1/6}\text{ and }|P_{\ell}^{*}|<\sigma}\right)\leq 3m^{-1/12}.

Proof.

Fix an interesting {1,,m}\ell\in\{1,\dots,m\}. Recall that TT_{\ell} depends only on the variables xix_{i} with iSi\in S_{\ell}. So, for the proof of this claim, let us condition on some outcome for the variables xix_{i} with iSi\in S_{\ell} such that we have Tm1/6T_{\ell}\geq m^{1/6}. Under this conditioning the polynomial PP_{\ell}^{*} becomes precisely the polynomial QQ_{\ell}, which is a linear polynomial in the remaining variables xix_{i} for iIgoodSi\in I_{\text{good}}\setminus S_{\ell}, having Tm1/6T_{\ell}\geq m^{1/6} degree-11 coefficients with absolute value at least σ\sigma. Now, by the Erdős–Littlewood–Offord inequality (Theorem 4.1, applied with t=1t=1) we indeed have Pr(|P|<σ)3/T1/23m1/12\Pr\mathopen{}\mathclose{{}\left(|P_{\ell}^{*}|<\sigma}\right)\leq 3/T_{\ell}^{1/2}\leq 3m^{-1/12}. ∎

Next, define the random variable XX as the number of interesting {1,,m}\ell\in\{1,\dots,m\} such that Tm1/6T_{\ell}\geq m^{1/6}. Furthermore, define the random variable YY as the number of interesting {1,,m}\ell\in\{1,\dots,m\} such that Tm1/6T_{\ell}\geq m^{1/6} and |P|<σ|P_{\ell}^{*}|<\sigma.

Claim 7.7.

With probability at least 1m1/121-m^{-1/12} we have Xm/18X\geq m/18.

Proof.

First, note that by Claim 7.5 we have 𝔼X(m/3)(1/3)=m/9\mathbb{E}X\geq(m/3)\cdot(1/3)=m/9. Also recall that for each interesting \ell, the event Tm1/6T_{\ell}\geq m^{1/6} only depends on the outcomes of xix_{i} for iSi\in S_{\ell}. This means that changing the outcome of any of the good random variables xix_{i} can affect XX by at most m1/3m^{1/3} (recall that for each good xix_{i}, we have iSi\in S_{\ell} for at most m1/3m^{1/3} different \ell). Hence by Lemma 4.5, we have Xm/18X\geq m/18 with probability at least

12exp((m/18)222mm2/3)=12exp(m1/3362)1m1/121-2\exp\mathopen{}\mathclose{{}\left(-\frac{(m/18)^{2}}{2\cdot 2m\cdot m^{2/3}}}\right)=1-2\exp\mathopen{}\mathclose{{}\left(-\frac{m^{1/3}}{36^{2}}}\right)\geq 1-m^{-1/12}

(for sufficiently large mm), as desired. ∎

Claim 7.8.

With probability at least 1108m1/121-108m^{-1/12} we have Ym/36Y\leq m/36.

Proof.

By Claim 7.6 we have 𝔼Ym3m1/12=3m11/12\mathbb{E}Y\leq m\cdot 3m^{-1/12}=3m^{11/12}. Hence, by Markov’s inequality we have Ym/36Y\geq m/36 with probability at most 108m1/12108m^{-1/12}. ∎

From the previous two claims, we conclude that if mm is sufficiently large, then with probability at least 1109m1/121m1/241-109m^{-1/12}\geq 1-m^{-1/24} we have XYm/36X-Y\geq m/36. But note that whenever XYm/36X-Y\geq m/36, there are at least m/36m/36 interesting {1,,m}\ell\in\{1,\dots,m\} such that |P|σ|P_{\ell}^{*}|\geq\sigma, which implies |P|σ(στ)=τ=λ/(4n4)|P_{\ell}|\geq\sigma-(\sigma-\tau)=\tau=\lambda/(4n^{4}). This proves Equation 7.1, and finishes the proof of Lemma 3.2.

8 Concluding remarks

We have proved that the permanent of a random symmetric ±1\pm 1 matrix typically has magnitude nn/2+o(n)n^{n/2+o(n)}. This encapsulates the upper bound in Proposition 1.2 and the lower bound in Theorem 1.3.

The upper bound and lower bound both permit some fairly immediate generalisations. For example, in the setting of Theorem 1.3 we actually have Pr(perMn=a)Pr(|perMna|nn/2εn)nc\Pr(\operatorname{per}M_{n}=a)\leq\Pr(|\operatorname{per}M_{n}-a|\leq n^{n/2-\varepsilon n})\leq n^{-c} for all aa (not just a=0a=0). To see this, recall that the proof of Theorem 1.3 concludes by repeatedly applying Lemma 3.2, where the final application shows that, conditional on a typical outcome of perMn1\operatorname{per}M_{n-1}, it is very likely that |perMn|nn/2εn|\operatorname{per}M_{n}|\leq n^{n/2-\varepsilon n}. In this final application we can instead apply a slight generalisation of Lemma 3.2 (proved in the same way), showing that for any aa in fact it is very likely that |perMna|nn/2εn|\operatorname{per}M_{n}-a|\leq n^{n/2-\varepsilon n}.

We can also permit the entries of our random matrix to take more general distributions. As defined in the introduction, consider any real probability distributions μ\mu and ν\nu, and let Mnμ,νM_{n}^{\mu,\nu} be the random symmetric matrix whose diagonal entries have distribution ν\nu and whose off-diagonal entries have distribution μ\mu (and whose entries on and above the diagonal are mutually independent). If μ\mu and ν\nu are fixed (not depending on nn), ν\nu has finite second moment and μ\mu has vanishing first moment and finite fourth moment, then essentially the same proof as for Proposition 1.2 shows that Pr(|perMnμ,ν|nn/2+εn)nεn\Pr\mathopen{}\mathclose{{}\left(|\operatorname{per}M^{\mu,\nu}_{n}|\geq n^{n/2+\varepsilon n}}\right)\leq n^{-\varepsilon n} for nn sufficiently large with respect to ε\varepsilon, μ\mu and ν\nu.

The conclusion of Theorem 1.3 can be even more freely generalised to any fixed distributions μ\mu and ν\nu such that μ\mu is supported on at least two points (not requiring any moment assumptions at all). In the case where μ\mu is supported on two points, any quadratic polynomial in independent μ\mu-distributed random variables can be rewritten as a multilinear polynomial. One can then use essentially the same proof as for Theorem 1.3 (only changing the various constants in the lemma statements) to prove that there is a constant cc (depending on μ\mu but not ν\nu) such that for all ε>0\varepsilon>0 and all nn sufficiently large with respect to ε\varepsilon, we have Pr(|perMnμ,ν|nn/2εn)nc\Pr(|\operatorname{per}M^{\mu,\nu}_{n}|\leq n^{n/2-\varepsilon n})\leq n^{-c}. Note that changing ν\nu has no effect on our arguments, because we never actually use the randomness of the diagonal entries. One needs some slight generalisations of the anti-concentration lemmas in Section 4 for μ\mu-distributed random variables, but these are indeed available; see [5, Theorem A.1] and [23, Theorem 1.8].

In the case where μ\mu is supported on more than two points, then it is necessary to make slightly more involved changes to the proof of Theorem 1.3, but the same result does hold. To give a brief sketch: in this case, the main issue is that in the proof of Lemma 3.2 we are no longer able to assume that the relevant quadratic polynomials are multilinear, so we must treat the square terms xi2x_{i}^{2} in basically the same way we treat the linear terms xix_{i}. To be specific, in the proof of Lemma 3.2, we must allow the polynomials QQ_{\ell} to contain square terms xi2x_{i}^{2} in addition to linear terms. There are several aspects to this. First, we need to generalise Fact 4.3 and Theorem 4.4 to quadratic polynomials of independent μ\mu-distributed random variables. In particular, we need a generalisation of Theorem 4.4 for quadratic polynomials that are not necessarily multilinear, still giving a bound in terms of the graph matching number ν(G(r)(f))\nu(G^{(r)}(f)) (where we ignore the square terms in ff for the purpose of constructing the graph G(r)(f)G^{(r)}(f)). Suitable generalisations of Fact 4.3 and Theorem 4.4 can be proved with the methods in [23, Section 4] (in fact, in this section the authors prove [23, Theorem 1.8], which is essentially the required generalisation of Theorem 4.4 but with slightly different assumptions).

Second, we need a generalisation of Fact 4.2 for μ\mu-distributed random variables (for which we can make fairly trivial changes to the proof of Fact 4.2). And lastly, we need a generalisation of the Erdős–Littlewood–Offord theorem (Theorem 4.1) for polynomials of independent μ\mu-distributed random variables (when μ\mu is supported on at least three values) which applies not just to linear polynomials, but also to quadratic polynomials consisting of both linear and square terms (having no multilinear degree-2 terms). Such polynomials can be interpreted as linear polynomials in independent (but not identically distributed) random variables, and therefore the appropriate generalisation of Theorem 4.1 follows from the Doeblin–Lévy–Kolmogorov–Rogozin inequality [27] (or alternatively the method in [23, Section 4]), together with a basic single-variable quadratic anti-concentration bound. Namely, we need the fact that for any real random variable xx supported on at least 3 values, and any r>0r>0, there are s>0s>0 and p>0p>0 such that we have Pr(|f(x)|<s)<1p\Pr(|f(x)|<s)<1-p for any one-variable quadratic polynomial ff whose linear or quadratic coefficient has absolute value at least rr.

So, for example, both Proposition 1.2 and Theorem 1.3, and therefore Theorem 1.1, can be generalised to the case where μ\mu is a centred Gaussian distribution (as long as μ\mu and ν\nu do not depend on nn, and ν\nu has finite second moment). The Gaussian Orthogonal Ensemble (GOE) is an important special case. Another case that may be of particular interest is the case where the support of μ\mu is {0,1}\{0,1\}, and ν\nu is the trivial distribution always taking the value zero (Proposition 1.2 does not hold in this case, but Theorem 1.3 does). In this case Mnμ,νM^{\mu,\nu}_{n} can be interpreted as the adjacency matrix of a random graph. However, we note that in this case the statement in Theorem 1.3 can be proved in a much simpler way: we can take advantage of the fact that changing any off-diagonal entry in Mnμ,νM^{\mu,\nu}_{n} from 0 to 11 typically causes a large increase in the value of perMnμ,ν\operatorname{per}M^{\mu,\nu}_{n}, and apply a more general anti-concentration inequality for functions with this property [14, Theorem 1.2]. Actually, in this setting we suspect that it is possible to prove a limit law for perMnμ,ν\operatorname{per}M^{\mu,\nu}_{n}, using the ideas in [17].

Regarding further directions for research, it would be very interesting to prove stronger upper bounds for the concentration probabilities of the permanent, in both the i.i.d. and the symmetric case. It is currently only known that Pr(perMn=a)\Pr(\operatorname{per}M_{n}=a) and Pr(perAn=a)\Pr(\operatorname{per}A_{n}=a) are bounded by ncn^{-c} for some constant cc. It would be very interesting to prove bounds of the form nω(1)n^{-\omega(1)}, where ω(1)\omega(1) is any function going to infinity with nn. Actually, Vu conjectured (see [40, Conjecture 6.12]) that Pr(perAn=0)\Pr(\operatorname{per}A_{n}=0) is of the form ω(1)n\omega(1)^{-n} (in contrast to the situation for the determinant, where we have Pr(detAn=0)=2n+o(n)\Pr(\det A_{n}=0)=2^{-n+o(n)}). It seems reasonable to conjecture that in fact all probabilities of the form Pr(perMn=a)\Pr(\operatorname{per}M_{n}=a) or Pr(perAn=a)\Pr(\operatorname{per}A_{n}=a) are upper-bounded by ncnn^{-cn}, for some constant cc. However, essentially all known tools for studying permanents of random matrices also apply to determinants, so significant new ideas would be required to prove such strong results.

Acknowledgements. We would like to thank Asaf Ferber for insightful discussions. We are also grateful to the referee for their careful reading of the paper, and their useful comments and suggestions.

References

  • [1] S. Aaronson, Anti-concentration bound for permanents of Gaussian matrices?, MathOverflow post, https://mathoverflow.net/q/45822, 2010.
  • [2] S. Aaronson and A. Arkhipov, The computational complexity of linear optics, Theory Comput. 9 (2013), 143–252.
  • [3] N. Alon and J. H. Spencer, The probabilistic method, fourth ed., Wiley Series in Discrete Mathematics and Optimization, John Wiley & Sons, Inc., Hoboken, NJ, 2016.
  • [4] P. Bourgade and K. Mody, Gaussian fluctuations of the determinant of Wigner matrices, Electron. J. Probab. 24 (2019), Paper No. 96.
  • [5] J. Bourgain, V. H. Vu, and P. M. Wood, On the singularity probability of discrete random matrices, J. Funct. Anal. 258 (2010), no. 2, 559–603.
  • [6] M. V. Budrevich and A. E. Guterman, Kräuter conjecture on permanents is true, J. Combin. Theory Ser. A 162 (2019), 306–343.
  • [7] M. V. Budrevich, A. E. Guterman, and K. A. Taranin, On the divisibility of the permanent of (±1)(\pm 1)-matrices, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 439 (2015), no. Chislennye Metody i Voprosy Organizatsii Vychisleniĭ. XXVIII, 26–37.
  • [8] M. V. Budrevich, A. E. Guterman, and K. A. Taranin, On the Kräuter-Seifter theorem on the divisibility of permanents, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 463 (2017), no. Chislennye Metody i Voprosy Organizatsii Vychisleniĭ. XXX, 25–35.
  • [9] M. Campos, L. Mattos, R. Morris, and N. Morrison, On the singularity of random symmetric matrices, Duke Math. J. 170 (2021), no. 5, 881–907.
  • [10] K. P. Costello, T. Tao, and V. Vu, Random symmetric matrices are almost surely nonsingular, Duke Math. J. 135 (2006), no. 2, 395–413.
  • [11] L. Eldar and S. Mehraban, Approximating the permanent of a random matrix with vanishing mean, 59th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2018, IEEE Computer Soc., Los Alamitos, CA, 2018, pp. 23–34.
  • [12] A. Ferber, Singularity of random symmetric matrices – simple proof, C. R. Math. Acad. Sci. Paris 359 (2021), no. 6, 743–747.
  • [13] A. Ferber and V. Jain, Singularity of random symmetric matrices—a combinatorial approach to improved bounds, Forum Math. Sigma 7 (2019), e22.
  • [14] J. Fox, M. Kwan, and L. Sauermann, Combinatorial anti-concentration inequalities, with applications, Math. Proc. Cambridge Philos. Soc. 171 (2021), no. 2, 227–248.
  • [15] V. L. Girko, A central limit theorem for random determinants, Teor. Veroyatnost. i Primenen. 24 (1979), no. 4, 728–740.
  • [16] V. L. Girko, A refinement of the central limit theorem for random determinants, Teor. Veroyatnost. i Primenen. 42 (1997), no. 1, 63–73.
  • [17] S. Janson, The numbers of spanning trees, Hamilton cycles and perfect matchings in a random graph, Combin. Probab. Comput. 3 (1994), no. 1, 97–126.
  • [18] J. Kahn, J. Komlós, and E. Szemerédi, On the probability that a random ±1\pm 1-matrix is singular, J. Amer. Math. Soc. 8 (1995), no. 1, 223–240.
  • [19] J. Komlós, On the determinant of (0, 1)(0,\,1) matrices, Studia Sci. Math. Hungar. 2 (1967), 7–21.
  • [20] A. R. Kräuter and N. Seifter, On some questions concerning permanents of (1,1)(1,\,-1)-matrices, Israel J. Math. 45 (1983), no. 1, 53–62.
  • [21] A. R. Kräuter and N. Seifter, Some properties of the permanent of (1,1)(1,\,-1)-matrices, Linear and Multilinear Algebra 15 (1984), no. 3-4, 207–223.
  • [22] P. Lundow and K. Markström, Efficient computation of permanents, with applications to boson sampling and random matrices, arXiv preprint arXiv:1904.06229 (2019).
  • [23] R. Meka, O. Nguyen, and V. Vu, Anti-concentration for polynomials of independent random variables, Theory Comput. 12 (2016), Paper No. 11.
  • [24] H. H. Nguyen, Inverse Littlewood-Offord problems and the singularity of random symmetric matrices, Duke Math. J. 161 (2012), no. 4, 545–586.
  • [25] H. H. Nguyen, On the least singular value of random symmetric matrices, Electron. J. Probab. 17 (2012), Paper No. 53.
  • [26] H. H. Nguyen and V. Vu, Random matrices: law of the determinant, Ann. Probab. 42 (2014), no. 1, 146–167.
  • [27] B. A. Rogozin, On the increase of dispersion of sums of independent random variables, Teor. Verojatnost. i Primenen 6 (1961), 106–108.
  • [28] M. Rudelson and R. Vershynin, The Littlewood-Offord problem and invertibility of random matrices, Adv. Math. 218 (2008), no. 2, 600–633.
  • [29] N. Seifter, Upper bounds for permanents of (1,1)(1,-1)-matrices, Israel J. Math. 48 (1984), no. 1, 69–78.
  • [30] R. Simion and F. W. Schmidt, On (+1,1)(+1,-1)-matrices with vanishing permanent, Discrete Math. 46 (1983), no. 1, 107–108.
  • [31] T. Tao and V. Vu, On random ±1\pm 1 matrices: singularity and determinant, Random Structures Algorithms 28 (2006), no. 1, 1–23.
  • [32] T. Tao and V. Vu, On the singularity probability of random Bernoulli matrices, J. Amer. Math. Soc. 20 (2007), no. 3, 603–628.
  • [33] T. Tao and V. Vu, On the permanent of random Bernoulli matrices, Adv. Math. 220 (2009), no. 3, 657–669.
  • [34] T. Tao and V. Vu, A central limit theorem for the determinant of a Wigner matrix, Adv. Math. 231 (2012), no. 1, 74–101.
  • [35] T. Tao and V. H. Vu, Additive combinatorics, Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2010, Paperback edition.
  • [36] K. Tikhomirov, Singularity of random Bernoulli matrices, Ann. of Math. (2) 191 (2020), no. 2, 593–634.
  • [37] L. G. Valiant, The complexity of computing the permanent, Theoret. Comput. Sci. 8 (1979), no. 2, 189–201.
  • [38] R. Vershynin, Invertibility of symmetric random matrices, Random Structures Algorithms 44 (2014), no. 2, 135–182.
  • [39] V. Vu, Random matrices: a survey, slides for a presentation at an IPAM workshop, http://campuspress.yale.edu/vanvu/files/2013/01/ipam09.pdf, 2009.
  • [40] V. H. Vu, Recent progress in combinatorial random matrix theory, Probab. Surv. 18 (2021), 179–200.
  • [41] E. T.-H. Wang, On permanents of (1,1)(1,\,-1)-matrices, Israel J. Math. 18 (1974), 353–361.
  • [42] I. M. Wanless, Permanents of matrices of signed ones, Linear Multilinear Algebra 53 (2005), no. 6, 427–433.
  • [43] E. P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Ann. of Math. (2) 62 (1955), 548–564.
  • [44] E. P. Wigner, On the distribution of the roots of certain symmetric matrices, Ann. of Math. (2) 67 (1958), 325–327.