This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fourier Growth of Communication Protocols for XOR Functions

Uma Girish Princeton University. Email: [email protected]    Makrand Sinha Simons Institute and University of California at Berkeley. Email: [email protected]    Avishay Tal University of California at Berkeley. Email: [email protected]    Kewen Wu University of California at Berkeley. Email: [email protected]
Abstract

The level-kk 1\ell_{1}-Fourier weight of a Boolean function refers to the sum of absolute values of its level-kk Fourier coefficients. Fourier growth refers to the growth of these weights as kk grows. It has been extensively studied for various computational models, and bounds on the Fourier growth, even for the first few levels, have proven useful in learning theory, circuit lower bounds, pseudorandomness, and quantum-classical separations.

In this work, we investigate the Fourier growth of certain functions that naturally arise from communication protocols for XOR functions (partial functions evaluated on the bitwise XOR of the inputs xx and yy to Alice and Bob). If a protocol 𝒞\mathcal{C} computes an XOR function, then 𝒞(x,y)\mathcal{C}(x,y) is a function of the parity xyx\oplus y. This motivates us to analyze the XOR-fiber of the communication protocol 𝒞\mathcal{C}, defined as h(z):=𝔼𝒙,𝒚[𝒞(𝒙,𝒚)|𝒙𝒚=z]h(z):=\mathbb{E}_{\bm{x},\bm{y}}[\mathcal{C}(\bm{x},\bm{y})|\bm{x}\oplus\bm{y}=z].

We present improved Fourier growth bounds for the XOR-fibers of randomized protocols that communicate dd bits. For the first level, we show a tight O(d)O(\sqrt{d}) bound and obtain a new coin theorem, as well as an alternative proof for the tight randomized communication lower bound for the Gap-Hamming problem. For the second level, we show an d3/2polylog(n)d^{3/2}\cdot\mathrm{polylog}(n) bound, which improves the previous O(d2)O(d^{2}) bound by Girish, Raz, and Tal (ITCS 2021) and implies a polynomial improvement on the randomized communication lower bound for the XOR-lift of the Forrelation problem, which extends the quantum-classical gap for this problem.

Our analysis is based on a new way of adaptively partitioning a relatively large set in Gaussian space to control its moments in all directions. We achieve this via martingale arguments and allowing protocols to transmit real values. We also show a connection between Fourier growth and lifting theorems with constant-sized gadgets as a potential approach to prove optimal bounds for the second level and beyond.

1 Introduction

The Fourier spectrum of Boolean functions and their various properties have played an important role in many areas of mathematics and theoretical computer science. In this work, we study a notion called 1\ell_{1}-Fourier growth, which captures the scaling of the sum of absolute values of the level-kk Fourier coefficients of a function. In a nutshell, functions with small Fourier growth cannot aggregate many weak signals in the input to obtain a considerable effect on the output. In contrast, the Majority function, which can amplify weak biases, is an example of a Boolean function with extremely high Fourier growth.

To formally define Fourier growth, we recall that every Boolean function f:{±1}n[1,1]f:\{\pm 1\}^{n}\to[-1,1] can be uniquely represented as a multilinear polynomial

f(x)=S[n]f^(S)iSxif(x)=\sum_{S\subseteq[n]}\widehat{f}(S)\cdot\prod_{i\in S}x_{i}

where the coefficients of the polynomial f^(S)\widehat{f}(S)\in\mathbb{R} are called the Fourier coefficients of ff, and they satisfy f^(S)=𝔼[f(𝒙)iS𝒙i]\widehat{f}(S)=\operatorname*{\mathbb{E}}[f(\bm{x})\cdot\prod_{i\in S}\bm{x}_{i}] for a uniformly random 𝒙{±1}n\bm{x}\in\{\pm 1\}^{n}. The level-kk 1\ell_{1}-Fourier growth of ff is the sum of the absolute values of its level-kk Fourier coefficients,

L1,k(f):=S[n]:|S|=k|f^(S)|.L_{1,k}(f):=\sum_{S\subseteq[n]:|S|=k}\left|\widehat{f}(S)\right|.

The study of Fourier growth dates back to the work of Mansour [42] who used it in the context of learning algorithms. Since then, several works have shown that upper bounds on the Fourier growth, even for the first few Fourier levels, have applications to pseudorandomness, circuit complexity, and quantum-classical separations. For example:

  • A bound on the level-one Fourier growth is sufficient to control the advantage of distinguishing biased coins from unbiased ones [4].

  • A bound on the level-two Fourier growth already gives pseudorandom generators [15], oracle separations between BQP and PH [52, 68], and separations between efficient quantum communication and randomized classical communication [27].

Meanwhile, Fourier growth bounds have been extensively studied and established for various computational models, including small-width DNFs/CNFs [42], 𝖠𝖢0\mathsf{AC}^{0} circuits [63], low-sensitivity Boolean functions [29], small-width branching programs [51, 59, 16, 38], small-depth decision trees [45, 64, 57], functions related to small-cost communication protocols [28, 27], low-degree 𝖦𝖥(2)\mathsf{GF}(2) polynomials [14, 15, 7], product tests [36], small-depth parity decision trees [10, 30], low-degree bounded functions [33], and more.

For any Boolean function ff with outputs in [1,1][-1,1], the level-kk Fourier growth L1,k(f)L_{1,k}(f) is at most (nk)\sqrt{\binom{n}{k}}. However, for many natural classes of Boolean functions, this bound is far from tight and not good enough for applications. Establishing better bounds require exploring structural properties of the specific class of functions in question. Even for low Fourier levels, this can be highly non-trivial and tight bounds remain elusive in many cases. For example, for degree-dd 𝖦𝖥(2)\mathsf{GF}(2) polynomials (which well-approximate 𝖠𝖢0[]\mathsf{AC}^{0}[\oplus] when we set d=polylog(n)d=\mathrm{polylog}(n) [46, 56]), while we know a level-one bound of L1,1(f)O(d)L_{1,1}(f)\leq O(d) due to [15], the current best bound for levels k2k\geq 2 is roughly 2O(dk)2^{O(dk)} [14], whereas the conjectured bound is dO(k)d^{O(k)}. Validating such a bound, even for the second level k=2k=2, will imply unconditional pseudorandom generators of polylogarithmic seed length for 𝖠𝖢0[]\mathsf{AC}^{0}[\oplus] [15], a longstanding open problem in circuit complexity and pseudorandomness.

XOR Functions.

In this work, we study the Fourier growth of certain functions that naturally arise from communication protocols for XOR-lifted functions, also referred to as XOR functions. XOR functions are an important and well-studied class of functions in communication complexity with connections to the log-rank conjecture and quantum versus classical separations [43, 31, 65, 60, 70].

In this setting, Alice gets an input x{±1}nx\in\{\pm 1\}^{n} and Bob gets an input y{±1}ny\in\{\pm 1\}^{n} and they wish to compute f(xy)f(x\odot y) where ff is some partial Boolean function and xyx\odot y is in the domain of ff. Here, xyx\odot y denotes the pointwise product of xx and yy. Given any communication protocol 𝒞\mathcal{C} that computes an XOR function exactly, the output 𝒞(x,y)\mathcal{C}(x,y) of the protocol depends only on the parity xyx\odot y, whenever ff is defined on xyx\odot y. This gives a natural motivation to analyze the XOR-fiber of a communication protocol defined below. We note that a similar notion first appeared in an earlier work of Raz [47].

Definition 1.1.

Let 𝒞:{±1}n×{±1}n{±1}\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\} be any deterministic communication protocol. The XOR-fiber of the communication protocol 𝒞\mathcal{C} is the function h:{±1}n[1,1]h\colon\{\pm 1\}^{n}\to[-1,1] defined at z{±1}nz\in\{\pm 1\}^{n} as

h(z)=𝔼𝒙,𝒚ν[𝒞(𝒙,𝒚)|𝒙𝒚=z],h(z)=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})~{}|~{}\bm{x}\odot\bm{y}=z],

where \odot is the entrywise product and ν\nu is the uniform distribution over {±1}n\{\pm 1\}^{n}.

We remark that XOR-fiber is the “inverse” of XOR-lift of a function: If 𝒞\mathcal{C} computes the XOR function of ff, then the XOR-fiber hh of 𝒞\mathcal{C} is equal to ff on the domain of ff.

In this work, we investigate the Fourier growth of XOR-fibers of small-cost communication protocols and apply these bounds in several contexts. Before stating our results, we first discuss several related works.

Related Works.

Showing optimal Fourier growth bounds for XOR-fibers is a complex undertaking in general and a first step towards this end is to obtain optimal Fourier growth bounds for parity decision trees. This is because a parity decision tree for a Boolean function ff naturally gives rise to a structured communication protocol for the XOR-function corresponding to ff. This protocol perfectly simulates the parity decision tree by having Alice and Bob exchange one bit each to simulate a parity query. Moreover, the XOR-fiber of this protocol exactly computes the parity decision tree. As such, parity decision trees can be seen as a special case of communication protocols, and Fourier growth bounds on XOR-fibers of communication protocols immediately imply Fourier growth bounds on parity decision trees.

Fourier growth bounds for decision trees and parity decision trees are well-studied. It is not too difficult to obtain a level-kk bound of O(d)kO(d)^{k} for parity decision trees of depth dd, however, obtaining improved bounds is significantly more challenging. For decision trees of depth dd (which form a subclass of parity decision trees of depth dd), O’Donnell and Servedio [45] proved a tight bound of O(d)O(\sqrt{d}) on the level-one Fourier growth. By inductive tree decompositions, Tal [64] obtained bounds for the higher levels of the form L1,k(f)dkO(log(n))k1L_{1,k}(f)\leq\sqrt{d^{k}\cdot O(\log(n))^{k-1}}. This was later sharpened by Sherstov, Storozhenko, and Wu [57] to the asymptotically tight bound of L1,k(f)(dk)O(log(n))k1L_{1,k}(f)\leq\sqrt{\binom{d}{k}\cdot O(\log(n))^{k-1}} using a more sophisticated layered partitioning strategy on the tree.

When it comes to parity decision trees, despite all the similarities, the structural decomposition approach does not seem to carry over due to the correlations between the parity queries. For parity decision trees of depth dd, Blais, Tan, and Wan [10] proved a tight level-one bound of O(d)O(\sqrt{d}). For higher levels, Girish, Tal, and Wu [30] showed that L1,k(f)dkO(klog(n))2kL_{1,k}(f)\leq\sqrt{d^{k}\cdot O(k\log(n))^{2k}}. These works imply almost tight Fourier growth bounds on the XOR-fibers of structured protocols that arise from simulating decision trees or parity decision trees.

For the case of XOR-fibers of arbitrary deterministic/randomized communication protocols (which do not necessarily simulate parity decision trees or decision trees), Girish, Raz, and Tal [27] showed an O(dk){O}(d^{k}) Fourier growth111Technically, [27] only proved a level-two bound (as it suffices for their analysis), but a level-kk bound follows easily from their proof approach, as noted by [28] for level-kk. For level-one and level-two, these bounds are O(d)O(d) and O(d2)O(d^{2}) respectively and are sub-optimal — as mentioned previously, such weaker bounds for parity decision trees are easy to obtain, while obtaining optimal bounds (for parity decision trees) of O(d)O(\sqrt{d}) for level one and dpolylog(n)d\cdot\mathrm{polylog}(n) for level two already requires sophisticated ideas.

The bounds in [27] follow by analyzing the Fourier growth of XOR-fibers of communication rectangles of measure 2d\approx 2^{-d} and then adding up the contributions from all the leaf rectangles induced by the protocol. Such a per-rectangle-based approach cannot give better bounds than the ones in [27], while they also conjectured that the optimal Fourier growth of XOR-fibers of arbitrary protocols should match the growth for parity decision trees.

Showing the above is a challenging task even for the first two Fourier levels. The difficulty arises primarily since in the absence of a per-rectangle-based argument, one has to crucially leverage cancellations between different rectangles induced by the communication protocol. In the simpler case of parity decision trees (or protocols that exchange parities), such cancellations are leveraged in [30] by ensuring kk-wise independence at each node of the tree — this can be achieved by adding extra parity queries. In a general protocol, the parties can send arbitrary partial information about their inputs and correlate the coordinates in complicated ways that such methods break down. This is one of the key difficulties we face in this paper.

1.1 Main Results

We prove new and improved bounds on the Fourier growth of the XOR-fibers associated with small-cost protocols for levels k=1k=1 and k=2k=2.

Theorem 1.2.

Let 𝒞:{±1}n×{±1}n{±1}\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\} be a deterministic communication protocol with at most dd bits of communication. Let hh be its XOR-fiber as in Definition 1.1. Then, L1,1(h)=O(d)L_{1,1}(h)=O\left(\sqrt{d}\right).

Theorem 1.3.

Let 𝒞:{±1}n×{±1}n{±1}\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\} be a deterministic protocol communicating at most dd bits. Let hh be its XOR-fiber as in Definition 1.1. Then, L1,2(h)=O(d3/2log3(n))L_{1,2}(h)=O\left(d^{3/2}\log^{3}(n)\right).

Our bounds in Theorems 1.2 and 1.3 extend directly to randomized communication protocols. This is because L1,kL_{1,k} is convex and any randomized protocol is a convex combination of deterministic protocols with the same cost. Moreover, we can use Fourier growth reductions, as described in Subsection 1.2.3, to demonstrate that these bounds apply to general constant-sized gadgets gg and the corresponding gg-fiber.

Our level-one and level-two bounds improve previous bounds in [27] by polynomial factors. Additionally, our level-one bound is tight since a deterministic protocol with d+1d+1 bits of communication can compute the majority vote of x1y1,,xdydx_{1}\cdot y_{1},\ldots,x_{d}\cdot y_{d}, which corresponds to h(z)=MAJ(z1,,zd)h(z)=\mathrm{MAJ}(z_{1},\ldots,z_{d}) with L1,1(h)=Θ(d)L_{1,1}(h)=\Theta(\sqrt{d}). Furthermore, as we discuss later in Subsection 1.2, level-one and level-two bounds are already sufficient for many interesting applications.

In terms of techniques, our analysis presents a key new idea that enables us to exploit cancellations between different rectangles induced by the protocol. This idea involves using a novel process to adaptively partition a relatively large set in Gaussian space, which enables us to control its kk-wise moments in all directions — this can be thought of as a spectral notion of almost kk-wise independence. We achieve this by utilizing martingale arguments and allowing protocols to transmit real values rather than just discrete bits. This notion and procedure may be of independent interest. See Section 2 for a detailed discussion.

1.2 Applications and Connections

Our main theorem has applications to XOR functions, and in more generality to functions lifted with constant-sized gadgets. In this setting, there is a simple gadget g:Σ×Σ{±1}g:\Sigma\times\Sigma\to\{\pm 1\} and a Boolean function ff defined on inputs z{±1}nz\in\{\pm 1\}^{n}. The lifted function fgf\circ{g} is defined on nn pairs of symbols (x1,y1),,(xn,yn)Σ×Σ(x_{1},y_{1}),\ldots,(x_{n},y_{n})\in\Sigma\times\Sigma such that (fg)(x,y)=f(g(x1,y1),,g(xn,yn))(f\circ{g})(x,y)=f(g(x_{1},y_{1}),\ldots,g(x_{n},y_{n})). The function fgf\circ{g} naturally defines a communication problem where Alice is given x=(x1,,xn)x=(x_{1},\ldots,x_{n}), Bob is given y=(y1,,yn)y=(y_{1},\ldots,y_{n}), and they are asked to compute (fg)(x,y)(f\circ{g})(x,y).

Since XOR functions are functions lifted with the XOR gadget, our main theorem implies lower bounds on the communication complexity of specific XOR functions. Additionally, we also show connections between XOR-lifting and lifting with any constant-sized gadget. Next, we describe these lower bounds and connections, with further context.

1.2.1 The Coin Problem and the Gap-Hamming Problem

The coin problem studies the advantage that a class of Boolean functions has in distinguishing biased coins from unbiased ones. More formally, let \mathcal{F} be a class of nn-variate Boolean functions. Let ρ[1,1]\rho\in[-1,1] and πρn\pi^{\otimes n}_{\rho} denote the product distribution over {±1}n\{\pm 1\}^{n} where each coordinate has expectation ρ\rho. The Coin Problem asks what is the maximum advantage that functions in \mathcal{F} have in distinguishing πρn\pi^{\otimes n}_{\rho} from the uniform distribution π0n\pi^{\otimes n}_{0}.

This quantity essentially captures how well \mathcal{F} can approximate threshold functions, and in particular, the majority function. The coin problem has been studied for various models of computation including branching programs [11], 𝖠𝖢0\mathsf{AC}^{0} and 𝖠𝖢0[]\mathsf{AC}^{0}[\oplus] circuits [13, 40], product tests [41], and more. Recently, Agrawal [4] showed that the coin problem is closely related to the level-one Fourier growth of functions in \mathcal{F}.

Lemma 1.4 ([4, Lemma 3.2]).

Assume that \mathcal{F} is closed under restrictions and satisfies L1,1(f)tL_{1,1}(f)\leq t for all ff\in\mathcal{F}. Then, for all ρ(1,1)\rho\in(-1,1) and ff\in\mathcal{F},

|𝔼zπρn[f(z)]𝔼zπ0n[f(z)]|ln(11|ρ|)t.\left|\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{\rho}}[f(z)]-\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{0}}[f(z)]\right|\leq\ln\left(\tfrac{1}{1-|\rho|}\right)\cdot t.

Note that communication protocols of small cost are closed under restrictions, so are their XOR-fibers (see [27, Lemma 5.5]). By noting that ln(11|ρ|)|ρ|\ln\left(\frac{1}{1-|\rho|}\right)\approx|\rho| for small values of ρ\rho, we obtain the following corollary.222Here we also use the fact that the upper bound O(|ρ|d)O(|\rho|\cdot\sqrt{d}) is vacuous for large enough ρ\rho as it is larger than 11. We also remark that, using the Fourier growth reductions (see Subsection 1.2.3), Theorem 1.5 can be established for general gadgets of small size.

Theorem 1.5.

Let hh be the XOR-fiber of a protocol with total communication dd. Then for all ρ\rho,

|𝔼zπρn[h(z)]𝔼zπ0n[h(z)]|O(|ρ|d).\left|\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{\rho}}[h(z)]-\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{0}}[h(z)]\right|\leq O\!\left(|\rho|\cdot\sqrt{d}\right).

In particular, consider the following distinguishing task: Alice and Bob either receive two uniformly random strings in {±1}n\{\pm 1\}^{n} or they receive two uniformly random strings in {±1}n\{\pm 1\}^{n} conditioned on their XOR distributed according to πρn\pi^{\otimes n}_{\rho} for ρ=1/n\rho=1/\sqrt{n} (the latter is often referred to as ρ\rho-correlated strings). Theorem 1.5 implies that any protocol communicating o(n)o(n) bits cannot distinguish these two distributions with constant advantage. This is essentially a communication lower bound for the well-known Gap-Hamming Problem.

The Gap-Hamming Problem.

In the Gap-Hamming Problem, Alice and Bob receive strings x,y{±1}nx,y\in\{\pm 1\}^{n} respectively and they want to distinguish if x,yn\left\langle x,y\right\rangle\leq-\sqrt{n} or x,yn\left\langle x,y\right\rangle\geq\sqrt{n}.

This is essentially the XOR-lift of the Coin Problem with ρ=±1/n\rho=\pm 1/\sqrt{n} because the distribution of (x,y)(x,y) conditioned on xyπρnx\odot y\sim\pi^{\otimes n}_{\rho} with ρ=1/n\rho=-1/\sqrt{n} and ρ=1/n\rho=1/\sqrt{n} is mostly supported on the Yes and No instances of Gap-Hamming respectively. Thus immediately from Theorem 1.5, we derive a new proof for the Ω(n)\Omega(n) lower bound on the communication complexity of the Gap-Hamming Problem. The proof is deferred to Appendix A.

Theorem 1.6.

The randomized communication complexity of the Gap-Hamming Problem is Ω(n)\Omega(n).

We note that there are various different proofs [18, 55, 67, 53] that obtain the above lower bound but the perspective taken here is perhaps conceptually simpler: (1) Gap-Hamming is essentially the XOR-lift of the Gap-Majority function, and (2) any function that approximates the Gap-Majority function must have large level-one Fourier growth, whereas XOR-fibers of small-cost protocols have small Fourier growth.

1.2.2 Quantum versus Classical Communication Separation via Lifting

One natural approach to proving quantum versus classical separations in communication complexity is via lifting: Consider a function ff separating quantum and classical query complexity and lift it using a gadget gg. Naturally, an algorithm computing ff with few queries to zz can be translated into a communication protocol computing fgf\circ{g} where we replace each query to a bit ziz_{i} with a short conversation that allows the calculation of zi=g(xi,yi)z_{i}=g(x_{i},y_{i}). Göös, Pitassi, and Watson [26] showed that for randomized query/communication complexity and for various gadgets, this is essentially the best possible. Such results are referred to as lifting theorems.

Lifting theorems apply to different models of computation, such as deterministic decision trees [48, 25], randomized decision trees [26, 12], and more. A beautiful line of work shows how to “lift” many lower bounds in the query model to the communication model [48, 25, 23, 24, 19, 31, 69, 17, 35, 61, 54, 50, 49, 22, 39]. For quantum query complexity, only one direction (considered the “easier” direction) is known: Any quantum query algorithm for ff can be translated to a communication protocol for fgf\circ{g} with a small logarithmic overhead [6]. It remains widely open whether the other direction holds as well. However, this query-to-communication direction for quantum, combined with the communication-to-query direction for classical, is already sufficient for lifting quantum versus classical separations from the query model to the communication model.

One drawback of this approach to proving communication complexity separations is that the state-of-the-art lifting results [12, 37] work for gadgets with alphabet size at least nn (recall that nn denotes ff’s input length) and it is a significant challenge to reduce the alphabet size to O(1)O(1) or even polylog(n)\mathrm{polylog}(n). These large gadgets will usually result in larger overheads in terms of communication rounds, communication bits, and computations for both parties. As demonstrated next, lifting with simpler gadgets like XOR allows for a simpler quantum protocol for the lifted problem.

Lifting Forrelation with XOR.

The Forrelation function introduced by [2] is defined as follows: on input x=(x1,x2){±1}nx=(x_{1},x_{2})\in\{\pm 1\}^{n} where nn is a power of 22,

Forr(x)=2nHx1,x2,\mathrm{Forr}(x)=\frac{2}{n}\left\langle Hx_{1},x_{2}\right\rangle,

where HH denotes the (n/2)×(n/2)(n/2)\times(n/2) (unitary) Hadamard matrix.

Girish, Raz, and Tal [27] studied the XOR-lift of the Forrelation problem and obtained new separations between quantum and randomized communication protocols. In more detail, they considered the partial function333We are overloading the notation here: technically, ForrXOR\mathrm{Forr}\circ\mathrm{XOR} is the XOR-lift of the partial boolean function which on input xx outputs 11 if Forr(x)\mathrm{Forr}(x) is large and 1-1 if Forr(x)\mathrm{Forr}(x) is small. ForrXOR:{±1}n×{±1}n{±1}\mathrm{Forr}\circ\mathrm{XOR}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\} defined as

ForrXOR(x,y)={1Forr(xy)1200ln(n/2),1Forr(xy)1400ln(n/2),\mathrm{Forr}\circ\mathrm{XOR}(x,y)=\begin{cases}1&\mathrm{Forr}(x\odot y)\geq\frac{1}{200\ln(n/2)},\\ -1&\mathrm{Forr}(x\odot y)\leq\frac{1}{400\ln(n/2)},\end{cases}

and showed that if Alice and Bob use a randomized communication protocol, then they must communicate at least Ω~(n1/4)\widetilde{\Omega}(n^{1/4}) bits to compute ForrXOR\mathrm{Forr}\circ{\mathrm{XOR}}; while it can be solved by two entangled parties in the quantum simultaneous message passing model with a polylog(n)\mathrm{polylog}(n)-qubit communication protocol and additionally the parties can be implemented with efficient quantum circuits.

The lower bound in [27] was obtained from a second level Fourier growth bound (higher levels are not needed) on the XOR-fiber of classical communication protocols. Our level-two bound strengthens their bound and immediately gives an improved communication lower bound.

Theorem 1.7.

The randomized communication complexity of ForrXOR\mathrm{Forr}\circ\mathrm{XOR} is Ω~(n1/3)\widetilde{\Omega}(n^{1/3}).

Theorem 1.7 above gives an polylog(n)\mathrm{polylog}(n) versus Ω~(n1/3)\widetilde{\Omega}(n^{1/3}) separation between the above quantum communication model and the randomized two-party communication model, improving upon the polylog(n)\mathrm{polylog}(n) versus Ω~(n1/4)\widetilde{\Omega}(n^{1/4}) separation from [27]. We emphasize that our separations are for players with efficient quantum running time, where the only prior separation was shown by the aforementioned work [27]. Such efficiency features can also benefit real-world implementations to demonstrate quantum advantage in experiments; for instance, one such proposal was introduced recently by Aaronson, Buhrman, and Kretschmer [3]. Without the efficiency assumption, a better polylog(n)\mathrm{polylog}(n) versus Ω~(n)\widetilde{\Omega}(\sqrt{n}) separation is known [21] (see [27, Section 1.1] for a more detailed comparison). Optimal Fourier growth bounds of dpolylog(n)d\cdot\mathrm{polylog}(n) for level two, which we state later in 1.8, would also imply such a separation with XOR-lift of Forrelation.

Lifting kk-Fold Forrelation with XOR.

kk-Fold Forrelation [1] is a generalization of the Forrelation problem and was originally conjectured to be a candidate that exhibits a maximal separation between quantum and classical query complexity. In a recent work, [9] showed that the randomized query complexity of kk-Fold Forrelation is Ω~(n11/k)\widetilde{\Omega}(n^{1-1/k}), confirming this conjecture, and a similar separation was proven in [57] for variants of kk-Fold Forrelation. These separations, together with lifting theorems with the inner product gadget [12], imply an O(klog(n))O(k\log(n)) vs Ω~(n11/k)\widetilde{\Omega}(n^{1-1/k}) separation between two-party quantum and classical communication complexity, where additionally, the number of rounds444We remark that for k=2k=2, this is exactly the XOR-lift of the Forrelation problem and can even be computed in the quantum simultaneous model, as shown in [27]. in the two-party quantum protocol is 2k/22\cdot\lceil k/2\rceil.

Replacing the inner product gadget with the XOR gadget above would yield an improved quantum-classical communication separation where the gadget is simpler and the number of rounds required by the quantum protocol to achieve the same quantitative separation is reduced by half. Bansal and Sinha [9] showed that for any computational model, small Fourier growth for the first O(k2)O(k^{2})-levels implies hardness of kk-Fold Forrelation in that particular model. Thus, in conjunction with their results, to prove the above XOR lifting result for the kk-Fold Forrelation problem, it suffices to prove the following Fourier growth bounds for XOR-fibers.

Conjecture 1.8.

Let 𝒞:{±1}n×{±1}n{±1}\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\} be a deterministic communication protocol with at most dd bits of communication. Let hh be its XOR-fiber as in Definition 1.1. Then for all kk\in\mathbb{N}, we have that L1,k(h)(dpoly(k,log(n)))kL_{1,k}(h)\leq(\sqrt{d}\cdot\mathrm{poly}(k,\log(n)))^{k}.

Note that these bounds are consistent with the Fourier growth of parity decision trees (or protocols that only send parities) as shown in [30].

We prove the above conjecture for the case k=1k=1 and make progress for the case k=2k=2. While our techniques can be extended to higher levels in a straightforward manner, the bounds obtained are farther from the conjectured ones. Thus, we decided to defer dealing with higher levels to future work as we believe one needs to first prove the optimal bound for level k=2k=2.

In the next subsection, we give another motivation to study the above conjecture by showing a connection to lifting theorems for constant-sized gadgets.

1.2.3 General Gadgets and Fourier Growth from Lifting

Our main results are Fourier growth bounds for XOR-fibers, which corresponds to XOR-lifts of functions. To complement this, we show that similar bounds hold for general lifted functions.

Let g:Σ×Σ{±1}g\colon\Sigma\times\Sigma\to\{\pm 1\} be a gadget and 𝒞:Σn×Σn{±1}\mathcal{C}\colon\Sigma^{n}\times\Sigma^{n}\to\{\pm 1\} be a communication protocol. Define the gg-fiber of 𝒞\mathcal{C}, denoted by 𝒞g:{±1}n[1,1]\mathcal{C}_{\downarrow g}\colon\{\pm 1\}^{n}\to[-1,1], as

𝒞g(z)=𝔼[𝒞(𝒙,𝒚)|g(𝒙i,𝒚i)=zi,i],\mathcal{C}_{\downarrow g}(z)=\operatorname*{\mathbb{E}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=z_{i},~{}\forall i\right],

where 𝒙\bm{x} and 𝒚\bm{y} are uniform over Σ\Sigma. We use L1,k(g,d)L_{1,k}(g,d) to denote the upper bound of the level-kk Fourier growth for the gg-fibers of protocols with at most dd bits of communication. Using this notation, the XOR-fiber of 𝒞\mathcal{C} is simply 𝒞XOR\mathcal{C}_{\downarrow\mathrm{XOR}}, and our main results Theorems 1.2 and 1.3 can be rephrased as

L1,1(XOR,d)O(d)andL1,2(XOR,d)O(d3/2log3(n)).L_{1,1}(\mathrm{XOR},d)\leq O\left(\sqrt{d}\right)\quad\text{and}\quad L_{1,2}(\mathrm{XOR},d)\leq O\left(d^{3/2}\log^{3}(n)\right).

In Section 7, we relate L1,k(g,d)L_{1,k}(g,d) to L1,k(XOR,d)L_{1,k}(\mathrm{XOR},d), and the main takeaway is, in the study of Fourier growth bounds, constant-sized gadgets are all equivalent.

Theorem 1.9 (Informal, see Theorem 7.5 and Theorem 7.6).

Let g:Σ×Σ{±1}g\colon\Sigma\times\Sigma\to\{\pm 1\} be a “balanced” gadget. Then

|Σ|kL1,k(XOR,d)L1,k(g,d)|Σ|kL1,k(XOR,d).|\Sigma|^{-k}\cdot L_{1,k}(\mathrm{XOR},d)\leq L_{1,k}(g,d)\leq|\Sigma|^{k}\cdot L_{1,k}(\mathrm{XOR},d).

Theorem 1.9 also proposes a different approach towards 1.8: it suffices to establish tight Fourier growth bound for gg-fibers for some constant-sized (actually, polylogarithmic size suffices) gadget gg, and then apply the reduction. The benefit of switching to a different gadget is that we can perhaps first prove a lifting theorem, and then appeal to the known Fourier growth bounds of (randomized) decision trees [64, 57]. See Subsection 8.1 for detail.

As mentioned earlier, lifting theorems show how to simulate communication protocols of cost dd for lifted functions with decision trees of depth at most O(d)O(d) (see e.g., [26]). A problem at the frontier of this fruitful line of work has been establishing lifting theorems for decision trees with constant-sized gadgets. Note that the XOR gadget itself cannot have such a generic lifting result: Indeed, the parity function serves as a counterexample. Nevertheless, it is speculative that some larger gadget works, which suffices for our purposes.555In terms of the separations between quantum and classical communication, even restricted lifting results for the specific outer function being the Forrelation function would suffice. On the other hand, for lifting from parity decision trees, we do know an XOR-lifting theorem [31]. However, it only holds for deterministic communication protocols and has a sextic blowup in the cost.

Thus, one can see 1.8 as either a further motivation for establishing lifting results for decision trees with constant-sized gadgets, or as a necessary milestone before proving such lifting results.

1.2.4 Pseudorandomness for Communication Protocols

We say G:{±1}{±1}n×{±1}nG\colon\{\pm 1\}^{\ell}\to\{\pm 1\}^{n}\times\{\pm 1\}^{n} is a pseudorandom generator (PRG) for a (randomized) communication protocol 𝒞:{±1}n×{±1}n[1,1]\mathcal{C}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1] with error ε\varepsilon and seed length \ell if

|𝔼𝒙,𝒚ν[𝒞(𝒙,𝒚)]𝔼𝒓{±1}[𝒞(G(𝒓))]|ε.\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{\bm{r}\sim\{\pm 1\}^{\ell}}[\mathcal{C}(G(\bm{r}))]\right|\leq\varepsilon.

[32] showed that for the class of protocols sending at most dd communication bits, there exists an explicit PRG of error 2d2^{-d} and seed length n+O(d)n+O(d) from expander graphs. Note that the overhead nn is inevitable even if the protocol is only sending one bit, since it can depend arbitrarily on Alice/Bob’s input.

Combining 1.8 and the PRG construction from [14, Theorem 4.5], we would obtain a completely different explicit PRG for this class with error ε\varepsilon and seed length n+dpolylog(n/ε)n+d\cdot\mathrm{polylog}(n/\varepsilon).

Paper Organization.

An overview of our proofs is given in Section 2. In Section 3 we define necessary notation and recall useful inequalities. Section 4 explains a way to associate the Fourier growth to a martingale process. The proof of level-one bound (Theorem 1.2) is given in Section 5, and the level-two bound (Theorem 1.3) in Section 6. The Fourier growth reductions between general gadgets are presented in Section 7. The future directions are discussed in Section 8. Missing proofs can be found in the appendix.

2 Proof Overview

We first briefly outline the proof strategy, which consists of three main components:

  • First, we show that the level-one bound can be characterized as the expected absolute value of a martingale defined as follows: Consider the random walk induced on the protocol tree when Alice and Bob are given inputs 𝒙\bm{x} and 𝒚\bm{y} uniformly from {±1}n\{\pm 1\}^{n}. Let 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} be the rectangle associated with the random walk at time tt. The martingale process tracks the inner product μ(𝑿(t)),μ(𝒀(t))\left\langle\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})\right\rangle where μ(𝑿(t))=𝔼[𝒙|𝒙𝑿(t)]\mu(\bm{X}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{x}\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right] and μ(𝒀(t))=𝔼[𝒚|𝒚𝒀(t)]\mu(\bm{Y}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{y}\,\middle|\,\bm{y}\in\bm{Y}^{(t)}\right] are Alice’s and Bob’s center of masses.

  • Second, to bound the value of the martingale, it is necessary to ensure that neither 𝑿(t)\bm{X}^{(t)} nor 𝒀(t)\bm{Y}^{(t)} become excessively elongated in any direction during the protocol execution. To measure the length of 𝑿(t)\bm{X}^{(t)} in a particular direction θ𝕊n1\theta\in\mathbb{S}^{n-1}, we calculate the variance 𝕍ar[𝒙,θ|𝒙𝑿(t)]\mathbb{V}\mathrm{ar}\left[\left\langle\bm{x},\theta\right\rangle\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right], i.e. the variance of a uniformly random 𝒙𝑿(t)\bm{x}\in\bm{X}^{(t)} in the direction θ\theta. If the set is not elongated in any direction, this can be thought of as a spectral notion of almost pairwise independence. Such a notion also generalizes to almost kk-wise independence by considering higher moments.

    To achieve the property that the sets are not elongated, one of the main novel ideas in our paper is to modify the original protocol to a new one that incorporates additional cleanup steps where the parties communicate real values 𝒙,θ\left\langle\bm{x},\theta\right\rangle. Through these communication steps, the sets 𝑿(t)\bm{X}^{(t)} and 𝒀(t)\bm{Y}^{(t)} are recursively divided into affine slices along problematic directions.

  • Last, one needs to show that the number of cleanup steps are small in order to bound the value of the martingale for the new protocol. This is the most involved part of our proof and requires considerable effort because the cleanup steps are real-valued and adaptively depend on the entire history, including the previous real values communicated.

The strategy outlined above also generalizes to level-two Fourier growth by considering higher moments and sending values of quadratic forms in the inputs. We also remark that since we view the sets 𝑿(t)\bm{X}^{(t)} and 𝒀(t)\bm{Y}^{(t)} above as embedded in n\mathbb{R}^{n} and allow the protocol to send real values, it is more natural for us to work in Gaussian space by doing a standard transformation. The rotational invariance of the Gaussian space also seems to be essential for us to obtain optimal level-one bound without losing additional polylogarithmic factors.

We now elaborate on the above components in detail and also highlight the differences between the level-one and level-two settings. For conciseness, in the following overview we use fgf\lesssim g to denote f=O(g)f=O(g) and fgf\gtrsim g to denote f=Ω(g)f=\Omega(g) where OO and Ω\Omega only hide absolute constants.

2.1 Level-One Fourier Growth

The level-one Fourier growth of the XOR-fiber hh is given by

L1,1(h)=i=1n|h^({i})|=i=1n|𝔼𝒛ν[h(𝒛)𝒛i]|=i=1n|𝔼𝒙,𝒚ν[𝒞(𝒙,𝒚)𝒙i𝒚i]|.L_{1,1}(h)=\sum_{i=1}^{n}\left|\widehat{h}(\{i\})\right|=\sum_{i=1}^{n}\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}[h(\bm{z})\bm{z}_{i}]\right|=\sum_{i=1}^{n}\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{y}_{i}]\right|.

To bound the above, it suffices to bound i=1nηi𝔼[𝒞(𝒙,𝒚)𝒙i𝒚i]\sum_{i=1}^{n}\eta_{i}\cdot\operatorname*{\mathbb{E}}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{y}_{i}] for any sign vector η{±1}n\eta\in\{\pm 1\}^{n}. Here for simplicity we assume ηi1\eta_{i}\equiv 1 and the probability of reaching every leaf is 2d\approx 2^{-d}.

A Martingale Perspective.

To evaluate the quantity i=1n𝔼[𝒞(𝒙,𝒚)𝒙i𝒚i]\sum_{i=1}^{n}\operatorname*{\mathbb{E}}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{y}_{i}], consider a random leaf \bm{\ell} of the protocol and let 𝑿×𝒀\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}} be the corresponding rectangle. Since the leaf determines the answer of the protocol, denoted by 𝒞()\mathcal{C}(\bm{\ell}), the quantity above equals

i=1n𝔼[𝒞()𝔼[𝒙i|𝒙𝑿]𝔼[𝒚i|𝒚𝒀]]=𝔼[𝒞()μ(𝑿),μ(𝒀)]𝔼[|μ(𝑿),μ(𝒀)|],\sum_{i=1}^{n}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\mathcal{C}(\bm{\ell})\cdot\operatorname*{\mathbb{E}}[\bm{x}_{i}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}]\cdot\operatorname*{\mathbb{E}}[\bm{y}_{i}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}]\right]=\operatorname*{\mathbb{E}}_{\bm{\ell}}[\mathcal{C}(\bm{\ell})\cdot\left\langle\mu(\bm{X}_{\bm{\ell}}),\mu(\bm{Y}_{\bm{\ell}})\right\rangle]\leq\operatorname*{\mathbb{E}}_{\bm{\ell}}[|\left\langle\mu(\bm{X}_{\bm{\ell}}),\mu(\bm{Y}_{\bm{\ell}})\right\rangle|],

where μ(𝑿)=𝔼[𝒙|𝒙𝑿]\mu(\bm{X}_{\bm{\ell}})=\operatorname*{\mathbb{E}}\left[\bm{x}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right] and μ(𝒀)=𝔼[𝒚|𝒚𝒀]\mu(\bm{Y}_{\bm{\ell}})=\operatorname*{\mathbb{E}}\left[\bm{y}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right] are the center of masses of the rectangle. Our goal is to bound the magnitude of the random variable 𝒛=μ(𝑿),μ(𝒀)\bm{z}=\left\langle\mu(\bm{X}_{\bm{\ell}}),\mu(\bm{Y}_{\bm{\ell}})\right\rangle.

We shall show that 𝔼[|𝒛|]d\operatorname*{\mathbb{E}}_{\bm{\ell}}[|\bm{z}|]\lesssim\sqrt{d}. Note that |𝒛||\bm{z}| can be as large as dd in the worst case — for instance if the first dd coordinates of 𝑿\bm{X}_{\bm{\ell}} and 𝒀\bm{Y}_{\bm{\ell}} are fixed to the same value — thus we cannot argue for each leaf separately.

To analyze it for a random leaf, we first characterize the above as a martingale process using the tree structure of the protocol. The martingale process is defined as (𝒛(t))t\left(\bm{z}^{(t)}\right)_{t} where 𝒛(t):=μ(𝑿(t)),μ(𝒀(t))\bm{z}^{(t)}:=\left\langle\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})\right\rangle tracks the inner product between the center of masses μ(𝑿(t))\mu(\bm{X}^{(t)}) and μ(𝒀(t))\mu(\bm{Y}^{(t)}) of the current rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} at step tt. Denote the martingale differences by Δ𝒛(t+1)=𝒛(t+1)𝒛(t)\Delta\bm{z}^{(t+1)}=\bm{z}^{(t+1)}-\bm{z}^{(t)} and note that if in the ttht^{\text{th}} step Alice sends a message, then

Δ𝒛(t+1)=Δμ(𝑿(t+1)),μ(𝒀(t+1)),\Delta\bm{z}^{(t+1)}=\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t+1)})\right\rangle,

where Δμ(𝑿(t+1))=μ(𝑿(t+1))μ(𝑿(t))\Delta\mu(\bm{X}^{(t+1)})=\mu(\bm{X}^{(t+1)})-\mu(\bm{X}^{(t)}) is the change in Alice’s center of mass. A similar expression holds if Bob sends a message. Then it suffices to bound the expected quadratic variation (see Section 3) since

(𝔼[|𝒛(d)|])2𝔼[(𝒛(d))2]=𝔼[t=0d1(Δ𝒛(t+1))2],\left(\operatorname*{\mathbb{E}}\left[\left|\bm{z}^{(d)}\right|\right]\right)^{2}\leq\operatorname*{\mathbb{E}}\left[\left(\bm{z}^{(d)}\right)^{2}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=0}^{d-1}\left(\Delta\bm{z}^{(t+1)}\right)^{2}\right], (2.1)

where the equality holds due to the martingale property: 𝔼[Δ𝒛(t+1)|𝒛(1),𝒛(t)]=0\operatorname*{\mathbb{E}}\left[\Delta\bm{z}^{(t+1)}\,\middle|\,\bm{z}^{(1)},\ldots\bm{z}^{(t)}\right]=0.

To obtain the desired bound, we need to bound the expected quadratic variation by O(d)O(d). Note that it could be the case that a single Δ𝒛(t+1)\Delta\bm{z}^{(t+1)} scales like d\sqrt{d}. For instance, if Bob first announces his first dd coordinates, y1,,ydy_{1},\ldots,y_{d}, and then Alice sends a majority of x1y1,,xdydx_{1}\cdot y_{1},\ldots,x_{d}\cdot y_{d}, then in the last step Alice’s center of mass μ(𝑿(t+1))\mu(\bm{X}^{(t+1)}) changes by 1/d\approx 1/\sqrt{d} in each of the first dd coordinates, and the inner product with Bob’s center of mass changes by d\approx\sqrt{d} in a single step.

Such cases make it difficult to directly control the individual step sizes of the martingale and we will only be able to obtain an amortized bound. It turns out, as we explain later, that such an amortized bound on the martingale can be obtained if Alice and Bob’s sets are not elongated in any direction. Therefore, we will transform the original protocol into a clean protocol by introducing real communication steps that slice the elongated directions. For this, it will be convenient to work in Gaussian space which also turns out to be essential in proving the optimal O(d)O(\sqrt{d}) bound.

Protocols in Gaussian Space.

A communication protocol in Gaussian space takes as inputs 𝒙,𝒚n\bm{x},\bm{y}\in\mathbb{R}^{n} where 𝒙,𝒚\bm{x},\bm{y} are independently sampled from the Gaussian distribution γn\gamma_{n}. One can embed the original Boolean protocol in the Gaussian space by running the protocol on the uniformly distributed Boolean inputs sgn(𝒙)\mathrm{sgn}(\bm{x}) and sgn(𝒚)\mathrm{sgn}(\bm{y}) where sgn()\mathrm{sgn}(\cdot) takes the sign of each coordinate. Note that any node of the protocol tree in the Gaussian space corresponds to a rectangle X×YX\times Y where X,YnX,Y\subseteq\mathbb{R}^{n}. Abusing the notation and defining their Gaussian centers of masses as μ(X)=𝔼𝒙γn[𝒙|𝒙X]\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\,\middle|\,\bm{x}\in X\right] and μ(Y)=𝔼𝒚γn[𝒚|𝒚Y]\mu(Y)=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\bm{y}\,\middle|\,\bm{y}\in Y\right], one can associate the same martingale (𝒛(t))t(\bm{z}^{(t)})_{t} with the protocol in the Gaussian space:

𝒛(t)=μ(𝑿(t)),μ(𝒀(t)).\bm{z}^{(t)}=\left\langle\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})\right\rangle.

It turns out that bounding the quadratic variation of this martingale suffices to give a bound on L1,2(h)L_{1,2}(h) (see Section 4), so we will stick to the Gaussian setting. We now describe the ideas behind the cleanup process so that the step sizes can be controlled more easily.

Cleanup with Real Communication.

The cleanup protocol runs the original protocol interspersed with some cleanup steps where Alice and Bob send real values. As outlined before, one of the goals of these cleanup steps is to ensure that the sets are not elongated in any direction, in order to control the martingale steps. In more detail, recall that we want to control

𝔼[(Δ𝒛(t+1))2|𝒛(1),,𝒛(t)]=𝔼[Δμ(𝑿(t+1)),μ(𝒀(t+1))2|𝒛(1),,𝒛(t)]\operatorname*{\mathbb{E}}\left[(\Delta\bm{z}^{(t+1)})^{2}\,\middle|\,\bm{z}^{(1)},\ldots,\bm{z}^{(t)}\right]=\operatorname*{\mathbb{E}}\left[\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t+1)})\right\rangle^{2}\,\middle|\,\bm{z}^{(1)},\ldots,\bm{z}^{(t)}\right]

in the ttht^{\text{th}} step where Alice speaks. There are two key underlying ideas for the cleanup steps:

  • Gram-Schmidt Orthogonalization: At each round, if the current rectangle is 𝑿×𝒀\bm{X}\times\bm{Y}, before Alice sends the actual message, she sends the inner product x,μ(𝒀)\left\langle x,\mu({\bm{Y}})\right\rangle between her input and Bob’s current center of mass μ(𝒀)\mu({\bm{Y}}). This partitions Alice’s set 𝑿\bm{X} into affine slices orthogonal to Bob’s current center of mass μ(𝒀)\mu(\bm{Y}). Thus the change in Alice’s center of mass in later rounds is orthogonal to μ(𝒀)\mu(\bm{Y}) since it only takes place inside the affine slice.

    Recall that the martingale 𝒛(t)\bm{z}^{(t)} is the inner product of Alice and Bob’s center of masses, and Bob’s center of mass does not change when Alice speaks. The original communication steps now do not contribute to the martingale and only the steps where the inner products are revealed do. In particular, if tprev<tt_{\mathrm{prev}}<t are two consecutive times where Alice revealed the inner product, then the change in Alice’s center of mass is orthogonal to change in Bob’s center of mass between time tprevt_{\mathrm{prev}} and tt. Thus, conditioned on the rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} fixed by the messages until time tt, we have, by Jensen’s inequality,

    𝔼[(Δ𝒛(t+1))2|𝑿(t),𝒀(t)]\displaystyle\operatorname*{\mathbb{E}}\left[(\Delta\bm{z}^{(t+1)})^{2}\,\middle|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right] =𝔼[Δμ(𝑿(t+1)),μ(𝒀(t))μ(𝒀(tprev))2|𝑿(t),𝒀(t)]\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t)})-\mu(\bm{Y}^{(t_{\mathrm{prev}})})\right\rangle^{2}\,\middle|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right]
    𝔼[𝒙μ(𝑿(t)),μ(𝒀(t))μ(𝒀(tprev))2|𝑿(t),𝒀(t)].\displaystyle\leq\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}-\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})-\mu(\bm{Y}^{(t_{\mathrm{prev}})})\right\rangle^{2}\,\middle|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right]. (2.2)

    Note that the quantity on the right-hand side above is of the form 𝒙𝔼[𝒙],v\left\langle\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}],v\right\rangle. In other words, it is the variance of the random vector 𝒙\bm{x} along direction vv. To maintain a bound on this quantity, we introduce the notion of “not being elongated in any direction”.

  • Not elongated in any direction: We define the following notion to capture the fact that the random vector is not elongated in any direction: we say that a mean-zero random vector 𝒙=𝒙𝔼[𝒙]\bm{x}^{\prime}=\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}] in n\mathbb{R}^{n} is λ\lambda-pairwise clean, if for every vnv\in\mathbb{R}^{n},

    𝔼[𝒙,v2]λv2,\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}^{\prime},v\right\rangle^{2}\right]\leq\lambda\cdot\|v\|^{2}, (2.3)

    or equivalently, the operator norm of the covariance matrix 𝔼[𝒙𝒙]\operatorname*{\mathbb{E}}[\bm{x}^{\prime}\bm{x}^{\prime\top}] is at most λ\lambda. This can be considered a spectral notion of almost pairwise independence, since the pairwise moments are well-behaved in every direction.

If the input distribution conditioned on Alice’s set 𝑿(t)\bm{X}^{(t)} is O(1)O(1)-pairwise clean, we say that her set is pairwise clean. Based on the above ideas, after Alice sends the initial message, if her set is not yet clean, she partitions it recursively by taking affine slices and transmitting real values. More precisely, while there is direction θ𝕊n1\theta\in\mathbb{S}^{n-1} violating Equation 2.3, Alice does a cleanup of her set by sending the inner product x,θ\left\langle x,\theta\right\rangle. This direction is known to Bob as it only depends on Alice’s current space. In addition, this cleanup does not contribute to the martingale in the future because the inner product along this direction is fixed now.

The resulting protocol is pairwise clean in the sense that at each step666We remark that the sets are only clean at intermediate steps where a cleanup phase ends, but we show that because of the orthogonalization step, the other steps do not contribute to the value of the martingale., Alice’s current set is pairwise clean. Similar arguments work for Bob.

Let 𝒅\bm{d} be the total number of communication rounds including all the cleanup steps. Then, by the above argument, and denoting by (𝝉m)m(\bm{\tau}_{m})_{m} and (𝝉m)m(\bm{\tau}^{\prime}_{m})_{m} the indices of the inner product steps for Alice and Bob, we can ultimately bound

𝔼[(𝒛(𝒅))2]\displaystyle\operatorname*{\mathbb{E}}\left[(\bm{z}^{(\bm{d})})^{2}\right] 𝔼[mμ(𝑿(𝝉m))μ(𝑿(𝝉m1))2+μ(𝒀(𝝉m))μ(𝒀(𝝉m1))2]\displaystyle\lesssim\operatorname*{\mathbb{E}}\left[\sum_{m}\left\|\mu(\bm{X}^{(\bm{\tau}_{m})})-\mu(\bm{X}^{(\bm{\tau}_{m-1})})\right\|^{2}+\left\|\mu(\bm{Y}^{(\bm{\tau}^{\prime}_{m})})-\mu(\bm{Y}^{(\bm{\tau}^{\prime}_{m}-1)})\right\|^{2}\right]
=𝔼[μ(𝑿(𝒅))2+μ(𝒀(𝒅))2],\displaystyle=\operatorname*{\mathbb{E}}\left[\left\|\mu(\bm{X}^{(\bm{d})})\right\|^{2}+\left\|\mu(\bm{Y}^{(\bm{d})})\right\|^{2}\right], (2.4)

where again, the last equality follows from the martingale property. The right hand side above can be bounded by the expected number of communication rounds 𝔼[𝒅]\operatorname*{\mathbb{E}}[\bm{d}] using the level-one inequality (see Theorem 3.1) — this inequality bounds the Euclidean norm of the center of mass of a set in terms of its Gaussian measure.

Expected Number of Cleanup steps.

Since the original communication only consists of dd rounds, the analysis essentially reduces to bounding the expected number of cleanup steps by O(d)O(d), which is technically the most involved part of the proof.

It is implicit in the previous works on the Gap-Hamming Problem [18, 67] that large sets are not elongated in many directions: if a set XnX\subseteq\mathbb{R}^{n} has Gaussian measure 2d\approx 2^{-d}, then for a random vector 𝒙\bm{x} sampled from XX, there are at most mdm\lesssim d orthogonal directions θ1,,θm\theta_{1},\ldots,\theta_{m} such that 𝔼[𝒙,θi2]1\operatorname*{\mathbb{E}}[\left\langle\bm{x}^{\prime},\theta_{i}\right\rangle^{2}]\gtrsim 1 where 𝒙=𝒙𝔼[𝒙]\bm{x}^{\prime}=\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}]. This is a consequence of the fact that the expectation of 𝒒=i=1m𝒙,θi2\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}^{\prime},\theta_{i}\right\rangle^{2} can be bounded by O(d)O(d) provided that XX has measure 2d\approx 2^{-d}.

The above argument suggests that maybe we can clean up the set XX along these O(d)O(d) bad orthogonal directions. However this is not enough for our purposes: after taking an affine slice, the set may not be clean in a direction where it was clean before. Moreover, since the parties take turns to send messages and clean up, the bad directions will also depend on the entire history of the protocol, including the previous real and Boolean communication. This adaptivity makes the analysis more delicate and to prove the optimal bound we crucially utilize the rotational symmetry of the Gaussian distribution. Indeed, the fact that a large set is not elongated in many directions also holds even when we replace the Gaussian distribution with the uniform distribution on {±1}n\{\pm 1\}^{n}, but it is unclear how to obtain an optimal level-one bound using the latter.

In the final protocol, since the parties only send Boolean bits and linear forms of their inputs, conditioned on the history of the martingale, one can still say what the distribution of the next cleanup 𝒙,θ\left\langle\bm{x},\theta\right\rangle looks like, as the Gaussian distribution is well-behaved under linear projections. We then use martingale concentration and stopping time arguments to show that the expected number of cleanup steps is indeed bounded by O(d)O(d) even if the cleanup is adaptive.

We make two remarks in passing: First, we can also prove the optimal level-one bound using information-theoretic ideas but they do not seem to generalize to the level-two setting, so we adopt the alternative concentration-based approach here and they are similar in spirit. Second, it is possible from our proof approach (in particular, the approach for level two described next) to derive a weaker upper bound of dpolylog(n)\sqrt{d}\cdot\mathrm{polylog}(n) for the level one while directly working with the uniform distribution on the hypercube.

2.2 Level-Two Fourier Growth

We start by noting that the level-two Fourier growth of the XOR-fiber hh is given by

L1,2(h)=ij|h^({i,j})|=ij|𝔼𝒛ν[h(𝒛)𝒛i𝒛j]|=ij|𝔼𝒙,𝒚ν[𝒞(𝒙,𝒚)𝒙i𝒙j𝒚i𝒚j]|.L_{1,2}(h)=\sum_{i\neq j}\left|\widehat{h}(\{i,j\})\right|=\sum_{i\neq j}\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}[h(\bm{z})\bm{z}_{i}\bm{z}_{j}]\right|=\sum_{i\neq j}\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{x}_{j}\bm{y}_{i}\bm{y}_{j}]\right|.

To bound the above, it suffices to bound ijηij𝔼[𝒞(𝒙,𝒚)𝒙i𝒙j𝒚i𝒚j]\sum_{i\neq j}\eta_{ij}\cdot\operatorname*{\mathbb{E}}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{x}_{j}\bm{y}_{i}\bm{y}_{j}] for any symmetric sign matrix (ηij)(\eta_{ij}). For this proof overview, we assume for simplicity that ηij1\eta_{ij}\equiv 1.

Martingales and Gram-Schmidt Orthogonalization.

Similar to the case of level one, the level-two Fourier growth also has a martingale formulation. In particular, let 𝑿(t)\bm{X}^{(t)} and 𝒀(t)\bm{Y}^{(t)} be Alice and Bob’s sets at time tt as before and define σ(𝑿(t))=𝔼[𝒙𝒙|𝒙𝑿(t)],σ(𝒀(t))=𝔼[𝒚𝒚|𝒚𝒀(t)]\sigma(\bm{X}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right],\sigma(\bm{Y}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y}\,\middle|\,\bm{y}\in\bm{Y}^{(t)}\right] to be the n×nn\times n matrices that represent the level-two center of masses of the two sets. Here 𝒙𝒚\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y} denotes the tensor product 𝒙𝒚\bm{x}\otimes\bm{y} with the diagonal zeroed out.777Here xyx\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y is an n×nn\times n matrix. We will also interchangeably view n×nn\times n matrices as n2n^{2}-length vectors. To bound the level-two Fourier growth, it suffices to bound the expected quadratic variation of the martingale (𝒛(t))t\left(\bm{z}^{(t)}\right)_{t} defined by taking the inner product of the level-two center of masses 𝒛(t):=σ(𝑿(t)),σ(𝒀(t))\bm{z}^{(t)}:=\left\langle\sigma(\bm{X}^{(t)}),\sigma(\bm{Y}^{(t)})\right\rangle where ,\left\langle\cdot,\cdot\right\rangle is the inner product of two matrices viewed as vectors.

To this end, we again move to Gaussian space where the inputs x,ynx,y\in\mathbb{R}^{n} and transform the protocol to a clean protocol. First, we need an analog of the Gram-Schmidt orthogonalization step — this is achieved in a natural way by Alice sending inner product xx,σ(𝒀(t))\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\sigma(\bm{Y}^{(t)})\right\rangle with Bob’s level-two center of mass, and Bob does the same. Note that Alice and Bob are now exchanging values of quadratic polynomials in their inputs. Thus, to control the step sizes, we now need to control the second moment of quadratic forms which naturally motivates the following spectral analogue of 44-wise independence.

4-wise Cleanup with Quadratic Forms.

We say a random vector 𝒙\bm{x} is 44-wise clean with parameter λ\lambda if the operator norm of the n2×n2n^{2}\times n^{2} covariance matrix

𝔼[(𝒙𝒙𝔼[𝒙𝒙])(𝒙𝒙𝔼[𝒙𝒙])]\operatorname*{\mathbb{E}}\left[\left(\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right]\right)\left(\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right]\right)^{\top}\right]

is at most λ\lambda where we view 𝒙𝒙𝔼[𝒙𝒙]\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}] as an n2n^{2}-dimensional vector. This is equivalent to saying that for any quadratic form M,𝒙𝒙\left\langle M,\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right\rangle,

𝔼[M,𝒙𝒙𝔼[𝒙𝒙]2]λM2,\operatorname*{\mathbb{E}}\left[\left\langle M,\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right]\right\rangle^{2}\right]\leq\lambda\left\|M\right\|^{2}, (2.5)

where M\left\|M\right\| denotes the Euclidean norm of MM when viewed as a vector. Thus, this allows us to control the second moment of any quadratic polynomial (and in particular, fourth moments of linear functions). We note that one can generalize the above spectral notion to kk-wise independence in the natural way by looking at the covariance matrix of the tensor 𝒙k\bm{x}^{\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}k}.

We say a set is 44-wise clean with parameter λ\lambda if Equation 2.5 is preserved for all MM with zero diagonal888The requirement of zero diagonal is for analysis purposes only and can be assumed without loss of generality since 𝒙𝒙\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x} is zero diagonal anyway.. Combined with this notion, one can define the cleanup in an analogous way to the level-one cleanup: While there exists some Mn×nM\in\mathbb{R}^{n\times n} violating Equation 2.5, Alice sends the quadratic form xx,M\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,M\right\rangle to Bob until her set is 4-wise clean with parameter λ\lambda.

Cleanup Analysis via Hanson-Wright Inequalities.

The crux of the proof is to bound the number of cleanup steps which, together with a similar analysis as in the level-one case, gives us the desired bound. We show that mdm\lesssim d cleanup steps suffice in expectation to make the sets 44-wise clean for λdpolylog(n)\lambda\leq d\cdot\mathrm{polylog}(n). Analogous to Equation 2.1 and Subsection 2.1, this gives a bound of d3polylog(n)d^{3}\cdot\mathrm{polylog}(n) on the expected quadratic variation and implies L1,2(h)d3/2polylog(n)L_{1,2}(h)\leq d^{3/2}\cdot\mathrm{polylog}(n).

Since the parties send values of quadratic forms now, the analysis here is significantly more involved compared to the level-one case, even after moving to the Gaussian setting, where one could previously use the fact that the Gaussian distribution behaves nicely under linear projections. We rely on a powerful generalization of the Hanson-Wright inequality to a Banach-space-valued setting due to Adamczak, Latała, and Meller [5]. This inequality gives a tail bound for sum of squares of quadratic forms: In particular if M1,,MmM_{1},\ldots,M_{m} are matrices with zero diagonal which form an orthonormal set when viewed as n2n^{2} dimensional vectors, then the random variable 𝒒=i=1m𝒙𝒙,Mi2\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2} satisfies 𝐏𝐫𝒙γn[𝒒t]eΩ(t)\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}[\bm{q}\geq t]\leq e^{-\Omega(\sqrt{t})} for any tm2t\gtrsim m^{2} (see Theorem 3.3 for a precise statement). We remark that this tail bound relies on the orthogonality of the quadratic forms and is much sharper than, for example, the bound obtained from hypercontractivity or other standard polynomial concentration inequalities.

In our setting, the matrices are being chosen adaptively. In addition, the parties are sending quadratic forms in their inputs, and the distribution of the next 𝒙𝒙,M\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M\right\rangle conditioned on the history is hard to determine, unlike the level-one case. To handle this, we replace the real communication with Boolean communication of finite precision ±1/poly(n)\pm 1/\mathrm{poly}(n). This means that whenever Alice wants to perform cleanup 𝒙𝒙,M\left\langle\bm{x}\otimes\bm{x},M\right\rangle for some MM known to both parties, she sends only O(log(n))O(\log(n)) bits. On the one hand, this modification is similar enough to the cleanup protocol with real messages so that most of the argument carries through. On the other hand, now the protocol is completely discrete, which allows us to condition on any particular transcript.

For intuition, assume we fix a transcript of L=d+O(mlog(n))L=d+O(m\log(n)) bits which has gone through mm cleanups. Typically, this transcript should capture 2L\approx 2^{-L} of the probability mass. More crucially, the matrices M1,,MmM_{1},\ldots,M_{m} for the cleanups are also fixed along the transcript, and one can apply the aforementioned Hanson-Wright inequality on 𝒒=i=1m𝒙𝒙,Mi2\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}. Combining the two facts together, we can apply the non-adaptive tail bound above and then condition on obtaining such typical transcript. This shows 𝔼[𝒒]d2polylog(n)\operatorname*{\mathbb{E}}[\bm{q}]\leq d^{2}\cdot\mathrm{polylog}(n). However, each quadratic form comes from a violation of Equation 2.5 and contributes at least λ\lambda to 𝒒\bm{q} in expectation. This implies that 𝔼[𝒒]λm\operatorname*{\mathbb{E}}[\bm{q}]\geq\lambda\cdot m and by taking λ=dpolylog(n)\lambda=d\cdot\mathrm{polylog}(n), we derive that the number of cleanup steps mdm\lesssim d. This shows that the level-two Fourier growth is O((m+d)λ)=d3/2polylog(n)O((m+d)\cdot\sqrt{\lambda})=d^{3/2}\cdot\mathrm{polylog}(n) completing the proof.

Note that if we could take λ=polylog(n)\lambda=\mathrm{polylog}(n) while having the same number of cleanup steps m=dpolylog(n)m=d\cdot\mathrm{polylog}(n), then we would obtain an optimal level-two bound of dpolylog(n)d\cdot\mathrm{polylog}(n). However, it is not clear how to use current approach to show this. In Subsection 8.2, we identify examples showing the tightness of our current analysis and also discuss potential ways to circumvent the obstacles within.

We remark that by replacing the Hanson-Wright inequality with its higher-degree variants and performing level-kk cleanups, we can analyze level-kk Fourier growth in the similar way. However, since the first two levels already suffice for our applications and we believe that our level-two bound can be further improved, we do not make the effort of generalizing it to higher levels here.

3 Preliminaries

Notation.

Throughout, log()\log(\cdot) and ln()\ln(\cdot) denote logarithms with base 22 and ee respectively. We use ={0,1,2,}\mathbb{N}=\left\{0,1,2,\ldots\right\} to denote the set of natural numbers including 0. For nn\in\mathbb{N}, we write [n][n] to denote the set {1,2,,n}\left\{1,2,\ldots,n\right\}. We use the standard O(),Ω(),Θ()O(\cdot),\Omega(\cdot),\Theta(\cdot) notation, and emphasize that in this paper they only hide universal constants that do not depend on any parameter.

We write \odot to denote the entrywise product for vectors and matrices: in particular, for any x,ynx,y\in\mathbb{R}^{n}, we define xynx\odot y\in\mathbb{R}^{n} to be a vector where (xy)i=xiyi(x\odot y)_{i}=x_{i}y_{i} for i[n]i\in[n] and similarly for any X,Yn×mX,Y\in\mathbb{R}^{n\times m}, we define XYn×mX\odot Y\in\mathbb{R}^{n\times m} to be a matrix where (XY)ij=XijYij(X\odot Y)_{ij}=X_{ij}Y_{ij} for i[n],j[m]i\in[n],j\in[m]. We use \overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes} to denote a tensor with zeros on the diagonal, i.e., for any xnx\in\mathbb{R}^{n}, xxx\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x is a n×nn\times n matrix where (xx)ij=xixj\left(x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right)_{ij}=x_{i}x_{j} if iji\neq j and zero if i=ji=j.

For a vector xnx\in\mathbb{R}^{n}, we use x\left\|x\right\| to denote its Euclidean norm. Similarly, for a matrix Xn×nX\in\mathbb{R}^{n\times n}, we use X\left\|X\right\| to denote its Euclidean norm viewing the matrix XX as an n2n^{2}-dimensional vector. For nonzero xnx\in\mathbb{R}^{n} or Xn×nX\in\mathbb{R}^{n\times n}, we define unit(x)n\mathrm{unit}(x)\in\mathbb{R}^{n} or unit(X)n×n\mathrm{unit}(X)\in\mathbb{R}^{n\times n} as the unit vector along direction xx and XX respectively: unit(x)=x/x\mathrm{unit}(x)=x/\left\|x\right\| and unit(X)=X/X\mathrm{unit}(X)=X/\left\|X\right\|. We write 𝕊n1\mathbb{S}^{n-1} for the unit sphere in n\mathbb{R}^{n}, and write 𝕊n×n1\mathbb{S}^{n\times n-1} for the unit sphere in n×n\mathbb{R}^{n\times n} where additionally the diagonal entries of the n×nn\times n matrices are zero. We use x,y\left\langle x,y\right\rangle to denote the inner product between vectors x,ynx,y\in\mathbb{R}^{n} and X,Y\left\langle X,Y\right\rangle to denote the inner product between matrices X,Yn×nX,Y\in\mathbb{R}^{n\times n} viewing them as n2n^{2}-dimensional vectors.

Probability.

A probability space is a triple (Ω,,ξ)(\Omega,\mathcal{F},\xi) where Ω\Omega is the sample space, \mathcal{F} is a σ\sigma-algebra which describes the measurable sets (or events) in the probability space, and ξ\xi is a probability measure. We use 𝒙ξ\bm{x}\sim\xi to denote a random sample distributed according to ξ\xi and 𝔼𝒙ξ[f(𝒙)]\operatorname*{\mathbb{E}}_{\bm{x}\sim\xi}[f(\bm{x})] to denote the expectation of a function ff under the measure ξ\xi. For any event SS\in\mathcal{F}, we use ξ(S)\xi(S) to denote the measure of SS under ξ\xi. We say an event SS holds almost surely if ξ(S)=1\xi(S)=1, i.e., the exceptions to the event have measure zero. For a measurable event \mathcal{E}\in\mathcal{F}, we write {}\mathcal{F}\cap\{\mathcal{E}\} to denote the intersection of the sigma-algebra \mathcal{F} and the sigma-algebra generated by \mathcal{E}.

We use νn\nu_{n} to denote the uniform probability measure over {±1}n\{\pm 1\}^{n} and γn\gamma_{n} to denote the nn-dimensional standard Gaussian measure in n\mathbb{R}^{n}. We say a random variable 𝒙n\bm{x}\in\mathbb{R}^{n} is a standard Gaussian in n\mathbb{R}^{n} if its probability distribution is γn\gamma_{n}. We will drop the subscript if the dimension is clear from context. We will also need lower dimensional Gaussian measures: given a linear subspace VV of dimension kk, there is a kk-dimensional standard Gaussian measure on it, which we denote by γV\gamma_{V}. For any measurable subset SnS\subseteq\mathbb{R}^{n}, we define its ambient space to be the smallest affine subspace V+tV+t that contains it where VV is a linear subspace of n\mathbb{R}^{n} and tnt\in\mathbb{R}^{n}. The relative Gaussian measure of SS denoted by γrel(S)\gamma_{\mathrm{rel}}(S) is then defined to be the Gaussian measure of the set StS-t under γV\gamma_{V}.

Martingales.

Given a sequence of real-valued random variables 𝒙1,𝒙2,,𝒙n\bm{x}_{1},\bm{x}_{2},\ldots,\bm{x}_{n} in a probability space (Ω,,ξ)(\Omega,\mathcal{F},\xi) and a function f(𝒙1,,𝒙n)f(\bm{x}_{1},\ldots,\bm{x}_{n}) satisfying 𝔼[|f(𝒙1,,𝒙n)|]<\operatorname*{\mathbb{E}}\left[|f(\bm{x}_{1},\ldots,\bm{x}_{n})|\right]<\infty, the sequence of random variables 𝒛(t)=𝔼[f(𝒙1,,𝒙n)|(t1)]\bm{z}^{(t)}=\operatorname*{\mathbb{E}}\left[f(\bm{x}_{1},\ldots,\bm{x}_{n})\,\middle|\,\mathcal{F}^{(t-1)}\right] is called the Doob martingale where (t1)\mathcal{F}^{(t-1)} is the σ\sigma-algebra generated by 𝒙1,,𝒙t1\bm{x}_{1},\ldots,\bm{x}_{t-1} which should be viewed as a record of the randomness of the process until time t1t-1. The sequence ((t))t(\mathcal{F}^{(t)})_{t} is called a filtration. A sequence of random variables (𝒛(t))t(\bm{z}^{(t)})_{t} is called predictable (or adapted) with respect to (t)\mathcal{F}^{(t)} if 𝒛(t)\bm{z}^{(t)} is (t)\mathcal{F}^{(t)}-measurable for every tt, meaning that it is determined by the randomness in (t)\mathcal{F}^{(t)}.

A discrete random variable 𝝉\bm{\tau}\in\mathbb{N} is called a stopping time with respect to the filtration ((t))t(\mathcal{F}^{(t)})_{t} if the event {𝝉=t}(t)\{\bm{\tau}=t\}\in\mathcal{F}^{(t)} for all tt\in\mathbb{N}, or in words, whether the event 𝝉=t\bm{\tau}=t occurs is determined by the history of the process until time tt. All stopping times considered in this paper will be finite. The sigma-algebra (𝝉)\mathcal{F}^{(\bm{\tau})} which contains all events that imply the stopping condition is defined as the set of all events \mathcal{E} such that {𝝉=t}(t)\mathcal{E}\cap\{\bm{\tau}=t\}\in\mathcal{F}^{(t)} for all tt\in\mathbb{N}. We also note if one takes an increasing sequence of stopping times (𝝉m)m(\bm{\tau}_{m})_{m} then the process defined by (𝒛(𝝉m))m(\bm{z}^{(\bm{\tau}_{m})})_{m} is also a martingale.

Let Δ𝒛(t):=𝒛(t)𝒛(t1)\Delta\bm{z}^{(t)}:=\bm{z}^{(t)}-\bm{z}^{(t-1)} be the martingale differences. Note that 𝔼[Δ𝒛(t)|(t1)]=0\operatorname*{\mathbb{E}}\left[\Delta\bm{z}^{(t)}\,\middle|\,\mathcal{F}^{(t-1)}\right]=0 and thus

𝔼[(𝒛(t))2]=𝔼[(t=1nΔ𝒛(t))2]=𝔼[t=1n(Δ𝒛(t))2],\operatorname*{\mathbb{E}}\left[\left(\bm{z}^{(t)}\right)^{2}\right]=\operatorname*{\mathbb{E}}\left[\left(\sum_{t=1}^{n}\Delta\bm{z}^{(t)}\right)^{2}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{n}\left(\Delta\bm{z}^{(t)}\right)^{2}\right], (3.1)

where the cross terms disappear upon taking expectation. In other words, the martingale differences are orthogonal under taking expectations. The right hand side above is the expected quadratic variation of the martingale (𝒛(t))t\left(\bm{z}^{(t)}\right)_{t}. If the sequence (𝒛(t))t(\bm{z}^{(t)})_{t} is vector-valued (resp., matrix-valued) and satisfies 𝔼[Δ𝒛(t)|(t1)]=0\operatorname*{\mathbb{E}}\left[\Delta\bm{z}^{(t)}\,\middle|\,\mathcal{F}^{(t-1)}\right]=0 where 0 is zero vector (resp., matrix), then we say it is a vector-valued (resp., matrix-valued) martingale with respect to ((t))t(\mathcal{F}^{(t)})_{t}. Since each coordinate of a vector or matrix-valued martingale is itself a real-valued martingale, vector-valued or matrix-valued martingale differences are also orthogonal under Euclidean norms:

𝔼[𝒛(t)2]=𝔼[t=1nΔ𝒛(t)2]=𝔼[t=1nΔ𝒛(t)2].\operatorname*{\mathbb{E}}\left[\left\|\bm{z}^{(t)}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\left\|\sum_{t=1}^{n}\Delta\bm{z}^{(t)}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{n}\left\|\Delta\bm{z}^{(t)}\right\|^{2}\right]. (3.2)
Useful Inequalities.

We will use the well-known level-kk inequality [62, 34] (see e.g., [44, Level-kk Inequalities]). A statement in the Gaussian setting can be found in, e.g., [20, Lemma 2.2]. We remark that we will only use the case for k=1k=1 and k=2k=2 here which we state below.999Our Theorem 3.1 is slightly different from the references, where they additionally require μ1/e\mu\leq 1/e. By Parseval’s identity, the left hand side is always at most one. Therefore we use a slightly worse bound for the right hand side to allow for the whole range of μ\mu.

Below we write 𝟏A\mathbf{1}_{A} for the indicator function of a set and xS=iSxix_{S}=\prod_{i\in S}x_{i} for a monomial.

Theorem 3.1 (Level-kk Inequality).

Let k{1,2}k\in\{1,2\}. Assume AnA\subseteq\mathbb{R}^{n} is measurable and μ:=𝔼𝐱γ[𝟏A(𝐱)]\mu:=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}[\mathbf{1}_{A}(\bm{x})]. Then, we have

|S|=k(𝔼𝒙γ[𝟏A(𝒙)𝒙S])22e2μ2lnk(e/μ).\sum_{|S|=k}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\mathbf{1}_{A}(\bm{x})\bm{x}_{S}\right]\right)^{2}\leq 2e^{2}\mu^{2}\cdot\ln^{k}(e/\mu).

In particular, if μ\mu is non-zero, dividing both sides by μ2\mu^{2}, we get the following more convenient form for k{1,2}k\in\{1,2\}:

|S|=k(𝔼𝒙γ[𝒙S|𝒙A])22e2lnk(e/μ).\sum_{|S|=k}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}_{S}\,\middle|\,\bm{x}\in A\right]\right)^{2}\leq 2e^{2}\cdot\ln^{k}(e/\mu).

We also make use of the following standard concentration inequality for sums of squares of independent standard Gaussians (see [66]).

Fact 3.2.

Let mm\in\mathbb{N} be arbitrary. For any r2mr\geq 2m, we have 𝐏𝐫𝐱γm[i=1m𝐱i2r]er/4\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{m}}\left[\sum_{i=1}^{m}\bm{x}_{i}^{2}\geq r\right]\leq e^{-{r}/{4}}.

We also need a concentration inequality for sums of squares of orthogonal quadratic forms over Gaussian random variables. In particular, we prove the following inequality which follows from a generalization of the Hanson-Wright inequality to a Banach space-valued setting [5, Theorem 6]. Since, we only need a special case that is easier to prove, we include a self-contained proof using the Gaussian isoperimetric inequality in Appendix B following [5, Proposition 23].

Theorem 3.3.

Let mm\in\mathbb{N} be arbitrary. Let M1,,MmM_{1},\ldots,M_{m} be n×nn\times n real matrices where each MiM_{i} has zero diagonal, Mi,Mi=1\left\langle M_{i},M_{i}\right\rangle=1 and Mi,Mj=0\left\langle M_{i},M_{j}\right\rangle=0 for iji\neq j. Then for any r98mr\geq 98m, we have

𝐏𝐫𝒙γn[i=1m𝒙𝒙,Mi2r]exp{Ω(rm+r)}.\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}\geq r\right]\leq\exp\left\{-\Omega\left(\frac{r}{m+\sqrt{r}}\right)\right\}.

We remark that the tail bound above holds more generally for sub-Gaussian random variables 𝒙\bm{x} (see [5]).

4 Fourier Growth via Martingales in Gaussian Space

In this section, we reduce the question of bounding the level-one and level-two Fourier growth to bounding the expected quadratic variation of certain martingales. To analyze these martingales and to prove the optimal bound for the level-one setting, it seems to be crucial to work in the Gaussian setting, so first we give a generic transformation from Boolean to Gaussian. We shall also additionally allow protocols that communicate real numbers to make the analysis easier.

4.1 Communication Protocols in Gaussian Space

Let 𝒞:{±1}n×{±1}n{±1}\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\} be a communication protocol with total communication dd and hh be its XOR-fiber defined in Definition 1.1.

We embed the protocol in the Gaussian space by allowing Alice’s and Bob’s inputs, xx and yy respectively, to be real vectors in n\mathbb{R}^{n} — the new protocol 𝒞~\widetilde{\mathcal{C}} runs the original protocol 𝒞\mathcal{C} with Boolean inputs sgn(x)\mathrm{sgn}(x) and sgn(y)\mathrm{sgn}(y) where sgn(v)=(sgn(v1),,sgn(vn))\mathrm{sgn}(v)=(\mathrm{sgn}(v_{1}),\ldots,\mathrm{sgn}(v_{n})) denotes the sign function applied pointwise to each coordinate for a vector vnv\in\mathbb{R}^{n}. The behavior of the communication protocol 𝒞~\widetilde{\mathcal{C}} can be defined arbitrarily if any coordinate of sgn(x)\mathrm{sgn}(x) or sgn(y)\mathrm{sgn}(y) is zero since such points have zero measure under the standard nn-dimensional Gaussian measure γn\gamma_{n}.

This translation from the Boolean hypercube to the Gaussian space preserves the measure of sets: for any subset S{±1}nS\subseteq\{\pm 1\}^{n}, we have νn(S)=γn({xn|sgn(x)S})\nu_{n}(S)=\gamma_{n}(\left\{x\in\mathbb{R}^{n}\,\middle|\,\mathrm{sgn}(x)\in S\right\}) where νn\nu_{n} is the uniform measure over {±1}n\{\pm 1\}^{n}. Moreover, up to some normalizing factor, the Fourier coefficients of hh can also be computed by looking at Gaussian inputs. In particular, denoting by xS=iSxix_{S}=\prod_{i\in S}x_{i} for a subset S[n]S\subseteq[n], we have the following fact.

Fact 4.1.

For all S[n]S\subseteq[n], we have 𝔼𝐳νn[h(𝐳)𝐳S]=(π/2)|S|𝔼𝐱,𝐲γn[𝒞~(𝐱,𝐲)𝐱S𝐲S]\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}\left[h(\bm{z})\bm{z}_{S}\right]=(\pi/2)^{|S|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\widetilde{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right].

Proof.

Note that for 𝒙γn\bm{x}\sim\gamma_{n}, the random variable sgn(𝒙)\mathrm{sgn}(\bm{x}) is distributed as νn\nu_{n}. Thus, by the definition of the XOR-fiber hh and the protocol 𝒞~\widetilde{\mathcal{C}}, we have

𝔼𝒛νn[h(𝒛)𝒛S]\displaystyle\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}\left[h(\bm{z})\bm{z}_{S}\right] =𝔼𝒙,𝒚γn[𝒞(sgn(𝒙),sgn(𝒚))iSsgn(𝒙i)sgn(𝒚i)]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\mathcal{C}(\mathrm{sgn}(\bm{x}),\mathrm{sgn}(\bm{y}))\cdot\prod_{i\in S}\mathrm{sgn}(\bm{x}_{i})\cdot\mathrm{sgn}(\bm{y}_{i})\right]
=(π/2)|S|𝔼𝒙,𝒚γn[𝒞(sgn(𝒙),sgn(𝒚))iS𝒙i𝒚i]\displaystyle=(\pi/2)^{|S|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\mathcal{C}(\mathrm{sgn}(\bm{x}),\mathrm{sgn}(\bm{y}))\cdot\prod_{i\in S}\bm{x}_{i}\cdot\bm{y}_{i}\right]
=(π/2)|S|𝔼𝒙,𝒚γn[𝒞~(𝒙,𝒚)𝒙S𝒚S],\displaystyle=(\pi/2)^{|S|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\widetilde{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right],

where the second line follows since the expected value of a standard Gaussian in \mathbb{R} conditioned on its sign being fixed to η\eta is 2πη\sqrt{\frac{2}{\pi}}\cdot\eta by the following calculation:

𝔼𝒙iγ[𝒙i|sgn(𝒙i)=η]=η02πrer2/2dr=2πη.\operatorname*{\mathbb{E}}_{\bm{x}_{i}\sim\gamma}\left[\bm{x}_{i}\,\middle|\,\mathrm{sgn}(\bm{x}_{i})=\eta\right]=\eta\cdot\int_{0}^{\infty}\sqrt{\frac{2}{\pi}}\cdot r\cdot e^{-r^{2}/2}\mathrm{d}r=\sqrt{\frac{2}{\pi}}\cdot\eta.\qed
Remark 4.2.

We remark that instead of the Gaussian distribution above, one can work with any distribution where the coordinates are i.i.d. and symmetric around zero. In particular, if ξ\xi is a symmetric probability measure on the real line, and 𝒙,𝒚\bm{x},\bm{y} are independently drawn vectors in n\mathbb{R}^{n} where each coordinate is i.i.d. sampled from ξ\xi, then 𝔼𝒛νn[h(𝒛)𝒛S]=cξ|S|𝔼𝒙,𝒚ξn[𝒞~(𝒙,𝒚)𝒙S𝒚S]\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}\left[h(\bm{z})\bm{z}_{S}\right]=c_{\xi}^{|S|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\xi^{\otimes n}}\left[\widetilde{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right] where cξ=(𝔼𝒙iξ[|𝒙i|])2c_{\xi}=(\operatorname*{\mathbb{E}}_{\bm{x}_{i}\sim\xi}[|\bm{x}_{i}|])^{-2}. In the case of level-two we will need to work with the truncated Gaussian distribution where each coordinate is sampled independently from the one dimensional standard Gaussian conditioned on being in some interval [T,T][-T,T] for T=Ω(1)T=\Omega(1) in which case cξc_{\xi} is upper bounded by a universal constant.

4.2 Generalized Communication Protocols

In the protocol 𝒞~\widetilde{\mathcal{C}} defined above, Alice and Bob’s inputs xx and yy are real vectors in n\mathbb{R}^{n}, but in each round they still exchange a single bit based on sgn(x)\mathrm{sgn}(x) and sgn(y)\mathrm{sgn}(y). In order to bound the Fourier growth, it will be more convenient for us to define a notion of generalized communication protocols where parties are also allowed to send real numbers with arbitrary precision in each round. To define this formally, we place certain restrictions on the real communication in the protocol. More formally, in a generalized communication protocol, in each round a player with input znz\in\mathbb{R}^{n} can either send:

  1. (i)

    a bit in {0,1}\{0,1\} which is purely a function of the Boolean input sgn(z)\mathrm{sgn}(z) and the previous Boolean messages, or

  2. (ii)

    a real number that is a measurable function of zz and the previous (real or Boolean) messages.

The depth of a generalized communication protocol is defined to be the maximum number of rounds of communication.

Note that a generalized protocol also generates a “protocol tree” where if in a round a real number is sent, the “children” of that particular “node” are indexed by all possible values in \mathbb{R}. A “transcript” of the protocol can be defined in an analogous way. The set of inputs that reach a particular node of this generalized protocol tree still form a rectangle X×YX\times Y where X,YnX,Y\subseteq\mathbb{R}^{n}. We say that a generalized protocol 𝒞¯\overline{\mathcal{C}} is equivalent to the protocol 𝒞~\widetilde{\mathcal{C}} if 𝒞¯(x,y)=𝒞~(x,y){\overline{\mathcal{C}}}(x,y)=\widetilde{\mathcal{C}}(x,y) for every x,ynx,y\in\mathbb{R}^{n} except on a measure zero set.

We will be interested in random walks on such generalized protocol trees when the inputs 𝒙\bm{x} and 𝒚\bm{y} are sampled from a product measure ξx×ξy\xi_{x}\times\xi_{y} on n×n\mathbb{R}^{n}\times\mathbb{R}^{n} and the parties send messages according to the protocol to reach a “leaf”. The random variables corresponding to the messages until any time tt generate a filtration ((t))t(\mathcal{F}^{(t)})_{t} — this filtration can be thought of as specifying a particular node of the generalized protocol at depth tt (equivalently, a partial transcript of the protocol till time tt) that was sampled by the process. In this case, conditioned on any event in (t)\mathcal{F}^{(t)}, (e.g., any realization of the transcript till time tt), almost surely the conditional probability measure on the inputs 𝒙,𝒚\bm{x},\bm{y} is some product measure on ξx(t)×ξy(t)\xi_{x}^{(t)}\times\xi_{y}^{(t)} supported on a rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} where 𝑿(t),𝒀(t)n\bm{X}^{(t)},\bm{Y}^{(t)}\subseteq\mathbb{R}^{n}. We shall refer to the random variable 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} as the current rectangle determined by (t)\mathcal{F}^{(t)}. Since we will be working with product measures on inputs 𝒙,𝒚\bm{x},\bm{y}, the reader can think of conditioning on the filtration (t)\mathcal{F}^{(t)} as essentially conditioning on the inputs being in the rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} or equivalently a partial transcript till time tt.

4.3 Fourier Growth via Martingales

We will now relate Fourier growth to the quadratic variation of a martingale. Towards this end, we first note that in light of Fact 4.1, the level-kk Fourier growth of the XOR-fiber hh of the original communication protocol is given by

L1,k(h)=S[n]|S|=k|𝔼𝒛νn[h(𝒛)𝒛S]|\displaystyle L_{1,k}(h)=\sum_{\begin{subarray}{c}S\subseteq[n]\\ |S|=k\end{subarray}}\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}[h(\bm{z})\bm{z}_{S}]\right| =(π/2)kS[n]|S|=k|𝔼𝒙,𝒚γn[𝒞¯(𝒙,𝒚)𝒙S𝒚S]|\displaystyle=(\pi/2)^{k}\sum_{\begin{subarray}{c}S\subseteq[n]\\ |S|=k\end{subarray}}\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}]\right|
=(π/2)kmax(ηS)|S|=kS[n]|S|=kηS𝔼𝒙,𝒚γn[𝒞¯(𝒙,𝒚)𝒙S𝒚S],\displaystyle=(\pi/2)^{k}\max_{(\eta_{S})_{|S|=k}}\sum_{\begin{subarray}{c}S\subseteq[n]\\ |S|=k\end{subarray}}\eta_{S}{\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]}, (4.1)

where 𝒞¯\overline{\mathcal{C}} is any generalized protocol that is equivalent to 𝒞~\widetilde{\mathcal{C}} and ηS{±1}\eta_{S}\in\{\pm 1\}.

We now express the right hand side above as an inner product. Let \bm{\ell} be a random leaf of the generalized protocol tree 𝒞¯\overline{\mathcal{C}} induced by taking 𝒙,𝒚γn\bm{x},\bm{y}\sim\gamma_{n} and let 𝑿×𝒀\bm{X}_{\ell}\times\bm{Y}_{\ell} be the corresponding rectangle in the generalized protocol tree. Then,

S[n],|S|=kηS𝔼𝒙,𝒚γn[𝒞¯(𝒙,𝒚)𝒙S𝒚S]\displaystyle\sum_{\begin{subarray}{c}S\subseteq[n],|S|=k\end{subarray}}\eta_{S}{\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]} =𝔼[𝔼𝒙,𝒚γ[S[n],|S|=kηS𝒞¯(𝒙,𝒚)𝒙S𝒚S|(𝒙,𝒚)𝑿×𝒀]]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{\begin{subarray}{c}S\subseteq[n],|S|=k\end{subarray}}\eta_{S}\cdot\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\,\middle|\,(\bm{x},\bm{y})\in\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}\right]\right]
=𝔼[𝒞¯()𝔼𝒙,𝒚γ[S[n],|S|=kηS𝒙S𝒚S|(𝒙,𝒚)𝑿×𝒀]]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\overline{\mathcal{C}}(\bm{\ell})\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{\begin{subarray}{c}S\subseteq[n],|S|=k\end{subarray}}\eta_{S}\cdot\bm{x}_{S}\bm{y}_{S}\,\middle|\,(\bm{x},\bm{y})\in\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}\right]\right]
𝔼[|S[n],|S|=kηS𝔼[𝒙S|𝒙𝑿]𝔼[𝒚S|𝒚𝒀]|],\displaystyle\leq\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[~{}\left|\sum_{\begin{subarray}{c}S\subseteq[n],|S|=k\end{subarray}}\eta_{S}\operatorname*{\mathbb{E}}\left[\bm{x}_{S}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{S}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right|~{}\right], (4.2)

where the second line follows since \bm{\ell} is a leaf and determines the answer and the third line follows since 𝒙\bm{x} and 𝒚\bm{y} are independent conditioned on being in the rectangle 𝑿×𝒀\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}.

Thus, specializing Subsection 4.3 to the level-one (k=1k=1) and level-two cases (k=2k=2), from Subsection 4.3 we get that

L1,1(h)\displaystyle L_{1,1}(h) π2maxη𝔼[|i=1nηi𝔼[𝒙i|𝒙𝑿]𝔼[𝒚i|𝒚𝒀]|],\displaystyle\leq\frac{\pi}{2}\cdot\max_{\eta}~{}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[~{}\left|\sum_{i=1}^{n}\eta_{i}\cdot\operatorname*{\mathbb{E}}\left[\bm{x}_{i}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{i}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right|~{}\right],
L1,2(h)\displaystyle L_{1,2}(h) π24maxη𝔼[|i,j=1nηij𝔼[𝒙ij|𝒙𝑿]𝔼[𝒚ij|𝒚𝒀]|],\displaystyle\leq\frac{\pi^{2}}{4}\cdot\max_{\eta}~{}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[~{}\left|\sum_{i,j=1}^{n}\eta_{ij}\cdot~{}\operatorname*{\mathbb{E}}\left[\bm{x}_{ij}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{ij}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right|~{}\right],

where for L1,1L_{1,1} we optimize over η{±1}n\eta\in\{\pm 1\}^{n} and for L1,2L_{1,2} we optimize over η\eta being an n×nn\times n symmetric matrix with zeros on the diagonals and ±1\pm 1 entries otherwise.

To make the above more compact, we respectively define μ(X)n\mu(X)\in\mathbb{R}^{n} and σ(X)n×n\sigma(X)\in\mathbb{R}^{n\times n} to be the level-one and level-two centers of mass of a set XnX\subseteq\mathbb{R}^{n}:

μ(X)=𝔼𝒙γn[𝒙|𝒙X]andσ(X)=𝔼𝒙γn[𝒙𝒙|𝒙X].\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\,\middle|\,\bm{x}\in X\right]\quad\text{and}\quad\sigma(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in X\right]. (4.3)

Then, upper bounding the constants in the above inequality (π/2\pi/2 and π2/4\pi^{2}/4) by 44, we get

L1,1(h)4maxη𝔼[|μ(𝑿),ημ(𝒀)|],L1,2(h)4maxη𝔼[|σ(𝑿),ησ(𝒀)|],\begin{split}L_{1,1}(h)&\leq 4\cdot\max_{\eta}~{}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\left|\left\langle\mu(\bm{X}_{\bm{\ell}}),\eta\odot\mu(\bm{Y}_{\bm{\ell}})\right\rangle\right|\right],\\ L_{1,2}(h)&\leq 4\cdot\max_{\eta}~{}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\left|\left\langle\sigma(\bm{X}_{\bm{\ell}}),\eta\odot\sigma(\bm{Y}_{\bm{\ell}})\right\rangle\right|\right],\end{split} (4.4)

where η\eta is understood to be the same as before.

Moving forward, we fix an arbitrary η\eta for both cases k{1,2}k\in\{1,2\} and define a martingale process (𝒛k(t))t\left(\bm{z}^{(t)}_{k}\right)_{t} that captures the right hand side above. For this we note that a generalized communication protocol, where Alice’s and Bob’s inputs are sampled from the Gaussian distribution, naturally induces a discrete-time random walk on the corresponding (generalized) protocol tree where at time tt we are at a node at depth tt with the corresponding rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)}. Then, we have the following proposition.

Proposition 4.3.

μ(𝑿(t))\mu(\bm{X}^{(t)}) and μ(𝐘(t))\mu(\bm{Y}^{(t)}) are vector-valued martingales taking values in n\mathbb{R}^{n} and σ(𝐗(t))\sigma(\bm{X}^{(t)}) and σ(𝐘(t))\sigma(\bm{Y}^{(t)}) are matrix-valued martingales taking values in n×n\mathbb{R}^{n\times n}.

Note that if in the ttht^{\text{th}} round Alice speaks, then μ(𝒀(t))\mu(\bm{Y}^{(t)}) and σ(𝒀(t))\sigma(\bm{Y}^{(t)}) do not change and similarly if Bob speaks, then μ(𝑿(t))\mu(\bm{X}^{(t)}) and σ(𝑿(t))\sigma(\bm{X}^{(t)}) do not change. The above proposition implies that the real-valued processes

𝒛1(t)=μ(𝑿(t)),ημ(𝒀(t)) and 𝒛2(t)=σ(𝑿(t)),ησ(𝒀(t)),\bm{z}^{(t)}_{1}=\left\langle\mu(\bm{X}^{(t)}),\eta\odot\mu(\bm{Y}^{(t)})\right\rangle\text{ and }\bm{z}^{(t)}_{2}=\left\langle\sigma(\bm{X}^{(t)}),\eta\odot\sigma(\bm{Y}^{(t)})\right\rangle, (4.5)

each form a Doob martingale with respect to the natural filtration induced by the random walk on the protocol tree. Note that taking a random walk on the tree until we hit a leaf generates the marginal distribution on \bm{\ell} given in Equation 4.4. Let 𝒅\bm{d} be the stopping time when this martingale hits a leaf and stops (i.e., the depth of the random leaf). Thus, by the orthogonality of martingale differences Δ𝒛k(t)=𝒛k(t)𝒛k(t1)\Delta\bm{z}^{(t)}_{k}=\bm{z}^{(t)}_{k}-\bm{z}^{(t-1)}_{k} from Equation 3.1, we get that for k{1,2}k\in\{1,2\}, one can upper bound the Fourier growth in terms of expected quadratic variation of the above martingales:

Proposition 4.4.

For k{1,2}k\in\{1,2\}, 14L1,k(h)maxη𝔼[(𝐳k(𝐝))2]=maxη𝔼[t=1𝐝(Δ𝐳k(t))2]\frac{1}{4}\cdot L_{1,k}(h)\leq\max_{\eta}\sqrt{\operatorname*{\mathbb{E}}\left[\left(\bm{z}^{(\bm{d})}_{k}\right)^{2}\right]}=\max_{\eta}\sqrt{\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{k}\right)^{2}\right]}.

The martingale implicitly depends on η\eta as used in Equation 4.4 and hence the maximum. Moreover, the martingale also depends on the underlying generalized communication protocol 𝒞¯\overline{\mathcal{C}}. In the next two sections, we will show that after transforming the original communication protocol into “clean” protocols, the expected quadratic variations of (𝒛1(t))t(\bm{z}^{(t)}_{1})_{t} and (𝒛2(t))t(\bm{z}^{(t)}_{2})_{t} are O(d)O(d) and O(d3)polylog(n)O(d^{3})\cdot\mathrm{polylog}(n) respectively. This will then imply our main theorems.

Remark 4.5.

Note that Proposition 4.3 still holds even if the input distribution is not the Gaussian distribution, but some other product probability measure on the inputs 𝒙,𝒚\bm{x},\bm{y}. This also implies that 𝒛k(t)\bm{z}^{(t)}_{k} for k{1,2}k\in\{1,2\} is a martingale. In particular, for the level-two case, we will need to use a truncated Gaussian distribution. In light of Remark 4.2, Proposition 4.4 still suffices for us with a different constant instead of 1/41/4. We also remark that we shall also need to truncate the real messages being used in the protocol for the level-two case to a finite precision, so the generalized protocols for the level-two case only have Boolean communication. However, to obtain the optimal level-one bound allowing generalized protocols that communicate real values seems to be crucial.

5 Level-One Fourier Growth

In this section, we will give a proof of Theorem 1.2 that L1,1(h)=O(d)L_{1,1}(h)=O(\sqrt{d}). We start with a dd-round communication protocol 𝒞~\widetilde{\mathcal{C}} over the Gaussian space as defined in Subsection 4.1. Given the discussion in the previous section and Proposition 4.4, our task ultimately reduces to bounding the expected quadratic variation of the martingale that results from the protocol 𝒞¯\overline{\mathcal{C}}. For example, one can simply take 𝒞¯=𝒞~\overline{\mathcal{C}}=\widetilde{\mathcal{C}}, but, as discussed in Section 2, the individual step sizes of this martingale can be quite large in the worst-case and it is not so easy to leverage cancellations here to bound the quadratic variation by O(d)O(d).

So, we first define a generalized communication protocol 𝒞¯\overline{\mathcal{C}} that is equivalent to the original protocol 𝒞~\widetilde{\mathcal{C}} but has additional “cleanup” rounds where Alice and Bob reveal certain linear forms of their inputs so that their sets are pairwise clean in the sense described in the overview. These cleanup steps allow us to keep track of the quadratic variation more easily.

5.1 Pairwise Clean Protocols

To define a clean protocol, we first define the notion of a pairwise clean set. Let XnX\subseteq\mathbb{R}^{n}. We say that the set XX is pairwise clean in a direction a𝕊n1a\in\mathbb{S}^{n-1} with parameter λ\lambda if

𝔼𝒙γ[𝒙μ(X),a2|𝒙X]λ,\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\mu(X),a\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]\leq\lambda, (5.1)

where we recall that μ(X)=𝔼𝒙γ[𝒙|𝒙X]\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\,\middle|\,\bm{x}\in X\right] is the level-one center of mass of XX.

The above condition implies that for a random vector 𝒙\bm{x} sampled from γ\gamma conditioned on XX, its variance along the direction aa is bounded by λ\lambda. We say that the set XX is pairwise clean (with parameter λ\lambda) if it is clean in every direction a𝕊n1a\in\mathbb{S}^{n-1}. Equivalently, the operator norm of the covariance matrix of the random vector 𝒙\bm{x} is bounded by λ\lambda.

We call a generalized communication protocol pairwise clean with parameter λ\lambda if at the start of a new “phase” of the protocol, the corresponding rectangle X×YX\times Y satisfies that both XX and YY are pairwise clean. Starting from a communication protocol 𝒞~\widetilde{\mathcal{C}} in the Gaussian space, we will transform it into a pairwise clean protocol 𝒞¯\overline{\mathcal{C}} by proceeding from top to bottom and adding certain Gram-Schmidt orthogonalization and cleanup steps.

In particular, consider an intermediate node in the protocol tree of 𝒞~\widetilde{\mathcal{C}}. Before Alice sends her bit as in the original protocol 𝒞~\widetilde{\mathcal{C}}, she first performs an orthogonalization step by revealing the inner-product between her input and Bob’s current level-one center of mass. After this, she sends her bit according to the original protocol and afterwards she repeatedly cleans her current set XX by revealing x,a\left\langle x,a\right\rangle\in\mathbb{R} while XX is not clean along the direction aa orthogonal to previous directions. Once XX becomes clean, they proceed to the next round. We now describe this formally.

Construction of pairwise clean protocol 𝒞¯\overline{\mathcal{C}} from 𝒞~\widetilde{\mathcal{C}}.

We set λ=100\lambda=100. The construction of the new protocol is recursive and we first define some notation. Consider an intermediate node of the new protocol 𝒞¯\overline{\mathcal{C}} at depth tt. We use the random variable 𝑿(t)n\bm{X}^{(t)}\subseteq\mathbb{R}^{n} (resp., 𝒀(t)n\bm{Y}^{(t)}\subseteq\mathbb{R}^{n}) to denote the set of inputs of Alice (resp., Bob) reaching the node. If Alice reveals a linear form in this step, we use 𝒂(t)n\bm{a}^{(t)}\in\mathbb{R}^{n} to denote the vector of the linear form; otherwise, we set 𝒂(t)\bm{a}^{(t)} to be the all-zeroes vector. We define 𝒃(t)\bm{b}^{(t)} similarly for Bob. Throughout the protocol, we will abbreviate 𝒖(t)=μ(𝑿(t))\bm{u}^{(t)}=\mu(\bm{X}^{(t)}) and 𝒗(t)=μ(𝒀(t))\bm{v}^{(t)}=\mu(\bm{Y}^{(t)}) for Alice’s and Bob’s current center of mass respectively.

  1. 1.

    At the beginning, Alice receives an input xnx\in\mathbb{R}^{n} and Bob receives an input yny\in\mathbb{R}^{n}.

  2. 2.

    We initialize t0t\leftarrow 0, 𝑿(0),𝒀(0)n\bm{X}^{(0)},\bm{Y}^{(0)}\leftarrow\mathbb{R}^{n}, and 𝒂(0),𝒃(0)0n\bm{a}^{(0)},\bm{b}^{(0)}\leftarrow 0^{n}.

  3. 3.

    For each phase i=1,2,,di=1,2,\ldots,d: suppose we are starting the cleanup for a node at depth ii in the original protocol 𝒞~\widetilde{\mathcal{C}} and suppose we are at a node of depth tt in the new protocol 𝒞¯\overline{\mathcal{C}}. If it is Alice’s turn to speak in 𝒞~\widetilde{\mathcal{C}}:

    1. (a)

      Orthogonalization by revealing the correlation with Bob’s center of mass.
      Alice begins by revealing the inner product of her input xx with Bob’s current (signed) center of mass η𝒗(t)\eta\odot\bm{v}^{(t)}. Since in the previous steps, she has already revealed the inner product with Bob’s previous centers of mass, for technical reasons, we will only have Alice announce the inner product with the component of η𝒗(t)\eta\odot\bm{v}^{(t)} that is orthogonal to the previous directions along which Alice announced the inner product. More formally, let 𝒂(t+1)\bm{a}^{(t+1)} be the component of η𝒗(t)\eta\odot\bm{v}^{(t)} that is orthonormal to all previous directions 𝒂(1),,𝒂(t)\bm{a}^{(1)},\dots,\bm{a}^{(t)}, i.e.,

      𝒂(t+1)=unit(η𝒗(t)τ=1tη𝒗(t),𝒂(τ)𝒂(τ)).\textstyle\bm{a}^{(t+1)}=\mathrm{unit}\left(\eta\odot\bm{v}^{(t)}-\sum_{\tau=1}^{t}\left\langle\eta\odot\bm{v}^{(t)},\bm{a}^{(\tau)}\right\rangle\cdot\bm{a}^{(\tau)}\right).

      Alice computes 𝒄¯(t+1)x,𝒂(t+1)\overline{\bm{c}}^{(t+1)}\leftarrow\left\langle x,\bm{a}^{(t+1)}\right\rangle and sends 𝒄¯(t+1)\overline{\bm{c}}^{(t+1)} to Bob. Set 𝒃(t+1)0n\bm{b}^{(t+1)}\leftarrow 0^{n}. Increment tt by 1 and go to step (b).

    2. (b)

      Original communication. Alice sends the bit 𝒄¯(t+1)\overline{\bm{c}}^{(t+1)} that she was supposed to send in 𝒞~\widetilde{\mathcal{C}} based on previous messages and the input xx. Set 𝒂(t+1),𝒃(t+1)0n\bm{a}^{(t+1)},\bm{b}^{(t+1)}\leftarrow 0^{n}. Increment tt by 1 and go to step (c).

    3. (c)

      Cleanup steps. While there exists some direction a𝕊n1a\in\mathbb{S}^{n-1} orthogonal to the previous directions (i.e., satisfying a,𝒂(τ)=0\left\langle a,\bm{a}^{(\tau)}\right\rangle=0 for all τ[t]\tau\in[t]) such that 𝑿(t)\bm{X}^{(t)} is not pairwise clean in direction aa, Alice computes 𝒄¯(t+1)x,a\overline{\bm{c}}^{(t+1)}\leftarrow\left\langle x,a\right\rangle and sends this to Bob. Set 𝒂(t+1)a\bm{a}^{(t+1)}\leftarrow a and 𝒃(t+1)0n\bm{b}^{(t+1)}\leftarrow 0^{n}. Increment tt by 1. Repeat step (c) as long as 𝑿(t)\bm{X}^{(t)} is not pairwise clean; otherwise increment ii by 1 and go back to the for-loop in step 3 which starts the new phase.

    If it is Bob’s turn to speak, we define everything similarly with the role of x,𝒂,𝑿,𝒗x,\bm{a},\bm{X},\bm{v} switched with y,𝒃,𝒀,𝒖y,\bm{b},\bm{Y},\bm{u}.

  4. 4.

    Finally at the end of the protocol, the value 𝒞¯(x,y)\overline{\mathcal{C}}(x,y) is determined based on all the previous communication and the corresponding output it defines in 𝒞~\widetilde{\mathcal{C}}.

We note some basic properties that directly follow from the description. First we note that the steps 3(a), 3(b), and 3(c) always occur in sequence for each party and we refer to such a sequence of steps as a phase for that party. Note that there are at most dd phases. If a new phase starts at time tt, then the current rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} is pairwise clean for both parties by construction. Also, note that the non-zero vectors in the sequence (𝒂(t))t(\bm{a}^{(t)})_{t} (resp., (𝒃(t))t(\bm{b}^{(t)})_{t}) form an orthonormal set. We also note that the Boolean communication in step 3(b) is solely determined by the original protocol and hence only depends on the previous Boolean messages.

Lastly, each phase has one 3(a) and 3(b) step, followed by potentially many 3(c) steps. However, the following claim shows that it is always finite.

Claim 5.1.

Let \ell be an arbitrary leaf of the protocol 𝒞¯\overline{\mathcal{C}} and D()D(\ell) be its depth. Then D()2n+2dD(\ell)\leq 2n+2d. Moreover, along this path there are at most 2d2d many steps 3(a) and 3(b).

Proof.

We count the number of communication steps separately:

  • Steps 3(a) and 3(b). Steps 3(a) and 3(b) occur once in every phase, thus at most dd times.

  • Step 3(c). For Alice, each time she communicates at step 3(c) ana\in\mathbb{R}^{n}, the direction is orthogonal to all previous 𝒂(t)\bm{a}^{(t)}’s. Since the dimension of n\mathbb{R}^{n} is nn, this happens at most nn times. Similar argument works for Bob.

Thus in total we have at most 2n+2d2n+2d steps. ∎

We will eventually show that the expected depth of the protocol 𝒞¯\overline{\mathcal{C}} is O(d)O(d) when 𝒙,𝒚γn\bm{x},\bm{y}\sim\gamma_{n}.

5.2 Bounding the Expected Quadratic Variation

Consider a random walk on the protocol tree generated by the new protocol 𝒞¯\overline{\mathcal{C}} when the parties are given independent inputs 𝒙,𝒚γn\bm{x},\bm{y}\sim\gamma_{n}. Consider the corresponding level-one martingale process defined in Equation 4.5. Formally, at time tt the process is defined by

𝒛1(t)=𝒖(t),η𝒗(t),\bm{z}^{(t)}_{1}=\left\langle\bm{u}^{(t)},\eta\odot\bm{v}^{(t)}\right\rangle,

where we recall that 𝒖(t)=μ(𝑿(t))\bm{u}^{(t)}=\mu(\bm{X}^{(t)}) and 𝒗(t)=μ(𝒀(t))\bm{v}^{(t)}=\mu(\bm{Y}^{(t)}) and η{±1}n\eta\in\{\pm 1\}^{n} is a fixed sign vector.

The martingale process stops once it hits a leaf of the protocol 𝒞¯\overline{\mathcal{C}}. Let 𝒅\bm{d} denote the (stopping) time when this happens. Note that 𝔼[𝒅]\operatorname*{\mathbb{E}}[\bm{d}] is exactly the expected depth of the protocol 𝒞¯\overline{\mathcal{C}}. Then, in light of Proposition 4.4, to prove Theorem 1.2, it suffices to prove the following.

Lemma 5.2.

𝔼[t=1𝒅(Δ𝒛1(t))2]=O(d)\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{1}\right)^{2}\right]=O(d).

We will prove this in two steps. We first show that the only change in the value of the martingale occurs during the orthogonalization step 3(a). This is because in each phase, Alice’s change of center of mass in steps 3(b) and 3(c) is always orthogonal to η𝒗(t)\eta\odot\bm{v}^{(t)} so they do not change the value of the martingale 𝒛1(t)\bm{z}^{(t)}_{1} as discussed in Section 2. Moreover, recalling Subsection 2.1, since Alice’s node was pairwise clean just before Alice sent the message in step 3(a), the expected change 𝔼[(Δ𝒛1(t+1))2]\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{1}\right)^{2}\right] can be bounded in terms of the squared norm of the change that occurred in 𝒖(t)\bm{u}^{(t)} between the current round and the last round where Alice was in step 3(a). A similar argument works for Bob.

Formally, this is encapsulated by the next lemma for which we need some additional definition. Let ((t))t(\mathcal{F}^{(t)})_{t} be the natural filtration induced by the random walk on the generalized protocol tree with respect to which 𝒛1(t)\bm{z}^{(t)}_{1} is a Doob martingale and also 𝒖(t),𝒗(t)\bm{u}^{(t)},\bm{v}^{(t)} form vector-valued martingales (recall Proposition 4.3). Note that (t)\mathcal{F}^{(t)} fixes all the rectangles encountered during times 0,,t0,\ldots,t and thus for τt\tau\leq t, the random variables 𝒖(τ),𝒗(τ),𝒛1(τ)\bm{u}^{(\tau)},\bm{v}^{(\tau)},\bm{z}^{(\tau)}_{1} are determined, in particular, they are (t)\mathcal{F}^{(t)}-measurable. Recalling that λ=100\lambda=100 is the cleanup parameter, we then have the following. Below we assume without any loss of generality that Alice speaks first and, in particular, we note that Alice speaks in step 3(a) for the first time at time zero.

Lemma 5.3 (Step Size).

Let 0=𝛕1<𝛕2<𝐝0=\bm{\tau}_{1}<\bm{\tau}_{2}<\cdots\leq\bm{d} be a sequence of stopping times with 𝛕m\bm{\tau}_{m} being the index of the round where Alice speaks in step 3(a) for the mthm^{\text{th}} time or 𝐝\bm{d} if there is no such round. Then, for any integer m2m\geq 2,

𝔼[(Δ𝒛1(𝝉m+1))2|(𝝉m)]λ𝒗(𝝉m)𝒗(𝝉m1)2,\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}_{m}+1)}_{1}\right)^{2}~{}\bigg{|}~{}\mathcal{F}^{(\bm{\tau}_{m})}\right]\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2},

and moreover, for any tt\in\mathbb{N}, we have that

𝔼[(Δ𝒛1(t+1))2|(t),𝝉m1<t<𝝉m,Alice speaks at time t]=0.\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{1}\right)^{2}~{}\bigg{|}~{}\mathcal{F}^{(t)},\bm{\tau}_{m-1}<t<\bm{\tau}_{m},\text{Alice speaks at time }t\right]=0.

A similar statement also holds if Bob speaks where 𝐯\bm{v} is replaced by 𝐮\bm{u} and the sequence (𝛕m)(\bm{\tau}_{m}) is replaced by (𝛕m)(\bm{\tau}^{\prime}_{m}) where 𝛕m\bm{\tau}^{\prime}_{m} is the index of the round where Bob speaks in step 3(a) for the mthm^{\text{th}} time or 𝐝\bm{d} if there is no such round.

In particular, we see that the steps 3(b) and 3(c) do not contribute to the quadratic variation and only the steps 3(a) do. Also, since the first time Alice and Bob speak, they start in step 3(a), we also note that 𝒖(𝝉1)\bm{u}^{(\bm{\tau}_{1})} and 𝒗(𝝉1)\bm{v}^{(\bm{\tau}^{\prime}_{1})} are their initial centers of mass which are both zero.

We shall prove the above lemma in Subsection 5.3 and continue with the bound on the quadratic variation here. Using Lemma 5.3, we have

𝔼[t=1𝒅(Δ𝒛1(t))2]λ𝔼[m2𝒗(𝝉m)𝒗(𝝉m1)2+𝒖(𝝉m)𝒖(𝝉m1)2].\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{1}\right)^{2}\right]\leq\lambda\cdot\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+\left\|\bm{u}^{(\bm{\tau}^{\prime}_{m})}-\bm{u}^{(\bm{\tau}^{\prime}_{m-1})}\right\|^{2}\right].

On the other hand, by the orthogonality of vector-valued martingale differences from Equation 3.2, we have

𝔼[m2𝒗(𝝉m)𝒗(𝝉m1)2]=𝔼[𝒗(𝒅)2].\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right].

A similar statement holds for (𝒖(t))t(\bm{u}^{(t)})_{t}. Therefore,

𝔼[t=1𝒅(Δ𝒛1(t))2]λ(𝔼[𝒖(𝒅)2]+𝔼[𝒗(𝒅)2]).\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{1}\right)^{2}\right]\leq\lambda\cdot\left(\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}\right]+\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\right). (5.2)

We prove the following in Subsection 5.4 to upper bound the quantity on the right hand side above. Loosely speaking, by an application of level-one inequalities (see Theorem 3.1), the lemma below ultimately boils down to a bound on the expected number of cleanup steps.

Lemma 5.4 (Final Center of Mass).

𝔼[𝒖(𝒅)2+𝒗(𝒅)2]=O(d).\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]=O(d).

Since λ=100\lambda=100, plugging in the bounds from the above into Equation 5.2 readily implies Lemma 5.2. Together with Proposition 4.4, this completes the proof of Theorem 1.2.

5.3 Bounds on Step Sizes (Proof of Lemma 5.3)

Let us abbreviate 𝝉=𝝉m\bm{\tau}=\bm{\tau}_{m}. Observe that

𝔼[(Δ𝒛1(𝝉+1))2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{1}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] =𝔼[𝒖(𝝉+1)𝒖(𝝉),η𝒗(𝝉)2|(𝝉)]\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]
=𝔼[𝒖(𝝉+1),η𝒗(𝝉)2𝒖(𝝉),η𝒗(𝝉)2|(𝝉)],\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right], (5.3)

where the second line is due to (𝒖(t))t(\bm{u}^{(t)})_{t} being a vector-valued martingale and thus 𝔼[𝒖(𝝉+1)|(𝝉)]=𝒖(𝝉)\operatorname*{\mathbb{E}}\left[\bm{u}^{(\bm{\tau}+1)}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\bm{u}^{(\bm{\tau})}.

We first consider the case that at time 𝝉\bm{\tau} a new phase starts for Alice. By construction, this means that the current rectangle 𝑿(𝝉)×𝒀(𝝉)\bm{X}^{(\bm{\tau})}\times\bm{Y}^{(\bm{\tau})} determined by (𝝉)\mathcal{F}^{(\bm{\tau})} is pairwise clean with parameter λ\lambda, and since Alice is in step 3(a) at the start of a new phase, 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is chosen to be the (normalized) component of η𝒗(𝝉)\eta\odot\bm{v}^{(\bm{\tau})} that is orthogonal to previous directions 𝒂(0),,𝒂(𝝉)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}. Let 𝜷(𝝉+1):=η𝒗(𝝉),𝒂(𝝉+1)\bm{\beta}^{(\bm{\tau}+1)}:=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle be the length of this component before normalization. Note that 𝜷(𝝉+1)\bm{\beta}^{(\bm{\tau}+1)} is (𝝉)\mathcal{F}^{(\bm{\tau})}-measurable (i.e., it is determined by (𝝉)\mathcal{F}^{(\bm{\tau})}).

We now claim that components of 𝒖(𝝉+1)\bm{u}^{(\bm{\tau}+1)} and 𝒖(𝝉)\bm{u}^{(\bm{\tau})} are the same along any of the previous directions 𝒂(0),,𝒂(𝝉)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}. So in Equation 5.3, they cancel out and the only relevant quantity is the component in the direction 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)}. This follows since, in all the previous steps t𝝉t\leq\bm{\tau}, Alice has already fixed x,𝒂(t)\langle x,\bm{a}^{(t)}\rangle. This implies that for any 𝑿(𝝉)\bm{X}^{(\bm{\tau})} and 𝑿(𝝉+1)\bm{X}^{(\bm{\tau}+1)} that are determined by (𝝉+1)\mathcal{F}^{(\bm{\tau}+1)}, the inner product with all the previous 𝒂(0),,𝒂(𝝉)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})} is fixed over the choice of xx from both rectangles. Formally, we have that for any x𝑿(𝝉)x\in\bm{X}^{(\bm{\tau})} and x𝑿(𝝉+1)x^{\prime}\in\bm{X}^{(\bm{\tau}+1)}, it holds that x,𝒂(t)=x,𝒂(t)\langle x,\bm{a}^{(t)}\rangle=\langle x^{\prime},\bm{a}^{(t)}\rangle for any t𝝉t\leq\bm{\tau}. In particular, since 𝒖(𝝉)=μ(𝑿(𝝉))\bm{u}^{(\bm{\tau})}=\mu(\bm{X}^{(\bm{\tau})}) and 𝒖(𝝉+1)=μ(𝑿(𝝉+1))\bm{u}^{(\bm{\tau}+1)}=\mu(\bm{X}^{(\bm{\tau}+1)}) are the corresponding centers of mass, we have that

𝒖(𝝉+1),𝒂(t)=𝒖(𝝉),𝒂(t) for all t𝝉.\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(t)}\right\rangle=\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(t)}\right\rangle\text{ for all }t\leq\bm{\tau}. (5.4)

This, together with Equation 5.3 and recalling that 𝜷(𝝉+1)\bm{\beta}^{(\bm{\tau}+1)} is determined by (𝝉)\mathcal{F}^{(\bm{\tau})}, implies that

𝔼[(Δ𝒛1(𝝉+1))2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{1}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] =(𝜷(𝝉+1))2𝔼[𝒖(𝝉+1),𝒂(𝝉+1)2𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)].\displaystyle=\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\cdot\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]. (5.5)

We now bound the term outside the expectation by the change in the center of mass 𝒗()\bm{v}^{(\cdot)} and the term inside the expectation by the fact that the set is pairwise clean.

Term Outside the Expectation.

Recall that 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is chosen to be the (normalized) component of η𝒗(𝝉)\eta\odot\bm{v}^{(\bm{\tau})} that is orthogonal to the span of 𝒂(0),,𝒂(𝝉)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}. Since η𝒗(𝝉m1)\eta\odot\bm{v}^{(\bm{\tau}_{m-1})} is in the span of 𝒂(1),,𝒂(𝝉m1+1)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)} and 𝝉m1+1𝝉=𝝉m\bm{\tau}_{m-1}+1\leq\bm{\tau}=\bm{\tau}_{m}, it is orthogonal to 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)}. Hence,

𝜷(𝝉+1)=η𝒗(𝝉),𝒂(𝝉+1)=η(𝒗(𝝉)𝒗(𝝉m1)),𝒂(𝝉+1).\bm{\beta}^{(\bm{\tau}+1)}=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle=\left\langle\eta\odot\left(\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right),\bm{a}^{(\bm{\tau}+1)}\right\rangle.

Since 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is a unit vector and each entry of η\eta is in {±1}\{\pm 1\}, this implies that

(𝜷(𝝉+1))2𝒗(𝝉)𝒗(𝝉m1)2.\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\leq\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}. (5.6)
Term Inside the Expectation.

Since (𝒖(τ))(\bm{u}^{(\tau)}) is a vector-valued martingale with respect to (τ)\mathcal{F}^{(\tau)}, and 𝒂(τ+1)\bm{a}^{(\tau+1)} is (τ)\mathcal{F}^{(\tau)}-measurable (determined by (τ)\mathcal{F}^{(\tau)}), we have that

𝔼[𝒖(𝝉+1),𝒂(𝝉+1)2𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]=𝔼[𝒖(τ+1)𝒖(τ),𝒂(𝝉+1)2|(τ)].\displaystyle\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\tau+1)}-\bm{u}^{(\tau)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\tau)}\right].

Since Alice is in step 3(a), her message fixes x,𝒂(𝝉+1)\left\langle x,\bm{a}^{(\bm{\tau}+1)}\right\rangle at time 𝝉\bm{\tau} for every x𝑿(𝝉+1)x\in\bm{X}^{(\bm{\tau}+1)}. Thus,

𝔼[𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] =𝔼[𝔼𝒙γ[𝒙|𝒙𝑿(𝝉+1)]𝒖(τ),𝒂(𝝉+1)2|(𝝉)]\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\,\middle|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]-\bm{u}^{(\tau)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]
=𝔼[𝔼𝒙γ[𝒙𝒖(𝝉),𝒂(𝝉+1)2|𝒙𝑿(𝝉+1)]|(𝝉)]\displaystyle=\operatorname*{\mathbb{E}}\left[\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]
=𝔼[𝒙𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)],\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right], (5.7)

where the last line follows from the tower property of conditional expectation.

Recall that 𝒖(𝝉)=μ(𝑿(𝝉))\bm{u}^{(\bm{\tau})}=\mu(\bm{X}^{(\bm{\tau})}) is the center of mass. Moreover, the unit vector 𝒂(τ+1)\bm{a}^{(\tau+1)} is determined by (τ)\mathcal{F}^{(\tau)} and also the conditional distribution of 𝒙\bm{x} conditioned on (τ)\mathcal{F}^{(\tau)} is that of 𝒙γ\bm{x}\sim\gamma conditioned on 𝒙𝑿(τ)\bm{x}\in\bm{X}^{(\tau)}. Thus, using the fact that 𝑿(𝝉)\bm{X}^{(\bm{\tau})} is pairwise clean since Alice is in step 3(a), the right hand side in Subsection 5.3 is at most λ\lambda.

Final Bound.

Substituting the above in Equation 5.5, we have

𝔼[(Δ𝒛1(𝝉+1))2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{1}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] λ(𝜷(𝝉+1))2λ𝒗(𝝉)𝒗(𝝉m1)2,\displaystyle\leq\lambda\cdot\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2},

where the second inequality follows from Equation 5.6. This completes the proof of the first statement.

For the moreover part, let us condition on the event 𝝉m1<t<𝝉m\bm{\tau}_{m-1}<t<\bm{\tau}_{m} where Alice speaks at time tt. Note that such tt must all lie in the same phase of the protocol where Alice is the only one speaking. So, Bob’s center of mass does not change from the time 𝝉m1\bm{\tau}_{m-1} till tt, i.e., 𝒗(t+1)=𝒗(𝝉m1)\bm{v}^{(t+1)}=\bm{v}^{(\bm{\tau}_{m-1})}. Thus we have Δ𝒛1(t+1)=𝒖(t+1)𝒖(t),η𝒗(𝝉m1)\Delta\bm{z}^{(t+1)}_{1}=\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}\right\rangle. Analogous to Equation 5.4, the component of Alice’s center of mass along the previous directions are fixed. Thus 𝒖(t+1),𝒂(r)=𝒖(t),𝒂(r)\left\langle\bm{u}^{(t+1)},\bm{a}^{(r)}\right\rangle=\left\langle\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle for all rtr\leq t. Furthermore, by construction, η𝒗(𝝉m1)\eta\odot\bm{v}^{(\bm{\tau}_{m-1})} lies in the linear subspace spanned by 𝒂(0),,𝒂(𝝉m1+1)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)}. Therefore, since 𝝉m1+1t\bm{\tau}_{m-1}+1\leq t, it follows that Δ𝒛1(t+1)=0\Delta\bm{z}^{(t+1)}_{1}=0.

5.4 Expected Norm of Final Center of Mass (Proof of Lemma 5.4)

Let 𝑯A=𝑯A(𝒅)\bm{H}_{A}=\bm{H}_{A}^{(\bm{d})} be the (random) linear subspace spanned by the vectors 𝒂(0),,𝒂(𝒅)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{d})} and similarly, let 𝑯B=𝑯B(𝒅)\bm{H}_{B}=\bm{H}_{B}^{(\bm{d})} be the linear subspace spanned by the vectors 𝒃(0),,𝒃(𝒅)\bm{b}^{(0)},\ldots,\bm{b}^{(\bm{d})}. For any linear subspace VV of n\mathbb{R}^{n}, we denote by 𝚷V\bm{\Pi}_{V} and 𝚷V\bm{\Pi}_{V^{\bot}} the projectors on the subspace VV and its orthogonal complement VV^{\bot} respectively. Then, we have that

𝒖(𝒅)2=𝚷HA𝒖(𝒅)2+𝚷HA𝒖(𝒅)2 and 𝒗(𝒅)2=𝚷HB𝒗(𝒅)2+𝚷HB𝒗(𝒅)2.\left\|\bm{u}^{(\bm{d})}\right\|^{2}=\left\|\bm{\Pi}_{H_{A}}\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{H_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}\text{ and }\left\|\bm{v}^{(\bm{d})}\right\|^{2}=\left\|\bm{\Pi}_{H_{B}}\bm{v}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{H_{B}^{\bot}}\bm{v}^{(\bm{d})}\right\|^{2}.

Note that the non-zero vectors in (𝒂(t))t(\bm{a}^{(t)})_{t} and (𝒃(t))t(\bm{b}^{(t)})_{t} form an orthonormal basis for the subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B} respectively. Moreover, for each t𝒅t\leq\bm{d}, the inner product x,𝒂(t)\left\langle x,\bm{a}^{(t)}\right\rangle is fixed for every x𝑿(𝒅)x\in\bm{X}^{(\bm{d})} and the inner product y,𝒃(t)\left\langle y,\bm{b}^{(t)}\right\rangle is also fixed for every y𝒀(𝒅)y\in\bm{Y}^{(\bm{d})} where 𝑿(𝒅)×𝒀(𝒅)\bm{X}^{(\bm{d})}\times\bm{Y}^{(\bm{d})} is the current rectangle determined by (𝒅)\mathcal{F}^{(\bm{d})}. In particular, since 𝒖(𝒅)\bm{u}^{(\bm{d})} is the center of mass of 𝑿(𝒅)\bm{X}^{(\bm{d})}, this implies that

𝚷HA𝒖(𝒅)2=t=1𝒅𝒖(𝒅),𝒂(t)2\displaystyle\left\|\bm{\Pi}_{H_{A}}\bm{u}^{(\bm{d})}\right\|^{2}=\sum_{t=1}^{\bm{d}}\left\langle\bm{u}^{(\bm{d})},\bm{a}^{(t)}\right\rangle^{2} =t=1𝒅(𝔼𝒙γ[𝒙,𝒂(t)|𝒙𝑿(𝒅)])2\displaystyle=\sum_{t=1}^{\bm{d}}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle\,\middle|\,\bm{x}\in\bm{X}^{(\bm{d})}\right]\right)^{2}
=t=1𝒅𝔼𝒙γ[𝒙,𝒂(t)2|𝒙𝑿(𝒅)],\displaystyle=\sum_{t=1}^{\bm{d}}\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in\bm{X}^{(\bm{d})}\right],

where the second line follows from the inner product being fixed in 𝑿(𝒅)\bm{X}^{(\bm{d})}. Therefore, we have

𝒖(𝒅)2=t=1𝒅𝔼𝒙γ[𝒙,𝒂(t)2|𝒙𝑿(𝒅)]𝒑A+𝚷HA𝒖(𝒅)2𝒒A.\left\|\bm{u}^{(\bm{d})}\right\|^{2}=\underbrace{\sum_{t=1}^{\bm{d}}{\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in\bm{X}^{(\bm{d})}\right]}}_{\bm{p}_{A}}+\underbrace{\left\|\bm{\Pi}_{H_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}}_{\bm{q}_{A}}.

In an analogous fashion,

𝒗(𝒅)2=t=1𝒅𝔼𝒚γ[𝒚,𝒃(t)2|𝒚𝒀(𝒅)]𝒑B+𝚷HB𝒗(𝒅)2𝒒B.\left\|\bm{v}^{(\bm{d})}\right\|^{2}=\underbrace{\sum_{t=1}^{\bm{d}}{\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}\,\middle|\,\bm{y}\in\bm{Y}^{(\bm{d})}\right]}}_{\bm{p}_{B}}+\underbrace{\left\|\bm{\Pi}_{H_{B}^{\bot}}\bm{v}^{(\bm{d})}\right\|^{2}}_{\bm{q}_{B}}.

We next show that both 𝔼[𝒑A+𝒑B]\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}] and 𝔼[𝒒A+𝒒B]\operatorname*{\mathbb{E}}[\bm{q}_{A}+\bm{q}_{B}] are at most O(d)O(d). The former follows from stopping time and concentration arguments laid out in the overview that there cannot be too many orthogonal directions where 𝔼[𝒙,𝒂(t)2]\operatorname*{\mathbb{E}}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\right] is large. The latter follows from an application of level-one inequalities.

We will bound the norm of the projection on the subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B}, which corresponds to the quantity 𝔼[𝒑A+𝒑B]\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}], in Subsection 5.4.1 and bound the norm of the projection on the orthogonal subspaces 𝑯A\bm{H}_{A}^{\bot} and 𝑯B\bm{H}_{B}^{\bot}, which corresponds to the quantity 𝔼[𝒒A+𝒒B]\operatorname*{\mathbb{E}}[\bm{q}_{A}+\bm{q}_{B}], in Subsection 5.4.2.

5.4.1 Projection on the Subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B}

We shall show that the expected norm of the final center of mass when projected on the subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B} is

𝔼[𝒑A+𝒑B]=O(d).\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}]=O(d).

Towards this end, define the random variable 𝒌t=𝒌t(𝒙,𝒚)=𝒙,𝒂(t)2+𝒚,𝒃(t)2\bm{k}_{t}=\bm{k}_{t}(\bm{x},\bm{y})=\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2} for each tt\in\mathbb{N}. Note that the vectors 𝒂(t)\bm{a}^{(t)}’s are being chosen adaptively depending on the previous inner products 𝒙,𝒂(τ)\left\langle\bm{x},\bm{a}^{(\tau)}\right\rangle for τ<t\tau<t, as well as the Boolean communication bits from step 3(b), thus they are functions of 𝒙\bm{x} and 𝒚\bm{y} as well here. Observe that

𝔼[𝒑A+𝒑B]=𝔼[t=1𝒅𝔼[𝒌t|(𝒅)]]=𝔼𝒙,𝒚γ[t=1𝒅𝒌t].\operatorname*{\mathbb{E}}\left[\bm{p}_{A}+\bm{p}_{B}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\operatorname*{\mathbb{E}}\left[\bm{k}_{t}\,\middle|\,\mathcal{F}^{(\bm{d})}\right]\right]=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t=1}^{\bm{d}}\bm{k}_{t}\right].

We now divide the time sequence into successive intervals of different lengths r4dr\cdot 4d for r=1,2,r=1,2,\ldots. Then we bound the expected sum of 𝒌t\bm{k}_{t} within each time interval by O(rd)O(rd). We further argue that the probability that the stopping time 𝒅\bm{d} lies in the rr-th interval is at most 22r2\cdot 2^{-r}. In particular, for rr\in\mathbb{N}, letting interval Ir={(r2)4d+1,,(r+12)4d}I_{r}=\left\{\binom{r}{2}\cdot 4d+1,\ldots,\binom{r+1}{2}\cdot 4d\right\}, which is of length 4dr4dr, we show the following.

Claim 5.5.

For any rr\in\mathbb{N}, we have

𝔼𝒙,𝒚γ[tIr𝒌t|𝒅>(r2)4d]20dr+4ln(1𝐏𝐫[𝒅>(r2)4d]).\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>\binom{r}{2}\cdot 4d\right]\leq 20dr+4\ln\left(\dfrac{1}{\operatorname*{\mathbf{Pr}}\left[\bm{d}>\binom{r}{2}\cdot 4d\right]}\right).

We shall prove the above claim later since it is the most involved part of the proof. The previous claim readily implies the following probability bounds.

Claim 5.6.

For any rr\in\mathbb{N}, we have 𝐏𝐫[𝒅>(r2)4d]22r\operatorname*{\mathbf{Pr}}\left[\bm{d}>\binom{r}{2}\cdot 4d\right]\leq 2\cdot 2^{-r}.

Proof of Claim 5.6.

We bound 𝐏𝐫[𝒅>(r2)4d]\operatorname*{\mathbf{Pr}}\left[\bm{d}>\binom{r}{2}\cdot 4d\right] by induction on rr. The claim trivially holds for r=1r=1.

Now we proceed to analyze the event 𝒅(r+12)4d\bm{d}\geq\tbinom{r+1}{2}\cdot 4d. Observe that Claim 5.1 implies that there are at most 2d2d many step 3(a) and 3(b) throughout the protocol. Thus if the event above occurs, there are at least 4dr2d2dr4dr-2d\geq 2dr many time steps tIrt\in I_{r} where the process is in step 3(c).

By the definition of the cleanup step, if X×YX\times Y is a rectangle determined101010It suffices to consider such events since we have a product measure on 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} conditioned on (t)\mathcal{F}^{(t)} and 𝒅\bm{d} is a stopping time and is (t)\mathcal{F}^{(t)}-measurable (i.e., determined by the randomness in (t)\mathcal{F}^{(t)}). by (t1){𝒅>(r2)4d}\mathcal{F}^{(t-1)}\cap\{\bm{d}>\binom{r}{2}\cdot 4d\} where the process is in step 3(c) and Alice speaks, then

𝔼𝒙γ[𝒌t|(𝒙,𝒚)X×Y]=𝔼𝒙γ[𝒙,𝒂(t)2|𝒙X]𝔼𝒙γ[𝒙μ(X),𝒂(t)2|𝒙X]λ,\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{k}_{t}\,\middle|\,(\bm{x},\bm{y})\in X\times Y\right]=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]\geq\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\mu(X),\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]\geq\lambda,

where λ=100\lambda=100 is the cleanup parameter and μ(X)=𝔼𝒙γ[𝒙|𝒙X]\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}[\bm{x}~{}|~{}\bm{x}\in X] is the center of mass. This is because 𝒂(t)\bm{a}^{(t)} is chosen to be a unit vector in a direction where the current set (conditioned on the history) is not pairwise clean. A similar statement holds if Bob speaks in step 3(c) for the random variable 𝒚,𝒃(t)2\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2} where 𝒚\bm{y} is sampled from γ\gamma conditioned on YY.

By the tower property of conditional expectation, the above implies that

1002dr𝐏𝐫[𝒅>(r+12)4d|𝒅>(r2)4d]𝔼[tIr𝒌t|𝒅>(r2)4d].100\cdot 2dr\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}>{\textstyle\binom{r+1}{2}}\cdot 4d\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]\leq\operatorname*{\mathbb{E}}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right].

Recall that Claim 5.5 implies that the right hand side is at most 20dr+4ln(1𝐏𝐫[𝒅>(r2)4d])\leq 20dr+4\ln\left(\frac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\tbinom{r}{2}\cdot 4d]}\right). We consider two cases:

  1. (i)

    if 𝐏𝐫[𝒅>(r2)4d]2r\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]\leq 2^{-r}, then clearly 𝐏𝐫[𝒅>(r+12)4d]2r\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r+1}{2}\cdot 4d]\leq 2^{-r} as well as required;

  2. (ii)

    otherwise 𝐏𝐫[𝒅>(r2)4d]2r\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]\geq 2^{-r} and 20dr+4(1𝐏𝐫[𝒅>(r2)4d])20dr+4r20dr+4\left(\frac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\tbinom{r}{2}\cdot 4d]}\right)\leq 20dr+4r, then it follows that

    𝐏𝐫[𝒅>(r+12)4d|𝒅>(r2)4d]1/2,\operatorname*{\mathbf{Pr}}\left[\bm{d}>\textstyle\binom{r+1}{2}\cdot 4d\,\middle|\,\bm{d}>\textstyle\binom{r}{2}\cdot 4d\right]\leq 1/2,

    and by induction this implies that 𝐏𝐫[𝒅>(r+12)4d]1/2𝐏𝐫[𝒅>(r2)4d]2r\operatorname*{\mathbf{Pr}}\left[\bm{d}>\textstyle\binom{r+1}{2}\cdot 4d\right]\leq 1/2\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}>\textstyle\binom{r}{2}\cdot 4d\right]\leq 2^{-r}.∎

These claims imply that

𝔼[𝒑A+𝒑B]\displaystyle\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}] 𝔼[r=01[𝒅>(r2)4d]tIr𝒌t]\displaystyle\leq\operatorname*{\mathbb{E}}\left[\sum_{r=0}^{\infty}1\left[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]\cdot\sum_{t\in I_{r}}\bm{k}_{t}\right]
=r=0𝐏𝐫[𝒅>(r2)4d]𝔼[tIr𝒌t|𝒅>(r2)4d]\displaystyle=\sum_{r=0}^{\infty}\operatorname*{\mathbf{Pr}}[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d]\cdot\operatorname*{\mathbb{E}}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]
r=0(21rO(rd)+4𝐏𝐫[𝒅>(r2)4d]ln(1𝐏𝐫[𝒅>(r2)4d]))\displaystyle\leq\sum_{r=0}^{\infty}\left(2^{1-r}\cdot O(rd)+4\cdot\operatorname*{\mathbf{Pr}}[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d]\cdot\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}\left[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]}\right)\right)
r=0(21rO(rd)+O((r+1)2r))O(d),\displaystyle\leq\sum_{r=0}^{\infty}\left(2^{1-r}\cdot O(rd)+O\left((r+1)2^{-r}\right)\right)\leq O(d),

where the last line uses the fact that xln(1/x)O((r+1)2r)x\ln(1/x)\leq O((r+1)2^{-r}) for 0x22r0\leq x\leq 2\cdot 2^{-r} and rr\in\mathbb{N}. This proves the desired bound on 𝔼[𝒑A+𝒑B]\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}] assuming Claim 5.5 which we prove next.

Proof of Claim 5.5.

To prove the claim, we need to analyze the expectation of tIr𝒌t\sum_{t\in I_{r}}\bm{k}_{t} under 𝒙,𝒚\bm{x},\bm{y} sampled from γ\gamma conditioned on the event 𝒅(r2)4d\bm{d}\geq\binom{r}{2}\cdot 4d.

We first describe an equivalent way of sampling from this distribution which will be easier for analysis. First, we recall that the definition of the cleanup protocol implies that the Boolean communication in 𝒞¯\overline{\mathcal{C}} is solely determined by the previous Boolean communication, since it is specified by the original protocol 𝒞~\widetilde{\mathcal{C}} (and thus 𝒞\mathcal{C}) before the cleanup.

Let us fix any Boolean string c{0,1}c\in\{0,1\}^{*} that is a valid Boolean transcript in the original communication protocol 𝒞~\widetilde{\mathcal{C}}. This defines a rectangle Xc×Ycn×nX_{c}\times Y_{c}\subseteq\mathbb{R}^{n}\times\mathbb{R}^{n} consisting of all pairs of inputs to Alice and Bob that result in the Boolean transcript cc in 𝒞~\widetilde{\mathcal{C}}. If we sample 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma conditioned on 𝒅>(r2)4d\bm{d}>\binom{r}{2}\cdot 4d and output the unique (𝑿c,𝒀c)(\bm{X}_{c},\bm{Y}_{c}) such that (𝒙,𝒚)𝑿c×𝒀c(\bm{x},\bm{y})\in\bm{X}_{c}\times\bm{Y}_{c}, we obtain a distribution on rectangles. We use γ(Xc×Yc|𝒅>(r2)4d)\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d) to denote the probability of obtaining Xc×YcX_{c}\times Y_{c} by this sampling process so that cγ(Xc×Yc|𝒅>(r2)4d)=1\sum_{c}\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d)=1.

Now consider the following two-stage sampling process. First, we sample a rectangle Xc×YcX_{c}\times Y_{c} according to the above distribution, and then we sample the inputs 𝒙,𝒚\bm{x},\bm{y} sampled from γn\gamma_{n} conditioned on the event that {(𝒙,𝒚)Xc×Yc}{𝒅>(r2)4d}\{(\bm{x},\bm{y})\in X_{c}\times Y_{c}\}\wedge\{\bm{d}>\binom{r}{2}\cdot 4d\}. We shall show the following claim for any rectangle Xc×YcX_{c}\times Y_{c} that could be sampled in the first step.

Claim 5.7.

𝔼𝒙,𝒚γ[tIr𝒌t|𝒅>4d(r2),(𝒙,𝒚)Xc×Yc]12dr+4ln(1𝐏𝐫[𝒅>4d(r2),(𝒙,𝒚)Xc×Yc])\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>4d\tbinom{r}{2},(\bm{x},\bm{y})\in X_{c}\times Y_{c}\right]\leq 12dr+4\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>4d\tbinom{r}{2},(\bm{x},\bm{y})\in X_{c}\times Y_{c}]}\right).

Assuming the above, and taking an expectation over Xc×YcX_{c}\times Y_{c} drawn with probability γ(Xc×Yc|𝒅>(r2)4d)\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d), we immediately obtain Claim 5.5:

𝔼𝒙,𝒚γ[tIr𝒌t|𝒅>(r2)4d]\displaystyle\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]
12dr+4c{0,1},|c|dγ(Xc×Yc|𝒅>(r2)4d)(ln(1γ(Xc×Yc|𝒅>(r2)4d))+ln(1𝐏𝐫[𝒅>(r2)4d]))\displaystyle\leq 12dr+4\cdot\sum_{\begin{subarray}{c}c\in\{0,1\}^{*},|c|\leq d\end{subarray}}\gamma(X_{c}\times Y_{c}|\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d)\cdot\left(\ln\left(\tfrac{1}{\gamma(X_{c}\times Y_{c}|\bm{d}>\binom{r}{2}\cdot 4d)}\right)+\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]}\right)\right)
12dr+4ln(3d)+4ln(1𝐏𝐫[𝒅>(r2)4d])\displaystyle\leq 12dr+4\cdot\ln(3^{d})+4\cdot\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]}\right) (by concavity of ln()\ln(\cdot))
20dr+4ln(1𝐏𝐫[𝒅>(r2)4d]).\displaystyle\leq 20dr+4\cdot\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]}\right).

To complete the proof, we now prove Claim 5.7.

Proof of Claim 5.7.

Fix any cc such that γ(Xc×Yc|𝒅>(r2)4d)>0\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d)>0. We will bound the expectation of the quantity tIr𝒌t=tIr𝒙,𝒂(t)2+𝒚,𝒃(t)2\sum_{t\in I_{r}}\bm{k}_{t}=\sum_{t\in I_{r}}\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2} where 𝒙,𝒚\bm{x},\bm{y} are sampled from γn\gamma_{n} conditioned on the event that {(𝒙,𝒚)Xc×Yc}{𝒅>(r2)4d}\{(\bm{x},\bm{y})\in X_{c}\times Y_{c}\}\wedge\{\bm{d}>\binom{r}{2}\cdot 4d\}. Note that 𝒂(t),𝒃(t),𝒅\bm{a}^{(t)},\bm{b}^{(t)},\bm{d} are functions of the previous messages of the protocol and hence also the inputs 𝒙,𝒚\bm{x},\bm{y}. Once we condition on the above event, the Boolean communication is also fixed to be cc.

To analyze the above conditioning, we first do a thought experiment and consider a different protocol that takes standard Gaussian inputs (without any conditioning) and show a tail bound for the random variable tIr𝒌t\sum_{t\in I_{r}}\bm{k}_{t} for this new protocol. In the last step, we will use it to compute the expectation we ultimately want.

Protocol 𝒞c\mathcal{C}_{c}.

The protocol 𝒞c\mathcal{C}_{c} always communicates according to the fixed transcript cc in a Boolean communication step and otherwise according to the cleanup protocol 𝒞¯\overline{\mathcal{C}} on any input x,yx,y. Consider the random walk on this new protocol tree where the inputs 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma (without any conditioning). Let (𝒢(t))t(\mathcal{G}^{(t)})_{t} be the associated filtration of the new protocol 𝒞c\mathcal{C}_{c} which can be identified with the collection of all partial transcripts till time tt. Note that the vectors 𝒂(t)\bm{a}^{(t)} and 𝒃(t)\bm{b}^{(t)} in this new protocol are determined only by the previous real communication since the Boolean communication is fixed to cc. This also implies that the vectors 𝒂(t)\bm{a}^{(t)} and 𝒃(t)\bm{b}^{(t)} form a predictable sequence with respect to the filtration (𝒢(t))t(\mathcal{G}^{(t)})_{t}. Moreover, by the definition of the protocol the next non-zero vector 𝒂()\bm{a}^{(\cdot)} is chosen to be a unit vector orthogonal to the previously chosen 𝒂()\bm{a}^{(\cdot)}’s and the same holds for the vectors 𝒃()\bm{b}^{(\cdot)}.

We denote by 𝒌t(c)\bm{k}_{t}^{(c)} the random variable that captures 𝒌t\bm{k}_{t} for the protocol 𝒞c\mathcal{C}_{c}, i.e., 𝒌t(c)=𝒙,𝒂(t)2+𝒚,𝒃(t)2\bm{k}_{t}^{(c)}=\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2} for 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma and 𝒂(t),𝒃(t)\bm{a}^{(t)},\bm{b}^{(t)} defined by the protocol 𝒞c\mathcal{C}_{c}. Observe that if (𝒙,𝒚)Xc×Yc(\bm{x},\bm{y})\in X_{c}\times Y_{c} then 𝒌t(c)=𝒌t\bm{k}_{t}^{(c)}=\bm{k}_{t}.

Consider the behavior of the protocol 𝒞c\mathcal{C}_{c} at some fixed time tt. The nice thing about the protocol 𝒞c\mathcal{C}_{c} is that conditioned on all previous real messages for τ<t\tau<t, both 𝒙\bm{x} and 𝒚\bm{y} are standard Gaussian distributions on an affine subspace of n\mathbb{R}^{n} (defined by the previous messages). Then, at time tt, since 𝒂(t)\bm{a}^{(t)} is orthogonal to the directions used in all previous real messages, it follows that the distribution of 𝒙,𝒂(t)\left\langle\bm{x},\bm{a}^{(t)}\right\rangle conditioned on any event in 𝒢(t1)\mathcal{G}^{(t-1)} is an independent standard Gaussian for every tt if 𝒂(t)\bm{a}^{(t)} is non-zero. The same holds for 𝒚,𝒃(t)\left\langle\bm{y},\bm{b}^{(t)}\right\rangle as well. This last fact uses that the projection of a multi-variate standard Gaussian γn\gamma_{n} in orthonormal directions yields independent real-valued standard Gaussians.

This implies that each new 𝒙,𝒂(t)2\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2} and 𝒚,𝒃(t)2\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2} is an independent chi-squared random variable conditioned on the history (up to depth (r2)4d\binom{r}{2}\cdot 4d) of the random walk. Therefore, Fact 3.2 implies that

𝐏𝐫𝒙,𝒚γ[tIr𝒌t(c)(𝒙,𝒚)2|Ir|+s|𝒢((r2)4d)]es/4.\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}^{(c)}(\bm{x},\bm{y})\geq 2|I_{r}|+s\,\middle|\,\mathcal{G}^{(\binom{r}{2}\cdot 4d)}\right]\leq e^{-s/4}.

Since |Ir|4dr|I_{r}|\leq 4dr, we have 𝐏𝐫𝒙,𝒚γ[tIr𝒌t(c)(𝒙,𝒚)8dr+s]es/4.\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}^{(c)}_{t}(\bm{x},\bm{y})\geq 8dr+s\right]\leq e^{-s/4}.

Computing the Original Expectation.

Let us compare the probability of the above tail event in the original protocol 𝒞¯\overline{\mathcal{C}} where inputs 𝒙,𝒚\bm{x},\bm{y} are sampled from γ\gamma conditioned on the event that {(𝒙,𝒚)Xc×Yc}{𝒅>(r2)4d}\{(\bm{x},\bm{y})\in X_{c}\times Y_{c}\}\wedge\{\bm{d}>\binom{r}{2}\cdot 4d\}. We can write

𝐏𝐫(𝒙,𝒚)γ[tIr𝒌t(𝒙,𝒚)8dr+s|𝒅>(r2)4d,(𝒙,𝒚)Xc×Yc]\displaystyle\phantom{\leq}\operatorname*{\mathbf{Pr}}_{(\bm{x},\bm{y})\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\geq 8dr+s\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d,(\bm{x},\bm{y})\in X_{c}\times Y_{c}\right] (5.8)
=𝐏𝐫𝒙,𝒚γ[tIr𝒌t(𝒙,𝒚)8dr+s,(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d]𝐏𝐫𝒙,𝒚γ[(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d].\displaystyle=\frac{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\geq 8dr+s,(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}.

We then bound the numerator by

𝐏𝐫𝒙,𝒚γ[tIr𝒌t(𝒙,𝒚)8dr+s,(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d]\displaystyle\phantom{\leq}\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\geq 8dr+s,(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]
=𝐏𝐫𝒙,𝒚γ[tIr𝒌t(c)(𝒙,𝒚)8dr+s,(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d]\displaystyle=\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}^{(c)}(\bm{x},\bm{y})\geq 8dr+s,(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right] (if (𝒙,𝒚)Xc×Yc(\bm{x},\bm{y})\in X_{c}\times Y_{c} then 𝒌t(c)=𝒌t\bm{k}_{t}^{(c)}=\bm{k}_{t})
𝐏𝐫𝒙,𝒚γ[tIr𝒌t(c)(𝒙,𝒚)8dr+s]es/4.\displaystyle\leq\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}^{(c)}(\bm{x},\bm{y})\geq 8dr+s\right]\leq e^{-s/4}.

Note that the inequality gives us an exponential tail on Equation 5.8:

Equation 5.8es/4(𝐏𝐫𝒙,𝒚γ[(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d])1.\lx@cref{creftypecap~refnum}{eq:tail_of_kt}\leq e^{-s/4}\cdot\left(\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]\right)^{-1}.

We can now integrate the above inequality to get an upper bound on the expected value of tIr𝒌t\sum_{t\in I_{r}}\bm{k}_{t} under the distribution of interest. In particular, since for any non-negative random variable 𝒘\bm{w}, the following holds for any parameter α0\alpha\geq 0:

𝔼[𝒘]=0+𝐏𝐫[𝒘z]dzα+α+𝐏𝐫[𝒘z]dz=α+0+𝐏𝐫[𝒘α+z]dz,\operatorname*{\mathbb{E}}[\bm{w}]=\int_{0}^{+\infty}\operatorname*{\mathbf{Pr}}[\bm{w}\geq z]\mathrm{d}z\leq\alpha+\int_{\alpha}^{+\infty}\operatorname*{\mathbf{Pr}}[\bm{w}\geq z]\mathrm{d}z=\alpha+\int_{0}^{+\infty}\operatorname*{\mathbf{Pr}}[\bm{w}\geq\alpha+z]\mathrm{d}z,

we derive the following by taking α=8dr+4ln(1𝐏𝐫𝒙,𝒚γ[(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d])\alpha=8dr+4\ln\left(\frac{1}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}\right):

𝔼(𝒙,𝒚)γ[iIr𝒌t(𝒙,𝒚)|𝒅>(r2)4d,(𝒙,𝒚)Xc×Yc]\displaystyle\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\gamma}\left[\sum_{i\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d,(\bm{x},\bm{y})\in X_{c}\times Y_{c}\right]
α+0+ez/4dz=α+4\displaystyle\qquad\leq\alpha+\int_{0}^{+\infty}e^{-z/4}\mathrm{d}z=\alpha+4
12dr+4ln(1𝐏𝐫𝒙,𝒚γ[(𝒙,𝒚)Xc×Yc,𝒅>(r2)4d]).\displaystyle\qquad\leq 12dr+4\ln\left(\dfrac{1}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}\right).

This completes the proof of Claim 5.7. ∎

5.4.2 Projection on the Orthogonal Subspaces 𝑯A\bm{H}_{A}^{\bot} and 𝑯B\bm{H}_{B}^{\bot}

We shall show that the expected norm of the final center of mass when projected on the subspaces 𝑯A\bm{H}_{A}^{\bot} and 𝑯B\bm{H}_{B}^{\bot} is

𝔼[𝒒A+𝒒B]=O(d).\operatorname*{\mathbb{E}}[\bm{q}_{A}+\bm{q}_{B}]=O(d).

Recall that 𝒒A=𝚷𝑯A𝒖(𝒅)2\bm{q}_{A}=\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2} where 𝑯A\bm{H}_{A} is the (random) linear subspace spanned by the orthonormal set of vectors 𝒂(0),,𝒂(𝒅)\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{d})} and 𝑯A\bm{H}_{A}^{\bot} its orthogonal complement. Moreover, the vectors 𝒂(t)\bm{a}^{(t)} are determined by the previous Boolean and real communication. A similar statement holds for 𝒒B\bm{q}_{B} and the vectors 𝒃(t)\bm{b}^{(t)} as well.

The proof will follow in two steps. We will first show that one can bound the norm of the projection 𝚷𝑯A𝒖(d)\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(d)}, which turns out to be the Gaussian center of mass of a set that lives in the subspace 𝑯A\bm{H}_{A}^{\bot}, in terms of the logarithm of the inverse relative measure with respect to the subspace. Note that the Gaussian measure here is the Gaussian measure γ𝑯A\gamma_{\bm{H}_{A}^{\bot}} on the subspace 𝑯A\bm{H}_{A}^{\bot}. The case for 𝚷𝑯B𝒖(d)\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{u}^{(d)} will be similar. The second step will use information theory-esque convexity argument to show that on average the logarithm of the inverse relative measure is small.

For the first part, we observe that if we sample 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma and take a random walk on this protocol tree, we obtain a probability measure over transcripts which includes both real and Boolean values. Recall that the Boolean transcript is determined by the original protocol and only depends on the previous Boolean communication and the real transcript is sandwiched between the Boolean communication. Let =(𝒄,𝒓)\bm{\ell}=(\bm{c},\bm{r}) denote the random variable representing the full transcript of the generalized protocol where 𝒄\bm{c} is the Boolean communication and 𝒓\bm{r} is the real communication. For any given transcript \bm{\ell}, let 𝑿×𝒀\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}} denote the corresponding rectangle consists of inputs reaching the leaf, and let 𝑿𝒄×𝒀𝒄\bm{X}_{\bm{c}}\times\bm{Y}_{\bm{c}} (for 𝑿𝒄,𝒀𝒄n\bm{X}_{\bm{c}},\bm{Y}_{\bm{c}}\subseteq\mathbb{R}^{n}) denote the rectangle consisting of all pairs of inputs to Alice and Bob that result in the Boolean transcript 𝒄\bm{c}. Note that the real communication 𝒓\bm{r} together with 𝒄\bm{c} fixes the subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B} and particular affine shifts 𝒔A\bm{s}_{A} and 𝒔B\bm{s}_{B} of those subspaces depending on the value of the inner products determined by the full transcript. In particular, the rectangle 𝑿×𝒀\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}} consistent with the full transcript =(𝒄,𝒓)\bm{\ell}=(\bm{c},\bm{r}) is given by 𝑿=𝑿𝒄(𝑯A+𝒔A)\bm{X}_{\bm{\ell}}=\bm{X}_{\bm{c}}\cap(\bm{H}_{A}+\bm{s}_{A}) and 𝒀=𝒀𝒄(𝑯B+𝒔B)\bm{Y}_{\bm{\ell}}=\bm{Y}_{\bm{c}}\cap(\bm{H}_{B}+\bm{s}_{B}), i.e., taking (random) affine slices of the original sets.

Note that 𝒖(𝒅)\bm{u}^{(\bm{d})} and 𝒗(𝒅)\bm{v}^{(\bm{d})} are distributed as the center of masses of the final rectangle 𝑿×𝒀\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}, and thus is suffices to look at the rectangles for the rest of the argument. Since 𝑿\bm{X}_{\bm{\ell}} (resp., 𝒀\bm{Y}_{\bm{\ell}}) lies in some affine shift of 𝑯A\bm{H}_{A}^{\bot} (resp., 𝑯B\bm{H}_{B}^{\bot}), defining the relative center of mass for a set AA that lives in the ambient linear subspace VV, as μV(A)=𝔼𝒙γV[𝒙|𝒙A]\mu_{V}(A)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{V}}[\bm{x}~{}|~{}\bm{x}\in A] where the Gaussian measure γV\gamma_{V} is on the subspace VV, it follows that

𝔼[𝒒A+𝒒B]\displaystyle\operatorname*{\mathbb{E}}\left[\bm{q}_{A}+\bm{q}_{B}\right] =𝔼[𝚷𝑯A𝒖(𝒅)2+𝚷𝑯A𝒖(𝒅)2]=𝔼[μ𝑯A(𝚷𝑯A𝑿)2+μ𝑯B(𝚷𝑯B𝒀)2].\displaystyle=\operatorname*{\mathbb{E}}\left[\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}\right]=\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\|\mu_{\bm{H}_{A}^{\perp}}(\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{X}_{\bm{\ell}})\|^{2}+\|\mu_{\bm{H}_{B}^{\perp}}(\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{Y}_{\bm{\ell}})\|^{2}\right].

Recalling that γrel\gamma_{\mathrm{rel}} is the Gaussian measure of a set relative to its ambient space, we will show:

Claim 5.8.

μ𝑯A(𝚷𝑯A𝑿)22e2ln(eγrel(𝑿))\|\mu_{\bm{H}_{A}^{\perp}}(\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{X}_{\bm{\ell}})\|^{2}\leq 2e^{2}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}\left(\bm{X}_{\bm{\ell}}\right)}\right) and μ𝑯B(𝚷𝑯B𝒀)22e2ln(eγrel(𝒀))\|\mu_{\bm{H}_{B}^{\perp}}(\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{Y}_{\bm{\ell}})\|^{2}\leq 2e^{2}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}\left(\bm{Y}_{\bm{\ell}}\right)}\right).

Note that we can ignore the case when γrel(𝑿)\gamma_{\mathrm{rel}}(\bm{X}_{\bm{\ell}}) is zero above, since we will eventually take an expectation over \bm{\ell} and almost surely this measure is non-zero.

Using the previous claim,

𝔼[𝒒A+𝒒B]\displaystyle\operatorname*{\mathbb{E}}\left[\bm{q}_{A}+\bm{q}_{B}\right] =𝔼[𝚷𝑯A𝒖(𝒅)2+𝚷𝑯A𝒖(𝒅)2]2e2𝔼[ln(eγrel(𝑿×𝒀))].\displaystyle=\operatorname*{\mathbb{E}}\left[\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}\right]\leq 2e^{2}\cdot\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln\left(\frac{e}{\gamma_{\mathrm{rel}}\left({\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}}\right)}\right)\right].

For the second step of the proof, we show the next claim which relies on convexity arguments to bound the right hand side above by O(d)O(d). This is similar in spirit to chain-style arguments from information theory.

Claim 5.9.

𝔼[ln(eγrel(𝑿×𝒀))]=O(d)\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}\left({\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}}\right)}\right)\right]=O(d).

This gives us the final bound 𝔼[𝒒A+𝒒B]=O(d)\operatorname*{\mathbb{E}}\left[\bm{q}_{A}+\bm{q}_{B}\right]=O(d) assuming the claims which we now prove.

Proof of Claim 5.8.

We can bound the norm of the above projection by an application of the Gaussian level-one inequality (Theorem 3.1), which, by rotational symmetry, implies that if AA is a subset of a linear subspace VV with non-zero measure, then

μV(A)22e2ln(eγV(A)),\displaystyle\|\mu_{V}(A)\|^{2}\leq 2e^{2}\ln\left(\frac{e}{\gamma_{V}(A)}\right), (5.9)

where recall that μV(A)=𝔼𝒙γV[𝒙|𝒙A]\mu_{V}(A)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{V}}[\bm{x}~{}|~{}\bm{x}\in A] is the center of mass with respect to the Gaussian measure γV\gamma_{V} on the subspace VV.

If we run the generalized protocol on 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma and condition on getting the full transcript \bm{\ell}, the conditional probability measure on 𝚷𝑯A𝒙\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{x} is that of the Gaussian measure γ𝑯A\gamma_{\bm{H}_{A}^{\bot}} conditioned on 𝒙𝑿𝒔A\bm{x}\in\bm{X}_{\bm{\ell}}-\bm{s}_{A} and 𝚷𝑯A𝒚\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{y} is that of the Gaussian measure γ𝑯B\gamma_{\bm{H}_{B}^{\bot}} conditioned on 𝒚𝒀𝒔B\bm{y}\in\bm{Y}_{\bm{\ell}}-\bm{s}_{B} and they are independent. This follows from the fact that so far the parties have fixed inner products along a basis for the orthogonal subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B} and the fact the projection of a standard Gaussian on orthogonal subspaces are independent.

Thus, applying Equation 5.9, we have

μ𝑯A(𝚷𝑯A𝑿)22e2ln(eγ𝑯A(𝑿𝒔A))=2e2ln(eγrel(𝑿)),\displaystyle\|\mu_{\bm{H}_{A}^{\bot}}(\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{X}_{\bm{\ell}})\|^{2}\leq 2e^{2}\ln\left(\frac{e}{\gamma_{\bm{H}_{A}^{\bot}}(\bm{X}_{\bm{\ell}}-\bm{s}_{A})}\right)=2e^{2}\ln\left(\frac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{\bm{\ell}})}\right),

where the last line follows since 𝑯A+𝒔A\bm{H}_{A}+\bm{s}_{A} is the ambient space for 𝑿\bm{X}_{\bm{\ell}} (this holds almost surely) and γrel(S)=γV(St)\gamma_{\mathrm{rel}}(S)=\gamma_{V}(S-t) if V+tV+t is the ambient space of SS. A similar argument proves the bound on μ𝑯B(𝚷𝑯B𝒀)2\|\mu_{\bm{H}_{B}^{\bot}}(\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{Y}_{\bm{\ell}})\|^{2}. ∎

Proof of Claim 5.9.

For this claim, it will be convenient to consider a different generalized protocol 𝒞\mathcal{C}^{\prime} that generates the same distribution on the leaves \bm{\ell}. In particular, since the Boolean messages in the generalized protocol 𝒞¯\overline{\mathcal{C}} only depend on the previous Boolean messages, one can first send all the Boolean messages 𝒄\bm{c}, and then send all the real messages 𝒓\bm{r} choosing them according to the protocol 𝒞¯\overline{\mathcal{C}} depending on the previous real messages and the (partial) Boolean transcript. Note that the protocol 𝒞\mathcal{C}^{\prime} generates the same distribution on the leaves \bm{\ell} when the inputs 𝒙,𝒚γn\bm{x},\bm{y}\sim\gamma_{n}. In particular, the real communication only partitions 111111We remark that this protocol 𝒞\mathcal{C}^{\prime} suffices for proving this claim since we are looking only at the leaves. However, unlike Lemma 5.3, directly bounding the expected quadratic variation of the martingale corresponding to the protocol 𝒞\mathcal{C}^{\prime} is difficult. each rectangles Xc×YcX_{c}\times Y_{c} that corresponds to the Boolean transcript cc into affine slices.

For rest of the claim, we now work with the protocol 𝒞\mathcal{C}^{\prime} where the Boolean communication happens first. To prove the claim, we condition on a Boolean transcript 𝒄=c\bm{c}=c and by induction show that

𝔼𝒓[ln(eγrel(𝑿(c,𝒓)×𝒀(c,𝒓)))|𝒄=c]ln(eγrel(Xc×Yc)),\displaystyle\operatorname*{\mathbb{E}}_{\bm{r}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r})}\times\bm{Y}_{(c,\bm{r})})}\right)\,\middle|\,\bm{c}=c\right]\leq\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X_{c}\times Y_{c})}\right), (5.10)

where (c,r)(c,r) is the full transcript and Xc×YcX_{c}\times Y_{c} is the rectangle containing all the inputs such that Boolean transcript is cc. Note that γrel(Xc×Yc)\gamma_{\mathrm{rel}}(X_{c}\times Y_{c}) is the probability of obtaining the Boolean transcript cc since the ambient space of XcX_{c} and YcY_{c} is n\mathbb{R}^{n}.

Then, taking expectation over the Boolean transcript cc,

𝔼[ln(eγrel(𝑿×𝒀))]\displaystyle\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}})}\right)\right] 𝔼𝒄[ln(eγrel(𝑿𝒄×𝒀𝒄))]\displaystyle\leq\operatorname*{\mathbb{E}}_{\bm{c}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{\bm{c}}\times\bm{Y}_{\bm{c}})}\right)\right]
=c{0,1},|c|d𝐏𝐫[𝒄=c]ln(e𝐏𝐫[𝒄=c])\displaystyle=\sum_{\begin{subarray}{c}c\in\{0,1\}^{*},|c|\leq d\end{subarray}}\operatorname*{\mathbf{Pr}}[\bm{c}=c]\ln\left(\dfrac{e}{\operatorname*{\mathbf{Pr}}[\bm{c}=c]}\right)
ln(2e2d)=O(d),\displaystyle\leq\ln(2e\cdot 2^{d})=O(d),

where the last line follows from concavity.

Induction.

To complete the proof, we now show Equation 5.10 by induction. For this, let us look at an intermediate step tt in 𝒞\mathcal{C}^{\prime} where the Boolean communication is fixed to cc and Alice and Bob have exchanged some real messages r<t:=r1,,rt1r_{<t}:=r_{1},\ldots,r_{t-1}. Let the current rectangle be X(c,r<t)×Y(c,r<t)X_{(c,r_{<t})}\times Y_{(c,r_{<t})} and it is Alice’s turn to speak. Note that X(c,r<t)X_{(c,r_{<t})} and Y(c,r<t)Y_{(c,r_{<t})} live in some affine subspaces at this point and in the current round, Alice sends the inner product of her input xx with a vector a(t)a^{(t)} that is determined by the previous messages and orthogonal to the ambient space of X(c,r<t)X_{(c,r_{<t})}. At this step, Bob’s set Y(c,r<t)Y_{(c,r_{<t})} does not change at all. We shall show that in each step, the log of the inverse of the relative measure of the current rectangle does not increase on average over the next message:

𝔼𝒓t[ln(eγrel(𝑿(c,𝒓t)))|𝒄=c,𝒓<t=r<t]ln(eγrel(X(c,r<t))),\displaystyle\operatorname*{\mathbb{E}}_{\bm{r}_{\leq t}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r}_{\leq t})})}\right)\,\middle|\,\bm{c}=c,\bm{r}_{<t}=r_{<t}\right]\leq\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X_{(c,r_{<t})})}\right), (5.11)

and an analogous statement holds when Bob speaks. Taking an expectation over 𝒓<t\bm{r}_{<t}, the above directly applies (5.10) by a straightforward backward induction:

𝔼𝒓t[ln(eγrel(𝑿(c,𝒓t)×𝒀(c,𝒓t)))|𝒄=c]\displaystyle\operatorname*{\mathbb{E}}_{\bm{r}_{\leq t}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r}_{\leq t})}\times\bm{Y}_{(c,{\bm{r}_{\leq t})}})}\right)\,\middle|\,\bm{c}=c\right] 𝔼𝒓<t[ln(eγrel(𝑿(c,𝒓<t)×𝒀(c,𝒓<t)))|𝒄=c]\displaystyle\leq\operatorname*{\mathbb{E}}_{\bm{r}_{<t}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r}_{<t})}\times\bm{Y}_{(c,{\bm{r}_{<t})}})}\right)\,\middle|\,\bm{c}=c\right]
ln(eγrel(Xc×Yc)).\displaystyle\leq\cdots\leq\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X_{c}\times Y_{c})}\right).

To see Equation 5.11, let us write X:=X(c,r<t)X:=X_{(c,r_{<t})} for Alice’s current set. Recall that since we have fixed the history, Alice has fixed inner product with some orthogonal directions a(1),,a(t1)a^{(1)},\ldots,a^{(t-1)} and she has decided on the next direction a:=a(t)a:=a^{(t)} along which she will send the next inner product. Thus, XX lives in some fixed affine subspace V+sV^{\bot}+s where VV is the span of a(1),,a(t1)a^{(1)},\ldots,a^{(t-1)} and the next message r:=rt=x,ar:=r_{t}=\left\langle x,a\right\rangle. Moreover, conditioned on the history till this point, the conditional probability distribution on Alice’s input 𝒙n\bm{x}\in\mathbb{R}^{n} can be described as follows: the projections corresponding to the non-zero vectors in the sequence a(1),,a(t1)a^{(1)},\ldots,a^{(t-1)}, i.e., the inner products 𝒙,a(τ)\left\langle\bm{x},a^{(\tau)}\right\rangle where a(τ)0a^{(\tau)}\neq 0 for τ<t\tau<t, are fixed according to the shift ss, while the distribution on the orthogonal complement VV^{\bot} is that of the Gaussian measure γV\gamma_{V^{\bot}} on the subspace VV^{\bot} after conditioning on the event that 𝒙Xs\bm{x}\in X-s (which lives in VV^{\bot}). This uses that projections of a standard nn-dimensional Gaussian in orthogonal directions are independent.

Let kk be the dimension of VV where k<nk<n. Then, by doing a linear transformation, we may assume that V=nkV^{\bot}=\mathbb{R}^{n-k} (and thus XnkX\subseteq\mathbb{R}^{n-k} and the shift ss fixes the coordinates nk+1n-k+1 through nn) and a=e1a=e_{1}, i.e., in the current message Alice reveals the first coordinate of 𝒙nk\bm{x}\in\mathbb{R}^{n-k} where 𝒙\bm{x} is sampled from γnk\gamma_{n-k} conditioned on 𝒙X\bm{x}\in X. In this case, γrel\gamma_{\mathrm{rel}} in the left hand side of Equation 5.11 is exactly γrel(X{x1=r})\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\}) if Alice sends rr as the message, while for the right hand side of Equation 5.11, we have γrel(X)=γnk(X)\gamma_{\mathrm{rel}}(X)=\gamma_{n-k}(X). Denoting by dμx1\mathrm{d}\mu_{x_{1}} the probability density function of 𝒙1\bm{x}_{1}, our statement boils down to showing

ln(eγrel(X{x1=r}))dμx1(r)\displaystyle\int_{\mathbb{R}}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}\right)\mathrm{d}\mu_{x_{1}}(r) ln(eγnk(X)).\displaystyle\leq\ln\left(\dfrac{e}{\gamma_{n-k}(X)}\right).

We show the above by explicitly writing the probability density function dμx1\mathrm{d}\mu_{x_{1}}. Denote by dγnk(x1,,xnk)\mathrm{d}\gamma_{n-k}(x_{1},\ldots,x_{n-k}) the standard Gaussian density function121212Explicitly dγm(x1,,xm)=i=1mdγ1(xi)\mathrm{d}\gamma_{m}(x_{1},\ldots,x_{m})=\prod_{i=1}^{m}\mathrm{d}\gamma_{1}(x_{i}) where dγ1(r)=12πer2/2\mathrm{d}\gamma_{1}(r)=\frac{1}{\sqrt{2\pi}}e^{-r^{2}/2} is the density function for one-dimensional standard Gaussian. in nk\mathbb{R}^{n-k}. The density function of the random vector 𝒙\bm{x} sampled from γnk\gamma_{n-k} conditioned on xXx\in X, is given γnk(X)1dγnk(x1,,xnk){\gamma_{n-k}(X)}^{-1}\cdot{\mathrm{d}\gamma_{n-k}(x_{1},\ldots,x_{n-k})} for xXx\in X and zero outside. Thus, we have

dμx1(r)\displaystyle\mathrm{d}\mu_{x_{1}}(r) =X{x1=r}dγnk(x1,,xnk)γnk(X)\displaystyle=\frac{\int_{X\cap\{x_{1}=r\}}\mathrm{d}\gamma_{n-k}(x_{1},\ldots,x_{n-k})}{\gamma_{n-k}(X)}
=dγ1(r)X{x1=r}dγnk1(x2,,xnk)γnk(X)=dγ1(r)γrel(X{x1=r})γnk(X).\displaystyle=\mathrm{d}\gamma_{1}(r)\cdot\frac{\int_{X\cap\{x_{1}=r\}}\mathrm{d}\gamma_{n-k-1}(x_{2},\ldots,x_{n-k})}{\gamma_{n-k}(X)}=\mathrm{d}\gamma_{1}(r)\cdot\frac{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}{\gamma_{n-k}(X)}.

Then, by concavity, the left hand side of Equation 5.11 is exactly given by

ln(eγrel(X{x1=r}))dμx1(r)\displaystyle\int_{\mathbb{R}}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}\right)\mathrm{d}\mu_{x_{1}}(r) ln(eγrel(X{x1=r})dμx1(r))\displaystyle\leq\ln\left(\int_{\mathbb{R}}\dfrac{e}{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}\mathrm{d}\mu_{x_{1}}(r)\right)
=ln(eγnk(X)dγ1(r))=ln(eγnk(X)).\displaystyle=\ln\left(\dfrac{e}{\gamma_{n-k}(X)}\int_{\mathbb{R}}\mathrm{d}\gamma_{1}(r)\right)=\ln\left(\dfrac{e}{\gamma_{n-k}(X)}\right).\qed

6 Level-Two Fourier Growth

In this section, we prove Theorem 1.3 that L1,2(h)=O(d3/2log3(n))L_{1,2}(h)=O\left(d^{3/2}\log^{3}(n)\right). Similar to the proof of level-one bound Theorem 1.2, we start with a dd-round communication protocol 𝒞~\widetilde{\mathcal{C}} over the Gaussian space as defined in Section 4. Note that 𝒞~\widetilde{\mathcal{C}} in turn comes from the original Boolean communication protocol 𝒞\mathcal{C}. Thus in the following we assume without loss of generality dnd\leq n.

Given the discussion in Subsection 4.3, to bound the second-level Fourier growth, one can attempt to bound the expected quadratic variation of the martingale that results from the protocol 𝒞¯\overline{\mathcal{C}} directly, but similar to the case of level-one it is hard to leverage cancellations here to prove the bound we aim for. So, starting from 𝒞~\widetilde{\mathcal{C}}, we will define a communication protocol 𝒞¯\overline{\mathcal{C}} that computes the same function as 𝒞~\widetilde{\mathcal{C}}, but satisfies some additional “clean” property where it is easier to control the quadratic variation. This new protocol will differ from 𝒞~\widetilde{\mathcal{C}} in two ways. Firstly, the protocol 𝒞¯\overline{\mathcal{C}} will consist of additional “cleanup steps” where Alice and Bob reveal certain quadratic forms of their input. Secondly, the protocol 𝒞¯\overline{\mathcal{C}} will send the real value of the quadratic form with certain precision. Note that this protocol will not involve sending real messages at all, instead, any potential real messages will be truncated to a few bits of precision and be sent as Boolean messages.

We emphasize that the main difference in the protocol 𝒞¯\overline{\mathcal{C}} from the corresponding level-one variant comes from the precision control, which is not needed there due to the fact that Gaussian distribution remains a (lower-dimensional) Gaussian under linear projections. For technical reasons we shall also need to analyze the martingale under a truncated Gaussian distribution, where all coordinates are bounded in some large interval [T,T][-T,T]. This intuitively doesn’t incur a noticeable difference on the distribution since it is highly unlikely that coordinates drawn from Gaussian distribution will be outside such intervals and recalling Remark 4.2 and Proposition 4.4, it still suffices to analyze the corresponding martingale under the truncated Gaussian distribution.

We next define the notion of a 44-wise clean protocol.

6.1 44-Wise Clean Protocols

Consider an intermediate node in the protocol and let XnX\subseteq\mathbb{R}^{n} refer to the set of Alice’s inputs reaching this node. We denote by 𝕊n×n1\mathbb{S}^{n\times n-1} the set of all matrices in n×n\mathbb{R}^{n\times n} with zero diagonal and unit norm (when viewed as n2n^{2}-dimensional vectors). For a parameter λ>0\lambda>0, we say that the set XX is 44-wise clean in a direction a𝕊n×n1a\in\mathbb{S}^{n\times n-1} if

𝔼𝒙γ[𝒙𝒙σ(X),a2|𝒙X]<λ,\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\sigma(X),a\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]<\lambda,

where we recall that σ(X)=𝔼𝒙γ[𝒙𝒙|𝒙X]\sigma(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in X\right] is the level-two center of mass of XX under the Gaussian measure. We say that the set XX is 44-wise clean if it is 44-wise clean in every direction aa. Our new protocol will consist of the original protocol, interspersed by cleaning steps. Once Alice sends her bit as in the original protocol, she cleans XX by revealing xx,a\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\right\rangle with a few bits of precision while there exists direction a𝕊n×n1a\in\mathbb{S}^{n\times n-1} such that XX not clean in direction aa. Once XX becomes clean, Alice proceeds to the next round and Bob does an analogous cleanup. We now describe this formally.

Communication with Finite Precision.

Let positive integer LL be a precision parameter that we will use for truncation. In our new communication protocol, we will send real numbers with precision 2L2^{-L}. This is formalized as the truncL(z)\mathrm{trunc}_{L}(z) function defined at zz\in\mathbb{R} as

truncL(z)=z2L/2L.\mathrm{trunc}_{L}(z)=\left\lfloor z\cdot 2^{L}\right\rfloor/2^{L}.
Construct 𝒞¯\overline{\mathcal{C}} from 𝒞~\widetilde{\mathcal{C}}.

As described before, 𝒞¯\overline{\mathcal{C}} will consist of the original protocol along with extra steps where Alice or Bob reveal the (approximate) value of a quadratic form on their input. Consider an intermediate node of this new protocol at depth tt. We always use the random variable 𝑿(t)\bm{X}^{(t)} (resp., 𝒀(t)\bm{Y}^{(t)}) to denote the set of inputs of Alice (resp., Bob) reaching the node. If Alice is revealing a quadratic form in this step, we use 𝒂(t)\bm{a}^{(t)} to denote the matrix of the quadratic form revealed at this node, otherwise set 𝒂(t)\bm{a}^{(t)} to be the all-zeroes matrix. We define 𝒃(t)\bm{b}^{(t)} similarly for Bob. Throughout the protocol, we will always set 𝒖(t)\bm{u}^{(t)} and 𝒗(t)\bm{v}^{(t)} to denote σ(𝑿(t))\sigma(\bm{X}^{(t)}) and σ(𝒀(t))\sigma(\bm{Y}^{(t)}) respectively.

Recall that λ>0\lambda>0 is the parameter for cleanup to be optimized later. Since we will now send real numbers (with certain precision) as bit-strings, their magnitudes should also be well controlled to guarantee bounded message length. This is managed by a parameter T>0T>0 and we will restrict the inputs to the parties in 𝒞¯\overline{\mathcal{C}} to come from the box [T,T]n[-T,T]^{n}. Note that, by Gaussian concentration, T=Θ(log(n))T=\Theta\left(\sqrt{\log(n)}\right) suffices.

  1. 1.

    At the beginning, Alice receives an input x[T,T]nx\in[-T,T]^{n} and Bob receives an input y[T,T]ny\in[-T,T]^{n}.

  2. 2.

    We initialize t0t\leftarrow 0, 𝑿(0),𝒀(0)[T,T]n\bm{X}^{(0)},\bm{Y}^{(0)}\leftarrow[-T,T]^{n}, and 𝒂(0),𝒃(0)0n×n\bm{a}^{(0)},\bm{b}^{(0)}\leftarrow 0^{n\times n}.

  3. 3.

    For each phase i=1,2,,di=1,2,\ldots,d: suppose we are starting the cleanup for a node at depth ii in the original protocol 𝒞~\widetilde{\mathcal{C}} and suppose we are at a node of depth tt in the new protocol 𝒞¯\overline{\mathcal{C}}. If it is Alice’s turn to speak in 𝒞~\widetilde{\mathcal{C}}:

    1. (a)

      Orthogonalization by revealing the correlation with Bob’s center of mass.
      Alice begins by revealing the inner product of her input xx with Bob’s current (signed) level-two center of mass η𝒗(t)\eta\odot\bm{v}^{(t)}. Since in the previous steps, she has already revealed the inner product with Bob’s previous centers of mass, for technical reasons, we will only have Alice announce the inner product with the component of η𝒗(t)\eta\odot\bm{v}^{(t)} that is orthogonal to the previous directions along which Alice announced the inner product. More formally, let 𝒂(t+1)\bm{a}^{(t+1)} be the component of η𝒗(t)\eta\odot\bm{v}^{(t)} that is orthonormal to the span of the previous directions 𝒂(τ)\bm{a}^{(\tau)} for τt\tau\leq t, i.e.,

      𝒂(t+1)=unit(η𝒗(t)τ=1tη𝒗(t),𝒂(τ)𝒂(τ)).\textstyle\bm{a}^{(t+1)}=\mathrm{unit}\left(\eta\odot\bm{v}^{(t)}-\sum_{\tau=1}^{t}\left\langle\eta\odot\bm{v}^{(t)},\bm{a}^{(\tau)}\right\rangle\cdot\bm{a}^{(\tau)}\right).

      Alice computes 𝒄¯(t+1)truncL(xx,𝒂(t+1))\overline{\bm{c}}^{(t+1)}\leftarrow\mathrm{trunc}_{L}\left(\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(t+1)}\right\rangle\right) and sends 𝒄¯(t+1)\overline{\bm{c}}^{(t+1)} to Bob. Set 𝒃(t+1)0n×n\bm{b}^{(t+1)}\leftarrow 0^{n\times n}. Increment tt by 11 and go to step (b).

    2. (b)

      Original communication. Alice sends the bit 𝒄¯(t+1)\overline{\bm{c}}^{(t+1)} that she was supposed to send in 𝒞~\widetilde{\mathcal{C}} based on previous messages and xx. Set 𝒂(t+1),𝒃(t+1)0n×n\bm{a}^{(t+1)},\bm{b}^{(t+1)}\leftarrow 0^{n\times n}. Increment tt by 1 and go to step (c).

    3. (c)

      Cleanup steps. While there exists some direction a𝕊n×n1a\in\mathbb{S}^{n\times n-1} orthogonal to previous directions, i.e., a,𝒂(τ)=0\left\langle a,\bm{a}^{(\tau)}\right\rangle=0 for all τt\tau\leq t, and 𝑿(t)\bm{X}^{(t)} is not 44-wise clean in direction aa, Alice computes 𝒄¯(t+1)truncL(xx,a)\overline{\bm{c}}^{(t+1)}\leftarrow\mathrm{trunc}_{L}\left(\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\right\rangle\right) and sends 𝒄¯(t+1)\overline{\bm{c}}^{(t+1)} to Bob. Set 𝒂(t+1)a\bm{a}^{(t+1)}\leftarrow a and 𝒃(t+1)0n×n\bm{b}^{(t+1)}\leftarrow 0^{n\times n}. Increment tt by 1. Repeat step (c) while 𝑿(t)\bm{X}^{(t)} is not 44-wise clean; otherwise, increment ii by 1 and go back to the for-loop in step 3 which starts a new phase.

    If it is Bob’s turn to speak, we define everything similarly with the role of x,𝒂,𝑿,𝒖x,\bm{a},\bm{X},\bm{u} switched with y,𝒃,𝒀,𝒗y,\bm{b},\bm{Y},\bm{v}.

  4. 4.

    Finally at the end of the protocol, the value 𝒞¯(x,y)\overline{\mathcal{C}}(x,y) is determined based on all the previous communication and the corresponding output it defines in 𝒞~\widetilde{\mathcal{C}}.

Remark 6.1.

Note that by construction, the non-zero matrices among 𝒂(1),𝒂(2),\bm{a}^{(1)},\bm{a}^{(2)},\ldots form an orthonormal set when viewed as n2n^{2}-dimensional vectors (similarly for 𝒃(1),𝒃(2),\bm{b}^{(1)},\bm{b}^{(2)},\ldots) and moreover, their diagonals are zero. Lastly, 𝒂(t)\bm{a}^{(t)} and 𝒃(t)\bm{b}^{(t)} are known to both Alice and Bob as they are canonically determined by previous messages.

We remark that the steps 3(a), 3(b), and 3(c) always occur in sequence for each party and we refer to such a sequence of steps as a phase for that party. Note that there are at most dd phases. If a new phase starts at time tt, then the current rectangle 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} is 44-wise clean for both parties by construction.

Now we formalize a few useful properties regarding the communication protocol 𝒞¯\overline{\mathcal{C}}. The first fact below follows since each 𝒖(t)\bm{u}^{(t)} is an expectation of 𝒙𝒙\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x} over some distribution and 𝒙𝒙\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x} has zero diagonal.

Fact 6.2.

𝒖(0)=𝒗(0)=0n×n\bm{u}^{(0)}=\bm{v}^{(0)}=0^{n\times n} and each 𝐮(t),𝐯(t)\bm{u}^{(t)},\bm{v}^{(t)} has zero diagonal.

The following follows from tail bounds for the univariate standard normal distribution.

Fact 6.3.

Let γ=γ(𝐗(0))γ(𝐘(0))\gamma^{*}=\gamma(\bm{X}^{(0)})\cdot\gamma(\bm{Y}^{(0)}). Then γ1O(neT2/2)\gamma^{*}\geq 1-O\left(n\cdot e^{-T^{2}/2}\right).

The next fact says that when a node fixes a quadratic form with 2L2^{-L} precision, for any two inputs that reach this node, the quadratic forms differ by at most 2L2^{-L}.

Fact 6.4.

In step 3(a) and 3(c), any x,x𝐗(t+1)x,x^{\prime}\in\bm{X}^{(t+1)} satisfies |xx,𝐚(t+1)xx,𝐚(t+1)|<2L\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(t+1)}\right\rangle-\left\langle x^{\prime}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x^{\prime},\bm{a}^{(t+1)}\right\rangle\right|<2^{-L}. Similarly any y,y𝐘(t+1)y,y^{\prime}\in\bm{Y}^{(t+1)} satisfies |yy,𝐛(t+1)yy,𝐛(t+1)|<2L\left|\left\langle y\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y,\bm{b}^{(t+1)}\right\rangle-\left\langle y^{\prime}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y^{\prime},\bm{b}^{(t+1)}\right\rangle\right|<2^{-L}.

The next claim bounds the maximum attainable norms for Alice and Bob’s level-two center of masses at any point in the protocol. This uses the fact that the inputs come from the truncated Gaussian distribution.

Claim 6.5.

𝒖(t)=η𝒖(t)<nT\left\|\bm{u}^{(t)}\right\|=\left\|\eta\odot\bm{u}^{(t)}\right\|<nT and 𝒗(t)=η𝒗(t)<nT\left\|\bm{v}^{(t)}\right\|=\left\|\eta\odot\bm{v}^{(t)}\right\|<nT for all possible tt and 𝒖(t),𝒗(t)\bm{u}^{(t)},\bm{v}^{(t)} throughout the communication.

Proof.

Since η\eta is a matrix with zero diagonal and {±1}\{\pm 1\} entries off diagonal and 𝒖(t)\bm{u}^{(t)} also has zero diagonal, 𝒖(t)=η𝒖(t)\left\|\bm{u}^{(t)}\right\|=\left\|\eta\odot\bm{u}^{(t)}\right\|. In addition, since 𝑿(t)𝑿(0)=[T,T]n\bm{X}^{(t)}\subseteq\bm{X}^{(0)}=[-T,T]^{n}, we have

𝒖(t)𝔼𝒙γ[(𝒙𝒙)|𝒙𝑿(t)](n2n)T2<nT.\left\|\bm{u}^{(t)}\right\|\leq\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\|\left(\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right)\right\|\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right]\leq\sqrt{(n^{2}-n)\cdot T^{2}}<nT.

A similar analysis works for 𝒗(t)\bm{v}^{(t)}. ∎

The next claim gives a bound on the length of any message in the protocol 𝒞¯\overline{\mathcal{C}}.

Claim 6.6.

For any x𝑿(0)x\in\bm{X}^{(0)} and y𝒀(0)y\in\bm{Y}^{(0)}, any message in 𝒞¯(x,y)\overline{\mathcal{C}}(x,y) consists of at most L+log(Tn)L+\log(Tn) many bits.

Proof.

Assume without loss of generality it is Alice’s turn to speak. On step 3(b) she sends one bits. On steps 3(a) and 3(c), she computes truncL(xx,a)\mathrm{trunc}_{L}(\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\rangle) for some a𝕊n×n1a\in\mathbb{S}^{n\times n-1} and send the result. Since

|xx,a|xxa(n2n)T2<nT,\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\right\rangle\right|\leq\left\|x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right\|\cdot\left\|a\right\|\leq\sqrt{(n^{2}-n)\cdot T^{2}}<nT,

and the message is a multiple of 2L2^{-L} that means truncL\mathrm{trunc}_{L} yields a message with L+log(nT)L+\log(nT) many bits. ∎

The last claim bounds the maximum depth of the new protocol 𝒞¯\overline{\mathcal{C}}.

Claim 6.7.

Let \ell be an arbitrary leaf of the protocol 𝒞¯\overline{\mathcal{C}} and D()D(\ell) be its depth. Then D()2n2D(\ell)\leq 2n^{2}. Moreover, along this path there are at most n2nn^{2}-n many non-zero 𝒂(t)\bm{a}^{(t)} and at most n2nn^{2}-n many non-zero 𝒃(t)\bm{b}^{(t)} for t{1,,D()}t\in\{1,\ldots,D(\ell)\}.

Proof.

We count the number of communication steps separately:

  • Steps 3(a) and 3(b). Steps 3(a) and 3(b) occur once in every phase, thus at most dd times.

  • Step 3(c). For Alice, each time she communicates at step 3(c), the direction a𝕊n×n1a\in\mathbb{S}^{n\times n-1} is non-zero and orthogonal to all previous 𝒂(t)\bm{a}^{(t)}’s. Since the dimension of 𝕊n×n1\mathbb{S}^{n\times n-1} is n2nn^{2}-n, this happens at most n2nn^{2}-n times. Similar argument works for Bob.

Thus in total we have at most 2(n2n)+2d2n22(n^{2}-n)+2d\leq 2n^{2} steps. ∎

We will eventually show that, with suitable choice of λ,T,L\lambda,T,L, typically D()D(\ell) is at most dpolylog(n)d\cdot\mathrm{polylog}(n).

6.2 Bounding the Expected Quadratic Variation

Consider the martingale process defined in Equation 4.5 from a random walk on the protocol tree generated by 𝒞¯\overline{\mathcal{C}} where the inputs 𝒙,𝒚\bm{x},\bm{y} are sampled from γn\gamma_{n} conditioned on being in the bounded cube [T,T]n[-T,T]^{n}. Recall that Proposition 4.3 still holds (see Remark 4.5).

Formally, at time tt the process is defined by

𝒛2(t)=𝒖(t),η𝒗(t),\bm{z}^{(t)}_{2}=\left\langle\bm{u}^{(t)},\eta\odot\bm{v}^{(t)}\right\rangle,

where we recall that 𝒖(t)=σ(𝑿(t))\bm{u}^{(t)}=\sigma(\bm{X}^{(t)}) and 𝒗(t)=σ(𝒀(t)))\bm{v}^{(t)}=\sigma(\bm{Y}^{(t))}) and η\eta is a fixed sign matrix with a zero diagonal. The martingale process stops once it hits a leaf of 𝒞¯\overline{\mathcal{C}}. Let 𝒅\bm{d} denote the (stopping) time when this happens. Note that 𝔼[𝒅]\operatorname*{\mathbb{E}}[\bm{d}] is exactly the expected depth of the protocol 𝒞¯\overline{\mathcal{C}}.

In light of Remark 4.2 and Proposition 4.4, to prove Theorem 1.3, it suffices to prove the following.

Lemma 6.8.

𝔼[t=1𝒅(Δ𝒛2(t))2]=O(d3log6(n)).\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right]=O\left(d^{3}\log^{6}(n)\right).

Lemma 6.8 is proved in three steps. We first show that essentially the only change in the value of the martingale is the orthogonalization step 3(a). The reason is the same as the level-one bound: Alice’s messages sent in step 3(b) and 3(c) are always near-orthogonal to Bob’s current level-two center of mass, thus they do not change the value of the martingale 𝒛2(t)\bm{z}^{(t)}_{2} much. Moreover, by level-two analog of Subsection 2.1, since Alice’s current node was clean just before Alice sent the message in step 3(a), the expected change 𝔼[(Δ𝒛2(t+1))2]\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{2}\right)^{2}\right] can be bounded in terms of the squared norm of the change that occurred in 𝒖(t)\bm{u}^{(t)} (or 𝒗(t)\bm{v}^{(t)}) between the current round and the last round where Alice was in step 3(a). Similar argument works for Bob.

Formally, this is encapsulated by the next lemma for which we need some additional definitions. Let ((t))t(\mathcal{F}^{(t)})_{t} denote the natural filtration induced by the random walk on the generalized protocol tree with respect to which 𝒛2(t)\bm{z}^{(t)}_{2} is a Doob martingale and also 𝒖(t),𝒗(t)\bm{u}^{(t)},\bm{v}^{(t)} form vector-valued martingales (recall Proposition 4.3). Note that (t)\mathcal{F}^{(t)} fixes all the rectangles encountered during times 0,,t0,\ldots,t and thus for τt\tau\leq t, the random variables 𝒖(τ),𝒗(τ),𝒛2(τ)\bm{u}^{(\tau)},\bm{v}^{(\tau)},\bm{z}^{(\tau)}_{2} are determined, in particular, they are (t)\mathcal{F}^{(t)}-measurable. Recalling that λ\lambda is the cleanup parameter to be optimized later, we then have the following. Below we assume without any loss of generality that Alice speaks first and, in particular, we note that Alice speaks in step 3(a) for the first time at time zero when both Alice and Bob’s center of masses are at zero: 𝒖(0)=𝒗(0)=0\bm{u}^{(0)}=\bm{v}^{(0)}=0.

Lemma 6.9 (Step Size).

Let 0=𝛕1<𝛕2<𝐝0=\bm{\tau}_{1}<\bm{\tau}_{2}<\cdots\leq\bm{d} be a sequence of stopping times with 𝛕m\bm{\tau}_{m} being the index of the round where Alice speaks in step 3(a) for the mthm^{\text{th}} time or 𝐝\bm{d} if there is no such round. Then, for any integer m2m\geq 2,

𝔼[(Δ𝒛2(𝝉m+1))2|(𝝉m)]λ𝒗(𝝉m)𝒗(𝝉m1)2+16n7T32L.\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}_{m}+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau}_{m})}\right]\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+16n^{7}T^{3}\cdot 2^{-L}.

and moreover, for any tt\in\mathbb{N}, we have that

𝔼[(Δ𝒛2(t+1))2|(t),𝝉m1<t<𝝉m,Alice speaks at time t]4n6T222L\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(t)},\bm{\tau}_{m-1}<t<\bm{\tau}_{m},\text{Alice speaks at time }t\right]\leq 4n^{6}T^{2}\cdot 2^{-2L}

A similar statement also holds if Bob speaks where 𝐯\bm{v} is replaced by 𝐮\bm{u} and the sequence (𝛕m)(\bm{\tau}_{m}) is replaced by (𝛕m)(\bm{\tau}^{\prime}_{m}) where 𝛕m\bm{\tau}^{\prime}_{m} is the index of the round where Bob speaks in step 3(a) for the mthm^{\text{th}} time or 𝐝\bm{d} if there is no such round.

We indeed see that, if L=Ω(log(n))L=\Omega(\log(n)) and T=O(log(n))T=O(\sqrt{\log(n)}), then poly(T,n)2L=o(1)\mathrm{poly}(T,n)\cdot 2^{-L}=o(1), and steps 3(b) and 3(c) do not contribute much to the quadratic variation and only the steps 3(a) do. Also, since the first time Alice and Bob speak, they start in step 3(a), we also note that 𝒖(𝝉1)\bm{u}^{(\bm{\tau}_{1})} and 𝒗(𝝉1)\bm{v}^{(\bm{\tau}^{\prime}_{1})} are their initial centers of mass which are both zero.

We shall prove the above lemma in Subsection 6.3 and continue with the bound on the quadratic variation here. Using the bounds on the step sizes from Lemma 6.9,

𝔼[t=1𝒅(Δ𝒛2(t))2]\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right] λ𝔼[m2𝒗(𝝉m)𝒗(𝝉m1)2+𝒖(𝝉m)𝒖(𝝉m1)2]+16n7T32L𝔼[𝒅]\displaystyle\leq\lambda\cdot\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+\left\|\bm{u}^{(\bm{\tau}^{\prime}_{m})}-\bm{u}^{(\bm{\tau}^{\prime}_{m-1})}\right\|^{2}\right]+16n^{7}T^{3}\cdot 2^{-L}\cdot\operatorname*{\mathbb{E}}[\bm{d}]
λ𝔼[m2𝒗(𝝉m)𝒗(𝝉m1)2+𝒖(𝝉m)𝒖(𝝉m1)2]+16n7T32L2n2.\displaystyle\leq\lambda\cdot\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+\left\|\bm{u}^{(\bm{\tau}^{\prime}_{m})}-\bm{u}^{(\bm{\tau}^{\prime}_{m-1})}\right\|^{2}\right]+16n^{7}T^{3}\cdot 2^{-L}\cdot 2n^{2}. (by Claim 6.7)

On the other hand, using the orthogonality of vector-valued martingale differences from Equation 3.2,

𝔼[m2𝒗(𝝉m)𝒗(𝝉m1)2]=𝔼[𝒗(𝒅)2].\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right].

A similar statement holds for (𝒖(t))(\bm{u}^{(t)}) as well. Therefore,

𝔼[t=1𝒅(Δ𝒛2(t))2]λ(𝔼[𝒖(𝒅)2]+𝔼[𝒗(𝒅)2])+64n9T32L.\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right]\leq\lambda\cdot\left(\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}\right]+\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\right)+64n^{9}T^{3}\cdot 2^{-L}. (6.1)

Then in Subsection 6.4 we will apply level-two inequalities (see Theorem 3.1) to convert the bounding 𝔼[𝒖(𝒅)2+𝒗(𝒅)2]\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right] into bounding the second moment 𝔼[𝒅2]\operatorname*{\mathbb{E}}[\bm{d}^{2}]. This reduction is formalized as Lemma 6.10 below and its proof is similar to [27, Claim 1].

For each leaf \ell, let γ()=γ(𝑿(D()))γ(𝒀(D()))\gamma(\ell)=\gamma(\bm{X}^{(D(\ell))})\cdot\gamma(\bm{Y}^{(D(\ell))}) be the Gaussian measure of the rectangle at \ell. Recall γ=γ(𝑿(0))×γ(𝒀(0))\gamma^{*}=\gamma(\bm{X}^{(0)})\times\gamma(\bm{Y}^{(0)}).

Lemma 6.10.

𝔼[𝒖(𝒅)2+𝒗(𝒅)2]O(1γ+L2𝔼[𝒅2])\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\leq O\left(\frac{1}{\gamma^{*}}+L^{2}\operatorname*{\mathbb{E}}[\bm{d}^{2}]\right).

Finally, in Subsection 6.5, we bound the second moment 𝔼[𝒅2]\operatorname*{\mathbb{E}}[\bm{d}^{2}] for a suitable choice of parameters.

Lemma 6.11.

It holds that 𝔼[𝐝2]=O(d2)\operatorname*{\mathbb{E}}[\bm{d}^{2}]=O(d^{2}) and γ34\gamma^{*}\geq\frac{3}{4} for L=Θ(log(n))L=\Theta(\log(n)), T=Θ(log(n))T=\Theta(\sqrt{\log(n)}), and λ=Θ(dlog4(n))\lambda=\Theta(d\log^{4}(n)).

Given Lemmas 6.10 and 6.11,the proof of Lemma 6.8 naturally follows.

Proof of Lemma 6.8.

With the parameters chosen in Lemma 6.11, we have

𝔼[t=1𝒅(Δ𝒛2(t))2]\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right] O(dlog4(n))(𝔼[𝒖(𝒅)2]+𝔼[𝒗(𝒅)2])+1\displaystyle\leq O(d\log^{4}(n))\cdot\left(\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}\right]+\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\right)+1 (by Equation 6.1)
O(dlog4(n))(1+log2(n)𝔼[𝒅2])+1\displaystyle\leq O(d\log^{4}(n))\cdot\left(1+\log^{2}(n)\cdot\operatorname*{\mathbb{E}}[\bm{d}^{2}]\right)+1 (by Lemma 6.10)
O(dlog4(n))(1+log2(n)d2)+1\displaystyle\leq O(d\log^{4}(n))\cdot\left(1+\log^{2}(n)\cdot d^{2}\right)+1 (by Lemma 6.11)
=O(d3log6(n)).\displaystyle=O(d^{3}\log^{6}(n)).
Remark 6.12.

Note that our proof for level-two Fourier growth actually holds for a slightly more general setting, where Alice and Bob are allowed to send O(L)=O(log(n))O(L)=O(\log(n)) bits during each original communication round. This can be viewed as balancing the length of the messages in step 3(b) with step 3(a) and step 3(c).

Since one can always convert a dd-round 11-bit communication protocol into a 2dloglog(n)\frac{2d}{\log\log(n)}-round log(n)\log(n)-bit communication protocol, we obtain a slightly better level-two Fourier growth bound of

O(d3/2log3(n)(loglog(n))3/2).O\left(\frac{d^{3/2}\log^{3}(n)}{\left(\log\log(n)\right)^{3/2}}\right).

The conversion is done by Alice (resp., Bob) enumerating the next loglog(n)/2\log\log(n)/2 bits from Bob (resp., Alice), and providing the corresponding loglog(n)/2\log\log(n)/2 bits responses for each possibility.

It is also possible to improve the log3(n)\log^{3}(n) factor to log2(n)\log^{2}(n) by varying the cleanup parameter λ\lambda with depth. For example, for depth in the interval [4rd,4(r+1)d][4rd,4(r+1)d], one could pick λr=Θ(dlog2(n)r2)\lambda_{r}=\Theta(d\cdot\log^{2}(n)\cdot r^{2}). Since our focus is mostly on improving the polynomial dependence in dd where there is still room for improvement, we do not make an effort here to improve the polylog terms.

6.3 Bounds on Step Sizes (Proof of Lemma 6.9)

Let us abbreviate 𝝉=𝝉m\bm{\tau}=\bm{\tau}_{m} and note that at time 𝝉\bm{\tau} a new phase starts for Alice. By construction, this means that the current rectangle 𝑿(𝝉)×𝒀(𝝉)\bm{X}^{(\bm{\tau})}\times\bm{Y}^{(\bm{\tau})} determined by (𝝉)\mathcal{F}^{(\bm{\tau})} is 44-wise clean with parameter λ\lambda, and since Alice is in step 3(a) at the start of a new phase, 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is chosen to be the (normalized) component of η𝒗(𝝉)\eta\odot\bm{v}^{(\bm{\tau})} that is orthogonal to previous directions 𝒂(1),,𝒂(𝝉)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})}.

For each r=1,,𝝉+1r=1,\ldots,\bm{\tau}+1, let 𝜷(r):=η𝒗(𝝉),𝒂(r)\bm{\beta}^{(r)}:=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle be the length of η𝒗(𝝉)\eta\odot\bm{v}^{(\bm{\tau})} along direction 𝒂(r)\bm{a}^{(r)}. Each 𝜷(r)\bm{\beta}^{(r)} is (𝝉)\mathcal{F}^{(\bm{\tau})}-measurable (i.e., it is determined by (𝝉)\mathcal{F}^{(\bm{\tau})}) and η𝒗(𝝉)=r𝝉+1𝜷(r)𝒂(r)\eta\odot\bm{v}^{(\bm{\tau})}=\sum_{r\leq\bm{\tau}+1}\bm{\beta}^{(r)}\cdot\bm{a}^{(r)}. In this case, we have

𝔼[(Δ𝒛2(𝝉+1))2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] =𝔼[𝒖(𝝉+1)𝒖(𝝉),η𝒗(𝝉)2|(𝝉)]\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]
=𝔼[(r=1𝝉+1𝜷(r)𝒖(𝝉+1)𝒖(𝝉),𝒂(r))2|(𝝉)].\displaystyle=\operatorname*{\mathbb{E}}\left[\left(\sum_{r=1}^{\bm{\tau}+1}\bm{\beta}^{(r)}\cdot\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]. (6.2)

Similar to the level-one proof, the components of 𝒖(𝝉+1)\bm{u}^{(\bm{\tau}+1)} and 𝒖(𝝉)\bm{u}^{(\bm{\tau})} are roughly the same along any of the previous directions 𝒂(1),,𝒂(𝝉)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})} and so they almost cancel out and the major quantity is in the direction 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)}. This follows since, in all the previous steps r𝝉r\leq\bm{\tau}, Alice has already fixed xx,𝒂(r)\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(r)}\right\rangle with precision 2L2^{-L}. This implies that for any 𝑿(𝝉)\bm{X}^{(\bm{\tau})} and 𝑿(𝝉+1)\bm{X}^{(\bm{\tau}+1)} that are determined by (𝝉+1)\mathcal{F}^{(\bm{\tau}+1)}, the inner product with all the previous 𝒂(1),,𝒂(𝝉)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})} is fixed with precision 2L2^{-L} over the choice of xx. Formally, by Fact 6.4, we have that for any x𝑿(𝝉)x\in\bm{X}^{(\bm{\tau})} and x𝑿(𝝉+1)x^{\prime}\in\bm{X}^{(\bm{\tau}+1)}, it holds that |xx,𝒂(r)xx,𝒂(r)|2L\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(r)}\right\rangle-\left\langle x^{\prime}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x^{\prime},\bm{a}^{(r)}\right\rangle\right|\leq 2^{-L} for all r𝝉r\leq\bm{\tau}. In particular, since 𝒖(𝝉)=σ(𝑿(𝝉))\bm{u}^{(\bm{\tau})}=\sigma(\bm{X}^{(\bm{\tau})}) and 𝒖(𝝉+1)=σ(𝑿(𝝉+1))\bm{u}^{(\bm{\tau}+1)}=\sigma(\bm{X}^{(\bm{\tau}+1)}) are the corresponding centers of mass, we have that

|𝒖(𝝉+1)𝒖(𝝉),𝒂(r)|2Lfor all r𝝉.\left|\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right|\leq 2^{-L}\quad\text{for all $r\leq\bm{\tau}$.} (6.3)

On the other hand, since 𝑿(𝝉+1)𝑿(𝝉)𝑿(0)=[T,T]n\bm{X}^{(\bm{\tau}+1)}\subseteq\bm{X}^{(\bm{\tau})}\subseteq\bm{X}^{(0)}=[-T,T]^{n} and 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is a unit direction, we have

|𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)|𝒖(𝝉+1)𝒖(𝝉)2nT.\left|\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle\right|\leq\left\|\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})}\right\|\leq 2nT. (6.4)

Similarly, noting that η\eta is a sign matrix, we can bound

|𝜷(r)|=|η𝒗(𝝉),𝒂(r)|η𝒗(𝝉)𝒗(𝝉)nTfor all r𝝉+1.\left|\bm{\beta}^{(r)}\right|=\left|\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right|\leq\left\|\eta\odot\bm{v}^{(\bm{\tau})}\right\|\leq\left\|\bm{v}^{(\bm{\tau})}\right\|\leq nT\quad\text{for all $r\leq\bm{\tau}+1$.} (6.5)

Expanding the square in Subsection 6.3 and plugging these estimates to each one of the (𝝉+1)2(\bm{\tau}+1)^{2} terms gives

𝔼[(Δ𝒛2(𝝉+1))2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] 𝔼[(𝜷(𝝉+1))2𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)2+((𝝉+1)21)2(nT)32L|(𝝉)]\displaystyle\leq\operatorname*{\mathbb{E}}\left[\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}+((\bm{\tau}+1)^{2}-1)\cdot\tfrac{2(nT)^{3}}{2^{L}}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]
(𝜷(𝝉+1))2𝔼[𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]+12n7T32L,\displaystyle\leq\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]+12n^{7}T^{3}\cdot 2^{-L}, (6.6)

where the second line follows from Claim 6.7.

We now bound the term outside the expectation by the change in the center of mass 𝒗()\bm{v}^{(\cdot)} and the term inside the expectation by the fact that the set is 44-wise clean.

Term Outside the Expectation.

Recall that 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is chosen to be the (normalized) component of η𝒗(𝝉)\eta\odot\bm{v}^{(\bm{\tau})} that is orthogonal to the span of 𝒂(1),,𝒂(𝝉)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})}. Since η𝒗(𝝉m1)\eta\odot\bm{v}^{(\bm{\tau}_{m}-1)} is in the span of 𝒂(1),,𝒂(𝝉m1+1)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)} and 𝝉m1+1𝝉=𝝉m\bm{\tau}_{m-1}+1\leq\bm{\tau}=\bm{\tau}_{m}, it is orthogonal to 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)}. Hence

𝜷(𝝉+1)=η𝒗(𝝉),𝒂(𝝉+1)=η(𝒗(𝝉)𝒗(𝝉m1)),𝒂(𝝉+1).\bm{\beta}^{(\bm{\tau}+1)}=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle=\left\langle\eta\odot\left(\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right),\bm{a}^{(\bm{\tau}+1)}\right\rangle.

Since 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is a unit direction and η\eta is a sign matrix, this implies that

(𝜷(𝝉+1))2𝒗(𝝉)𝒗(𝝉m1)2.\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\leq\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}. (6.7)
Term Inside the Expectation.

Recall that Alice is in step 3(a), she sends xx,𝒂(𝝉+1)\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle with precision 2L2^{-L} at time 𝝉\bm{\tau}, and thus the same inner product with 𝒂(𝝉+1)\bm{a}^{(\bm{\tau}+1)} is fixed with precision 2L2^{-L} for every point in 𝑿(𝝉+1)\bm{X}^{(\bm{\tau}+1)} determined by (𝝉+1)\mathcal{F}^{(\bm{\tau}+1)}. Thus

𝒖(𝝉+1),𝒂(𝝉+1)2\displaystyle\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2} =(𝔼𝒙γ[𝒙𝒙,𝒂(𝝉+1)|𝒙𝑿(𝝉+1)])2\displaystyle=\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(\bm{\tau}+1)}\right\rangle\,\middle|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\right)^{2}
=(xx,𝒂(𝝉+1)+𝔼𝒙γ[ε𝒙|𝒙𝑿(𝝉+1)])2\displaystyle=\left(\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle+\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\varepsilon_{\bm{x}}\,\middle|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\right)^{2} (|ε𝒙|2L|\varepsilon_{\bm{x}}|\leq 2^{-L} is the truncation error by Fact 6.4)
xx,𝒂(𝝉+1)2+22L+21L|xx,𝒂(𝝉+1)|\displaystyle\leq\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}+2^{-2L}+2^{1-L}\cdot\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle\right|
xx,𝒂(𝝉+1)2+nT22L,\displaystyle\leq\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}+nT\cdot 2^{2-L}, (6.8)

where the last line follows from |xx,𝒂(𝝉+1)|xx\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle\right|\leq\left\|x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right\| and x𝑿(0)=[T,T]nx\in\bm{X}^{(0)}=[-T,T]^{n}.

Final Bound.

Since (𝒖(r))r(\bm{u}^{(r)})_{r} is a matrix-valued martingale and thus 𝔼[𝒖(𝝉+1)|(𝝉)]=𝒖(𝝉)\operatorname*{\mathbb{E}}\left[\bm{u}^{(\bm{\tau}+1)}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\bm{u}^{(\bm{\tau})}, we have

𝔼[𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]=𝔼[𝒖(𝝉+1),𝒂(𝝉+1)2𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]

Then by Equation 6.8, we upper bound the right hand side by

nT22L+𝔼𝒙γ[𝒙𝒙,𝒂(𝝉+1)2𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)].\displaystyle nT\cdot 2^{2-L}+\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right].

Since 𝑿(𝝉)\bm{X}^{(\bm{\tau})} is 44-wise clean with parameter λ\lambda, it can be bounded by nT22L+λnT\cdot 2^{2-L}+\lambda:

𝔼[𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]nT22L+λ\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]\leq nT\cdot 2^{2-L}+\lambda (6.9)

Putting everything together, we have

𝔼[(Δ𝒛2(𝝉+1))2|(𝝉)]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right] (𝜷(𝝉+1))2𝔼[𝒖(𝝉+1)𝒖(𝝉),𝒂(𝝉+1)2|(𝝉)]+12n7T32L\displaystyle\leq\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]+12n^{7}T^{3}\cdot 2^{-L} (by Equation 6.6)
(𝜷(𝝉+1))2(nT22L+λ)+12n7T32L\displaystyle\leq\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\cdot\left(nT\cdot 2^{2-L}+\lambda\right)+12n^{7}T^{3}\cdot 2^{-L} (by Equation 6.9)
λ(𝜷(𝝉+1))2+n3T322L+12n7T32L\displaystyle\leq\lambda\cdot\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}+n^{3}T^{3}\cdot 2^{2-L}+12n^{7}T^{3}\cdot 2^{-L} (by Equation 6.5)
λ𝒗(𝝉)𝒗(𝝉m1)2+n3T322L+12n7T32L\displaystyle\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+n^{3}T^{3}\cdot 2^{2-L}+12n^{7}T^{3}\cdot 2^{-L} (by Equation 6.7)
λ𝒗(𝝉)𝒗(𝝉m1)2+16n7T32L.\displaystyle\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+16n^{7}T^{3}\cdot 2^{-L}.

This completes the proof of the first statement in the lemma.

For the moreover part, let us condition on the event 𝝉m1<t<𝝉m\bm{\tau}_{m-1}<t<\bm{\tau}_{m} where Alice speaks at time tt. Note that such tt must all lie in the same phase of the protocol where Alice is the only one speaking. So, Bob’s center of mass does not change from the time 𝝉m1\bm{\tau}_{m-1} till tt, i.e., 𝒗(t+1)=𝒗(𝝉m1)\bm{v}^{(t+1)}=\bm{v}^{(\bm{\tau}_{m-1})}. Thus we have

Δ𝒛2(t+1)=𝒖(t+1)𝒖(t),η𝒗(𝝉m1).\Delta\bm{z}^{(t+1)}_{2}=\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}\right\rangle. (6.10)

Analogous to Equation 6.3, the component of Alice’s center of mass along the previous directions are fixed with precision 2L2^{-L}. Thus by Fact 6.4,

|𝒖(t+1)𝒖(t),𝒂(r)|2Lfor all rt.\left|\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle\right|\leq 2^{-L}\quad\text{for all $r\leq t$.} (6.11)

Furthermore, by construction, η𝒗(𝝉m1)\eta\odot\bm{v}^{(\bm{\tau}_{m-1})} lies in the space spanned by 𝒂(1),,𝒂(𝝉m1+1)\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)}. Note that 𝝉m1+1t\bm{\tau}_{m-1}+1\leq t. Similar to the previous analysis, for each r=1,,tr=1,\ldots,t, let 𝜷(r):=η𝒗(t),𝒂(r)\bm{\beta}^{(r)}:=\left\langle\eta\odot\bm{v}^{(t)},\bm{a}^{(r)}\right\rangle be the length of η𝒗(t)\eta\odot\bm{v}^{(t)} along direction 𝒂(r)\bm{a}^{(r)}. Then Equation 6.5 also holds here. Therefore

|Δ𝒛2(t+1)|\displaystyle\left|\Delta\bm{z}^{(t+1)}_{2}\right| =|r=1t𝜷(r)𝒖(t+1)𝒖(t),𝒂(r)|\displaystyle=\left|\sum_{r=1}^{t}\bm{\beta}^{(r)}\cdot\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle\right| (by Equation 6.10)
r=1t|𝜷(r)||𝒖(t+1)𝒖(t),𝒂(r)|r=1tnT2L\displaystyle\leq\sum_{r=1}^{t}\left|\bm{\beta}^{(r)}\right|\cdot\left|\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle\right|\leq\sum_{r=1}^{t}nT\cdot 2^{-L} (by Equation 6.5 and Equation 6.11)
2n3T2L.\displaystyle\leq 2n^{3}T\cdot 2^{-L}. (by Claim 6.7)

6.4 Conversion to Second Moment Bounds of the Depth (Proof of Lemma 6.10)

Recall γ=γ(𝑿(0))×γ(𝒀(0))\gamma^{*}=\gamma(\bm{X}^{(0)})\times\gamma(\bm{Y}^{(0)}) and γ()=γ(𝑿(D()))γ(𝒀(D()))\gamma(\ell)=\gamma(\bm{X}^{(D(\ell)}))\cdot\gamma(\bm{Y}^{(D(\ell))}) for each leaf \ell. The goal of this subsection is to prove Lemma 6.10.

We first note the following basic fact.

Fact 6.13.

γ()=γ\sum_{\ell}\gamma(\ell)=\gamma^{*} and

𝐏𝐫𝒙𝑿(0),𝒚𝒀(0)[𝒞¯(𝒙,𝒚) reaches leaf ]=γ()/γ.\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\bm{X}^{(0)},\bm{y}\sim\bm{Y}^{(0)}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\text{ reaches leaf }\ell\right]=\gamma(\ell)/\gamma^{*}.

Now we apply Theorem 3.1 with k=2k=2 to relate the LHS of Lemma 6.10 with an entropy-type bound.

Lemma 6.14.

𝔼[𝒖(𝒅)2+𝒗(𝒅)2]4e2γγ()ln2(eγ())\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\leq\frac{4e^{2}}{\gamma^{*}}\sum_{\ell}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right).

Proof.

Let \ell be a fixed leaf and D=D()D=D(\ell) be its depth. Note that this also fixes the rectangle X(D)×Y(D)X^{(D)}\times Y^{(D)} and thus the centers of mass u(D),v(D)u^{(D)},v^{(D)}. Define the indicator function 𝟏:2n{0,1}\mathbf{1}_{\ell}\colon\mathbb{R}^{2n}\to\{0,1\} by

𝟏(x,y)={1(x,y)X(D)×Y(D),0otherwise.\mathbf{1}_{\ell}(x,y)=\begin{cases}1&(x,y)\in X^{(D)}\times Y^{(D)},\\ 0&\text{otherwise.}\end{cases}

Then we have

u(D)2+v(D)2\displaystyle\phantom{\leq}\left\|u^{(D)}\right\|^{2}+\left\|v^{(D)}\right\|^{2}
=𝔼𝒙γ[𝒙𝒙|𝒙X(D)]2+𝔼𝒚γ[𝒚𝒚|𝒚Y(D)]2\displaystyle=\left\|\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in X^{(D)}\right]\right\|^{2}+\left\|\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y}\,\middle|\,\bm{y}\in Y^{(D)}\right]\right\|^{2}
=i,j=1ijn(𝔼𝒙γ[𝒙i𝒙j|𝒙X(D)])2+i,j=1ijn(𝔼𝒚γ[𝒚i𝒚j|𝒚Y(D)])2\displaystyle=\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}_{i}\bm{x}_{j}\,\middle|\,\bm{x}\in X^{(D)}\right]\right)^{2}+\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\bm{y}_{i}\bm{y}_{j}\,\middle|\,\bm{y}\in Y^{(D)}\right]\right)^{2}
=i,j=1ijn(𝔼𝒙,𝒚γ[𝒙i𝒙j|(𝒙,𝒚)X(D)×Y(D)])2+i,j=1ijn(𝔼𝒙,𝒚γ[𝒚i𝒚j|(𝒙,𝒚)X(D)×Y(D)])2\displaystyle=\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{x}_{i}\bm{x}_{j}\,\middle|\,(\bm{x},\bm{y})\in X^{(D)}\times Y^{(D)}\right]\right)^{2}+\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{y}_{i}\bm{y}_{j}\,\middle|\,(\bm{x},\bm{y})\in X^{(D)}\times Y^{(D)}\right]\right)^{2}
=2γ()2(S([n]2)(𝔼𝒙γ,𝒚γ[𝟏(𝒙,𝒚)𝒙S])2+S([n]2)(𝔼𝒙γ,𝒚γ[𝟏(𝒙,𝒚)𝒚S])2)\displaystyle=\frac{2}{\gamma(\ell)^{2}}\left(\sum_{S\in\binom{[n]}{2}}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma,\bm{y}\sim\gamma}\left[\mathbf{1}_{\ell}(\bm{x},\bm{y})\bm{x}_{S}\right]\right)^{2}+\sum_{S\in\binom{[n]}{2}}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma,\bm{y}\sim\gamma}\left[\mathbf{1}_{\ell}(\bm{x},\bm{y})\bm{y}_{S}\right]\right)^{2}\right)
2γ()2S([2n]2)(𝔼𝒘γn×γn[𝟏(𝒘)𝒘S])2\displaystyle\leq\frac{2}{\gamma(\ell)^{2}}\sum_{S\in\binom{[2n]}{2}}\left(\operatorname*{\mathbb{E}}_{\bm{w}\sim\gamma_{n}\times\gamma_{n}}\left[\mathbf{1}_{\ell}(\bm{w})\bm{w}_{S}\right]\right)^{2}
2γ()22e2γ()2ln2(eγ())\displaystyle\leq\frac{2}{\gamma(\ell)^{2}}\cdot 2e^{2}\gamma(\ell)^{2}\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right) (by Theorem 3.1)
=4e2ln2(eγ()).\displaystyle=4e^{2}\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right).

Therefore taking expectation over a random \ell, by Fact 6.13, we have

𝔼[𝒖(𝒅)2+𝒗(𝒅)2]4e2𝔼[ln2(eγ())]=4e2γγ()ln2(eγ()).\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\leq 4e^{2}\cdot\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln^{2}\left(\frac{e}{\gamma(\bm{\ell})}\right)\right]=\frac{4e^{2}}{\gamma^{*}}\sum_{\ell}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right).

Now in the next lemma, we bound the right hand side of Lemma 6.14 in terms of the second moment of the depth, which immediately proves Lemma 6.10.

Lemma 6.15.

Assume that Tn2LTn\leq 2^{L}. Then, γ()ln2(e/γ())O(1+γL2𝔼[𝐝2])\sum_{\ell}\gamma(\ell)\cdot\ln^{2}\left(e/{\gamma(\ell)}\right)\leq O(1+\gamma^{*}\cdot L^{2}\operatorname*{\mathbb{E}}[\bm{d}^{2}]).

Proof.

By Claim 6.6, and the assumption Tn2LTn\leq 2^{L} each message is of length at most L+log(Tn)2LL+\log(Tn)\leq 2L. We divide \ell into two cases based on γ()\gamma(\ell):

:γ()<23LD()γ()ln2(eγ())\displaystyle\sum_{\ell:\gamma(\ell)<2^{-3L\cdot D(\ell)}}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right)
:γ()<23LD()23LD()ln2(e23LD())\displaystyle\leq\sum_{\ell:\gamma(\ell)<2^{-3L\cdot D(\ell)}}2^{-3L\cdot D(\ell)}\cdot\ln^{2}\left(e\cdot 2^{3L\cdot D(\ell)}\right) (xln2(e/x)x\ln^{2}(e/x) is increasing when 0x0.20\leq x\leq 0.2)
t=123Lt2(9L2t2+1)|{:D()=t}|\displaystyle\leq\sum_{t=1}^{\infty}2^{-3L\cdot t}\cdot 2(9L^{2}t^{2}+1)\cdot\left|\left\{\ell:D(\ell)=t\right\}\right| (since ln2(ab)2ln2(a)+2ln2(b)\ln^{2}(ab)\leq 2\ln^{2}(a)+2\ln^{2}(b))
t=123Lt2(9L2t2+1)2(2L)t\displaystyle\leq\sum_{t=1}^{\infty}2^{-3L\cdot t}\cdot 2(9L^{2}t^{2}+1)\cdot 2^{(2L)\cdot t} (each message is of length 2L\leq 2L)
t=12(9L2t2+1)2Lt=O(1)\displaystyle\leq\sum_{t=1}^{\infty}2(9L^{2}t^{2}+1)\cdot 2^{-Lt}=O(1) (since L2L\geq 2)

and

:γ()23LD()γ()ln2(eγ())\displaystyle\sum_{\ell:\gamma(\ell)\geq 2^{-3L\cdot D(\ell)}}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right) :γ()23LD()γ()ln2(e23LD())\displaystyle\leq\sum_{\ell:\gamma(\ell)\geq 2^{-3L\cdot D(\ell)}}\gamma(\ell)\cdot\ln^{2}\left(e\cdot 2^{3L\cdot D(\ell)}\right)
29L2γ()D()2+2γ()\displaystyle\leq 2\cdot 9L^{2}\sum_{\ell}\gamma(\ell)D(\ell)^{2}+2\sum_{\ell}\gamma(\ell)
=18L2γ𝔼[D()2]+2\displaystyle=18L^{2}\gamma^{*}\cdot\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[D(\bm{\ell})^{2}\right]+2
=18L2γ𝔼[𝒅2]+2.\displaystyle=18L^{2}\gamma^{*}\cdot\operatorname*{\mathbb{E}}\left[\bm{d}^{2}\right]+2.

Adding up the two estimates above gives the desired bound. ∎

6.5 Second Moment Bounds for the Depth (Proof of Lemma 6.11)

The final ingredient is an estimate for the second moment 𝔼[𝒅2]\operatorname*{\mathbb{E}}[\bm{d}^{2}]. This subsection is devoted to this goal and proving Lemma 6.11.

For messages =(𝒄¯(1),,𝒄¯(t))\ell^{\prime}=(\overline{\bm{c}}^{(1)},\ldots,\overline{\bm{c}}^{(t)}), we define γ()=γ(𝑿(t))γ(𝒀(t))\gamma(\ell^{\prime})=\gamma(\bm{X}^{(t)})\cdot\gamma(\bm{Y}^{(t)}) where 𝑿(t),𝒀(t)\bm{X}^{(t)},\bm{Y}^{(t)} is defined by the protocol using the messages \ell^{\prime}. Note that this definition is consistent with γ()\gamma(\ell) from Subsection 6.4 for a leaf \ell.

Lemma 6.16.

There exists a universal constant α>0\alpha>0 such that the following holds. Let 0d1<d20\leq d_{1}<d_{2} be two arbitrary integers with d2d12d+1d_{2}-d_{1}\geq 2d+1. Let =(𝐜¯(1),,𝐜¯(d1))\ell^{*}=(\overline{\bm{c}}^{(1)},\ldots,\overline{\bm{c}}^{(d_{1})}) be arbitrary messages of the first d1d_{1} communication steps. Assume 2L8n4T22^{L}\geq 8n^{4}T^{2}. Then

𝐏𝐫[𝒅d2|]αd22L2λ(d2d12d)+1423Ld1γ().\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]\leq\frac{\alpha\cdot d_{2}^{2}L^{2}}{\lambda\cdot(d_{2}-d_{1}-2d)}+\frac{1}{4}\cdot\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}.
Proof.

Let 𝒙,𝒚\bm{x},\bm{y} be sampled from γ\gamma conditioned on 𝒙𝑿(0),𝒚𝒀(0)\bm{x}\in\bm{X}^{(0)},\bm{y}\in\bm{Y}^{(0)}. Let \bm{\ell} be its corresponding leaf in 𝒞¯\overline{\mathcal{C}} and 𝒅\bm{d} be the depth of \bm{\ell}. By Claim 6.7, \bm{\ell} always has finite depth. We extend 𝒂(t)=𝒃(t)=0n×n\bm{a}^{(t)}=\bm{b}^{(t)}=0^{n\times n} and 𝑿(t)=𝑿(𝒅),𝒀(t)=𝒀(𝒅)\bm{X}^{(t)}=\bm{X}^{(\bm{d})},\bm{Y}^{(t)}=\bm{Y}^{(\bm{d})} for all t>𝒅t>\bm{d}. Then define

𝒌(𝒙,𝒚)=t=d1+1d2(𝒙𝒙,𝒂(t)2+𝒚𝒚,𝒃(t)2)andK=𝔼𝒙,𝒚γ[𝒌(𝒙,𝒚)|],\bm{k}(\bm{x},\bm{y})=\sum_{t=d_{1}+1}^{d_{2}}\left(\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},\bm{b}^{(t)}\right\rangle^{2}\right)\quad\text{and}\quad K=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}(\bm{x},\bm{y})\,\middle|\,\ell^{*}\right],

where 𝒂()\bm{a}^{(\cdot)}’s and 𝒃()\bm{b}^{(\cdot)}’s depend only on \bm{\ell}.131313Note that \bm{\ell} specifies all the communication messages, which allows us to simulate the protocol and obtain each 𝒂()\bm{a}^{(\cdot)} and 𝒃()\bm{b}^{(\cdot)}. Equivalently, we can write KK as

K=𝔼𝒙,𝒚γ[𝒌(𝒙,𝒚)|(𝒙,𝒚)X(d1)×Y(d1)],K=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}(\bm{x},\bm{y})\,\middle|\,(\bm{x},\bm{y})\in X^{(d_{1})}\times Y^{(d_{1})}\right],

where X(d1)X^{(d_{1})} and Y(d1)Y^{(d_{1})} are fixed due to \ell^{*}.

Observe that for any fixed td1t\geq d_{1}, 𝑿(t)×𝒀(t)\bm{X}^{(t)}\times\bm{Y}^{(t)} induced by different \bm{\ell}, conditioned on \ell^{*}, is a disjoint partition of X(d1)×Y(d1)X^{(d_{1})}\times Y^{(d_{1})}. Therefore sampling 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma conditioned on (𝒙,𝒚)X(d1)×Y(d1)(\bm{x},\bm{y})\in X^{(d_{1})}\times Y^{(d_{1})} is equivalent to

  • first sample random messages =(𝒄¯(d1+1),,𝒄¯(t))\bm{\ell}^{\prime}=(\overline{\bm{c}}^{(d_{1}+1)},\ldots,\overline{\bm{c}}^{(t)}) conditioned on \ell^{*},

  • then sample 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma conditioned on (𝒙,𝒚)𝑿(t)×𝒀(t)(\bm{x},\bm{y})\in\bm{X}^{(t)}\times\bm{Y}^{(t)} given \bm{\ell}^{\prime}.

Note that we can further expand \bm{\ell}^{\prime} to a leaf \bm{\ell} as a full communication path, and obtain the following equivalent sampling process:

  • Sample a random leaf \bm{\ell} conditioned on \ell^{*}.

  • Sample 𝒙,𝒚γ\bm{x},\bm{y}\sim\gamma conditioned on (𝒙,𝒚)𝑿(t)×𝒀(t)(\bm{x},\bm{y})\in\bm{X}^{(t)}\times\bm{Y}^{(t)} defined by the first tt messages of \bm{\ell}.

As a result, we have

K\displaystyle K =t=d1+1d2𝔼[𝔼𝒙,𝒚γ[𝒙𝒙,𝒂(t)2+𝒚𝒚,𝒃(t)2|(𝒙,𝒚)𝑿(t)×𝒀(t)]|]\displaystyle=\sum_{t=d_{1}+1}^{d_{2}}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},\bm{b}^{(t)}\right\rangle^{2}\,\middle|\,(\bm{x},\bm{y})\in\bm{X}^{(t)}\times\bm{Y}^{(t)}\right]\,\middle|\,\ell^{*}\right]
=𝔼[t=d1+1d2𝔼𝒙γ[𝒙𝒙,𝒂(t)2|𝒙𝑿(t)]+𝔼𝒚γ[𝒚𝒚,𝒃(t)2|𝒚𝒀(t)]|].\displaystyle=\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\sum_{t=d_{1}+1}^{d_{2}}\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right]+\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},\bm{b}^{(t)}\right\rangle^{2}\,\middle|\,\bm{y}\in\bm{Y}^{(t)}\right]\,\middle|\,\ell^{*}\right].

Observe that there are at most 2d2d many step 3(a) and 3(b) in \bm{\ell}. This means, if 𝒅d2\bm{d}\geq d_{2}, then from the (d1+1)(d_{1}+1)-th to the d2d_{2}-th communication steps, there are at least d2d12dd_{2}-d_{1}-2d cleanup steps (i.e., step 3(c)), each of which contributes at least λ\lambda to KK. Thus we can lower bound KK by

Kλ(d2d12d)𝐏𝐫[𝒅d2|].K\geq\lambda\cdot(d_{2}-d_{1}-2d)\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]. (6.12)

On the other hand by Claim 6.7, there are at most n2n^{2} non-zero 𝒂()\bm{a}^{(\cdot)}’s and at most n2n^{2} non-zero 𝒃()\bm{b}^{(\cdot)}’s in each communication path. Thus

𝒌(𝒙,𝒚)n2(maxx𝑿(0)xx2+maxy𝒀(0)yy2)<2n4T2.\bm{k}(\bm{x},\bm{y})\leq n^{2}\cdot\left(\max_{x\in\bm{X}^{(0)}}\left\|x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right\|^{2}+\max_{y\in\bm{Y}^{(0)}}\left\|y\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y\right\|^{2}\right)<2n^{4}T^{2}. (6.13)

We now obtain another upper bound using Theorem 3.3. Let ¯=(𝒄¯(1),,𝒄¯(d2))\overline{\bm{\ell}}=(\overline{\bm{c}}^{(1)},\ldots,\overline{\bm{c}}^{(d_{2})}) extend \ell^{*} for the next d2d1d_{2}-d_{1} messages.141414If ¯\overline{\bm{\ell}} becomes a leaf before d2d_{2}, then we can simply pad dummy messages to it. Then K=𝔼¯[𝒌(¯)|]K=\operatorname*{\mathbb{E}}_{\overline{\bm{\ell}}}\left[\bm{k}(\overline{\bm{\ell}})\,\middle|\,\ell^{*}\right] where 𝒌(¯):=𝔼𝒙,𝒚γ[𝒌(𝒙,𝒚)|¯]\bm{k}(\overline{\ell}):=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}(\bm{x},\bm{y})\,\middle|\,\overline{\ell}\right]. Note that ¯\overline{\ell} fixes a()a^{(\cdot)}’s and b()b^{(\cdot)}’s in 𝒌(𝒙,𝒚)\bm{k}(\bm{x},\bm{y}). Therefore we use 𝒌¯(𝒙,𝒚)\bm{k}_{\overline{\ell}}(\bm{x},\bm{y}) to denote 𝒌(𝒙,𝒚)\bm{k}(\bm{x},\bm{y}) with the directions a()a^{(\cdot)}’s and b()b^{(\cdot)}’s fixed by ¯\overline{\ell}. We now continue the bound on 𝒌(¯)\bm{k}(\overline{\ell}):

𝒌(¯)\displaystyle\bm{k}(\overline{\ell}) t=0𝐏𝐫𝒙,𝒚γ[𝒌¯(𝒙,𝒚)t|¯]=t=0𝐏𝐫𝒙,𝒚γ[𝒌¯(𝒙,𝒚)t,¯]𝐏𝐫𝒙,𝒚γ[¯]\displaystyle\leq\sum_{t=0}^{\infty}\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\,\middle|\,\overline{\ell}\right]=\sum_{t=0}^{\infty}\frac{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t,\overline{\ell}\right]}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\overline{\ell}\right]}
=t=0min{1,𝐏𝐫𝒙,𝒚γ[𝒌¯(𝒙,𝒚)t,¯]γ(¯)}\displaystyle=\sum_{t=0}^{\infty}\min\left\{1,\frac{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t,\overline{\ell}\right]}{\gamma(\overline{\ell})}\right\} (by the definition of γ()\gamma(\cdot))
t=0min{1,𝐏𝐫𝒙,𝒚γ[𝒌¯(𝒙,𝒚)t]γ(¯)}.\displaystyle\leq\sum_{t=0}^{\infty}\min\left\{1,\frac{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right]}{\gamma(\overline{\ell})}\right\}. (6.14)

We now analyze 𝐏𝐫𝒙,𝒚γ[𝒌¯(𝒙,𝒚)t]\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right] using Theorem 3.3. Since a(t),b(t)a^{(t)},b^{(t)} cannot be non-zero simultaneously, we rearrange the matrices and assume a(d1+1),,a(d),b(d+1),,b(d′′)a^{(d_{1}+1)},\ldots,a^{(d^{\prime})},b^{(d^{\prime}+1)},\ldots,b^{(d^{\prime\prime})} are the only non-zero matrices where d′′d2d^{\prime\prime}\leq d_{2}. Then

𝒌¯(𝒙,𝒚)=t=d1+1d𝒙𝒙,a(t)2+t=d+1d′′𝒚𝒚,b(t)2.\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})=\sum_{t=d_{1}+1}^{d^{\prime}}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},a^{(t)}\right\rangle^{2}+\sum_{t=d^{\prime}+1}^{d^{\prime\prime}}\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},b^{(t)}\right\rangle^{2}.

Note that aa’s (resp., bb’s) satisfy the condition in Theorem 3.3. Let 1/κ1/\kappa be the constant151515In particular κ=56448\kappa=56448 suffices from our proof in Appendix B. in Ω\Omega in Theorem 3.3. Hence

𝐏𝐫[𝒌¯(𝒙,𝒚)t]\displaystyle\operatorname*{\mathbf{Pr}}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right] 𝐏𝐫[t=d1+1d𝒙𝒙,a(t)2t/2]+𝐏𝐫[t=d+1d′′𝒚𝒚,b(t)2t/2]\displaystyle\leq\operatorname*{\mathbf{Pr}}\left[\sum_{t=d_{1}+1}^{d^{\prime}}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},a^{(t)}\right\rangle^{2}\geq t/2\right]+\operatorname*{\mathbf{Pr}}\left[\sum_{t=d^{\prime}+1}^{d^{\prime\prime}}\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},b^{(t)}\right\rangle^{2}\geq t/2\right]
2exp{1κt/2dd1+t/2}+2exp{1κt/2d′′d+t/2}\displaystyle\leq 2\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d^{\prime}-d_{1}+\sqrt{t/2}}\right\}+2\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d^{\prime\prime}-d^{\prime}+\sqrt{t/2}}\right\} (by Theorem 3.3 and assuming t196max{dd1,d′′d}t\geq 196\cdot\max\left\{d^{\prime}-d_{1},d^{\prime\prime}-d^{\prime}\right\})
4exp{1κt/2d2d1+t/2}.\displaystyle\leq 4\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d_{2}-d_{1}+\sqrt{t/2}}\right\}. (since d1dd′′d2d_{1}\leq d^{\prime}\leq d^{\prime\prime}\leq d_{2})

Thus for any t196(d2d1)196max{dd1,d′′d}t\geq 196\cdot(d_{2}-d_{1})\geq 196\cdot\max\left\{d^{\prime}-d_{1},d^{\prime\prime}-d^{\prime}\right\}, we have

𝐏𝐫[𝒌¯(𝒙,𝒚)t]4exp{1κt/2d2d1+t/2}.\operatorname*{\mathbf{Pr}}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right]\leq 4\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d_{2}-d_{1}+\sqrt{t/2}}\right\}. (6.15)

For γ(¯)23Ld2\gamma(\overline{\ell})\geq 2^{-3L\cdot d_{2}}, we plug Equation 6.15 into Equation 6.14 and obtain

𝒌(¯)\displaystyle\bm{k}(\overline{\ell}) t=0196(d2d1)21+t>196(d2d1)2min{1,23Ld2+1exp{1κt/2d2d1+t/2}}\displaystyle\leq\sum_{t=0}^{196\cdot(d_{2}-d_{1})^{2}}1+\sum_{t>196\cdot(d_{2}-d_{1})^{2}}\min\left\{1,2^{3L\cdot d_{2}+1}\cdot\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d_{2}-d_{1}+\sqrt{t/2}}\right\}\right\} (by Equation 6.15)
196(d2d1)2+1+t196(d2d1)2min{1,23Ld2+1e1κt/22t/2}\displaystyle\leq 196\cdot(d_{2}-d_{1})^{2}+1+\sum_{t\geq 196\cdot(d_{2}-d_{1})^{2}}\min\left\{1,2^{3L\cdot d_{2}+1}\cdot e^{-\frac{1}{\kappa}\cdot\frac{t/2}{2\sqrt{t/2}}}\right\}
197d22+t1min{1,23Ld2+1et/22κ}\displaystyle\leq 197\cdot d_{2}^{2}+\sum_{t\geq 1}\min\left\{1,2^{3L\cdot d_{2}+1}\cdot e^{-\frac{\sqrt{t/2}}{2\kappa}}\right\}
αd22L2,\displaystyle\leq\alpha\cdot d_{2}^{2}L^{2}, (6.16)

where α\alpha is another universal constant. Now we have

K=𝔼¯[𝒌(¯)|]=¯γ(¯)γ()𝒌(¯)=¯:γ(¯)<23Ld2γ(¯)γ()𝒌(¯)+¯:γ(¯)23Ld2γ(¯)γ()𝒌(¯),K=\operatorname*{\mathbb{E}}_{\overline{\bm{\ell}}}\left[\bm{k}(\overline{\bm{\ell}})\,\middle|\,\ell^{*}\right]=\sum_{\overline{\ell}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})=\sum_{\overline{\ell}:\gamma(\overline{\ell})<2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})+\sum_{\overline{\ell}:\gamma(\overline{\ell})\geq 2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell}),

where the first summation can be bounded by

¯:γ(¯)<23Ld2γ(¯)γ()𝒌(¯)\displaystyle\sum_{\overline{\ell}:\gamma(\overline{\ell})<2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell}) 23Ld1γ()¯23L(d2d1)n4T2\displaystyle\leq\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot\sum_{\overline{\ell}}2^{-3L\cdot(d_{2}-d_{1})}\cdot n^{4}T^{2} (by Equation 6.13)
23Ld1γ()22L(d2d1)23L(d2d1)n4T2\displaystyle\leq\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot 2^{2L\cdot(d_{2}-d_{1})}\cdot 2^{-3L\cdot(d_{2}-d_{1})}\cdot n^{4}T^{2} (since \ell^{*} is fixed and each message is at most 2L2L bits)
=23Ld1γ()2n4T22L\displaystyle=\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot\frac{2n^{4}T^{2}}{2^{L}} (since d2d11d_{2}-d_{1}\geq 1)

and the second summation is bounded by

¯:γ(¯)23Ld2γ(¯)γ()𝒌(¯)¯γ(¯)γ()αd22L2=αd22L2.\sum_{\overline{\ell}:\gamma(\overline{\ell})\geq 2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})\leq\sum_{\overline{\ell}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\alpha\cdot d_{2}^{2}L^{2}=\alpha\cdot d_{2}^{2}L^{2}. (by Equation 6.16)

Then combining Equation 6.12, we have

λ(d2d12d)𝐏𝐫[𝒅d2|]αd22L2+23Ld1γ()2n4T22L.\lambda\cdot(d_{2}-d_{1}-2d)\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]\leq\alpha\cdot d_{2}^{2}L^{2}+\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot\frac{2n^{4}T^{2}}{2^{L}}.

Assume 2L8n4T22^{L}\geq 8n^{4}T^{2} and d2d12d+1d_{2}-d_{1}\geq 2d+1. Then

𝐏𝐫[𝒅d2|]αd22L2λ(d2d12d)+1423Ld1γ().\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]\leq\frac{\alpha\cdot d_{2}^{2}L^{2}}{\lambda\cdot(d_{2}-d_{1}-2d)}+\frac{1}{4}\cdot\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}.
Corollary 6.17.

Assume γ3/4\gamma^{*}\geq 3/4, TnT\leq n, LΘ(log(n))L\geq\Theta(\log(n)), and λΘ(dL2log2(n))\lambda\geq\Theta(dL^{2}\log^{2}(n)). Then for each k=0,1,,4log(n)k=0,1,\ldots,4\log(n), we have

𝐏𝐫[𝒅4kd]2k+kn5.\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\right]\leq 2^{-k}+\frac{k}{n^{5}}.
Proof.

We prove the bound by induction on kk. The base case k=0k=0 is trivial. For the inductive case, let \ell^{*} be the first 4(k1)d4(k-1)d communication messages. Then we bound

P:=:γ()/γ<23L4(k1)dγ()γ𝐏𝐫[𝒅4kd|]P:=\sum_{\ell^{*}:\gamma(\ell^{*})/\gamma^{*}<2^{-3L\cdot 4(k-1)d}}\frac{\gamma(\ell^{*})}{\gamma^{*}}\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\,\middle|\,\ell^{*}\right]

and

Q:=:γ()/γ23L4(k1)dγ()γ𝐏𝐫[𝒅4kd|]Q:=\sum_{\ell^{*}:\gamma(\ell^{*})/\gamma^{*}\geq 2^{-3L\cdot 4(k-1)d}}\frac{\gamma(\ell^{*})}{\gamma^{*}}\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\,\middle|\,\ell^{*}\right]

separately.

For PP, observe that if k=1k=1 then \ell^{*} is root of the protocol, thus γ()=γ\gamma(\ell^{*})=\gamma^{*} and P=0P=0. On the other hand, if k2k\geq 2, then

P\displaystyle P :γ()/γ<23L4(k1)d23L4(k1)d23L4(k1)d\displaystyle\leq\sum_{\ell^{*}:\gamma(\ell^{*})/\gamma^{*}<2^{-3L\cdot 4(k-1)d}}2^{-3L\cdot 4(k-1)d}\leq\sum_{\ell^{*}}2^{-3L\cdot 4(k-1)d}
22L4(k1)d23L4(k1)d\displaystyle\leq 2^{2L\cdot 4(k-1)d}\cdot 2^{-3L\cdot 4(k-1)d} (each communication message is at most 2L2L bits)
=2L4(k1)dn5.\displaystyle=2^{-L\cdot 4(k-1)d}\leq n^{-5}. (since k2k\geq 2 and LΘ(log(n))L\geq\Theta(\log(n)))

Now we turn to QQ. Applying Lemma 6.16 with \ell^{*} and d1=4(k1)d,d2=4kdd_{1}=4(k-1)d,d_{2}=4kd, we have

Q\displaystyle Q :γ()/γ23L4(k1)dγ()γ(16αk2d2L22dR+1423L4(k1)dγ())\displaystyle\leq\sum_{\ell^{*}:\gamma(\ell^{*})/\gamma^{*}\geq 2^{-3L\cdot 4(k-1)d}}\frac{\gamma(\ell^{*})}{\gamma^{*}}\cdot\left(\frac{16\alpha\cdot k^{2}d^{2}L^{2}}{2dR}+\frac{1}{4}\cdot\frac{2^{-3L\cdot 4(k-1)d}}{\gamma(\ell^{*})}\right)
γ()γ(8αk2dL2λ+14γ)\displaystyle\leq\sum_{\ell^{*}}\frac{\gamma(\ell^{*})}{\gamma^{*}}\cdot\left(\frac{8\alpha\cdot k^{2}dL^{2}}{\lambda}+\frac{1}{4\gamma^{*}}\right)
=𝐏𝐫[𝒅4(k1)d](8αk2dL2λ+14γ)\displaystyle=\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4(k-1)d\right]\cdot\left(\frac{8\alpha\cdot k^{2}dL^{2}}{\lambda}+\frac{1}{4\gamma^{*}}\right)
𝐏𝐫[𝒅4(k1)d]12\displaystyle\leq\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4(k-1)d\right]\cdot\frac{1}{2} (since γ3/4\gamma^{*}\geq 3/4 and λΘ(dL2log2(n)),k4log(n)\lambda\geq\Theta(dL^{2}\log^{2}(n)),k\leq 4\log(n))
(2(k1)+k1n5)122k+k1n5.\displaystyle\leq\left(2^{-(k-1)}+\frac{k-1}{n^{5}}\right)\cdot\frac{1}{2}\leq 2^{-k}+\frac{k-1}{n^{5}}. (by induction hypothesis)

By adding up PP and QQ, we complete the induction. ∎

Given Corollary 6.17 and suitable choice of the parameters, we now prove the second moment bound.

Proof of Lemma 6.11.

With L=Θ(log(n))L=\Theta(\log(n)), T=Θ(log(n))T=\Theta(\sqrt{\log(n)}), and λ=Θ(dlog4(n))\lambda=\Theta(d\log^{4}(n)), by Fact 6.3, we have γ3/4\gamma^{*}\geq 3/4. Therefore the second moment of 𝒅\bm{d} is

𝔼[𝒅2]\displaystyle\operatorname*{\mathbb{E}}[\bm{d}^{2}] k=04log(n)(4(k+1)d)2𝐏𝐫[𝒅4kd]+𝐏𝐫[𝒅16dlog(n)](2n2)2\displaystyle\leq\sum_{k=0}^{4\log(n)}\left(4(k+1)d\right)^{2}\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\right]+\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 16d\log(n)\right]\cdot(2n^{2})^{2} (by Claim 6.7)
k=04log(n)(4(k+1)d)2(2k+kn5)+(n4+4log(n)n5)(2n2)2\displaystyle\leq\sum_{k=0}^{4\log(n)}\left(4(k+1)d\right)^{2}\cdot\left(2^{-k}+\frac{k}{n^{5}}\right)+\left(n^{-4}+\frac{4\log(n)}{n^{5}}\right)\cdot(2n^{2})^{2} (by Corollary 6.17)
=O(d2).\displaystyle=O(d^{2}).

7 Fourier Growth Reductions For General Gadgets

In this section, we show that Fourier growth bounds of communication protocols for general (constant-sized) gadgets can be reduced to the bounds of XOR-fiber, and vice versa. This implies that in the study of Fourier growth, they are all equivalent.

Let m1,m2m_{1},m_{2} be two positive integers. Let g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} be a gadget. Recall that ν\nu is the uniform distribution over {±1}n\{\pm 1\}^{n}. We now use ν1,ν2,ν¯1,ν¯2\nu_{1},\nu_{2},\overline{\nu}_{1},\overline{\nu}_{2} to denote the uniform distributions over {±1}m1,{±1}m2,({±1}m1)n,({±1}m2)n\{\pm 1\}^{m_{1}},\{\pm 1\}^{m_{2}},(\{\pm 1\}^{m_{1}})^{n},(\{\pm 1\}^{m_{2}})^{n} respectively. We define the gg-fiber of communication protocols similar to the XOR-fiber:

Definition 7.1.

For any randomized two-party protocol 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1], its gg-fiber, denoted by 𝒞g:{±1}n[1,1]\mathcal{C}_{\downarrow g}\colon\{\pm 1\}^{n}\to[-1,1], is defined by

𝒞g(z)=𝔼𝒙ν¯1,𝒚ν¯2[𝒞(𝒙,𝒚)|g(𝒙i,𝒚i)=zi,i],\mathcal{C}_{\downarrow g}(z)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=z_{i},~{}\forall i\right],

where the expectation is also over the internal randomness of 𝒞\mathcal{C}.

To compare the Fourier growth bounds between gadgets, we use L1,k(g,d,m1,m2,n)L_{1,k}(g,d,m_{1},m_{2},n) to denote the upper bound of the level-kk Fourier growth for the gg-fiber of an arbitrary randomized communication protocol 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1] with at most dd bits of communication, where g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} is the gadget. Since randomized protocols are convex combinations of deterministic protocols of the same cost, using this notation, our main results Theorems 1.2 and 1.3 can be rephrased as

L1,1(XOR,d,1,1,n)O(d)andL1,2(XOR,d,1,1,n)O(d3/2log3(n)).L_{1,1}(\mathrm{XOR},d,1,1,n)\leq O\left(\sqrt{d}\right)\quad\text{and}\quad L_{1,2}(\mathrm{XOR},d,1,1,n)\leq O\left(d^{3/2}\log^{3}(n)\right).

For any set S[m1]S\subseteq[m_{1}], define xS=iSxix_{S}=\prod_{i\in S}x_{i}, and similarly for yTy_{T} with T[m2]T\subseteq[m_{2}]. Similar to the standard Fourier representation of Boolean functions, the gadget gg, which is a two-party function, also has Fourier representation:

g(x,y)=S[m1],T[m2]g^(S,T)xSyT,whereg^(S,T)=𝔼𝒙ν1,𝒚ν2[g(𝒙,𝒚)𝒙S𝒚T].g(x,y)=\sum_{S\subseteq[m_{1}],T\subseteq[m_{2}]}\widehat{g}(S,T)\cdot x_{S}y_{T},\quad\text{where}\quad\widehat{g}(S,T)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\nu_{1},\bm{y}\sim\nu_{2}}\left[g(\bm{x},\bm{y})\cdot\bm{x}_{S}\bm{y}_{T}\right].

For convenience, we will assume gg satisfies the following assumption. It’s easy to see that the XOR gadget satisfies it.

Assumption 7.2.

g^(S,T)=0\widehat{g}(S,T)=0 if S=S=\emptyset or T=T=\emptyset.

Remark 7.3.

This assumption is equivalent to the fact that, restricted on any input to Alice’s side, the remaining function on Bob’s side is balanced, and vice versa.

Even if gg does not satisfy the assumption, then we can embed it inside a similar gadget g:{±1}m1+1×{±1}m2+1{±1}g^{\prime}\colon\{\pm 1\}^{m_{1}+1}\times\{\pm 1\}^{m_{2}+1}\to\{\pm 1\}, where we XOR the last bit of Alice and the last bit of Bob to the old gadget gg applied to Alice’s first m1m_{1} bits and Bob’s first m2m_{2} bits, i.e.,

g(x,y)=xm1+1ym2+1g(xm1,ym2).g^{\prime}(x,y)=x_{m_{1}+1}y_{m_{2}+1}\cdot g(x_{\leq m_{1}},y_{\leq m_{2}}).

Then gg^{\prime} satisfies the assumption and inherits most properties of gg sufficient for studies in communication complexity tasks.

Now for a protocol 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1], it is also a two-party function and thus admitting similar Fourier representation. We view an input from ({±1}m1)n(\{\pm 1\}^{m_{1}})^{n} as indexed by a tuple in [n]×[m1][n]\times[m_{1}]. Therefore any subset of ({±1}m1)n(\{\pm 1\}^{m_{1}})^{n} is uniquely identified as i[n]{i}×Si\bigcup_{i\in[n]}\left\{i\right\}\times S_{i}, where each Si[m1]S_{i}\subseteq[m_{1}]. We use S[n]S^{[n]} to denote (Si)i[n](S_{i})_{i\in[n]}. Thus the Fourier coefficients of 𝒞\mathcal{C} can be written as

𝒞^(S[n],T[n]):=𝒞^(i[n]{i}×Si,i[n]{i}×Ti),\widehat{\mathcal{C}}(S^{[n]},T^{[n]}):=\widehat{\mathcal{C}}\left(\bigcup_{i\in[n]}\left\{i\right\}\times S_{i},\bigcup_{i\in[n]}\left\{i\right\}\times T_{i}\right),

and the Fourier representation of 𝒞\mathcal{C} is

𝒞(x,y)=S[n],J[n]𝒞^(S[n],T[n])i[n]xi,Sij[n]yj,Tj,\mathcal{C}(x,y)=\sum_{S^{[n]},J^{[n]}}\widehat{\mathcal{C}}(S^{[n]},T^{[n]})\cdot\prod_{i\in[n]}x_{i,S_{i}}\cdot\prod_{j\in[n]}y_{j,T_{j}},

where xi,S=jSxi,jx_{i,S}=\prod_{j\in S}x_{i,j} and similar for yj,Ty_{j,T}.

Under this notation and assuming Assumption 7.2, we can effectively compute the Fourier coefficients of any gg-fiber.

Fact 7.4.

Assume gadget g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} satisfies Assumption 7.2. Then we have

𝒞g^(I)=SI,TISi,Ti,iI𝒞^(SI,TI)iIg^(Si,Ti)for any I[n],\widehat{\mathcal{C}_{\downarrow g}}(I)=\sum_{\begin{subarray}{c}S^{I},T^{I}\\ S_{i}\neq\emptyset,T_{i}\neq\emptyset,\forall i\in I\end{subarray}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i})\quad\text{for any $I\subseteq[n]$,}

where we use SIS^{I} to denote S[n]S^{[n]} with SjS_{j} fixed to \emptyset for all jIj\notin I.

Proof.

Observe that

𝒞g^(I)\displaystyle\widehat{\mathcal{C}_{\downarrow g}}(I) =𝔼𝒛ν[𝒞g(𝒛)iI𝒛i]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}\left[\mathcal{C}_{\downarrow g}(\bm{z})\cdot\prod_{i\in I}\bm{z}_{i}\right]
=𝔼𝒛ν[𝔼𝒙ν¯1,𝒚ν¯2[𝒞(𝒙,𝒚)|g(𝒙i,𝒚i)=𝒛i,i]iI𝒛i]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}\left[\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=\bm{z}_{i},~{}\forall i\right]\cdot\prod_{i\in I}\bm{z}_{i}\right]
=𝔼𝒛ν[𝔼𝒙ν¯1,𝒚ν¯2[𝒞(𝒙,𝒚)iIg(𝒙i,𝒚i)|g(𝒙i,𝒚i)=𝒛i,i]].\displaystyle=\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}\left[\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\cdot\prod_{i\in I}g(\bm{x}_{i},\bm{y}_{i})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=\bm{z}_{i},~{}\forall i\right]\right].

Since g^(,)=0\widehat{g}(\emptyset,\emptyset)=0 by Assumption 7.2, every pair (x,y)(x,y) is sampled with the same probability under the conditional distribution. Thus we get

𝒞g^(I)=𝔼𝒙ν¯1,𝒚ν¯2[𝒞(𝒙,𝒚)iIg(𝒙i,𝒚i)].\widehat{\mathcal{C}_{\downarrow g}}(I)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\cdot\prod_{i\in I}g(\bm{x}_{i},\bm{y}_{i})\right].

Now we expand 𝒞\mathcal{C} and gg in the Fourier basis and obtain

𝒞g^(I)\displaystyle\widehat{\mathcal{C}_{\downarrow g}}(I) =𝔼𝒙ν¯1,𝒚ν¯2[(S[n],T[n]𝒞^(S[n],T[n])i[n]𝒙i,Sij[n]𝒚j,Tj)iI(Si,Tig^(Si,Ti)𝒙i,Si𝒚i,Ti)]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\left(\sum_{S^{[n]},T^{[n]}}\widehat{\mathcal{C}}(S^{[n]},T^{[n]})\prod_{i\in[n]}\bm{x}_{i,S_{i}}\prod_{j\in[n]}\bm{y}_{j,T_{j}}\right)\cdot\prod_{i\in I}\left(\sum_{S_{i},T_{i}}\widehat{g}(S_{i},T_{i})\bm{x}_{i,S_{i}}\bm{y}_{i,T_{i}}\right)\right]
=𝔼𝒙ν¯1,𝒚ν¯2[(S[n],T[n]𝒞^(S[n],T[n])i[n]𝒙i,Sij[n]𝒚j,Tj)(SI,TIiIg^(Si,Ti)𝒙i,Si𝒚i,Ti)]\displaystyle=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\left(\sum_{S^{[n]},T^{[n]}}\widehat{\mathcal{C}}(S^{[n]},T^{[n]})\prod_{i\in[n]}\bm{x}_{i,S_{i}}\prod_{j\in[n]}\bm{y}_{j,T_{j}}\right)\left(\sum_{S^{I},T^{I}}\prod_{i\in I}\widehat{g}(S_{i},T_{i})\bm{x}_{i,S_{i}}\bm{y}_{i,T_{i}}\right)\right]
=SI,TI𝒞^(SI,TI)iIg^(Si,Ti)\displaystyle=\sum_{S^{I},T^{I}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i})
=SI,TISi,Ti,iI𝒞^(SI,TI)iIg^(Si,Ti),\displaystyle=\sum_{\begin{subarray}{c}S^{I},T^{I}\\ S_{i}\neq\emptyset,T_{i}\neq\emptyset,\forall i\in I\end{subarray}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i}), (by Assumption 7.2)

as desired. ∎

Now we present the reduction from XOR-fiber to a general gg-fiber.

Theorem 7.5.

Assume gadget g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} satisfies Assumption 7.2. Then

L1,k(XOR,d,1,1,n)\displaystyle L_{1,k}(\mathrm{XOR},d,1,1,n) (maxS,T|g^(S,T)|)kL1,k(g,d,m1,m2,n)\displaystyle\leq\left(\max_{S,T}|\widehat{g}(S,T)|\right)^{-k}\cdot L_{1,k}(g,d,m_{1},m_{2},n)
2(m1+m2)k/2L1,k(g,d,m1,m2,n).\displaystyle\leq 2^{(m_{1}+m_{2})\cdot k/2}\cdot L_{1,k}(g,d,m_{1},m_{2},n).
Proof.

Let 𝒞:{±1}n×{±1}n[1,1]\mathcal{C}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1] be an arbitrary protocol of cost at most dd. Then for a fixed set I[n]I\subseteq[n], by Fact 7.4 applied to the XOR gadget, we have

𝒞XOR^(I)=𝒞^(1I,1I).\widehat{\mathcal{C}_{\downarrow\mathrm{XOR}}}(I)=\widehat{\mathcal{C}}(1^{I},1^{I}). (7.1)

Let S[m1]S\subseteq[m_{1}] and T[m2]T\subseteq[m_{2}] maximize |g^(S,T)||\widehat{g}(S,T)|. Since gg satisfies Assumption 7.2, we know SS and TT are not empty sets.

Now define a different protocol 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}^{\prime}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1] as follows: After receiving input xx, Alice computes xi=xi,Sx^{\prime}_{i}=x_{i,S} for each block xix_{i}; and Bob computes similarly yi=yi,Ty^{\prime}_{i}=y_{i,T} upon receiving input yy. Then they execute the protocol 𝒞\mathcal{C} on xx^{\prime} and yy^{\prime}. That is, 𝒞(x,y)=𝒞(x,y)\mathcal{C}^{\prime}(x,y)=\mathcal{C}(x^{\prime},y^{\prime}). Therefore, for any I[n]I\subseteq[n] and SI,TIS^{I},T^{I} satisfying Si,TiS_{i}\neq\emptyset,T_{i}\neq\emptyset for iIi\in I, we have

𝒞^(SI,TI)={𝒞^(1I,1I)Si=S,Ti=T,iI,0otherwise.\widehat{\mathcal{C}^{\prime}}(S^{I},T^{I})=\begin{cases}\widehat{\mathcal{C}}(1^{I},1^{I})&S_{i}=S,T_{i}=T,~{}\forall i\in I,\\ 0&\text{otherwise.}\end{cases}

Then by Equation 7.1 and Fact 7.4 applied to 𝒞\mathcal{C}^{\prime} with gadget gg, we have

𝒞g^(I)=𝒞^(1I,1I)g^(S,T)|I|=𝒞XOR^(I)g^(S,T)|I|.\widehat{\mathcal{C}_{\downarrow g}^{\prime}}(I)=\widehat{\mathcal{C}}(1^{I},1^{I})\cdot\widehat{g}(S,T)^{|I|}=\widehat{\mathcal{C}_{\downarrow\mathrm{XOR}}}(I)\cdot\widehat{g}(S,T)^{|I|}.

Now summing over all I[n]I\subseteq[n] of size kk, we have

L1,k(𝒞XOR)\displaystyle L_{1,k}(\mathcal{C}_{\downarrow\mathrm{XOR}}) =I[n]:|I|=k|𝒞XOR^(I)|=|g^(S,T)|kI[n]:|I|=k|𝒞g^(I)|=|g^(S,T)|kL1,k(𝒞g)\displaystyle=\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}_{\downarrow\mathrm{XOR}}}(I)\right|=|\widehat{g}(S,T)|^{-k}\cdot\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}_{\downarrow g}^{\prime}}(I)\right|=|\widehat{g}(S,T)|^{-k}\cdot L_{1,k}(\mathcal{C}^{\prime}_{\downarrow g})
|g^(S,T)|kL1,k(g,d,m1,m2,n).\displaystyle\leq|\widehat{g}(S,T)|^{-k}\cdot L_{1,k}(g,d,m_{1},m_{2},n). (since 𝒞\mathcal{C}^{\prime} has cost at most dd)

Since 𝒞\mathcal{C} is arbitrary, this proves the first half of Theorem 7.5. To prove the second half, we use an averaging argument and Parseval’s identity on gg:

|g^(S,T)|2m1m2S,Tg^(S,T)2=2m1m2.|\widehat{g}(S,T)|\geq\sqrt{2^{-m_{1}-m_{2}}\sum_{S^{\prime},T^{\prime}}\widehat{g}(S^{\prime},T^{\prime})^{2}}=\sqrt{2^{-m_{1}-m_{2}}}.

Using similar analysis, we also have a reduction from a general gg-fiber to XOR-fiber.

Theorem 7.6.

Assume gadget g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} satisfies Assumption 7.2. Then

L1,k(g,d,m1,m2,n)\displaystyle L_{1,k}(g,d,m_{1},m_{2},n) (S,T|g^(S,T)|)kL1,k(XOR,d,1,1,n)\displaystyle\leq\left(\sum_{S,T}|\widehat{g}(S,T)|\right)^{k}\cdot L_{1,k}(\mathrm{XOR},d,1,1,n)
2(m1+m2)k/2L1,k(XOR,d,1,1,n).\displaystyle\leq 2^{(m_{1}+m_{2})\cdot k/2}\cdot L_{1,k}(\mathrm{XOR},d,1,1,n).
Proof.

Let 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1] be an arbitrary protocol of cost at most dd. Then for a fixed set I[n]I\subseteq[n], by Fact 7.4 applied to gadget gg and using Assumption 7.2, we have

𝒞g^(I)=SI,TI𝒞^(SI,TI)iIg^(Si,Ti).\widehat{\mathcal{C}_{\downarrow g}}(I)=\sum_{S^{I},T^{I}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i}).

Therefore

L1,k(𝒞g)I[n]:|I|=kSI,TI|𝒞^(SI,TI)||iIg^(Si,Ti)|.L_{1,k}(\mathcal{C}_{\downarrow g})\leq\sum_{I\subseteq[n]:|I|=k}\sum_{S^{I},T^{I}}\left|\widehat{\mathcal{C}}(S^{I},T^{I})\right|\cdot\left|\prod_{i\in I}\widehat{g}(S_{i},T_{i})\right|.

Now let M=S,T|g^(S,T)|M=\sum_{S,T}|\widehat{g}(S,T)|. Let ρ\rho be a distribution over subsets of [m1]×[m2][m_{1}]\times[m_{2}] and its probability density function is defined as:

ρ(S,T)=|g^(S,T)|/M.\rho(S,T)=|\widehat{g}(S,T)|/M.

Then we can rewrite L1,k(𝒞g)L_{1,k}(\mathcal{C}_{\downarrow g}) as

L1,k(𝒞g)\displaystyle L_{1,k}(\mathcal{C}_{\downarrow g}) I[n]:|I|=k𝔼(𝑺I,𝑻I)ρI[|𝒞^(𝑺I,𝑻I)|Mk]\displaystyle\leq\sum_{I\subseteq[n]:|I|=k}\operatorname*{\mathbb{E}}_{(\bm{S}^{I},\bm{T}^{I})\sim\rho^{I}}\left[\left|\widehat{\mathcal{C}}(\bm{S}^{I},\bm{T}^{I})\right|\cdot M^{k}\right]
=Mk𝔼(𝑺[n],𝑻[n])ρ[n][I[n]:|I|=k|𝒞^(𝑺I,𝑻I)|].\displaystyle=M^{k}\cdot\operatorname*{\mathbb{E}}_{(\bm{S}^{[n]},\bm{T}^{[n]})\sim\rho^{[n]}}\left[\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}}(\bm{S}^{I},\bm{T}^{I})\right|\right]. (7.2)

Now we fix an arbitrary (S[n],T[n])(S^{[n]},T^{[n]}) sampled from ρ[n]\rho^{[n]}. Note that SiS_{i} and TiT_{i} are not empty by the definition of ρ\rho and Assumption 7.2. Then define a different protocol 𝒞:{±1}n×{±1}n[1,1]\mathcal{C}^{\prime}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1] as follows: After receiving input xx, Alice samples x({±1}m1)nx^{\prime}\in(\{\pm 1\}^{m_{1}})^{n} uniformly conditioned on xi,Si=xix^{\prime}_{i,S_{i}}=x_{i} for all i[n]i\in[n]; and Bob samples similarly y({±1}m2)ny^{\prime}\in(\{\pm 1\}^{m_{2}})^{n} conditioned on yi,Ti=yiy^{\prime}_{i,T_{i}}=y_{i} for all i[n]i\in[n]. Then they execute the protocol 𝒞\mathcal{C} on xx^{\prime} and yy^{\prime}. That is, 𝒞(x,y)=𝔼𝒙,𝒚[𝒞(𝒙,𝒚)]\mathcal{C}^{\prime}(x,y)=\operatorname*{\mathbb{E}}_{\bm{x}^{\prime},\bm{y}^{\prime}}[\mathcal{C}(\bm{x}^{\prime},\bm{y}^{\prime})]. Therefore, for any I[n]I\subseteq[n], we have

𝒞^(1I,1I)=𝒞^(SI,TI).\widehat{\mathcal{C}^{\prime}}(1^{I},1^{I})=\widehat{\mathcal{C}}(S^{I},T^{I}).

By Fact 7.4 applied to 𝒞\mathcal{C}^{\prime} and the XOR gadget, we have

𝒞XOR^(I)=𝒞^(1I,1I)=𝒞^(SI,TI).\widehat{\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}}(I)=\widehat{\mathcal{C}^{\prime}}(1^{I},1^{I})=\widehat{\mathcal{C}}(S^{I},T^{I}).

Since 𝒞\mathcal{C}^{\prime} has cost at most dd, we have

I[n]:|I|=k|𝒞^(SI,TI)|=I[n]:|I|=k|𝒞XOR^(I)|=L1,k(𝒞XOR)L1,k(XOR,d,1,1,n).\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}}(S^{I},T^{I})\right|=\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}}(I)\right|=L_{1,k}(\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}})\leq L_{1,k}(\mathrm{XOR},d,1,1,n).

Putting back to Equation 7.2, we have

L1,k(𝒞g)MkL1,k(XOR,d,1,1,n),L_{1,k}(\mathcal{C}_{\downarrow g})\leq M^{k}\cdot L_{1,k}(\mathrm{XOR},d,1,1,n),

which proves the first half of Theorem 7.6 since 𝒞\mathcal{C} is arbitrary. To prove the second half, we use Cauchy-Schwarz inequality and Parseval’s identity on gg:

M=S,T|g^(S,T)|2m1+m2S,Tg^(S,T)2=2m1+m2.M=\sum_{S,T}|\widehat{g}(S,T)|\leq\sqrt{2^{m_{1}+m_{2}}\sum_{S,T}\widehat{g}(S,T)^{2}}=\sqrt{2^{m_{1}+m_{2}}}.

As a corollary, to study the Fourier growth bounds, we can switch between gadgets conveniently, as long as the gadgets have small size.

Corollary 7.7.

Assume gadgets g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} and g:{±1}m1×{±1}m2{±1}g^{\prime}\colon\{\pm 1\}^{m_{1}^{\prime}}\times\{\pm 1\}^{m_{2}^{\prime}}\to\{\pm 1\} satisfy Assumption 7.2. Then

L1,k(g,d,m1,m2,n)2(m1+m2+m1+m2)k/2L1,k(g,d,m1,m2,n).L_{1,k}(g,d,m_{1},m_{2},n)\leq 2^{(m_{1}+m_{2}+m_{1}^{\prime}+m_{2}^{\prime})\cdot k/2}\cdot L_{1,k}(g^{\prime},d,m_{1}^{\prime},m_{2}^{\prime},n).

8 Directions Towards Further Improvements

In this section we propose potential directions for further improving our second level bounds. In Subsection 8.1, we show that better Fourier growth bounds can be obtained from strong lifting theorems in a black-box way. This relies on the Fourier growth reductions in Section 7. In Subsection 8.2, we examine the bottleneck in our analysis and identify major obstacles within.

8.1 Better Lifting Theorems Imply Better Fourier Growth

Let f:{±1}n{±1}f:\{\pm 1\}^{n}\to\{\pm 1\} be a Boolean function. Let g:{±1}m1×{±1}m2{±1}g:\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} be a gadget. A lifting theorem connects the communication complexity of fgf\circ g with the query complexity of ff. Some lifting theorems show that a low-cost communication protocol can be simulated by a low-cost query algorithm.

To be more precise, let 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}:(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1] be a randomized two-party protocol. Recall Definition 7.1, the gg-fiber of 𝒞\mathcal{C}, denoted 𝒞g(z):{±1}n[1,1]\mathcal{C}_{\downarrow g}(z):\{\pm 1\}^{n}\to[-1,1], is defined by

𝒞g(z)=𝔼𝒙ν¯1,𝒚ν¯2[𝒞(𝒙,𝒚)|g(𝒙i,𝒚i)=zi,i].\mathcal{C}_{\downarrow g}(z)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=z_{i},~{}\forall i\right].

We say that gg satisfies a strong lifting theorem if for all randomized protocols 𝒞\mathcal{C} of small communication bits, there is a randomized decision tree of small depth that approximates 𝒞g\mathcal{C}_{\downarrow g} on each input with error 1/poly(n)1/\mathrm{poly}(n) (see e.g., [26]).

Theorem 8.1.

Assume gadget g:{±1}m1×{±1}m2{±1}g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\} satisfies Assumption 7.2. Assume for any randomized protocol 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1] with at most dd bits of communication, there exists a randomized decision tree 𝒯\mathcal{T} of depth at most DD that approximates 𝒞g\mathcal{C}_{\downarrow g} with pointwise error at most 1/nk1/n^{k}, i.e.,

|𝒯(z)𝒞g(z)|nkz{±1}n.\left|\mathcal{T}(z)-\mathcal{C}_{\downarrow g}(z)\right|\leq n^{-k}\quad\forall z\in\{\pm 1\}^{n}.

Then, for any randomized protocol 𝒞:{±1}n×{±1}n[1,1]\mathcal{C}^{\prime}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1] with at most dd bits of communication, its XOR-fiber 𝒞XOR\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}} has level-kk Fourier growth

L1,k(𝒞XOR)\displaystyle L_{1,k}(\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}) (maxS,T|g^(S,T)|)kDkO(log(n))k1\displaystyle\leq\left(\max_{S,T}|\widehat{g}(S,T)|\right)^{-k}\cdot\sqrt{D^{k}\cdot O\left(\log(n)\right)^{k-1}}
2(m1+m2)k/2DkO(log(n))k1.\displaystyle\leq 2^{(m_{1}+m_{2})\cdot k/2}\cdot\sqrt{D^{k}\cdot O\left(\log(n)\right)^{k-1}}.

As a simple corollary, we see that if the assumption of Theorem 8.1 holds with k=2k=2, D=dpolylog(n)D=d\cdot\mathrm{polylog}(n), and a polylogarithmic-sized gadget gg (i.e., 2m1,2m2polylog(n)2^{m_{1}},2^{m_{2}}\leq\mathrm{polylog}(n)), then the second level Fourier growth of the XOR-fiber of any randomized protocol of cost dd is at most dpolylog(n)d\cdot\mathrm{polylog}(n) as desired.

We also remark that state-of-the-art lifting results hold with the gadget gg being either:

  • The inner product on m1=m2=O(log(n))m_{1}=m_{2}=O(\log(n)) bits [12]. However, for such gg the largest Fourier coefficient squared is 1/poly(n)1/\mathrm{poly}(n), which yields a trivial bound in Theorem 8.1.

  • The index function with m1=poly(n)m_{1}=\mathrm{poly}(n), m2=log(m1)m_{2}=\log(m_{1}) [26].161616For deterministic lifting, a better bound m1=O(nlog(n))m_{1}=O(n\log(n)) is known [37], but it doesn’t suffice for our reduction. In this case the largest Fourier coefficient squared is 1/m121/m_{1}^{2}, which again yields a trivial bound in Theorem 8.1. Nonetheless, even a polynomial improvement on m1m_{1}, say m1=n0.01m_{1}=n^{0.01}, would give new non-trivial bounds in Theorem 8.1 and in turn improves our lower bound on the XOR-lift of Forrelation.

Proof of Theorem 8.1.

Let 𝒞:({±1}m1)n×({±1}m2)n[1,1]\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1] be a randomized protocol of cost at most dd. Then by assumption, 𝒞g\mathcal{C}_{\downarrow g} can be approximated up to error 1/nk1/n^{k} by a randomized decision tree 𝒯\mathcal{T} of depth at most DD. Thus any Fourier coefficient of 𝒞g\mathcal{C}_{\downarrow g} and 𝒯\mathcal{T} differs by at most 1/nk1/n^{k}. Therefore by the level-kk Fourier growth bounds on randomized decision trees [64, 57], we have

L1,k(𝒞g)S[n]:|S|=k(nk+|𝒯^(S)|)DkO(log(n))k1.L_{1,k}(\mathcal{C}_{\downarrow g})\leq\sum_{S\subseteq[n]:|S|=k}\left(n^{-k}+\left|\widehat{\mathcal{T}}(S)\right|\right)\leq\sqrt{D^{k}\cdot O(\log(n))^{k-1}}.

Since 𝒞\mathcal{C} is arbitrary, the claimed bound for 𝒞XOR\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}} follows from Theorem 7.5. ∎

8.2 Sums of Squares of Quadratic Forms for Pairwise Clean Sets

In our analysis for the level-two bound, we showed that one can transform a general protocol to a 44-wise clean protocol with parameter λ=dpolylog(n)\lambda=d\cdot\mathrm{polylog}(n) by adding O(d)O(d) additional cleanup steps in expectation. If one could show that with essentially the same number of steps, one could take λ=polylog(n)\lambda=\mathrm{polylog}(n), then we would obtain the optimal level-two bound of dpolylog(n)d\cdot\mathrm{polylog}(n).

We recall that to bound the number of cleanup steps, we rely on a concentration inequality for sums of squares of orthonormal quadratic forms (Theorem 3.3), which says that if M1,,MmM_{1},\ldots,M_{m} are matrices with zero diagonal and form an orthonormal set when viewed as n2n^{2} dimensional vectors, then the random variable 𝒒=i=1m𝒙𝒙,Mi2\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2} satisfies 𝐏𝐫𝒙γn[𝒒t]eΩ(t)\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}[\bm{q}\geq t]\leq e^{-\Omega(\sqrt{t})} for any tm2t\gtrsim m^{2}. Using this tail bound for m=Θ(d)m=\Theta(d) and conditioning on 𝒙X\bm{x}\in X where XX is an arbitrary subset of n\mathbb{R}^{n} with Gaussian measure 2d\approx 2^{-d}, we obtained a bound 𝔼𝒙γ[𝒒|𝒙X]d2\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}[\bm{q}~{}|~{}\bm{x}\in X]\lesssim d^{2}. This shows that there can be at most O(d)O(d) such quadratic forms MiM_{i}’s where the value 𝔼𝒙γ[𝒙𝒙,Mi2|𝒙X]\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right] can be larger than dd and hence, the reason we can only take λd\lambda\approx d. We note that the argument just described is for the non-adaptive setting, while in our case the MiM_{i}’s are also being chosen adaptively, so additional work is needed.

The next example shows that the aforementioned statement is tight even in the non-adaptive setting where the MiM_{i}’s are fixed: in particular, there is a set XX of large measure and d\approx d such orthonormal quadratic forms where the above expectation after conditioning on 𝒙X\bm{x}\in X is Θ(d2)\Theta(d^{2}).

Example 8.2.

For 1i<jd1\leq i<j\leq\sqrt{d}, let Mij=EijM_{ij}=E_{ij} for i<ji<j where EijE_{ij} denotes the n×nn\times n matrix where only the (i,j)(i,j) entry is one. Note that the matrices MijM_{ij} form an orthonormal set and they all have a zero diagonal. Let X={xn||xi|d1/4 for all id1/2}X=\left\{x\in\mathbb{R}^{n}\,\middle|\,|x_{i}|\gtrsim d^{1/4}\text{ for all $i\leq d^{1/2}$}\right\}. Then, the Gaussian measure γ(X)=2Θ(d)\gamma(X)=2^{-\Theta(d)} but

𝔼𝒙γ[1i<jd𝒙𝒙,Mij2|𝒙X]=Θ(d2).\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\sum_{1\leq i<j\leq\sqrt{d}}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{ij}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]=\Theta(d^{2}).

Note that the set XX in the example above is not pairwise clean and for our application, one can get around it by first ensuring that the protocol is pairwise clean and then proceeding with the 4-wise cleanup process. Motivated by this, we speculate that when the set is pairwise clean, then the expected value of the sum of squares of orthonormal quadratic forms is much smaller unlike the example above. Assuming such a statement and combining it with our ideas for handling the adaptivity suggests a potential way of improving the level-two bounds.

References

  • AA [18] Scott Aaronson and Andris Ambainis. Forrelation: A problem that optimally separates quantum from classical computing. SIAM J. Comput., 47(3):982–1038, 2018.
  • Aar [10] Scott Aaronson. BQP and the polynomial hierarchy. In STOC, pages 141–150, 2010.
  • ABK [23] Scott Aaronson, Harry Buhrman, and William Kretschmer. A qubit, a coin, and an advice string walk into a relational problem. arXiv preprint arXiv:2302.10332, 2023.
  • Agr [20] Rohit Agrawal. Coin theorems and the fourier expansion. Chic. J. Theor. Comput. Sci., 2020, 2020.
  • ALM [20] Radosław Adamczak, Rafał Latała, and Rafał Meller. Hanson–wright inequality in banach spaces. Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 56(4), nov 2020.
  • BCW [98] Harry Buhrman, Richard Cleve, and Avi Wigderson. Quantum vs. classical communication and computation. In STOC, pages 63–68. ACM, 1998.
  • BIJ+ [21] Jarosław Błasiok, Peter Ivanov, Yaonan Jin, Chin Ho Lee, Rocco A Servedio, and Emanuele Viola. Fourier growth of structured 𝔽2\mathbb{F}_{2}-polynomials and applications. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
  • Bor [75] Christer Borell. The brunn-minkowski inequality in gauss space. Inventiones mathematicae, 30(2):207–216, 1975.
  • BS [21] Nikhil Bansal and Makrand Sinha. k-forrelation optimally separates quantum and classical query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1303–1316, 2021.
  • BTW [15] Eric Blais, Li-Yang Tan, and Andrew Wan. An inequality for the fourier spectrum of parity decision trees. CoRR, abs/1506.01055, 2015.
  • BV [10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 30–39, 2010.
  • CFK+ [19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting for bpp using inner product. In ICALP, 2019.
  • CGR [14] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
  • CHHL [19] Eshan Chattopadhyay, Pooya Hatami, Kaave Hosseini, and Shachar Lovett. Pseudorandom generators from polarizing random walks. Theory Comput., 15:1–26, 2019.
  • CHLT [18] Eshan Chattopadhyay, Pooya Hatami, Shachar Lovett, and Avishay Tal. Pseudorandom generators from the second fourier level and applications to ac0 with parity gates. In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
  • CHRT [18] Eshan Chattopadhyay, Pooya Hatami, Omer Reingold, and Avishay Tal. Improved pseudorandomness for unordered branching programs through local monotonicity. In STOC, pages 363–375. ACM, 2018.
  • CKLM [19] Arkadev Chattopadhyay, Michal Koucký, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudo-random properties. Comput. Complex., 28(4):617–659, 2019.
  • CR [12] Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299–1317, 2012.
  • dRNV [16] Susanna F. de Rezende, Jakob Nordström, and Marc Vinyals. How limited interaction hinders real communication (and what it means for proof and circuit complexity). In FOCS, pages 295–304. IEEE Computer Society, 2016.
  • EM [22] Ronen Eldan and Dana Moshkovitz. Reduction from non-unique games to boolean unique games. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
  • Gav [20] Dmitry Gavinsky. Entangled simultaneity versus classical interactivity in communication complexity. IEEE Trans. Inf. Theory, 66(7):4641–4651, 2020.
  • GKPW [19] Mika Göös, Pritish Kamath, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for P NP. Comput. Complex., 28(1):113–144, 2019.
  • GLM+ [15] Mika Göös, Shachar Lovett, Raghu Meka, Thomas Watson, and David Zuckerman. Rectangles are nonnegative juntas. In STOC, pages 257–266. ACM, 2015.
  • Göö [15] Mika Göös. Lower bounds for clique vs. independent set. In FOCS, pages 1066–1076. IEEE Computer Society, 2015.
  • GPW [15] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. In FOCS, pages 1077–1088. IEEE Computer Society, 2015.
  • GPW [20] Mika Göös, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for BPP. SIAM J. Comput., 49(4), 2020.
  • GRT [21] Uma Girish, Ran Raz, and Avishay Tal. Quantum versus randomized communication complexity, with efficient players. In ITCS, volume 185 of LIPIcs, pages 54:1–54:20, 2021. Presented in QIP, 2020 as a contributed talk.
  • GRZ [21] Uma Girish, Ran Raz, and Wei Zhan. Lower bounds for xor of forrelations. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2021.
  • GSTW [16] Parikshit Gopalan, Rocco A. Servedio, Avishay Tal, and Avi Wigderson. Degree and sensitivity: tails of two distributions. CoRR, abs/1604.07432, 2016.
  • GTW [21] Uma Girish, Avishay Tal, and Kewen Wu. Fourier growth of parity decision trees. In 36th Computational Complexity Conference (CCC 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
  • HHL [18] Hamed Hatami, Kaave Hosseini, and Shachar Lovett. Structure of protocols for XOR functions. SIAM J. Comput., 47(1):208–217, 2018.
  • INW [94] Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for network algorithms. In STOC, pages 356–364. ACM, 1994.
  • IRR+ [21] Siddharth Iyer, Anup Rao, Victor Reis, Thomas Rothvoss, and Amir Yehudayoff. Tight bounds on the fourier growth of bounded functions on the hypercube. arXiv preprint arXiv:2107.06309, 2021.
  • KKL [88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions (extended abstract). In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80. IEEE Computer Society, 1988.
  • KMR [17] Pravesh K. Kothari, Raghu Meka, and Prasad Raghavendra. Approximating rectangles by juntas and weakly-exponential lower bounds for LP relaxations of csps. In STOC, pages 590–603. ACM, 2017.
  • Lee [19] Chin Ho Lee. Fourier bounds and pseudorandom generators for product tests. In Amir Shpilka, editor, 34th Computational Complexity Conference, CCC 2019, July 18-20, 2019, New Brunswick, NJ, USA, volume 137 of LIPIcs, pages 7:1–7:25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
  • LMM+ [22] Shachar Lovett, Raghu Meka, Ian Mertz, Toniann Pitassi, and Jiapeng Zhang. Lifting with sunflowers. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
  • LPV [22] Chin Ho Lee, Edward Pyne, and Salil P. Vadhan. Fourier growth of regular branching programs. In Amit Chakrabarti and Chaitanya Swamy, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms DBLP:conf/approx/LeePV22and Techniques, APPROX/RANDOM 2022, September 19-21, 2022, University of Illinois, Urbana-Champaign, USA (Virtual Conference), volume 245 of LIPIcs, pages 2:1–2:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
  • LRS [15] James R. Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semidefinite programming relaxations. In STOC, pages 567–576. ACM, 2015.
  • LSS+ [19] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. A fixed-depth size-hierarchy theorem for AC0[]{AC}^{0}[\oplus] via the coin problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 442–453, 2019.
  • LV [18] Chin Ho Lee and Emanuele Viola. The coin problem for product tests. ACM Transactions on Computation Theory (TOCT), 10(3):1–10, 2018.
  • Man [95] Yishay Mansour. An O(nloglogn){O}(n^{\log\log n}) learning algorithm for DNF under the uniform distribution. J. Comput. Syst. Sci., 50(3):543–550, 1995. Appeared in COLT, 1992.
  • MO [10] Ashley Montanaro and Tobias Osborne. On the communication complexity of xor functions, 2010.
  • O’D [14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
  • OS [07] Ryan O’Donnell and Rocco A. Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827–844, 2007.
  • Raz [87] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
  • Raz [95] Ran Raz. Fourier analysis for probabilistic communication complexity. Comput. Complex., 5(3/4):205–221, 1995.
  • RM [99] Ran Raz and Pierre McKenzie. Separation of the monotone NC hierarchy. Comb., 19(3):403–435, 1999.
  • RPRC [16] Robert Robere, Toniann Pitassi, Benjamin Rossman, and Stephen A. Cook. Exponential lower bounds for monotone span programs. In FOCS, pages 406–415. IEEE Computer Society, 2016.
  • RS [10] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of ac0{}^{\mbox{0}}. SIAM J. Comput., 39(5):1833–1855, 2010.
  • RSV [13] Omer Reingold, Thomas Steinke, and Salil P. Vadhan. Pseudorandomness for regular branching programs via Fourier analysis. In APPROX-RANDOM, pages 655–670. Springer, 2013.
  • RT [19] Ran Raz and Avishay Tal. Oracle separation of BQP and PH. In STOC, pages 13–23. ACM, 2019. Presented in QIP, 2019 as a plenary talk. Accepted to the Journal of the ACM.
  • RY [22] Anup Rao and Amir Yehudayoff. Anticoncentration and the exact gap-hamming problem. SIAM Journal on Discrete Mathematics, 36(2):1071–1092, 2022.
  • She [11] Alexander A. Sherstov. The pattern matrix method. SIAM J. Comput., 40(6):1969–2000, 2011.
  • She [12] Alexander A. Sherstov. The communication complexity of gap hamming distance. Theory Comput., 8(1):197–208, 2012.
  • Smo [87] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Alfred V. Aho, editor, Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, New York, USA, pages 77–82. ACM, 1987.
  • SSW [21] Alexander A Sherstov, Andrey A Storozhenko, and Pei Wu. An optimal separation of randomized and quantum query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1289–1302, 2021.
  • ST [78] Vladimir N Sudakov and Boris S Tsirel’son. Extremal properties of half-spaces for spherically invariant measures. Journal of Soviet Mathematics, 9(1):9–18, 1978.
  • SVW [17] Thomas Steinke, Salil P. Vadhan, and Andrew Wan. Pseudorandomness and Fourier-growth bounds for width-3 branching programs. Theory of Computing, 13(1):1–50, 2017. Appeared in APPROX-RANDOM, 2014.
  • SZ [08] Yaoyun Shi and Zhiqiang Zhang. Communication complexities of xor functions. arXiv preprint arXiv:0808.1762, 2008.
  • SZ [09] Yaoyun Shi and Yufan Zhu. Quantum communication complexity of block-composed functions. Quantum Inf. Comput., 9(5&6):444–460, 2009.
  • Tal [96] Michel Talagrand. How much are increasing sets positively correlated? Comb., 16(2):243–258, 1996.
  • Tal [17] Avishay Tal. Tight bounds on the Fourier spectrum of AC0. In Computational Complexity Conference, volume 79 of LIPIcs, pages 15:1–15:31. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.
  • Tal [20] Avishay Tal. Towards optimal separations between quantum and randomized query complexities. In FOCS, pages 228–239. IEEE, 2020.
  • TWXZ [13] Hing Yin Tsang, Chung Hoi Wong, Ning Xie, and Shengyu Zhang. Fourier sparsity, spectral norm, and the log-rank conjecture. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 658–667, 2013.
  • Ver [18] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  • Vid [12] Thomas Vidick. A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gap-hamming-distance problem. Chic. J. Theor. Comput. Sci., 2012, 2012.
  • Wu [22] Xinyu Wu. A stochastic calculus approach to the oracle separation of BQP and PH. Theory Comput., 18:1–11, 2022.
  • WYY [17] Xiaodi Wu, Penghui Yao, and Henry S. Yuen. Raz-mckenzie simulation with the inner product gadget. Electron. Colloquium Comput. Complex., 24:10, 2017.
  • Zha [14] Shengyu Zhang. Efficient quantum protocols for xor functions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1878–1885. SIAM, 2014.

Appendix A Gap-Hamming Lower Bounds

As an immediate consequence of Theorem 1.5, we can derive optimal lower bounds against the Gap-Hamming problem as in Theorem 1.6.

Proof of Theorem 1.6.

Set ρ=10/n\rho=10/\sqrt{n}. Fix the randomness to be any r{0,1}r\in\{0,1\}^{*} and let 𝒞r\mathcal{C}_{r} refer to the deterministic protocol 𝒞\mathcal{C} with randomness fixed to rr. Suppose dτnd\leq\tau\cdot n for a sufficiently small constant τ\tau, we apply Theorem 1.5 on ρ\rho as well as ρ-\rho, and apply triangle inequality to conclude that

|𝔼𝒛πρn[hr(𝒛)]𝔼𝒛πρn[hr(𝒛)]|2O(d/n)<1/9.\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\pi^{\otimes n}_{\rho}}[h_{r}(\bm{z})]-\operatorname*{\mathbb{E}}_{\bm{z}\sim\pi^{\otimes n}_{-\rho}}[h_{r}(\bm{z})]\right|\leq 2\cdot O\left(\sqrt{d/n}\right)<1/9.

Let σρ\sigma_{\rho} be the distribution of (𝒙,𝒚)(\bm{x},\bm{y}) induced by sampling 𝒙π0n\bm{x}\sim\pi^{\otimes n}_{0} and 𝒛πρn\bm{z}\sim\pi^{\otimes n}_{\rho} and letting 𝒚=𝒙𝒛\bm{y}=\bm{x}\odot\bm{z}, similarly define σρ\sigma_{-\rho} but with 𝒛πρn\bm{z}\sim\pi^{\otimes n}_{-\rho}. We now expand hr(z)h_{r}(z) in terms of 𝒞(x,y)\mathcal{C}(x,y), take an expectation over rr and apply triangle inequality to conclude that

|𝔼(𝒙,𝒚)σρ[𝒞(𝒙,𝒚)]𝔼(𝒙,𝒚)σρ[𝒞(𝒙,𝒚)]|<1/9.\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/9. (A.1)

Hoeffding’s inequality implies that for 𝒛πρn\bm{z}\sim\pi^{\otimes n}_{\rho}, we have

𝐏𝐫[|i𝒛i10n|5n]2exp{2(5n)24n}<1/18.\operatorname*{\mathbf{Pr}}\left[\left|\sum_{i}\bm{z}_{i}-10\sqrt{n}\right|\geq 5\sqrt{n}\right]\leq 2\exp\left\{\tfrac{-2\cdot(5\sqrt{n})^{2}}{4n}\right\}<1/18.

This implies that a random (𝒙,𝒚)σρ(\bm{x},\bm{y})\sim\sigma_{\rho} is a yes instance of the Gap-Hamming problem with probability larger than 17/1817/18. Let σ~ρ\widetilde{\sigma}_{\rho} denote σρ\sigma_{\rho} conditioned on Yes instances of the Gap-Hamming problem. Similarly define σ~ρ\widetilde{\sigma}_{-\rho} to be σρ\sigma_{-\rho} conditioned on No instances of the Gap-Hamming problem. Since 𝒞(x,y)\mathcal{C}(x,y) has outputs in [1,1][-1,1], we have

|𝔼(𝒙,𝒚)σρ[𝒞(𝒙,𝒚)]𝔼(𝒙,𝒚)σ~ρ[𝒞(𝒙,𝒚)]|<1/9\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/9

and

|𝔼(𝒙,𝒚)σρ[𝒞(𝒙,𝒚)]𝔼(𝒙,𝒚)σ~ρ[𝒞(𝒙,𝒚)]|<1/9.\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/9.

This, along with Equation A.1 and triangle inequality, implies that

|𝔼(𝒙,𝒚)σ~ρ[𝒞(𝒙,𝒚)]𝔼(𝒙,𝒚)σ~ρ[𝒞(𝒙,𝒚)]|<1/3.\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/3.

However, this contradicts the assumption that the protocol 𝒞\mathcal{C} solves the Gap-Hamming problem with advantage at least 2/32/3. ∎

Appendix B Concentration for Sum of Squares of Quadratic Forms

Here we prove Theorem 3.3. While it follows from [5, Theorem 6] which is a Banach space-valued version of the Hanson-Wright inequality, in our setting a weaker statement suffices, for which we give a self-contained proof following [5].

For any integer n1n\geq 1, we use n={xn|x1}\mathcal{B}^{n}=\left\{x\in\mathbb{R}^{n}\,\middle|\,\left\|x\right\|\leq 1\right\} to denote the unit Euclidean ball in n\mathbb{R}^{n}. For any two sets A,BnA,B\subseteq\mathbb{R}^{n}, we define A+B={x+y|xA,yB}A+B=\left\{x+y\,\middle|\,x\in A,y\in B\right\}. For any set AnA\in\mathbb{R}^{n} and any number tt\in\mathbb{R}, we define tA={tx|xA}tA=\left\{t\cdot x\,\middle|\,x\in A\right\}. Let Φ:[0,1]\Phi\colon\mathbb{R}\to[0,1] be the cumulative distribution function of the standard Gaussian distribution, i.e., Φ(a)=12πaeu2/2du\Phi(a)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{a}e^{-u^{2}/2}\mathrm{d}u.

Now we cite the famous Gaussian isoperimetric inequality [8, 58].

Theorem B.1 (Gaussian Isoperimetric Inequality).

Let AnA\subseteq\mathbb{R}^{n} be a measurable set and assume γn(A)Φ(a)\gamma_{n}(A)\geq\Phi(a) for some aa\in\mathbb{R}. Then for any t0t\geq 0, we have γn(A+tn)Φ(a+t)\gamma_{n}(A+t\mathcal{B}^{n})\geq\Phi(a+t).

In particular, if γn(A)1/2\gamma_{n}(A)\geq 1/2, then we can pick a=0a=0 in Theorem B.1 and have

γn(A+tn)Φ(t)1et2/2.\gamma_{n}(A+t\mathcal{B}^{n})\geq\Phi(t)\geq 1-e^{-t^{2}/2}. (B.1)

Now we are ready to prove Theorem 3.3.

Proof of Theorem 3.3.

Note that the bound is trivial when m=0m=0. Thus from now on we assume without loss of generality m1m\geq 1.

For each xnx\in\mathbb{R}^{n}, let Kx=i=1mxx,Mi2K_{x}=\sum_{i=1}^{m}\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,M_{i}\right\rangle^{2}. We first write KxK_{x} as a squared Euclidean norm of a vector:

  • For i[m]i\in[m], we view MiM_{i} as a length-n2n^{2} row vector.

  • Let Mm×n2M\in\mathbb{R}^{m\times n^{2}} be a matrix where the ii-th row is MiM_{i}.

Therefore we have

Kx=M(xx)2=M(xx)2,K_{x}=\left\|M(x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x)\right\|^{2}=\left\|M(x\otimes x)\right\|^{2}, (B.2)

where \otimes is the standard tensor product and the second equality follows since each MiM_{i} has zero diagonal.

Define f(y)=M(yy)f(y)=\left\|M(y\otimes y)\right\|, g(y)=supz𝕊n1M(zy)g(y)=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(z\otimes y)\right\|, and h(y)=supz𝕊n1M(yz)h(y)=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(y\otimes z)\right\|. Let F=𝔼𝒚γn[f(𝒚)]F=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}[f(\bm{y})], G=𝔼𝒚γn[g(𝒚)]G=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}[g(\bm{y})], and H=𝔼𝒚γn[h(𝒚)]H=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}[h(\bm{y})] be their mean. Define the set

A={yn|f(y)<6F,g(y)<6G, and h(y)<6H}.A=\left\{y\in\mathbb{R}^{n}\,\middle|\,f(y)<6F,\ g(y)<6G,\text{ and }h(y)<6H\right\}.

By Markov’s inequality and union bound, we have the Gaussian measure of AA is γn(A)1/2\gamma_{n}(A)\geq 1/2. Then by Equation B.1, we have

γn(A+tn)1et2/2holds for all t0.\gamma_{n}(A+t\mathcal{B}^{n})\geq 1-e^{-t^{2}/2}\quad\text{holds for all $t\geq 0$.} (B.3)

Now for an arbitrary xA+tnx\in A+t\mathcal{B}^{n}, we write x=y+tzx=y+tz where yAy\in A and znz\in\mathcal{B}^{n}. Then

M(xx)\displaystyle\left\|M(x\otimes x)\right\| M(yy)+tM(yz)+tM(zy)+t2M(zz)\displaystyle\leq\left\|M(y\otimes y)\right\|+t\cdot\left\|M(y\otimes z)\right\|+t\cdot\left\|M(z\otimes y)\right\|+t^{2}\cdot\left\|M(z\otimes z)\right\|
<6F+6t(G+H)+t2V,\displaystyle<6F+6t(G+H)+t^{2}V,

where V=supz𝕊n1M(zz)V=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(z\otimes z)\right\|. This, together with Equation B.2 and Equation B.3, implies

𝐏𝐫𝒙γn[K𝒙(6F+6t(G+H)+t2V)2]𝐏𝐫𝒙γn[𝒙A+tn]=1γn(A+tn)et2/2.\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[K_{\bm{x}}\geq\left(6F+6t(G+H)+t^{2}V\right)^{2}\right]\leq\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\notin A+t\mathcal{B}^{n}\right]=1-\gamma_{n}(A+t\mathcal{B}^{n})\leq e^{-t^{2}/2}. (B.4)

Now we calculate F,G,H,VF,G,H,V in the following claim, the proof of which will be presented later.

Claim B.2.

F2mF\leq\sqrt{2m}, G,HmG,H\leq\sqrt{m}, and V1V\leq 1.

Plugging Claim B.2 into Equation B.4, we have

𝐏𝐫𝒙γn[K𝒙(62m+12tm+t2)2]et2/2holds for any t0.\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[K_{\bm{x}}\geq\left(6\sqrt{2m}+12t\sqrt{m}+t^{2}\right)^{2}\right]\leq e^{-t^{2}/2}\quad\text{holds for any $t\geq 0$.}

Now we set

t=1168rm+r0t=\frac{1}{168}\sqrt{\frac{r}{m+\sqrt{r}}}\geq 0

and assume r98mr\geq 98m. Then 62m67r6\sqrt{2m}\leq\frac{6}{7}\sqrt{r}, 12tm114r12t\sqrt{m}\leq\frac{1}{14}\sqrt{r}, and t2114rt^{2}\leq\frac{1}{14}\sqrt{r}. Therefore

𝐏𝐫𝒙γn[i=1m𝒙𝒙,Mi2r]=𝐏𝐫𝒙γn[K𝒙r]et2/2=exp{156448rm+r}.\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}\geq r\right]=\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[K_{\bm{x}}\geq r\right]\leq e^{-t^{2}/2}=\exp\left\{-\frac{1}{56448}\cdot\frac{r}{m+\sqrt{r}}\right\}.

Finally we present the missing proof of Claim B.2.

Proof of Claim B.2.

First we observe that rows of MM are unit vectors, therefore

M=m.\left\|M\right\|=\sqrt{m}. (B.5)

In addition, rows of MM are orthogonal to each other, therefore the operator norm of MM is

Mop1.\left\|M\right\|_{\mathrm{op}}\leq 1. (B.6)

We index the columns of MM by [n]2[n]^{2} and let the column vectors of MM be (bi,j)i,j[n]\left(b_{i,j}\right)_{i,j\in[n]}. Since rows of MM are flattened matrices with zero diagonal, we have

bi,i=0mfor all i[n].b_{i,i}=0^{m}\quad\text{for all $i\in[n]$.} (B.7)

Now we bound F,G,H,VF,G,H,V separately.

Bounding FF.

Observe that

F2\displaystyle F^{2} =(𝔼𝒚γn[M(𝒚𝒚)])2𝔼𝒚γn[M(𝒚𝒚)2]=𝔼𝒚γn[i,j[n]bi,j𝒚i𝒚j2]\displaystyle=\left(\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\left\|M(\bm{y}\otimes\bm{y})\right\|\right]\right)^{2}\leq\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\left\|M(\bm{y}\otimes\bm{y})\right\|^{2}\right]=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\left\|\sum_{i,j\in[n]}b_{i,j}\bm{y}_{i}\bm{y}_{j}\right\|^{2}\right] (by convexity)
=𝔼𝒚γn[i,j,i,j[n]bi,j,bi,j𝒚i𝒚j𝒚i𝒚j]=i,j[n](bi,j2+bi,j,bj,i)\displaystyle=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\sum_{i,j,i^{\prime},j^{\prime}\in[n]}\left\langle b_{i,j},b_{i^{\prime},j^{\prime}}\right\rangle\bm{y}_{i}\bm{y}_{j}\bm{y}_{i^{\prime}}\bm{y}_{j^{\prime}}\right]=\sum_{i,j\in[n]}\left(\left\|b_{i,j}\right\|^{2}+\left\langle b_{i,j},b_{j,i}\right\rangle\right) (by Equation B.7)
i,j[n](bi,j2+12(bi,j2+bj,i2))=2i,j[n]bi,j2\displaystyle\leq\sum_{i,j\in[n]}\left(\left\|b_{i,j}\right\|^{2}+\frac{1}{2}\left(\left\|b_{i,j}\right\|^{2}+\left\|b_{j,i}\right\|^{2}\right)\right)=2\sum_{i,j\in[n]}\left\|b_{i,j}\right\|^{2}
=2M2=2m.\displaystyle=2\left\|M\right\|^{2}=2m. (by Equation B.5)
Bounding GG and HH.

Fix an arbitrary yny\in\mathbb{R}^{n} and we first simplify g(y)g(y). For each i[n]i\in[n], define vector bi=j[n]bi,jyjb_{i}=\sum_{j\in[n]}b_{i,j}y_{j} and let BB be the matrix with bib_{i}’s as column vectors. Then

g(y)=supz𝕊n1i,j[n]bi,jziyj=supz𝕊n1i[n]bizi=BopB=i[n]j[n]bi,jyj2.g(y)=\sup_{z\in\mathbb{S}^{n-1}}\left\|\sum_{i,j\in[n]}b_{i,j}z_{i}y_{j}\right\|=\sup_{z\in\mathbb{S}^{n-1}}\left\|\sum_{i\in[n]}b_{i}z_{i}\right\|=\left\|B\right\|_{\mathrm{op}}\leq\left\|B\right\|=\sqrt{\sum_{i\in[n]}\left\|\sum_{j\in[n]}b_{i,j}y_{j}\right\|^{2}}. (B.8)

Now we bound GG:

G2\displaystyle G^{2} =(𝔼𝒚γn[g(𝒚)])2𝔼𝒚γn[g(𝒚)2]\displaystyle=\left(\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[g(\bm{y})\right]\right)^{2}\leq\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[g(\bm{y})^{2}\right] (by convexity)
𝔼𝒚γn[i[n]j[n]bi,j𝒚j2]=𝔼𝒚γn[i[n]j,j[n]bi,j,bi,j𝒚j𝒚j]\displaystyle\leq\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\sum_{i\in[n]}\left\|\sum_{j\in[n]}b_{i,j}\bm{y}_{j}\right\|^{2}\right]=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\sum_{i\in[n]}\sum_{j,j^{\prime}\in[n]}\left\langle b_{i,j},b_{i,j^{\prime}}\right\rangle\bm{y}_{j}\bm{y}_{j^{\prime}}\right] (by Equation B.8)
=i,j[n]bi,j2=M2=m.\displaystyle=\sum_{i,j\in[n]}\left\|b_{i,j}\right\|^{2}=\left\|M\right\|^{2}=m. (by Equation B.5)

Similar argument works for HH.

Bounding VV.

Note that for any z𝕊n1z\in\mathbb{S}^{n-1}, we have zz=z2=1\left\|z\otimes z\right\|=\left\|z\right\|^{2}=1. Thus, by Equation B.6, we have

V=supz𝕊n1M(zz)Mop1.V=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(z\otimes z)\right\|\leq\left\|M\right\|_{\mathrm{op}}\leq 1.