Fourier Growth of Communication Protocols for XOR Functions

Uma Girish Princeton University. Email: [email protected] Makrand Sinha Simons Institute and University of California at Berkeley. Email: [email protected] Avishay Tal University of California at Berkeley. Email: [email protected] Kewen Wu University of California at Berkeley. Email: [email protected]

Abstract

The level- $k$ $\ell_{1}$ -Fourier weight of a Boolean function refers to the sum of absolute values of its level- $k$ Fourier coefficients. Fourier growth refers to the growth of these weights as $k$ grows. It has been extensively studied for various computational models, and bounds on the Fourier growth, even for the first few levels, have proven useful in learning theory, circuit lower bounds, pseudorandomness, and quantum-classical separations.

In this work, we investigate the Fourier growth of certain functions that naturally arise from communication protocols for XOR functions (partial functions evaluated on the bitwise XOR of the inputs $x$ and $y$ to Alice and Bob). If a protocol $\mathcal{C}$ computes an XOR function, then $\mathcal{C}(x,y)$ is a function of the parity $x\oplus y$ . This motivates us to analyze the XOR-fiber of the communication protocol $\mathcal{C}$ , defined as $h(z):=\mathbb{E}_{\bm{x},\bm{y}}[\mathcal{C}(\bm{x},\bm{y})|\bm{x}\oplus\bm{y}=z]$ .

We present improved Fourier growth bounds for the XOR-fibers of randomized protocols that communicate $d$ bits. For the first level, we show a tight $O(\sqrt{d})$ bound and obtain a new coin theorem, as well as an alternative proof for the tight randomized communication lower bound for the Gap-Hamming problem. For the second level, we show an $d^{3/2}\cdot\mathrm{polylog}(n)$ bound, which improves the previous $O(d^{2})$ bound by Girish, Raz, and Tal (ITCS 2021) and implies a polynomial improvement on the randomized communication lower bound for the XOR-lift of the Forrelation problem, which extends the quantum-classical gap for this problem.

Our analysis is based on a new way of adaptively partitioning a relatively large set in Gaussian space to control its moments in all directions. We achieve this via martingale arguments and allowing protocols to transmit real values. We also show a connection between Fourier growth and lifting theorems with constant-sized gadgets as a potential approach to prove optimal bounds for the second level and beyond.

1 Introduction

The Fourier spectrum of Boolean functions and their various properties have played an important role in many areas of mathematics and theoretical computer science. In this work, we study a notion called $\ell_{1}$ -Fourier growth, which captures the scaling of the sum of absolute values of the level- $k$ Fourier coefficients of a function. In a nutshell, functions with small Fourier growth cannot aggregate many weak signals in the input to obtain a considerable effect on the output. In contrast, the Majority function, which can amplify weak biases, is an example of a Boolean function with extremely high Fourier growth.

To formally define Fourier growth, we recall that every Boolean function $f:\{\pm 1\}^{n}\to[-1,1]$ can be uniquely represented as a multilinear polynomial

f(x)=\sum_{S\subseteq[n]}\widehat{f}(S)\cdot\prod_{i\in S}x_{i}

where the coefficients of the polynomial $\widehat{f}(S)\in\mathbb{R}$ are called the Fourier coefficients of $f$ , and they satisfy $\widehat{f}(S)=\operatorname*{\mathbb{E}}[f(\bm{x})\cdot\prod_{i\in S}\bm{x}_{i}]$ for a uniformly random $\bm{x}\in\{\pm 1\}^{n}$ . The level- $k$ $\ell_{1}$ -Fourier growth of $f$ is the sum of the absolute values of its level- $k$ Fourier coefficients,

L_{1,k}(f):=\sum_{S\subseteq[n]:|S|=k}\left|\widehat{f}(S)\right|.

The study of Fourier growth dates back to the work of Mansour [42] who used it in the context of learning algorithms. Since then, several works have shown that upper bounds on the Fourier growth, even for the first few Fourier levels, have applications to pseudorandomness, circuit complexity, and quantum-classical separations. For example:

•

A bound on the level-one Fourier growth is sufficient to control the advantage of distinguishing biased coins from unbiased ones [4].
•

A bound on the level-two Fourier growth already gives pseudorandom generators [15], oracle separations between BQP and PH [52, 68], and separations between efficient quantum communication and randomized classical communication [27].

Meanwhile, Fourier growth bounds have been extensively studied and established for various computational models, including small-width DNFs/CNFs [42], $\mathsf{AC}^{0}$ circuits [63], low-sensitivity Boolean functions [29], small-width branching programs [51, 59, 16, 38], small-depth decision trees [45, 64, 57], functions related to small-cost communication protocols [28, 27], low-degree $\mathsf{GF}(2)$ polynomials [14, 15, 7], product tests [36], small-depth parity decision trees [10, 30], low-degree bounded functions [33], and more.

For any Boolean function $f$ with outputs in $[-1,1]$ , the level- $k$ Fourier growth $L_{1,k}(f)$ is at most $\sqrt{\binom{n}{k}}$ . However, for many natural classes of Boolean functions, this bound is far from tight and not good enough for applications. Establishing better bounds require exploring structural properties of the specific class of functions in question. Even for low Fourier levels, this can be highly non-trivial and tight bounds remain elusive in many cases. For example, for degree- $d$ $\mathsf{GF}(2)$ polynomials (which well-approximate $\mathsf{AC}^{0}[\oplus]$ when we set $d=\mathrm{polylog}(n)$ [46, 56]), while we know a level-one bound of $L_{1,1}(f)\leq O(d)$ due to [15], the current best bound for levels $k\geq 2$ is roughly $2^{O(dk)}$ [14], whereas the conjectured bound is $d^{O(k)}$ . Validating such a bound, even for the second level $k=2$ , will imply unconditional pseudorandom generators of polylogarithmic seed length for $\mathsf{AC}^{0}[\oplus]$ [15], a longstanding open problem in circuit complexity and pseudorandomness.

XOR Functions.

In this work, we study the Fourier growth of certain functions that naturally arise from communication protocols for XOR-lifted functions, also referred to as XOR functions. XOR functions are an important and well-studied class of functions in communication complexity with connections to the log-rank conjecture and quantum versus classical separations [43, 31, 65, 60, 70].

In this setting, Alice gets an input $x\in\{\pm 1\}^{n}$ and Bob gets an input $y\in\{\pm 1\}^{n}$ and they wish to compute $f(x\odot y)$ where $f$ is some partial Boolean function and $x\odot y$ is in the domain of $f$ . Here, $x\odot y$ denotes the pointwise product of $x$ and $y$ . Given any communication protocol $\mathcal{C}$ that computes an XOR function exactly, the output $\mathcal{C}(x,y)$ of the protocol depends only on the parity $x\odot y$ , whenever $f$ is defined on $x\odot y$ . This gives a natural motivation to analyze the XOR-fiber of a communication protocol defined below. We note that a similar notion first appeared in an earlier work of Raz [47].

Definition 1.1.

Let $\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\}$ be any deterministic communication protocol. The XOR-fiber of the communication protocol $\mathcal{C}$ is the function $h\colon\{\pm 1\}^{n}\to[-1,1]$ defined at $z\in\{\pm 1\}^{n}$ as

h(z)=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})~{}|~{}\bm{x}\odot\bm{y}=z],

where $\odot$ is the entrywise product and $\nu$ is the uniform distribution over $\{\pm 1\}^{n}$ .

We remark that XOR-fiber is the “inverse” of XOR-lift of a function: If $\mathcal{C}$ computes the XOR function of $f$ , then the XOR-fiber $h$ of $\mathcal{C}$ is equal to $f$ on the domain of $f$ .

In this work, we investigate the Fourier growth of XOR-fibers of small-cost communication protocols and apply these bounds in several contexts. Before stating our results, we first discuss several related works.

Related Works.

Showing optimal Fourier growth bounds for XOR-fibers is a complex undertaking in general and a first step towards this end is to obtain optimal Fourier growth bounds for parity decision trees. This is because a parity decision tree for a Boolean function $f$ naturally gives rise to a structured communication protocol for the XOR-function corresponding to $f$ . This protocol perfectly simulates the parity decision tree by having Alice and Bob exchange one bit each to simulate a parity query. Moreover, the XOR-fiber of this protocol exactly computes the parity decision tree. As such, parity decision trees can be seen as a special case of communication protocols, and Fourier growth bounds on XOR-fibers of communication protocols immediately imply Fourier growth bounds on parity decision trees.

Fourier growth bounds for decision trees and parity decision trees are well-studied. It is not too difficult to obtain a level- $k$ bound of $O(d)^{k}$ for parity decision trees of depth $d$ , however, obtaining improved bounds is significantly more challenging. For decision trees of depth $d$ (which form a subclass of parity decision trees of depth $d$ ), O’Donnell and Servedio [45] proved a tight bound of $O(\sqrt{d})$ on the level-one Fourier growth. By inductive tree decompositions, Tal [64] obtained bounds for the higher levels of the form $L_{1,k}(f)\leq\sqrt{d^{k}\cdot O(\log(n))^{k-1}}$ . This was later sharpened by Sherstov, Storozhenko, and Wu [57] to the asymptotically tight bound of $L_{1,k}(f)\leq\sqrt{\binom{d}{k}\cdot O(\log(n))^{k-1}}$ using a more sophisticated layered partitioning strategy on the tree.

When it comes to parity decision trees, despite all the similarities, the structural decomposition approach does not seem to carry over due to the correlations between the parity queries. For parity decision trees of depth $d$ , Blais, Tan, and Wan [10] proved a tight level-one bound of $O(\sqrt{d})$ . For higher levels, Girish, Tal, and Wu [30] showed that $L_{1,k}(f)\leq\sqrt{d^{k}\cdot O(k\log(n))^{2k}}$ . These works imply almost tight Fourier growth bounds on the XOR-fibers of structured protocols that arise from simulating decision trees or parity decision trees.

For the case of XOR-fibers of arbitrary deterministic/randomized communication protocols (which do not necessarily simulate parity decision trees or decision trees), Girish, Raz, and Tal [27] showed an ${O}(d^{k})$ Fourier growth¹¹1Technically, [27] only proved a level-two bound (as it suffices for their analysis), but a level- $k$ bound follows easily from their proof approach, as noted by [28] for level- $k$ . For level-one and level-two, these bounds are $O(d)$ and $O(d^{2})$ respectively and are sub-optimal — as mentioned previously, such weaker bounds for parity decision trees are easy to obtain, while obtaining optimal bounds (for parity decision trees) of $O(\sqrt{d})$ for level one and $d\cdot\mathrm{polylog}(n)$ for level two already requires sophisticated ideas.

The bounds in [27] follow by analyzing the Fourier growth of XOR-fibers of communication rectangles of measure $\approx 2^{-d}$ and then adding up the contributions from all the leaf rectangles induced by the protocol. Such a per-rectangle-based approach cannot give better bounds than the ones in [27], while they also conjectured that the optimal Fourier growth of XOR-fibers of arbitrary protocols should match the growth for parity decision trees.

Showing the above is a challenging task even for the first two Fourier levels. The difficulty arises primarily since in the absence of a per-rectangle-based argument, one has to crucially leverage cancellations between different rectangles induced by the communication protocol. In the simpler case of parity decision trees (or protocols that exchange parities), such cancellations are leveraged in [30] by ensuring $k$ -wise independence at each node of the tree — this can be achieved by adding extra parity queries. In a general protocol, the parties can send arbitrary partial information about their inputs and correlate the coordinates in complicated ways that such methods break down. This is one of the key difficulties we face in this paper.

1.1 Main Results

We prove new and improved bounds on the Fourier growth of the XOR-fibers associated with small-cost protocols for levels $k=1$ and $k=2$ .

Theorem 1.2.

Let $\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\}$ be a deterministic communication protocol with at most $d$ bits of communication. Let $h$ be its XOR-fiber as in Definition 1.1. Then, $L_{1,1}(h)=O\left(\sqrt{d}\right)$ .

Theorem 1.3.

Let $\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\}$ be a deterministic protocol communicating at most $d$ bits. Let $h$ be its XOR-fiber as in Definition 1.1. Then, $L_{1,2}(h)=O\left(d^{3/2}\log^{3}(n)\right)$ .

Our bounds in Theorems 1.2 and 1.3 extend directly to randomized communication protocols. This is because $L_{1,k}$ is convex and any randomized protocol is a convex combination of deterministic protocols with the same cost. Moreover, we can use Fourier growth reductions, as described in Subsection 1.2.3, to demonstrate that these bounds apply to general constant-sized gadgets $g$ and the corresponding $g$ -fiber.

Our level-one and level-two bounds improve previous bounds in [27] by polynomial factors. Additionally, our level-one bound is tight since a deterministic protocol with $d+1$ bits of communication can compute the majority vote of $x_{1}\cdot y_{1},\ldots,x_{d}\cdot y_{d}$ , which corresponds to $h(z)=\mathrm{MAJ}(z_{1},\ldots,z_{d})$ with $L_{1,1}(h)=\Theta(\sqrt{d})$ . Furthermore, as we discuss later in Subsection 1.2, level-one and level-two bounds are already sufficient for many interesting applications.

In terms of techniques, our analysis presents a key new idea that enables us to exploit cancellations between different rectangles induced by the protocol. This idea involves using a novel process to adaptively partition a relatively large set in Gaussian space, which enables us to control its $k$ -wise moments in all directions — this can be thought of as a spectral notion of almost $k$ -wise independence. We achieve this by utilizing martingale arguments and allowing protocols to transmit real values rather than just discrete bits. This notion and procedure may be of independent interest. See Section 2 for a detailed discussion.

1.2 Applications and Connections

Our main theorem has applications to XOR functions, and in more generality to functions lifted with constant-sized gadgets. In this setting, there is a simple gadget $g:\Sigma\times\Sigma\to\{\pm 1\}$ and a Boolean function $f$ defined on inputs $z\in\{\pm 1\}^{n}$ . The lifted function $f\circ{g}$ is defined on $n$ pairs of symbols $(x_{1},y_{1}),\ldots,(x_{n},y_{n})\in\Sigma\times\Sigma$ such that $(f\circ{g})(x,y)=f(g(x_{1},y_{1}),\ldots,g(x_{n},y_{n}))$ . The function $f\circ{g}$ naturally defines a communication problem where Alice is given $x=(x_{1},\ldots,x_{n})$ , Bob is given $y=(y_{1},\ldots,y_{n})$ , and they are asked to compute $(f\circ{g})(x,y)$ .

Since XOR functions are functions lifted with the XOR gadget, our main theorem implies lower bounds on the communication complexity of specific XOR functions. Additionally, we also show connections between XOR-lifting and lifting with any constant-sized gadget. Next, we describe these lower bounds and connections, with further context.

1.2.1 The Coin Problem and the Gap-Hamming Problem

The coin problem studies the advantage that a class of Boolean functions has in distinguishing biased coins from unbiased ones. More formally, let $\mathcal{F}$ be a class of $n$ -variate Boolean functions. Let $\rho\in[-1,1]$ and $\pi^{\otimes n}_{\rho}$ denote the product distribution over $\{\pm 1\}^{n}$ where each coordinate has expectation $\rho$ . The Coin Problem asks what is the maximum advantage that functions in $\mathcal{F}$ have in distinguishing $\pi^{\otimes n}_{\rho}$ from the uniform distribution $\pi^{\otimes n}_{0}$ .

This quantity essentially captures how well $\mathcal{F}$ can approximate threshold functions, and in particular, the majority function. The coin problem has been studied for various models of computation including branching programs [11], $\mathsf{AC}^{0}$ and $\mathsf{AC}^{0}[\oplus]$ circuits [13, 40], product tests [41], and more. Recently, Agrawal [4] showed that the coin problem is closely related to the level-one Fourier growth of functions in $\mathcal{F}$ .

Lemma 1.4 ([4, Lemma 3.2]).

Assume that $\mathcal{F}$ is closed under restrictions and satisfies $L_{1,1}(f)\leq t$ for all $f\in\mathcal{F}$ . Then, for all $\rho\in(-1,1)$ and $f\in\mathcal{F}$ ,

\left|\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{\rho}}[f(z)]-\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{0}}[f(z)]\right|\leq\ln\left(\tfrac{1}{1-|\rho|}\right)\cdot t.

Note that communication protocols of small cost are closed under restrictions, so are their XOR-fibers (see [27, Lemma 5.5]). By noting that $\ln\left(\frac{1}{1-|\rho|}\right)\approx|\rho|$ for small values of $\rho$ , we obtain the following corollary.²²2Here we also use the fact that the upper bound $O(|\rho|\cdot\sqrt{d})$ is vacuous for large enough $\rho$ as it is larger than $1$ . We also remark that, using the Fourier growth reductions (see Subsection 1.2.3), Theorem 1.5 can be established for general gadgets of small size.

Theorem 1.5.

Let $h$ be the XOR-fiber of a protocol with total communication $d$ . Then for all $\rho$ ,

\left|\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{\rho}}[h(z)]-\operatorname*{\mathbb{E}}_{z\sim\pi^{\otimes n}_{0}}[h(z)]\right|\leq O\!\left(|\rho|\cdot\sqrt{d}\right).

In particular, consider the following distinguishing task: Alice and Bob either receive two uniformly random strings in $\{\pm 1\}^{n}$ or they receive two uniformly random strings in $\{\pm 1\}^{n}$ conditioned on their XOR distributed according to $\pi^{\otimes n}_{\rho}$ for $\rho=1/\sqrt{n}$ (the latter is often referred to as $\rho$ -correlated strings). Theorem 1.5 implies that any protocol communicating $o(n)$ bits cannot distinguish these two distributions with constant advantage. This is essentially a communication lower bound for the well-known Gap-Hamming Problem.

The Gap-Hamming Problem.

In the Gap-Hamming Problem, Alice and Bob receive strings $x,y\in\{\pm 1\}^{n}$ respectively and they want to distinguish if $\left\langle x,y\right\rangle\leq-\sqrt{n}$ or $\left\langle x,y\right\rangle\geq\sqrt{n}$ .

This is essentially the XOR-lift of the Coin Problem with $\rho=\pm 1/\sqrt{n}$ because the distribution of $(x,y)$ conditioned on $x\odot y\sim\pi^{\otimes n}_{\rho}$ with $\rho=-1/\sqrt{n}$ and $\rho=1/\sqrt{n}$ is mostly supported on the Yes and No instances of Gap-Hamming respectively. Thus immediately from Theorem 1.5, we derive a new proof for the $\Omega(n)$ lower bound on the communication complexity of the Gap-Hamming Problem. The proof is deferred to Appendix A.

Theorem 1.6.

The randomized communication complexity of the Gap-Hamming Problem is $\Omega(n)$ .

We note that there are various different proofs [18, 55, 67, 53] that obtain the above lower bound but the perspective taken here is perhaps conceptually simpler: (1) Gap-Hamming is essentially the XOR-lift of the Gap-Majority function, and (2) any function that approximates the Gap-Majority function must have large level-one Fourier growth, whereas XOR-fibers of small-cost protocols have small Fourier growth.

1.2.2 Quantum versus Classical Communication Separation via Lifting

One natural approach to proving quantum versus classical separations in communication complexity is via lifting: Consider a function $f$ separating quantum and classical query complexity and lift it using a gadget $g$ . Naturally, an algorithm computing $f$ with few queries to $z$ can be translated into a communication protocol computing $f\circ{g}$ where we replace each query to a bit $z_{i}$ with a short conversation that allows the calculation of $z_{i}=g(x_{i},y_{i})$ . Göös, Pitassi, and Watson [26] showed that for randomized query/communication complexity and for various gadgets, this is essentially the best possible. Such results are referred to as lifting theorems.

Lifting theorems apply to different models of computation, such as deterministic decision trees [48, 25], randomized decision trees [26, 12], and more. A beautiful line of work shows how to “lift” many lower bounds in the query model to the communication model [48, 25, 23, 24, 19, 31, 69, 17, 35, 61, 54, 50, 49, 22, 39]. For quantum query complexity, only one direction (considered the “easier” direction) is known: Any quantum query algorithm for $f$ can be translated to a communication protocol for $f\circ{g}$ with a small logarithmic overhead [6]. It remains widely open whether the other direction holds as well. However, this query-to-communication direction for quantum, combined with the communication-to-query direction for classical, is already sufficient for lifting quantum versus classical separations from the query model to the communication model.

One drawback of this approach to proving communication complexity separations is that the state-of-the-art lifting results [12, 37] work for gadgets with alphabet size at least $n$ (recall that $n$ denotes $f$ ’s input length) and it is a significant challenge to reduce the alphabet size to $O(1)$ or even $\mathrm{polylog}(n)$ . These large gadgets will usually result in larger overheads in terms of communication rounds, communication bits, and computations for both parties. As demonstrated next, lifting with simpler gadgets like XOR allows for a simpler quantum protocol for the lifted problem.

Lifting Forrelation with XOR.

The Forrelation function introduced by [2] is defined as follows: on input $x=(x_{1},x_{2})\in\{\pm 1\}^{n}$ where $n$ is a power of $2$ ,

\mathrm{Forr}(x)=\frac{2}{n}\left\langle Hx_{1},x_{2}\right\rangle,

where $H$ denotes the $(n/2)\times(n/2)$ (unitary) Hadamard matrix.

Girish, Raz, and Tal [27] studied the XOR-lift of the Forrelation problem and obtained new separations between quantum and randomized communication protocols. In more detail, they considered the partial function³³3We are overloading the notation here: technically, $\mathrm{Forr}\circ\mathrm{XOR}$ is the XOR-lift of the partial boolean function which on input $x$ outputs $1$ if $\mathrm{Forr}(x)$ is large and $-1$ if $\mathrm{Forr}(x)$ is small. $\mathrm{Forr}\circ\mathrm{XOR}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\}$ defined as

\mathrm{Forr}\circ\mathrm{XOR}(x,y)=\begin{cases}1&\mathrm{Forr}(x\odot y)\geq\frac{1}{200\ln(n/2)},\\ -1&\mathrm{Forr}(x\odot y)\leq\frac{1}{400\ln(n/2)},\end{cases}

and showed that if Alice and Bob use a randomized communication protocol, then they must communicate at least $\widetilde{\Omega}(n^{1/4})$ bits to compute $\mathrm{Forr}\circ{\mathrm{XOR}}$ ; while it can be solved by two entangled parties in the quantum simultaneous message passing model with a $\mathrm{polylog}(n)$ -qubit communication protocol and additionally the parties can be implemented with efficient quantum circuits.

The lower bound in [27] was obtained from a second level Fourier growth bound (higher levels are not needed) on the XOR-fiber of classical communication protocols. Our level-two bound strengthens their bound and immediately gives an improved communication lower bound.

Theorem 1.7.

The randomized communication complexity of $\mathrm{Forr}\circ\mathrm{XOR}$ is $\widetilde{\Omega}(n^{1/3})$ .

Theorem 1.7 above gives an $\mathrm{polylog}(n)$ versus $\widetilde{\Omega}(n^{1/3})$ separation between the above quantum communication model and the randomized two-party communication model, improving upon the $\mathrm{polylog}(n)$ versus $\widetilde{\Omega}(n^{1/4})$ separation from [27]. We emphasize that our separations are for players with efficient quantum running time, where the only prior separation was shown by the aforementioned work [27]. Such efficiency features can also benefit real-world implementations to demonstrate quantum advantage in experiments; for instance, one such proposal was introduced recently by Aaronson, Buhrman, and Kretschmer [3]. Without the efficiency assumption, a better $\mathrm{polylog}(n)$ versus $\widetilde{\Omega}(\sqrt{n})$ separation is known [21] (see [27, Section 1.1] for a more detailed comparison). Optimal Fourier growth bounds of $d\cdot\mathrm{polylog}(n)$ for level two, which we state later in 1.8, would also imply such a separation with XOR-lift of Forrelation.

Lifting $k$ -Fold Forrelation with XOR.

$k$ -Fold Forrelation [1] is a generalization of the Forrelation problem and was originally conjectured to be a candidate that exhibits a maximal separation between quantum and classical query complexity. In a recent work, [9] showed that the randomized query complexity of $k$ -Fold Forrelation is $\widetilde{\Omega}(n^{1-1/k})$ , confirming this conjecture, and a similar separation was proven in [57] for variants of $k$ -Fold Forrelation. These separations, together with lifting theorems with the inner product gadget [12], imply an $O(k\log(n))$ vs $\widetilde{\Omega}(n^{1-1/k})$ separation between two-party quantum and classical communication complexity, where additionally, the number of rounds⁴⁴4We remark that for $k=2$ , this is exactly the XOR-lift of the Forrelation problem and can even be computed in the quantum simultaneous model, as shown in [27]. in the two-party quantum protocol is $2\cdot\lceil k/2\rceil$ .

Replacing the inner product gadget with the XOR gadget above would yield an improved quantum-classical communication separation where the gadget is simpler and the number of rounds required by the quantum protocol to achieve the same quantitative separation is reduced by half. Bansal and Sinha [9] showed that for any computational model, small Fourier growth for the first $O(k^{2})$ -levels implies hardness of $k$ -Fold Forrelation in that particular model. Thus, in conjunction with their results, to prove the above XOR lifting result for the $k$ -Fold Forrelation problem, it suffices to prove the following Fourier growth bounds for XOR-fibers.

Conjecture 1.8.

Note that these bounds are consistent with the Fourier growth of parity decision trees (or protocols that only send parities) as shown in [30].

We prove the above conjecture for the case $k=1$ and make progress for the case $k=2$ . While our techniques can be extended to higher levels in a straightforward manner, the bounds obtained are farther from the conjectured ones. Thus, we decided to defer dealing with higher levels to future work as we believe one needs to first prove the optimal bound for level $k=2$ .

In the next subsection, we give another motivation to study the above conjecture by showing a connection to lifting theorems for constant-sized gadgets.

1.2.3 General Gadgets and Fourier Growth from Lifting

Our main results are Fourier growth bounds for XOR-fibers, which corresponds to XOR-lifts of functions. To complement this, we show that similar bounds hold for general lifted functions.

Let $g\colon\Sigma\times\Sigma\to\{\pm 1\}$ be a gadget and $\mathcal{C}\colon\Sigma^{n}\times\Sigma^{n}\to\{\pm 1\}$ be a communication protocol. Define the $g$ -fiber of $\mathcal{C}$ , denoted by $\mathcal{C}_{\downarrow g}\colon\{\pm 1\}^{n}\to[-1,1]$ , as

\mathcal{C}_{\downarrow g}(z)=\operatorname*{\mathbb{E}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=z_{i},~{}\forall i\right],

where $\bm{x}$ and $\bm{y}$ are uniform over $\Sigma$ . We use $L_{1,k}(g,d)$ to denote the upper bound of the level- $k$ Fourier growth for the $g$ -fibers of protocols with at most $d$ bits of communication. Using this notation, the XOR-fiber of $\mathcal{C}$ is simply $\mathcal{C}_{\downarrow\mathrm{XOR}}$ , and our main results Theorems 1.2 and 1.3 can be rephrased as

L_{1,1}(\mathrm{XOR},d)\leq O\left(\sqrt{d}\right)\quad\text{and}\quad L_{1,2}(\mathrm{XOR},d)\leq O\left(d^{3/2}\log^{3}(n)\right).

In Section 7, we relate $L_{1,k}(g,d)$ to $L_{1,k}(\mathrm{XOR},d)$ , and the main takeaway is, in the study of Fourier growth bounds, constant-sized gadgets are all equivalent.

Theorem 1.9 (Informal, see Theorem 7.5 and Theorem 7.6).

Let $g\colon\Sigma\times\Sigma\to\{\pm 1\}$ be a “balanced” gadget. Then

|\Sigma|^{-k}\cdot L_{1,k}(\mathrm{XOR},d)\leq L_{1,k}(g,d)\leq|\Sigma|^{k}\cdot L_{1,k}(\mathrm{XOR},d).

Theorem 1.9 also proposes a different approach towards 1.8: it suffices to establish tight Fourier growth bound for $g$ -fibers for some constant-sized (actually, polylogarithmic size suffices) gadget $g$ , and then apply the reduction. The benefit of switching to a different gadget is that we can perhaps first prove a lifting theorem, and then appeal to the known Fourier growth bounds of (randomized) decision trees [64, 57]. See Subsection 8.1 for detail.

As mentioned earlier, lifting theorems show how to simulate communication protocols of cost $d$ for lifted functions with decision trees of depth at most $O(d)$ (see e.g., [26]). A problem at the frontier of this fruitful line of work has been establishing lifting theorems for decision trees with constant-sized gadgets. Note that the XOR gadget itself cannot have such a generic lifting result: Indeed, the parity function serves as a counterexample. Nevertheless, it is speculative that some larger gadget works, which suffices for our purposes.⁵⁵5In terms of the separations between quantum and classical communication, even restricted lifting results for the specific outer function being the Forrelation function would suffice. On the other hand, for lifting from parity decision trees, we do know an XOR-lifting theorem [31]. However, it only holds for deterministic communication protocols and has a sextic blowup in the cost.

Thus, one can see 1.8 as either a further motivation for establishing lifting results for decision trees with constant-sized gadgets, or as a necessary milestone before proving such lifting results.

1.2.4 Pseudorandomness for Communication Protocols

We say $G\colon\{\pm 1\}^{\ell}\to\{\pm 1\}^{n}\times\{\pm 1\}^{n}$ is a pseudorandom generator (PRG) for a (randomized) communication protocol $\mathcal{C}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1]$ with error $\varepsilon$ and seed length $\ell$ if

\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{\bm{r}\sim\{\pm 1\}^{\ell}}[\mathcal{C}(G(\bm{r}))]\right|\leq\varepsilon.

[32] showed that for the class of protocols sending at most $d$ communication bits, there exists an explicit PRG of error $2^{-d}$ and seed length $n+O(d)$ from expander graphs. Note that the overhead $n$ is inevitable even if the protocol is only sending one bit, since it can depend arbitrarily on Alice/Bob’s input.

Combining 1.8 and the PRG construction from [14, Theorem 4.5], we would obtain a completely different explicit PRG for this class with error $\varepsilon$ and seed length $n+d\cdot\mathrm{polylog}(n/\varepsilon)$ .

Paper Organization.

An overview of our proofs is given in Section 2. In Section 3 we define necessary notation and recall useful inequalities. Section 4 explains a way to associate the Fourier growth to a martingale process. The proof of level-one bound (Theorem 1.2) is given in Section 5, and the level-two bound (Theorem 1.3) in Section 6. The Fourier growth reductions between general gadgets are presented in Section 7. The future directions are discussed in Section 8. Missing proofs can be found in the appendix.

2 Proof Overview

We first briefly outline the proof strategy, which consists of three main components:

•

First, we show that the level-one bound can be characterized as the expected absolute value of a martingale defined as follows: Consider the random walk induced on the protocol tree when Alice and Bob are given inputs $\bm{x}$ and $\bm{y}$ uniformly from $\{\pm 1\}^{n}$ . Let $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ be the rectangle associated with the random walk at time $t$ . The martingale process tracks the inner product $\left\langle\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})\right\rangle$ where $\mu(\bm{X}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{x}\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right]$ and $\mu(\bm{Y}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{y}\,\middle|\,\bm{y}\in\bm{Y}^{(t)}\right]$ are Alice’s and Bob’s center of masses.
•

Second, to bound the value of the martingale, it is necessary to ensure that neither $\bm{X}^{(t)}$ nor $\bm{Y}^{(t)}$ become excessively elongated in any direction during the protocol execution. To measure the length of $\bm{X}^{(t)}$ in a particular direction $\theta\in\mathbb{S}^{n-1}$ , we calculate the variance $\mathbb{V}\mathrm{ar}\left[\left\langle\bm{x},\theta\right\rangle\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right]$ , i.e. the variance of a uniformly random $\bm{x}\in\bm{X}^{(t)}$ in the direction $\theta$ . If the set is not elongated in any direction, this can be thought of as a spectral notion of almost pairwise independence. Such a notion also generalizes to almost $k$ -wise independence by considering higher moments.

To achieve the property that the sets are not elongated, one of the main novel ideas in our paper is to modify the original protocol to a new one that incorporates additional cleanup steps where the parties communicate real values $\left\langle\bm{x},\theta\right\rangle$ . Through these communication steps, the sets $\bm{X}^{(t)}$ and $\bm{Y}^{(t)}$ are recursively divided into affine slices along problematic directions.
•

Last, one needs to show that the number of cleanup steps are small in order to bound the value of the martingale for the new protocol. This is the most involved part of our proof and requires considerable effort because the cleanup steps are real-valued and adaptively depend on the entire history, including the previous real values communicated.

The strategy outlined above also generalizes to level-two Fourier growth by considering higher moments and sending values of quadratic forms in the inputs. We also remark that since we view the sets $\bm{X}^{(t)}$ and $\bm{Y}^{(t)}$ above as embedded in $\mathbb{R}^{n}$ and allow the protocol to send real values, it is more natural for us to work in Gaussian space by doing a standard transformation. The rotational invariance of the Gaussian space also seems to be essential for us to obtain optimal level-one bound without losing additional polylogarithmic factors.

We now elaborate on the above components in detail and also highlight the differences between the level-one and level-two settings. For conciseness, in the following overview we use $f\lesssim g$ to denote $f=O(g)$ and $f\gtrsim g$ to denote $f=\Omega(g)$ where $O$ and $\Omega$ only hide absolute constants.

2.1 Level-One Fourier Growth

The level-one Fourier growth of the XOR-fiber $h$ is given by

L_{1,1}(h)=\sum_{i=1}^{n}\left|\widehat{h}(\{i\})\right|=\sum_{i=1}^{n}\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}[h(\bm{z})\bm{z}_{i}]\right|=\sum_{i=1}^{n}\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{y}_{i}]\right|.

To bound the above, it suffices to bound $\sum_{i=1}^{n}\eta_{i}\cdot\operatorname*{\mathbb{E}}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{y}_{i}]$ for any sign vector $\eta\in\{\pm 1\}^{n}$ . Here for simplicity we assume $\eta_{i}\equiv 1$ and the probability of reaching every leaf is $\approx 2^{-d}$ .

A Martingale Perspective.

To evaluate the quantity $\sum_{i=1}^{n}\operatorname*{\mathbb{E}}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{y}_{i}]$ , consider a random leaf $\bm{\ell}$ of the protocol and let $\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}$ be the corresponding rectangle. Since the leaf determines the answer of the protocol, denoted by $\mathcal{C}(\bm{\ell})$ , the quantity above equals

\sum_{i=1}^{n}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\mathcal{C}(\bm{\ell})\cdot\operatorname*{\mathbb{E}}[\bm{x}_{i}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}]\cdot\operatorname*{\mathbb{E}}[\bm{y}_{i}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}]\right]=\operatorname*{\mathbb{E}}_{\bm{\ell}}[\mathcal{C}(\bm{\ell})\cdot\left\langle\mu(\bm{X}_{\bm{\ell}}),\mu(\bm{Y}_{\bm{\ell}})\right\rangle]\leq\operatorname*{\mathbb{E}}_{\bm{\ell}}[|\left\langle\mu(\bm{X}_{\bm{\ell}}),\mu(\bm{Y}_{\bm{\ell}})\right\rangle|],

where $\mu(\bm{X}_{\bm{\ell}})=\operatorname*{\mathbb{E}}\left[\bm{x}\,\middle|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]$ and $\mu(\bm{Y}_{\bm{\ell}})=\operatorname*{\mathbb{E}}\left[\bm{y}\,\middle|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]$ are the center of masses of the rectangle. Our goal is to bound the magnitude of the random variable $\bm{z}=\left\langle\mu(\bm{X}_{\bm{\ell}}),\mu(\bm{Y}_{\bm{\ell}})\right\rangle$ .

We shall show that $\operatorname*{\mathbb{E}}_{\bm{\ell}}[|\bm{z}|]\lesssim\sqrt{d}$ . Note that $|\bm{z}|$ can be as large as $d$ in the worst case — for instance if the first $d$ coordinates of $\bm{X}_{\bm{\ell}}$ and $\bm{Y}_{\bm{\ell}}$ are fixed to the same value — thus we cannot argue for each leaf separately.

To analyze it for a random leaf, we first characterize the above as a martingale process using the tree structure of the protocol. The martingale process is defined as $\left(\bm{z}^{(t)}\right)_{t}$ where $\bm{z}^{(t)}:=\left\langle\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})\right\rangle$ tracks the inner product between the center of masses $\mu(\bm{X}^{(t)})$ and $\mu(\bm{Y}^{(t)})$ of the current rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ at step $t$ . Denote the martingale differences by $\Delta\bm{z}^{(t+1)}=\bm{z}^{(t+1)}-\bm{z}^{(t)}$ and note that if in the $t^{\text{th}}$ step Alice sends a message, then

\Delta\bm{z}^{(t+1)}=\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t+1)})\right\rangle,

where $\Delta\mu(\bm{X}^{(t+1)})=\mu(\bm{X}^{(t+1)})-\mu(\bm{X}^{(t)})$ is the change in Alice’s center of mass. A similar expression holds if Bob sends a message. Then it suffices to bound the expected quadratic variation (see Section 3) since

\left(\operatorname*{\mathbb{E}}\left[\left|\bm{z}^{(d)}\right|\right]\right)^{2}\leq\operatorname*{\mathbb{E}}\left[\left(\bm{z}^{(d)}\right)^{2}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=0}^{d-1}\left(\Delta\bm{z}^{(t+1)}\right)^{2}\right],

(2.1)

where the equality holds due to the martingale property: $\operatorname*{\mathbb{E}}\left[\Delta\bm{z}^{(t+1)}\,\middle|\,\bm{z}^{(1)},\ldots\bm{z}^{(t)}\right]=0$ .

To obtain the desired bound, we need to bound the expected quadratic variation by $O(d)$ . Note that it could be the case that a single $\Delta\bm{z}^{(t+1)}$ scales like $\sqrt{d}$ . For instance, if Bob first announces his first $d$ coordinates, $y_{1},\ldots,y_{d}$ , and then Alice sends a majority of $x_{1}\cdot y_{1},\ldots,x_{d}\cdot y_{d}$ , then in the last step Alice’s center of mass $\mu(\bm{X}^{(t+1)})$ changes by $\approx 1/\sqrt{d}$ in each of the first $d$ coordinates, and the inner product with Bob’s center of mass changes by $\approx\sqrt{d}$ in a single step.

Such cases make it difficult to directly control the individual step sizes of the martingale and we will only be able to obtain an amortized bound. It turns out, as we explain later, that such an amortized bound on the martingale can be obtained if Alice and Bob’s sets are not elongated in any direction. Therefore, we will transform the original protocol into a clean protocol by introducing real communication steps that slice the elongated directions. For this, it will be convenient to work in Gaussian space which also turns out to be essential in proving the optimal $O(\sqrt{d})$ bound.

Protocols in Gaussian Space.

A communication protocol in Gaussian space takes as inputs $\bm{x},\bm{y}\in\mathbb{R}^{n}$ where $\bm{x},\bm{y}$ are independently sampled from the Gaussian distribution $\gamma_{n}$ . One can embed the original Boolean protocol in the Gaussian space by running the protocol on the uniformly distributed Boolean inputs $\mathrm{sgn}(\bm{x})$ and $\mathrm{sgn}(\bm{y})$ where $\mathrm{sgn}(\cdot)$ takes the sign of each coordinate. Note that any node of the protocol tree in the Gaussian space corresponds to a rectangle $X\times Y$ where $X,Y\subseteq\mathbb{R}^{n}$ . Abusing the notation and defining their Gaussian centers of masses as $\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\,\middle|\,\bm{x}\in X\right]$ and $\mu(Y)=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\bm{y}\,\middle|\,\bm{y}\in Y\right]$ , one can associate the same martingale $(\bm{z}^{(t)})_{t}$ with the protocol in the Gaussian space:

\bm{z}^{(t)}=\left\langle\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})\right\rangle.

It turns out that bounding the quadratic variation of this martingale suffices to give a bound on $L_{1,2}(h)$ (see Section 4), so we will stick to the Gaussian setting. We now describe the ideas behind the cleanup process so that the step sizes can be controlled more easily.

Cleanup with Real Communication.

The cleanup protocol runs the original protocol interspersed with some cleanup steps where Alice and Bob send real values. As outlined before, one of the goals of these cleanup steps is to ensure that the sets are not elongated in any direction, in order to control the martingale steps. In more detail, recall that we want to control

\operatorname*{\mathbb{E}}\left[(\Delta\bm{z}^{(t+1)})^{2}\,\middle|\,\bm{z}^{(1)},\ldots,\bm{z}^{(t)}\right]=\operatorname*{\mathbb{E}}\left[\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t+1)})\right\rangle^{2}\,\middle|\,\bm{z}^{(1)},\ldots,\bm{z}^{(t)}\right]

in the $t^{\text{th}}$ step where Alice speaks. There are two key underlying ideas for the cleanup steps:

•

Gram-Schmidt Orthogonalization: At each round, if the current rectangle is $\bm{X}\times\bm{Y}$ , before Alice sends the actual message, she sends the inner product $\left\langle x,\mu({\bm{Y}})\right\rangle$ between her input and Bob’s current center of mass $\mu({\bm{Y}})$ . This partitions Alice’s set $\bm{X}$ into affine slices orthogonal to Bob’s current center of mass $\mu(\bm{Y})$ . Thus the change in Alice’s center of mass in later rounds is orthogonal to $\mu(\bm{Y})$ since it only takes place inside the affine slice.

Recall that the martingale $\bm{z}^{(t)}$ is the inner product of Alice and Bob’s center of masses, and Bob’s center of mass does not change when Alice speaks. The original communication steps now do not contribute to the martingale and only the steps where the inner products are revealed do. In particular, if $t_{\mathrm{prev}}<t$ are two consecutive times where Alice revealed the inner product, then the change in Alice’s center of mass is orthogonal to change in Bob’s center of mass between time $t_{\mathrm{prev}}$ and $t$ . Thus, conditioned on the rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ fixed by the messages until time $t$ , we have, by Jensen’s inequality,

	$\displaystyle\operatorname*{\mathbb{E}}\left[(\Delta\bm{z}^{(t+1)})^{2}\,\middle\|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t)})-\mu(\bm{Y}^{(t_{\mathrm{prev}})})\right\rangle^{2}\,\middle\|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right]$
		$\displaystyle\leq\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}-\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})-\mu(\bm{Y}^{(t_{\mathrm{prev}})})\right\rangle^{2}\,\middle\|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right].$		(2.2)

Note that the quantity on the right-hand side above is of the form $\left\langle\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}],v\right\rangle$ . In other words, it is the variance of the random vector $\bm{x}$ along direction $v$ . To maintain a bound on this quantity, we introduce the notion of “not being elongated in any direction”.

•

Not elongated in any direction: We define the following notion to capture the fact that the random vector is not elongated in any direction: we say that a mean-zero random vector $\bm{x}^{\prime}=\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}]$ in $\mathbb{R}^{n}$ is $\lambda$ -pairwise clean, if for every $v\in\mathbb{R}^{n}$ ,

$\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}^{\prime},v\right\rangle^{2}\right]\leq\lambda\cdot\|v\|^{2},$ (2.3)

or equivalently, the operator norm of the covariance matrix $\operatorname*{\mathbb{E}}[\bm{x}^{\prime}\bm{x}^{\prime\top}]$ is at most $\lambda$ . This can be considered a spectral notion of almost pairwise independence, since the pairwise moments are well-behaved in every direction.

If the input distribution conditioned on Alice’s set $\bm{X}^{(t)}$ is $O(1)$ -pairwise clean, we say that her set is pairwise clean. Based on the above ideas, after Alice sends the initial message, if her set is not yet clean, she partitions it recursively by taking affine slices and transmitting real values. More precisely, while there is direction $\theta\in\mathbb{S}^{n-1}$ violating Equation 2.3, Alice does a cleanup of her set by sending the inner product $\left\langle x,\theta\right\rangle$ . This direction is known to Bob as it only depends on Alice’s current space. In addition, this cleanup does not contribute to the martingale in the future because the inner product along this direction is fixed now.

The resulting protocol is pairwise clean in the sense that at each step⁶⁶6We remark that the sets are only clean at intermediate steps where a cleanup phase ends, but we show that because of the orthogonalization step, the other steps do not contribute to the value of the martingale., Alice’s current set is pairwise clean. Similar arguments work for Bob.

Let $\bm{d}$ be the total number of communication rounds including all the cleanup steps. Then, by the above argument, and denoting by $(\bm{\tau}_{m})_{m}$ and $(\bm{\tau}^{\prime}_{m})_{m}$ the indices of the inner product steps for Alice and Bob, we can ultimately bound

	$\displaystyle\operatorname*{\mathbb{E}}\left[(\bm{z}^{(\bm{d})})^{2}\right]$	$\displaystyle\lesssim\operatorname*{\mathbb{E}}\left[\sum_{m}\left\\|\mu(\bm{X}^{(\bm{\tau}_{m})})-\mu(\bm{X}^{(\bm{\tau}_{m-1})})\right\\|^{2}+\left\\|\mu(\bm{Y}^{(\bm{\tau}^{\prime}_{m})})-\mu(\bm{Y}^{(\bm{\tau}^{\prime}_{m}-1)})\right\\|^{2}\right]$
		$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\\|\mu(\bm{X}^{(\bm{d})})\right\\|^{2}+\left\\|\mu(\bm{Y}^{(\bm{d})})\right\\|^{2}\right],$		(2.4)

where again, the last equality follows from the martingale property. The right hand side above can be bounded by the expected number of communication rounds $\operatorname*{\mathbb{E}}[\bm{d}]$ using the level-one inequality (see Theorem 3.1) — this inequality bounds the Euclidean norm of the center of mass of a set in terms of its Gaussian measure.

Expected Number of Cleanup steps.

Since the original communication only consists of $d$ rounds, the analysis essentially reduces to bounding the expected number of cleanup steps by $O(d)$ , which is technically the most involved part of the proof.

It is implicit in the previous works on the Gap-Hamming Problem [18, 67] that large sets are not elongated in many directions: if a set $X\subseteq\mathbb{R}^{n}$ has Gaussian measure $\approx 2^{-d}$ , then for a random vector $\bm{x}$ sampled from $X$ , there are at most $m\lesssim d$ orthogonal directions $\theta_{1},\ldots,\theta_{m}$ such that $\operatorname*{\mathbb{E}}[\left\langle\bm{x}^{\prime},\theta_{i}\right\rangle^{2}]\gtrsim 1$ where $\bm{x}^{\prime}=\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}]$ . This is a consequence of the fact that the expectation of $\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}^{\prime},\theta_{i}\right\rangle^{2}$ can be bounded by $O(d)$ provided that $X$ has measure $\approx 2^{-d}$ .

The above argument suggests that maybe we can clean up the set $X$ along these $O(d)$ bad orthogonal directions. However this is not enough for our purposes: after taking an affine slice, the set may not be clean in a direction where it was clean before. Moreover, since the parties take turns to send messages and clean up, the bad directions will also depend on the entire history of the protocol, including the previous real and Boolean communication. This adaptivity makes the analysis more delicate and to prove the optimal bound we crucially utilize the rotational symmetry of the Gaussian distribution. Indeed, the fact that a large set is not elongated in many directions also holds even when we replace the Gaussian distribution with the uniform distribution on $\{\pm 1\}^{n}$ , but it is unclear how to obtain an optimal level-one bound using the latter.

In the final protocol, since the parties only send Boolean bits and linear forms of their inputs, conditioned on the history of the martingale, one can still say what the distribution of the next cleanup $\left\langle\bm{x},\theta\right\rangle$ looks like, as the Gaussian distribution is well-behaved under linear projections. We then use martingale concentration and stopping time arguments to show that the expected number of cleanup steps is indeed bounded by $O(d)$ even if the cleanup is adaptive.

We make two remarks in passing: First, we can also prove the optimal level-one bound using information-theoretic ideas but they do not seem to generalize to the level-two setting, so we adopt the alternative concentration-based approach here and they are similar in spirit. Second, it is possible from our proof approach (in particular, the approach for level two described next) to derive a weaker upper bound of $\sqrt{d}\cdot\mathrm{polylog}(n)$ for the level one while directly working with the uniform distribution on the hypercube.

2.2 Level-Two Fourier Growth

We start by noting that the level-two Fourier growth of the XOR-fiber $h$ is given by

L_{1,2}(h)=\sum_{i\neq j}\left|\widehat{h}(\{i,j\})\right|=\sum_{i\neq j}\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}[h(\bm{z})\bm{z}_{i}\bm{z}_{j}]\right|=\sum_{i\neq j}\left|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\nu}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{x}_{j}\bm{y}_{i}\bm{y}_{j}]\right|.

To bound the above, it suffices to bound $\sum_{i\neq j}\eta_{ij}\cdot\operatorname*{\mathbb{E}}[\mathcal{C}(\bm{x},\bm{y})\bm{x}_{i}\bm{x}_{j}\bm{y}_{i}\bm{y}_{j}]$ for any symmetric sign matrix $(\eta_{ij})$ . For this proof overview, we assume for simplicity that $\eta_{ij}\equiv 1$ .

Martingales and Gram-Schmidt Orthogonalization.

Similar to the case of level one, the level-two Fourier growth also has a martingale formulation. In particular, let $\bm{X}^{(t)}$ and $\bm{Y}^{(t)}$ be Alice and Bob’s sets at time $t$ as before and define $\sigma(\bm{X}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right],\sigma(\bm{Y}^{(t)})=\operatorname*{\mathbb{E}}\left[\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y}\,\middle|\,\bm{y}\in\bm{Y}^{(t)}\right]$ to be the $n\times n$ matrices that represent the level-two center of masses of the two sets. Here $\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y}$ denotes the tensor product $\bm{x}\otimes\bm{y}$ with the diagonal zeroed out.⁷⁷7Here $x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y$ is an $n\times n$ matrix. We will also interchangeably view $n\times n$ matrices as $n^{2}$ -length vectors. To bound the level-two Fourier growth, it suffices to bound the expected quadratic variation of the martingale $\left(\bm{z}^{(t)}\right)_{t}$ defined by taking the inner product of the level-two center of masses $\bm{z}^{(t)}:=\left\langle\sigma(\bm{X}^{(t)}),\sigma(\bm{Y}^{(t)})\right\rangle$ where $\left\langle\cdot,\cdot\right\rangle$ is the inner product of two matrices viewed as vectors.

To this end, we again move to Gaussian space where the inputs $x,y\in\mathbb{R}^{n}$ and transform the protocol to a clean protocol. First, we need an analog of the Gram-Schmidt orthogonalization step — this is achieved in a natural way by Alice sending inner product $\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\sigma(\bm{Y}^{(t)})\right\rangle$ with Bob’s level-two center of mass, and Bob does the same. Note that Alice and Bob are now exchanging values of quadratic polynomials in their inputs. Thus, to control the step sizes, we now need to control the second moment of quadratic forms which naturally motivates the following spectral analogue of $4$ -wise independence.

4-wise Cleanup with Quadratic Forms.

We say a random vector $\bm{x}$ is $4$ -wise clean with parameter $\lambda$ if the operator norm of the $n^{2}\times n^{2}$ covariance matrix

\operatorname*{\mathbb{E}}\left[\left(\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right]\right)\left(\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right]\right)^{\top}\right]

is at most $\lambda$ where we view $\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}]$ as an $n^{2}$ -dimensional vector. This is equivalent to saying that for any quadratic form $\left\langle M,\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right\rangle$ ,

\operatorname*{\mathbb{E}}\left[\left\langle M,\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\operatorname*{\mathbb{E}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right]\right\rangle^{2}\right]\leq\lambda\left\|M\right\|^{2},

(2.5)

where $\left\|M\right\|$ denotes the Euclidean norm of $M$ when viewed as a vector. Thus, this allows us to control the second moment of any quadratic polynomial (and in particular, fourth moments of linear functions). We note that one can generalize the above spectral notion to $k$ -wise independence in the natural way by looking at the covariance matrix of the tensor $\bm{x}^{\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}k}$ .

We say a set is $4$ -wise clean with parameter $\lambda$ if Equation 2.5 is preserved for all $M$ with zero diagonal⁸⁸8The requirement of zero diagonal is for analysis purposes only and can be assumed without loss of generality since $\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}$ is zero diagonal anyway.. Combined with this notion, one can define the cleanup in an analogous way to the level-one cleanup: While there exists some $M\in\mathbb{R}^{n\times n}$ violating Equation 2.5, Alice sends the quadratic form $\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,M\right\rangle$ to Bob until her set is 4-wise clean with parameter $\lambda$ .

Cleanup Analysis via Hanson-Wright Inequalities.

The crux of the proof is to bound the number of cleanup steps which, together with a similar analysis as in the level-one case, gives us the desired bound. We show that $m\lesssim d$ cleanup steps suffice in expectation to make the sets $4$ -wise clean for $\lambda\leq d\cdot\mathrm{polylog}(n)$ . Analogous to Equation 2.1 and Subsection 2.1, this gives a bound of $d^{3}\cdot\mathrm{polylog}(n)$ on the expected quadratic variation and implies $L_{1,2}(h)\leq d^{3/2}\cdot\mathrm{polylog}(n)$ .

Since the parties send values of quadratic forms now, the analysis here is significantly more involved compared to the level-one case, even after moving to the Gaussian setting, where one could previously use the fact that the Gaussian distribution behaves nicely under linear projections. We rely on a powerful generalization of the Hanson-Wright inequality to a Banach-space-valued setting due to Adamczak, Latała, and Meller [5]. This inequality gives a tail bound for sum of squares of quadratic forms: In particular if $M_{1},\ldots,M_{m}$ are matrices with zero diagonal which form an orthonormal set when viewed as $n^{2}$ dimensional vectors, then the random variable $\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}$ satisfies $\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}[\bm{q}\geq t]\leq e^{-\Omega(\sqrt{t})}$ for any $t\gtrsim m^{2}$ (see Theorem 3.3 for a precise statement). We remark that this tail bound relies on the orthogonality of the quadratic forms and is much sharper than, for example, the bound obtained from hypercontractivity or other standard polynomial concentration inequalities.

In our setting, the matrices are being chosen adaptively. In addition, the parties are sending quadratic forms in their inputs, and the distribution of the next $\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M\right\rangle$ conditioned on the history is hard to determine, unlike the level-one case. To handle this, we replace the real communication with Boolean communication of finite precision $\pm 1/\mathrm{poly}(n)$ . This means that whenever Alice wants to perform cleanup $\left\langle\bm{x}\otimes\bm{x},M\right\rangle$ for some $M$ known to both parties, she sends only $O(\log(n))$ bits. On the one hand, this modification is similar enough to the cleanup protocol with real messages so that most of the argument carries through. On the other hand, now the protocol is completely discrete, which allows us to condition on any particular transcript.

For intuition, assume we fix a transcript of $L=d+O(m\log(n))$ bits which has gone through $m$ cleanups. Typically, this transcript should capture $\approx 2^{-L}$ of the probability mass. More crucially, the matrices $M_{1},\ldots,M_{m}$ for the cleanups are also fixed along the transcript, and one can apply the aforementioned Hanson-Wright inequality on $\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}$ . Combining the two facts together, we can apply the non-adaptive tail bound above and then condition on obtaining such typical transcript. This shows $\operatorname*{\mathbb{E}}[\bm{q}]\leq d^{2}\cdot\mathrm{polylog}(n)$ . However, each quadratic form comes from a violation of Equation 2.5 and contributes at least $\lambda$ to $\bm{q}$ in expectation. This implies that $\operatorname*{\mathbb{E}}[\bm{q}]\geq\lambda\cdot m$ and by taking $\lambda=d\cdot\mathrm{polylog}(n)$ , we derive that the number of cleanup steps $m\lesssim d$ . This shows that the level-two Fourier growth is $O((m+d)\cdot\sqrt{\lambda})=d^{3/2}\cdot\mathrm{polylog}(n)$ completing the proof.

Note that if we could take $\lambda=\mathrm{polylog}(n)$ while having the same number of cleanup steps $m=d\cdot\mathrm{polylog}(n)$ , then we would obtain an optimal level-two bound of $d\cdot\mathrm{polylog}(n)$ . However, it is not clear how to use current approach to show this. In Subsection 8.2, we identify examples showing the tightness of our current analysis and also discuss potential ways to circumvent the obstacles within.

We remark that by replacing the Hanson-Wright inequality with its higher-degree variants and performing level- $k$ cleanups, we can analyze level- $k$ Fourier growth in the similar way. However, since the first two levels already suffice for our applications and we believe that our level-two bound can be further improved, we do not make the effort of generalizing it to higher levels here.

3 Preliminaries

Notation.

Throughout, $\log(\cdot)$ and $\ln(\cdot)$ denote logarithms with base $2$ and $e$ respectively. We use $\mathbb{N}=\left\{0,1,2,\ldots\right\}$ to denote the set of natural numbers including 0. For $n\in\mathbb{N}$ , we write $[n]$ to denote the set $\left\{1,2,\ldots,n\right\}$ . We use the standard $O(\cdot),\Omega(\cdot),\Theta(\cdot)$ notation, and emphasize that in this paper they only hide universal constants that do not depend on any parameter.

We write $\odot$ to denote the entrywise product for vectors and matrices: in particular, for any $x,y\in\mathbb{R}^{n}$ , we define $x\odot y\in\mathbb{R}^{n}$ to be a vector where $(x\odot y)_{i}=x_{i}y_{i}$ for $i\in[n]$ and similarly for any $X,Y\in\mathbb{R}^{n\times m}$ , we define $X\odot Y\in\mathbb{R}^{n\times m}$ to be a matrix where $(X\odot Y)_{ij}=X_{ij}Y_{ij}$ for $i\in[n],j\in[m]$ . We use $\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}$ to denote a tensor with zeros on the diagonal, i.e., for any $x\in\mathbb{R}^{n}$ , $x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x$ is a $n\times n$ matrix where $\left(x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right)_{ij}=x_{i}x_{j}$ if $i\neq j$ and zero if $i=j$ .

For a vector $x\in\mathbb{R}^{n}$ , we use $\left\|x\right\|$ to denote its Euclidean norm. Similarly, for a matrix $X\in\mathbb{R}^{n\times n}$ , we use $\left\|X\right\|$ to denote its Euclidean norm viewing the matrix $X$ as an $n^{2}$ -dimensional vector. For nonzero $x\in\mathbb{R}^{n}$ or $X\in\mathbb{R}^{n\times n}$ , we define $\mathrm{unit}(x)\in\mathbb{R}^{n}$ or $\mathrm{unit}(X)\in\mathbb{R}^{n\times n}$ as the unit vector along direction $x$ and $X$ respectively: $\mathrm{unit}(x)=x/\left\|x\right\|$ and $\mathrm{unit}(X)=X/\left\|X\right\|$ . We write $\mathbb{S}^{n-1}$ for the unit sphere in $\mathbb{R}^{n}$ , and write $\mathbb{S}^{n\times n-1}$ for the unit sphere in $\mathbb{R}^{n\times n}$ where additionally the diagonal entries of the $n\times n$ matrices are zero. We use $\left\langle x,y\right\rangle$ to denote the inner product between vectors $x,y\in\mathbb{R}^{n}$ and $\left\langle X,Y\right\rangle$ to denote the inner product between matrices $X,Y\in\mathbb{R}^{n\times n}$ viewing them as $n^{2}$ -dimensional vectors.

Probability.

A probability space is a triple $(\Omega,\mathcal{F},\xi)$ where $\Omega$ is the sample space, $\mathcal{F}$ is a $\sigma$ -algebra which describes the measurable sets (or events) in the probability space, and $\xi$ is a probability measure. We use $\bm{x}\sim\xi$ to denote a random sample distributed according to $\xi$ and $\operatorname*{\mathbb{E}}_{\bm{x}\sim\xi}[f(\bm{x})]$ to denote the expectation of a function $f$ under the measure $\xi$ . For any event $S\in\mathcal{F}$ , we use $\xi(S)$ to denote the measure of $S$ under $\xi$ . We say an event $S$ holds almost surely if $\xi(S)=1$ , i.e., the exceptions to the event have measure zero. For a measurable event $\mathcal{E}\in\mathcal{F}$ , we write $\mathcal{F}\cap\{\mathcal{E}\}$ to denote the intersection of the sigma-algebra $\mathcal{F}$ and the sigma-algebra generated by $\mathcal{E}$ .

We use $\nu_{n}$ to denote the uniform probability measure over $\{\pm 1\}^{n}$ and $\gamma_{n}$ to denote the $n$ -dimensional standard Gaussian measure in $\mathbb{R}^{n}$ . We say a random variable $\bm{x}\in\mathbb{R}^{n}$ is a standard Gaussian in $\mathbb{R}^{n}$ if its probability distribution is $\gamma_{n}$ . We will drop the subscript if the dimension is clear from context. We will also need lower dimensional Gaussian measures: given a linear subspace $V$ of dimension $k$ , there is a $k$ -dimensional standard Gaussian measure on it, which we denote by $\gamma_{V}$ . For any measurable subset $S\subseteq\mathbb{R}^{n}$ , we define its ambient space to be the smallest affine subspace $V+t$ that contains it where $V$ is a linear subspace of $\mathbb{R}^{n}$ and $t\in\mathbb{R}^{n}$ . The relative Gaussian measure of $S$ denoted by $\gamma_{\mathrm{rel}}(S)$ is then defined to be the Gaussian measure of the set $S-t$ under $\gamma_{V}$ .

Martingales.

Given a sequence of real-valued random variables $\bm{x}_{1},\bm{x}_{2},\ldots,\bm{x}_{n}$ in a probability space $(\Omega,\mathcal{F},\xi)$ and a function $f(\bm{x}_{1},\ldots,\bm{x}_{n})$ satisfying $\operatorname*{\mathbb{E}}\left[|f(\bm{x}_{1},\ldots,\bm{x}_{n})|\right]<\infty$ , the sequence of random variables $\bm{z}^{(t)}=\operatorname*{\mathbb{E}}\left[f(\bm{x}_{1},\ldots,\bm{x}_{n})\,\middle|\,\mathcal{F}^{(t-1)}\right]$ is called the Doob martingale where $\mathcal{F}^{(t-1)}$ is the $\sigma$ -algebra generated by $\bm{x}_{1},\ldots,\bm{x}_{t-1}$ which should be viewed as a record of the randomness of the process until time $t-1$ . The sequence $(\mathcal{F}^{(t)})_{t}$ is called a filtration. A sequence of random variables $(\bm{z}^{(t)})_{t}$ is called predictable (or adapted) with respect to $\mathcal{F}^{(t)}$ if $\bm{z}^{(t)}$ is $\mathcal{F}^{(t)}$ -measurable for every $t$ , meaning that it is determined by the randomness in $\mathcal{F}^{(t)}$ .

A discrete random variable $\bm{\tau}\in\mathbb{N}$ is called a stopping time with respect to the filtration $(\mathcal{F}^{(t)})_{t}$ if the event $\{\bm{\tau}=t\}\in\mathcal{F}^{(t)}$ for all $t\in\mathbb{N}$ , or in words, whether the event $\bm{\tau}=t$ occurs is determined by the history of the process until time $t$ . All stopping times considered in this paper will be finite. The sigma-algebra $\mathcal{F}^{(\bm{\tau})}$ which contains all events that imply the stopping condition is defined as the set of all events $\mathcal{E}$ such that $\mathcal{E}\cap\{\bm{\tau}=t\}\in\mathcal{F}^{(t)}$ for all $t\in\mathbb{N}$ . We also note if one takes an increasing sequence of stopping times $(\bm{\tau}_{m})_{m}$ then the process defined by $(\bm{z}^{(\bm{\tau}_{m})})_{m}$ is also a martingale.

Let $\Delta\bm{z}^{(t)}:=\bm{z}^{(t)}-\bm{z}^{(t-1)}$ be the martingale differences. Note that $\operatorname*{\mathbb{E}}\left[\Delta\bm{z}^{(t)}\,\middle|\,\mathcal{F}^{(t-1)}\right]=0$ and thus

\operatorname*{\mathbb{E}}\left[\left(\bm{z}^{(t)}\right)^{2}\right]=\operatorname*{\mathbb{E}}\left[\left(\sum_{t=1}^{n}\Delta\bm{z}^{(t)}\right)^{2}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{n}\left(\Delta\bm{z}^{(t)}\right)^{2}\right],

(3.1)

where the cross terms disappear upon taking expectation. In other words, the martingale differences are orthogonal under taking expectations. The right hand side above is the expected quadratic variation of the martingale $\left(\bm{z}^{(t)}\right)_{t}$ . If the sequence $(\bm{z}^{(t)})_{t}$ is vector-valued (resp., matrix-valued) and satisfies $\operatorname*{\mathbb{E}}\left[\Delta\bm{z}^{(t)}\,\middle|\,\mathcal{F}^{(t-1)}\right]=0$ where $0$ is zero vector (resp., matrix), then we say it is a vector-valued (resp., matrix-valued) martingale with respect to $(\mathcal{F}^{(t)})_{t}$ . Since each coordinate of a vector or matrix-valued martingale is itself a real-valued martingale, vector-valued or matrix-valued martingale differences are also orthogonal under Euclidean norms:

\operatorname*{\mathbb{E}}\left[\left\|\bm{z}^{(t)}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\left\|\sum_{t=1}^{n}\Delta\bm{z}^{(t)}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{n}\left\|\Delta\bm{z}^{(t)}\right\|^{2}\right].

(3.2)

Useful Inequalities.

We will use the well-known level- $k$ inequality [62, 34] (see e.g., [44, Level- $k$ Inequalities]). A statement in the Gaussian setting can be found in, e.g., [20, Lemma 2.2]. We remark that we will only use the case for $k=1$ and $k=2$ here which we state below.⁹⁹9Our Theorem 3.1 is slightly different from the references, where they additionally require $\mu\leq 1/e$ . By Parseval’s identity, the left hand side is always at most one. Therefore we use a slightly worse bound for the right hand side to allow for the whole range of $\mu$ .

Below we write $\mathbf{1}_{A}$ for the indicator function of a set and $x_{S}=\prod_{i\in S}x_{i}$ for a monomial.

Theorem 3.1 (Level- $k$ Inequality).

Let $k\in\{1,2\}$ . Assume $A\subseteq\mathbb{R}^{n}$ is measurable and $\mu:=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}[\mathbf{1}_{A}(\bm{x})]$ . Then, we have

\sum_{|S|=k}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\mathbf{1}_{A}(\bm{x})\bm{x}_{S}\right]\right)^{2}\leq 2e^{2}\mu^{2}\cdot\ln^{k}(e/\mu).

In particular, if $\mu$ is non-zero, dividing both sides by $\mu^{2}$ , we get the following more convenient form for $k\in\{1,2\}$ :

\sum_{|S|=k}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}_{S}\,\middle|\,\bm{x}\in A\right]\right)^{2}\leq 2e^{2}\cdot\ln^{k}(e/\mu).

We also make use of the following standard concentration inequality for sums of squares of independent standard Gaussians (see [66]).

Fact 3.2.

Let $m\in\mathbb{N}$ be arbitrary. For any $r\geq 2m$ , we have $\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{m}}\left[\sum_{i=1}^{m}\bm{x}_{i}^{2}\geq r\right]\leq e^{-{r}/{4}}$ .

We also need a concentration inequality for sums of squares of orthogonal quadratic forms over Gaussian random variables. In particular, we prove the following inequality which follows from a generalization of the Hanson-Wright inequality to a Banach space-valued setting [5, Theorem 6]. Since, we only need a special case that is easier to prove, we include a self-contained proof using the Gaussian isoperimetric inequality in Appendix B following [5, Proposition 23].

Theorem 3.3.

Let $m\in\mathbb{N}$ be arbitrary. Let $M_{1},\ldots,M_{m}$ be $n\times n$ real matrices where each $M_{i}$ has zero diagonal, $\left\langle M_{i},M_{i}\right\rangle=1$ and $\left\langle M_{i},M_{j}\right\rangle=0$ for $i\neq j$ . Then for any $r\geq 98m$ , we have

\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}\geq r\right]\leq\exp\left\{-\Omega\left(\frac{r}{m+\sqrt{r}}\right)\right\}.

We remark that the tail bound above holds more generally for sub-Gaussian random variables $\bm{x}$ (see [5]).

4 Fourier Growth via Martingales in Gaussian Space

In this section, we reduce the question of bounding the level-one and level-two Fourier growth to bounding the expected quadratic variation of certain martingales. To analyze these martingales and to prove the optimal bound for the level-one setting, it seems to be crucial to work in the Gaussian setting, so first we give a generic transformation from Boolean to Gaussian. We shall also additionally allow protocols that communicate real numbers to make the analysis easier.

4.1 Communication Protocols in Gaussian Space

Let $\mathcal{C}:\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to\{\pm 1\}$ be a communication protocol with total communication $d$ and $h$ be its XOR-fiber defined in Definition 1.1.

We embed the protocol in the Gaussian space by allowing Alice’s and Bob’s inputs, $x$ and $y$ respectively, to be real vectors in $\mathbb{R}^{n}$ — the new protocol $\widetilde{\mathcal{C}}$ runs the original protocol $\mathcal{C}$ with Boolean inputs $\mathrm{sgn}(x)$ and $\mathrm{sgn}(y)$ where $\mathrm{sgn}(v)=(\mathrm{sgn}(v_{1}),\ldots,\mathrm{sgn}(v_{n}))$ denotes the sign function applied pointwise to each coordinate for a vector $v\in\mathbb{R}^{n}$ . The behavior of the communication protocol $\widetilde{\mathcal{C}}$ can be defined arbitrarily if any coordinate of $\mathrm{sgn}(x)$ or $\mathrm{sgn}(y)$ is zero since such points have zero measure under the standard $n$ -dimensional Gaussian measure $\gamma_{n}$ .

This translation from the Boolean hypercube to the Gaussian space preserves the measure of sets: for any subset $S\subseteq\{\pm 1\}^{n}$ , we have $\nu_{n}(S)=\gamma_{n}(\left\{x\in\mathbb{R}^{n}\,\middle|\,\mathrm{sgn}(x)\in S\right\})$ where $\nu_{n}$ is the uniform measure over $\{\pm 1\}^{n}$ . Moreover, up to some normalizing factor, the Fourier coefficients of $h$ can also be computed by looking at Gaussian inputs. In particular, denoting by $x_{S}=\prod_{i\in S}x_{i}$ for a subset $S\subseteq[n]$ , we have the following fact.

Fact 4.1.

For all $S\subseteq[n]$ , we have $\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}\left[h(\bm{z})\bm{z}_{S}\right]=(\pi/2)^{|S|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\widetilde{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]$ .

Proof.

Note that for $\bm{x}\sim\gamma_{n}$ , the random variable $\mathrm{sgn}(\bm{x})$ is distributed as $\nu_{n}$ . Thus, by the definition of the XOR-fiber $h$ and the protocol $\widetilde{\mathcal{C}}$ , we have

	$\displaystyle\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}\left[h(\bm{z})\bm{z}_{S}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\mathcal{C}(\mathrm{sgn}(\bm{x}),\mathrm{sgn}(\bm{y}))\cdot\prod_{i\in S}\mathrm{sgn}(\bm{x}_{i})\cdot\mathrm{sgn}(\bm{y}_{i})\right]$
		$\displaystyle=(\pi/2)^{\|S\|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\mathcal{C}(\mathrm{sgn}(\bm{x}),\mathrm{sgn}(\bm{y}))\cdot\prod_{i\in S}\bm{x}_{i}\cdot\bm{y}_{i}\right]$
		$\displaystyle=(\pi/2)^{\|S\|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\widetilde{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right],$

where the second line follows since the expected value of a standard Gaussian in $\mathbb{R}$ conditioned on its sign being fixed to $\eta$ is $\sqrt{\frac{2}{\pi}}\cdot\eta$ by the following calculation:

\operatorname*{\mathbb{E}}_{\bm{x}_{i}\sim\gamma}\left[\bm{x}_{i}\,\middle|\,\mathrm{sgn}(\bm{x}_{i})=\eta\right]=\eta\cdot\int_{0}^{\infty}\sqrt{\frac{2}{\pi}}\cdot r\cdot e^{-r^{2}/2}\mathrm{d}r=\sqrt{\frac{2}{\pi}}\cdot\eta.\qed

Remark 4.2.

We remark that instead of the Gaussian distribution above, one can work with any distribution where the coordinates are i.i.d. and symmetric around zero. In particular, if $\xi$ is a symmetric probability measure on the real line, and $\bm{x},\bm{y}$ are independently drawn vectors in $\mathbb{R}^{n}$ where each coordinate is i.i.d. sampled from $\xi$ , then $\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}\left[h(\bm{z})\bm{z}_{S}\right]=c_{\xi}^{|S|}\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\xi^{\otimes n}}\left[\widetilde{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]$ where $c_{\xi}=(\operatorname*{\mathbb{E}}_{\bm{x}_{i}\sim\xi}[|\bm{x}_{i}|])^{-2}$ . In the case of level-two we will need to work with the truncated Gaussian distribution where each coordinate is sampled independently from the one dimensional standard Gaussian conditioned on being in some interval $[-T,T]$ for $T=\Omega(1)$ in which case $c_{\xi}$ is upper bounded by a universal constant.

4.2 Generalized Communication Protocols

In the protocol $\widetilde{\mathcal{C}}$ defined above, Alice and Bob’s inputs $x$ and $y$ are real vectors in $\mathbb{R}^{n}$ , but in each round they still exchange a single bit based on $\mathrm{sgn}(x)$ and $\mathrm{sgn}(y)$ . In order to bound the Fourier growth, it will be more convenient for us to define a notion of generalized communication protocols where parties are also allowed to send real numbers with arbitrary precision in each round. To define this formally, we place certain restrictions on the real communication in the protocol. More formally, in a generalized communication protocol, in each round a player with input $z\in\mathbb{R}^{n}$ can either send:

(i)

a bit in $\{0,1\}$ which is purely a function of the Boolean input $\mathrm{sgn}(z)$ and the previous Boolean messages, or
(ii)

a real number that is a measurable function of $z$ and the previous (real or Boolean) messages.

The depth of a generalized communication protocol is defined to be the maximum number of rounds of communication.

Note that a generalized protocol also generates a “protocol tree” where if in a round a real number is sent, the “children” of that particular “node” are indexed by all possible values in $\mathbb{R}$ . A “transcript” of the protocol can be defined in an analogous way. The set of inputs that reach a particular node of this generalized protocol tree still form a rectangle $X\times Y$ where $X,Y\subseteq\mathbb{R}^{n}$ . We say that a generalized protocol $\overline{\mathcal{C}}$ is equivalent to the protocol $\widetilde{\mathcal{C}}$ if ${\overline{\mathcal{C}}}(x,y)=\widetilde{\mathcal{C}}(x,y)$ for every $x,y\in\mathbb{R}^{n}$ except on a measure zero set.

We will be interested in random walks on such generalized protocol trees when the inputs $\bm{x}$ and $\bm{y}$ are sampled from a product measure $\xi_{x}\times\xi_{y}$ on $\mathbb{R}^{n}\times\mathbb{R}^{n}$ and the parties send messages according to the protocol to reach a “leaf”. The random variables corresponding to the messages until any time $t$ generate a filtration $(\mathcal{F}^{(t)})_{t}$ — this filtration can be thought of as specifying a particular node of the generalized protocol at depth $t$ (equivalently, a partial transcript of the protocol till time $t$ ) that was sampled by the process. In this case, conditioned on any event in $\mathcal{F}^{(t)}$ , (e.g., any realization of the transcript till time $t$ ), almost surely the conditional probability measure on the inputs $\bm{x},\bm{y}$ is some product measure on $\xi_{x}^{(t)}\times\xi_{y}^{(t)}$ supported on a rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ where $\bm{X}^{(t)},\bm{Y}^{(t)}\subseteq\mathbb{R}^{n}$ . We shall refer to the random variable $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ as the current rectangle determined by $\mathcal{F}^{(t)}$ . Since we will be working with product measures on inputs $\bm{x},\bm{y}$ , the reader can think of conditioning on the filtration $\mathcal{F}^{(t)}$ as essentially conditioning on the inputs being in the rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ or equivalently a partial transcript till time $t$ .

4.3 Fourier Growth via Martingales

We will now relate Fourier growth to the quadratic variation of a martingale. Towards this end, we first note that in light of Fact 4.1, the level- $k$ Fourier growth of the XOR-fiber $h$ of the original communication protocol is given by

	$\displaystyle L_{1,k}(h)=\sum_{\begin{subarray}{c}S\subseteq[n]\\ \|S\|=k\end{subarray}}\left\|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}[h(\bm{z})\bm{z}_{S}]\right\|$	$\displaystyle=(\pi/2)^{k}\sum_{\begin{subarray}{c}S\subseteq[n]\\ \|S\|=k\end{subarray}}\left\|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}]\right\|$
		$\displaystyle=(\pi/2)^{k}\max_{(\eta_{S})_{\|S\|=k}}\sum_{\begin{subarray}{c}S\subseteq[n]\\ \|S\|=k\end{subarray}}\eta_{S}{\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]},$		(4.1)

where $\overline{\mathcal{C}}$ is any generalized protocol that is equivalent to $\widetilde{\mathcal{C}}$ and $\eta_{S}\in\{\pm 1\}$ .

We now express the right hand side above as an inner product. Let $\bm{\ell}$ be a random leaf of the generalized protocol tree $\overline{\mathcal{C}}$ induced by taking $\bm{x},\bm{y}\sim\gamma_{n}$ and let $\bm{X}_{\ell}\times\bm{Y}_{\ell}$ be the corresponding rectangle in the generalized protocol tree. Then,

$\displaystyle\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}{\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]}$	$\displaystyle=\operatorname{\mathbb{E}}_{\bm{\ell}}\left[\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}\cdot\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\,\middle\|\,(\bm{x},\bm{y})\in\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}\right]\right]$
	$\displaystyle=\operatorname{\mathbb{E}}_{\bm{\ell}}\left[\overline{\mathcal{C}}(\bm{\ell})\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}\cdot\bm{x}_{S}\bm{y}_{S}\,\middle\|\,(\bm{x},\bm{y})\in\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}\right]\right]$
	$\displaystyle\leq\operatorname{\mathbb{E}}_{\bm{\ell}}\left[~{}\left\|\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}\operatorname{\mathbb{E}}\left[\bm{x}_{S}\,\middle\|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{S}\,\middle\|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right\|~{}\right],$	(4.2)

where the second line follows since $\bm{\ell}$ is a leaf and determines the answer and the third line follows since $\bm{x}$ and $\bm{y}$ are independent conditioned on being in the rectangle $\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}$ .

Thus, specializing Subsection 4.3 to the level-one ( $k=1$ ) and level-two cases ( $k=2$ ), from Subsection 4.3 we get that

	$\displaystyle L_{1,1}(h)$	$\displaystyle\leq\frac{\pi}{2}\cdot\max_{\eta}~{}\operatorname{\mathbb{E}}_{\bm{\ell}}\left[~{}\left\|\sum_{i=1}^{n}\eta_{i}\cdot\operatorname{\mathbb{E}}\left[\bm{x}_{i}\,\middle\|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{i}\,\middle\|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right\|~{}\right],$
	$\displaystyle L_{1,2}(h)$	$\displaystyle\leq\frac{\pi^{2}}{4}\cdot\max_{\eta}~{}\operatorname{\mathbb{E}}_{\bm{\ell}}\left[~{}\left\|\sum_{i,j=1}^{n}\eta_{ij}\cdot~{}\operatorname{\mathbb{E}}\left[\bm{x}_{ij}\,\middle\|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{ij}\,\middle\|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right\|~{}\right],$

where for $L_{1,1}$ we optimize over $\eta\in\{\pm 1\}^{n}$ and for $L_{1,2}$ we optimize over $\eta$ being an $n\times n$ symmetric matrix with zeros on the diagonals and $\pm 1$ entries otherwise.

To make the above more compact, we respectively define $\mu(X)\in\mathbb{R}^{n}$ and $\sigma(X)\in\mathbb{R}^{n\times n}$ to be the level-one and level-two centers of mass of a set $X\subseteq\mathbb{R}^{n}$ :

\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\,\middle|\,\bm{x}\in X\right]\quad\text{and}\quad\sigma(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in X\right].

(4.3)

Then, upper bounding the constants in the above inequality ( $\pi/2$ and $\pi^{2}/4$ ) by $4$ , we get

\begin{split}L_{1,1}(h)&\leq 4\cdot\max_{\eta}~{}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\left|\left\langle\mu(\bm{X}_{\bm{\ell}}),\eta\odot\mu(\bm{Y}_{\bm{\ell}})\right\rangle\right|\right],\\ L_{1,2}(h)&\leq 4\cdot\max_{\eta}~{}\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\left|\left\langle\sigma(\bm{X}_{\bm{\ell}}),\eta\odot\sigma(\bm{Y}_{\bm{\ell}})\right\rangle\right|\right],\end{split}

(4.4)

where $\eta$ is understood to be the same as before.

Moving forward, we fix an arbitrary $\eta$ for both cases $k\in\{1,2\}$ and define a martingale process $\left(\bm{z}^{(t)}_{k}\right)_{t}$ that captures the right hand side above. For this we note that a generalized communication protocol, where Alice’s and Bob’s inputs are sampled from the Gaussian distribution, naturally induces a discrete-time random walk on the corresponding (generalized) protocol tree where at time $t$ we are at a node at depth $t$ with the corresponding rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ . Then, we have the following proposition.

Proposition 4.3.

$\mu(\bm{X}^{(t)})$ and $\mu(\bm{Y}^{(t)})$ are vector-valued martingales taking values in $\mathbb{R}^{n}$ and $\sigma(\bm{X}^{(t)})$ and $\sigma(\bm{Y}^{(t)})$ are matrix-valued martingales taking values in $\mathbb{R}^{n\times n}$ .

Note that if in the $t^{\text{th}}$ round Alice speaks, then $\mu(\bm{Y}^{(t)})$ and $\sigma(\bm{Y}^{(t)})$ do not change and similarly if Bob speaks, then $\mu(\bm{X}^{(t)})$ and $\sigma(\bm{X}^{(t)})$ do not change. The above proposition implies that the real-valued processes

\bm{z}^{(t)}_{1}=\left\langle\mu(\bm{X}^{(t)}),\eta\odot\mu(\bm{Y}^{(t)})\right\rangle\text{ and }\bm{z}^{(t)}_{2}=\left\langle\sigma(\bm{X}^{(t)}),\eta\odot\sigma(\bm{Y}^{(t)})\right\rangle,

(4.5)

each form a Doob martingale with respect to the natural filtration induced by the random walk on the protocol tree. Note that taking a random walk on the tree until we hit a leaf generates the marginal distribution on $\bm{\ell}$ given in Equation 4.4. Let $\bm{d}$ be the stopping time when this martingale hits a leaf and stops (i.e., the depth of the random leaf). Thus, by the orthogonality of martingale differences $\Delta\bm{z}^{(t)}_{k}=\bm{z}^{(t)}_{k}-\bm{z}^{(t-1)}_{k}$ from Equation 3.1, we get that for $k\in\{1,2\}$ , one can upper bound the Fourier growth in terms of expected quadratic variation of the above martingales:

Proposition 4.4.

For $k\in\{1,2\}$ , $\frac{1}{4}\cdot L_{1,k}(h)\leq\max_{\eta}\sqrt{\operatorname*{\mathbb{E}}\left[\left(\bm{z}^{(\bm{d})}_{k}\right)^{2}\right]}=\max_{\eta}\sqrt{\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{k}\right)^{2}\right]}$ .

The martingale implicitly depends on $\eta$ as used in Equation 4.4 and hence the maximum. Moreover, the martingale also depends on the underlying generalized communication protocol $\overline{\mathcal{C}}$ . In the next two sections, we will show that after transforming the original communication protocol into “clean” protocols, the expected quadratic variations of $(\bm{z}^{(t)}_{1})_{t}$ and $(\bm{z}^{(t)}_{2})_{t}$ are $O(d)$ and $O(d^{3})\cdot\mathrm{polylog}(n)$ respectively. This will then imply our main theorems.

Remark 4.5.

Note that Proposition 4.3 still holds even if the input distribution is not the Gaussian distribution, but some other product probability measure on the inputs $\bm{x},\bm{y}$ . This also implies that $\bm{z}^{(t)}_{k}$ for $k\in\{1,2\}$ is a martingale. In particular, for the level-two case, we will need to use a truncated Gaussian distribution. In light of Remark 4.2, Proposition 4.4 still suffices for us with a different constant instead of $1/4$ . We also remark that we shall also need to truncate the real messages being used in the protocol for the level-two case to a finite precision, so the generalized protocols for the level-two case only have Boolean communication. However, to obtain the optimal level-one bound allowing generalized protocols that communicate real values seems to be crucial.

5 Level-One Fourier Growth

In this section, we will give a proof of Theorem 1.2 that $L_{1,1}(h)=O(\sqrt{d})$ . We start with a $d$ -round communication protocol $\widetilde{\mathcal{C}}$ over the Gaussian space as defined in Subsection 4.1. Given the discussion in the previous section and Proposition 4.4, our task ultimately reduces to bounding the expected quadratic variation of the martingale that results from the protocol $\overline{\mathcal{C}}$ . For example, one can simply take $\overline{\mathcal{C}}=\widetilde{\mathcal{C}}$ , but, as discussed in Section 2, the individual step sizes of this martingale can be quite large in the worst-case and it is not so easy to leverage cancellations here to bound the quadratic variation by $O(d)$ .

So, we first define a generalized communication protocol $\overline{\mathcal{C}}$ that is equivalent to the original protocol $\widetilde{\mathcal{C}}$ but has additional “cleanup” rounds where Alice and Bob reveal certain linear forms of their inputs so that their sets are pairwise clean in the sense described in the overview. These cleanup steps allow us to keep track of the quadratic variation more easily.

5.1 Pairwise Clean Protocols

To define a clean protocol, we first define the notion of a pairwise clean set. Let $X\subseteq\mathbb{R}^{n}$ . We say that the set $X$ is pairwise clean in a direction $a\in\mathbb{S}^{n-1}$ with parameter $\lambda$ if

\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\mu(X),a\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]\leq\lambda,

(5.1)

where we recall that $\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\,\middle|\,\bm{x}\in X\right]$ is the level-one center of mass of $X$ .

The above condition implies that for a random vector $\bm{x}$ sampled from $\gamma$ conditioned on $X$ , its variance along the direction $a$ is bounded by $\lambda$ . We say that the set $X$ is pairwise clean (with parameter $\lambda$ ) if it is clean in every direction $a\in\mathbb{S}^{n-1}$ . Equivalently, the operator norm of the covariance matrix of the random vector $\bm{x}$ is bounded by $\lambda$ .

We call a generalized communication protocol pairwise clean with parameter $\lambda$ if at the start of a new “phase” of the protocol, the corresponding rectangle $X\times Y$ satisfies that both $X$ and $Y$ are pairwise clean. Starting from a communication protocol $\widetilde{\mathcal{C}}$ in the Gaussian space, we will transform it into a pairwise clean protocol $\overline{\mathcal{C}}$ by proceeding from top to bottom and adding certain Gram-Schmidt orthogonalization and cleanup steps.

In particular, consider an intermediate node in the protocol tree of $\widetilde{\mathcal{C}}$ . Before Alice sends her bit as in the original protocol $\widetilde{\mathcal{C}}$ , she first performs an orthogonalization step by revealing the inner-product between her input and Bob’s current level-one center of mass. After this, she sends her bit according to the original protocol and afterwards she repeatedly cleans her current set $X$ by revealing $\left\langle x,a\right\rangle\in\mathbb{R}$ while $X$ is not clean along the direction $a$ orthogonal to previous directions. Once $X$ becomes clean, they proceed to the next round. We now describe this formally.

Construction of pairwise clean protocol $\overline{\mathcal{C}}$ from $\widetilde{\mathcal{C}}$ .

We set $\lambda=100$ . The construction of the new protocol is recursive and we first define some notation. Consider an intermediate node of the new protocol $\overline{\mathcal{C}}$ at depth $t$ . We use the random variable $\bm{X}^{(t)}\subseteq\mathbb{R}^{n}$ (resp., $\bm{Y}^{(t)}\subseteq\mathbb{R}^{n}$ ) to denote the set of inputs of Alice (resp., Bob) reaching the node. If Alice reveals a linear form in this step, we use $\bm{a}^{(t)}\in\mathbb{R}^{n}$ to denote the vector of the linear form; otherwise, we set $\bm{a}^{(t)}$ to be the all-zeroes vector. We define $\bm{b}^{(t)}$ similarly for Bob. Throughout the protocol, we will abbreviate $\bm{u}^{(t)}=\mu(\bm{X}^{(t)})$ and $\bm{v}^{(t)}=\mu(\bm{Y}^{(t)})$ for Alice’s and Bob’s current center of mass respectively.

1.

At the beginning, Alice receives an input $x\in\mathbb{R}^{n}$ and Bob receives an input $y\in\mathbb{R}^{n}$ .
2.

We initialize $t\leftarrow 0$ , $\bm{X}^{(0)},\bm{Y}^{(0)}\leftarrow\mathbb{R}^{n}$ , and $\bm{a}^{(0)},\bm{b}^{(0)}\leftarrow 0^{n}$ .

For each phase $i=1,2,\ldots,d$ : suppose we are starting the cleanup for a node at depth $i$ in the original protocol $\widetilde{\mathcal{C}}$ and suppose we are at a node of depth $t$ in the new protocol $\overline{\mathcal{C}}$ . If it is Alice’s turn to speak in $\widetilde{\mathcal{C}}$ :

(a)

Orthogonalization by revealing the correlation with Bob’s center of mass.
Alice begins by revealing the inner product of her input $x$ with Bob’s current (signed) center of mass $\eta\odot\bm{v}^{(t)}$ . Since in the previous steps, she has already revealed the inner product with Bob’s previous centers of mass, for technical reasons, we will only have Alice announce the inner product with the component of $\eta\odot\bm{v}^{(t)}$ that is orthogonal to the previous directions along which Alice announced the inner product. More formally, let $\bm{a}^{(t+1)}$ be the component of $\eta\odot\bm{v}^{(t)}$ that is orthonormal to all previous directions $\bm{a}^{(1)},\dots,\bm{a}^{(t)}$ , i.e.,

\textstyle\bm{a}^{(t+1)}=\mathrm{unit}\left(\eta\odot\bm{v}^{(t)}-\sum_{\tau=1}^{t}\left\langle\eta\odot\bm{v}^{(t)},\bm{a}^{(\tau)}\right\rangle\cdot\bm{a}^{(\tau)}\right).

Alice computes $\overline{\bm{c}}^{(t+1)}\leftarrow\left\langle x,\bm{a}^{(t+1)}\right\rangle$ and sends $\overline{\bm{c}}^{(t+1)}$ to Bob. Set $\bm{b}^{(t+1)}\leftarrow 0^{n}$ . Increment $t$ by 1 and go to step (b).

(b)

Original communication. Alice sends the bit $\overline{\bm{c}}^{(t+1)}$ that she was supposed to send in $\widetilde{\mathcal{C}}$ based on previous messages and the input $x$ . Set $\bm{a}^{(t+1)},\bm{b}^{(t+1)}\leftarrow 0^{n}$ . Increment $t$ by 1 and go to step (c).
(c)

Cleanup steps. While there exists some direction $a\in\mathbb{S}^{n-1}$ orthogonal to the previous directions (i.e., satisfying $\left\langle a,\bm{a}^{(\tau)}\right\rangle=0$ for all $\tau\in[t]$ ) such that $\bm{X}^{(t)}$ is not pairwise clean in direction $a$ , Alice computes $\overline{\bm{c}}^{(t+1)}\leftarrow\left\langle x,a\right\rangle$ and sends this to Bob. Set $\bm{a}^{(t+1)}\leftarrow a$ and $\bm{b}^{(t+1)}\leftarrow 0^{n}$ . Increment $t$ by 1. Repeat step (c) as long as $\bm{X}^{(t)}$ is not pairwise clean; otherwise increment $i$ by 1 and go back to the for-loop in step 3 which starts the new phase.

If it is Bob’s turn to speak, we define everything similarly with the role of $x,\bm{a},\bm{X},\bm{v}$ switched with $y,\bm{b},\bm{Y},\bm{u}$ .

4.

Finally at the end of the protocol, the value $\overline{\mathcal{C}}(x,y)$ is determined based on all the previous communication and the corresponding output it defines in $\widetilde{\mathcal{C}}$ .

We note some basic properties that directly follow from the description. First we note that the steps 3(a), 3(b), and 3(c) always occur in sequence for each party and we refer to such a sequence of steps as a phase for that party. Note that there are at most $d$ phases. If a new phase starts at time $t$ , then the current rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ is pairwise clean for both parties by construction. Also, note that the non-zero vectors in the sequence $(\bm{a}^{(t)})_{t}$ (resp., $(\bm{b}^{(t)})_{t}$ ) form an orthonormal set. We also note that the Boolean communication in step 3(b) is solely determined by the original protocol and hence only depends on the previous Boolean messages.

Lastly, each phase has one 3(a) and 3(b) step, followed by potentially many 3(c) steps. However, the following claim shows that it is always finite.

Claim 5.1.

Let $\ell$ be an arbitrary leaf of the protocol $\overline{\mathcal{C}}$ and $D(\ell)$ be its depth. Then $D(\ell)\leq 2n+2d$ . Moreover, along this path there are at most $2d$ many steps 3(a) and 3(b).

Proof.

We count the number of communication steps separately:

•

Steps 3(a) and 3(b). Steps 3(a) and 3(b) occur once in every phase, thus at most $d$ times.
•

Step 3(c). For Alice, each time she communicates at step 3(c) $a\in\mathbb{R}^{n}$ , the direction is orthogonal to all previous $\bm{a}^{(t)}$ ’s. Since the dimension of $\mathbb{R}^{n}$ is $n$ , this happens at most $n$ times. Similar argument works for Bob.

Thus in total we have at most $2n+2d$ steps. ∎

We will eventually show that the expected depth of the protocol $\overline{\mathcal{C}}$ is $O(d)$ when $\bm{x},\bm{y}\sim\gamma_{n}$ .

5.2 Bounding the Expected Quadratic Variation

Consider a random walk on the protocol tree generated by the new protocol $\overline{\mathcal{C}}$ when the parties are given independent inputs $\bm{x},\bm{y}\sim\gamma_{n}$ . Consider the corresponding level-one martingale process defined in Equation 4.5. Formally, at time $t$ the process is defined by

\bm{z}^{(t)}_{1}=\left\langle\bm{u}^{(t)},\eta\odot\bm{v}^{(t)}\right\rangle,

where we recall that $\bm{u}^{(t)}=\mu(\bm{X}^{(t)})$ and $\bm{v}^{(t)}=\mu(\bm{Y}^{(t)})$ and $\eta\in\{\pm 1\}^{n}$ is a fixed sign vector.

The martingale process stops once it hits a leaf of the protocol $\overline{\mathcal{C}}$ . Let $\bm{d}$ denote the (stopping) time when this happens. Note that $\operatorname*{\mathbb{E}}[\bm{d}]$ is exactly the expected depth of the protocol $\overline{\mathcal{C}}$ . Then, in light of Proposition 4.4, to prove Theorem 1.2, it suffices to prove the following.

Lemma 5.2.

$\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{1}\right)^{2}\right]=O(d)$ .

We will prove this in two steps. We first show that the only change in the value of the martingale occurs during the orthogonalization step 3(a). This is because in each phase, Alice’s change of center of mass in steps 3(b) and 3(c) is always orthogonal to $\eta\odot\bm{v}^{(t)}$ so they do not change the value of the martingale $\bm{z}^{(t)}_{1}$ as discussed in Section 2. Moreover, recalling • ‣ Subsection 2.1, since Alice’s node was pairwise clean just before Alice sent the message in step 3(a), the expected change $\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{1}\right)^{2}\right]$ can be bounded in terms of the squared norm of the change that occurred in $\bm{u}^{(t)}$ between the current round and the last round where Alice was in step 3(a). A similar argument works for Bob.

Formally, this is encapsulated by the next lemma for which we need some additional definition. Let $(\mathcal{F}^{(t)})_{t}$ be the natural filtration induced by the random walk on the generalized protocol tree with respect to which $\bm{z}^{(t)}_{1}$ is a Doob martingale and also $\bm{u}^{(t)},\bm{v}^{(t)}$ form vector-valued martingales (recall Proposition 4.3). Note that $\mathcal{F}^{(t)}$ fixes all the rectangles encountered during times $0,\ldots,t$ and thus for $\tau\leq t$ , the random variables $\bm{u}^{(\tau)},\bm{v}^{(\tau)},\bm{z}^{(\tau)}_{1}$ are determined, in particular, they are $\mathcal{F}^{(t)}$ -measurable. Recalling that $\lambda=100$ is the cleanup parameter, we then have the following. Below we assume without any loss of generality that Alice speaks first and, in particular, we note that Alice speaks in step 3(a) for the first time at time zero.

Lemma 5.3 (Step Size).

Let $0=\bm{\tau}_{1}<\bm{\tau}_{2}<\cdots\leq\bm{d}$ be a sequence of stopping times with $\bm{\tau}_{m}$ being the index of the round where Alice speaks in step 3(a) for the $m^{\text{th}}$ time or $\bm{d}$ if there is no such round. Then, for any integer $m\geq 2$ ,

\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}_{m}+1)}_{1}\right)^{2}~{}\bigg{|}~{}\mathcal{F}^{(\bm{\tau}_{m})}\right]\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2},

and moreover, for any $t\in\mathbb{N}$ , we have that

\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{1}\right)^{2}~{}\bigg{|}~{}\mathcal{F}^{(t)},\bm{\tau}_{m-1}<t<\bm{\tau}_{m},\text{Alice speaks at time }t\right]=0.

A similar statement also holds if Bob speaks where $\bm{v}$ is replaced by $\bm{u}$ and the sequence $(\bm{\tau}_{m})$ is replaced by $(\bm{\tau}^{\prime}_{m})$ where $\bm{\tau}^{\prime}_{m}$ is the index of the round where Bob speaks in step 3(a) for the $m^{\text{th}}$ time or $\bm{d}$ if there is no such round.

In particular, we see that the steps 3(b) and 3(c) do not contribute to the quadratic variation and only the steps 3(a) do. Also, since the first time Alice and Bob speak, they start in step 3(a), we also note that $\bm{u}^{(\bm{\tau}_{1})}$ and $\bm{v}^{(\bm{\tau}^{\prime}_{1})}$ are their initial centers of mass which are both zero.

We shall prove the above lemma in Subsection 5.3 and continue with the bound on the quadratic variation here. Using Lemma 5.3, we have

\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{1}\right)^{2}\right]\leq\lambda\cdot\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+\left\|\bm{u}^{(\bm{\tau}^{\prime}_{m})}-\bm{u}^{(\bm{\tau}^{\prime}_{m-1})}\right\|^{2}\right].

On the other hand, by the orthogonality of vector-valued martingale differences from Equation 3.2, we have

\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right].

A similar statement holds for $(\bm{u}^{(t)})_{t}$ . Therefore,

\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{1}\right)^{2}\right]\leq\lambda\cdot\left(\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}\right]+\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\right).

(5.2)

We prove the following in Subsection 5.4 to upper bound the quantity on the right hand side above. Loosely speaking, by an application of level-one inequalities (see Theorem 3.1), the lemma below ultimately boils down to a bound on the expected number of cleanup steps.

Lemma 5.4 (Final Center of Mass).

$\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]=O(d).$

Since $\lambda=100$ , plugging in the bounds from the above into Equation 5.2 readily implies Lemma 5.2. Together with Proposition 4.4, this completes the proof of Theorem 1.2.

5.3 Bounds on Step Sizes (Proof of Lemma 5.3)

Let us abbreviate $\bm{\tau}=\bm{\tau}_{m}$ . Observe that

	$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{1}\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
		$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right],$		(5.3)

where the second line is due to $(\bm{u}^{(t)})_{t}$ being a vector-valued martingale and thus $\operatorname*{\mathbb{E}}\left[\bm{u}^{(\bm{\tau}+1)}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\bm{u}^{(\bm{\tau})}$ .

We first consider the case that at time $\bm{\tau}$ a new phase starts for Alice. By construction, this means that the current rectangle $\bm{X}^{(\bm{\tau})}\times\bm{Y}^{(\bm{\tau})}$ determined by $\mathcal{F}^{(\bm{\tau})}$ is pairwise clean with parameter $\lambda$ , and since Alice is in step 3(a) at the start of a new phase, $\bm{a}^{(\bm{\tau}+1)}$ is chosen to be the (normalized) component of $\eta\odot\bm{v}^{(\bm{\tau})}$ that is orthogonal to previous directions $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}$ . Let $\bm{\beta}^{(\bm{\tau}+1)}:=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle$ be the length of this component before normalization. Note that $\bm{\beta}^{(\bm{\tau}+1)}$ is $\mathcal{F}^{(\bm{\tau})}$ -measurable (i.e., it is determined by $\mathcal{F}^{(\bm{\tau})}$ ).

We now claim that components of $\bm{u}^{(\bm{\tau}+1)}$ and $\bm{u}^{(\bm{\tau})}$ are the same along any of the previous directions $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}$ . So in Equation 5.3, they cancel out and the only relevant quantity is the component in the direction $\bm{a}^{(\bm{\tau}+1)}$ . This follows since, in all the previous steps $t\leq\bm{\tau}$ , Alice has already fixed $\langle x,\bm{a}^{(t)}\rangle$ . This implies that for any $\bm{X}^{(\bm{\tau})}$ and $\bm{X}^{(\bm{\tau}+1)}$ that are determined by $\mathcal{F}^{(\bm{\tau}+1)}$ , the inner product with all the previous $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}$ is fixed over the choice of $x$ from both rectangles. Formally, we have that for any $x\in\bm{X}^{(\bm{\tau})}$ and $x^{\prime}\in\bm{X}^{(\bm{\tau}+1)}$ , it holds that $\langle x,\bm{a}^{(t)}\rangle=\langle x^{\prime},\bm{a}^{(t)}\rangle$ for any $t\leq\bm{\tau}$ . In particular, since $\bm{u}^{(\bm{\tau})}=\mu(\bm{X}^{(\bm{\tau})})$ and $\bm{u}^{(\bm{\tau}+1)}=\mu(\bm{X}^{(\bm{\tau}+1)})$ are the corresponding centers of mass, we have that

\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(t)}\right\rangle=\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(t)}\right\rangle\text{ for all }t\leq\bm{\tau}.

(5.4)

This, together with Equation 5.3 and recalling that $\bm{\beta}^{(\bm{\tau}+1)}$ is determined by $\mathcal{F}^{(\bm{\tau})}$ , implies that

\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{1}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]

\displaystyle=\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\cdot\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right].

(5.5)

We now bound the term outside the expectation by the change in the center of mass $\bm{v}^{(\cdot)}$ and the term inside the expectation by the fact that the set is pairwise clean.

Term Outside the Expectation.

Recall that $\bm{a}^{(\bm{\tau}+1)}$ is chosen to be the (normalized) component of $\eta\odot\bm{v}^{(\bm{\tau})}$ that is orthogonal to the span of $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau})}$ . Since $\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}$ is in the span of $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)}$ and $\bm{\tau}_{m-1}+1\leq\bm{\tau}=\bm{\tau}_{m}$ , it is orthogonal to $\bm{a}^{(\bm{\tau}+1)}$ . Hence,

\bm{\beta}^{(\bm{\tau}+1)}=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle=\left\langle\eta\odot\left(\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right),\bm{a}^{(\bm{\tau}+1)}\right\rangle.

Since $\bm{a}^{(\bm{\tau}+1)}$ is a unit vector and each entry of $\eta$ is in $\{\pm 1\}$ , this implies that

\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\leq\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}.

(5.6)

Term Inside the Expectation.

Since $(\bm{u}^{(\tau)})$ is a vector-valued martingale with respect to $\mathcal{F}^{(\tau)}$ , and $\bm{a}^{(\tau+1)}$ is $\mathcal{F}^{(\tau)}$ -measurable (determined by $\mathcal{F}^{(\tau)}$ ), we have that

\displaystyle\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\tau+1)}-\bm{u}^{(\tau)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\tau)}\right].

Since Alice is in step 3(a), her message fixes $\left\langle x,\bm{a}^{(\bm{\tau}+1)}\right\rangle$ at time $\bm{\tau}$ for every $x\in\bm{X}^{(\bm{\tau}+1)}$ . Thus,

$\displaystyle\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle=\operatorname{\mathbb{E}}\left[\left\langle\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]-\bm{u}^{(\tau)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
	$\displaystyle=\operatorname{\mathbb{E}}\left[\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right],$	(5.7)

where the last line follows from the tower property of conditional expectation.

Recall that $\bm{u}^{(\bm{\tau})}=\mu(\bm{X}^{(\bm{\tau})})$ is the center of mass. Moreover, the unit vector $\bm{a}^{(\tau+1)}$ is determined by $\mathcal{F}^{(\tau)}$ and also the conditional distribution of $\bm{x}$ conditioned on $\mathcal{F}^{(\tau)}$ is that of $\bm{x}\sim\gamma$ conditioned on $\bm{x}\in\bm{X}^{(\tau)}$ . Thus, using the fact that $\bm{X}^{(\bm{\tau})}$ is pairwise clean since Alice is in step 3(a), the right hand side in Subsection 5.3 is at most $\lambda$ .

Final Bound.

Substituting the above in Equation 5.5, we have

\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{1}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]

\displaystyle\leq\lambda\cdot\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2},

where the second inequality follows from Equation 5.6. This completes the proof of the first statement.

For the moreover part, let us condition on the event $\bm{\tau}_{m-1}<t<\bm{\tau}_{m}$ where Alice speaks at time $t$ . Note that such $t$ must all lie in the same phase of the protocol where Alice is the only one speaking. So, Bob’s center of mass does not change from the time $\bm{\tau}_{m-1}$ till $t$ , i.e., $\bm{v}^{(t+1)}=\bm{v}^{(\bm{\tau}_{m-1})}$ . Thus we have $\Delta\bm{z}^{(t+1)}_{1}=\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}\right\rangle$ . Analogous to Equation 5.4, the component of Alice’s center of mass along the previous directions are fixed. Thus $\left\langle\bm{u}^{(t+1)},\bm{a}^{(r)}\right\rangle=\left\langle\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle$ for all $r\leq t$ . Furthermore, by construction, $\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}$ lies in the linear subspace spanned by $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)}$ . Therefore, since $\bm{\tau}_{m-1}+1\leq t$ , it follows that $\Delta\bm{z}^{(t+1)}_{1}=0$ .

5.4 Expected Norm of Final Center of Mass (Proof of Lemma 5.4)

Let $\bm{H}_{A}=\bm{H}_{A}^{(\bm{d})}$ be the (random) linear subspace spanned by the vectors $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{d})}$ and similarly, let $\bm{H}_{B}=\bm{H}_{B}^{(\bm{d})}$ be the linear subspace spanned by the vectors $\bm{b}^{(0)},\ldots,\bm{b}^{(\bm{d})}$ . For any linear subspace $V$ of $\mathbb{R}^{n}$ , we denote by $\bm{\Pi}_{V}$ and $\bm{\Pi}_{V^{\bot}}$ the projectors on the subspace $V$ and its orthogonal complement $V^{\bot}$ respectively. Then, we have that

\left\|\bm{u}^{(\bm{d})}\right\|^{2}=\left\|\bm{\Pi}_{H_{A}}\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{H_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}\text{ and }\left\|\bm{v}^{(\bm{d})}\right\|^{2}=\left\|\bm{\Pi}_{H_{B}}\bm{v}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{H_{B}^{\bot}}\bm{v}^{(\bm{d})}\right\|^{2}.

Note that the non-zero vectors in $(\bm{a}^{(t)})_{t}$ and $(\bm{b}^{(t)})_{t}$ form an orthonormal basis for the subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$ respectively. Moreover, for each $t\leq\bm{d}$ , the inner product $\left\langle x,\bm{a}^{(t)}\right\rangle$ is fixed for every $x\in\bm{X}^{(\bm{d})}$ and the inner product $\left\langle y,\bm{b}^{(t)}\right\rangle$ is also fixed for every $y\in\bm{Y}^{(\bm{d})}$ where $\bm{X}^{(\bm{d})}\times\bm{Y}^{(\bm{d})}$ is the current rectangle determined by $\mathcal{F}^{(\bm{d})}$ . In particular, since $\bm{u}^{(\bm{d})}$ is the center of mass of $\bm{X}^{(\bm{d})}$ , this implies that

	$\displaystyle\left\\|\bm{\Pi}_{H_{A}}\bm{u}^{(\bm{d})}\right\\|^{2}=\sum_{t=1}^{\bm{d}}\left\langle\bm{u}^{(\bm{d})},\bm{a}^{(t)}\right\rangle^{2}$	$\displaystyle=\sum_{t=1}^{\bm{d}}\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{d})}\right]\right)^{2}$
		$\displaystyle=\sum_{t=1}^{\bm{d}}\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{d})}\right],$

where the second line follows from the inner product being fixed in $\bm{X}^{(\bm{d})}$ . Therefore, we have

\left\|\bm{u}^{(\bm{d})}\right\|^{2}=\underbrace{\sum_{t=1}^{\bm{d}}{\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in\bm{X}^{(\bm{d})}\right]}}_{\bm{p}_{A}}+\underbrace{\left\|\bm{\Pi}_{H_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}}_{\bm{q}_{A}}.

In an analogous fashion,

\left\|\bm{v}^{(\bm{d})}\right\|^{2}=\underbrace{\sum_{t=1}^{\bm{d}}{\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}\,\middle|\,\bm{y}\in\bm{Y}^{(\bm{d})}\right]}}_{\bm{p}_{B}}+\underbrace{\left\|\bm{\Pi}_{H_{B}^{\bot}}\bm{v}^{(\bm{d})}\right\|^{2}}_{\bm{q}_{B}}.

We next show that both $\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}]$ and $\operatorname*{\mathbb{E}}[\bm{q}_{A}+\bm{q}_{B}]$ are at most $O(d)$ . The former follows from stopping time and concentration arguments laid out in the overview that there cannot be too many orthogonal directions where $\operatorname*{\mathbb{E}}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\right]$ is large. The latter follows from an application of level-one inequalities.

We will bound the norm of the projection on the subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$ , which corresponds to the quantity $\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}]$ , in Subsection 5.4.1 and bound the norm of the projection on the orthogonal subspaces $\bm{H}_{A}^{\bot}$ and $\bm{H}_{B}^{\bot}$ , which corresponds to the quantity $\operatorname*{\mathbb{E}}[\bm{q}_{A}+\bm{q}_{B}]$ , in Subsection 5.4.2.

5.4.1 Projection on the Subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$

We shall show that the expected norm of the final center of mass when projected on the subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$ is

\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}]=O(d).

Towards this end, define the random variable $\bm{k}_{t}=\bm{k}_{t}(\bm{x},\bm{y})=\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}$ for each $t\in\mathbb{N}$ . Note that the vectors $\bm{a}^{(t)}$ ’s are being chosen adaptively depending on the previous inner products $\left\langle\bm{x},\bm{a}^{(\tau)}\right\rangle$ for $\tau<t$ , as well as the Boolean communication bits from step 3(b), thus they are functions of $\bm{x}$ and $\bm{y}$ as well here. Observe that

\operatorname*{\mathbb{E}}\left[\bm{p}_{A}+\bm{p}_{B}\right]=\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\operatorname*{\mathbb{E}}\left[\bm{k}_{t}\,\middle|\,\mathcal{F}^{(\bm{d})}\right]\right]=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t=1}^{\bm{d}}\bm{k}_{t}\right].

We now divide the time sequence into successive intervals of different lengths $r\cdot 4d$ for $r=1,2,\ldots$ . Then we bound the expected sum of $\bm{k}_{t}$ within each time interval by $O(rd)$ . We further argue that the probability that the stopping time $\bm{d}$ lies in the $r$ -th interval is at most $2\cdot 2^{-r}$ . In particular, for $r\in\mathbb{N}$ , letting interval $I_{r}=\left\{\binom{r}{2}\cdot 4d+1,\ldots,\binom{r+1}{2}\cdot 4d\right\}$ , which is of length $4dr$ , we show the following.

Claim 5.5.

For any $r\in\mathbb{N}$ , we have

\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>\binom{r}{2}\cdot 4d\right]\leq 20dr+4\ln\left(\dfrac{1}{\operatorname*{\mathbf{Pr}}\left[\bm{d}>\binom{r}{2}\cdot 4d\right]}\right).

We shall prove the above claim later since it is the most involved part of the proof. The previous claim readily implies the following probability bounds.

Claim 5.6.

For any $r\in\mathbb{N}$ , we have $\operatorname*{\mathbf{Pr}}\left[\bm{d}>\binom{r}{2}\cdot 4d\right]\leq 2\cdot 2^{-r}$ .

Proof of Claim 5.6.

We bound $\operatorname*{\mathbf{Pr}}\left[\bm{d}>\binom{r}{2}\cdot 4d\right]$ by induction on $r$ . The claim trivially holds for $r=1$ .

Now we proceed to analyze the event $\bm{d}\geq\tbinom{r+1}{2}\cdot 4d$ . Observe that Claim 5.1 implies that there are at most $2d$ many step 3(a) and 3(b) throughout the protocol. Thus if the event above occurs, there are at least $4dr-2d\geq 2dr$ many time steps $t\in I_{r}$ where the process is in step 3(c).

By the definition of the cleanup step, if $X\times Y$ is a rectangle determined¹⁰¹⁰10It suffices to consider such events since we have a product measure on $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ conditioned on $\mathcal{F}^{(t)}$ and $\bm{d}$ is a stopping time and is $\mathcal{F}^{(t)}$ -measurable (i.e., determined by the randomness in $\mathcal{F}^{(t)}$ ). by $\mathcal{F}^{(t-1)}\cap\{\bm{d}>\binom{r}{2}\cdot 4d\}$ where the process is in step 3(c) and Alice speaks, then

\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{k}_{t}\,\middle|\,(\bm{x},\bm{y})\in X\times Y\right]=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]\geq\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\mu(X),\bm{a}^{(t)}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]\geq\lambda,

where $\lambda=100$ is the cleanup parameter and $\mu(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}[\bm{x}~{}|~{}\bm{x}\in X]$ is the center of mass. This is because $\bm{a}^{(t)}$ is chosen to be a unit vector in a direction where the current set (conditioned on the history) is not pairwise clean. A similar statement holds if Bob speaks in step 3(c) for the random variable $\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}$ where $\bm{y}$ is sampled from $\gamma$ conditioned on $Y$ .

By the tower property of conditional expectation, the above implies that

100\cdot 2dr\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}>{\textstyle\binom{r+1}{2}}\cdot 4d\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]\leq\operatorname*{\mathbb{E}}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right].

Recall that Claim 5.5 implies that the right hand side is at most $\leq 20dr+4\ln\left(\frac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\tbinom{r}{2}\cdot 4d]}\right)$ . We consider two cases:

(i)

if $\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]\leq 2^{-r}$ , then clearly $\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r+1}{2}\cdot 4d]\leq 2^{-r}$ as well as required;

(ii)

otherwise $\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]\geq 2^{-r}$ and $20dr+4\left(\frac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\tbinom{r}{2}\cdot 4d]}\right)\leq 20dr+4r$ , then it follows that

\operatorname*{\mathbf{Pr}}\left[\bm{d}>\textstyle\binom{r+1}{2}\cdot 4d\,\middle|\,\bm{d}>\textstyle\binom{r}{2}\cdot 4d\right]\leq 1/2,

and by induction this implies that $\operatorname*{\mathbf{Pr}}\left[\bm{d}>\textstyle\binom{r+1}{2}\cdot 4d\right]\leq 1/2\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}>\textstyle\binom{r}{2}\cdot 4d\right]\leq 2^{-r}$ .∎

These claims imply that

	$\displaystyle\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}]$	$\displaystyle\leq\operatorname*{\mathbb{E}}\left[\sum_{r=0}^{\infty}1\left[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]\cdot\sum_{t\in I_{r}}\bm{k}_{t}\right]$
		$\displaystyle=\sum_{r=0}^{\infty}\operatorname{\mathbf{Pr}}[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d]\cdot\operatorname{\mathbb{E}}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle\|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]$
		$\displaystyle\leq\sum_{r=0}^{\infty}\left(2^{1-r}\cdot O(rd)+4\cdot\operatorname{\mathbf{Pr}}[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d]\cdot\ln\left(\tfrac{1}{\operatorname{\mathbf{Pr}}\left[\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]}\right)\right)$
		$\displaystyle\leq\sum_{r=0}^{\infty}\left(2^{1-r}\cdot O(rd)+O\left((r+1)2^{-r}\right)\right)\leq O(d),$

where the last line uses the fact that $x\ln(1/x)\leq O((r+1)2^{-r})$ for $0\leq x\leq 2\cdot 2^{-r}$ and $r\in\mathbb{N}$ . This proves the desired bound on $\operatorname*{\mathbb{E}}[\bm{p}_{A}+\bm{p}_{B}]$ assuming Claim 5.5 which we prove next.

Proof of Claim 5.5.

To prove the claim, we need to analyze the expectation of $\sum_{t\in I_{r}}\bm{k}_{t}$ under $\bm{x},\bm{y}$ sampled from $\gamma$ conditioned on the event $\bm{d}\geq\binom{r}{2}\cdot 4d$ .

We first describe an equivalent way of sampling from this distribution which will be easier for analysis. First, we recall that the definition of the cleanup protocol implies that the Boolean communication in $\overline{\mathcal{C}}$ is solely determined by the previous Boolean communication, since it is specified by the original protocol $\widetilde{\mathcal{C}}$ (and thus $\mathcal{C}$ ) before the cleanup.

Let us fix any Boolean string $c\in\{0,1\}^{*}$ that is a valid Boolean transcript in the original communication protocol $\widetilde{\mathcal{C}}$ . This defines a rectangle $X_{c}\times Y_{c}\subseteq\mathbb{R}^{n}\times\mathbb{R}^{n}$ consisting of all pairs of inputs to Alice and Bob that result in the Boolean transcript $c$ in $\widetilde{\mathcal{C}}$ . If we sample $\bm{x},\bm{y}\sim\gamma$ conditioned on $\bm{d}>\binom{r}{2}\cdot 4d$ and output the unique $(\bm{X}_{c},\bm{Y}_{c})$ such that $(\bm{x},\bm{y})\in\bm{X}_{c}\times\bm{Y}_{c}$ , we obtain a distribution on rectangles. We use $\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d)$ to denote the probability of obtaining $X_{c}\times Y_{c}$ by this sampling process so that $\sum_{c}\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d)=1$ .

Now consider the following two-stage sampling process. First, we sample a rectangle $X_{c}\times Y_{c}$ according to the above distribution, and then we sample the inputs $\bm{x},\bm{y}$ sampled from $\gamma_{n}$ conditioned on the event that $\{(\bm{x},\bm{y})\in X_{c}\times Y_{c}\}\wedge\{\bm{d}>\binom{r}{2}\cdot 4d\}$ . We shall show the following claim for any rectangle $X_{c}\times Y_{c}$ that could be sampled in the first step.

Claim 5.7.

$\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle|\,\bm{d}>4d\tbinom{r}{2},(\bm{x},\bm{y})\in X_{c}\times Y_{c}\right]\leq 12dr+4\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>4d\tbinom{r}{2},(\bm{x},\bm{y})\in X_{c}\times Y_{c}]}\right)$ .

Assuming the above, and taking an expectation over $X_{c}\times Y_{c}$ drawn with probability $\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d)$ , we immediately obtain Claim 5.5:

	$\displaystyle\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}\,\middle\|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]$
	$\displaystyle\leq 12dr+4\cdot\sum_{\begin{subarray}{c}c\in\{0,1\}^{},\|c\|\leq d\end{subarray}}\gamma(X_{c}\times Y_{c}\|\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d)\cdot\left(\ln\left(\tfrac{1}{\gamma(X_{c}\times Y_{c}\|\bm{d}>\binom{r}{2}\cdot 4d)}\right)+\ln\left(\tfrac{1}{\operatorname{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]}\right)\right)$
	$\displaystyle\leq 12dr+4\cdot\ln(3^{d})+4\cdot\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]}\right)$		(by concavity of $\ln(\cdot)$ )
	$\displaystyle\leq 20dr+4\cdot\ln\left(\tfrac{1}{\operatorname*{\mathbf{Pr}}[\bm{d}>\binom{r}{2}\cdot 4d]}\right).$		∎

To complete the proof, we now prove Claim 5.7.

Proof of Claim 5.7.

Fix any $c$ such that $\gamma(X_{c}\times Y_{c}\,|\,\bm{d}>\binom{r}{2}\cdot 4d)>0$ . We will bound the expectation of the quantity $\sum_{t\in I_{r}}\bm{k}_{t}=\sum_{t\in I_{r}}\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}$ where $\bm{x},\bm{y}$ are sampled from $\gamma_{n}$ conditioned on the event that $\{(\bm{x},\bm{y})\in X_{c}\times Y_{c}\}\wedge\{\bm{d}>\binom{r}{2}\cdot 4d\}$ . Note that $\bm{a}^{(t)},\bm{b}^{(t)},\bm{d}$ are functions of the previous messages of the protocol and hence also the inputs $\bm{x},\bm{y}$ . Once we condition on the above event, the Boolean communication is also fixed to be $c$ .

To analyze the above conditioning, we first do a thought experiment and consider a different protocol that takes standard Gaussian inputs (without any conditioning) and show a tail bound for the random variable $\sum_{t\in I_{r}}\bm{k}_{t}$ for this new protocol. In the last step, we will use it to compute the expectation we ultimately want.

Protocol $\mathcal{C}_{c}$ .

The protocol $\mathcal{C}_{c}$ always communicates according to the fixed transcript $c$ in a Boolean communication step and otherwise according to the cleanup protocol $\overline{\mathcal{C}}$ on any input $x,y$ . Consider the random walk on this new protocol tree where the inputs $\bm{x},\bm{y}\sim\gamma$ (without any conditioning). Let $(\mathcal{G}^{(t)})_{t}$ be the associated filtration of the new protocol $\mathcal{C}_{c}$ which can be identified with the collection of all partial transcripts till time $t$ . Note that the vectors $\bm{a}^{(t)}$ and $\bm{b}^{(t)}$ in this new protocol are determined only by the previous real communication since the Boolean communication is fixed to $c$ . This also implies that the vectors $\bm{a}^{(t)}$ and $\bm{b}^{(t)}$ form a predictable sequence with respect to the filtration $(\mathcal{G}^{(t)})_{t}$ . Moreover, by the definition of the protocol the next non-zero vector $\bm{a}^{(\cdot)}$ is chosen to be a unit vector orthogonal to the previously chosen $\bm{a}^{(\cdot)}$ ’s and the same holds for the vectors $\bm{b}^{(\cdot)}$ .

We denote by $\bm{k}_{t}^{(c)}$ the random variable that captures $\bm{k}_{t}$ for the protocol $\mathcal{C}_{c}$ , i.e., $\bm{k}_{t}^{(c)}=\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}$ for $\bm{x},\bm{y}\sim\gamma$ and $\bm{a}^{(t)},\bm{b}^{(t)}$ defined by the protocol $\mathcal{C}_{c}$ . Observe that if $(\bm{x},\bm{y})\in X_{c}\times Y_{c}$ then $\bm{k}_{t}^{(c)}=\bm{k}_{t}$ .

Consider the behavior of the protocol $\mathcal{C}_{c}$ at some fixed time $t$ . The nice thing about the protocol $\mathcal{C}_{c}$ is that conditioned on all previous real messages for $\tau<t$ , both $\bm{x}$ and $\bm{y}$ are standard Gaussian distributions on an affine subspace of $\mathbb{R}^{n}$ (defined by the previous messages). Then, at time $t$ , since $\bm{a}^{(t)}$ is orthogonal to the directions used in all previous real messages, it follows that the distribution of $\left\langle\bm{x},\bm{a}^{(t)}\right\rangle$ conditioned on any event in $\mathcal{G}^{(t-1)}$ is an independent standard Gaussian for every $t$ if $\bm{a}^{(t)}$ is non-zero. The same holds for $\left\langle\bm{y},\bm{b}^{(t)}\right\rangle$ as well. This last fact uses that the projection of a multi-variate standard Gaussian $\gamma_{n}$ in orthonormal directions yields independent real-valued standard Gaussians.

This implies that each new $\left\langle\bm{x},\bm{a}^{(t)}\right\rangle^{2}$ and $\left\langle\bm{y},\bm{b}^{(t)}\right\rangle^{2}$ is an independent chi-squared random variable conditioned on the history (up to depth $\binom{r}{2}\cdot 4d$ ) of the random walk. Therefore, Fact 3.2 implies that

\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}^{(c)}(\bm{x},\bm{y})\geq 2|I_{r}|+s\,\middle|\,\mathcal{G}^{(\binom{r}{2}\cdot 4d)}\right]\leq e^{-s/4}.

Since $|I_{r}|\leq 4dr$ , we have $\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}^{(c)}_{t}(\bm{x},\bm{y})\geq 8dr+s\right]\leq e^{-s/4}.$

Computing the Original Expectation.

Let us compare the probability of the above tail event in the original protocol $\overline{\mathcal{C}}$ where inputs $\bm{x},\bm{y}$ are sampled from $\gamma$ conditioned on the event that $\{(\bm{x},\bm{y})\in X_{c}\times Y_{c}\}\wedge\{\bm{d}>\binom{r}{2}\cdot 4d\}$ . We can write

		$\displaystyle\phantom{\leq}\operatorname*{\mathbf{Pr}}_{(\bm{x},\bm{y})\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\geq 8dr+s\,\middle\|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d,(\bm{x},\bm{y})\in X_{c}\times Y_{c}\right]$		(5.8)
		$\displaystyle=\frac{\operatorname{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\geq 8dr+s,(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}{\operatorname{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}.$

We then bound the numerator by

	$\displaystyle\phantom{\leq}\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\geq 8dr+s,(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]$
	$\displaystyle=\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}^{(c)}(\bm{x},\bm{y})\geq 8dr+s,(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d\right]$		(if $(\bm{x},\bm{y})\in X_{c}\times Y_{c}$ then $\bm{k}_{t}^{(c)}=\bm{k}_{t}$ )
	$\displaystyle\leq\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{t\in I_{r}}\bm{k}_{t}^{(c)}(\bm{x},\bm{y})\geq 8dr+s\right]\leq e^{-s/4}.$

Note that the inequality gives us an exponential tail on Equation 5.8:

\lx@cref{creftypecap~refnum}{eq:tail_of_kt}\leq e^{-s/4}\cdot\left(\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]\right)^{-1}.

We can now integrate the above inequality to get an upper bound on the expected value of $\sum_{t\in I_{r}}\bm{k}_{t}$ under the distribution of interest. In particular, since for any non-negative random variable $\bm{w}$ , the following holds for any parameter $\alpha\geq 0$ :

\operatorname*{\mathbb{E}}[\bm{w}]=\int_{0}^{+\infty}\operatorname*{\mathbf{Pr}}[\bm{w}\geq z]\mathrm{d}z\leq\alpha+\int_{\alpha}^{+\infty}\operatorname*{\mathbf{Pr}}[\bm{w}\geq z]\mathrm{d}z=\alpha+\int_{0}^{+\infty}\operatorname*{\mathbf{Pr}}[\bm{w}\geq\alpha+z]\mathrm{d}z,

we derive the following by taking $\alpha=8dr+4\ln\left(\frac{1}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}\right)$ :

	$\displaystyle\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\gamma}\left[\sum_{i\in I_{r}}\bm{k}_{t}(\bm{x},\bm{y})\,\middle\|\,\bm{d}>{\textstyle\binom{r}{2}}\cdot 4d,(\bm{x},\bm{y})\in X_{c}\times Y_{c}\right]$
	$\displaystyle\qquad\leq\alpha+\int_{0}^{+\infty}e^{-z/4}\mathrm{d}z=\alpha+4$
	$\displaystyle\qquad\leq 12dr+4\ln\left(\dfrac{1}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[(\bm{x},\bm{y})\in X_{c}\times Y_{c},\bm{d}>\binom{r}{2}\cdot 4d\right]}\right).$

This completes the proof of Claim 5.7. ∎

5.4.2 Projection on the Orthogonal Subspaces $\bm{H}_{A}^{\bot}$ and $\bm{H}_{B}^{\bot}$

We shall show that the expected norm of the final center of mass when projected on the subspaces $\bm{H}_{A}^{\bot}$ and $\bm{H}_{B}^{\bot}$ is

\operatorname*{\mathbb{E}}[\bm{q}_{A}+\bm{q}_{B}]=O(d).

Recall that $\bm{q}_{A}=\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}$ where $\bm{H}_{A}$ is the (random) linear subspace spanned by the orthonormal set of vectors $\bm{a}^{(0)},\ldots,\bm{a}^{(\bm{d})}$ and $\bm{H}_{A}^{\bot}$ its orthogonal complement. Moreover, the vectors $\bm{a}^{(t)}$ are determined by the previous Boolean and real communication. A similar statement holds for $\bm{q}_{B}$ and the vectors $\bm{b}^{(t)}$ as well.

The proof will follow in two steps. We will first show that one can bound the norm of the projection $\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(d)}$ , which turns out to be the Gaussian center of mass of a set that lives in the subspace $\bm{H}_{A}^{\bot}$ , in terms of the logarithm of the inverse relative measure with respect to the subspace. Note that the Gaussian measure here is the Gaussian measure $\gamma_{\bm{H}_{A}^{\bot}}$ on the subspace $\bm{H}_{A}^{\bot}$ . The case for $\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{u}^{(d)}$ will be similar. The second step will use information theory-esque convexity argument to show that on average the logarithm of the inverse relative measure is small.

For the first part, we observe that if we sample $\bm{x},\bm{y}\sim\gamma$ and take a random walk on this protocol tree, we obtain a probability measure over transcripts which includes both real and Boolean values. Recall that the Boolean transcript is determined by the original protocol and only depends on the previous Boolean communication and the real transcript is sandwiched between the Boolean communication. Let $\bm{\ell}=(\bm{c},\bm{r})$ denote the random variable representing the full transcript of the generalized protocol where $\bm{c}$ is the Boolean communication and $\bm{r}$ is the real communication. For any given transcript $\bm{\ell}$ , let $\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}$ denote the corresponding rectangle consists of inputs reaching the leaf, and let $\bm{X}_{\bm{c}}\times\bm{Y}_{\bm{c}}$ (for $\bm{X}_{\bm{c}},\bm{Y}_{\bm{c}}\subseteq\mathbb{R}^{n}$ ) denote the rectangle consisting of all pairs of inputs to Alice and Bob that result in the Boolean transcript $\bm{c}$ . Note that the real communication $\bm{r}$ together with $\bm{c}$ fixes the subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$ and particular affine shifts $\bm{s}_{A}$ and $\bm{s}_{B}$ of those subspaces depending on the value of the inner products determined by the full transcript. In particular, the rectangle $\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}$ consistent with the full transcript $\bm{\ell}=(\bm{c},\bm{r})$ is given by $\bm{X}_{\bm{\ell}}=\bm{X}_{\bm{c}}\cap(\bm{H}_{A}+\bm{s}_{A})$ and $\bm{Y}_{\bm{\ell}}=\bm{Y}_{\bm{c}}\cap(\bm{H}_{B}+\bm{s}_{B})$ , i.e., taking (random) affine slices of the original sets.

Note that $\bm{u}^{(\bm{d})}$ and $\bm{v}^{(\bm{d})}$ are distributed as the center of masses of the final rectangle $\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}$ , and thus is suffices to look at the rectangles for the rest of the argument. Since $\bm{X}_{\bm{\ell}}$ (resp., $\bm{Y}_{\bm{\ell}}$ ) lies in some affine shift of $\bm{H}_{A}^{\bot}$ (resp., $\bm{H}_{B}^{\bot}$ ), defining the relative center of mass for a set $A$ that lives in the ambient linear subspace $V$ , as $\mu_{V}(A)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{V}}[\bm{x}~{}|~{}\bm{x}\in A]$ where the Gaussian measure $\gamma_{V}$ is on the subspace $V$ , it follows that

\displaystyle\operatorname*{\mathbb{E}}\left[\bm{q}_{A}+\bm{q}_{B}\right]

\displaystyle=\operatorname*{\mathbb{E}}\left[\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}\right]=\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\|\mu_{\bm{H}_{A}^{\perp}}(\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{X}_{\bm{\ell}})\|^{2}+\|\mu_{\bm{H}_{B}^{\perp}}(\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{Y}_{\bm{\ell}})\|^{2}\right].

Recalling that $\gamma_{\mathrm{rel}}$ is the Gaussian measure of a set relative to its ambient space, we will show:

Claim 5.8.

$\|\mu_{\bm{H}_{A}^{\perp}}(\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{X}_{\bm{\ell}})\|^{2}\leq 2e^{2}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}\left(\bm{X}_{\bm{\ell}}\right)}\right)$ and $\|\mu_{\bm{H}_{B}^{\perp}}(\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{Y}_{\bm{\ell}})\|^{2}\leq 2e^{2}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}\left(\bm{Y}_{\bm{\ell}}\right)}\right)$ .

Note that we can ignore the case when $\gamma_{\mathrm{rel}}(\bm{X}_{\bm{\ell}})$ is zero above, since we will eventually take an expectation over $\bm{\ell}$ and almost surely this measure is non-zero.

Using the previous claim,

\displaystyle\operatorname*{\mathbb{E}}\left[\bm{q}_{A}+\bm{q}_{B}\right]

\displaystyle=\operatorname*{\mathbb{E}}\left[\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{u}^{(\bm{d})}\right\|^{2}\right]\leq 2e^{2}\cdot\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln\left(\frac{e}{\gamma_{\mathrm{rel}}\left({\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}}\right)}\right)\right].

For the second step of the proof, we show the next claim which relies on convexity arguments to bound the right hand side above by $O(d)$ . This is similar in spirit to chain-style arguments from information theory.

Claim 5.9.

$\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}\left({\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}}\right)}\right)\right]=O(d)$ .

This gives us the final bound $\operatorname*{\mathbb{E}}\left[\bm{q}_{A}+\bm{q}_{B}\right]=O(d)$ assuming the claims which we now prove.

Proof of Claim 5.8.

We can bound the norm of the above projection by an application of the Gaussian level-one inequality (Theorem 3.1), which, by rotational symmetry, implies that if $A$ is a subset of a linear subspace $V$ with non-zero measure, then

\displaystyle\|\mu_{V}(A)\|^{2}\leq 2e^{2}\ln\left(\frac{e}{\gamma_{V}(A)}\right),

(5.9)

where recall that $\mu_{V}(A)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma_{V}}[\bm{x}~{}|~{}\bm{x}\in A]$ is the center of mass with respect to the Gaussian measure $\gamma_{V}$ on the subspace $V$ .

If we run the generalized protocol on $\bm{x},\bm{y}\sim\gamma$ and condition on getting the full transcript $\bm{\ell}$ , the conditional probability measure on $\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{x}$ is that of the Gaussian measure $\gamma_{\bm{H}_{A}^{\bot}}$ conditioned on $\bm{x}\in\bm{X}_{\bm{\ell}}-\bm{s}_{A}$ and $\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{y}$ is that of the Gaussian measure $\gamma_{\bm{H}_{B}^{\bot}}$ conditioned on $\bm{y}\in\bm{Y}_{\bm{\ell}}-\bm{s}_{B}$ and they are independent. This follows from the fact that so far the parties have fixed inner products along a basis for the orthogonal subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$ and the fact the projection of a standard Gaussian on orthogonal subspaces are independent.

Thus, applying Equation 5.9, we have

\displaystyle\|\mu_{\bm{H}_{A}^{\bot}}(\bm{\Pi}_{\bm{H}_{A}^{\bot}}\bm{X}_{\bm{\ell}})\|^{2}\leq 2e^{2}\ln\left(\frac{e}{\gamma_{\bm{H}_{A}^{\bot}}(\bm{X}_{\bm{\ell}}-\bm{s}_{A})}\right)=2e^{2}\ln\left(\frac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{\bm{\ell}})}\right),

where the last line follows since $\bm{H}_{A}+\bm{s}_{A}$ is the ambient space for $\bm{X}_{\bm{\ell}}$ (this holds almost surely) and $\gamma_{\mathrm{rel}}(S)=\gamma_{V}(S-t)$ if $V+t$ is the ambient space of $S$ . A similar argument proves the bound on $\|\mu_{\bm{H}_{B}^{\bot}}(\bm{\Pi}_{\bm{H}_{B}^{\bot}}\bm{Y}_{\bm{\ell}})\|^{2}$ . ∎

Proof of Claim 5.9.

For this claim, it will be convenient to consider a different generalized protocol $\mathcal{C}^{\prime}$ that generates the same distribution on the leaves $\bm{\ell}$ . In particular, since the Boolean messages in the generalized protocol $\overline{\mathcal{C}}$ only depend on the previous Boolean messages, one can first send all the Boolean messages $\bm{c}$ , and then send all the real messages $\bm{r}$ choosing them according to the protocol $\overline{\mathcal{C}}$ depending on the previous real messages and the (partial) Boolean transcript. Note that the protocol $\mathcal{C}^{\prime}$ generates the same distribution on the leaves $\bm{\ell}$ when the inputs $\bm{x},\bm{y}\sim\gamma_{n}$ . In particular, the real communication only partitions ¹¹¹¹11We remark that this protocol $\mathcal{C}^{\prime}$ suffices for proving this claim since we are looking only at the leaves. However, unlike Lemma 5.3, directly bounding the expected quadratic variation of the martingale corresponding to the protocol $\mathcal{C}^{\prime}$ is difficult. each rectangles $X_{c}\times Y_{c}$ that corresponds to the Boolean transcript $c$ into affine slices.

For rest of the claim, we now work with the protocol $\mathcal{C}^{\prime}$ where the Boolean communication happens first. To prove the claim, we condition on a Boolean transcript $\bm{c}=c$ and by induction show that

\displaystyle\operatorname*{\mathbb{E}}_{\bm{r}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r})}\times\bm{Y}_{(c,\bm{r})})}\right)\,\middle|\,\bm{c}=c\right]\leq\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X_{c}\times Y_{c})}\right),

(5.10)

where $(c,r)$ is the full transcript and $X_{c}\times Y_{c}$ is the rectangle containing all the inputs such that Boolean transcript is $c$ . Note that $\gamma_{\mathrm{rel}}(X_{c}\times Y_{c})$ is the probability of obtaining the Boolean transcript $c$ since the ambient space of $X_{c}$ and $Y_{c}$ is $\mathbb{R}^{n}$ .

Then, taking expectation over the Boolean transcript $c$ ,

	$\displaystyle\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}})}\right)\right]$	$\displaystyle\leq\operatorname*{\mathbb{E}}_{\bm{c}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{\bm{c}}\times\bm{Y}_{\bm{c}})}\right)\right]$
		$\displaystyle=\sum_{\begin{subarray}{c}c\in\{0,1\}^{},\|c\|\leq d\end{subarray}}\operatorname{\mathbf{Pr}}[\bm{c}=c]\ln\left(\dfrac{e}{\operatorname*{\mathbf{Pr}}[\bm{c}=c]}\right)$
		$\displaystyle\leq\ln(2e\cdot 2^{d})=O(d),$

where the last line follows from concavity.

Induction.

To complete the proof, we now show Equation 5.10 by induction. For this, let us look at an intermediate step $t$ in $\mathcal{C}^{\prime}$ where the Boolean communication is fixed to $c$ and Alice and Bob have exchanged some real messages $r_{<t}:=r_{1},\ldots,r_{t-1}$ . Let the current rectangle be $X_{(c,r_{<t})}\times Y_{(c,r_{<t})}$ and it is Alice’s turn to speak. Note that $X_{(c,r_{<t})}$ and $Y_{(c,r_{<t})}$ live in some affine subspaces at this point and in the current round, Alice sends the inner product of her input $x$ with a vector $a^{(t)}$ that is determined by the previous messages and orthogonal to the ambient space of $X_{(c,r_{<t})}$ . At this step, Bob’s set $Y_{(c,r_{<t})}$ does not change at all. We shall show that in each step, the log of the inverse of the relative measure of the current rectangle does not increase on average over the next message:

\displaystyle\operatorname*{\mathbb{E}}_{\bm{r}_{\leq t}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r}_{\leq t})})}\right)\,\middle|\,\bm{c}=c,\bm{r}_{<t}=r_{<t}\right]\leq\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X_{(c,r_{<t})})}\right),

(5.11)

and an analogous statement holds when Bob speaks. Taking an expectation over $\bm{r}_{<t}$ , the above directly applies (5.10) by a straightforward backward induction:

	$\displaystyle\operatorname*{\mathbb{E}}_{\bm{r}_{\leq t}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r}_{\leq t})}\times\bm{Y}_{(c,{\bm{r}_{\leq t})}})}\right)\,\middle\|\,\bm{c}=c\right]$	$\displaystyle\leq\operatorname*{\mathbb{E}}_{\bm{r}_{<t}}\left[\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(\bm{X}_{(c,\bm{r}_{<t})}\times\bm{Y}_{(c,{\bm{r}_{<t})}})}\right)\,\middle\|\,\bm{c}=c\right]$
		$\displaystyle\leq\cdots\leq\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X_{c}\times Y_{c})}\right).$

To see Equation 5.11, let us write $X:=X_{(c,r_{<t})}$ for Alice’s current set. Recall that since we have fixed the history, Alice has fixed inner product with some orthogonal directions $a^{(1)},\ldots,a^{(t-1)}$ and she has decided on the next direction $a:=a^{(t)}$ along which she will send the next inner product. Thus, $X$ lives in some fixed affine subspace $V^{\bot}+s$ where $V$ is the span of $a^{(1)},\ldots,a^{(t-1)}$ and the next message $r:=r_{t}=\left\langle x,a\right\rangle$ . Moreover, conditioned on the history till this point, the conditional probability distribution on Alice’s input $\bm{x}\in\mathbb{R}^{n}$ can be described as follows: the projections corresponding to the non-zero vectors in the sequence $a^{(1)},\ldots,a^{(t-1)}$ , i.e., the inner products $\left\langle\bm{x},a^{(\tau)}\right\rangle$ where $a^{(\tau)}\neq 0$ for $\tau<t$ , are fixed according to the shift $s$ , while the distribution on the orthogonal complement $V^{\bot}$ is that of the Gaussian measure $\gamma_{V^{\bot}}$ on the subspace $V^{\bot}$ after conditioning on the event that $\bm{x}\in X-s$ (which lives in $V^{\bot}$ ). This uses that projections of a standard $n$ -dimensional Gaussian in orthogonal directions are independent.

Let $k$ be the dimension of $V$ where $k<n$ . Then, by doing a linear transformation, we may assume that $V^{\bot}=\mathbb{R}^{n-k}$ (and thus $X\subseteq\mathbb{R}^{n-k}$ and the shift $s$ fixes the coordinates $n-k+1$ through $n$ ) and $a=e_{1}$ , i.e., in the current message Alice reveals the first coordinate of $\bm{x}\in\mathbb{R}^{n-k}$ where $\bm{x}$ is sampled from $\gamma_{n-k}$ conditioned on $\bm{x}\in X$ . In this case, $\gamma_{\mathrm{rel}}$ in the left hand side of Equation 5.11 is exactly $\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})$ if Alice sends $r$ as the message, while for the right hand side of Equation 5.11, we have $\gamma_{\mathrm{rel}}(X)=\gamma_{n-k}(X)$ . Denoting by $\mathrm{d}\mu_{x_{1}}$ the probability density function of $\bm{x}_{1}$ , our statement boils down to showing

\displaystyle\int_{\mathbb{R}}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}\right)\mathrm{d}\mu_{x_{1}}(r)

\displaystyle\leq\ln\left(\dfrac{e}{\gamma_{n-k}(X)}\right).

We show the above by explicitly writing the probability density function $\mathrm{d}\mu_{x_{1}}$ . Denote by $\mathrm{d}\gamma_{n-k}(x_{1},\ldots,x_{n-k})$ the standard Gaussian density function¹²¹²12Explicitly $\mathrm{d}\gamma_{m}(x_{1},\ldots,x_{m})=\prod_{i=1}^{m}\mathrm{d}\gamma_{1}(x_{i})$ where $\mathrm{d}\gamma_{1}(r)=\frac{1}{\sqrt{2\pi}}e^{-r^{2}/2}$ is the density function for one-dimensional standard Gaussian. in $\mathbb{R}^{n-k}$ . The density function of the random vector $\bm{x}$ sampled from $\gamma_{n-k}$ conditioned on $x\in X$ , is given ${\gamma_{n-k}(X)}^{-1}\cdot{\mathrm{d}\gamma_{n-k}(x_{1},\ldots,x_{n-k})}$ for $x\in X$ and zero outside. Thus, we have

	$\displaystyle\mathrm{d}\mu_{x_{1}}(r)$	$\displaystyle=\frac{\int_{X\cap\{x_{1}=r\}}\mathrm{d}\gamma_{n-k}(x_{1},\ldots,x_{n-k})}{\gamma_{n-k}(X)}$
		$\displaystyle=\mathrm{d}\gamma_{1}(r)\cdot\frac{\int_{X\cap\{x_{1}=r\}}\mathrm{d}\gamma_{n-k-1}(x_{2},\ldots,x_{n-k})}{\gamma_{n-k}(X)}=\mathrm{d}\gamma_{1}(r)\cdot\frac{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}{\gamma_{n-k}(X)}.$

Then, by concavity, the left hand side of Equation 5.11 is exactly given by

	$\displaystyle\int_{\mathbb{R}}\ln\left(\dfrac{e}{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}\right)\mathrm{d}\mu_{x_{1}}(r)$	$\displaystyle\leq\ln\left(\int_{\mathbb{R}}\dfrac{e}{\gamma_{\mathrm{rel}}(X\cap\{x_{1}=r\})}\mathrm{d}\mu_{x_{1}}(r)\right)$
		$\displaystyle=\ln\left(\dfrac{e}{\gamma_{n-k}(X)}\int_{\mathbb{R}}\mathrm{d}\gamma_{1}(r)\right)=\ln\left(\dfrac{e}{\gamma_{n-k}(X)}\right).\qed$

6 Level-Two Fourier Growth

In this section, we prove Theorem 1.3 that $L_{1,2}(h)=O\left(d^{3/2}\log^{3}(n)\right)$ . Similar to the proof of level-one bound Theorem 1.2, we start with a $d$ -round communication protocol $\widetilde{\mathcal{C}}$ over the Gaussian space as defined in Section 4. Note that $\widetilde{\mathcal{C}}$ in turn comes from the original Boolean communication protocol $\mathcal{C}$ . Thus in the following we assume without loss of generality $d\leq n$ .

Given the discussion in Subsection 4.3, to bound the second-level Fourier growth, one can attempt to bound the expected quadratic variation of the martingale that results from the protocol $\overline{\mathcal{C}}$ directly, but similar to the case of level-one it is hard to leverage cancellations here to prove the bound we aim for. So, starting from $\widetilde{\mathcal{C}}$ , we will define a communication protocol $\overline{\mathcal{C}}$ that computes the same function as $\widetilde{\mathcal{C}}$ , but satisfies some additional “clean” property where it is easier to control the quadratic variation. This new protocol will differ from $\widetilde{\mathcal{C}}$ in two ways. Firstly, the protocol $\overline{\mathcal{C}}$ will consist of additional “cleanup steps” where Alice and Bob reveal certain quadratic forms of their input. Secondly, the protocol $\overline{\mathcal{C}}$ will send the real value of the quadratic form with certain precision. Note that this protocol will not involve sending real messages at all, instead, any potential real messages will be truncated to a few bits of precision and be sent as Boolean messages.

We emphasize that the main difference in the protocol $\overline{\mathcal{C}}$ from the corresponding level-one variant comes from the precision control, which is not needed there due to the fact that Gaussian distribution remains a (lower-dimensional) Gaussian under linear projections. For technical reasons we shall also need to analyze the martingale under a truncated Gaussian distribution, where all coordinates are bounded in some large interval $[-T,T]$ . This intuitively doesn’t incur a noticeable difference on the distribution since it is highly unlikely that coordinates drawn from Gaussian distribution will be outside such intervals and recalling Remark 4.2 and Proposition 4.4, it still suffices to analyze the corresponding martingale under the truncated Gaussian distribution.

We next define the notion of a $4$ -wise clean protocol.

6.1 $4$ -Wise Clean Protocols

Consider an intermediate node in the protocol and let $X\subseteq\mathbb{R}^{n}$ refer to the set of Alice’s inputs reaching this node. We denote by $\mathbb{S}^{n\times n-1}$ the set of all matrices in $\mathbb{R}^{n\times n}$ with zero diagonal and unit norm (when viewed as $n^{2}$ -dimensional vectors). For a parameter $\lambda>0$ , we say that the set $X$ is $4$ -wise clean in a direction $a\in\mathbb{S}^{n\times n-1}$ if

\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}-\sigma(X),a\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]<\lambda,

where we recall that $\sigma(X)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle|\,\bm{x}\in X\right]$ is the level-two center of mass of $X$ under the Gaussian measure. We say that the set $X$ is $4$ -wise clean if it is $4$ -wise clean in every direction $a$ . Our new protocol will consist of the original protocol, interspersed by cleaning steps. Once Alice sends her bit as in the original protocol, she cleans $X$ by revealing $\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\right\rangle$ with a few bits of precision while there exists direction $a\in\mathbb{S}^{n\times n-1}$ such that $X$ not clean in direction $a$ . Once $X$ becomes clean, Alice proceeds to the next round and Bob does an analogous cleanup. We now describe this formally.

Communication with Finite Precision.

Let positive integer $L$ be a precision parameter that we will use for truncation. In our new communication protocol, we will send real numbers with precision $2^{-L}$ . This is formalized as the $\mathrm{trunc}_{L}(z)$ function defined at $z\in\mathbb{R}$ as

\mathrm{trunc}_{L}(z)=\left\lfloor z\cdot 2^{L}\right\rfloor/2^{L}.

Construct $\overline{\mathcal{C}}$ from $\widetilde{\mathcal{C}}$ .

As described before, $\overline{\mathcal{C}}$ will consist of the original protocol along with extra steps where Alice or Bob reveal the (approximate) value of a quadratic form on their input. Consider an intermediate node of this new protocol at depth $t$ . We always use the random variable $\bm{X}^{(t)}$ (resp., $\bm{Y}^{(t)}$ ) to denote the set of inputs of Alice (resp., Bob) reaching the node. If Alice is revealing a quadratic form in this step, we use $\bm{a}^{(t)}$ to denote the matrix of the quadratic form revealed at this node, otherwise set $\bm{a}^{(t)}$ to be the all-zeroes matrix. We define $\bm{b}^{(t)}$ similarly for Bob. Throughout the protocol, we will always set $\bm{u}^{(t)}$ and $\bm{v}^{(t)}$ to denote $\sigma(\bm{X}^{(t)})$ and $\sigma(\bm{Y}^{(t)})$ respectively.

Recall that $\lambda>0$ is the parameter for cleanup to be optimized later. Since we will now send real numbers (with certain precision) as bit-strings, their magnitudes should also be well controlled to guarantee bounded message length. This is managed by a parameter $T>0$ and we will restrict the inputs to the parties in $\overline{\mathcal{C}}$ to come from the box $[-T,T]^{n}$ . Note that, by Gaussian concentration, $T=\Theta\left(\sqrt{\log(n)}\right)$ suffices.

1.

At the beginning, Alice receives an input $x\in[-T,T]^{n}$ and Bob receives an input $y\in[-T,T]^{n}$ .
2.

We initialize $t\leftarrow 0$ , $\bm{X}^{(0)},\bm{Y}^{(0)}\leftarrow[-T,T]^{n}$ , and $\bm{a}^{(0)},\bm{b}^{(0)}\leftarrow 0^{n\times n}$ .

(a)

Orthogonalization by revealing the correlation with Bob’s center of mass.
Alice begins by revealing the inner product of her input $x$ with Bob’s current (signed) level-two center of mass $\eta\odot\bm{v}^{(t)}$ . Since in the previous steps, she has already revealed the inner product with Bob’s previous centers of mass, for technical reasons, we will only have Alice announce the inner product with the component of $\eta\odot\bm{v}^{(t)}$ that is orthogonal to the previous directions along which Alice announced the inner product. More formally, let $\bm{a}^{(t+1)}$ be the component of $\eta\odot\bm{v}^{(t)}$ that is orthonormal to the span of the previous directions $\bm{a}^{(\tau)}$ for $\tau\leq t$ , i.e.,

\textstyle\bm{a}^{(t+1)}=\mathrm{unit}\left(\eta\odot\bm{v}^{(t)}-\sum_{\tau=1}^{t}\left\langle\eta\odot\bm{v}^{(t)},\bm{a}^{(\tau)}\right\rangle\cdot\bm{a}^{(\tau)}\right).

Alice computes $\overline{\bm{c}}^{(t+1)}\leftarrow\mathrm{trunc}_{L}\left(\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(t+1)}\right\rangle\right)$ and sends $\overline{\bm{c}}^{(t+1)}$ to Bob. Set $\bm{b}^{(t+1)}\leftarrow 0^{n\times n}$ . Increment $t$ by $1$ and go to step (b).

(b)

Original communication. Alice sends the bit $\overline{\bm{c}}^{(t+1)}$ that she was supposed to send in $\widetilde{\mathcal{C}}$ based on previous messages and $x$ . Set $\bm{a}^{(t+1)},\bm{b}^{(t+1)}\leftarrow 0^{n\times n}$ . Increment $t$ by 1 and go to step (c).
(c)

Cleanup steps. While there exists some direction $a\in\mathbb{S}^{n\times n-1}$ orthogonal to previous directions, i.e., $\left\langle a,\bm{a}^{(\tau)}\right\rangle=0$ for all $\tau\leq t$ , and $\bm{X}^{(t)}$ is not $4$ -wise clean in direction $a$ , Alice computes $\overline{\bm{c}}^{(t+1)}\leftarrow\mathrm{trunc}_{L}\left(\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\right\rangle\right)$ and sends $\overline{\bm{c}}^{(t+1)}$ to Bob. Set $\bm{a}^{(t+1)}\leftarrow a$ and $\bm{b}^{(t+1)}\leftarrow 0^{n\times n}$ . Increment $t$ by 1. Repeat step (c) while $\bm{X}^{(t)}$ is not $4$ -wise clean; otherwise, increment $i$ by 1 and go back to the for-loop in step 3 which starts a new phase.

If it is Bob’s turn to speak, we define everything similarly with the role of $x,\bm{a},\bm{X},\bm{u}$ switched with $y,\bm{b},\bm{Y},\bm{v}$ .

4.

Finally at the end of the protocol, the value $\overline{\mathcal{C}}(x,y)$ is determined based on all the previous communication and the corresponding output it defines in $\widetilde{\mathcal{C}}$ .

Remark 6.1.

Note that by construction, the non-zero matrices among $\bm{a}^{(1)},\bm{a}^{(2)},\ldots$ form an orthonormal set when viewed as $n^{2}$ -dimensional vectors (similarly for $\bm{b}^{(1)},\bm{b}^{(2)},\ldots$ ) and moreover, their diagonals are zero. Lastly, $\bm{a}^{(t)}$ and $\bm{b}^{(t)}$ are known to both Alice and Bob as they are canonically determined by previous messages.

We remark that the steps 3(a), 3(b), and 3(c) always occur in sequence for each party and we refer to such a sequence of steps as a phase for that party. Note that there are at most $d$ phases. If a new phase starts at time $t$ , then the current rectangle $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ is $4$ -wise clean for both parties by construction.

Now we formalize a few useful properties regarding the communication protocol $\overline{\mathcal{C}}$ . The first fact below follows since each $\bm{u}^{(t)}$ is an expectation of $\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}$ over some distribution and $\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}$ has zero diagonal.

Fact 6.2.

$\bm{u}^{(0)}=\bm{v}^{(0)}=0^{n\times n}$ and each $\bm{u}^{(t)},\bm{v}^{(t)}$ has zero diagonal.

The following follows from tail bounds for the univariate standard normal distribution.

Fact 6.3.

Let $\gamma^{*}=\gamma(\bm{X}^{(0)})\cdot\gamma(\bm{Y}^{(0)})$ . Then $\gamma^{*}\geq 1-O\left(n\cdot e^{-T^{2}/2}\right)$ .

The next fact says that when a node fixes a quadratic form with $2^{-L}$ precision, for any two inputs that reach this node, the quadratic forms differ by at most $2^{-L}$ .

Fact 6.4.

In step 3(a) and 3(c), any $x,x^{\prime}\in\bm{X}^{(t+1)}$ satisfies $\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(t+1)}\right\rangle-\left\langle x^{\prime}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x^{\prime},\bm{a}^{(t+1)}\right\rangle\right|<2^{-L}$ . Similarly any $y,y^{\prime}\in\bm{Y}^{(t+1)}$ satisfies $\left|\left\langle y\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y,\bm{b}^{(t+1)}\right\rangle-\left\langle y^{\prime}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y^{\prime},\bm{b}^{(t+1)}\right\rangle\right|<2^{-L}$ .

The next claim bounds the maximum attainable norms for Alice and Bob’s level-two center of masses at any point in the protocol. This uses the fact that the inputs come from the truncated Gaussian distribution.

Claim 6.5.

$\left\|\bm{u}^{(t)}\right\|=\left\|\eta\odot\bm{u}^{(t)}\right\|<nT$ and $\left\|\bm{v}^{(t)}\right\|=\left\|\eta\odot\bm{v}^{(t)}\right\|<nT$ for all possible $t$ and $\bm{u}^{(t)},\bm{v}^{(t)}$ throughout the communication.

Proof.

Since $\eta$ is a matrix with zero diagonal and $\{\pm 1\}$ entries off diagonal and $\bm{u}^{(t)}$ also has zero diagonal, $\left\|\bm{u}^{(t)}\right\|=\left\|\eta\odot\bm{u}^{(t)}\right\|$ . In addition, since $\bm{X}^{(t)}\subseteq\bm{X}^{(0)}=[-T,T]^{n}$ , we have

\left\|\bm{u}^{(t)}\right\|\leq\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\|\left(\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\right)\right\|\,\middle|\,\bm{x}\in\bm{X}^{(t)}\right]\leq\sqrt{(n^{2}-n)\cdot T^{2}}<nT.

A similar analysis works for $\bm{v}^{(t)}$ . ∎

The next claim gives a bound on the length of any message in the protocol $\overline{\mathcal{C}}$ .

Claim 6.6.

For any $x\in\bm{X}^{(0)}$ and $y\in\bm{Y}^{(0)}$ , any message in $\overline{\mathcal{C}}(x,y)$ consists of at most $L+\log(Tn)$ many bits.

Proof.

Assume without loss of generality it is Alice’s turn to speak. On step 3(b) she sends one bits. On steps 3(a) and 3(c), she computes $\mathrm{trunc}_{L}(\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\rangle)$ for some $a\in\mathbb{S}^{n\times n-1}$ and send the result. Since

\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,a\right\rangle\right|\leq\left\|x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right\|\cdot\left\|a\right\|\leq\sqrt{(n^{2}-n)\cdot T^{2}}<nT,

and the message is a multiple of $2^{-L}$ that means $\mathrm{trunc}_{L}$ yields a message with $L+\log(nT)$ many bits. ∎

The last claim bounds the maximum depth of the new protocol $\overline{\mathcal{C}}$ .

Claim 6.7.

Let $\ell$ be an arbitrary leaf of the protocol $\overline{\mathcal{C}}$ and $D(\ell)$ be its depth. Then $D(\ell)\leq 2n^{2}$ . Moreover, along this path there are at most $n^{2}-n$ many non-zero $\bm{a}^{(t)}$ and at most $n^{2}-n$ many non-zero $\bm{b}^{(t)}$ for $t\in\{1,\ldots,D(\ell)\}$ .

Proof.

We count the number of communication steps separately:

•

Steps 3(a) and 3(b). Steps 3(a) and 3(b) occur once in every phase, thus at most $d$ times.
•

Step 3(c). For Alice, each time she communicates at step 3(c), the direction $a\in\mathbb{S}^{n\times n-1}$ is non-zero and orthogonal to all previous $\bm{a}^{(t)}$ ’s. Since the dimension of $\mathbb{S}^{n\times n-1}$ is $n^{2}-n$ , this happens at most $n^{2}-n$ times. Similar argument works for Bob.

Thus in total we have at most $2(n^{2}-n)+2d\leq 2n^{2}$ steps. ∎

We will eventually show that, with suitable choice of $\lambda,T,L$ , typically $D(\ell)$ is at most $d\cdot\mathrm{polylog}(n)$ .

6.2 Bounding the Expected Quadratic Variation

Consider the martingale process defined in Equation 4.5 from a random walk on the protocol tree generated by $\overline{\mathcal{C}}$ where the inputs $\bm{x},\bm{y}$ are sampled from $\gamma_{n}$ conditioned on being in the bounded cube $[-T,T]^{n}$ . Recall that Proposition 4.3 still holds (see Remark 4.5).

Formally, at time $t$ the process is defined by

\bm{z}^{(t)}_{2}=\left\langle\bm{u}^{(t)},\eta\odot\bm{v}^{(t)}\right\rangle,

where we recall that $\bm{u}^{(t)}=\sigma(\bm{X}^{(t)})$ and $\bm{v}^{(t)}=\sigma(\bm{Y}^{(t))})$ and $\eta$ is a fixed sign matrix with a zero diagonal. The martingale process stops once it hits a leaf of $\overline{\mathcal{C}}$ . Let $\bm{d}$ denote the (stopping) time when this happens. Note that $\operatorname*{\mathbb{E}}[\bm{d}]$ is exactly the expected depth of the protocol $\overline{\mathcal{C}}$ .

In light of Remark 4.2 and Proposition 4.4, to prove Theorem 1.3, it suffices to prove the following.

Lemma 6.8.

$\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right]=O\left(d^{3}\log^{6}(n)\right).$

Lemma 6.8 is proved in three steps. We first show that essentially the only change in the value of the martingale is the orthogonalization step 3(a). The reason is the same as the level-one bound: Alice’s messages sent in step 3(b) and 3(c) are always near-orthogonal to Bob’s current level-two center of mass, thus they do not change the value of the martingale $\bm{z}^{(t)}_{2}$ much. Moreover, by level-two analog of • ‣ Subsection 2.1, since Alice’s current node was clean just before Alice sent the message in step 3(a), the expected change $\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{2}\right)^{2}\right]$ can be bounded in terms of the squared norm of the change that occurred in $\bm{u}^{(t)}$ (or $\bm{v}^{(t)}$ ) between the current round and the last round where Alice was in step 3(a). Similar argument works for Bob.

Formally, this is encapsulated by the next lemma for which we need some additional definitions. Let $(\mathcal{F}^{(t)})_{t}$ denote the natural filtration induced by the random walk on the generalized protocol tree with respect to which $\bm{z}^{(t)}_{2}$ is a Doob martingale and also $\bm{u}^{(t)},\bm{v}^{(t)}$ form vector-valued martingales (recall Proposition 4.3). Note that $\mathcal{F}^{(t)}$ fixes all the rectangles encountered during times $0,\ldots,t$ and thus for $\tau\leq t$ , the random variables $\bm{u}^{(\tau)},\bm{v}^{(\tau)},\bm{z}^{(\tau)}_{2}$ are determined, in particular, they are $\mathcal{F}^{(t)}$ -measurable. Recalling that $\lambda$ is the cleanup parameter to be optimized later, we then have the following. Below we assume without any loss of generality that Alice speaks first and, in particular, we note that Alice speaks in step 3(a) for the first time at time zero when both Alice and Bob’s center of masses are at zero: $\bm{u}^{(0)}=\bm{v}^{(0)}=0$ .

Lemma 6.9 (Step Size).

\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}_{m}+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau}_{m})}\right]\leq\lambda\cdot\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}+16n^{7}T^{3}\cdot 2^{-L}.

and moreover, for any $t\in\mathbb{N}$ , we have that

\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(t+1)}_{2}\right)^{2}\,\middle|\,\mathcal{F}^{(t)},\bm{\tau}_{m-1}<t<\bm{\tau}_{m},\text{Alice speaks at time }t\right]\leq 4n^{6}T^{2}\cdot 2^{-2L}

We indeed see that, if $L=\Omega(\log(n))$ and $T=O(\sqrt{\log(n)})$ , then $\mathrm{poly}(T,n)\cdot 2^{-L}=o(1)$ , and steps 3(b) and 3(c) do not contribute much to the quadratic variation and only the steps 3(a) do. Also, since the first time Alice and Bob speak, they start in step 3(a), we also note that $\bm{u}^{(\bm{\tau}_{1})}$ and $\bm{v}^{(\bm{\tau}^{\prime}_{1})}$ are their initial centers of mass which are both zero.

We shall prove the above lemma in Subsection 6.3 and continue with the bound on the quadratic variation here. Using the bounds on the step sizes from Lemma 6.9,

	$\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right]$	$\displaystyle\leq\lambda\cdot\operatorname{\mathbb{E}}\left[\sum_{m\geq 2}\left\\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\\|^{2}+\left\\|\bm{u}^{(\bm{\tau}^{\prime}_{m})}-\bm{u}^{(\bm{\tau}^{\prime}_{m-1})}\right\\|^{2}\right]+16n^{7}T^{3}\cdot 2^{-L}\cdot\operatorname{\mathbb{E}}[\bm{d}]$
		$\displaystyle\leq\lambda\cdot\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\\|^{2}+\left\\|\bm{u}^{(\bm{\tau}^{\prime}_{m})}-\bm{u}^{(\bm{\tau}^{\prime}_{m-1})}\right\\|^{2}\right]+16n^{7}T^{3}\cdot 2^{-L}\cdot 2n^{2}.$		(by Claim 6.7)

On the other hand, using the orthogonality of vector-valued martingale differences from Equation 3.2,

\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{m\geq 2}\left\|\bm{v}^{(\bm{\tau}_{m})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}\right]=\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right].

A similar statement holds for $(\bm{u}^{(t)})$ as well. Therefore,

\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right]\leq\lambda\cdot\left(\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}\right]+\operatorname*{\mathbb{E}}\left[\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\right)+64n^{9}T^{3}\cdot 2^{-L}.

(6.1)

Then in Subsection 6.4 we will apply level-two inequalities (see Theorem 3.1) to convert the bounding $\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]$ into bounding the second moment $\operatorname*{\mathbb{E}}[\bm{d}^{2}]$ . This reduction is formalized as Lemma 6.10 below and its proof is similar to [27, Claim 1].

For each leaf $\ell$ , let $\gamma(\ell)=\gamma(\bm{X}^{(D(\ell))})\cdot\gamma(\bm{Y}^{(D(\ell))})$ be the Gaussian measure of the rectangle at $\ell$ . Recall $\gamma^{*}=\gamma(\bm{X}^{(0)})\times\gamma(\bm{Y}^{(0)})$ .

Lemma 6.10.

$\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\leq O\left(\frac{1}{\gamma^{*}}+L^{2}\operatorname*{\mathbb{E}}[\bm{d}^{2}]\right)$ .

Finally, in Subsection 6.5, we bound the second moment $\operatorname*{\mathbb{E}}[\bm{d}^{2}]$ for a suitable choice of parameters.

Lemma 6.11.

It holds that $\operatorname*{\mathbb{E}}[\bm{d}^{2}]=O(d^{2})$ and $\gamma^{*}\geq\frac{3}{4}$ for $L=\Theta(\log(n))$ , $T=\Theta(\sqrt{\log(n)})$ , and $\lambda=\Theta(d\log^{4}(n))$ .

Given Lemmas 6.10 and 6.11,the proof of Lemma 6.8 naturally follows.

Proof of Lemma 6.8.

With the parameters chosen in Lemma 6.11, we have

$\displaystyle\operatorname*{\mathbb{E}}\left[\sum_{t=1}^{\bm{d}}\left(\Delta\bm{z}^{(t)}_{2}\right)^{2}\right]$	$\displaystyle\leq O(d\log^{4}(n))\cdot\left(\operatorname{\mathbb{E}}\left[\left\\|\bm{u}^{(\bm{d})}\right\\|^{2}\right]+\operatorname{\mathbb{E}}\left[\left\\|\bm{v}^{(\bm{d})}\right\\|^{2}\right]\right)+1$	(by Equation 6.1)
	$\displaystyle\leq O(d\log^{4}(n))\cdot\left(1+\log^{2}(n)\cdot\operatorname*{\mathbb{E}}[\bm{d}^{2}]\right)+1$	(by Lemma 6.10)
	$\displaystyle\leq O(d\log^{4}(n))\cdot\left(1+\log^{2}(n)\cdot d^{2}\right)+1$	(by Lemma 6.11)
	$\displaystyle=O(d^{3}\log^{6}(n)).$	∎

Remark 6.12.

Note that our proof for level-two Fourier growth actually holds for a slightly more general setting, where Alice and Bob are allowed to send $O(L)=O(\log(n))$ bits during each original communication round. This can be viewed as balancing the length of the messages in step 3(b) with step 3(a) and step 3(c).

Since one can always convert a $d$ -round $1$ -bit communication protocol into a $\frac{2d}{\log\log(n)}$ -round $\log(n)$ -bit communication protocol, we obtain a slightly better level-two Fourier growth bound of

O\left(\frac{d^{3/2}\log^{3}(n)}{\left(\log\log(n)\right)^{3/2}}\right).

The conversion is done by Alice (resp., Bob) enumerating the next $\log\log(n)/2$ bits from Bob (resp., Alice), and providing the corresponding $\log\log(n)/2$ bits responses for each possibility.

It is also possible to improve the $\log^{3}(n)$ factor to $\log^{2}(n)$ by varying the cleanup parameter $\lambda$ with depth. For example, for depth in the interval $[4rd,4(r+1)d]$ , one could pick $\lambda_{r}=\Theta(d\cdot\log^{2}(n)\cdot r^{2})$ . Since our focus is mostly on improving the polynomial dependence in $d$ where there is still room for improvement, we do not make an effort here to improve the polylog terms.

6.3 Bounds on Step Sizes (Proof of Lemma 6.9)

Let us abbreviate $\bm{\tau}=\bm{\tau}_{m}$ and note that at time $\bm{\tau}$ a new phase starts for Alice. By construction, this means that the current rectangle $\bm{X}^{(\bm{\tau})}\times\bm{Y}^{(\bm{\tau})}$ determined by $\mathcal{F}^{(\bm{\tau})}$ is $4$ -wise clean with parameter $\lambda$ , and since Alice is in step 3(a) at the start of a new phase, $\bm{a}^{(\bm{\tau}+1)}$ is chosen to be the (normalized) component of $\eta\odot\bm{v}^{(\bm{\tau})}$ that is orthogonal to previous directions $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})}$ .

For each $r=1,\ldots,\bm{\tau}+1$ , let $\bm{\beta}^{(r)}:=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle$ be the length of $\eta\odot\bm{v}^{(\bm{\tau})}$ along direction $\bm{a}^{(r)}$ . Each $\bm{\beta}^{(r)}$ is $\mathcal{F}^{(\bm{\tau})}$ -measurable (i.e., it is determined by $\mathcal{F}^{(\bm{\tau})}$ ) and $\eta\odot\bm{v}^{(\bm{\tau})}=\sum_{r\leq\bm{\tau}+1}\bm{\beta}^{(r)}\cdot\bm{a}^{(r)}$ . In this case, we have

	$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
		$\displaystyle=\operatorname*{\mathbb{E}}\left[\left(\sum_{r=1}^{\bm{\tau}+1}\bm{\beta}^{(r)}\cdot\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right].$		(6.2)

Similar to the level-one proof, the components of $\bm{u}^{(\bm{\tau}+1)}$ and $\bm{u}^{(\bm{\tau})}$ are roughly the same along any of the previous directions $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})}$ and so they almost cancel out and the major quantity is in the direction $\bm{a}^{(\bm{\tau}+1)}$ . This follows since, in all the previous steps $r\leq\bm{\tau}$ , Alice has already fixed $\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(r)}\right\rangle$ with precision $2^{-L}$ . This implies that for any $\bm{X}^{(\bm{\tau})}$ and $\bm{X}^{(\bm{\tau}+1)}$ that are determined by $\mathcal{F}^{(\bm{\tau}+1)}$ , the inner product with all the previous $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})}$ is fixed with precision $2^{-L}$ over the choice of $x$ . Formally, by Fact 6.4, we have that for any $x\in\bm{X}^{(\bm{\tau})}$ and $x^{\prime}\in\bm{X}^{(\bm{\tau}+1)}$ , it holds that $\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(r)}\right\rangle-\left\langle x^{\prime}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x^{\prime},\bm{a}^{(r)}\right\rangle\right|\leq 2^{-L}$ for all $r\leq\bm{\tau}$ . In particular, since $\bm{u}^{(\bm{\tau})}=\sigma(\bm{X}^{(\bm{\tau})})$ and $\bm{u}^{(\bm{\tau}+1)}=\sigma(\bm{X}^{(\bm{\tau}+1)})$ are the corresponding centers of mass, we have that

\left|\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right|\leq 2^{-L}\quad\text{for all $r\leq\bm{\tau}$.}

(6.3)

On the other hand, since $\bm{X}^{(\bm{\tau}+1)}\subseteq\bm{X}^{(\bm{\tau})}\subseteq\bm{X}^{(0)}=[-T,T]^{n}$ and $\bm{a}^{(\bm{\tau}+1)}$ is a unit direction, we have

\left|\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle\right|\leq\left\|\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})}\right\|\leq 2nT.

(6.4)

Similarly, noting that $\eta$ is a sign matrix, we can bound

\left|\bm{\beta}^{(r)}\right|=\left|\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right|\leq\left\|\eta\odot\bm{v}^{(\bm{\tau})}\right\|\leq\left\|\bm{v}^{(\bm{\tau})}\right\|\leq nT\quad\text{for all $r\leq\bm{\tau}+1$.}

(6.5)

Expanding the square in Subsection 6.3 and plugging these estimates to each one of the $(\bm{\tau}+1)^{2}$ terms gives

	$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle\leq\operatorname*{\mathbb{E}}\left[\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}+((\bm{\tau}+1)^{2}-1)\cdot\tfrac{2(nT)^{3}}{2^{L}}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
		$\displaystyle\leq\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]+12n^{7}T^{3}\cdot 2^{-L},$		(6.6)

where the second line follows from Claim 6.7.

We now bound the term outside the expectation by the change in the center of mass $\bm{v}^{(\cdot)}$ and the term inside the expectation by the fact that the set is $4$ -wise clean.

Term Outside the Expectation.

Recall that $\bm{a}^{(\bm{\tau}+1)}$ is chosen to be the (normalized) component of $\eta\odot\bm{v}^{(\bm{\tau})}$ that is orthogonal to the span of $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau})}$ . Since $\eta\odot\bm{v}^{(\bm{\tau}_{m}-1)}$ is in the span of $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)}$ and $\bm{\tau}_{m-1}+1\leq\bm{\tau}=\bm{\tau}_{m}$ , it is orthogonal to $\bm{a}^{(\bm{\tau}+1)}$ . Hence

\bm{\beta}^{(\bm{\tau}+1)}=\left\langle\eta\odot\bm{v}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle=\left\langle\eta\odot\left(\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right),\bm{a}^{(\bm{\tau}+1)}\right\rangle.

Since $\bm{a}^{(\bm{\tau}+1)}$ is a unit direction and $\eta$ is a sign matrix, this implies that

\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\leq\left\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\|^{2}.

(6.7)

Term Inside the Expectation.

Recall that Alice is in step 3(a), she sends $\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle$ with precision $2^{-L}$ at time $\bm{\tau}$ , and thus the same inner product with $\bm{a}^{(\bm{\tau}+1)}$ is fixed with precision $2^{-L}$ for every point in $\bm{X}^{(\bm{\tau}+1)}$ determined by $\mathcal{F}^{(\bm{\tau}+1)}$ . Thus

$\displaystyle\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}$	$\displaystyle=\left(\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(\bm{\tau}+1)}\right\rangle\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\right)^{2}$
	$\displaystyle=\left(\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle+\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\varepsilon_{\bm{x}}\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\right)^{2}$	( $\|\varepsilon_{\bm{x}}\|\leq 2^{-L}$ is the truncation error by Fact 6.4)
	$\displaystyle\leq\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}+2^{-2L}+2^{1-L}\cdot\left\|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle\right\|$
	$\displaystyle\leq\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}+nT\cdot 2^{2-L},$	(6.8)

where the last line follows from $\left|\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,\bm{a}^{(\bm{\tau}+1)}\right\rangle\right|\leq\left\|x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right\|$ and $x\in\bm{X}^{(0)}=[-T,T]^{n}$ .

Final Bound.

Since $(\bm{u}^{(r)})_{r}$ is a matrix-valued martingale and thus $\operatorname*{\mathbb{E}}\left[\bm{u}^{(\bm{\tau}+1)}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\bm{u}^{(\bm{\tau})}$ , we have

\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]

Then by Equation 6.8, we upper bound the right hand side by

\displaystyle nT\cdot 2^{2-L}+\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}-\left\langle\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right].

Since $\bm{X}^{(\bm{\tau})}$ is $4$ -wise clean with parameter $\lambda$ , it can be bounded by $nT\cdot 2^{2-L}+\lambda$ :

\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle|\,\mathcal{F}^{(\bm{\tau})}\right]\leq nT\cdot 2^{2-L}+\lambda

(6.9)

Putting everything together, we have

$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle\leq\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]+12n^{7}T^{3}\cdot 2^{-L}$	(by Equation 6.6)
	$\displaystyle\leq\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}\cdot\left(nT\cdot 2^{2-L}+\lambda\right)+12n^{7}T^{3}\cdot 2^{-L}$	(by Equation 6.9)
	$\displaystyle\leq\lambda\cdot\left(\bm{\beta}^{(\bm{\tau}+1)}\right)^{2}+n^{3}T^{3}\cdot 2^{2-L}+12n^{7}T^{3}\cdot 2^{-L}$	(by Equation 6.5)
	$\displaystyle\leq\lambda\cdot\left\\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\\|^{2}+n^{3}T^{3}\cdot 2^{2-L}+12n^{7}T^{3}\cdot 2^{-L}$	(by Equation 6.7)
	$\displaystyle\leq\lambda\cdot\left\\|\bm{v}^{(\bm{\tau})}-\bm{v}^{(\bm{\tau}_{m-1})}\right\\|^{2}+16n^{7}T^{3}\cdot 2^{-L}.$

This completes the proof of the first statement in the lemma.

\Delta\bm{z}^{(t+1)}_{2}=\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}\right\rangle.

(6.10)

Analogous to Equation 6.3, the component of Alice’s center of mass along the previous directions are fixed with precision $2^{-L}$ . Thus by Fact 6.4,

\left|\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle\right|\leq 2^{-L}\quad\text{for all $r\leq t$.}

(6.11)

Furthermore, by construction, $\eta\odot\bm{v}^{(\bm{\tau}_{m-1})}$ lies in the space spanned by $\bm{a}^{(1)},\ldots,\bm{a}^{(\bm{\tau}_{m-1}+1)}$ . Note that $\bm{\tau}_{m-1}+1\leq t$ . Similar to the previous analysis, for each $r=1,\ldots,t$ , let $\bm{\beta}^{(r)}:=\left\langle\eta\odot\bm{v}^{(t)},\bm{a}^{(r)}\right\rangle$ be the length of $\eta\odot\bm{v}^{(t)}$ along direction $\bm{a}^{(r)}$ . Then Equation 6.5 also holds here. Therefore

$\displaystyle\left\|\Delta\bm{z}^{(t+1)}_{2}\right\|$	$\displaystyle=\left\|\sum_{r=1}^{t}\bm{\beta}^{(r)}\cdot\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle\right\|$	(by Equation 6.10)
	$\displaystyle\leq\sum_{r=1}^{t}\left\|\bm{\beta}^{(r)}\right\|\cdot\left\|\left\langle\bm{u}^{(t+1)}-\bm{u}^{(t)},\bm{a}^{(r)}\right\rangle\right\|\leq\sum_{r=1}^{t}nT\cdot 2^{-L}$	(by Equation 6.5 and Equation 6.11)
	$\displaystyle\leq 2n^{3}T\cdot 2^{-L}.$	(by Claim 6.7)

6.4 Conversion to Second Moment Bounds of the Depth (Proof of Lemma 6.10)

Recall $\gamma^{*}=\gamma(\bm{X}^{(0)})\times\gamma(\bm{Y}^{(0)})$ and $\gamma(\ell)=\gamma(\bm{X}^{(D(\ell)}))\cdot\gamma(\bm{Y}^{(D(\ell))})$ for each leaf $\ell$ . The goal of this subsection is to prove Lemma 6.10.

We first note the following basic fact.

Fact 6.13.

$\sum_{\ell}\gamma(\ell)=\gamma^{*}$ and

\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\bm{X}^{(0)},\bm{y}\sim\bm{Y}^{(0)}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\text{ reaches leaf }\ell\right]=\gamma(\ell)/\gamma^{*}.

Now we apply Theorem 3.1 with $k=2$ to relate the LHS of Lemma 6.10 with an entropy-type bound.

Lemma 6.14.

$\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\leq\frac{4e^{2}}{\gamma^{*}}\sum_{\ell}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right)$ .

Proof.

Let $\ell$ be a fixed leaf and $D=D(\ell)$ be its depth. Note that this also fixes the rectangle $X^{(D)}\times Y^{(D)}$ and thus the centers of mass $u^{(D)},v^{(D)}$ . Define the indicator function $\mathbf{1}_{\ell}\colon\mathbb{R}^{2n}\to\{0,1\}$ by

\mathbf{1}_{\ell}(x,y)=\begin{cases}1&(x,y)\in X^{(D)}\times Y^{(D)},\\ 0&\text{otherwise.}\end{cases}

Then we have

	$\displaystyle\phantom{\leq}\left\\|u^{(D)}\right\\|^{2}+\left\\|v^{(D)}\right\\|^{2}$
	$\displaystyle=\left\\|\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x}\,\middle\|\,\bm{x}\in X^{(D)}\right]\right\\|^{2}+\left\\|\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y}\,\middle\|\,\bm{y}\in Y^{(D)}\right]\right\\|^{2}$
	$\displaystyle=\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}_{i}\bm{x}_{j}\,\middle\|\,\bm{x}\in X^{(D)}\right]\right)^{2}+\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\bm{y}_{i}\bm{y}_{j}\,\middle\|\,\bm{y}\in Y^{(D)}\right]\right)^{2}$
	$\displaystyle=\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{x}_{i}\bm{x}_{j}\,\middle\|\,(\bm{x},\bm{y})\in X^{(D)}\times Y^{(D)}\right]\right)^{2}+\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}\left(\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{y}_{i}\bm{y}_{j}\,\middle\|\,(\bm{x},\bm{y})\in X^{(D)}\times Y^{(D)}\right]\right)^{2}$
	$\displaystyle=\frac{2}{\gamma(\ell)^{2}}\left(\sum_{S\in\binom{[n]}{2}}\left(\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma,\bm{y}\sim\gamma}\left[\mathbf{1}_{\ell}(\bm{x},\bm{y})\bm{x}_{S}\right]\right)^{2}+\sum_{S\in\binom{[n]}{2}}\left(\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma,\bm{y}\sim\gamma}\left[\mathbf{1}_{\ell}(\bm{x},\bm{y})\bm{y}_{S}\right]\right)^{2}\right)$
	$\displaystyle\leq\frac{2}{\gamma(\ell)^{2}}\sum_{S\in\binom{[2n]}{2}}\left(\operatorname*{\mathbb{E}}_{\bm{w}\sim\gamma_{n}\times\gamma_{n}}\left[\mathbf{1}_{\ell}(\bm{w})\bm{w}_{S}\right]\right)^{2}$
	$\displaystyle\leq\frac{2}{\gamma(\ell)^{2}}\cdot 2e^{2}\gamma(\ell)^{2}\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right)$		(by Theorem 3.1)
	$\displaystyle=4e^{2}\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right).$

Therefore taking expectation over a random $\ell$ , by Fact 6.13, we have

\operatorname*{\mathbb{E}}\left[\left\|\bm{u}^{(\bm{d})}\right\|^{2}+\left\|\bm{v}^{(\bm{d})}\right\|^{2}\right]\leq 4e^{2}\cdot\operatorname*{\mathbb{E}}_{\bm{\ell}}\left[\ln^{2}\left(\frac{e}{\gamma(\bm{\ell})}\right)\right]=\frac{4e^{2}}{\gamma^{*}}\sum_{\ell}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right).

∎

Now in the next lemma, we bound the right hand side of Lemma 6.14 in terms of the second moment of the depth, which immediately proves Lemma 6.10.

Lemma 6.15.

Assume that $Tn\leq 2^{L}$ . Then, $\sum_{\ell}\gamma(\ell)\cdot\ln^{2}\left(e/{\gamma(\ell)}\right)\leq O(1+\gamma^{*}\cdot L^{2}\operatorname*{\mathbb{E}}[\bm{d}^{2}])$ .

Proof.

By Claim 6.6, and the assumption $Tn\leq 2^{L}$ each message is of length at most $L+\log(Tn)\leq 2L$ . We divide $\ell$ into two cases based on $\gamma(\ell)$ :

	$\displaystyle\sum_{\ell:\gamma(\ell)<2^{-3L\cdot D(\ell)}}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right)$
	$\displaystyle\leq\sum_{\ell:\gamma(\ell)<2^{-3L\cdot D(\ell)}}2^{-3L\cdot D(\ell)}\cdot\ln^{2}\left(e\cdot 2^{3L\cdot D(\ell)}\right)$		( $x\ln^{2}(e/x)$ is increasing when $0\leq x\leq 0.2$ )
	$\displaystyle\leq\sum_{t=1}^{\infty}2^{-3L\cdot t}\cdot 2(9L^{2}t^{2}+1)\cdot\left\|\left\{\ell:D(\ell)=t\right\}\right\|$		(since $\ln^{2}(ab)\leq 2\ln^{2}(a)+2\ln^{2}(b)$ )
	$\displaystyle\leq\sum_{t=1}^{\infty}2^{-3L\cdot t}\cdot 2(9L^{2}t^{2}+1)\cdot 2^{(2L)\cdot t}$		(each message is of length $\leq 2L$ )
	$\displaystyle\leq\sum_{t=1}^{\infty}2(9L^{2}t^{2}+1)\cdot 2^{-Lt}=O(1)$		(since $L\geq 2$ )

and

	$\displaystyle\sum_{\ell:\gamma(\ell)\geq 2^{-3L\cdot D(\ell)}}\gamma(\ell)\cdot\ln^{2}\left(\frac{e}{\gamma(\ell)}\right)$	$\displaystyle\leq\sum_{\ell:\gamma(\ell)\geq 2^{-3L\cdot D(\ell)}}\gamma(\ell)\cdot\ln^{2}\left(e\cdot 2^{3L\cdot D(\ell)}\right)$
		$\displaystyle\leq 2\cdot 9L^{2}\sum_{\ell}\gamma(\ell)D(\ell)^{2}+2\sum_{\ell}\gamma(\ell)$
		$\displaystyle=18L^{2}\gamma^{}\cdot\operatorname{\mathbb{E}}_{\bm{\ell}}\left[D(\bm{\ell})^{2}\right]+2$
		$\displaystyle=18L^{2}\gamma^{}\cdot\operatorname{\mathbb{E}}\left[\bm{d}^{2}\right]+2.$

Adding up the two estimates above gives the desired bound. ∎

6.5 Second Moment Bounds for the Depth (Proof of Lemma 6.11)

The final ingredient is an estimate for the second moment $\operatorname*{\mathbb{E}}[\bm{d}^{2}]$ . This subsection is devoted to this goal and proving Lemma 6.11.

For messages $\ell^{\prime}=(\overline{\bm{c}}^{(1)},\ldots,\overline{\bm{c}}^{(t)})$ , we define $\gamma(\ell^{\prime})=\gamma(\bm{X}^{(t)})\cdot\gamma(\bm{Y}^{(t)})$ where $\bm{X}^{(t)},\bm{Y}^{(t)}$ is defined by the protocol using the messages $\ell^{\prime}$ . Note that this definition is consistent with $\gamma(\ell)$ from Subsection 6.4 for a leaf $\ell$ .

Lemma 6.16.

There exists a universal constant $\alpha>0$ such that the following holds. Let $0\leq d_{1}<d_{2}$ be two arbitrary integers with $d_{2}-d_{1}\geq 2d+1$ . Let $\ell^{*}=(\overline{\bm{c}}^{(1)},\ldots,\overline{\bm{c}}^{(d_{1})})$ be arbitrary messages of the first $d_{1}$ communication steps. Assume $2^{L}\geq 8n^{4}T^{2}$ . Then

\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]\leq\frac{\alpha\cdot d_{2}^{2}L^{2}}{\lambda\cdot(d_{2}-d_{1}-2d)}+\frac{1}{4}\cdot\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}.

Proof.

Let $\bm{x},\bm{y}$ be sampled from $\gamma$ conditioned on $\bm{x}\in\bm{X}^{(0)},\bm{y}\in\bm{Y}^{(0)}$ . Let $\bm{\ell}$ be its corresponding leaf in $\overline{\mathcal{C}}$ and $\bm{d}$ be the depth of $\bm{\ell}$ . By Claim 6.7, $\bm{\ell}$ always has finite depth. We extend $\bm{a}^{(t)}=\bm{b}^{(t)}=0^{n\times n}$ and $\bm{X}^{(t)}=\bm{X}^{(\bm{d})},\bm{Y}^{(t)}=\bm{Y}^{(\bm{d})}$ for all $t>\bm{d}$ . Then define

\bm{k}(\bm{x},\bm{y})=\sum_{t=d_{1}+1}^{d_{2}}\left(\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},\bm{b}^{(t)}\right\rangle^{2}\right)\quad\text{and}\quad K=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}(\bm{x},\bm{y})\,\middle|\,\ell^{*}\right],

where $\bm{a}^{(\cdot)}$ ’s and $\bm{b}^{(\cdot)}$ ’s depend only on $\bm{\ell}$ .¹³¹³13Note that $\bm{\ell}$ specifies all the communication messages, which allows us to simulate the protocol and obtain each $\bm{a}^{(\cdot)}$ and $\bm{b}^{(\cdot)}$ . Equivalently, we can write $K$ as

K=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}(\bm{x},\bm{y})\,\middle|\,(\bm{x},\bm{y})\in X^{(d_{1})}\times Y^{(d_{1})}\right],

where $X^{(d_{1})}$ and $Y^{(d_{1})}$ are fixed due to $\ell^{*}$ .

Observe that for any fixed $t\geq d_{1}$ , $\bm{X}^{(t)}\times\bm{Y}^{(t)}$ induced by different $\bm{\ell}$ , conditioned on $\ell^{*}$ , is a disjoint partition of $X^{(d_{1})}\times Y^{(d_{1})}$ . Therefore sampling $\bm{x},\bm{y}\sim\gamma$ conditioned on $(\bm{x},\bm{y})\in X^{(d_{1})}\times Y^{(d_{1})}$ is equivalent to

•

first sample random messages $\bm{\ell}^{\prime}=(\overline{\bm{c}}^{(d_{1}+1)},\ldots,\overline{\bm{c}}^{(t)})$ conditioned on $\ell^{*}$ ,
•

then sample $\bm{x},\bm{y}\sim\gamma$ conditioned on $(\bm{x},\bm{y})\in\bm{X}^{(t)}\times\bm{Y}^{(t)}$ given $\bm{\ell}^{\prime}$ .

Note that we can further expand $\bm{\ell}^{\prime}$ to a leaf $\bm{\ell}$ as a full communication path, and obtain the following equivalent sampling process:

•

Sample a random leaf $\bm{\ell}$ conditioned on $\ell^{*}$ .
•

Sample $\bm{x},\bm{y}\sim\gamma$ conditioned on $(\bm{x},\bm{y})\in\bm{X}^{(t)}\times\bm{Y}^{(t)}$ defined by the first $t$ messages of $\bm{\ell}$ .

As a result, we have

	$\displaystyle K$	$\displaystyle=\sum_{t=d_{1}+1}^{d_{2}}\operatorname{\mathbb{E}}_{\bm{\ell}}\left[\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(t)}\right\rangle^{2}+\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},\bm{b}^{(t)}\right\rangle^{2}\,\middle\|\,(\bm{x},\bm{y})\in\bm{X}^{(t)}\times\bm{Y}^{(t)}\right]\,\middle\|\,\ell^{*}\right]$
		$\displaystyle=\operatorname{\mathbb{E}}_{\bm{\ell}}\left[\sum_{t=d_{1}+1}^{d_{2}}\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},\bm{a}^{(t)}\right\rangle^{2}\,\middle\|\,\bm{x}\in\bm{X}^{(t)}\right]+\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma}\left[\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},\bm{b}^{(t)}\right\rangle^{2}\,\middle\|\,\bm{y}\in\bm{Y}^{(t)}\right]\,\middle\|\,\ell^{}\right].$

Observe that there are at most $2d$ many step 3(a) and 3(b) in $\bm{\ell}$ . This means, if $\bm{d}\geq d_{2}$ , then from the $(d_{1}+1)$ -th to the $d_{2}$ -th communication steps, there are at least $d_{2}-d_{1}-2d$ cleanup steps (i.e., step 3(c)), each of which contributes at least $\lambda$ to $K$ . Thus we can lower bound $K$ by

K\geq\lambda\cdot(d_{2}-d_{1}-2d)\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right].

(6.12)

On the other hand by Claim 6.7, there are at most $n^{2}$ non-zero $\bm{a}^{(\cdot)}$ ’s and at most $n^{2}$ non-zero $\bm{b}^{(\cdot)}$ ’s in each communication path. Thus

\bm{k}(\bm{x},\bm{y})\leq n^{2}\cdot\left(\max_{x\in\bm{X}^{(0)}}\left\|x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x\right\|^{2}+\max_{y\in\bm{Y}^{(0)}}\left\|y\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}y\right\|^{2}\right)<2n^{4}T^{2}.

(6.13)

We now obtain another upper bound using Theorem 3.3. Let $\overline{\bm{\ell}}=(\overline{\bm{c}}^{(1)},\ldots,\overline{\bm{c}}^{(d_{2})})$ extend $\ell^{*}$ for the next $d_{2}-d_{1}$ messages.¹⁴¹⁴14If $\overline{\bm{\ell}}$ becomes a leaf before $d_{2}$ , then we can simply pad dummy messages to it. Then $K=\operatorname*{\mathbb{E}}_{\overline{\bm{\ell}}}\left[\bm{k}(\overline{\bm{\ell}})\,\middle|\,\ell^{*}\right]$ where $\bm{k}(\overline{\ell}):=\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}(\bm{x},\bm{y})\,\middle|\,\overline{\ell}\right]$ . Note that $\overline{\ell}$ fixes $a^{(\cdot)}$ ’s and $b^{(\cdot)}$ ’s in $\bm{k}(\bm{x},\bm{y})$ . Therefore we use $\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})$ to denote $\bm{k}(\bm{x},\bm{y})$ with the directions $a^{(\cdot)}$ ’s and $b^{(\cdot)}$ ’s fixed by $\overline{\ell}$ . We now continue the bound on $\bm{k}(\overline{\ell})$ :

$\displaystyle\bm{k}(\overline{\ell})$	$\displaystyle\leq\sum_{t=0}^{\infty}\operatorname{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\,\middle\|\,\overline{\ell}\right]=\sum_{t=0}^{\infty}\frac{\operatorname{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t,\overline{\ell}\right]}{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\overline{\ell}\right]}$
	$\displaystyle=\sum_{t=0}^{\infty}\min\left\{1,\frac{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t,\overline{\ell}\right]}{\gamma(\overline{\ell})}\right\}$	(by the definition of $\gamma(\cdot)$ )
	$\displaystyle\leq\sum_{t=0}^{\infty}\min\left\{1,\frac{\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right]}{\gamma(\overline{\ell})}\right\}.$	(6.14)

We now analyze $\operatorname*{\mathbf{Pr}}_{\bm{x},\bm{y}\sim\gamma}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right]$ using Theorem 3.3. Since $a^{(t)},b^{(t)}$ cannot be non-zero simultaneously, we rearrange the matrices and assume $a^{(d_{1}+1)},\ldots,a^{(d^{\prime})},b^{(d^{\prime}+1)},\ldots,b^{(d^{\prime\prime})}$ are the only non-zero matrices where $d^{\prime\prime}\leq d_{2}$ . Then

\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})=\sum_{t=d_{1}+1}^{d^{\prime}}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},a^{(t)}\right\rangle^{2}+\sum_{t=d^{\prime}+1}^{d^{\prime\prime}}\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},b^{(t)}\right\rangle^{2}.

Note that $a$ ’s (resp., $b$ ’s) satisfy the condition in Theorem 3.3. Let $1/\kappa$ be the constant¹⁵¹⁵15In particular $\kappa=56448$ suffices from our proof in Appendix B. in $\Omega$ in Theorem 3.3. Hence

$\displaystyle\operatorname*{\mathbf{Pr}}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right]$	$\displaystyle\leq\operatorname{\mathbf{Pr}}\left[\sum_{t=d_{1}+1}^{d^{\prime}}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},a^{(t)}\right\rangle^{2}\geq t/2\right]+\operatorname{\mathbf{Pr}}\left[\sum_{t=d^{\prime}+1}^{d^{\prime\prime}}\left\langle\bm{y}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{y},b^{(t)}\right\rangle^{2}\geq t/2\right]$
	$\displaystyle\leq 2\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d^{\prime}-d_{1}+\sqrt{t/2}}\right\}+2\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d^{\prime\prime}-d^{\prime}+\sqrt{t/2}}\right\}$	(by Theorem 3.3 and assuming $t\geq 196\cdot\max\left\{d^{\prime}-d_{1},d^{\prime\prime}-d^{\prime}\right\}$ )
	$\displaystyle\leq 4\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d_{2}-d_{1}+\sqrt{t/2}}\right\}.$	(since $d_{1}\leq d^{\prime}\leq d^{\prime\prime}\leq d_{2}$ )

Thus for any $t\geq 196\cdot(d_{2}-d_{1})\geq 196\cdot\max\left\{d^{\prime}-d_{1},d^{\prime\prime}-d^{\prime}\right\}$ , we have

\operatorname*{\mathbf{Pr}}\left[\bm{k}_{\overline{\ell}}(\bm{x},\bm{y})\geq t\right]\leq 4\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d_{2}-d_{1}+\sqrt{t/2}}\right\}.

(6.15)

For $\gamma(\overline{\ell})\geq 2^{-3L\cdot d_{2}}$ , we plug Equation 6.15 into Equation 6.14 and obtain

$\displaystyle\bm{k}(\overline{\ell})$	$\displaystyle\leq\sum_{t=0}^{196\cdot(d_{2}-d_{1})^{2}}1+\sum_{t>196\cdot(d_{2}-d_{1})^{2}}\min\left\{1,2^{3L\cdot d_{2}+1}\cdot\exp\left\{-\frac{1}{\kappa}\cdot\frac{t/2}{d_{2}-d_{1}+\sqrt{t/2}}\right\}\right\}$	(by Equation 6.15)
	$\displaystyle\leq 196\cdot(d_{2}-d_{1})^{2}+1+\sum_{t\geq 196\cdot(d_{2}-d_{1})^{2}}\min\left\{1,2^{3L\cdot d_{2}+1}\cdot e^{-\frac{1}{\kappa}\cdot\frac{t/2}{2\sqrt{t/2}}}\right\}$
	$\displaystyle\leq 197\cdot d_{2}^{2}+\sum_{t\geq 1}\min\left\{1,2^{3L\cdot d_{2}+1}\cdot e^{-\frac{\sqrt{t/2}}{2\kappa}}\right\}$
	$\displaystyle\leq\alpha\cdot d_{2}^{2}L^{2},$	(6.16)

where $\alpha$ is another universal constant. Now we have

K=\operatorname*{\mathbb{E}}_{\overline{\bm{\ell}}}\left[\bm{k}(\overline{\bm{\ell}})\,\middle|\,\ell^{*}\right]=\sum_{\overline{\ell}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})=\sum_{\overline{\ell}:\gamma(\overline{\ell})<2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})+\sum_{\overline{\ell}:\gamma(\overline{\ell})\geq 2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell}),

where the first summation can be bounded by

$\displaystyle\sum_{\overline{\ell}:\gamma(\overline{\ell})<2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})$	$\displaystyle\leq\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot\sum_{\overline{\ell}}2^{-3L\cdot(d_{2}-d_{1})}\cdot n^{4}T^{2}$	(by Equation 6.13)
	$\displaystyle\leq\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot 2^{2L\cdot(d_{2}-d_{1})}\cdot 2^{-3L\cdot(d_{2}-d_{1})}\cdot n^{4}T^{2}$	(since $\ell^{*}$ is fixed and each message is at most $2L$ bits)
	$\displaystyle=\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot\frac{2n^{4}T^{2}}{2^{L}}$	(since $d_{2}-d_{1}\geq 1$ )

and the second summation is bounded by

\sum_{\overline{\ell}:\gamma(\overline{\ell})\geq 2^{-3L\cdot d_{2}}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\bm{k}(\overline{\ell})\leq\sum_{\overline{\ell}}\frac{\gamma(\overline{\ell})}{\gamma(\ell^{*})}\cdot\alpha\cdot d_{2}^{2}L^{2}=\alpha\cdot d_{2}^{2}L^{2}.

(by Equation 6.16)

Then combining Equation 6.12, we have

\lambda\cdot(d_{2}-d_{1}-2d)\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]\leq\alpha\cdot d_{2}^{2}L^{2}+\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}\cdot\frac{2n^{4}T^{2}}{2^{L}}.

Assume $2^{L}\geq 8n^{4}T^{2}$ and $d_{2}-d_{1}\geq 2d+1$ . Then

\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq d_{2}\,\middle|\,\ell^{*}\right]\leq\frac{\alpha\cdot d_{2}^{2}L^{2}}{\lambda\cdot(d_{2}-d_{1}-2d)}+\frac{1}{4}\cdot\frac{2^{-3L\cdot d_{1}}}{\gamma(\ell^{*})}.

∎

Corollary 6.17.

Assume $\gamma^{*}\geq 3/4$ , $T\leq n$ , $L\geq\Theta(\log(n))$ , and $\lambda\geq\Theta(dL^{2}\log^{2}(n))$ . Then for each $k=0,1,\ldots,4\log(n)$ , we have

\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\right]\leq 2^{-k}+\frac{k}{n^{5}}.

Proof.

We prove the bound by induction on $k$ . The base case $k=0$ is trivial. For the inductive case, let $\ell^{*}$ be the first $4(k-1)d$ communication messages. Then we bound

P:=\sum_{\ell^{*}:\gamma(\ell^{*})/\gamma^{*}<2^{-3L\cdot 4(k-1)d}}\frac{\gamma(\ell^{*})}{\gamma^{*}}\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\,\middle|\,\ell^{*}\right]

and

Q:=\sum_{\ell^{*}:\gamma(\ell^{*})/\gamma^{*}\geq 2^{-3L\cdot 4(k-1)d}}\frac{\gamma(\ell^{*})}{\gamma^{*}}\cdot\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4kd\,\middle|\,\ell^{*}\right]

separately.

For $P$ , observe that if $k=1$ then $\ell^{*}$ is root of the protocol, thus $\gamma(\ell^{*})=\gamma^{*}$ and $P=0$ . On the other hand, if $k\geq 2$ , then

$\displaystyle P$	$\displaystyle\leq\sum_{\ell^{}:\gamma(\ell^{})/\gamma^{}<2^{-3L\cdot 4(k-1)d}}2^{-3L\cdot 4(k-1)d}\leq\sum_{\ell^{}}2^{-3L\cdot 4(k-1)d}$
	$\displaystyle\leq 2^{2L\cdot 4(k-1)d}\cdot 2^{-3L\cdot 4(k-1)d}$	(each communication message is at most $2L$ bits)
	$\displaystyle=2^{-L\cdot 4(k-1)d}\leq n^{-5}.$	(since $k\geq 2$ and $L\geq\Theta(\log(n))$ )

Now we turn to $Q$ . Applying Lemma 6.16 with $\ell^{*}$ and $d_{1}=4(k-1)d,d_{2}=4kd$ , we have

$\displaystyle Q$	$\displaystyle\leq\sum_{\ell^{}:\gamma(\ell^{})/\gamma^{}\geq 2^{-3L\cdot 4(k-1)d}}\frac{\gamma(\ell^{})}{\gamma^{}}\cdot\left(\frac{16\alpha\cdot k^{2}d^{2}L^{2}}{2dR}+\frac{1}{4}\cdot\frac{2^{-3L\cdot 4(k-1)d}}{\gamma(\ell^{})}\right)$
	$\displaystyle\leq\sum_{\ell^{}}\frac{\gamma(\ell^{})}{\gamma^{}}\cdot\left(\frac{8\alpha\cdot k^{2}dL^{2}}{\lambda}+\frac{1}{4\gamma^{}}\right)$
	$\displaystyle=\operatorname{\mathbf{Pr}}\left[\bm{d}\geq 4(k-1)d\right]\cdot\left(\frac{8\alpha\cdot k^{2}dL^{2}}{\lambda}+\frac{1}{4\gamma^{}}\right)$
	$\displaystyle\leq\operatorname*{\mathbf{Pr}}\left[\bm{d}\geq 4(k-1)d\right]\cdot\frac{1}{2}$	(since $\gamma^{*}\geq 3/4$ and $\lambda\geq\Theta(dL^{2}\log^{2}(n)),k\leq 4\log(n)$ )
	$\displaystyle\leq\left(2^{-(k-1)}+\frac{k-1}{n^{5}}\right)\cdot\frac{1}{2}\leq 2^{-k}+\frac{k-1}{n^{5}}.$	(by induction hypothesis)

By adding up $P$ and $Q$ , we complete the induction. ∎

Given Corollary 6.17 and suitable choice of the parameters, we now prove the second moment bound.

Proof of Lemma 6.11.

With $L=\Theta(\log(n))$ , $T=\Theta(\sqrt{\log(n)})$ , and $\lambda=\Theta(d\log^{4}(n))$ , by Fact 6.3, we have $\gamma^{*}\geq 3/4$ . Therefore the second moment of $\bm{d}$ is

$\displaystyle\operatorname*{\mathbb{E}}[\bm{d}^{2}]$	$\displaystyle\leq\sum_{k=0}^{4\log(n)}\left(4(k+1)d\right)^{2}\cdot\operatorname{\mathbf{Pr}}\left[\bm{d}\geq 4kd\right]+\operatorname{\mathbf{Pr}}\left[\bm{d}\geq 16d\log(n)\right]\cdot(2n^{2})^{2}$	(by Claim 6.7)
	$\displaystyle\leq\sum_{k=0}^{4\log(n)}\left(4(k+1)d\right)^{2}\cdot\left(2^{-k}+\frac{k}{n^{5}}\right)+\left(n^{-4}+\frac{4\log(n)}{n^{5}}\right)\cdot(2n^{2})^{2}$	(by Corollary 6.17)
	$\displaystyle=O(d^{2}).$	∎

7 Fourier Growth Reductions For General Gadgets

In this section, we show that Fourier growth bounds of communication protocols for general (constant-sized) gadgets can be reduced to the bounds of XOR-fiber, and vice versa. This implies that in the study of Fourier growth, they are all equivalent.

Let $m_{1},m_{2}$ be two positive integers. Let $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ be a gadget. Recall that $\nu$ is the uniform distribution over $\{\pm 1\}^{n}$ . We now use $\nu_{1},\nu_{2},\overline{\nu}_{1},\overline{\nu}_{2}$ to denote the uniform distributions over $\{\pm 1\}^{m_{1}},\{\pm 1\}^{m_{2}},(\{\pm 1\}^{m_{1}})^{n},(\{\pm 1\}^{m_{2}})^{n}$ respectively. We define the $g$ -fiber of communication protocols similar to the XOR-fiber:

Definition 7.1.

For any randomized two-party protocol $\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ , its $g$ -fiber, denoted by $\mathcal{C}_{\downarrow g}\colon\{\pm 1\}^{n}\to[-1,1]$ , is defined by

\mathcal{C}_{\downarrow g}(z)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=z_{i},~{}\forall i\right],

where the expectation is also over the internal randomness of $\mathcal{C}$ .

To compare the Fourier growth bounds between gadgets, we use $L_{1,k}(g,d,m_{1},m_{2},n)$ to denote the upper bound of the level- $k$ Fourier growth for the $g$ -fiber of an arbitrary randomized communication protocol $\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ with at most $d$ bits of communication, where $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ is the gadget. Since randomized protocols are convex combinations of deterministic protocols of the same cost, using this notation, our main results Theorems 1.2 and 1.3 can be rephrased as

L_{1,1}(\mathrm{XOR},d,1,1,n)\leq O\left(\sqrt{d}\right)\quad\text{and}\quad L_{1,2}(\mathrm{XOR},d,1,1,n)\leq O\left(d^{3/2}\log^{3}(n)\right).

For any set $S\subseteq[m_{1}]$ , define $x_{S}=\prod_{i\in S}x_{i}$ , and similarly for $y_{T}$ with $T\subseteq[m_{2}]$ . Similar to the standard Fourier representation of Boolean functions, the gadget $g$ , which is a two-party function, also has Fourier representation:

g(x,y)=\sum_{S\subseteq[m_{1}],T\subseteq[m_{2}]}\widehat{g}(S,T)\cdot x_{S}y_{T},\quad\text{where}\quad\widehat{g}(S,T)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\nu_{1},\bm{y}\sim\nu_{2}}\left[g(\bm{x},\bm{y})\cdot\bm{x}_{S}\bm{y}_{T}\right].

For convenience, we will assume $g$ satisfies the following assumption. It’s easy to see that the XOR gadget satisfies it.

Assumption 7.2.

$\widehat{g}(S,T)=0$ if $S=\emptyset$ or $T=\emptyset$ .

Remark 7.3.

This assumption is equivalent to the fact that, restricted on any input to Alice’s side, the remaining function on Bob’s side is balanced, and vice versa.

Even if $g$ does not satisfy the assumption, then we can embed it inside a similar gadget $g^{\prime}\colon\{\pm 1\}^{m_{1}+1}\times\{\pm 1\}^{m_{2}+1}\to\{\pm 1\}$ , where we XOR the last bit of Alice and the last bit of Bob to the old gadget $g$ applied to Alice’s first $m_{1}$ bits and Bob’s first $m_{2}$ bits, i.e.,

g^{\prime}(x,y)=x_{m_{1}+1}y_{m_{2}+1}\cdot g(x_{\leq m_{1}},y_{\leq m_{2}}).

Then $g^{\prime}$ satisfies the assumption and inherits most properties of $g$ sufficient for studies in communication complexity tasks.

Now for a protocol $\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ , it is also a two-party function and thus admitting similar Fourier representation. We view an input from $(\{\pm 1\}^{m_{1}})^{n}$ as indexed by a tuple in $[n]\times[m_{1}]$ . Therefore any subset of $(\{\pm 1\}^{m_{1}})^{n}$ is uniquely identified as $\bigcup_{i\in[n]}\left\{i\right\}\times S_{i}$ , where each $S_{i}\subseteq[m_{1}]$ . We use $S^{[n]}$ to denote $(S_{i})_{i\in[n]}$ . Thus the Fourier coefficients of $\mathcal{C}$ can be written as

\widehat{\mathcal{C}}(S^{[n]},T^{[n]}):=\widehat{\mathcal{C}}\left(\bigcup_{i\in[n]}\left\{i\right\}\times S_{i},\bigcup_{i\in[n]}\left\{i\right\}\times T_{i}\right),

and the Fourier representation of $\mathcal{C}$ is

\mathcal{C}(x,y)=\sum_{S^{[n]},J^{[n]}}\widehat{\mathcal{C}}(S^{[n]},T^{[n]})\cdot\prod_{i\in[n]}x_{i,S_{i}}\cdot\prod_{j\in[n]}y_{j,T_{j}},

where $x_{i,S}=\prod_{j\in S}x_{i,j}$ and similar for $y_{j,T}$ .

Under this notation and assuming Assumption 7.2, we can effectively compute the Fourier coefficients of any $g$ -fiber.

Fact 7.4.

Assume gadget $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ satisfies Assumption 7.2. Then we have

\widehat{\mathcal{C}_{\downarrow g}}(I)=\sum_{\begin{subarray}{c}S^{I},T^{I}\\ S_{i}\neq\emptyset,T_{i}\neq\emptyset,\forall i\in I\end{subarray}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i})\quad\text{for any $I\subseteq[n]$,}

where we use $S^{I}$ to denote $S^{[n]}$ with $S_{j}$ fixed to $\emptyset$ for all $j\notin I$ .

Proof.

Observe that

	$\displaystyle\widehat{\mathcal{C}_{\downarrow g}}(I)$	$\displaystyle=\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu}\left[\mathcal{C}_{\downarrow g}(\bm{z})\cdot\prod_{i\in I}\bm{z}_{i}\right]$
		$\displaystyle=\operatorname{\mathbb{E}}_{\bm{z}\sim\nu}\left[\operatorname{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle\|\,g(\bm{x}_{i},\bm{y}_{i})=\bm{z}_{i},~{}\forall i\right]\cdot\prod_{i\in I}\bm{z}_{i}\right]$
		$\displaystyle=\operatorname{\mathbb{E}}_{\bm{z}\sim\nu}\left[\operatorname{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\cdot\prod_{i\in I}g(\bm{x}_{i},\bm{y}_{i})\,\middle\|\,g(\bm{x}_{i},\bm{y}_{i})=\bm{z}_{i},~{}\forall i\right]\right].$

Since $\widehat{g}(\emptyset,\emptyset)=0$ by Assumption 7.2, every pair $(x,y)$ is sampled with the same probability under the conditional distribution. Thus we get

\widehat{\mathcal{C}_{\downarrow g}}(I)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\cdot\prod_{i\in I}g(\bm{x}_{i},\bm{y}_{i})\right].

Now we expand $\mathcal{C}$ and $g$ in the Fourier basis and obtain

$\displaystyle\widehat{\mathcal{C}_{\downarrow g}}(I)$	$\displaystyle=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\left(\sum_{S^{[n]},T^{[n]}}\widehat{\mathcal{C}}(S^{[n]},T^{[n]})\prod_{i\in[n]}\bm{x}_{i,S_{i}}\prod_{j\in[n]}\bm{y}_{j,T_{j}}\right)\cdot\prod_{i\in I}\left(\sum_{S_{i},T_{i}}\widehat{g}(S_{i},T_{i})\bm{x}_{i,S_{i}}\bm{y}_{i,T_{i}}\right)\right]$
	$\displaystyle=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\left(\sum_{S^{[n]},T^{[n]}}\widehat{\mathcal{C}}(S^{[n]},T^{[n]})\prod_{i\in[n]}\bm{x}_{i,S_{i}}\prod_{j\in[n]}\bm{y}_{j,T_{j}}\right)\left(\sum_{S^{I},T^{I}}\prod_{i\in I}\widehat{g}(S_{i},T_{i})\bm{x}_{i,S_{i}}\bm{y}_{i,T_{i}}\right)\right]$
	$\displaystyle=\sum_{S^{I},T^{I}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i})$
	$\displaystyle=\sum_{\begin{subarray}{c}S^{I},T^{I}\\ S_{i}\neq\emptyset,T_{i}\neq\emptyset,\forall i\in I\end{subarray}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i}),$	(by Assumption 7.2)

as desired. ∎

Now we present the reduction from XOR-fiber to a general $g$ -fiber.

Theorem 7.5.

Assume gadget $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ satisfies Assumption 7.2. Then

	$\displaystyle L_{1,k}(\mathrm{XOR},d,1,1,n)$	$\displaystyle\leq\left(\max_{S,T}\|\widehat{g}(S,T)\|\right)^{-k}\cdot L_{1,k}(g,d,m_{1},m_{2},n)$
		$\displaystyle\leq 2^{(m_{1}+m_{2})\cdot k/2}\cdot L_{1,k}(g,d,m_{1},m_{2},n).$

Proof.

Let $\mathcal{C}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1]$ be an arbitrary protocol of cost at most $d$ . Then for a fixed set $I\subseteq[n]$ , by Fact 7.4 applied to the XOR gadget, we have

\widehat{\mathcal{C}_{\downarrow\mathrm{XOR}}}(I)=\widehat{\mathcal{C}}(1^{I},1^{I}).

(7.1)

Let $S\subseteq[m_{1}]$ and $T\subseteq[m_{2}]$ maximize $|\widehat{g}(S,T)|$ . Since $g$ satisfies Assumption 7.2, we know $S$ and $T$ are not empty sets.

Now define a different protocol $\mathcal{C}^{\prime}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ as follows: After receiving input $x$ , Alice computes $x^{\prime}_{i}=x_{i,S}$ for each block $x_{i}$ ; and Bob computes similarly $y^{\prime}_{i}=y_{i,T}$ upon receiving input $y$ . Then they execute the protocol $\mathcal{C}$ on $x^{\prime}$ and $y^{\prime}$ . That is, $\mathcal{C}^{\prime}(x,y)=\mathcal{C}(x^{\prime},y^{\prime})$ . Therefore, for any $I\subseteq[n]$ and $S^{I},T^{I}$ satisfying $S_{i}\neq\emptyset,T_{i}\neq\emptyset$ for $i\in I$ , we have

\widehat{\mathcal{C}^{\prime}}(S^{I},T^{I})=\begin{cases}\widehat{\mathcal{C}}(1^{I},1^{I})&S_{i}=S,T_{i}=T,~{}\forall i\in I,\\ 0&\text{otherwise.}\end{cases}

Then by Equation 7.1 and Fact 7.4 applied to $\mathcal{C}^{\prime}$ with gadget $g$ , we have

\widehat{\mathcal{C}_{\downarrow g}^{\prime}}(I)=\widehat{\mathcal{C}}(1^{I},1^{I})\cdot\widehat{g}(S,T)^{|I|}=\widehat{\mathcal{C}_{\downarrow\mathrm{XOR}}}(I)\cdot\widehat{g}(S,T)^{|I|}.

Now summing over all $I\subseteq[n]$ of size $k$ , we have

	$\displaystyle L_{1,k}(\mathcal{C}_{\downarrow\mathrm{XOR}})$	$\displaystyle=\sum_{I\subseteq[n]:\|I\|=k}\left\|\widehat{\mathcal{C}_{\downarrow\mathrm{XOR}}}(I)\right\|=\|\widehat{g}(S,T)\|^{-k}\cdot\sum_{I\subseteq[n]:\|I\|=k}\left\|\widehat{\mathcal{C}_{\downarrow g}^{\prime}}(I)\right\|=\|\widehat{g}(S,T)\|^{-k}\cdot L_{1,k}(\mathcal{C}^{\prime}_{\downarrow g})$
		$\displaystyle\leq\|\widehat{g}(S,T)\|^{-k}\cdot L_{1,k}(g,d,m_{1},m_{2},n).$		(since $\mathcal{C}^{\prime}$ has cost at most $d$ )

Since $\mathcal{C}$ is arbitrary, this proves the first half of Theorem 7.5. To prove the second half, we use an averaging argument and Parseval’s identity on $g$ :

|\widehat{g}(S,T)|\geq\sqrt{2^{-m_{1}-m_{2}}\sum_{S^{\prime},T^{\prime}}\widehat{g}(S^{\prime},T^{\prime})^{2}}=\sqrt{2^{-m_{1}-m_{2}}}.

∎

Using similar analysis, we also have a reduction from a general $g$ -fiber to XOR-fiber.

Theorem 7.6.

Assume gadget $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ satisfies Assumption 7.2. Then

	$\displaystyle L_{1,k}(g,d,m_{1},m_{2},n)$	$\displaystyle\leq\left(\sum_{S,T}\|\widehat{g}(S,T)\|\right)^{k}\cdot L_{1,k}(\mathrm{XOR},d,1,1,n)$
		$\displaystyle\leq 2^{(m_{1}+m_{2})\cdot k/2}\cdot L_{1,k}(\mathrm{XOR},d,1,1,n).$

Proof.

Let $\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ be an arbitrary protocol of cost at most $d$ . Then for a fixed set $I\subseteq[n]$ , by Fact 7.4 applied to gadget $g$ and using Assumption 7.2, we have

\widehat{\mathcal{C}_{\downarrow g}}(I)=\sum_{S^{I},T^{I}}\widehat{\mathcal{C}}(S^{I},T^{I})\cdot\prod_{i\in I}\widehat{g}(S_{i},T_{i}).

Therefore

L_{1,k}(\mathcal{C}_{\downarrow g})\leq\sum_{I\subseteq[n]:|I|=k}\sum_{S^{I},T^{I}}\left|\widehat{\mathcal{C}}(S^{I},T^{I})\right|\cdot\left|\prod_{i\in I}\widehat{g}(S_{i},T_{i})\right|.

Now let $M=\sum_{S,T}|\widehat{g}(S,T)|$ . Let $\rho$ be a distribution over subsets of $[m_{1}]\times[m_{2}]$ and its probability density function is defined as:

\rho(S,T)=|\widehat{g}(S,T)|/M.

Then we can rewrite $L_{1,k}(\mathcal{C}_{\downarrow g})$ as

	$\displaystyle L_{1,k}(\mathcal{C}_{\downarrow g})$	$\displaystyle\leq\sum_{I\subseteq[n]:\|I\|=k}\operatorname*{\mathbb{E}}_{(\bm{S}^{I},\bm{T}^{I})\sim\rho^{I}}\left[\left\|\widehat{\mathcal{C}}(\bm{S}^{I},\bm{T}^{I})\right\|\cdot M^{k}\right]$
		$\displaystyle=M^{k}\cdot\operatorname*{\mathbb{E}}_{(\bm{S}^{[n]},\bm{T}^{[n]})\sim\rho^{[n]}}\left[\sum_{I\subseteq[n]:\|I\|=k}\left\|\widehat{\mathcal{C}}(\bm{S}^{I},\bm{T}^{I})\right\|\right].$		(7.2)

Now we fix an arbitrary $(S^{[n]},T^{[n]})$ sampled from $\rho^{[n]}$ . Note that $S_{i}$ and $T_{i}$ are not empty by the definition of $\rho$ and Assumption 7.2. Then define a different protocol $\mathcal{C}^{\prime}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1]$ as follows: After receiving input $x$ , Alice samples $x^{\prime}\in(\{\pm 1\}^{m_{1}})^{n}$ uniformly conditioned on $x^{\prime}_{i,S_{i}}=x_{i}$ for all $i\in[n]$ ; and Bob samples similarly $y^{\prime}\in(\{\pm 1\}^{m_{2}})^{n}$ conditioned on $y^{\prime}_{i,T_{i}}=y_{i}$ for all $i\in[n]$ . Then they execute the protocol $\mathcal{C}$ on $x^{\prime}$ and $y^{\prime}$ . That is, $\mathcal{C}^{\prime}(x,y)=\operatorname*{\mathbb{E}}_{\bm{x}^{\prime},\bm{y}^{\prime}}[\mathcal{C}(\bm{x}^{\prime},\bm{y}^{\prime})]$ . Therefore, for any $I\subseteq[n]$ , we have

\widehat{\mathcal{C}^{\prime}}(1^{I},1^{I})=\widehat{\mathcal{C}}(S^{I},T^{I}).

By Fact 7.4 applied to $\mathcal{C}^{\prime}$ and the XOR gadget, we have

\widehat{\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}}(I)=\widehat{\mathcal{C}^{\prime}}(1^{I},1^{I})=\widehat{\mathcal{C}}(S^{I},T^{I}).

Since $\mathcal{C}^{\prime}$ has cost at most $d$ , we have

\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}}(S^{I},T^{I})\right|=\sum_{I\subseteq[n]:|I|=k}\left|\widehat{\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}}(I)\right|=L_{1,k}(\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}})\leq L_{1,k}(\mathrm{XOR},d,1,1,n).

Putting back to Equation 7.2, we have

L_{1,k}(\mathcal{C}_{\downarrow g})\leq M^{k}\cdot L_{1,k}(\mathrm{XOR},d,1,1,n),

which proves the first half of Theorem 7.6 since $\mathcal{C}$ is arbitrary. To prove the second half, we use Cauchy-Schwarz inequality and Parseval’s identity on $g$ :

M=\sum_{S,T}|\widehat{g}(S,T)|\leq\sqrt{2^{m_{1}+m_{2}}\sum_{S,T}\widehat{g}(S,T)^{2}}=\sqrt{2^{m_{1}+m_{2}}}.

∎

As a corollary, to study the Fourier growth bounds, we can switch between gadgets conveniently, as long as the gadgets have small size.

Corollary 7.7.

Assume gadgets $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ and $g^{\prime}\colon\{\pm 1\}^{m_{1}^{\prime}}\times\{\pm 1\}^{m_{2}^{\prime}}\to\{\pm 1\}$ satisfy Assumption 7.2. Then

L_{1,k}(g,d,m_{1},m_{2},n)\leq 2^{(m_{1}+m_{2}+m_{1}^{\prime}+m_{2}^{\prime})\cdot k/2}\cdot L_{1,k}(g^{\prime},d,m_{1}^{\prime},m_{2}^{\prime},n).

8 Directions Towards Further Improvements

In this section we propose potential directions for further improving our second level bounds. In Subsection 8.1, we show that better Fourier growth bounds can be obtained from strong lifting theorems in a black-box way. This relies on the Fourier growth reductions in Section 7. In Subsection 8.2, we examine the bottleneck in our analysis and identify major obstacles within.

8.1 Better Lifting Theorems Imply Better Fourier Growth

Let $f:\{\pm 1\}^{n}\to\{\pm 1\}$ be a Boolean function. Let $g:\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ be a gadget. A lifting theorem connects the communication complexity of $f\circ g$ with the query complexity of $f$ . Some lifting theorems show that a low-cost communication protocol can be simulated by a low-cost query algorithm.

To be more precise, let $\mathcal{C}:(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ be a randomized two-party protocol. Recall Definition 7.1, the $g$ -fiber of $\mathcal{C}$ , denoted $\mathcal{C}_{\downarrow g}(z):\{\pm 1\}^{n}\to[-1,1]$ , is defined by

\mathcal{C}_{\downarrow g}(z)=\operatorname*{\mathbb{E}}_{\bm{x}\sim\overline{\nu}_{1},\bm{y}\sim\overline{\nu}_{2}}\left[\mathcal{C}(\bm{x},\bm{y})\,\middle|\,g(\bm{x}_{i},\bm{y}_{i})=z_{i},~{}\forall i\right].

We say that $g$ satisfies a strong lifting theorem if for all randomized protocols $\mathcal{C}$ of small communication bits, there is a randomized decision tree of small depth that approximates $\mathcal{C}_{\downarrow g}$ on each input with error $1/\mathrm{poly}(n)$ (see e.g., [26]).

Theorem 8.1.

Assume gadget $g\colon\{\pm 1\}^{m_{1}}\times\{\pm 1\}^{m_{2}}\to\{\pm 1\}$ satisfies Assumption 7.2. Assume for any randomized protocol $\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ with at most $d$ bits of communication, there exists a randomized decision tree $\mathcal{T}$ of depth at most $D$ that approximates $\mathcal{C}_{\downarrow g}$ with pointwise error at most $1/n^{k}$ , i.e.,

\left|\mathcal{T}(z)-\mathcal{C}_{\downarrow g}(z)\right|\leq n^{-k}\quad\forall z\in\{\pm 1\}^{n}.

Then, for any randomized protocol $\mathcal{C}^{\prime}\colon\{\pm 1\}^{n}\times\{\pm 1\}^{n}\to[-1,1]$ with at most $d$ bits of communication, its XOR-fiber $\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}$ has level- $k$ Fourier growth

	$\displaystyle L_{1,k}(\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}})$	$\displaystyle\leq\left(\max_{S,T}\|\widehat{g}(S,T)\|\right)^{-k}\cdot\sqrt{D^{k}\cdot O\left(\log(n)\right)^{k-1}}$
		$\displaystyle\leq 2^{(m_{1}+m_{2})\cdot k/2}\cdot\sqrt{D^{k}\cdot O\left(\log(n)\right)^{k-1}}.$

As a simple corollary, we see that if the assumption of Theorem 8.1 holds with $k=2$ , $D=d\cdot\mathrm{polylog}(n)$ , and a polylogarithmic-sized gadget $g$ (i.e., $2^{m_{1}},2^{m_{2}}\leq\mathrm{polylog}(n)$ ), then the second level Fourier growth of the XOR-fiber of any randomized protocol of cost $d$ is at most $d\cdot\mathrm{polylog}(n)$ as desired.

We also remark that state-of-the-art lifting results hold with the gadget $g$ being either:

•

The inner product on $m_{1}=m_{2}=O(\log(n))$ bits [12]. However, for such $g$ the largest Fourier coefficient squared is $1/\mathrm{poly}(n)$ , which yields a trivial bound in Theorem 8.1.
•

The index function with $m_{1}=\mathrm{poly}(n)$ , $m_{2}=\log(m_{1})$ [26].¹⁶¹⁶16For deterministic lifting, a better bound $m_{1}=O(n\log(n))$ is known [37], but it doesn’t suffice for our reduction. In this case the largest Fourier coefficient squared is $1/m_{1}^{2}$ , which again yields a trivial bound in Theorem 8.1. Nonetheless, even a polynomial improvement on $m_{1}$ , say $m_{1}=n^{0.01}$ , would give new non-trivial bounds in Theorem 8.1 and in turn improves our lower bound on the XOR-lift of Forrelation.

Proof of Theorem 8.1.

Let $\mathcal{C}\colon(\{\pm 1\}^{m_{1}})^{n}\times(\{\pm 1\}^{m_{2}})^{n}\to[-1,1]$ be a randomized protocol of cost at most $d$ . Then by assumption, $\mathcal{C}_{\downarrow g}$ can be approximated up to error $1/n^{k}$ by a randomized decision tree $\mathcal{T}$ of depth at most $D$ . Thus any Fourier coefficient of $\mathcal{C}_{\downarrow g}$ and $\mathcal{T}$ differs by at most $1/n^{k}$ . Therefore by the level- $k$ Fourier growth bounds on randomized decision trees [64, 57], we have

L_{1,k}(\mathcal{C}_{\downarrow g})\leq\sum_{S\subseteq[n]:|S|=k}\left(n^{-k}+\left|\widehat{\mathcal{T}}(S)\right|\right)\leq\sqrt{D^{k}\cdot O(\log(n))^{k-1}}.

Since $\mathcal{C}$ is arbitrary, the claimed bound for $\mathcal{C}^{\prime}_{\downarrow\mathrm{XOR}}$ follows from Theorem 7.5. ∎

8.2 Sums of Squares of Quadratic Forms for Pairwise Clean Sets

In our analysis for the level-two bound, we showed that one can transform a general protocol to a $4$ -wise clean protocol with parameter $\lambda=d\cdot\mathrm{polylog}(n)$ by adding $O(d)$ additional cleanup steps in expectation. If one could show that with essentially the same number of steps, one could take $\lambda=\mathrm{polylog}(n)$ , then we would obtain the optimal level-two bound of $d\cdot\mathrm{polylog}(n)$ .

We recall that to bound the number of cleanup steps, we rely on a concentration inequality for sums of squares of orthonormal quadratic forms (Theorem 3.3), which says that if $M_{1},\ldots,M_{m}$ are matrices with zero diagonal and form an orthonormal set when viewed as $n^{2}$ dimensional vectors, then the random variable $\bm{q}=\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}$ satisfies $\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}[\bm{q}\geq t]\leq e^{-\Omega(\sqrt{t})}$ for any $t\gtrsim m^{2}$ . Using this tail bound for $m=\Theta(d)$ and conditioning on $\bm{x}\in X$ where $X$ is an arbitrary subset of $\mathbb{R}^{n}$ with Gaussian measure $\approx 2^{-d}$ , we obtained a bound $\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}[\bm{q}~{}|~{}\bm{x}\in X]\lesssim d^{2}$ . This shows that there can be at most $O(d)$ such quadratic forms $M_{i}$ ’s where the value $\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]$ can be larger than $d$ and hence, the reason we can only take $\lambda\approx d$ . We note that the argument just described is for the non-adaptive setting, while in our case the $M_{i}$ ’s are also being chosen adaptively, so additional work is needed.

The next example shows that the aforementioned statement is tight even in the non-adaptive setting where the $M_{i}$ ’s are fixed: in particular, there is a set $X$ of large measure and $\approx d$ such orthonormal quadratic forms where the above expectation after conditioning on $\bm{x}\in X$ is $\Theta(d^{2})$ .

Example 8.2.

For $1\leq i<j\leq\sqrt{d}$ , let $M_{ij}=E_{ij}$ for $i<j$ where $E_{ij}$ denotes the $n\times n$ matrix where only the $(i,j)$ entry is one. Note that the matrices $M_{ij}$ form an orthonormal set and they all have a zero diagonal. Let $X=\left\{x\in\mathbb{R}^{n}\,\middle|\,|x_{i}|\gtrsim d^{1/4}\text{ for all $i\leq d^{1/2}$}\right\}$ . Then, the Gaussian measure $\gamma(X)=2^{-\Theta(d)}$ but

\operatorname*{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\sum_{1\leq i<j\leq\sqrt{d}}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{ij}\right\rangle^{2}\,\middle|\,\bm{x}\in X\right]=\Theta(d^{2}).

Note that the set $X$ in the example above is not pairwise clean and for our application, one can get around it by first ensuring that the protocol is pairwise clean and then proceeding with the 4-wise cleanup process. Motivated by this, we speculate that when the set is pairwise clean, then the expected value of the sum of squares of orthonormal quadratic forms is much smaller unlike the example above. Assuming such a statement and combining it with our ideas for handling the adaptivity suggests a potential way of improving the level-two bounds.

References

AA [18] Scott Aaronson and Andris Ambainis. Forrelation: A problem that optimally separates quantum from classical computing. SIAM J. Comput., 47(3):982–1038, 2018.
Aar [10] Scott Aaronson. BQP and the polynomial hierarchy. In STOC, pages 141–150, 2010.
ABK [23] Scott Aaronson, Harry Buhrman, and William Kretschmer. A qubit, a coin, and an advice string walk into a relational problem. arXiv preprint arXiv:2302.10332, 2023.
Agr [20] Rohit Agrawal. Coin theorems and the fourier expansion. Chic. J. Theor. Comput. Sci., 2020, 2020.
ALM [20] Radosław Adamczak, Rafał Latała, and Rafał Meller. Hanson–wright inequality in banach spaces. Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 56(4), nov 2020.
BCW [98] Harry Buhrman, Richard Cleve, and Avi Wigderson. Quantum vs. classical communication and computation. In STOC, pages 63–68. ACM, 1998.
BIJ⁺ [21] Jarosław Błasiok, Peter Ivanov, Yaonan Jin, Chin Ho Lee, Rocco A Servedio, and Emanuele Viola. Fourier growth of structured $\mathbb{F}_{2}$ -polynomials and applications. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
Bor [75] Christer Borell. The brunn-minkowski inequality in gauss space. Inventiones mathematicae, 30(2):207–216, 1975.
BS [21] Nikhil Bansal and Makrand Sinha. k-forrelation optimally separates quantum and classical query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1303–1316, 2021.
BTW [15] Eric Blais, Li-Yang Tan, and Andrew Wan. An inequality for the fourier spectrum of parity decision trees. CoRR, abs/1506.01055, 2015.
BV [10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 30–39, 2010.
CFK⁺ [19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting for bpp using inner product. In ICALP, 2019.
CGR [14] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
CHHL [19] Eshan Chattopadhyay, Pooya Hatami, Kaave Hosseini, and Shachar Lovett. Pseudorandom generators from polarizing random walks. Theory Comput., 15:1–26, 2019.
CHLT [18] Eshan Chattopadhyay, Pooya Hatami, Shachar Lovett, and Avishay Tal. Pseudorandom generators from the second fourier level and applications to ac0 with parity gates. In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
CHRT [18] Eshan Chattopadhyay, Pooya Hatami, Omer Reingold, and Avishay Tal. Improved pseudorandomness for unordered branching programs through local monotonicity. In STOC, pages 363–375. ACM, 2018.
CKLM [19] Arkadev Chattopadhyay, Michal Koucký, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudo-random properties. Comput. Complex., 28(4):617–659, 2019.
CR [12] Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299–1317, 2012.
dRNV [16] Susanna F. de Rezende, Jakob Nordström, and Marc Vinyals. How limited interaction hinders real communication (and what it means for proof and circuit complexity). In FOCS, pages 295–304. IEEE Computer Society, 2016.
EM [22] Ronen Eldan and Dana Moshkovitz. Reduction from non-unique games to boolean unique games. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
Gav [20] Dmitry Gavinsky. Entangled simultaneity versus classical interactivity in communication complexity. IEEE Trans. Inf. Theory, 66(7):4641–4651, 2020.
GKPW [19] Mika Göös, Pritish Kamath, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for P NP. Comput. Complex., 28(1):113–144, 2019.
GLM⁺ [15] Mika Göös, Shachar Lovett, Raghu Meka, Thomas Watson, and David Zuckerman. Rectangles are nonnegative juntas. In STOC, pages 257–266. ACM, 2015.
Göö [15] Mika Göös. Lower bounds for clique vs. independent set. In FOCS, pages 1066–1076. IEEE Computer Society, 2015.
GPW [15] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. In FOCS, pages 1077–1088. IEEE Computer Society, 2015.
GPW [20] Mika Göös, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for BPP. SIAM J. Comput., 49(4), 2020.
GRT [21] Uma Girish, Ran Raz, and Avishay Tal. Quantum versus randomized communication complexity, with efficient players. In ITCS, volume 185 of LIPIcs, pages 54:1–54:20, 2021. Presented in QIP, 2020 as a contributed talk.
GRZ [21] Uma Girish, Ran Raz, and Wei Zhan. Lower bounds for xor of forrelations. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2021.
GSTW [16] Parikshit Gopalan, Rocco A. Servedio, Avishay Tal, and Avi Wigderson. Degree and sensitivity: tails of two distributions. CoRR, abs/1604.07432, 2016.
GTW [21] Uma Girish, Avishay Tal, and Kewen Wu. Fourier growth of parity decision trees. In 36th Computational Complexity Conference (CCC 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
HHL [18] Hamed Hatami, Kaave Hosseini, and Shachar Lovett. Structure of protocols for XOR functions. SIAM J. Comput., 47(1):208–217, 2018.
INW [94] Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for network algorithms. In STOC, pages 356–364. ACM, 1994.
IRR⁺ [21] Siddharth Iyer, Anup Rao, Victor Reis, Thomas Rothvoss, and Amir Yehudayoff. Tight bounds on the fourier growth of bounded functions on the hypercube. arXiv preprint arXiv:2107.06309, 2021.
KKL [88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions (extended abstract). In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80. IEEE Computer Society, 1988.
KMR [17] Pravesh K. Kothari, Raghu Meka, and Prasad Raghavendra. Approximating rectangles by juntas and weakly-exponential lower bounds for LP relaxations of csps. In STOC, pages 590–603. ACM, 2017.
Lee [19] Chin Ho Lee. Fourier bounds and pseudorandom generators for product tests. In Amir Shpilka, editor, 34th Computational Complexity Conference, CCC 2019, July 18-20, 2019, New Brunswick, NJ, USA, volume 137 of LIPIcs, pages 7:1–7:25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
LMM⁺ [22] Shachar Lovett, Raghu Meka, Ian Mertz, Toniann Pitassi, and Jiapeng Zhang. Lifting with sunflowers. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
LPV [22] Chin Ho Lee, Edward Pyne, and Salil P. Vadhan. Fourier growth of regular branching programs. In Amit Chakrabarti and Chaitanya Swamy, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms DBLP:conf/approx/LeePV22and Techniques, APPROX/RANDOM 2022, September 19-21, 2022, University of Illinois, Urbana-Champaign, USA (Virtual Conference), volume 245 of LIPIcs, pages 2:1–2:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
LRS [15] James R. Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semidefinite programming relaxations. In STOC, pages 567–576. ACM, 2015.
LSS⁺ [19] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. A fixed-depth size-hierarchy theorem for ${AC}^{0}[\oplus]$ via the coin problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 442–453, 2019.
LV [18] Chin Ho Lee and Emanuele Viola. The coin problem for product tests. ACM Transactions on Computation Theory (TOCT), 10(3):1–10, 2018.
Man [95] Yishay Mansour. An ${O}(n^{\log\log n})$ learning algorithm for DNF under the uniform distribution. J. Comput. Syst. Sci., 50(3):543–550, 1995. Appeared in COLT, 1992.
MO [10] Ashley Montanaro and Tobias Osborne. On the communication complexity of xor functions, 2010.
O’D [14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
OS [07] Ryan O’Donnell and Rocco A. Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827–844, 2007.
Raz [87] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
Raz [95] Ran Raz. Fourier analysis for probabilistic communication complexity. Comput. Complex., 5(3/4):205–221, 1995.
RM [99] Ran Raz and Pierre McKenzie. Separation of the monotone NC hierarchy. Comb., 19(3):403–435, 1999.
RPRC [16] Robert Robere, Toniann Pitassi, Benjamin Rossman, and Stephen A. Cook. Exponential lower bounds for monotone span programs. In FOCS, pages 406–415. IEEE Computer Society, 2016.
RS [10] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of ac ${}^{\mbox{0}}$ . SIAM J. Comput., 39(5):1833–1855, 2010.
RSV [13] Omer Reingold, Thomas Steinke, and Salil P. Vadhan. Pseudorandomness for regular branching programs via Fourier analysis. In APPROX-RANDOM, pages 655–670. Springer, 2013.
RT [19] Ran Raz and Avishay Tal. Oracle separation of BQP and PH. In STOC, pages 13–23. ACM, 2019. Presented in QIP, 2019 as a plenary talk. Accepted to the Journal of the ACM.
RY [22] Anup Rao and Amir Yehudayoff. Anticoncentration and the exact gap-hamming problem. SIAM Journal on Discrete Mathematics, 36(2):1071–1092, 2022.
She [11] Alexander A. Sherstov. The pattern matrix method. SIAM J. Comput., 40(6):1969–2000, 2011.
She [12] Alexander A. Sherstov. The communication complexity of gap hamming distance. Theory Comput., 8(1):197–208, 2012.
Smo [87] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Alfred V. Aho, editor, Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, New York, USA, pages 77–82. ACM, 1987.
SSW [21] Alexander A Sherstov, Andrey A Storozhenko, and Pei Wu. An optimal separation of randomized and quantum query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1289–1302, 2021.
ST [78] Vladimir N Sudakov and Boris S Tsirel’son. Extremal properties of half-spaces for spherically invariant measures. Journal of Soviet Mathematics, 9(1):9–18, 1978.
SVW [17] Thomas Steinke, Salil P. Vadhan, and Andrew Wan. Pseudorandomness and Fourier-growth bounds for width-3 branching programs. Theory of Computing, 13(1):1–50, 2017. Appeared in APPROX-RANDOM, 2014.
SZ [08] Yaoyun Shi and Zhiqiang Zhang. Communication complexities of xor functions. arXiv preprint arXiv:0808.1762, 2008.
SZ [09] Yaoyun Shi and Yufan Zhu. Quantum communication complexity of block-composed functions. Quantum Inf. Comput., 9(5&6):444–460, 2009.
Tal [96] Michel Talagrand. How much are increasing sets positively correlated? Comb., 16(2):243–258, 1996.
Tal [17] Avishay Tal. Tight bounds on the Fourier spectrum of AC0. In Computational Complexity Conference, volume 79 of LIPIcs, pages 15:1–15:31. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.
Tal [20] Avishay Tal. Towards optimal separations between quantum and randomized query complexities. In FOCS, pages 228–239. IEEE, 2020.
TWXZ [13] Hing Yin Tsang, Chung Hoi Wong, Ning Xie, and Shengyu Zhang. Fourier sparsity, spectral norm, and the log-rank conjecture. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 658–667, 2013.
Ver [18] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
Vid [12] Thomas Vidick. A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gap-hamming-distance problem. Chic. J. Theor. Comput. Sci., 2012, 2012.
Wu [22] Xinyu Wu. A stochastic calculus approach to the oracle separation of BQP and PH. Theory Comput., 18:1–11, 2022.
WYY [17] Xiaodi Wu, Penghui Yao, and Henry S. Yuen. Raz-mckenzie simulation with the inner product gadget. Electron. Colloquium Comput. Complex., 24:10, 2017.
Zha [14] Shengyu Zhang. Efficient quantum protocols for xor functions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1878–1885. SIAM, 2014.

Appendix A Gap-Hamming Lower Bounds

As an immediate consequence of Theorem 1.5, we can derive optimal lower bounds against the Gap-Hamming problem as in Theorem 1.6.

Proof of Theorem 1.6.

Set $\rho=10/\sqrt{n}$ . Fix the randomness to be any $r\in\{0,1\}^{*}$ and let $\mathcal{C}_{r}$ refer to the deterministic protocol $\mathcal{C}$ with randomness fixed to $r$ . Suppose $d\leq\tau\cdot n$ for a sufficiently small constant $\tau$ , we apply Theorem 1.5 on $\rho$ as well as $-\rho$ , and apply triangle inequality to conclude that

\left|\operatorname*{\mathbb{E}}_{\bm{z}\sim\pi^{\otimes n}_{\rho}}[h_{r}(\bm{z})]-\operatorname*{\mathbb{E}}_{\bm{z}\sim\pi^{\otimes n}_{-\rho}}[h_{r}(\bm{z})]\right|\leq 2\cdot O\left(\sqrt{d/n}\right)<1/9.

Let $\sigma_{\rho}$ be the distribution of $(\bm{x},\bm{y})$ induced by sampling $\bm{x}\sim\pi^{\otimes n}_{0}$ and $\bm{z}\sim\pi^{\otimes n}_{\rho}$ and letting $\bm{y}=\bm{x}\odot\bm{z}$ , similarly define $\sigma_{-\rho}$ but with $\bm{z}\sim\pi^{\otimes n}_{-\rho}$ . We now expand $h_{r}(z)$ in terms of $\mathcal{C}(x,y)$ , take an expectation over $r$ and apply triangle inequality to conclude that

\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/9.

(A.1)

Hoeffding’s inequality implies that for $\bm{z}\sim\pi^{\otimes n}_{\rho}$ , we have

\operatorname*{\mathbf{Pr}}\left[\left|\sum_{i}\bm{z}_{i}-10\sqrt{n}\right|\geq 5\sqrt{n}\right]\leq 2\exp\left\{\tfrac{-2\cdot(5\sqrt{n})^{2}}{4n}\right\}<1/18.

This implies that a random $(\bm{x},\bm{y})\sim\sigma_{\rho}$ is a yes instance of the Gap-Hamming problem with probability larger than $17/18$ . Let $\widetilde{\sigma}_{\rho}$ denote $\sigma_{\rho}$ conditioned on Yes instances of the Gap-Hamming problem. Similarly define $\widetilde{\sigma}_{-\rho}$ to be $\sigma_{-\rho}$ conditioned on No instances of the Gap-Hamming problem. Since $\mathcal{C}(x,y)$ has outputs in $[-1,1]$ , we have

\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/9

and

\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\sigma_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/9.

This, along with Equation A.1 and triangle inequality, implies that

\left|\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{\rho}}[\mathcal{C}(\bm{x},\bm{y})]-\operatorname*{\mathbb{E}}_{(\bm{x},\bm{y})\sim\widetilde{\sigma}_{-\rho}}[\mathcal{C}(\bm{x},\bm{y})]\right|<1/3.

However, this contradicts the assumption that the protocol $\mathcal{C}$ solves the Gap-Hamming problem with advantage at least $2/3$ . ∎

Appendix B Concentration for Sum of Squares of Quadratic Forms

Here we prove Theorem 3.3. While it follows from [5, Theorem 6] which is a Banach space-valued version of the Hanson-Wright inequality, in our setting a weaker statement suffices, for which we give a self-contained proof following [5].

For any integer $n\geq 1$ , we use $\mathcal{B}^{n}=\left\{x\in\mathbb{R}^{n}\,\middle|\,\left\|x\right\|\leq 1\right\}$ to denote the unit Euclidean ball in $\mathbb{R}^{n}$ . For any two sets $A,B\subseteq\mathbb{R}^{n}$ , we define $A+B=\left\{x+y\,\middle|\,x\in A,y\in B\right\}$ . For any set $A\in\mathbb{R}^{n}$ and any number $t\in\mathbb{R}$ , we define $tA=\left\{t\cdot x\,\middle|\,x\in A\right\}$ . Let $\Phi\colon\mathbb{R}\to[0,1]$ be the cumulative distribution function of the standard Gaussian distribution, i.e., $\Phi(a)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{a}e^{-u^{2}/2}\mathrm{d}u$ .

Now we cite the famous Gaussian isoperimetric inequality [8, 58].

Theorem B.1 (Gaussian Isoperimetric Inequality).

Let $A\subseteq\mathbb{R}^{n}$ be a measurable set and assume $\gamma_{n}(A)\geq\Phi(a)$ for some $a\in\mathbb{R}$ . Then for any $t\geq 0$ , we have $\gamma_{n}(A+t\mathcal{B}^{n})\geq\Phi(a+t)$ .

In particular, if $\gamma_{n}(A)\geq 1/2$ , then we can pick $a=0$ in Theorem B.1 and have

\gamma_{n}(A+t\mathcal{B}^{n})\geq\Phi(t)\geq 1-e^{-t^{2}/2}.

(B.1)

Now we are ready to prove Theorem 3.3.

Proof of Theorem 3.3.

Note that the bound is trivial when $m=0$ . Thus from now on we assume without loss of generality $m\geq 1$ .

For each $x\in\mathbb{R}^{n}$ , let $K_{x}=\sum_{i=1}^{m}\left\langle x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x,M_{i}\right\rangle^{2}$ . We first write $K_{x}$ as a squared Euclidean norm of a vector:

•

For $i\in[m]$ , we view $M_{i}$ as a length- $n^{2}$ row vector.
•

Let $M\in\mathbb{R}^{m\times n^{2}}$ be a matrix where the $i$ -th row is $M_{i}$ .

Therefore we have

K_{x}=\left\|M(x\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}x)\right\|^{2}=\left\|M(x\otimes x)\right\|^{2},

(B.2)

where $\otimes$ is the standard tensor product and the second equality follows since each $M_{i}$ has zero diagonal.

Define $f(y)=\left\|M(y\otimes y)\right\|$ , $g(y)=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(z\otimes y)\right\|$ , and $h(y)=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(y\otimes z)\right\|$ . Let $F=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}[f(\bm{y})]$ , $G=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}[g(\bm{y})]$ , and $H=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}[h(\bm{y})]$ be their mean. Define the set

A=\left\{y\in\mathbb{R}^{n}\,\middle|\,f(y)<6F,\ g(y)<6G,\text{ and }h(y)<6H\right\}.

By Markov’s inequality and union bound, we have the Gaussian measure of $A$ is $\gamma_{n}(A)\geq 1/2$ . Then by Equation B.1, we have

\gamma_{n}(A+t\mathcal{B}^{n})\geq 1-e^{-t^{2}/2}\quad\text{holds for all $t\geq 0$.}

(B.3)

Now for an arbitrary $x\in A+t\mathcal{B}^{n}$ , we write $x=y+tz$ where $y\in A$ and $z\in\mathcal{B}^{n}$ . Then

	$\displaystyle\left\\|M(x\otimes x)\right\\|$	$\displaystyle\leq\left\\|M(y\otimes y)\right\\|+t\cdot\left\\|M(y\otimes z)\right\\|+t\cdot\left\\|M(z\otimes y)\right\\|+t^{2}\cdot\left\\|M(z\otimes z)\right\\|$
		$\displaystyle<6F+6t(G+H)+t^{2}V,$

where $V=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(z\otimes z)\right\|$ . This, together with Equation B.2 and Equation B.3, implies

\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[K_{\bm{x}}\geq\left(6F+6t(G+H)+t^{2}V\right)^{2}\right]\leq\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[\bm{x}\notin A+t\mathcal{B}^{n}\right]=1-\gamma_{n}(A+t\mathcal{B}^{n})\leq e^{-t^{2}/2}.

(B.4)

Now we calculate $F,G,H,V$ in the following claim, the proof of which will be presented later.

Claim B.2.

$F\leq\sqrt{2m}$ , $G,H\leq\sqrt{m}$ , and $V\leq 1$ .

Plugging Claim B.2 into Equation B.4, we have

\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[K_{\bm{x}}\geq\left(6\sqrt{2m}+12t\sqrt{m}+t^{2}\right)^{2}\right]\leq e^{-t^{2}/2}\quad\text{holds for any $t\geq 0$.}

Now we set

t=\frac{1}{168}\sqrt{\frac{r}{m+\sqrt{r}}}\geq 0

and assume $r\geq 98m$ . Then $6\sqrt{2m}\leq\frac{6}{7}\sqrt{r}$ , $12t\sqrt{m}\leq\frac{1}{14}\sqrt{r}$ , and $t^{2}\leq\frac{1}{14}\sqrt{r}$ . Therefore

\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[\sum_{i=1}^{m}\left\langle\bm{x}\overset{\mathchoice{\mathbin{{\hbox{\scalebox{0.5}{$\displaystyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\textstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptstyle\bullet$}}}}}{\mathbin{{\hbox{\scalebox{0.5}{$\scriptscriptstyle\bullet$}}}}}}{\otimes}\bm{x},M_{i}\right\rangle^{2}\geq r\right]=\operatorname*{\mathbf{Pr}}_{\bm{x}\sim\gamma_{n}}\left[K_{\bm{x}}\geq r\right]\leq e^{-t^{2}/2}=\exp\left\{-\frac{1}{56448}\cdot\frac{r}{m+\sqrt{r}}\right\}.

∎

Finally we present the missing proof of Claim B.2.

Proof of Claim B.2.

First we observe that rows of $M$ are unit vectors, therefore

\left\|M\right\|=\sqrt{m}.

(B.5)

In addition, rows of $M$ are orthogonal to each other, therefore the operator norm of $M$ is

\left\|M\right\|_{\mathrm{op}}\leq 1.

(B.6)

We index the columns of $M$ by $[n]^{2}$ and let the column vectors of $M$ be $\left(b_{i,j}\right)_{i,j\in[n]}$ . Since rows of $M$ are flattened matrices with zero diagonal, we have

b_{i,i}=0^{m}\quad\text{for all $i\in[n]$.}

(B.7)

Now we bound $F,G,H,V$ separately.

Bounding $F$ .

Observe that

$\displaystyle F^{2}$	$\displaystyle=\left(\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\left\\|M(\bm{y}\otimes\bm{y})\right\\|\right]\right)^{2}\leq\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\left\\|M(\bm{y}\otimes\bm{y})\right\\|^{2}\right]=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\left\\|\sum_{i,j\in[n]}b_{i,j}\bm{y}_{i}\bm{y}_{j}\right\\|^{2}\right]$	(by convexity)
	$\displaystyle=\operatorname*{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\sum_{i,j,i^{\prime},j^{\prime}\in[n]}\left\langle b_{i,j},b_{i^{\prime},j^{\prime}}\right\rangle\bm{y}_{i}\bm{y}_{j}\bm{y}_{i^{\prime}}\bm{y}_{j^{\prime}}\right]=\sum_{i,j\in[n]}\left(\left\\|b_{i,j}\right\\|^{2}+\left\langle b_{i,j},b_{j,i}\right\rangle\right)$	(by Equation B.7)
	$\displaystyle\leq\sum_{i,j\in[n]}\left(\left\\|b_{i,j}\right\\|^{2}+\frac{1}{2}\left(\left\\|b_{i,j}\right\\|^{2}+\left\\|b_{j,i}\right\\|^{2}\right)\right)=2\sum_{i,j\in[n]}\left\\|b_{i,j}\right\\|^{2}$
	$\displaystyle=2\left\\|M\right\\|^{2}=2m.$	(by Equation B.5)

Bounding $G$ and $H$ .

Fix an arbitrary $y\in\mathbb{R}^{n}$ and we first simplify $g(y)$ . For each $i\in[n]$ , define vector $b_{i}=\sum_{j\in[n]}b_{i,j}y_{j}$ and let $B$ be the matrix with $b_{i}$ ’s as column vectors. Then

g(y)=\sup_{z\in\mathbb{S}^{n-1}}\left\|\sum_{i,j\in[n]}b_{i,j}z_{i}y_{j}\right\|=\sup_{z\in\mathbb{S}^{n-1}}\left\|\sum_{i\in[n]}b_{i}z_{i}\right\|=\left\|B\right\|_{\mathrm{op}}\leq\left\|B\right\|=\sqrt{\sum_{i\in[n]}\left\|\sum_{j\in[n]}b_{i,j}y_{j}\right\|^{2}}.

(B.8)

Now we bound $G$ :

$\displaystyle G^{2}$	$\displaystyle=\left(\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[g(\bm{y})\right]\right)^{2}\leq\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[g(\bm{y})^{2}\right]$	(by convexity)
	$\displaystyle\leq\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\sum_{i\in[n]}\left\\|\sum_{j\in[n]}b_{i,j}\bm{y}_{j}\right\\|^{2}\right]=\operatorname{\mathbb{E}}_{\bm{y}\sim\gamma_{n}}\left[\sum_{i\in[n]}\sum_{j,j^{\prime}\in[n]}\left\langle b_{i,j},b_{i,j^{\prime}}\right\rangle\bm{y}_{j}\bm{y}_{j^{\prime}}\right]$	(by Equation B.8)
	$\displaystyle=\sum_{i,j\in[n]}\left\\|b_{i,j}\right\\|^{2}=\left\\|M\right\\|^{2}=m.$	(by Equation B.5)

Similar argument works for $H$ .

Bounding $V$ .

Note that for any $z\in\mathbb{S}^{n-1}$ , we have $\left\|z\otimes z\right\|=\left\|z\right\|^{2}=1$ . Thus, by Equation B.6, we have

V=\sup_{z\in\mathbb{S}^{n-1}}\left\|M(z\otimes z)\right\|\leq\left\|M\right\|_{\mathrm{op}}\leq 1.

∎

	$\displaystyle\operatorname*{\mathbb{E}}\left[(\Delta\bm{z}^{(t+1)})^{2}\,\middle\|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\Delta\mu(\bm{X}^{(t+1)}),\mu(\bm{Y}^{(t)})-\mu(\bm{Y}^{(t_{\mathrm{prev}})})\right\rangle^{2}\,\middle\|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right]$
		$\displaystyle\leq\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}-\mu(\bm{X}^{(t)}),\mu(\bm{Y}^{(t)})-\mu(\bm{Y}^{(t_{\mathrm{prev}})})\right\rangle^{2}\,\middle\|\,\bm{X}^{(t)},\bm{Y}^{(t)}\right].$		(2.2)

	$\displaystyle L_{1,k}(h)=\sum_{\begin{subarray}{c}S\subseteq[n]\\ \|S\|=k\end{subarray}}\left\|\operatorname*{\mathbb{E}}_{\bm{z}\sim\nu_{n}}[h(\bm{z})\bm{z}_{S}]\right\|$	$\displaystyle=(\pi/2)^{k}\sum_{\begin{subarray}{c}S\subseteq[n]\\ \|S\|=k\end{subarray}}\left\|\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}]\right\|$
		$\displaystyle=(\pi/2)^{k}\max_{(\eta_{S})_{\|S\|=k}}\sum_{\begin{subarray}{c}S\subseteq[n]\\ \|S\|=k\end{subarray}}\eta_{S}{\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]},$		(4.1)

$\displaystyle\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}{\operatorname*{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma_{n}}\left[\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\right]}$	$\displaystyle=\operatorname{\mathbb{E}}_{\bm{\ell}}\left[\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}\cdot\overline{\mathcal{C}}(\bm{x},\bm{y})\bm{x}_{S}\bm{y}_{S}\,\middle\|\,(\bm{x},\bm{y})\in\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}\right]\right]$
	$\displaystyle=\operatorname{\mathbb{E}}_{\bm{\ell}}\left[\overline{\mathcal{C}}(\bm{\ell})\operatorname{\mathbb{E}}_{\bm{x},\bm{y}\sim\gamma}\left[\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}\cdot\bm{x}_{S}\bm{y}_{S}\,\middle\|\,(\bm{x},\bm{y})\in\bm{X}_{\bm{\ell}}\times\bm{Y}_{\bm{\ell}}\right]\right]$
	$\displaystyle\leq\operatorname{\mathbb{E}}_{\bm{\ell}}\left[~{}\left\|\sum_{\begin{subarray}{c}S\subseteq[n],\|S\|=k\end{subarray}}\eta_{S}\operatorname{\mathbb{E}}\left[\bm{x}_{S}\,\middle\|\,\bm{x}\in\bm{X}_{\bm{\ell}}\right]\cdot\operatorname*{\mathbb{E}}\left[\bm{y}_{S}\,\middle\|\,\bm{y}\in\bm{Y}_{\bm{\ell}}\right]\right\|~{}\right],$	(4.2)

$\displaystyle\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle=\operatorname{\mathbb{E}}\left[\left\langle\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\bm{x}\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]-\bm{u}^{(\tau)},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
	$\displaystyle=\operatorname{\mathbb{E}}\left[\operatorname{\mathbb{E}}_{\bm{x}\sim\gamma}\left[\left\langle\bm{x}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\bm{x}\in\bm{X}^{(\bm{\tau}+1)}\right]\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{x}-\bm{u}^{(\bm{\tau})},\bm{a}^{(\bm{\tau}+1)}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right],$	(5.7)

	$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\Delta\bm{z}^{(\bm{\tau}+1)}_{2}\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}\left[\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\eta\odot\bm{v}^{(\bm{\tau})}\right\rangle^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right]$
		$\displaystyle=\operatorname*{\mathbb{E}}\left[\left(\sum_{r=1}^{\bm{\tau}+1}\bm{\beta}^{(r)}\cdot\left\langle\bm{u}^{(\bm{\tau}+1)}-\bm{u}^{(\bm{\tau})},\bm{a}^{(r)}\right\rangle\right)^{2}\,\middle\|\,\mathcal{F}^{(\bm{\tau})}\right].$		(6.2)

Fourier Growth of Communication Protocols for XOR Functions

Abstract

1 Introduction

XOR Functions.

Definition 1.1.

Related Works.

1.1 Main Results

Theorem 1.2.

Theorem 1.3.

1.2 Applications and Connections

1.2.1 The Coin Problem and the Gap-Hamming Problem

Lemma 1.4 ([4, Lemma 3.2]).

Theorem 1.5.

The Gap-Hamming Problem.

Theorem 1.6.

1.2.2 Quantum versus Classical Communication Separation via Lifting

Lifting Forrelation with XOR.

Theorem 1.7.

Lifting kk-Fold Forrelation with XOR.

Conjecture 1.8.

1.2.3 General Gadgets and Fourier Growth from Lifting

Theorem 1.9 (Informal, see Theorem 7.5 and Theorem 7.6).

1.2.4 Pseudorandomness for Communication Protocols

Paper Organization.

2 Proof Overview

2.1 Level-One Fourier Growth

A Martingale Perspective.

Protocols in Gaussian Space.

Cleanup with Real Communication.

Expected Number of Cleanup steps.

2.2 Level-Two Fourier Growth

Martingales and Gram-Schmidt Orthogonalization.

4-wise Cleanup with Quadratic Forms.

Cleanup Analysis via Hanson-Wright Inequalities.

3 Preliminaries

Notation.

Probability.

Martingales.

Useful Inequalities.

Theorem 3.1 (Level-kk Inequality).

Fact 3.2.

Theorem 3.3.

4 Fourier Growth via Martingales in Gaussian Space

4.1 Communication Protocols in Gaussian Space

Fact 4.1.

Proof.

Remark 4.2.

4.2 Generalized Communication Protocols

4.3 Fourier Growth via Martingales

Proposition 4.3.

Proposition 4.4.

Remark 4.5.

5 Level-One Fourier Growth

5.1 Pairwise Clean Protocols

Construction of pairwise clean protocol 𝒞¯\overline{\mathcal{C}} from 𝒞~\widetilde{\mathcal{C}}.

Claim 5.1.

Proof.

5.2 Bounding the Expected Quadratic Variation

Lemma 5.2.

Lemma 5.3 (Step Size).

Lemma 5.4 (Final Center of Mass).

5.3 Bounds on Step Sizes (Proof of Lemma 5.3)

Term Outside the Expectation.

Term Inside the Expectation.

Final Bound.

5.4 Expected Norm of Final Center of Mass (Proof of Lemma 5.4)

5.4.1 Projection on the Subspaces 𝑯A\bm{H}_{A} and 𝑯B\bm{H}_{B}

Claim 5.5.

Claim 5.6.

Proof of Claim 5.6.

Proof of Claim 5.5.

Claim 5.7.

Proof of Claim 5.7.

Protocol 𝒞c\mathcal{C}_{c}.

Computing the Original Expectation.

5.4.2 Projection on the Orthogonal Subspaces 𝑯A⊥\bm{H}_{A}^{\bot} and 𝑯B⊥\bm{H}_{B}^{\bot}

Claim 5.8.

Claim 5.9.

Proof of Claim 5.8.

Proof of Claim 5.9.

Lifting $k$ -Fold Forrelation with XOR.

Theorem 3.1 (Level- $k$ Inequality).

Construction of pairwise clean protocol $\overline{\mathcal{C}}$ from $\widetilde{\mathcal{C}}$ .

5.4.1 Projection on the Subspaces $\bm{H}_{A}$ and $\bm{H}_{B}$

Protocol $\mathcal{C}_{c}$ .

5.4.2 Projection on the Orthogonal Subspaces $\bm{H}_{A}^{\bot}$ and $\bm{H}_{B}^{\bot}$

6.1 $4$ -Wise Clean Protocols

Construct $\overline{\mathcal{C}}$ from $\widetilde{\mathcal{C}}$ .

Bounding $F$ .

Bounding $G$ and $H$ .

Bounding $V$ .