Fourier Growth of Communication Protocols for XOR Functions
Abstract
The level- -Fourier weight of a Boolean function refers to the sum of absolute values of its level- Fourier coefficients. Fourier growth refers to the growth of these weights as grows. It has been extensively studied for various computational models, and bounds on the Fourier growth, even for the first few levels, have proven useful in learning theory, circuit lower bounds, pseudorandomness, and quantum-classical separations.
In this work, we investigate the Fourier growth of certain functions that naturally arise from communication protocols for XOR functions (partial functions evaluated on the bitwise XOR of the inputs and to Alice and Bob). If a protocol computes an XOR function, then is a function of the parity . This motivates us to analyze the XOR-fiber of the communication protocol , defined as .
We present improved Fourier growth bounds for the XOR-fibers of randomized protocols that communicate bits. For the first level, we show a tight bound and obtain a new coin theorem, as well as an alternative proof for the tight randomized communication lower bound for the Gap-Hamming problem. For the second level, we show an bound, which improves the previous bound by Girish, Raz, and Tal (ITCS 2021) and implies a polynomial improvement on the randomized communication lower bound for the XOR-lift of the Forrelation problem, which extends the quantum-classical gap for this problem.
Our analysis is based on a new way of adaptively partitioning a relatively large set in Gaussian space to control its moments in all directions. We achieve this via martingale arguments and allowing protocols to transmit real values. We also show a connection between Fourier growth and lifting theorems with constant-sized gadgets as a potential approach to prove optimal bounds for the second level and beyond.
1 Introduction
The Fourier spectrum of Boolean functions and their various properties have played an important role in many areas of mathematics and theoretical computer science. In this work, we study a notion called -Fourier growth, which captures the scaling of the sum of absolute values of the level- Fourier coefficients of a function. In a nutshell, functions with small Fourier growth cannot aggregate many weak signals in the input to obtain a considerable effect on the output. In contrast, the Majority function, which can amplify weak biases, is an example of a Boolean function with extremely high Fourier growth.
To formally define Fourier growth, we recall that every Boolean function can be uniquely represented as a multilinear polynomial
where the coefficients of the polynomial are called the Fourier coefficients of , and they satisfy for a uniformly random . The level- -Fourier growth of is the sum of the absolute values of its level- Fourier coefficients,
The study of Fourier growth dates back to the work of Mansour [42] who used it in the context of learning algorithms. Since then, several works have shown that upper bounds on the Fourier growth, even for the first few Fourier levels, have applications to pseudorandomness, circuit complexity, and quantum-classical separations. For example:
-
•
A bound on the level-one Fourier growth is sufficient to control the advantage of distinguishing biased coins from unbiased ones [4].
- •
Meanwhile, Fourier growth bounds have been extensively studied and established for various computational models, including small-width DNFs/CNFs [42], circuits [63], low-sensitivity Boolean functions [29], small-width branching programs [51, 59, 16, 38], small-depth decision trees [45, 64, 57], functions related to small-cost communication protocols [28, 27], low-degree polynomials [14, 15, 7], product tests [36], small-depth parity decision trees [10, 30], low-degree bounded functions [33], and more.
For any Boolean function with outputs in , the level- Fourier growth is at most . However, for many natural classes of Boolean functions, this bound is far from tight and not good enough for applications. Establishing better bounds require exploring structural properties of the specific class of functions in question. Even for low Fourier levels, this can be highly non-trivial and tight bounds remain elusive in many cases. For example, for degree- polynomials (which well-approximate when we set [46, 56]), while we know a level-one bound of due to [15], the current best bound for levels is roughly [14], whereas the conjectured bound is . Validating such a bound, even for the second level , will imply unconditional pseudorandom generators of polylogarithmic seed length for [15], a longstanding open problem in circuit complexity and pseudorandomness.
XOR Functions.
In this work, we study the Fourier growth of certain functions that naturally arise from communication protocols for XOR-lifted functions, also referred to as XOR functions. XOR functions are an important and well-studied class of functions in communication complexity with connections to the log-rank conjecture and quantum versus classical separations [43, 31, 65, 60, 70].
In this setting, Alice gets an input and Bob gets an input and they wish to compute where is some partial Boolean function and is in the domain of . Here, denotes the pointwise product of and . Given any communication protocol that computes an XOR function exactly, the output of the protocol depends only on the parity , whenever is defined on . This gives a natural motivation to analyze the XOR-fiber of a communication protocol defined below. We note that a similar notion first appeared in an earlier work of Raz [47].
Definition 1.1.
Let be any deterministic communication protocol. The XOR-fiber of the communication protocol is the function defined at as
where is the entrywise product and is the uniform distribution over .
We remark that XOR-fiber is the “inverse” of XOR-lift of a function: If computes the XOR function of , then the XOR-fiber of is equal to on the domain of .
In this work, we investigate the Fourier growth of XOR-fibers of small-cost communication protocols and apply these bounds in several contexts. Before stating our results, we first discuss several related works.
Related Works.
Showing optimal Fourier growth bounds for XOR-fibers is a complex undertaking in general and a first step towards this end is to obtain optimal Fourier growth bounds for parity decision trees. This is because a parity decision tree for a Boolean function naturally gives rise to a structured communication protocol for the XOR-function corresponding to . This protocol perfectly simulates the parity decision tree by having Alice and Bob exchange one bit each to simulate a parity query. Moreover, the XOR-fiber of this protocol exactly computes the parity decision tree. As such, parity decision trees can be seen as a special case of communication protocols, and Fourier growth bounds on XOR-fibers of communication protocols immediately imply Fourier growth bounds on parity decision trees.
Fourier growth bounds for decision trees and parity decision trees are well-studied. It is not too difficult to obtain a level- bound of for parity decision trees of depth , however, obtaining improved bounds is significantly more challenging. For decision trees of depth (which form a subclass of parity decision trees of depth ), O’Donnell and Servedio [45] proved a tight bound of on the level-one Fourier growth. By inductive tree decompositions, Tal [64] obtained bounds for the higher levels of the form . This was later sharpened by Sherstov, Storozhenko, and Wu [57] to the asymptotically tight bound of using a more sophisticated layered partitioning strategy on the tree.
When it comes to parity decision trees, despite all the similarities, the structural decomposition approach does not seem to carry over due to the correlations between the parity queries. For parity decision trees of depth , Blais, Tan, and Wan [10] proved a tight level-one bound of . For higher levels, Girish, Tal, and Wu [30] showed that . These works imply almost tight Fourier growth bounds on the XOR-fibers of structured protocols that arise from simulating decision trees or parity decision trees.
For the case of XOR-fibers of arbitrary deterministic/randomized communication protocols (which do not necessarily simulate parity decision trees or decision trees), Girish, Raz, and Tal [27] showed an Fourier growth111Technically, [27] only proved a level-two bound (as it suffices for their analysis), but a level- bound follows easily from their proof approach, as noted by [28] for level-. For level-one and level-two, these bounds are and respectively and are sub-optimal — as mentioned previously, such weaker bounds for parity decision trees are easy to obtain, while obtaining optimal bounds (for parity decision trees) of for level one and for level two already requires sophisticated ideas.
The bounds in [27] follow by analyzing the Fourier growth of XOR-fibers of communication rectangles of measure and then adding up the contributions from all the leaf rectangles induced by the protocol. Such a per-rectangle-based approach cannot give better bounds than the ones in [27], while they also conjectured that the optimal Fourier growth of XOR-fibers of arbitrary protocols should match the growth for parity decision trees.
Showing the above is a challenging task even for the first two Fourier levels. The difficulty arises primarily since in the absence of a per-rectangle-based argument, one has to crucially leverage cancellations between different rectangles induced by the communication protocol. In the simpler case of parity decision trees (or protocols that exchange parities), such cancellations are leveraged in [30] by ensuring -wise independence at each node of the tree — this can be achieved by adding extra parity queries. In a general protocol, the parties can send arbitrary partial information about their inputs and correlate the coordinates in complicated ways that such methods break down. This is one of the key difficulties we face in this paper.
1.1 Main Results
We prove new and improved bounds on the Fourier growth of the XOR-fibers associated with small-cost protocols for levels and .
Theorem 1.2.
Let be a deterministic communication protocol with at most bits of communication. Let be its XOR-fiber as in Definition 1.1. Then, .
Theorem 1.3.
Let be a deterministic protocol communicating at most bits. Let be its XOR-fiber as in Definition 1.1. Then, .
Our bounds in Theorems 1.2 and 1.3 extend directly to randomized communication protocols. This is because is convex and any randomized protocol is a convex combination of deterministic protocols with the same cost. Moreover, we can use Fourier growth reductions, as described in Subsection 1.2.3, to demonstrate that these bounds apply to general constant-sized gadgets and the corresponding -fiber.
Our level-one and level-two bounds improve previous bounds in [27] by polynomial factors. Additionally, our level-one bound is tight since a deterministic protocol with bits of communication can compute the majority vote of , which corresponds to with . Furthermore, as we discuss later in Subsection 1.2, level-one and level-two bounds are already sufficient for many interesting applications.
In terms of techniques, our analysis presents a key new idea that enables us to exploit cancellations between different rectangles induced by the protocol. This idea involves using a novel process to adaptively partition a relatively large set in Gaussian space, which enables us to control its -wise moments in all directions — this can be thought of as a spectral notion of almost -wise independence. We achieve this by utilizing martingale arguments and allowing protocols to transmit real values rather than just discrete bits. This notion and procedure may be of independent interest. See Section 2 for a detailed discussion.
1.2 Applications and Connections
Our main theorem has applications to XOR functions, and in more generality to functions lifted with constant-sized gadgets. In this setting, there is a simple gadget and a Boolean function defined on inputs . The lifted function is defined on pairs of symbols such that . The function naturally defines a communication problem where Alice is given , Bob is given , and they are asked to compute .
Since XOR functions are functions lifted with the XOR gadget, our main theorem implies lower bounds on the communication complexity of specific XOR functions. Additionally, we also show connections between XOR-lifting and lifting with any constant-sized gadget. Next, we describe these lower bounds and connections, with further context.
1.2.1 The Coin Problem and the Gap-Hamming Problem
The coin problem studies the advantage that a class of Boolean functions has in distinguishing biased coins from unbiased ones. More formally, let be a class of -variate Boolean functions. Let and denote the product distribution over where each coordinate has expectation . The Coin Problem asks what is the maximum advantage that functions in have in distinguishing from the uniform distribution .
This quantity essentially captures how well can approximate threshold functions, and in particular, the majority function. The coin problem has been studied for various models of computation including branching programs [11], and circuits [13, 40], product tests [41], and more. Recently, Agrawal [4] showed that the coin problem is closely related to the level-one Fourier growth of functions in .
Lemma 1.4 ([4, Lemma 3.2]).
Assume that is closed under restrictions and satisfies for all . Then, for all and ,
Note that communication protocols of small cost are closed under restrictions, so are their XOR-fibers (see [27, Lemma 5.5]). By noting that for small values of , we obtain the following corollary.222Here we also use the fact that the upper bound is vacuous for large enough as it is larger than . We also remark that, using the Fourier growth reductions (see Subsection 1.2.3), Theorem 1.5 can be established for general gadgets of small size.
Theorem 1.5.
Let be the XOR-fiber of a protocol with total communication . Then for all ,
In particular, consider the following distinguishing task: Alice and Bob either receive two uniformly random strings in or they receive two uniformly random strings in conditioned on their XOR distributed according to for (the latter is often referred to as -correlated strings). Theorem 1.5 implies that any protocol communicating bits cannot distinguish these two distributions with constant advantage. This is essentially a communication lower bound for the well-known Gap-Hamming Problem.
The Gap-Hamming Problem.
In the Gap-Hamming Problem, Alice and Bob receive strings respectively and they want to distinguish if or .
This is essentially the XOR-lift of the Coin Problem with because the distribution of conditioned on with and is mostly supported on the Yes and No instances of Gap-Hamming respectively. Thus immediately from Theorem 1.5, we derive a new proof for the lower bound on the communication complexity of the Gap-Hamming Problem. The proof is deferred to Appendix A.
Theorem 1.6.
The randomized communication complexity of the Gap-Hamming Problem is .
We note that there are various different proofs [18, 55, 67, 53] that obtain the above lower bound but the perspective taken here is perhaps conceptually simpler: (1) Gap-Hamming is essentially the XOR-lift of the Gap-Majority function, and (2) any function that approximates the Gap-Majority function must have large level-one Fourier growth, whereas XOR-fibers of small-cost protocols have small Fourier growth.
1.2.2 Quantum versus Classical Communication Separation via Lifting
One natural approach to proving quantum versus classical separations in communication complexity is via lifting: Consider a function separating quantum and classical query complexity and lift it using a gadget . Naturally, an algorithm computing with few queries to can be translated into a communication protocol computing where we replace each query to a bit with a short conversation that allows the calculation of . Göös, Pitassi, and Watson [26] showed that for randomized query/communication complexity and for various gadgets, this is essentially the best possible. Such results are referred to as lifting theorems.
Lifting theorems apply to different models of computation, such as deterministic decision trees [48, 25], randomized decision trees [26, 12], and more. A beautiful line of work shows how to “lift” many lower bounds in the query model to the communication model [48, 25, 23, 24, 19, 31, 69, 17, 35, 61, 54, 50, 49, 22, 39]. For quantum query complexity, only one direction (considered the “easier” direction) is known: Any quantum query algorithm for can be translated to a communication protocol for with a small logarithmic overhead [6]. It remains widely open whether the other direction holds as well. However, this query-to-communication direction for quantum, combined with the communication-to-query direction for classical, is already sufficient for lifting quantum versus classical separations from the query model to the communication model.
One drawback of this approach to proving communication complexity separations is that the state-of-the-art lifting results [12, 37] work for gadgets with alphabet size at least (recall that denotes ’s input length) and it is a significant challenge to reduce the alphabet size to or even . These large gadgets will usually result in larger overheads in terms of communication rounds, communication bits, and computations for both parties. As demonstrated next, lifting with simpler gadgets like XOR allows for a simpler quantum protocol for the lifted problem.
Lifting Forrelation with XOR.
The Forrelation function introduced by [2] is defined as follows: on input where is a power of ,
where denotes the (unitary) Hadamard matrix.
Girish, Raz, and Tal [27] studied the XOR-lift of the Forrelation problem and obtained new separations between quantum and randomized communication protocols. In more detail, they considered the partial function333We are overloading the notation here: technically, is the XOR-lift of the partial boolean function which on input outputs if is large and if is small. defined as
and showed that if Alice and Bob use a randomized communication protocol, then they must communicate at least bits to compute ; while it can be solved by two entangled parties in the quantum simultaneous message passing model with a -qubit communication protocol and additionally the parties can be implemented with efficient quantum circuits.
The lower bound in [27] was obtained from a second level Fourier growth bound (higher levels are not needed) on the XOR-fiber of classical communication protocols. Our level-two bound strengthens their bound and immediately gives an improved communication lower bound.
Theorem 1.7.
The randomized communication complexity of is .
Theorem 1.7 above gives an versus separation between the above quantum communication model and the randomized two-party communication model, improving upon the versus separation from [27]. We emphasize that our separations are for players with efficient quantum running time, where the only prior separation was shown by the aforementioned work [27]. Such efficiency features can also benefit real-world implementations to demonstrate quantum advantage in experiments; for instance, one such proposal was introduced recently by Aaronson, Buhrman, and Kretschmer [3]. Without the efficiency assumption, a better versus separation is known [21] (see [27, Section 1.1] for a more detailed comparison). Optimal Fourier growth bounds of for level two, which we state later in 1.8, would also imply such a separation with XOR-lift of Forrelation.
Lifting -Fold Forrelation with XOR.
-Fold Forrelation [1] is a generalization of the Forrelation problem and was originally conjectured to be a candidate that exhibits a maximal separation between quantum and classical query complexity. In a recent work, [9] showed that the randomized query complexity of -Fold Forrelation is , confirming this conjecture, and a similar separation was proven in [57] for variants of -Fold Forrelation. These separations, together with lifting theorems with the inner product gadget [12], imply an vs separation between two-party quantum and classical communication complexity, where additionally, the number of rounds444We remark that for , this is exactly the XOR-lift of the Forrelation problem and can even be computed in the quantum simultaneous model, as shown in [27]. in the two-party quantum protocol is .
Replacing the inner product gadget with the XOR gadget above would yield an improved quantum-classical communication separation where the gadget is simpler and the number of rounds required by the quantum protocol to achieve the same quantitative separation is reduced by half. Bansal and Sinha [9] showed that for any computational model, small Fourier growth for the first -levels implies hardness of -Fold Forrelation in that particular model. Thus, in conjunction with their results, to prove the above XOR lifting result for the -Fold Forrelation problem, it suffices to prove the following Fourier growth bounds for XOR-fibers.
Conjecture 1.8.
Let be a deterministic communication protocol with at most bits of communication. Let be its XOR-fiber as in Definition 1.1. Then for all , we have that .
Note that these bounds are consistent with the Fourier growth of parity decision trees (or protocols that only send parities) as shown in [30].
We prove the above conjecture for the case and make progress for the case . While our techniques can be extended to higher levels in a straightforward manner, the bounds obtained are farther from the conjectured ones. Thus, we decided to defer dealing with higher levels to future work as we believe one needs to first prove the optimal bound for level .
In the next subsection, we give another motivation to study the above conjecture by showing a connection to lifting theorems for constant-sized gadgets.
1.2.3 General Gadgets and Fourier Growth from Lifting
Our main results are Fourier growth bounds for XOR-fibers, which corresponds to XOR-lifts of functions. To complement this, we show that similar bounds hold for general lifted functions.
Let be a gadget and be a communication protocol. Define the -fiber of , denoted by , as
where and are uniform over . We use to denote the upper bound of the level- Fourier growth for the -fibers of protocols with at most bits of communication. Using this notation, the XOR-fiber of is simply , and our main results Theorems 1.2 and 1.3 can be rephrased as
In Section 7, we relate to , and the main takeaway is, in the study of Fourier growth bounds, constant-sized gadgets are all equivalent.
Theorem 1.9 (Informal, see Theorem 7.5 and Theorem 7.6).
Let be a “balanced” gadget. Then
Theorem 1.9 also proposes a different approach towards 1.8: it suffices to establish tight Fourier growth bound for -fibers for some constant-sized (actually, polylogarithmic size suffices) gadget , and then apply the reduction. The benefit of switching to a different gadget is that we can perhaps first prove a lifting theorem, and then appeal to the known Fourier growth bounds of (randomized) decision trees [64, 57]. See Subsection 8.1 for detail.
As mentioned earlier, lifting theorems show how to simulate communication protocols of cost for lifted functions with decision trees of depth at most (see e.g., [26]). A problem at the frontier of this fruitful line of work has been establishing lifting theorems for decision trees with constant-sized gadgets. Note that the XOR gadget itself cannot have such a generic lifting result: Indeed, the parity function serves as a counterexample. Nevertheless, it is speculative that some larger gadget works, which suffices for our purposes.555In terms of the separations between quantum and classical communication, even restricted lifting results for the specific outer function being the Forrelation function would suffice. On the other hand, for lifting from parity decision trees, we do know an XOR-lifting theorem [31]. However, it only holds for deterministic communication protocols and has a sextic blowup in the cost.
Thus, one can see 1.8 as either a further motivation for establishing lifting results for decision trees with constant-sized gadgets, or as a necessary milestone before proving such lifting results.
1.2.4 Pseudorandomness for Communication Protocols
We say is a pseudorandom generator (PRG) for a (randomized) communication protocol with error and seed length if
[32] showed that for the class of protocols sending at most communication bits, there exists an explicit PRG of error and seed length from expander graphs. Note that the overhead is inevitable even if the protocol is only sending one bit, since it can depend arbitrarily on Alice/Bob’s input.
Combining 1.8 and the PRG construction from [14, Theorem 4.5], we would obtain a completely different explicit PRG for this class with error and seed length .
Paper Organization.
An overview of our proofs is given in Section 2. In Section 3 we define necessary notation and recall useful inequalities. Section 4 explains a way to associate the Fourier growth to a martingale process. The proof of level-one bound (Theorem 1.2) is given in Section 5, and the level-two bound (Theorem 1.3) in Section 6. The Fourier growth reductions between general gadgets are presented in Section 7. The future directions are discussed in Section 8. Missing proofs can be found in the appendix.
2 Proof Overview
We first briefly outline the proof strategy, which consists of three main components:
-
•
First, we show that the level-one bound can be characterized as the expected absolute value of a martingale defined as follows: Consider the random walk induced on the protocol tree when Alice and Bob are given inputs and uniformly from . Let be the rectangle associated with the random walk at time . The martingale process tracks the inner product where and are Alice’s and Bob’s center of masses.
-
•
Second, to bound the value of the martingale, it is necessary to ensure that neither nor become excessively elongated in any direction during the protocol execution. To measure the length of in a particular direction , we calculate the variance , i.e. the variance of a uniformly random in the direction . If the set is not elongated in any direction, this can be thought of as a spectral notion of almost pairwise independence. Such a notion also generalizes to almost -wise independence by considering higher moments.
To achieve the property that the sets are not elongated, one of the main novel ideas in our paper is to modify the original protocol to a new one that incorporates additional cleanup steps where the parties communicate real values . Through these communication steps, the sets and are recursively divided into affine slices along problematic directions.
-
•
Last, one needs to show that the number of cleanup steps are small in order to bound the value of the martingale for the new protocol. This is the most involved part of our proof and requires considerable effort because the cleanup steps are real-valued and adaptively depend on the entire history, including the previous real values communicated.
The strategy outlined above also generalizes to level-two Fourier growth by considering higher moments and sending values of quadratic forms in the inputs. We also remark that since we view the sets and above as embedded in and allow the protocol to send real values, it is more natural for us to work in Gaussian space by doing a standard transformation. The rotational invariance of the Gaussian space also seems to be essential for us to obtain optimal level-one bound without losing additional polylogarithmic factors.
We now elaborate on the above components in detail and also highlight the differences between the level-one and level-two settings. For conciseness, in the following overview we use to denote and to denote where and only hide absolute constants.
2.1 Level-One Fourier Growth
The level-one Fourier growth of the XOR-fiber is given by
To bound the above, it suffices to bound for any sign vector . Here for simplicity we assume and the probability of reaching every leaf is .
A Martingale Perspective.
To evaluate the quantity , consider a random leaf of the protocol and let be the corresponding rectangle. Since the leaf determines the answer of the protocol, denoted by , the quantity above equals
where and are the center of masses of the rectangle. Our goal is to bound the magnitude of the random variable .
We shall show that . Note that can be as large as in the worst case — for instance if the first coordinates of and are fixed to the same value — thus we cannot argue for each leaf separately.
To analyze it for a random leaf, we first characterize the above as a martingale process using the tree structure of the protocol. The martingale process is defined as where tracks the inner product between the center of masses and of the current rectangle at step . Denote the martingale differences by and note that if in the step Alice sends a message, then
where is the change in Alice’s center of mass. A similar expression holds if Bob sends a message. Then it suffices to bound the expected quadratic variation (see Section 3) since
(2.1) |
where the equality holds due to the martingale property: .
To obtain the desired bound, we need to bound the expected quadratic variation by . Note that it could be the case that a single scales like . For instance, if Bob first announces his first coordinates, , and then Alice sends a majority of , then in the last step Alice’s center of mass changes by in each of the first coordinates, and the inner product with Bob’s center of mass changes by in a single step.
Such cases make it difficult to directly control the individual step sizes of the martingale and we will only be able to obtain an amortized bound. It turns out, as we explain later, that such an amortized bound on the martingale can be obtained if Alice and Bob’s sets are not elongated in any direction. Therefore, we will transform the original protocol into a clean protocol by introducing real communication steps that slice the elongated directions. For this, it will be convenient to work in Gaussian space which also turns out to be essential in proving the optimal bound.
Protocols in Gaussian Space.
A communication protocol in Gaussian space takes as inputs where are independently sampled from the Gaussian distribution . One can embed the original Boolean protocol in the Gaussian space by running the protocol on the uniformly distributed Boolean inputs and where takes the sign of each coordinate. Note that any node of the protocol tree in the Gaussian space corresponds to a rectangle where . Abusing the notation and defining their Gaussian centers of masses as and , one can associate the same martingale with the protocol in the Gaussian space:
It turns out that bounding the quadratic variation of this martingale suffices to give a bound on (see Section 4), so we will stick to the Gaussian setting. We now describe the ideas behind the cleanup process so that the step sizes can be controlled more easily.
Cleanup with Real Communication.
The cleanup protocol runs the original protocol interspersed with some cleanup steps where Alice and Bob send real values. As outlined before, one of the goals of these cleanup steps is to ensure that the sets are not elongated in any direction, in order to control the martingale steps. In more detail, recall that we want to control
in the step where Alice speaks. There are two key underlying ideas for the cleanup steps:
-
•
Gram-Schmidt Orthogonalization: At each round, if the current rectangle is , before Alice sends the actual message, she sends the inner product between her input and Bob’s current center of mass . This partitions Alice’s set into affine slices orthogonal to Bob’s current center of mass . Thus the change in Alice’s center of mass in later rounds is orthogonal to since it only takes place inside the affine slice.
Recall that the martingale is the inner product of Alice and Bob’s center of masses, and Bob’s center of mass does not change when Alice speaks. The original communication steps now do not contribute to the martingale and only the steps where the inner products are revealed do. In particular, if are two consecutive times where Alice revealed the inner product, then the change in Alice’s center of mass is orthogonal to change in Bob’s center of mass between time and . Thus, conditioned on the rectangle fixed by the messages until time , we have, by Jensen’s inequality,
(2.2) Note that the quantity on the right-hand side above is of the form . In other words, it is the variance of the random vector along direction . To maintain a bound on this quantity, we introduce the notion of “not being elongated in any direction”.
-
•
Not elongated in any direction: We define the following notion to capture the fact that the random vector is not elongated in any direction: we say that a mean-zero random vector in is -pairwise clean, if for every ,
(2.3) or equivalently, the operator norm of the covariance matrix is at most . This can be considered a spectral notion of almost pairwise independence, since the pairwise moments are well-behaved in every direction.
If the input distribution conditioned on Alice’s set is -pairwise clean, we say that her set is pairwise clean. Based on the above ideas, after Alice sends the initial message, if her set is not yet clean, she partitions it recursively by taking affine slices and transmitting real values. More precisely, while there is direction violating Equation 2.3, Alice does a cleanup of her set by sending the inner product . This direction is known to Bob as it only depends on Alice’s current space. In addition, this cleanup does not contribute to the martingale in the future because the inner product along this direction is fixed now.
The resulting protocol is pairwise clean in the sense that at each step666We remark that the sets are only clean at intermediate steps where a cleanup phase ends, but we show that because of the orthogonalization step, the other steps do not contribute to the value of the martingale., Alice’s current set is pairwise clean. Similar arguments work for Bob.
Let be the total number of communication rounds including all the cleanup steps. Then, by the above argument, and denoting by and the indices of the inner product steps for Alice and Bob, we can ultimately bound
(2.4) |
where again, the last equality follows from the martingale property. The right hand side above can be bounded by the expected number of communication rounds using the level-one inequality (see Theorem 3.1) — this inequality bounds the Euclidean norm of the center of mass of a set in terms of its Gaussian measure.
Expected Number of Cleanup steps.
Since the original communication only consists of rounds, the analysis essentially reduces to bounding the expected number of cleanup steps by , which is technically the most involved part of the proof.
It is implicit in the previous works on the Gap-Hamming Problem [18, 67] that large sets are not elongated in many directions: if a set has Gaussian measure , then for a random vector sampled from , there are at most orthogonal directions such that where . This is a consequence of the fact that the expectation of can be bounded by provided that has measure .
The above argument suggests that maybe we can clean up the set along these bad orthogonal directions. However this is not enough for our purposes: after taking an affine slice, the set may not be clean in a direction where it was clean before. Moreover, since the parties take turns to send messages and clean up, the bad directions will also depend on the entire history of the protocol, including the previous real and Boolean communication. This adaptivity makes the analysis more delicate and to prove the optimal bound we crucially utilize the rotational symmetry of the Gaussian distribution. Indeed, the fact that a large set is not elongated in many directions also holds even when we replace the Gaussian distribution with the uniform distribution on , but it is unclear how to obtain an optimal level-one bound using the latter.
In the final protocol, since the parties only send Boolean bits and linear forms of their inputs, conditioned on the history of the martingale, one can still say what the distribution of the next cleanup looks like, as the Gaussian distribution is well-behaved under linear projections. We then use martingale concentration and stopping time arguments to show that the expected number of cleanup steps is indeed bounded by even if the cleanup is adaptive.
We make two remarks in passing: First, we can also prove the optimal level-one bound using information-theoretic ideas but they do not seem to generalize to the level-two setting, so we adopt the alternative concentration-based approach here and they are similar in spirit. Second, it is possible from our proof approach (in particular, the approach for level two described next) to derive a weaker upper bound of for the level one while directly working with the uniform distribution on the hypercube.
2.2 Level-Two Fourier Growth
We start by noting that the level-two Fourier growth of the XOR-fiber is given by
To bound the above, it suffices to bound for any symmetric sign matrix . For this proof overview, we assume for simplicity that .
Martingales and Gram-Schmidt Orthogonalization.
Similar to the case of level one, the level-two Fourier growth also has a martingale formulation. In particular, let and be Alice and Bob’s sets at time as before and define to be the matrices that represent the level-two center of masses of the two sets. Here denotes the tensor product with the diagonal zeroed out.777Here is an matrix. We will also interchangeably view matrices as -length vectors. To bound the level-two Fourier growth, it suffices to bound the expected quadratic variation of the martingale defined by taking the inner product of the level-two center of masses where is the inner product of two matrices viewed as vectors.
To this end, we again move to Gaussian space where the inputs and transform the protocol to a clean protocol. First, we need an analog of the Gram-Schmidt orthogonalization step — this is achieved in a natural way by Alice sending inner product with Bob’s level-two center of mass, and Bob does the same. Note that Alice and Bob are now exchanging values of quadratic polynomials in their inputs. Thus, to control the step sizes, we now need to control the second moment of quadratic forms which naturally motivates the following spectral analogue of -wise independence.
4-wise Cleanup with Quadratic Forms.
We say a random vector is -wise clean with parameter if the operator norm of the covariance matrix
is at most where we view as an -dimensional vector. This is equivalent to saying that for any quadratic form ,
(2.5) |
where denotes the Euclidean norm of when viewed as a vector. Thus, this allows us to control the second moment of any quadratic polynomial (and in particular, fourth moments of linear functions). We note that one can generalize the above spectral notion to -wise independence in the natural way by looking at the covariance matrix of the tensor .
We say a set is -wise clean with parameter if Equation 2.5 is preserved for all with zero diagonal888The requirement of zero diagonal is for analysis purposes only and can be assumed without loss of generality since is zero diagonal anyway.. Combined with this notion, one can define the cleanup in an analogous way to the level-one cleanup: While there exists some violating Equation 2.5, Alice sends the quadratic form to Bob until her set is 4-wise clean with parameter .
Cleanup Analysis via Hanson-Wright Inequalities.
The crux of the proof is to bound the number of cleanup steps which, together with a similar analysis as in the level-one case, gives us the desired bound. We show that cleanup steps suffice in expectation to make the sets -wise clean for . Analogous to Equation 2.1 and Subsection 2.1, this gives a bound of on the expected quadratic variation and implies .
Since the parties send values of quadratic forms now, the analysis here is significantly more involved compared to the level-one case, even after moving to the Gaussian setting, where one could previously use the fact that the Gaussian distribution behaves nicely under linear projections. We rely on a powerful generalization of the Hanson-Wright inequality to a Banach-space-valued setting due to Adamczak, Latała, and Meller [5]. This inequality gives a tail bound for sum of squares of quadratic forms: In particular if are matrices with zero diagonal which form an orthonormal set when viewed as dimensional vectors, then the random variable satisfies for any (see Theorem 3.3 for a precise statement). We remark that this tail bound relies on the orthogonality of the quadratic forms and is much sharper than, for example, the bound obtained from hypercontractivity or other standard polynomial concentration inequalities.
In our setting, the matrices are being chosen adaptively. In addition, the parties are sending quadratic forms in their inputs, and the distribution of the next conditioned on the history is hard to determine, unlike the level-one case. To handle this, we replace the real communication with Boolean communication of finite precision . This means that whenever Alice wants to perform cleanup for some known to both parties, she sends only bits. On the one hand, this modification is similar enough to the cleanup protocol with real messages so that most of the argument carries through. On the other hand, now the protocol is completely discrete, which allows us to condition on any particular transcript.
For intuition, assume we fix a transcript of bits which has gone through cleanups. Typically, this transcript should capture of the probability mass. More crucially, the matrices for the cleanups are also fixed along the transcript, and one can apply the aforementioned Hanson-Wright inequality on . Combining the two facts together, we can apply the non-adaptive tail bound above and then condition on obtaining such typical transcript. This shows . However, each quadratic form comes from a violation of Equation 2.5 and contributes at least to in expectation. This implies that and by taking , we derive that the number of cleanup steps . This shows that the level-two Fourier growth is completing the proof.
Note that if we could take while having the same number of cleanup steps , then we would obtain an optimal level-two bound of . However, it is not clear how to use current approach to show this. In Subsection 8.2, we identify examples showing the tightness of our current analysis and also discuss potential ways to circumvent the obstacles within.
We remark that by replacing the Hanson-Wright inequality with its higher-degree variants and performing level- cleanups, we can analyze level- Fourier growth in the similar way. However, since the first two levels already suffice for our applications and we believe that our level-two bound can be further improved, we do not make the effort of generalizing it to higher levels here.
3 Preliminaries
Notation.
Throughout, and denote logarithms with base and respectively. We use to denote the set of natural numbers including 0. For , we write to denote the set . We use the standard notation, and emphasize that in this paper they only hide universal constants that do not depend on any parameter.
We write to denote the entrywise product for vectors and matrices: in particular, for any , we define to be a vector where for and similarly for any , we define to be a matrix where for . We use to denote a tensor with zeros on the diagonal, i.e., for any , is a matrix where if and zero if .
For a vector , we use to denote its Euclidean norm. Similarly, for a matrix , we use to denote its Euclidean norm viewing the matrix as an -dimensional vector. For nonzero or , we define or as the unit vector along direction and respectively: and . We write for the unit sphere in , and write for the unit sphere in where additionally the diagonal entries of the matrices are zero. We use to denote the inner product between vectors and to denote the inner product between matrices viewing them as -dimensional vectors.
Probability.
A probability space is a triple where is the sample space, is a -algebra which describes the measurable sets (or events) in the probability space, and is a probability measure. We use to denote a random sample distributed according to and to denote the expectation of a function under the measure . For any event , we use to denote the measure of under . We say an event holds almost surely if , i.e., the exceptions to the event have measure zero. For a measurable event , we write to denote the intersection of the sigma-algebra and the sigma-algebra generated by .
We use to denote the uniform probability measure over and to denote the -dimensional standard Gaussian measure in . We say a random variable is a standard Gaussian in if its probability distribution is . We will drop the subscript if the dimension is clear from context. We will also need lower dimensional Gaussian measures: given a linear subspace of dimension , there is a -dimensional standard Gaussian measure on it, which we denote by . For any measurable subset , we define its ambient space to be the smallest affine subspace that contains it where is a linear subspace of and . The relative Gaussian measure of denoted by is then defined to be the Gaussian measure of the set under .
Martingales.
Given a sequence of real-valued random variables in a probability space and a function satisfying , the sequence of random variables is called the Doob martingale where is the -algebra generated by which should be viewed as a record of the randomness of the process until time . The sequence is called a filtration. A sequence of random variables is called predictable (or adapted) with respect to if is -measurable for every , meaning that it is determined by the randomness in .
A discrete random variable is called a stopping time with respect to the filtration if the event for all , or in words, whether the event occurs is determined by the history of the process until time . All stopping times considered in this paper will be finite. The sigma-algebra which contains all events that imply the stopping condition is defined as the set of all events such that for all . We also note if one takes an increasing sequence of stopping times then the process defined by is also a martingale.
Let be the martingale differences. Note that and thus
(3.1) |
where the cross terms disappear upon taking expectation. In other words, the martingale differences are orthogonal under taking expectations. The right hand side above is the expected quadratic variation of the martingale . If the sequence is vector-valued (resp., matrix-valued) and satisfies where is zero vector (resp., matrix), then we say it is a vector-valued (resp., matrix-valued) martingale with respect to . Since each coordinate of a vector or matrix-valued martingale is itself a real-valued martingale, vector-valued or matrix-valued martingale differences are also orthogonal under Euclidean norms:
(3.2) |
Useful Inequalities.
We will use the well-known level- inequality [62, 34] (see e.g., [44, Level- Inequalities]). A statement in the Gaussian setting can be found in, e.g., [20, Lemma 2.2]. We remark that we will only use the case for and here which we state below.999Our Theorem 3.1 is slightly different from the references, where they additionally require . By Parseval’s identity, the left hand side is always at most one. Therefore we use a slightly worse bound for the right hand side to allow for the whole range of .
Below we write for the indicator function of a set and for a monomial.
Theorem 3.1 (Level- Inequality).
Let . Assume is measurable and . Then, we have
In particular, if is non-zero, dividing both sides by , we get the following more convenient form for :
We also make use of the following standard concentration inequality for sums of squares of independent standard Gaussians (see [66]).
Fact 3.2.
Let be arbitrary. For any , we have .
We also need a concentration inequality for sums of squares of orthogonal quadratic forms over Gaussian random variables. In particular, we prove the following inequality which follows from a generalization of the Hanson-Wright inequality to a Banach space-valued setting [5, Theorem 6]. Since, we only need a special case that is easier to prove, we include a self-contained proof using the Gaussian isoperimetric inequality in Appendix B following [5, Proposition 23].
Theorem 3.3.
Let be arbitrary. Let be real matrices where each has zero diagonal, and for . Then for any , we have
We remark that the tail bound above holds more generally for sub-Gaussian random variables (see [5]).
4 Fourier Growth via Martingales in Gaussian Space
In this section, we reduce the question of bounding the level-one and level-two Fourier growth to bounding the expected quadratic variation of certain martingales. To analyze these martingales and to prove the optimal bound for the level-one setting, it seems to be crucial to work in the Gaussian setting, so first we give a generic transformation from Boolean to Gaussian. We shall also additionally allow protocols that communicate real numbers to make the analysis easier.
4.1 Communication Protocols in Gaussian Space
Let be a communication protocol with total communication and be its XOR-fiber defined in Definition 1.1.
We embed the protocol in the Gaussian space by allowing Alice’s and Bob’s inputs, and respectively, to be real vectors in — the new protocol runs the original protocol with Boolean inputs and where denotes the sign function applied pointwise to each coordinate for a vector . The behavior of the communication protocol can be defined arbitrarily if any coordinate of or is zero since such points have zero measure under the standard -dimensional Gaussian measure .
This translation from the Boolean hypercube to the Gaussian space preserves the measure of sets: for any subset , we have where is the uniform measure over . Moreover, up to some normalizing factor, the Fourier coefficients of can also be computed by looking at Gaussian inputs. In particular, denoting by for a subset , we have the following fact.
Fact 4.1.
For all , we have .
Proof.
Note that for , the random variable is distributed as . Thus, by the definition of the XOR-fiber and the protocol , we have
where the second line follows since the expected value of a standard Gaussian in conditioned on its sign being fixed to is by the following calculation:
Remark 4.2.
We remark that instead of the Gaussian distribution above, one can work with any distribution where the coordinates are i.i.d. and symmetric around zero. In particular, if is a symmetric probability measure on the real line, and are independently drawn vectors in where each coordinate is i.i.d. sampled from , then where . In the case of level-two we will need to work with the truncated Gaussian distribution where each coordinate is sampled independently from the one dimensional standard Gaussian conditioned on being in some interval for in which case is upper bounded by a universal constant.
4.2 Generalized Communication Protocols
In the protocol defined above, Alice and Bob’s inputs and are real vectors in , but in each round they still exchange a single bit based on and . In order to bound the Fourier growth, it will be more convenient for us to define a notion of generalized communication protocols where parties are also allowed to send real numbers with arbitrary precision in each round. To define this formally, we place certain restrictions on the real communication in the protocol. More formally, in a generalized communication protocol, in each round a player with input can either send:
-
(i)
a bit in which is purely a function of the Boolean input and the previous Boolean messages, or
-
(ii)
a real number that is a measurable function of and the previous (real or Boolean) messages.
The depth of a generalized communication protocol is defined to be the maximum number of rounds of communication.
Note that a generalized protocol also generates a “protocol tree” where if in a round a real number is sent, the “children” of that particular “node” are indexed by all possible values in . A “transcript” of the protocol can be defined in an analogous way. The set of inputs that reach a particular node of this generalized protocol tree still form a rectangle where . We say that a generalized protocol is equivalent to the protocol if for every except on a measure zero set.
We will be interested in random walks on such generalized protocol trees when the inputs and are sampled from a product measure on and the parties send messages according to the protocol to reach a “leaf”. The random variables corresponding to the messages until any time generate a filtration — this filtration can be thought of as specifying a particular node of the generalized protocol at depth (equivalently, a partial transcript of the protocol till time ) that was sampled by the process. In this case, conditioned on any event in , (e.g., any realization of the transcript till time ), almost surely the conditional probability measure on the inputs is some product measure on supported on a rectangle where . We shall refer to the random variable as the current rectangle determined by . Since we will be working with product measures on inputs , the reader can think of conditioning on the filtration as essentially conditioning on the inputs being in the rectangle or equivalently a partial transcript till time .
4.3 Fourier Growth via Martingales
We will now relate Fourier growth to the quadratic variation of a martingale. Towards this end, we first note that in light of Fact 4.1, the level- Fourier growth of the XOR-fiber of the original communication protocol is given by
(4.1) |
where is any generalized protocol that is equivalent to and .
We now express the right hand side above as an inner product. Let be a random leaf of the generalized protocol tree induced by taking and let be the corresponding rectangle in the generalized protocol tree. Then,
(4.2) |
where the second line follows since is a leaf and determines the answer and the third line follows since and are independent conditioned on being in the rectangle .
Thus, specializing Subsection 4.3 to the level-one () and level-two cases (), from Subsection 4.3 we get that
where for we optimize over and for we optimize over being an symmetric matrix with zeros on the diagonals and entries otherwise.
To make the above more compact, we respectively define and to be the level-one and level-two centers of mass of a set :
(4.3) |
Then, upper bounding the constants in the above inequality ( and ) by , we get
(4.4) |
where is understood to be the same as before.
Moving forward, we fix an arbitrary for both cases and define a martingale process that captures the right hand side above. For this we note that a generalized communication protocol, where Alice’s and Bob’s inputs are sampled from the Gaussian distribution, naturally induces a discrete-time random walk on the corresponding (generalized) protocol tree where at time we are at a node at depth with the corresponding rectangle . Then, we have the following proposition.
Proposition 4.3.
and are vector-valued martingales taking values in and and are matrix-valued martingales taking values in .
Note that if in the round Alice speaks, then and do not change and similarly if Bob speaks, then and do not change. The above proposition implies that the real-valued processes
(4.5) |
each form a Doob martingale with respect to the natural filtration induced by the random walk on the protocol tree. Note that taking a random walk on the tree until we hit a leaf generates the marginal distribution on given in Equation 4.4. Let be the stopping time when this martingale hits a leaf and stops (i.e., the depth of the random leaf). Thus, by the orthogonality of martingale differences from Equation 3.1, we get that for , one can upper bound the Fourier growth in terms of expected quadratic variation of the above martingales:
Proposition 4.4.
For , .
The martingale implicitly depends on as used in Equation 4.4 and hence the maximum. Moreover, the martingale also depends on the underlying generalized communication protocol . In the next two sections, we will show that after transforming the original communication protocol into “clean” protocols, the expected quadratic variations of and are and respectively. This will then imply our main theorems.
Remark 4.5.
Note that Proposition 4.3 still holds even if the input distribution is not the Gaussian distribution, but some other product probability measure on the inputs . This also implies that for is a martingale. In particular, for the level-two case, we will need to use a truncated Gaussian distribution. In light of Remark 4.2, Proposition 4.4 still suffices for us with a different constant instead of . We also remark that we shall also need to truncate the real messages being used in the protocol for the level-two case to a finite precision, so the generalized protocols for the level-two case only have Boolean communication. However, to obtain the optimal level-one bound allowing generalized protocols that communicate real values seems to be crucial.
5 Level-One Fourier Growth
In this section, we will give a proof of Theorem 1.2 that . We start with a -round communication protocol over the Gaussian space as defined in Subsection 4.1. Given the discussion in the previous section and Proposition 4.4, our task ultimately reduces to bounding the expected quadratic variation of the martingale that results from the protocol . For example, one can simply take , but, as discussed in Section 2, the individual step sizes of this martingale can be quite large in the worst-case and it is not so easy to leverage cancellations here to bound the quadratic variation by .
So, we first define a generalized communication protocol that is equivalent to the original protocol but has additional “cleanup” rounds where Alice and Bob reveal certain linear forms of their inputs so that their sets are pairwise clean in the sense described in the overview. These cleanup steps allow us to keep track of the quadratic variation more easily.
5.1 Pairwise Clean Protocols
To define a clean protocol, we first define the notion of a pairwise clean set. Let . We say that the set is pairwise clean in a direction with parameter if
(5.1) |
where we recall that is the level-one center of mass of .
The above condition implies that for a random vector sampled from conditioned on , its variance along the direction is bounded by . We say that the set is pairwise clean (with parameter ) if it is clean in every direction . Equivalently, the operator norm of the covariance matrix of the random vector is bounded by .
We call a generalized communication protocol pairwise clean with parameter if at the start of a new “phase” of the protocol, the corresponding rectangle satisfies that both and are pairwise clean. Starting from a communication protocol in the Gaussian space, we will transform it into a pairwise clean protocol by proceeding from top to bottom and adding certain Gram-Schmidt orthogonalization and cleanup steps.
In particular, consider an intermediate node in the protocol tree of . Before Alice sends her bit as in the original protocol , she first performs an orthogonalization step by revealing the inner-product between her input and Bob’s current level-one center of mass. After this, she sends her bit according to the original protocol and afterwards she repeatedly cleans her current set by revealing while is not clean along the direction orthogonal to previous directions. Once becomes clean, they proceed to the next round. We now describe this formally.
Construction of pairwise clean protocol from .
We set . The construction of the new protocol is recursive and we first define some notation. Consider an intermediate node of the new protocol at depth . We use the random variable (resp., ) to denote the set of inputs of Alice (resp., Bob) reaching the node. If Alice reveals a linear form in this step, we use to denote the vector of the linear form; otherwise, we set to be the all-zeroes vector. We define similarly for Bob. Throughout the protocol, we will abbreviate and for Alice’s and Bob’s current center of mass respectively.
-
1.
At the beginning, Alice receives an input and Bob receives an input .
-
2.
We initialize , , and .
-
3.
For each phase : suppose we are starting the cleanup for a node at depth in the original protocol and suppose we are at a node of depth in the new protocol . If it is Alice’s turn to speak in :
-
(a)
Orthogonalization by revealing the correlation with Bob’s center of mass.
Alice begins by revealing the inner product of her input with Bob’s current (signed) center of mass . Since in the previous steps, she has already revealed the inner product with Bob’s previous centers of mass, for technical reasons, we will only have Alice announce the inner product with the component of that is orthogonal to the previous directions along which Alice announced the inner product. More formally, let be the component of that is orthonormal to all previous directions , i.e.,Alice computes and sends to Bob. Set . Increment by 1 and go to step (b).
-
(b)
Original communication. Alice sends the bit that she was supposed to send in based on previous messages and the input . Set . Increment by 1 and go to step (c).
-
(c)
Cleanup steps. While there exists some direction orthogonal to the previous directions (i.e., satisfying for all ) such that is not pairwise clean in direction , Alice computes and sends this to Bob. Set and . Increment by 1. Repeat step (c) as long as is not pairwise clean; otherwise increment by 1 and go back to the for-loop in step 3 which starts the new phase.
If it is Bob’s turn to speak, we define everything similarly with the role of switched with .
-
(a)
-
4.
Finally at the end of the protocol, the value is determined based on all the previous communication and the corresponding output it defines in .
We note some basic properties that directly follow from the description. First we note that the steps 3(a), 3(b), and 3(c) always occur in sequence for each party and we refer to such a sequence of steps as a phase for that party. Note that there are at most phases. If a new phase starts at time , then the current rectangle is pairwise clean for both parties by construction. Also, note that the non-zero vectors in the sequence (resp., ) form an orthonormal set. We also note that the Boolean communication in step 3(b) is solely determined by the original protocol and hence only depends on the previous Boolean messages.
Lastly, each phase has one 3(a) and 3(b) step, followed by potentially many 3(c) steps. However, the following claim shows that it is always finite.
Claim 5.1.
Let be an arbitrary leaf of the protocol and be its depth. Then . Moreover, along this path there are at most many steps 3(a) and 3(b).
Proof.
We count the number of communication steps separately:
-
•
Steps 3(a) and 3(b). Steps 3(a) and 3(b) occur once in every phase, thus at most times.
-
•
Step 3(c). For Alice, each time she communicates at step 3(c) , the direction is orthogonal to all previous ’s. Since the dimension of is , this happens at most times. Similar argument works for Bob.
Thus in total we have at most steps. ∎
We will eventually show that the expected depth of the protocol is when .
5.2 Bounding the Expected Quadratic Variation
Consider a random walk on the protocol tree generated by the new protocol when the parties are given independent inputs . Consider the corresponding level-one martingale process defined in Equation 4.5. Formally, at time the process is defined by
where we recall that and and is a fixed sign vector.
The martingale process stops once it hits a leaf of the protocol . Let denote the (stopping) time when this happens. Note that is exactly the expected depth of the protocol . Then, in light of Proposition 4.4, to prove Theorem 1.2, it suffices to prove the following.
Lemma 5.2.
.
We will prove this in two steps. We first show that the only change in the value of the martingale occurs during the orthogonalization step 3(a). This is because in each phase, Alice’s change of center of mass in steps 3(b) and 3(c) is always orthogonal to so they do not change the value of the martingale as discussed in Section 2. Moreover, recalling • ‣ Subsection 2.1, since Alice’s node was pairwise clean just before Alice sent the message in step 3(a), the expected change can be bounded in terms of the squared norm of the change that occurred in between the current round and the last round where Alice was in step 3(a). A similar argument works for Bob.
Formally, this is encapsulated by the next lemma for which we need some additional definition. Let be the natural filtration induced by the random walk on the generalized protocol tree with respect to which is a Doob martingale and also form vector-valued martingales (recall Proposition 4.3). Note that fixes all the rectangles encountered during times and thus for , the random variables are determined, in particular, they are -measurable. Recalling that is the cleanup parameter, we then have the following. Below we assume without any loss of generality that Alice speaks first and, in particular, we note that Alice speaks in step 3(a) for the first time at time zero.
Lemma 5.3 (Step Size).
Let be a sequence of stopping times with being the index of the round where Alice speaks in step 3(a) for the time or if there is no such round. Then, for any integer ,
and moreover, for any , we have that
A similar statement also holds if Bob speaks where is replaced by and the sequence is replaced by where is the index of the round where Bob speaks in step 3(a) for the time or if there is no such round.
In particular, we see that the steps 3(b) and 3(c) do not contribute to the quadratic variation and only the steps 3(a) do. Also, since the first time Alice and Bob speak, they start in step 3(a), we also note that and are their initial centers of mass which are both zero.
We shall prove the above lemma in Subsection 5.3 and continue with the bound on the quadratic variation here. Using Lemma 5.3, we have
On the other hand, by the orthogonality of vector-valued martingale differences from Equation 3.2, we have
A similar statement holds for . Therefore,
(5.2) |
We prove the following in Subsection 5.4 to upper bound the quantity on the right hand side above. Loosely speaking, by an application of level-one inequalities (see Theorem 3.1), the lemma below ultimately boils down to a bound on the expected number of cleanup steps.
Lemma 5.4 (Final Center of Mass).
Since , plugging in the bounds from the above into Equation 5.2 readily implies Lemma 5.2. Together with Proposition 4.4, this completes the proof of Theorem 1.2.
5.3 Bounds on Step Sizes (Proof of Lemma 5.3)
Let us abbreviate . Observe that
(5.3) |
where the second line is due to being a vector-valued martingale and thus .
We first consider the case that at time a new phase starts for Alice. By construction, this means that the current rectangle determined by is pairwise clean with parameter , and since Alice is in step 3(a) at the start of a new phase, is chosen to be the (normalized) component of that is orthogonal to previous directions . Let be the length of this component before normalization. Note that is -measurable (i.e., it is determined by ).
We now claim that components of and are the same along any of the previous directions . So in Equation 5.3, they cancel out and the only relevant quantity is the component in the direction . This follows since, in all the previous steps , Alice has already fixed . This implies that for any and that are determined by , the inner product with all the previous is fixed over the choice of from both rectangles. Formally, we have that for any and , it holds that for any . In particular, since and are the corresponding centers of mass, we have that
(5.4) |
This, together with Equation 5.3 and recalling that is determined by , implies that
(5.5) |
We now bound the term outside the expectation by the change in the center of mass and the term inside the expectation by the fact that the set is pairwise clean.
Term Outside the Expectation.
Recall that is chosen to be the (normalized) component of that is orthogonal to the span of . Since is in the span of and , it is orthogonal to . Hence,
Since is a unit vector and each entry of is in , this implies that
(5.6) |
Term Inside the Expectation.
Since is a vector-valued martingale with respect to , and is -measurable (determined by ), we have that
Since Alice is in step 3(a), her message fixes at time for every . Thus,
(5.7) |
where the last line follows from the tower property of conditional expectation.
Recall that is the center of mass. Moreover, the unit vector is determined by and also the conditional distribution of conditioned on is that of conditioned on . Thus, using the fact that is pairwise clean since Alice is in step 3(a), the right hand side in Subsection 5.3 is at most .
Final Bound.
Substituting the above in Equation 5.5, we have
where the second inequality follows from Equation 5.6. This completes the proof of the first statement.
For the moreover part, let us condition on the event where Alice speaks at time . Note that such must all lie in the same phase of the protocol where Alice is the only one speaking. So, Bob’s center of mass does not change from the time till , i.e., . Thus we have . Analogous to Equation 5.4, the component of Alice’s center of mass along the previous directions are fixed. Thus for all . Furthermore, by construction, lies in the linear subspace spanned by . Therefore, since , it follows that .
5.4 Expected Norm of Final Center of Mass (Proof of Lemma 5.4)
Let be the (random) linear subspace spanned by the vectors and similarly, let be the linear subspace spanned by the vectors . For any linear subspace of , we denote by and the projectors on the subspace and its orthogonal complement respectively. Then, we have that
Note that the non-zero vectors in and form an orthonormal basis for the subspaces and respectively. Moreover, for each , the inner product is fixed for every and the inner product is also fixed for every where is the current rectangle determined by . In particular, since is the center of mass of , this implies that
where the second line follows from the inner product being fixed in . Therefore, we have
In an analogous fashion,
We next show that both and are at most . The former follows from stopping time and concentration arguments laid out in the overview that there cannot be too many orthogonal directions where is large. The latter follows from an application of level-one inequalities.
We will bound the norm of the projection on the subspaces and , which corresponds to the quantity , in Subsection 5.4.1 and bound the norm of the projection on the orthogonal subspaces and , which corresponds to the quantity , in Subsection 5.4.2.
5.4.1 Projection on the Subspaces and
We shall show that the expected norm of the final center of mass when projected on the subspaces and is
Towards this end, define the random variable for each . Note that the vectors ’s are being chosen adaptively depending on the previous inner products for , as well as the Boolean communication bits from step 3(b), thus they are functions of and as well here. Observe that
We now divide the time sequence into successive intervals of different lengths for . Then we bound the expected sum of within each time interval by . We further argue that the probability that the stopping time lies in the -th interval is at most . In particular, for , letting interval , which is of length , we show the following.
Claim 5.5.
For any , we have
We shall prove the above claim later since it is the most involved part of the proof. The previous claim readily implies the following probability bounds.
Claim 5.6.
For any , we have .
Proof of Claim 5.6.
We bound by induction on . The claim trivially holds for .
Now we proceed to analyze the event . Observe that Claim 5.1 implies that there are at most many step 3(a) and 3(b) throughout the protocol. Thus if the event above occurs, there are at least many time steps where the process is in step 3(c).
By the definition of the cleanup step, if is a rectangle determined101010It suffices to consider such events since we have a product measure on conditioned on and is a stopping time and is -measurable (i.e., determined by the randomness in ). by where the process is in step 3(c) and Alice speaks, then
where is the cleanup parameter and is the center of mass. This is because is chosen to be a unit vector in a direction where the current set (conditioned on the history) is not pairwise clean. A similar statement holds if Bob speaks in step 3(c) for the random variable where is sampled from conditioned on .
By the tower property of conditional expectation, the above implies that
Recall that Claim 5.5 implies that the right hand side is at most . We consider two cases:
-
(i)
if , then clearly as well as required;
-
(ii)
otherwise and , then it follows that
and by induction this implies that .∎
These claims imply that
where the last line uses the fact that for and . This proves the desired bound on assuming Claim 5.5 which we prove next.
Proof of Claim 5.5.
To prove the claim, we need to analyze the expectation of under sampled from conditioned on the event .
We first describe an equivalent way of sampling from this distribution which will be easier for analysis. First, we recall that the definition of the cleanup protocol implies that the Boolean communication in is solely determined by the previous Boolean communication, since it is specified by the original protocol (and thus ) before the cleanup.
Let us fix any Boolean string that is a valid Boolean transcript in the original communication protocol . This defines a rectangle consisting of all pairs of inputs to Alice and Bob that result in the Boolean transcript in . If we sample conditioned on and output the unique such that , we obtain a distribution on rectangles. We use to denote the probability of obtaining by this sampling process so that .
Now consider the following two-stage sampling process. First, we sample a rectangle according to the above distribution, and then we sample the inputs sampled from conditioned on the event that . We shall show the following claim for any rectangle that could be sampled in the first step.
Claim 5.7.
.
Assuming the above, and taking an expectation over drawn with probability , we immediately obtain Claim 5.5:
(by concavity of ) | |||
∎ |
To complete the proof, we now prove Claim 5.7.
Proof of Claim 5.7.
Fix any such that . We will bound the expectation of the quantity where are sampled from conditioned on the event that . Note that are functions of the previous messages of the protocol and hence also the inputs . Once we condition on the above event, the Boolean communication is also fixed to be .
To analyze the above conditioning, we first do a thought experiment and consider a different protocol that takes standard Gaussian inputs (without any conditioning) and show a tail bound for the random variable for this new protocol. In the last step, we will use it to compute the expectation we ultimately want.
Protocol .
The protocol always communicates according to the fixed transcript in a Boolean communication step and otherwise according to the cleanup protocol on any input . Consider the random walk on this new protocol tree where the inputs (without any conditioning). Let be the associated filtration of the new protocol which can be identified with the collection of all partial transcripts till time . Note that the vectors and in this new protocol are determined only by the previous real communication since the Boolean communication is fixed to . This also implies that the vectors and form a predictable sequence with respect to the filtration . Moreover, by the definition of the protocol the next non-zero vector is chosen to be a unit vector orthogonal to the previously chosen ’s and the same holds for the vectors .
We denote by the random variable that captures for the protocol , i.e., for and defined by the protocol . Observe that if then .
Consider the behavior of the protocol at some fixed time . The nice thing about the protocol is that conditioned on all previous real messages for , both and are standard Gaussian distributions on an affine subspace of (defined by the previous messages). Then, at time , since is orthogonal to the directions used in all previous real messages, it follows that the distribution of conditioned on any event in is an independent standard Gaussian for every if is non-zero. The same holds for as well. This last fact uses that the projection of a multi-variate standard Gaussian in orthonormal directions yields independent real-valued standard Gaussians.
This implies that each new and is an independent chi-squared random variable conditioned on the history (up to depth ) of the random walk. Therefore, Fact 3.2 implies that
Since , we have
Computing the Original Expectation.
Let us compare the probability of the above tail event in the original protocol where inputs are sampled from conditioned on the event that . We can write
(5.8) | ||||
We then bound the numerator by
(if then ) | |||
Note that the inequality gives us an exponential tail on Equation 5.8:
We can now integrate the above inequality to get an upper bound on the expected value of under the distribution of interest. In particular, since for any non-negative random variable , the following holds for any parameter :
we derive the following by taking :
This completes the proof of Claim 5.7. ∎
5.4.2 Projection on the Orthogonal Subspaces and
We shall show that the expected norm of the final center of mass when projected on the subspaces and is
Recall that where is the (random) linear subspace spanned by the orthonormal set of vectors and its orthogonal complement. Moreover, the vectors are determined by the previous Boolean and real communication. A similar statement holds for and the vectors as well.
The proof will follow in two steps. We will first show that one can bound the norm of the projection , which turns out to be the Gaussian center of mass of a set that lives in the subspace , in terms of the logarithm of the inverse relative measure with respect to the subspace. Note that the Gaussian measure here is the Gaussian measure on the subspace . The case for will be similar. The second step will use information theory-esque convexity argument to show that on average the logarithm of the inverse relative measure is small.
For the first part, we observe that if we sample and take a random walk on this protocol tree, we obtain a probability measure over transcripts which includes both real and Boolean values. Recall that the Boolean transcript is determined by the original protocol and only depends on the previous Boolean communication and the real transcript is sandwiched between the Boolean communication. Let denote the random variable representing the full transcript of the generalized protocol where is the Boolean communication and is the real communication. For any given transcript , let denote the corresponding rectangle consists of inputs reaching the leaf, and let (for ) denote the rectangle consisting of all pairs of inputs to Alice and Bob that result in the Boolean transcript . Note that the real communication together with fixes the subspaces and and particular affine shifts and of those subspaces depending on the value of the inner products determined by the full transcript. In particular, the rectangle consistent with the full transcript is given by and , i.e., taking (random) affine slices of the original sets.
Note that and are distributed as the center of masses of the final rectangle , and thus is suffices to look at the rectangles for the rest of the argument. Since (resp., ) lies in some affine shift of (resp., ), defining the relative center of mass for a set that lives in the ambient linear subspace , as where the Gaussian measure is on the subspace , it follows that
Recalling that is the Gaussian measure of a set relative to its ambient space, we will show:
Claim 5.8.
and .
Note that we can ignore the case when is zero above, since we will eventually take an expectation over and almost surely this measure is non-zero.
Using the previous claim,
For the second step of the proof, we show the next claim which relies on convexity arguments to bound the right hand side above by . This is similar in spirit to chain-style arguments from information theory.
Claim 5.9.
.
This gives us the final bound assuming the claims which we now prove.
Proof of Claim 5.8.
We can bound the norm of the above projection by an application of the Gaussian level-one inequality (Theorem 3.1), which, by rotational symmetry, implies that if is a subset of a linear subspace with non-zero measure, then
(5.9) |
where recall that is the center of mass with respect to the Gaussian measure on the subspace .
If we run the generalized protocol on and condition on getting the full transcript , the conditional probability measure on is that of the Gaussian measure conditioned on and is that of the Gaussian measure conditioned on and they are independent. This follows from the fact that so far the parties have fixed inner products along a basis for the orthogonal subspaces and and the fact the projection of a standard Gaussian on orthogonal subspaces are independent.
Thus, applying Equation 5.9, we have
where the last line follows since is the ambient space for (this holds almost surely) and if is the ambient space of . A similar argument proves the bound on . ∎
Proof of Claim 5.9.
For this claim, it will be convenient to consider a different generalized protocol that generates the same distribution on the leaves . In particular, since the Boolean messages in the generalized protocol only depend on the previous Boolean messages, one can first send all the Boolean messages , and then send all the real messages choosing them according to the protocol depending on the previous real messages and the (partial) Boolean transcript. Note that the protocol generates the same distribution on the leaves when the inputs . In particular, the real communication only partitions 111111We remark that this protocol suffices for proving this claim since we are looking only at the leaves. However, unlike Lemma 5.3, directly bounding the expected quadratic variation of the martingale corresponding to the protocol is difficult. each rectangles that corresponds to the Boolean transcript into affine slices.
For rest of the claim, we now work with the protocol where the Boolean communication happens first. To prove the claim, we condition on a Boolean transcript and by induction show that
(5.10) |
where is the full transcript and is the rectangle containing all the inputs such that Boolean transcript is . Note that is the probability of obtaining the Boolean transcript since the ambient space of and is .
Then, taking expectation over the Boolean transcript ,
where the last line follows from concavity.
Induction.
To complete the proof, we now show Equation 5.10 by induction. For this, let us look at an intermediate step in where the Boolean communication is fixed to and Alice and Bob have exchanged some real messages . Let the current rectangle be and it is Alice’s turn to speak. Note that and live in some affine subspaces at this point and in the current round, Alice sends the inner product of her input with a vector that is determined by the previous messages and orthogonal to the ambient space of . At this step, Bob’s set does not change at all. We shall show that in each step, the log of the inverse of the relative measure of the current rectangle does not increase on average over the next message:
(5.11) |
and an analogous statement holds when Bob speaks. Taking an expectation over , the above directly applies (5.10) by a straightforward backward induction:
To see Equation 5.11, let us write for Alice’s current set. Recall that since we have fixed the history, Alice has fixed inner product with some orthogonal directions and she has decided on the next direction along which she will send the next inner product. Thus, lives in some fixed affine subspace where is the span of and the next message . Moreover, conditioned on the history till this point, the conditional probability distribution on Alice’s input can be described as follows: the projections corresponding to the non-zero vectors in the sequence , i.e., the inner products where for , are fixed according to the shift , while the distribution on the orthogonal complement is that of the Gaussian measure on the subspace after conditioning on the event that (which lives in ). This uses that projections of a standard -dimensional Gaussian in orthogonal directions are independent.
Let be the dimension of where . Then, by doing a linear transformation, we may assume that (and thus and the shift fixes the coordinates through ) and , i.e., in the current message Alice reveals the first coordinate of where is sampled from conditioned on . In this case, in the left hand side of Equation 5.11 is exactly if Alice sends as the message, while for the right hand side of Equation 5.11, we have . Denoting by the probability density function of , our statement boils down to showing
We show the above by explicitly writing the probability density function . Denote by the standard Gaussian density function121212Explicitly where is the density function for one-dimensional standard Gaussian. in . The density function of the random vector sampled from conditioned on , is given for and zero outside. Thus, we have
Then, by concavity, the left hand side of Equation 5.11 is exactly given by
6 Level-Two Fourier Growth
In this section, we prove Theorem 1.3 that . Similar to the proof of level-one bound Theorem 1.2, we start with a -round communication protocol over the Gaussian space as defined in Section 4. Note that in turn comes from the original Boolean communication protocol . Thus in the following we assume without loss of generality .
Given the discussion in Subsection 4.3, to bound the second-level Fourier growth, one can attempt to bound the expected quadratic variation of the martingale that results from the protocol directly, but similar to the case of level-one it is hard to leverage cancellations here to prove the bound we aim for. So, starting from , we will define a communication protocol that computes the same function as , but satisfies some additional “clean” property where it is easier to control the quadratic variation. This new protocol will differ from in two ways. Firstly, the protocol will consist of additional “cleanup steps” where Alice and Bob reveal certain quadratic forms of their input. Secondly, the protocol will send the real value of the quadratic form with certain precision. Note that this protocol will not involve sending real messages at all, instead, any potential real messages will be truncated to a few bits of precision and be sent as Boolean messages.
We emphasize that the main difference in the protocol from the corresponding level-one variant comes from the precision control, which is not needed there due to the fact that Gaussian distribution remains a (lower-dimensional) Gaussian under linear projections. For technical reasons we shall also need to analyze the martingale under a truncated Gaussian distribution, where all coordinates are bounded in some large interval . This intuitively doesn’t incur a noticeable difference on the distribution since it is highly unlikely that coordinates drawn from Gaussian distribution will be outside such intervals and recalling Remark 4.2 and Proposition 4.4, it still suffices to analyze the corresponding martingale under the truncated Gaussian distribution.
We next define the notion of a -wise clean protocol.
6.1 -Wise Clean Protocols
Consider an intermediate node in the protocol and let refer to the set of Alice’s inputs reaching this node. We denote by the set of all matrices in with zero diagonal and unit norm (when viewed as -dimensional vectors). For a parameter , we say that the set is -wise clean in a direction if
where we recall that is the level-two center of mass of under the Gaussian measure. We say that the set is -wise clean if it is -wise clean in every direction . Our new protocol will consist of the original protocol, interspersed by cleaning steps. Once Alice sends her bit as in the original protocol, she cleans by revealing with a few bits of precision while there exists direction such that not clean in direction . Once becomes clean, Alice proceeds to the next round and Bob does an analogous cleanup. We now describe this formally.
Communication with Finite Precision.
Let positive integer be a precision parameter that we will use for truncation. In our new communication protocol, we will send real numbers with precision . This is formalized as the function defined at as
Construct from .
As described before, will consist of the original protocol along with extra steps where Alice or Bob reveal the (approximate) value of a quadratic form on their input. Consider an intermediate node of this new protocol at depth . We always use the random variable (resp., ) to denote the set of inputs of Alice (resp., Bob) reaching the node. If Alice is revealing a quadratic form in this step, we use to denote the matrix of the quadratic form revealed at this node, otherwise set to be the all-zeroes matrix. We define similarly for Bob. Throughout the protocol, we will always set and to denote and respectively.
Recall that is the parameter for cleanup to be optimized later. Since we will now send real numbers (with certain precision) as bit-strings, their magnitudes should also be well controlled to guarantee bounded message length. This is managed by a parameter and we will restrict the inputs to the parties in to come from the box . Note that, by Gaussian concentration, suffices.
-
1.
At the beginning, Alice receives an input and Bob receives an input .
-
2.
We initialize , , and .
-
3.
For each phase : suppose we are starting the cleanup for a node at depth in the original protocol and suppose we are at a node of depth in the new protocol . If it is Alice’s turn to speak in :
-
(a)
Orthogonalization by revealing the correlation with Bob’s center of mass.
Alice begins by revealing the inner product of her input with Bob’s current (signed) level-two center of mass . Since in the previous steps, she has already revealed the inner product with Bob’s previous centers of mass, for technical reasons, we will only have Alice announce the inner product with the component of that is orthogonal to the previous directions along which Alice announced the inner product. More formally, let be the component of that is orthonormal to the span of the previous directions for , i.e.,Alice computes and sends to Bob. Set . Increment by and go to step (b).
-
(b)
Original communication. Alice sends the bit that she was supposed to send in based on previous messages and . Set . Increment by 1 and go to step (c).
-
(c)
Cleanup steps. While there exists some direction orthogonal to previous directions, i.e., for all , and is not -wise clean in direction , Alice computes and sends to Bob. Set and . Increment by 1. Repeat step (c) while is not -wise clean; otherwise, increment by 1 and go back to the for-loop in step 3 which starts a new phase.
If it is Bob’s turn to speak, we define everything similarly with the role of switched with .
-
(a)
-
4.
Finally at the end of the protocol, the value is determined based on all the previous communication and the corresponding output it defines in .
Remark 6.1.
Note that by construction, the non-zero matrices among form an orthonormal set when viewed as -dimensional vectors (similarly for ) and moreover, their diagonals are zero. Lastly, and are known to both Alice and Bob as they are canonically determined by previous messages.
We remark that the steps 3(a), 3(b), and 3(c) always occur in sequence for each party and we refer to such a sequence of steps as a phase for that party. Note that there are at most phases. If a new phase starts at time , then the current rectangle is -wise clean for both parties by construction.
Now we formalize a few useful properties regarding the communication protocol . The first fact below follows since each is an expectation of over some distribution and has zero diagonal.
Fact 6.2.
and each has zero diagonal.
The following follows from tail bounds for the univariate standard normal distribution.
Fact 6.3.
Let . Then .
The next fact says that when a node fixes a quadratic form with precision, for any two inputs that reach this node, the quadratic forms differ by at most .
Fact 6.4.
In step 3(a) and 3(c), any satisfies . Similarly any satisfies .
The next claim bounds the maximum attainable norms for Alice and Bob’s level-two center of masses at any point in the protocol. This uses the fact that the inputs come from the truncated Gaussian distribution.
Claim 6.5.
and for all possible and throughout the communication.
Proof.
Since is a matrix with zero diagonal and entries off diagonal and also has zero diagonal, . In addition, since , we have
A similar analysis works for . ∎
The next claim gives a bound on the length of any message in the protocol .
Claim 6.6.
For any and , any message in consists of at most many bits.
Proof.
Assume without loss of generality it is Alice’s turn to speak. On step 3(b) she sends one bits. On steps 3(a) and 3(c), she computes for some and send the result. Since
and the message is a multiple of that means yields a message with many bits. ∎
The last claim bounds the maximum depth of the new protocol .
Claim 6.7.
Let be an arbitrary leaf of the protocol and be its depth. Then . Moreover, along this path there are at most many non-zero and at most many non-zero for .
Proof.
We count the number of communication steps separately:
-
•
Steps 3(a) and 3(b). Steps 3(a) and 3(b) occur once in every phase, thus at most times.
-
•
Step 3(c). For Alice, each time she communicates at step 3(c), the direction is non-zero and orthogonal to all previous ’s. Since the dimension of is , this happens at most times. Similar argument works for Bob.
Thus in total we have at most steps. ∎
We will eventually show that, with suitable choice of , typically is at most .
6.2 Bounding the Expected Quadratic Variation
Consider the martingale process defined in Equation 4.5 from a random walk on the protocol tree generated by where the inputs are sampled from conditioned on being in the bounded cube . Recall that Proposition 4.3 still holds (see Remark 4.5).
Formally, at time the process is defined by
where we recall that and and is a fixed sign matrix with a zero diagonal. The martingale process stops once it hits a leaf of . Let denote the (stopping) time when this happens. Note that is exactly the expected depth of the protocol .
In light of Remark 4.2 and Proposition 4.4, to prove Theorem 1.3, it suffices to prove the following.
Lemma 6.8.
Lemma 6.8 is proved in three steps. We first show that essentially the only change in the value of the martingale is the orthogonalization step 3(a). The reason is the same as the level-one bound: Alice’s messages sent in step 3(b) and 3(c) are always near-orthogonal to Bob’s current level-two center of mass, thus they do not change the value of the martingale much. Moreover, by level-two analog of • ‣ Subsection 2.1, since Alice’s current node was clean just before Alice sent the message in step 3(a), the expected change can be bounded in terms of the squared norm of the change that occurred in (or ) between the current round and the last round where Alice was in step 3(a). Similar argument works for Bob.
Formally, this is encapsulated by the next lemma for which we need some additional definitions. Let denote the natural filtration induced by the random walk on the generalized protocol tree with respect to which is a Doob martingale and also form vector-valued martingales (recall Proposition 4.3). Note that fixes all the rectangles encountered during times and thus for , the random variables are determined, in particular, they are -measurable. Recalling that is the cleanup parameter to be optimized later, we then have the following. Below we assume without any loss of generality that Alice speaks first and, in particular, we note that Alice speaks in step 3(a) for the first time at time zero when both Alice and Bob’s center of masses are at zero: .
Lemma 6.9 (Step Size).
Let be a sequence of stopping times with being the index of the round where Alice speaks in step 3(a) for the time or if there is no such round. Then, for any integer ,
and moreover, for any , we have that
A similar statement also holds if Bob speaks where is replaced by and the sequence is replaced by where is the index of the round where Bob speaks in step 3(a) for the time or if there is no such round.
We indeed see that, if and , then , and steps 3(b) and 3(c) do not contribute much to the quadratic variation and only the steps 3(a) do. Also, since the first time Alice and Bob speak, they start in step 3(a), we also note that and are their initial centers of mass which are both zero.
We shall prove the above lemma in Subsection 6.3 and continue with the bound on the quadratic variation here. Using the bounds on the step sizes from Lemma 6.9,
(by Claim 6.7) |
On the other hand, using the orthogonality of vector-valued martingale differences from Equation 3.2,
A similar statement holds for as well. Therefore,
(6.1) |
Then in Subsection 6.4 we will apply level-two inequalities (see Theorem 3.1) to convert the bounding into bounding the second moment . This reduction is formalized as Lemma 6.10 below and its proof is similar to [27, Claim 1].
For each leaf , let be the Gaussian measure of the rectangle at . Recall .
Lemma 6.10.
.
Finally, in Subsection 6.5, we bound the second moment for a suitable choice of parameters.
Lemma 6.11.
It holds that and for , , and .
Given Lemmas 6.10 and 6.11,the proof of Lemma 6.8 naturally follows.
Proof of Lemma 6.8.
With the parameters chosen in Lemma 6.11, we have
(by Equation 6.1) | ||||
(by Lemma 6.10) | ||||
(by Lemma 6.11) | ||||
∎ |
Remark 6.12.
Note that our proof for level-two Fourier growth actually holds for a slightly more general setting, where Alice and Bob are allowed to send bits during each original communication round. This can be viewed as balancing the length of the messages in step 3(b) with step 3(a) and step 3(c).
Since one can always convert a -round -bit communication protocol into a -round -bit communication protocol, we obtain a slightly better level-two Fourier growth bound of
The conversion is done by Alice (resp., Bob) enumerating the next bits from Bob (resp., Alice), and providing the corresponding bits responses for each possibility.
It is also possible to improve the factor to by varying the cleanup parameter with depth. For example, for depth in the interval , one could pick . Since our focus is mostly on improving the polynomial dependence in where there is still room for improvement, we do not make an effort here to improve the polylog terms.
6.3 Bounds on Step Sizes (Proof of Lemma 6.9)
Let us abbreviate and note that at time a new phase starts for Alice. By construction, this means that the current rectangle determined by is -wise clean with parameter , and since Alice is in step 3(a) at the start of a new phase, is chosen to be the (normalized) component of that is orthogonal to previous directions .
For each , let be the length of along direction . Each is -measurable (i.e., it is determined by ) and . In this case, we have
(6.2) |
Similar to the level-one proof, the components of and are roughly the same along any of the previous directions and so they almost cancel out and the major quantity is in the direction . This follows since, in all the previous steps , Alice has already fixed with precision . This implies that for any and that are determined by , the inner product with all the previous is fixed with precision over the choice of . Formally, by Fact 6.4, we have that for any and , it holds that for all . In particular, since and are the corresponding centers of mass, we have that
(6.3) |
On the other hand, since and is a unit direction, we have
(6.4) |
Similarly, noting that is a sign matrix, we can bound
(6.5) |
Expanding the square in Subsection 6.3 and plugging these estimates to each one of the terms gives
(6.6) |
where the second line follows from Claim 6.7.
We now bound the term outside the expectation by the change in the center of mass and the term inside the expectation by the fact that the set is -wise clean.
Term Outside the Expectation.
Recall that is chosen to be the (normalized) component of that is orthogonal to the span of . Since is in the span of and , it is orthogonal to . Hence
Since is a unit direction and is a sign matrix, this implies that
(6.7) |
Term Inside the Expectation.
Recall that Alice is in step 3(a), she sends with precision at time , and thus the same inner product with is fixed with precision for every point in determined by . Thus
( is the truncation error by Fact 6.4) | ||||
(6.8) |
where the last line follows from and .
Final Bound.
Since is a matrix-valued martingale and thus , we have
Then by Equation 6.8, we upper bound the right hand side by
Since is -wise clean with parameter , it can be bounded by :
(6.9) |
Putting everything together, we have
(by Equation 6.6) | ||||
(by Equation 6.9) | ||||
(by Equation 6.5) | ||||
(by Equation 6.7) | ||||
This completes the proof of the first statement in the lemma.
For the moreover part, let us condition on the event where Alice speaks at time . Note that such must all lie in the same phase of the protocol where Alice is the only one speaking. So, Bob’s center of mass does not change from the time till , i.e., . Thus we have
(6.10) |
Analogous to Equation 6.3, the component of Alice’s center of mass along the previous directions are fixed with precision . Thus by Fact 6.4,
(6.11) |
Furthermore, by construction, lies in the space spanned by . Note that . Similar to the previous analysis, for each , let be the length of along direction . Then Equation 6.5 also holds here. Therefore
(by Equation 6.10) | ||||
(by Equation 6.5 and Equation 6.11) | ||||
(by Claim 6.7) |
6.4 Conversion to Second Moment Bounds of the Depth (Proof of Lemma 6.10)
Recall and for each leaf . The goal of this subsection is to prove Lemma 6.10.
We first note the following basic fact.
Fact 6.13.
and
Now we apply Theorem 3.1 with to relate the LHS of Lemma 6.10 with an entropy-type bound.
Lemma 6.14.
.
Proof.
Let be a fixed leaf and be its depth. Note that this also fixes the rectangle and thus the centers of mass . Define the indicator function by
Then we have
(by Theorem 3.1) | |||
Therefore taking expectation over a random , by Fact 6.13, we have
∎ |
Now in the next lemma, we bound the right hand side of Lemma 6.14 in terms of the second moment of the depth, which immediately proves Lemma 6.10.
Lemma 6.15.
Assume that . Then, .
Proof.
By Claim 6.6, and the assumption each message is of length at most . We divide into two cases based on :
( is increasing when ) | |||
(since ) | |||
(each message is of length ) | |||
(since ) |
and
Adding up the two estimates above gives the desired bound. ∎
6.5 Second Moment Bounds for the Depth (Proof of Lemma 6.11)
The final ingredient is an estimate for the second moment . This subsection is devoted to this goal and proving Lemma 6.11.
For messages , we define where is defined by the protocol using the messages . Note that this definition is consistent with from Subsection 6.4 for a leaf .
Lemma 6.16.
There exists a universal constant such that the following holds. Let be two arbitrary integers with . Let be arbitrary messages of the first communication steps. Assume . Then
Proof.
Let be sampled from conditioned on . Let be its corresponding leaf in and be the depth of . By Claim 6.7, always has finite depth. We extend and for all . Then define
where ’s and ’s depend only on .131313Note that specifies all the communication messages, which allows us to simulate the protocol and obtain each and . Equivalently, we can write as
where and are fixed due to .
Observe that for any fixed , induced by different , conditioned on , is a disjoint partition of . Therefore sampling conditioned on is equivalent to
-
•
first sample random messages conditioned on ,
-
•
then sample conditioned on given .
Note that we can further expand to a leaf as a full communication path, and obtain the following equivalent sampling process:
-
•
Sample a random leaf conditioned on .
-
•
Sample conditioned on defined by the first messages of .
As a result, we have
Observe that there are at most many step 3(a) and 3(b) in . This means, if , then from the -th to the -th communication steps, there are at least cleanup steps (i.e., step 3(c)), each of which contributes at least to . Thus we can lower bound by
(6.12) |
On the other hand by Claim 6.7, there are at most non-zero ’s and at most non-zero ’s in each communication path. Thus
(6.13) |
We now obtain another upper bound using Theorem 3.3. Let extend for the next messages.141414If becomes a leaf before , then we can simply pad dummy messages to it. Then where . Note that fixes ’s and ’s in . Therefore we use to denote with the directions ’s and ’s fixed by . We now continue the bound on :
(by the definition of ) | ||||
(6.14) |
We now analyze using Theorem 3.3. Since cannot be non-zero simultaneously, we rearrange the matrices and assume are the only non-zero matrices where . Then
Note that ’s (resp., ’s) satisfy the condition in Theorem 3.3. Let be the constant151515In particular suffices from our proof in Appendix B. in in Theorem 3.3. Hence
(by Theorem 3.3 and assuming ) | ||||
(since ) |
Thus for any , we have
(6.15) |
For , we plug Equation 6.15 into Equation 6.14 and obtain
(by Equation 6.15) | ||||
(6.16) |
where is another universal constant. Now we have
where the first summation can be bounded by
(by Equation 6.13) | ||||
(since is fixed and each message is at most bits) | ||||
(since ) |
and the second summation is bounded by
(by Equation 6.16) |
Then combining Equation 6.12, we have
Assume and . Then
∎ |
Corollary 6.17.
Assume , , , and . Then for each , we have
Proof.
We prove the bound by induction on . The base case is trivial. For the inductive case, let be the first communication messages. Then we bound
and
separately.
For , observe that if then is root of the protocol, thus and . On the other hand, if , then
(each communication message is at most bits) | ||||
(since and ) |
Now we turn to . Applying Lemma 6.16 with and , we have
(since and ) | ||||
(by induction hypothesis) |
By adding up and , we complete the induction. ∎
Given Corollary 6.17 and suitable choice of the parameters, we now prove the second moment bound.
Proof of Lemma 6.11.
With , , and , by Fact 6.3, we have . Therefore the second moment of is
(by Claim 6.7) | ||||
(by Corollary 6.17) | ||||
∎ |
7 Fourier Growth Reductions For General Gadgets
In this section, we show that Fourier growth bounds of communication protocols for general (constant-sized) gadgets can be reduced to the bounds of XOR-fiber, and vice versa. This implies that in the study of Fourier growth, they are all equivalent.
Let be two positive integers. Let be a gadget. Recall that is the uniform distribution over . We now use to denote the uniform distributions over respectively. We define the -fiber of communication protocols similar to the XOR-fiber:
Definition 7.1.
For any randomized two-party protocol , its -fiber, denoted by , is defined by
where the expectation is also over the internal randomness of .
To compare the Fourier growth bounds between gadgets, we use to denote the upper bound of the level- Fourier growth for the -fiber of an arbitrary randomized communication protocol with at most bits of communication, where is the gadget. Since randomized protocols are convex combinations of deterministic protocols of the same cost, using this notation, our main results Theorems 1.2 and 1.3 can be rephrased as
For any set , define , and similarly for with . Similar to the standard Fourier representation of Boolean functions, the gadget , which is a two-party function, also has Fourier representation:
For convenience, we will assume satisfies the following assumption. It’s easy to see that the XOR gadget satisfies it.
Assumption 7.2.
if or .
Remark 7.3.
This assumption is equivalent to the fact that, restricted on any input to Alice’s side, the remaining function on Bob’s side is balanced, and vice versa.
Even if does not satisfy the assumption, then we can embed it inside a similar gadget , where we XOR the last bit of Alice and the last bit of Bob to the old gadget applied to Alice’s first bits and Bob’s first bits, i.e.,
Then satisfies the assumption and inherits most properties of sufficient for studies in communication complexity tasks.
Now for a protocol , it is also a two-party function and thus admitting similar Fourier representation. We view an input from as indexed by a tuple in . Therefore any subset of is uniquely identified as , where each . We use to denote . Thus the Fourier coefficients of can be written as
and the Fourier representation of is
where and similar for .
Under this notation and assuming Assumption 7.2, we can effectively compute the Fourier coefficients of any -fiber.
Fact 7.4.
Proof.
Observe that
Since by Assumption 7.2, every pair is sampled with the same probability under the conditional distribution. Thus we get
Now we expand and in the Fourier basis and obtain
(by Assumption 7.2) |
as desired. ∎
Now we present the reduction from XOR-fiber to a general -fiber.
Theorem 7.5.
Assume gadget satisfies Assumption 7.2. Then
Proof.
Let be an arbitrary protocol of cost at most . Then for a fixed set , by Fact 7.4 applied to the XOR gadget, we have
(7.1) |
Let and maximize . Since satisfies Assumption 7.2, we know and are not empty sets.
Now define a different protocol as follows: After receiving input , Alice computes for each block ; and Bob computes similarly upon receiving input . Then they execute the protocol on and . That is, . Therefore, for any and satisfying for , we have
Then by Equation 7.1 and Fact 7.4 applied to with gadget , we have
Now summing over all of size , we have
(since has cost at most ) |
Since is arbitrary, this proves the first half of Theorem 7.5. To prove the second half, we use an averaging argument and Parseval’s identity on :
∎ |
Using similar analysis, we also have a reduction from a general -fiber to XOR-fiber.
Theorem 7.6.
Assume gadget satisfies Assumption 7.2. Then
Proof.
Let be an arbitrary protocol of cost at most . Then for a fixed set , by Fact 7.4 applied to gadget and using Assumption 7.2, we have
Therefore
Now let . Let be a distribution over subsets of and its probability density function is defined as:
Then we can rewrite as
(7.2) |
Now we fix an arbitrary sampled from . Note that and are not empty by the definition of and Assumption 7.2. Then define a different protocol as follows: After receiving input , Alice samples uniformly conditioned on for all ; and Bob samples similarly conditioned on for all . Then they execute the protocol on and . That is, . Therefore, for any , we have
By Fact 7.4 applied to and the XOR gadget, we have
Since has cost at most , we have
Putting back to Equation 7.2, we have
which proves the first half of Theorem 7.6 since is arbitrary. To prove the second half, we use Cauchy-Schwarz inequality and Parseval’s identity on :
∎ |
As a corollary, to study the Fourier growth bounds, we can switch between gadgets conveniently, as long as the gadgets have small size.
Corollary 7.7.
Assume gadgets and satisfy Assumption 7.2. Then
8 Directions Towards Further Improvements
In this section we propose potential directions for further improving our second level bounds. In Subsection 8.1, we show that better Fourier growth bounds can be obtained from strong lifting theorems in a black-box way. This relies on the Fourier growth reductions in Section 7. In Subsection 8.2, we examine the bottleneck in our analysis and identify major obstacles within.
8.1 Better Lifting Theorems Imply Better Fourier Growth
Let be a Boolean function. Let be a gadget. A lifting theorem connects the communication complexity of with the query complexity of . Some lifting theorems show that a low-cost communication protocol can be simulated by a low-cost query algorithm.
To be more precise, let be a randomized two-party protocol. Recall Definition 7.1, the -fiber of , denoted , is defined by
We say that satisfies a strong lifting theorem if for all randomized protocols of small communication bits, there is a randomized decision tree of small depth that approximates on each input with error (see e.g., [26]).
Theorem 8.1.
Assume gadget satisfies Assumption 7.2. Assume for any randomized protocol with at most bits of communication, there exists a randomized decision tree of depth at most that approximates with pointwise error at most , i.e.,
Then, for any randomized protocol with at most bits of communication, its XOR-fiber has level- Fourier growth
As a simple corollary, we see that if the assumption of Theorem 8.1 holds with , , and a polylogarithmic-sized gadget (i.e., ), then the second level Fourier growth of the XOR-fiber of any randomized protocol of cost is at most as desired.
We also remark that state-of-the-art lifting results hold with the gadget being either:
- •
-
•
The index function with , [26].161616For deterministic lifting, a better bound is known [37], but it doesn’t suffice for our reduction. In this case the largest Fourier coefficient squared is , which again yields a trivial bound in Theorem 8.1. Nonetheless, even a polynomial improvement on , say , would give new non-trivial bounds in Theorem 8.1 and in turn improves our lower bound on the XOR-lift of Forrelation.
Proof of Theorem 8.1.
Let be a randomized protocol of cost at most . Then by assumption, can be approximated up to error by a randomized decision tree of depth at most . Thus any Fourier coefficient of and differs by at most . Therefore by the level- Fourier growth bounds on randomized decision trees [64, 57], we have
Since is arbitrary, the claimed bound for follows from Theorem 7.5. ∎
8.2 Sums of Squares of Quadratic Forms for Pairwise Clean Sets
In our analysis for the level-two bound, we showed that one can transform a general protocol to a -wise clean protocol with parameter by adding additional cleanup steps in expectation. If one could show that with essentially the same number of steps, one could take , then we would obtain the optimal level-two bound of .
We recall that to bound the number of cleanup steps, we rely on a concentration inequality for sums of squares of orthonormal quadratic forms (Theorem 3.3), which says that if are matrices with zero diagonal and form an orthonormal set when viewed as dimensional vectors, then the random variable satisfies for any . Using this tail bound for and conditioning on where is an arbitrary subset of with Gaussian measure , we obtained a bound . This shows that there can be at most such quadratic forms ’s where the value can be larger than and hence, the reason we can only take . We note that the argument just described is for the non-adaptive setting, while in our case the ’s are also being chosen adaptively, so additional work is needed.
The next example shows that the aforementioned statement is tight even in the non-adaptive setting where the ’s are fixed: in particular, there is a set of large measure and such orthonormal quadratic forms where the above expectation after conditioning on is .
Example 8.2.
For , let for where denotes the matrix where only the entry is one. Note that the matrices form an orthonormal set and they all have a zero diagonal. Let . Then, the Gaussian measure but
Note that the set in the example above is not pairwise clean and for our application, one can get around it by first ensuring that the protocol is pairwise clean and then proceeding with the 4-wise cleanup process. Motivated by this, we speculate that when the set is pairwise clean, then the expected value of the sum of squares of orthonormal quadratic forms is much smaller unlike the example above. Assuming such a statement and combining it with our ideas for handling the adaptivity suggests a potential way of improving the level-two bounds.
References
- AA [18] Scott Aaronson and Andris Ambainis. Forrelation: A problem that optimally separates quantum from classical computing. SIAM J. Comput., 47(3):982–1038, 2018.
- Aar [10] Scott Aaronson. BQP and the polynomial hierarchy. In STOC, pages 141–150, 2010.
- ABK [23] Scott Aaronson, Harry Buhrman, and William Kretschmer. A qubit, a coin, and an advice string walk into a relational problem. arXiv preprint arXiv:2302.10332, 2023.
- Agr [20] Rohit Agrawal. Coin theorems and the fourier expansion. Chic. J. Theor. Comput. Sci., 2020, 2020.
- ALM [20] Radosław Adamczak, Rafał Latała, and Rafał Meller. Hanson–wright inequality in banach spaces. Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 56(4), nov 2020.
- BCW [98] Harry Buhrman, Richard Cleve, and Avi Wigderson. Quantum vs. classical communication and computation. In STOC, pages 63–68. ACM, 1998.
- BIJ+ [21] Jarosław Błasiok, Peter Ivanov, Yaonan Jin, Chin Ho Lee, Rocco A Servedio, and Emanuele Viola. Fourier growth of structured -polynomials and applications. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
- Bor [75] Christer Borell. The brunn-minkowski inequality in gauss space. Inventiones mathematicae, 30(2):207–216, 1975.
- BS [21] Nikhil Bansal and Makrand Sinha. k-forrelation optimally separates quantum and classical query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1303–1316, 2021.
- BTW [15] Eric Blais, Li-Yang Tan, and Andrew Wan. An inequality for the fourier spectrum of parity decision trees. CoRR, abs/1506.01055, 2015.
- BV [10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 30–39, 2010.
- CFK+ [19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting for bpp using inner product. In ICALP, 2019.
- CGR [14] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
- CHHL [19] Eshan Chattopadhyay, Pooya Hatami, Kaave Hosseini, and Shachar Lovett. Pseudorandom generators from polarizing random walks. Theory Comput., 15:1–26, 2019.
- CHLT [18] Eshan Chattopadhyay, Pooya Hatami, Shachar Lovett, and Avishay Tal. Pseudorandom generators from the second fourier level and applications to ac0 with parity gates. In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
- CHRT [18] Eshan Chattopadhyay, Pooya Hatami, Omer Reingold, and Avishay Tal. Improved pseudorandomness for unordered branching programs through local monotonicity. In STOC, pages 363–375. ACM, 2018.
- CKLM [19] Arkadev Chattopadhyay, Michal Koucký, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudo-random properties. Comput. Complex., 28(4):617–659, 2019.
- CR [12] Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299–1317, 2012.
- dRNV [16] Susanna F. de Rezende, Jakob Nordström, and Marc Vinyals. How limited interaction hinders real communication (and what it means for proof and circuit complexity). In FOCS, pages 295–304. IEEE Computer Society, 2016.
- EM [22] Ronen Eldan and Dana Moshkovitz. Reduction from non-unique games to boolean unique games. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
- Gav [20] Dmitry Gavinsky. Entangled simultaneity versus classical interactivity in communication complexity. IEEE Trans. Inf. Theory, 66(7):4641–4651, 2020.
- GKPW [19] Mika Göös, Pritish Kamath, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for P NP. Comput. Complex., 28(1):113–144, 2019.
- GLM+ [15] Mika Göös, Shachar Lovett, Raghu Meka, Thomas Watson, and David Zuckerman. Rectangles are nonnegative juntas. In STOC, pages 257–266. ACM, 2015.
- Göö [15] Mika Göös. Lower bounds for clique vs. independent set. In FOCS, pages 1066–1076. IEEE Computer Society, 2015.
- GPW [15] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. In FOCS, pages 1077–1088. IEEE Computer Society, 2015.
- GPW [20] Mika Göös, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for BPP. SIAM J. Comput., 49(4), 2020.
- GRT [21] Uma Girish, Ran Raz, and Avishay Tal. Quantum versus randomized communication complexity, with efficient players. In ITCS, volume 185 of LIPIcs, pages 54:1–54:20, 2021. Presented in QIP, 2020 as a contributed talk.
- GRZ [21] Uma Girish, Ran Raz, and Wei Zhan. Lower bounds for xor of forrelations. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2021.
- GSTW [16] Parikshit Gopalan, Rocco A. Servedio, Avishay Tal, and Avi Wigderson. Degree and sensitivity: tails of two distributions. CoRR, abs/1604.07432, 2016.
- GTW [21] Uma Girish, Avishay Tal, and Kewen Wu. Fourier growth of parity decision trees. In 36th Computational Complexity Conference (CCC 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
- HHL [18] Hamed Hatami, Kaave Hosseini, and Shachar Lovett. Structure of protocols for XOR functions. SIAM J. Comput., 47(1):208–217, 2018.
- INW [94] Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for network algorithms. In STOC, pages 356–364. ACM, 1994.
- IRR+ [21] Siddharth Iyer, Anup Rao, Victor Reis, Thomas Rothvoss, and Amir Yehudayoff. Tight bounds on the fourier growth of bounded functions on the hypercube. arXiv preprint arXiv:2107.06309, 2021.
- KKL [88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions (extended abstract). In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80. IEEE Computer Society, 1988.
- KMR [17] Pravesh K. Kothari, Raghu Meka, and Prasad Raghavendra. Approximating rectangles by juntas and weakly-exponential lower bounds for LP relaxations of csps. In STOC, pages 590–603. ACM, 2017.
- Lee [19] Chin Ho Lee. Fourier bounds and pseudorandom generators for product tests. In Amir Shpilka, editor, 34th Computational Complexity Conference, CCC 2019, July 18-20, 2019, New Brunswick, NJ, USA, volume 137 of LIPIcs, pages 7:1–7:25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
- LMM+ [22] Shachar Lovett, Raghu Meka, Ian Mertz, Toniann Pitassi, and Jiapeng Zhang. Lifting with sunflowers. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
- LPV [22] Chin Ho Lee, Edward Pyne, and Salil P. Vadhan. Fourier growth of regular branching programs. In Amit Chakrabarti and Chaitanya Swamy, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms DBLP:conf/approx/LeePV22and Techniques, APPROX/RANDOM 2022, September 19-21, 2022, University of Illinois, Urbana-Champaign, USA (Virtual Conference), volume 245 of LIPIcs, pages 2:1–2:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
- LRS [15] James R. Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semidefinite programming relaxations. In STOC, pages 567–576. ACM, 2015.
- LSS+ [19] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. A fixed-depth size-hierarchy theorem for via the coin problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 442–453, 2019.
- LV [18] Chin Ho Lee and Emanuele Viola. The coin problem for product tests. ACM Transactions on Computation Theory (TOCT), 10(3):1–10, 2018.
- Man [95] Yishay Mansour. An learning algorithm for DNF under the uniform distribution. J. Comput. Syst. Sci., 50(3):543–550, 1995. Appeared in COLT, 1992.
- MO [10] Ashley Montanaro and Tobias Osborne. On the communication complexity of xor functions, 2010.
- O’D [14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
- OS [07] Ryan O’Donnell and Rocco A. Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827–844, 2007.
- Raz [87] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
- Raz [95] Ran Raz. Fourier analysis for probabilistic communication complexity. Comput. Complex., 5(3/4):205–221, 1995.
- RM [99] Ran Raz and Pierre McKenzie. Separation of the monotone NC hierarchy. Comb., 19(3):403–435, 1999.
- RPRC [16] Robert Robere, Toniann Pitassi, Benjamin Rossman, and Stephen A. Cook. Exponential lower bounds for monotone span programs. In FOCS, pages 406–415. IEEE Computer Society, 2016.
- RS [10] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of ac. SIAM J. Comput., 39(5):1833–1855, 2010.
- RSV [13] Omer Reingold, Thomas Steinke, and Salil P. Vadhan. Pseudorandomness for regular branching programs via Fourier analysis. In APPROX-RANDOM, pages 655–670. Springer, 2013.
- RT [19] Ran Raz and Avishay Tal. Oracle separation of BQP and PH. In STOC, pages 13–23. ACM, 2019. Presented in QIP, 2019 as a plenary talk. Accepted to the Journal of the ACM.
- RY [22] Anup Rao and Amir Yehudayoff. Anticoncentration and the exact gap-hamming problem. SIAM Journal on Discrete Mathematics, 36(2):1071–1092, 2022.
- She [11] Alexander A. Sherstov. The pattern matrix method. SIAM J. Comput., 40(6):1969–2000, 2011.
- She [12] Alexander A. Sherstov. The communication complexity of gap hamming distance. Theory Comput., 8(1):197–208, 2012.
- Smo [87] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Alfred V. Aho, editor, Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, New York, USA, pages 77–82. ACM, 1987.
- SSW [21] Alexander A Sherstov, Andrey A Storozhenko, and Pei Wu. An optimal separation of randomized and quantum query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1289–1302, 2021.
- ST [78] Vladimir N Sudakov and Boris S Tsirel’son. Extremal properties of half-spaces for spherically invariant measures. Journal of Soviet Mathematics, 9(1):9–18, 1978.
- SVW [17] Thomas Steinke, Salil P. Vadhan, and Andrew Wan. Pseudorandomness and Fourier-growth bounds for width-3 branching programs. Theory of Computing, 13(1):1–50, 2017. Appeared in APPROX-RANDOM, 2014.
- SZ [08] Yaoyun Shi and Zhiqiang Zhang. Communication complexities of xor functions. arXiv preprint arXiv:0808.1762, 2008.
- SZ [09] Yaoyun Shi and Yufan Zhu. Quantum communication complexity of block-composed functions. Quantum Inf. Comput., 9(5&6):444–460, 2009.
- Tal [96] Michel Talagrand. How much are increasing sets positively correlated? Comb., 16(2):243–258, 1996.
- Tal [17] Avishay Tal. Tight bounds on the Fourier spectrum of AC0. In Computational Complexity Conference, volume 79 of LIPIcs, pages 15:1–15:31. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.
- Tal [20] Avishay Tal. Towards optimal separations between quantum and randomized query complexities. In FOCS, pages 228–239. IEEE, 2020.
- TWXZ [13] Hing Yin Tsang, Chung Hoi Wong, Ning Xie, and Shengyu Zhang. Fourier sparsity, spectral norm, and the log-rank conjecture. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 658–667, 2013.
- Ver [18] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Vid [12] Thomas Vidick. A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gap-hamming-distance problem. Chic. J. Theor. Comput. Sci., 2012, 2012.
- Wu [22] Xinyu Wu. A stochastic calculus approach to the oracle separation of BQP and PH. Theory Comput., 18:1–11, 2022.
- WYY [17] Xiaodi Wu, Penghui Yao, and Henry S. Yuen. Raz-mckenzie simulation with the inner product gadget. Electron. Colloquium Comput. Complex., 24:10, 2017.
- Zha [14] Shengyu Zhang. Efficient quantum protocols for xor functions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1878–1885. SIAM, 2014.
Appendix A Gap-Hamming Lower Bounds
As an immediate consequence of Theorem 1.5, we can derive optimal lower bounds against the Gap-Hamming problem as in Theorem 1.6.
Proof of Theorem 1.6.
Set . Fix the randomness to be any and let refer to the deterministic protocol with randomness fixed to . Suppose for a sufficiently small constant , we apply Theorem 1.5 on as well as , and apply triangle inequality to conclude that
Let be the distribution of induced by sampling and and letting , similarly define but with . We now expand in terms of , take an expectation over and apply triangle inequality to conclude that
(A.1) |
Hoeffding’s inequality implies that for , we have
This implies that a random is a yes instance of the Gap-Hamming problem with probability larger than . Let denote conditioned on Yes instances of the Gap-Hamming problem. Similarly define to be conditioned on No instances of the Gap-Hamming problem. Since has outputs in , we have
and
This, along with Equation A.1 and triangle inequality, implies that
However, this contradicts the assumption that the protocol solves the Gap-Hamming problem with advantage at least . ∎
Appendix B Concentration for Sum of Squares of Quadratic Forms
Here we prove Theorem 3.3. While it follows from [5, Theorem 6] which is a Banach space-valued version of the Hanson-Wright inequality, in our setting a weaker statement suffices, for which we give a self-contained proof following [5].
For any integer , we use to denote the unit Euclidean ball in . For any two sets , we define . For any set and any number , we define . Let be the cumulative distribution function of the standard Gaussian distribution, i.e., .
Theorem B.1 (Gaussian Isoperimetric Inequality).
Let be a measurable set and assume for some . Then for any , we have .
In particular, if , then we can pick in Theorem B.1 and have
(B.1) |
Now we are ready to prove Theorem 3.3.
Proof of Theorem 3.3.
Note that the bound is trivial when . Thus from now on we assume without loss of generality .
For each , let . We first write as a squared Euclidean norm of a vector:
-
•
For , we view as a length- row vector.
-
•
Let be a matrix where the -th row is .
Therefore we have
(B.2) |
where is the standard tensor product and the second equality follows since each has zero diagonal.
Define , , and . Let , , and be their mean. Define the set
By Markov’s inequality and union bound, we have the Gaussian measure of is . Then by Equation B.1, we have
(B.3) |
Now for an arbitrary , we write where and . Then
where . This, together with Equation B.2 and Equation B.3, implies
(B.4) |
Now we calculate in the following claim, the proof of which will be presented later.
Claim B.2.
, , and .
Finally we present the missing proof of Claim B.2.
Proof of Claim B.2.
First we observe that rows of are unit vectors, therefore
(B.5) |
In addition, rows of are orthogonal to each other, therefore the operator norm of is
(B.6) |
We index the columns of by and let the column vectors of be . Since rows of are flattened matrices with zero diagonal, we have
(B.7) |
Now we bound separately.
Bounding .
Observe that
(by convexity) | ||||
(by Equation B.7) | ||||
(by Equation B.5) |
Bounding and .
Fix an arbitrary and we first simplify . For each , define vector and let be the matrix with ’s as column vectors. Then
(B.8) |
Now we bound :
(by convexity) | ||||
(by Equation B.8) | ||||
(by Equation B.5) |
Similar argument works for .
Bounding .
Note that for any , we have . Thus, by Equation B.6, we have
∎ |