Lower Bounds for Set-Multilinear Branching Programs
Abstract
In this paper, we prove super-polynomial lower bounds for the model of sum of ordered set-multilinear algebraic branching programs, each with a possibly different ordering (). Specifically, we give an explicit -variate polynomial of degree such that any computing it must have size for as low as . Notably, this constitutes the first such lower bound in the low degree regime. Moreover, for , we demonstrate an exponential lower bound. This result generalizes the seminal work of Nisan (STOC, 1991), which proved an exponential lower bound for a single ordered set-multilinear ABP.
The significance of our lower bounds is underscored by the recent work of Bhargav, Dwivedi, and Saxena (to appear in TAMC, 2024), which showed that super-polynomial lower bounds against a sum of ordered set-multilinear branching programs – for a polynomial of sufficiently low degree – would imply super-polynomial lower bounds against general ABPs, thereby resolving Valiant’s longstanding conjecture that the permanent polynomial can not be computed efficiently by ABPs. More precisely, their work shows that if one could obtain such lower bounds when the degree is bounded by , then it would imply super-polynomial lower bounds against general ABPs.
Our results strengthen the works of Arvind & Raja (Chic. J. Theor. Comput. Sci., 2016) and Bhargav, Dwivedi & Saxena (to appear in TAMC, 2024), as well as the works of Ramya & Rao (Theor. Comput. Sci., 2020) and Ghoshal & Rao (International Computer Science Symposium in Russia, 2021), each of which established lower bounds for related or restricted versions of this model. They also strongly answer a question from the former two, which asked to prove super-polynomial lower bounds for general .
1 Introduction
1.1 Background on Algberaic Complexity
In his seminal work ([Val79]) in 1979, Valiant proposed an algebraic framework to study the computational complexity of computing polynomials. Algebraic Complexity Theory is this study of the complexity of computational problems which can be described as computing a multivariate polynomial over some elements lying in a fixed field . Several fundamental computational tasks such as computing the determinant, permanent, matrix product, etc., can be represented using this framework. The natural computational models that we investigate in this setting are models such as algebraic circuits, algebraic branching programs, and algebraic formulas.
An algebraic circuit over a field for a multivariate polynomial is a directed acyclic graph (DAG) whose internal vertices (called gates) are labeled as either (sum) or (product), and leaves (vertices of in-degree zero) are labeled by the variables or constants from . A special output gate (the root of the DAG) represents the polynomial . If the DAG happens to be a tree, such a resulting circuit is called an algebraic formula. The size of a circuit or formula is the number of nodes in the DAG. We also consider the product-depth of the circuit, which is the maximum number of product gates on a root-to-leaf path. The class (respectively, ) is then defined to be the collection of all polynomials having at most polynomially large degree which can be computed by polynomial-sized circuits (respectively, formulas).
The class is synonymous to what we understand as efficiently computable polynomials. The class , whose definition is similar to the boolean class , is in some sense a notion of what we deem as explicit. Much like the problem of proving circuit size lower bounds for explicit boolean functions, the problem of proving them for explicit polynomials (i.e., showing ) has also remained elusive for many decades. However, because the latter only deals with formal symbolic computation as opposed to modelling semantic truth-table constraints, it is widely believed to be easier to resolve than its boolean counterpart. In fact, it is even known to be a pre-requisite to the conjecture in the non-uniform setting ([Bür00]).
An algebraic branching program (ABP) is a layered DAG with two special nodes in it: a start-node and an end-node. All edges of the ABP go from layer to layer for some (say start-node is the unique node in layer and end-node is the unique node in the last layer) and are labeled by a linear polynomial. Every directed path from start-node to end-node computes the monomial , which is the product of all labels on the path . The ABP computes the polynomial , where the sum is over all paths from start-node to end-node. Its size is simply the number of nodes in the DAG, its depth is the length of the longest path from the start-node to the end-node, and width is the maximum number of nodes in any layer. The class is then defined to be the collection of all polynomials (with polynomially-bounded degree) which can be computed by polynomial-sized branching programs. ABPs are known to be of intermediate complexity between formulas and circuits; in other words, we know the inclusions .
It is conjectured that all of these inclusions are strict, and resolving any of these conjectures would represent a dramatic advancement in algebraic complexity theory, and even more broadly, in circuit complexity overall. Valiant’s original hypothesis in [Val79] pertains to showing a super-polynomial separation between the complexity of computing the determinant and the permanent polynomials. This is known to be equivalent to the conjecture, i.e., showing super-polynomial size lower bounds against ABPs computing explicit polynomials. At present, the best known lower bound against ABPs is only quadratic ([CKSV22]), and it appears as though we are quite distant from addressing this conjecture. On the other hand, as we now elaborate, while not directly improving upon this quadratic bound, this paper makes significant progress towards a different line of attack aimed at resolving Valiant’s conjecture.
1.2 Set-Multilinearity: A Key Syntactic Restriction
One key advantage that algebraic models offer over their boolean counterparts is that of syntactic restrictions. A recurring theme in algebraic complexity theory is to first efficiently convert general models of computation (such as circuits or formulas) to special kinds of syntactically-restricted models, show strong lower bounds against these restricted models, and then recover non-trivial lower bounds against the original general models owing to the efficiency of this conversion. This phenomenon is termed hardness escalation. In this subsection, we describe one crucial example of a syntactic restriction in detail, that of set-multilinearity.
A polynomial is said to be homogeneous if each monomial has the same total degree and multilinear if every variable occurs at most once in any monomial. Now, suppose that the underlying variable set is partitioned into sets . Then the polynomial is said to be set-multilinear with respect to this variable partition if each monomial in has exactly one variable from each set. Note that a set-multilinear polynomial is both multilinear and homogeneous, and has degree precisely if it is set-multilinear over sets. Next, we define different models of computation corresponding to these variants of polynomials classes. An algebraic formula/branching program/circuit is set-multilinear with respect to a variable partition if each internal node in the formula/branching program/circuit computes a set-multilinear polynomial.555Of course, a non-root node need not be set-multilinear with respect to the entire variable partition. Nevertheless, here we demand that it must be set-multilinear with respect to some subset of the collection . Multilinear and homogeneous formulas/branching programs/circuits are defined analogously.
We now describe several important hardness escalation results, each reducing general models to corresponding set-multilinear models.
Constant depth circuits.
The recent celebrated breakthrough work of Limaye, Srinivasan, and Tavenas ([LST21]) establishes super-polynomial lower bounds for general algebraic circuits for all constant-depths, a problem that was open for many decades. In order to show this, it is first shown that general low-depth algebraic formulas can be converted to set-multilinear algebraic formulas of low depth as well, and without much of a blow-up in size (as long as the degree is small). Subsequently, strong lower bounds are established for low-depth set-multilinear circuits (of small enough degree), which when combined with the first step yields the desired lower bound for general constant-depth circuits.
Formulas.
Raz [Raz13] showed that if an -variate set-multilinear polynomial of degree has an algebraic formula of size , then it also has a set-multilinear formula of size . In particular, for a set-multilinear polynomial of degree , it follows that has a formula of size if and only if has a set-multilinear formula of size . Thus, having set-multilinear formula size lower bounds for such a low degree would imply super-polynomial lower bounds for general formulas. A recent line of work by Kush and Saraf ([KS22, KS23]) can be viewed as an attempt to prove general formula lower bounds via this route.
Algebraic Branching Programs.
In fact, in the context of ABPs as well, the very recent work of Bhargav, Dwivedi, and Saxena ([BDS24]) reduces the problem of showing lower bounds against general ABPs to proving lower bounds against sums of ordered set-multlinear ABPs (again, as long as the degree is small enough). Ordered set-multilinear ABPs are, in fact, historically well-studied models and extremely well-understood. However, despite their apparent simplicity, the work [BDS24] implies that understanding their sums – a model that is far less studied – is at the forefront of understanding Valiant’s conjecture. We state their result formally as Theorem 1.5 in Section 1.3.
First however, as this is also the main model considered in this paper, we begin by formally defining ordered set-multilinear ABPs and outlining their importance.
Definition 1.1 (Ordered smABP).
Given a variable partition , we say that a set-multilinear branching program of depth is said to be ordered with respect to an ordering (or permutation) if for each , all edges of the ABP from layer to layer are labeled using a linear form over the variables in . It is simply said to be ordered if there exists an ordering such that it is ordered with respect to .
At this point, it is essential to take note of the terminology in this context: in this paper, a general (or “unordered”) set-multilinear branching program refers to an ABP for which each internal node computes a polynomial that is set-multilinear with respect to some subset of the global partition, whereas an ordered set-multilinear branching program is more specialized and has the property that any two nodes in the same layer compute polynomials that are set-multilinear with respect to the same partition.
Remark 1.2.
This notion of ordered set-multilinear branching programs turns out to be equivalent to the more commonly used notions of (i) “read-once oblivious algebraic branching programs (ROABPs)”, as well as (ii) “non-commutative algebraic branching programs” (see, for example, [FS13]). This relationship, especially with the former model, is described in more detail later in Section 1.4.
Definition 1.3 ().
Given a polynomial that is set-multilinear with respect to the variable partition , we say that is a computing if indeed , and each is an ordered set-multilinear branching program i.e., each is ordered with respect to some . We call (i.e., the number of summands in a ) its support size and define its max-width and total-width to be the maximum over the width of each and the sum of the width of each , respectively.
We have known exponential width lower bounds against a single ordered set-multilinear ABP since the foundational work of Nisan. In [Nis91], he showed that there are explicit polynomials (in fact, in ) which require any ordered set-multilinear ABP computing them to be of exponentially large width. Viewed differently, this work even shows that in the non-commutative setting, 666We briefly explain the connection between ordered set-multilinear ABPs and non-commutative computation in Section 1.4.. More crucially however, this work introduced a powerful technique – a notion known as the partial derivative method – that has been instrumental in the bulk of the major advancements in algebraic complexity theory over the past three decades (such as [NW97, Raz06, RY09, Kay12, KST16, KLSS17, KS17, LST21, TLS22], see also [SY10, Sap15]).
Despite the considerable development of the partial derivative technique over the course of these works (and many more) for proving strong lower bounds against various algebraic models, relatively little is known about a general sum of ordered set-multilinear ABPs – a simple and direct generalization of the original model considered by Nisan. There is some progress in the literature towards this goal but which still requires additional structural restrictions on either the max-width or the support size or the size of each part in the variable partition. The work [AR16] of Arvind and Raja shows that any of support size computing the permanent polynomial requires max-width (and therefore, total-width) at least . Note that for this bound to be super-polynomial, the support size needs to be heavily restricted i.e., must be sub-linear. On the other hand, the work [BDS24] also shows a super-polynomial lower bound in this context: it implies that no of polynomially-bounded total-width can compute the iterated matrix multiplication (IMM) polynomial. However, their work requires the additional assumption that the max-width of such an is , that is sub-polynomial in the number of variables.
Apart from these, Ramya and Rao ([RR20]) use the partial derivative method to show an exponential lower bound against the related model of sum of ROABPs in the multilinear setting, as well as some other structured multilinear ABPs. Their lower bounds are for a multilinear polynomial that is computable by a small multilinear circuit. Ghoshal and Rao ([GR21]) partially extend their work by proving an exponential lower bound, for a polynomial that is computable even by a small multilinear ABP, against sums of ROABPs that have polynomially bounded width. Notably, these results can be viewed as lower bounds against the model where each variable set in the variable partition has size (that is, the total number of variables is ). This is because a multilinear polynomial and any multilinear model computing it (such as circuit, formula, or branching program) can be converted, in a generic manner, to a set-multilinear polynomial and the corresponding set-multilinear model respectively, with each variable set having size (also see Section 1.5 for a discussion). However, from the perspective of hardness escalation of [BDS24] that is described above – and which is indeed the focus of our work – the setting of that is far more interesting is when it is allowed to be considerably smaller than . More precisely, the framework of [BDS24] requires (stated formally as Theorem 1.5 below). A detailed discussion about the results in [RR20], [GR21] and how they compare with our work can be found in Section 1.5.
1.3 Our Results
Our main result is in this paper is a super-polynomial lower bound against an unrestricted sum of ordered set-multilinear branching programs, for a hard polynomial with “small” degree. We first state this result formally below, and then subsequently explain the connection with the hardness escalation result of [BDS24] that is alluded to in the previous subsections.
Theorem 1.4 (“Low”-Degree Lower Bounds).
Let be growing parameters satisfying . There is a -variate degree set-multilinear polynomial in such that cannot be computed by a of total-width .
Next, we formally state the aforementioned hardness escalation result of [BDS24]. In words, in order to show , it suffices to show lower bounds for any computing a polynomial whose degree is at most about logarithmic in the number of variables. Towards this goal, our main result above (Theorem 1.4) shows a super-polynomial lower bound for any computing an explicit set-multilinear polynomial, whose degree is barely super-logarithmic in the number of variables. In this sense, it approaches the resolution of Valiant’s conjecture.
Theorem 1.5 (Hardness Escalation of [BDS24]).
Let be growing parameters with . Let be a -variate degree set-multilinear polynomial in (respectively, ). If cannot be computed by a of total-width , then (respectively, ).
Next, we also give an explicit set-multilinear polynomial (with polynomially-large degree) such that any computing it must require exponential total-width. This strongly answers a question left open in both [AR16] and [BDS24].
Theorem 1.6 (Exponential Lower Bounds for ).
There is a set-multilinear polynomial in , in variables and of degree , such that any computing requires total-width .
Theorem 1.6 and Theorem 1.4 are also true when (as defined in Section 2.4) is replaced by the appropriate Nisan-Wigderson polynomial (as defined in Section 2.3), which is known to be in . In fact, we first indeed established them for the Nisan-Wigderson polynomial, and then used some of the ideas presented in a recent work by Kush and Saraf ([KS23]) to make the hard polynomial lie in . We also acknowledge that an exponential lower bound – with weaker quantitative parameters – for the related model of multilinear ROABPs was obtained in [RR20]. For a comparison of this model with the model, see Sections 1.4 and 1.5.
With additional effort, and building upon the machinery777This is explained in more detail in Section 1.6. of [KS23] (which, in turn, uses the techniques developed in [DMPY12]), we can almost recover the same lower bounds as in Theorem 1.6 and Theorem 1.4 for a set-multilinear polynomial even in . We preferred to first state Theorem 1.6 and Theorem 1.4 in the manner above because (i) the proof is less intricate and in fact, even serves as a prelude to the proof of the latter, and (ii) to draw a direct comparison and contrast with the hardness escalation statement (Theorem 1.5). We now state these results for when the hard polynomial is the polynomial and then describe two intriguing consequences.
Theorem 1.6’.
There is a fixed constant and a set-multilinear polynomial in , in variables and of degree , such that any computing requires total-width .
Theorem 1.4’.
Let be growing parameters satisfying . There is a -variate, degree set-multilinear polynomial in such that cannot be computed by a of total-width .
The first intriguing consequence of proving the statements above is that we are able to show that the ABP set-multilinearization process given in [BDS24] is nearly tight, as is known to have a small set-multilinear branching program and yet, any computing it must have large total-width. To make this point effectively, we first state the following key ingredient in the proof of Theorem 1.5, and subsequently state our tightness result.
Lemma 1.7 (ABP Set-Multilinearization in [BDS24]).
Let be a polynomial of degree that is set-multilinear with respect to the partition where for all . If can be computed by an ABP of size , then it can also be computed by a of max-width and total-width .
Theorem 1.8 (Near-Tightness of ABP Set-Multilinearization).
For large enough integers , there is a polynomial which is set-multilinear over the variable partition with each , and such that:
-
•
it has a branching program of size ,
-
•
but any of max-width computing requires total-width .
The second intriguing consequence is the fact that Theorem 1.8 can also be viewed as an exponential separation between the model of (general) small-width set-multilinear branching programs and the model of sums of small-width ordered set-multilinear branching programs. Moreover, we can improve this bound much further in the case of a single ordered set-multilinear branching program. More precisely, in Theorem 1.9 below, we answer a question posed in [KS23] about the relative strength of an unordered and (a single) ordered set-multilinear branching program, by obtaining a near-optimal separation. A priori, as is shown in [KS23] and as mentioned earlier in the introduction, if these two models coincided (i.e., if a general set-multilinear ABP could be simulated by a small and ordered one), then it would have led to super-polynomial lower bounds for general algebraic formulas.
Theorem 1.9 (Near-Optimal Separation between Ordered and Unordered smABPs).
There is a polynomial which is set-multilinear over the variable partition with each , and such that:
-
•
it has a set-multilinear branching program of size ,
-
•
but any ordered set-multilinear branching program computing requires width .
Note that has at most monomials and so, it trivially has an ordered set-multilinear ABP of width . Therefore, the lower bound above is essentially optimal.
1.4 The ROABP Perspective
One can also view all of our results described in Section 1.3 through the lens of another well-studied model in algebraic complexity theory, namely read-once oblivious algebraic branching programs (ROABPs).
Definition 1.10 (ROABP).
For integers and a permutation , an ABP over the variables is said be a read-once oblivious algebraic branching program (ROABP) in the order of individual degree if for each , all edges from layer to are labelled by univariate polynomials in of degree at most .
ROABPs were first introduced in this form by Forbes and Shpilka in [FS13], where it is also noted that proving lower bounds against ordered set-multilinear ABPs (as in Definition 1.1) is equivalent to proving lower bounds against ROABPs as well as non-commutative ABPs.
Suppose is a set-multilinear polynomial with respect to with . Then we can define an associated polynomial as follows.
Now let us assume that can be computed by an ROABP of size that is ordered with respect to . Then a set-multilinear ABP ordered with respect to can be constructed using it, by simply replacing by and erasing any degree zero components on each edge. It is easy to check that this computes and we can use the lower bound against ordered set-multilinear ABPs for to prove a lower bound against ROABPs for . Conversely, given , we can define with by by replacing with . We could then use an ordered set-multilinear ABP computing to construct an ROABP (in the same order) computing by using the inverse transformation, thereby proving that lower bounds against ROABPs imply lower bounds against ordered set-multilinear ABPs. Furthermore, the computation that an ROABP (or an ordered set-multilinear ABP) performs can be seen to be non-commutative. This is because the variables (or linear forms) along a path get multiplied in the same order as that of the ROABP (or ordered set-multilinear ABP).
As a consequence, exponential lower bounds follow for a single ROABP from the work of Nisan ([Nis91]), and also from later works ([Jan08, KNS20]). Using the transformation described above, our lower bounds (Theorems 1.6 and 1.6’) can also be viewed as exponential lower bounds for the model of sum of ROABPs. The work of Ramya & Rao [RR20] also prove (weaker) exponential lower bounds against this model for a multilinear polynomial computable by multilinear circuits. In a follow-up work, Ghoshal & Rao [GR21] prove an exponential lower bound against sums of ROABPs with the additional restriction that the summand ROABPs have pollynomially-bounded width for a mulilinear polynomial computable by multlinear ABPs. On the other hand, the works of Arvind & Raja ([AR16]) and Bhargav, Dwivedi & Saxena ([BDS24] provide lower bounds in certain restricted versions of this model. Along with these, the work of Anderson, Forbes, Saptharishi, Shpilka, and Volk ([AFS+18]) also implies an exponential lower bound for a restricted version (for the sum of ROABPs when ).
Finally, we note that ROABPs have been studied extensively in the context of another central problem in algebraic complexity theory: that of polynomial identity testing (PIT). The PIT question for a general algebraic model is the following: Given access to an -variate polynomial of degree at most that can be computed in the model of (an appropriate measure of) complexity at most , determine whether in time. When one is given access to the model computing explicitly, this flavour of PIT is called white-box PIT, and when one is merely provided query access to , it is called black-box PIT.
The solution to the PIT problem for ROABPs in the white-box setting follows from a result by Raz and Shpilka ([RS05] – where it is stated in the equivalent language of non-commutative computation). However, the corresponding problem in the black-box setting remains open to this date, with the best-known time bound in the black-box setting still being only due to the work by Forbes and Shpilka ([FS13]), who additionally assumed that the ordering of the ROABP is known. This was matched later by Agrawal Gurjar, Korwar, and Saxena ([AGKS15]) in the unknown order setting, improving upon the work of Forbes, Saptharishi and Shpilka ([FSS14]). Guo and Gurjar improved the result further by improving the dependence on the width [GG20]. Additionally, there have been various improvements to this result in restricted settings ([GKS17, GV20, BG22]) and some other works that study PIT for a small sum of ROABPs ([GKST17, BS21, GG20]). When the number of summands is super-constant, the question of even white-box PIT remains wide open.
1.5 Related Work
In this subsection, we discuss two closely related papers, namely those of Ramya & Rao [RR20] and Ghoshal & Rao [GR21]888We thank Ben Lee Volk and Utsab Ghosal for pointing out these papers to us after the release of an initial pre-print of this article, which erroneously claimed that it was the first to show super-polynomial lower bounds in the sum of ROABPs model., which study the model of sum of ROABPs in the multilinear setting. In [RR20, Theorem 1], the authors show that there exists an explicit multilinear polynomial (computable by a small multilinear circuit) such that any sum of ROABPs computing it has exponential size. In [GR21, Theorem 2], the authors show a similar lower bound for an explicit multilinear polynomial (computable by a small multilinear ABP) – albeit, in the restricted setting where the summand ROABPs have polynomially-bounded width.
Using the transformation described in Section 1.4, one can then view these lower bounds as ones against the model in the special case that each bucket in the variable partition has size . (To see how a multilinear polynomial say over the variable set can be set-multilinearized trivially, here is a sketch: for each variable , have a variable set comprising of two fresh variables and in the new set-multilinear polynomial; here, the latter is to signify the “presence” of in any monomial of the original multilinear polynomial, whereas the former is to signify its “absence”.) Additionally, it is not hard to see that the set-multilinearized version of the hard polynomials (in the manner just described) used in [RR20, Theorem 1] and [GR21, Theorem 2] are efficiently computable by small set-multilinear circuits and set-multilinear ABPs respectively. We note, however, that even so, our result in the high-degree setting where the hard polynomial is in (Theorem 1.6) is quantitatively better than [RR20, Theorem 1]. Additionally, our result in the high-degree setting where the hard polynomial is in (1.6’) is both quantitatively as well as qualitatively better than [GR21, Theorem 2] – the latter since we do not assume any bound on the width of the individual summand ordered set-multilinear ABPs. More crucially, our techniques enable us to prove super-polynomial bounds even when the degree is vastly smaller than the number of variables – in particular, when is as low as (Theorem 1.4 and 1.4’) – which is the more interesting regime of parameters due to the work of [BDS24].
Ramya and Rao [RR20] also study another model, which they call sum of -set-multilinear ABPs. They define -set-multilinear ABPs to be ABPs with layers, where is the number of variables in the polynomial being computed. Any edge between layer and in an -set-multilinear ABP is labelled by an arbitrary multilinear polynomial over , where is a partition of the variable set. Then, for , they establish exponential lower bounds against sum of -set-multilinear ABPs for a polynomial that is multilinear, but which is not set-multilinear under the variable partition that the model respects. Hence, even though this model is more general than ordered set-multilinear ABPs, this result [RR20, Theorem 3] is also not comparable with ours as our hard polynomial is set-multilinear. Again, more crucially, the result [RR20, Theorem 3] does not handle the “low-degree” regime — a setting in which our techniques allow us to prove lower bounds.
1.6 Proof Overview
The organization of this subsection is as follows: we first describe the basics of the partial derivative method and summarize its typical application in proving lower bounds against a generic set-multilinear model of computation. Next, we briefly describe Nisan’s original partial derivative method from [Nis91] to prove lower bounds specifically against a single ordered set-multilinear branching program. We then describe an alternative approach that yields a slightly weaker bound for the same model, but nevertheless is versatile enough that we can generalize it considerably more in order to prove Theorems 1.6 and 1.4. Finally, we describe the additional ideas needed in order to situate the hard polynomial in these theorems in and in the process, establish the tightness result for ABP set-multilinearization (Theorem 1.8).
Partial Derivative Measure Basics:
The high-level idea is to work with a measure that we show to be “small” for all polynomials computed by a specified model of computation – the model against which we wish to prove lower bounds. If we can also show that there is a “hard” polynomial for which the measure is in fact “large”, then it follows that this polynomial cannot be computed by the specified model. These partial derivative measures, after the initial work ([Nis91]) by Nisan, were further developed by Nisan and Wigderson in [NW97], who used them to prove some constant-depth set-multilinear formula lower bounds. Since then, variations of these measures have also been used to prove various other stronger set-multilinear formula lower bounds (e.g., [LST21, TLS22, LST22, BDS22, KS22, KS23]).
Given a variable partition , the idea is to label each set of variables as ‘’ or ‘’ according to some rule (called a “word”) . Let and denote the set of positive and negative indices (or coordinates) respectively, and let and denote the sets of all set-multilinear monomials over and respectively. For a polynomial that is set-multilinear over the given variable partition , the measure then is simply the rank of the “partial derivative matrix” , whose rows are indexed by the elements of and columns indexed by , and the entry of this matrix corresponding to a row and a column is the coefficient of the monomial in .
For a subset , let denote the sum of those coordinates of that lie in . In other words, measures the amount of “bias” that the rule exhibits when restricted to the coordinates. Note that the rank of can never exceed . Furthermore, we have that the rank measure is multiplicative: if and are polynomials that are set-multilinear over disjoint subsets of the global partition , then the rank of is the product of the ranks of and . These two observations, combined with the sub-additivity of rank, provide a recipe for showing lower bounds against any given set-multilinear model of computation: the overall idea is to carefully split up the original model into smaller, multiplicatively disjoint parts and then argue the existence of a rule for which enough of these parts exhibit high bias. This process allows us to prove that the measure is small for the model of computation. Therefore, one can conclude that any explicit polynomial for which the measure is provably high – which needs to established separately – can not be computed by this model. It is known ([KS22, KS23]) that there is a set-multilinear polynomial in (see Section 2.3) as well as a set-multilinear polynomial in (see Section 2.4) for which the matrices have full-rank, whenever .
Nisan’s original lower bound:
Let us first summarize how Nisan’s original partial derivative method from [Nis91], as alluded to in Section 1.2, can be applied in this context to obtain lower bounds against the size of a single ordered set-multilinear ABP (ordered smABP) computing the aforementioned “full-rank” polynomials. Given any set-multilinear branching program ordered with respect to some permutation computing , the idea is to pick a word such that the labels in precisely correspond to the “left half” of the ordering , and the labels correspond to the “right half”. One can then observe that the rank of serves as a lower bound on the number of nodes in the middle layer of the ABP, yielding a near-optimal lower bound: this is because the matrix is easily seen to be the product of an and an matrix.
We now sketch an alternate proof: rather than constructing a word dependent on the ordering of variable sets in the ordered smABP as above, choose a uniformly random999We also need to suitably condition on the event that the word is symmetric (i.e., ) in order to use the full-rank property of the hard polynomial – the probability of this event is . For ease of exposition, we omit the technical details in this sketch. word from . We demonstrate that, with positive probability, the rank of is bounded by , where is the width of the middle layer in : Standard anti-concentration bounds imply that, with at least constant probability, the bias in the left and right halves of is . Since can be expressed as a sum of polynomials for , where each and are ordered smABPs with respect to disjoint subsets of the global partition, we encounter a loss of a factor of in the rank of the product polynomial due to the bias of . This, combined with the sub-additivity of rank, shows the desired bound of on the rank of . Finally, we exploit the full-rank property of with respect to such words to establish a lower bound of on the width of a single ordered smABP computing . Notably, this bound is indeed slightly worse than what one can obtain by manually defining a rule deterministically, which ensures a maximal bias of in each half of as described in the paragraph above.
Generalization of the alternative argument:
The alternative argument described above yields an exponential lower bound even for a sum of ordered smABPs, assuming the number of summands is small. Consider a of the form , of max-width , computing . For each summand , the analysis above provides an upper bound of on the rank of with constant probability. If the number of summands is a small enough constant, the union bound ensures the existence of a word such that the rank of is at most . Thus101010See footnote 9., we obtain an exponential lower bound on since this computes a full-rank polynomial. However, because of the use of the union bound in this manner, this method faces an inherent limitation – it is unable to handle more than a very small number of summands, even if we lower the bias demand from each half (e.g., from to or a smaller polynomial in ). In fact, one can construct a sum of ordered smABPs (by starting with a single smABP ordered arbitrarily and considering the cyclic shifts of this ordering) such that any unbiased word (i.e., ) has the property that for at least one of the summands, the left and right halves will have no bias! Evidently then, in order to prove lower bounds against an unrestricted number of summands, we need another method to analyze the rank of the summands. Nonetheless, a conceptual takeaway from the exercise above is that selecting a rule that is oblivious to the orderings of individual summands (and in particular, a random rule) still lets us derive strong lower bounds for the sum of multiple ordered smABPs.
Suppose instead of slicing an ordered smABP down the middle, we slice it into three roughly equal pieces. Then, it is possible to write the polynomial computed by as a sum over terms, each of the form where for each , each of depends on disjoint variable sets of the global partition. We can then perform a similar analysis as above to show enough bias across these pieces, thereby obtaining a rank deficit. More precisely, we can conclude that for a single ordered smABP , again with a constant probability, the rank of is at most . When we slice the ABP into pieces in this way, it is not immediately clear where the gain is. In fact, for a single ordered smABP, this method actually gives a worse lower bound on due to the presence of the factor of . Where we gain is in the magnitude of the probability with which we can guarantee that a single ordered smABP has a rank deficit – we will now describe how this observation allows us to take a union bound over many more summands.
In order to illustrate this trade-off more clearly, we will partition the ordered smABP into many more pieces. Suppose we slice it into pieces, each of size roughly (this is just one setting of parameters; and are suitably optimized in the final proof). Thus, the polynomial that computes can be written as a sum of at most terms, where each term is a product of polynomials – each set-multilinear over a disjoint subset of the global partition, where each piece has size . When a word is chosen randomly, each such piece again exhibits a bias of about with constant probability. The crucial observation then is that by known concentration bounds, it can be shown that with probability exponentially close to , the sum of the biases across all the pieces is . For a single ordered smABP , this shows that the rank of is at most , which is still enough to show an exponential lower bound on , even though it is worse than what we obtained by slicing into fewer pieces.
The key advantage in implementing this analysis is that it provides a way to argue that for a random word , has low rank for a single ordered smABP – with probability exponentially close to . In particular, this allows us to union bound over exponentially many ordered smABPs and show that even if we have an computing of exponential support size, with high probability, each summand will have a rank deficit. Then, again using the sub-additivity of rank, we can conclude that the sum has a rank deficit as well.
This method of analyzing the rank of an ordered smABP by partitioning it into numerous pieces and tactfully using concentration bounds is novel, and conceptually the most essential aspect of the proof. As we demonstrated above, this method of analysis indeed gives a worse bound for a single smABP. However, while mildly sacrificing what we can prove about the rank of a single ordered smABP, we are able to leverage it to still prove something meaningful about the rank of a sum with a much larger number of summands.
Our partial derivative measure draws inspiration from previously known lower bounds in the context of multilinear and set-multilinear formulas ([Raz06, KS22]). One noteworthy distinction lies in the analysis of the measure: whereas the partitioning is present intrinsically in those formula settings, in our setting of ABPs, we deliberately introduce the partitioning at the expense of a notable increase in the number of summands or the total-width (and therefore, in the number of events we union bound over). The substantial advantage gained in utilizing this partitioning for rank analysis justifies the tolerable increase in the total-width.
Tightness of ABP set-multilinearization:
In order to make the hard polynomial in Theorems 1.6 and 1.4 lie in , one might wonder if we can get away with using the same rank measure (i.e., rank of the matrix for a uniformly random word ) that was used in the analysis above for the polynomial . However, as far as we know, full-rank polynomials (in the sense described above) may also require super-polynomial sized set-multilinear ABPs. Thus, in order to prove a separation between (general) set-multilinear ABPs and (sums of) ordered set-multilinear ABPs, we seek a property that is weaker than being full-rank and yet is still useful enough for proving lower bounds against our model. For this, we rely upon the arc-partition framework that is developed in [KS23] in order to prove near-optimal set-multilinear formula lower bounds (building upon the initial ingenious construction given in [DMPY12] for the multilinear context), tailor the framework to our model, and use a more delicate concentration bound analysis in order to prove our results.
An arc-partition is a special kind of symmetric word from : we will now describe a distribution over ; the words that will have positive probability of being obtained in this distribution will be called arc-partitions. The distribution is defined according to the following (iterative) sampling algorithm. Position the variable sets on a cycle with nodes so that there is an edge between and modulo . Start with the arc (an arc is a connected path on the cycle). At step of the process, maintain a partition of the arc . “Grow” this partition by first picking a pair uniformly at random out of the three possible pairs , and then choosing a labelling (or partition) on this pair i.e., assigning one of them ‘’ and the other ‘’ uniformly at random. After steps, we have chosen a partition (i.e., a word from ) of the variable sets into two disjoint, equal-size sets of variables and . It is known from [KS23] that there exist set-multilinear polynomials (as defined in Section 2.5) that are arc-full-rank i.e., is full-rank for every arc-partition . Analogous to the proofs of Theorems 1.6 and 1.4, we establish our lower bounds by showing that with high probability, every has an appropriately large rank deficit with respect to the arc-partition distribution. However, as we now briefly explain, this analysis turns out to be significantly more intricate.
Similar to the analysis as in the case, we partition an ordered smABP into pieces of size each, and write the polynomial that it computes as a sum of at most terms. Again, the task is to show that an arc-partition exhibits a large total bias across the pieces: more precisely, we show that if the pieces are labelled as , then with probability exponentially close to , the sum (i.e., the total bias of across these pieces) is , which is polynomially large in for an appropriate setting of . This then yields the desired rank deficit similar to the analysis (albeit with mildly worse parameters).
The bias lower bound is established in the following sequence of steps:
-
•
View the partition of as a fixed “coloring” of the latter. We say that a pair – as sampled in the construction of an arc-partition described above – “violates” a color if exactly one of the elements of the pair is colored by the set . Then, we show that with probability exponentially close to , “many” colors must have “many” violations: more precisely, that at least a constant fraction of the colors (i.e., many) have at least many violations each (for some small constant ). Such a “many violations” lemma is also established in [KS23] in the context of proving set-multilinear formula lower bounds. We show that this lemma, in fact, holds for a much wider range of parameters than was previously known; this extension is indeed necessary for our use. The proof of this strengthened many violations lemma is deferred to the appendix.
-
•
We then use the strengthened many violations lemma to argue that even though is not chosen uniformly at random and as such, its coordinates are not truly independent, it possesses “enough” inherent independence that a similar concentration bound as in the analysis is applicable. More precisely, we show that with high probability, there is an ordering of a set of colors such that each such color has at least violations and a more nuanced application of standard concentration bounds shows that exhibits a total bias of at least .
1.7 Structure of the paper
In Section 2, we formally introduce the measure used to prove our lower bounds, as well as the hard polynomials for which the bounds are shown. Section 2.4 is dedicated to proving lower bounds against in both the high degree and low degree regimes for a polynomial in (Theorem 1.6, Theorem 1.4). We then prove the corresponding results for a polynomial in (1.6’, 1.4’), in Section 4, thereby proving a near-tightness of the hardness escalation result in [BDS24] (Theorem 1.8). Finally, in Section 5, we show an optimal separation between setmultilinear and ordered set-multilinear ABPs (Theorem 1.9).
2 Preliminaries
2.1 Relative Rank and its Properties
We first describe the notation that we need to define the measures that we use to prove our results described in Section 1.3. Instead of directly working with the rank of the partial derivative matrix, we work with the following normalized form.
Definition 2.1.
Let be a tuple (or word) of non-zero real numbers. For a subset , we shall refer to the sum by , and by , we will refer to the tuple obtained by considering only the elements of that are indexed by . Given a word , we denote by a tuple of sets of variables where .111111In particular, . We denote by the set of set-multilinear polynomials over the tuple of sets of variables .
Definition 2.2 (Relative Rank Measure of [LST21]).
Let be a tuple of sets of variables such that and let . Let be a tuple (or word) of non-zero real numbers such that for all . Corresponding to a word , define and . Let be the set of all set-multilinear monomials over the subset of the variable sets precisely indexed by , and similarly let be the set of all set-multilinear monomials over these variable sets indexed by .
Define the ‘partial derivative matrix’ matrix whose rows are indexed by the elements of and columns indexed by the elements of as follows: the entry of this matrix corresponding to a row and a column is the coefficient of the monomial in . We define
The following is a simple result that establishes various useful properties of the relative rank measure.
Claim 2.3 ([LST21]).
-
1.
(Imbalance) Say . Then, .
-
2.
(Sub-additivity) If , then .
-
3.
(Multiplicativity) Say and assume that for each , , where is a partition of . Then
We will repeatedly make use of the following.
Theorem 2.4 (Chernoff Bound, as stated in [MU05]).
Suppose are independent random variables taking values in . Let denote their sum and let denote the expected value of the sum. Then for any ,
2.2 Inner Product Gadget
The following observation is used crucially to construct the hard polynomials in as well as .
Observation 2.5 ([KS23]).
Let and and be two disjoint sets of variables. Then, for any symmetric word (i.e., where ) and for the inner product ‘gadget’ , i.e., is full-rank.
2.3 A Hard Set-multilinear Polynomial in
As is done in previous lower bounds using the NW polynomials (for example, see [KSS14]), we will identify the set of the first integers as elements of via an arbitrary correspondence . If is a univariate polynomial, then we abuse notation to let denote the evaluation of at the -th field element via the above correspondence i.e., . To simplify the exposition, in the following definition, we will omit the correspondence and identify a variable by the point .
Definition 2.6 (Nisan-Wigderson Polynomials).
For a prime power , let be a field of size . For an integer and the set of variables , we define the degree homogeneous polynomial over any field as
Claim 2.7 ([KS22]).
For an integer and , let with . Then i.e., has full rank.
Proof.
Fix and , so that we can also write for , and let . The condition on implies that . Observe that is a square matrix of dimension . Consider a row of indexed by a monomial . can be thought of as a map from to which sends to for each . Next, by interpolating the pairs , we know that there exists a unique polynomial of degree for which for each . As a consequence, there is a unique ‘extension’ of the monomial that appears as a term in , which is precisely . Therefore, all but one of the entries in the row corresponding to must be zero, and the remaining entry must be . Applying the same argument to the columns of , we deduce that is a permutation matrix, and so has full rank. ∎
2.4 A Hard Set-multilinear Polynomial in
Let be an even integer and let be a collection of sets of variables where each , and similarly, let be a distinct collection of sets of variables where each . We shall refer to the -variables as the auxiliary variables. For and , let denote the inner-product quadratic form . Here, we shall assume that and .
For two integers and , we denote and call such a set an interval. For every interval , we define a polynomial as follows:
These in present form were defined in [KS23], but were in turn inspired from an earlier work of Raz and Yehudayof ([RY08]) in the multilinear context. [KS23] shows that they have the following full-rank property that will be instrumental for us.
Lemma 2.8 ([KS23]).
Let and be an even integer. Over any field of characteristic zero, the polynomial as defined above satisfies the following: For any with , is full-rank when viewed as a matrix over the field , the field of rational functions over the variables.
2.5 A Hard Set-Multilinear Polynomial in
2.5.1 Arc-partition Measure Description
This subsection is adapted from Section 2 of [DMPY12]. Let , be an even integer, and let be a collection of disjoint sets of variables each. An arc-partition will be a special kind of symmetric word (i.e., a one-to-one map from to ). For the purpose of this subsection, the reader can even choose to think of the alphabet of as (i.e., one ‘positive’ and one ‘negative’ value) – we use only to remain consistent with 2.2.
Identify with the set in the natural way. Consider the -cycle graph, i.e., the graph with nodes and edges between and modulo . For two nodes in the -cycle, denote by the arc between , that is, the set of nodes on the path from to in -cycle. First, define a distribution on a family of pairings (a list of disjoint pairs of nodes in the cycle) as follows. A random pairing is constructed in steps. At the end of step , we shall have a pairing of the arc . The size of is always . The first pairing contains only with and . Given and , define the random pair (independently of previous choices) by
Define
So, is either , or , each value is obtained with probability , and similarly (but not independently) for .
The final pairing is . Denote by a pairing distributed according to .
Once a pairing has been obtained, a word is obtained by simply randomly assigning and to the indices of any pair . More formally, for every , if , let with probability , independently of all other choices,
and with probability 1/2,
Denote by a word in that is sampled using this procedure. We call such a word an arc-partition. For a pair , we refer to and as partners.
Definition 2.9 (Arc-full-rank).
We say that a polynomial that is set-multilinear over is arc-full-rank if for every arc-partition , .
2.5.2 Construction of an Arc-full-rank Polynomial
Below, we describe a simple construction of a polynomial sized ABP that computes an arc-full-rank set-multilinear polynomial. The high-level idea is to construct an ABP in which every path between start-node and end-node corresponds to a specific execution of the random process which samples arc-partitions. Each node in the ABP corresponds to an arc , which sends an edge to each of the nodes and . The edges have specially chosen labels that help guarantee full rank with respect to every arc-partition. For simplicity of presentation, we allow the edges of the program to be labeled by degree four set-multilinear polynomial polynomials over the corresponding subset of the variable partition. This assumption can be easily removed by replacing each edge with a polynomial-sized ABP computing the corresponding degree four polynomial.
Formally, the nodes of the program are even-size arcs in the -cycle, an even integer. The start-node of the program is the empty arc and the end-node is the whole cycle (both are “special” arcs). Let be a collection of sets of variables where each , and similarly, let be a distinct collection of sets of variables where each (we shall refer to the -variables as auxiliary variables). For and in , let denote the inner-product quadratic form . Here, we shall assume that and .
Construct the branching program by connecting a node/arc of size to three nodes/arcs of size . For , there is just one node , and the edge from start-node to it is labeled . For , the node of size is connected to the three nodes: , and . (It may be the case that the three nodes are the end-node.) The edge labeling is:
-
•
The edge between and is labeled .
-
•
The edge between and is labeled .
-
•
The edge between and is labeled .
Consider the ABP thus described, and the polynomial it computes. For every path from start-node to end-node in the ABP, the list of edges along yields a pairing ; every edge in corresponds to a pair of nodes in -cycle. Thus,
(1) |
where the sum is over all paths from start-node to end-node.
Remark 2.10.
There is in fact a one-to-one correspondence between pairings and such paths (this follows by induction on ). Note that this is true only because pairings are tuples i.e., they are ordered by definition. Otherwise, it is of course still possible to obtain the same set of pairs in a given pairing using multiple different orderings. The sum defining can be thought of, therefore, as over pairings .
The following statement summarizes the main useful property of .
Lemma 2.11 ([KS23]).
Over any field of characteristic zero, the polynomial defined above is arc-full-rank as a set-multilinear polynomial in the variables over the field of rational functions in .
Proof.
Let be an arc-partition. We want to show that has full rank. The arc-partition is defined from a pairing (though as discussed in 2.10, there could be multiple such ). The pairing corresponds to a path from start-node to end-node. Consider the polynomial that is obtained by setting every in such that is not a pair in , and setting every for every pair in . Then, it is easy to see that the only terms that survive in Equation 1 correspond to paths (and in turn, pairings) which have the same underlying set of pairs as . As a consequence, is simply some non-zero constant times a polynomial which is full-rank (recall 2.5). being full rank then implies that is also full-rank. ∎
3 Separation between and
In the theorem below, refers to the polynomial defined in Section 2.4. In this section, we prove Theorem 1.6 and Theorem 1.4 by first proving the following statement.
Lemma 3.1.
Given large enough integers , any of max-width and support size computing must satisfy at least one of the following:
-
•
either ,
-
•
or , for any integer in the range .
Proof.
Suppose that (so that the range in the theorem statement is indeed well-defined).
First, we observe that for any computing , we can view each summand as an ordered set-multilinear branching program with respect to only the variables. In other words, by appropriately collapsing the layers labelled using the variables, each summand is a set-multilinear branching program over the field ordered with respect to for some permutation . It is easy to see that this collapsing process does not increase the width or the size of any summand in the branching program. The edge labels do get altered however: the coefficients of the variables in any edge label can now be polynomials in the variables (and therefore, field constants in ).
Let be such a set-multilinear branching program of width and depth that is ordered with respect to 121212In general, may be ordered with respect to an arbitrary permutation , but the assumption that is the identity permutation in the discussion that follows is without loss of generality.. Recall that this means that for each , all edges of the ABP from layer to layer are labeled using a linear form in . Given a node in layer and a node in layer of , define to be the polynomial computed by the ABP restricted to the layers with the source and the sink defined by and respectively. Consider an integer in the range specified in the theorem statement and let be the largest integer such that the product i.e., we must have . Consider the following decomposition of :
where and are defined to be the source and the sink of respectively, and for , varies over all choices of nodes in layer . Note that hence, this expression contains at most terms. Also, note that each is set-multilinear over the partition where is the set 131313If is instead ordered with respect to , then is taken to be the set . of length exactly for , and has length at most . We now analyze the relative rank of each summand.
Let (where ) be an arbitrary word. By 2.3, we see that
from which we observe that the task of upper bounding this rank can be reduced to the task of lower bounding the sum , which is established below.
Choose from (where ) uniformly at random. For each , let denote the (bad) event that . Since is an interval of length , by a standard estimation of binomial coefficients, we obtain that . Then, by the Chernoff bound (Theorem 2.4), the probability that at least half of the events occur is at most . Therefore, with probability at least ,
and therefore, by the sub-additivity of ,
Now, let be a computing with max-width bounded by , and such that each is ordered set-multilinear with respect to the variable partition for some permutation . By the union bound and the discussion above, it follows that with probability at least ,
But now, we can condition on the event that (which occurs with probability )) to establish the existence of a word with such that satisfies . This is because of the given bound . Because for such a by 2.8, we conclude that , where the last inequality follows from our choice of .
∎
Proof of Theorem 1.6.
We invoke Lemma 3.1 with . If , then we trivially have that the total-width is at least , so assume . We shall show that then, , which will yield the desired result.
Set . Then clearly, . Moreover, as by assumption, we verify that
Therefore, we can use 3.1 to obtain the inequality (as for large enough ). Plugging in , we see that
∎
Proof of Theorem 1.4.
We consider cases as follows:
Case :
Suppose there is a constant such that . Set and note that . We see that by 3.1,
Note that and decays to zero as becomes larger. Furthermore, as by assumption and , . We conclude that if is bounded by a polynomial in , then must be super-polynomial in .
Case :
Suppose there is a fixed constant such that . Assume that , and set . Then indeed lies in the range to apply 3.1. We obtain the inequality
which contradicts the assumption that . Hence, which is indeed super-polynomial in whenever .
Thus, in either case, it is shown that both and cannot be polynomially bounded. Hence, the total-width of any computing cannot be polynomially bounded. ∎
4 Tightness of ABP Set-Multilinearization
In this section, we prove Theorem 1.8 and in the process, also prove 1.6’ and 1.4’. We first establish the following technical lemma that will be essential for these proofs.
Lemma 4.1.
Let be growing parameters satisfying . There exist fixed constants and such that any of max-width and support size computing must satisfy at least one of the following:
-
•
either ,
-
•
or , for any integer in the range .
Proof.
First, similar to the proof of 3.1, we observe that for any computing , we can view each summand as an ordered set-multilinear branching program with respect to only the variables. In other words, by appropriately collapsing the layers labelled using the variables, each summand is a set-multilinear branching program over the field ordered with respect to for some permutation . It is easy to see that this collapsing process does not increase the width or the size of any summand branching program. The edge labels do get altered however: the coefficients of the variables in any edge label can now be polynomials in the variables (and therefore, field constants in ).
Let be such a set-multilinear branching program of width and depth that is ordered with respect to 141414In general, may be ordered with respect to an arbitrary permutation , but the assumption that is the identity permutation in the discussion that follows is without loss of generality.. Recall that this means that for each , all edges of the ABP from layer to layer are labeled using a linear form in . Given a node in layer and a node in layer of , define to be the polynomial computed by the ABP restricted to the layers with the source and the sink defined by and respectively. Consider an integer in the range specified in the lemma statement and let be such that151515 If does not divide , then we can let be . In the discussion that immediately follows, we simply bound the relative rank of the ‘last’ component by and so, the remaining analysis is nearly identical. . Consider the following decomposition of :
where and are defined to be the source and the sink of respectively, and for , varies over all choices of nodes in layer . Note that hence, this expression contains at most terms. Also, note that each is set-multilinear over the partition where is the set 161616If is instead ordered with respect to , then would be simply defined as the set . of length exactly . We now analyze the relative rank of each summand.
By 2.3, we see that for every appropriate word ,
from which we observe that the task of upper bounding this rank can be reduced to the task of lower bounding the sum , which is established below.
Choose from the distribution , as described in Section 2.5. We now view the partition of as a fixed “coloring” of the latter set (and in turn, the -cycle, as described in Section 2.5) i.e., each node is assigned the color if and only if . For a pairing and set , define the number of -violations by
In words, it is the set of pairs in which one color is and the other color is different from . For some fixed , denote
Next, we state a technical lemma that states that with probability exponentially close to , “many” colors have “many” violations. The constants that appear in the statement below indeed define the constants that are mentioned in the statement of Lemma 4.1.
Lemma 4.2 (Many Violations Lemma).
Let be growing parameters satisfying . There exist fixed constants and , such that for all integers in the range the following holds: Let be a partition of the -cycle where each . Then,
where .
We now show, in the claim below, how the preceding lemma can be used to argue that with probability exponentially close to , an arc-partition exhibits large bias. The constant that appears below defines the constant in the statement of Lemma 4.1.
Claim 4.3.
There exists a fixed constant such that
where the probability is over the choice of .
Proof.
Let denote the event , and denote the event . From the law of total probability, it follows that , where denotes the complement of – therefore, it suffices to bound .
Fix a pairing such that the high probability event occurs. Consider an ordering of the colors in . A color is said to be bright with respect to an ordering if there are at least nodes of color such that the partner of is colored using a color that appears after in the ordering . Call an ordering of the nodes in good if there are at least bright colors with respect to . The observation is that for any ordering of the colors, either itself is good, or its reverse is good. We conclude that given any pairing , there exists a good ordering of . Fix any such good ordering and let be the collection of bright colors with respect to this ordering. Let the colors in according to this good ordering be .
Next, notice that if the sum is at most , then so is the sum . Let (which is at least if ). View the sampling of from as happening in a specific order, according to the order of : First define on pairs with at least one point with color , then define on remaining pairs with at least one point with color , and so forth. When finished with , continue to define on all other pairs.
Observe that if the sum is at most , then there exists a subset with half its size, ,171717We assume without loss of generality that is even to avoid ceilings and floors. such that every color in satisfies (otherwise, we obtain an immediate contradiction). There are at most many choices for such – fix such a choice and relabel the colors in as , where and this order respects the order described in the paragraph above. For every , define to be the event that . By choice, conditioned on , there are at least pairs so that that are not yet assigned a ‘positive’ or ‘negative’ sign (with a magnitude equal to ). For every such , the element in is assigned a positive sign with probability , and is independent of any other . Therefore, is bounded by the probability that a binomial random variable over a universe of size lies in any specific interval of size . By a standard estimation of binomial coefficients, this probability is bounded by a constant, .
Hence, for any fixed choice of , for all ,
Therefore, for any fixed choice of , by the chain rule, it follows that
There are at most choices for . We conclude that
Finally, we note that by the many violations lemma (4.2), . Thus, for some fixed constant which can be defined in terms of and . ∎
From the claim, it follows that with probability at least ,
and therefore, by the sub-additivity of ,
Now, let be a computing with max-width bounded by , and such that each is ordered set-multilinear with respect to the variable partition for some permutation . Assume that (if the second inequality does not hold, then the first item in the lemma statement is true; and we deal with the case at the end of the proof). By the union bound and the discussion above, it follows that with probability at least ,
Since by assumption, note that . Furthermore, since we are assuming , we have that and therefore, Lemma 4.2 applies and so does 4.3 along with the entire discussion above. We conclude that there exists an arc-partition such that satisfies . Because for such a by 2.11, we conclude that .
Finally, if is bounded by the constant , then we just add constantly many width- ordered set-multilinear branching programs which each compute the zero polynomial so that the new support size of the sum becomes larger than and the previous case applies, and we are able to conclude . Appropriately defining the constant then lets us conclude the desired bound . ∎
Proof of 1.6’.
Set , where is as defined in the Lemma 4.1. If , then we trivially have that the total-width is at least , so assume (so that the second item of 4.1 applies). We shall show that then, , which will yield the desired result.
By 4.1, any of max-width and support size computing satisfies the inequality , for any integer in the range . Set . Then clearly, and so, lies in the required range. Plugging in the setting for in the inequality, we see that
∎
Proof of 1.4’.
We consider cases as follows:
Case :
Suppose there is a constant such that . Set and note that . We see that by 4.1,
Note that and decays to zero as becomes larger. Furthermore, as by assumption and , . We conclude that if is bounded by a polynomial in , then must be super-polynomial in .
Case :
Suppose there is a fixed constant such that . Define to be the constant . Set and assume that . Then indeed lies in the range to apply 4.1: we have because by definition, and because of the assumption on . We obtain the inequality
which contradicts the assumption that . Hence, which is indeed super-polynomial in whenever .
Thus, in either case, it is shown that both and cannot be polynomially bounded. Hence, the total-width of any computing cannot be polynomially bounded. ∎
Finally, we observe that Theorem 1.8 follows immediately from (i) the ABP construction of given in Section 2.5, and (ii), the proof of the case in 1.4’ above.
5 Optimal Separation between Ordered and Unordered smABPs
In this section, we prove Theorem 1.9. The result is that , as described in Section 2.5, does not have small ordered set-multilinear branching programs with respect to any ordering of the s and s. More precisely, we claim that any set-multilinear branching program computing which is ordered with respect to some permutation of the sets in the collection must have size at least .
Proof of Theorem 1.9.
Let be a set-multilinear branching program computing , which is ordered with respect to a permutation of the sets in . That is is a bijective map where , for some , if all edges between layer and layer are labelled using linear forms in the variables of . Let be the ordering inherited from by the sets, that is if is the -th -set appearing in the sequence . Define a word using as follows: let
That is, the edges labelled using linear forms in appear on the ‘left’ half in , when the sets are ignored. Note that is symmetric, or in other words, .
Now, for , define to be the size of layer in . The technique used by Nisan [Nis91] can be used directly to show the following.
Lemma 5.1 ([Nis91]).
If is thought of as a polynomial only over the variables (here , the field of rational functions in ), then .181818The factor comes from using relative-rank rather than rank.
Thus, in order to prove a lower bound on the width of , it suffices to lower bound by for some . We prove such a lower bound using the following lemma.
Lemma 5.2.
Given any with , there exists an arc-partition obtained from a pairing such that ‘splits’ a constant fraction of the pairs in : more precisely, there is a set of size at least such that if , then .
Let us first prove the lower bound on assuming this lemma. Consider the polynomial that is obtained by setting every in such that is not a pair in , and setting every for every pair in (where is as in 5.2). Then, it is easy to see that the only terms that survive in Equation 1 correspond to paths (and in turn, pairings) which have the same underlying set of pairs as . As a consequence,
for a non-zero constant . Next, notice that the rank of the matrix (over the field ) serves as a lower bound on the rank of the matrix (over the field ). Indeed, if the former is then there is a non-vanishing minor of . But this implies that the determinant of the corresponding sub-matrix in must be a non-zero polynomial in as it has a non-zero evaluation. From this we conclude that also has a non-vanishing minor and therefore has rank at least . Stated in terms of relative rank, we have showed that .
Now, given the set from 5.2, we can lower bound the relative rank of as follows: first, note that if for some , the pair of is in , then by 2.5, . Secondly, if does not split some pair of (i.e., ), then is simply a one-dimensional vector (either or ) and we trivially have . Combining, and using the multiplicativity of (third item of 2.3),
Therefore, we have and hence, by 5.1, the size of is at least . ∎
We now give the proof of 5.2.
Proof of 5.2.
Let be defined as follows.
Clearly, either or . Without loss of generality, let us assume that .
Also, without loss of generality191919If this is not the case, then and we redefine to be the largest possible subsets of the originally defined , respectively, such that . we can assume that . Then , and say . Further, let
We can then define an initial pairing, . Let us also define the set . The goal is to iteratively update such that, at the end, we have a pairing corresponding to an arc-partition. We will also update at each step so that, at the end, each pair in are of opposite signs.
Let . Intuitively, are the right most and left-most points, respectively, of the partial arc-partition, , defined till the -th iteration. Given for any , we will define as follows. Note that the calculations are when .
-
Case 1:
and are both odd.
-
Case 2:
is even and is odd.
We first define
and then define
Also,
and
-
Case 3:
is odd and is even.
We first define
and then define
Also,
and
-
Case 4:
and are both even.
We first define
and then define
Also,
Note that must be odd. So finally, we define and . Firstly note that corresponds to an arc-partition, since is a partial arc-partition and it is easy to check that for each , is a valid extension of .
Since , clearly has size at least . The only thing left to complete the proof is to check that has the other required property. For any , let us assume that has the property that if , . In Case 1, clearly, continues to have this property. In Cases 2 and 4, if , then again clearly continues to have this property. When , it must mean that and so . That is, in the previous iteration Case 2 had been true and had been added to . Therefore and, also, . A similar argument for Case 3 then completes the proof. ∎
Acknowledgements
Parts of this work were done while the first author was visiting TIFR Mumbai and ICTS-TIFR Bengaluru, and she would like to thank Venkata Susmita Biswas, Ramprasad Saptharishi, Prahladh Harsha and Jaikumar Radhakrishnan for the hospitality. The first author would also like to thank Anamay Tengse for useful discussions while trying to understand [RR20] and [GR21].
References
- [AFS+18] Matthew Anderson, Michael A. Forbes, Ramprasad Saptharishi, Amir Shpilka, and Ben Lee Volk. Identity testing and lower bounds for read-k oblivious algebraic branching programs. ACM Trans. Comput. Theory, 10(1):3:1–3:30, 2018.
- [AGKS15] Manindra Agrawal, Rohit Gurjar, Arpita Korwar, and Nitin Saxena. Hitting-sets for ROABP and sum of set-multilinear circuits. SIAM J. Comput., 44(3):669–697, 2015.
- [AR16] Vikraman Arvind and S. Raja. Some lower bound results for set-multilinear arithmetic computations. Chic. J. Theor. Comput. Sci., 2016, 2016.
- [BDS22] C. S. Bhargav, Sagnik Dutta, and Nitin Saxena. Improved lower bound, and proof barrier, for constant depth algebraic circuits. In Stefan Szeider, Robert Ganian, and Alexandra Silva, editors, 47th International Symposium on Mathematical Foundations of Computer Science, MFCS 2022, August 22-26, 2022, Vienna, Austria, volume 241 of LIPIcs, pages 18:1–18:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
- [BDS24] C.S. Bhargav, Prateek Dwivedi, and Nitin Saxena. Lower bounds for the sum of small-size algebraic branching programs. To appear in the proceedings of the Annual Conference on Theory and Applications of Models of Computation, 2024. https://www.cse.iitk.ac.in/users/nitin/papers/sumRO.pdf.
- [BG22] Vishwas Bhargava and Sumanta Ghosh. Improved hitting set for orbit of roabps. Comput. Complex., 31(2):15, 2022.
- [BS21] Pranav Bisht and Nitin Saxena. Blackbox identity testing for sum of special roabps and its border class. Comput. Complex., 30(1):8, 2021.
- [Bür00] Peter Bürgisser. Cook’s versus valiant’s hypothesis. Theor. Comput. Sci., 235(1):71–88, 2000.
- [CKSV22] Prerona Chatterjee, Mrinal Kumar, Adrian She, and Ben Lee Volk. Quadratic lower bounds for algebraic branching programs and formulas. Comput. Complex., 31(2):8, 2022.
- [DMPY12] Zeev Dvir, Guillaume Malod, Sylvain Perifel, and Amir Yehudayoff. Separating multilinear branching programs and formulas. In Howard J. Karloff and Toniann Pitassi, editors, Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 615–624. ACM, 2012.
- [FS13] Michael A. Forbes and Amir Shpilka. Quasipolynomial-time identity testing of non-commutative and read-once oblivious algebraic branching programs. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 243–252. IEEE Computer Society, 2013.
- [FSS14] Michael A. Forbes, Ramprasad Saptharishi, and Amir Shpilka. Hitting sets for multilinear read-once algebraic branching programs, in any order. In David B. Shmoys, editor, Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 867–875. ACM, 2014.
- [GG20] Zeyu Guo and Rohit Gurjar. Improved explicit hitting-sets for roabps. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
- [GKS17] Rohit Gurjar, Arpita Korwar, and Nitin Saxena. Identity testing for constant-width, and any-order, read-once oblivious arithmetic branching programs. Theory Comput., 13(1):1–21, 2017.
- [GKST17] Rohit Gurjar, Arpita Korwar, Nitin Saxena, and Thomas Thierauf. Deterministic identity testing for sum of read-once oblivious arithmetic branching programs. Comput. Complex., 26(4):835–880, 2017.
- [GR21] Purnata Ghosal and B. V. Raghavendra Rao. Limitations of sums of bounded read formulas and abps. In Rahul Santhanam and Daniil Musatov, editors, Computer Science - Theory and Applications - 16th International Computer Science Symposium in Russia, CSR 2021, Sochi, Russia, June 28 - July 2, 2021, Proceedings, volume 12730 of Lecture Notes in Computer Science, pages 147–169. Springer, 2021.
- [GV20] Rohit Gurjar and Ben Lee Volk. Pseudorandom bits for oblivious branching programs. ACM Trans. Comput. Theory, 12(2):8:1–8:12, 2020.
- [Jan08] Maurice J. Jansen. Lower bounds for syntactically multilinear algebraic branching programs. In Edward Ochmanski and Jerzy Tyszkiewicz, editors, Mathematical Foundations of Computer Science 2008, 33rd International Symposium, MFCS 2008, Torun, Poland, August 25-29, 2008, Proceedings, volume 5162 of Lecture Notes in Computer Science, pages 407–418. Springer, 2008.
- [Kay12] Neeraj Kayal. An exponential lower bound for the sum of powers of bounded degree polynomials. Electron. Colloquium Comput. Complex., TR12-081, 2012.
- [KLSS17] Neeraj Kayal, Nutan Limaye, Chandan Saha, and Srikanth Srinivasan. An exponential lower bound for homogeneous depth four arithmetic formulas. SIAM J. Comput., 46(1):307–335, 2017.
- [KNS20] Neeraj Kayal, Vineet Nair, and Chandan Saha. Separation between read-once oblivious algebraic branching programs (roabps) and multilinear depth-three circuits. ACM Trans. Comput. Theory, 12(1):2:1–2:27, 2020.
- [KS17] Mrinal Kumar and Shubhangi Saraf. On the power of homogeneous depth 4 arithmetic circuits. SIAM J. Comput., 46(1):336–387, 2017.
- [KS22] Deepanshu Kush and Shubhangi Saraf. Improved low-depth set-multilinear circuit lower bounds. In Shachar Lovett, editor, 37th Computational Complexity Conference, CCC 2022, July 20-23, 2022, Philadelphia, PA, USA, volume 234 of LIPIcs, pages 38:1–38:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
- [KS23] Deepanshu Kush and Shubhangi Saraf. Near-optimal set-multilinear formula lower bounds. In Amnon Ta-Shma, editor, 38th Computational Complexity Conference, CCC 2023, July 17-20, 2023, Warwick, UK, volume 264 of LIPIcs, pages 15:1–15:33. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
- [KSS14] Neeraj Kayal, Chandan Saha, and Ramprasad Saptharishi. A super-polynomial lower bound for regular arithmetic formulas. In David B. Shmoys, editor, Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 146–153. ACM, 2014.
- [KST16] Neeraj Kayal, Chandan Saha, and Sébastien Tavenas. An almost cubic lower bound for depth three arithmetic circuits. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, volume 55 of LIPIcs, pages 33:1–33:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
- [LST21] Nutan Limaye, Srikanth Srinivasan, and Sébastien Tavenas. Superpolynomial lower bounds against low-depth algebraic circuits. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 804–814. IEEE, 2021.
- [LST22] Nutan Limaye, Srikanth Srinivasan, and Sébastien Tavenas. On the partial derivative method applied to lopsided set-multilinear polynomials. In Shachar Lovett, editor, 37th Computational Complexity Conference, CCC 2022, July 20-23, 2022, Philadelphia, PA, USA, volume 234 of LIPIcs, pages 32:1–32:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
- [MU05] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.
- [Nis91] Noam Nisan. Lower bounds for non-commutative computation (extended abstract). In Cris Koutsougeras and Jeffrey Scott Vitter, editors, Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, May 5-8, 1991, New Orleans, Louisiana, USA, pages 410–418. ACM, 1991.
- [NW97] Noam Nisan and Avi Wigderson. Lower bounds on arithmetic circuits via partial derivatives. Comput. Complex., 6(3):217–234, 1997.
- [Raz06] Ran Raz. Separation of multilinear circuit and formula size. Theory Comput., 2(6):121–135, 2006.
- [Raz13] Ran Raz. Tensor-rank and lower bounds for arithmetic formulas. J. ACM, 60(6):40:1–40:15, 2013.
- [RR20] C. Ramya and B. V. Raghavendra Rao. Lower bounds for special cases of syntactic multilinear abps. Theor. Comput. Sci., 809:1–20, 2020.
- [RS05] Ran Raz and Amir Shpilka. Deterministic polynomial identity testing in non-commutative models. Comput. Complex., 14(1):1–19, 2005.
- [RY08] Ran Raz and Amir Yehudayoff. Balancing syntactically multilinear arithmetic circuits. Comput. Complex., 17(4):515–535, 2008.
- [RY09] Ran Raz and Amir Yehudayoff. Lower bounds and separations for constant depth multilinear circuits. Comput. Complex., 18(2):171–207, 2009.
- [Sap15] Ramprasad Saptharishi. A survey of lower bounds in arithmetic circuit complexity. Github Survey, 2015.
- [SY10] Amir Shpilka and Amir Yehudayoff. Arithmetic circuits: A survey of recent results and open questions. Found. Trends Theor. Comput. Sci., 5(3-4):207–388, 2010.
- [TLS22] Sébastien Tavenas, Nutan Limaye, and Srikanth Srinivasan. Set-multilinear and non-commutative formula lower bounds for iterated matrix multiplication. In Stefano Leonardi and Anupam Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 416–425. ACM, 2022.
- [Val79] L. G. Valiant. Completeness classes in algebra. In Proceedings of the Eleventh Annual ACM Symposium on Theory of Computing, STOC ’79, page 249–261, New York, NY, USA, 1979. Association for Computing Machinery.
Appendix A Proof of the Many Violations Lemma
We briefly recall some notation from Section 4. Recall that is chosen from the distribution , as described in Section 2.5. For a pairing , and a set , we defined the number of -violations as
In words, it is the set of pairs in which one color is and the other color is different. We used the following notation to denote the set of colors with “many” violations (for some fixed )
As mentioned previously, this subsection is adapted from the proof of the (weaker) many violations lemma in [KS23], which is in turn adapted from Lemma 4.1 in [DMPY12].
Lemma.
(Many Violations Lemma restated) Let be growing parameters satisfying . There exist fixed constants and , such that for all integers in the range the following holds: Let be a partition of the -cycle where each .202020As explained in footnote 15 we assume w.l.o.g. . Then,
where .
Proof.
Set . Fix a partition (or a “coloring”) of the -cycle satisfying the conditions of the lemma. Think of as a function from the -cycle to the set , assigning every node its color in . is the color of . Use the following definition to partition the proof into cases. For a color , count the number of jumps in it (with respect to the partition ) to be
the set of elements of color so that has a color different from .
Case 1: Many colors with many jumps.
The high-level idea is that each color with many jumps has many violations because pairs of the form yield violations as soon as they are constructed.
Assume that for at least colors , . Denote by the set of ’s that satisfy this inequality. Then, for every in , there exists a subset of size . Let
We think of the construction of the (random) pairing as happening in epochs, depending on , as follows.
For , define the random variable
the set after removing a four-neighborhood of . For a certain sequence of time steps , we will define special nodes which lie in this small ‘cloud’ around the arc (i.e., within a distance of on either side of the arc) - it is for these special nodes that the set of pairs will provide many violations. We now formalize this intuition.
Let be the first time after so that the distance between and is at most two. The distance between and is at least five. The size of the arc increases by two at each time step. So, . Let be an element of that is of distance at most two from ; if there is more than one such , choose arbitrarily. The minimality of implies that is not in .
Let be the first time after so that the distance between and is at most two. Let be an element of that is of distance at most two from . Define , for similarly, until is empty. As long as , we have . This process, therefore, has at least steps. For , denote by the event that during the time between and the pair is added to . The pair is violating color . At time , even conditioned on all the past , in at most two steps (and before ) we can add the pair to . For every , therefore,
Next, let . We want to show that with high probability, for at least many , the event occurs. There are many ways of choosing a set of indices of size . Subsequently,
where is a universal constant. Finally, we argue that if there do exist for which the events occur, then . To see this, note that the size of every is . So, every color in can contribute at most elements to . If , then at most these many colors can contribute more than (and up to elements) - combined, at most elements. However, there are at least colors which can contribute only up to elements. Again combined, this is not sufficient to cover the elements overall (for large enough and a small enough constant that depends only on ), which is a contradiction. Hence,
and the proof follows in this case as .
Case 2: Many colors with few jumps.
The intuition is that many violations will come from pairs of the form in the construction of the pairing. Assume that for at least colors , . Denote again by the set of ’s that satisfy the above inequality. We say that a color is noticeable in the arc if
Claim A.1.
There are disjoint arcs so that for every ,
-
1.
and,
-
2.
there is a color in that is noticeable in .
Moreover, the colors can be chosen to be pairwise distinct.
Proof.
For each color in , there are at least vertices of color in the -cycle and at most jumps in the color . Therefore, there is at least one -monochromatic arc of size at least . Hence, on the -cycle, there are such monochromatic arcs for the colors in , in this order .
Consider an arc of size included in . Thus . If we “slide” the arc until it is included in , then . By continuity, there is an intermediate position for the arc such that . This provides the first arc of the claim.
Sliding an arc inside to inside shows that there exists an arc such that . The arcs and are disjoint: The distance of the largest element of and the smallest element of is at most . The distance of the smallest element of and the largest element of is at most . The size of is larger than . Proceed in this way to define . ∎
Use Claim A.1 to divide the construction of the (random) pairing into epochs. Denote by the family of arcs given by the claim. Let be the first time that the arc hits one of the arcs in . Denote by that arc that is hit at time (break ties arbitrarily). Denote by the color that is noticeable in . Let be the first time so that is contained in . Let be the subset of of arcs that have an empty intersection with . Similarly, let be the first time after that the arc hits one of the arcs in . If there are no arc in , define . Denote by that arc that is hit at time . Denote by the color that is noticeable in . Let be the first time so that is contained in . Let be the subset of of arcs that have an empty intersection with . Define for analogously. For every , denote by the event that during the time between and the number of pairs added that violate color ’s at most . (If does not hold, then and .) The main part of the proof is summarized in the following proposition, whose proof is deferred to Section A.1.
Lemma A.2 (Chessboard Lemma).
There is an absolute constant such that for every , and any choice of pairs ,
Given this lemma, let us finish the proof of Lemma 4.2. Define and let denote the event that the number of ’s for which is at least . First, we argue that occurs with high probability.
For any , consider the evolution of the arc between the time steps (when it first hits arc ) and (when it completely engulfs it). During this epoch, let us call the evolution of in the ‘direction’ of as good (labelled ‘’) and away from the direction of as bad (‘’). To this end, for any time step in this epoch, we can code the three possible choices for the evolution of as (when the arc is grown in the direction of ), (when it is grown equally on either side), or (when it is grown away from the direction of ). Consequently, the evolution of during this epoch can be realized as a sequence consisting of the symbols and .
Consider the sequence of ’s and ’s obtained by concatenating the sequences corresponding to all the epochs (ignoring the choices made at time steps that do not lie in such epochs, i.e., between and for some - as there is no corresponding notion of a ‘good’ direction outside such epochs). The intuition is that if (i.e., if does not occur), then there must be an extremely large number of ’s compared to ’s (i.e., the arc evolves disproportionately in the bad direction) in the concatenated string , which should occur only with a vanishingly small probability.
Consider the sub-string of that corresponds to the choices made only for the nodes in . Note that there are precisely many ’s in . Suppose for concreteness (the cases and are similar). This implies that there are many ’s in . Since only up to many of these ’s may appear as a result of the evolution making a choice of the form , it follows that the evolution of must make a choice of the form at least times out of a possible , in order to cover the elements of . Denote . By the union bound, this probability is at most
for some universal constant . Similarly, we have bounds for both and and it follows that for some universal constant .
Remark A.3.
The argument above for showing that occurs with high probability differs considerably from [DMPY12], where the corresponding event is sketched to occur with probability only at least , which is not strong enough for our purposes.
Next, note that
If , then at least colors in have at most many violations. Since , in particular, there must exist at least colors within the first colors (here we are using the ordering of colors as provided by Claim A.1) for which there are at most many violations. We then obtain the following by conditioning on , using the union bound.
For a fixed choice of , by the chain rule and Lemma A.2, we have
Overall, setting appropriately, we conclude that
∎
A.1 Proof of the Chessboard Lemma
To prove Lemma A.2, we use a different point of view of the random process. We begin by describing this different view, and later describe its formal connection to the distribution on pairings. This subsection is adapted from Section 5 of [DMPY12] and closely follows their argument, though with numerous parameter changes to suit our demands.
The view uses two definitions. One is a standard definition of a two-dimensional random walk, and the other is a definition of a “chessboard” configuration in the plane. The proof of the proposition will follow by analyzing the behavior of the random walk on the “chessboard”. Let be as above and be as defined in Lemma A.1. The random walk on is defined as follows. It starts at the origin, . At every step it move to one of three nodes, independently of previous choices,
At time , the -distance of from the origin is thus .
The “chessboard” is defined as follows. Let and be two Boolean functions. The functions induce a “chessboard” structure on the board . A position in the board is colored either white or black. It is colored black if and white if . We say that the “chessboard” is well-behaved if
-
1.
is far from constant:
-
2.
does not contain many jumps:
-
3.
does not contain many jumps:
Consider a random walk on top of the “chessboard” and stop it when reaching the boundary of the board (i.e., when it tries to make a step outside the board ). We define a good step to be a step of the form that lands in a black block. We will later relate good steps to violating edges. Our goal is, therefore, to show that a typical makes many good steps.
Lemma A.4.
Assume the chessboard is well-behaved. There is a constant such that the probability that makes less than good steps is at most .
We use this lemma to show Lemma A.2.
Proof of Lemma A.2 given Lemma A.4.
Recall that is an arc of size so that there is a color satisfying
(2) |
Furthermore, condition on , . Assume without loss of generality that is in (when is in , the analysis is similar). The distance of from the smallest element of is at most one (the length of “one step to the right” is between zero and two). We now grow the random interval until , i.e., as long as stays in . At the same time, performs a movement to the left. Since , there are at least steps for to take to the left before hitting . There is a one-to-one correspondence between pairings and random walks using the correspondence
Define the function to be at positions of with color , and at the other positions. Set the function as to describe the color from leftward. The “chessboard” is well-behaved by (2) and since is in the set defined in case 2 of the proof of Lemma 4.2 (so there are not many jumps for the color ). Finally, if makes a good step, then the corresponding pair added to violated color . So, if holds for , then the corresponding makes less than good steps. Formally, by Lemma 4.2,
∎
Proof of Lemma A.4.
Define three events , all of which happen with small probability, so that every that is not in their union makes many good steps.
Call a subset of the board of the form or , where is a sub-interval, a region. The width of a region is the size of . Let be the set of regions of width at least . The size of is at most . For a region in , denote by the event that the number of steps of the form that makes in is less than given that it makes at least steps in . Denote
To estimate the probability of , note that we can simply apply the Chernoff bound to a sum of Bernoulli random variables with . By the union bound, we conclude that there is a universal constant such that
Denote by the set of all points in the board with -norm at least . At time the random walk is distributed along all points in of -norm exactly . The distribution of on this set is the same as that of a random walk on that is started at , and moves at every step to the right with probability , stays in place with probability and moves to the left with probability . The probability that such a random walk on is at a specific point in at time is at most . Hence, for every point in ,
Call a point in the board a corner if both and are of the same color , but and are not of color . For a corner , denote by the -neighborhood of in -metric. Denote by the union over all , for corners in . Denote by the event that hits any point in . Since the board is well-behaved, the number of jumps in each of is at most . Therefore, the number of corners is at most . By the union bound,
where in the last step, we used . Note that plugging in, say, indeed makes the inequality true. Next, let . Define three (vertical) lines: is the line , is the line and is the line . Denote by the event that does not cross the line before stopped (i.e., hitting the boundary of the board). Chernoff’s bound implies that there is a universal constant for which
To conclude the proof by the union bound, it suffices to show that for every not in , the walk makes at least good steps. Fix such a walk . Since , we know that crosses the line .
We consider several cases. Define a block to be a maximal monochromatic rectangle in the board. The board is thus partitioned into black blocks and white blocks - which is what led [DMPY12] to calling it a “chessboard.” We now think of the board as drawn in the plane with at the bottom-left corner and at the upper-right corner.
Case 1: The walk does not hit any white block after crossing and before crossing . In this case, all steps taken in the region whose left border is and right border is are in a black area. The number of such steps is at least . Since , the claim holds.
Case 2: The walk W hits a white block after crossing and before crossing . Let us label the blocks as follows: we associate every block with a pair where is between and the number of jumps in and is between and the number of jumps in . So, the label of the “bottom-left” is , the label of the block “above” it is and the label of the block “to its right” is , etc. There are two sub-cases to consider:
Sub-case 1: At some point after crossing and before crossing , there are two white blocks of the form , so that intersects both blocks. Let be the corner between these two blocks (which must exist by definition). Since , we know that does not visit . Therefore, must walk in a black area around . Every path surrounding has length at least . Since , the claim holds.
Sub-case 2: At all times after crossing and before crossing , the walk never moves from a white block to one of the two white blocks , . Since , this is indeed the last case. The width of a combinatorial rectangle in the board is the size of its “bottom side” (i.e., the corresponding subset of ). Let be the first white block hits after crossing . Let be the family of black blocks that are to the right but on the same height as . Define as the maximal width of a rectangle of the form over all . Since the board is well-behaved, it follows (from the first condition) that the total width of the black area on the same height as is at least . Also, since we are in case 2, the left border of is to the left of . Therefore, the total width of the black area to the right of the left border of and to the left of , on the same height as is at least . Therefore, since the number of jumps is at most ,
Since we are in this sub-case, the walk must “go through” every black block it hits: it can go from bottom side to upper side or from left side to right side (but not from left side to upper side or from bottom side to right side). Consider the behaviour of after it hits : starting from a white block, because , it is guaranteed to cross . Therefore, the color of the block that “exits” from from each column must keep alternating between white and black. For each black block in , therefore, there exists a black block in the same column that crosses horizontally. Focusing on one such black block of width , since , the claim holds. ∎