Comparing computational entropies below majority
(or: When is the dense model theorem false?)
Abstract
Computational pseudorandomness studies the extent to which a random variable looks like the uniform distribution according to a class of tests . Computational entropy generalizes computational pseudorandomness by studying the extent which a random variable looks like a high entropy distribution. There are different formal definitions of computational entropy with different advantages for different applications. Because of this, it is of interest to understand when these definitions are equivalent.
We consider three notions of computational entropy which are known to be equivalent when the test class is closed under taking majorities. This equivalence constitutes (essentially) the so-called dense model theorem of Green and Tao (and later made explicit by Tao-Zeigler, Reingold et al., and Gowers). The dense model theorem plays a key role in Green and Tao’s proof that the primes contain arbitrarily long arithmetic progressions and has since been connected to a surprisingly wide range of topics in mathematics and computer science, including cryptography, computational complexity, combinatorics and machine learning. We show that, in different situations where is not closed under majority, this equivalence fails. This in turn provides examples where the dense model theorem is false.
1 Introduction
Computational pseudorandomness is a central topic in theoretical computer science. In this scenario, one has a class of boolean functions (which we’ll refer to as tests) and random variable over . We say that is -pseudorandom with respect to ) if where is the uniform distribution over and is small. In this case, we think of as ‘behaving like the uniform distribution’ according to tests in . In general, say that two random variables , -indistinguishable by if (and so -pseudorandom distributions are exactly those which are -indistinguishable from ). Constructing explicit ’s which behave like the uniform distribution according to different test classes is among the central goals of complexity theory, with sufficiently strong constructions leading to, for example, derandomization of . One way in which the theory of pseudo-randomness is rich is that there are multiple equivalent formulations of pseudo-randomness, such as Yao’s next bit test ([51]).
The various notions of pseudo-entropy and pseudo-density generalize pseudo-randomness to formalize how much randomness a distribution looks like it has as far as this class of tests can perceive. Many of these notions were first introduced as stepping stones towards pseudo-randomness, giving properties of sub-routines within constructions of pseudo-random generators. However, measuring seeming randomness quantitatively is important in many other contexts, so these notions have found wider application. For example, in mathematical subjects such as combinatorics and number theory, there is a general phenomenon of “structure vs. randomness”, where a deterministically defined object such as a graph or set of integers can be decomposed into a structured part and a random part. Pseudo-entropy quantifies how much randomness the “random part” has. Notions of pseudo-density were used in this context by Green, Tao, and Ziegler [18, 48] to show that the primes contain arbitrarily long arithmetic progressions. We can also use pseudo-entropy notions to characterize the amount of seeming randomness remains n a cryptographic key after it has been compromised with a side-channel attack. A data set used in a machine learning algorithm might not have much randomness in itself, and might not be completely random looking, but is hopefully representative of the much larger set of inputs that the results of the algorithm will be applied to, so we can use notions of pseudo-entropy to say when such algorithms will generalize. There are many possible definitions of this intuitive idea, and as with pseudo-randomness, the power of pseudo-entropy is that many of these notions have been related or proven equivalent.
In particular, the dense model theorem provides such a basic equivalence. Here, the intuitive concept we are trying to capture is the density (or relative min-entropy) of the target distribution within a larger distribution, what fraction of the larger distribution is within the target. We say that is -dense if where is density function defining (in the sense that ). One application of indistinguishability from a dense distribution is as a stepping stone to pseudorandomness: if is indistinguishable from a distribution with density within the uniform distribution, then applying a randomness extractor with min-entropy rate to is a pseudorandom distribution. A more sophisticated application comes from additive number theory. It is not hard to show that a random subset of (including each element with probability , say) contains many arithmetic progressions (which are sets of the form ). Szemerédi [45] showed that, in fact, sufficiently dense subsets of the integers also contain such arithmetic progressions: specifically, that for any , the size of the largest subsets of which doesn’t contain an arithmetic progression grows like .
So we would like some technology to reason about random variables which ‘behave like dense distributions’. It turns out, however, that formalizing what it means for to ‘behave like a dense distribution’ is subtle. Here are three perfectly legitimate candidates:
- Candidate 1:
-
behaves like a -dense distribution if it behaves like something that’s -dense. Formally, this means that is -indistinguishable from some -dense distribution. In this case, we say that has a -dense -model.
- Candidate 2:
-
behaves -dense if it’s -dense inside of something that behaves like the uniform distribution. Formally this means there’s an -pseudorandom distribution in which is -dense. In this case, we say that is -dense in an -pseudorandom set.
- Candidate 3:
-
behaves -dense if it appears to be the case that conditioning on increases the size of any set by at most (roughly) a -factor. This is an operational definition: conditioning on a (truly) dense set increases the set by at most a -fraction, so we should expect the same behavior from things that behave like a dense set. Formally, this means that for any in our test class . In this case, we say that has -pseudodensity.
Precisely which definition you pick will depend on what you know about and in what sense you would like it to behave like a -dense distribution. Indeed, each of these definitions have appeared in different applications ([25], [18], [13], respectively), so there are scenarios where each of these types of behavior is desired. In general, the first candidate is the strongest (and, arguably, the most natural), but it is sometimes hard to establish that a distribution has the property. The following claim gives some simple relationships between the definitions:
Claim 1.1.
For any , the following hold:
-
1.
If has a -dense -model, then is -dense in a -pseudorandom set.
-
2.
If is -dense in an -pseudorandom set, then has -pseudodensity.
Proof sketch.
-
1.
Let be the -dense -model for . Note that . So is -pseudorandom and is -dense within it.
-
2.
Suppose is -dense in which -pseudorandom for . Then for any , .
∎
The marvelous quality of these three candidates in particular is that, for many natural , all of them are equivalent, and so establishing even -pseudodensity is enough to guarantee the existence of a -dense -model.
This equivalence holds for which are closed under majority, meaning for any (which we can think of as for now), if then , where is if at least half of its input bits are . In fact, it holds for more general if we allow the distinguishing parameter ( in -pseudodensity) to be exponentially small (as in the original formulation, which we’ll dicuss later on). In this case, the subtelty in defining what it means to behave like a dense set vanishes. These equivalences constitute (essentially) what is known as the dense model theorem, originating in the work of Green-Tao [18] and Tao-Zeigler [48], and independently in Barak et al. [8] (though in different guises). This result has been fruitfully applied in many seemingly unrelated areas of mathematics and computer science: additive number theory [18, 48] where encodes additive information about subsets of (or possibly a more general group), graph theory [49, 38] where encodes cuts in a fixed graph, circuit complexity [49], Fourier analysis [29], machine learning [29] and leakage-resilient cryptography [14]. The ubiquity of the dense model theorem motivates a simple question: are there natural scenarios in which the dense model theorem is false?
We show that the answer to this question is yes. In particular, we show that for either implication from 1.1 there is a class and a random variable so that converse fails to hold. From the computational entropy perspective, we show that the three computational entropies we’ve discussed are inequivalent for certain test classes . Necessarily (with not exponentially small) these classes are not closed under majority and so we will need to look ‘below’ majority in order to find our counterexamples.
1.1 The dense model theorem
We turn to discuss the dense model theorem in some more detail to better contextualize our work. Restricting our attention to random variable over , the dense model theorem states the following:
Theorem 1.1 (Dense model theorem).
Let be a class of tests and a random variable over with -pseudodensity with respect to for . Then has a -dense -model with respect to .
We will generally also consider a parameter , which in this case is , the additive error in pseudodensity. To get an intuition for what this is saying, let’s conisder a setting where it’s false but for trivial reasons. As a simple example given in [52], pick a set some set as a fraction of another set of size . Then doesn’t have a -dense -model (i.e. ) with respect to ’s indicator function, which we’ll call . On the other hand, the distribution obtained by sampling with probability and sampling from ’s complement with probability is at most -distinguishable from for any function, since is simply the measure of the difference between and . In particular is -dense in the -pseudorandom (which implies, via 1.1, that it is -pseudodense). This means that the 1.1 is tight for the dependence on , in that it becomes false for . In many instances, we think of , constant (or perhaps with mild dependences on ) and .
Originally, the dense model theorem was proved with a different (and stronger) assumption; namely, that is dense in a pseudorandom set. Green and Tao, in proving that the primes contain arbitrarily long arithmetic progressions, used it to the following effect: if are the prime numbers up to , then its density is known to behave like . On the other hand, Szemerédi [45] showed that sufficiently dense subsets of contain arbitrarily long arithmetic progressions. The best bounds for Szemerédi ’s theorem require density , which is much larger than the primes (see [16] and the recent [9] for more on the rich history on this and related problems). Not all is lost, however: the only property of dense sets that we’re interested in is that they contain arithemtic progressions. So Green and Tao construct a class of tests which can ‘detect’ arithmetic progressions and under which the primes are dense inside of a -pseudorandom set (more on later). By applying the dense model theorem, we conclude that the primes ‘look like’ a dense set (themselves having long arithemtic progressions) with respect to the class . As detects arithmetic progressions, it must be the case that the primes possess them. Of course, many details need to be filled in, but we hope this example shows the reader the ‘spirit’ of the dense model theorem.
A primary source of interest in the dense model theorem is in the connections it shares with seemingly unrelated branches of mathematics and computer science. The original application was in additive number theory, but it was independently discovered and proved in the context of cryptography ([8, 14]). RTTV [38] and Gowers [17] observed proofs of the dense model theorem which use linear programming duality, which is in turn related to Nisan’s proof of the hardcore lemma from circuit complexity [28]. In fact, Impagliazzo [29] shows in unpublished work that optimal-density versions of the hardcore lemma due to Holenstein [26] actually imply the dense model theorem. Klivans and Servedio [32] famously observed the relationship betweeen the hardcore lemma and boosting, a fundamental technique for aggregating weak learners in machine learning [15]. Together with the result of Impagliazzo, this connection means that dense model theorems can be proved by a particular type of boosting algorithm. A boosting argument for the existence of dense models also gives us constructive versions of the dense model theorem, which are needed for algorithmic applications. Zhang [52] (without using Impagliazzo’s reduction from the dense model theorem to the hardcore lemma) used the boosting algorithm of [7] directly to prove the dense model theorem with optimal query complexity ().
In addition to its connections to complexity, machine learning, additive number theory and cryptography, the dense model theorem (and ideas which developed from the dense model theorem, chiefly the approximation theorem of [49]), have been used to understand the weak graph regularity lemma of Frieze and Kannan [29], notions of computational differential privacy [36] and even generalization in generative adversarial networks (GANs) [5]. We now turn to discussing the complexity-theoretic aspects of the dense model theorem, specifically regarding our question of whether the from the statement is optimal.
As alluded to earlier, Green and Tao actually worked in a setting where doesn’t ’need to compute majorities but where (that is, the distinguishing parameter in the pseudodensity assumption in the statement of 1.1) needs to be replaced by some (with experiencing a small increase). We state this result, as proved in Tao and Zeigler [48] and stated this way in RTTV [38], for comparison. For a test class , let be the set of tests of the form for .
Theorem 1.2 (Computationally simple dense-model theorem, strong assumption).
Let be a class of tests and a random variable over which is -dense in a set -pseudorandom for with and . Then has a -dense -model with respect to .
RTTV [38] observe that this proof can be adapted to work for have polynomial dependence on by restricting to the case of boolean-valued tests. Doing so, however, makes much more complicated (essentially requiring circuits of size exponential in ). In 1.1, we can obtain the best of both worlds: has polynomial dependence on and the complexity blow-up is rather small. However, in this more picturesque circumstance, we need to be able to compute majorities. Is such a tradeoff necessary? Our results suggest that the answer is yes. 1.6 (stated in the following section) tells us that if the dense model theorem is true for , then there’s a small, constant-depth circuit with -oracle gates approximating majority on bits.
Another important aspect of the dense model theorem is how the different assumptions are related. As mentioned, the original assumption was that is -dense in an -pseudorandom set, but the proof can be extended to the case where is -pseudodense. 1.1 showed that the former assumption implies that latter assumption. When the dense model theorem is true, the latter also implies the former: simply apply the dense model theorem to which is -dense to obtain a -dense -model. Then, by the first part of 1.1, we’re done.
First, we give examples of situations where these two notions are distinct. For example, we show in 1.4 and 1.5 that they are inequivalent when is constant-depth polynomial size circuits or when is a low-degree polynomial over a finite field. Note that a separation between pseudodensity and being dense in a pseudorandom set also implies a separation between pseudodensity and having a dense model, as being dense in a pseudorandom set is a necessary condition for having a dense model.
Second, we show that the dense model theorem is false even when we make the stronger assumption that the starting distribution is dense in a pseudorandom set. Specifically, in 1.3 we can show that some distributions are dense in a pseudorandom set but fail to have a dense model when consists of constant-depth, polynomial size circuits.
Having contextualized our work some, we now turn to describe our contributions in more detail.
1.2 Contributions
We separate the previously described notions of computational entropy, giving examples where the dense model theorem is false. We are able to prove different separations when is constant-depth unbounded fan-in circuits, low-degree polynomials over a finite field, and, in one case, any test class which cannot efficiently approximate majority (in some sense made explicit later on). The only known separation prior was between pseudodensity and having a dense model for bounded-width read-once branching programs, due to Barak et al. [8].
Let denote the class of unbounded fan-in, size , depth circuits. We are generally thinking of and , which corresponds to the complexity class . 1.3 shows that being -dense in an -pseudorandom set need not imply that has a -dense -model when the test class is :
Theorem 1.3.
Let be arbitrary, and
Then for , there is a random variable over with so that is -dense in an -pseudorandom set but does not have a -dense -model. In particular, the dense model theorem is false in this setting.
Recall that the dense model theorem is false when , which makes the restriction extremely mild. A common regime is , and , in which case this gives us (essentially) a lower bound of weakly exponential in .
Let denote the product distribution of Bernoulli random variables with success probability . Recall that density in a pseudorandom set readily implies pseudodensity, and one can use the dense model theorem to show the converse. We show that -pseudodensity need not imply -density in an -pseudorandom set when the test class is :
Theorem 1.4.
Fix , , and
Then over with is -pseudodense and yet is not -dense inside of any -pseudorandom set.
The dependence means that we can take exponentially smaller than and still obtain a separation. This case corresponds to being ‘very’ fooled by but still not being -dense in a ‘mildly’ pseudorandom set. This result draws on a recent line of work in the pseudorandomness literature — often referred to as ‘the coin problem’ and studied in, e.g., [42, 12, 1, 46] — which concerns the ability of a test class unable to compute majority has in distinguising and . We will discuss this connection in more detail during the proof overviews.
We prove a similar separation for degree- -polynomials (on variables), which generalizes (and uses techniques from) a recent result of Srinivasan [44] in the case where . In this case, we think of a distribution as being -pseudodense for degree- -polynomials when for any degree- polynomial (noting that we are only evaluating over ).
Theorem 1.5.
Fix a finite field with characteristic , and let where is an absolute constant. Suppose that
Then when is the -variate degree- polynomials over with , and , is -pseudodense but is not -dense inside of an -pseudorandom set.
This implies lower bounds for constant-depth circuits with gates by the classical lower bounds of Razborov [37] and Smolensky [43]. Perhaps more interestingly, this holds even over non-prime fields. Also notably, there is no dependence on , so we can take it to be arbitrarily small.
We also prove a more general separation between pseudodensity and density in a pseudorandom set. This result, drawing from the work of [42], provides a more specific characterization of the sense in which dense model theorems are ‘required’ to compute majority.
Theorem 1.6.
Let . Suppose is a test class of boolean functions with the following property: there is no -oracle circuit of size computing majority on bits.
Then is -pseudodense and yet does not have a -dense -model. In particular, when the hypotheses are met, the dense model theorem is false.
Informally, this says that any which can refute the pseudodensity of is only ‘a constant-depth circuit away’ from computing majority.
1.3 Related work
Computational entropy
Computational entropy was studied systematically in [8] and is relevant to various problems in complexity and cryptography such as leakage-resilience [14], constructions of PRGs from one-way functions [25, 21, 20]. and derandomization [13].
There are a number of definitions of computational entropy which we don’t consider in this work. For example, Yao pseudoentropy [51] (see also [8]), corresponding to random variables which are ‘compressible’ by a class of tests , in the sense that can encode and decode the random variable by encoding into a small number of bits. Yao pseudoentropy was recently used in time-efficient hardness-to-randomness tradeoffs [13], where (randomness-efficient) samplers for pseudodense distributions were used with an appropriate extractor to construct a pseudorandom distribution. Another example is inaccessible entropy of Haitner et al. [21], corresponding to the entropy of a message at some round in a two-player protocol conditioned on the prior messages and the randomness of the players, which is used in efficient constructions of statistically hiding commitment schemes from one-way functions [20].
Separating notions of computational entropy has been studied before in [8], who prove a separation of pseudodensity and having a dense model for bounded-width read-once branching programs. Separating notions of conditional computational entropy was studied in [27], showing separations between conditional variants of Yao pseudoentropy and having a dense model.
As mentioned in [27], citing [49] and personal communication with Impagliazzo, another question of interest is whether Yao pseudoentropy (corresponding to efficient encoding/decoding algorithms) implies having dense model. It is not hard to see that small Yao pseudoentropy implies small pseudodensity, with some mild restrictions on . It would be interesting to see if the techniques from this paper can be used to understand Yao pseudoentropy in more detail. We leave this to future work.
Complexity of dense model theorems and hardness amplification
Prior work on the complexity of dense model theorems has included a tight lower bound on the query complexity [52] and a lower bound on the advice complexity [50]. As far as we are aware, this is the first work to consider the computational complexity of dense model theorems.
There has also been prior work on the computational complexity of hardness amplification, establishing that various known strategies for hardness amplification require the computation of majority [34, 42, 19, 41]. It is known that a particular type of hardness amplification given by the hardcore lemma implies the dense model theorem [29].
Our results are stronger in the following sense: previous work [34, 42, 19] shows that black-box hardness amplification proofs require majority. This means that if you amplify the hardness of in some black-box way, then this can be used to compute majority. In our case, we simply show (in different settings) that the dense model theorem is false, regardless of how we tried to prove it. By the connection between the hardcore lemma and the dense model theorem, our results also provide scenarios where the hardcore lemma is false. As far as we are aware, these are the first such scenarios recorded in the literature.
1.4 Technical overview
We discuss two general themes that appear consistently in the proofs and then discuss each of the main theorems in some more detail.
1.4.1 Dense distributions have mostly unbiased bits
A commonly-used observation in theoretical computer science is that most bit positions of a -dense random variable over have bias (see, for example, the introduction of [35]). Relevant to our purposes, it provides a necessary condition for having a -dense -model with respect to any class containing the projections . has a -dense -model, then most bits of have bias . In particular, if all of the bits of have large bias, then it can’t have a dense model.
This is used directly in the proof of 1.3. In this case, we construct a distribution which is -dense in a set which is -pseudorandom for but where the each bit is noticeably biased away from .
In order to prove separations between pseudodensity and being dense in a pseudorandom set — as in 1.4, 1.5 and 1.6 — we need to consider the bias of larger subsets of variables. Considering just two bits is sufficient to prove mild concentration bounds on the weight of pseudorandom strings. This implies that the tails of dense subsets of pseudorandom sets should not be too heavy.
1.4.2 Biased coin distribution
The biased coin distribution, over is the product of Bernoulli random variables with success probability . has recently garnered significant interest in the pseudorandomness literature (see [2, 12, 46, 10, 1]). Shaltiel and Viola [42] showed that if is a test which -distinguishes from , then there is a small, constant-depth circuit with -oracle gates which computes majority on bits. A similar, but qualitatively different, connection due to Limaye et al [33] — extended to any choice of by Srinivasan [44] — shows that any -polynomial with advantage in distinguishing from must have degree . We extend some of these pseudorandomness results regarding to pseudodensity results.
First, we extend the observation of Shaltiel and Viola to apply to tests for which (which corresponds to pseudorandomness when ). This gives us unconditional pseudodensity for test classes which can’t be used in small, constant-depth oracle circuits approximating majority. We also extend the observation of [33] to show lower bounds on the -degree for any function which refutes the pseudodensity of .
In Lemma 4.1, we show that exhibits -pseudodensity for and . This can be seen as a generalization of Tal’s result, building on [12, 1, 42] that is -pseudorandom for .
Tal uses a Fourier analytic proof which becomes very simple given tail bounds on the Fourier spectrum of (the latter being the main contribution of [46]). More generally, any enjoying sufficiently strong tail bounds on the Fourier spectrum (in the norm) cannot distinguish between and uniform. It turns out, as proved by Tal and recorded in Agarwal [2], that if is closed under restrictions than even bounding the first level of the Fourier spectrum works. The proof of Lemma 4.1 based specifically on the switching lemma for constant-depth circuits. While switching lemmas can be used to show Fourier concentration, it would be intersting to find a proof which only uses the assumption of Fourier concentration (or some Fourier-analytic assumption).
1.4.3 1.3
Our goal is to construct a random variable which is dense inside of an -pseudorandom set but where each bit is biased away from . In this case, would be distinguishable from any dense set, since the average bit of a dense set is roughly unbiased. Doing so requires two steps.
The first step is constructing an appropriate distribution that fools circuits. For this we adopt a general strategy of Ajtai and Wigderson [3] (and applied in many contexts in pseudorandomness since; see, e.g., [40]): to fool a circuit , we start by producing a random restriction to simpify to a short decision tree (via the switching lemma), and then we fool the decision tree on the remaining bits using a -wise independent distribution . If we wanted to have small support size, we would need some way of producing random restrictions with a small amount randomness (which is precisely the approach of Ajtai-Wigderson and later work). Fortunately, we only care about the existence of and are therefore content to use the ‘non-derandomized’ switching lemma.
The second step is finding a dense subset of with biased bits. We do this by constructing so that each bit has bias roughly , where is a parameter. This is achieved by randomly bucketing the indices into buckets and assigning each bucket a random bit, which reduces the dimension of the problem from to . This means we can pick a -dense event in with extremal bias — met (up to constants) by the function accepting all strings with weight less than — in order to find a dense subset of with large bias. The bucketing construction introduces some error when a small set hits to distinct elements in some buckets.
1.4.4 1.4
We will show has -pseudodensity for for , . The idea is that can be sampled by first sampling a random restriction which leaves a fraction of the bits unset (and is unbiased on the restricted bits) and then setting the remaining bits with bias . Applying the switching lemma, we conclude that where is a short decision tree (which doesn’t not depend on all of its inputs). A simple calculation reveals that acceptance probability of can increase by at a most a factor when passing from the uniform distribution to . By incorporating the error from the switching lemma (i.e. the advantage lost by conditioning on the switching lemma succeeding), we get -pseudodensity.
To prove the separation, we use the fact that the Hamming weight of a random variable fooling is concentrated around its expectation. This means in particular that if were -dense in a pseudorandom distribution, then the tails of couldn’t be too heavy and therefore couldn’t be too large.
1.4.5 1.5 and 1.6
With and an arbitrary class of tests , suppose that witnesses that fails to have -pseudo-density in the sense that
[44] and [42] both make use of the following simple observation. Given two strings with and , a uniformly random index has distributed as a -biased coin and as an unbiased coin. In our case, applying to sufficiently many random samples from or ‘distinguishes’ the two of them, but in a weaker sense.
In the case of 1.6, we can amplify acceptance probabilities by increasing the size of the circuit by a factor , after which we can apply [42] saying that constant-error distinguishers between and can be used to compute majority.
For 1.5, we apply a beautiful recent result of Srinivasan [44] showing that any -variate polynomial (over a finite field) which vanishes on most points on the slice and doesn’t vanish on most points on the slice must have high degree . One way of interpreting this result is that low-degree polynomials can’t approximately solve certain ‘promise’ versions of majority.
2 Technical tools
We write and use boldface to denote random variables. Let be the set of size , depth- unbounded fan-in circuits. For a boolean function , let denote the depth of the shortest decision tree computing .
2.1 Biased coins
As before, let denote the random variable corresponding to the product of independent coins with bias . That is,
where denotes the Hamming weight of .
For a random variable over and , let . Let be the set of monotone projections. A random variable is -pseudorandom with respect to precisely when each marginal has the property that for each . In particular,
Claim 2.1.
For any , is -pseudorandom with respect to .
2.2 Information theory
The (Shannon) entropy of a random variable is defined as
where is the probability density function corresponding to . The Shannon entropy of random vector is sub-additive, in that . When and , we use to denote the binary entropy function.
The min-entropy is defined as
If is -dense inside of , then its min-entropy is and for any random variable , .
By this latter inequality and subadditivity, the average entropy of ’s bits is at least . Appealing to a quadratic approximation of binary entropy, we learn that the bias must be at most . This result has been referred to as Chang’s inequality and the Level-1 inequality, having been observed in different forms and with different proofs in, for example, [47, 11, 22, 31]. Because it is so simple, we provide a proof here:
Claim 2.2.
If is -dense in , then
Proof.
As -density is equivalent to min-entropy,
by subadditivity of entropy. The entropy of ’s bits, therefore, is at least on average. Taking the Taylor series, we can approximate the binary entropy function around by a quadratic function as . Comparing this bound with the average, we get
meaning . ∎
2.3 Random variables lacking computational entropy
It follows directly from 2.2 that if exceeds for every , then does not have a -dense -model with respect to the projections .
Lemma 2.1.
Let be a random variable with
for every . Then for any and , does not have a -dense -model with respect to .
This is used for the separation in 1.3. We would also like a necessary condition for being dense in a pseudorandom set. Towards this end, we note that pseudorandom distributions for even very simple test classes have mild concentration properties.
Claim 2.3.
Suppose can compute for every and let over be -pseudorandom for . Then
Proof.
We work over instead of to make calculations easier. We can compute the second moment as
Applying Markov’s inequality to , we see that
We use because it maps back to in . Then the conclusion follows from our second moment calculation and converting back to . ∎
The tails of a dense subset can’t be too much larger than the original distribution, by definition of density. This gives us a test for being dense in a pseudorandom set, which we specialize to .
Lemma 2.2.
Let be arbitrary. Suppose can compute for any and . Then is not -dense in any set which is -pseudorandom for .
Proof.
Under , the volume of the threshold is . Taking 2.3 in the contrapositive, we reach the desired conclusion when
∎
2.4 Random restrictions and the switching lemma
A restriction over is a function . Indices in can be thought of as unset and each other index as set. For another restriction so that , let be defined by
Define the restricted function over ’s unset indices by
Let be the distribution on restrictions over obtained by setting independently with probability , and then setting each bit not assigned to a random bit. The switching lemma we use is due to Rossman [39], building on a long line of work [3, 23, 24, 30]:
Theorem 2.1 (Rossman [39]).
Suppose . Then
By considering a random restriction over and a random variable over , the definition of a restricted function implies that
We make crucial use of two simple corollaries of the switching lemma, which allow us to reason about distinguishability for circuits in terms of distinguishability for short decision trees.
Lemma 2.3.
Suppose . Then there is a distribution over depth decision trees so that
Proof.
Let denote the optimal decision tree for . Let denote the event that has depth at most and . Let be the distribution over depth at most decision trees obtained by sampling conditioned on . Then
The right-hand term is bounded in absolute value by because is Boolean. By 2.1, . ∎
Lemma 2.4.
Suppose . Then there’s a depth decision tree so that
3 Proof of 1.3
We start by reducing the problem of constructing a pseudorandom for to constructing a pseudorandom for small-depth decision trees. This can be immediately achieved by applying Lemma 2.4.
Claim 3.1.
Let be arbitrary and suppose is a random variable over which is -pseudorandom for depth- decision trees. Then for , is -pseudorandom for for
The next lemma constructs a pseudorandom distribution for depth- decision trees with each bit having significant bias.
Lemma 3.1.
For any and , there is a -wise independent random variable over and a -dense subset of with the property that
-
1.
is -dense in .
-
2.
For all ,
-
3.
is -pseudorandom for depth- decision trees.
We will use the following standard lower bound on the lower tail of a binomial distribution:
Claim 3.2 ([6]).
For and let be independent unbiased coins (-valued). Then any with for some positive integer satisfies
Proof of Lemma 3.1.
We sample in two stages. First, randomly partition into parts for . Second, assign to each a uniformly random bit .
Let be conditioned on having weight less than . Since the ’s are unbiased random bits, we can apply 3.2 to lower bound ’s density: for any ,
This is at least when
with the upper bound in the last line following from . Hence, if the set of strings with weight at most is -dense, we have . is at most when , in which case . In particular, this lower bounds the bias of ’s bits.
To see why it’s -pseudorandom for depth- decision trees, consider a depth- decision tree . Over , we can imagine evaluting ‘on-line’ as follows: whenever queries the th bit, determine the value of by flipping an unbiased coin. Over , we can imagine evaluating similarly, where we determine the bucket that lives in and the value of that bucket.
By conditioning on not placing two distinct indices in the same bucket — call this conditioned random variable — then doesn’t have any distinguish advantage over , as all of the bits it queries are independent and uniform. By a union bound, places two distinct indices in the same bucket with probability at most . ’s distinguishing advantage is therefore at most . ∎
In principle, we could have used other pseudorandom distributions for decision trees such as the -almost -wise independent distributions from [4]. The construction here is used to obtain better dependence on the parameters of interest. We will also need a claim to expresses the bias of the bits in . The proof can be found in the appendix.
Claim 3.3.
Fix and a random variable . Let be an event which is independent from (in that the conditional distribution of is identical to the unconditioned distribution). Then
1.3, which we restate here, is obtained by an appropriate setting of parameters.
See 1.3
Proof.
Let , and . We also need by the restriction in Lemma 3.1, which explaines the restriction , simplified by using (a stronger restriction) instead. Let and be the random variables from Lemma 3.1. By 3.3, the bias of (where ) is . By 3.1 and Lemma 3.1, is pseudorandom. We can also ensure that does not have a -dense -model when , by Lemma 2.1.
By substituting, . In comparison, . Recalling that , we get that
The claim follows by solving for . ∎
4 Proof of 1.4
Lemma 4.1.
has -pseudodensity for for and .
Of note, the only additive error depends on the error from the switching lemma. Compare this with the claim that is -pseudorandom (and therefore has the same pseudodensity for ) for , due to Tal [46].
To prove the lemma, we need a few claims.
Claim 4.1.
Suppose . Then there is a depth- decision tree with the property that:
Proof.
Second, we can upper bound the extent to which the acceptance probability of a short decision tree increases when passing from the uniform distribution to the biased distribution
Claim 4.2.
Suppose is a depth- decision tree. Then
The proof amounts to a calculation, which we include in the appendix. We’re now in a position to prove the lemma.
Proof of Lemma 4.1.
We can now prove 1.4, restated here:
See 1.4
Proof of 1.4.
Let , and . These choices satisfy , meaning is not -dense in any -pseudorandom set for , by Lemma 2.2.
By Lemma 4.1, has -pseudodensity for and .
The constraint on the density implies
Plugging this value of into the expression for , we get
Note that . Solving for gives the claimed bound. ∎
5 Proofs of 1.5 and 1.6
In this section, we prove 1.5 and 1.6. We include them in the same section due to their similarity and start with 1.5 which is more involved. Indeed, 1.6 will follow directly from a result of Shaltiel and Viola [42] after an appropriate error reduction procedure.
5.1 Proof of 1.5
As in 1.4, we will prove unconditional pseudodensity for and compare it with the lower bound for given by Lemma 2.2.
Let denote the (random variable uniform over the) set of -bit strings of weight exactly . We convert a function which refutes the pseudodensity of into a (random) function which distinguishes between the two ‘slices’ of the hypercube of weight and , which extend similar ideas from [42, 33, 44].
Lemma 5.1.
Let be a function for which and let . Then for any positive integer , there is a random function so that
Moreover, is the OR of copies of (on random inputs).
Proof.
Fix an input . For , let denote the random length sequence over obtained by sampling indices independently, uniformly at random (with replacement).
The random function is defined as
We can evaluate ’s acceptance probability on and as follows:
-
1.
Suppose . Then each is distributed as on bits, meaning the ’s constitute independent samples from . By definition, accepts with probability over and so the probability that doesn’t accept in independent runs is .
-
2.
Suppose . Then is a uniformly random string, meaning we have independent samples from . For each sample, outputs independently with probability , so that this occurs once in attempts happens with probability at most . Since , we can bound this by .
∎
Lemma 5.2.
Fix and for some absolute constant Let be a field of positive characteristic and let be the set of all polynomials with degree at most . Then if has -pseudodensity with respect to , .
Note that there is no dependence on . This is a relic of the field size , which allows us to approximate large fan-in ORs with polynomials whose degree does not depend on the fan-in but only the quality of approximation.
We will proceed in the contrapositive; if isn’t pseudodense, we will use the witness to construct a polynomial which we can prove degree lower bounds on directly. By ensuring that the degrees are related by a constant factor, we get the lower bound. The proof uses the following approximation of the OR function as a low-degree polynomial over when has positive characteristic (in characteristic zero, there is dependence on the fan-in of the OR, which we want to avoid).
Claim 5.1 ([37]).
For any and any finite field with characteristic , there is a distribution on degree polynomials so that for all
where . Moreover, is supported on polynomials with for every
We also use a special case of the robust Hegëdus lemma, discovered recently by Srinivasan [44].
Lemma 5.3 (Robust Hegëdus lemma (special case), [44]).
Let be a finite field. Let where is a (small) absolute constant. Let be an integer so that . Then if is a degree polynomial for which:
-
1.
-
2.
Then .
Now we can prove the main lemma.
Proof of Lemma 5.2.
Let be a degree polynomial. We can assume takes boolean values over by replacing with ( being ’s characteristic). This increases the degree to less than . Assume towards a contradiction that can certify that is not -pseudodense, in the sense that .
Let . Applying Lemma 5.1 to (noting that is boolean on ) with specified later, we obtain a random polynomial of the form . For a fixed , let For to be determined later, let be the random polynomial from 5.1 and we remark that can be made to output boolean values on boolean inputs. Then for any , so with probability . In sum, has the property that:
Moreover, is a random polynomial over of degree at most , with the coming from possibly replacing with .
We now apply Lemma 5.3 to . Let denote the constant from the statement of the lemma. Let , , and where the inequality is because follows from . With this setting of parameters, note that, in regards to the hypotheses of Lemma 5.3, we have ,
By calculating, ’s classifies incorrectly with probability and classifies incorrectly with probability , which we can bound away from by choosing to be a sufficiently small constant.
Therefore, by Lemma 5.3, the degree of some polynomial in the support of is . Since and (’s characteristic) are constants, the same conclusion therefore holds of . ∎
We can now prove 1.5, restated here:
See 1.5
Proof of 1.5.
If is -pseudodense with respect to degree- polynomials over , then . On the other hand, taking implies that is not -dense in an -pseudorandom set. Therefore, if we take then picking will give us a separation. ∎
5.2 Proof of 1.6
In what is becoming tradition, we will show an unconditional pseudodensity result, from which 1.6 will follow by Lemma 2.2.
Lemma 5.4.
Let and be arbitrary. Suppose is a test class of boolean functions with the following property: there is no -oracle circuit of size which computes on input bits. Then is -pseudodense.
This will follow from the result of from Shaltiel and Viola:
Theorem 5.1 ([42]).
Let be a function that distinguishes between and with constant distinguishing probability. Then there is an -circuit of size using -oracle gates which computes majority on bits.
Proof of Lemma 5.4.
Suppose is not -pseudodense for as in the statement of the theorem and let witness this fact with .
Let
where each is a random permutation of . When , is distributed as and likewise for . Hence rejects samples from with probability which is constant when , this latter property following because . Additionally, accepts samples from with probability , which is at most for our choice of . Averaging, some function in ’s support distinguishes and with constant advantage. Applying 5.1 yields a small circuit computing majority, which is a contradiction. ∎
See 1.6
The proof is the same as 1.5, so we omit the details.
References
- [1] Scott Aaronson. Bqp and the polynomial hierarchy. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 141–150, 2010.
- [2] Rohit Agrawal. Coin theorems and the fourier expansion. arXiv preprint arXiv:1906.03743, 2019.
- [3] Miklos Ajtai and Avi Wigderson. Deterministic simulation of probabilistic constant depth circuits. In 26th Annual Symposium on Foundations of Computer Science (sfcs 1985), pages 11–19. IEEE, 1985.
- [4] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.
- [5] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 224–232. JMLR. org, 2017.
- [6] R.B. Ash. Information Theory. Dover books on advanced mathematics. Dover Publications, 1990.
- [7] Boaz Barak, Moritz Hardt, and Satyen Kale. The uniform hardcore lemma via approximate bregman projections. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 1193–1200. SIAM, 2009.
- [8] Boaz Barak, Ronen Shaltiel, and Avi Wigderson. Computational analogues of entropy. In Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques, pages 200–215. Springer, 2003.
- [9] Thomas F Bloom and Olof Sisask. Breaking the logarithmic barrier in roth’s theorem on arithmetic progressions. arXiv preprint arXiv:2007.03528, 2020.
- [10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 30–39. IEEE, 2010.
- [11] Mei-Chu Chang et al. A polynomial bound in freiman’s theorem. Duke mathematical journal, 113(3):399–419, 2002.
- [12] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
- [13] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandomness from hardness. Technical report, ECCC preprint TR19-099, 2019.
- [14] Stefan Dziembowski and Krzysztof Pietrzak. Leakage-resilient cryptography. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 293–302. IEEE, 2008.
- [15] Yoav Freund and Robert Schapire. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771-780):1612, 1999.
- [16] W. T. Gowers. A new proof of szemerédi’s theorem. Geometric & Functional Analysis GAFA, 11(3):465–588, 2001.
- [17] W. T. Gowers. Decompositions, approximate structure, transference, and the hahn–banach theorem. Bulletin of the London Mathematical Society, 42(4):573–606, 2010.
- [18] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions. Annals of Mathematics, pages 481–547, 2008.
- [19] Aryeh Grinberg, Ronen Shaltiel, and Emanuele Viola. Indistinguishability by adaptive procedures with advice, and lower bounds on hardness amplification proofs. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 956–966. IEEE, 2018.
- [20] Iftach Haitner, Omer Reingold, and Salil Vadhan. Efficiency improvements in constructing pseudorandom generators from one-way functions. SIAM Journal on Computing, 42(3):1405–1430, 2013.
- [21] Iftach Haitner, Omer Reingold, Salil Vadhan, and Hoeteck Wee. Inaccessible entropy. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 611–620, 2009.
- [22] Lianna Hambardzumyan and Yaqiao Li. Chang’s lemma via pinsker’s inequality. Discrete Mathematics, 343(1):111496, 2020.
- [23] Johan Håstad. Computational limitations of small-depth circuits. 1987.
- [24] Johan Håstad. On the correlation of parity and small-depth circuits. SIAM Journal on Computing, 43(5):1699–1708, 2014.
- [25] Johan Håstad, Russell Impagliazzo, Leonid A Levin, and Michael Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
- [26] Thomas Holenstein. Key agreement from weak bit agreement. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 664–673, 2005.
- [27] Chun-Yuan Hsiao, Chi-Jen Lu, and Leonid Reyzin. Conditional computational entropy, or toward separating pseudoentropy from compressibility. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 169–186. Springer, 2007.
- [28] Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 538–545. IEEE, 1995.
- [29] Russell Impagliazzo. Connections between pseudo-randomness and machine learning: boosting, dense models, and regularity. 2020.
- [30] Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisfiability algorithm for ac0. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 961–972. SIAM, 2012.
- [31] Russell Impagliazzo, Cristopher Moore, and Alexander Russell. An entropic proof of chang’s inequality. SIAM Journal on Discrete Mathematics, 28(1):173–176, 2014.
- [32] Adam R Klivans and Rocco A Servedio. Boosting and hard-core set construction. Machine Learning, 51(3):217–238, 2003.
- [33] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. A fixed-depth size-hierarchy theorem for via the coin problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 442–453, 2019.
- [34] Chi-Jen Lu, Shi-Chun Tsai, and Hsin-Lung Wu. Complexity of hard-core set proofs. computational complexity, 20(1):145–171, 2011.
- [35] Or Meir and Avi Wigderson. Prediction from partial information and hindsight, with application to circuit lower bounds. computational complexity, 28(2):145–183, 2019.
- [36] Ilya Mironov, Omkant Pandey, Omer Reingod, and Salil Vadhan. Computational differential privacy. In Annual International Cryptology Conference, pages 126–142. Springer, 2009.
- [37] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
- [38] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Dense subsets of pseudorandom sets. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 76–85. IEEE, 2008.
- [39] Benjamin Rossman. An entropy proof of the switching lemma and tight bounds on the decision-tree size of ac0, 2017.
- [40] Rocco A Servedio and Li-Yang Tan. Improved pseudorandom generators from pseudorandom multi-switching lemmas. arXiv preprint arXiv:1801.03590, 2018.
- [41] Ronen Shaltiel. Is it possible to improve yao’s xor lemma using reductions that exploit the efficiency of their oracle? In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
- [42] Ronen Shaltiel and Emanuele Viola. Hardness amplification proofs require majority. SIAM Journal on Computing, 39(7):3122–3154, 2010.
- [43] Roman Smolensky. On representations by low-degree polynomials. In Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science, pages 130–138. IEEE, 1993.
- [44] Srikanth Srinivasan. A robust version of hegedus’s lemma, with applications. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 1349–1362. ACM, 2020.
- [45] Endre Szemerédi. On sets of integers containing no four elements in arithmetic progression. Acta Mathematica Academiae Scientiarum Hungarica, 20(1-2):89–104, 1969.
- [46] Avishay Tal. Tight bounds on the fourier spectrum of ac0. In 32nd Computational Complexity Conference (CCC 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
- [47] Michel Talagrand. How much are increasing sets positively correlated? Combinatorica, 16(2):243–258, 1996.
- [48] Terence Tao, Tamar Ziegler, et al. The primes contain arbitrarily long polynomial progressions. Acta Mathematica, 201(2):213–305, 2008.
- [49] Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Regularity, boosting, and efficiently simulating every high-entropy distribution. In 2009 24th Annual IEEE Conference on Computational Complexity, pages 126–136. IEEE, 2009.
- [50] Thomas Watson. Advice lower bounds for the dense model theorem. ACM Transactions on Computation Theory (TOCT), 7(1):1–18, 2015.
- [51] Andrew C Yao. Theory and application of trapdoor functions. In 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pages 80–91. IEEE, 1982.
- [52] Jiapeng Zhang. On the query complexity for showing dense model. 2011.
Appendix A Omitted proofs
A.1 Proof of 3.3
See 3.3
Proof.
Let be the event that .
where we used independence of from and the independence of from to obtain the final line. ∎
A.2 Proof of 4.2
See 4.2
Proof.
We proceed by induction. When , is constant and so the claim holds trivially. Suppose where have depth- decision trees which don’t depend on . Note that and assume that . For ,
(independence) | ||||
(induction) | ||||
(assuming ) | ||||
The argument for is the same. ∎