This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Comparing computational entropies below majority
(or: When is the dense model theorem false?)

Russell Impagliazzo Sam McGuire UCSE CSE UCSD CSE [email protected] [email protected] Research supported by NSF award 1909634 and a Simons Investigator awardResearch supported by NSF award 1909634 and a Simons Investigator award
Abstract

Computational pseudorandomness studies the extent to which a random variable 𝐙\mathbf{Z} looks like the uniform distribution according to a class of tests \mathcal{F}. Computational entropy generalizes computational pseudorandomness by studying the extent which a random variable looks like a high entropy distribution. There are different formal definitions of computational entropy with different advantages for different applications. Because of this, it is of interest to understand when these definitions are equivalent.

We consider three notions of computational entropy which are known to be equivalent when the test class \mathcal{F} is closed under taking majorities. This equivalence constitutes (essentially) the so-called dense model theorem of Green and Tao (and later made explicit by Tao-Zeigler, Reingold et al., and Gowers). The dense model theorem plays a key role in Green and Tao’s proof that the primes contain arbitrarily long arithmetic progressions and has since been connected to a surprisingly wide range of topics in mathematics and computer science, including cryptography, computational complexity, combinatorics and machine learning. We show that, in different situations where \mathcal{F} is not closed under majority, this equivalence fails. This in turn provides examples where the dense model theorem is false.

1 Introduction

Computational pseudorandomness is a central topic in theoretical computer science. In this scenario, one has a class \mathcal{F} of boolean functions f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\} (which we’ll refer to as tests) and random variable 𝐙\mathbf{Z} over {0,1}n\{0,1\}^{n}. We say that 𝐙\mathbf{Z} is ϵ\epsilon-pseudorandom with respect to \mathcal{F}) if maxf|𝔼[f(𝐙)]𝔼[f(𝐔)]|ε\max_{f\in\mathcal{F}}|\mathbb{E}[f(\mathbf{Z})]-\mathbb{E}[f(\mathbf{U})]|\leq\varepsilon where 𝐔\mathbf{U} is the uniform distribution over {0,1}n\{0,1\}^{n} and ε>0\varepsilon>0 is small. In this case, we think of 𝐙\mathbf{Z} as ‘behaving like the uniform distribution’ according to tests in \mathcal{F}. In general, say that two random variables 𝐗\mathbf{X}, 𝐘\mathbf{Y} ε\varepsilon-indistinguishable by \mathcal{F} if maxf|𝔼[f(𝐗)]𝔼[f(𝐘)]|\max_{f\in\mathcal{F}}|\mathbb{E}[f(\mathbf{X})]-\mathbb{E}[f(\mathbf{Y})]| (and so ε\varepsilon-pseudorandom distributions are exactly those which are ε\varepsilon-indistinguishable from 𝐔\mathbf{U}). Constructing explicit 𝐙\mathbf{Z}’s which behave like the uniform distribution according to different test classes is among the central goals of complexity theory, with sufficiently strong constructions leading to, for example, derandomization of 𝖡𝖯𝖯{\mathsf{BPP}}. One way in which the theory of pseudo-randomness is rich is that there are multiple equivalent formulations of pseudo-randomness, such as Yao’s next bit test ([51]).

The various notions of pseudo-entropy and pseudo-density generalize pseudo-randomness to formalize how much randomness a distribution looks like it has as far as this class of tests can perceive. Many of these notions were first introduced as stepping stones towards pseudo-randomness, giving properties of sub-routines within constructions of pseudo-random generators. However, measuring seeming randomness quantitatively is important in many other contexts, so these notions have found wider application. For example, in mathematical subjects such as combinatorics and number theory, there is a general phenomenon of “structure vs. randomness”, where a deterministically defined object such as a graph or set of integers can be decomposed into a structured part and a random part. Pseudo-entropy quantifies how much randomness the “random part” has. Notions of pseudo-density were used in this context by Green, Tao, and Ziegler [18, 48] to show that the primes contain arbitrarily long arithmetic progressions. We can also use pseudo-entropy notions to characterize the amount of seeming randomness remains n a cryptographic key after it has been compromised with a side-channel attack. A data set used in a machine learning algorithm might not have much randomness in itself, and might not be completely random looking, but is hopefully representative of the much larger set of inputs that the results of the algorithm will be applied to, so we can use notions of pseudo-entropy to say when such algorithms will generalize. There are many possible definitions of this intuitive idea, and as with pseudo-randomness, the power of pseudo-entropy is that many of these notions have been related or proven equivalent.

In particular, the dense model theorem provides such a basic equivalence. Here, the intuitive concept we are trying to capture is the density (or relative min-entropy) of the target distribution within a larger distribution, what fraction of the larger distribution is within the target. We say that 𝐙\mathbf{Z} is δ\delta-dense if 𝔼[μ(x)]=2nxμ(x)δ\mathbb{E}[\mu(x)]=2^{-n}\sum_{x}\mu(x)\geq\delta where μ:{0,1}n[0,1]\mu:\{0,1\}^{n}\to[0,1] is density function defining 𝐙\mathbf{Z} (in the sense that Pr[𝐙=z]=μ(z)/(2n𝔼[μ(x)])\Pr[\mathbf{Z}=z]=\mu(z)/(2^{n}\mathbb{E}[\mu(x)])). One application of indistinguishability from a dense distribution is as a stepping stone to pseudorandomness: if 𝐙\mathbf{Z} is indistinguishable from a distribution 𝐌\mathbf{M} with density δ\delta within the uniform distribution, then applying a randomness extractor with min-entropy rate nlog(1/δ)n-\log(1/\delta) to 𝐙\mathbf{Z} is a pseudorandom distribution. A more sophisticated application comes from additive number theory. It is not hard to show that a random subset of [N]={1,2,,N}[N]=\{1,2,...,N\} (including each element with probability 1/21/2, say) contains many arithmetic progressions (which are sets of the form {a,a+b,a+2b,a+3b,}\{a,a+b,a+2b,a+3b,...\}). Szemerédi [45] showed that, in fact, sufficiently dense subsets of the integers also contain such arithmetic progressions: specifically, that for any kk, the size of the largest subsets of [N][N] which doesn’t contain an arithmetic progression grows like o(N)o(N).

So we would like some technology to reason about random variables 𝐙\mathbf{Z} which ‘behave like dense distributions’. It turns out, however, that formalizing what it means for 𝐙\mathbf{Z} to ‘behave like a dense distribution’ is subtle. Here are three perfectly legitimate candidates:

Candidate 1:

𝐙\mathbf{Z} behaves like a δ\delta-dense distribution if it behaves like something that’s δ\delta-dense. Formally, this means that 𝐙\mathbf{Z} is ε\varepsilon-indistinguishable from some δ\delta-dense distribution. In this case, we say that 𝐙\mathbf{Z} has a δ\delta-dense ε\varepsilon-model.

Candidate 2:

𝐙\mathbf{Z} behaves δ\delta-dense if it’s δ\delta-dense inside of something that behaves like the uniform distribution. Formally this means there’s an ε\varepsilon-pseudorandom distribution 𝐗\mathbf{X} in which 𝐙\mathbf{Z} is δ\delta-dense. In this case, we say that 𝐙\mathbf{Z} is δ\delta-dense in an ε\varepsilon-pseudorandom set.

Candidate 3:

𝐙\mathbf{Z} behaves δ\delta-dense if it appears to be the case that conditioning on 𝐙\mathbf{Z} increases the size of any set by at most (roughly) a 1/δ1/\delta-factor. This is an operational definition: conditioning on a (truly) dense set increases the set by at most a 1/δ1/\delta-fraction, so we should expect the same behavior from things that behave like a dense set. Formally, this means that δ𝔼[f(𝐙)]𝔼[f(𝐔)]+ε\delta\mathbb{E}[f(\mathbf{Z})]\leq\mathbb{E}[f(\mathbf{U})]+\varepsilon for any ff in our test class \mathcal{F}. In this case, we say that 𝐙\mathbf{Z} has (ε,δ)(\varepsilon,\delta)-pseudodensity.

Precisely which definition you pick will depend on what you know about 𝐙\mathbf{Z} and in what sense you would like it to behave like a δ\delta-dense distribution. Indeed, each of these definitions have appeared in different applications ([25], [18], [13], respectively), so there are scenarios where each of these types of behavior is desired. In general, the first candidate is the strongest (and, arguably, the most natural), but it is sometimes hard to establish that a distribution has the property. The following claim gives some simple relationships between the definitions:

Claim 1.1.

For any \mathcal{F}, the following hold:

  1. 1.

    If 𝐙\mathbf{Z} has a δ\delta-dense ε\varepsilon-model, then 𝐙\mathbf{Z} is δ\delta-dense in a ε\varepsilon-pseudorandom set.

  2. 2.

    If 𝐙\mathbf{Z} is δ\delta-dense in an ε\varepsilon-pseudorandom set, then 𝐙\mathbf{Z} has (ϵ,δ)(\epsilon,\delta)-pseudodensity.

Proof sketch.
  1. 1.

    Let 𝐌\mathbf{M} be the δ\delta-dense ε\varepsilon-model for 𝐙\mathbf{Z}. Note that 𝐔=δ𝐌+(1δ)𝐌¯\mathbf{U}=\delta\mathbf{M}+(1-\delta)\overline{\mathbf{M}}. So 𝐔=δ𝐙+(1δ)𝐌¯\mathbf{U}^{\prime}=\delta\mathbf{Z}+(1-\delta)\overline{\mathbf{M}} is ε\varepsilon-pseudorandom and 𝐙\mathbf{Z} is δ\delta-dense within it.

  2. 2.

    Suppose 𝐙\mathbf{Z} is δ\delta-dense in 𝐙\mathbf{Z}^{\prime} which ε\varepsilon-pseudorandom for \mathcal{F}. Then for any ff\in\mathcal{F}, δ𝔼[f(𝐙)]𝔼[f(𝐙)]𝔼[f(𝐔]+ε\delta\mathbb{E}[f(\mathbf{Z})]\leq\mathbb{E}[f(\mathbf{Z}^{\prime})]\leq\mathbb{E}[f(\mathbf{U}]+\varepsilon.

The marvelous quality of these three candidates in particular is that, for many natural \mathcal{F}, all of them are equivalent, and so establishing even (ϵ,δ)(\epsilon^{\prime},\delta)-pseudodensity is enough to guarantee the existence of a δ\delta-dense ε\varepsilon-model.

This equivalence holds for \mathcal{F} which are closed under majority, meaning for any kk (which we can think of as k=O(1)k=O(1) for now), if f1,,fkf_{1},...,f_{k}\in\mathcal{F} then 𝖬𝖠𝖩k(f1,,fk)\mathsf{MAJ}_{k}(f_{1},...,f_{k})\in\mathcal{F}, where 𝖬𝖠𝖩:{0,1}n{0,1}\mathsf{MAJ}:\{0,1\}^{n}\to\{0,1\} is 11 if at least half of its input bits are 11. In fact, it holds for more general \mathcal{F} if we allow the distinguishing parameter (ε\varepsilon^{\prime} in (ε,δ)(\varepsilon^{\prime},\delta)-pseudodensity) to be exponentially small (as in the original formulation, which we’ll dicuss later on). In this case, the subtelty in defining what it means to behave like a dense set vanishes. These equivalences constitute (essentially) what is known as the dense model theorem, originating in the work of Green-Tao [18] and Tao-Zeigler [48], and independently in Barak et al. [8] (though in different guises). This result has been fruitfully applied in many seemingly unrelated areas of mathematics and computer science: additive number theory [18, 48] where \mathcal{F} encodes additive information about subsets of {1,,N}\{1,...,N\} (or possibly a more general group), graph theory [49, 38] where \mathcal{F} encodes cuts in a fixed graph, circuit complexity [49], Fourier analysis [29], machine learning [29] and leakage-resilient cryptography [14]. The ubiquity of the dense model theorem motivates a simple question: are there natural scenarios in which the dense model theorem is false?

We show that the answer to this question is yes. In particular, we show that for either implication from 1.1 there is a class \mathcal{F} and a random variable 𝐙\mathbf{Z} so that converse fails to hold. From the computational entropy perspective, we show that the three computational entropies we’ve discussed are inequivalent for certain test classes \mathcal{F}. Necessarily (with ε\varepsilon^{\prime} not exponentially small) these classes are not closed under majority and so we will need to look ‘below’ majority in order to find our counterexamples.

1.1 The dense model theorem

We turn to discuss the dense model theorem in some more detail to better contextualize our work. Restricting our attention to random variable over {0,1}n\{0,1\}^{n}, the dense model theorem states the following:

Theorem 1.1 (Dense model theorem).

Let \mathcal{F} be a class of tests f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\} and 𝐙\mathbf{Z} a random variable over {0,1}n\{0,1\}^{n} with (εδ,δ)(\varepsilon\delta,\delta)-pseudodensity with respect to 𝖬𝖠𝖩k\mathsf{MAJ}_{k}\circ\mathcal{F} for k=O(log(1/δ)/ε2)k=O(\log(1/\delta)/\varepsilon^{2}). Then 𝐙\mathbf{Z} has a δ\delta-dense ε\varepsilon-model with respect to \mathcal{F}.

We will generally also consider a parameter ε\varepsilon^{\prime}, which in this case is εδ\varepsilon\delta, the additive error in pseudodensity. To get an intuition for what this is saying, let’s conisder a setting where it’s false but for trivial reasons. As a simple example given in [52], pick a set 𝐙\mathbf{Z} some set as a (1ε)(1-\varepsilon) fraction of another set 𝐒\mathbf{S} of size δ2n\delta 2^{n}. Then 𝐙\mathbf{Z} doesn’t have a δ\delta-dense ε\varepsilon-model (i.e. 𝐒\mathbf{S}) with respect to 𝐙\mathbf{Z}’s indicator function, which we’ll call ff. On the other hand, the distribution 𝐖\mathbf{W} obtained by sampling 𝐙\mathbf{Z} with probability δ\delta and sampling from 𝐒\mathbf{S}’s complement with probability 1δ1-\delta is at most εδ\varepsilon\delta-distinguishable from 𝐒\mathbf{S} for any function, since εδ\varepsilon\delta is simply the measure of the difference between 𝐒\mathbf{S} and 𝐙\mathbf{Z}. In particular 𝐙\mathbf{Z} is δ\delta-dense in the εδ\varepsilon\delta-pseudorandom 𝐖\mathbf{W} (which implies, via 1.1, that it is (εδ,δ)(\varepsilon\delta,\delta)-pseudodense). This means that the 1.1 is tight for the dependence on ε=εδ\varepsilon^{\prime}=\varepsilon\delta, in that it becomes false for Ω(εδ)\Omega(\varepsilon\delta). In many instances, we think of ε=1/𝗉𝗈𝗅𝗒(n)\varepsilon=1/{\mathsf{poly}}(n), δ\delta constant (or perhaps with mild dependences on nn) and ε=δε\varepsilon^{\prime}=\delta\varepsilon.

Originally, the dense model theorem was proved with a different (and stronger) assumption; namely, that 𝐙\mathbf{Z} is dense in a pseudorandom set. Green and Tao, in proving that the primes contain arbitrarily long arithmetic progressions, used it to the following effect: if 𝐙\mathbf{Z} are the prime numbers up to nn, then its density is known to behave like Θ(1/logn)\Theta(1/\log n). On the other hand, Szemerédi [45] showed that sufficiently dense subsets of \mathbb{Z} contain arbitrarily long arithmetic progressions. The best bounds for Szemerédi ’s theorem require density ω(1/loglogn))\omega(1/\log\log n)), which is much larger than the primes (see [16] and the recent [9] for more on the rich history on this and related problems). Not all is lost, however: the only property of dense sets that we’re interested in is that they contain arithemtic progressions. So Green and Tao construct a class \mathcal{F} of tests which can ‘detect’ arithmetic progressions and under which the primes are dense inside of a \mathcal{F}^{\prime}-pseudorandom set (more on \mathcal{F}^{\prime} later). By applying the dense model theorem, we conclude that the primes ‘look like’ a dense set (themselves having long arithemtic progressions) with respect to the class \mathcal{F}. As \mathcal{F} detects arithmetic progressions, it must be the case that the primes possess them. Of course, many details need to be filled in, but we hope this example shows the reader the ‘spirit’ of the dense model theorem.

A primary source of interest in the dense model theorem is in the connections it shares with seemingly unrelated branches of mathematics and computer science. The original application was in additive number theory, but it was independently discovered and proved in the context of cryptography ([8, 14]). RTTV [38] and Gowers [17] observed proofs of the dense model theorem which use linear programming duality, which is in turn related to Nisan’s proof of the hardcore lemma from circuit complexity [28]. In fact, Impagliazzo [29] shows in unpublished work that optimal-density versions of the hardcore lemma due to Holenstein [26] actually imply the dense model theorem. Klivans and Servedio [32] famously observed the relationship betweeen the hardcore lemma and boosting, a fundamental technique for aggregating weak learners in machine learning [15]. Together with the result of Impagliazzo, this connection means that dense model theorems can be proved by a particular type of boosting algorithm. A boosting argument for the existence of dense models also gives us constructive versions of the dense model theorem, which are needed for algorithmic applications. Zhang [52] (without using Impagliazzo’s reduction from the dense model theorem to the hardcore lemma) used the boosting algorithm of [7] directly to prove the dense model theorem with optimal query complexity (kk).

In addition to its connections to complexity, machine learning, additive number theory and cryptography, the dense model theorem (and ideas which developed from the dense model theorem, chiefly the approximation theorem of [49]), have been used to understand the weak graph regularity lemma of Frieze and Kannan [29], notions of computational differential privacy [36] and even generalization in generative adversarial networks (GANs) [5]. We now turn to discussing the complexity-theoretic aspects of the dense model theorem, specifically regarding our question of whether the 𝖬𝖠𝖩k\mathsf{MAJ}_{k} from the statement is optimal.

As alluded to earlier, Green and Tao actually worked in a setting where \mathcal{F}^{\prime} doesn’t ’need to compute majorities but where εδ\varepsilon\delta (that is, the distinguishing parameter in the pseudodensity assumption in the statement of 1.1) needs to be replaced by some ε=exp(𝗉𝗈𝗅𝗒(1/ε,1/δ))\varepsilon^{\prime}=\exp(-{\mathsf{poly}}(1/\varepsilon,1/\delta)) (with k=𝗉𝗈𝗅𝗒(1/δ,1/ε)k={\mathsf{poly}}(1/\delta,1/\varepsilon) experiencing a small increase). We state this result, as proved in Tao and Zeigler [48] and stated this way in RTTV [38], for comparison. For a test class \mathcal{F}, let k\prod_{k}\mathcal{F} be the set of tests of the form i[k]fi\prod_{i\in[k]}f_{i} for fif_{i}\in\mathcal{F}.

Theorem 1.2 (Computationally simple dense-model theorem, strong assumption).

Let \mathcal{F} be a class of tests f:{0,1}n[0,1]f:\{0,1\}^{n}\to[0,1] and 𝐙\mathbf{Z} a random variable over {0,1}n\{0,1\}^{n} which is δ\delta-dense in a set ε\varepsilon^{\prime}-pseudorandom for k\prod_{k}\mathcal{F} with k=𝗉𝗈𝗅𝗒(1/δ,1/ε)k={\mathsf{poly}}(1/\delta,1/\varepsilon) and ε=exp(1/δ,1/ε)\varepsilon^{\prime}=\exp(-1/\delta,1/\varepsilon). Then 𝐙\mathbf{Z} has a δ\delta-dense ε\varepsilon-model with respect to \mathcal{F}.

RTTV [38] observe that this proof can be adapted to work for ε\varepsilon^{\prime} have polynomial dependence on ε,δ\varepsilon,\delta by restricting to the case of boolean-valued tests. Doing so, however, makes \mathcal{F}^{\prime} much more complicated (essentially requiring circuits of size exponential in kk). In 1.1, we can obtain the best of both worlds: ε\varepsilon^{\prime} has polynomial dependence on ε,δ\varepsilon,\delta and the complexity blow-up is rather small. However, in this more picturesque circumstance, we need to be able to compute majorities. Is such a tradeoff necessary? Our results suggest that the answer is yes. 1.6 (stated in the following section) tells us that if the dense model theorem is true for \mathcal{F}, then there’s a small, constant-depth circuit with \mathcal{F}-oracle gates approximating majority on O(1/ε2)O(1/\varepsilon^{2}) bits.

Another important aspect of the dense model theorem is how the different assumptions are related. As mentioned, the original assumption was that 𝐙\mathbf{Z} is δ\delta-dense in an ε\varepsilon-pseudorandom set, but the proof can be extended to the case where 𝐙\mathbf{Z} is (ε,δ)(\varepsilon,\delta)-pseudodense. 1.1 showed that the former assumption implies that latter assumption. When the dense model theorem is true, the latter also implies the former: simply apply the dense model theorem to 𝐙\mathbf{Z} which is (ε,δ)(\varepsilon,\delta)-dense to obtain a δ\delta-dense ε\varepsilon-model. Then, by the first part of 1.1, we’re done.

First, we give examples of situations where these two notions are distinct. For example, we show in 1.4 and 1.5 that they are inequivalent when \mathcal{F} is constant-depth polynomial size circuits or when \mathcal{F} is a low-degree polynomial over a finite field. Note that a separation between pseudodensity and being dense in a pseudorandom set also implies a separation between pseudodensity and having a dense model, as being dense in a pseudorandom set is a necessary condition for having a dense model.

Second, we show that the dense model theorem is false even when we make the stronger assumption that the starting distribution 𝐙\mathbf{Z} is dense in a pseudorandom set. Specifically, in 1.3 we can show that some distributions 𝐙\mathbf{Z} are dense in a pseudorandom set but fail to have a dense model when \mathcal{F} consists of constant-depth, polynomial size circuits.

Having contextualized our work some, we now turn to describe our contributions in more detail.

1.2 Contributions

We separate the previously described notions of computational entropy, giving examples where the dense model theorem is false. We are able to prove different separations when \mathcal{F} is constant-depth unbounded fan-in circuits, low-degree polynomials over a finite field, and, in one case, any test class \mathcal{F} which cannot efficiently approximate majority (in some sense made explicit later on). The only known separation prior was between pseudodensity and having a dense model for bounded-width read-once branching programs, due to Barak et al. [8].

Let 𝒞(S,d)\mathcal{C}(S,d) denote the class of unbounded fan-in, size SS, depth dd circuits. We are generally thinking of S=𝗉𝗈𝗅𝗒(n)S={\mathsf{poly}}(n) and d=O(1)d=O(1), which corresponds to the complexity class 𝖠𝖢0{\mathsf{AC}}^{0}. 1.3 shows that 𝐙\mathbf{Z} being δ\delta-dense in an ε\varepsilon-pseudorandom set need not imply that 𝐙\mathbf{Z} has a δ\delta-dense ε\varepsilon-model when the test class is 𝒞(S,d)\mathcal{C}(S,d):

Theorem 1.3.

Let ε,ε>0\varepsilon,\varepsilon^{\prime}>0 be arbitrary, δε/8\delta\geq\varepsilon^{\prime}/8 and

Sexp(O(εεlog(1/δ)log(1/ε))1/(d1)).S\leq\exp(O\Big{(}\frac{\sqrt{\varepsilon^{\prime}}}{\varepsilon}\cdot\frac{\sqrt{\log(1/\delta)}}{\log(1/\varepsilon^{\prime})}\Big{)}^{1/(d-1)}).

Then for =𝒞(S,d)\mathcal{F}=\mathcal{C}(S,d), there is a random variable 𝐃\mathbf{D} over {0,1}n\{0,1\}^{n} with n=O(log(1/δ)/ε2)n=O(\log(1/\delta)/\varepsilon^{2}) so that 𝐃\mathbf{D} is δ\delta-dense in an ε\varepsilon^{\prime}-pseudorandom set but does not have a δ\delta-dense ε\varepsilon-model. In particular, the dense model theorem is false in this setting.

Recall that the dense model theorem is false when ε=Ω(εδ)\varepsilon^{\prime}=\Omega(\varepsilon\delta), which makes the restriction δε/8\delta\geq\varepsilon^{\prime}/8 extremely mild. A common regime is ε=1/𝗉𝗈𝗅𝗒(n)\varepsilon=1/{\mathsf{poly}}(n), δ=O(1)\delta=O(1) and ε=δε=Θ(ε)\varepsilon^{\prime}=\delta\varepsilon=\Theta(\varepsilon), in which case this gives us (essentially) a lower bound of weakly exponential in 1/ε1/ε1/\sqrt{\varepsilon}\approx 1/\sqrt{\varepsilon^{\prime}}.

Let 𝐍α\mathbf{N}_{\alpha} denote the product distribution of nn Bernoulli random variables with success probability 1/2α1/2-\alpha. Recall that density in a pseudorandom set readily implies pseudodensity, and one can use the dense model theorem to show the converse. We show that (ε,δ)(\varepsilon,\delta)-pseudodensity need not imply δ\delta-density in an ε\varepsilon-pseudorandom set when the test class is 𝒞(S,d)\mathcal{C}(S,d):

Theorem 1.4.

Fix ε,ε,δ>0\varepsilon,\varepsilon^{\prime},\delta>0, dd\in\mathbb{N}, and

Sexp(O(δεlog(1/δ)log(1/ε))1/(d1)).S\leq\exp(O\Big{(}\frac{\sqrt{\delta}}{\sqrt{\varepsilon}}\cdot\frac{\log(1/\delta)}{\log(1/\varepsilon^{\prime})}\Big{)}^{1/(d-1)}).

Then 𝐍ε/δ\mathbf{N}_{\sqrt{\varepsilon/\delta}} over {0,1}n\{0,1\}^{n} with n=O(1/ε)n=O(1/\varepsilon) is (ε,δ)(\varepsilon^{\prime},\delta)-pseudodense and yet 𝐍ε/δ\mathbf{N}_{\sqrt{\varepsilon/\delta}} is not δ\delta-dense inside of any ε\varepsilon-pseudorandom set.

The dependence ε\varepsilon^{\prime} means that we can take ε\varepsilon^{\prime} exponentially smaller than ε\varepsilon and still obtain a separation. This case corresponds to \mathcal{F} being ‘very’ fooled by 𝐍α\mathbf{N}_{\alpha} but still not being δ\delta-dense in a ‘mildly’ pseudorandom set. This result draws on a recent line of work in the pseudorandomness literature — often referred to as ‘the coin problem’ and studied in, e.g., [42, 12, 1, 46] — which concerns the ability of a test class \mathcal{F} unable to compute majority has in distinguising 𝐍α\mathbf{N}_{\alpha} and 𝐔\mathbf{U}. We will discuss this connection in more detail during the proof overviews.

We prove a similar separation for degree-dd 𝔽p\mathbb{F}_{p}-polynomials (on nn variables), which generalizes (and uses techniques from) a recent result of Srinivasan [44] in the case where δ=1\delta=1. In this case, we think of a distribution 𝐙\mathbf{Z} as being (ε,δ)(\varepsilon^{\prime},\delta)-pseudodense for degree-dd 𝔽p\mathbb{F}_{p}-polynomials when δPr[P(𝐙)0]εPr[P(𝐔)0]\delta\Pr[P(\mathbf{Z})\neq 0]-\varepsilon^{\prime}\geq\Pr[P(\mathbf{U})\neq 0] for any degree-dd polynomial P𝔽p[X1,,Xn]P\in\mathbb{F}_{p}[X_{1},...,X_{n}] (noting that we are only evaluating PP over {0,1}n\{0,1\}^{n}).

Theorem 1.5.

Fix a finite field 𝔽\mathbb{F} with characteristic p=O(1)p=O(1) , ε,ε>0\varepsilon,\varepsilon^{\prime}>0 and let c>δ>0c>\delta>0 where c1/200c\approx 1/200 is an absolute constant. Suppose that

dO(δ/ε)d\leq O(\sqrt{\delta/\varepsilon})

Then when \mathcal{F} is the nn-variate degree-dd polynomials over 𝔽\mathbb{F} with n=1/εn=1/\varepsilon, and α=O(ε/δ)\alpha=O(\sqrt{\varepsilon/\delta}), 𝐍α\mathbf{N}_{\alpha} is (ε,δ)(\varepsilon^{\prime},\delta)-pseudodense but is not δ\delta-dense inside of an ε\varepsilon-pseudorandom set.

This implies lower bounds for constant-depth circuits with 𝖬𝖮𝖣p{\mathsf{MOD}}_{p} gates by the classical lower bounds of Razborov [37] and Smolensky [43]. Perhaps more interestingly, this holds even over non-prime fields. Also notably, there is no dependence on εεδ\varepsilon^{\prime}\leq\varepsilon\delta, so we can take it to be arbitrarily small.

We also prove a more general separation between pseudodensity and density in a pseudorandom set. This result, drawing from the work of [42], provides a more specific characterization of the sense in which dense model theorems are ‘required’ to compute majority.

Theorem 1.6.

Let ε,δ>0\varepsilon,\delta>0. Suppose \mathcal{F} is a test class of boolean functions f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\} with the following property: there is no 𝖠𝖢0{\mathsf{AC}}^{0} \mathcal{F}-oracle circuit of size 𝗉𝗈𝗅𝗒(nδε3/2){\mathsf{poly}}(n\cdot\frac{\sqrt{\delta}}{\varepsilon^{3/2}}) computing majority on O(δ/ε)O(\sqrt{\delta/\varepsilon}) bits.

Then 𝐍ε/δ\mathbf{N}_{\sqrt{\varepsilon/\delta}} is (ϵδ,δ)(\epsilon\delta,\delta)-pseudodense and yet does not have a δ\delta-dense ε\varepsilon-model. In particular, when the hypotheses are met, the dense model theorem is false.

Informally, this says that any \mathcal{F} which can refute the pseudodensity of 𝐍α\mathbf{N}_{\alpha} is only ‘a constant-depth circuit away’ from computing majority.

1.3 Related work

Computational entropy

Computational entropy was studied systematically in [8] and is relevant to various problems in complexity and cryptography such as leakage-resilience [14], constructions of PRGs from one-way functions [25, 21, 20]. and derandomization [13].

There are a number of definitions of computational entropy which we don’t consider in this work. For example, Yao pseudoentropy [51] (see also [8]), corresponding to random variables which are ‘compressible’ by a class of tests \mathcal{F}, in the sense that \mathcal{F} can encode and decode the random variable by encoding into a small number of bits. Yao pseudoentropy was recently used in time-efficient hardness-to-randomness tradeoffs [13], where (randomness-efficient) samplers for pseudodense distributions were used with an appropriate extractor to construct a pseudorandom distribution. Another example is inaccessible entropy of Haitner et al. [21], corresponding to the entropy of a message at some round in a two-player protocol conditioned on the prior messages and the randomness of the players, which is used in efficient constructions of statistically hiding commitment schemes from one-way functions [20].

Separating notions of computational entropy has been studied before in [8], who prove a separation of pseudodensity and having a dense model for bounded-width read-once branching programs. Separating notions of conditional computational entropy was studied in [27], showing separations between conditional variants of Yao pseudoentropy and having a dense model.

As mentioned in [27], citing [49] and personal communication with Impagliazzo, another question of interest is whether Yao pseudoentropy (corresponding to efficient encoding/decoding algorithms) implies having dense model. It is not hard to see that small Yao pseudoentropy implies small pseudodensity, with some mild restrictions on \mathcal{F}. It would be interesting to see if the techniques from this paper can be used to understand Yao pseudoentropy in more detail. We leave this to future work.

Complexity of dense model theorems and hardness amplification

Prior work on the complexity of dense model theorems has included a tight lower bound on the query complexity [52] and a lower bound on the advice complexity [50]. As far as we are aware, this is the first work to consider the computational complexity of dense model theorems.

There has also been prior work on the computational complexity of hardness amplification, establishing that various known strategies for hardness amplification require the computation of majority [34, 42, 19, 41]. It is known that a particular type of hardness amplification given by the hardcore lemma implies the dense model theorem [29].

Our results are stronger in the following sense: previous work [34, 42, 19] shows that black-box hardness amplification proofs require majority. This means that if you amplify the hardness of ff in some black-box way, then this can be used to compute majority. In our case, we simply show (in different settings) that the dense model theorem is false, regardless of how we tried to prove it. By the connection between the hardcore lemma and the dense model theorem, our results also provide scenarios where the hardcore lemma is false. As far as we are aware, these are the first such scenarios recorded in the literature.

1.4 Technical overview

We discuss two general themes that appear consistently in the proofs and then discuss each of the main theorems in some more detail.

1.4.1 Dense distributions have mostly unbiased bits

A commonly-used observation in theoretical computer science is that most bit positions of a δ\delta-dense random variable over {0,1}n\{0,1\}^{n} have bias O(log(1/δ)/n)O(\sqrt{\log(1/\delta)}/n) (see, for example, the introduction of [35]). Relevant to our purposes, it provides a necessary condition for having a δ\delta-dense ε\varepsilon-model with respect to any class \mathcal{F} containing the projections zziz\mapsto z_{i}. 𝐙\mathbf{Z} has a δ\delta-dense ε\varepsilon-model, then most bits of 𝐙\mathbf{Z} have bias ε+O(log(1/δ)/n)\varepsilon+O(\sqrt{\log(1/\delta)}/n). In particular, if all of the bits of 𝐙\mathbf{Z} have large bias, then it can’t have a dense model.

This is used directly in the proof of 1.3. In this case, we construct a distribution 𝐙\mathbf{Z} which is δ\delta-dense in a set which is ε\varepsilon-pseudorandom for 𝖠𝖢0{\mathsf{AC}}^{0} but where the each bit is noticeably biased away from 1/21/2.

In order to prove separations between pseudodensity and being dense in a pseudorandom set — as in 1.4, 1.5 and 1.6 — we need to consider the bias of larger subsets of variables. Considering just two bits is sufficient to prove mild concentration bounds on the weight of pseudorandom strings. This implies that the tails of dense subsets of pseudorandom sets should not be too heavy.

1.4.2 Biased coin distribution

The biased coin distribution, 𝐍α\mathbf{N}_{\alpha} over {0,1}n\{0,1\}^{n} is the product of nn Bernoulli random variables with success probability 1/2α1/2-\alpha. 𝐍α\mathbf{N}_{\alpha} has recently garnered significant interest in the pseudorandomness literature (see [2, 12, 46, 10, 1]). Shaltiel and Viola [42] showed that if ff is a test which ε\varepsilon-distinguishes 𝐍α\mathbf{N}_{\alpha} from 𝐔\mathbf{U}, then there is a small, constant-depth circuit CC with ff-oracle gates which computes majority on O(1/ε)O(1/\varepsilon) bits. A similar, but qualitatively different, connection due to Limaye et al [33] — extended to any choice of ε>0\varepsilon>0 by Srinivasan [44] — shows that any 𝔽p\mathbb{F}_{p}-polynomial with advantage 12ε1-2\varepsilon in distinguishing 𝐍α\mathbf{N}_{\alpha} from 𝐔\mathbf{U} must have degree Ω(log(1/ε)/α)\Omega(\log(1/\varepsilon)/\alpha). We extend some of these pseudorandomness results regarding 𝐍α\mathbf{N}_{\alpha} to pseudodensity results.

First, we extend the observation of Shaltiel and Viola to apply to tests ff for which 𝔼[f(𝐙)]δ𝔼[f(𝐔)]+ε\mathbb{E}[f(\mathbf{Z})]\geq\delta\mathbb{E}[f(\mathbf{U})]+\varepsilon (which corresponds to pseudorandomness when δ=1\delta=1). This gives us unconditional pseudodensity for test classes \mathcal{F} which can’t be used in small, constant-depth oracle circuits approximating majority. We also extend the observation of [33] to show lower bounds on the 𝔽p\mathbb{F}_{p}-degree for any function ff which refutes the pseudodensity of 𝐍α\mathbf{N}_{\alpha}.

In Lemma 4.1, we show that 𝐍α\mathbf{N}_{\alpha} exhibits (ε,δ)(\varepsilon,\delta)-pseudodensity for ε=(pO(logS)d1)k\varepsilon=(p\cdot O(\log S)^{d-1})^{k} and δ=eαk/p\delta=e^{-\alpha k/p}. This can be seen as a generalization of Tal’s result, building on [12, 1, 42] that 𝐍α\mathbf{N}_{\alpha} is 3αO(logS)d13\alpha\cdot O(\log S)^{d-1}-pseudorandom for 𝒞(S,d)\mathcal{C}(S,d).

Tal uses a Fourier analytic proof which becomes very simple given tail bounds on the Fourier spectrum of 𝖠𝖢0{\mathsf{AC}}^{0} (the latter being the main contribution of [46]). More generally, any \mathcal{F} enjoying sufficiently strong tail bounds on the Fourier spectrum (in the 1\ell_{1} norm) cannot distinguish between 𝐍α\mathbf{N}_{\alpha} and uniform. It turns out, as proved by Tal and recorded in Agarwal [2], that if \mathcal{F} is closed under restrictions than even bounding the first level of the Fourier spectrum works. The proof of Lemma 4.1 based specifically on the switching lemma for constant-depth circuits. While switching lemmas can be used to show Fourier concentration, it would be intersting to find a proof which only uses the assumption of Fourier concentration (or some Fourier-analytic assumption).

1.4.3 1.3

Our goal is to construct a random variable 𝐃\mathbf{D} which is dense inside of an 𝖠𝖢0{\mathsf{AC}}^{0}-pseudorandom set but where each bit is biased away from 0. In this case, 𝐃\mathbf{D} would be distinguishable from any dense set, since the average bit of a dense set is roughly unbiased. Doing so requires two steps.

The first step is constructing an appropriate distribution 𝐙\mathbf{Z} that fools 𝖠𝖢0{\mathsf{AC}}^{0} circuits. For this we adopt a general strategy of Ajtai and Wigderson [3] (and applied in many contexts in pseudorandomness since; see, e.g., [40]): to fool a circuit CC, we start by producing a random restriction to simpify CC to a short decision tree (via the switching lemma), and then we fool the decision tree on the remaining bits using a kk-wise independent distribution 𝐒\mathbf{S}. If we wanted 𝐙\mathbf{Z} to have small support size, we would need some way of producing random restrictions with a small amount randomness (which is precisely the approach of Ajtai-Wigderson and later work). Fortunately, we only care about the existence of 𝐙\mathbf{Z} and are therefore content to use the ‘non-derandomized’ switching lemma.

The second step is finding a dense subset 𝐃\mathbf{D} of 𝐒\mathbf{S} with biased bits. We do this by constructing 𝐒\mathbf{S} so that each bit has bias roughly log(1/δ)/K\sqrt{\log(1/\delta)/K}, where kKnk\ll K\ll n is a parameter. This is achieved by randomly bucketing the indices into KK buckets and assigning each bucket a random bit, which reduces the dimension of the problem from nn to KK. This means we can pick a δ\delta-dense event in {0,1}K\{0,1\}^{K} with extremal bias — met (up to constants) by the function accepting all strings with weight less than K/2Klog(1/δ)K/2-K\sqrt{\log(1/\delta)} — in order to find a dense subset of 𝐒\mathbf{S} with large bias. The bucketing construction introduces some error when a small set I[n]I\subseteq[n] hits to distinct elements in some buckets.

1.4.4 1.4

We will show 𝐍α\mathbf{N}_{\alpha} has (δ,ε)(\delta,\varepsilon^{\prime})-pseudodensity for 𝖠𝖢0{\mathsf{AC}}^{0} for δ=ε=O(1)\delta=\varepsilon^{\prime}=O(1), α=1/𝗉𝗈𝗅𝗒log(n)\alpha=1/{\mathsf{poly}}\log(n). The idea is that 𝐍α\mathbf{N}_{\alpha} can be sampled by first sampling a random restriction which leaves a pp fraction of the bits unset (and is unbiased on the restricted bits) and then setting the remaining bits with bias α/p\alpha/p. Applying the switching lemma, we conclude that 𝔼[f(𝐍α)]𝔼[f(𝐍α/p)]\mathbb{E}[f(\mathbf{N}_{\alpha})]\approx\mathbb{E}[f^{\prime}(\mathbf{N}_{\alpha/p})] where ff^{\prime} is a short decision tree (which doesn’t not depend on all of its inputs). A simple calculation reveals that acceptance probability of ff^{\prime} can increase by at a most a factor (1+α/p)deαd/p(1+\alpha/p)^{d}\leq e^{\alpha d/p} when passing from the uniform distribution to 𝐍α/p\mathbf{N}_{\alpha/p}. By incorporating the error from the switching lemma (i.e. the advantage lost by conditioning on the switching lemma succeeding), we get (δ,ϵ)(\delta,\epsilon)-pseudodensity.

To prove the separation, we use the fact that the Hamming weight of a random variable fooling 𝒞(S,d)\mathcal{C}(S,d) is concentrated around its expectation. This means in particular that if 𝐍α\mathbf{N}_{\alpha} were δ\delta-dense in a pseudorandom distribution, then the tails of 𝐍α\mathbf{N}_{\alpha} couldn’t be too heavy and therefore α\alpha couldn’t be too large.

1.4.5 1.5 and 1.6

1.5 and 1.6 draw from related work of Srinivasan [44] and Shaltiel-Viola [42] respectively.

With ϵ>0\epsilon>0 and \mathcal{F} an arbitrary class of tests f:{0,1}n{±1}f:\{0,1\}^{n}\to\{\pm 1\}, suppose that ff\in\mathcal{F} witnesses that 𝐍ε\mathbf{N}_{\varepsilon} fails to have (ε,δ)(\varepsilon^{\prime},\delta)-pseudo-density in the sense that

𝔼[f(𝐔)]δ𝔼[f(𝐍β)]γ.\mathbb{E}[f(\mathbf{U})]\leq\delta\mathbb{E}[f(\mathbf{N}_{\beta})]-\gamma.

[44] and [42] both make use of the following simple observation. Given two strings u,v{0,1}mu,v\in\{0,1\}^{m} with wt(u)=(1/2ε)m\mathrm{wt}(u)=(1/2-\varepsilon)m and wt(v)=m/2\mathrm{wt}(v)=m/2, a uniformly random index i[m]i\in[m] has uiu_{i} distributed as a (1/2ε)(1/2-\varepsilon)-biased coin and viv_{i} as an unbiased coin. In our case, applying ff to sufficiently many random samples from uu or vv ‘distinguishes’ the two of them, but in a weaker sense.

In the case of 1.6, we can amplify acceptance probabilities by increasing the size of the circuit by a factor 1/εδ1/\varepsilon\delta, after which we can apply [42] saying that constant-error distinguishers between 𝐍α\mathbf{N}_{\alpha} and 𝐔\mathbf{U} can be used to compute majority.

For 1.5, we apply a beautiful recent result of Srinivasan [44] showing that any mm-variate polynomial (over a finite field) which vanishes on most points on the slice 1/2α1/2-\alpha and doesn’t vanish on most points on the slice 1/21/2 must have high degree Ω(αm)\Omega(\alpha m). One way of interpreting this result is that low-degree polynomials can’t approximately solve certain ‘promise’ versions of majority.

In this latter case, we need to open up the error reduction procedure we use for 1.6 and show how to approximate it using low-degree polynomials. This will ultimately be achieved by approximating OR with a probabilistic polynomial, as in [37, 43].

2 Technical tools

We write [n]={1,,n}[n]=\{1,...,n\} and use boldface to denote random variables. Let 𝒞(S,d)\mathcal{C}(S,d) be the set of size SS, depth-dd unbounded fan-in circuits. For a boolean function f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\}, let DT(f)DT(f) denote the depth of the shortest decision tree computing ff.

2.1 Biased coins

As before, let 𝐍α\mathbf{N}_{\alpha} denote the random variable corresponding to the product of nn independent coins with bias (1/2α)(1/2-\alpha). That is,

Pr[𝐍α=z]=(1/2α)wt(z)(1/2+α)nwt(z)\Pr[\mathbf{N}_{\alpha}=z]=(1/2-\alpha)^{\mathrm{wt}(z)}(1/2+\alpha)^{n-\mathrm{wt}(z)}

where wt(z)\mathrm{wt}(z) denotes the Hamming weight of zz.

For a random variable 𝐙\mathbf{Z} over {0,1}n\{0,1\}^{n} and i[n]i\in[n], let biasi(𝐙)=|Pr[𝐙i=1]Pr[𝐙i=0]|/2\mathrm{bias}_{i}(\mathbf{Z})=|\Pr[\mathbf{Z}_{i}=1]-\Pr[\mathbf{Z}_{i}=0]|/2. Let ={zzi:i[n]}\mathcal{B}=\{z\mapsto z_{i}:i\in[n]\} be the set of monotone projections. A random variable 𝐙=(𝐙1,,𝐙n)\mathbf{Z}=(\mathbf{Z}_{1},...,\mathbf{Z}_{n}) is ϵ\epsilon-pseudorandom with respect to \mathcal{B} precisely when each marginal 𝐙i\mathbf{Z}_{i} has the property that biasi(𝐙)=|Pr[𝐙i=1]1/2|ε\mathrm{bias}_{i}(\mathbf{Z})=|\Pr[\mathbf{Z}_{i}=1]-1/2|\leq\varepsilon for each i[n]i\in[n]. In particular,

Claim 2.1.

For any ε>0\varepsilon>0, 𝐍ε\mathbf{N}_{\varepsilon} is ε\varepsilon-pseudorandom with respect to \mathcal{B}.

2.2 Information theory

The (Shannon) entropy of a random variable is defined as

H(𝐙)=x{0,1}np𝐙(x)logp𝐙(x),H(\mathbf{Z})=-\sum_{x\in\{0,1\}^{n}}p_{\mathbf{Z}}(x)\log p_{\mathbf{Z}}(x),

where p𝐙p_{\mathbf{Z}} is the probability density function corresponding to 𝐙\mathbf{Z}. The Shannon entropy of random vector is sub-additive, in that H(𝐙)i[n]𝐙𝐢H(\mathbf{Z})\leq\sum_{i\in[n]}\mathbf{Z_{i}}. When 𝐙{0,1}\mathbf{Z}\in\{0,1\} and Pr[𝐙=1]=p\Pr[\mathbf{Z}=1]=p, we use h(p)=H(𝐙)=(plogp+(1p)log(1p))h(p)=H(\mathbf{Z})=-(p\log p+(1-p)\log(1-p)) to denote the binary entropy function.

The min-entropy is defined as

H(𝐙)=minx{0,1}nlogp𝐙(x)H_{\infty}(\mathbf{Z})=-\min_{x\in\{0,1\}^{n}}\log p_{\mathbf{Z}}(x)

If 𝐙\mathbf{Z} is δ\delta-dense inside of 𝐔\mathbf{U}, then its min-entropy is nlog(1/δ)n-\log(1/\delta) and for any random variable 𝐙\mathbf{Z}, H(𝐙)H(𝐙)H_{\infty}(\mathbf{Z})\leq H(\mathbf{Z}).

By this latter inequality and subadditivity, the average entropy of 𝐙\mathbf{Z}’s bits is at least 1log(1/δ)/n1-\log(1/\delta)/n. Appealing to a quadratic approximation of binary entropy, we learn that the bias must be at most log(1/δ)/n\sqrt{\log(1/\delta)/n}. This result has been referred to as Chang’s inequality and the Level-1 inequality, having been observed in different forms and with different proofs in, for example, [47, 11, 22, 31]. Because it is so simple, we provide a proof here:

Claim 2.2.

If 𝐙\mathbf{Z} is δ\delta-dense in 𝐔\mathbf{U}, then 𝔼i[biasi(𝐙)]log(1/δ)/n\mathbb{E}_{i}[\mathrm{bias}_{i}(\mathbf{Z})]\leq\sqrt{\log(1/\delta)/n}

Proof.

As δ\delta-density is equivalent to nlog(1/δ)n-\log(1/\delta) min-entropy,

nlog(1/δ)=H(𝐙)H(𝐙)i[n]H(𝐙i),n-\log(1/\delta)=H_{\infty}(\mathbf{Z})\leq H(\mathbf{Z})\leq\sum_{i\in[n]}H(\mathbf{Z}_{i}),

by subadditivity of entropy. The entropy of 𝐙i\mathbf{Z}_{i}’s bits, therefore, is at least 1log(1/δ)/n1-\log(1/\delta)/n on average. Taking the Taylor series, we can approximate the binary entropy function h(p)h(p) around 1/21/2 by a quadratic function as h(1/2+ε)1(2/ln2)ε2h(1/2+\varepsilon)\leq 1-(2/\ln 2)\varepsilon^{2}. Comparing this bound with the average, we get

1log(1/δ)1(2/ln2)ε2,1-\log(1/\delta)\leq 1-(2/\ln 2)\varepsilon^{2},

meaning ε(ln2/2)(log(1/δ)/n)log(1/δ)/n\varepsilon\leq\sqrt{(\ln 2/2)\cdot(\log(1/\delta)/n)}\leq\sqrt{\log(1/\delta)/n}. ∎

2.3 Random variables lacking computational entropy

It follows directly from 2.2 that if biasi(𝐙i)\mathrm{bias}_{i}(\mathbf{Z}_{i}) exceeds ε+log(1/δ)/n\varepsilon+\sqrt{\log(1/\delta)}/n for every ii, then 𝐙\mathbf{Z} does not have a δ\delta-dense ε\varepsilon-model with respect to the projections \mathcal{B}.

Lemma 2.1.

Let 𝐙\mathbf{Z} be a random variable with biasi(𝐙)γ\mathrm{bias}_{i}(\mathbf{Z})\leq\gamma

for every i[n]i\in[n]. Then for any δ>0\delta>0 and γϵ+log(1/δ)n\gamma\geq\epsilon+\sqrt{\frac{\log(1/\delta)}{n}}, 𝐙\mathbf{Z} does not have a δ\delta-dense ϵ\epsilon-model with respect to \mathcal{B}.

This is used for the separation in 1.3. We would also like a necessary condition for being dense in a pseudorandom set. Towards this end, we note that pseudorandom distributions for even very simple test classes have mild concentration properties.

Claim 2.3.

Suppose \mathcal{F} can compute xixjx_{i}\oplus x_{j} for every i,j[n]i,j\in[n] and let 𝐙\mathbf{Z} over {0,1}n\{0,1\}^{n} be ε\varepsilon-pseudorandom for \mathcal{F}. Then

Pr[i𝐙𝐢n/2αn]14α2n+ε4α2\Pr[\sum_{i}\mathbf{Z_{i}}\leq n/2-\alpha n]\leq\frac{1}{4\alpha^{2}n}+\frac{\varepsilon}{4\alpha^{2}}
Proof.

We work over {±1}\{\pm 1\} instead of {0,1}\{0,1\} to make calculations easier. We can compute the second moment as

𝔼[(i𝐙i)2]=i𝔼[𝐙i2]+ij𝔼[𝐙i𝐙j]n+εn2.\mathbb{E}[(\sum_{i}\mathbf{Z}_{i})^{2}]=\sum_{i}\mathbb{E}[\mathbf{Z}_{i}^{2}]+\sum_{i\neq j}\mathbb{E}[\mathbf{Z}_{i}\mathbf{Z}_{j}]\leq n+\varepsilon n^{2}.

Applying Markov’s inequality to (i𝐙i)2(\sum_{i}\mathbf{Z}_{i})^{2}, we see that

Pr[|i𝐙i|2αn]=Pr[(i𝐙i)2(2αn)2]𝔼[(i𝐙i)2]/(2αn)2.\Pr[|\sum_{i}\mathbf{Z}_{i}|\geq 2\alpha n]=\Pr[(\sum_{i}\mathbf{Z}_{i})^{2}\geq(2\alpha n)^{2}]\leq\mathbb{E}[(\sum_{i}\mathbf{Z}_{i})^{2}]/(2\alpha n)^{2}.

We use 2αn2\alpha n because it maps back to n/2αnn/2-\alpha n in {0,1}\{0,1\}. Then the conclusion follows from our second moment calculation and converting back to {0,1}\{0,1\}. ∎

The tails of a dense subset can’t be too much larger than the original distribution, by definition of density. This gives us a test for being dense in a pseudorandom set, which we specialize to 𝐍α\mathbf{N}_{\alpha}.

Lemma 2.2.

Let ε,δ>0\varepsilon,\delta>0 be arbitrary. Suppose \mathcal{F} can compute xixjx_{i}\oplus x_{j} for any i,j[n]i,j\in[n] and α1/(8δ)(1/n+ε)\alpha\geq\sqrt{1/(8\delta)\cdot(1/n+\varepsilon)}. Then 𝐍α\mathbf{N}_{\alpha} is not δ\delta-dense in any set which is ε\varepsilon-pseudorandom for \mathcal{F}.

Proof.

Under 𝐍α\mathbf{N}_{\alpha}, the volume of the threshold 𝟏[i𝐙in/2αn]\mathbf{1}[\sum_{i}\mathbf{Z}_{i}\leq n/2-\alpha n] is 1/21/2. Taking 2.3 in the contrapositive, we reach the desired conclusion when

1/2\displaystyle 1/2 >14δα2n+ε4δα2\displaystyle>\frac{1}{4\delta\alpha^{2}n}+\frac{\varepsilon}{4\delta\alpha^{2}}
α2\displaystyle\alpha^{2} >18δ(1/n+ε)\displaystyle>\frac{1}{8\delta}(1/n+\varepsilon)

2.4 Random restrictions and the switching lemma

A restriction over [n][n] is a function ρ:[n]{0,1,}\rho:[n]\to\{0,1,*\}. Indices in ρ1()\rho^{-1}(*) can be thought of as unset and each other index as set. For another restriction zz so that ρ1()z1({0,1})\rho^{-1}(*)\subseteq z^{-1}(\{0,1\}), let ρz{0,1}n\rho\circ z\in\{0,1\}^{n} be defined by

(ρz)i={zi if iρ1(),ρi otherwise.(\rho\circ z)_{i}=\begin{cases}z_{i}\text{ if $i\in\rho^{-1}(*)$,}\\ \rho_{i}\text{ otherwise.}\end{cases}

Define the restricted function f|ρ:{0,1}ρ1(){0,1}f|_{\rho}:\{0,1\}^{\rho^{-1}(*)}\to\{0,1\} over ρ\rho’s unset indices by

f|ρ(z)=f(ρz).f|_{\rho}(z)=f(\rho\circ z).

Let RpR_{p} be the distribution on restrictions over [n][n] obtained by setting ρ(i)=\rho(i)=* independently with probability pp, and then setting each bit not assigned to * a random bit. The switching lemma we use is due to Rossman [39], building on a long line of work [3, 23, 24, 30]:

Theorem 2.1 (Rossman [39]).

Suppose f𝒞(S,d)f\in\mathcal{C}(S,d). Then

Pr𝝆Rp[DT(f|𝝆)k](pO(logS)d1)k\Pr_{\boldsymbol{\rho}\sim R_{p}}[DT(f|_{\boldsymbol{\rho}})\geq k]\leq(p\cdot O(\log S)^{d-1})^{k}

By considering a random restriction 𝝆Rp\boldsymbol{\rho}\sim R_{p} over [n][n] and a random variable 𝐙\mathbf{Z} over {0,1}n\{0,1\}^{n}, the definition of a restricted function implies that

𝔼[f(𝝆𝐙)]=𝔼[f|𝝆(𝐙)].\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]=\mathbb{E}[f|_{\boldsymbol{\rho}}(\mathbf{Z})].

We make crucial use of two simple corollaries of the switching lemma, which allow us to reason about distinguishability for 𝖠𝖢0{\mathsf{AC}}^{0} circuits in terms of distinguishability for short decision trees.

Lemma 2.3.

Suppose f𝒞(S,d)f\in\mathcal{C}(S,d). Then there is a distribution over depth kk decision trees so that

|𝔼[f(𝝆𝐙)]𝔼[h𝝆(𝐙)]|(pO(logS)d1)k|\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]|\leq(p\cdot O(\log S)^{d-1})^{k}
Proof.

Let g𝝆g_{\boldsymbol{\rho}} denote the optimal decision tree for f|𝝆f|_{\boldsymbol{\rho}}. Let EE denote the event that g𝝆g_{\boldsymbol{\rho}} has depth at most kk and Pr[E]=1q\Pr[E]=1-q. Let h𝝆h_{\boldsymbol{\rho}} be the distribution over depth at most kk decision trees obtained by sampling g𝝆g_{\boldsymbol{\rho}} conditioned on EE. Then

𝔼[f(𝝆𝐙)]\displaystyle\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})] =𝔼[f|𝝆(𝐙)]\displaystyle=\mathbb{E}[f|_{\boldsymbol{\rho}}(\mathbf{Z})]
=(1q)𝔼[g𝝆(𝐙)|E]+q𝔼[g𝝆(𝐙)|¬E]\displaystyle=(1-q)\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})|E]+q\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})|\neg E]
=(1q)𝔼[h𝝆(𝐙)]+q𝔼[g𝝆(𝐙)|¬E]\displaystyle=(1-q)\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]+q\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})|\neg E]
=𝔼[h𝝆(𝐙)]q(𝔼[h𝝆(𝐙)]𝔼[g𝝆(𝐙)|¬E]).\displaystyle=\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]-q(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]-\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})|\neg E]).

The right-hand term is bounded in absolute value by qq because ff is Boolean. By 2.1, q(pO(logS)d1)kq\leq(p\cdot O(\log S)^{d-1})^{k}. ∎

Lemma 2.4.

Suppose f𝒞(S,d)f\in\mathcal{C}(S,d). Then there’s a depth kk decision tree hh so that

|𝔼[f(𝐔)]𝔼[f(𝝆𝐙)]||𝔼[f(𝐔)]𝔼[f(𝐙)]|+(pO(logS)d1)k|\mathbb{E}[f(\mathbf{U})]-\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]|\leq|\mathbb{E}[f^{\prime}(\mathbf{U})]-\mathbb{E}[f^{\prime}(\mathbf{Z})]|+(p\cdot O(\log S)^{d-1})^{k}
Proof.

Lemma 2.3 gives us the following upper bound.

|𝔼[f(𝐔)]𝔼[f(𝝆𝐙)]|\displaystyle|\mathbb{E}[f(\mathbf{U})]-\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]| |(𝔼[h𝝆(𝐔)]±q)(𝔼[h𝝆(𝐙)]±q)|\displaystyle\leq|(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]\pm q)-(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\pm q)| (Lemma 2.3)
|𝔼[h𝝆(𝐔)]𝔼[h𝝆(𝐙)]|+2q\displaystyle\leq|\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]|+2q (triangle inequality)

We can continue to upper bound the right-hand term by

|𝔼[h𝝆(𝐔)]𝔼[h𝝆(𝐙)]|\displaystyle|\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]| =|𝔼𝝆[𝔼[hρ(𝐔)]𝔼[hρ(𝐙)]]|\displaystyle=|\mathbb{E}_{\boldsymbol{\rho}}[\mathbb{E}[h_{\rho}(\mathbf{U})]-\mathbb{E}[h_{\rho}(\mathbf{Z})]]|
𝔼𝝆[|𝔼[hρ(𝐔)]𝔼[hρ(𝐙)]|]\displaystyle\leq\mathbb{E}_{\boldsymbol{\rho}}[|\mathbb{E}[h_{\rho}(\mathbf{U})]-\mathbb{E}[h_{\rho}(\mathbf{Z})]|] (triangle inequality)
|𝔼[h(𝐔)]𝔼[h(𝐙)]|\displaystyle\leq|\mathbb{E}[h(\mathbf{U})]-\mathbb{E}[h(\mathbf{Z})]|

where the last line holds for some hh in the support of h𝝆h_{\boldsymbol{\rho}} by averaging. ∎

3 Proof of 1.3

We start by reducing the problem of constructing a pseudorandom 𝐙\mathbf{Z} for 𝖠𝖢0{\mathsf{AC}}^{0} to constructing a pseudorandom 𝐙\mathbf{Z} for small-depth decision trees. This can be immediately achieved by applying Lemma 2.4.

Claim 3.1.

Let p[0,1]p\in[0,1] be arbitrary and suppose 𝐙\mathbf{Z} is a random variable over {0,1}n\{0,1\}^{n} which is ϵ\epsilon-pseudorandom for depth-kk decision trees. Then for 𝛒Rp\boldsymbol{\rho}\sim R_{p}, 𝛒𝐙\boldsymbol{\rho}\circ\mathbf{Z} is ϵ\epsilon^{\prime}-pseudorandom for 𝒞(S,d)\mathcal{C}(S,d) for

ϵ=ϵ+(pO(logS)d1)k\epsilon^{\prime}=\epsilon+(p\cdot O(\log S)^{d-1})^{k}

The next lemma constructs a pseudorandom distribution for depth-kk decision trees with each bit having significant bias.

Lemma 3.1.

For any k,δ>0k\in\mathbb{N},\delta>0 and K1/2δK\geq 1/2\delta, there is a kk-wise independent random variable 𝐒\mathbf{S} over {0,1}n\{0,1\}^{n} and a δ\delta-dense subset 𝐃\mathbf{D} of 𝐒\mathbf{S} with the property that

  1. 1.

    𝐃\mathbf{D} is δ\delta-dense in 𝐒\mathbf{S}.

  2. 2.

    For all i[n]i\in[n], biasi(𝐃)=Ω(log(1/δ)/8K)\mathrm{bias}_{i}(\mathbf{D})=\Omega(\sqrt{\log(1/\delta)/8K})

  3. 3.

    𝐒\mathbf{S} is k2/Kk^{2}/K-pseudorandom for depth-kk decision trees.

We will use the following standard lower bound on the lower tail of a binomial distribution:

Claim 3.2 ([6]).

For 0<α<10<\alpha<1 and let 𝐙1,,𝐙K\mathbf{Z}_{1},...,\mathbf{Z}_{K} be independent unbiased coins ({0,1}\{0,1\}-valued). Then any γ\gamma with 1/2γ=r/K1/2-\gamma=r/K for some positive integer rr satisfies

2K(1h(1/2γ))2KPr[i[K]𝐙iK/2Kγ]\frac{2^{-K(1-h(1/2-\gamma))}}{\sqrt{2K}}\leq\Pr[\sum_{i\in[K]}\mathbf{Z}_{i}\leq K/2-K\gamma]
Proof of Lemma 3.1.

We sample 𝐒\mathbf{S} in two stages. First, randomly partition [n][n] into KK parts 𝐀1,,𝐀K\mathbf{A}_{1},...,\mathbf{A}_{K} for K>k2K>k^{2}. Second, assign to each AiA_{i} a uniformly random bit 𝐛i\mathbf{b}_{i}.

Let 𝐃\mathbf{D} be 𝐒\mathbf{S} conditioned on 𝐛=(𝐛1,,𝐛K)\mathbf{b}=(\mathbf{b}_{1},...,\mathbf{b}_{K}) having weight less than K/2Klog(1/δ)/8K/2-\sqrt{K\log(1/\delta)/8}. Since the 𝐛i\mathbf{b}_{i}’s are unbiased random bits, we can apply 3.2 to lower bound 𝐃\mathbf{D}’s density: for any γ\gamma,

Pr[i[k]𝐛iγK]2Kh(1/2γ)2K.\Pr[\sum_{i\in[k]}\mathbf{b}_{i}\leq\gamma K]\geq\frac{2^{-Kh(1/2-\gamma)}}{\sqrt{2K}}.

This is at least δ\delta when

2K(1h(1/2γ))2K\displaystyle\frac{2^{-K(1-h(1/2-\gamma))}}{\sqrt{2K}} δ\displaystyle\geq\delta
1h(1/2γ)\displaystyle 1-h(1/2-\gamma) log(1/δ)/Klog(2K)/2K\displaystyle\geq\log(1/\delta)/K-\log(2K)/2K
4γ2\displaystyle 4\gamma^{2} log(1/δ)/Klog(2K)/2K\displaystyle\geq\log(1/\delta)/K-\log(2K)/2K

with the upper bound in the last line following from h(1/2γ)14γ2h(1/2-\gamma)\geq 1-4\gamma^{2}. Hence, if the set of strings with weight at most K/2γKK/2-\gamma K is δ\delta-dense, we have γ12log(1/δ)/Klog(2K)/2K\gamma\geq\frac{1}{2}\sqrt{\log(1/\delta)/K-\log(2K)/2K}. log(2K)/2K\log(2K)/2K is at most log(1/δ)/2K\log(1/\delta)/2K when 2K1/δ2K\leq 1/\delta, in which case γlog(1/δ)/8K\gamma\geq\sqrt{\log(1/\delta)/8K}. In particular, this lower bounds the bias of 𝐃\mathbf{D}’s bits.

To see why it’s k2/Kk^{2}/K-pseudorandom for depth-kk decision trees, consider a depth-kk decision tree TT. Over 𝐔\mathbf{U}, we can imagine evaluting TT ‘on-line’ as follows: whenever TT queries the iith bit, determine the value of ziz_{i} by flipping an unbiased coin. Over 𝐒\mathbf{S}, we can imagine evaluating TT similarly, where we determine the bucket AjA_{j} that ii lives in and the value bjb_{j} of that bucket.

By conditioning 𝐒\mathbf{S} on not placing two distinct indices i,ji,j in the same bucket — call this conditioned random variable 𝐒\mathbf{S}^{\prime} — then TT doesn’t have any distinguish advantage over 𝐒\mathbf{S}^{\prime}, as all of the bits it queries are independent and uniform. By a union bound, 𝐒\mathbf{S} places two distinct indices in the same bucket with probability at most k2/Kk^{2}/K. TT’s distinguishing advantage is therefore at most k2/Kk^{2}/K. ∎

In principle, we could have used other pseudorandom distributions for decision trees such as the ε\varepsilon-almost kk-wise independent distributions from [4]. The construction here is used to obtain better dependence on the parameters of interest. We will also need a claim to expresses the bias of the bits in 𝝆𝐙\boldsymbol{\rho}\circ\mathbf{Z}. The proof can be found in the appendix.

Claim 3.3.

Fix p[0,1]p\in[0,1] and a random variable 𝐙\mathbf{Z}. Let EE be an event which is independent from ρ\mathbf{\rho} (in that the conditional distribution of ρ\mathbf{\rho} is identical to the unconditioned distribution). Then

Pr[(𝝆𝐙)i=1|E]=pPr[𝐙i=1|E]+(1p)/2\Pr[(\boldsymbol{\rho}\circ\mathbf{Z})_{i}=1|E]=p\Pr[\mathbf{Z}_{i}=1|E]+(1-p)/2

1.3, which we restate here, is obtained by an appropriate setting of parameters.

See 1.3

Proof.

Let n=log(1/δ)/ε2n=\log(1/\delta)/\varepsilon^{2}, k=log(2/ε)k=\log(2/\varepsilon^{\prime}) and K=(2k2)/εK=(2k^{2})/\varepsilon^{\prime}. We also need K1/2δK\geq 1/2\delta by the restriction in Lemma 3.1, which explaines the restriction 8δk2ε8\delta k^{2}\geq\varepsilon^{\prime}, simplified by using 8δε8\delta\geq\varepsilon^{\prime} (a stronger restriction) instead. Let 𝐒\mathbf{S} and 𝐃\mathbf{D} be the random variables from Lemma 3.1. By 3.3, the bias of 𝝆𝐒\boldsymbol{\rho}\circ\mathbf{S} (where 𝝆Rp\boldsymbol{\rho}\sim R_{p}) is plog(1/δ)/8Kp\sqrt{\log(1/\delta)/8K}. By 3.1 and Lemma 3.1, 𝝆𝐒\boldsymbol{\rho}\circ\mathbf{S} is ε=k2/K+(pO(logS)d1)k\varepsilon^{\prime}=k^{2}/K+(pO(\log S)^{d-1})^{k} pseudorandom. We can also ensure that 𝝆𝐒\boldsymbol{\rho}\circ\mathbf{S} does not have a δ\delta-dense ε\varepsilon-model when plog(1/δ)/8Kε+log(1/δ)/np\sqrt{\log(1/\delta)/8K}\geq\varepsilon+\sqrt{\log(1/\delta)/n}, by Lemma 2.1.

By substituting, p2K/n=2(kε)2/εlog(1/δ)p\geq 2\sqrt{K/n}=2\sqrt{(k\varepsilon)^{2}/\varepsilon^{\prime}\cdot\log(1/\delta)}. In comparison, εk2/K+(pO(logS)d1)k\varepsilon^{\prime}\geq k^{2}/K+(pO(\log S)^{d-1})^{k}. Recalling that k2/K=ε/2k^{2}/K=\varepsilon^{\prime}/2, we get that

ε/2\displaystyle\varepsilon^{\prime}/2 (2K/nO(logS)d1)k\displaystyle\geq(2\sqrt{K/n}O(\log S)^{d-1})^{k}
n2K(ε/2)1/k\displaystyle\frac{\sqrt{n}}{2\sqrt{K}}(\varepsilon^{\prime}/2)^{1/k} O(logS)d1\displaystyle\geq O(\log S)^{d-1}
log1/δεε22k(ε/2)1/k\displaystyle\frac{\sqrt{\log 1/\delta}}{\varepsilon}\cdot\frac{\sqrt{\varepsilon^{\prime}}}{2\sqrt{2}k}\cdot(\varepsilon^{\prime}/2)^{1/k} O(logS)d1.\displaystyle\geq O(\log S)^{d-1}.
log1/δεε32log(1/ε)\displaystyle\frac{\sqrt{\log 1/\delta}}{\varepsilon}\cdot\frac{\sqrt{\varepsilon^{\prime}}}{\sqrt{32}\log(1/\varepsilon^{\prime})} O(logS)d1.\displaystyle\geq O(\log S)^{d-1}.

The claim follows by solving for SS. ∎

4 Proof of 1.4

1.4 follows by combining Lemma 2.2 and the following lemma:

Lemma 4.1.

𝐍α\mathbf{N}_{\alpha} has (ε,δ)(\varepsilon,\delta)-pseudodensity for 𝒞(S,d)\mathcal{C}(S,d) for ε=(pO(logS)d1)k\varepsilon=(p\cdot O(\log S)^{d-1})^{k} and δ=eαk/p\delta=e^{-\alpha k/p}.

Of note, the only additive error depends on the error from the switching lemma. Compare this with the claim that 𝐍α\mathbf{N}_{\alpha} is (3αO(logS)d1)(3\alpha\cdot O(\log S)^{d-1})-pseudorandom (and therefore has the same pseudodensity for δ=1\delta=1) for 𝒞(S,d)\mathcal{C}(S,d), due to Tal [46].

To prove the lemma, we need a few claims.

Claim 4.1.

Suppose f𝒞(S,d)f\in\mathcal{C}(S,d). Then there is a depth-kk decision tree hh with the property that:

𝔼[f(𝐍α)]𝔼[h(𝐍α/p)]+(pO(logS)d1)k\mathbb{E}[f(\mathbf{N}_{\alpha})]\leq\mathbb{E}[h(\mathbf{N}_{\alpha/p})]+(p\cdot O(\log S)^{d-1})^{k}
Proof.

Take 𝐙=𝐍α/p\mathbf{Z}=\mathbf{N}_{\alpha/p} in Lemma 2.3, so we have 𝝆𝐍α/p=𝐍α\boldsymbol{\rho}\circ\mathbf{N}_{\alpha/p}=\mathbf{N}_{\alpha} and

𝔼[f(𝐍α)]𝔼[h𝝆(𝐍α/p)]+(pO(logS)d1)k\mathbb{E}[f(\mathbf{N}_{\alpha})]\leq\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{N}_{\alpha/p})]+(p\cdot O(\log S)^{d-1})^{k}

Averaging over ρ\mathbf{\rho} yields the fixed decision tree. ∎

Second, we can upper bound the extent to which the acceptance probability of a short decision tree increases when passing from the uniform distribution 𝐔\mathbf{U} to the biased distribution 𝐍γ\mathbf{N}_{\gamma}

Claim 4.2.

Suppose f:{0,1}n{1,1}f:\{0,1\}^{n}\to\{-1,1\} is a depth-kk decision tree. Then

𝔼[f(𝐍γ)](1+γ)k𝔼[f(𝐔)]eγk𝔼[f(𝐔)]\mathbb{E}[f(\mathbf{N}_{\gamma})]\leq(1+\gamma)^{k}\cdot\mathbb{E}[f(\mathbf{U})]\leq e^{\gamma k}\cdot\mathbb{E}[f(\mathbf{U})]

The proof amounts to a calculation, which we include in the appendix. We’re now in a position to prove the lemma.

Proof of Lemma 4.1.

Directly applying 4.1, we get

𝔼[f(𝐍α)]𝔼[f(𝐍α/p)]+(pO(logS)d1)k\mathbb{E}[f(\mathbf{N}_{\alpha})]\leq\mathbb{E}[f^{\prime}(\mathbf{N}_{\alpha/p})]+(p\cdot O(\log S)^{d-1})^{k}

Applying 4.2 to 𝔼[f(𝐍α/p)]\mathbb{E}[f^{\prime}(\mathbf{N}_{\alpha/p})], we get

𝔼[f(𝐍α)]\displaystyle\mathbb{E}[f(\mathbf{N}_{\alpha})] (1+α/p)k𝔼[f(𝐔)]\displaystyle\leq(1+\alpha/p)^{k}\mathbb{E}[f^{\prime}(\mathbf{U})]
eαk/p𝔼[f(𝐔)].\displaystyle\leq e^{\alpha k/p}\mathbb{E}[f^{\prime}(\mathbf{U})].

Putting these together finishes the proof. ∎

We can now prove 1.4, restated here:

See 1.4

Proof of 1.4.

Let n=1/(7ε)n=1/(7\varepsilon), k=log(1/ε)k=\log(1/\varepsilon^{\prime}) and α=ε/δ\alpha=\sqrt{\varepsilon/\delta}. These choices satisfy α18δ(1/n+ε)\alpha\geq\sqrt{\frac{1}{8\delta}}(1/n+\varepsilon), meaning 𝐍α\mathbf{N}_{\alpha} is not δ\delta-dense in any ε\varepsilon-pseudorandom set for 𝒞(S,d)\mathcal{C}(S,d), by Lemma 2.2.

By Lemma 4.1, 𝐍α\mathbf{N}_{\alpha} has (ε,δ)(\varepsilon^{\prime},\delta)-pseudodensity for δ=eαk/p\delta=e^{-\alpha k/p} and ε=(pO(logS)d1)k\varepsilon^{\prime}=(p\cdot O(\log S)^{d-1})^{k}.

The constraint on the density implies

δ\displaystyle\delta =eαk/p\displaystyle=e^{-\alpha k/p}
log(1/δ)\displaystyle\log(1/\delta) =αk/p\displaystyle=\alpha k/p
log(1/δ)\displaystyle\log(1/\delta) =ε/δlog(1/ε)/p\displaystyle=\sqrt{\varepsilon/\delta}\log(1/\varepsilon^{\prime})/p
p\displaystyle p =ε/δlog(1/ε)log(1/δ).\displaystyle=\frac{\sqrt{\varepsilon/\delta}\log(1/\varepsilon^{\prime})}{\log(1/\delta)}.

Plugging this value of pp into the expression for ε\varepsilon^{\prime}, we get

ε\displaystyle\varepsilon^{\prime} =(pO(logS)d1)k\displaystyle=(p\cdot O(\log S)^{d-1})^{k}
(ε)1/k/p\displaystyle(\varepsilon^{\prime})^{1/k}/p =O(logS)d1\displaystyle=O(\log S)^{d-1}
(ε)1/log(1/ε)δlog(1/δ)εlog(1/ε)\displaystyle(\varepsilon^{\prime})^{1/\log(1/\varepsilon^{\prime})}\cdot\frac{\sqrt{\delta}\log(1/\delta)}{\sqrt{\varepsilon}\log(1/\varepsilon^{\prime})} =O(logS)d1.\displaystyle=O(\log S)^{d-1}.

Note that (ε)1/log(1/ε)=2log(1/ε)/log(1/ε)=1/2(\varepsilon^{\prime})^{1/\log(1/\varepsilon^{\prime})}=2^{-\log(1/\varepsilon^{\prime})/\log(1/\varepsilon^{\prime})}=1/2. Solving for SS gives the claimed bound. ∎

5 Proofs of 1.5 and 1.6

In this section, we prove 1.5 and 1.6. We include them in the same section due to their similarity and start with 1.5 which is more involved. Indeed, 1.6 will follow directly from a result of Shaltiel and Viola [42] after an appropriate error reduction procedure.

5.1 Proof of 1.5

As in 1.4, we will prove unconditional pseudodensity for 𝐍α\mathbf{N}_{\alpha} and compare it with the lower bound for α\alpha given by Lemma 2.2.

Let 𝐒𝐩n,k\mathbf{Sp}_{n,k} denote the (random variable uniform over the) set of nn-bit strings of weight exactly n/2kn/2-k. We convert a function which refutes the pseudodensity of 𝐍α\mathbf{N}_{\alpha} into a (random) function which distinguishes between the two ‘slices’ of the hypercube of weight m/2αmm/2-\alpha m and m/2m/2, which extend similar ideas from [42, 33, 44].

Lemma 5.1.

Let f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\} be a function for which δ𝔼[f(𝐍α)]ε𝔼[f(𝐔)]\delta\mathbb{E}[f(\mathbf{N}_{\alpha})]-\varepsilon^{\prime}\geq\mathbb{E}[f(\mathbf{U})] and let q=𝔼[f(𝐍α)]q=\mathbb{E}[f(\mathbf{N}_{\alpha})]. Then for any positive integer \ell, there is a random function 𝐅:{0,1}m{0,1}\mathbf{F}:\{0,1\}^{m}\to\{0,1\} so that

Pr[𝐅(𝐒𝐩m,αm)=0]\displaystyle\Pr[\mathbf{F}(\mathbf{Sp}_{m,\alpha m})=0] eq;\displaystyle\leq e^{-\ell q};
Pr[𝐅(𝐒𝐩m,0)=1]\displaystyle\Pr[\mathbf{F}(\mathbf{Sp}_{m,0})=1] qδ.\displaystyle\leq\ell q\delta.

Moreover, 𝐅\mathbf{F} is the OR of \ell copies of ff (on random inputs).

Proof.

Fix an input z{0,1}mz\in\{0,1\}^{m}. For i[]i\in[\ell], let 𝐈i\mathbf{I}_{i} denote the random length nn sequence over [m][m] obtained by sampling nn indices j1,,jn[m]j_{1},...,j_{n}\in[m] independently, uniformly at random (with replacement).

The random function is defined as

𝐅(z)=i[]f(z𝐈i)\mathbf{F}(z)=\bigvee_{i\in[\ell]}f(z_{\mathbf{I}_{i}})

We can evaluate 𝐅\mathbf{F}’s acceptance probability on 𝐒𝐩m,αm\mathbf{Sp}_{m,\alpha m} and 𝐒𝐩m,0\mathbf{Sp}_{m,0} as follows:

  1. 1.

    Suppose z𝐒𝐩m,αmz\in\mathbf{Sp}_{m,\alpha m}. Then each z𝐈iz_{\mathbf{I}_{i}} is distributed as 𝐍α\mathbf{N}_{\alpha} on nn bits, meaning the z𝐈iz_{\mathbf{I}_{i}}’s constitute \ell independent samples from 𝐍α\mathbf{N}_{\alpha}. By definition, ff accepts with probability qq over 𝐍α\mathbf{N}_{\alpha} and so the probability that ff doesn’t accept in \ell independent runs is (1q)kekq(1-q)^{k}\leq e^{-kq}.

  2. 2.

    Suppose z𝐒𝐩m,m/2z\in\mathbf{Sp}_{m,m/2}. Then z𝐈i{0,1}z_{\mathbf{I}_{i}}\in\{0,1\} is a uniformly random string, meaning we have \ell independent samples from 𝐔\mathbf{U}. For each sample, ff outputs 11 independently with probability q=𝔼[f(𝐔)]q^{\prime}=\mathbb{E}[f(\mathbf{U})], so that this occurs once in \ell attempts happens with probability at most q\ell q^{\prime}. Since qqδεqδq^{\prime}\leq q\delta-\varepsilon\leq q\delta, we can bound this by qδ\ell q\delta.

Lemma 5.2.

Fix ε,δ>0\varepsilon^{\prime},\delta>0 and δ<c\delta<c for some absolute constant c1/200c\approx 1/200 Let 𝔽\mathbb{F} be a field of positive characteristic p=O(1)p=O(1) and let \mathcal{F} be the set of all polynomials P𝔽[X1,,Xn]P\in\mathbb{F}[X_{1},...,X_{n}] with degree at most dd. Then if 𝐍α\mathbf{N}_{\alpha} has (ε,δ)(\varepsilon^{\prime},\delta)-pseudodensity with respect to \mathcal{F}, d=O(1/α)d=O(1/\alpha).

Note that there is no dependence on ε\varepsilon^{\prime}. This is a relic of the field size p>0p>0, which allows us to approximate large fan-in ORs with polynomials whose degree does not depend on the fan-in but only the quality of approximation.

We will proceed in the contrapositive; if 𝐍α\mathbf{N}_{\alpha} isn’t pseudodense, we will use the witness to construct a polynomial which we can prove degree lower bounds on directly. By ensuring that the degrees are related by a constant factor, we get the lower bound. The proof uses the following approximation of the OR function as a low-degree polynomial over 𝔽\mathbb{F} when 𝔽\mathbb{F} has positive characteristic (in characteristic zero, there is dependence on the fan-in of the OR, which we want to avoid).

Claim 5.1 ([37]).

For any nn and any finite field 𝔽\mathbb{F} with characteristic p>0p>0, there is a distribution 𝐑\mathbf{R} on degree dd polynomials so that for all z{0,1}nz\in\{0,1\}^{n}

Pr[𝐑(x)=OR(x)]1γ\Pr[\mathbf{R}(x)=OR(x)]\geq 1-\gamma

where dplog(1/γ)d\leq p\log(1/\gamma). Moreover, 𝐑\mathbf{R} is supported on polynomials RR with R(z){0,1}R(z)\in\{0,1\} for every z{0,1}nz\in\{0,1\}^{n}

We also use a special case of the robust Hegëdus lemma, discovered recently by Srinivasan [44].

Lemma 5.3 (Robust Hegëdus lemma (special case), [44]).

Let 𝔽\mathbb{F} be a finite field. Let 2m/100λc2^{-m/100}\leq\lambda\leq c where c<1c<1 is a (small) absolute constant. Let α2m\alpha^{2}m be an integer so that 22α2mλ2^{-2\alpha^{2}m}\geq\lambda. Then if P:𝔽n𝔽P:\mathbb{F}^{n}\to\mathbb{F} is a degree dd polynomial for which:

  1. 1.

    Pr[P(𝐒𝐩m,αm)0]λ\Pr[P(\mathbf{Sp}_{m,\alpha m})\neq 0]\leq\lambda

  2. 2.

    Pr[P(𝐒𝐩m,0)=0]1eα2m/2\Pr[P(\mathbf{Sp}_{m,0})=0]\leq 1-e^{-\alpha^{2}m/2}

Then d=Ω(αm)d=\Omega(\alpha m).

Now we can prove the main lemma.

Proof of Lemma 5.2.

Let P:𝔽n𝔽P:\mathbb{F}^{n}\to\mathbb{F} be a degree dd polynomial. We can assume PP takes boolean values over {0,1}n\{0,1\}^{n} by replacing PP with Pp1P^{p-1} (pp being 𝔽\mathbb{F}’s characteristic). This increases the degree to less than pdpd. Assume towards a contradiction that PP can certify that 𝐍α\mathbf{N}_{\alpha} is not (ε,δ)(\varepsilon^{\prime},\delta)-pseudodense, in the sense that δPr[P(𝐍α)0]εPr[P(𝐔)0]\delta\Pr[P(\mathbf{N}_{\alpha})\neq 0]-\varepsilon^{\prime}\geq\Pr[P(\mathbf{U})\neq 0].

Let q=Pr[P(𝐍α)]q=\Pr[P(\mathbf{N}_{\alpha})]. Applying Lemma 5.1 to PP (noting that PP is boolean on {0,1}n\{0,1\}^{n}) with \ell specified later, we obtain a random polynomial 𝐅\mathbf{F} of the form iP(z𝐈i)\lor_{i}P(z_{\mathbf{I}_{i}}). For a fixed z{0,1}mz\in\{0,1\}^{m}, let 𝐗(z)=(z𝐈i)i[]\mathbf{X}(z)=(z_{\mathbf{I}_{i}})_{i\in[\ell]} For γ\gamma to be determined later, let 𝐑\mathbf{R} be the random polynomial from 5.1 and we remark that 𝐑\mathbf{R} can be made to output boolean values on boolean inputs. Then for any z{0,1}mz\in\{0,1\}^{m}, 𝐅(z)=OR(𝐗(z))\mathbf{F}(z)=OR(\mathbf{X}(z)) so 𝐅(z)=𝐑(𝐗(z))\mathbf{F}(z)=\mathbf{R}(\mathbf{X}(z)) with probability 1γ1-\gamma. In sum, 𝐑(𝐗())\mathbf{R}(\mathbf{X}(\cdot)) has the property that:

Pr[𝐑(𝐗(𝐒𝐩m,αm))=0]\displaystyle\Pr[\mathbf{R}(\mathbf{X}(\mathbf{Sp}_{m,\alpha m}))=0] eq+γ;\displaystyle\leq e^{-\ell q}+\gamma;
Pr[𝐑(𝐗(𝐒𝐩m,0))=1]\displaystyle\Pr[\mathbf{R}(\mathbf{X}(\mathbf{Sp}_{m,0}))=1] qδ+γ.\displaystyle\leq\ell q\delta+\gamma.

Moreover, 𝐑(𝐗())\mathbf{R}(\mathbf{X}(\cdot)) is a random polynomial over 𝔽\mathbb{F} of degree at most dp2log(1/γ)dp^{2}\log(1/\gamma), with the p2p^{2} coming from possibly replacing PP with Pp1P^{p-1}.

We now apply Lemma 5.3 to 𝐑(𝐗())\mathbf{R}(\mathbf{X}(\cdot)). Let cc denote the constant from the statement of the lemma. Let m=ln(1/c)/2α2m=\ln(1/c)/2\alpha^{2}, λ=eq+γc\lambda=e^{-\ell q}+\gamma\leq c, γ=c/2\gamma=c/2 and =ln(2/c)/q(ln(2/c)δ)/ε\ell=\ln(2/c)/q\leq(\ln(2/c)\delta)/\varepsilon^{\prime} where the inequality is because qε/δq\geq\varepsilon^{\prime}/\delta follows from δqε0\delta q-\varepsilon^{\prime}\geq 0. With this setting of parameters, note that, in regards to the hypotheses of Lemma 5.3, we have 2m/100λ22α2m=c2^{-m/100}\leq\lambda\leq 2^{-2\alpha^{2}m}=c,

By calculating, 𝐑(𝐗())\mathbf{R}(\mathbf{X}(\cdot))’s classifies 𝐒𝐩m,0\mathbf{Sp}_{m,0} incorrectly with probability eq=c/2e^{-\ell q}=c/2 and classifies 𝐒𝐩m,αm\mathbf{Sp}_{m,\alpha m} incorrectly with probability qδln(2/c)δ\ell q\delta\leq\ln(2/c)\delta, which we can bound away from 11 by choosing δ\delta to be a sufficiently small constant.

Therefore, by Lemma 5.3, the degree of some polynomial in the support of 𝐑(𝐗())\mathbf{R}(\mathbf{X}(\cdot)) is Ω(1/α)\Omega(1/\alpha). Since γ\gamma and pp (𝔽\mathbb{F}’s characteristic) are constants, the same conclusion therefore holds of PP. ∎

We can now prove 1.5, restated here:

See 1.5

Proof of 1.5.

If 𝐍α\mathbf{N}_{\alpha} is (ε,δ)(\varepsilon^{\prime},\delta)-pseudodense with respect to degree-dd polynomials over 𝔽\mathbb{F}, then d=O(1/α)d=O(1/\alpha). On the other hand, taking α1/8δ(1/n+ε)\alpha\geq\sqrt{1/8\delta(1/n+\varepsilon)} implies that 𝐍α\mathbf{N}_{\alpha} is not δ\delta-dense in an ε\varepsilon-pseudorandom set. Therefore, if we take n=1/εn=1/\varepsilon then picking α2ε/8δ\alpha\geq\sqrt{2\varepsilon/8\delta} will give us a separation. ∎

5.2 Proof of 1.6

In what is becoming tradition, we will show an unconditional pseudodensity result, from which 1.6 will follow by Lemma 2.2.

Lemma 5.4.

Let ε>0\varepsilon>0 and 1/4>δ>01/4>\delta>0 be arbitrary. Suppose \mathcal{F} is a test class of boolean functions f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\} with the following property: there is no 𝖠𝖢0{\mathsf{AC}}^{0} \mathcal{F}-oracle circuit of size 𝗉𝗈𝗅𝗒(n/αε){\mathsf{poly}}(n/\alpha\varepsilon) which computes 𝖬𝖠𝖩\mathsf{MAJ} on O(1/α)O(1/\alpha) input bits. Then 𝐍α\mathbf{N}_{\alpha} is (ϵδ,δ)(\epsilon\delta,\delta)-pseudodense.

This will follow from the result of from Shaltiel and Viola:

Theorem 5.1 ([42]).

Let f:{0,1}n{0,1}f:\{0,1\}^{n}\to\{0,1\} be a function that distinguishes between 𝐔\mathbf{U} and 𝐍α\mathbf{N}_{\alpha} with constant distinguishing probability. Then there is an 𝖠𝖢0{\mathsf{AC}}^{0}-circuit of size 𝗉𝗈𝗅𝗒(n/α){\mathsf{poly}}(n/\alpha) using ff-oracle gates which computes majority on O(1/α)O(1/\alpha) bits.

Proof of Lemma 5.4.

Suppose 𝐍α\mathbf{N}_{\alpha} is not (ϵδ,δ)(\epsilon\delta,\delta)-pseudodense for \mathcal{F} as in the statement of the theorem and let ff witness this fact with q=𝔼[f(𝐍α)]q=\mathbb{E}[f(\mathbf{N}_{\alpha})].

Let

𝐅(z)=jf(𝝆i(z))\mathbf{F}(z)=\bigvee_{j\in\ell}f(\boldsymbol{\rho}_{i}(z))

where each 𝝆i\boldsymbol{\rho}_{i} is a random permutation of [n][n]. When z𝐍αz\sim\mathbf{N}_{\alpha}, 𝝆i(z)\boldsymbol{\rho}_{i}(z) is distributed as 𝐍α\mathbf{N}_{\alpha} and likewise for 𝐔\mathbf{U}. Hence 𝐅\mathbf{F} rejects samples from 𝐍α\mathbf{N}_{\alpha} with probability (1q)eq(1-q)^{\ell}\leq e^{-q\ell} which is constant when =1/qδ/ε=1/ε\ell=1/q\leq\delta/\varepsilon^{\prime}=1/\varepsilon, this latter property following because δqε>0\delta q-\varepsilon^{\prime}>0. Additionally, 𝐅\mathbf{F} accepts samples from 𝐔\mathbf{U} with probability qδ\ell q\delta, which is at most δ\delta for our choice of \ell. Averaging, some function in 𝐅\mathbf{F}’s support distinguishes 𝐍α\mathbf{N}_{\alpha} and 𝐔\mathbf{U} with constant advantage. Applying 5.1 yields a small 𝖠𝖢0{\mathsf{AC}}^{0} circuit computing majority, which is a contradiction. ∎

See 1.6

The proof is the same as 1.5, so we omit the details.

References

  • [1] Scott Aaronson. Bqp and the polynomial hierarchy. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 141–150, 2010.
  • [2] Rohit Agrawal. Coin theorems and the fourier expansion. arXiv preprint arXiv:1906.03743, 2019.
  • [3] Miklos Ajtai and Avi Wigderson. Deterministic simulation of probabilistic constant depth circuits. In 26th Annual Symposium on Foundations of Computer Science (sfcs 1985), pages 11–19. IEEE, 1985.
  • [4] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.
  • [5] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 224–232. JMLR. org, 2017.
  • [6] R.B. Ash. Information Theory. Dover books on advanced mathematics. Dover Publications, 1990.
  • [7] Boaz Barak, Moritz Hardt, and Satyen Kale. The uniform hardcore lemma via approximate bregman projections. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 1193–1200. SIAM, 2009.
  • [8] Boaz Barak, Ronen Shaltiel, and Avi Wigderson. Computational analogues of entropy. In Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques, pages 200–215. Springer, 2003.
  • [9] Thomas F Bloom and Olof Sisask. Breaking the logarithmic barrier in roth’s theorem on arithmetic progressions. arXiv preprint arXiv:2007.03528, 2020.
  • [10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 30–39. IEEE, 2010.
  • [11] Mei-Chu Chang et al. A polynomial bound in freiman’s theorem. Duke mathematical journal, 113(3):399–419, 2002.
  • [12] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
  • [13] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandomness from hardness. Technical report, ECCC preprint TR19-099, 2019.
  • [14] Stefan Dziembowski and Krzysztof Pietrzak. Leakage-resilient cryptography. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 293–302. IEEE, 2008.
  • [15] Yoav Freund and Robert Schapire. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771-780):1612, 1999.
  • [16] W. T. Gowers. A new proof of szemerédi’s theorem. Geometric & Functional Analysis GAFA, 11(3):465–588, 2001.
  • [17] W. T. Gowers. Decompositions, approximate structure, transference, and the hahn–banach theorem. Bulletin of the London Mathematical Society, 42(4):573–606, 2010.
  • [18] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions. Annals of Mathematics, pages 481–547, 2008.
  • [19] Aryeh Grinberg, Ronen Shaltiel, and Emanuele Viola. Indistinguishability by adaptive procedures with advice, and lower bounds on hardness amplification proofs. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 956–966. IEEE, 2018.
  • [20] Iftach Haitner, Omer Reingold, and Salil Vadhan. Efficiency improvements in constructing pseudorandom generators from one-way functions. SIAM Journal on Computing, 42(3):1405–1430, 2013.
  • [21] Iftach Haitner, Omer Reingold, Salil Vadhan, and Hoeteck Wee. Inaccessible entropy. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 611–620, 2009.
  • [22] Lianna Hambardzumyan and Yaqiao Li. Chang’s lemma via pinsker’s inequality. Discrete Mathematics, 343(1):111496, 2020.
  • [23] Johan Håstad. Computational limitations of small-depth circuits. 1987.
  • [24] Johan Håstad. On the correlation of parity and small-depth circuits. SIAM Journal on Computing, 43(5):1699–1708, 2014.
  • [25] Johan Håstad, Russell Impagliazzo, Leonid A Levin, and Michael Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
  • [26] Thomas Holenstein. Key agreement from weak bit agreement. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 664–673, 2005.
  • [27] Chun-Yuan Hsiao, Chi-Jen Lu, and Leonid Reyzin. Conditional computational entropy, or toward separating pseudoentropy from compressibility. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 169–186. Springer, 2007.
  • [28] Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 538–545. IEEE, 1995.
  • [29] Russell Impagliazzo. Connections between pseudo-randomness and machine learning: boosting, dense models, and regularity. 2020.
  • [30] Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisfiability algorithm for ac0. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 961–972. SIAM, 2012.
  • [31] Russell Impagliazzo, Cristopher Moore, and Alexander Russell. An entropic proof of chang’s inequality. SIAM Journal on Discrete Mathematics, 28(1):173–176, 2014.
  • [32] Adam R Klivans and Rocco A Servedio. Boosting and hard-core set construction. Machine Learning, 51(3):217–238, 2003.
  • [33] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. A fixed-depth size-hierarchy theorem for 𝖠𝖢0[]{\mathsf{AC}}^{0}[\oplus] via the coin problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 442–453, 2019.
  • [34] Chi-Jen Lu, Shi-Chun Tsai, and Hsin-Lung Wu. Complexity of hard-core set proofs. computational complexity, 20(1):145–171, 2011.
  • [35] Or Meir and Avi Wigderson. Prediction from partial information and hindsight, with application to circuit lower bounds. computational complexity, 28(2):145–183, 2019.
  • [36] Ilya Mironov, Omkant Pandey, Omer Reingod, and Salil Vadhan. Computational differential privacy. In Annual International Cryptology Conference, pages 126–142. Springer, 2009.
  • [37] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
  • [38] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Dense subsets of pseudorandom sets. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 76–85. IEEE, 2008.
  • [39] Benjamin Rossman. An entropy proof of the switching lemma and tight bounds on the decision-tree size of ac0, 2017.
  • [40] Rocco A Servedio and Li-Yang Tan. Improved pseudorandom generators from pseudorandom multi-switching lemmas. arXiv preprint arXiv:1801.03590, 2018.
  • [41] Ronen Shaltiel. Is it possible to improve yao’s xor lemma using reductions that exploit the efficiency of their oracle? In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
  • [42] Ronen Shaltiel and Emanuele Viola. Hardness amplification proofs require majority. SIAM Journal on Computing, 39(7):3122–3154, 2010.
  • [43] Roman Smolensky. On representations by low-degree polynomials. In Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science, pages 130–138. IEEE, 1993.
  • [44] Srikanth Srinivasan. A robust version of hegedus’s lemma, with applications. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 1349–1362. ACM, 2020.
  • [45] Endre Szemerédi. On sets of integers containing no four elements in arithmetic progression. Acta Mathematica Academiae Scientiarum Hungarica, 20(1-2):89–104, 1969.
  • [46] Avishay Tal. Tight bounds on the fourier spectrum of ac0. In 32nd Computational Complexity Conference (CCC 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
  • [47] Michel Talagrand. How much are increasing sets positively correlated? Combinatorica, 16(2):243–258, 1996.
  • [48] Terence Tao, Tamar Ziegler, et al. The primes contain arbitrarily long polynomial progressions. Acta Mathematica, 201(2):213–305, 2008.
  • [49] Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Regularity, boosting, and efficiently simulating every high-entropy distribution. In 2009 24th Annual IEEE Conference on Computational Complexity, pages 126–136. IEEE, 2009.
  • [50] Thomas Watson. Advice lower bounds for the dense model theorem. ACM Transactions on Computation Theory (TOCT), 7(1):1–18, 2015.
  • [51] Andrew C Yao. Theory and application of trapdoor functions. In 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pages 80–91. IEEE, 1982.
  • [52] Jiapeng Zhang. On the query complexity for showing dense model. 2011.

Appendix A Omitted proofs

A.1 Proof of 3.3

See 3.3

Proof.

Let RiR_{i} be the event that i𝝆1()i\in\boldsymbol{\rho}^{-1}(*).

Pr[(𝝆𝐙)i=1|E]\displaystyle\Pr[(\boldsymbol{\rho}\circ\mathbf{Z})_{i}=1|E] =Pr[Ri]Pr[𝐙i=1|Ri,E]+Pr[¬Ri]Pr[𝝆i=1|¬Ri,E]\displaystyle=\Pr[R_{i}]\Pr[\mathbf{Z}_{i}=1|R_{i},E]+\Pr[\neg R_{i}]\Pr[\boldsymbol{\rho}_{i}=1|\neg R_{i},E]
=pPr[𝐙i|E]+(1p)/2\displaystyle=p\Pr[\mathbf{Z}_{i}|E]+(1-p)/2

where we used independence of 𝐙i\mathbf{Z}_{i} from RiR_{i} and the independence of 𝝆i=1\boldsymbol{\rho}_{i}=1 from EE to obtain the final line. ∎

A.2 Proof of 4.2

See 4.2

Proof.

We proceed by induction. When k=0k=0, ff is constant and so the claim holds trivially. Suppose f(z)=(1zi)g(z)+zih(z)f(z)=(1-z_{i})g(z)+z_{i}h(z) where g,hg,h have depth-kk decision trees which don’t depend on ziz_{i}. Note that 𝔼[f(𝐔)]=(𝔼[g(𝐔)]+𝔼[h(𝐔)])/2\mathbb{E}[f(\mathbf{U})]=(\mathbb{E}[g(\mathbf{U})]+\mathbb{E}[h(\mathbf{U})])/2 and assume that 𝔼[g(𝐔)]/2𝔼[f(𝐔)]\mathbb{E}[g(\mathbf{U})]/2\geq\mathbb{E}[f(\mathbf{U})]. For z𝐍εz\sim\mathbf{N}_{\varepsilon},

𝔼[f(z)]\displaystyle\mathbb{E}[f(z)] =𝔼[(1zi)g(z)]+𝔼[zih(z)]\displaystyle=\mathbb{E}[(1-z_{i})g(z)]+\mathbb{E}[z_{i}h(z)]
=(1𝔼[zi])𝔼[g(z)]+𝔼[zi]𝔼[h(z)]\displaystyle=(1-\mathbb{E}[z_{i}])\mathbb{E}[g(z)]+\mathbb{E}[z_{i}]\mathbb{E}[h(z)] (independence)
(1/2ε)(1+ε)k𝔼[g(𝐔)]+(1/2+ε)(1+ε)k𝔼[h(𝐔)]\displaystyle\leq(1/2-\varepsilon)(1+\varepsilon)^{k}\mathbb{E}[g(\mathbf{U})]+(1/2+\varepsilon)(1+\varepsilon)^{k}\mathbb{E}[h(\mathbf{U})] (induction)
=(1+ε)k[(𝔼[g(𝐔)]+𝔼[h(𝐔)])/2ε𝔼[g(𝐔)]+ε𝔼[h(𝐔)]]\displaystyle=(1+\varepsilon)^{k}\Big{[}(\mathbb{E}[g(\mathbf{U})]+\mathbb{E}[h(\mathbf{U})])/2-\varepsilon\mathbb{E}[g(\mathbf{U})]+\varepsilon\mathbb{E}[h(\mathbf{U})]\Big{]}
=(1+ε)k[𝔼[f(𝐔)]ε𝔼[g(𝐔)]+ε(2𝔼[f𝐔]𝔼[g(𝐔)])]\displaystyle=(1+\varepsilon)^{k}[\mathbb{E}[f(\mathbf{U})]-\varepsilon\mathbb{E}[g(\mathbf{U})]+\varepsilon(2\mathbb{E}[f{\mathbf{U}}]-\mathbb{E}[g(\mathbf{U})])]
=(1+ε)k[𝔼[f(𝐔)]+ε(2𝔼[f𝐔]𝔼[g(𝐔)])]\displaystyle=(1+\varepsilon)^{k}[\mathbb{E}[f(\mathbf{U})]+\varepsilon(2\mathbb{E}[f{\mathbf{U}}]-\mathbb{E}[g(\mathbf{U})])]
(1+ε)k[𝔼[f(𝐔)]+ε𝔼[f𝐔]]\displaystyle\leq(1+\varepsilon)^{k}[\mathbb{E}[f(\mathbf{U})]+\varepsilon\mathbb{E}[f{\mathbf{U}}]] (assuming 𝔼[g]/2𝔼[f]\mathbb{E}[g]/2\leq\mathbb{E}[f])
=(1+ε)k+1𝔼[f(𝐔)].\displaystyle=(1+\varepsilon)^{k+1}\cdot\mathbb{E}[f(\mathbf{U})].

The argument for 𝔼[h]/2𝔼[f]\mathbb{E}[h]/2\leq\mathbb{E}[f] is the same. ∎