Comparing computational entropies below majority
(or: When is the dense model theorem false?)

Russell Impagliazzo Sam McGuire UCSE CSE UCSD CSE [email protected] [email protected] Research supported by NSF award 1909634 and a Simons Investigator awardResearch supported by NSF award 1909634 and a Simons Investigator award

Abstract

Computational pseudorandomness studies the extent to which a random variable $\mathbf{Z}$ looks like the uniform distribution according to a class of tests $\mathcal{F}$ . Computational entropy generalizes computational pseudorandomness by studying the extent which a random variable looks like a high entropy distribution. There are different formal definitions of computational entropy with different advantages for different applications. Because of this, it is of interest to understand when these definitions are equivalent.

We consider three notions of computational entropy which are known to be equivalent when the test class $\mathcal{F}$ is closed under taking majorities. This equivalence constitutes (essentially) the so-called dense model theorem of Green and Tao (and later made explicit by Tao-Zeigler, Reingold et al., and Gowers). The dense model theorem plays a key role in Green and Tao’s proof that the primes contain arbitrarily long arithmetic progressions and has since been connected to a surprisingly wide range of topics in mathematics and computer science, including cryptography, computational complexity, combinatorics and machine learning. We show that, in different situations where $\mathcal{F}$ is not closed under majority, this equivalence fails. This in turn provides examples where the dense model theorem is false.

1 Introduction

Computational pseudorandomness is a central topic in theoretical computer science. In this scenario, one has a class $\mathcal{F}$ of boolean functions $f:\{0,1\}^{n}\to\{0,1\}$ (which we’ll refer to as tests) and random variable $\mathbf{Z}$ over $\{0,1\}^{n}$ . We say that $\mathbf{Z}$ is $\epsilon$ -pseudorandom with respect to $\mathcal{F}$ ) if $\max_{f\in\mathcal{F}}|\mathbb{E}[f(\mathbf{Z})]-\mathbb{E}[f(\mathbf{U})]|\leq\varepsilon$ where $\mathbf{U}$ is the uniform distribution over $\{0,1\}^{n}$ and $\varepsilon>0$ is small. In this case, we think of $\mathbf{Z}$ as ‘behaving like the uniform distribution’ according to tests in $\mathcal{F}$ . In general, say that two random variables $\mathbf{X}$ , $\mathbf{Y}$ $\varepsilon$ -indistinguishable by $\mathcal{F}$ if $\max_{f\in\mathcal{F}}|\mathbb{E}[f(\mathbf{X})]-\mathbb{E}[f(\mathbf{Y})]|$ (and so $\varepsilon$ -pseudorandom distributions are exactly those which are $\varepsilon$ -indistinguishable from $\mathbf{U}$ ). Constructing explicit $\mathbf{Z}$ ’s which behave like the uniform distribution according to different test classes is among the central goals of complexity theory, with sufficiently strong constructions leading to, for example, derandomization of ${\mathsf{BPP}}$ . One way in which the theory of pseudo-randomness is rich is that there are multiple equivalent formulations of pseudo-randomness, such as Yao’s next bit test ([51]).

The various notions of pseudo-entropy and pseudo-density generalize pseudo-randomness to formalize how much randomness a distribution looks like it has as far as this class of tests can perceive. Many of these notions were first introduced as stepping stones towards pseudo-randomness, giving properties of sub-routines within constructions of pseudo-random generators. However, measuring seeming randomness quantitatively is important in many other contexts, so these notions have found wider application. For example, in mathematical subjects such as combinatorics and number theory, there is a general phenomenon of “structure vs. randomness”, where a deterministically defined object such as a graph or set of integers can be decomposed into a structured part and a random part. Pseudo-entropy quantifies how much randomness the “random part” has. Notions of pseudo-density were used in this context by Green, Tao, and Ziegler [18, 48] to show that the primes contain arbitrarily long arithmetic progressions. We can also use pseudo-entropy notions to characterize the amount of seeming randomness remains n a cryptographic key after it has been compromised with a side-channel attack. A data set used in a machine learning algorithm might not have much randomness in itself, and might not be completely random looking, but is hopefully representative of the much larger set of inputs that the results of the algorithm will be applied to, so we can use notions of pseudo-entropy to say when such algorithms will generalize. There are many possible definitions of this intuitive idea, and as with pseudo-randomness, the power of pseudo-entropy is that many of these notions have been related or proven equivalent.

In particular, the dense model theorem provides such a basic equivalence. Here, the intuitive concept we are trying to capture is the density (or relative min-entropy) of the target distribution within a larger distribution, what fraction of the larger distribution is within the target. We say that $\mathbf{Z}$ is $\delta$ -dense if $\mathbb{E}[\mu(x)]=2^{-n}\sum_{x}\mu(x)\geq\delta$ where $\mu:\{0,1\}^{n}\to[0,1]$ is density function defining $\mathbf{Z}$ (in the sense that $\Pr[\mathbf{Z}=z]=\mu(z)/(2^{n}\mathbb{E}[\mu(x)])$ ). One application of indistinguishability from a dense distribution is as a stepping stone to pseudorandomness: if $\mathbf{Z}$ is indistinguishable from a distribution $\mathbf{M}$ with density $\delta$ within the uniform distribution, then applying a randomness extractor with min-entropy rate $n-\log(1/\delta)$ to $\mathbf{Z}$ is a pseudorandom distribution. A more sophisticated application comes from additive number theory. It is not hard to show that a random subset of $[N]=\{1,2,...,N\}$ (including each element with probability $1/2$ , say) contains many arithmetic progressions (which are sets of the form $\{a,a+b,a+2b,a+3b,...\}$ ). Szemerédi [45] showed that, in fact, sufficiently dense subsets of the integers also contain such arithmetic progressions: specifically, that for any $k$ , the size of the largest subsets of $[N]$ which doesn’t contain an arithmetic progression grows like $o(N)$ .

So we would like some technology to reason about random variables $\mathbf{Z}$ which ‘behave like dense distributions’. It turns out, however, that formalizing what it means for $\mathbf{Z}$ to ‘behave like a dense distribution’ is subtle. Here are three perfectly legitimate candidates:

Candidate 1:: $\mathbf{Z}$ behaves like a $\delta$ -dense distribution if it behaves like something that’s $\delta$ -dense. Formally, this means that $\mathbf{Z}$ is $\varepsilon$ -indistinguishable from some $\delta$ -dense distribution. In this case, we say that $\mathbf{Z}$ has a $\delta$ -dense $\varepsilon$ -model.
Candidate 2:: $\mathbf{Z}$ behaves $\delta$ -dense if it’s $\delta$ -dense inside of something that behaves like the uniform distribution. Formally this means there’s an $\varepsilon$ -pseudorandom distribution $\mathbf{X}$ in which $\mathbf{Z}$ is $\delta$ -dense. In this case, we say that $\mathbf{Z}$ is $\delta$ -dense in an $\varepsilon$ -pseudorandom set.
Candidate 3:: $\mathbf{Z}$ behaves $\delta$ -dense if it appears to be the case that conditioning on $\mathbf{Z}$ increases the size of any set by at most (roughly) a $1/\delta$ -factor. This is an operational definition: conditioning on a (truly) dense set increases the set by at most a $1/\delta$ -fraction, so we should expect the same behavior from things that behave like a dense set. Formally, this means that $\delta\mathbb{E}[f(\mathbf{Z})]\leq\mathbb{E}[f(\mathbf{U})]+\varepsilon$ for any $f$ in our test class $\mathcal{F}$ . In this case, we say that $\mathbf{Z}$ has $(\varepsilon,\delta)$ -pseudodensity.

Precisely which definition you pick will depend on what you know about $\mathbf{Z}$ and in what sense you would like it to behave like a $\delta$ -dense distribution. Indeed, each of these definitions have appeared in different applications ([25], [18], [13], respectively), so there are scenarios where each of these types of behavior is desired. In general, the first candidate is the strongest (and, arguably, the most natural), but it is sometimes hard to establish that a distribution has the property. The following claim gives some simple relationships between the definitions:

Claim 1.1.

For any $\mathcal{F}$ , the following hold:

1.

If $\mathbf{Z}$ has a $\delta$ -dense $\varepsilon$ -model, then $\mathbf{Z}$ is $\delta$ -dense in a $\varepsilon$ -pseudorandom set.
2.

If $\mathbf{Z}$ is $\delta$ -dense in an $\varepsilon$ -pseudorandom set, then $\mathbf{Z}$ has $(\epsilon,\delta)$ -pseudodensity.

Proof sketch.

1.

Let $\mathbf{M}$ be the $\delta$ -dense $\varepsilon$ -model for $\mathbf{Z}$ . Note that $\mathbf{U}=\delta\mathbf{M}+(1-\delta)\overline{\mathbf{M}}$ . So $\mathbf{U}^{\prime}=\delta\mathbf{Z}+(1-\delta)\overline{\mathbf{M}}$ is $\varepsilon$ -pseudorandom and $\mathbf{Z}$ is $\delta$ -dense within it.
2.

Suppose $\mathbf{Z}$ is $\delta$ -dense in $\mathbf{Z}^{\prime}$ which $\varepsilon$ -pseudorandom for $\mathcal{F}$ . Then for any $f\in\mathcal{F}$ , $\delta\mathbb{E}[f(\mathbf{Z})]\leq\mathbb{E}[f(\mathbf{Z}^{\prime})]\leq\mathbb{E}[f(\mathbf{U}]+\varepsilon$ .

∎

The marvelous quality of these three candidates in particular is that, for many natural $\mathcal{F}$ , all of them are equivalent, and so establishing even $(\epsilon^{\prime},\delta)$ -pseudodensity is enough to guarantee the existence of a $\delta$ -dense $\varepsilon$ -model.

This equivalence holds for $\mathcal{F}$ which are closed under majority, meaning for any $k$ (which we can think of as $k=O(1)$ for now), if $f_{1},...,f_{k}\in\mathcal{F}$ then $\mathsf{MAJ}_{k}(f_{1},...,f_{k})\in\mathcal{F}$ , where $\mathsf{MAJ}:\{0,1\}^{n}\to\{0,1\}$ is $1$ if at least half of its input bits are $1$ . In fact, it holds for more general $\mathcal{F}$ if we allow the distinguishing parameter ( $\varepsilon^{\prime}$ in $(\varepsilon^{\prime},\delta)$ -pseudodensity) to be exponentially small (as in the original formulation, which we’ll dicuss later on). In this case, the subtelty in defining what it means to behave like a dense set vanishes. These equivalences constitute (essentially) what is known as the dense model theorem, originating in the work of Green-Tao [18] and Tao-Zeigler [48], and independently in Barak et al. [8] (though in different guises). This result has been fruitfully applied in many seemingly unrelated areas of mathematics and computer science: additive number theory [18, 48] where $\mathcal{F}$ encodes additive information about subsets of $\{1,...,N\}$ (or possibly a more general group), graph theory [49, 38] where $\mathcal{F}$ encodes cuts in a fixed graph, circuit complexity [49], Fourier analysis [29], machine learning [29] and leakage-resilient cryptography [14]. The ubiquity of the dense model theorem motivates a simple question: are there natural scenarios in which the dense model theorem is false?

We show that the answer to this question is yes. In particular, we show that for either implication from 1.1 there is a class $\mathcal{F}$ and a random variable $\mathbf{Z}$ so that converse fails to hold. From the computational entropy perspective, we show that the three computational entropies we’ve discussed are inequivalent for certain test classes $\mathcal{F}$ . Necessarily (with $\varepsilon^{\prime}$ not exponentially small) these classes are not closed under majority and so we will need to look ‘below’ majority in order to find our counterexamples.

1.1 The dense model theorem

We turn to discuss the dense model theorem in some more detail to better contextualize our work. Restricting our attention to random variable over $\{0,1\}^{n}$ , the dense model theorem states the following:

Theorem 1.1 (Dense model theorem).

Let $\mathcal{F}$ be a class of tests $f:\{0,1\}^{n}\to\{0,1\}$ and $\mathbf{Z}$ a random variable over $\{0,1\}^{n}$ with $(\varepsilon\delta,\delta)$ -pseudodensity with respect to $\mathsf{MAJ}_{k}\circ\mathcal{F}$ for $k=O(\log(1/\delta)/\varepsilon^{2})$ . Then $\mathbf{Z}$ has a $\delta$ -dense $\varepsilon$ -model with respect to $\mathcal{F}$ .

We will generally also consider a parameter $\varepsilon^{\prime}$ , which in this case is $\varepsilon\delta$ , the additive error in pseudodensity. To get an intuition for what this is saying, let’s conisder a setting where it’s false but for trivial reasons. As a simple example given in [52], pick a set $\mathbf{Z}$ some set as a $(1-\varepsilon)$ fraction of another set $\mathbf{S}$ of size $\delta 2^{n}$ . Then $\mathbf{Z}$ doesn’t have a $\delta$ -dense $\varepsilon$ -model (i.e. $\mathbf{S}$ ) with respect to $\mathbf{Z}$ ’s indicator function, which we’ll call $f$ . On the other hand, the distribution $\mathbf{W}$ obtained by sampling $\mathbf{Z}$ with probability $\delta$ and sampling from $\mathbf{S}$ ’s complement with probability $1-\delta$ is at most $\varepsilon\delta$ -distinguishable from $\mathbf{S}$ for any function, since $\varepsilon\delta$ is simply the measure of the difference between $\mathbf{S}$ and $\mathbf{Z}$ . In particular $\mathbf{Z}$ is $\delta$ -dense in the $\varepsilon\delta$ -pseudorandom $\mathbf{W}$ (which implies, via 1.1, that it is $(\varepsilon\delta,\delta)$ -pseudodense). This means that the 1.1 is tight for the dependence on $\varepsilon^{\prime}=\varepsilon\delta$ , in that it becomes false for $\Omega(\varepsilon\delta)$ . In many instances, we think of $\varepsilon=1/{\mathsf{poly}}(n)$ , $\delta$ constant (or perhaps with mild dependences on $n$ ) and $\varepsilon^{\prime}=\delta\varepsilon$ .

Originally, the dense model theorem was proved with a different (and stronger) assumption; namely, that $\mathbf{Z}$ is dense in a pseudorandom set. Green and Tao, in proving that the primes contain arbitrarily long arithmetic progressions, used it to the following effect: if $\mathbf{Z}$ are the prime numbers up to $n$ , then its density is known to behave like $\Theta(1/\log n)$ . On the other hand, Szemerédi [45] showed that sufficiently dense subsets of $\mathbb{Z}$ contain arbitrarily long arithmetic progressions. The best bounds for Szemerédi ’s theorem require density $\omega(1/\log\log n))$ , which is much larger than the primes (see [16] and the recent [9] for more on the rich history on this and related problems). Not all is lost, however: the only property of dense sets that we’re interested in is that they contain arithemtic progressions. So Green and Tao construct a class $\mathcal{F}$ of tests which can ‘detect’ arithmetic progressions and under which the primes are dense inside of a $\mathcal{F}^{\prime}$ -pseudorandom set (more on $\mathcal{F}^{\prime}$ later). By applying the dense model theorem, we conclude that the primes ‘look like’ a dense set (themselves having long arithemtic progressions) with respect to the class $\mathcal{F}$ . As $\mathcal{F}$ detects arithmetic progressions, it must be the case that the primes possess them. Of course, many details need to be filled in, but we hope this example shows the reader the ‘spirit’ of the dense model theorem.

A primary source of interest in the dense model theorem is in the connections it shares with seemingly unrelated branches of mathematics and computer science. The original application was in additive number theory, but it was independently discovered and proved in the context of cryptography ([8, 14]). RTTV [38] and Gowers [17] observed proofs of the dense model theorem which use linear programming duality, which is in turn related to Nisan’s proof of the hardcore lemma from circuit complexity [28]. In fact, Impagliazzo [29] shows in unpublished work that optimal-density versions of the hardcore lemma due to Holenstein [26] actually imply the dense model theorem. Klivans and Servedio [32] famously observed the relationship betweeen the hardcore lemma and boosting, a fundamental technique for aggregating weak learners in machine learning [15]. Together with the result of Impagliazzo, this connection means that dense model theorems can be proved by a particular type of boosting algorithm. A boosting argument for the existence of dense models also gives us constructive versions of the dense model theorem, which are needed for algorithmic applications. Zhang [52] (without using Impagliazzo’s reduction from the dense model theorem to the hardcore lemma) used the boosting algorithm of [7] directly to prove the dense model theorem with optimal query complexity ( $k$ ).

In addition to its connections to complexity, machine learning, additive number theory and cryptography, the dense model theorem (and ideas which developed from the dense model theorem, chiefly the approximation theorem of [49]), have been used to understand the weak graph regularity lemma of Frieze and Kannan [29], notions of computational differential privacy [36] and even generalization in generative adversarial networks (GANs) [5]. We now turn to discussing the complexity-theoretic aspects of the dense model theorem, specifically regarding our question of whether the $\mathsf{MAJ}_{k}$ from the statement is optimal.

As alluded to earlier, Green and Tao actually worked in a setting where $\mathcal{F}^{\prime}$ doesn’t ’need to compute majorities but where $\varepsilon\delta$ (that is, the distinguishing parameter in the pseudodensity assumption in the statement of 1.1) needs to be replaced by some $\varepsilon^{\prime}=\exp(-{\mathsf{poly}}(1/\varepsilon,1/\delta))$ (with $k={\mathsf{poly}}(1/\delta,1/\varepsilon)$ experiencing a small increase). We state this result, as proved in Tao and Zeigler [48] and stated this way in RTTV [38], for comparison. For a test class $\mathcal{F}$ , let $\prod_{k}\mathcal{F}$ be the set of tests of the form $\prod_{i\in[k]}f_{i}$ for $f_{i}\in\mathcal{F}$ .

Theorem 1.2 (Computationally simple dense-model theorem, strong assumption).

Let $\mathcal{F}$ be a class of tests $f:\{0,1\}^{n}\to[0,1]$ and $\mathbf{Z}$ a random variable over $\{0,1\}^{n}$ which is $\delta$ -dense in a set $\varepsilon^{\prime}$ -pseudorandom for $\prod_{k}\mathcal{F}$ with $k={\mathsf{poly}}(1/\delta,1/\varepsilon)$ and $\varepsilon^{\prime}=\exp(-1/\delta,1/\varepsilon)$ . Then $\mathbf{Z}$ has a $\delta$ -dense $\varepsilon$ -model with respect to $\mathcal{F}$ .

RTTV [38] observe that this proof can be adapted to work for $\varepsilon^{\prime}$ have polynomial dependence on $\varepsilon,\delta$ by restricting to the case of boolean-valued tests. Doing so, however, makes $\mathcal{F}^{\prime}$ much more complicated (essentially requiring circuits of size exponential in $k$ ). In 1.1, we can obtain the best of both worlds: $\varepsilon^{\prime}$ has polynomial dependence on $\varepsilon,\delta$ and the complexity blow-up is rather small. However, in this more picturesque circumstance, we need to be able to compute majorities. Is such a tradeoff necessary? Our results suggest that the answer is yes. 1.6 (stated in the following section) tells us that if the dense model theorem is true for $\mathcal{F}$ , then there’s a small, constant-depth circuit with $\mathcal{F}$ -oracle gates approximating majority on $O(1/\varepsilon^{2})$ bits.

Another important aspect of the dense model theorem is how the different assumptions are related. As mentioned, the original assumption was that $\mathbf{Z}$ is $\delta$ -dense in an $\varepsilon$ -pseudorandom set, but the proof can be extended to the case where $\mathbf{Z}$ is $(\varepsilon,\delta)$ -pseudodense. 1.1 showed that the former assumption implies that latter assumption. When the dense model theorem is true, the latter also implies the former: simply apply the dense model theorem to $\mathbf{Z}$ which is $(\varepsilon,\delta)$ -dense to obtain a $\delta$ -dense $\varepsilon$ -model. Then, by the first part of 1.1, we’re done.

First, we give examples of situations where these two notions are distinct. For example, we show in 1.4 and 1.5 that they are inequivalent when $\mathcal{F}$ is constant-depth polynomial size circuits or when $\mathcal{F}$ is a low-degree polynomial over a finite field. Note that a separation between pseudodensity and being dense in a pseudorandom set also implies a separation between pseudodensity and having a dense model, as being dense in a pseudorandom set is a necessary condition for having a dense model.

Second, we show that the dense model theorem is false even when we make the stronger assumption that the starting distribution $\mathbf{Z}$ is dense in a pseudorandom set. Specifically, in 1.3 we can show that some distributions $\mathbf{Z}$ are dense in a pseudorandom set but fail to have a dense model when $\mathcal{F}$ consists of constant-depth, polynomial size circuits.

Having contextualized our work some, we now turn to describe our contributions in more detail.

1.2 Contributions

We separate the previously described notions of computational entropy, giving examples where the dense model theorem is false. We are able to prove different separations when $\mathcal{F}$ is constant-depth unbounded fan-in circuits, low-degree polynomials over a finite field, and, in one case, any test class $\mathcal{F}$ which cannot efficiently approximate majority (in some sense made explicit later on). The only known separation prior was between pseudodensity and having a dense model for bounded-width read-once branching programs, due to Barak et al. [8].

Let $\mathcal{C}(S,d)$ denote the class of unbounded fan-in, size $S$ , depth $d$ circuits. We are generally thinking of $S={\mathsf{poly}}(n)$ and $d=O(1)$ , which corresponds to the complexity class ${\mathsf{AC}}^{0}$ . 1.3 shows that $\mathbf{Z}$ being $\delta$ -dense in an $\varepsilon$ -pseudorandom set need not imply that $\mathbf{Z}$ has a $\delta$ -dense $\varepsilon$ -model when the test class is $\mathcal{C}(S,d)$ :

Theorem 1.3.

Let $\varepsilon,\varepsilon^{\prime}>0$ be arbitrary, $\delta\geq\varepsilon^{\prime}/8$ and

S\leq\exp(O\Big{(}\frac{\sqrt{\varepsilon^{\prime}}}{\varepsilon}\cdot\frac{\sqrt{\log(1/\delta)}}{\log(1/\varepsilon^{\prime})}\Big{)}^{1/(d-1)}).

Then for $\mathcal{F}=\mathcal{C}(S,d)$ , there is a random variable $\mathbf{D}$ over $\{0,1\}^{n}$ with $n=O(\log(1/\delta)/\varepsilon^{2})$ so that $\mathbf{D}$ is $\delta$ -dense in an $\varepsilon^{\prime}$ -pseudorandom set but does not have a $\delta$ -dense $\varepsilon$ -model. In particular, the dense model theorem is false in this setting.

Recall that the dense model theorem is false when $\varepsilon^{\prime}=\Omega(\varepsilon\delta)$ , which makes the restriction $\delta\geq\varepsilon^{\prime}/8$ extremely mild. A common regime is $\varepsilon=1/{\mathsf{poly}}(n)$ , $\delta=O(1)$ and $\varepsilon^{\prime}=\delta\varepsilon=\Theta(\varepsilon)$ , in which case this gives us (essentially) a lower bound of weakly exponential in $1/\sqrt{\varepsilon}\approx 1/\sqrt{\varepsilon^{\prime}}$ .

Let $\mathbf{N}_{\alpha}$ denote the product distribution of $n$ Bernoulli random variables with success probability $1/2-\alpha$ . Recall that density in a pseudorandom set readily implies pseudodensity, and one can use the dense model theorem to show the converse. We show that $(\varepsilon,\delta)$ -pseudodensity need not imply $\delta$ -density in an $\varepsilon$ -pseudorandom set when the test class is $\mathcal{C}(S,d)$ :

Theorem 1.4.

Fix $\varepsilon,\varepsilon^{\prime},\delta>0$ , $d\in\mathbb{N}$ , and

S\leq\exp(O\Big{(}\frac{\sqrt{\delta}}{\sqrt{\varepsilon}}\cdot\frac{\log(1/\delta)}{\log(1/\varepsilon^{\prime})}\Big{)}^{1/(d-1)}).

Then $\mathbf{N}_{\sqrt{\varepsilon/\delta}}$ over $\{0,1\}^{n}$ with $n=O(1/\varepsilon)$ is $(\varepsilon^{\prime},\delta)$ -pseudodense and yet $\mathbf{N}_{\sqrt{\varepsilon/\delta}}$ is not $\delta$ -dense inside of any $\varepsilon$ -pseudorandom set.

The dependence $\varepsilon^{\prime}$ means that we can take $\varepsilon^{\prime}$ exponentially smaller than $\varepsilon$ and still obtain a separation. This case corresponds to $\mathcal{F}$ being ‘very’ fooled by $\mathbf{N}_{\alpha}$ but still not being $\delta$ -dense in a ‘mildly’ pseudorandom set. This result draws on a recent line of work in the pseudorandomness literature — often referred to as ‘the coin problem’ and studied in, e.g., [42, 12, 1, 46] — which concerns the ability of a test class $\mathcal{F}$ unable to compute majority has in distinguising $\mathbf{N}_{\alpha}$ and $\mathbf{U}$ . We will discuss this connection in more detail during the proof overviews.

We prove a similar separation for degree- $d$ $\mathbb{F}_{p}$ -polynomials (on $n$ variables), which generalizes (and uses techniques from) a recent result of Srinivasan [44] in the case where $\delta=1$ . In this case, we think of a distribution $\mathbf{Z}$ as being $(\varepsilon^{\prime},\delta)$ -pseudodense for degree- $d$ $\mathbb{F}_{p}$ -polynomials when $\delta\Pr[P(\mathbf{Z})\neq 0]-\varepsilon^{\prime}\geq\Pr[P(\mathbf{U})\neq 0]$ for any degree- $d$ polynomial $P\in\mathbb{F}_{p}[X_{1},...,X_{n}]$ (noting that we are only evaluating $P$ over $\{0,1\}^{n}$ ).

Theorem 1.5.

Fix a finite field $\mathbb{F}$ with characteristic $p=O(1)$ , $\varepsilon,\varepsilon^{\prime}>0$ and let $c>\delta>0$ where $c\approx 1/200$ is an absolute constant. Suppose that

d\leq O(\sqrt{\delta/\varepsilon})

Then when $\mathcal{F}$ is the $n$ -variate degree- $d$ polynomials over $\mathbb{F}$ with $n=1/\varepsilon$ , and $\alpha=O(\sqrt{\varepsilon/\delta})$ , $\mathbf{N}_{\alpha}$ is $(\varepsilon^{\prime},\delta)$ -pseudodense but is not $\delta$ -dense inside of an $\varepsilon$ -pseudorandom set.

This implies lower bounds for constant-depth circuits with ${\mathsf{MOD}}_{p}$ gates by the classical lower bounds of Razborov [37] and Smolensky [43]. Perhaps more interestingly, this holds even over non-prime fields. Also notably, there is no dependence on $\varepsilon^{\prime}\leq\varepsilon\delta$ , so we can take it to be arbitrarily small.

We also prove a more general separation between pseudodensity and density in a pseudorandom set. This result, drawing from the work of [42], provides a more specific characterization of the sense in which dense model theorems are ‘required’ to compute majority.

Theorem 1.6.

Let $\varepsilon,\delta>0$ . Suppose $\mathcal{F}$ is a test class of boolean functions $f:\{0,1\}^{n}\to\{0,1\}$ with the following property: there is no ${\mathsf{AC}}^{0}$ $\mathcal{F}$ -oracle circuit of size ${\mathsf{poly}}(n\cdot\frac{\sqrt{\delta}}{\varepsilon^{3/2}})$ computing majority on $O(\sqrt{\delta/\varepsilon})$ bits.

Then $\mathbf{N}_{\sqrt{\varepsilon/\delta}}$ is $(\epsilon\delta,\delta)$ -pseudodense and yet does not have a $\delta$ -dense $\varepsilon$ -model. In particular, when the hypotheses are met, the dense model theorem is false.

Informally, this says that any $\mathcal{F}$ which can refute the pseudodensity of $\mathbf{N}_{\alpha}$ is only ‘a constant-depth circuit away’ from computing majority.

1.3 Related work

Computational entropy

Computational entropy was studied systematically in [8] and is relevant to various problems in complexity and cryptography such as leakage-resilience [14], constructions of PRGs from one-way functions [25, 21, 20]. and derandomization [13].

There are a number of definitions of computational entropy which we don’t consider in this work. For example, Yao pseudoentropy [51] (see also [8]), corresponding to random variables which are ‘compressible’ by a class of tests $\mathcal{F}$ , in the sense that $\mathcal{F}$ can encode and decode the random variable by encoding into a small number of bits. Yao pseudoentropy was recently used in time-efficient hardness-to-randomness tradeoffs [13], where (randomness-efficient) samplers for pseudodense distributions were used with an appropriate extractor to construct a pseudorandom distribution. Another example is inaccessible entropy of Haitner et al. [21], corresponding to the entropy of a message at some round in a two-player protocol conditioned on the prior messages and the randomness of the players, which is used in efficient constructions of statistically hiding commitment schemes from one-way functions [20].

Separating notions of computational entropy has been studied before in [8], who prove a separation of pseudodensity and having a dense model for bounded-width read-once branching programs. Separating notions of conditional computational entropy was studied in [27], showing separations between conditional variants of Yao pseudoentropy and having a dense model.

As mentioned in [27], citing [49] and personal communication with Impagliazzo, another question of interest is whether Yao pseudoentropy (corresponding to efficient encoding/decoding algorithms) implies having dense model. It is not hard to see that small Yao pseudoentropy implies small pseudodensity, with some mild restrictions on $\mathcal{F}$ . It would be interesting to see if the techniques from this paper can be used to understand Yao pseudoentropy in more detail. We leave this to future work.

Complexity of dense model theorems and hardness amplification

Prior work on the complexity of dense model theorems has included a tight lower bound on the query complexity [52] and a lower bound on the advice complexity [50]. As far as we are aware, this is the first work to consider the computational complexity of dense model theorems.

There has also been prior work on the computational complexity of hardness amplification, establishing that various known strategies for hardness amplification require the computation of majority [34, 42, 19, 41]. It is known that a particular type of hardness amplification given by the hardcore lemma implies the dense model theorem [29].

Our results are stronger in the following sense: previous work [34, 42, 19] shows that black-box hardness amplification proofs require majority. This means that if you amplify the hardness of $f$ in some black-box way, then this can be used to compute majority. In our case, we simply show (in different settings) that the dense model theorem is false, regardless of how we tried to prove it. By the connection between the hardcore lemma and the dense model theorem, our results also provide scenarios where the hardcore lemma is false. As far as we are aware, these are the first such scenarios recorded in the literature.

1.4 Technical overview

We discuss two general themes that appear consistently in the proofs and then discuss each of the main theorems in some more detail.

1.4.1 Dense distributions have mostly unbiased bits

A commonly-used observation in theoretical computer science is that most bit positions of a $\delta$ -dense random variable over $\{0,1\}^{n}$ have bias $O(\sqrt{\log(1/\delta)}/n)$ (see, for example, the introduction of [35]). Relevant to our purposes, it provides a necessary condition for having a $\delta$ -dense $\varepsilon$ -model with respect to any class $\mathcal{F}$ containing the projections $z\mapsto z_{i}$ . $\mathbf{Z}$ has a $\delta$ -dense $\varepsilon$ -model, then most bits of $\mathbf{Z}$ have bias $\varepsilon+O(\sqrt{\log(1/\delta)}/n)$ . In particular, if all of the bits of $\mathbf{Z}$ have large bias, then it can’t have a dense model.

This is used directly in the proof of 1.3. In this case, we construct a distribution $\mathbf{Z}$ which is $\delta$ -dense in a set which is $\varepsilon$ -pseudorandom for ${\mathsf{AC}}^{0}$ but where the each bit is noticeably biased away from $1/2$ .

In order to prove separations between pseudodensity and being dense in a pseudorandom set — as in 1.4, 1.5 and 1.6 — we need to consider the bias of larger subsets of variables. Considering just two bits is sufficient to prove mild concentration bounds on the weight of pseudorandom strings. This implies that the tails of dense subsets of pseudorandom sets should not be too heavy.

1.4.2 Biased coin distribution

The biased coin distribution, $\mathbf{N}_{\alpha}$ over $\{0,1\}^{n}$ is the product of $n$ Bernoulli random variables with success probability $1/2-\alpha$ . $\mathbf{N}_{\alpha}$ has recently garnered significant interest in the pseudorandomness literature (see [2, 12, 46, 10, 1]). Shaltiel and Viola [42] showed that if $f$ is a test which $\varepsilon$ -distinguishes $\mathbf{N}_{\alpha}$ from $\mathbf{U}$ , then there is a small, constant-depth circuit $C$ with $f$ -oracle gates which computes majority on $O(1/\varepsilon)$ bits. A similar, but qualitatively different, connection due to Limaye et al [33] — extended to any choice of $\varepsilon>0$ by Srinivasan [44] — shows that any $\mathbb{F}_{p}$ -polynomial with advantage $1-2\varepsilon$ in distinguishing $\mathbf{N}_{\alpha}$ from $\mathbf{U}$ must have degree $\Omega(\log(1/\varepsilon)/\alpha)$ . We extend some of these pseudorandomness results regarding $\mathbf{N}_{\alpha}$ to pseudodensity results.

First, we extend the observation of Shaltiel and Viola to apply to tests $f$ for which $\mathbb{E}[f(\mathbf{Z})]\geq\delta\mathbb{E}[f(\mathbf{U})]+\varepsilon$ (which corresponds to pseudorandomness when $\delta=1$ ). This gives us unconditional pseudodensity for test classes $\mathcal{F}$ which can’t be used in small, constant-depth oracle circuits approximating majority. We also extend the observation of [33] to show lower bounds on the $\mathbb{F}_{p}$ -degree for any function $f$ which refutes the pseudodensity of $\mathbf{N}_{\alpha}$ .

In Lemma 4.1, we show that $\mathbf{N}_{\alpha}$ exhibits $(\varepsilon,\delta)$ -pseudodensity for $\varepsilon=(p\cdot O(\log S)^{d-1})^{k}$ and $\delta=e^{-\alpha k/p}$ . This can be seen as a generalization of Tal’s result, building on [12, 1, 42] that $\mathbf{N}_{\alpha}$ is $3\alpha\cdot O(\log S)^{d-1}$ -pseudorandom for $\mathcal{C}(S,d)$ .

Tal uses a Fourier analytic proof which becomes very simple given tail bounds on the Fourier spectrum of ${\mathsf{AC}}^{0}$ (the latter being the main contribution of [46]). More generally, any $\mathcal{F}$ enjoying sufficiently strong tail bounds on the Fourier spectrum (in the $\ell_{1}$ norm) cannot distinguish between $\mathbf{N}_{\alpha}$ and uniform. It turns out, as proved by Tal and recorded in Agarwal [2], that if $\mathcal{F}$ is closed under restrictions than even bounding the first level of the Fourier spectrum works. The proof of Lemma 4.1 based specifically on the switching lemma for constant-depth circuits. While switching lemmas can be used to show Fourier concentration, it would be intersting to find a proof which only uses the assumption of Fourier concentration (or some Fourier-analytic assumption).

1.4.3 1.3

Our goal is to construct a random variable $\mathbf{D}$ which is dense inside of an ${\mathsf{AC}}^{0}$ -pseudorandom set but where each bit is biased away from $0$ . In this case, $\mathbf{D}$ would be distinguishable from any dense set, since the average bit of a dense set is roughly unbiased. Doing so requires two steps.

The first step is constructing an appropriate distribution $\mathbf{Z}$ that fools ${\mathsf{AC}}^{0}$ circuits. For this we adopt a general strategy of Ajtai and Wigderson [3] (and applied in many contexts in pseudorandomness since; see, e.g., [40]): to fool a circuit $C$ , we start by producing a random restriction to simpify $C$ to a short decision tree (via the switching lemma), and then we fool the decision tree on the remaining bits using a $k$ -wise independent distribution $\mathbf{S}$ . If we wanted $\mathbf{Z}$ to have small support size, we would need some way of producing random restrictions with a small amount randomness (which is precisely the approach of Ajtai-Wigderson and later work). Fortunately, we only care about the existence of $\mathbf{Z}$ and are therefore content to use the ‘non-derandomized’ switching lemma.

The second step is finding a dense subset $\mathbf{D}$ of $\mathbf{S}$ with biased bits. We do this by constructing $\mathbf{S}$ so that each bit has bias roughly $\sqrt{\log(1/\delta)/K}$ , where $k\ll K\ll n$ is a parameter. This is achieved by randomly bucketing the indices into $K$ buckets and assigning each bucket a random bit, which reduces the dimension of the problem from $n$ to $K$ . This means we can pick a $\delta$ -dense event in $\{0,1\}^{K}$ with extremal bias — met (up to constants) by the function accepting all strings with weight less than $K/2-K\sqrt{\log(1/\delta)}$ — in order to find a dense subset of $\mathbf{S}$ with large bias. The bucketing construction introduces some error when a small set $I\subseteq[n]$ hits to distinct elements in some buckets.

1.4.4 1.4

We will show $\mathbf{N}_{\alpha}$ has $(\delta,\varepsilon^{\prime})$ -pseudodensity for ${\mathsf{AC}}^{0}$ for $\delta=\varepsilon^{\prime}=O(1)$ , $\alpha=1/{\mathsf{poly}}\log(n)$ . The idea is that $\mathbf{N}_{\alpha}$ can be sampled by first sampling a random restriction which leaves a $p$ fraction of the bits unset (and is unbiased on the restricted bits) and then setting the remaining bits with bias $\alpha/p$ . Applying the switching lemma, we conclude that $\mathbb{E}[f(\mathbf{N}_{\alpha})]\approx\mathbb{E}[f^{\prime}(\mathbf{N}_{\alpha/p})]$ where $f^{\prime}$ is a short decision tree (which doesn’t not depend on all of its inputs). A simple calculation reveals that acceptance probability of $f^{\prime}$ can increase by at a most a factor $(1+\alpha/p)^{d}\leq e^{\alpha d/p}$ when passing from the uniform distribution to $\mathbf{N}_{\alpha/p}$ . By incorporating the error from the switching lemma (i.e. the advantage lost by conditioning on the switching lemma succeeding), we get $(\delta,\epsilon)$ -pseudodensity.

To prove the separation, we use the fact that the Hamming weight of a random variable fooling $\mathcal{C}(S,d)$ is concentrated around its expectation. This means in particular that if $\mathbf{N}_{\alpha}$ were $\delta$ -dense in a pseudorandom distribution, then the tails of $\mathbf{N}_{\alpha}$ couldn’t be too heavy and therefore $\alpha$ couldn’t be too large.

1.4.5 1.5 and 1.6

1.5 and 1.6 draw from related work of Srinivasan [44] and Shaltiel-Viola [42] respectively.

With $\epsilon>0$ and $\mathcal{F}$ an arbitrary class of tests $f:\{0,1\}^{n}\to\{\pm 1\}$ , suppose that $f\in\mathcal{F}$ witnesses that $\mathbf{N}_{\varepsilon}$ fails to have $(\varepsilon^{\prime},\delta)$ -pseudo-density in the sense that

\mathbb{E}[f(\mathbf{U})]\leq\delta\mathbb{E}[f(\mathbf{N}_{\beta})]-\gamma.

[44] and [42] both make use of the following simple observation. Given two strings $u,v\in\{0,1\}^{m}$ with $\mathrm{wt}(u)=(1/2-\varepsilon)m$ and $\mathrm{wt}(v)=m/2$ , a uniformly random index $i\in[m]$ has $u_{i}$ distributed as a $(1/2-\varepsilon)$ -biased coin and $v_{i}$ as an unbiased coin. In our case, applying $f$ to sufficiently many random samples from $u$ or $v$ ‘distinguishes’ the two of them, but in a weaker sense.

In the case of 1.6, we can amplify acceptance probabilities by increasing the size of the circuit by a factor $1/\varepsilon\delta$ , after which we can apply [42] saying that constant-error distinguishers between $\mathbf{N}_{\alpha}$ and $\mathbf{U}$ can be used to compute majority.

For 1.5, we apply a beautiful recent result of Srinivasan [44] showing that any $m$ -variate polynomial (over a finite field) which vanishes on most points on the slice $1/2-\alpha$ and doesn’t vanish on most points on the slice $1/2$ must have high degree $\Omega(\alpha m)$ . One way of interpreting this result is that low-degree polynomials can’t approximately solve certain ‘promise’ versions of majority.

In this latter case, we need to open up the error reduction procedure we use for 1.6 and show how to approximate it using low-degree polynomials. This will ultimately be achieved by approximating OR with a probabilistic polynomial, as in [37, 43].

2 Technical tools

We write $[n]=\{1,...,n\}$ and use boldface to denote random variables. Let $\mathcal{C}(S,d)$ be the set of size $S$ , depth- $d$ unbounded fan-in circuits. For a boolean function $f:\{0,1\}^{n}\to\{0,1\}$ , let $DT(f)$ denote the depth of the shortest decision tree computing $f$ .

2.1 Biased coins

As before, let $\mathbf{N}_{\alpha}$ denote the random variable corresponding to the product of $n$ independent coins with bias $(1/2-\alpha)$ . That is,

\Pr[\mathbf{N}_{\alpha}=z]=(1/2-\alpha)^{\mathrm{wt}(z)}(1/2+\alpha)^{n-\mathrm{wt}(z)}

where $\mathrm{wt}(z)$ denotes the Hamming weight of $z$ .

For a random variable $\mathbf{Z}$ over $\{0,1\}^{n}$ and $i\in[n]$ , let $\mathrm{bias}_{i}(\mathbf{Z})=|\Pr[\mathbf{Z}_{i}=1]-\Pr[\mathbf{Z}_{i}=0]|/2$ . Let $\mathcal{B}=\{z\mapsto z_{i}:i\in[n]\}$ be the set of monotone projections. A random variable $\mathbf{Z}=(\mathbf{Z}_{1},...,\mathbf{Z}_{n})$ is $\epsilon$ -pseudorandom with respect to $\mathcal{B}$ precisely when each marginal $\mathbf{Z}_{i}$ has the property that $\mathrm{bias}_{i}(\mathbf{Z})=|\Pr[\mathbf{Z}_{i}=1]-1/2|\leq\varepsilon$ for each $i\in[n]$ . In particular,

Claim 2.1.

For any $\varepsilon>0$ , $\mathbf{N}_{\varepsilon}$ is $\varepsilon$ -pseudorandom with respect to $\mathcal{B}$ .

2.2 Information theory

The (Shannon) entropy of a random variable is defined as

H(\mathbf{Z})=-\sum_{x\in\{0,1\}^{n}}p_{\mathbf{Z}}(x)\log p_{\mathbf{Z}}(x),

where $p_{\mathbf{Z}}$ is the probability density function corresponding to $\mathbf{Z}$ . The Shannon entropy of random vector is sub-additive, in that $H(\mathbf{Z})\leq\sum_{i\in[n]}\mathbf{Z_{i}}$ . When $\mathbf{Z}\in\{0,1\}$ and $\Pr[\mathbf{Z}=1]=p$ , we use $h(p)=H(\mathbf{Z})=-(p\log p+(1-p)\log(1-p))$ to denote the binary entropy function.

The min-entropy is defined as

H_{\infty}(\mathbf{Z})=-\min_{x\in\{0,1\}^{n}}\log p_{\mathbf{Z}}(x)

If $\mathbf{Z}$ is $\delta$ -dense inside of $\mathbf{U}$ , then its min-entropy is $n-\log(1/\delta)$ and for any random variable $\mathbf{Z}$ , $H_{\infty}(\mathbf{Z})\leq H(\mathbf{Z})$ .

By this latter inequality and subadditivity, the average entropy of $\mathbf{Z}$ ’s bits is at least $1-\log(1/\delta)/n$ . Appealing to a quadratic approximation of binary entropy, we learn that the bias must be at most $\sqrt{\log(1/\delta)/n}$ . This result has been referred to as Chang’s inequality and the Level-1 inequality, having been observed in different forms and with different proofs in, for example, [47, 11, 22, 31]. Because it is so simple, we provide a proof here:

Claim 2.2.

If $\mathbf{Z}$ is $\delta$ -dense in $\mathbf{U}$ , then $\mathbb{E}_{i}[\mathrm{bias}_{i}(\mathbf{Z})]\leq\sqrt{\log(1/\delta)/n}$

Proof.

As $\delta$ -density is equivalent to $n-\log(1/\delta)$ min-entropy,

n-\log(1/\delta)=H_{\infty}(\mathbf{Z})\leq H(\mathbf{Z})\leq\sum_{i\in[n]}H(\mathbf{Z}_{i}),

by subadditivity of entropy. The entropy of $\mathbf{Z}_{i}$ ’s bits, therefore, is at least $1-\log(1/\delta)/n$ on average. Taking the Taylor series, we can approximate the binary entropy function $h(p)$ around $1/2$ by a quadratic function as $h(1/2+\varepsilon)\leq 1-(2/\ln 2)\varepsilon^{2}$ . Comparing this bound with the average, we get

1-\log(1/\delta)\leq 1-(2/\ln 2)\varepsilon^{2},

meaning $\varepsilon\leq\sqrt{(\ln 2/2)\cdot(\log(1/\delta)/n)}\leq\sqrt{\log(1/\delta)/n}$ . ∎

2.3 Random variables lacking computational entropy

It follows directly from 2.2 that if $\mathrm{bias}_{i}(\mathbf{Z}_{i})$ exceeds $\varepsilon+\sqrt{\log(1/\delta)}/n$ for every $i$ , then $\mathbf{Z}$ does not have a $\delta$ -dense $\varepsilon$ -model with respect to the projections $\mathcal{B}$ .

Lemma 2.1.

Let $\mathbf{Z}$ be a random variable with $\mathrm{bias}_{i}(\mathbf{Z})\leq\gamma$

for every $i\in[n]$ . Then for any $\delta>0$ and $\gamma\geq\epsilon+\sqrt{\frac{\log(1/\delta)}{n}}$ , $\mathbf{Z}$ does not have a $\delta$ -dense $\epsilon$ -model with respect to $\mathcal{B}$ .

This is used for the separation in 1.3. We would also like a necessary condition for being dense in a pseudorandom set. Towards this end, we note that pseudorandom distributions for even very simple test classes have mild concentration properties.

Claim 2.3.

Suppose $\mathcal{F}$ can compute $x_{i}\oplus x_{j}$ for every $i,j\in[n]$ and let $\mathbf{Z}$ over $\{0,1\}^{n}$ be $\varepsilon$ -pseudorandom for $\mathcal{F}$ . Then

\Pr[\sum_{i}\mathbf{Z_{i}}\leq n/2-\alpha n]\leq\frac{1}{4\alpha^{2}n}+\frac{\varepsilon}{4\alpha^{2}}

Proof.

We work over $\{\pm 1\}$ instead of $\{0,1\}$ to make calculations easier. We can compute the second moment as

\mathbb{E}[(\sum_{i}\mathbf{Z}_{i})^{2}]=\sum_{i}\mathbb{E}[\mathbf{Z}_{i}^{2}]+\sum_{i\neq j}\mathbb{E}[\mathbf{Z}_{i}\mathbf{Z}_{j}]\leq n+\varepsilon n^{2}.

Applying Markov’s inequality to $(\sum_{i}\mathbf{Z}_{i})^{2}$ , we see that

\Pr[|\sum_{i}\mathbf{Z}_{i}|\geq 2\alpha n]=\Pr[(\sum_{i}\mathbf{Z}_{i})^{2}\geq(2\alpha n)^{2}]\leq\mathbb{E}[(\sum_{i}\mathbf{Z}_{i})^{2}]/(2\alpha n)^{2}.

We use $2\alpha n$ because it maps back to $n/2-\alpha n$ in $\{0,1\}$ . Then the conclusion follows from our second moment calculation and converting back to $\{0,1\}$ . ∎

The tails of a dense subset can’t be too much larger than the original distribution, by definition of density. This gives us a test for being dense in a pseudorandom set, which we specialize to $\mathbf{N}_{\alpha}$ .

Lemma 2.2.

Let $\varepsilon,\delta>0$ be arbitrary. Suppose $\mathcal{F}$ can compute $x_{i}\oplus x_{j}$ for any $i,j\in[n]$ and $\alpha\geq\sqrt{1/(8\delta)\cdot(1/n+\varepsilon)}$ . Then $\mathbf{N}_{\alpha}$ is not $\delta$ -dense in any set which is $\varepsilon$ -pseudorandom for $\mathcal{F}$ .

Proof.

Under $\mathbf{N}_{\alpha}$ , the volume of the threshold $\mathbf{1}[\sum_{i}\mathbf{Z}_{i}\leq n/2-\alpha n]$ is $1/2$ . Taking 2.3 in the contrapositive, we reach the desired conclusion when

	$\displaystyle 1/2$	$\displaystyle>\frac{1}{4\delta\alpha^{2}n}+\frac{\varepsilon}{4\delta\alpha^{2}}$
	$\displaystyle\alpha^{2}$	$\displaystyle>\frac{1}{8\delta}(1/n+\varepsilon)$

∎

2.4 Random restrictions and the switching lemma

A restriction over $[n]$ is a function $\rho:[n]\to\{0,1,*\}$ . Indices in $\rho^{-1}(*)$ can be thought of as unset and each other index as set. For another restriction $z$ so that $\rho^{-1}(*)\subseteq z^{-1}(\{0,1\})$ , let $\rho\circ z\in\{0,1\}^{n}$ be defined by

(\rho\circ z)_{i}=\begin{cases}z_{i}\text{ if $i\in\rho^{-1}(*)$,}\\ \rho_{i}\text{ otherwise.}\end{cases}

Define the restricted function $f|_{\rho}:\{0,1\}^{\rho^{-1}(*)}\to\{0,1\}$ over $\rho$ ’s unset indices by

f|_{\rho}(z)=f(\rho\circ z).

Let $R_{p}$ be the distribution on restrictions over $[n]$ obtained by setting $\rho(i)=*$ independently with probability $p$ , and then setting each bit not assigned to $*$ a random bit. The switching lemma we use is due to Rossman [39], building on a long line of work [3, 23, 24, 30]:

Theorem 2.1 (Rossman [39]).

Suppose $f\in\mathcal{C}(S,d)$ . Then

\Pr_{\boldsymbol{\rho}\sim R_{p}}[DT(f|_{\boldsymbol{\rho}})\geq k]\leq(p\cdot O(\log S)^{d-1})^{k}

By considering a random restriction $\boldsymbol{\rho}\sim R_{p}$ over $[n]$ and a random variable $\mathbf{Z}$ over $\{0,1\}^{n}$ , the definition of a restricted function implies that

\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]=\mathbb{E}[f|_{\boldsymbol{\rho}}(\mathbf{Z})].

We make crucial use of two simple corollaries of the switching lemma, which allow us to reason about distinguishability for ${\mathsf{AC}}^{0}$ circuits in terms of distinguishability for short decision trees.

Lemma 2.3.

Suppose $f\in\mathcal{C}(S,d)$ . Then there is a distribution over depth $k$ decision trees so that

|\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]|\leq(p\cdot O(\log S)^{d-1})^{k}

Proof.

Let $g_{\boldsymbol{\rho}}$ denote the optimal decision tree for $f|_{\boldsymbol{\rho}}$ . Let $E$ denote the event that $g_{\boldsymbol{\rho}}$ has depth at most $k$ and $\Pr[E]=1-q$ . Let $h_{\boldsymbol{\rho}}$ be the distribution over depth at most $k$ decision trees obtained by sampling $g_{\boldsymbol{\rho}}$ conditioned on $E$ . Then

	$\displaystyle\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]$	$\displaystyle=\mathbb{E}[f\|_{\boldsymbol{\rho}}(\mathbf{Z})]$
		$\displaystyle=(1-q)\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|E]+q\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|\neg E]$
		$\displaystyle=(1-q)\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]+q\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|\neg E]$
		$\displaystyle=\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]-q(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]-\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|\neg E]).$

The right-hand term is bounded in absolute value by $q$ because $f$ is Boolean. By 2.1, $q\leq(p\cdot O(\log S)^{d-1})^{k}$ . ∎

Lemma 2.4.

Suppose $f\in\mathcal{C}(S,d)$ . Then there’s a depth $k$ decision tree $h$ so that

|\mathbb{E}[f(\mathbf{U})]-\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]|\leq|\mathbb{E}[f^{\prime}(\mathbf{U})]-\mathbb{E}[f^{\prime}(\mathbf{Z})]|+(p\cdot O(\log S)^{d-1})^{k}

Proof.

Lemma 2.3 gives us the following upper bound.

	$\displaystyle\|\mathbb{E}[f(\mathbf{U})]-\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]\|$	$\displaystyle\leq\|(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]\pm q)-(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\pm q)\|$	(Lemma 2.3)
		$\displaystyle\leq\|\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\|+2q$	(triangle inequality)

We can continue to upper bound the right-hand term by

$\displaystyle\|\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\|$	$\displaystyle=\|\mathbb{E}_{\boldsymbol{\rho}}[\mathbb{E}[h_{\rho}(\mathbf{U})]-\mathbb{E}[h_{\rho}(\mathbf{Z})]]\|$
	$\displaystyle\leq\mathbb{E}_{\boldsymbol{\rho}}[\|\mathbb{E}[h_{\rho}(\mathbf{U})]-\mathbb{E}[h_{\rho}(\mathbf{Z})]\|]$	(triangle inequality)
	$\displaystyle\leq\|\mathbb{E}[h(\mathbf{U})]-\mathbb{E}[h(\mathbf{Z})]\|$

where the last line holds for some $h$ in the support of $h_{\boldsymbol{\rho}}$ by averaging. ∎

3 Proof of 1.3

We start by reducing the problem of constructing a pseudorandom $\mathbf{Z}$ for ${\mathsf{AC}}^{0}$ to constructing a pseudorandom $\mathbf{Z}$ for small-depth decision trees. This can be immediately achieved by applying Lemma 2.4.

Claim 3.1.

Let $p\in[0,1]$ be arbitrary and suppose $\mathbf{Z}$ is a random variable over $\{0,1\}^{n}$ which is $\epsilon$ -pseudorandom for depth- $k$ decision trees. Then for $\boldsymbol{\rho}\sim R_{p}$ , $\boldsymbol{\rho}\circ\mathbf{Z}$ is $\epsilon^{\prime}$ -pseudorandom for $\mathcal{C}(S,d)$ for

\epsilon^{\prime}=\epsilon+(p\cdot O(\log S)^{d-1})^{k}

The next lemma constructs a pseudorandom distribution for depth- $k$ decision trees with each bit having significant bias.

Lemma 3.1.

For any $k\in\mathbb{N},\delta>0$ and $K\geq 1/2\delta$ , there is a $k$ -wise independent random variable $\mathbf{S}$ over $\{0,1\}^{n}$ and a $\delta$ -dense subset $\mathbf{D}$ of $\mathbf{S}$ with the property that

1.

$\mathbf{D}$ is $\delta$ -dense in $\mathbf{S}$ .
2.

For all $i\in[n]$ , $\mathrm{bias}_{i}(\mathbf{D})=\Omega(\sqrt{\log(1/\delta)/8K})$
3.

$\mathbf{S}$ is $k^{2}/K$ -pseudorandom for depth- $k$ decision trees.

We will use the following standard lower bound on the lower tail of a binomial distribution:

Claim 3.2 ([6]).

For $0<\alpha<1$ and let $\mathbf{Z}_{1},...,\mathbf{Z}_{K}$ be independent unbiased coins ( $\{0,1\}$ -valued). Then any $\gamma$ with $1/2-\gamma=r/K$ for some positive integer $r$ satisfies

\frac{2^{-K(1-h(1/2-\gamma))}}{\sqrt{2K}}\leq\Pr[\sum_{i\in[K]}\mathbf{Z}_{i}\leq K/2-K\gamma]

Proof of Lemma 3.1.

We sample $\mathbf{S}$ in two stages. First, randomly partition $[n]$ into $K$ parts $\mathbf{A}_{1},...,\mathbf{A}_{K}$ for $K>k^{2}$ . Second, assign to each $A_{i}$ a uniformly random bit $\mathbf{b}_{i}$ .

Let $\mathbf{D}$ be $\mathbf{S}$ conditioned on $\mathbf{b}=(\mathbf{b}_{1},...,\mathbf{b}_{K})$ having weight less than $K/2-\sqrt{K\log(1/\delta)/8}$ . Since the $\mathbf{b}_{i}$ ’s are unbiased random bits, we can apply 3.2 to lower bound $\mathbf{D}$ ’s density: for any $\gamma$ ,

\Pr[\sum_{i\in[k]}\mathbf{b}_{i}\leq\gamma K]\geq\frac{2^{-Kh(1/2-\gamma)}}{\sqrt{2K}}.

This is at least $\delta$ when

	$\displaystyle\frac{2^{-K(1-h(1/2-\gamma))}}{\sqrt{2K}}$	$\displaystyle\geq\delta$
	$\displaystyle 1-h(1/2-\gamma)$	$\displaystyle\geq\log(1/\delta)/K-\log(2K)/2K$
	$\displaystyle 4\gamma^{2}$	$\displaystyle\geq\log(1/\delta)/K-\log(2K)/2K$

with the upper bound in the last line following from $h(1/2-\gamma)\geq 1-4\gamma^{2}$ . Hence, if the set of strings with weight at most $K/2-\gamma K$ is $\delta$ -dense, we have $\gamma\geq\frac{1}{2}\sqrt{\log(1/\delta)/K-\log(2K)/2K}$ . $\log(2K)/2K$ is at most $\log(1/\delta)/2K$ when $2K\leq 1/\delta$ , in which case $\gamma\geq\sqrt{\log(1/\delta)/8K}$ . In particular, this lower bounds the bias of $\mathbf{D}$ ’s bits.

To see why it’s $k^{2}/K$ -pseudorandom for depth- $k$ decision trees, consider a depth- $k$ decision tree $T$ . Over $\mathbf{U}$ , we can imagine evaluting $T$ ‘on-line’ as follows: whenever $T$ queries the $i$ th bit, determine the value of $z_{i}$ by flipping an unbiased coin. Over $\mathbf{S}$ , we can imagine evaluating $T$ similarly, where we determine the bucket $A_{j}$ that $i$ lives in and the value $b_{j}$ of that bucket.

By conditioning $\mathbf{S}$ on not placing two distinct indices $i,j$ in the same bucket — call this conditioned random variable $\mathbf{S}^{\prime}$ — then $T$ doesn’t have any distinguish advantage over $\mathbf{S}^{\prime}$ , as all of the bits it queries are independent and uniform. By a union bound, $\mathbf{S}$ places two distinct indices in the same bucket with probability at most $k^{2}/K$ . $T$ ’s distinguishing advantage is therefore at most $k^{2}/K$ . ∎

In principle, we could have used other pseudorandom distributions for decision trees such as the $\varepsilon$ -almost $k$ -wise independent distributions from [4]. The construction here is used to obtain better dependence on the parameters of interest. We will also need a claim to expresses the bias of the bits in $\boldsymbol{\rho}\circ\mathbf{Z}$ . The proof can be found in the appendix.

Claim 3.3.

Fix $p\in[0,1]$ and a random variable $\mathbf{Z}$ . Let $E$ be an event which is independent from $\mathbf{\rho}$ (in that the conditional distribution of $\mathbf{\rho}$ is identical to the unconditioned distribution). Then

\Pr[(\boldsymbol{\rho}\circ\mathbf{Z})_{i}=1|E]=p\Pr[\mathbf{Z}_{i}=1|E]+(1-p)/2

1.3, which we restate here, is obtained by an appropriate setting of parameters.

See 1.3

Proof.

Let $n=\log(1/\delta)/\varepsilon^{2}$ , $k=\log(2/\varepsilon^{\prime})$ and $K=(2k^{2})/\varepsilon^{\prime}$ . We also need $K\geq 1/2\delta$ by the restriction in Lemma 3.1, which explaines the restriction $8\delta k^{2}\geq\varepsilon^{\prime}$ , simplified by using $8\delta\geq\varepsilon^{\prime}$ (a stronger restriction) instead. Let $\mathbf{S}$ and $\mathbf{D}$ be the random variables from Lemma 3.1. By 3.3, the bias of $\boldsymbol{\rho}\circ\mathbf{S}$ (where $\boldsymbol{\rho}\sim R_{p}$ ) is $p\sqrt{\log(1/\delta)/8K}$ . By 3.1 and Lemma 3.1, $\boldsymbol{\rho}\circ\mathbf{S}$ is $\varepsilon^{\prime}=k^{2}/K+(pO(\log S)^{d-1})^{k}$ pseudorandom. We can also ensure that $\boldsymbol{\rho}\circ\mathbf{S}$ does not have a $\delta$ -dense $\varepsilon$ -model when $p\sqrt{\log(1/\delta)/8K}\geq\varepsilon+\sqrt{\log(1/\delta)/n}$ , by Lemma 2.1.

By substituting, $p\geq 2\sqrt{K/n}=2\sqrt{(k\varepsilon)^{2}/\varepsilon^{\prime}\cdot\log(1/\delta)}$ . In comparison, $\varepsilon^{\prime}\geq k^{2}/K+(pO(\log S)^{d-1})^{k}$ . Recalling that $k^{2}/K=\varepsilon^{\prime}/2$ , we get that

	$\displaystyle\varepsilon^{\prime}/2$	$\displaystyle\geq(2\sqrt{K/n}O(\log S)^{d-1})^{k}$
	$\displaystyle\frac{\sqrt{n}}{2\sqrt{K}}(\varepsilon^{\prime}/2)^{1/k}$	$\displaystyle\geq O(\log S)^{d-1}$
	$\displaystyle\frac{\sqrt{\log 1/\delta}}{\varepsilon}\cdot\frac{\sqrt{\varepsilon^{\prime}}}{2\sqrt{2}k}\cdot(\varepsilon^{\prime}/2)^{1/k}$	$\displaystyle\geq O(\log S)^{d-1}.$
	$\displaystyle\frac{\sqrt{\log 1/\delta}}{\varepsilon}\cdot\frac{\sqrt{\varepsilon^{\prime}}}{\sqrt{32}\log(1/\varepsilon^{\prime})}$	$\displaystyle\geq O(\log S)^{d-1}.$

The claim follows by solving for $S$ . ∎

4 Proof of 1.4

1.4 follows by combining Lemma 2.2 and the following lemma:

Lemma 4.1.

$\mathbf{N}_{\alpha}$ has $(\varepsilon,\delta)$ -pseudodensity for $\mathcal{C}(S,d)$ for $\varepsilon=(p\cdot O(\log S)^{d-1})^{k}$ and $\delta=e^{-\alpha k/p}$ .

Of note, the only additive error depends on the error from the switching lemma. Compare this with the claim that $\mathbf{N}_{\alpha}$ is $(3\alpha\cdot O(\log S)^{d-1})$ -pseudorandom (and therefore has the same pseudodensity for $\delta=1$ ) for $\mathcal{C}(S,d)$ , due to Tal [46].

To prove the lemma, we need a few claims.

Claim 4.1.

Suppose $f\in\mathcal{C}(S,d)$ . Then there is a depth- $k$ decision tree $h$ with the property that:

\mathbb{E}[f(\mathbf{N}_{\alpha})]\leq\mathbb{E}[h(\mathbf{N}_{\alpha/p})]+(p\cdot O(\log S)^{d-1})^{k}

Proof.

Take $\mathbf{Z}=\mathbf{N}_{\alpha/p}$ in Lemma 2.3, so we have $\boldsymbol{\rho}\circ\mathbf{N}_{\alpha/p}=\mathbf{N}_{\alpha}$ and

\mathbb{E}[f(\mathbf{N}_{\alpha})]\leq\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{N}_{\alpha/p})]+(p\cdot O(\log S)^{d-1})^{k}

Averaging over $\mathbf{\rho}$ yields the fixed decision tree. ∎

Second, we can upper bound the extent to which the acceptance probability of a short decision tree increases when passing from the uniform distribution $\mathbf{U}$ to the biased distribution $\mathbf{N}_{\gamma}$

Claim 4.2.

Suppose $f:\{0,1\}^{n}\to\{-1,1\}$ is a depth- $k$ decision tree. Then

\mathbb{E}[f(\mathbf{N}_{\gamma})]\leq(1+\gamma)^{k}\cdot\mathbb{E}[f(\mathbf{U})]\leq e^{\gamma k}\cdot\mathbb{E}[f(\mathbf{U})]

The proof amounts to a calculation, which we include in the appendix. We’re now in a position to prove the lemma.

Proof of Lemma 4.1.

Directly applying 4.1, we get

\mathbb{E}[f(\mathbf{N}_{\alpha})]\leq\mathbb{E}[f^{\prime}(\mathbf{N}_{\alpha/p})]+(p\cdot O(\log S)^{d-1})^{k}

Applying 4.2 to $\mathbb{E}[f^{\prime}(\mathbf{N}_{\alpha/p})]$ , we get

	$\displaystyle\mathbb{E}[f(\mathbf{N}_{\alpha})]$	$\displaystyle\leq(1+\alpha/p)^{k}\mathbb{E}[f^{\prime}(\mathbf{U})]$
		$\displaystyle\leq e^{\alpha k/p}\mathbb{E}[f^{\prime}(\mathbf{U})].$

Putting these together finishes the proof. ∎

We can now prove 1.4, restated here:

See 1.4

Proof of 1.4.

Let $n=1/(7\varepsilon)$ , $k=\log(1/\varepsilon^{\prime})$ and $\alpha=\sqrt{\varepsilon/\delta}$ . These choices satisfy $\alpha\geq\sqrt{\frac{1}{8\delta}}(1/n+\varepsilon)$ , meaning $\mathbf{N}_{\alpha}$ is not $\delta$ -dense in any $\varepsilon$ -pseudorandom set for $\mathcal{C}(S,d)$ , by Lemma 2.2.

By Lemma 4.1, $\mathbf{N}_{\alpha}$ has $(\varepsilon^{\prime},\delta)$ -pseudodensity for $\delta=e^{-\alpha k/p}$ and $\varepsilon^{\prime}=(p\cdot O(\log S)^{d-1})^{k}$ .

The constraint on the density implies

	$\displaystyle\delta$	$\displaystyle=e^{-\alpha k/p}$
	$\displaystyle\log(1/\delta)$	$\displaystyle=\alpha k/p$
	$\displaystyle\log(1/\delta)$	$\displaystyle=\sqrt{\varepsilon/\delta}\log(1/\varepsilon^{\prime})/p$
	$\displaystyle p$	$\displaystyle=\frac{\sqrt{\varepsilon/\delta}\log(1/\varepsilon^{\prime})}{\log(1/\delta)}.$

Plugging this value of $p$ into the expression for $\varepsilon^{\prime}$ , we get

	$\displaystyle\varepsilon^{\prime}$	$\displaystyle=(p\cdot O(\log S)^{d-1})^{k}$
	$\displaystyle(\varepsilon^{\prime})^{1/k}/p$	$\displaystyle=O(\log S)^{d-1}$
	$\displaystyle(\varepsilon^{\prime})^{1/\log(1/\varepsilon^{\prime})}\cdot\frac{\sqrt{\delta}\log(1/\delta)}{\sqrt{\varepsilon}\log(1/\varepsilon^{\prime})}$	$\displaystyle=O(\log S)^{d-1}.$

Note that $(\varepsilon^{\prime})^{1/\log(1/\varepsilon^{\prime})}=2^{-\log(1/\varepsilon^{\prime})/\log(1/\varepsilon^{\prime})}=1/2$ . Solving for $S$ gives the claimed bound. ∎

5 Proofs of 1.5 and 1.6

In this section, we prove 1.5 and 1.6. We include them in the same section due to their similarity and start with 1.5 which is more involved. Indeed, 1.6 will follow directly from a result of Shaltiel and Viola [42] after an appropriate error reduction procedure.

5.1 Proof of 1.5

As in 1.4, we will prove unconditional pseudodensity for $\mathbf{N}_{\alpha}$ and compare it with the lower bound for $\alpha$ given by Lemma 2.2.

Let $\mathbf{Sp}_{n,k}$ denote the (random variable uniform over the) set of $n$ -bit strings of weight exactly $n/2-k$ . We convert a function which refutes the pseudodensity of $\mathbf{N}_{\alpha}$ into a (random) function which distinguishes between the two ‘slices’ of the hypercube of weight $m/2-\alpha m$ and $m/2$ , which extend similar ideas from [42, 33, 44].

Lemma 5.1.

Let $f:\{0,1\}^{n}\to\{0,1\}$ be a function for which $\delta\mathbb{E}[f(\mathbf{N}_{\alpha})]-\varepsilon^{\prime}\geq\mathbb{E}[f(\mathbf{U})]$ and let $q=\mathbb{E}[f(\mathbf{N}_{\alpha})]$ . Then for any positive integer $\ell$ , there is a random function $\mathbf{F}:\{0,1\}^{m}\to\{0,1\}$ so that

	$\displaystyle\Pr[\mathbf{F}(\mathbf{Sp}_{m,\alpha m})=0]$	$\displaystyle\leq e^{-\ell q};$
	$\displaystyle\Pr[\mathbf{F}(\mathbf{Sp}_{m,0})=1]$	$\displaystyle\leq\ell q\delta.$

Moreover, $\mathbf{F}$ is the OR of $\ell$ copies of $f$ (on random inputs).

Proof.

Fix an input $z\in\{0,1\}^{m}$ . For $i\in[\ell]$ , let $\mathbf{I}_{i}$ denote the random length $n$ sequence over $[m]$ obtained by sampling $n$ indices $j_{1},...,j_{n}\in[m]$ independently, uniformly at random (with replacement).

The random function is defined as

\mathbf{F}(z)=\bigvee_{i\in[\ell]}f(z_{\mathbf{I}_{i}})

We can evaluate $\mathbf{F}$ ’s acceptance probability on $\mathbf{Sp}_{m,\alpha m}$ and $\mathbf{Sp}_{m,0}$ as follows:

1.

Suppose $z\in\mathbf{Sp}_{m,\alpha m}$ . Then each $z_{\mathbf{I}_{i}}$ is distributed as $\mathbf{N}_{\alpha}$ on $n$ bits, meaning the $z_{\mathbf{I}_{i}}$ ’s constitute $\ell$ independent samples from $\mathbf{N}_{\alpha}$ . By definition, $f$ accepts with probability $q$ over $\mathbf{N}_{\alpha}$ and so the probability that $f$ doesn’t accept in $\ell$ independent runs is $(1-q)^{k}\leq e^{-kq}$ .
2.

Suppose $z\in\mathbf{Sp}_{m,m/2}$ . Then $z_{\mathbf{I}_{i}}\in\{0,1\}$ is a uniformly random string, meaning we have $\ell$ independent samples from $\mathbf{U}$ . For each sample, $f$ outputs $1$ independently with probability $q^{\prime}=\mathbb{E}[f(\mathbf{U})]$ , so that this occurs once in $\ell$ attempts happens with probability at most $\ell q^{\prime}$ . Since $q^{\prime}\leq q\delta-\varepsilon\leq q\delta$ , we can bound this by $\ell q\delta$ .

∎

Lemma 5.2.

Fix $\varepsilon^{\prime},\delta>0$ and $\delta<c$ for some absolute constant $c\approx 1/200$ Let $\mathbb{F}$ be a field of positive characteristic $p=O(1)$ and let $\mathcal{F}$ be the set of all polynomials $P\in\mathbb{F}[X_{1},...,X_{n}]$ with degree at most $d$ . Then if $\mathbf{N}_{\alpha}$ has $(\varepsilon^{\prime},\delta)$ -pseudodensity with respect to $\mathcal{F}$ , $d=O(1/\alpha)$ .

Note that there is no dependence on $\varepsilon^{\prime}$ . This is a relic of the field size $p>0$ , which allows us to approximate large fan-in ORs with polynomials whose degree does not depend on the fan-in but only the quality of approximation.

We will proceed in the contrapositive; if $\mathbf{N}_{\alpha}$ isn’t pseudodense, we will use the witness to construct a polynomial which we can prove degree lower bounds on directly. By ensuring that the degrees are related by a constant factor, we get the lower bound. The proof uses the following approximation of the OR function as a low-degree polynomial over $\mathbb{F}$ when $\mathbb{F}$ has positive characteristic (in characteristic zero, there is dependence on the fan-in of the OR, which we want to avoid).

Claim 5.1 ([37]).

For any $n$ and any finite field $\mathbb{F}$ with characteristic $p>0$ , there is a distribution $\mathbf{R}$ on degree $d$ polynomials so that for all $z\in\{0,1\}^{n}$

\Pr[\mathbf{R}(x)=OR(x)]\geq 1-\gamma

where $d\leq p\log(1/\gamma)$ . Moreover, $\mathbf{R}$ is supported on polynomials $R$ with $R(z)\in\{0,1\}$ for every $z\in\{0,1\}^{n}$

We also use a special case of the robust Hegëdus lemma, discovered recently by Srinivasan [44].

Lemma 5.3 (Robust Hegëdus lemma (special case), [44]).

Let $\mathbb{F}$ be a finite field. Let $2^{-m/100}\leq\lambda\leq c$ where $c<1$ is a (small) absolute constant. Let $\alpha^{2}m$ be an integer so that $2^{-2\alpha^{2}m}\geq\lambda$ . Then if $P:\mathbb{F}^{n}\to\mathbb{F}$ is a degree $d$ polynomial for which:

1.

$\Pr[P(\mathbf{Sp}_{m,\alpha m})\neq 0]\leq\lambda$
2.

$\Pr[P(\mathbf{Sp}_{m,0})=0]\leq 1-e^{-\alpha^{2}m/2}$

Then $d=\Omega(\alpha m)$ .

Now we can prove the main lemma.

Proof of Lemma 5.2.

Let $P:\mathbb{F}^{n}\to\mathbb{F}$ be a degree $d$ polynomial. We can assume $P$ takes boolean values over $\{0,1\}^{n}$ by replacing $P$ with $P^{p-1}$ ( $p$ being $\mathbb{F}$ ’s characteristic). This increases the degree to less than $pd$ . Assume towards a contradiction that $P$ can certify that $\mathbf{N}_{\alpha}$ is not $(\varepsilon^{\prime},\delta)$ -pseudodense, in the sense that $\delta\Pr[P(\mathbf{N}_{\alpha})\neq 0]-\varepsilon^{\prime}\geq\Pr[P(\mathbf{U})\neq 0]$ .

Let $q=\Pr[P(\mathbf{N}_{\alpha})]$ . Applying Lemma 5.1 to $P$ (noting that $P$ is boolean on $\{0,1\}^{n}$ ) with $\ell$ specified later, we obtain a random polynomial $\mathbf{F}$ of the form $\lor_{i}P(z_{\mathbf{I}_{i}})$ . For a fixed $z\in\{0,1\}^{m}$ , let $\mathbf{X}(z)=(z_{\mathbf{I}_{i}})_{i\in[\ell]}$ For $\gamma$ to be determined later, let $\mathbf{R}$ be the random polynomial from 5.1 and we remark that $\mathbf{R}$ can be made to output boolean values on boolean inputs. Then for any $z\in\{0,1\}^{m}$ , $\mathbf{F}(z)=OR(\mathbf{X}(z))$ so $\mathbf{F}(z)=\mathbf{R}(\mathbf{X}(z))$ with probability $1-\gamma$ . In sum, $\mathbf{R}(\mathbf{X}(\cdot))$ has the property that:

	$\displaystyle\Pr[\mathbf{R}(\mathbf{X}(\mathbf{Sp}_{m,\alpha m}))=0]$	$\displaystyle\leq e^{-\ell q}+\gamma;$
	$\displaystyle\Pr[\mathbf{R}(\mathbf{X}(\mathbf{Sp}_{m,0}))=1]$	$\displaystyle\leq\ell q\delta+\gamma.$

Moreover, $\mathbf{R}(\mathbf{X}(\cdot))$ is a random polynomial over $\mathbb{F}$ of degree at most $dp^{2}\log(1/\gamma)$ , with the $p^{2}$ coming from possibly replacing $P$ with $P^{p-1}$ .

We now apply Lemma 5.3 to $\mathbf{R}(\mathbf{X}(\cdot))$ . Let $c$ denote the constant from the statement of the lemma. Let $m=\ln(1/c)/2\alpha^{2}$ , $\lambda=e^{-\ell q}+\gamma\leq c$ , $\gamma=c/2$ and $\ell=\ln(2/c)/q\leq(\ln(2/c)\delta)/\varepsilon^{\prime}$ where the inequality is because $q\geq\varepsilon^{\prime}/\delta$ follows from $\delta q-\varepsilon^{\prime}\geq 0$ . With this setting of parameters, note that, in regards to the hypotheses of Lemma 5.3, we have $2^{-m/100}\leq\lambda\leq 2^{-2\alpha^{2}m}=c$ ,

By calculating, $\mathbf{R}(\mathbf{X}(\cdot))$ ’s classifies $\mathbf{Sp}_{m,0}$ incorrectly with probability $e^{-\ell q}=c/2$ and classifies $\mathbf{Sp}_{m,\alpha m}$ incorrectly with probability $\ell q\delta\leq\ln(2/c)\delta$ , which we can bound away from $1$ by choosing $\delta$ to be a sufficiently small constant.

Therefore, by Lemma 5.3, the degree of some polynomial in the support of $\mathbf{R}(\mathbf{X}(\cdot))$ is $\Omega(1/\alpha)$ . Since $\gamma$ and $p$ ( $\mathbb{F}$ ’s characteristic) are constants, the same conclusion therefore holds of $P$ . ∎

We can now prove 1.5, restated here:

See 1.5

Proof of 1.5.

If $\mathbf{N}_{\alpha}$ is $(\varepsilon^{\prime},\delta)$ -pseudodense with respect to degree- $d$ polynomials over $\mathbb{F}$ , then $d=O(1/\alpha)$ . On the other hand, taking $\alpha\geq\sqrt{1/8\delta(1/n+\varepsilon)}$ implies that $\mathbf{N}_{\alpha}$ is not $\delta$ -dense in an $\varepsilon$ -pseudorandom set. Therefore, if we take $n=1/\varepsilon$ then picking $\alpha\geq\sqrt{2\varepsilon/8\delta}$ will give us a separation. ∎

5.2 Proof of 1.6

In what is becoming tradition, we will show an unconditional pseudodensity result, from which 1.6 will follow by Lemma 2.2.

Lemma 5.4.

Let $\varepsilon>0$ and $1/4>\delta>0$ be arbitrary. Suppose $\mathcal{F}$ is a test class of boolean functions $f:\{0,1\}^{n}\to\{0,1\}$ with the following property: there is no ${\mathsf{AC}}^{0}$ $\mathcal{F}$ -oracle circuit of size ${\mathsf{poly}}(n/\alpha\varepsilon)$ which computes $\mathsf{MAJ}$ on $O(1/\alpha)$ input bits. Then $\mathbf{N}_{\alpha}$ is $(\epsilon\delta,\delta)$ -pseudodense.

This will follow from the result of from Shaltiel and Viola:

Theorem 5.1 ([42]).

Let $f:\{0,1\}^{n}\to\{0,1\}$ be a function that distinguishes between $\mathbf{U}$ and $\mathbf{N}_{\alpha}$ with constant distinguishing probability. Then there is an ${\mathsf{AC}}^{0}$ -circuit of size ${\mathsf{poly}}(n/\alpha)$ using $f$ -oracle gates which computes majority on $O(1/\alpha)$ bits.

Proof of Lemma 5.4.

Suppose $\mathbf{N}_{\alpha}$ is not $(\epsilon\delta,\delta)$ -pseudodense for $\mathcal{F}$ as in the statement of the theorem and let $f$ witness this fact with $q=\mathbb{E}[f(\mathbf{N}_{\alpha})]$ .

Let

\mathbf{F}(z)=\bigvee_{j\in\ell}f(\boldsymbol{\rho}_{i}(z))

where each $\boldsymbol{\rho}_{i}$ is a random permutation of $[n]$ . When $z\sim\mathbf{N}_{\alpha}$ , $\boldsymbol{\rho}_{i}(z)$ is distributed as $\mathbf{N}_{\alpha}$ and likewise for $\mathbf{U}$ . Hence $\mathbf{F}$ rejects samples from $\mathbf{N}_{\alpha}$ with probability $(1-q)^{\ell}\leq e^{-q\ell}$ which is constant when $\ell=1/q\leq\delta/\varepsilon^{\prime}=1/\varepsilon$ , this latter property following because $\delta q-\varepsilon^{\prime}>0$ . Additionally, $\mathbf{F}$ accepts samples from $\mathbf{U}$ with probability $\ell q\delta$ , which is at most $\delta$ for our choice of $\ell$ . Averaging, some function in $\mathbf{F}$ ’s support distinguishes $\mathbf{N}_{\alpha}$ and $\mathbf{U}$ with constant advantage. Applying 5.1 yields a small ${\mathsf{AC}}^{0}$ circuit computing majority, which is a contradiction. ∎

See 1.6

The proof is the same as 1.5, so we omit the details.

References

[1] Scott Aaronson. Bqp and the polynomial hierarchy. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 141–150, 2010.
[2] Rohit Agrawal. Coin theorems and the fourier expansion. arXiv preprint arXiv:1906.03743, 2019.
[3] Miklos Ajtai and Avi Wigderson. Deterministic simulation of probabilistic constant depth circuits. In 26th Annual Symposium on Foundations of Computer Science (sfcs 1985), pages 11–19. IEEE, 1985.
[4] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.
[5] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 224–232. JMLR. org, 2017.
[6] R.B. Ash. Information Theory. Dover books on advanced mathematics. Dover Publications, 1990.
[7] Boaz Barak, Moritz Hardt, and Satyen Kale. The uniform hardcore lemma via approximate bregman projections. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 1193–1200. SIAM, 2009.
[8] Boaz Barak, Ronen Shaltiel, and Avi Wigderson. Computational analogues of entropy. In Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques, pages 200–215. Springer, 2003.
[9] Thomas F Bloom and Olof Sisask. Breaking the logarithmic barrier in roth’s theorem on arithmetic progressions. arXiv preprint arXiv:2007.03528, 2020.
[10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 30–39. IEEE, 2010.
[11] Mei-Chu Chang et al. A polynomial bound in freiman’s theorem. Duke mathematical journal, 113(3):399–419, 2002.
[12] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
[13] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandomness from hardness. Technical report, ECCC preprint TR19-099, 2019.
[14] Stefan Dziembowski and Krzysztof Pietrzak. Leakage-resilient cryptography. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 293–302. IEEE, 2008.
[15] Yoav Freund and Robert Schapire. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771-780):1612, 1999.
[16] W. T. Gowers. A new proof of szemerédi’s theorem. Geometric & Functional Analysis GAFA, 11(3):465–588, 2001.
[17] W. T. Gowers. Decompositions, approximate structure, transference, and the hahn–banach theorem. Bulletin of the London Mathematical Society, 42(4):573–606, 2010.
[18] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions. Annals of Mathematics, pages 481–547, 2008.
[19] Aryeh Grinberg, Ronen Shaltiel, and Emanuele Viola. Indistinguishability by adaptive procedures with advice, and lower bounds on hardness amplification proofs. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 956–966. IEEE, 2018.
[20] Iftach Haitner, Omer Reingold, and Salil Vadhan. Efficiency improvements in constructing pseudorandom generators from one-way functions. SIAM Journal on Computing, 42(3):1405–1430, 2013.
[21] Iftach Haitner, Omer Reingold, Salil Vadhan, and Hoeteck Wee. Inaccessible entropy. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 611–620, 2009.
[22] Lianna Hambardzumyan and Yaqiao Li. Chang’s lemma via pinsker’s inequality. Discrete Mathematics, 343(1):111496, 2020.
[23] Johan Håstad. Computational limitations of small-depth circuits. 1987.
[24] Johan Håstad. On the correlation of parity and small-depth circuits. SIAM Journal on Computing, 43(5):1699–1708, 2014.
[25] Johan Håstad, Russell Impagliazzo, Leonid A Levin, and Michael Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
[26] Thomas Holenstein. Key agreement from weak bit agreement. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 664–673, 2005.
[27] Chun-Yuan Hsiao, Chi-Jen Lu, and Leonid Reyzin. Conditional computational entropy, or toward separating pseudoentropy from compressibility. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 169–186. Springer, 2007.
[28] Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 538–545. IEEE, 1995.
[29] Russell Impagliazzo. Connections between pseudo-randomness and machine learning: boosting, dense models, and regularity. 2020.
[30] Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisfiability algorithm for ac0. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 961–972. SIAM, 2012.
[31] Russell Impagliazzo, Cristopher Moore, and Alexander Russell. An entropic proof of chang’s inequality. SIAM Journal on Discrete Mathematics, 28(1):173–176, 2014.
[32] Adam R Klivans and Rocco A Servedio. Boosting and hard-core set construction. Machine Learning, 51(3):217–238, 2003.
[33] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. A fixed-depth size-hierarchy theorem for ${\mathsf{AC}}^{0}[\oplus]$ via the coin problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 442–453, 2019.
[34] Chi-Jen Lu, Shi-Chun Tsai, and Hsin-Lung Wu. Complexity of hard-core set proofs. computational complexity, 20(1):145–171, 2011.
[35] Or Meir and Avi Wigderson. Prediction from partial information and hindsight, with application to circuit lower bounds. computational complexity, 28(2):145–183, 2019.
[36] Ilya Mironov, Omkant Pandey, Omer Reingod, and Salil Vadhan. Computational differential privacy. In Annual International Cryptology Conference, pages 126–142. Springer, 2009.
[37] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
[38] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Dense subsets of pseudorandom sets. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 76–85. IEEE, 2008.
[39] Benjamin Rossman. An entropy proof of the switching lemma and tight bounds on the decision-tree size of ac0, 2017.
[40] Rocco A Servedio and Li-Yang Tan. Improved pseudorandom generators from pseudorandom multi-switching lemmas. arXiv preprint arXiv:1801.03590, 2018.
[41] Ronen Shaltiel. Is it possible to improve yao’s xor lemma using reductions that exploit the efficiency of their oracle? In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
[42] Ronen Shaltiel and Emanuele Viola. Hardness amplification proofs require majority. SIAM Journal on Computing, 39(7):3122–3154, 2010.
[43] Roman Smolensky. On representations by low-degree polynomials. In Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science, pages 130–138. IEEE, 1993.
[44] Srikanth Srinivasan. A robust version of hegedus’s lemma, with applications. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 1349–1362. ACM, 2020.
[45] Endre Szemerédi. On sets of integers containing no four elements in arithmetic progression. Acta Mathematica Academiae Scientiarum Hungarica, 20(1-2):89–104, 1969.
[46] Avishay Tal. Tight bounds on the fourier spectrum of ac0. In 32nd Computational Complexity Conference (CCC 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
[47] Michel Talagrand. How much are increasing sets positively correlated? Combinatorica, 16(2):243–258, 1996.
[48] Terence Tao, Tamar Ziegler, et al. The primes contain arbitrarily long polynomial progressions. Acta Mathematica, 201(2):213–305, 2008.
[49] Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Regularity, boosting, and efficiently simulating every high-entropy distribution. In 2009 24th Annual IEEE Conference on Computational Complexity, pages 126–136. IEEE, 2009.
[50] Thomas Watson. Advice lower bounds for the dense model theorem. ACM Transactions on Computation Theory (TOCT), 7(1):1–18, 2015.
[51] Andrew C Yao. Theory and application of trapdoor functions. In 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pages 80–91. IEEE, 1982.
[52] Jiapeng Zhang. On the query complexity for showing dense model. 2011.

Appendix A Omitted proofs

A.1 Proof of 3.3

See 3.3

Proof.

Let $R_{i}$ be the event that $i\in\boldsymbol{\rho}^{-1}(*)$ .

	$\displaystyle\Pr[(\boldsymbol{\rho}\circ\mathbf{Z})_{i}=1\|E]$	$\displaystyle=\Pr[R_{i}]\Pr[\mathbf{Z}_{i}=1\|R_{i},E]+\Pr[\neg R_{i}]\Pr[\boldsymbol{\rho}_{i}=1\|\neg R_{i},E]$
		$\displaystyle=p\Pr[\mathbf{Z}_{i}\|E]+(1-p)/2$

where we used independence of $\mathbf{Z}_{i}$ from $R_{i}$ and the independence of $\boldsymbol{\rho}_{i}=1$ from $E$ to obtain the final line. ∎

A.2 Proof of 4.2

See 4.2

Proof.

We proceed by induction. When $k=0$ , $f$ is constant and so the claim holds trivially. Suppose $f(z)=(1-z_{i})g(z)+z_{i}h(z)$ where $g,h$ have depth- $k$ decision trees which don’t depend on $z_{i}$ . Note that $\mathbb{E}[f(\mathbf{U})]=(\mathbb{E}[g(\mathbf{U})]+\mathbb{E}[h(\mathbf{U})])/2$ and assume that $\mathbb{E}[g(\mathbf{U})]/2\geq\mathbb{E}[f(\mathbf{U})]$ . For $z\sim\mathbf{N}_{\varepsilon}$ ,

$\displaystyle\mathbb{E}[f(z)]$	$\displaystyle=\mathbb{E}[(1-z_{i})g(z)]+\mathbb{E}[z_{i}h(z)]$
	$\displaystyle=(1-\mathbb{E}[z_{i}])\mathbb{E}[g(z)]+\mathbb{E}[z_{i}]\mathbb{E}[h(z)]$	(independence)
	$\displaystyle\leq(1/2-\varepsilon)(1+\varepsilon)^{k}\mathbb{E}[g(\mathbf{U})]+(1/2+\varepsilon)(1+\varepsilon)^{k}\mathbb{E}[h(\mathbf{U})]$	(induction)
	$\displaystyle=(1+\varepsilon)^{k}\Big{[}(\mathbb{E}[g(\mathbf{U})]+\mathbb{E}[h(\mathbf{U})])/2-\varepsilon\mathbb{E}[g(\mathbf{U})]+\varepsilon\mathbb{E}[h(\mathbf{U})]\Big{]}$
	$\displaystyle=(1+\varepsilon)^{k}[\mathbb{E}[f(\mathbf{U})]-\varepsilon\mathbb{E}[g(\mathbf{U})]+\varepsilon(2\mathbb{E}[f{\mathbf{U}}]-\mathbb{E}[g(\mathbf{U})])]$
	$\displaystyle=(1+\varepsilon)^{k}[\mathbb{E}[f(\mathbf{U})]+\varepsilon(2\mathbb{E}[f{\mathbf{U}}]-\mathbb{E}[g(\mathbf{U})])]$
	$\displaystyle\leq(1+\varepsilon)^{k}[\mathbb{E}[f(\mathbf{U})]+\varepsilon\mathbb{E}[f{\mathbf{U}}]]$	(assuming $\mathbb{E}[g]/2\leq\mathbb{E}[f]$ )
	$\displaystyle=(1+\varepsilon)^{k+1}\cdot\mathbb{E}[f(\mathbf{U})].$

The argument for $\mathbb{E}[h]/2\leq\mathbb{E}[f]$ is the same. ∎

	$\displaystyle\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]$	$\displaystyle=\mathbb{E}[f\|_{\boldsymbol{\rho}}(\mathbf{Z})]$
		$\displaystyle=(1-q)\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|E]+q\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|\neg E]$
		$\displaystyle=(1-q)\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]+q\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|\neg E]$
		$\displaystyle=\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]-q(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]-\mathbb{E}[g_{\boldsymbol{\rho}}(\mathbf{Z})\|\neg E]).$

	$\displaystyle\|\mathbb{E}[f(\mathbf{U})]-\mathbb{E}[f(\boldsymbol{\rho}\circ\mathbf{Z})]\|$	$\displaystyle\leq\|(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]\pm q)-(\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\pm q)\|$	(Lemma 2.3)
		$\displaystyle\leq\|\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\|+2q$	(triangle inequality)

$\displaystyle\|\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{U})]-\mathbb{E}[h_{\boldsymbol{\rho}}(\mathbf{Z})]\|$	$\displaystyle=\|\mathbb{E}_{\boldsymbol{\rho}}[\mathbb{E}[h_{\rho}(\mathbf{U})]-\mathbb{E}[h_{\rho}(\mathbf{Z})]]\|$
	$\displaystyle\leq\mathbb{E}_{\boldsymbol{\rho}}[\|\mathbb{E}[h_{\rho}(\mathbf{U})]-\mathbb{E}[h_{\rho}(\mathbf{Z})]\|]$	(triangle inequality)
	$\displaystyle\leq\|\mathbb{E}[h(\mathbf{U})]-\mathbb{E}[h(\mathbf{Z})]\|$

Comparing computational entropies below majority (or: When is the dense model theorem false?)

Abstract

1 Introduction

Claim 1.1.

Proof sketch.

1.1 The dense model theorem

Theorem 1.1 (Dense model theorem).

Theorem 1.2 (Computationally simple dense-model theorem, strong assumption).

1.2 Contributions

Theorem 1.3.

Theorem 1.4.

Theorem 1.5.

Theorem 1.6.

1.3 Related work

Computational entropy

Complexity of dense model theorems and hardness amplification

1.4 Technical overview

1.4.1 Dense distributions have mostly unbiased bits

1.4.2 Biased coin distribution

1.4.3 1.3

1.4.4 1.4

1.4.5 1.5 and 1.6

2 Technical tools

2.1 Biased coins

Claim 2.1.

2.2 Information theory

Claim 2.2.

Proof.

2.3 Random variables lacking computational entropy

Lemma 2.1.

Claim 2.3.

Proof.

Lemma 2.2.

Proof.

2.4 Random restrictions and the switching lemma

Theorem 2.1 (Rossman [39]).

Lemma 2.3.

Proof.

Lemma 2.4.

Proof.

3 Proof of 1.3

Claim 3.1.

Lemma 3.1.

Claim 3.2 ([6]).

Proof of Lemma 3.1.

Claim 3.3.

Proof.

4 Proof of 1.4

Lemma 4.1.

Claim 4.1.

Proof.

Claim 4.2.

Proof of Lemma 4.1.

Proof of 1.4.

5 Proofs of 1.5 and 1.6

5.1 Proof of 1.5

Lemma 5.1.

Proof.

Lemma 5.2.

Claim 5.1 ([37]).

Lemma 5.3 (Robust Hegëdus lemma (special case), [44]).

Proof of Lemma 5.2.

Proof of 1.5.

5.2 Proof of 1.6

Lemma 5.4.

Theorem 5.1 ([42]).

Proof of Lemma 5.4.

References

Appendix A Omitted proofs

A.1 Proof of 3.3

Proof.

A.2 Proof of 4.2

Proof.

Comparing computational entropies below majority
(or: When is the dense model theorem false?)