This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Isoperimetric Inequalities Made Simpler

Ronen Eldan 111Department of Computer Science and Applied Mathematics, Weizmann Institute.    Guy Kindler 222School of Computer Science and Engineering, Hebrew University.    Noam Lifshitz 333School of Mathematics, Hebrew University. Partially supported by ISF individual research grant 1980/22.    Dor Minzer 444Department of Mathematics, Massachusetts Institute of Technology. Supported by a Sloan Research Fellowship and NSF CCF award 2227876.
Abstract

We give an alternative, simple method to prove isoperimetric inequalities over the hypercube. In particular, we show:

  1. 1.

    An elementary proof of classical isoperimetric inequalities of Talagrand, as well as a stronger isoperimetric result conjectured by Talagrand and recently proved by Eldan and Gross.

  2. 2.

    A strengthening of the Friedgut junta theorem, asserting that if the pp-moment of the sensitivity of a function is constant for some 1/2+εp11/2+\varepsilon\leqslant p\leqslant 1, then the function is close to a junta. In this language, Friedgut’s theorem is the special case that p=1p=1.

1 Introduction

1.1 Isoperimetric Inequalities Over the Hypercube

Isoperimetric inequalities are fundamental results in mathematics with a myriad of applications. The main topic of this paper is discrete isoperimetric inequalities, and more specifically isoperimetric inequalities over the hypercube {0,1}n\{0,1\}^{n}. Classical results along these lines are due to Harper, Bernstein, Lindsey and Hart [Har66, Ber67, Lin64, Har76], Harper [Har66], Margulis [Mar74], Talagrand [Tal93, Tal97, Tal94] and Kahn, Kalai and Linial [KKL88].

All of these theorems discuss various measures of boundaries and relations between them for Boolean functions. For a function f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} and a point x{0,1}nx\in{\left\{0,1\right\}}^{n}, we define the sensitivity of ff at xx, denoted by sf(x)s_{f}(x), to be the number of coordinates i[n]i\in[n] such that f(xei)f(x)f(x\oplus e_{i})\neq f(x). The vertex boundary of a function ff, denoted by VfV_{f}, is the set of all xx’s such that sf(x)>0s_{f}(x)>0, and the edge boundary EfE_{f} of ff is defined to be the set of edges (x,xei)(x,x\oplus e_{i}) of the hypercube whose endpoints are assigned different values by ff. The most basic isoperimetric result related to Boolean functions, known as Poincaré’s inequality, asserts that |Ef|2n𝗏𝖺𝗋(f)\frac{\left|{E_{f}}\right|}{2^{n}}\geqslant{\sf var}(f), i.e. the edge boundary of a function ff cannot be too small.555We remark that the term “Poincaré’s inequality” comes from the theory of functional inequalities and Markov chains and refers to a much more general result, see for example [KT14, Section 2]. In the case of graphs, it asserts that for a graph GG, if the first non-trivial eigenvalue of its Laplacian LL is at least λ\lambda, then Lf2λf𝔼[f]2\|Lf\|_{2}\geqslant\lambda\|f-\mathop{\mathbb{E}}[f]\|_{2}. In our example, the Laplacian LL is L=I1nAL=I-\frac{1}{n}A where AA is the adjacency matrix of the Boolean hypercube, and λ\lambda is 2/n2/n. The edge isoperimetric inequality for the hypercube, due to Harper, Bernstein, Lindsey and Hart [Har66, Ber67, Lin64, Har76], is a strengthening of that statement, asserting that for a fixed value of 𝔼[f]\mathop{\mathbb{E}}[f], the functions that minimize the edge boundary |Ef|2n\frac{\left|{E_{f}}\right|}{2^{n}} are indicator functions of initial segments in lexicographical orderings of the hypercube. In particular, this result implies that if ff is a non-constant Boolean function, then |Ef|2n𝔼[f]log(1𝔼[f])\frac{\left|{E_{f}}\right|}{2^{n}}\geqslant\mathop{\mathbb{E}}[f]\log\left(\frac{1}{\mathop{\mathbb{E}}[f]}\right) (here and throughout, log\log means the logarithm function with base 22). The vertex isoperimetric inequality, due to Harper [Har66], asserts that among all Boolean function of a given measure, the functions that minimize the size of the vertex boundary VfV_{f} are the indicator functions of initial segments in simplicial orderings of the hypercube. Margulis [Mar74] proved a result that combines the notion of edge boundary and vertex boundary, showing that for all functions ff one has that |Vf|2n|Ef|2nΩ(𝗏𝖺𝗋(f)2)\frac{\left|{V_{f}}\right|}{2^{n}}\cdot\frac{\left|{E_{f}}\right|}{2^{n}}\geqslant\Omega({\sf var}(f)^{2}).

Talagrand considered a different notion of boundary for a function ff, namely the quantity 𝔼x[sf(x)]{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}, which we will refer to as the Talagrand boundary of ff. Here and throughout, the distribution over xx will be uniform over the hypercube {0,1}n{\left\{0,1\right\}}^{n}. Talagrand proved a number of results about this quantity [Tal93, Tal97], the most basic of which asserts that 𝔼x[sf(x)]Ω(𝗏𝖺𝗋(f)){\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant\Omega({\sf var}(f)). Using a simple application of the Cauchy-Schwarz inequality one sees that the result of Talagrand implies Margulis’ theorem. Talagrand then went on and strengthened the result for functions with small variance, proving the following theorem:

Theorem 1.1 (Talagrand [Tal93]).

For all non-constant Boolean functions f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} it holds that

𝔼x[sf(x)]Ω(𝗏𝖺𝗋(f)log(1𝗏𝖺𝗋(f))).{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant\Omega\left({\sf var}(f)\sqrt{\log\left(\frac{1}{{\sf var}(f)}\right)}\right).

We remark that Talagrand’s result is in fact more general and slightly stronger. First, Talagrand proves that the above inequality holds for the pp-biased measure of the cube if one inserts a factor of min(p,1p)p(1p)\frac{\min(p,1-p)}{\sqrt{p(1-p)}} to the right hand side. The result and techniques of the current paper also generalize to the pp-biased measure in a straight-forward manner, and we focus on the case that p=1/2p=1/2 for simplicity. Second, Talagrand considers a variant of the quantity sf(x)s_{f}(x) wherein sf(x)s_{f}(x) is defined to be 0 for xx such that f(x)=0f(x)=0. The results we state below also hold with this stronger definition.

While Theorem 1.1 can be seen to be tight up to constant factors (as can be seen by considering sub-cubes), it leaves something to be desired: it is incomparable to yet another prominent isoperimetric result – the KKL theorem [KKL88]. The KKL Theorem is concerned with the influence of coordinates on a function. For a coordinate i[n]i\in[n], the influence of ii is defined by

Ii[f]=Prx{0,1}n[f(x)f(xei)].I_{i}[f]={\Pr_{x\in{\left\{0,1\right\}}^{n}}\left[{f(x)\neq f(x\oplus e_{i})}\right]}.

In words, it is the fraction of edges along direction ii that are in the edge boundary of ff. The KKL theorem [KKL88] asserts that any function must have a variable that is at least somewhat influential; more quantitatively, they proved that maxiIi[f]Ω(lognn𝗏𝖺𝗋(f))\max_{i}I_{i}[f]\geqslant\Omega\left(\frac{\log n}{n}{\sf var}(f)\right).

Since the isoperimetric results of Talagrand and of KKL look very different (as well as the techniques that go into their proofs), Talagrand asked whether one can establish a single isoperimetric inequality that is strong enough to capture both of these results simultaneously. In [Tal97], Talagrand conjectures such a statement, and makes some progress towards it. Namely, he shows that there exists α(0,1/2]\alpha\in(0,1/2] such that for any non-constant function f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} it holds that

𝔼x[sf(x)]c𝗏𝖺𝗋(f)(log(1𝗏𝖺𝗋(f)))1/2α(log(1+1i=1nIi[f]2))α,{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant c\cdot{\sf var}(f)\left(\log\left(\frac{1}{{\sf var}(f)}\right)\right)^{1/2-\alpha}\left(\log\left(1+\frac{1}{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\right)\right)^{\alpha}, (1)

for some absolute constant c>0c>0. This inequality can be seen to imply a weaker result along the lines of the KKL theorem,666Namely that there is β>0\beta>0 such that if 𝗏𝖺𝗋(f)Ω(1){\sf var}(f)\geqslant\Omega(1), maxiIi[f]Ω((logn)β/n)\max_{i}I_{i}[f]\geqslant\Omega((\log n)^{\beta}/n). but is not enough to fully recover it. Talagrand conjectured that the above inequality holds for α=1/2\alpha=1/2, a statement that is strong enough to directly imply the KKL theorem.

Recently, Eldan and Gross [EG] introduced techniques from stochastic processes to the field, and used these to prove that Talagrand’s conjecture is true, i.e. that one may take α=1/2\alpha=1/2 in (1).

1.2 Our Results

We show a unified and simple Fourier analytic approach, based on random restrictions, to establish the results mentioned above, as well as a strengthening of Friedgut’s junta theorem [Fri98]. Our results are:

  1. 1.

    Classical isoperimetric inequalities. We give Fourier analytic proofs of the inequalities above by Margulis, Talagrand and Eldan-Gross, as well as of robust versions of them. Our proof technique also unravels a connection between these isoperimetric inequalities and other classical results in the area, such as the Friedgut junta theorem [Fri98] and Bourgain’s tail theorem [Bou02]. The fact that such purely Fourier analytic proof exists is quite surprising to us, as the quantity 𝔼x[sf(x)]{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]} does not seem to relate well to Fourier analytic methods (and indeed, the proof of Talagrand proceeds by induction, whereas the proof of Eldan and Gross uses stochastic calculus).

  2. 2.

    Strengthened junta theorems. We establish a strengthening of Friedgut’s theorem in which the assumption that the average sensitivity of ff is small, i.e. that 𝔼x[sf(x)]{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]} is small, is replaced by the weaker condition that 𝔼x[sf(x)p]{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]} is small, for pp slightly larger than 1/21/2.

In the rest of this introductory section, we describe our results in more detail.

1.3 Classical Isoperimetric Inequalities and Robust Versions

We give simpler proofs for the results below, which are due to Talagrand [Tal93] and Eldan-Gross [EG].

Theorem 1.2.

There exists an absolute constant c>0c>0, such that for all non-constant functions f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} we have that

𝔼x[sf(x)]c𝗏𝖺𝗋(f)log(1𝗏𝖺𝗋(f)).{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant c\cdot{\sf var}(f)\sqrt{\log\left(\frac{1}{{\sf var}(f)}\right)}.
Theorem 1.3.

There exists an absolute constant c>0c>0, such that for all non-constant functions f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} we have that

𝔼x[sf(x)]c𝗏𝖺𝗋(f)log(1+1i=1nIi[f]2).{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant c\cdot{\sf var}(f)\sqrt{\log\left(1+\frac{1}{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\right)}.

Our technique is more general, and can be used to prove more robust versions of Theorems 1.21.3. For a 22-coloring of the edges of the hypercube 𝖼𝗈𝗅:E({0,1}n){𝗋𝖾𝖽,𝖻𝗅𝗎𝖾}{\sf col}\colon E(\{0,1\}^{n})\to\{{\sf red},{\sf blue}\}, define the red sensitivity as sf,𝗋𝖾𝖽(x)=0s_{f,{\sf red}}(x)=0 if f(x)=0f(x)=0, and otherwise sf,𝗋𝖾𝖽(x)s_{f,{\sf red}}(x) is the number of sensitive edges adjacent to xx that are red. Similarly, define the blue sensitivity as sf,𝖻𝗅𝗎𝖾(x)=0s_{f,{\sf blue}}(x)=0 if f(x)=1f(x)=1 and otherwise sf,𝖻𝗅𝗎𝖾(x)s_{f,{\sf blue}}(x) is the number of sensitive edges adjacent to xx that are blue. Then, defining

𝖳𝖺𝗅𝖺𝗀𝗋𝖺𝗇𝖽𝖼𝗈𝗅(f)=𝔼x[sf,𝗋𝖾𝖽(x)]+𝔼x[sf,𝖻𝗅𝗎𝖾(x)]{\sf Talagrand}_{{\sf col}}(f)={\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f,{\sf red}}(x)}}\right]}+{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f,{\sf blue}}(x)}}\right]}

we have that the left hand side of Theorems 1.21.3 can be replaced by 𝖳𝖺𝗅𝖺𝗀𝗋𝖺𝗇𝖽𝖼𝗈𝗅(f){\sf Talagrand}_{{\sf col}}(f) for any 22-coloring 𝖼𝗈𝗅{\sf col}. Such results can be used to prove the existence of nearly bi-regular structures in the sensitivity graph of ff which are sometimes useful in applications (for example, in [KMS18] such structures in the directed sensitivity graph are necessary to construct monotonicity testers). To get some intuition, consider the bipartite sub-graph of the hypercube consisting of sensitive edges of ff, and suppose that we want to find a large sub-graph of it which is nearly regular, in the sense that the degrees of one side are between dd and 2d2d, while the degrees of the other side are at most dd for some dd. In that case, taking the coloring 𝖼𝗈𝗅𝗋𝖾𝖽{\sf col}_{{\sf red}} that assigns to all edges red or the coloring 𝖼𝗈𝗅𝖻𝗅𝗎𝖾{\sf col}_{{\sf blue}} that assigns all of the edges blue (which amount to Talagrand’s original formulation) gives us information about the typical degrees of vertices xx with f(x)=1f(x)=1 and vertices yy with f(y)=0f(y)=0 respectively. It doesn’t give us any information about the possible relation between the degrees of such xx’s and such yy’s, though. Using the coloring 𝖼𝗈𝗅{\sf col} appropriately, one may establish a better balance between these two degrees and get useful information on the relation between them. Establishing such robust results using the techniques of [EG] is more challenging.

1.4 New Junta Theorems

Next, we study connections between Talagrand-like quantities such as 𝔼x[sf(x)]{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}, and the notion of juntas. For a point x{0,1}nx\in\{0,1\}^{n} and a subset J[n]J\subseteq[n], we denote by xJx_{J} the point in {0,1}J\{0,1\}^{J} corresponding to the value of xx on coordinates JJ.

Definition 1.4.

A function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is called a jj-junta if there exists J[n]J\subseteq[n] of size at most jj, and g:{0,1}J{0,1}g\colon\{0,1\}^{J}\to\{0,1\} such that f(x)=g(xJ)f(x)=g(x_{J}) for all x{0,1}nx\in\{0,1\}^{n}.

We say a function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is ε\varepsilon-close to a jj-junta if there is a jj-junta g:{0,1}n{0,1}g\colon\{0,1\}^{n}\to\{0,1\} such that Prx[f(x)g(x)]ε{\Pr_{x}\left[{f(x)\neq g(x)}\right]}\leqslant\varepsilon.

On the one hand, it is possible that the quantity 𝔼x[sf(x)]{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]} is constant while the function ff is very far from being a junta, as can be seen by taking ff to be the majority function. On the other hand, if we look at the expected sensitivity – that is 𝔼x[sf(x)]{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]} – a well known result of Friedgut [Fri98] asserts that if 𝔼x[sf(x)]=O(1){\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}=O(1), then ff is close to junta. More precisely, Friedgut’s result asserts that if 𝔼x[sf(x)]A{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}\leqslant A, then for all ε>0\varepsilon>0, ff is ε\varepsilon-close to a 2O(A/ε)2^{O(A/\varepsilon)}-junta. In light of this contrast, it is interesting to ask what happens if an intermediate moment of the sensitivity, say the ppth moment for p(1/2,1)p\in(1/2,1), is constant.

We prove a strengthening of Friedgut’s junta theorem [Fri98] addressing this case, morally showing that for any p>1/2p>1/2, bounded pp-moment of the sensitivity implies closeness to junta. More precisely:

Theorem 1.5.

For all η>0\eta>0 there is C=C(η)>0C=C(\eta)>0 such that the following holds. For all ε>0\varepsilon>0 and A>0A>0, and 1/2+ηp11/2+\eta\leqslant p\leqslant 1, if f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is a function such that 𝔼x[sf(x)p]A{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A, then ff is ε\varepsilon-close to a JJ-junta, where J=2C(A/ε)1/pJ=2^{C\left(A/\varepsilon\right)^{1/p}}.

Theorem 1.5 is tight, and can be seen as a generalization of Friedgut’s Junta Theorem (corresponding to the case that p=1p=1) that is somewhat close in spirit to Bourgain’s Junta Theorem. Indeed, for all p1/2p\geqslant 1/2 the tribes function shows this to be tight. Here, the tribes function is the function in which we take a partition P1PmP_{1}\cup\ldots\cup P_{m} of [n][n] and define 𝖳𝗋𝗂𝖻𝖾𝗌(x)=1{\sf Tribes}(x)=1 if there is ii such that xPi=1x_{P_{i}}=\vec{1}, and 𝖳𝗋𝗂𝖻𝖾𝗌(x)=0{\sf Tribes}(x)=0 otherwise. We take PiP_{i} of equal size and the probability that 𝖳𝗋𝗂𝖻𝖾𝗌(x)=1{\sf Tribes}(x)=1 is roughly 1/21/2; the correct choice of parameters is |Pi|=lognloglogn+c\left|{P_{i}}\right|=\log n-\log\log n+c and m=nlognloglogn+cm=\frac{n}{\log n-\log\log n+c} where c=O(1)c=O(1). For this function one has 𝔼x[s𝖳𝗋𝗂𝖻𝖾𝗌(x)p]=Θ((logn)p){\mathop{\mathbb{E}}_{x}\left[{s_{{\sf Tribes}}(x)^{p}}\right]}=\Theta((\log n)^{p}). Indeed, as s𝖳𝗋𝗂𝖻𝖾𝗌(x)=Θ(logn)s_{{\sf Tribes}}(x)=\Theta(\log n) with constant probability we have the lower bound, and the upper bound follows by Jensen’s inequality and the fact that the total influence of 𝖳𝗋𝗂𝖻𝖾𝗌{\sf Tribes} is Θ(logn)\Theta(\log n).

As a consequence, Theorem 1.5 also implies a version of the KKL theorem for ppth moments of sensitivity:

Corollary 1.6.

For all δ>0\delta>0, there is c>0c>0 such that the following holds for all p[1/2+δ,1]p\in[1/2+\delta,1]. Suppose that a function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} satisfies that 𝔼x[sf(x)p]A{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A, then there exists i[n]i\in[n] such that

Ii[f]2c(A/𝗏𝖺𝗋(f))1/p.I_{i}[f]\geqslant 2^{-c\left(A/{\sf var}(f)\right)^{1/p}}.
Proof.

Assume without loss of generality that 𝔼[f]12\mathop{\mathbb{E}}[f]\leqslant\frac{1}{2} (otherwise we work with 1f1-f). Take ε=𝔼[f]/10\varepsilon=\mathop{\mathbb{E}}[f]/10 and apply Theorem 1.5 to get that ff is ε\varepsilon-close to a function gg which is a JJ-junta for J=2Θ((A/𝗏𝖺𝗋(f))1/p)J=2^{\Theta(\left(A/{\sf var}(f)\right)^{1/p})}, and let 𝒥\mathcal{J} be the set of coordinates that gg depends on. Choose x{0,1}nx\in\{0,1\}^{n} uniformly, and take y{0,1}ny\in\{0,1\}^{n} where y𝒥¯=x𝒥¯y_{\overline{\mathcal{J}}}=x_{\overline{\mathcal{J}}} and y𝒥y_{\mathcal{J}} is sampled uniformly from {0,1}𝒥{\left\{0,1\right\}}^{\mathcal{J}}. Then Prx,y[g(x)=0,g(y)=1]=𝔼[g](1𝔼[g]){\Pr_{x,y}\left[{g(x)=0,g(y)=1}\right]}=\mathop{\mathbb{E}}[g](1-\mathop{\mathbb{E}}[g]); also, f(x)=g(x)f(x)=g(x) and f(y)=g(y)f(y)=g(y) hold with probability at least 12ε1-2\varepsilon, hence we conclude that

Prx,y[f(x)=0,f(y)=1]𝔼[g](1𝔼[g])2ε12(𝔼[f]ε)2ε14𝔼[f].{\Pr_{x,y}\left[{f(x)=0,f(y)=1}\right]}\geqslant\mathop{\mathbb{E}}[g](1-\mathop{\mathbb{E}}[g])-2\varepsilon\geqslant\frac{1}{2}(\mathop{\mathbb{E}}[f]-\varepsilon)-2\varepsilon\geqslant\frac{1}{4}\mathop{\mathbb{E}}[f].

On the other hand, write 𝒥={i1,,iJ}\mathcal{J}={\left\{i_{1},\ldots,i_{J}\right\}} and consider a path x(0),,x(J)x(0),\ldots,x(J) where x(0)=xx(0)=x, x(J)=yx(J)=y and x(t),x(t+1)x(t),x(t+1) may differ only on coordinate it+1i_{t+1}. Then by the union bound,

14𝔼[f]Prx,y[f(x)=0,f(y)=1]t=0J1Prx,y[f(x(t))=0,f(x(t+1))=1]=t=0J1Iit+1[f]JmaxiIi[f],\frac{1}{4}\mathop{\mathbb{E}}[f]\leqslant{\Pr_{x,y}\left[{f(x)=0,f(y)=1}\right]}\leqslant\sum\limits_{t=0}^{J-1}{\Pr_{x,y}\left[{f(x(t))=0,f(x(t+1))=1}\right]}=\sum\limits_{t=0}^{J-1}I_{i_{t+1}}[f]\leqslant J\max_{i}I_{i}[f],

and the result follows. ∎

We note that the KKL theorem is this statement for p=1p=1, and the version for p<1p<1 is only stronger than the KKL theorem as always 𝔼x[sf(x)p]𝔼x[sf(x)]p{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}^{p}.

Remark 1.7.

We remark that a naive application of our method establishes a sub-optimal dependency between the parameters AA and JJ in Theorem 1.5, but gives an interesting relation to the noise sensitivity theorem of Bourgain [Bou02]. More precisely, in Corollary 3.11 we show that if f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} satisfies 𝔼x[sf(x)p]A{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A, then ff must be noise stable. More specifically, for ε>0\varepsilon>0, we define the (1ε)(1-\varepsilon)-correlated distribution (x,y)(x,y) by taking x{0,1}nx\in{\left\{0,1\right\}}^{n} uniformly, and for each i[n]i\in[n] independently, setting yi=xiy_{i}=x_{i} with probability 1ε1-\varepsilon and otherwise resampling yi{0,1}y_{i}\in{\left\{0,1\right\}} uniformly. In this language, we show that if 𝔼x[sf(x)p]A{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A, and p<1p<1, then for all ε>0\varepsilon>0,

Prx,y (1ε)-correlated[f(x)=f(y)]1Op(Aεp).{\Pr_{x,y\text{ $(1-\varepsilon)$-correlated}}\left[{f(x)=f(y)}\right]}\geqslant 1-O_{p}(A\cdot\varepsilon^{p}).

From this, one may establish a weaker variant of Theorem 1.5 by appealing to Bourgain’s result [Bou02] (see also [KKO18]). The version of that theorem from [KKO18, Theorem 3.5] asserts that if the Fourier weight of a function ff above level kk is at most ξ/k\xi/\sqrt{k}, then the function is (1+o(1))ξ(1+o(1))\xi-close to a J=2O(k)/ξ4J=2^{O(k)}/\xi^{4}-junta. To use this result, one may observe that the noise stability bound above implies that for k=1/εk=1/\varepsilon, the Fourier weight of ff above level kk is at most Op(Aεp)=Op(Aεp1/2k)O_{p}(A\varepsilon^{p})=O_{p}\left(\frac{A\varepsilon^{p-1/2}}{\sqrt{k}}\right), hence by [KKO18, Theorem 3.5] ff is O(Aεp1/2)O(A\varepsilon^{p-1/2})-close to a J=2O(1/ε)Aεp1/2J=\frac{2^{O(1/\varepsilon)}}{A\varepsilon^{p-1/2}}-junta for all ε>0\varepsilon>0. In particular, setting ε=Θ((δ/A)2/(2p1))\varepsilon=\Theta((\delta/A)^{2/(2p-1)}), we get that ff is δ\delta-close to a J=2O((A/δ)2/(2p1))J=2^{O((A/\delta)^{2/(2p-1)})} junta.

1.5 Proof Overview

In this section we give an outline of the proof of Theorems 1.2 and 1.3. Both results rely on the same simple observation, that, once combined with off-the-shelf Fourier concentration results, finishes the proof.

Given f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to\{0,1\}, we consider its Fourier coefficients f^(S)=𝔼x{0,1}n[f(x)iS(1)xi]\widehat{f}(S)={\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{f(x)\prod\limits_{i\in S}(-1)^{x_{i}}}\right]} as well as its gradient f:{0,1}n{1,0,1}n\nabla f\colon{\left\{0,1\right\}}^{n}\to\{-1,0,1\}^{n} defined as f(x)=(f(xei)f(x))i=1,,n\nabla f(x)=(f(x\oplus e_{i})-f(x))_{i=1,\ldots,n}. We first observe that the square root of the sensitivity of ff at xx is nothing but f(x)2\|\nabla f(x)\|_{2}, and also that looking at the absolute value of the iith entry in f\nabla f, namely at the function x|f(xei)f(x)|x\rightarrow\left|{f(x\oplus e_{i})-f(x)}\right|, one has that its average is at least the absolute value of the singleton Fourier coefficient |f^({i})|\left|{\widehat{f}(\{i\})}\right|. This suggests that if ff has significant singleton Fourier coefficients then the magnitude of the gradient should be large, and therefore the expected root of the sensitivity should also be large. Indeed, by a simple application of the triangle inequality one gets that

𝔼x{0,1}n[f(x)2]𝔼x{0,1}n[|f(x)|]2(|f^({1})|,,|f^({n})|)2,{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\left|{\nabla f(x)}\right|}\right]}\|_{2}\geqslant\left\|\left(\left|{\widehat{f}(\{1\})}\right|,\ldots,\left|{\widehat{f}(\{n\})}\right|\right)\right\|_{2},

which is equal to i=1nf^({i})2\sqrt{\sum\limits_{i=1}^{n}\widehat{f}(\{i\})^{2}}. As the sum of all the squares of all of the Fourier coefficients is at most 11, we get that 𝔼x{0,1}n[f(x)2]i=1nf^({i})2{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\sum\limits_{i=1}^{n}\widehat{f}(\{i\})^{2}. In words, via a simple application of the triangle inequality we managed to show a lower bound of the quantity we are interested in by the level 11 Fourier weight of the function ff. This is already a rather interesting isoperimetric consequence, albeit rather weak.

We next show how to bootstrap this basic result into Theorems 1.2 and 1.3 via random restrictions. To get some intuition, suppose that we knew our function ff has significant weight on level dd (where d>1d>1); can we utilize the above logic to say anything of interest about lower bounds of 𝔼x{0,1}n[f(x)2]{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}? The answer is yes, and one way to do that is via random restrictions. If ff has considerable weight on level dd, we choose a subset of coordinates J[n]J\subseteq[n] by independently including each coordinate in it with probability p=1/dp=1/d and then restrict the coordinates inside J¯\overline{J} to a uniformly chosen value z{0,1}J¯z\in{\left\{0,1\right\}}^{\overline{J}}. In that case, we expect the weight of ff to collapse from level dd to roughly level pdpd , which by the choice of the parameter pp is roughly the first Fourier level. At that point we could apply the above logic on the restricted function and get some lower bound for the expected root of the sensitivity of the restricted function. But how does this last quantity relate to 𝔼x{0,1}n[f(x)2]{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}?

Thinking about it for a moment, for a fixed point xx, if we choose JJ randomly as above and fix all coordinates inside J¯\overline{J}, then only roughly 1/d1/d of the coordinates sensitive on xx remain alive, hence we expect the sensitivity to drop by a factor of 1/d1/d as a result of this restriction. Indeed, it can be shown that under random restrictions, the quantity 𝔼x{0,1}n[f(x)2]{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]} drops by a factor of 1/d1/\sqrt{d}. Thus, combining this observation with the above reasoning suggests that we should get a lower bound of

𝔼x{0,1}n[f(x)2]Ω(dd|S|2df^(S)2),{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\Omega\left(\sqrt{d}\sum\limits_{d\leqslant\left|{S}\right|\leqslant 2d}\widehat{f}(S)^{2}\right),

and indeed this is the bound we establish.

Theorems 1.2 and 1.3 then follow from known Fourier tail bounds. Specifically, it is known that (1) if ff has small variance then most of its Fourier mass lies above level d=Ω(log(1/𝗏𝖺𝗋(f)))d=\Omega(\log(1/{\sf var}(f))), and (2) if ff has small influences then most Fourier mass lies above level d=Ω(log(1/i=1nIi[f]2))d=\Omega(\log(1/\sum\limits_{i=1}^{n}I_{i}[f]^{2})).

2 Preliminaries

Notation.

We will use big OO notations throughout the paper. Sometimes, it will be easier for us to use \lesssim and \gtrsim notations; when we say that XYX\lesssim Y we mean that there is an absolute constant C>0C>0 such that XCYX\leqslant CY. Analogously, when we say that XYX\gtrsim Y we mean that there is an absolute constant C>0C>0 such that XCYX\geqslant CY.

2.1 Fourier Analysis over the Hypercube

We consider the space of real-valued functions f:{0,1}nf:{\left\{0,1\right\}}^{n}\to{\mathbb{R}}, equipped with the inner product f,g=𝔼xR{0,1}n[f(x)g(x)]\langle{f},{g}\rangle={\mathop{\mathbb{E}}_{x\in_{R}{\left\{0,1\right\}}^{n}}\left[{f(x)g(x)}\right]}. It is well-known that the collection of functions {χS}S[n]{\left\{\chi_{S}\right\}}_{S\subseteq[n]} defined by χS(x)=iS(1)xi\chi_{S}(x)=\prod\limits_{i\in S}{(-1)^{x_{i}}} forms an orthonormal basis, so any function ff may be written as f(x)=S[n]f^(S)χS(x)f(x)=\sum\limits_{S\subseteq[n]}{\widehat{f}(S)\chi_{S}(x)} in a unique way, where f^(S)=f,χS\widehat{f}(S)=\langle{f},{\chi_{S}}\rangle. We also consider LpL_{p} norms for p1p\geqslant 1, that are defined as fp=(𝔼x[|f(x)|p])1/p\|f\|_{p}=\left({\mathop{\mathbb{E}}_{x}\left[{\left|{f(x)}\right|^{p}}\right]}\right)^{1/p}.

For z{0,1}nz\in{\left\{0,1\right\}}^{n}, we denote by zi{0,1}n1z_{-i}\in{\left\{0,1\right\}}^{n-1} the vector resulting from dropping the iith coordinate of zz. We denote by (xi=1,xi=zi)(x_{i}=1,x_{-i}=z_{-i}) the nn-bit vector which is 11 on the iith coordinate and is equal to zz on any other coordinate.

Definition 2.1.

The derivative of ff along direction i[n]i\in[n] is if:{0,1}n\partial_{i}f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R} defined by if(z)=f(xi=0,xi=zi)f(xi=1,xi=zi)\partial_{i}f(z)=f(x_{i}=0,x_{-i}=z_{-i})-f(x_{i}=1,x_{-i}=z_{-i}). The gradient of ff, f:{0,1}nn\nabla f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}^{n}, is defined as f(z)=(1f(z),,nf(z))\nabla f(z)=(\partial_{1}f(z),\ldots,\partial_{n}f(z)).

Note that for a Boolean function ff, for each x{0,1}nx\in{\left\{0,1\right\}}^{n}, if(x)0\partial_{i}f(x)\neq 0 if and only if the coordinate ii is sensitive on xx, and in this case its value is either 11 or 1-1. Thus, f(x)2=sf(x)\|\nabla f(x)\|_{2}=\sqrt{s_{f}(x)}, and it will be often convenient for us to use the 2\ell_{2}-norm of the gradient of ff in place of sf(x)\sqrt{s_{f}(x)}.

Fact 2.2.

Let f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}, and let i[n]i\in[n]. Then if(x)=2Sif^(S)χS{i}(x)\partial_{i}f(x)=2\sum\limits_{S\ni i}{\widehat{f}(S)\chi_{S\setminus{\left\{i\right\}}}(x)}. In particular, we have that 𝔼x[if(x)]=2f^({i}){\mathop{\mathbb{E}}_{x}\left[{\partial_{i}f(x)}\right]}=2\widehat{f}({\left\{i\right\}}).

Next, we define the level dd Fourier weight of a function ff, the approximate level dd weight of a function ff and the Fourier weight of ff up to/ above level dd.

Definition 2.3.

The level dd Fourier weight of ff is defined to be W=d[f]=|S|=df^(S)2W_{=d}[f]=\sum\limits_{\left|{S}\right|=d}{\widehat{f}(S)^{2}}. The approximate level dd weight of a function ff is defined to be Wd[f]=dj<2dW=j[f]W_{\approx d}[f]=\sum\limits_{d\leqslant j<2d}{W_{=j}[f]}. The weight of ff up to level dd is defined as Wd[f]=|S|df^(S)2W_{\leqslant d}[f]=\sum\limits_{\left|{S}\right|\leqslant d}{\widehat{f}(S)^{2}}, and the weight of ff above level dd is defined as W>d[f]=|S|>df^(S)2W_{>d}[f]=\sum\limits_{\left|{S}\right|>d}{\widehat{f}(S)^{2}}.

For a function f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}, we also define the level at most dd part of ff, denoted by fdf^{\leqslant d}, as fd(x)=|S|df^(S)χS(x)f^{\leqslant d}(x)=\sum\limits_{\left|{S}\right|\leqslant d}\widehat{f}(S)\chi_{S}(x). We note that by Parseval’s equality we have that fd22=Wd[f]\|f^{\leqslant d}\|_{2}^{2}=W_{\leqslant d}[f].

2.2 Random Restrictions

For a function f:{0,1}nf\colon\{0,1\}^{n}\to\mathbb{R}, a set of coordinates J[n]J\subseteq[n] and z{0,1}J¯z\in{\left\{0,1\right\}}^{\overline{J}}, we define the restriction of ff according to (J,z)(J,z) as the function fJ¯z:{0,1}Jf_{\overline{J}\rightarrow z}\colon\{0,1\}^{J}\to\mathbb{R} defined by

fJ¯z(y)=f(xJ¯=z,xJ=y).f_{\overline{J}\rightarrow z}(y)=f(x_{\overline{J}}=z,x_{J}=y).

We refer to the coordinates of JJ as “alive”, and refer to the coordinates of J¯\overline{J} as fixed. We will often be interested in random restrictions of a function ff. By that, we mean that the set JJ is chosen randomly by including in it each i[n]i\in[n] independently with some probability (which is specified), and z{0,1}J¯z\in{\left\{0,1\right\}}^{\overline{J}} is chosen uniformly. We have the following fact, which establishes a relation between the approximate level dd weight of ff and the level 11 weight of a random restriction of ff in which each i[n]i\in[n] is included in JJ with probability 12d\frac{1}{2d} (see [KKO18, Fact 4.1] and [KKK+, Section 3.1]).

Fact 2.4.

Let f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R} and dd\in\mathbb{N}, let (J,z)(J,z) be a random restriction where each j[n]j\in[n] is included in JJ with probability 12d\frac{1}{2d}, and z{0,1}J¯z\in{\left\{0,1\right\}}^{\bar{J}} is chosen uniformly. Then 𝔼J,z[W=1[fJ¯z]]Wd[f]{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}\gtrsim W_{\approx d}[f].

Proof.

For a fixed JJ and TJT\subseteq J, we have that fJ¯z^(T)=SJ¯f^(TS)χS(z)\widehat{f_{\bar{J}\rightarrow z}}(T)=\sum\limits_{S\subseteq\bar{J}}\widehat{f}(T\cup S)\chi_{S}(z). Thus,

𝔼J,z[W=1[fJ¯z]]=𝔼J[𝔼z[jJ(SJ¯f^(S{j})χS(z))2]]=𝔼J[jJSJ¯f^(S{j})2],{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}={\mathop{\mathbb{E}}_{J}\left[{{\mathop{\mathbb{E}}_{z}\left[{\sum\limits_{j\in J}\left(\sum\limits_{S\subseteq\bar{J}}\widehat{f}(S\cup{\left\{j\right\}})\chi_{S}(z)\right)^{2}}\right]}}\right]}={\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{j\in J}\sum\limits_{S\subseteq\bar{J}}\widehat{f}(S\cup{\left\{j\right\}})^{2}}\right]},

where the last transition follows by pushing the expectation over zz inside and using Parseval’s equality. It follows that

𝔼J,z[W=1[fJ¯z]]=𝔼J[Tf^(T)21|TJ|=1]=Tf^(T)2PrJ[|TJ|=1],{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}={\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{T}\widehat{f}(T)^{2}1_{\left|{T\cap J}\right|=1}}\right]}=\sum\limits_{T}\widehat{f}(T)^{2}{\Pr_{J}\left[{\left|{T\cap J}\right|=1}\right]},

and as PrJ[|TJ|=1]1{\Pr_{J}\left[{\left|{T\cap J}\right|=1}\right]}\gtrsim 1 for all TT whose size is between dd and 2d2d and the rest of the summands are all non-negative, the proof is concluded. ∎

2.3 Bonami’s Lemma

The degree of a function f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R} is defined as 𝖽𝖾𝗀(f)=max{|S||f^(S)0}{\sf deg}(f)=\max\left\{\left.\left|{S}\right|\;\right|\widehat{f}(S)\neq 0\right\}. We will need the following consequence of the Bonami-Beckner-Gross hypercontractive inequality [Bec75, Bon70, Gro75] (see also [O’D14]), showing that the 44-norm of a low-degree function ff is comparable to its 22-norm.

Theorem 2.5.

Let f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R} be a function of degree at most dd. Then f43df2\|f\|_{4}\leqslant\sqrt{3}^{d}\|f\|_{2}.

3 Isoperimetric Inequalities on the Hypercube

In this section, we prove Theorems 1.2 and 1.3. We begin with the following very crude bound on the Talagrand boundary, and then improve it using random restrictions.

Lemma 3.1.

Let f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}. Then 𝔼[f(x)2]2W=1[f]{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant 2\sqrt{W_{=1}[f]}.

Proof.

By the triangle inequality and Fact 2.2 we have

𝔼x[f(x)2]𝔼x[f(x)]2=2(f^({1}),,f^({n}))2=2i=1nf^({i})2=2W=1[f].{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x}\left[{\nabla f(x)}\right]}\|_{2}=\|2(\widehat{f}({\left\{1\right\}}),\ldots,\widehat{f}({\left\{n\right\}}))\|_{2}=2\sqrt{\sum\limits_{i=1}^{n}\widehat{f}({\left\{i\right\}})^{2}}=2\sqrt{W_{=1}[f]}.\qed
Lemma 3.2.

Let dd\in\mathbb{N}, and f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}. Then 𝔼[f(x)2]dWd[f]{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\gtrsim\sqrt{d}W_{\approx d}[f].

Proof.

Let (J,z)(J,z) be a random restriction for ff, where each j[n]j\in[n] is included in JJ independently with probability 12d\frac{1}{2d}, and z{0,1}J¯z\in{\left\{0,1\right\}}^{\bar{J}} is sampled uniformly. Fix x{0,1}nx\in{\left\{0,1\right\}}^{n}, and note that

𝔼J[fJ¯xJ¯(xJ)2]𝔼J[fJ¯xJ¯(xJ)22]=𝔼J[j|jf(x)|21jJ]=f(x)22d.{\mathop{\mathbb{E}}_{J}\left[{\|\nabla f_{\bar{J}\rightarrow x_{\bar{J}}}(x_{J})\|_{2}}\right]}\leqslant\sqrt{{\mathop{\mathbb{E}}_{J}\left[{\|\nabla f_{\bar{J}\rightarrow x_{\bar{J}}}(x_{J})\|^{2}_{2}}\right]}}=\sqrt{{\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{j}{\left|{\partial_{j}f(x)}\right|^{2}1_{j\in J}}}\right]}}=\frac{\|\nabla f(x)\|_{2}}{\sqrt{2d}}.

Therefore,

12d𝔼[f(x)2]𝔼J,z[𝔼x{0,1}J[fJ¯z(x)2]].\frac{1}{\sqrt{2d}}{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant{\mathop{\mathbb{E}}_{J,z}\left[{{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{J}}\left[{\|\nabla f_{\bar{J}\rightarrow z}(x)\|_{2}}\right]}}\right]}. (2)

On the other hand, for each JJ and z{0,1}J¯z\in{\left\{0,1\right\}}^{\bar{J}} we may apply Lemma 3.1 on fJ¯zf_{\bar{J}\rightarrow z} to get that

𝔼x{0,1}J[fJ¯z(x)2]W=1[fJ¯z]W=1[fJ¯z].{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{J}}\left[{\|\nabla f_{\bar{J}\rightarrow z}(x)\|_{2}}\right]}\gtrsim\sqrt{W_{=1}[f_{\bar{J}\rightarrow z}]}\geqslant W_{=1}[f_{\bar{J}\rightarrow z}].

Taking expectation over JJ and zz we get that

𝔼J,z[𝔼x{0,1}J[fJ¯z(x)2]]𝔼J,z[W=1[fJ¯z]]Wd[f],{\mathop{\mathbb{E}}_{J,z}\left[{{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{J}}\left[{\|\nabla f_{\bar{J}\rightarrow z}(x)\|_{2}}\right]}}\right]}\gtrsim{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}\gtrsim W_{\approx d}[f],

where the last inequality is by Fact 2.4. Combining this with equation (2) finishes the proof. ∎

Lemma 3.3.

Let dd\in\mathbb{N}, and f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}. Then 𝔼[f(x)2]dWd[f]{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\gtrsim\sqrt{d}W_{\geqslant d}[f].

Proof.

Applying Lemma 3.2 with d2jd2^{j} in place of dd for j=0,1,2,j=0,1,2,\ldots, we get that

2j/2𝔼[f(x)2]dW2jd[f],2^{-j/2}{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\gtrsim\sqrt{d}W_{\approx 2^{j}d}[f],

and summing over jj yields the result. ∎

3.1 Proof of Theorem 1.2

If 𝗏𝖺𝗋(f)216{\sf var}(f)\geqslant 2^{-16}, then it is enough to prove a lower bound of 𝗏𝖺𝗋(f)\gtrsim{\sf var}(f). Applying Lemma 3.2 on dd’s that are powers of 22 between 11 and nn, we get that

j=0logn12j𝔼[f(x)2]j=0lognW2j[f]=𝗏𝖺𝗋(f),\sum\limits_{j=0}^{\log n}\frac{1}{\sqrt{2^{j}}}{{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}}\gtrsim\sum\limits_{j=0}^{\log n}{W_{\approx 2^{j}}[f]}={\sf var}(f),

and by geometric sum, the left hand side is 𝔼[f(x)2]\lesssim{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}.

We thus assume that 𝗏𝖺𝗋(f)216{\sf var}(f)\leqslant 2^{-16}, and without loss of generality the majority value of f(x)f(x) is 0. Thus, 𝔼[f]2𝗏𝖺𝗋(f)\mathop{\mathbb{E}}[f]\leqslant 2{\sf var}(f). Setting d=18log(1/𝗏𝖺𝗋(f))d=\frac{1}{8}\log(1/{\sf var}(f)), we have that

0<|S|df^(S)2fd22=f,fdf4/3fd43d𝔼[f]3/4f22d𝔼[f]5/40.9𝗏𝖺𝗋(f),\sum\limits_{0<\left|{S}\right|\leqslant d}\widehat{f}(S)^{2}\leqslant\|f^{\leqslant d}\|_{2}^{2}=\langle{f},{f^{\leqslant d}}\rangle\leqslant\|f\|_{4/3}\|f^{\leqslant d}\|_{4}\leqslant\sqrt{3}^{d}\mathop{\mathbb{E}}[f]^{3/4}\|f\|_{2}\leqslant 2^{d}\mathop{\mathbb{E}}[f]^{5/4}\leqslant 0.9{\sf var}(f),

where we used the bound fd43dfd23df2\|f^{\leqslant d}\|_{4}\leqslant\sqrt{3}^{d}\|f^{\leqslant d}\|_{2}\leqslant\sqrt{3}^{d}\|f\|_{2} which follows from Theorem 2.5 and Parseval’s equality. Therefore, Wd[f]0.1𝗏𝖺𝗋(f)W_{\geqslant d}[f]\geqslant 0.1{\sf var}(f), and by Lemma 3.3 we conclude the result.∎

3.2 Proof of Theorem 1.3

Denote M[f]=i=1nIi[f]2M[f]=\sum\limits_{i=1}^{n}I_{i}[f]^{2}; we need the following result due to [Tal96, BKS99, KK13] (the precise version we use is due to Keller and Kindler [KK13]).

Theorem 3.4.

There are c1,c2>0c_{1},c_{2}>0, such that for all f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} we have that

Wc1log(1/M[f])[f]M[f]c2.W_{\leqslant c_{1}\log(1/M[f])}[f]\leqslant M[f]^{c_{2}}.

We may now give the proof of Theorem 1.3.

Proof of Theorem 1.3.

If M[f]𝗏𝖺𝗋(f)2/c2M[f]\geqslant{\sf var}(f)^{2/c_{2}}, then the result follows from Theorem 1.2. Otherwise, fix c1,c2c_{1},c_{2} from Theorem 3.4 and set d=c1log(1/M[f])d=c_{1}\log(1/M[f]). By Theorem 3.4 we have that Wd[f]M[f]c2𝗏𝖺𝗋(f)212𝗏𝖺𝗋(f)W_{\leqslant d}[f]\leqslant M[f]^{c_{2}}\leqslant{\sf var}(f)^{2}\leqslant{1\over 2}{\sf var}(f), and so W>d[f]12𝗏𝖺𝗋(f)W_{>d}[f]\geqslant{1\over 2}{\sf var}(f), and the result follows from Lemma 3.3. ∎

We remark that Theorem 1.3 implies a stability result for the KKL theorem. Namely, if all influences of a Boolean function are at most O(logn/n)O(\log n/n), then ff must have constant-size vertex boundary. More precisely:

Corollary 3.5.

For any K>0K>0 there are c,δ>0c,\delta>0 such that the following holds. Suppose that for f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} we have that maxiIi[f]K𝗏𝖺𝗋(f)lognn\max_{i}I_{i}[f]\leqslant K{\sf var}(f)\frac{\log n}{n}. Then

Prx[f(x)2c𝗏𝖺𝗋(f)logn]δ𝗏𝖺𝗋(f).{\Pr_{x}\left[{\|\nabla f(x)\|_{2}\geqslant c{\sf var}(f)\sqrt{\log n}}\right]}\geqslant\delta{\sf var}(f).
Proof.

Note that by assumption, M[f]K2log2nnK2nM[f]\leqslant K^{2}\frac{\log^{2}n}{n}\leqslant\frac{K^{2}}{\sqrt{n}}, so Theorem 1.3 implies that Z=def𝔼x[f(x)2]K𝗏𝖺𝗋(f)lognZ\stackrel{{\scriptstyle def}}{{=}}{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant K^{\prime}{\sf var}(f)\sqrt{\log n} for some KK^{\prime} depending only on KK. Also, we have that

𝔼x[f(x)22]=I1[f]++In[f]K𝗏𝖺𝗋(f)logn.{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}^{2}}\right]}=I_{1}[f]+\ldots+I_{n}[f]\leqslant K{\sf var}(f)\log n.

It follows from the Paley-Zygmund inequality that

Pr[f(x)212Z]14Z2𝔼x[f(x)22]14K2K𝗏𝖺𝗋(f),{\Pr\left[{\|\nabla f(x)\|_{2}\geqslant{1\over 2}Z}\right]}\geqslant\frac{1}{4}\frac{Z^{2}}{{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}^{2}}\right]}}\geqslant\frac{1}{4}\frac{K^{\prime 2}}{K}{\sf var}(f),

and the claim is proved for c=12Kc={1\over 2}K^{\prime} and δ=14K2K\delta=\frac{1}{4}\frac{K^{\prime 2}}{K}. ∎

3.3 Extensions

3.3.1 Other LpL_{p}-norms

Lemma 3.1 applies not only in the case of L2L_{2}-norms, but rather for any LpL_{p} norm:

Lemma 3.6.

Let 1p21\leqslant p\leqslant 2 and f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}. Then 𝔼[f(x)p]W=1[f]1/p{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{p}}\right]}\geqslant W_{=1}[f]^{1/p}.

Proof.

By Jensen’s inequality and Lemma 3.1 we have

𝔼x[f(x)p]=𝔼x[f(x)22/p]𝔼x[f(x)2]2/pW=1[f]1/p.{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{p}}\right]}={\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}^{2/p}}\right]}\geqslant{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}}\right]}^{2/p}\geqslant W_{=1}[f]^{1/p}.\qed

As in Lemma 3.2, the previous lemma quickly leads to the following conclusion.

Lemma 3.7.

Let dd\in\mathbb{N}, α[1/2,1]\alpha\in[1/2,1] and f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}. Then 𝔼[sf(x)α]dαWd[f]{\mathop{\mathbb{E}}\left[{s_{f}(x)^{\alpha}}\right]}\gtrsim d^{\alpha}W_{\geqslant d}[f].

Using the same argument as before and replacing invocations of Lemma 3.3 with Lemma 3.7, one may conclude variants of Theorems 1.2 and 1.3 in which square roots are replaced with any power p[1/2,1]p\in[1/2,1]. Below are precise statements.

Theorem 3.8.

There exists an absolute constant c>0c>0, such that for all non-constant functions f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} and for all p[1/2,1]p\in[1/2,1] we have that

𝔼x[sf(x)p]c𝗏𝖺𝗋(f)(log(1𝗏𝖺𝗋(f)))p.{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\geqslant c\cdot{\sf var}(f)\left(\log\left(\frac{1}{{\sf var}(f)}\right)\right)^{p}.
Proof.

The proof is identical to the proof of Theorem 1.2 where we use Lemma 3.7 instead of Lemma 3.3. ∎

Theorem 3.9.

There exists an absolute constant c>0c>0, such that for all non-constant functions f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} and for all p[1/2,1]p\in[1/2,1] we have that

𝔼x[sf(x)p]c𝗏𝖺𝗋(f)(log(1+1i=1nIi[f]2))p.{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\geqslant c\cdot{\sf var}(f)\left(\log\left(1+\frac{1}{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\right)\right)^{p}.
Proof.

The proof is identical to the proof of Theorem 1.3 where we use Lemma 3.7 instead of Lemma 3.3. ∎

3.3.2 Talagrand Boundary and Noise Sensitivity

We next demonstrate the connection between having small Talagrand-type boundary and being noise stable, a notion that we define next.

For a parameter ρ[0,1]\rho\in[0,1] and a point x{0,1}nx\in{\left\{0,1\right\}}^{n}, the distribution of ρ\rho-correlated inputs with xx, denoted as yρxy\sim_{\rho}x, is defined by the following randomized process. For each i[n]i\in[n] independently, set yi=xiy_{i}=x_{i} with probability ρ\rho, and otherwise sample yi{0,1}y_{i}\in\{0,1\} uniformly. The operator Tρ:L2({0,1}n)L2({0,1}n)T_{\rho}\colon L_{2}({\left\{0,1\right\}}^{n})\to L_{2}({\left\{0,1\right\}}^{n}), known as the noise operator with noise rate 1ρ1-\rho, is defined as

Tρf(x)=𝔼yρx[f(y)].T_{\rho}f(x)={\mathop{\mathbb{E}}_{y\sim_{\rho}x}\left[{f(y)}\right]}.
Definition 3.10.

For ε>0\varepsilon>0, the noise stability of ff with noise rate ε\varepsilon is defined as 𝖲𝗍𝖺𝖻1ε(f)=f,T1εf{\sf Stab}_{1-\varepsilon}(f)=\langle{f},{T_{1-\varepsilon}f}\rangle.

The relation established in Lemma 3.7 between the Fourier weight of ff and 𝔼[sf(x)α]{\mathop{\mathbb{E}}\left[{s_{f}(x)^{\alpha}}\right]} for 1/2α11/2\leqslant\alpha\leqslant 1 implies a relation between the latter quantity and noise stability, as follows:

Corollary 3.11.

There exists an absolute constant C>0C>0, such that for any δ[0,1/2]\delta\in[0,1/2] and for any function f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}} satisfying 𝔼[sf(x)12+δ]A{\mathop{\mathbb{E}}\left[{s_{f}(x)^{{1\over 2}+\delta}}\right]}\leqslant A, we have that 𝖲𝗍𝖺𝖻1ε(f)1CAε12+δlog(1/ε){\sf Stab}_{1-\varepsilon}(f)\geqslant 1-C\cdot A\cdot\varepsilon^{{1\over 2}+\delta}\log(1/\varepsilon).

Proof.
1𝖲𝗍𝖺𝖻1ε(f)=Prx,y (1ε)-correlated[f(x)f(y)]=21f,T1εf=2k=0n(1(1ε)k)W=k[f].1-{\sf Stab}_{1-\varepsilon}(f)={\Pr_{x,y\text{ $(1-\varepsilon)$-correlated}}\left[{f(x)\neq f(y)}\right]}=2\langle{1-f},{T_{1-\varepsilon}f}\rangle=2\sum\limits_{k=0}^{n}{(1-(1-\varepsilon)^{k})W_{=k}[f]}.

Set d=1/εd=1/\varepsilon; we split the sum into k<dk<d and kdk\geqslant d, and upper bound the contribution of each part separately. For kdk\geqslant d, using Lemma 3.7 we have

2k=dn(1(1ε)k)W=k[f]2Wd[f]Ad(1/2+δ)=Aε1/2+δ.2\sum\limits_{k=d}^{n}{(1-(1-\varepsilon)^{k})W_{=k}[f]}\leqslant 2W_{\geqslant d}[f]\lesssim A\cdot d^{-(1/2+\delta)}=A\varepsilon^{1/2+\delta}.

For kdk\leqslant d, we write

k=0d(1(1ε)k)W=k[f]k=1dkεW=k[f]εk=1dWk[f].\sum\limits_{k=0}^{d}{(1-(1-\varepsilon)^{k})W_{=k}[f]}\leqslant\sum\limits_{k=1}^{d}{k\varepsilon W_{=k}[f]}\leqslant\varepsilon\sum\limits_{k=1}^{d}W_{\geqslant k}[f].

Using Lemma 3.7 we get Wk[f]Ak(1/2+δ)W_{\geqslant k}[f]\lesssim A\cdot k^{-(1/2+\delta)}, and so

k=0d(1(1ε)k)W=k[f]Aεk=1d1k1/2+δAεd1/2δk=1d1kAεd1/2δlogd.\sum\limits_{k=0}^{d}{(1-(1-\varepsilon)^{k})W_{=k}[f]}\lesssim A\cdot\varepsilon\sum\limits_{k=1}^{d}\frac{1}{k^{1/2+\delta}}\lesssim A\cdot\varepsilon d^{1/2-\delta}\sum\limits_{k=1}^{d}\frac{1}{k}\lesssim A\cdot\varepsilon d^{1/2-\delta}\log d.\qed
Remark 3.12.

Inspecting the proof of Corollary 3.11, we see that if δ\delta is bounded away from 1/21/2, say δ0.49\delta\leqslant 0.49, then the log(1/ε)\log(1/\varepsilon) term may be removed. Indeed, this is true because in the end of the proof, we can replace the upper bound k=1d1k1/2+δd1/2δlogd\sum\limits_{k=1}^{d}\frac{1}{k^{1/2+\delta}}\lesssim d^{1/2-\delta}\log d by the upper bound k=1d1k1/2+δd1/2δ\sum\limits_{k=1}^{d}\frac{1}{k^{1/2+\delta}}\lesssim d^{1/2-\delta} that holds if δ\delta is bounded away from 1/21/2.

3.4 Robust Versions

Fix a 22-coloring 𝖼𝗈𝗅:E({0,1}n){𝗋𝖾𝖽,𝖻𝗅𝗎𝖾}{\sf col}\colon E(\{0,1\}^{n})\to\{{\sf red},{\sf blue}\}. Let 𝗋𝖾𝖽f(x)\nabla_{{\sf red}}f(x) be the Boolean vector whose iith coordinate is 11 if f(x)=1f(x)=1 and (x,xei)(x,x\oplus e_{i}) is a red sensitive edge, and let 𝖻𝗅𝗎𝖾f(x)\nabla_{{\sf blue}}f(x) be the Boolean vector whose iith coordinate is 11 if f(x)=0f(x)=0 and (x,xei)(x,x\oplus e_{i}) is a blue sensitive edge. Analogously to Lemma 3.1 we have:

Lemma 3.13.

Let f:{0,1}n{0,1}f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}, and 𝖼𝗈𝗅:E({0,1}n){𝗋𝖾𝖽,𝖻𝗅𝗎𝖾}{\sf col}\colon E(\{0,1\}^{n})\to\{{\sf red},{\sf blue}\} be any coloring. Then

𝔼[𝗋𝖾𝖽f(x)2+𝖻𝗅𝗎𝖾f(x)2]12W=1[f].{\mathop{\mathbb{E}}\left[{\|\nabla_{{\sf red}}f(x)\|_{2}+\|\nabla_{{\sf blue}}f(x)\|_{2}}\right]}\geqslant\frac{1}{2}\sqrt{W_{=1}[f]}.
Proof.

By the triangle inequality 𝔼x[𝗋𝖾𝖽f(x)2]𝔼x[𝗋𝖾𝖽f](x)2{\mathop{\mathbb{E}}_{x}\left[{\|\nabla_{{\sf red}}f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf red}}f}\right]}(x)\|_{2}, and looking at the vector v=𝔼x[𝗋𝖾𝖽f(x)]v={\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf red}}f(x)}\right]} we see its iith entry is vi=Prx[f(x)=1,(x,xei) is a red influential edge]v_{i}={\Pr_{x}\left[{f(x)=1,(x,x\oplus e_{i})\text{ is a red influential edge}}\right]}. Similarly, 𝔼x[𝖻𝗅𝗎𝖾f(x)2]𝔼x[𝖻𝗅𝗎𝖾f](x)2{\mathop{\mathbb{E}}_{x}\left[{\|\nabla_{{\sf blue}}f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf blue}}f}\right]}(x)\|_{2} and looking at the vector u=𝔼x[𝖻𝗅𝗎𝖾f(x)]u={\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf blue}}f(x)}\right]} we see its iith entry is ui=Prx[f(x)=0,(x,xei) is a blue influential edge]u_{i}={\Pr_{x}\left[{f(x)=0,(x,x\oplus e_{i})\text{ is a blue influential edge}}\right]}. It follows that

𝔼[𝗋𝖾𝖽f(x)2+𝖻𝗅𝗎𝖾f(x)2]v2+u2u+v2.{\mathop{\mathbb{E}}\left[{\|\nabla_{{\sf red}}f(x)\|_{2}+\|\nabla_{{\sf blue}}f(x)\|_{2}}\right]}\geqslant\|v\|_{2}+\|u\|_{2}\geqslant\|u+v\|_{2}.

Let Ri(x)R_{i}(x) be the indicator that (x,xei)(x,x\oplus e_{i}) is a red influential edge and f(x)=1f(x)=1, and Bi(x)B_{i}(x) be the indicator that (x,xei)(x,x\oplus e_{i}) is a blue influential edge and f(x)=0f(x)=0. Then

ui+vi=12n+1x(Ri(x)+Ri(xei)+Bi(x)+Bi(xei))12n+1x1(x,xei) influential=12Ii[f].u_{i}+v_{i}=\frac{1}{2^{n+1}}\sum\limits_{x}(R_{i}(x)+R_{i}(x\oplus e_{i})+B_{i}(x)+B_{i}(x\oplus e_{i}))\geqslant\frac{1}{2^{n+1}}\sum\limits_{x}1_{(x,x\oplus e_{i})\text{ influential}}=\frac{1}{2}I_{i}[f].

Hence

u+v212i=1nIi[f]212W=1[f].\|u+v\|_{2}\geqslant\frac{1}{2}\sqrt{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\geqslant\frac{1}{2}\sqrt{W_{=1}[f]}.\qed

Secondly, for any ff, coloring 𝖼𝗈𝗅{\sf col} and x{0,1}nx\in{\left\{0,1\right\}}^{n}, choosing J[n]J\subseteq[n] randomly by including each coordinate independently with probability 12d\frac{1}{2d}, one has

𝔼J[sfJ¯xJ¯,𝗋𝖾𝖽(xJ)]𝔼J[sfJ¯xJ¯,𝗋𝖾𝖽(xJ)]=sf,𝗋𝖾𝖽(x)2d,{\mathop{\mathbb{E}}_{J}\left[{\sqrt{s_{f_{\bar{J}\rightarrow x_{\bar{J}}},{\sf red}}(x_{J})}}\right]}\leqslant\sqrt{{\mathop{\mathbb{E}}_{J}\left[{s_{f_{\bar{J}\rightarrow x_{\bar{J}}},{\sf red}}(x_{J})}\right]}}=\sqrt{\frac{s_{f,{\sf red}}(x)}{2d}},

and similarly for blue sensitivity. This implies the analog of (2) for red and blue sensitivity, and repeating the argument in Lemmas 3.23.3 we get that 𝖳𝖺𝗅𝖺𝗀𝗋𝖺𝗇𝖽𝖼𝗈𝗅(f)dWd[f]{\sf Talagrand}_{{\sf col}}(f)\gtrsim\sqrt{d}W_{\geqslant d}[f] for any coloring 𝖼𝗈𝗅{\sf col}. The robust versions of Theorems 1.2 and 1.3 now follow as we have shown that Wd[f]Ω(𝗏𝖺𝗋(f))W_{\geqslant d}[f]\geqslant\Omega({\sf var}(f)) where we can either take d=Ω(log(1/𝗏𝖺𝗋(f)))d=\Omega(\log(1/{\sf var}(f))) or d=Ω(log(1/M(f)))d=\Omega(\log(1/M(f))).

4 Improved Junta Theorems

4.1 Proof of Theorem 1.5

In this section, we prove Theorem 1.5. As explained in the introduction, if one does not care about getting tight dependency between the junta size JJ and the parameter AA, then one may use Corollary 3.11 and Bourgain’s noise sensitivity theorem [Bou02, KKO18] and get a junta theorem with slightly worse parameters than those stated in Theorem 1.5. The main goal of this section is to get an optimal dependency of JJ on AA.

For the proof, we will need the notion of noisy influences of a function ff, and we first generalize the notion of influences to real valued functions.

Definition 4.1.

Let f:{0,1}nf\colon{\left\{0,1\right\}}^{n}\to\mathbb{R} be a function, and let i[n]i\in[n] be a coordinate. We define the influence of ii on ff as Ii[f]=if22I_{i}[f]=\|\partial_{i}f\|_{2}^{2}, where we recall that if\partial_{i}f is the discrete derivative of ff along direction ii.

Definition 4.2.

The ρ\rho-noisy influence of a coordinate ii for f:{0,1}nf\colon\{0,1\}^{n}\to\mathbb{R} is defined to be Ii(ρ)[f]=Ii[Tρf]I_{i}^{(\rho)}[f]=I_{i}[T_{\rho}f].

Fact 4.3.

[[O’D14]] For all f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} we have that i=1nIi(ρ)[f]11ρ\sum\limits_{i=1}^{n}I_{i}^{(\rho)}[f]\lesssim\frac{1}{1-\rho}. ∎

Let C1,C2C_{1},C_{2} be two absolute constants, and we think of C2C_{2} as much larger than C1C_{1}. Given f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} with 𝔼x[sf(x)p]A{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A, we denote d=C1(Aε)1/pd=C_{1}\left(\frac{A}{\varepsilon}\right)^{1/p}, δ=2C22p1d\delta=2^{-\frac{C_{2}}{2p-1}d}. Define the candidate set of junta coordinates as

𝒥={i[n]|Ii[T113d[f]]δ},\mathcal{J}=\left\{\left.i\in[n]\;\right|I_{i}[T_{1-\frac{1}{3d}}[f]]\geqslant\delta\right\},

i.e. 𝒥\mathcal{J} is the set of coordinates whose noisy influence at ff is somewhat large. Our task is to show that ff is ε\varepsilon-close to a 𝒥\mathcal{J} junta, i.e. that

S:S𝒥f^(S)2ε,\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant\varepsilon, (3)

and the rest of the proof is devoted towards this goal. We remark that the junta 𝒥\mathcal{J} is similar in spirit to the junta in Bourgain’s original proof [Bou02], which is easier to see in [KKO18, Section 4] and in particular in [KKO18, Theorem 4.20]. Indeed, both here and in [KKO18, Section 4], the junta is taken to be the set of coordinates with significant noisy influence. There are many other similarities between the argument presented therein and the argument here. First, in both cases one considers a dyadic sort of partitioning of the Fourier coefficients: in [KKO18, Section 4] it is achieved via considering the difference in the noise sensitivity of ff at different noise rates (see [KKO18, Theorem 4.16]). In the current proof, we perform explicit dyadic partitioning and also accompany it by applications of suitable noise operators (the operators SkS_{k}, SkS_{k}^{\prime} and Sk′′S_{k}^{\prime\prime} below). Second, both proofs also use some relation between the noise sensitivity / moments of the sensitivity and the level 11 weight of the function ff (and its restrictions). While the technical execution of the proofs is different, it is hard to pinpoint the high level difference between them. For example, the “harsh” degree truncations in [KKO18] have to be replaced by the “softer” type of degree truncations that are achieved by the operators SkS_{k}, SkS_{k}^{\prime} and Sk′′S_{k}^{\prime\prime} (see also the parameter W~k,d\tilde{W}_{k,d}) to make the argument go through. Additionally, towards the end of the argument we are able to bound the sum of noisy influences in a non-trivial way, which ultimately allows us to choose δ=2Op(d)\delta=2^{O_{p}(d)} as opposed to δ=2Op(dlogd)\delta=2^{O_{p}(d\log d)} (which would have resulted in slightly worse bounds as in [KKO18, Theorem 4.20]); see Claim 4.5.

Towards proving (3), we partition the contribution of different scales of |S𝒥|\left|{S\cap\mathcal{J}}\right| to the left hand side, and for each k=1,,logdk=1,\ldots,\log d define

Wk,d[f]=S:|S|d,2k1|S𝒥¯|2kf^(S)2.W_{k,d}[f]=\sum\limits_{\begin{subarray}{c}S:\left|{S}\right|\leqslant d,\\ 2^{k-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|\leqslant 2^{k}\end{subarray}}\widehat{f}(S)^{2}.

Thus, we may write

S:S𝒥f^(S)2W>d[f]+k=1logdWk,d[f].\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant W_{>d}[f]+\sum\limits_{k=1}^{\log d}W_{k,d}[f]. (4)

The rest of the proof is devoted to bounding W>d[f]W_{>d}[f] and Wk,d[f]W_{k,d}[f] for each kk. From Lemma 3.7 it follows that

W>d[f]O(Adp)ε16W_{>d}[f]\leqslant O\left(\frac{A}{d^{p}}\right)\leqslant\frac{\varepsilon}{16} (5)

for sufficiently large C1C_{1}. Next, we bound Wk,d[f]W_{k,d}[f]. Fix kk, and consider the operator Sk=T11/d𝒥T11/2k𝒥¯S_{k}=T_{1-1/d}^{\mathcal{J}}T_{1-1/2^{k}}^{\bar{\mathcal{J}}}, i.e. the noise operator that applies 1/2k1/2^{k} noise on coordinates outside 𝒥\mathcal{J}, and 1/d1/d noise on coordinates of 𝒥\mathcal{J}. Note that

Wk,d[f]Wk,d[Skf],W_{k,d}[f]\lesssim W_{k,d}[S_{k}f],

so it suffices to bound Wk,d[Skf]W_{k,d}[S_{k}f]. We partition Wk,d[Skf]W_{k,d}[S_{k}f] according to contribution of each j𝒥¯j\in\bar{\mathcal{J}}, getting:

Wk,d[Skf]=S:|S|d,2k1|S𝒥¯|2kSkf^(S)212k1j𝒥Sj:|S|d,2k1|S𝒥¯|2kSkf^(S)2\displaystyle W_{k,d}[S_{k}f]=\sum\limits_{\begin{subarray}{c}S:\left|{S}\right|\leqslant d,\\ 2^{k-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|\leqslant 2^{k}\end{subarray}}\widehat{S_{k}f}(S)^{2}\leqslant\frac{1}{2^{k-1}}\sum\limits_{j\not\in\mathcal{J}}\sum\limits_{\begin{subarray}{c}S\ni j:\left|{S}\right|\leqslant d,\\ 2^{k-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|\leqslant 2^{k}\end{subarray}}\widehat{S_{k}f}(S)^{2} 12k1j𝒥Skjf22\displaystyle\lesssim\frac{1}{2^{k-1}}\sum\limits_{j\not\in\mathcal{J}}\|S_{k}\partial_{j}f\|_{2}^{2}
=defW~k,d[f].\displaystyle\stackrel{{\scriptstyle def}}{{=}}\tilde{W}_{k,d}[f].

Thus, we have that Wk,d[f]W~k,d[f]W_{k,d}[f]\lesssim\tilde{W}_{k,d}[f], and the rest of the proof is devoted to upper bounding the right hand side. Define the operators SkS_{k}^{\prime} and Sk′′S_{k}^{\prime\prime} as

Sk=T113d𝒥T112k+23d𝒥¯,Sk′′=T123d𝒥T112k+13d𝒥¯.S_{k}^{\prime}=T_{1-\frac{1}{3d}}^{\mathcal{J}}T_{1-\frac{1}{2^{k}}+\frac{2}{3d}}^{\bar{\mathcal{J}}},\qquad S_{k}^{\prime\prime}=T_{1-\frac{2}{3d}}^{\mathcal{J}}T_{1-\frac{1}{2^{k}}+\frac{1}{3d}}^{\bar{\mathcal{J}}}.

We establish the following claim using random restrictions and hypercontractivity:

Claim 4.4.

For all τ(0,1)\tau\in(0,1), we have that

2kW~k,d[f]Cd1pτ2p1𝔼x[sf(x)p]+Cτ1100dj𝒥¯Skjf22+1100d,2^{k}\tilde{W}_{k,d}[f]\leqslant C\cdot d^{1-p}\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}+C\cdot\tau^{-\frac{1}{100d}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}},

where CC is an absolute constant.

Proof.

Deferred to Section 4.2. ∎

Given Claim 4.4, one may pull out Skjf21100dδ1200d\|S_{k}^{\prime}\partial_{j}f\|_{2}^{\frac{1}{100d}}\leqslant\delta^{\frac{1}{200d}} outside the sum, and bound the sum on the rest of the noisy influences by O(d)O(d) to get a bound of the form 2kW~k,d[f]d1pτ2p1𝔼x[sf(x)p]+dτ1100dδ1200d2^{k}\tilde{W}_{k,d}[f]\lesssim d^{1-p}\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}+d\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}. This idea is good enough to establish a junta theorem, yet it gives worse parameters.777One has to take δ=2O(dlogd)\delta=2^{-O(d\log d)}, as opposed to δ=2O(d)\delta=2^{-O(d)} as we have taken. We remark that if one only cares about getting a larger junta of size 2O(dlogd)2^{O(d\log d)}, the proof greatly simplifies. Instead, to get the tight result with respect to the junta size, we will need a more careful bound on the sum of noisy influences outside 𝒥\mathcal{J}. Specifically, we use an improved bound of the sum of noisy influences that relates them to W,d[f]W_{\ell,d}[f]. To state this bound we define

ε,d=|S|>d21|S𝒥¯|2f^(S)2.\varepsilon_{\ell,d}=\sum\limits_{\begin{subarray}{c}\left|{S}\right|>d\\ 2^{\ell-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|\leqslant 2^{\ell}\end{subarray}}\widehat{f}(S)^{2}.
Claim 4.5.

12kj𝒥¯Skjf22=1logn2|k|(W,d[f]+ε,d)\frac{1}{2^{k}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2}\lesssim\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}(W_{\ell,d}[f]+\varepsilon_{\ell,d}).

Proof.

By definition,

12kj𝒥¯Skjf22=4S|S𝒥¯|2k(112k+23d)|S𝒥¯|1(113d)|S𝒥|f^(S)2.\frac{1}{2^{k}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2}=4\sum\limits_{S}\frac{\left|{S\cap\bar{\mathcal{J}}}\right|}{2^{k}}\left(1-\frac{1}{2^{k}}+\frac{2}{3d}\right)^{\left|{S\cap\bar{\mathcal{J}}}\right|-1}\left(1-\frac{1}{3d}\right)^{\left|{S\cap\mathcal{J}}\right|}\widehat{f}(S)^{2}.

Partitioning the last sum according to |S𝒥¯|\left|{S\cap\bar{\mathcal{J}}}\right| and dyadically partitioning so that 21|S𝒥¯|<22^{\ell-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|<2^{\ell}, we have that the last expression is at most

=1logn2k(112k+23d)21(W,d[f]+ε,d)=1logn2|k|(W,d[f]+ε,d).\lesssim\sum\limits_{\ell=1}^{\log n}2^{\ell-k}\left(1-\frac{1}{2^{k}}+\frac{2}{3d}\right)^{2^{\ell-1}}(W_{\ell,d}[f]+\varepsilon_{\ell,d})\lesssim\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}(W_{\ell,d}[f]+\varepsilon_{\ell,d}).\qed

Combining Claims 4.4 and 4.5 we immediately get the following corollary:

Corollary 4.6.

There exists an absolute constant C>0C>0 such that for all τ>0\tau>0,

W~k,d[f]C2kd1pτ2p1𝔼x[sf(x)p]+Cτ1100dδ1200d(=1logd2|k|W,d[f]+=1logn2|k|ε,d).\tilde{W}_{k,d}[f]\leqslant C\cdot 2^{-k}d^{1-p}\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}+C\cdot\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\left(\sum\limits_{\ell=1}^{\log d}2^{-\left|{\ell-k}\right|}W_{\ell,d}[f]+\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}\varepsilon_{\ell,d}\right).

We may now give an upper bound on k=1logdWk,d[f]\sum\limits_{k=1}^{\log d}W_{k,d}[f].

Claim 4.7.

For sufficiently large absolute constant C1C_{1} and sufficiently large C2C_{2} in comparison to C1C_{1}, we have that k=1logdWk,d[f]ε16\sum\limits_{k=1}^{\log d}W_{k,d}[f]\leqslant\frac{\varepsilon}{16}.

Proof.

Let τ>0\tau>0 to be determined. As Wk,d[f]W~k,d[f]W_{k,d}[f]\lesssim\tilde{W}_{k,d}[f], by Corollary 4.6 we have

k=1logdWk,d[f]k=1logd(2kd1pτ2p1A+τ1100dδ1200d(=1logd2|k|W,d[f]+=1logn2|k|ε,d)).\sum\limits_{k=1}^{\log d}W_{k,d}[f]\lesssim\sum\limits_{k=1}^{\log d}\left(2^{-k}d^{1-p}\tau^{2p-1}A+\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\left(\sum\limits_{\ell=1}^{\log d}2^{-\left|{\ell-k}\right|}W_{\ell,d}[f]+\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}\varepsilon_{\ell,d}\right)\right).

The first sum is clearly d1pτ2p1A\lesssim d^{1-p}\tau^{2p-1}A. The second sum is at most

τ1100dδ1200d=1logdW,d[f]k2|k|τ1100dδ1200d=1logdW,d[f].\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log d}W_{\ell,d}[f]\sum\limits_{k}2^{-\left|{\ell-k}\right|}\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log d}W_{\ell,d}[f].

Finally, the third sum is at most

τ1100dδ1200d=1lognε,dk=1logd2|k|τ1100dδ1200d=1lognε,dτ1100dδ1200dW>d[f]τ1100dδ1200dε,\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log n}\varepsilon_{\ell,d}\sum\limits_{k=1}^{\log d}2^{-\left|{\ell-k}\right|}\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log n}\varepsilon_{\ell,d}\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}W_{>d}[f]\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\varepsilon,

where we used (5). We thus get that

k=1logdWk,d[f]Cd1pτ2p1A+Cτ1100dδ1200dk=1logdWk,d[f]+Cτ1100dδ1200dε\sum\limits_{k=1}^{\log d}W_{k,d}[f]\leqslant C\cdot d^{1-p}\tau^{2p-1}A+C\cdot\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{k=1}^{\log d}W_{k,d}[f]+C\cdot\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\varepsilon

for some absolute constant C>0C>0. We choose τ=δ1/4\tau=\delta^{1/4} and then C2C_{2} large enough so that Cτ1100dδ1200d12C\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\leqslant\frac{1}{2}, to get that k=1logdWk,d[f]2Cd1pδ(2p1)/4A+2Cδ1400dε\sum\limits_{k=1}^{\log d}W_{k,d}[f]\leqslant 2C\cdot d^{1-p}\delta^{(2p-1)/4}A+2C\delta^{\frac{1}{400d}}\varepsilon, which is at most ε16\frac{\varepsilon}{16} for large enough C2C_{2}. ∎

We can now prove Theorem 1.5.

Proof of Theorem 1.5.

Combining (4), (5) and Claim 4.7 we get that S:S𝒥f^(S)2ε8\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant\frac{\varepsilon}{8}. Define h(x)=1h(x)=1 if S:S𝒥f^(S)χS(x)1/2\sum\limits_{S:S\subseteq\mathcal{J}}\widehat{f}(S)\chi_{S}(x)\geqslant 1/2 and 0 otherwise. Clearly, hh is a 𝒥\mathcal{J}-junta and we have that

fh1=fh224fS:S𝒥f^(S)χS(x)22=4S:S𝒥f^(S)2ε.\|f-h\|_{1}=\|f-h\|_{2}^{2}\leqslant 4\left\|f-\sum\limits_{S:S\subseteq\mathcal{J}}\widehat{f}(S)\chi_{S}(x)\right\|_{2}^{2}=4\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant\varepsilon.

Finally, note that as the sum of (113d)\left(1-\frac{1}{3d}\right)-noisy influences of ff is at most O(d)O(d) by Fact 4.3, we get that

|𝒥|O(dδ).\left|{\mathcal{J}}\right|\leqslant O\left(\frac{d}{\delta}\right).\qed

4.2 Proof of Claim 4.4

For the proof of Claim 4.4 we need a version of the hypercontractive inequality with the noise operator, as follows:

Lemma 4.8.

[[O’D14]] Let q>2q>2, and ρ1q1\rho\leqslant\frac{1}{\sqrt{q-1}}. Then for all f:{0,1}nf\colon\{0,1\}^{n}\to\mathbb{R} we have Tρfqf2\|T_{\rho}f\|_{q}\leqslant\|f\|_{2}.

We also need the following claim, asserting that noise only decreases Talagrand’s boundary.

Claim 4.9.

For f:{0,1}nf\colon\{0,1\}^{n}\to\mathbb{R} and any noise operator S=Tρ1TρnS=T_{\rho_{1}}\otimes\ldots T_{\rho_{n}} we have 𝔼x[Sf(x)p]𝔼x[f(x)p]{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{Sf}(x)\|_{p}}\right]}\leqslant{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{f}(x)\|_{p}}\right]}.

Proof.
𝔼x[(Sf)(x)p]𝔼x[Sf(x)p]=𝔼x[𝔼ySx[f(y)]p],{\mathop{\mathbb{E}}_{x}\left[{\|\nabla(Sf)(x)\|_{p}}\right]}\leqslant{\mathop{\mathbb{E}}_{x}\left[{\|S\nabla f(x)\|_{p}}\right]}={\mathop{\mathbb{E}}_{x}\left[{\|{\mathop{\mathbb{E}}_{y\sim Sx}\left[{\nabla f(y)}\right]}\|_{p}}\right]},

and the result follows from the triangle inequality. ∎

Take (J,z)(J,z) to be a random restriction so that each j[n]j\in[n] is included in JJ independently with probability 13d\frac{1}{3d}, and consider g=Sk′′fg=S_{k}^{\prime\prime}f. We calculate the contribution of 𝒥¯\bar{\mathcal{J}} to the level 11 weight of a restriction of gg:

d𝔼J,z[jJ𝒥¯gJ¯z^({j})2]=dj𝒥¯𝔼J[Sg(S)^21SJ={j}]\displaystyle d{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}=d\sum\limits_{j\in\overline{\mathcal{J}}}{\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{S}\widehat{g(S)}^{2}1_{S\cap J=\{j\}}}\right]} =dS|S𝒥¯|3d(113d)|S|1g(S)^2\displaystyle=d\sum\limits_{S}\frac{\left|{S\cap\bar{\mathcal{J}}}\right|}{3d}\left(1-\frac{1}{3d}\right)^{\left|{S}\right|-1}\widehat{g(S)}^{2}
=13S|S𝒥¯|(113d)|S|1g(S)^2.\displaystyle=\frac{1}{3}\sum\limits_{S}\left|{S\cap\bar{\mathcal{J}}}\right|\left(1-\frac{1}{3d}\right)^{\left|{S}\right|-1}\widehat{g(S)}^{2}.

Also,

j𝒥Skjf22=4S|S𝒥¯|(112k)2ρ12|S𝒥|ρ22|S𝒥¯|g(S)^2,\displaystyle\sum\limits_{j\not\in\mathcal{J}}\|S_{k}\partial_{j}f\|_{2}^{2}=4\sum\limits_{S}\left|{S\cap\bar{\mathcal{J}}}\right|\left(1-\frac{1}{2^{k}}\right)^{-2}\rho_{1}^{2\left|{S\cap\mathcal{J}}\right|}\rho_{2}^{2\left|{S\cap\bar{\mathcal{J}}}\right|}\widehat{g(S)}^{2},

where ρ1=11/d12/(3d)\rho_{1}=\frac{1-1/d}{1-2/(3d)} and ρ2=11/2k11/2k+1/(3d)\rho_{2}=\frac{1-1/2^{k}}{1-1/2^{k}+1/(3d)}. Using aba+cb+c\frac{a}{b}\leqslant\frac{a+c}{b+c} for 0ab0\leqslant a\leqslant b and c0c\geqslant 0 we have

ρ111/d+2/(3d)12/(3d)+2/(3d)=113d,ρ211/2k+(1/2k1/(3d))11/2k+1/(3d)+(1/2k1/(3d))=113d,\rho_{1}\leqslant\frac{1-1/d+2/(3d)}{1-2/(3d)+2/(3d)}=1-\frac{1}{3d},\qquad\rho_{2}\leqslant\frac{1-1/2^{k}+(1/2^{k}-1/(3d))}{1-1/2^{k}+1/(3d)+(1/2^{k}-1/(3d))}=1-\frac{1}{3d},

and so

j𝒥Skjf2212(11d)2d𝔼J,z[jJ𝒥¯gJ¯z^({j})2]d𝔼J,z[jJ𝒥¯gJ¯z^({j})2].\sum\limits_{j\not\in\mathcal{J}}\|S_{k}\partial_{j}f\|_{2}^{2}\leqslant 12\left(1-\frac{1}{d}\right)^{-2}d{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\lesssim d{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}.

Hence our goal of upper bounding the left hand side in Claim 4.4 reduces to upper bounding the contribution of 𝒥\mathcal{J} to the level 11 of random restrictions of gg. For J,zJ,z define

LJ,z={jJ𝒥¯||gJ¯z^({j})|τ},HJ,z={jJ𝒥¯||gJ¯z^({j})|>τ}.L_{J,z}=\left\{\left.j\in J\cap\bar{\mathcal{J}}\;\right|\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|\leqslant\tau\right\},\qquad H_{J,z}=\left\{\left.j\in J\cap\bar{\mathcal{J}}\;\right|\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|>\tau\right\}.

Then

𝔼J,z[jJ𝒥¯gJ¯z^({j})2]=𝔼J,z[jLJ,zgJ¯z^({j})2]+𝔼J,z[jHJ,zgJ¯z^({j})2].{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}={\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}+{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}.
Bounding the contribution from LJ,zL_{J,z}.

For the first sum, we have

jLJ,zgJ¯z^({j})2(jLJ,zgJ¯z^({j})2)p\displaystyle\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}\leqslant\left(\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}\right)^{p} (jLJ,zτ(2p1)/p|gJ¯z^({j})|1/p)p\displaystyle\leqslant\left(\sum\limits_{j\in L_{J,z}}\tau^{(2p-1)/p}\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{1/p}\right)^{p}
τ2p1(j|gJ¯z^({j})|1/p)p.\displaystyle\leqslant\tau^{2p-1}\left(\sum\limits_{j}\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{1/p}\right)^{p}.

In the last expression we have τ2p1\tau^{2p-1} times the 1/p1/p-norm of the vector (gJ¯z^({j}))jJ(\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}}))_{j\in J}. Thus, using the triangle inequality in a similar way to Lemma 3.1, we may bound the last expression from above by

τ2p1𝔼x[gJ¯z(x)1/p].\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{g_{\overline{J}\rightarrow z}}(x)\|_{1/p}}\right]}. (6)

Taking expectation, we get that

𝔼J,z[jLJ,zgJ¯z^({j})2]τ2p1𝔼J,z[𝔼x[gJ¯z(x)1/p]]=τ2p1𝔼x[𝔼J,z[gJ¯z(x)1/p]].{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\leqslant\tau^{2p-1}{\mathop{\mathbb{E}}_{J,z}\left[{{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{g_{\overline{J}\rightarrow z}}(x)\|_{1/p}}\right]}}\right]}=\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{{\mathop{\mathbb{E}}_{J,z}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}}\right]}}\right]}.

Fix x,zx,z, and note that 𝔼J[gJ¯z(x)1/p1/p]=𝔼J[j|jg(x,z)|1/p1jJ]=13dg(x,z)1/p1/p{\mathop{\mathbb{E}}_{J}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}^{1/p}}\right]}={\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{j}\left|{\partial_{j}g(x,z)}\right|^{1/p}1_{j\in J}}\right]}=\frac{1}{3d}\|\nabla g(x,z)\|_{1/p}^{1/p}. It follows by Jensen’s inequality that

𝔼J[gJ¯z(x)1/p]𝔼J[gJ¯z(x)1/p1/p]pdpg(x,z)1/p.{\mathop{\mathbb{E}}_{J}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}}\right]}\leqslant{\mathop{\mathbb{E}}_{J}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}^{1/p}}\right]}^{p}\leqslant d^{-p}\|\nabla g(x,z)\|_{1/p}.

Therefore (6) is upper bounded by τ2p1dp𝔼x,z[g(x,z)1/p]\tau^{2p-1}d^{-p}{\mathop{\mathbb{E}}_{x,z}\left[{\|\nabla g(x,z)\|_{1/p}}\right]}. Using Claim 4.9, we have that this may be upper bounded by τ2p1dp𝔼x,z[f(x,z)1/p]=τ2p1dp𝔼x,z[sf(x,z)p]\tau^{2p-1}d^{-p}{\mathop{\mathbb{E}}_{x,z}\left[{\|\nabla f(x,z)\|_{1/p}}\right]}=\tau^{2p-1}d^{-p}{\mathop{\mathbb{E}}_{x,z}\left[{s_{f}(x,z)^{p}}\right]}.

Bounding the contribution from HI,zH_{I,z}.

We have

jHJ,zgJ¯z^({j})2τ1100djHJ,z|gJ¯z^({j})|2+1100dτ1100djJ𝒥¯|gJ¯z^({j})|2+1100d,\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}\leqslant\tau^{-\frac{1}{100d}}\sum\limits_{j\in H_{J,z}}\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}}\leqslant\tau^{-\frac{1}{100d}}\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}},

and we next take expectation over zz:

𝔼z[jHJ,zgJ¯z^({j})2]τ1100djJ𝒥¯𝔼z[|gJ¯z^({j})|2+1100d].{\mathop{\mathbb{E}}_{z}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\leqslant\tau^{-\frac{1}{100d}}\sum\limits_{j\in J\cap\bar{\mathcal{J}}}{\mathop{\mathbb{E}}_{z}\left[{\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}}}\right]}. (7)

Define the operator T=Tρ1𝒥Tρ2𝒥¯T=T_{\rho_{1}}^{\mathcal{J}}T_{\rho_{2}}^{\overline{\mathcal{J}}} where ρ1=12/(3d)11/(3d)\rho_{1}=\frac{1-2/(3d)}{1-1/(3d)} and ρ2=11/2k+1/(3d)11/2k+2/(3d)\rho_{2}=\frac{1-1/2^{k}+1/(3d)}{1-1/2^{k}+2/(3d)}, and note that by the semi-group property of the noise operator (namely, the fact that TρTρ=TρρT_{\rho}T_{\rho^{\prime}}=T_{\rho\rho^{\prime}}) we may write g=TSkfg=TS_{k}^{\prime}f. Thus,

gJ¯z^({j})=SJ¯{j},Sjg^(S)χS{j}(z)=SJ¯{j},Sjρ1|S𝒥|ρ2|S𝒥¯|Skf^(S)χS{j}(z).\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})=\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\widehat{g}(S)\chi_{S\setminus\{j\}}(z)=\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left|{S\cap\mathcal{J}}\right|}\rho_{2}^{\left|{S\cap\overline{\mathcal{J}}}\right|}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus\{j\}}(z).

Thus, using hypercontractivity, i.e. Lemma 4.8, we have that

𝔼z[|gJ¯z^({j})|2+1100d]\displaystyle{\mathop{\mathbb{E}}_{z}\left[{\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}}}\right]} =SJ¯{j},Sjρ1|S𝒥|ρ2|S𝒥¯|Skf^(S)χS{j}(z)2+1100d2+1100d\displaystyle=\left\|\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left|{S\cap\mathcal{J}}\right|}\rho_{2}^{\left|{S\cap\overline{\mathcal{J}}}\right|}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\|_{2+\frac{1}{100d}}^{2+\frac{1}{100d}}
SJ¯{j},Sjρ1|S𝒥|ρ2|S𝒥¯|(113d)(|S|1)Skf^(S)χS{j}(z)22+1100d\displaystyle\leqslant\left\|\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left|{S\cap\mathcal{J}}\right|}\rho_{2}^{\left|{S\cap\overline{\mathcal{J}}}\right|}\left(1-\frac{1}{3d}\right)^{-(\left|{S}\right|-1)}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\|_{2}^{2+\frac{1}{100d}}
SjSkf^(S)χS{j}(z)22+1100d.\displaystyle\leqslant\left\|\sum\limits_{S\ni j}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\|_{2}^{2+\frac{1}{100d}}. (8)

In the last inequality, we used Parseval’s equality and that fact that ρ1,ρ2113d\rho_{1},\rho_{2}\leqslant 1-\frac{1}{3d}. Note that

SjSkf^(S)χS{j}(z)22+1100dSkjf22+1100d,\left\|\sum\limits_{S\ni j}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\|_{2}^{2+\frac{1}{100d}}\lesssim\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}},

so plugging (4.2) into (7) yields that

𝔼z[jHJ,zgJ¯z^({j})2]τ1100djJ𝒥¯Skjf22+1100d.{\mathop{\mathbb{E}}_{z}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\lesssim\tau^{-\frac{1}{100d}}\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}}.

Taking expectation over JJ now gives that

𝔼z,J[jHJ,zgJ¯z^({j})2]τ1100dj𝒥¯Skjf22+1100d𝔼J[1jJ]=13dτ1100dj𝒥¯Skjf22+1100d.{\mathop{\mathbb{E}}_{z,J}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\lesssim\tau^{-\frac{1}{100d}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}}{\mathop{\mathbb{E}}_{J}\left[{1_{j\in J}}\right]}=\frac{1}{3d}\tau^{-\frac{1}{100d}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}}.

5 Acknowledgements

We thank an anonymous for many helpful comments and corrections to an earlier version of the paper which led to considerable improvements of the paper in all aspects.

References

  • [Bec75] William Beckner. Inequalities in Fourier analysis. Annals of Mathematics, 102:159–182, 1975.
  • [Ber67] Arthur J Bernstein. Maximally connected arrays on the n-cube. SIAM Journal on Applied Mathematics, 15(6):1485–1489, 1967.
  • [BKS99] Itai Benjamini, Gil Kalai, and Oded Schramm. Noise sensitivity of Boolean functions and applications to percolation. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques, 90(1):5–43, 1999.
  • [Bon70] Aline Bonami. Étude des coefficients de Fourier des fonctions de lp(g)l^{p}(g). Annales de l’Institut Fourier, 20(2):335–402, 1970.
  • [Bou02] Jean Bourgain. On the distribution of the fourier spectrum of boolean functions. Israel Journal of Mathematics, 131(1):269–276, 2002.
  • [EG] Ronen Eldan and Renan Gross. Concentration on the Boolean hypercube via pathwise stochastic analysis. In STOC 2020.
  • [Fri98] Ehud Friedgut. Boolean functions with low average sensitivity depend on few coordinates. Combinatorica, 18(1):27–35, 1998.
  • [Gro75] Leonard Gross. Logarithmic Sobolev inequalities. American Journal of Mathematics, 97(4):1061–1083, 1975.
  • [Har66] Lawrence H Harper. Optimal numberings and isoperimetric problems on graphs. Journal of Combinatorial Theory, 1(3):385–393, 1966.
  • [Har76] Sergiu Hart. A note on the edges of the n-cube. Discrete Mathematics, 14(2):157–163, 1976.
  • [KK13] Nathan Keller and Guy Kindler. Quantitative relation between noise sensitivity and influences. Combinatorica, 33(1):45–71, 2013.
  • [KKK+] Esty Kelman, Subhash Khot, Guy Kindler, Dor Minzer, and Muli Safra. Theorems of KKL, Friedgut, and Talagrand via random restrictions and log-sobolev inequality. In ITCS 2021.
  • [KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions (extended abstract). In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80, 1988.
  • [KKO18] Guy Kindler, Naomi Kirshner, and Ryan O’Donnell. Gaussian noise sensitivity and fourier tails. Israel Journal of Mathematics, 225:71–109, 2018.
  • [KMS18] Subhash Khot, Dor Minzer, and Muli Safra. On monotonicity testing and Boolean isoperimetric-type theorems. SIAM J. Comput., 47(6):2238–2276, 2018.
  • [KT14] Christos P Kitsos and Thomas L Toulias. Inequalities for the Fisher’s information measures. Handbook of Functional Equations: Functional Inequalities, pages 281–313, 2014.
  • [Lin64] John H Lindsey. Assignment of numbers to vertices. The American Mathematical Monthly, 71(5):508–516, 1964.
  • [Mar74] Grigorii Aleksandrovich Margulis. Probabilistic characteristics of graphs with large connectivity. Problemy peredachi informatsii, 10(2):101–108, 1974.
  • [O’D14] Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
  • [Tal93] Michel Talagrand. Isoperimetry, logarithmic Sobolev inequalities on the discrete cube, and Margulis’ graph connectivity theorem. Geometric & Functional Analysis GAFA, 3(3):295–314, 1993.
  • [Tal94] Michel Talagrand. On Russo’s approximate zero-one law. The Annals of Probability, 22(3):1576–1587, 1994.
  • [Tal96] Michel Talagrand. How much are increasing sets positively correlated? Combinatorica, 16(2):243–258, 1996.
  • [Tal97] Michel Talagrand. On boundaries and influences. Combinatorica, 17(2):275–285, 1997.