Isoperimetric Inequalities Made Simpler

Ronen Eldan ¹¹1Department of Computer Science and Applied Mathematics, Weizmann Institute. Guy Kindler ²²2School of Computer Science and Engineering, Hebrew University. Noam Lifshitz ³³3School of Mathematics, Hebrew University. Partially supported by ISF individual research grant 1980/22. Dor Minzer ⁴⁴4Department of Mathematics, Massachusetts Institute of Technology. Supported by a Sloan Research Fellowship and NSF CCF award 2227876.

Abstract

We give an alternative, simple method to prove isoperimetric inequalities over the hypercube. In particular, we show:

1.

An elementary proof of classical isoperimetric inequalities of Talagrand, as well as a stronger isoperimetric result conjectured by Talagrand and recently proved by Eldan and Gross.
2.

A strengthening of the Friedgut junta theorem, asserting that if the $p$ -moment of the sensitivity of a function is constant for some $1/2+\varepsilon\leqslant p\leqslant 1$ , then the function is close to a junta. In this language, Friedgut’s theorem is the special case that $p=1$ .

1 Introduction

1.1 Isoperimetric Inequalities Over the Hypercube

Isoperimetric inequalities are fundamental results in mathematics with a myriad of applications. The main topic of this paper is discrete isoperimetric inequalities, and more specifically isoperimetric inequalities over the hypercube $\{0,1\}^{n}$ . Classical results along these lines are due to Harper, Bernstein, Lindsey and Hart [Har66, Ber67, Lin64, Har76], Harper [Har66], Margulis [Mar74], Talagrand [Tal93, Tal97, Tal94] and Kahn, Kalai and Linial [KKL88].

All of these theorems discuss various measures of boundaries and relations between them for Boolean functions. For a function $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ and a point $x\in{\left\{0,1\right\}}^{n}$ , we define the sensitivity of $f$ at $x$ , denoted by $s_{f}(x)$ , to be the number of coordinates $i\in[n]$ such that $f(x\oplus e_{i})\neq f(x)$ . The vertex boundary of a function $f$ , denoted by $V_{f}$ , is the set of all $x$ ’s such that $s_{f}(x)>0$ , and the edge boundary $E_{f}$ of $f$ is defined to be the set of edges $(x,x\oplus e_{i})$ of the hypercube whose endpoints are assigned different values by $f$ . The most basic isoperimetric result related to Boolean functions, known as Poincaré’s inequality, asserts that $\frac{\left|{E_{f}}\right|}{2^{n}}\geqslant{\sf var}(f)$ , i.e. the edge boundary of a function $f$ cannot be too small.⁵⁵5We remark that the term “Poincaré’s inequality” comes from the theory of functional inequalities and Markov chains and refers to a much more general result, see for example [KT14, Section 2]. In the case of graphs, it asserts that for a graph $G$ , if the first non-trivial eigenvalue of its Laplacian $L$ is at least $\lambda$ , then $\|Lf\|_{2}\geqslant\lambda\|f-\mathop{\mathbb{E}}[f]\|_{2}$ . In our example, the Laplacian $L$ is $L=I-\frac{1}{n}A$ where $A$ is the adjacency matrix of the Boolean hypercube, and $\lambda$ is $2/n$ . The edge isoperimetric inequality for the hypercube, due to Harper, Bernstein, Lindsey and Hart [Har66, Ber67, Lin64, Har76], is a strengthening of that statement, asserting that for a fixed value of $\mathop{\mathbb{E}}[f]$ , the functions that minimize the edge boundary $\frac{\left|{E_{f}}\right|}{2^{n}}$ are indicator functions of initial segments in lexicographical orderings of the hypercube. In particular, this result implies that if $f$ is a non-constant Boolean function, then $\frac{\left|{E_{f}}\right|}{2^{n}}\geqslant\mathop{\mathbb{E}}[f]\log\left(\frac{1}{\mathop{\mathbb{E}}[f]}\right)$ (here and throughout, $\log$ means the logarithm function with base $2$ ). The vertex isoperimetric inequality, due to Harper [Har66], asserts that among all Boolean function of a given measure, the functions that minimize the size of the vertex boundary $V_{f}$ are the indicator functions of initial segments in simplicial orderings of the hypercube. Margulis [Mar74] proved a result that combines the notion of edge boundary and vertex boundary, showing that for all functions $f$ one has that $\frac{\left|{V_{f}}\right|}{2^{n}}\cdot\frac{\left|{E_{f}}\right|}{2^{n}}\geqslant\Omega({\sf var}(f)^{2})$ .

Talagrand considered a different notion of boundary for a function $f$ , namely the quantity ${\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}$ , which we will refer to as the Talagrand boundary of $f$ . Here and throughout, the distribution over $x$ will be uniform over the hypercube ${\left\{0,1\right\}}^{n}$ . Talagrand proved a number of results about this quantity [Tal93, Tal97], the most basic of which asserts that ${\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant\Omega({\sf var}(f))$ . Using a simple application of the Cauchy-Schwarz inequality one sees that the result of Talagrand implies Margulis’ theorem. Talagrand then went on and strengthened the result for functions with small variance, proving the following theorem:

Theorem 1.1 (Talagrand [Tal93]).

For all non-constant Boolean functions $f\colon\{0,1\}^{n}\to\{0,1\}$ it holds that

{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant\Omega\left({\sf var}(f)\sqrt{\log\left(\frac{1}{{\sf var}(f)}\right)}\right).

We remark that Talagrand’s result is in fact more general and slightly stronger. First, Talagrand proves that the above inequality holds for the $p$ -biased measure of the cube if one inserts a factor of $\frac{\min(p,1-p)}{\sqrt{p(1-p)}}$ to the right hand side. The result and techniques of the current paper also generalize to the $p$ -biased measure in a straight-forward manner, and we focus on the case that $p=1/2$ for simplicity. Second, Talagrand considers a variant of the quantity $s_{f}(x)$ wherein $s_{f}(x)$ is defined to be $0$ for $x$ such that $f(x)=0$ . The results we state below also hold with this stronger definition.

While Theorem 1.1 can be seen to be tight up to constant factors (as can be seen by considering sub-cubes), it leaves something to be desired: it is incomparable to yet another prominent isoperimetric result – the KKL theorem [KKL88]. The KKL Theorem is concerned with the influence of coordinates on a function. For a coordinate $i\in[n]$ , the influence of $i$ is defined by

I_{i}[f]={\Pr_{x\in{\left\{0,1\right\}}^{n}}\left[{f(x)\neq f(x\oplus e_{i})}\right]}.

In words, it is the fraction of edges along direction $i$ that are in the edge boundary of $f$ . The KKL theorem [KKL88] asserts that any function must have a variable that is at least somewhat influential; more quantitatively, they proved that $\max_{i}I_{i}[f]\geqslant\Omega\left(\frac{\log n}{n}{\sf var}(f)\right)$ .

Since the isoperimetric results of Talagrand and of KKL look very different (as well as the techniques that go into their proofs), Talagrand asked whether one can establish a single isoperimetric inequality that is strong enough to capture both of these results simultaneously. In [Tal97], Talagrand conjectures such a statement, and makes some progress towards it. Namely, he shows that there exists $\alpha\in(0,1/2]$ such that for any non-constant function $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ it holds that

{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant c\cdot{\sf var}(f)\left(\log\left(\frac{1}{{\sf var}(f)}\right)\right)^{1/2-\alpha}\left(\log\left(1+\frac{1}{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\right)\right)^{\alpha},

(1)

for some absolute constant $c>0$ . This inequality can be seen to imply a weaker result along the lines of the KKL theorem,⁶⁶6Namely that there is $\beta>0$ such that if ${\sf var}(f)\geqslant\Omega(1)$ , $\max_{i}I_{i}[f]\geqslant\Omega((\log n)^{\beta}/n)$ . but is not enough to fully recover it. Talagrand conjectured that the above inequality holds for $\alpha=1/2$ , a statement that is strong enough to directly imply the KKL theorem.

Recently, Eldan and Gross [EG] introduced techniques from stochastic processes to the field, and used these to prove that Talagrand’s conjecture is true, i.e. that one may take $\alpha=1/2$ in (1).

1.2 Our Results

We show a unified and simple Fourier analytic approach, based on random restrictions, to establish the results mentioned above, as well as a strengthening of Friedgut’s junta theorem [Fri98]. Our results are:

1.

Classical isoperimetric inequalities. We give Fourier analytic proofs of the inequalities above by Margulis, Talagrand and Eldan-Gross, as well as of robust versions of them. Our proof technique also unravels a connection between these isoperimetric inequalities and other classical results in the area, such as the Friedgut junta theorem [Fri98] and Bourgain’s tail theorem [Bou02]. The fact that such purely Fourier analytic proof exists is quite surprising to us, as the quantity ${\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}$ does not seem to relate well to Fourier analytic methods (and indeed, the proof of Talagrand proceeds by induction, whereas the proof of Eldan and Gross uses stochastic calculus).
2.

Strengthened junta theorems. We establish a strengthening of Friedgut’s theorem in which the assumption that the average sensitivity of $f$ is small, i.e. that ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}$ is small, is replaced by the weaker condition that ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}$ is small, for $p$ slightly larger than $1/2$ .

In the rest of this introductory section, we describe our results in more detail.

1.3 Classical Isoperimetric Inequalities and Robust Versions

We give simpler proofs for the results below, which are due to Talagrand [Tal93] and Eldan-Gross [EG].

Theorem 1.2.

There exists an absolute constant $c>0$ , such that for all non-constant functions $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ we have that

{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant c\cdot{\sf var}(f)\sqrt{\log\left(\frac{1}{{\sf var}(f)}\right)}.

Theorem 1.3.

There exists an absolute constant $c>0$ , such that for all non-constant functions $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ we have that

{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}\geqslant c\cdot{\sf var}(f)\sqrt{\log\left(1+\frac{1}{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\right)}.

Our technique is more general, and can be used to prove more robust versions of Theorems 1.2, 1.3. For a $2$ -coloring of the edges of the hypercube ${\sf col}\colon E(\{0,1\}^{n})\to\{{\sf red},{\sf blue}\}$ , define the red sensitivity as $s_{f,{\sf red}}(x)=0$ if $f(x)=0$ , and otherwise $s_{f,{\sf red}}(x)$ is the number of sensitive edges adjacent to $x$ that are red. Similarly, define the blue sensitivity as $s_{f,{\sf blue}}(x)=0$ if $f(x)=1$ and otherwise $s_{f,{\sf blue}}(x)$ is the number of sensitive edges adjacent to $x$ that are blue. Then, defining

{\sf Talagrand}_{{\sf col}}(f)={\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f,{\sf red}}(x)}}\right]}+{\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f,{\sf blue}}(x)}}\right]}

we have that the left hand side of Theorems 1.2, 1.3 can be replaced by ${\sf Talagrand}_{{\sf col}}(f)$ for any $2$ -coloring ${\sf col}$ . Such results can be used to prove the existence of nearly bi-regular structures in the sensitivity graph of $f$ which are sometimes useful in applications (for example, in [KMS18] such structures in the directed sensitivity graph are necessary to construct monotonicity testers). To get some intuition, consider the bipartite sub-graph of the hypercube consisting of sensitive edges of $f$ , and suppose that we want to find a large sub-graph of it which is nearly regular, in the sense that the degrees of one side are between $d$ and $2d$ , while the degrees of the other side are at most $d$ for some $d$ . In that case, taking the coloring ${\sf col}_{{\sf red}}$ that assigns to all edges red or the coloring ${\sf col}_{{\sf blue}}$ that assigns all of the edges blue (which amount to Talagrand’s original formulation) gives us information about the typical degrees of vertices $x$ with $f(x)=1$ and vertices $y$ with $f(y)=0$ respectively. It doesn’t give us any information about the possible relation between the degrees of such $x$ ’s and such $y$ ’s, though. Using the coloring ${\sf col}$ appropriately, one may establish a better balance between these two degrees and get useful information on the relation between them. Establishing such robust results using the techniques of [EG] is more challenging.

1.4 New Junta Theorems

Next, we study connections between Talagrand-like quantities such as ${\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}$ , and the notion of juntas. For a point $x\in\{0,1\}^{n}$ and a subset $J\subseteq[n]$ , we denote by $x_{J}$ the point in $\{0,1\}^{J}$ corresponding to the value of $x$ on coordinates $J$ .

Definition 1.4.

A function $f\colon\{0,1\}^{n}\to\{0,1\}$ is called a $j$ -junta if there exists $J\subseteq[n]$ of size at most $j$ , and $g\colon\{0,1\}^{J}\to\{0,1\}$ such that $f(x)=g(x_{J})$ for all $x\in\{0,1\}^{n}$ .

We say a function $f\colon\{0,1\}^{n}\to\{0,1\}$ is $\varepsilon$ -close to a $j$ -junta if there is a $j$ -junta $g\colon\{0,1\}^{n}\to\{0,1\}$ such that ${\Pr_{x}\left[{f(x)\neq g(x)}\right]}\leqslant\varepsilon$ .

On the one hand, it is possible that the quantity ${\mathop{\mathbb{E}}_{x}\left[{\sqrt{s_{f}(x)}}\right]}$ is constant while the function $f$ is very far from being a junta, as can be seen by taking $f$ to be the majority function. On the other hand, if we look at the expected sensitivity – that is ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}$ – a well known result of Friedgut [Fri98] asserts that if ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}=O(1)$ , then $f$ is close to junta. More precisely, Friedgut’s result asserts that if ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}\leqslant A$ , then for all $\varepsilon>0$ , $f$ is $\varepsilon$ -close to a $2^{O(A/\varepsilon)}$ -junta. In light of this contrast, it is interesting to ask what happens if an intermediate moment of the sensitivity, say the $p$ th moment for $p\in(1/2,1)$ , is constant.

We prove a strengthening of Friedgut’s junta theorem [Fri98] addressing this case, morally showing that for any $p>1/2$ , bounded $p$ -moment of the sensitivity implies closeness to junta. More precisely:

Theorem 1.5.

For all $\eta>0$ there is $C=C(\eta)>0$ such that the following holds. For all $\varepsilon>0$ and $A>0$ , and $1/2+\eta\leqslant p\leqslant 1$ , if $f\colon\{0,1\}^{n}\to\{0,1\}$ is a function such that ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A$ , then $f$ is $\varepsilon$ -close to a $J$ -junta, where $J=2^{C\left(A/\varepsilon\right)^{1/p}}$ .

Theorem 1.5 is tight, and can be seen as a generalization of Friedgut’s Junta Theorem (corresponding to the case that $p=1$ ) that is somewhat close in spirit to Bourgain’s Junta Theorem. Indeed, for all $p\geqslant 1/2$ the tribes function shows this to be tight. Here, the tribes function is the function in which we take a partition $P_{1}\cup\ldots\cup P_{m}$ of $[n]$ and define ${\sf Tribes}(x)=1$ if there is $i$ such that $x_{P_{i}}=\vec{1}$ , and ${\sf Tribes}(x)=0$ otherwise. We take $P_{i}$ of equal size and the probability that ${\sf Tribes}(x)=1$ is roughly $1/2$ ; the correct choice of parameters is $\left|{P_{i}}\right|=\log n-\log\log n+c$ and $m=\frac{n}{\log n-\log\log n+c}$ where $c=O(1)$ . For this function one has ${\mathop{\mathbb{E}}_{x}\left[{s_{{\sf Tribes}}(x)^{p}}\right]}=\Theta((\log n)^{p})$ . Indeed, as $s_{{\sf Tribes}}(x)=\Theta(\log n)$ with constant probability we have the lower bound, and the upper bound follows by Jensen’s inequality and the fact that the total influence of ${\sf Tribes}$ is $\Theta(\log n)$ .

As a consequence, Theorem 1.5 also implies a version of the KKL theorem for $p$ th moments of sensitivity:

Corollary 1.6.

For all $\delta>0$ , there is $c>0$ such that the following holds for all $p\in[1/2+\delta,1]$ . Suppose that a function $f\colon\{0,1\}^{n}\to\{0,1\}$ satisfies that ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A$ , then there exists $i\in[n]$ such that

I_{i}[f]\geqslant 2^{-c\left(A/{\sf var}(f)\right)^{1/p}}.

Proof.

Assume without loss of generality that $\mathop{\mathbb{E}}[f]\leqslant\frac{1}{2}$ (otherwise we work with $1-f$ ). Take $\varepsilon=\mathop{\mathbb{E}}[f]/10$ and apply Theorem 1.5 to get that $f$ is $\varepsilon$ -close to a function $g$ which is a $J$ -junta for $J=2^{\Theta(\left(A/{\sf var}(f)\right)^{1/p})}$ , and let $\mathcal{J}$ be the set of coordinates that $g$ depends on. Choose $x\in\{0,1\}^{n}$ uniformly, and take $y\in\{0,1\}^{n}$ where $y_{\overline{\mathcal{J}}}=x_{\overline{\mathcal{J}}}$ and $y_{\mathcal{J}}$ is sampled uniformly from ${\left\{0,1\right\}}^{\mathcal{J}}$ . Then ${\Pr_{x,y}\left[{g(x)=0,g(y)=1}\right]}=\mathop{\mathbb{E}}[g](1-\mathop{\mathbb{E}}[g])$ ; also, $f(x)=g(x)$ and $f(y)=g(y)$ hold with probability at least $1-2\varepsilon$ , hence we conclude that

{\Pr_{x,y}\left[{f(x)=0,f(y)=1}\right]}\geqslant\mathop{\mathbb{E}}[g](1-\mathop{\mathbb{E}}[g])-2\varepsilon\geqslant\frac{1}{2}(\mathop{\mathbb{E}}[f]-\varepsilon)-2\varepsilon\geqslant\frac{1}{4}\mathop{\mathbb{E}}[f].

On the other hand, write $\mathcal{J}={\left\{i_{1},\ldots,i_{J}\right\}}$ and consider a path $x(0),\ldots,x(J)$ where $x(0)=x$ , $x(J)=y$ and $x(t),x(t+1)$ may differ only on coordinate $i_{t+1}$ . Then by the union bound,

\frac{1}{4}\mathop{\mathbb{E}}[f]\leqslant{\Pr_{x,y}\left[{f(x)=0,f(y)=1}\right]}\leqslant\sum\limits_{t=0}^{J-1}{\Pr_{x,y}\left[{f(x(t))=0,f(x(t+1))=1}\right]}=\sum\limits_{t=0}^{J-1}I_{i_{t+1}}[f]\leqslant J\max_{i}I_{i}[f],

and the result follows. ∎

We note that the KKL theorem is this statement for $p=1$ , and the version for $p<1$ is only stronger than the KKL theorem as always ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)}\right]}^{p}$ .

Remark 1.7.

We remark that a naive application of our method establishes a sub-optimal dependency between the parameters $A$ and $J$ in Theorem 1.5, but gives an interesting relation to the noise sensitivity theorem of Bourgain [Bou02]. More precisely, in Corollary 3.11 we show that if $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ satisfies ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A$ , then $f$ must be noise stable. More specifically, for $\varepsilon>0$ , we define the $(1-\varepsilon)$ -correlated distribution $(x,y)$ by taking $x\in{\left\{0,1\right\}}^{n}$ uniformly, and for each $i\in[n]$ independently, setting $y_{i}=x_{i}$ with probability $1-\varepsilon$ and otherwise resampling $y_{i}\in{\left\{0,1\right\}}$ uniformly. In this language, we show that if ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A$ , and $p<1$ , then for all $\varepsilon>0$ ,

{\Pr_{x,y\text{ $(1-\varepsilon)$-correlated}}\left[{f(x)=f(y)}\right]}\geqslant 1-O_{p}(A\cdot\varepsilon^{p}).

From this, one may establish a weaker variant of Theorem 1.5 by appealing to Bourgain’s result [Bou02] (see also [KKO18]). The version of that theorem from [KKO18, Theorem 3.5] asserts that if the Fourier weight of a function $f$ above level $k$ is at most $\xi/\sqrt{k}$ , then the function is $(1+o(1))\xi$ -close to a $J=2^{O(k)}/\xi^{4}$ -junta. To use this result, one may observe that the noise stability bound above implies that for $k=1/\varepsilon$ , the Fourier weight of $f$ above level $k$ is at most $O_{p}(A\varepsilon^{p})=O_{p}\left(\frac{A\varepsilon^{p-1/2}}{\sqrt{k}}\right)$ , hence by [KKO18, Theorem 3.5] $f$ is $O(A\varepsilon^{p-1/2})$ -close to a $J=\frac{2^{O(1/\varepsilon)}}{A\varepsilon^{p-1/2}}$ -junta for all $\varepsilon>0$ . In particular, setting $\varepsilon=\Theta((\delta/A)^{2/(2p-1)})$ , we get that $f$ is $\delta$ -close to a $J=2^{O((A/\delta)^{2/(2p-1)})}$ junta.

1.5 Proof Overview

In this section we give an outline of the proof of Theorems 1.2 and 1.3. Both results rely on the same simple observation, that, once combined with off-the-shelf Fourier concentration results, finishes the proof.

Given $f\colon{\left\{0,1\right\}}^{n}\to\{0,1\}$ , we consider its Fourier coefficients $\widehat{f}(S)={\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{f(x)\prod\limits_{i\in S}(-1)^{x_{i}}}\right]}$ as well as its gradient $\nabla f\colon{\left\{0,1\right\}}^{n}\to\{-1,0,1\}^{n}$ defined as $\nabla f(x)=(f(x\oplus e_{i})-f(x))_{i=1,\ldots,n}$ . We first observe that the square root of the sensitivity of $f$ at $x$ is nothing but $\|\nabla f(x)\|_{2}$ , and also that looking at the absolute value of the $i$ th entry in $\nabla f$ , namely at the function $x\rightarrow\left|{f(x\oplus e_{i})-f(x)}\right|$ , one has that its average is at least the absolute value of the singleton Fourier coefficient $\left|{\widehat{f}(\{i\})}\right|$ . This suggests that if $f$ has significant singleton Fourier coefficients then the magnitude of the gradient should be large, and therefore the expected root of the sensitivity should also be large. Indeed, by a simple application of the triangle inequality one gets that

{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\left|{\nabla f(x)}\right|}\right]}\|_{2}\geqslant\left\|\left(\left|{\widehat{f}(\{1\})}\right|,\ldots,\left|{\widehat{f}(\{n\})}\right|\right)\right\|_{2},

which is equal to $\sqrt{\sum\limits_{i=1}^{n}\widehat{f}(\{i\})^{2}}$ . As the sum of all the squares of all of the Fourier coefficients is at most $1$ , we get that ${\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\sum\limits_{i=1}^{n}\widehat{f}(\{i\})^{2}$ . In words, via a simple application of the triangle inequality we managed to show a lower bound of the quantity we are interested in by the level $1$ Fourier weight of the function $f$ . This is already a rather interesting isoperimetric consequence, albeit rather weak.

We next show how to bootstrap this basic result into Theorems 1.2 and 1.3 via random restrictions. To get some intuition, suppose that we knew our function $f$ has significant weight on level $d$ (where $d>1$ ); can we utilize the above logic to say anything of interest about lower bounds of ${\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}$ ? The answer is yes, and one way to do that is via random restrictions. If $f$ has considerable weight on level $d$ , we choose a subset of coordinates $J\subseteq[n]$ by independently including each coordinate in it with probability $p=1/d$ and then restrict the coordinates inside $\overline{J}$ to a uniformly chosen value $z\in{\left\{0,1\right\}}^{\overline{J}}$ . In that case, we expect the weight of $f$ to collapse from level $d$ to roughly level $pd$ , which by the choice of the parameter $p$ is roughly the first Fourier level. At that point we could apply the above logic on the restricted function and get some lower bound for the expected root of the sensitivity of the restricted function. But how does this last quantity relate to ${\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}$ ?

Thinking about it for a moment, for a fixed point $x$ , if we choose $J$ randomly as above and fix all coordinates inside $\overline{J}$ , then only roughly $1/d$ of the coordinates sensitive on $x$ remain alive, hence we expect the sensitivity to drop by a factor of $1/d$ as a result of this restriction. Indeed, it can be shown that under random restrictions, the quantity ${\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}$ drops by a factor of $1/\sqrt{d}$ . Thus, combining this observation with the above reasoning suggests that we should get a lower bound of

{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{n}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\Omega\left(\sqrt{d}\sum\limits_{d\leqslant\left|{S}\right|\leqslant 2d}\widehat{f}(S)^{2}\right),

and indeed this is the bound we establish.

Theorems 1.2 and 1.3 then follow from known Fourier tail bounds. Specifically, it is known that (1) if $f$ has small variance then most of its Fourier mass lies above level $d=\Omega(\log(1/{\sf var}(f)))$ , and (2) if $f$ has small influences then most Fourier mass lies above level $d=\Omega(\log(1/\sum\limits_{i=1}^{n}I_{i}[f]^{2}))$ .

2 Preliminaries

Notation.

We will use big $O$ notations throughout the paper. Sometimes, it will be easier for us to use $\lesssim$ and $\gtrsim$ notations; when we say that $X\lesssim Y$ we mean that there is an absolute constant $C>0$ such that $X\leqslant CY$ . Analogously, when we say that $X\gtrsim Y$ we mean that there is an absolute constant $C>0$ such that $X\geqslant CY$ .

2.1 Fourier Analysis over the Hypercube

We consider the space of real-valued functions $f:{\left\{0,1\right\}}^{n}\to{\mathbb{R}}$ , equipped with the inner product $\langle{f},{g}\rangle={\mathop{\mathbb{E}}_{x\in_{R}{\left\{0,1\right\}}^{n}}\left[{f(x)g(x)}\right]}$ . It is well-known that the collection of functions ${\left\{\chi_{S}\right\}}_{S\subseteq[n]}$ defined by $\chi_{S}(x)=\prod\limits_{i\in S}{(-1)^{x_{i}}}$ forms an orthonormal basis, so any function $f$ may be written as $f(x)=\sum\limits_{S\subseteq[n]}{\widehat{f}(S)\chi_{S}(x)}$ in a unique way, where $\widehat{f}(S)=\langle{f},{\chi_{S}}\rangle$ . We also consider $L_{p}$ norms for $p\geqslant 1$ , that are defined as $\|f\|_{p}=\left({\mathop{\mathbb{E}}_{x}\left[{\left|{f(x)}\right|^{p}}\right]}\right)^{1/p}$ .

For $z\in{\left\{0,1\right\}}^{n}$ , we denote by $z_{-i}\in{\left\{0,1\right\}}^{n-1}$ the vector resulting from dropping the $i$ th coordinate of $z$ . We denote by $(x_{i}=1,x_{-i}=z_{-i})$ the $n$ -bit vector which is $1$ on the $i$ th coordinate and is equal to $z$ on any other coordinate.

Definition 2.1.

The derivative of $f$ along direction $i\in[n]$ is $\partial_{i}f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ defined by $\partial_{i}f(z)=f(x_{i}=0,x_{-i}=z_{-i})-f(x_{i}=1,x_{-i}=z_{-i})$ . The gradient of $f$ , $\nabla f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}^{n}$ , is defined as $\nabla f(z)=(\partial_{1}f(z),\ldots,\partial_{n}f(z))$ .

Note that for a Boolean function $f$ , for each $x\in{\left\{0,1\right\}}^{n}$ , $\partial_{i}f(x)\neq 0$ if and only if the coordinate $i$ is sensitive on $x$ , and in this case its value is either $1$ or $-1$ . Thus, $\|\nabla f(x)\|_{2}=\sqrt{s_{f}(x)}$ , and it will be often convenient for us to use the $\ell_{2}$ -norm of the gradient of $f$ in place of $\sqrt{s_{f}(x)}$ .

Fact 2.2.

Let $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ , and let $i\in[n]$ . Then $\partial_{i}f(x)=2\sum\limits_{S\ni i}{\widehat{f}(S)\chi_{S\setminus{\left\{i\right\}}}(x)}$ . In particular, we have that ${\mathop{\mathbb{E}}_{x}\left[{\partial_{i}f(x)}\right]}=2\widehat{f}({\left\{i\right\}})$ .

Next, we define the level $d$ Fourier weight of a function $f$ , the approximate level $d$ weight of a function $f$ and the Fourier weight of $f$ up to/ above level $d$ .

Definition 2.3.

The level $d$ Fourier weight of $f$ is defined to be $W_{=d}[f]=\sum\limits_{\left|{S}\right|=d}{\widehat{f}(S)^{2}}$ . The approximate level $d$ weight of a function $f$ is defined to be $W_{\approx d}[f]=\sum\limits_{d\leqslant j<2d}{W_{=j}[f]}$ . The weight of $f$ up to level $d$ is defined as $W_{\leqslant d}[f]=\sum\limits_{\left|{S}\right|\leqslant d}{\widehat{f}(S)^{2}}$ , and the weight of $f$ above level $d$ is defined as $W_{>d}[f]=\sum\limits_{\left|{S}\right|>d}{\widehat{f}(S)^{2}}$ .

For a function $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ , we also define the level at most $d$ part of $f$ , denoted by $f^{\leqslant d}$ , as $f^{\leqslant d}(x)=\sum\limits_{\left|{S}\right|\leqslant d}\widehat{f}(S)\chi_{S}(x)$ . We note that by Parseval’s equality we have that $\|f^{\leqslant d}\|_{2}^{2}=W_{\leqslant d}[f]$ .

2.2 Random Restrictions

For a function $f\colon\{0,1\}^{n}\to\mathbb{R}$ , a set of coordinates $J\subseteq[n]$ and $z\in{\left\{0,1\right\}}^{\overline{J}}$ , we define the restriction of $f$ according to $(J,z)$ as the function $f_{\overline{J}\rightarrow z}\colon\{0,1\}^{J}\to\mathbb{R}$ defined by

f_{\overline{J}\rightarrow z}(y)=f(x_{\overline{J}}=z,x_{J}=y).

We refer to the coordinates of $J$ as “alive”, and refer to the coordinates of $\overline{J}$ as fixed. We will often be interested in random restrictions of a function $f$ . By that, we mean that the set $J$ is chosen randomly by including in it each $i\in[n]$ independently with some probability (which is specified), and $z\in{\left\{0,1\right\}}^{\overline{J}}$ is chosen uniformly. We have the following fact, which establishes a relation between the approximate level $d$ weight of $f$ and the level $1$ weight of a random restriction of $f$ in which each $i\in[n]$ is included in $J$ with probability $\frac{1}{2d}$ (see [KKO18, Fact 4.1] and [KKK⁺, Section 3.1]).

Fact 2.4.

Let $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ and $d\in\mathbb{N}$ , let $(J,z)$ be a random restriction where each $j\in[n]$ is included in $J$ with probability $\frac{1}{2d}$ , and $z\in{\left\{0,1\right\}}^{\bar{J}}$ is chosen uniformly. Then ${\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}\gtrsim W_{\approx d}[f]$ .

Proof.

For a fixed $J$ and $T\subseteq J$ , we have that $\widehat{f_{\bar{J}\rightarrow z}}(T)=\sum\limits_{S\subseteq\bar{J}}\widehat{f}(T\cup S)\chi_{S}(z)$ . Thus,

{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}={\mathop{\mathbb{E}}_{J}\left[{{\mathop{\mathbb{E}}_{z}\left[{\sum\limits_{j\in J}\left(\sum\limits_{S\subseteq\bar{J}}\widehat{f}(S\cup{\left\{j\right\}})\chi_{S}(z)\right)^{2}}\right]}}\right]}={\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{j\in J}\sum\limits_{S\subseteq\bar{J}}\widehat{f}(S\cup{\left\{j\right\}})^{2}}\right]},

where the last transition follows by pushing the expectation over $z$ inside and using Parseval’s equality. It follows that

{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}={\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{T}\widehat{f}(T)^{2}1_{\left|{T\cap J}\right|=1}}\right]}=\sum\limits_{T}\widehat{f}(T)^{2}{\Pr_{J}\left[{\left|{T\cap J}\right|=1}\right]},

and as ${\Pr_{J}\left[{\left|{T\cap J}\right|=1}\right]}\gtrsim 1$ for all $T$ whose size is between $d$ and $2d$ and the rest of the summands are all non-negative, the proof is concluded. ∎

2.3 Bonami’s Lemma

The degree of a function $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ is defined as ${\sf deg}(f)=\max\left\{\left.\left|{S}\right|\;\right|\widehat{f}(S)\neq 0\right\}$ . We will need the following consequence of the Bonami-Beckner-Gross hypercontractive inequality [Bec75, Bon70, Gro75] (see also [O’D14]), showing that the $4$ -norm of a low-degree function $f$ is comparable to its $2$ -norm.

Theorem 2.5.

Let $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ be a function of degree at most $d$ . Then $\|f\|_{4}\leqslant\sqrt{3}^{d}\|f\|_{2}$ .

3 Isoperimetric Inequalities on the Hypercube

In this section, we prove Theorems 1.2 and 1.3. We begin with the following very crude bound on the Talagrand boundary, and then improve it using random restrictions.

Lemma 3.1.

Let $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ . Then ${\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant 2\sqrt{W_{=1}[f]}$ .

Proof.

By the triangle inequality and Fact 2.2 we have

{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x}\left[{\nabla f(x)}\right]}\|_{2}=\|2(\widehat{f}({\left\{1\right\}}),\ldots,\widehat{f}({\left\{n\right\}}))\|_{2}=2\sqrt{\sum\limits_{i=1}^{n}\widehat{f}({\left\{i\right\}})^{2}}=2\sqrt{W_{=1}[f]}.\qed

Lemma 3.2.

Let $d\in\mathbb{N}$ , and $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ . Then ${\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\gtrsim\sqrt{d}W_{\approx d}[f]$ .

Proof.

Let $(J,z)$ be a random restriction for $f$ , where each $j\in[n]$ is included in $J$ independently with probability $\frac{1}{2d}$ , and $z\in{\left\{0,1\right\}}^{\bar{J}}$ is sampled uniformly. Fix $x\in{\left\{0,1\right\}}^{n}$ , and note that

{\mathop{\mathbb{E}}_{J}\left[{\|\nabla f_{\bar{J}\rightarrow x_{\bar{J}}}(x_{J})\|_{2}}\right]}\leqslant\sqrt{{\mathop{\mathbb{E}}_{J}\left[{\|\nabla f_{\bar{J}\rightarrow x_{\bar{J}}}(x_{J})\|^{2}_{2}}\right]}}=\sqrt{{\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{j}{\left|{\partial_{j}f(x)}\right|^{2}1_{j\in J}}}\right]}}=\frac{\|\nabla f(x)\|_{2}}{\sqrt{2d}}.

Therefore,

\frac{1}{\sqrt{2d}}{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant{\mathop{\mathbb{E}}_{J,z}\left[{{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{J}}\left[{\|\nabla f_{\bar{J}\rightarrow z}(x)\|_{2}}\right]}}\right]}.

(2)

On the other hand, for each $J$ and $z\in{\left\{0,1\right\}}^{\bar{J}}$ we may apply Lemma 3.1 on $f_{\bar{J}\rightarrow z}$ to get that

{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{J}}\left[{\|\nabla f_{\bar{J}\rightarrow z}(x)\|_{2}}\right]}\gtrsim\sqrt{W_{=1}[f_{\bar{J}\rightarrow z}]}\geqslant W_{=1}[f_{\bar{J}\rightarrow z}].

Taking expectation over $J$ and $z$ we get that

{\mathop{\mathbb{E}}_{J,z}\left[{{\mathop{\mathbb{E}}_{x\in{\left\{0,1\right\}}^{J}}\left[{\|\nabla f_{\bar{J}\rightarrow z}(x)\|_{2}}\right]}}\right]}\gtrsim{\mathop{\mathbb{E}}_{J,z}\left[{W_{=1}[f_{\bar{J}\rightarrow z}]}\right]}\gtrsim W_{\approx d}[f],

where the last inequality is by Fact 2.4. Combining this with equation (2) finishes the proof. ∎

Lemma 3.3.

Let $d\in\mathbb{N}$ , and $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ . Then ${\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\gtrsim\sqrt{d}W_{\geqslant d}[f]$ .

Proof.

Applying Lemma 3.2 with $d2^{j}$ in place of $d$ for $j=0,1,2,\ldots$ , we get that

2^{-j/2}{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}\gtrsim\sqrt{d}W_{\approx 2^{j}d}[f],

and summing over $j$ yields the result. ∎

3.1 Proof of Theorem 1.2

If ${\sf var}(f)\geqslant 2^{-16}$ , then it is enough to prove a lower bound of $\gtrsim{\sf var}(f)$ . Applying Lemma 3.2 on $d$ ’s that are powers of $2$ between $1$ and $n$ , we get that

\sum\limits_{j=0}^{\log n}\frac{1}{\sqrt{2^{j}}}{{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}}\gtrsim\sum\limits_{j=0}^{\log n}{W_{\approx 2^{j}}[f]}={\sf var}(f),

and by geometric sum, the left hand side is $\lesssim{\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{2}}\right]}$ .

We thus assume that ${\sf var}(f)\leqslant 2^{-16}$ , and without loss of generality the majority value of $f(x)$ is $0$ . Thus, $\mathop{\mathbb{E}}[f]\leqslant 2{\sf var}(f)$ . Setting $d=\frac{1}{8}\log(1/{\sf var}(f))$ , we have that

\sum\limits_{0<\left|{S}\right|\leqslant d}\widehat{f}(S)^{2}\leqslant\|f^{\leqslant d}\|_{2}^{2}=\langle{f},{f^{\leqslant d}}\rangle\leqslant\|f\|_{4/3}\|f^{\leqslant d}\|_{4}\leqslant\sqrt{3}^{d}\mathop{\mathbb{E}}[f]^{3/4}\|f\|_{2}\leqslant 2^{d}\mathop{\mathbb{E}}[f]^{5/4}\leqslant 0.9{\sf var}(f),

where we used the bound $\|f^{\leqslant d}\|_{4}\leqslant\sqrt{3}^{d}\|f^{\leqslant d}\|_{2}\leqslant\sqrt{3}^{d}\|f\|_{2}$ which follows from Theorem 2.5 and Parseval’s equality. Therefore, $W_{\geqslant d}[f]\geqslant 0.1{\sf var}(f)$ , and by Lemma 3.3 we conclude the result.∎

3.2 Proof of Theorem 1.3

Denote $M[f]=\sum\limits_{i=1}^{n}I_{i}[f]^{2}$ ; we need the following result due to [Tal96, BKS99, KK13] (the precise version we use is due to Keller and Kindler [KK13]).

Theorem 3.4.

There are $c_{1},c_{2}>0$ , such that for all $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ we have that

W_{\leqslant c_{1}\log(1/M[f])}[f]\leqslant M[f]^{c_{2}}.

We may now give the proof of Theorem 1.3.

Proof of Theorem 1.3.

If $M[f]\geqslant{\sf var}(f)^{2/c_{2}}$ , then the result follows from Theorem 1.2. Otherwise, fix $c_{1},c_{2}$ from Theorem 3.4 and set $d=c_{1}\log(1/M[f])$ . By Theorem 3.4 we have that $W_{\leqslant d}[f]\leqslant M[f]^{c_{2}}\leqslant{\sf var}(f)^{2}\leqslant{1\over 2}{\sf var}(f)$ , and so $W_{>d}[f]\geqslant{1\over 2}{\sf var}(f)$ , and the result follows from Lemma 3.3. ∎

We remark that Theorem 1.3 implies a stability result for the KKL theorem. Namely, if all influences of a Boolean function are at most $O(\log n/n)$ , then $f$ must have constant-size vertex boundary. More precisely:

Corollary 3.5.

For any $K>0$ there are $c,\delta>0$ such that the following holds. Suppose that for $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ we have that $\max_{i}I_{i}[f]\leqslant K{\sf var}(f)\frac{\log n}{n}$ . Then

{\Pr_{x}\left[{\|\nabla f(x)\|_{2}\geqslant c{\sf var}(f)\sqrt{\log n}}\right]}\geqslant\delta{\sf var}(f).

Proof.

Note that by assumption, $M[f]\leqslant K^{2}\frac{\log^{2}n}{n}\leqslant\frac{K^{2}}{\sqrt{n}}$ , so Theorem 1.3 implies that $Z\stackrel{{\scriptstyle def}}{{=}}{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}}\right]}\geqslant K^{\prime}{\sf var}(f)\sqrt{\log n}$ for some $K^{\prime}$ depending only on $K$ . Also, we have that

{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}^{2}}\right]}=I_{1}[f]+\ldots+I_{n}[f]\leqslant K{\sf var}(f)\log n.

It follows from the Paley-Zygmund inequality that

{\Pr\left[{\|\nabla f(x)\|_{2}\geqslant{1\over 2}Z}\right]}\geqslant\frac{1}{4}\frac{Z^{2}}{{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}^{2}}\right]}}\geqslant\frac{1}{4}\frac{K^{\prime 2}}{K}{\sf var}(f),

and the claim is proved for $c={1\over 2}K^{\prime}$ and $\delta=\frac{1}{4}\frac{K^{\prime 2}}{K}$ . ∎

3.3 Extensions

3.3.1 Other $L_{p}$ -norms

Lemma 3.1 applies not only in the case of $L_{2}$ -norms, but rather for any $L_{p}$ norm:

Lemma 3.6.

Let $1\leqslant p\leqslant 2$ and $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ . Then ${\mathop{\mathbb{E}}\left[{\|\nabla f(x)\|_{p}}\right]}\geqslant W_{=1}[f]^{1/p}$ .

Proof.

By Jensen’s inequality and Lemma 3.1 we have

{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{p}}\right]}={\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}^{2/p}}\right]}\geqslant{\mathop{\mathbb{E}}_{x}\left[{\|\nabla f(x)\|_{2}}\right]}^{2/p}\geqslant W_{=1}[f]^{1/p}.\qed

As in Lemma 3.2, the previous lemma quickly leads to the following conclusion.

Lemma 3.7.

Let $d\in\mathbb{N}$ , $\alpha\in[1/2,1]$ and $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ . Then ${\mathop{\mathbb{E}}\left[{s_{f}(x)^{\alpha}}\right]}\gtrsim d^{\alpha}W_{\geqslant d}[f]$ .

Using the same argument as before and replacing invocations of Lemma 3.3 with Lemma 3.7, one may conclude variants of Theorems 1.2 and 1.3 in which square roots are replaced with any power $p\in[1/2,1]$ . Below are precise statements.

Theorem 3.8.

There exists an absolute constant $c>0$ , such that for all non-constant functions $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ and for all $p\in[1/2,1]$ we have that

{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\geqslant c\cdot{\sf var}(f)\left(\log\left(\frac{1}{{\sf var}(f)}\right)\right)^{p}.

Proof.

The proof is identical to the proof of Theorem 1.2 where we use Lemma 3.7 instead of Lemma 3.3. ∎

Theorem 3.9.

There exists an absolute constant $c>0$ , such that for all non-constant functions $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ and for all $p\in[1/2,1]$ we have that

{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\geqslant c\cdot{\sf var}(f)\left(\log\left(1+\frac{1}{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\right)\right)^{p}.

Proof.

The proof is identical to the proof of Theorem 1.3 where we use Lemma 3.7 instead of Lemma 3.3. ∎

3.3.2 Talagrand Boundary and Noise Sensitivity

We next demonstrate the connection between having small Talagrand-type boundary and being noise stable, a notion that we define next.

For a parameter $\rho\in[0,1]$ and a point $x\in{\left\{0,1\right\}}^{n}$ , the distribution of $\rho$ -correlated inputs with $x$ , denoted as $y\sim_{\rho}x$ , is defined by the following randomized process. For each $i\in[n]$ independently, set $y_{i}=x_{i}$ with probability $\rho$ , and otherwise sample $y_{i}\in\{0,1\}$ uniformly. The operator $T_{\rho}\colon L_{2}({\left\{0,1\right\}}^{n})\to L_{2}({\left\{0,1\right\}}^{n})$ , known as the noise operator with noise rate $1-\rho$ , is defined as

T_{\rho}f(x)={\mathop{\mathbb{E}}_{y\sim_{\rho}x}\left[{f(y)}\right]}.

Definition 3.10.

For $\varepsilon>0$ , the noise stability of $f$ with noise rate $\varepsilon$ is defined as ${\sf Stab}_{1-\varepsilon}(f)=\langle{f},{T_{1-\varepsilon}f}\rangle$ .

The relation established in Lemma 3.7 between the Fourier weight of $f$ and ${\mathop{\mathbb{E}}\left[{s_{f}(x)^{\alpha}}\right]}$ for $1/2\leqslant\alpha\leqslant 1$ implies a relation between the latter quantity and noise stability, as follows:

Corollary 3.11.

There exists an absolute constant $C>0$ , such that for any $\delta\in[0,1/2]$ and for any function $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ satisfying ${\mathop{\mathbb{E}}\left[{s_{f}(x)^{{1\over 2}+\delta}}\right]}\leqslant A$ , we have that ${\sf Stab}_{1-\varepsilon}(f)\geqslant 1-C\cdot A\cdot\varepsilon^{{1\over 2}+\delta}\log(1/\varepsilon)$ .

Proof.

1-{\sf Stab}_{1-\varepsilon}(f)={\Pr_{x,y\text{ $(1-\varepsilon)$-correlated}}\left[{f(x)\neq f(y)}\right]}=2\langle{1-f},{T_{1-\varepsilon}f}\rangle=2\sum\limits_{k=0}^{n}{(1-(1-\varepsilon)^{k})W_{=k}[f]}.

Set $d=1/\varepsilon$ ; we split the sum into $k<d$ and $k\geqslant d$ , and upper bound the contribution of each part separately. For $k\geqslant d$ , using Lemma 3.7 we have

2\sum\limits_{k=d}^{n}{(1-(1-\varepsilon)^{k})W_{=k}[f]}\leqslant 2W_{\geqslant d}[f]\lesssim A\cdot d^{-(1/2+\delta)}=A\varepsilon^{1/2+\delta}.

For $k\leqslant d$ , we write

\sum\limits_{k=0}^{d}{(1-(1-\varepsilon)^{k})W_{=k}[f]}\leqslant\sum\limits_{k=1}^{d}{k\varepsilon W_{=k}[f]}\leqslant\varepsilon\sum\limits_{k=1}^{d}W_{\geqslant k}[f].

Using Lemma 3.7 we get $W_{\geqslant k}[f]\lesssim A\cdot k^{-(1/2+\delta)}$ , and so

\sum\limits_{k=0}^{d}{(1-(1-\varepsilon)^{k})W_{=k}[f]}\lesssim A\cdot\varepsilon\sum\limits_{k=1}^{d}\frac{1}{k^{1/2+\delta}}\lesssim A\cdot\varepsilon d^{1/2-\delta}\sum\limits_{k=1}^{d}\frac{1}{k}\lesssim A\cdot\varepsilon d^{1/2-\delta}\log d.\qed

Remark 3.12.

Inspecting the proof of Corollary 3.11, we see that if $\delta$ is bounded away from $1/2$ , say $\delta\leqslant 0.49$ , then the $\log(1/\varepsilon)$ term may be removed. Indeed, this is true because in the end of the proof, we can replace the upper bound $\sum\limits_{k=1}^{d}\frac{1}{k^{1/2+\delta}}\lesssim d^{1/2-\delta}\log d$ by the upper bound $\sum\limits_{k=1}^{d}\frac{1}{k^{1/2+\delta}}\lesssim d^{1/2-\delta}$ that holds if $\delta$ is bounded away from $1/2$ .

3.4 Robust Versions

Fix a $2$ -coloring ${\sf col}\colon E(\{0,1\}^{n})\to\{{\sf red},{\sf blue}\}$ . Let $\nabla_{{\sf red}}f(x)$ be the Boolean vector whose $i$ th coordinate is $1$ if $f(x)=1$ and $(x,x\oplus e_{i})$ is a red sensitive edge, and let $\nabla_{{\sf blue}}f(x)$ be the Boolean vector whose $i$ th coordinate is $1$ if $f(x)=0$ and $(x,x\oplus e_{i})$ is a blue sensitive edge. Analogously to Lemma 3.1 we have:

Lemma 3.13.

Let $f\colon{\left\{0,1\right\}}^{n}\to{\left\{0,1\right\}}$ , and ${\sf col}\colon E(\{0,1\}^{n})\to\{{\sf red},{\sf blue}\}$ be any coloring. Then

{\mathop{\mathbb{E}}\left[{\|\nabla_{{\sf red}}f(x)\|_{2}+\|\nabla_{{\sf blue}}f(x)\|_{2}}\right]}\geqslant\frac{1}{2}\sqrt{W_{=1}[f]}.

Proof.

By the triangle inequality ${\mathop{\mathbb{E}}_{x}\left[{\|\nabla_{{\sf red}}f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf red}}f}\right]}(x)\|_{2}$ , and looking at the vector $v={\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf red}}f(x)}\right]}$ we see its $i$ th entry is $v_{i}={\Pr_{x}\left[{f(x)=1,(x,x\oplus e_{i})\text{ is a red influential edge}}\right]}$ . Similarly, ${\mathop{\mathbb{E}}_{x}\left[{\|\nabla_{{\sf blue}}f(x)\|_{2}}\right]}\geqslant\|{\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf blue}}f}\right]}(x)\|_{2}$ and looking at the vector $u={\mathop{\mathbb{E}}_{x}\left[{\nabla_{{\sf blue}}f(x)}\right]}$ we see its $i$ th entry is $u_{i}={\Pr_{x}\left[{f(x)=0,(x,x\oplus e_{i})\text{ is a blue influential edge}}\right]}$ . It follows that

{\mathop{\mathbb{E}}\left[{\|\nabla_{{\sf red}}f(x)\|_{2}+\|\nabla_{{\sf blue}}f(x)\|_{2}}\right]}\geqslant\|v\|_{2}+\|u\|_{2}\geqslant\|u+v\|_{2}.

Let $R_{i}(x)$ be the indicator that $(x,x\oplus e_{i})$ is a red influential edge and $f(x)=1$ , and $B_{i}(x)$ be the indicator that $(x,x\oplus e_{i})$ is a blue influential edge and $f(x)=0$ . Then

u_{i}+v_{i}=\frac{1}{2^{n+1}}\sum\limits_{x}(R_{i}(x)+R_{i}(x\oplus e_{i})+B_{i}(x)+B_{i}(x\oplus e_{i}))\geqslant\frac{1}{2^{n+1}}\sum\limits_{x}1_{(x,x\oplus e_{i})\text{ influential}}=\frac{1}{2}I_{i}[f].

Hence

\|u+v\|_{2}\geqslant\frac{1}{2}\sqrt{\sum\limits_{i=1}^{n}I_{i}[f]^{2}}\geqslant\frac{1}{2}\sqrt{W_{=1}[f]}.\qed

Secondly, for any $f$ , coloring ${\sf col}$ and $x\in{\left\{0,1\right\}}^{n}$ , choosing $J\subseteq[n]$ randomly by including each coordinate independently with probability $\frac{1}{2d}$ , one has

{\mathop{\mathbb{E}}_{J}\left[{\sqrt{s_{f_{\bar{J}\rightarrow x_{\bar{J}}},{\sf red}}(x_{J})}}\right]}\leqslant\sqrt{{\mathop{\mathbb{E}}_{J}\left[{s_{f_{\bar{J}\rightarrow x_{\bar{J}}},{\sf red}}(x_{J})}\right]}}=\sqrt{\frac{s_{f,{\sf red}}(x)}{2d}},

and similarly for blue sensitivity. This implies the analog of (2) for red and blue sensitivity, and repeating the argument in Lemmas 3.2, 3.3 we get that ${\sf Talagrand}_{{\sf col}}(f)\gtrsim\sqrt{d}W_{\geqslant d}[f]$ for any coloring ${\sf col}$ . The robust versions of Theorems 1.2 and 1.3 now follow as we have shown that $W_{\geqslant d}[f]\geqslant\Omega({\sf var}(f))$ where we can either take $d=\Omega(\log(1/{\sf var}(f)))$ or $d=\Omega(\log(1/M(f)))$ .

4 Improved Junta Theorems

4.1 Proof of Theorem 1.5

In this section, we prove Theorem 1.5. As explained in the introduction, if one does not care about getting tight dependency between the junta size $J$ and the parameter $A$ , then one may use Corollary 3.11 and Bourgain’s noise sensitivity theorem [Bou02, KKO18] and get a junta theorem with slightly worse parameters than those stated in Theorem 1.5. The main goal of this section is to get an optimal dependency of $J$ on $A$ .

For the proof, we will need the notion of noisy influences of a function $f$ , and we first generalize the notion of influences to real valued functions.

Definition 4.1.

Let $f\colon{\left\{0,1\right\}}^{n}\to\mathbb{R}$ be a function, and let $i\in[n]$ be a coordinate. We define the influence of $i$ on $f$ as $I_{i}[f]=\|\partial_{i}f\|_{2}^{2}$ , where we recall that $\partial_{i}f$ is the discrete derivative of $f$ along direction $i$ .

Definition 4.2.

The $\rho$ -noisy influence of a coordinate $i$ for $f\colon\{0,1\}^{n}\to\mathbb{R}$ is defined to be $I_{i}^{(\rho)}[f]=I_{i}[T_{\rho}f]$ .

Fact 4.3.

[[O’D14]] For all $f\colon\{0,1\}^{n}\to\{0,1\}$ we have that $\sum\limits_{i=1}^{n}I_{i}^{(\rho)}[f]\lesssim\frac{1}{1-\rho}$ . ∎

Let $C_{1},C_{2}$ be two absolute constants, and we think of $C_{2}$ as much larger than $C_{1}$ . Given $f\colon\{0,1\}^{n}\to\{0,1\}$ with ${\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}\leqslant A$ , we denote $d=C_{1}\left(\frac{A}{\varepsilon}\right)^{1/p}$ , $\delta=2^{-\frac{C_{2}}{2p-1}d}$ . Define the candidate set of junta coordinates as

\mathcal{J}=\left\{\left.i\in[n]\;\right|I_{i}[T_{1-\frac{1}{3d}}[f]]\geqslant\delta\right\},

i.e. $\mathcal{J}$ is the set of coordinates whose noisy influence at $f$ is somewhat large. Our task is to show that $f$ is $\varepsilon$ -close to a $\mathcal{J}$ junta, i.e. that

\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant\varepsilon,

(3)

and the rest of the proof is devoted towards this goal. We remark that the junta $\mathcal{J}$ is similar in spirit to the junta in Bourgain’s original proof [Bou02], which is easier to see in [KKO18, Section 4] and in particular in [KKO18, Theorem 4.20]. Indeed, both here and in [KKO18, Section 4], the junta is taken to be the set of coordinates with significant noisy influence. There are many other similarities between the argument presented therein and the argument here. First, in both cases one considers a dyadic sort of partitioning of the Fourier coefficients: in [KKO18, Section 4] it is achieved via considering the difference in the noise sensitivity of $f$ at different noise rates (see [KKO18, Theorem 4.16]). In the current proof, we perform explicit dyadic partitioning and also accompany it by applications of suitable noise operators (the operators $S_{k}$ , $S_{k}^{\prime}$ and $S_{k}^{\prime\prime}$ below). Second, both proofs also use some relation between the noise sensitivity / moments of the sensitivity and the level $1$ weight of the function $f$ (and its restrictions). While the technical execution of the proofs is different, it is hard to pinpoint the high level difference between them. For example, the “harsh” degree truncations in [KKO18] have to be replaced by the “softer” type of degree truncations that are achieved by the operators $S_{k}$ , $S_{k}^{\prime}$ and $S_{k}^{\prime\prime}$ (see also the parameter $\tilde{W}_{k,d}$ ) to make the argument go through. Additionally, towards the end of the argument we are able to bound the sum of noisy influences in a non-trivial way, which ultimately allows us to choose $\delta=2^{O_{p}(d)}$ as opposed to $\delta=2^{O_{p}(d\log d)}$ (which would have resulted in slightly worse bounds as in [KKO18, Theorem 4.20]); see Claim 4.5.

Towards proving (3), we partition the contribution of different scales of $\left|{S\cap\mathcal{J}}\right|$ to the left hand side, and for each $k=1,\ldots,\log d$ define

W_{k,d}[f]=\sum\limits_{\begin{subarray}{c}S:\left|{S}\right|\leqslant d,\\ 2^{k-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|\leqslant 2^{k}\end{subarray}}\widehat{f}(S)^{2}.

Thus, we may write

\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant W_{>d}[f]+\sum\limits_{k=1}^{\log d}W_{k,d}[f].

(4)

The rest of the proof is devoted to bounding $W_{>d}[f]$ and $W_{k,d}[f]$ for each $k$ . From Lemma 3.7 it follows that

W_{>d}[f]\leqslant O\left(\frac{A}{d^{p}}\right)\leqslant\frac{\varepsilon}{16}

(5)

for sufficiently large $C_{1}$ . Next, we bound $W_{k,d}[f]$ . Fix $k$ , and consider the operator $S_{k}=T_{1-1/d}^{\mathcal{J}}T_{1-1/2^{k}}^{\bar{\mathcal{J}}}$ , i.e. the noise operator that applies $1/2^{k}$ noise on coordinates outside $\mathcal{J}$ , and $1/d$ noise on coordinates of $\mathcal{J}$ . Note that

W_{k,d}[f]\lesssim W_{k,d}[S_{k}f],

so it suffices to bound $W_{k,d}[S_{k}f]$ . We partition $W_{k,d}[S_{k}f]$ according to contribution of each $j\in\bar{\mathcal{J}}$ , getting:

	$\displaystyle W_{k,d}[S_{k}f]=\sum\limits_{\begin{subarray}{c}S:\left\|{S}\right\|\leqslant d,\\ 2^{k-1}\leqslant\left\|{S\cap\bar{\mathcal{J}}}\right\|\leqslant 2^{k}\end{subarray}}\widehat{S_{k}f}(S)^{2}\leqslant\frac{1}{2^{k-1}}\sum\limits_{j\not\in\mathcal{J}}\sum\limits_{\begin{subarray}{c}S\ni j:\left\|{S}\right\|\leqslant d,\\ 2^{k-1}\leqslant\left\|{S\cap\bar{\mathcal{J}}}\right\|\leqslant 2^{k}\end{subarray}}\widehat{S_{k}f}(S)^{2}$	$\displaystyle\lesssim\frac{1}{2^{k-1}}\sum\limits_{j\not\in\mathcal{J}}\\|S_{k}\partial_{j}f\\|_{2}^{2}$
		$\displaystyle\stackrel{{\scriptstyle def}}{{=}}\tilde{W}_{k,d}[f].$

Thus, we have that $W_{k,d}[f]\lesssim\tilde{W}_{k,d}[f]$ , and the rest of the proof is devoted to upper bounding the right hand side. Define the operators $S_{k}^{\prime}$ and $S_{k}^{\prime\prime}$ as

S_{k}^{\prime}=T_{1-\frac{1}{3d}}^{\mathcal{J}}T_{1-\frac{1}{2^{k}}+\frac{2}{3d}}^{\bar{\mathcal{J}}},\qquad S_{k}^{\prime\prime}=T_{1-\frac{2}{3d}}^{\mathcal{J}}T_{1-\frac{1}{2^{k}}+\frac{1}{3d}}^{\bar{\mathcal{J}}}.

We establish the following claim using random restrictions and hypercontractivity:

Claim 4.4.

For all $\tau\in(0,1)$ , we have that

2^{k}\tilde{W}_{k,d}[f]\leqslant C\cdot d^{1-p}\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}+C\cdot\tau^{-\frac{1}{100d}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}},

where $C$ is an absolute constant.

Proof.

Deferred to Section 4.2. ∎

Given Claim 4.4, one may pull out $\|S_{k}^{\prime}\partial_{j}f\|_{2}^{\frac{1}{100d}}\leqslant\delta^{\frac{1}{200d}}$ outside the sum, and bound the sum on the rest of the noisy influences by $O(d)$ to get a bound of the form $2^{k}\tilde{W}_{k,d}[f]\lesssim d^{1-p}\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}+d\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}$ . This idea is good enough to establish a junta theorem, yet it gives worse parameters.⁷⁷7One has to take $\delta=2^{-O(d\log d)}$ , as opposed to $\delta=2^{-O(d)}$ as we have taken. We remark that if one only cares about getting a larger junta of size $2^{O(d\log d)}$ , the proof greatly simplifies. Instead, to get the tight result with respect to the junta size, we will need a more careful bound on the sum of noisy influences outside $\mathcal{J}$ . Specifically, we use an improved bound of the sum of noisy influences that relates them to $W_{\ell,d}[f]$ . To state this bound we define

\varepsilon_{\ell,d}=\sum\limits_{\begin{subarray}{c}\left|{S}\right|>d\\ 2^{\ell-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|\leqslant 2^{\ell}\end{subarray}}\widehat{f}(S)^{2}.

Claim 4.5.

$\frac{1}{2^{k}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2}\lesssim\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}(W_{\ell,d}[f]+\varepsilon_{\ell,d})$ .

Proof.

By definition,

\frac{1}{2^{k}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2}=4\sum\limits_{S}\frac{\left|{S\cap\bar{\mathcal{J}}}\right|}{2^{k}}\left(1-\frac{1}{2^{k}}+\frac{2}{3d}\right)^{\left|{S\cap\bar{\mathcal{J}}}\right|-1}\left(1-\frac{1}{3d}\right)^{\left|{S\cap\mathcal{J}}\right|}\widehat{f}(S)^{2}.

Partitioning the last sum according to $\left|{S\cap\bar{\mathcal{J}}}\right|$ and dyadically partitioning so that $2^{\ell-1}\leqslant\left|{S\cap\bar{\mathcal{J}}}\right|<2^{\ell}$ , we have that the last expression is at most

\lesssim\sum\limits_{\ell=1}^{\log n}2^{\ell-k}\left(1-\frac{1}{2^{k}}+\frac{2}{3d}\right)^{2^{\ell-1}}(W_{\ell,d}[f]+\varepsilon_{\ell,d})\lesssim\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}(W_{\ell,d}[f]+\varepsilon_{\ell,d}).\qed

Combining Claims 4.4 and 4.5 we immediately get the following corollary:

Corollary 4.6.

There exists an absolute constant $C>0$ such that for all $\tau>0$ ,

\tilde{W}_{k,d}[f]\leqslant C\cdot 2^{-k}d^{1-p}\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{s_{f}(x)^{p}}\right]}+C\cdot\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\left(\sum\limits_{\ell=1}^{\log d}2^{-\left|{\ell-k}\right|}W_{\ell,d}[f]+\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}\varepsilon_{\ell,d}\right).

We may now give an upper bound on $\sum\limits_{k=1}^{\log d}W_{k,d}[f]$ .

Claim 4.7.

For sufficiently large absolute constant $C_{1}$ and sufficiently large $C_{2}$ in comparison to $C_{1}$ , we have that $\sum\limits_{k=1}^{\log d}W_{k,d}[f]\leqslant\frac{\varepsilon}{16}$ .

Proof.

Let $\tau>0$ to be determined. As $W_{k,d}[f]\lesssim\tilde{W}_{k,d}[f]$ , by Corollary 4.6 we have

\sum\limits_{k=1}^{\log d}W_{k,d}[f]\lesssim\sum\limits_{k=1}^{\log d}\left(2^{-k}d^{1-p}\tau^{2p-1}A+\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\left(\sum\limits_{\ell=1}^{\log d}2^{-\left|{\ell-k}\right|}W_{\ell,d}[f]+\sum\limits_{\ell=1}^{\log n}2^{-\left|{\ell-k}\right|}\varepsilon_{\ell,d}\right)\right).

The first sum is clearly $\lesssim d^{1-p}\tau^{2p-1}A$ . The second sum is at most

\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log d}W_{\ell,d}[f]\sum\limits_{k}2^{-\left|{\ell-k}\right|}\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log d}W_{\ell,d}[f].

Finally, the third sum is at most

\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log n}\varepsilon_{\ell,d}\sum\limits_{k=1}^{\log d}2^{-\left|{\ell-k}\right|}\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{\ell=1}^{\log n}\varepsilon_{\ell,d}\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}W_{>d}[f]\lesssim\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\varepsilon,

where we used (5). We thus get that

\sum\limits_{k=1}^{\log d}W_{k,d}[f]\leqslant C\cdot d^{1-p}\tau^{2p-1}A+C\cdot\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\sum\limits_{k=1}^{\log d}W_{k,d}[f]+C\cdot\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\varepsilon

for some absolute constant $C>0$ . We choose $\tau=\delta^{1/4}$ and then $C_{2}$ large enough so that $C\tau^{-\frac{1}{100d}}\delta^{\frac{1}{200d}}\leqslant\frac{1}{2}$ , to get that $\sum\limits_{k=1}^{\log d}W_{k,d}[f]\leqslant 2C\cdot d^{1-p}\delta^{(2p-1)/4}A+2C\delta^{\frac{1}{400d}}\varepsilon$ , which is at most $\frac{\varepsilon}{16}$ for large enough $C_{2}$ . ∎

We can now prove Theorem 1.5.

Proof of Theorem 1.5.

Combining (4), (5) and Claim 4.7 we get that $\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant\frac{\varepsilon}{8}$ . Define $h(x)=1$ if $\sum\limits_{S:S\subseteq\mathcal{J}}\widehat{f}(S)\chi_{S}(x)\geqslant 1/2$ and $0$ otherwise. Clearly, $h$ is a $\mathcal{J}$ -junta and we have that

\|f-h\|_{1}=\|f-h\|_{2}^{2}\leqslant 4\left\|f-\sum\limits_{S:S\subseteq\mathcal{J}}\widehat{f}(S)\chi_{S}(x)\right\|_{2}^{2}=4\sum\limits_{S:S\not\subseteq\mathcal{J}}\widehat{f}(S)^{2}\leqslant\varepsilon.

Finally, note that as the sum of $\left(1-\frac{1}{3d}\right)$ -noisy influences of $f$ is at most $O(d)$ by Fact 4.3, we get that

\left|{\mathcal{J}}\right|\leqslant O\left(\frac{d}{\delta}\right).\qed

4.2 Proof of Claim 4.4

For the proof of Claim 4.4 we need a version of the hypercontractive inequality with the noise operator, as follows:

Lemma 4.8.

[[O’D14]] Let $q>2$ , and $\rho\leqslant\frac{1}{\sqrt{q-1}}$ . Then for all $f\colon\{0,1\}^{n}\to\mathbb{R}$ we have $\|T_{\rho}f\|_{q}\leqslant\|f\|_{2}$ .

We also need the following claim, asserting that noise only decreases Talagrand’s boundary.

Claim 4.9.

For $f\colon\{0,1\}^{n}\to\mathbb{R}$ and any noise operator $S=T_{\rho_{1}}\otimes\ldots T_{\rho_{n}}$ we have ${\mathop{\mathbb{E}}_{x}\left[{\|\nabla{Sf}(x)\|_{p}}\right]}\leqslant{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{f}(x)\|_{p}}\right]}$ .

Proof.

{\mathop{\mathbb{E}}_{x}\left[{\|\nabla(Sf)(x)\|_{p}}\right]}\leqslant{\mathop{\mathbb{E}}_{x}\left[{\|S\nabla f(x)\|_{p}}\right]}={\mathop{\mathbb{E}}_{x}\left[{\|{\mathop{\mathbb{E}}_{y\sim Sx}\left[{\nabla f(y)}\right]}\|_{p}}\right]},

and the result follows from the triangle inequality. ∎

Take $(J,z)$ to be a random restriction so that each $j\in[n]$ is included in $J$ independently with probability $\frac{1}{3d}$ , and consider $g=S_{k}^{\prime\prime}f$ . We calculate the contribution of $\bar{\mathcal{J}}$ to the level $1$ weight of a restriction of $g$ :

	$\displaystyle d{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}=d\sum\limits_{j\in\overline{\mathcal{J}}}{\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{S}\widehat{g(S)}^{2}1_{S\cap J=\{j\}}}\right]}$	$\displaystyle=d\sum\limits_{S}\frac{\left\|{S\cap\bar{\mathcal{J}}}\right\|}{3d}\left(1-\frac{1}{3d}\right)^{\left\|{S}\right\|-1}\widehat{g(S)}^{2}$
		$\displaystyle=\frac{1}{3}\sum\limits_{S}\left\|{S\cap\bar{\mathcal{J}}}\right\|\left(1-\frac{1}{3d}\right)^{\left\|{S}\right\|-1}\widehat{g(S)}^{2}.$

Also,

\displaystyle\sum\limits_{j\not\in\mathcal{J}}\|S_{k}\partial_{j}f\|_{2}^{2}=4\sum\limits_{S}\left|{S\cap\bar{\mathcal{J}}}\right|\left(1-\frac{1}{2^{k}}\right)^{-2}\rho_{1}^{2\left|{S\cap\mathcal{J}}\right|}\rho_{2}^{2\left|{S\cap\bar{\mathcal{J}}}\right|}\widehat{g(S)}^{2},

where $\rho_{1}=\frac{1-1/d}{1-2/(3d)}$ and $\rho_{2}=\frac{1-1/2^{k}}{1-1/2^{k}+1/(3d)}$ . Using $\frac{a}{b}\leqslant\frac{a+c}{b+c}$ for $0\leqslant a\leqslant b$ and $c\geqslant 0$ we have

\rho_{1}\leqslant\frac{1-1/d+2/(3d)}{1-2/(3d)+2/(3d)}=1-\frac{1}{3d},\qquad\rho_{2}\leqslant\frac{1-1/2^{k}+(1/2^{k}-1/(3d))}{1-1/2^{k}+1/(3d)+(1/2^{k}-1/(3d))}=1-\frac{1}{3d},

and so

\sum\limits_{j\not\in\mathcal{J}}\|S_{k}\partial_{j}f\|_{2}^{2}\leqslant 12\left(1-\frac{1}{d}\right)^{-2}d{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\lesssim d{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}.

Hence our goal of upper bounding the left hand side in Claim 4.4 reduces to upper bounding the contribution of $\mathcal{J}$ to the level $1$ of random restrictions of $g$ . For $J,z$ define

L_{J,z}=\left\{\left.j\in J\cap\bar{\mathcal{J}}\;\right|\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|\leqslant\tau\right\},\qquad H_{J,z}=\left\{\left.j\in J\cap\bar{\mathcal{J}}\;\right|\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|>\tau\right\}.

Then

{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}={\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}+{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}.

Bounding the contribution from $L_{J,z}$ .

For the first sum, we have

	$\displaystyle\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}\leqslant\left(\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}\right)^{p}$	$\displaystyle\leqslant\left(\sum\limits_{j\in L_{J,z}}\tau^{(2p-1)/p}\left\|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right\|^{1/p}\right)^{p}$
		$\displaystyle\leqslant\tau^{2p-1}\left(\sum\limits_{j}\left\|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right\|^{1/p}\right)^{p}.$

In the last expression we have $\tau^{2p-1}$ times the $1/p$ -norm of the vector $(\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}}))_{j\in J}$ . Thus, using the triangle inequality in a similar way to Lemma 3.1, we may bound the last expression from above by

\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{g_{\overline{J}\rightarrow z}}(x)\|_{1/p}}\right]}.

(6)

Taking expectation, we get that

{\mathop{\mathbb{E}}_{J,z}\left[{\sum\limits_{j\in L_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\leqslant\tau^{2p-1}{\mathop{\mathbb{E}}_{J,z}\left[{{\mathop{\mathbb{E}}_{x}\left[{\|\nabla{g_{\overline{J}\rightarrow z}}(x)\|_{1/p}}\right]}}\right]}=\tau^{2p-1}{\mathop{\mathbb{E}}_{x}\left[{{\mathop{\mathbb{E}}_{J,z}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}}\right]}}\right]}.

Fix $x,z$ , and note that ${\mathop{\mathbb{E}}_{J}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}^{1/p}}\right]}={\mathop{\mathbb{E}}_{J}\left[{\sum\limits_{j}\left|{\partial_{j}g(x,z)}\right|^{1/p}1_{j\in J}}\right]}=\frac{1}{3d}\|\nabla g(x,z)\|_{1/p}^{1/p}$ . It follows by Jensen’s inequality that

{\mathop{\mathbb{E}}_{J}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}}\right]}\leqslant{\mathop{\mathbb{E}}_{J}\left[{\|\nabla g_{\overline{J}\rightarrow z}(x)\|_{1/p}^{1/p}}\right]}^{p}\leqslant d^{-p}\|\nabla g(x,z)\|_{1/p}.

Therefore (6) is upper bounded by $\tau^{2p-1}d^{-p}{\mathop{\mathbb{E}}_{x,z}\left[{\|\nabla g(x,z)\|_{1/p}}\right]}$ . Using Claim 4.9, we have that this may be upper bounded by $\tau^{2p-1}d^{-p}{\mathop{\mathbb{E}}_{x,z}\left[{\|\nabla f(x,z)\|_{1/p}}\right]}=\tau^{2p-1}d^{-p}{\mathop{\mathbb{E}}_{x,z}\left[{s_{f}(x,z)^{p}}\right]}$ .

Bounding the contribution from $H_{I,z}$ .

We have

\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}\leqslant\tau^{-\frac{1}{100d}}\sum\limits_{j\in H_{J,z}}\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}}\leqslant\tau^{-\frac{1}{100d}}\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}},

and we next take expectation over $z$ :

{\mathop{\mathbb{E}}_{z}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\leqslant\tau^{-\frac{1}{100d}}\sum\limits_{j\in J\cap\bar{\mathcal{J}}}{\mathop{\mathbb{E}}_{z}\left[{\left|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right|^{2+\frac{1}{100d}}}\right]}.

(7)

Define the operator $T=T_{\rho_{1}}^{\mathcal{J}}T_{\rho_{2}}^{\overline{\mathcal{J}}}$ where $\rho_{1}=\frac{1-2/(3d)}{1-1/(3d)}$ and $\rho_{2}=\frac{1-1/2^{k}+1/(3d)}{1-1/2^{k}+2/(3d)}$ , and note that by the semi-group property of the noise operator (namely, the fact that $T_{\rho}T_{\rho^{\prime}}=T_{\rho\rho^{\prime}}$ ) we may write $g=TS_{k}^{\prime}f$ . Thus,

\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})=\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\widehat{g}(S)\chi_{S\setminus\{j\}}(z)=\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left|{S\cap\mathcal{J}}\right|}\rho_{2}^{\left|{S\cap\overline{\mathcal{J}}}\right|}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus\{j\}}(z).

Thus, using hypercontractivity, i.e. Lemma 4.8, we have that

$\displaystyle{\mathop{\mathbb{E}}_{z}\left[{\left\|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right\|^{2+\frac{1}{100d}}}\right]}$	$\displaystyle=\left\\|\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left\|{S\cap\mathcal{J}}\right\|}\rho_{2}^{\left\|{S\cap\overline{\mathcal{J}}}\right\|}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\\|_{2+\frac{1}{100d}}^{2+\frac{1}{100d}}$
	$\displaystyle\leqslant\left\\|\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left\|{S\cap\mathcal{J}}\right\|}\rho_{2}^{\left\|{S\cap\overline{\mathcal{J}}}\right\|}\left(1-\frac{1}{3d}\right)^{-(\left\|{S}\right\|-1)}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\\|_{2}^{2+\frac{1}{100d}}$
	$\displaystyle\leqslant\left\\|\sum\limits_{S\ni j}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\\|_{2}^{2+\frac{1}{100d}}.$	(8)

In the last inequality, we used Parseval’s equality and that fact that $\rho_{1},\rho_{2}\leqslant 1-\frac{1}{3d}$ . Note that

\left\|\sum\limits_{S\ni j}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\|_{2}^{2+\frac{1}{100d}}\lesssim\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}},

so plugging (4.2) into (7) yields that

{\mathop{\mathbb{E}}_{z}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\lesssim\tau^{-\frac{1}{100d}}\sum\limits_{j\in J\cap\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}}.

Taking expectation over $J$ now gives that

{\mathop{\mathbb{E}}_{z,J}\left[{\sum\limits_{j\in H_{J,z}}\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})^{2}}\right]}\lesssim\tau^{-\frac{1}{100d}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}}{\mathop{\mathbb{E}}_{J}\left[{1_{j\in J}}\right]}=\frac{1}{3d}\tau^{-\frac{1}{100d}}\sum\limits_{j\in\bar{\mathcal{J}}}\|S_{k}^{\prime}\partial_{j}f\|_{2}^{2+\frac{1}{100d}}.

∎

5 Acknowledgements

We thank an anonymous for many helpful comments and corrections to an earlier version of the paper which led to considerable improvements of the paper in all aspects.

References

[Bec75] William Beckner. Inequalities in Fourier analysis. Annals of Mathematics, 102:159–182, 1975.
[Ber67] Arthur J Bernstein. Maximally connected arrays on the n-cube. SIAM Journal on Applied Mathematics, 15(6):1485–1489, 1967.
[BKS99] Itai Benjamini, Gil Kalai, and Oded Schramm. Noise sensitivity of Boolean functions and applications to percolation. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques, 90(1):5–43, 1999.
[Bon70] Aline Bonami. Étude des coefficients de Fourier des fonctions de $l^{p}(g)$ . Annales de l’Institut Fourier, 20(2):335–402, 1970.
[Bou02] Jean Bourgain. On the distribution of the fourier spectrum of boolean functions. Israel Journal of Mathematics, 131(1):269–276, 2002.
[EG] Ronen Eldan and Renan Gross. Concentration on the Boolean hypercube via pathwise stochastic analysis. In STOC 2020.
[Fri98] Ehud Friedgut. Boolean functions with low average sensitivity depend on few coordinates. Combinatorica, 18(1):27–35, 1998.
[Gro75] Leonard Gross. Logarithmic Sobolev inequalities. American Journal of Mathematics, 97(4):1061–1083, 1975.
[Har66] Lawrence H Harper. Optimal numberings and isoperimetric problems on graphs. Journal of Combinatorial Theory, 1(3):385–393, 1966.
[Har76] Sergiu Hart. A note on the edges of the n-cube. Discrete Mathematics, 14(2):157–163, 1976.
[KK13] Nathan Keller and Guy Kindler. Quantitative relation between noise sensitivity and influences. Combinatorica, 33(1):45–71, 2013.
[KKK⁺] Esty Kelman, Subhash Khot, Guy Kindler, Dor Minzer, and Muli Safra. Theorems of KKL, Friedgut, and Talagrand via random restrictions and log-sobolev inequality. In ITCS 2021.
[KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions (extended abstract). In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80, 1988.
[KKO18] Guy Kindler, Naomi Kirshner, and Ryan O’Donnell. Gaussian noise sensitivity and fourier tails. Israel Journal of Mathematics, 225:71–109, 2018.
[KMS18] Subhash Khot, Dor Minzer, and Muli Safra. On monotonicity testing and Boolean isoperimetric-type theorems. SIAM J. Comput., 47(6):2238–2276, 2018.
[KT14] Christos P Kitsos and Thomas L Toulias. Inequalities for the Fisher’s information measures. Handbook of Functional Equations: Functional Inequalities, pages 281–313, 2014.
[Lin64] John H Lindsey. Assignment of numbers to vertices. The American Mathematical Monthly, 71(5):508–516, 1964.
[Mar74] Grigorii Aleksandrovich Margulis. Probabilistic characteristics of graphs with large connectivity. Problemy peredachi informatsii, 10(2):101–108, 1974.
[O’D14] Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
[Tal93] Michel Talagrand. Isoperimetry, logarithmic Sobolev inequalities on the discrete cube, and Margulis’ graph connectivity theorem. Geometric & Functional Analysis GAFA, 3(3):295–314, 1993.
[Tal94] Michel Talagrand. On Russo’s approximate zero-one law. The Annals of Probability, 22(3):1576–1587, 1994.
[Tal96] Michel Talagrand. How much are increasing sets positively correlated? Combinatorica, 16(2):243–258, 1996.
[Tal97] Michel Talagrand. On boundaries and influences. Combinatorica, 17(2):275–285, 1997.

$\displaystyle{\mathop{\mathbb{E}}_{z}\left[{\left\|{\widehat{g_{\overline{J}\rightarrow z}}({\left\{j\right\}})}\right\|^{2+\frac{1}{100d}}}\right]}$	$\displaystyle=\left\\|\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left\|{S\cap\mathcal{J}}\right\|}\rho_{2}^{\left\|{S\cap\overline{\mathcal{J}}}\right\|}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\\|_{2+\frac{1}{100d}}^{2+\frac{1}{100d}}$
	$\displaystyle\leqslant\left\\|\sum\limits_{S\subseteq\overline{J}\cup\{j\},S\ni j}\rho_{1}^{\left\|{S\cap\mathcal{J}}\right\|}\rho_{2}^{\left\|{S\cap\overline{\mathcal{J}}}\right\|}\left(1-\frac{1}{3d}\right)^{-(\left\|{S}\right\|-1)}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\\|_{2}^{2+\frac{1}{100d}}$
	$\displaystyle\leqslant\left\\|\sum\limits_{S\ni j}\widehat{S_{k}^{\prime}f}(S)\chi_{S\setminus{\left\{j\right\}}}(z)\right\\|_{2}^{2+\frac{1}{100d}}.$	(8)

Isoperimetric Inequalities Made Simpler

Abstract

1 Introduction

1.1 Isoperimetric Inequalities Over the Hypercube

Theorem 1.1 (Talagrand [Tal93]).

1.2 Our Results

1.3 Classical Isoperimetric Inequalities and Robust Versions

Theorem 1.2.

Theorem 1.3.

1.4 New Junta Theorems

Definition 1.4.

Theorem 1.5.

Corollary 1.6.

Proof.

Remark 1.7.

1.5 Proof Overview

2 Preliminaries

Notation.

2.1 Fourier Analysis over the Hypercube

Definition 2.1.

Fact 2.2.

Definition 2.3.

2.2 Random Restrictions

Fact 2.4.

Proof.

2.3 Bonami’s Lemma

Theorem 2.5.

3 Isoperimetric Inequalities on the Hypercube

Lemma 3.1.

Proof.

Lemma 3.2.

Proof.

Lemma 3.3.

Proof.

3.1 Proof of Theorem 1.2

3.2 Proof of Theorem 1.3

Theorem 3.4.

Proof of Theorem 1.3.

Corollary 3.5.

Proof.

3.3 Extensions

3.3.1 Other LpL_{p}-norms

Lemma 3.6.

Proof.

Lemma 3.7.

Theorem 3.8.

Proof.

Theorem 3.9.

Proof.

3.3.2 Talagrand Boundary and Noise Sensitivity

Definition 3.10.

Corollary 3.11.

Proof.

Remark 3.12.

3.4 Robust Versions

Lemma 3.13.

Proof.

4 Improved Junta Theorems

4.1 Proof of Theorem 1.5

Definition 4.1.

Definition 4.2.

Fact 4.3.

Claim 4.4.

Proof.

Claim 4.5.

Proof.

Corollary 4.6.

Claim 4.7.

Proof.

Proof of Theorem 1.5.

4.2 Proof of Claim 4.4

Lemma 4.8.

Claim 4.9.

Proof.

Bounding the contribution from LJ,zL_{J,z}.

Bounding the contribution from HI,zH_{I,z}.

5 Acknowledgements

References

3.3.1 Other $L_{p}$ -norms

Bounding the contribution from $L_{J,z}$ .

Bounding the contribution from $H_{I,z}$ .