This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Column randomization and almost-isometric embeddings

Shahar Mendelson Centre for Mathematics and its Applications, The Australian National University. Email: [email protected]
Abstract

The matrix A:nmA:\mathbb{R}^{n}\to\mathbb{R}^{m} is (δ,k)(\delta,k)-regular if for any kk-sparse vector xx,

|Ax22x22|δkx22.\left|\|Ax\|_{2}^{2}-\|x\|_{2}^{2}\right|\leq\delta\sqrt{k}\|x\|_{2}^{2}.

We show that if AA is (δ,k)(\delta,k)-regular for 1k1/δ21\leq k\leq 1/\delta^{2}, then by multiplying the columns of AA by independent random signs, the resulting random ensemble AεA_{\varepsilon} acts on an arbitrary subset TnT\subset\mathbb{R}^{n} (almost) as if it were gaussian, and with the optimal probability estimate: if (T)\ell_{*}(T) is the gaussian mean-width of TT and dT=suptTt2d_{T}=\sup_{t\in T}\|t\|_{2}, then with probability at least 12exp(c((T)/dT)2)1-2\exp(-c(\ell_{*}(T)/d_{T})^{2}),

suptT|Aεt22t22|C(ΛdTδ(T)+(δ(T))2),\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq C\left(\Lambda d_{T}\delta\ell_{*}(T)+(\delta\ell_{*}(T))^{2}\right),

where Λ=max{1,δ2log(nδ2)}\Lambda=\max\{1,\delta^{2}\log(n\delta^{2})\}. This estimate is optimal for 0<δ1/logn0<\delta\leq 1/\sqrt{\log n}.

1 Introduction

Linear operators that act in an almost-isometric way on subsets of n\mathbb{R}^{n} are of obvious importance. Although approximations of isometries are the only operators that almost preserve the Euclidean norm of any point in n\mathbb{R}^{n}, one may consider a more flexible alternative: a random ensemble of operators Γ\Gamma such that, for any fixed TnT\subset\mathbb{R}^{n}, with high probability, Γ\Gamma “acts well” on every element of TT. Such random ensembles have been studied extensively over the years, following the path paved by the celebrated work of Johnson and Lindenstrauss in [5]. Here we formulate the Johnson-Lindenstrauss Lemma in one of its gaussian versions:

Theorem 1.1.

There exist absolute constants c0c_{0} and c1c_{1} such that the following holds. Let 1mn1\leq m\leq n and set Γ:nm\Gamma:\mathbb{R}^{n}\to\mathbb{R}^{m} to be a random matrix whose entries are independent, standard gaussian random variables. Let TSn1T\subset S^{n-1} be of cardinality at most exp(c0m)\exp(c_{0}m). Then for m1/2log|T|<ρ<1m^{-1/2}\sqrt{\log|T|}<\rho<1, with probability at least 12exp(c1ρ2m)1-2\exp(-c_{1}\rho^{2}m), for every tTt\in T,

|m1/2Γt221|ρ.\left|\left\|m^{-1/2}\Gamma t\right\|_{2}^{2}-1\right|\leq\rho.

The scope of Theorem 1.1 can be extended to more general random ensembles than the gaussian one, e.g., to a random matrix whose rows are iid copies of a centred random vector that exhibits suitable decay properties (see, e.g. [4, 8]). It is far more challenging to construct a random ensemble that, on the one hand, satisfies a version of Theorem 1.1, and on the other is based on “few random bits” or is constructed using a heavy-tailed random vector.

A significant breakthrough towards more general “Johnson-Lindenstrauss transforms” came in [7], where it was shown that a matrix that satisfies a suitable version of the restricted isometry property, can be converted to the wanted random ensemble by multiplying its columns by random signs. More accurately, let ε1,,εn\varepsilon_{1},...,\varepsilon_{n} be independent, symmetric {1,1}\{-1,1\}-valued random variables. Set Dε=diag(ε1,,εn)D_{\varepsilon}={\rm diag}(\varepsilon_{1},...,\varepsilon_{n}) and for a matrix A:nmA:\mathbb{R}^{n}\to\mathbb{R}^{m} define

Aε=ADε.A_{\varepsilon}=AD_{\varepsilon}.

From here on we denote by Σk\Sigma_{k} the subset of Sn1S^{n-1} consisting of vectors that are supported on at most kk coordinates.

Definition 1.2.

A matrix A:nmA:\mathbb{R}^{n}\to\mathbb{R}^{m} satisfies the restricted isometry property of order kk and level δ(0,1)\delta\in(0,1) if

supxΣk|Ax221|δ.\sup_{x\in\Sigma_{k}}\left|\|Ax\|_{2}^{2}-1\right|\leq\delta.
Theorem 1.3.

[7] There are absolute constants c0c_{0} and c1c_{1} such that the following holds. Let λ>0\lambda>0 and ρ(0,1)\rho\in(0,1). Consider TnT\subset\mathbb{R}^{n} and let kclog(e|T|/λ)k\geq c\log(e|T|/\lambda). If AA satisfies the restricted isometry property of order kk and at level δ<ρ/4\delta<\rho/4, then with probability at least 1λ1-\lambda, for every tTt\in T,

(1ρ)t22Aεt22(1ρ)t22.(1-\rho)\|t\|_{2}^{2}\leq\|A_{\varepsilon}t\|_{2}^{2}\leq(1-\rho)\|t\|_{2}^{2}.

While Theorem 1.3 does not recover the probability estimate from Theorem 1.1, it does imply at the constant probability level that AεA_{\varepsilon} is an almost isometry in the random ensemble sense: if AA is a matrix that 1±δ1\pm\delta-preserves the norms of vectors that are clog|T|c\log|T| sparse, then a typical realization of the random ensemble AεA_{\varepsilon}, 1±cδ1\pm c^{\prime}\delta preserves the norms of all the elements in TT.

Various extensions of Theorem 1.1 that hold for arbitrary subsets of n\mathbb{R}^{n} have been studied over the years. In such extensions the “complexity parameter” log|T|\log|T| is replaced by more suitable counterparts. A rather general version of Theorem 1.1 follows from a functional Bernstein inequality (see, e.g., [3, 8, 2]), and to formulate that inequality in the gaussian case we require the following definition.

Definition 1.4.

Let g1,gng_{1},...g_{n} be independent, standard gaussian random variables. For TnT\subset\mathbb{R}^{n} set

(T)=𝔼suptT|i=1ngiti|anddT=suptTt2.\ell_{*}(T)=\mathbb{E}\sup_{t\in T}\left|\sum_{i=1}^{n}g_{i}t_{i}\right|\ \ \ {\rm and}\ \ \ d_{T}=\sup_{t\in T}\|t\|_{2}.

Let

((T)dT)2\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2}

be the critical dimension of the set TT.

The critical dimension appears naturally when studying the geometry of convex sets—for example, in the context of the Dvoretzky-Milman Theorem (see [1] and references therein for more details). It is the natural alternative to log|T|\log|T|—which was suitable for finite subsets of sphere Sn1S^{n-1}.

Let G=(gi)i=1nG=(g_{i})_{i=1}^{n} be the standard gaussian random vector in n\mathbb{R}^{n}, set G1,,GmG_{1},...,G_{m} to be independent copies of GG and put

Γ=i=1mGi,ei\Gamma=\sum_{i=1}^{m}\left\langle G_{i},\cdot\right\rangle e_{i}

to be the random ensemble used in Theorem 1.1.

Theorem 1.5.

There exist absolute constants c0,c1c_{0},c_{1} and CC such that the following holds. If TnT\subset\mathbb{R}^{n} and uc0u\geq c_{0} then with probability at least

12exp(c1u2((T)dT)2),1-2\exp\left(-c_{1}u^{2}\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2}\right),

for every tTt\in T,

|m1/2Γt22t22|C(udT(T)m+u2((T)m)2).\left|\left\|m^{-1/2}\Gamma t\right\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq C\left(ud_{T}\frac{\ell_{*}(T)}{\sqrt{m}}+u^{2}\left(\frac{\ell_{*}(T)}{\sqrt{m}}\right)^{2}\right). (1.1)

One may use Theorem 1.5 to ensure that the uniform error in (1.1) is at most max{ρ,ρ2}dT2\max\{\rho,\rho^{2}\}d_{T}^{2}. Indeed, if

(T)/dTmρ,\frac{\ell_{*}(T)/d_{T}}{\sqrt{m}}\sim\rho,

then with probability at least 12exp(c3ρ2m)1-2\exp(-c_{3}\rho^{2}m),

suptT|m1/2Γt22t22|max{ρ,ρ2}dT2,\sup_{t\in T}\left|\left\|m^{-1/2}\Gamma t\right\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq\max\{\rho,\rho^{2}\}d_{T}^{2}, (1.2)

which is a natural counterpart of Theorem 1.1 once log|T|\log|T| is replaced by ((T)/dT)2(\ell_{*}(T)/d_{T})^{2}.

As it happens, a version of Theorem 1.3 that is analogous to (1.2) was proved in [9], using the notion of a multi-level RIP.

Definition 1.6.

Let L=log2nL=\lceil\log_{2}n\rceil. For δ>0\delta>0 and s1s\geq 1 the matrix AA satisfies a multi-scale RIP with distortion δ\delta and sparsity ss if, for every 1L1\leq\ell\leq L and every xΣ2sx\in\Sigma_{2^{\ell}s}, one has

|Ax22x22|max{2/2δ,2δ2}.\left|\|Ax\|_{2}^{2}-\|x\|_{2}^{2}\right|\leq\max\left\{2^{\ell/2}\delta,2^{\ell}\delta^{2}\right\}.

Definition 1.6 implies that if ksk\geq s then

supxΣk|Ax22x22|max{kδ,kδ2}.\sup_{x\in\Sigma_{k}}\left|\|Ax\|_{2}^{2}-\|x\|_{2}^{2}\right|\leq\max\{\sqrt{k}\delta,k\delta^{2}\}.
Example 1.7.

Let Γ:nm\Gamma:\mathbb{R}^{n}\to\mathbb{R}^{m} be a gaussian matrix as above and set A=m1/2ΓA=m^{-1/2}\Gamma. It is standard to verify (using, for example, Theorem 1.5 and a well-known estimate on (Σk)\ell_{*}(\Sigma_{k})) that with probability at least 12exp(cklog(en/k))1-2\exp(-ck\log(en/k)),

supxΣk|Ax221|Cklog(en/k)m.\sup_{x\in\Sigma_{k}}\left|\|Ax\|_{2}^{2}-1\right|\leq C\sqrt{\frac{k\log(en/k)}{m}}.

By the union bound over kk it follows that with a nontrivial probability, AA satisfies a multi-scale RIP with s=1s=1 and δm1/2log(en)\delta\sim m^{-1/2}\sqrt{\log(en)}. Observe that the second term in the multi-scale RIP—, namely kδ2k\delta^{2}, is not needed here.

Remark 1.8.

Example 1.7 gives a good intuition on the role δ\delta has in well-behaved situations: it should scale (roughly) like 1/m1/\sqrt{m}, where mm is the number of rows of the matrix AA.

The following theorem is the starting point of this note: an estimate on the error a typical realization of the random ensemble Aε=ADεA_{\varepsilon}=AD_{\varepsilon} has when acting on an arbitrary TnT\subset\mathbb{R}^{n}, given that AA satisfies an appropriate multi-scale RIP.

Theorem 1.9.

[9] There are absolute constants cc and CC such that the following holds. Let η,ρ>0\eta,\rho>0 and A:nmA:\mathbb{R}^{n}\to\mathbb{R}^{m} that satisfies a multi-scale RIP with sparsity level s=c(1+η)s=c(1+\eta) and distortion

δ=CρdTmax{(T),dT}.\delta=C\frac{\rho d_{T}}{\max\{\ell_{*}(T),d_{T}\}}. (1.3)

Then for TnT\subset\mathbb{R}^{n}, with probability at least 1η1-\eta,

suptT|Aεt22t22|max{ρ2,ρ}dT2.\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq\max\{\rho^{2},\rho\}d_{T}^{2}. (1.4)

To put Theorem 1.9 in some context, if the belief is that AεA_{\varepsilon} should exhibit the same behaviour as the gaussian matrix m1/2Γm^{-1/2}\Gamma, then (keeping in mind that δ\delta should scale like 1/m1/\sqrt{m}), “a gaussian behaviour” as in Theorem 1.5 is that with high probability,

suptT|Aεt22t22|c(δdT(T)+(δ(T))2).\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq c\left(\delta d_{T}\ell_{*}(T)+(\delta\ell_{*}(T))^{2}\right).

Observe that (T)dT\ell_{*}(T)\gtrsim d_{T}, implying by (1.3) that ρδ(T)/dT\rho\sim\delta\ell_{*}(T)/d_{T}. Hence, the error in (1.4) in terms of δ\delta is indeed

dTδ(T)+(δ(T))2.\sim d_{T}\delta\ell_{*}(T)+(\delta\ell_{*}(T))^{2}.

However, despite the “gaussian error”, the probability estimate in Theorem 1.9 is far weaker than in Theorem 1.5—it is just at the constant level.

Our main result is that using a modified, seemingly less restrictive version of the multi-scale RIP, AεA_{\varepsilon} acts on TT as if it were a gaussian operator: achieving the same distortion and probability estimate as in Theorem 1.5.

Definition 1.10.

Let A:nmA:\mathbb{R}^{n}\to\mathbb{R}^{m} be a matrix. For δ>0\delta>0 let 1kn1\leq k^{*}\leq n be the largest such that for every 1kk1\leq k\leq k_{*}, AA is a (δ,k)(\delta,k) regular; that is, for every 1kk1\leq k\leq k_{*}

supxΣk|Ax22x22|δk.\sup_{x\in\Sigma_{k}}\left|\|Ax\|_{2}^{2}-\|x\|_{2}^{2}\right|\leq\delta\sqrt{k}.
Theorem 1.11.

There exist absolute constants cc and CC such that the following holds. Let δ>0\delta>0 and set Λ=max{1,δ2log(nδ2)}\Lambda=\max\{1,\delta^{2}\log(n\delta^{2})\}. If k1/δ2k_{*}\geq 1/\delta^{2}, TnT\subset\mathbb{R}^{n} and u1u\geq 1, then with probability at least

12exp(cu2((T)dT)2),1-2\exp\left(-cu^{2}\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2}\right),

we have that

suptT|Aεt22t22|Cu2(ΛdTδ(T)+(δ(T))2).\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq Cu^{2}\left(\Lambda\cdot d_{T}\delta\ell_{*}(T)+(\delta\ell_{*}(T))^{2}\right). (1.5)
Remark 1.12.

The sub-optimality in Theorem 1.11 lies in the factor Λ\Lambda—in the range where δ\delta is relatively large: at least 1/logn1/\sqrt{\log n}. For δ1/logn\delta\leq 1/\sqrt{\log n} we have that Λ=1\Lambda=1 and Theorem 1.11 recovers the functional Bernstein inequality for u1u\sim 1; that holds despite the fact that AεA_{\varepsilon} is based only on nn “random bits”.

Moreover, for the error in (1.5) to have a chance of being a nontrivial two-sided estimate, i.e., that for some 0<ρ<10<\rho<1 and every tTt\in T,

|Aεt22t22|ρdT2,\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq\rho d_{T}^{2},

δ\delta has to be smaller than dT/(T)\sim d_{T}/\ell_{*}(T). In particular, if the critical dimension of TT, ((T)/dT)2(\ell_{*}(T)/d_{T})^{2} is at least logn\log n, a choice of δ(dT/(T))\delta\leq(d_{T}/\ell_{*}(T)) leads to Λ=1\Lambda=1 and thus to an optimal outcome in Theorem 1.11.

Theorem 1.11 clearly improves the probability estimate from Theorem 1.9. The other (virtual) improvement is that the matrix AA need only be (δ,k)(\delta,k)-regular for k1/δ2k\leq 1/\delta^{2}, and the way AA acts on Σk\Sigma_{k} for k>1/δ2k>1/\delta^{2} is of no importance. The reason for calling that improvement “virtual” is the following observation:

Lemma 1.13.

If k1/δ2k^{*}\geq 1/\delta^{2} then for any 1sn1\leq s\leq n,

supxΣs|Ax221|4max{δs,δ2s}.\sup_{x\in\Sigma_{s}}\left|\|Ax\|_{2}^{2}-1\right|\leq 4\max\{\delta\sqrt{s},\delta^{2}s\}.

In other words, the second term in the multi-scale RIP condition follows automatically from the first one and the fact that kk^{*} is sufficiently large.

Proof.  Let xΣsx\in\Sigma_{s} for sks\geq k_{*}, and let (Ji)i=1(J_{i})_{i=1}^{\ell} be a decomposition of the support of xx to coordinate blocks of cardinality k/2k_{*}/2. Set yi=PJixy_{i}=P_{J_{i}}x, that is, the projection of xx onto span{em:mJi}{\rm span}\{e_{m}:m\in J_{i}\} and write x=i=1yix=\sum_{i=1}^{\ell}y_{i}. Note that 4s/k\ell\leq 4s/k_{*} and that

Ax22=i=1Ayi2=i=1Ayi22+ijAyi,Ayj.\|Ax\|_{2}^{2}=\bigl{\|}\sum_{i=1}^{\ell}Ay_{i}\bigr{\|}^{2}=\sum_{i=1}^{\ell}\|Ay_{i}\|_{2}^{2}+\sum_{i\not=j}\left\langle Ay_{i},Ay_{j}\right\rangle.

The vectors yiy_{i} are orthogonal and so x22=i=1yi22\|x\|_{2}^{2}=\sum_{i=1}^{\ell}\|y_{i}\|_{2}^{2}. Therefore,

|Ax22x22|i=1|Ayi22yi22|+|ijAyi,Ayj|.\left|\|Ax\|_{2}^{2}-\|x\|_{2}^{2}\right|\leq\sum_{i=1}^{\ell}\left|\|Ay_{i}\|_{2}^{2}-\|y_{i}\|_{2}^{2}\right|+\left|\sum_{i\not=j}\left\langle Ay_{i},Ay_{j}\right\rangle\right|.

For the first term, as each yiy_{i} is supported on at most k/2k_{*}/2 coordinates, it follows from the regularity condition that

i=1|Ayi22yi22|δk/2i=1yi22=δk/2x22δs.\sum_{i=1}^{\ell}\left|\|Ay_{i}\|_{2}^{2}-\|y_{i}\|_{2}^{2}\right|\leq\delta\sqrt{k_{*}/2}\sum_{i=1}^{\ell}\|y_{i}\|_{2}^{2}=\delta\sqrt{k_{*}/2}\|x\|_{2}^{2}\leq\delta\sqrt{s}.

As for the second term, since yiy_{i} and yjy_{j} are orthogonal, yi+yj2=yiyj2\|y_{i}+y_{j}\|_{2}=\|y_{i}-y_{j}\|_{2} and

Ayi,Ayj=\displaystyle\left\langle Ay_{i},Ay_{j}\right\rangle= 14(A(yi+yj)22A(yiyj)22)\displaystyle\frac{1}{4}\left(\|A(y_{i}+y_{j})\|_{2}^{2}-\|A(y_{i}-y_{j})\|_{2}^{2}\right)
=\displaystyle= 14(A(yi+yj)22yi+yj22)14(yiyj22A(yiyj)22).\displaystyle\frac{1}{4}\left(\|A(y_{i}+y_{j})\|_{2}^{2}-\|y_{i}+y_{j}\|_{2}^{2}\right)-\frac{1}{4}\left(\|y_{i}-y_{j}\|_{2}^{2}-\|A(y_{i}-y_{j})\|_{2}^{2}\right).

Thus, by the regularity of AA and as |supp(yi±yj)|k|{\rm supp}(y_{i}\pm y_{j})|\leq k_{*},

|Ayi,Ayj|\displaystyle|\left\langle Ay_{i},Ay_{j}\right\rangle|\leq 14(δkyi+yj22+δkyiyj22)\displaystyle\frac{1}{4}\left(\delta\sqrt{k_{*}}\|y_{i}+y_{j}\|_{2}^{2}+\delta\sqrt{k_{*}}\|y_{i}-y_{j}\|_{2}^{2}\right)
\displaystyle\leq 12δk(yi22+yj22).\displaystyle\frac{1}{2}\delta\sqrt{k_{*}}(\|y_{i}\|_{2}^{2}+\|y_{j}\|_{2}^{2}).

Taking the sum over all pairs iji\not=j, i,ji,j\leq\ell, each factor yi22\|y_{i}\|_{2}^{2} appears at most 22\ell times, and 28s/k2\ell\leq 8s/k_{*}. Hence, using that 1/δ2k1/\delta^{2}\leq k_{*}

ij|Ayi,Ayj|12δk8ski=1yi222δskx224δ2s.\sum_{i\not=j}|\left\langle Ay_{i},Ay_{j}\right\rangle|\leq\frac{1}{2}\delta\sqrt{k_{*}}\cdot\frac{8s}{k_{*}}\sum_{i=1}^{\ell}\|y_{i}\|_{2}^{2}\leq 2\delta\frac{s}{\sqrt{k_{*}}}\|x\|_{2}^{2}\leq 4\delta^{2}s.

 

Clearly, Theorem 1.11 implies a suitable version of Theorem 1.9.

Corollary 1.14.

There exist absolute constants cc and c1c_{1} such that the following holds. Let AA be as above, set TnT\subset\mathbb{R}^{n} and 0<δ<1/logn0<\delta<1/\sqrt{\log n}. Let ρ=cδ(T)/dT\rho=c\delta\ell_{*}(T)/d_{T}. Then with probability at least 12exp(c1ρ2/δ2)1-2\exp(-c_{1}\rho^{2}/\delta^{2}),

suptT|Aεt22t22|ρdT2.\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq\rho d_{T}^{2}.
Remark 1.15.

Recalling the intuition that m1/δ2m\sim 1/\delta^{2}, the outcome of Corollary 1.14 coincides with the estimate in (1.2).

In Section 3 we present one simple application of Theorem 1.11. We show that column randomization of a typical realization of a Bernoulli circulant matrix (complete or partial) exhibits an almost gaussian behaviour (conditioned on the generating vector). In particular, only 2n2n random bits (nn from the generating Bernoulli vector and nn from the column randomization) are required if one wishes to create a random ensemble that is, effectively, an almost isometry.

The proof of Theorem 1.11 is based on a chaining argument. For more information on the generic chaining mechanism, see Talagrand’s treasured manuscript [10]. We only require relatively basic notions from generic chaining theory, as well as the celebrated majorizing measures theorem.

Definition 1.16.

Let TnT\subset\mathbb{R}^{n}. A collection of subsets of TT, (Ts)s0(T_{s})_{s\geq 0}, is an admissible sequence if |T0|=1|T_{0}|=1 and for s1s\geq 1, |Ts|22s|T_{s}|\leq 2^{2^{s}}. For every tTt\in T denote by πst\pi_{s}t a nearest point to tt in TsT_{s} with respect to the Euclidean distance. Set Δst=πs+1tπst\Delta_{s}t=\pi_{s+1}t-\pi_{s}t for s1s\geq 1 and let Δ0t=π0t\Delta_{0}t=\pi_{0}t.

The γ2\gamma_{2} functional with respect to the 2\ell_{2} metric is defined by

γ2(T,2)=inf(Ts)suptTs02s/2Δst2,\gamma_{2}(T,\|\ \|_{2})=\inf_{(T_{s})}\sup_{t\in T}\sum_{s\geq 0}2^{s/2}\|\Delta_{s}t\|_{2},

where the infimum is taken with respect to all admissible sequences of TT.

An application of Talagrand’s majorizing measures theorem to the gaussian process ti=1ngitit\to\sum_{i=1}^{n}g_{i}t_{i} shows that γ2(T,2)\gamma_{2}(T,\|\ \|_{2}) and (T)\ell_{*}(T) are equivalent:

Theorem 1.17.

There are absolute constants cc and CC such that for every TnT\subset\mathbb{R}^{n},

cγ2(T,2)(T)Cγ2(T,2).c\gamma_{2}(T,\|\ \|_{2})\leq\ell_{*}(T)\leq C\gamma_{2}(T,\|\ \|_{2}).

The proof of Theorem 1.17 can be found, for example, in [10].

2 Proof of Theorem 1.11

We begin the proof with a word about notation: throughout, absolute constants, that is, positive numbers that are independent of all the parameters involved in the problem, are denoted by cc, c1c_{1}, CC, etc. Their value may change from line to line.

As noted previously, the proof is based on a chaining argument. Let (Ts)s0(T_{s})_{s\geq 0} be an optimal admissible sequence of TT. Set s0s_{0} to satisfy that 2s02^{s_{0}} is the critical dimension of TT, i.e.,

2s0=((T)dT)22^{s_{0}}=\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2}

(without loss of generality we may assume that equality holds). Let s0s1s_{0}\leq s_{1} to be named in what follows and observe that

Aεt22=Aε(tπs1t)+Aεπs1t22=Aε(tπs1t)22+2Aε(tπs1t),Aεπs1t+Aεπs1t22.\|A_{\varepsilon}t\|_{2}^{2}=\|A_{\varepsilon}(t-\pi_{s_{1}}t)+A_{\varepsilon}\pi_{s_{1}}t\|_{2}^{2}=\|A_{\varepsilon}(t-\pi_{s_{1}}t)\|_{2}^{2}+2\left\langle A_{\varepsilon}(t-\pi_{s_{1}}t),A_{\varepsilon}\pi_{s_{1}}t\right\rangle+\|A_{\varepsilon}\pi_{s_{1}}t\|_{2}^{2}.

Writing tπs1t=ss1Δstt-\pi_{s_{1}}t=\sum_{s\geq s_{1}}\Delta_{s}t, it follows that

Aε(tπs1t)2ss1AεΔst2,\|A_{\varepsilon}(t-\pi_{s_{1}}t)\|_{2}\leq\sum_{s\geq s_{1}}\|A_{\varepsilon}\Delta_{s}t\|_{2}, (2.1)

and

|Aε(tπs1t),Aεπs1t|Aε(tπs1t)2Aεπs1t2;\left|\left\langle A_{\varepsilon}(t-\pi_{s_{1}}t),A_{\varepsilon}\pi_{s_{1}}t\right\rangle\right|\leq\|A_{\varepsilon}(t-\pi_{s_{1}}t)\|_{2}\cdot\|A_{\varepsilon}\pi_{s_{1}}t\|_{2};

Therefore, setting

Ψ2=suptT|Aεπs1t22πs1t22|andΦ=ss1AεΔst2\Psi^{2}=\sup_{t\in T}\left|\|A_{\varepsilon}\pi_{s_{1}}t\|_{2}^{2}-\|\pi_{s_{1}}t\|_{2}^{2}\right|\ \ \ {\rm and}\ \ \ \Phi=\sum_{s\geq s_{1}}\|A_{\varepsilon}\Delta_{s}t\|_{2}

we have that for every tTt\in T,

Aεπs1t2Ψ2+dT2\|A_{\varepsilon}\pi_{s_{1}}t\|_{2}\leq\sqrt{\Psi^{2}+d_{T}^{2}}

and

|Aε(tπs1t),Aεπs1t|ΦΨ2+dT2.\left|\left\langle A_{\varepsilon}(t-\pi_{s_{1}}t),A_{\varepsilon}\pi_{s_{1}}t\right\rangle\right|\leq\Phi\cdot\sqrt{\Psi^{2}+d_{T}^{2}}. (2.2)

Hence,

suptT|Aεt22t22|Ψ2+2ΦΨ2+dT2+Φ2+suptT|πs1t22t22|.\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq\Psi^{2}+2\Phi\cdot\sqrt{\Psi^{2}+d_{T}^{2}}+\Phi^{2}+\sup_{t\in T}\left|\|\pi_{s_{1}}t\|_{2}^{2}-\|t\|_{2}^{2}\right|. (2.3)

To estimate the final term, note that for every tTt\in T,

|t22πs1t22|\displaystyle\left|\|t\|_{2}^{2}-\|\pi_{s_{1}}t\|_{2}^{2}\right|\leq tπs1t22+2|tπs1t,πs1t|tπs1t22+2tπs1t2πs1t2\displaystyle\|t-\pi_{s_{1}}t\|_{2}^{2}+2\left|\left\langle t-\pi_{s_{1}}t,\pi_{s_{1}}t\right\rangle\right|\leq\|t-\pi_{s_{1}}t\|_{2}^{2}+2\|t-\pi_{s_{1}}t\|_{2}\cdot\|\pi_{s_{1}}t\|_{2}
\displaystyle\leq (ss1Δst2)2+2dTss1Δst2.\displaystyle\bigl{(}\sum_{s\geq s_{1}}\|\Delta_{s}t\|_{2}\bigr{)}^{2}+2d_{T}\sum_{s\geq s_{1}}\|\Delta_{s}t\|_{2}.

By the definition of the γ2\gamma_{2} functional and the majorizing measures theorem, for every integer ss and every tTt\in T,

Δst22s/2γ2(T,2)c12s/2(T).\|\Delta_{s}t\|_{2}\leq 2^{-s/2}\gamma_{2}(T,\|\ \|_{2})\leq c_{1}2^{-s/2}\ell_{*}(T).

Thus,

ss1Δst2c1(T)ss12s/2c22s1/2(T),\sum_{s\geq s_{1}}\|\Delta_{s}t\|_{2}\leq c_{1}\ell_{*}(T)\sum_{s\geq s_{1}}2^{-s/2}\leq c_{2}2^{-s_{1}/2}\ell_{*}(T),

and

|πs1t22t22|c3(2s12(T)+dT2s1/2(T)).\left|\|\pi_{s_{1}}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq c_{3}\left(2^{-s_{1}}\ell_{*}^{2}(T)+d_{T}2^{-s_{1}/2}\ell_{*}(T)\right). (2.4)

Equation (2.4) and the wanted estimate in Theorem 1.11 hint on the identity of 2s12^{s_{1}}: it should be larger than 1/δ21/\delta^{2}. Recalling that s0s1s_{0}\leq s_{1} and that 2s0=((T)/dT)22^{s_{0}}=(\ell_{*}(T)/d_{T})^{2}, set

2s1=max{1δ2,((T)dT)2,log(e(1+nδ2))}.2^{s_{1}}=\max\left\{\frac{1}{\delta^{2}},\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2},\log\left(e(1+n\delta^{2})\right)\right\}.

The reason behind the choice of the third term will become clear in what follows.

With that choice of s1s_{1},

suptT|πs1t22t22|c3(δ22(T)+dTδ(T)),\sup_{t\in T}\left|\|\pi_{s_{1}}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq c_{3}\left(\delta^{2}\ell_{*}^{2}(T)+d_{T}\delta\ell_{*}(T)\right), (2.5)

and the nontrivial part of the proof is to control Φ\Phi and Ψ\Psi with high probability that would lead to the wanted estimate on (2.3).

2.1 A decoupling argument

For every tnt\in\mathbb{R}^{n},

Aεt22=i,jAei,Aejεiεjtitj=i=1nAei22ti2+ijAei,Aejεiεjtitj.\|A_{\varepsilon}t\|_{2}^{2}=\sum_{i,j}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}t_{i}t_{j}=\sum_{i=1}^{n}\|Ae_{i}\|_{2}^{2}t_{i}^{2}+\sum_{i\not=j}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}t_{i}t_{j}. (2.6)

By the assumption that AA is (δ,1)(\delta,1)-regular,

max1in|Aei221|δ,\max_{1\leq i\leq n}\left|\|Ae_{i}\|_{2}^{2}-1\right|\leq\delta,

and noting that dTc(T)d_{T}\leq c\ell_{*}(T),

|i=1nAei22ti2t22|=|i=1n(Aei221)ti2|δt22δdT2cdTδ(T).\left|\sum_{i=1}^{n}\|Ae_{i}\|_{2}^{2}t_{i}^{2}-\|t\|_{2}^{2}\right|=\left|\sum_{i=1}^{n}\left(\|Ae_{i}\|_{2}^{2}-1\right)t_{i}^{2}\right|\leq\delta\|t\|_{2}^{2}\leq\delta d_{T}^{2}\leq cd_{T}\delta\ell_{*}(T).

Next, we turn to the “off-diagonal” term in (2.6). For tnt\in\mathbb{R}^{n} let

Zt=ijεiεjtitjAei,AejZ_{t}=\sum_{i\not=j}\varepsilon_{i}\varepsilon_{j}t_{i}t_{j}\left\langle Ae_{i},Ae_{j}\right\rangle

and let us obtain high probability estimates on supuU|Zu|\sup_{u\in U}|Z_{u}| for various sets UU. The first step in that direction is decoupling: let η1,,ηn\eta_{1},...,\eta_{n} be independent {0,1}\{0,1\}-valued random variables with mean 1/21/2. Set I={i:ηi=1}I=\{i:\eta_{i}=1\} and observe that for every (εi)i=1n(\varepsilon_{i})_{i=1}^{n},

supuU|Zu|4supuU𝔼η|A(iIεiuiei),A(jIcεjujej)|.\sup_{u\in U}|Z_{u}|\leq 4\sup_{u\in U}\mathbb{E}_{\eta}\left|\left\langle A\left(\sum_{i\in I}\varepsilon_{i}u_{i}e_{i}\right),A\left(\sum_{j\in I^{c}}\varepsilon_{j}u_{j}e_{j}\right)\right\rangle\right|. (2.7)

Indeed, for every (εi)i=1n(\varepsilon_{i})_{i=1}^{n},

supuU|ijAei,Aejεiεjuiuj|=\displaystyle\sup_{u\in U}\left|\sum_{i\not=j}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}u_{i}u_{j}\right|= 4supuU|ijAei,Aej𝔼η(1ηi)ηjεiεjuiui|\displaystyle 4\sup_{u\in U}\left|\sum_{i\not=j}\left\langle Ae_{i},Ae_{j}\right\rangle\mathbb{E}_{\eta}(1-\eta_{i})\eta_{j}\cdot\varepsilon_{i}\varepsilon_{j}u_{i}u_{i}\right|
\displaystyle\leq 4supuU𝔼η|iI,jIcAei,Aejεiεjuiuj|;\displaystyle 4\sup_{u\in U}\mathbb{E}_{\eta}\left|\sum_{i\in I,\ j\in I^{c}}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}u_{i}u_{j}\right|;

and for every unu\in\mathbb{R}^{n},

iI,jIcAei,Aejεiεjuiuj=A(jIcεjujej),A(iIεiuiei).\sum_{i\in I,\ j\in I^{c}}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}u_{i}u_{j}=\left\langle A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}u_{j}e_{j}\bigr{)},A\bigl{(}\sum_{i\in I}\varepsilon_{i}u_{i}e_{i}\bigr{)}\right\rangle.

Equation (2.7) naturally leads to the following definition:

Definition 2.1.

For vnv\in\mathbb{R}^{n} and I{1,,n}I\subset\{1,...,n\}, set

Wv,I=AA(iIεiviei).W_{v,I}=A^{*}A\bigl{(}\sum_{i\in I}\varepsilon_{i}v_{i}e_{i}\bigr{)}. (2.8)

Recall that πs+1t\pi_{s+1}t is the nearest point to tt in Ts+1T_{s+1} and Δst=πs+1tπst\Delta_{s}t=\pi_{s+1}t-\pi_{s}t.

Lemma 2.2.

For every tt and every (εi)i=1n(\varepsilon_{i})_{i=1}^{n},

|Zπs1t|\displaystyle|Z_{\pi_{s_{1}}t}|\leq 4s=s0s11(𝔼η|jIcεj(Δst)j(Wπs+1t,I)j|+𝔼η|iIεi(Δst)i(Wπst,Ic)i|)\displaystyle 4\sum_{s=s_{0}}^{s_{1}-1}\left(\mathbb{E}_{\eta}\left|\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j}\right|+\mathbb{E}_{\eta}\left|\sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}(W_{\pi_{s}t,I^{c}})_{i}\right|\right)
+\displaystyle+ 4𝔼η|jIcεj(πs0t)j(Wπs0t,I)j|.\displaystyle 4\mathbb{E}_{\eta}\left|\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s_{0}}t)_{j}(W_{\pi_{s_{0}}t,I})_{j}\right|. (2.9)

Proof.  Fix an integer ss. With the decoupling argument in mind, fix I{1,,n}I\subset\{1,...,n\} and observe that

A(iIεi(πs+1t)iei),A(jIcεj(πs+1t)jej)\displaystyle\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s+1}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s+1}t)_{j}e_{j}\bigr{)}\right\rangle
=\displaystyle= A(iIεi(πs+1t)iei),A(jIcεj(Δst)jej)+A(iIεi(πs+1t)iei),A(jIcεj(πst)jej)\displaystyle\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s+1}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}e_{j}\bigr{)}\right\rangle+\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s+1}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s}t)_{j}e_{j}\bigr{)}\right\rangle
=\displaystyle= A(iIεi(πs+1t)iei),A(jIcεj(Δst)jej)+A(iIεi(Δst)iei),A(jIcεj(πst)jej)\displaystyle\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s+1}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}e_{j}\bigr{)}\right\rangle+\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s}t)_{j}e_{j}\bigr{)}\right\rangle
+\displaystyle+ A(iIεi(πst)iei),A(jIcεj(πst)jej).\displaystyle\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s}t)_{j}e_{j}\bigr{)}\right\rangle.

Moreover,

A(iIεi(πs+1t)iei),A(jIcεj(Δst)jej)=\displaystyle\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s+1}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}e_{j}\bigr{)}\right\rangle= AA(iIεi(πs+1t)iei),(jIcεj(Δst)jej)\displaystyle\left\langle A^{*}A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s+1}t)_{i}e_{i}\bigr{)},\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}e_{j}\bigr{)}\right\rangle
=\displaystyle= jIcεj(Δst)j(Wπs+1t,I)j.\displaystyle\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j}.

and

A(iIεi(Δst)iei),A(jIcεj(πst)jej)=iIεi(Δst)i(Wπst,Ic)i.\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s}t)_{j}e_{j}\bigr{)}\right\rangle=\sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}(W_{\pi_{s}t,I^{c}})_{i}.

Combining these observations, for every tTt\in T and (εi)i=1n(\varepsilon_{i})_{i=1}^{n},

14Zπs1t=\displaystyle\frac{1}{4}Z_{\pi_{s_{1}}t}= 𝔼ηA(iIεi(πs1t)iei),A(jIcεj(πs1t)jej)\displaystyle\mathbb{E}_{\eta}\left\langle A\bigl{(}\sum_{i\in I}\varepsilon_{i}(\pi_{s_{1}}t)_{i}e_{i}\bigr{)},A\bigl{(}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s_{1}}t)_{j}e_{j}\bigr{)}\right\rangle
=\displaystyle= 𝔼ηs=s0s11iIεi(Δst)i(Wπst,Ic)i\displaystyle\mathbb{E}_{\eta}\sum_{s=s_{0}}^{s_{1}-1}\sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}(W_{\pi_{s}t,I^{c}})_{i}
+\displaystyle+ 𝔼ηs=s0s11jIcεj(Δst)j(Wπs+1t,I)j\displaystyle\mathbb{E}_{\eta}\sum_{s=s_{0}}^{s_{1}-1}\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j}
+\displaystyle+ 𝔼ηjIcεj(πs0t)j(Wπs0t,I)j,\displaystyle\mathbb{E}_{\eta}\sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s_{0}}t)_{j}(W_{\pi_{s_{0}}t,I})_{j},

from which the claim follows immediately.  

As part of the decoupling argument and to deal with the introduction of the random variables (ηi)i=1n(\eta_{i})_{i=1}^{n} in (2.7), we will use the following elementary fact:

Lemma 2.3.

Let ff be a function of (εi)i=1n(\varepsilon_{i})_{i=1}^{n} and (ηi)i=1n(\eta_{i})_{i=1}^{n}. If 𝔼η𝔼ε|f|qκq\mathbb{E}_{\eta}\mathbb{E}_{\varepsilon}|f|^{q}\leq\kappa^{q} then with (εi)i=1n(\varepsilon_{i})_{i=1}^{n}-probability at least 1exp(q)1-\exp(-q),

𝔼η(|f||(εi)i=1n)eκ.\mathbb{E}_{\eta}(|f|\ |(\varepsilon_{i})_{i=1}^{n})\leq e\kappa. (2.10)

Proof.  For a nonnegative function hh we have that point-wise, 𝟙{ht}hq/tq\mathbbm{1}_{\{h\geq t\}}\leq h^{q}/t^{q}. Let h=𝔼η(|f||(εi)i=1n)h=\mathbb{E}_{\eta}(|f|\ |(\varepsilon_{i})_{i=1}^{n}) and note that

Prε(𝔼η(|f||(εi)i=1n)uκ)=𝔼ε𝟙{huκ}(uκ)q(𝔼εhq).Pr_{\varepsilon}\left(\mathbb{E}_{\eta}(|f|\ |(\varepsilon_{i})_{i=1}^{n})\geq u\kappa\right)=\mathbb{E}_{\varepsilon}\mathbbm{1}_{\{h\geq u\kappa\}}\leq(u\kappa)^{-q}(\mathbb{E}_{\varepsilon}h^{q}).

By Jensen’s inequality followed by Fubini’s Theorem,

𝔼εhq=𝔼ε(𝔼η(|f||(εi)i=1n)q𝔼ε𝔼η|f|qκq,\mathbb{E}_{\varepsilon}h^{q}=\mathbb{E}_{\varepsilon}(\mathbb{E}_{\eta}(|f|\ |(\varepsilon_{i})_{i=1}^{n})^{q}\leq\mathbb{E}_{\varepsilon}\mathbb{E}_{\eta}|f|^{q}\leq\kappa^{q},

and setting u=eu=e proves (2.10).  

We shall use Lemma 2.3 in situations where we actually have more information—namely that for any (ηi)i=1n(\eta_{i})_{i=1}^{n}, 𝔼ε(|f|q|(ηi)i=1n)κq\mathbb{E}_{\varepsilon}(|f|^{q}\ |(\eta_{i})_{i=1}^{n})\leq\kappa^{q} for a well chosen κ\kappa. As a result,

𝔼η(|f||(εi)i=1n)eκwithprobabilityatleast 1exp(q).\mathbb{E}_{\eta}\bigl{(}|f|\big{|}(\varepsilon_{i})_{i=1}^{n}\bigr{)}\leq e\kappa\ \ {\rm with\ probability\ at\ least\ }1-\exp(-q).

Taking into account Lemma 2.2 and Lemma 2.3, it follows that if one wishes to estimate suptT|Zπs1t|\sup_{t\in T}|Z_{\pi_{s_{1}}t}| using a chaining argument, it suffices to obtain, for every I{1,,n}I\subset\{1,...,n\}, bounds on moments of random variables of the form

jIcεj(Δst)j(Wπs+1t,I)j,iIεi(Δst)i(Wπst,Ic)i,andjIcεj(πs0t)j(Wπs0t,I)j,\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j},\ \ \ \sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}(W_{\pi_{s}t,I^{c}})_{i},\ \ \ {\rm and}\ \ \ \sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s_{0}}t)_{j}(W_{\pi_{s_{0}}t,I})_{j}, (2.11)

as that results in high probability estimates on each of the terms in (2.2). And as there are at most 22s+32^{2^{s+3}} random variables involved in this chaining argument at the ss-stage, the required moment in (2.11) is q2sq\sim 2^{s} for the first two terms and q2s0q\sim 2^{s_{0}} for the third one.

2.2 Preliminary estimates

For J{1,,n}J\subset\{1,...,n\} let PJx=jJxjejP_{J}x=\sum_{j\in J}x_{j}e_{j} be the projection of xx onto span(ej)jJ{\rm span}(e_{j})_{j\in J}. The key lemma in the proof of Theorem 1.11 is:

Lemma 2.4.

There exists an absolute constant cc such that the following holds. Let I{1,,n}I\subset\{1,...,n\} and Wv,IW_{v,I} be as in (2.8). Set JIcJ\subset I^{c} such that |J|k|J|\leq k_{*}. Then for q|J|q\geq|J|,

(𝔼PJWv,I2q)1/qcqδv2.\left(\mathbb{E}\|P_{J}W_{v,I}\|_{2}^{q}\right)^{1/q}\leq c\sqrt{q}\delta\|v\|_{2}.

Proof.  Let SJS^{J} be the Euclidean unit sphere in the subspace span(ej)jJ{\rm span}(e_{j})_{j\in J} and let UU be a maximal 1/101/10-separated subset of SJS^{J}. By a volumetric estimate (see, e.g. [1]), there is an absolute constant c0c_{0} such that |U|exp(c0|J|)|U|\leq\exp(c_{0}|J|). Moreover, a standard approximation argument shows that for every yny\in\mathbb{R}^{n},

PJy2=supxSJy,xc1maxxUy,x,\|P_{J}y\|_{2}=\sup_{x\in S^{J}}\left\langle y,x\right\rangle\leq c_{1}\max_{x\in U}\left\langle y,x\right\rangle,

where c1c_{1} is a suitable absolute constant. Therefore,

supxSJWv,I,xc1maxxUWv,I,x,\sup_{x\in S^{J}}\left\langle W_{v,I},x\right\rangle\leq c_{1}\max_{x\in U}\left\langle W_{v,I},x\right\rangle,

and it suffices to control, with high probability,

maxxUiIεiviei,AAx=maxxUiIεivi(AAx)i.\max_{x\in U}\left\langle\sum_{i\in I}\varepsilon_{i}v_{i}e_{i},A^{*}Ax\right\rangle=\max_{x\in U}\sum_{i\in I}\varepsilon_{i}v_{i}\left(A^{*}Ax\right)_{i}.

Fix xUx\in U, recall that AA is (δ,k)(\delta,k)-regular for 1kk1\leq k\leq k_{*} and we first explore the case 1qk/21\leq q\leq k_{*}/2.

Denote by III^{\prime}\subset I the set of indices corresponding to the qq largest values of |(AAx)i|,iI|(A^{*}Ax)_{i}|,\ i\in I. If |I|q|I|\leq q then set I=II^{\prime}=I.

It is straightforward to verify (e.g., using Höffding’s inequality) that there is an absolute constant c2c_{2} such that

iIεivi(AAx)iLq\displaystyle\left\|\sum_{i\in I}\varepsilon_{i}v_{i}\left(A^{*}Ax\right)_{i}\right\|_{L_{q}}\leq iI|vi(AAx)i|+c2q(iI\Ivi2(AAx)i2)1/2\displaystyle\sum_{i\in I^{\prime}}|v_{i}\left(A^{*}Ax\right)_{i}|+c_{2}\sqrt{q}\bigl{(}\sum_{i\in I\backslash I^{\prime}}v_{i}^{2}\left(A^{*}Ax\right)_{i}^{2}\bigr{)}^{1/2}
\displaystyle\leq v2PI(AAx)2+c2qPI(AAx)2qv2\displaystyle\|v\|_{2}\|P_{I^{\prime}}(A^{*}Ax)\|_{2}+c_{2}\sqrt{q}\cdot\frac{\|P_{I^{\prime}}(A^{*}Ax)\|_{2}}{\sqrt{q}}\|v\|_{2}
\displaystyle\leq c3v2PI(AAx)2,\displaystyle c_{3}\|v\|_{2}\|P_{I^{\prime}}(A^{*}Ax)\|_{2},

where we used that fact that for iI\Ii\in I\backslash I^{\prime},

|(AAx)i|PI(AAx)2q.|\left(A^{*}Ax\right)_{i}|\leq\frac{\|P_{I^{\prime}}(A^{*}Ax)\|_{2}}{\sqrt{q}}.

Therefore,

iIεivi(AAx)iLqc3v2maxII,|I|=qsupzSIAAx,z.\left\|\sum_{i\in I}\varepsilon_{i}v_{i}\left(A^{*}Ax\right)_{i}\right\|_{L_{q}}\leq c_{3}\|v\|_{2}\max_{I^{\prime}\subset I,\ |I^{\prime}|=q}\sup_{z\in S^{I^{\prime}}}\left\langle A^{*}Ax,z\right\rangle.

Note that xx is supported in JIcJ\subset I^{c}, while each ‘legal’ zz is supported in a subset of II; in particular, xx and zz are orthogonal, implying that for every such zz,

x+z2=xz22and|supp(x+z)|,|supp(xz)|2q.\|x+z\|_{2}=\|x-z\|_{2}\leq 2\ \ {\rm and}\ \ |{\rm supp}(x+z)|,|{\rm supp}(x-z)|\leq 2q.

Thus, by the (δ,2q)(\delta,2q)-regularity of AA (as 2qk2q\leq k_{*}),

|AAx,z|=\displaystyle\left|\left\langle A^{*}Ax,z\right\rangle\right|= |Az,Ax|=14|A(x+z)22A(xz)22|\displaystyle\left|\left\langle Az,Ax\right\rangle\right|=\frac{1}{4}\left|\|A(x+z)\|_{2}^{2}-\|A(x-z)\|_{2}^{2}\right|
=\displaystyle= 14|(A(x+z)22x+z22)(A(xz)22xz22)|\displaystyle\frac{1}{4}\left|\left(\|A(x+z)\|_{2}^{2}-\|x+z\|_{2}^{2}\right)-\left(\|A(x-z)\|_{2}^{2}-\|x-z\|_{2}^{2}\right)\right|
\displaystyle\leq δ42qmax{x+z22,xz22}δq,\displaystyle\frac{\delta}{4}\sqrt{2q}\cdot\max\{\|x+z\|_{2}^{2},\|x-z\|_{2}^{2}\}\leq\delta\sqrt{q},

and it follows that

iIεivi(AAx)iLqc4v2δq.\left\|\sum_{i\in I}\varepsilon_{i}v_{i}\left(A^{*}Ax\right)_{i}\right\|_{L_{q}}\leq c_{4}\|v\|_{2}\delta\sqrt{q}.

Turning to the case qk/2q\geq k_{*}/2, let II^{\prime} be the set of indices corresponding to the k/2k_{*}/2 largest coordinates of (|(AAx)i|)iI(|(A^{*}Ax)_{i}|)_{i\in I}. Therefore,

iIεivi(AAx)iLq\displaystyle\left\|\sum_{i\in I}\varepsilon_{i}v_{i}(A^{*}Ax)_{i}\right\|_{L_{q}}\leq iIvi(AAx)i+c2q(iI\Ivi2(AAx)i2)1/2\displaystyle\sum_{i\in I^{\prime}}v_{i}(A^{*}Ax)_{i}+c_{2}\sqrt{q}\left(\sum_{i\in I\backslash I^{\prime}}v_{i}^{2}(A^{*}Ax)_{i}^{2}\right)^{1/2}
\displaystyle\leq v2PIAAx2(1+c5qk),\displaystyle\|v\|_{2}\cdot\|P_{I^{\prime}}A^{*}Ax\|_{2}\left(1+c_{5}\sqrt{\frac{q}{k_{*}}}\right), (2.12)

using that for iI\Ii\in I\backslash I^{\prime},

|(AAx)i|PIAAx2k/2.|(A^{*}Ax)_{i}|\leq\frac{\|P_{I^{\prime}}A^{*}Ax\|_{2}}{\sqrt{k_{*}/2}}.

Recall that |supp(x)|=|J|k|{\rm supp}(x)|=|J|\leq k_{*} and that |I|=k/2|I^{\prime}|=k_{*}/2. The same argument used previously shows that

PIAAx2c6δk;\|P_{I^{\prime}}A^{*}Ax\|_{2}\leq c_{6}\delta\sqrt{k_{*}};

hence,

iIεivi(AAx)iLqc7v2δq,\left\|\sum_{i\in I}\varepsilon_{i}v_{i}\left(A^{*}Ax\right)_{i}\right\|_{L_{q}}\leq c_{7}\|v\|_{2}\delta\sqrt{q},

and the estimate holds for each q|J|q\geq|J| for that fixed xx.

Setting u1u\geq 1, it follows from Chebychev’s inequality that with probability at least 12exp(c8u2q)1-2\exp(-c_{8}u^{2}q),

|iIεivi(AAx)i|c9uv2δq,\left|\sum_{i\in I}\varepsilon_{i}v_{i}\left(A^{*}Ax\right)_{i}\right|\leq c_{9}u\|v\|_{2}\delta\sqrt{q},

and by the union bound, recalling that q|J|q\geq|J|, the same estimate holds uniformly for every xUx\in U, provided that uc10u\geq c_{10}. Thus, with probability at least 12exp(cu2q)1-2\exp(-cu^{2}q),

PJWv,I2cmaxxU|iIεivi(AAx)i|Cuv2δq,\|P_{J}W_{v,I}\|_{2}\leq c^{\prime}\max_{x\in U}\left|\sum_{i\in I}\varepsilon_{i}v_{i}(A^{*}Ax)_{i}\right|\leq Cu\|v\|_{2}\delta\sqrt{q},

and the wanted estimate follows from tail integration.  

The next observation deals with more refined estimates on random variables of the form

Xa,b=iIcεiaibi.X_{a,b}=\sum_{i\in I^{c}}\varepsilon_{i}a_{i}b_{i}.

Once again, we use the fact that for any JIcJ\subset I^{c}

Xa,bLqjJ|aibi|+cq(jIc\Jai2bi2)1/2.\|X_{a,b}\|_{L_{q}}\leq\sum_{j\in J}|a_{i}b_{i}|+c\sqrt{q}\bigl{(}\sum_{j\in I^{c}\backslash J}a_{i}^{2}b_{i}^{2}\bigr{)}^{1/2}. (2.13)

As one might have guessed, the choice of bb will be vectors of the form Wπst,IW_{\pi_{s}t,I}. These are random vectors that are independent of the Bernoulli random variables involved in the definition of XX. At the same time, aa will be deterministic.

Without loss of generality assume that ai,bi0a_{i},b_{i}\geq 0 for every ii. Let J1J_{1} be the set of indices corresponding to the k1k_{1} largest coordinates of (ai)iIc(a_{i})_{i\in I^{c}}, J2J_{2} is the set corresponding to the following k2k_{2} largest coordinates, and so on. The choice of k1,k2,,k_{1},k_{2},..., will be specified in what follows.

Note that for any >1\ell>1,

PJaPJ1a2|J1|=PJ1a2k1.\|P_{J_{\ell}}a\|_{\infty}\leq\frac{\|P_{J_{\ell-1}}a\|_{2}}{\sqrt{|J_{\ell-1}|}}=\frac{\|P_{J_{\ell-1}}a\|_{2}}{\sqrt{k_{\ell-1}}}.

Set J=J1J=J_{1}, and by (2.13),

Xa,bLq\displaystyle\|X_{a,b}\|_{L_{q}}\leq jJ1|aibi|+cq(1jJ+1ai2bi2)1/2\displaystyle\sum_{j\in J_{1}}|a_{i}b_{i}|+c\sqrt{q}\bigl{(}\sum_{\ell\geq 1}\sum_{j\in J_{\ell+1}}a_{i}^{2}b_{i}^{2}\bigr{)}^{1/2}
\displaystyle\leq PJ1a2PJ1b2+cq(1PJ+1a2PJ+1b22)1/2\displaystyle\|P_{J_{1}}a\|_{2}\|P_{J_{1}}b\|_{2}+c\sqrt{q}\bigl{(}\sum_{\ell\geq 1}\|P_{J_{\ell+1}}a\|_{\infty}^{2}\|P_{J_{\ell+1}}b\|_{2}^{2}\bigr{)}^{1/2}
\displaystyle\leq PJ1a2PJ1b2+cq(1PJa22|J|PJ+1b22)1/2\displaystyle\|P_{J_{1}}a\|_{2}\|P_{J_{1}}b\|_{2}+c\sqrt{q}\bigl{(}\sum_{\ell\geq 1}\frac{\|P_{J_{\ell}}a\|_{2}^{2}}{|J_{\ell}|}\|P_{J_{\ell+1}}b\|_{2}^{2}\bigr{)}^{1/2}
\displaystyle\leq PJ1a2PJ1b2+cqmax2PJb2k1(2PJa22)1/2\displaystyle\|P_{J_{1}}a\|_{2}\|P_{J_{1}}b\|_{2}+c\sqrt{q}\max_{\ell\geq 2}\frac{\|P_{J_{\ell}}b\|_{2}}{\sqrt{k_{\ell-1}}}\cdot\bigl{(}\sum_{\ell\geq 2}\|P_{J_{\ell}}a\|_{2}^{2}\bigr{)}^{1/2}
\displaystyle\leq a2(PJ1b2+cqmax2PJb2k1).\displaystyle\|a\|_{2}\Bigl{(}\|P_{J_{1}}b\|_{2}+c\sqrt{q}\max_{\ell\geq 2}\frac{\|P_{J_{\ell}}b\|_{2}}{\sqrt{k_{\ell-1}}}\Bigr{)}. (2.14)

And, in the case where |J|=k|J_{\ell}|=k for every \ell, it follows that

Xa,bLqa2(PJ1b2+cqkmax2PJb2).\|X_{a,b}\|_{L_{q}}\leq\|a\|_{2}\Bigl{(}\|P_{J_{1}}b\|_{2}+c\sqrt{\frac{q}{k}}\max_{\ell\geq 2}\|P_{J_{\ell}}b\|_{2}\Bigr{)}. (2.15)

2.3 Estimating Φ\Phi

Recall that

Φ=suptTss1AεΔst2\Phi=\sup_{t\in T}\sum_{s\geq s_{1}}\|A_{\varepsilon}\Delta_{s}t\|_{2}

and that for every tnt\in\mathbb{R}^{n},

Zt=ijεiεjtitjAei,Aej.Z_{t}=\sum_{i\not=j}\varepsilon_{i}\varepsilon_{j}t_{i}t_{j}\left\langle Ae_{i},Ae_{j}\right\rangle.
Theorem 2.5.

There are absolute constants cc and CC such that for u>1u>1, with probability at least 12exp(cu22s1)1-2\exp(-cu^{2}2^{s_{1}}),

Φ2Cu2δ22(T).\Phi^{2}\leq Cu^{2}\delta^{2}\ell_{*}^{2}(T).

Proof.  Let ss1s\geq s_{1}, and as noted previously, for every tTt\in T,

AεΔst22=\displaystyle\|A_{\varepsilon}\Delta_{s}t\|_{2}^{2}= i=1nAei22(Δst)i2+ijAei,Aejεiεj(Δst)i(Δst)j\displaystyle\sum_{i=1}^{n}\|Ae_{i}\|_{2}^{2}(\Delta_{s}t)_{i}^{2}+\sum_{i\not=j}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}(\Delta_{s}t)_{i}(\Delta_{s}t)_{j}
\displaystyle\leq (1+δ)Δst22+|ijAei,Aejεiεj(Δst)i(Δst)j|\displaystyle(1+\delta)\|\Delta_{s}t\|_{2}^{2}+\left|\sum_{i\not=j}\left\langle Ae_{i},Ae_{j}\right\rangle\varepsilon_{i}\varepsilon_{j}(\Delta_{s}t)_{i}(\Delta_{s}t)_{j}\right|
\displaystyle\leq 2Δst22+|ZΔst|.\displaystyle 2\|\Delta_{s}t\|_{2}^{2}+|Z_{\Delta_{s}t}|.

Following the decoupling argument, one may fix I{1,,n}I\subset\{1,...,n\}. The core of the argument is to obtain satisfactory estimates on moments of the random variables

VΔst,I=jIcεj(Δst)j(WΔst,I)jV_{\Delta_{s}t,I}=\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\Delta_{s}t,I})_{j}

for that (arbitrary) fixed choice of II. And, since |Ts|=22s|T_{s}|=2^{2^{s}}, for a uniform estimate that holds for every random variable of the form VΔst,I,tTV_{\Delta_{s}t,I},\ t\in T, it is enough to control the qq-th moment of each VΔst,IV_{\Delta_{s}t,I} for q2sq\sim 2^{s}.

With that in mind, denote by 𝔼Ic\mathbb{E}_{I^{c}} the expectation taken with respect to (εi)iIc(\varepsilon_{i})_{i\in I^{c}}, and set 𝔼I\mathbb{E}_{I} the expectation taken with respect to (εi)iI(\varepsilon_{i})_{i\in I}.

Let us apply (2.15), with the choice

k=1/δ2k,a=Δstandb=WΔst,I.k=1/\delta^{2}\leq k_{*},\ \ a=\Delta_{s}t\ \ {\rm and}\ \ b=W_{\Delta_{s}t,I}.

Thus, for every (εi)iI(\varepsilon_{i})_{i\in I},

(𝔼Ic|VΔst,I|q)1/qΔst2(PJ1WΔst,I2+c0δqmax2PJWΔst,I2).\left(\mathbb{E}_{I^{c}}|V_{\Delta_{s}t,I}|^{q}\right)^{1/q}\leq\|\Delta_{s}t\|_{2}\left(\|P_{J_{1}}W_{\Delta_{s}t,I}\|_{2}+c_{0}\delta\sqrt{q}\cdot\max_{\ell\geq 2}\|P_{J_{\ell}}W_{\Delta_{s}t,I}\|_{2}\right).

The sets JJ_{\ell} are all of cardinality 1/δ21/\delta^{2} and so there are at most max{δ2n,1}\max\{\delta^{2}n,1\} of them. By Lemma 2.4, for each one of the sets JIcJ_{\ell}\subset I^{c} and q|J|=1/δ2q\geq|J_{\ell}|=1/\delta^{2},

(𝔼IPJWΔst,I2q)1/qc1δqΔst2,\left(\mathbb{E}_{I}\|P_{J_{\ell}}W_{\Delta_{s}t,I}\|_{2}^{q}\right)^{1/q}\leq c_{1}\delta\sqrt{q}\|\Delta_{s}t\|_{2},

implying that

(𝔼ImaxPJWΔst,I2q)1/q\displaystyle\left(\mathbb{E}_{I}\max_{\ell}\|P_{J_{\ell}}W_{\Delta_{s}t,I}\|_{2}^{q}\right)^{1/q}\leq (𝔼IPJWΔst,I2q)1/q\displaystyle\left(\sum_{\ell}\mathbb{E}_{I}\|P_{J_{\ell}}W_{\Delta_{s}t,I}\|_{2}^{q}\right)^{1/q}
\displaystyle\leq 1/qc1δqΔst2c2δqΔst2\displaystyle\ell^{1/q}\cdot c_{1}\delta\sqrt{q}\|\Delta_{s}t\|_{2}\leq c_{2}\delta\sqrt{q}\|\Delta_{s}t\|_{2} (2.16)

as long as max{δ2n,1}exp(q)\max\{\delta^{2}n,1\}\leq\exp(q). In particular, since 2s1log(e(1+nδ2))2^{s_{1}}\geq\log(e(1+n\delta^{2})), it suffices that q2s1q\geq 2^{s_{1}} to ensure that (2.3) holds.

Hence, for every I{1,,n}I\subset\{1,...,n\},

(𝔼|VΔst,I|q)1/q=\displaystyle\left(\mathbb{E}|V_{\Delta_{s}t,I}|^{q}\right)^{1/q}= (𝔼I𝔼Ic|VΔst,I|q)1/q\displaystyle\left(\mathbb{E}_{I}\mathbb{E}_{I^{c}}|V_{\Delta_{s}t,I}|^{q}\right)^{1/q}
\displaystyle\leq Δst2(𝔼I(PJ1WΔst,I2+c0δqmax2PJWΔst,I2)q)1/q\displaystyle\|\Delta_{s}t\|_{2}\left(\mathbb{E}_{I}\left(\|P_{J_{1}}W_{\Delta_{s}t,I}\|_{2}+c_{0}\delta\sqrt{q}\cdot\max_{\ell\geq 2}\|P_{J_{\ell}}W_{\Delta_{s}t,I}\|_{2}\right)^{q}\right)^{1/q}
\displaystyle\leq Δst22((𝔼IPJ1WΔst,I2q)1/q+c0δq(𝔼Imax2PJWΔst,I2q)1/q)\displaystyle\|\Delta_{s}t\|_{2}\cdot 2\left(\left(\mathbb{E}_{I}\|P_{J_{1}}W_{\Delta_{s}t,I}\|_{2}^{q}\right)^{1/q}+c_{0}\delta\sqrt{q}\left(\mathbb{E}_{I}\max_{\ell\geq 2}\|P_{J_{\ell}}W_{\Delta_{s}t,I}\|_{2}^{q}\right)^{1/q}\right)
\displaystyle\leq c3Δst22(δq+δ2q)\displaystyle c_{3}\|\Delta_{s}t\|_{2}^{2}\left(\delta\sqrt{q}+\delta^{2}q\right)
\displaystyle\leq c4δ2qΔst22,\displaystyle c_{4}\delta^{2}q\|\Delta_{s}t\|_{2}^{2},

because δ2qδ22s11\delta^{2}q\geq\delta^{2}2^{s_{1}}\geq 1.

By Jensen’s inequality, for every tTt\in T,

(𝔼|ZΔst|q)1/q\displaystyle\left(\mathbb{E}|Z_{\Delta_{s}t}|^{q}\right)^{1/q}\leq 4(𝔼η𝔼ε|iIcεi(Δst)i(WΔst,I)i|q)1/q\displaystyle 4\left(\mathbb{E}_{\eta}\mathbb{E}_{\varepsilon}\left|\sum_{i\in I^{c}}\varepsilon_{i}(\Delta_{s}t)_{i}(W_{\Delta_{s}t,I})_{i}\right|^{q}\right)^{1/q}
=\displaystyle= 4(𝔼η𝔼ε|VΔst,I|q)1/qc5qδ2Δst22.\displaystyle 4\left(\mathbb{E}_{\eta}\mathbb{E}_{\varepsilon}|V_{\Delta_{s}t,I}|^{q}\right)^{1/q}\leq c_{5}q\delta^{2}\|\Delta_{s}t\|_{2}^{2}.

Setting q=u2sq=u2^{s} for u1u\geq 1 and ss1s\geq s_{1}, Chebychev’s inequality implies that with (εi)i=1n(\varepsilon_{i})_{i=1}^{n}-probability at least 12exp(c6u22s)1-2\exp(-c_{6}u^{2}2^{s}),

AεΔst22=\displaystyle\|A_{\varepsilon}\Delta_{s}t\|_{2}^{2}= 2Δst22+|ZΔst|2Δst22+c7u22sδ2Δst22\displaystyle 2\|\Delta_{s}t\|_{2}^{2}+|Z_{\Delta_{s}t}|\leq 2\|\Delta_{s}t\|_{2}^{2}+c_{7}u^{2}2^{s}\delta^{2}\|\Delta_{s}t\|_{2}^{2}
\displaystyle\leq c8u2δ22sΔst22.\displaystyle c_{8}u^{2}\delta^{2}2^{s}\|\Delta_{s}t\|_{2}^{2}.

By the union bound, this estimate holds for every tTt\in T and ss1s\geq s_{1}. Thus, there are absolute constants cc and CC such that with probability at least 12exp(cu22s1)1-2\exp(-cu^{2}2^{s_{1}}),

ss1AεΔst2Cuδss12s/2Δst2Cuδ(T),\sum_{s\geq s_{1}}\|A_{\varepsilon}\Delta_{s}t\|_{2}\leq Cu\delta\sum_{s\geq s_{1}}2^{s/2}\|\Delta_{s}t\|_{2}\leq Cu\delta\ell_{*}(T),

as claimed.

 

2.4 Estimating Ψ\Psi

Next, recall that

Ψ2=suptT|Aεπs1t22πs1t22|.\Psi^{2}=\sup_{t\in T}\left|\|A_{\varepsilon}\pi_{s_{1}}t\|_{2}^{2}-\|\pi_{s_{1}}t\|_{2}^{2}\right|.

Expanding as in (2.6), the “diagonal term” is at most δ2dT2\delta^{2}d_{T}^{2}, and one has to deal with the “off-diagonal” term

suptT|Zπs1t|.\sup_{t\in T}|Z_{\pi_{s_{1}}t}|.

As observed in Lemma 2.2 combined with Lemma 2.3, it suffices to obtain, for every fixed I{1,,n}I\subset\{1,...,n\}, estimates on the moments of

jIcεj(Δst)j(Wπs+1t,I)jiIεi(Δst)i(Wπst,Ic)iandjIcεj(πs0t)j(Wπs0t,I)j.\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j}\ \ \ \sum_{i\in I}\varepsilon_{i}(\Delta_{s}t)_{i}(W_{\pi_{s}t,I^{c}})_{i}\ \ {\rm and}\ \ \sum_{j\in I^{c}}\varepsilon_{j}(\pi_{s_{0}}t)_{j}(W_{\pi_{s_{0}}t,I})_{j}. (2.17)
Remark 2.6.

We shall assume throughout that ((T)/dT)21/δ2(\ell_{*}(T)/d_{T})^{2}\leq 1/\delta^{2}; the required modifications when the reverse inequality holds are straightforward and are therefore omitted.

We begin with a standard observation:

Lemma 2.7.

There is an absolute constant cc such that the following holds. Let (X)0(X_{\ell})_{\ell\geq 0} be nonnegative random variables, let q1q\geq 1 and set q=q2q_{\ell}=q2^{\ell}. If there is some κ\kappa such that for every \ell, XLqκ\|X_{\ell}\|_{L_{q_{\ell}}}\leq\kappa, then

maxXLqcκ\|\max_{\ell}X_{\ell}\|_{L_{q}}\leq c\kappa

Proof.  By Chebychev’s inequality, for every 1\ell\geq 1 and u2u\geq 2,

Pr(|X|uκ)𝔼|X|q(uκ)quq.Pr\left(|X_{\ell}|\geq u\kappa\right)\leq\frac{\mathbb{E}|X_{\ell}|^{q_{\ell}}}{(u\kappa)^{q_{\ell}}}\leq u^{-q_{\ell}}.

Therefore,

Pr(2:|X|uκ)2uq2c1u4q,Pr\left(\exists\ell\geq 2:|X_{\ell}|\geq u\kappa\right)\leq\sum_{\ell\geq 2}u^{-q2^{\ell}}\leq c_{1}u^{-4q},

implying that

𝔼max2|X|q0quq1Pr(max1|X|>u)𝑑u(c1κ)q.\mathbb{E}\max_{\ell\geq 2}|X_{\ell}|^{q}\leq\int_{0}^{\infty}qu^{q-1}Pr\left(\max_{\ell\geq 1}|X_{\ell}|>u\right)du\leq(c_{1}\kappa)^{q}.

Hence,

maxXLqX0Lq+X1Lq+max2XLqc2κ.\|\max_{\ell}X_{\ell}\|_{L_{q}}\leq\|X_{0}\|_{L_{q}}+\|X_{1}\|_{L_{q}}+\|\max_{\ell\geq 2}X_{\ell}\|_{L_{q}}\leq c_{2}\kappa.

 

The analysis is split into two cases. Case I: log(e(1+δ2n))1/δ2\log(e(1+\delta^{2}n))\leq 1/\delta^{2}. In this case, 2s1=1/δ22^{s_{1}}=1/\delta^{2}. For every s0ss1s_{0}\leq s\leq s_{1} we invoke (2.2) for sets JJ_{\ell} of cardinality 2s+2^{s+\ell} when s+s1s+\ell\leq s_{1}, and of cardinality 1/δ21/\delta^{2} when s+>s1s+\ell>s_{1}.

Set a=Δsta=\Delta_{s}t and b=Wπs+1t,Ib=W_{\pi_{s+1}t,I}; the treatment when b=Wπst,Icb=W_{\pi_{s}t,I^{c}} is identical and is omitted. Let q2sq\geq 2^{s} and put q=q2q_{\ell}=q2^{\ell} if s+s1s+\ell\leq s_{1}. Finally, set p1/δ2p\geq 1/\delta^{2}.

Consider \ell such that s+s1s+\ell\leq s_{1} and observe that q|J|=2s+q_{\ell}\geq|J_{\ell}|=2^{s+\ell}. By Lemma 2.4,

|J1|1/2(𝔼PJWπs+1t,I2q)1/qc1δπs+1t2q22s+c1q2sδdT.|J_{\ell-1}|^{-1/2}\left(\mathbb{E}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}^{q_{\ell}}\right)^{1/q_{\ell}}\leq c_{1}\delta\|\pi_{s+1}t\|_{2}\cdot\sqrt{\frac{q2^{\ell}}{2^{s+\ell}}}\leq c_{1}\sqrt{\frac{q}{2^{s}}}\delta d_{T}. (2.18)

Therefore, by Lemma 2.7,

(𝔼max{:s+s1}(|J1|1/2PJWπs+1t,I2)q)1/qc2q2sδdT.\left(\mathbb{E}\max_{\{\ell:s+\ell\leq s_{1}\}}\left(|J_{\ell-1}|^{-1/2}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}\right)^{q}\right)^{1/q}\leq c_{2}\sqrt{\frac{q}{2^{s}}}\delta d_{T}.

Next, if s+>s1s+\ell>s_{1} then p|J|=1/δ2p\geq|J_{\ell}|=1/\delta^{2} and by Lemma 2.4,

|J1|1/2(𝔼PJWπs+1t,I2p)1/pc3δπs+1t2p1/δ2c3pδ2δdT.|J_{\ell-1}|^{-1/2}\left(\mathbb{E}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}^{p}\right)^{1/p}\leq c_{3}\delta\|\pi_{s+1}t\|_{2}\cdot\sqrt{\frac{p}{1/\delta^{2}}}\leq c_{3}\sqrt{p\delta^{2}}\cdot\delta d_{T}. (2.19)

There are at most nδ2n\delta^{2} sets JJ_{\ell} in that range, and because nδ2exp(p)n\delta^{2}\leq\exp(p), it is evident that

(𝔼max{:s+>s1}(|J1|1/2PJWπs+1t,I2)p)1/pc4pδ2δdT;\left(\mathbb{E}\max_{\{\ell:s+\ell>s_{1}\}}\left(|J_{\ell-1}|^{-1/2}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}\right)^{p}\right)^{1/{p}}\leq c_{4}\sqrt{p\delta^{2}}\cdot\delta d_{T};

therefore, as qpq\leq p,

(𝔼max{:s+>s1}(|J1|1/2PJWπs+1t,I2)q)1/qc4pδ2δdT;\left(\mathbb{E}\max_{\{\ell:s+\ell>s_{1}\}}\left(|J_{\ell-1}|^{-1/2}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}\right)^{q}\right)^{1/{q}}\leq c_{4}\sqrt{p\delta^{2}}\delta d_{T};

Thus, for every fixed I{1,,n}I\subset\{1,...,n\},

(𝔼max(|J1|1/2PJWπs+1t,I2)q)1/qc5max{q2s,pδ2}δdT.\left(\mathbb{E}\max_{\ell}\left(|J_{\ell-1}|^{-1/2}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}\right)^{q}\right)^{1/{q}}\leq c_{5}\max\left\{\sqrt{\frac{q}{2^{s}}},\sqrt{p\delta^{2}}\right\}\cdot\delta d_{T}. (2.20)

Now, by (2.15),

(𝔼Ic|jIcεj(Δst)j(Wπs+1t,I)j|q)1/q\displaystyle\left(\mathbb{E}_{I^{c}}\left|\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j}\right|^{q}\right)^{1/q}
\displaystyle\leq Δst2(PJ1Wπs+1t,I2+c6qmax2PJ+1Wπs+1t,I2|J|),\displaystyle\|\Delta_{s}t\|_{2}\left(\|P_{J_{1}}W_{\pi_{s+1}t,I}\|_{2}+c_{6}\sqrt{q}\max_{\ell\geq 2}\frac{\|P_{J_{\ell+1}}W_{\pi_{s+1}t,I}\|_{2}}{\sqrt{|J_{\ell}|}}\right), (2.21)

which, combined with (2.18) and (2.20), implies that

(𝔼|jIcεj(Δst)j(Wπs+1t,I)j|q)1/qc7qΔst2(1+max{q2s,pδ2})δdT.\left(\mathbb{E}\left|\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}(W_{\pi_{s+1}t,I})_{j}\right|^{q}\right)^{1/q}\leq c_{7}\sqrt{q}\|\Delta_{s}t\|_{2}\cdot\Bigl{(}1+\max\left\{\sqrt{\frac{q}{2^{s}}},\sqrt{p\delta^{2}}\right\}\Bigr{)}\cdot\delta d_{T}. (2.22)

Clearly, there are at most 22s+32^{2^{s+3}} random variables as in (2.22). With that in mind, set u1u\geq 1 and let q=u2s,p=u/δ2q=u2^{s},\ p=u/\delta^{2}. By Lemma 2.2 and Lemma 2.3 followed by the union bound, we have that with probability at least 12exp(c8u22s)1-2\exp(-c_{8}u^{2}2^{s}), for every tTt\in T,

𝔼η|jIcεj(Δst)jWπs+1t,I|c9u22s/2Δst2δdT.\mathbb{E}_{\eta}\left|\sum_{j\in I^{c}}\varepsilon_{j}(\Delta_{s}t)_{j}W_{\pi_{s+1}t,I}\right|\leq c_{9}u^{2}2^{s/2}\|\Delta_{s}t\|_{2}\cdot\delta d_{T}. (2.23)

And, by the union bound and recalling that 2s0=((T)/dT)22^{s_{0}}=(\ell_{*}(T)/d_{T})^{2}, (2.23) holds uniformly for every tTt\in T and s0ss1s_{0}\leq s\leq s_{1} with probability at least

12exp(c10u2((T)dT)2).1-2\exp\left(-c_{10}u^{2}\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2}\right).

An identical argument shows that with probability at least

12exp(c11u22s0)=12exp(c11u2((T)/dT)2),1-2\exp(-c_{11}u^{2}2^{s_{0}})=1-2\exp(-c_{11}u^{2}(\ell_{*}(T)/d_{T})^{2}),

for every tTt\in T,

𝔼η|iIcεi(πs0t)i(Wπs0t,I)i|c12u22s0/2dTδdTc13dTδ(T).\mathbb{E}_{\eta}\left|\sum_{i\in I^{c}}\varepsilon_{i}(\pi_{s_{0}}t)_{i}(W_{\pi_{s_{0}}t,I})_{i}\right|\leq c_{12}u^{2}2^{s_{0}/2}d_{T}\cdot\delta d_{T}\leq c_{13}d_{T}\delta\ell_{*}(T).

Finally, invoking Lemma 2.2, we have that with probability at least 12exp(cu2((T)/dT)2)1-2\exp(-cu^{2}(\ell_{*}(T)/d_{T})^{2}), for every tTs1t\in T_{s_{1}},

|Zt|Cu2dTδ(T),|Z_{t}|\leq Cu^{2}d_{T}\delta\ell_{*}(T),

as required.  

Case II: log(e(1+δ2n))>1/δ2\log(e(1+\delta^{2}n))>1/\delta^{2}.

The necessary modifications are minor and we shall only sketch them. In this range, 2s1=log(e(1+δ2n))2^{s_{1}}=\log(e(1+\delta^{2}n)), and the problem is that for each vector Δst\Delta_{s}t, the number of blocks JJ_{\ell} of cardinality 1/δ21/\delta^{2}—namely, nδ2n\delta^{2}, is larger than exp(2s)\exp(2^{s}) when ss1s\leq s_{1}. Therefore, setting |J|=1/δ2|J_{\ell}|=1/\delta^{2} for every \ell, the uniform estimate on

maxPJ+1Wπs+1t,I2|J|=δmaxPJWπs+1t,I2\max_{\ell}\frac{\|P_{J_{\ell}+1}W_{\pi_{s+1}t,I}\|_{2}}{\sqrt{|J_{\ell}|}}=\delta\max_{\ell}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}

can be obtained by bounding (𝔼PJWπs+1t,I2q)1/q\left(\mathbb{E}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}^{q}\right)^{1/q} for qlog(δ2n)q\geq\log(\delta^{2}n). Indeed, by Lemma 2.4, for every I{1,n}I\subset\{1,...n\} and every tTt\in T, we have that

δ(𝔼maxPJWπs+1t,I2q)1/qδ2qdT.\delta\left(\mathbb{E}\max_{\ell}\|P_{J_{\ell}}W_{\pi_{s+1}t,I}\|_{2}^{q}\right)^{1/q}\leq\delta^{2}\sqrt{q}d_{T}.

The rest of the argument is identical to the previous one and is omitted.  

Proof of Theorem 1.11. Using the estimates we established, it follows that for uc0u\geq c_{0}, with probability at least

12exp(c1u2((T)dT)2),1-2\exp\left(-c_{1}u^{2}\left(\frac{\ell_{*}(T)}{d_{T}}\right)^{2}\right),
ΦCuδ(T)andΨ2Cu2dTδ(T)(1+δlog(e(1+nδ2))),\Phi\leq Cu\delta\ell_{*}(T)\ \ {\rm and}\ \ \Psi^{2}\leq Cu^{2}d_{T}\delta\ell_{*}(T)\left(1+\delta\sqrt{\log(e(1+n\delta^{2}))}\right),

noting that δdT2c2dTδ(T)\delta d_{T}^{2}\leq c_{2}d_{T}\delta\ell_{*}(T). Since

suptT|Aεt22t22|Ψ2+2ΦΨ2+dT2+Φ2+C((δ(T))2+dTδ(T))\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq\Psi^{2}+2\Phi\sqrt{\Psi^{2}+d_{T}^{2}}+\Phi^{2}+C^{\prime}\left((\delta\ell_{*}(T))^{2}+d_{T}\delta\ell_{*}(T)\right)

the claim follows from a straightforward computation, by separating to the cases ΦΨ\Phi\leq\Psi and ΨΦ\Psi\leq\Phi.  

3 Application - A circulant Bernoulli matrix

Let (εi)i=1n(\varepsilon_{i})_{i=1}^{n} and (εi)i=1n(\varepsilon_{i}^{\prime})_{i=1}^{n} be independent Bernoulli vectors. Set ξ=(εi)i=1n\xi=(\varepsilon_{i}^{\prime})_{i=1}^{n} and put Dε=diag(ε1,,εn)D_{\varepsilon}={\rm diag}(\varepsilon_{1},...,\varepsilon_{n}). Let Γ\Gamma be the circulant matrix generated by the random vector ξ\xi; that is, Γ\Gamma is the matrix whose jj-th row is the shifted vector τjξ\tau_{j}\xi, where for every vnv\in\mathbb{R}^{n}, τjv=(vji)i=1n\tau_{j}v=(v_{j-i})_{i=1}^{n}.

Fix I{1,,n}I\subset\{1,...,n\} of cardinality mm and let

A=1mPIΓ:nmA=\sqrt{\frac{1}{m}}P_{I}\Gamma:\mathbb{R}^{n}\to\mathbb{R}^{m}

to be the normalized partial circulant matrix. It follows from Theorem 3.1 in [6] and the estimates in Section 4 there that for a typical realization of ξ\xi, the matrix AA is (δ,k)(\delta,k)-regular for δm1/2log2n\delta\sim m^{-1/2}\log^{2}n:

Theorem 3.1.

[6] There exist absolute constants cc and CC such that the following holds. For x>0x>0 with probability at least

12exp(cmin{x2mk,xmklog2n})1-2\exp\left(-c\min\left\{x^{2}\frac{m}{k},x\sqrt{\frac{m}{k}}\log^{2}n\right\}\right)

we have that

suptΣk|At221|C(1+x)kmlog2n.\sup_{t\in\Sigma_{k}}\left|\|At\|_{2}^{2}-1\right|\leq C(1+x)\sqrt{\frac{k}{m}}\log^{2}n.

By Theorem 3.1 and the union bound for 1k1/δ21\leq k\leq 1/\delta^{2}, there is an event Ω\Omega with probability at least 12exp(clog4n)1-2\exp(-c^{\prime}\log^{4}n) with respect to (εi)i=1n(\varepsilon_{i}^{\prime})_{i=1}^{n}, on which, for every k1/δ2k\leq 1/\delta^{2},

suptΣk|At221|δk.\sup_{t\in\Sigma_{k}}\left|\|At\|_{2}^{2}-1\right|\leq\delta\sqrt{k}.

This verifies the assumption needed in Theorem 1.11 on the event Ω\Omega. Now fix (εi)i=1nΩ(\varepsilon_{i}^{\prime})_{i=1}^{n}\in\Omega and let AA be the resulting partial circulant matrix. Set Aε=ADεA_{\varepsilon}=AD_{\varepsilon} and let TnT\subset\mathbb{R}^{n}. By Theorem 1.11, with probability at least

12exp(c((T)/dT)2)1-2\exp(-c^{\prime}(\ell_{*}(T)/d_{T})^{2})

with respect to (εi)i=1n(\varepsilon_{i})_{i=1}^{n}, we have that

suptT|Aεt22t22|C(ΛdT(T)mlog2n+((T)m)2log4n)\sup_{t\in T}\left|\|A_{\varepsilon}t\|_{2}^{2}-\|t\|_{2}^{2}\right|\leq C^{\prime}\left(\Lambda d_{T}\frac{\ell_{*}(T)}{\sqrt{m}}\cdot\log^{2}n+\left(\frac{\ell_{*}(T)}{\sqrt{m}}\right)^{2}\cdot\log^{4}n\right)

where

Λc′′max{1,log5nm}.\Lambda\leq c^{\prime\prime}\max\left\{1,\frac{\log^{5}n}{m}\right\}.

Thus, a random matrix generated by 2n2n independent random signs is a good embedding of an arbitrary subset of n\mathbb{R}^{n} with the same accuracy (up to logarithmic factors) as a gaussian matrix. Moreover, conditioned on the choice of the circulant matrix AA, the probability estimate coincides with the estimate in the gaussian case.

References

  • [1] Shiri Artstein-Avidan, Apostolos Giannopoulos, and Vitali D. Milman. Asymptotic geometric analysis. Part I, volume 202 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2015.
  • [2] Withold Bednorz. Concentration via chaining method and its applications. arXiv:1405.0676.
  • [3] Sjoerd Dirksen. Tail bounds via generic chaining. Electron. J. Probab., 20:no. 53, 29, 2015.
  • [4] Sjoerd Dirksen. Dimensionality reduction with subgaussian matrices: a unified theory. Found. Comput. Math., 16(5):1367–1396, 2016.
  • [5] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Conn., 1982), volume 26 of Contemp. Math., pages 189–206. Amer. Math. Soc., Providence, RI, 1984.
  • [6] Felix Krahmer, Shahar Mendelson, and Holger Rauhut. Suprema of chaos processes and the restricted isometry property. Comm. Pure Appl. Math., 67(11):1877–1904, 2014.
  • [7] Felix Krahmer and Rachel Ward. New and improved Johnson-Lindenstrauss embeddings via the restricted isometry property. SIAM J. Math. Anal., 43(3):1269–1281, 2011.
  • [8] Shahar Mendelson. Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl., 126(12):3652–3680, 2016.
  • [9] Samet Oymak, Benjamin Recht, and Mahdi Soltanolkotabi. Isometric sketching of any set via the restricted isometry property. Inf. Inference, 7(4):707–726, 2018.
  • [10] Michel Talagrand. Upper and lower bounds for stochastic processes. Springer, 2014.