This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Unique Decoding of Explicit ε\varepsilon-balanced Codes Near
the Gilbert–Varshamov Bound

Fernando Granha Jeronimo University of Chicago. [email protected]. Supported in part by NSF grant CCF-1816372.    Dylan Quintana University of Chicago. [email protected]    Shashank Srivastava TTIC. [email protected]. Supported in part by NSF grant CCF-1816372.    Madhur Tulsiani TTIC. [email protected]. Supported by NSF grant CCF-1816372.

The Gilbert–Varshamov bound (non-constructively) establishes the existence of binary codes of distance 1/2ε1/2-\varepsilon and rate Ω(ε2)\Omega(\varepsilon^{2}) (where an upper bound of O(ε2log(1/ε))O(\varepsilon^{2}\log(1/\varepsilon)) is known). Ta-Shma [STOC 2017] gave an explicit construction of ε\varepsilon-balanced binary codes, where any two distinct codewords are at a distance between 1/2ε/21/2-\varepsilon/2 and 1/2+ε/21/2+\varepsilon/2, achieving a near optimal rate of Ω(ε2+β)\Omega(\varepsilon^{2+\beta}), where β0\beta\rightarrow 0 as ε0\varepsilon\rightarrow 0.

We develop unique and list decoding algorithms for (a slight modification of) the family of codes constructed by Ta-Shma, in the adversarial error model. We prove the following results for ε\varepsilon-balanced codes with block length NN and rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) in this family:

  • -

    For all ε,β>0\varepsilon,\beta>0 there are explicit codes which can be uniquely decoded up to an error of half the minimum distance in time NOε,β(1)N^{O_{\varepsilon,\beta}(1)}.

  • -

    For any fixed constant β\beta independent of ε\varepsilon, there is an explicit construction of codes which can be uniquely decoded up to an error of half the minimum distance in time (log(1/ε))O(1)NOβ(1)(\log(1/\varepsilon))^{O(1)}\cdot N^{O_{\beta}(1)}.

  • -

    For any ε>0\varepsilon>0, there are explicit ε\varepsilon-balanced codes with rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) which can be list decoded up to error 1/2ε1/2-\varepsilon^{\prime} in time NOε,ε,β(1)N^{O_{\varepsilon,\varepsilon^{\prime},\beta}(1)}, where ε,β0\varepsilon^{\prime},\beta\rightarrow 0 as ε0\varepsilon\rightarrow 0.

The starting point of our algorithms is the framework for list decoding direct-sum codes develop in Alev et al. [SODA 2020], which uses the Sum-of-Squares SDP hierarchy. The rates obtained there were quasipolynomial in ε\varepsilon. Here, we show how to overcome the far from optimal rates of this framework obtaining unique decoding algorithms for explicit binary codes of near optimal rate. These codes are based on simple modifications of Ta-Shma’s construction.

1 Introduction

Binary error correcting codes have pervasive applications [Gur10, GRS19] and yet we are far from understanding some of their basic properties [Gur09]. For instance, until very recently no explicit binary code achieving distance 1/2ε/21/2-\varepsilon/2 with rate near Ω(ε2)\Omega(\varepsilon^{2}) was known, even though the existence of such codes was (non-constructively) established long ago [Gil52, Var57] in what is now referred as the Gilbert–Varshamov (GV) bound. On the impossibility side, a rate upper bound of O(ε2log(1/ε))O(\varepsilon^{2}\log(1/\varepsilon)) is known for binary codes of distance 1/2ε/21/2-\varepsilon/2 (e.g., [Del75, MRRW77, NS09]).

In a breakthrough result [TS17], Ta-Shma gave an explicit construction of binary codes achieving nearly optimal distance versus rate trade-off, namely, binary codes of distance 1/2ε/21/2-\varepsilon/2 with rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) where β\beta vanishes as ε\varepsilon vanishes 111In fact, Ta-Shma obtained β=β(ε)=Θ(((loglog1/ε)/log1/ε)1/3)\beta=\beta(\varepsilon)=\Theta(((\log{\log{1/\varepsilon}})/\log{1/\varepsilon})^{1/3}) and thus limε0β(ε)=0\lim_{\varepsilon\rightarrow 0}\beta(\varepsilon)=0.. Actually, Ta-Shma obtained ε\varepsilon-balanced binary linear codes, that is, linear binary codes with the additional property that non-zero codewords have Hamming weight bounded not only below by 1/2ε/21/2-\varepsilon/2 but also above by 1/2+ε/21/2+\varepsilon/2, and this is a fundamental property in the study of pseudo-randomness [NN90, AGHP92].

While the codes constructed by Ta-Shma are explicit, they were not known to admit efficient decoding algorithms, while such results are known for codes with smaller rates. In particular, an explicit binary code due to Guruswami and Rudra [GR06] is known to be even list decodable at an error radius 1/2ε1/2-\varepsilon with rate Ω(ε3)\Omega(\varepsilon^{3}). We consider the following question:

Do explicit binary codes near the GV bound admit an efficient decoding algorithm?

Here, we answer this question in the affirmative by providing an efficient 222By “efficient”, we mean polynomial time. Given the fundamental nature of the problem of decoding nearly optimal binary codes, it is an interesting open problem to make these techniques viable in practice. unique decoding algorithm for (essentially) Ta-Shma’s code construction, which we refer as Ta-Shma codes. More precisely, by building on Ta-Shma’s construction and using our unique decoding algorithm we have the following result.

Theorem 1.1 (Unique Decoding).

For every ε>0\varepsilon>0 sufficiently small, there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} for infinitely many values NN\in\mathbb{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) where β=O(1/(log2(1/ε))1/6)\beta=O(1/(\log_{2}(1/\varepsilon))^{1/6}), and

  3. (iii)

    a unique decoding algorithm with running time NOε,β(1)N^{O_{\varepsilon,\beta}(1)}.

Furthermore, if instead we take β>0\beta>0 to be an arbitrary constant, the running time becomes (log(1/ε))O(1)NOβ(1)(\log(1/\varepsilon))^{O(1)}\cdot N^{O_{\beta}(1)} (fixed polynomial time).

We can also perform “gentle” list decoding in the following sense (note that this partially implies Theorem 1.1).

Theorem 1.2 (Gentle List Decoding).

For every ε>0\varepsilon>0 sufficiently small, there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} for infinitely many values NN\in\mathbb{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) where β=O(1/(log2(1/ε))1/6)\beta=O(1/(\log_{2}(1/\varepsilon))^{1/6}), and

  3. (iii)

    a list decoding algorithm that decodes within radius 1/22Θ((log2(1/ε))1/6)1/2-2^{-\Theta((\log_{2}(1/\varepsilon))^{1/6})} in time NOε,β(1)N^{O_{\varepsilon,\beta}(1)}.

We observe that the exponent in the running time NOε,β(1)N^{O_{\varepsilon,\beta}(1)} appearing in Theorem 1.1 and Theorem 1.2 depends on ε\varepsilon. This dependence is no worse than O(loglog(1/ε))O(\log\log(1/\varepsilon)), and if β>0\beta>0 is taken to be an arbitrarily constant (independent of ε\varepsilon), the running time becomes (log(1/ε))O(1)NOβ(1)(\log(1/\varepsilon))^{O(1)}\cdot N^{O_{\beta}(1)}. Avoiding this dependence in the exponent when β=β(ε)\beta=\beta(\varepsilon) is an interesting open problem. Furthermore, obtaining a list decoding radius of 1/2ε/21/2-\varepsilon/2 in Theorem 1.2 with the same rate (or even Ω(ε2)\Omega(\varepsilon^{2})) is another very interesting open problem and related to a central open question in the adversarial error regime [Gur09].

Direct sum codes.  Our work can be viewed within the broader context of developing algorithms for the decoding of direct sum codes. Given a (say linear) code 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} and a collection of tuples W[n]tW\subseteq[n]^{t}, the code dsumW(𝒞)\operatorname{dsum}_{W}(\mathcal{C}) with block length |W|\left\lvert W\right\rvert is defined as

dsumW(𝒞)={(zw1+zw2++zwt)wWz𝒞}.\operatorname{dsum}_{W}(\mathcal{C})=\left\{(z_{w_{1}}+z_{w_{2}}+\cdots+z_{w_{t}})_{w\in W}\mid z\in\mathcal{C}\right\}.

The direct sum operation has been used for several applications in coding and complexity theory [ABN+92, IW97, GI01, IKW09, DS14, DDG+15, Cha16, DK17, Aro02]. It is easy to see that if 𝒞\mathcal{C} is ε0\varepsilon_{0}-balanced for a constant ε0\varepsilon_{0}, then for any ε>0\varepsilon>0, choosing WW to be a random collection of tuples of size O(n/ε2)O(n/\varepsilon^{2}) results in dsumW(𝒞)\operatorname{dsum}_{W}(\mathcal{C}) being an ε\varepsilon-balanced code. The challenge in trying to construct good codes using this approach is to find explicit constructions of (sparse) collections WW which are “pseudorandom” enough to yield a similar distance amplification as above. On the other hand, the challenge in decoding such codes is to identify notions of “structure” in such collections WW, which can be exploited by decoding algorithms.

In Ta-Shma’s construction [TS17], such a pseudorandom collection WW was constructed by considering an expanding graph GG over the vertex set [n][n], and generating tt-tuples using sufficiently long walks of length t1t-1 over the so-called ss-wide replacement product of GG with another (small) expanding graph HH. Roughly speaking, this graph product is a generalization of the celebrated zig-zag product [RVW00] but with ss different steps of the zig-zag product instead of a single one. Ta-Shma’s construction can also be viewed as a clever way of selecting a sub-collection of all walks in GG, which refines an earlier construction suggested by Rozenman and Wigderson [Bog12] (and also analyzed by Ta-Shma) using all walks of length t1t-1.

Identifying structures to facilitate decoding.  For the closely related direct product construction (where the entry corresponding to wWw\in W is the entire tt-tuple (zw1,,zwt)(z_{w_{1}},\ldots,z_{w_{t}})) which amplifies distance but increases the alphabet size, it was proved by Alon et al. [ABN+92] that the resulting code admits a unique decoding algorithm if the incidence graph corresponding to the collection WW is a good sampler. Very recently, it was proved by Dinur et al. [DHK+19] that such a direct product construction admits list decoding if the incidence graph is a “double sampler”. The results of [DHK+19] also apply to direct sum, but the use of double samplers pushes the rate away from near optimality.

For the case of direct sum codes, the decoding task can be phrased as a maximum tt-XOR problem with the additional constraint that the solution must lie in 𝒞\mathcal{C}. More precisely, given y~𝔽2W\tilde{y}\in\mathbb{F}_{2}^{W} within the unique decoding radius of dsumW(𝒞)\operatorname{dsum}_{W}(\mathcal{C}), we consider the following optimization problem

argminz𝒞Δ(y~,dsumW(z)),\operatorname*{\arg\!\min}_{z\in\mathcal{C}}~{}\Delta(\tilde{y},\operatorname{dsum}_{W}(z)),

where Δ(,)\Delta(\cdot,\cdot) is the (normalized) Hamming distance. While maximum tt-XOR is in general hard to solve to even any non-trivial degree of approximation [Hås97], previous work by the authors [AJQ+20] identified a structural condition on WW called “splittability” under which the above constraint satisfaction problem can be solved (approximately) resulting in efficient unique and list decoding algorithms. However, by itself the splittability condition is too crude to be applicable to codes such as the ones in Ta-Shma’s construction. The requirements it places on the expansion of GG are too strong and the framework in [AJQ+20] is only able to obtain algorithms for direct sum codes with rate 2(log(1/ε))2ε2+β2^{-(\log(1/\varepsilon))^{2}}\ll\varepsilon^{2+\beta}.

The conceptual contribution of this work can be viewed as identifying a different recursive structure in direct sums generated by expander walks, which allows us to view the construction as giving a sequence of codes 𝒞0,𝒞1,,𝒞\mathcal{C}_{0},\mathcal{C}_{1},\ldots,\mathcal{C}_{\ell}. Here, 𝒞0\mathcal{C}_{0} is the starting code 𝒞\mathcal{C} and 𝒞\mathcal{C}_{\ell} is the final desired code, and each element in the sequence can be viewed as being obtained via a direct sum operation on the preceding code. Instead of considering a “one-shot” decoding task of finding an element of 𝒞0\mathcal{C}_{0}, this facilitates an iterative approach where at each step we reduce the task of decoding the code 𝒞i\mathcal{C}_{i} to decoding for 𝒞i1\mathcal{C}_{i-1}, using the above framework from [AJQ+20]. Such an iterative approach with a sequence of codes was also used (in a very different setting) in a work of Guruswami and Indyk [GI03] constructing codes over a large alphabet which are list decodable in linear time via spectral algorithms.

Another simple and well-known (see e.g., [GI04]) observation, which is very helpful in our setting, is the use of list decoding algorithms for unique decoding. For a code with distance 1/2ε/21/2-\varepsilon/2, unique decoding can be obtained by list decoding at a much smaller error radius of (say) 1/21/81/2-1/8. This permits a much more efficient application of the framework from [AJQ+20], with a milder dependence on the expansion of the graphs GG and HH in Ta-Shma’s construction, resulting in higher rates. We give a more detailed overview of our approach in Section 3.

Known results for random ensembles.  While the focus in this work is on explicit constructions, there are several known (non-explicit) constructions of random ensembles of binary codes near or achieving the Gilbert–Varshamov bound (e.g., Table 1). Although it is usually straightforward to ensure the desired rate in such constructions, the distance only holds with high probability. Given a sample code from such ensembles, certifying the minimum distance is usually not known to be polynomial time in the block length. Derandomizing such constructions is also a possible avenue for obtaining optimal codes, although such results remain elusive to this date (to the best of our knowledge).

One of the simplest constructions is that of random binary linear codes in which the generator matrix is sampled uniformly. This random ensemble achieves the GV bound with high probability, but its decoding is believed to be computationally hard [MMT11].

Much progress has been made on binary codes by using results for larger alphabet codes [Gur09]. Codes over non-binary alphabets with optimal (or nearly optimal) parameters are available [vL99, Sti08, GR06] and thanks to this availability a popular approach to constructing binary codes has been to concatenate such large alphabet codes with binary ones. Thommesen [Tho83] showed that by concatenating Reed–Solomon (RS) codes with random binary codes (one random binary code for each position of the outer RS code) it is possible to achieve the GV bound. Note that Thommesen codes arise from a more structured ensemble than random binary linear codes. This additional structure enabled Guruswami and Indyk [GI04] to obtain efficient decoding algorithms for the non-explicit Thommesen codes (whose minimum distance is not known to admit efficient certification). This kind of concatenation starting from a large alphabet code and using random binary codes, which we refer as Thommesen-like, has been an important technique in tackling binary code constructions with a variety of properties near or at the GV bound. An important drawback in several such Thommesen-like code constructions is that they end up being non-explicit (unless efficient derandomization or brute-force is viable).

Using a Thommesen-like construction, Gopi et al. [GKO+17] showed non-explicit constructions of locally testable and locally correctable binary codes approaching the GV bound. More recently, again with a Thommesen-like construction, Hemenway et al. [HRW17] obtained non-explicit near linear time unique decodable codes at the GV bound improving the running time of Guruswami and Indyk [GI04] (and also the decoding rates). We summarize the results discussed so far in Table 1.

Binary Code Results near the Gilbert–Varshamov bound
Who? Construction GV Explicit Concatenated Decoding Local
[Gil52, Var57] existential yes no no no n/a
[Tho83] Reed–Solomon + random binary yes no yes no n/a
[GI04] Thommesen [Tho83] yes no yes unique decoding n/a
[GKO+17] Thommesen-like yes no yes unique decoding LTC/LCC
[HRW17] Thommesen-like yes no yes near linear time unique decoding n/a
[TS17] Expander-based Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) yes no no n/a
this paper Ta-Shma [TS17] Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) yes no gentle list decoding n/a
Table 1: GV bound related results for binary codes.

There are also non-explicit constructions known to achieve list decoding capacity [GR08, MRRZ+19] (being concatenated or LDPC/Gallager [Gal62] is not an obstruction to achieve capacity). Contrary to the other results in this subsection, Guruswami and Rudra [Gur05, GR06, Gur09], also using a Thommesen-like construction, obtained explicit codes that are efficiently list decodable from radius 1/2ε1/2-\varepsilon with rate Ω(ε3)\Omega(\varepsilon^{3}). This was done by concatenating the so-called folded Reed–Solomon codes with a derandomization of a binary ensemble of random codes.

Results for non-adversarial error models.  All the results mentioned above are for the adversarial error model of Hamming [Ham50, Gur10]. In the setting of random corruptions (Shannon model), the situation seems to be better understood thanks to the seminal result on explicit polar codes of Arikan [Ari09]. More recently, in another breakthrough Guruswami et al. [GRY19] showed that polar codes can achieve almost linear time decoding with near optimal convergence to capacity for the binary symmetric channel. This result gives an explicit code construction achieving parameter trade-offs similar to Shannon’s randomized construction [Sha48] while also admitting very efficient encoding and decoding. Explicit capacity-achieving constructions are also known for bounded memory channels [SKS19] which restrict the power of the adversary and thus interpolate between the Shannon and Hamming models.

2 Preliminaries and Notation

2.1 Codes

We briefly recall some standard code terminology. Given z,z𝔽2nz,z^{\prime}\in{\mathbb{F}}_{2}^{n}, recall that the relative Hamming distance between zz and zz^{\prime} is Δ(z,z)|{izizi}|/n\Delta(z,z^{\prime})\coloneqq\left\lvert\{i\mid z_{i}\neq z_{i}^{\prime}\}\right\rvert/n. A binary code is any subset 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n}. The distance of 𝒞\mathcal{C} is defined as Δ(𝒞)minzzΔ(z,z)\Delta(\mathcal{C})\coloneqq\min_{z\neq z^{\prime}}\Delta(z,z^{\prime}) where z,z𝒞z,z^{\prime}\in\mathcal{C}. We say that 𝒞\mathcal{C} is a linear code if 𝒞\mathcal{C} is a linear subspace of 𝔽2n\mathbb{F}_{2}^{n}. The rate of 𝒞\mathcal{C} is log2(|𝒞|)/n\log_{2}(\left\lvert\mathcal{C}\right\rvert)/n.

Instead of discussing the distance of a binary code, it will often be more natural to phrase results in terms of its bias.

Definition 2.1 (Bias).

The bias of a word z𝔽2nz\in{\mathbb{F}}_{2}^{n} is defined as bias(z)|𝔼i[n](1)zi|\operatorname{bias}(z)\coloneqq\left\lvert{\mathbb{E}}_{i\in[n]}(-1)^{z_{i}}\right\rvert. The bias of a code 𝒞\mathcal{C} is the maximum bias of any non-zero codeword in 𝒞\mathcal{C}.

Definition 2.2 (ε\varepsilon-balanced Code).

A binary code 𝒞\mathcal{C} is ε\varepsilon-balanced if bias(z+z)ε\operatorname{bias}(z+z^{\prime})\leq\varepsilon for every pair of distinct z,z𝒞z,z^{\prime}\in\mathcal{C}.

Remark 2.3.

For linear binary code 𝒞\mathcal{C}, the condition bias(𝒞)ε\operatorname{bias}(\mathcal{C})\leq\varepsilon is equivalent to 𝒞\mathcal{C} being an ε\varepsilon-balanced code.

2.2 Direct Sum Lifts

Starting from a code 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n}, we amplify its distance by considering the direct sum lifting operation based on a collection W(k)[n]kW(k)\subseteq[n]^{k}. The direct sum lifting maps each codeword of 𝒞\mathcal{C} to a new word in 𝔽2|W(k)|{\mathbb{F}}_{2}^{|W(k)|} by taking the kk-XOR of its entries on each element of W(k)W(k).

Definition 2.4 (Direct Sum Lifting).

Let W(k)[n]kW(k)\subseteq[n]^{k}. For z𝔽2nz\in{\mathbb{F}}_{2}^{n}, we define the direct sum lifting as dsumW(k)(z)=y\operatorname{dsum}_{W(k)}(z)=y such that y𝔰=i𝔰ziy_{\mathfrak{s}}=\sum_{i\in\mathfrak{s}}z_{i} for all 𝔰W(k)\mathfrak{s}\in W(k). The direct sum lifting of a code 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} is

dsumW(k)(𝒞)={dsumW(k)(z)z𝒞}.\operatorname{dsum}_{W(k)}(\mathcal{C})=\{\operatorname{dsum}_{W(k)}(z)\mid z\in\mathcal{C}\}.

We will omit W(k)W(k) from this notation when it is clear from context.

Remark 2.5.

We will be concerned with collections W(k)[n]kW(k)\subseteq[n]^{k} arising from length-(k1)(k-1) walks on expanding structures (mostly on the ss-wide replacement product of two expander graphs).

We will be interested in cases where the direct sum lifting reduces the bias of the base code; in [TS17], structures with such a property are called parity samplers, as they emulate the reduction in bias that occurs by taking the parity of random samples.

Definition 2.6 (Parity Sampler).

A collection W(k)[n]kW(k)\subseteq[n]^{k} is called an (ε0,ε)(\varepsilon_{0},\varepsilon)-parity sampler if for all z𝔽2nz\in{\mathbb{F}}_{2}^{n} with bias(z)ε0\operatorname{bias}(z)\leq\varepsilon_{0}, we have bias(dsumW(k)(z))ε\operatorname{bias}(\operatorname{dsum}_{W(k)}(z))\leq\varepsilon.

2.3 Linear Algebra Conventions

All vectors considered in this paper are taken to be column vectors, and are multiplied on the left with any matrices or operators acting on them. Consequently, given an indexed sequence of operators 𝖦k1,,𝖦k2\mathsf{G}_{k_{1}},\ldots,\mathsf{G}_{k_{2}} (with k1k2k_{1}\leq k_{2}) corresponding to steps k1k_{1} through k2k_{2} of a walk, we expand the product i=k1k2Gi\prod_{i=k_{1}}^{k_{2}}G_{i} as

i=k1k2Gi:=Gk2Gk1.\prod_{i=k_{1}}^{k_{2}}G_{i}~{}:=~{}G_{k_{2}}\cdots G_{k_{1}}\,.

Unless otherwise stated, all inner products for vectors in coordinate spaces are taken to be with respect to the (uniform) probability measure on the coordinates. Similarly, all inner products for functions are taken to be with respect to the uniform measure on the inputs. All operators considered in this paper are normalized to have singular values at most 1.

3 Proof Overview

The starting point for our work is the framework developed in [AJQ+20] for decoding direct sum codes, obtained by starting from a code 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} and considering all parities corresponding to a set of tt-tuples W(t)[n]tW(t)\subseteq[n]^{t}. Ta-Shma’s near optimal ε\varepsilon-balanced codes are constructed by starting from a code with constant rate and constant distance and considering such a direct sum lifting. The set of tuples W(t)W(t) in his construction corresponds to a set of walks of length t1t-1 on the ss-wide replacement product of an expanding graph GG with vertex set [n][n] and a smaller expanding graph HH. The ss-wide replacement product can be thought of here as a way of constructing a much smaller pseudorandom subset of the set of all walks of length t1t-1 on GG, which yields a similar distance amplification for the lifted code.

The simplified construction with expander walks.

While we analyze Ta-Shma’s construction later in the paper, it is instructive to first consider a W(t)W(t) simply consisting of all walks of length t1t-1 on an expander. This construction, based on a suggestion of Rozenman and Wigderson [Bog12], was also analyzed by Ta-Shma [TS17] and can be used to obtain ε\varepsilon-balanced codes with rate Ω(ε4+o(1))\Omega(\varepsilon^{4+o(1)}). It helps to illustrate many of the conceptual ideas involved in our proof, while avoiding some technical issues.

Let GG be a dd-regular expanding graph with vertex set [n][n] and the (normalized) second singular value of the adjacency operator 𝖠G\mathsf{A}_{G} being λ\lambda. Let W(t)[n]tW(t)\subseteq[n]^{t} denote the set of tt-tuples corresponding to all walks of length t1t-1, with N=|W(t)|=ndt1N=\left\lvert W(t)\right\rvert=n\cdot d^{t-1}. Ta-Shma proves that for all z𝔽2nz\in{\mathbb{F}}_{2}^{n}, W(t)W(t) satisfies

bias(z)ε0bias(dsumW(t)(z))(ε0+2λ)(t1)/2,\operatorname{bias}(z)~{}\leq~{}\varepsilon_{0}\quad\Rightarrow\quad\operatorname{bias}(\operatorname{dsum}_{W(t)}(z))~{}\leq~{}(\varepsilon_{0}+2\lambda)^{\lfloor(t-1)/2\rfloor}\,,

i.e., W(t)W(t) is an (ε0,ε)(\varepsilon_{0},\varepsilon)-parity sampler for ε=(ε0+2λ)(t1)/2\varepsilon=(\varepsilon_{0}+2\lambda)^{\lfloor(t-1)/2\rfloor}. Choosing ε0=0.1\varepsilon_{0}=0.1 and λ=0.05\lambda=0.05 (say), we can choose d=O(1)d=O(1) and obtain the ε\varepsilon-balanced code 𝒞=dsumW(t)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(t)}(\mathcal{C}) with rate d(t1)=εO(1)d^{-(t-1)}=\varepsilon^{O(1)} (although the right constants matter a lot for optimal rates).

Decoding as constraint satisfaction.

The starting point for our work is the framework in [AJQ+20] which views the task of decoding y~\tilde{y} with Δ(𝒞,y~)<(1ε)/4δ\Delta(\mathcal{C}^{\prime},\tilde{y})<(1-\varepsilon)/4-\delta (where the distance of 𝒞\mathcal{C}^{\prime} is (1ε)/2(1-\varepsilon)/2) as an instance of the MAX t-XOR problem (see Fig. 1). The goal is to find

argminz𝒞Δ(dsumW(t)(z),y~),\operatorname*{\arg\!\min}_{z\in\mathcal{C}}\Delta\left(\operatorname{dsum}_{W(t)}(z),\tilde{y}\right),

which can be rephrased as

argmaxz𝒞𝔼w=(i1,,it)W(t)[𝟙{zi1++zit=y~w}].\operatorname*{\arg\!\max}_{z\in\mathcal{C}}\mathchoice{\underset{w=(i_{1},\ldots,i_{t})\in W(t)}{\mathbb{E}}\left[\mathds{1}_{\{z_{i_{1}}+\cdots+z_{i_{t}}=\tilde{y}_{w}\}}\right]}{{\mathbb{E}}_{w=(i_{1},\ldots,i_{t})\in W(t)}[\mathds{1}_{\{z_{i_{1}}+\cdots+z_{i_{t}}=\tilde{y}_{w}\}}]}{{\mathbb{E}}_{w=(i_{1},\ldots,i_{t})\in W(t)}[\mathds{1}_{\{z_{i_{1}}+\cdots+z_{i_{t}}=\tilde{y}_{w}\}}]}{{\mathbb{E}}_{w=(i_{1},\ldots,i_{t})\in W(t)}[\mathds{1}_{\{z_{i_{1}}+\cdots+z_{i_{t}}=\tilde{y}_{w}\}}]}.

It is possible to ignore the condition that z𝒞z\in\mathcal{C} if the collection W(t)W(t) is a slightly stronger parity sampler. For any solution z~𝔽2n\tilde{z}\in\mathbb{F}_{2}^{n} (not necessarily in 𝒞\mathcal{C}) such that

Δ(dsumW(t)(z~),y~)<1ε4+δ,\Delta(\operatorname{dsum}_{W(t)}(\tilde{z}),\tilde{y})<\frac{1-\varepsilon}{4}+\delta,

we have

Δ(dsumW(t)(z~),dsumW(t)(z))<1ε2\Delta(\operatorname{dsum}_{W(t)}(\tilde{z}),\operatorname{dsum}_{W(t)}(z))<\frac{1-\varepsilon}{2}

by the triangle inequality, and thus bias(dsumW(t)(zz~))>ε\operatorname{bias}(\operatorname{dsum}_{W(t)}(z-\tilde{z}))>\varepsilon. If W(t)W(t) is not just an (ε0,ε)(\varepsilon_{0},\varepsilon)-parity sampler, but in fact a ((1+ε0)/2,ε)((1+\varepsilon_{0})/2,\varepsilon)-parity sampler, this would imply bias(zz~)>(1+ε0)/2\operatorname{bias}(z-\tilde{z})>(1+\varepsilon_{0})/2. Thus, Δ(z,z~)<(1ε0)/4\Delta(z,\tilde{z})<(1-\varepsilon_{0})/4 (or Δ(z,z~¯)<(1ε0)/4\Delta(z,\overline{\tilde{z}})<(1-\varepsilon_{0})/4) and we can use a unique decoding algorithm for 𝒞\mathcal{C} to find zz given z~\tilde{z}.

Small approximation error δ\delta
(comparable to ε\varepsilon)
y~\tilde{y}yy
Unique decoding radius
((1ε)/4)((1-\varepsilon)/4)
Figure 1: Unique decoding ball along with error from approximation.

The task of finding such a z𝒞z\in\mathcal{C} boils down to finding a solution z~𝔽2n\tilde{z}\in\mathbb{F}_{2}^{n} to a MAX t-XOR instance, up to a an additive loss of O(δ)O(\delta) in the fraction of constraints satisfied by the optimal solution. While this is hard to do in general [Hås01, Gri01], [AJQ+20] (building on [AJT19]) show that this can be done if the instance satisfies a special property called splittability. To define this, we let W[t1,t2][n]t2t1+1W[t_{1},t_{2}]\subset[n]^{t_{2}-t_{1}+1} denote the collection of (t2t1+1)(t_{2}-t_{1}+1)-tuples obtained by considering the indices between t1t_{1} and t2t_{2} for all tuples in W(t)W(t). We also assume that all wW[t1,t2]w\in W[t_{1},t_{2}] can be extended to the same number of tuples in W(t)W(t) (which is true for walks).

Definition 3.1 (Splittability  (informal)).

A collection W(t)[n]tW(t)\subseteq[n]^{t} is said to be τ\tau-splittable, if t=1t=1 (base case) or there exists t[t1]t^{\prime}\in[t-1] such that:

  1. 1.

    The matrix 𝖲W[1,t]×W[t+1,t]\mathsf{S}\in{\mathbb{R}}^{W[1,t^{\prime}]\times W[t^{\prime}+1,t]} defined by 𝖲(w,w)=𝟙{wwW}\mathsf{S}(w,w^{\prime})~{}=~{}\mathds{1}_{\{ww^{\prime}\in W\}} has normalized second singular value at most τ\tau (where wwww^{\prime} denotes the concatenated tuple).

  2. 2.

    The collections W[1,t]W[1,t^{\prime}] and W[t+1,t]W[t^{\prime}+1,t] are τ\tau-splittable.

For example, considering walks in GG of length 33 (t=4t=4) and t=2t^{\prime}=2, we get that W[1,2]=W[3,4]=EW[1,2]=W[3,4]=E, the set of oriented edges in GG. Also 𝖲(w,w)=1\mathsf{S}(w,w^{\prime})=1 if and only if the second vertex of ww and first vertex of ww^{\prime} are adjacent in GG. Thus, up to permutation of rows and columns, we can write the normalized version of 𝖲\mathsf{S} as 𝖠G𝖩d/d\mathsf{A}_{G}\otimes\mathsf{J}_{d}/d where 𝖠G\mathsf{A}_{G} is normalized adjacency matrix of GG and 𝖩d\mathsf{J}_{d} denotes the d×dd\times d matrix of 1s. Hence such a W(t)W(t) satisfies σ2(𝖲)τ\sigma_{2}(\mathsf{S})\leq\tau with τ=σ2(𝖠G)\tau=\sigma_{2}(\mathsf{A}_{G}), and a similar proof works for walks of all lengths.

The framework in [AJQ+20] and [AJT19] gives that if W(t)W(t) is τ\tau-splittable for τ=(δ/2t)O(1)\tau=(\delta/2^{t})^{O(1)}, then the above instance of MAX t-XOR can be solved to additive error O(δ)O(\delta) using the Sum-of-Squares (SOS) SDP hierarchy. Broadly speaking, splittability allows one to (recursively) treat instances as expanding instances of problems with two “tuple variables” in each constraint, which can then be analyzed using known algorithms for 2-CSPs [BRS11, GS11] in the SOS hierarchy. Combined with parity sampling, this yields a unique decoding algorithm. Crucially, this framework can also be extended to perform list decoding333 While unique decoding can be thought of as recovering a single solution to a constraint satisfaction problem, the goal in the list decoding setting can be thought of as obtaining a “sufficiently rich” set of solutions which forms a good cover. This is achieved in the framework by adding an entropic term to the semidefinite program, which ensures that the SDP solution satisfies such a covering property. up to a radius of 1/2εδ1/2-\sqrt{\varepsilon}-\delta under a similar condition on τ\tau, which will be very useful for our application.

While the above can yield decoding algorithms for suitably expanding GG, the requirement on τ\tau (and hence on λ\lambda) makes the rate much worse. We need δ=O(ε)\delta=O(\varepsilon) (for unique decoding) and t=O(log(1/ε))t=O(\log(1/\varepsilon)) (for parity sampling), which requires λ=εΩ(1)\lambda=\varepsilon^{\Omega(1)}, yielding only a quasipolynomial rate for the code (recall that we could take λ=O(1)\lambda=O(1) earlier yielding polynomial rates).

Unique decoding: weakening the error requirement.

We first observe that it is possible to get rid of the dependence δ=O(ε)\delta=O(\varepsilon) above by using the list decoding algorithm for unique decoding. It suffices to take δ=0.1\delta=0.1 and return the closest element from the the list of all codewords up to an error radius 1/2ε0.11/2-\sqrt{\varepsilon}-0.1, if we are promised that Δ(y~,𝒞)\Delta(\tilde{y},\mathcal{C}) is within the unique decoding radius (see Fig. 2). However, this alone does not improve the rate as we still need the splittability (and hence λ\lambda) to be 2Ω(t)2^{-\Omega(t)} with t=O(log(1/ε))t=O(\log(1/\varepsilon)).

List decoding radius
(1/2ε)(1/2-\sqrt{\varepsilon})
y~\tilde{y}yy
Unique decoding radius
(1/4ε/4)(1/4-\varepsilon/4)
Constant approximation
error (0.1)
Figure 2: Unique decoding and list decoding balls along with error from approximation. Note that the list decoding ball contains the unique decoding ball even after allowing for a relatively large amount of error.
Code cascades: handling the dependence on walk length.

To avoid the dependence of the expansion on the length t1t-1 of the walk (and hence on ε\varepsilon), we avoid the “one-shot” decoding above, and instead consider a sequence of intermediate codes between 𝒞\mathcal{C} and 𝒞\mathcal{C}^{\prime}. Consider the case when t=k2t=k^{2}, and instead of computing tt-wise sums of bits in each z𝔽2nz\in{\mathbb{F}}_{2}^{n}, we first compute kk-wise sums according to walks of length k1k-1 on GG, and then a kk-wise sum of these values. In fact, the second sum can also be thought of as arising from a length k1k-1 walk on a different graph, with vertices corresponding to (directed) walks with kk vertices in GG, and edges connecting ww and ww^{\prime} when the last vertex of ww is connected to the first one in ww^{\prime} (this is similar to the matrix considered for defining splittability). We can thus think of a sequence of codes 𝒞0,𝒞1,𝒞2\mathcal{C}_{0},\mathcal{C}_{1},\mathcal{C}_{2} with 𝒞0=𝒞\mathcal{C}_{0}=\mathcal{C} and 𝒞2=𝒞\mathcal{C}_{2}=\mathcal{C}^{\prime}, and both 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2} being kk-wise direct sums. More generally, when t=kt=k^{\ell} for an appropriate constant kk we can think of a sequence 𝒞=𝒞0,𝒞1,,𝒞=𝒞\mathcal{C}=\mathcal{C}_{0},\mathcal{C}_{1},\ldots,\mathcal{C}_{\ell}=\mathcal{C}^{\prime}, where each is an kk-wise direct sum of the previous code, obtained via walks of length k1k-1 (hence kk vertices) in an appropriate graph. We refer to such sequences (defined formally in Section 5) as code cascades (see Fig. 3).

𝒞0\mathcal{C}_{0}𝒞1\mathcal{C}_{1}𝒞i1\mathcal{C}_{i-1}𝒞i\mathcal{C}_{i}𝒞\mathcal{C}_{\ell}\cdots\cdotsdsum\operatorname{dsum}dsum\operatorname{dsum}ε0\varepsilon_{0}ε1\varepsilon_{1}εi1\varepsilon_{i-1}εi\varepsilon_{i}ε=ε\varepsilon_{\ell}=\varepsilonRefined parity sampling via Ta-Shma’s walkCrude parity sampling via Markov chain walk
Figure 3: Code cascading.

Instead of applying the decoding framework above to directly reduce the decoding of a corrupted codeword from 𝒞\mathcal{C}^{\prime} to the unique decoding problem in 𝒞\mathcal{C}, we apply it at each level of a cascade, reducing the unique decoding problem in 𝒞i\mathcal{C}_{i} to that in 𝒞i1\mathcal{C}_{i-1}. If the direct sum at each level of the cascade is an (η0,η)(\eta_{0},\eta)-parity sampler, the list decoding algorithm at radius 1/2η1/2-\sqrt{\eta} suffices for unique decoding even if η\eta is a (sufficiently small) constant independent of ε\varepsilon. This implies that we can take kk to be a (suitably large) constant. This also allows the splittability (and hence λ\lambda) to be 2O(k)=Ω(1)2^{-O(k)}=\Omega(1), yielding polynomial rates. We present the reduction using cascades in Section 6 and the parameter choices in Section 8. The specific versions of the list decoding results from [AJQ+20] needed here are instantiated in Section 9.

While the above allows for polynomial rate, the running time of the algorithm is still exponential in the number of levels \ell (which is O(logt)=O(loglog(1/ε))O(\log t)=O(\log\log(1/\varepsilon))) since the list decoding for each level potentially produces a list of size poly(n){\mathrm{poly}}(n), and recursively calls the decoding algorithm for the previous level on each element of the list. We obtain a fixed polynomial time algorithm by “pruning” the list at each level of the cascade before invoking the decoding algorithm for the previous level, while only slightly increasing the parity sampling requirements. The details are contained in Section 6.

Working with Ta-Shma’s construction.

Finally, to obtain near-optimal rates, we need to work with with Ta-Shma’s construction, where the set of tuples W(t)[n]tW(t)\subseteq[n]^{t} corresponds to walks arising from an ss-wide replacement product of GG with another expanding graph HH. One issue that arises is that the collection of walks W(t)W(t) as defined in [TS17] does not satisfy the important splittability condition required by our algorithms. However, this turns out to be easily fixable by modifying each step in Ta-Shma’s construction to be exactly according to the zig-zag product of Reingold, Vadhan and Wigderson [RVW00]. We present Ta-Shma’s construction and this modification in Section 4.

We also verify that the tuples given by Ta-Shma’s construction satisfy the conditions for applying the list decoding framework, in Section 7. While the sketch above stated this in terms of splittability, the results in [AJQ+20] are in terms of a more technical condition called tensoriality. We show in Section 7 that this is indeed implied by splittability, and also prove splittability for (the modified version of) Ta-Shma’s construction.

4 Ta-Shma’s Construction: A Summary and Some Tweaks

In this section, we first discuss the ss-wide replacement product that is central to Ta-Shma’s construction of optimal ε\varepsilon-balanced codes, and then we describe the construction itself (we refer the reader to [TS17] for formal details beyond those we actually need here).

As mentioned before, we will also need to modify Ta-Shma’s construction [TS17] a little to get splittability which is a notion of expansion of a collection W(k)[n]kW(k)\subseteq[n]^{k} (and it is formally defined in Definition 7.9). The reason for this simple modification is that this splittability property is required by the list decoding framework. Note that we are not improving the Ta-Shma code parameters; in fact, we need to argue why with this modification we can still achieve Ta-Shma’s parameters. Fortunately, this modification is simple enough that we will be able to essentially reuse Ta-Shma’s original analysis. In Section 4.3, we will also have the opportunity to discuss, at an informal level, the intuition behind some parameter trade-offs in Ta-Shma codes which should provide enough motivation when we instantiate these codes in Section 8.

4.1 The ss-wide Replacement Product

Ta-Shma’s code construction is based on the so-called ss-wide replacement product [TS17]. This is a derandomization of random walks on a graph GG that will be defined via a product operation of GG with another graph HH (see 4.2 for a formal definition). We will refer to GG as the outer graph and HH as the inner graph in this construction.

Let GG be a d1d_{1}-regular graph on vertex set [n][n] and HH be a d2d_{2}-regular graph on vertex set [d1]s[d_{1}]^{s}, where ss is any positive integer. Suppose the neighbors of each vertex of GG are labeled 1, 2, …, d1d_{1}. For vV(G)v\in V(G), let vG[j]v_{G}[j] be the jj-th neighbor of vv. The ss-wide replacement product is defined by replacing each vertex of GG with a copy of HH, called a “cloud”. While the edges within each cloud are determined by HH, the edges between clouds are based on the edges of GG, which we will define via operators 𝖦0,𝖦1,,𝖦s1\mathsf{G}_{0},\mathsf{G}_{1},\dots,\mathsf{G}_{s-1}. The ii-th operator 𝖦i\mathsf{G}_{i} specifies one inter-cloud edge for each vertex (v,(a0,,as1))V(G)×V(H)(v,(a_{0},\dots,a_{s-1}))\in V(G)\times V(H), which goes to the cloud whose GG component is vG[ai]v_{G}[a_{i}], the neighbor of vv in GG indexed by the ii-th coordinate of the HH component. (We will resolve the question of what happens to the HH component after taking such a step momentarily.)

Walks on the ss-wide replacement product consist of steps with two different parts: an intra-cloud part followed by an inter-cloud part. All of the intra-cloud substeps simply move to a random neighbor in the current cloud, which corresponds to applying the operator 𝖨𝖠H\mathsf{I}\otimes\mathsf{A}_{H}, where 𝖠H\mathsf{A}_{H} is the normalized adjacency matrix of HH. The inter-cloud substeps are all deterministic, with the first moving according to 𝖦0\mathsf{G}_{0}, the second according to 𝖦1\mathsf{G}_{1}, and so on, returning to 𝖦0\mathsf{G}_{0} for step number s+1s+1. The operator for such a walk taking t1t-1 steps on the ss-wide replacement product is

i=0t2𝖦imods(𝖨𝖠H).\prod_{i=0}^{t-2}\mathsf{G}_{i\bmod s}(\mathsf{I}\otimes\mathsf{A}_{H}).

Observe that a walk on the ss-wide replacement product yields a walk on the outer graph GG by recording the GG component after each step of the walk. The number of (t1)(t-1)-step walks on the ss-wide replacement product is

|V(G)||V(H)|d2t1=nd1sd2t1,|V(G)|\cdot|V(H)|\cdot d_{2}^{t-1}=n\cdot d_{1}^{s}\cdot d_{2}^{t-1},

since a walk is completely determined by its intra-cloud steps. If d2d_{2} is much smaller than d1d_{1} and tt is large compared to ss, this is less than nd1t1nd_{1}^{t-1}, the number of (t1)(t-1)-step walks on GG itself. Thus the ss-wide replacement product will be used to simulate random walks on GG while requiring a reduced amount of randomness (of course this simulation is only possible under special conditions, namely, when we are uniformly distributed on each cloud).

To formally define the ss-wide replacement product, we must consider the labeling of neighbors in GG more carefully.

Definition 4.1 (Rotation Map).

Suppose GG is a d1d_{1}-regular graph on [n][n]. For each v[n]v\in[n] and j[d1]j\in[d_{1}], let vG[j]v_{G}[j] be the jj-th neighbor of vv in GG. Based on the indexing of the neighbors of each vertex, we define the rotation map 444This kind of map is denoted rotation map in the zig-zag terminology [RVW00]. rotG:[n]×[d1][n]×[d1]\textup{rot}_{G}\colon[n]\times[d_{1}]\rightarrow[n]\times[d_{1}] such that for every (v,j)[n]×[d1](v,j)\in[n]\times[d_{1}],

rotG((v,j))=(v,j)vG[j]=v and vG[j]=v.\textup{rot}_{G}((v,j))=(v^{\prime},j^{\prime})\Leftrightarrow v_{G}[j]=v^{\prime}\text{ and }v^{\prime}_{G}[j^{\prime}]=v.

Furthermore, if there exists a bijection φ:[d1][d1]\varphi\colon[d_{1}]\rightarrow[d_{1}] such that for every (v,j)[n]×[d1](v,j)\in[n]\times[d_{1}],

rotG((v,j))=(vG[j],φ(j)),\textup{rot}_{G}((v,j))=(v_{G}[j],\varphi(j)),

then we call rotG\textup{rot}_{G} locally invertible.

If GG has a locally invertible rotation map, the cloud label after applying the rotation map only depends on the current cloud label, not the vertex of GG. In the ss-wide replacement product, this corresponds to the HH component of the rotation map only depending on a vertex’s HH component, not its GG component. We define the ss-wide replacement product as described before, with the inter-cloud operator 𝖦i\mathsf{G}_{i} using the ii-th coordinate of the HH component, which is a value in [d1][d_{1}], to determine the inter-cloud step.

Definition 4.2 (ss-wide replacement product).

Suppose we are given the following:

  • -

    A d1d_{1}-regular graph G=([n],E)G=([n],E) together with a locally invertible rotation map rotG:[n]×[d1][n]×[d1]\textup{rot}_{G}\colon[n]\times[d_{1}]\rightarrow[n]\times[d_{1}].

  • -

    A d2d_{2}-regular graph H=([d1]s,E)H=([d_{1}]^{s},E^{\prime}).

And we define:

  • -

    For i{0,1,,s1}i\in\{0,1,\dots,s-1\}, we define Roti:[n]×[d1]s[n]×[d1]s\textup{Rot}_{i}\colon[n]\times[d_{1}]^{s}\rightarrow[n]\times[d_{1}]^{s} as, for every v[n]v\in[n] and (a0,,as1)[d1]s(a_{0},\dots,a_{s-1})\in[d_{1}]^{s},

    Roti((v,(a0,,as1)))(v,(a0,,ai1,ai,ai+1,,as1)),\textup{Rot}_{i}((v,(a_{0},\dots,a_{s-1})))\coloneqq(v^{\prime},(a_{0},\dots,a_{i-1},a_{i}^{\prime},a_{i+1},\dots,a_{s-1})),

    where (v,ai)=rotG(v,ai)(v^{\prime},a_{i}^{\prime})=\textup{rot}_{G}(v,a_{i}).

  • -

    Denote by 𝖦i\mathsf{G}_{i} the operator realizing Roti\textup{Rot}_{i} and let 𝖠H\mathsf{A}_{H} be the normalized random walk operator of HH. Note that 𝖦i\mathsf{G}_{i} is a permutation operator corresponding to a product of transpositions.

Then t1t-1 steps of the ss-wide replacement product are given by the operator

i=0t2𝖦imods(𝖨𝖠H).\prod_{i=0}^{t-2}\mathsf{G}_{i\bmod s}(\mathsf{I}\otimes\mathsf{A}_{H}).

Ta-Shma instantiates the ss-wide replacement product with an outer graph GG that is a Cayley graph, for which locally invertible rotation maps exist generically.

Remark 4.3.

Let RR be a group and ARA\subseteq R where the set AA is closed under inversion. For every Cayley graph Cay(R,A)\textup{Cay}(R,A), the map φ:AA\varphi\colon A\rightarrow A defined as φ(g)=g1\varphi(g)=g^{-1} gives rise to the locally invertible rotation map

rotCay(R,A)((r,a))=(ra,a1),\textup{rot}_{\textup{Cay}(R,A)}((r,a))=(r\cdot a,a^{-1}),

for every rRr\in R, aAa\in A.

12341234123412341234
Figure 4: An example of the 1-wide replacement product with outer graph G=K5G=K_{5} and inner graph H=C4H=C_{4}. Vertices are labeled by their HH components. Note that the rotation map is locally invertible, with φ(1)=2\varphi(1)=2, φ(2)=1\varphi(2)=1, φ(3)=4\varphi(3)=4, and φ(4)=3\varphi(4)=3.

4.2 The Construction

Ta-Shma’s code construction works by starting with a constant bias code 𝒞0\mathcal{C}_{0} in 𝔽2n\mathbb{F}_{2}^{n} and boosting to arbitrarily small bias using direct sum liftings. Recall that the direct sum lifting is based on a collection W(t)[n]tW(t)\subseteq[n]^{t}, which Ta-Shma obtains using t1t-1 steps of random walk on the ss-wide replacement product of two regular expander graphs GG and HH. The graph GG is on nn vertices (same as blocklength of the base code) and other parameters like degrees d1d_{1} and d2d_{2} of GG and HH respectively are chosen based on target code parameters.

To elaborate, every t1t-1 length walk on the replacement product gives a sequence of tt outer vertices or GG-vertices, which can be seen as an element of [n]t[n]^{t}. This gives the collection W(t)W(t) with |W(t)|=nd1sd2t1|W(t)|=n\cdot d_{1}^{s}\cdot d_{2}^{t-1} which means the rate of lifted code is smaller than the rate of 𝒞0\mathcal{C}_{0} by a factor of d1sd2t1d_{1}^{s}d_{2}^{t-1}. However, the collection W(t)W(t) is a parity sampler and this means that the bias decreases (or the distance increases). The relationship between this decrease in bias and decrease in rate with some careful parameter choices allows Ta-Shma to obtain nearly optimal ε\varepsilon-balanced codes.

4.3 Tweaking the Construction

Recall the first ss steps in Ta-Shma’s construction are given by the operator

𝖦s1(𝖨𝖠H)𝖦s2G1(𝖨𝖠H)𝖦0(𝖨𝖠H).\mathsf{G}_{s-1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{s-2}\cdots G_{1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{0}(\mathsf{I}\otimes\mathsf{A}_{H}).

Naively decomposing the above operator into the product of operators i=0s1𝖦i(𝖨𝖠H)\prod_{i=0}^{s-1}\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}) is not good enough to obtain the splittability property which would hold provided σ2(𝖦i(𝖨𝖠H))\sigma_{2}(\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})) was small for every ii in {0,,s1}\{0,\ldots,s-1\}. However, each 𝖦i(𝖨𝖠H)\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}) has |V(G)|\left\lvert V(G)\right\rvert singular values equal to 11 since GiG_{i} is an orthogonal operator and (𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H}) has |V(G)|\left\lvert V(G)\right\rvert singular values equal to 11. To avoid this issue we will tweak the construction to be the following product

i=0s1(𝖨𝖠H)𝖦i(𝖨𝖠H).\prod_{i=0}^{s-1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}).

The operator (𝖨𝖠H)𝖦i(𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}) is exactly the walk operator of the zig-zag product GzHG\operatorname{\leavevmode\hbox to9.42pt{\vbox to9.42pt{\pgfpicture\makeatletter\hbox{\hskip 4.70757pt\lower-4.70757pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.50757pt}{0.0pt}\pgfsys@curveto{4.50757pt}{2.48949pt}{2.48949pt}{4.50757pt}{0.0pt}{4.50757pt}\pgfsys@curveto{-2.48949pt}{4.50757pt}{-4.50757pt}{2.48949pt}{-4.50757pt}{0.0pt}\pgfsys@curveto{-4.50757pt}{-2.48949pt}{-2.48949pt}{-4.50757pt}{0.0pt}{-4.50757pt}\pgfsys@curveto{2.48949pt}{-4.50757pt}{4.50757pt}{-2.48949pt}{4.50757pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.22221pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\rm z}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}H of GG and HH with a rotation map given by the (rotation map) operator 𝖦i\mathsf{G}_{i}. This tweaked construction is slightly simpler in the sense that GzHG\operatorname{\leavevmode\hbox to9.42pt{\vbox to9.42pt{\pgfpicture\makeatletter\hbox{\hskip 4.70757pt\lower-4.70757pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.50757pt}{0.0pt}\pgfsys@curveto{4.50757pt}{2.48949pt}{2.48949pt}{4.50757pt}{0.0pt}{4.50757pt}\pgfsys@curveto{-2.48949pt}{4.50757pt}{-4.50757pt}{2.48949pt}{-4.50757pt}{0.0pt}\pgfsys@curveto{-4.50757pt}{-2.48949pt}{-2.48949pt}{-4.50757pt}{0.0pt}{-4.50757pt}\pgfsys@curveto{2.48949pt}{-4.50757pt}{4.50757pt}{-2.48949pt}{4.50757pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.22221pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\rm z}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}H is an undirected graph. We know by the zig-zag analysis that (𝖨𝖠H)𝖦i(𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}) is expanding as long GG and HH are themselves expanders. More precisely, we have a bound that follows from [RVW00].

Fact 4.4.

Let GG be an outer graph and HH be an inner graph used in the ss-wide replacement product. For any integer 0is10\leq i\leq s-1,

σ2((I𝖠H)Gi(I𝖠H))σ2(G)+2σ2(H)+σ2(H)2.\sigma_{2}((I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H}))\leq\sigma_{2}(G)+2\cdot\sigma_{2}(H)+\sigma_{2}(H)^{2}.

This bound will imply splittability as shown in Section 7.2. We will need to argue that this modification still preserves the correctness of the parity sampling and that it can be achieved with similar parameter trade-offs.

The formal definition of a length-tt walk on this slightly modified construction is given below.

Definition 4.5.

Let tt\in\mathbb{N}, GG be a d1d_{1}-regular graph and HH be a d2d_{2}-regular graph on d1sd_{1}^{s} vertices. Given a starting vertex (v,h)V(G)×V(H)(v,h)\in V(G)\times V(H), a (t1)(t-1)-step walk on the tweaked ss-wide replacement product of GG and HH is a tuple ((v0,h0),,(vt1,ht1))(V(G)×V(H))t((v_{0},h_{0}),\dots,(v_{t-1},h_{t-1}))\in(V(G)\times V(H))^{t} such that

  • -

    (v0,h0)=(v,h)(v_{0},h_{0})=(v,h), and

  • -

    for every 0i<t10\leq i<t-1, we have (vi,hi)(v_{i},h_{i}) adjacent to (vi+1,hi+1)(v_{i+1},h_{i+1}) in (𝖨𝖠H)𝖦imods(𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i\bmod s}(\mathsf{I}\otimes\mathsf{A}_{H}).

Note that each (𝖨𝖠H)𝖦imods(𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i\bmod s}(\mathsf{I}\otimes\mathsf{A}_{H}) is a walk operator of a d22d_{2}^{2}-regular graph. Therefore, the starting vertex (v,h)(v,h) together with a degree sequence (m1,,mt)[d22]t1(m_{1},\dots,m_{t})\in[d_{2}^{2}]^{t-1} uniquely defines a (t1)(t-1)-step walk.

4.3.1 Parity Sampling

We argue informally why parity sampling still holds with similar parameter trade-offs. Later in Section 4.3.2, we formalize a key result underlying parity sampling and, in Section 8, we compute the new trade-off between bias and rate in some regimes. In Section 4.1, the definition of the original ss-wide replacement product as a purely graph theoretic operation was given. Now, we explain how Ta-Shma used this construction for parity sampling obtaining codes near the GV bound.

For a word z𝔽2V(G)z\in\mathbb{F}_{2}^{V(G)} in the base code, let 𝖯z\mathsf{P}_{z} be the diagonal matrix, whose rows and columns are indexed by V(G)×V(H)V(G)\times V(H), with (𝖯z)(v,h),(v,h)=(1)zv(\mathsf{P}_{z})_{(v,h),(v,h)}=(-1)^{z_{v}}. Proving parity sampling requires analyzing the operator norm of the following product

𝖯zi=0s1(𝖨𝖠H)𝖦i𝖯z(𝖨𝖠H),\mathsf{P}_{z}\prod_{i=0}^{s-1}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}\mathsf{P}_{z}(\mathsf{I}\otimes\mathsf{A}_{H}), (1)

when bias(z)ε0\operatorname{bias}(z)\leq\varepsilon_{0}. Let 𝟏V(G)×V(H)\mathbf{1}\in\mathbb{R}^{V(G)\times V(H)} be the all-ones vector and WW be the collection of all (t1)(t-1)-step walks on the tweaked ss-wide replacement product. Ta-Shma showed (and it is not difficult to verify) that

bias(dsumW(z))=|𝟏,𝖯zi=0t2(𝖨𝖠H)𝖦imods𝖯z(𝖨𝖠H)𝟏|.\operatorname{bias}\left(\operatorname{dsum}_{W}(z)\right)=\left\lvert\left\langle\mathbf{1},\mathsf{P}_{z}\prod_{i=0}^{t-2}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i\bmod s}\mathsf{P}_{z}(\mathsf{I}\otimes\mathsf{A}_{H})\mathbf{1}\right\rangle\right\rvert.

From the previous equation, one readily deduces that

bias(dsumW(z))σ1(𝖯zi=0s1(𝖨𝖠H)𝖦i𝖯z(𝖨𝖠H))(t1)/s.\operatorname{bias}\left(\operatorname{dsum}_{W}(z)\right)\leq\sigma_{1}\left(\mathsf{P}_{z}\prod_{i=0}^{s-1}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}\mathsf{P}_{z}(\mathsf{I}\otimes\mathsf{A}_{H})\right)^{\lfloor(t-1)/s\rfloor}.

Set 𝖡𝖯zi=0s1(𝖨𝖠H)𝖦i𝖯z(𝖨𝖠H)\mathsf{B}\coloneqq\mathsf{P}_{z}\prod_{i=0}^{s-1}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}\mathsf{P}_{z}(\mathsf{I}\otimes\mathsf{A}_{H}). To analyze the operator norm of 𝖡\mathsf{B}, we will first need some notation. Note that 𝖡\mathsf{B} is an operator acting on the space 𝒱=V(G)V(H)\mathcal{V}=\mathbb{R}^{V(G)}\otimes\mathbb{R}^{V(H)}. Two of its subspaces play an important role in the analysis, namely,

𝒲=span{abV(G)V(H)b=𝟏} and 𝒲=(𝒲).\displaystyle\mathcal{W}^{\parallel}=\textup{span}\{a\otimes b\in\mathbb{R}^{V(G)}\otimes\mathbb{R}^{V(H)}\mid b=\mathbf{1}\}\text{ and }\mathcal{W}^{\perp}=(\mathcal{W}^{\parallel})^{\perp}.

Note that the complement subspace is with respect to the standard inner product. Observe that 𝒱=𝒲𝒲\mathcal{V}=\mathcal{W}^{\parallel}\oplus\mathcal{W}^{\perp}. Given arbitrary unit vectors v,w𝒱v,w\in\mathcal{V}, Ta-Shma considers the inner product

v,i=0s1(𝖨𝖠H)𝖦i𝖯z(𝖨𝖠H)w.\left\langle v,\prod_{i=0}^{s-1}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}\mathsf{P}_{z}(\mathsf{I}\otimes\mathsf{A}_{H})w\right\rangle. (2)

Each time an operator (𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H}) appears in the above expression, the next step of the walk can take one out of d2d_{2} possibilities and thus the rate suffers a multiplicative decrease of 1/d21/d_{2}. We think that we are “paying” d2d_{2} for this step of the walk. The whole problem lies in the trade-off between rate and distance, so the crucial question now is how much the norm decreases as we pay d2d_{2}. For a moment, suppose that the norm always decreases by a factor of λ2σ2(H)\lambda_{2}\coloneqq\sigma_{2}(H) per occurrence of (𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H}). If in this hypothetical case we could further assume λ2=1/d2\lambda_{2}=1/\sqrt{d_{2}}, then if BB was a product containing logλ2(ε)\lceil\log_{\lambda_{2}}(\varepsilon)\rceil factors of (𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H}), the final bias would be at most ε\varepsilon and the rate would have suffered a multiplicative decrease of (essentially) ε2\varepsilon^{2} and we would be done.

Of course, this was an oversimplification. The general strategy is roughly the above, but a beautiful non-trivial step is needed. Going back to the bilinear form Eq. 2, if w𝒲w\in\mathcal{W}^{\perp} (or v𝒲v\in\mathcal{W}^{\perp}), we pay d2d_{2} and we do obtain a norm decrease of λ2\lambda_{2}. More generally, note that can decompose w=w+ww=w^{\parallel}+w^{\perp} with w𝒲w^{\parallel}\in\mathcal{W}^{\parallel} and w𝒲w^{\perp}\in\mathcal{W}^{\perp} (decompose v=v+vv=v^{\parallel}+v^{\perp} similarly) and we can carry this process iteratively collecting factors of λ2\lambda_{2}. However, we are stuck with several terms of the form for 0k1k2<s0\leq k_{1}\leq k_{2}<s,

vk1,i=k1k2(𝖨𝖠H)𝖦i𝖯z(𝖨𝖠H)wk2,\left\langle v_{k_{1}}^{\parallel},\prod_{i=k_{1}}^{k_{2}}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}\mathsf{P}_{z}(\mathsf{I}\otimes\mathsf{A}_{H})w_{k_{2}}^{\parallel}\right\rangle,

with vk1,wk2𝒲v_{k_{1}}^{\parallel},w_{k_{2}}^{\parallel}\in\mathcal{W}^{\parallel}, and for which the preceding naive norm decrease argument fails. This is the point in the analysis where the structure of the ss-wide replacement product is used. Since vk1,wk2𝒲v_{k_{1}}^{\parallel},w_{k_{2}}^{\parallel}\in\mathcal{W}^{\parallel}, these vectors are uniform on each “cloud”, i.e., copy of HH. Recall that a vertex in HH is an ss-tuple (m1,,ms)[d1]s(m_{1},\dots,m_{s})\in[d_{1}]^{s}. Ta-Shma leverages the fact of having a uniform such tuple to implement k2k1+1k_{2}-k_{1}+1 (up to ss) steps of random walk on GG. More precisely, Ta-Shma obtains the following beautiful result:

Theorem 4.6 (Adapted from Ta-Shma [TS17]).

Let GG be a locally invertible graph of degree d1d_{1}, HH be a Cayley graph on 𝔽2slogd1\mathbb{F}_{2}^{s\log d_{1}}, and 0k1k2<s0\leq k_{1}\leq k_{2}<s be integers. If v=v1v^{\parallel}=v\otimes 1 and w=w1w^{\parallel}=w\otimes 1, then

v,i=k1k2𝖦i(𝖨𝖠H)𝖯zw=v,(𝖠G𝖬z)k2k1+1w\left\langle v^{\parallel},\prod_{i=k_{1}}^{k_{2}}\mathsf{G}_{i}\mathsf{(}\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{P}_{z}w^{\parallel}\right\rangle=\left\langle v,\left(\mathsf{\mathsf{A}}_{G}\mathsf{M}_{z}\right)^{k_{2}-k_{1}+1}w\right\rangle

where 𝖬zV(G)×V(G)\mathsf{M}_{z}\in\mathbb{R}^{V(G)\times V(G)} is the diagonal matrix defined as (𝖬z)v,v(1)zv(\mathsf{M}_{z})_{v,v}\coloneqq(-1)^{z_{v}} for vV(G)v\in V(G).

Remark 4.7.

Note that the walk operator in this theorem corresponds to the original construction. Theorem 4.6 was used by Ta-Shma to obtain 4.9 whose 4.10 corresponds to the modified construction.

Ta-Shma proved Theorem 4.6 under the more general condition that HH is 0-pseudorandom. Roughly speaking, this property means that if we start with a distribution that is uniform over the clouds, and walk according to fixed HH-steps j0,j1,,js1j_{0},j_{1},\cdots,j_{s-1}, then the distribution of GG-vertices obtained will be identical to the distribution obtained if we were doing the usual random walk on GG. We will always choose HH to be a Cayley graph on 𝔽2slogd1\mathbb{F}_{2}^{s\log d_{1}}, which will imply that HH is also 0-pseudorandom. The proof of Theorem 4.6 crucially uses the product structure of 𝔽2slogd1\mathbb{F}_{2}^{s\log d_{1}}: every vertex of HH can be represented by ss registers of logd1\log d_{1} bits each, and both inter-cloud and intra-cloud steps can be seen as applying register-wise bijections using some canonical mapping between [d1][d_{1}] and 𝔽2logd1\mathbb{F}_{2}^{\log d_{1}}.

Ta-Shma’s original parity sampling proof required ε0+2θ+2σ2(G)σ2(H)2\varepsilon_{0}+2\theta+2\sigma_{2}(G)\leq\sigma_{2}(H)^{2}, where ε0\varepsilon_{0} is the initial bias and θ\theta is an error parameter arising from a number theoretic construction of Ramanujan graphs for the outer graph GG. This is because ε0+2θ+2σ2(G)\varepsilon_{0}+2\theta+2\sigma_{2}(G) is the reduction of bias in every two steps while taking a walk on GG (see Theorem 5.2). Having ε0+2θ+2σ2(G)σ2(H)2\varepsilon_{0}+2\theta+2\sigma_{2}(G)\leq\sigma_{2}(H)^{2} ensured that after establishing Theorem 4.6, we were collecting enough reduction for d22d_{2}^{2} price we paid for two steps. In the modified construction, we now have d22d_{2}^{2} possibilities for each step in (𝖨𝖠H2)(\mathsf{I}\otimes\mathsf{A}_{H}^{2}) (so d24d_{2}^{4} price for two steps), and so if instead we have ε0+2θ+2σ2(G)σ2(H)4\varepsilon_{0}+2\theta+2\sigma_{2}(G)\leq\sigma_{2}(H)^{4} in the modified construction, we claim that the correctness of the parity sampling analysis is preserved as well as (essentially) the trade-off between walk length and norm decay. Fortunately, Ta-Shma’s parameters decouple and we can choose parameters to satisfy the above requirement.

Remark 4.8.

This modification on the ss-replacement product of GG and HH essentially 555Except at the first and last factors in the product of operators. amounts to taking a different inner graph HH which can be factored as H=HHH=\sqrt{H}\sqrt{H} (and is still 0-pseudorandom).

4.3.2 Spectral Analysis of the Modified Construction

We formally show that we don’t loose much by going from Ta-Shma’s original ss-wide product construction to its tweaked version. The key technical result obtained by Ta-Shma is the following, which is used to analyze the bias reduction as a function of the total number walk steps t1t-1.

Fact 4.9 (Theorem 24 abridged [TS17]).

If HH is a Cayley graph on 𝔽2slogd1\mathbb{F}_{2}^{s\log d_{1}} and ε0+2θ+2σ2(G)σ2(H)2\varepsilon_{0}+2\cdot\theta+2\cdot\sigma_{2}(G)\leq\sigma_{2}(H)^{2}, then

i=0s1𝖯z𝖦i(𝖨𝖠H)opσ2(H)s+sσ2(H)s1+s2σ2(H)s3,\left\lVert\prod_{i=0}^{s-1}\mathsf{P}_{z}\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})\right\rVert_{\textup{op}}\leq\sigma_{2}(H)^{s}+s\cdot\sigma_{2}(H)^{s-1}+s^{2}\cdot\sigma_{2}(H)^{s-3},

where 𝖯z(V(G)×V(H))×(V(G)×V(H))\mathsf{P}_{z}\in\mathbb{R}^{(V(G)\times V(H))\times(V(G)\times V(H))} is the sign operator of a ε0\varepsilon_{0} biased word z𝔽2V(G)z\in\mathbb{F}_{2}^{V(G)} defined as a diagonal matrix with (Pz)(v,h),(v,h)=(1)zv(P_{z})_{(v,h),(v,h)}=(-1)^{z_{v}} for every (v,h)V(G)×V(H)(v,h)\in V(G)\times V(H).

We reduce the analysis of Ta-Shma’s tweaked construction to 4.9. In doing so, we only lose one extra step as shown below.

Corollary 4.10.

If H2H^{2} is a Cayley graph on 𝔽2slogd1\mathbb{F}_{2}^{s\log d_{1}} and ε0+2θ+2σ2(G)σ2(H)4\varepsilon_{0}+2\cdot\theta+2\cdot\sigma_{2}(G)\leq\sigma_{2}(H)^{4}, then

i=0s1(𝖨𝖠H)𝖯z𝖦i(𝖨𝖠H)opσ2(H2)s1+(s1)σ2(H2)s2+(s1)2σ2(H2)s4,\left\lVert\prod_{i=0}^{s-1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{P}_{z}\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})\right\rVert_{\textup{op}}\leq\sigma_{2}(H^{2})^{s-1}+(s-1)\cdot\sigma_{2}(H^{2})^{s-2}+(s-1)^{2}\cdot\sigma_{2}(H^{2})^{s-4},

where 𝖯z\mathsf{P}_{z} is the sign operator of an ε0\varepsilon_{0}-biased word z𝔽2V(G)z\in\mathbb{F}_{2}^{V(G)} as in 4.9.

Proof.

We have

i=0s1(𝖨𝖠H)𝖯z𝖦i(𝖨𝖠H)op\displaystyle\left\lVert\prod_{i=0}^{s-1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{P}_{z}\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})\right\rVert_{\text{op}} (𝖨𝖠H)opi=1s1𝖯z𝖦i(𝖨𝖠H2)op𝖯z𝖦0(𝖨𝖠H)op\displaystyle\leq\left\lVert(\mathsf{I}\otimes\mathsf{A}_{H})\right\rVert_{\text{op}}\left\lVert\prod_{i=1}^{s-1}\mathsf{P}_{z}\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}^{2})\right\rVert_{\text{op}}\left\lVert\mathsf{P}_{z}\mathsf{G}_{0}(\mathsf{I}\otimes\mathsf{A}_{H})\right\rVert_{\text{op}}
i=1s1𝖯z𝖦i(𝖨𝖠H2)op\displaystyle\leq\left\lVert\prod_{i=1}^{s-1}\mathsf{P}_{z}\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}^{2})\right\rVert_{\text{op}}
σ2(H2)s1+(s1)σ2(H2)s2+(s1)2σ2(H2)s4,\displaystyle\leq\sigma_{2}(H^{2})^{s-1}+(s-1)\cdot\sigma_{2}(H^{2})^{s-2}+(s-1)^{2}\cdot\sigma_{2}(H^{2})^{s-4},

where the last inequality follows from 4.9.      

Remark 4.11.

We know that in the modified construction H2H^{2} is a Cayley graph since HH is a Cayley graph.

From this point onward, we will be working exclusively with the modified construction instead of using it in its original form. Any references to Ta-Shma’s construction or the ss-wide replacement product will actually refer to the modified versions described in this section.

5 Code Cascading

A code cascade is a sequence of codes generated by starting with a base code 𝒞0\mathcal{C}_{0} and recursively applying lifting operations.

Definition 5.1.

We say that a sequence of codes 𝒞0,𝒞1,,𝒞\mathcal{C}_{0},\mathcal{C}_{1},\ldots,\mathcal{C}_{\ell} is a code cascade provided 𝒞i=dsumWi(ti)(𝒞i1)\mathcal{C}_{i}=\operatorname{dsum}_{W_{i}(t_{i})}(\mathcal{C}_{i-1}) for every i[]i\in[\ell]. Each Wi(ti)W_{i}(t_{i}) is a subset of [ni1]ti[n_{i-1}]^{t_{i}}, where ni1=|Wi1(ti1)|n_{i-1}=|W_{i-1}(t_{i-1})| is the block length of the code 𝒞i1\mathcal{C}_{i-1}.

Let us see how code cascades may be useful for decoding. Suppose we wish to lift the code 𝒞0\mathcal{C}_{0} to 𝒞\mathcal{C}_{\ell}, and there is some W(t)[n0]tW(t)\subseteq[n_{0}]^{t} such that 𝒞=dsumW(t)(𝒞0)\mathcal{C}_{\ell}=\operatorname{dsum}_{W(t)}(\mathcal{C}_{0}). In our case of bias boosting, this tt will depend on the target bias ε\varepsilon. However, the expansion requirement of the list-decoding framework of [AJQ+20] has a poor dependence on tt. A way to work around this issue is to go from 𝒞0\mathcal{C}_{0} to 𝒞\mathcal{C}_{\ell} via a code cascade as above such that each tit_{i} is a constant independent of the final bias but i=1ti=t\prod\limits_{i=1}^{\ell}t_{i}=t (which means \ell depends on ε\varepsilon). The final code 𝒞\mathcal{C}_{\ell} of the cascade is the same as the code obtained from length-(t1)(t-1) walks. While decoding will now become an \ell-level recursive procedure, the gain from replacing tt by tit_{i} will outweigh this loss, as we discuss below.

5.1 Warm-up: Code Cascading Expander Walks

We now describe the code cascading construction and unique decoding algorithm in more detail. Let G=(V,E)G=(V,E) be a dd-regular graph with uniform distribution over the edges. Let mm be a sufficiently large positive integer, which will be the number of vertices of the walks used for the lifting between consecutive codes in the cascade. At first, it will be crucial that we can take m=O(1)m=O(1) so that the triangle inequality arising from the analysis of the lifting between two consecutive codes involves a constant number of terms. We construct a recursive family of codes as follows.

  • -

    Start with a code 𝒞0\mathcal{C}_{0} which is linear and has constant bias ε0\varepsilon_{0}.

  • -

    Define the code 𝒞1=dsumW(m)(𝒞0)\mathcal{C}_{1}=\operatorname{dsum}_{W(m)}(\mathcal{C}_{0}), which is the direct sum lifting over the collection W(m)W(m) of all length-(m1)(m-1) walks on GG using the code 𝒞0\mathcal{C}_{0}.

  • -

    Let G^i=(Vi,Ei)\widehat{G}_{i}=(V_{i},E_{i}) be the (directed) graph where ViV_{i} is the collection of all walks on mim^{i} vertices on GG with two walks (v1,,vmi)(v_{1},\dots,v_{m^{i}}) and (u1,,umi)(u_{1},\dots,u_{m^{i}}) connected iff vmiv_{m^{i}} is adjacent to u1u_{1} in GG.

  • -

    Define 𝒞i\mathcal{C}_{i} to be the direct sum lifting on the collection Wi(m)W_{i}(m) of all length-(m1)(m-1) walks on Gi1G_{i-1} using the code 𝒞i1\mathcal{C}_{i-1}, i.e., 𝒞i=dsumWi(m)(𝒞i1)\mathcal{C}_{i}=\operatorname{dsum}_{W_{i}(m)}(\mathcal{C}_{i-1}).

  • -

    Repeat this process to yield a code cascade 𝒞0,,𝒞\mathcal{C}_{0},\dots,\mathcal{C}_{\ell}.

Thanks to the definition of the graphs G^i\widehat{G}_{i} and the recursive nature of the construction, the final code 𝒞\mathcal{C}_{\ell} is the same as the code obtained from 𝒞0\mathcal{C}_{0} by taking the direct sum lifting over all walks on t=mt=m^{\ell} vertices of GG. We can use Ta-Shma’s analysis (building on the ideas of Rozenman and Wigderson [Bog12]) for the simpler setting of walks over a single expander graph to determine the amplification in bias that occurs in going from 𝒞0\mathcal{C}_{0} all the way to 𝒞\mathcal{C}_{\ell}.

Theorem 5.2 (Adapted from Ta-Shma [TS17]).

Let 𝒞\mathcal{C} be an ε0\varepsilon_{0}-balanced linear code, and let 𝒞=dsumW(t)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(t)}(\mathcal{C}) be the direct sum lifting of 𝒞\mathcal{C} over the collection of all length-(t1)(t-1) walks W(t)W(t) on a graph GG. Then

bias(𝒞)(ε0+2σ2(G))(t1)/2.\operatorname{bias}(\mathcal{C}^{\prime})\leq(\varepsilon_{0}+2\sigma_{2}(G))^{\left\lfloor(t-1)/2\right\rfloor}.

If σ2(G)ε0/2\sigma_{2}(G)\leq\varepsilon_{0}/2 and =logm(2log2ε0(ε)+3)\ell=\left\lceil\log_{m}(2\log_{2\varepsilon_{0}}(\varepsilon)+3)\right\rceil, taking t=m2log2ε0(ε)+3t=m^{\ell}\geq 2\log_{2\varepsilon_{0}}(\varepsilon)+3 in the above theorem shows that the final code 𝒞\mathcal{C}_{\ell} is ε\varepsilon-balanced. Observe that the required expansion of the graph GG only depends on the constant initial bias ε0\varepsilon_{0}, not on the desired final bias ε\varepsilon. It will be important for being able to decode with better parameters that both σ2(G)\sigma_{2}(G) and mm are constant with respect to ε\varepsilon; only \ell depends on the final bias (with more care we can make σ2(G)\sigma_{2}(G) depend on ε\varepsilon, but we restrict this analysis to Ta-Shma’s refined construction on the ss-wide replacement product).

As mentioned before, to uniquely decode 𝒞\mathcal{C}_{\ell} we will inductively employ the list decoding machinery for expander walks from [AJQ+20]. The list decoding algorithm can decode a direct sum lifting 𝒞=dsumW(m)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(m)}(\mathcal{C}) as long as the graph GG is sufficiently expanding, the walk length m1m-1 is large enough, and the base code 𝒞\mathcal{C} has an efficient unique decoding algorithm (see Theorem 6.1 for details).

The expansion requirement ultimately depends on the desired list decoding radius of 𝒞\mathcal{C}^{\prime}, or more specifically, on how close the list decoding radius is to 1/21/2. If the distance of 𝒞\mathcal{C}^{\prime} is at most 1/21/2, its unique decoding radius is at most 1/41/4, which means list decoding at the unique decoding radius is at a constant difference from 1/21/2 and thus places only a constant requirement on the expansion of GG. In the case of the code cascade 𝒞i=dsumWi(m)(𝒞i1)\mathcal{C}_{i}=\operatorname{dsum}_{W_{i}(m)}(\mathcal{C}_{i-1}), unique decoding of 𝒞i1\mathcal{C}_{i-1} is guaranteed by the induction hypothesis. It is not too difficult to see that each graph G^i\widehat{G}_{i} will have the same second singular value as GG, so we can uniquely decode 𝒞i\mathcal{C}_{i} if GG meets the constant expansion requirement and mm is sufficiently large.

5.2 Code Cascading Ta-Shma’s Construction

We will now describe how to set up a code cascade based on walks on an ss-wide replacement product. Consider the ss-wide replacement product of the outer graph GG with the inner graph HH. The first ss walk steps are given by the walk operator

i=0s1(𝖨𝖠H)𝖦i(𝖨𝖠H).\prod_{i=0}^{s-1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}).

Let 𝖠s1(𝖨𝖠H)𝖦s2(𝖨𝖠H)(𝖨𝖠H)𝖦0(𝖨𝖠H)\mathsf{A}_{s-1}\coloneqq(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{s-2}(\mathsf{I}\otimes\mathsf{A}_{H})\cdots(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{0}(\mathsf{I}\otimes\mathsf{A}_{H}). If the total walk length t1t-1 is a multiple of ss, the walks are generated using the operator

((𝖨𝖠H)𝖦s1(𝖨𝖠H)𝖠s1)(t1)/s.\left((\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{s-1}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{A}_{s-1}\right)^{(t-1)/s}.

Here (𝖨𝖠H)𝖦s1(𝖨𝖠H)(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{s-1}(\mathsf{I}\otimes\mathsf{A}_{H}) is used as a “binding” operator to connect two walks containing ss vertices at level 𝒞2\mathcal{C}_{2}, s2s^{2} vertices at level 𝒞3\mathcal{C}_{3}, and so on. More precisely, we form the following code cascade.

  • -

    𝒞0\mathcal{C}_{0} is an ε0\varepsilon_{0}-balanced linear code efficiently uniquely decodable from a constant radius.

  • -

    𝒞1=dsumW1(s)(𝒞0)\mathcal{C}_{1}=\operatorname{dsum}_{W_{1}(s)}(\mathcal{C}_{0}), where W1(s)W_{1}(s) is the set of length-(s-1) walks given by the operator

    (𝖨𝖠H)𝖦s2(𝖨𝖠H)(s2)th step(𝖨𝖠H)𝖦0(𝖨𝖠H)0th step.\underbrace{(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{s-2}(\mathsf{I}\otimes\mathsf{A}_{H})}_{\text{$(s-2)$th step}}\cdots\underbrace{(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{0}(\mathsf{I}\otimes\mathsf{A}_{H})}_{\text{$0$th step}}.
  • -

    𝒞2=dsumW2(s)(𝒞1)\mathcal{C}_{2}=\operatorname{dsum}_{W_{2}(s)}(\mathcal{C}_{1}), where W2(s)W_{2}(s) is the set of length-(s1)(s-1) walks over the vertex set W1(s)W_{1}(s) (with the latter being the set of length-(s1)(s-1) walks on the replacement product graph as mentioned above).

  • -

    𝒞i+1=dsumWi+1(s)(𝒞i)\mathcal{C}_{i+1}=\operatorname{dsum}_{W_{i+1}(s)}(\mathcal{C}_{i}), where Wi+1(s)W_{i+1}(s) is the set of length-(s1)(s-1) walks 666For simplicity we chose the number of vertices in all walks of the cascade to be ss, but it could naturally be some sis_{i}\in\mathbb{N} depending on ii. over the vertex set Wi(s)W_{i}(s). Similarly to the cascade of expander walks above, the lift can be thought of as being realized by taking walks using a suitable operator analogous to G^i\widehat{G}_{i}. Since its description is more technical we postpone its definition (see Definition 7.2) to Section 7.2 where it is actually used.

  • -

    𝒞\mathcal{C}_{\ell} denotes the final code in the sequence, which will later be chosen so that its bias is at most ε\varepsilon.

(IH)Gs1(IH)(I\otimes H)G_{s-1}(I\otimes H) (IH)Gs2(IH)(I\otimes H)G_{s-2}(I\otimes H) (IH)G0(IH)(I\otimes H)G_{0}(I\otimes H)\cdots(IH)Gs1(IH)(I\otimes H)G_{s-1}(I\otimes H) (IH)Gs2(IH)(I\otimes H)G_{s-2}(I\otimes H) (IH)G0(IH)(I\otimes H)G_{0}(I\otimes H)\cdots(s1)(s-1) steps(s1)(s-1) steps\cdots(s1)(s-1)-stepsBinding operatorBinding operator
Figure 5: Two levels of code cascading for Ta-Shma’s construction involving codes 𝒞0\mathcal{C}_{0}, 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2} (to make the notation compact we used HH to denote 𝖠H\mathsf{A}_{H}).

6 Unique Decoding of Ta-Shma Codes

We show how code cascading together with list decoding for each level of the cascade allow us to obtain an efficient unique decoding algorithm for Ta-Shma’s construction. We obtain a sequence of results of increasing strength culminating in Theorem 1.1 (which we recall below for convenience). The approach is as follows: we use several different instantiations of Ta-Shma’s construction, each yielding a value of ss (for the ss-wide replacement product) and expansion parameters for the family of outer and inner graphs, and show how the list decoding framework can be invoked in the associated cascade for each one.

See 1.1

In this section, we will fit these objects and tools together assuming the parameters are chosen to achieve the required rates and the conditions for applying the list decoding results are satisfied. The concrete instantiations of Ta-Shma codes are done in Section 8. Establishing that the list decoding framework can be applied to this construction is done in Section 7 after which the framework is finally instantiated in Section 9.

Ta-Shma uses the direct sum lifting on an ss-wide replacement product graph to construct a family of ε\varepsilon-balanced codes 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} with rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) and finds parameters for such codes to have the required bias and rate. We will discuss unique decoding results for several versions of these codes. Throughout this section, we will use collections W(k)W(k) which will always be either the set of walks with k=sk=s vertices on an ss-wide replacement product graph (corresponding to the first level of the code cascade), which we denote W[0,s1]W[0,s-1], or a set of walks where the vertices are walks on a lower level of the code cascade.

6.1 Unique Decoding via Code Cascading

To perform unique decoding we will use the machinery of list decoding from Theorem 6.1 (proven later in Section 9), which relies on the list decoding framework of [AJQ+20]. Proving that this framework can be applied to Ta-Shma’s construction is one of our technical contributions.

Theorem 6.1 (List decoding direct sum lifting).

Let η0(0,1/4)\eta_{0}\in(0,1/4) be a constant, η(0,η0)\eta\in(0,\eta_{0}), and

kk0(η)Θ(log(1/η)).k\geq k_{0}(\eta)\coloneqq\Theta(\log(1/\eta)).

Suppose 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} is an η0\eta_{0}-balanced linear code and 𝒞=dsumW(k)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(k)}(\mathcal{C}) is the direct sum lifting of 𝒞\mathcal{C} on a τ\tau-splittable collection of walks W(k)W(k). There exists an absolute constant K>0K>0 such that if

ττ0(η,k)η8Kk24k,\tau\leq\tau_{0}(\eta,k)\coloneqq\frac{\eta^{8}}{K\cdot k\cdot 2^{4k}},

then the code 𝒞\mathcal{C}^{\prime} is η\eta-balanced and can be efficiently list decoded in the following sense:

If y~\tilde{y} is (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}^{\prime}, then we can compute the list

(y~,𝒞,𝒞){(z,dsumW(k)(z))z𝒞,Δ(dsumW(k)(z),y~)12η}\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})\coloneqq\left\{(z,\operatorname{dsum}_{W(k)}(z))\mid z\in\mathcal{C},\Delta\left\lparen\operatorname{dsum}_{W(k)}(z),\tilde{y}\right\rparen\leq\frac{1}{2}-\sqrt{\eta}\right\}

in time

nO(1/τ0(η,k)4)f(n),n^{O(1/\tau_{0}(\eta,k)^{4})}\cdot f(n),

where f(n)f(n) is the running time of a unique decoding algorithm for 𝒞\mathcal{C}. Otherwise, we return (y~,𝒞,𝒞)=\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})=\emptyset with the same running time of the preceding case.

Note that the requirement on kk in the above theorem is necessary for the lifted code 𝒞\mathcal{C}^{\prime} to be η\eta-balanced. Splittability will imply that the walk collection W(k)W(k) is expanding, which gives us parity sampling for large kk. Specifically, kk must be large enough for W(k)W(k) to be a (1/2+η0/2,η)(1/2+\eta_{0}/2,\eta)-parity sampler.

Applying the list decoding tool above, we can perform unique decoding in the regime of η0\eta_{0}, η\eta, and kk being constant. With these choices the expansion required for splittability and the parity sampling strength are only required to be constants.

Lemma 6.2 (Decoding Step).

Let η0(0,1/4)\eta_{0}\in(0,1/4) and η<min{η0,1/16}\eta<\min\{\eta_{0},1/16\}. If W(k)W(k) is a walk collection on vertex set [n][n] with kk0(η)k\geq k_{0}(\eta) and splittability ττ0(η,k)\tau\leq\tau_{0}(\eta,k), where k0k_{0} and τ0\tau_{0} are as in Theorem 6.1, we have the following unique decoding property:

If 𝒞𝔽2n\mathcal{C}\subseteq\mathbb{F}_{2}^{n} is an η0\eta_{0}-balanced linear code that can be uniquely decoded in time f(n)f(n), then 𝒞=dsumW(k)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(k)}(\mathcal{C}) is an η\eta-balanced code that can be uniquely decoded in time nO(1/τ0(η,k)4)f(n)n^{O(1/\tau_{0}(\eta,k)^{4})}\cdot f(n).

Proof.

Using Theorem 6.1, we can list decode 𝒞\mathcal{C}^{\prime} up to a radius of 1/2η1/2-\sqrt{\eta} for any η\eta if we have the appropriate parameters kk and τ\tau. Let y~𝔽2W(k)\tilde{y}\in{\mathbb{F}}_{2}^{W(k)} be a received word within the unique decoding radius of 𝒞\mathcal{C}^{\prime}. To perform unique decoding, we simply run the list decoding algorithm on y~\tilde{y} and return the codeword on the resulting list which is closest to y~\tilde{y}; this will yield the correct result as long as the list decoding radius is larger than the unique decoding radius. It suffices to have 1/2η>1/4Δ(𝒞)/21/2-\sqrt{\eta}>1/4\geq\Delta(\mathcal{C}^{\prime})/2. We choose parameters as follows:

  1. 1.

    Take η<1/16\eta<1/16 to ensure 1/2η>1/41/2-\sqrt{\eta}>1/4.

  2. 2.

    Let k0=Θ(log(1/η))k_{0}=\Theta(\log(1/\eta)) be the smallest integer satisfying the assumption in Theorem 6.1 with the chosen η\eta. Take kk0k\geq k_{0}.

  3. 3.

    Take ττ0(η,k)=η8/(Kk24k)\tau\leq\tau_{0}(\eta,k)=\eta^{8}/(K\cdot k\cdot 2^{4k}).

Note that kk and τ\tau satisfy the conditions of Theorem 6.1, so we can use this theorem to list decode a received word y~\tilde{y} in time nO(1/τ0(η,k)4)f(n)n^{O(1/\tau_{0}(\eta,k)^{4})}\cdot f(n). To unique decode, we return the closest yy on the list to y~\tilde{y} (or failure if the list is empty).      

Iteratively using the decoding step given by Lemma 6.2 above, we obtain unique decodability of all codes in a cascade (under suitable assumptions).

Lemma 6.3 (Code Cascade Decoding).

Let η0(0,1/4)\eta_{0}\in(0,1/4) and η<min{η0,1/16}\eta<\min\{\eta_{0},1/16\}. Suppose 𝒞0𝔽2n0,𝒞1𝔽2n1,,𝒞𝔽2n\mathcal{C}_{0}\subseteq\mathbb{F}_{2}^{n_{0}},\mathcal{C}_{1}\subseteq\mathbb{F}_{2}^{n_{1}},\dots,\mathcal{C}_{\ell}\subseteq\mathbb{F}_{2}^{n_{\ell}} is a code cascade where 𝒞0\mathcal{C}_{0} is an η0\eta_{0}-balanced linear code that can be uniquely decoded in time g(n0)g(n_{0}).

If for every i[]i\in[\ell] we have that 𝒞i\mathcal{C}_{i} is obtained from 𝒞i1\mathcal{C}_{i-1} by a τi\tau_{i}-splittable walk collection Wi(ki)W_{i}(k_{i}) on vertex set [ni1][n_{i-1}] with kik0(η)k_{i}\geq k_{0}(\eta) and τiτ0(η,ki)\tau_{i}\leq\tau_{0}(\eta,k_{i}), where k0k_{0} and τ0\tau_{0} are as in Theorem 6.1, then 𝒞\mathcal{C}_{\ell} is uniquely decodable in time

g(n0)i=1ni1O(1/τ0(η,ki)4).g(n_{0})\cdot\prod_{i=1}^{\ell}n_{i-1}^{O(1/\tau_{0}(\eta,k_{i})^{4})}.
Proof.

Induct on i[]i\in[\ell] applying Lemma 6.2 as the induction step. The code 𝒞i\mathcal{C}_{i} produced during each step will have bias at most η<η0\eta<\eta_{0}, so the hypothesis of Lemma 6.2 will be met at each level of the cascade.      

We are almost ready to prove our first main theorem establishing decodability close to the Gilbert–Varshamov bound. We will need parameters for an instantiation of Ta-Shma’s code that achieves the desired distance and rate (which will be developed in Section 8.1) and a lemma relating splittability to the spectral properties of the graphs used in the construction (to be proven in Section 7.2).

Lemma 6.4 (Ta-Shma Codes I).

For any β>0\beta>0, there are infinitely many values of ε(0,1/2)\varepsilon\in(0,1/2) (with 0 as an accumulation point) such that for infinitely many values of NN\in{\mathbb{N}}, there are explicit binary Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq{\mathbb{F}}_{2}^{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced), and

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}).

Furthermore, 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} is the direct sum lifting of a base code 𝒞0𝔽2n0\mathcal{C}_{0}\subseteq{\mathbb{F}}_{2}^{n_{0}} using the collection of walks W[0,t1]W[0,t-1] on the ss-wide replacement product of two graphs GG and HH, with the following parameters:

  • -

    ss0max{128,26/β}s\geq s_{0}\coloneqq\max\{128,26/\beta\}.

  • -

    The inner graph HH is a regular graph with σ2(H)λ2\sigma_{2}(H)\leq\lambda_{2}, where λ2=(16s3logs)/s2s2\lambda_{2}=(16s^{3}\log s)/s^{2s^{2}}.

  • -

    The outer graph GG is a regular graph with σ2(G)λ1\sigma_{2}(G)\leq\lambda_{1}, where λ1=λ24/6\lambda_{1}=\lambda_{2}^{4}/6.

  • -

    The base code 𝒞0\mathcal{C}_{0} is unique decodable in time n0O(1)n_{0}^{O(1)} and has bias ε0λ24/3\varepsilon_{0}\leq\lambda_{2}^{4}/3.

  • -

    The number of vertices tt in the walks satisfies λ22(15/s)(11/s)(t1)ε\lambda_{2}^{2(1-5/s)(1-1/s)(t-1)}\leq\varepsilon.

Lemma 6.5.

Let W(k)W(k) be either the collection W[0,s1]W[0,s-1] of walks of length ss on the ss-wide replacement product with outer graph GG and inner graph HH or the collection of walks over the vertex set W[0,r]W[0,r], where r1(mods)r\equiv-1\pmod{s}. Then W(k)W(k) is τ\tau-splittable with τ=σ2(G)+2σ2(H)+σ2(H)2\tau=\sigma_{2}(G)+2\sigma_{2}(H)+\sigma_{2}(H)^{2}.

The statement of this first decoding theorem is more technical than Theorem 1.1, but it will be easier to prove while the latter will build on this theorem with a more careful tuning of parameters.

Theorem 6.6 (Main I).

For every β>0\beta>0, there are infinitely many values ε(0,1/2)\varepsilon\in(0,1/2) (with 0 an accumulation point) such that for infinitely many values of NN\in\mathbb{N} there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}), and

  3. (iii)

    a unique decoding algorithm with running time NOβ(log(log(1/ε)))N^{O_{\beta}(\log(\log(1/\varepsilon)))}.

Proof.

We proceed as follows:

  1. 1.

    Let η0=1/10\eta_{0}=1/10 and η=1/100\eta=1/100 (these choices are arbitrary; we only need η0<1/4\eta_{0}<1/4, η<1/16\eta<1/16, and η<η0\eta<\eta_{0}). Let k0=k0(η)k_{0}=k_{0}(\eta) be the constant from Theorem 6.1 with this value of η\eta.

  2. 2.

    Given β>0\beta>0, Lemma 6.4 provides a value s0s_{0} such that the direct sum lifting on the ss-wide replacement product with ss0s\geq s_{0} can achieve a rate of Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) for infinitely many ε(0,1/2)\varepsilon\in(0,1/2). Choose ss to be an integer larger than both k0k_{0} and s0s_{0} that also satisfies

    s2(s16)s2η84K,s^{2}\cdot\left(\frac{s}{16}\right)^{-s^{2}}\leq\frac{\eta^{8}}{4K}, (3)

    where KK is the constant from Theorem 6.1.

  3. 3.

    Use Lemma 6.4 with this value of ss to obtain graphs GG and HH and a base code 𝒞0\mathcal{C}_{0} having the specified parameters λ1\lambda_{1}, λ2\lambda_{2}, ε0\varepsilon_{0}, and tt, with the additional requirement that t=st=s^{\ell} for some integer \ell. These parameter choices ensure that the resulting code 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} has the desired distance and rate. Since s128s\geq 128, we have λ2=(16s3logs)/s2s2ss2\lambda_{2}=(16s^{3}\log s)/s^{2s^{2}}\leq s^{-s^{2}}. From the choice of tt satisfying λ22(15/s)(11/s)(t1)ε\lambda_{2}^{2(1-5/s)(1-1/s)(t-1)}\leq\varepsilon, we deduce that =O(log(log(1/ε)))\ell=O(\log(\log(1/\varepsilon))). Note also that the bias ε0\varepsilon_{0} of the code 𝒞0\mathcal{C}_{0} is smaller than η0\eta_{0}.

  4. 4.

    Create a code cascade with \ell levels using the ss-wide replacement product of the graphs GG and HH as in Section 5.2, starting with 𝒞0\mathcal{C}_{0} and ending with the final code 𝒞=𝒞N,ε,β\mathcal{C}_{\ell}=\mathcal{C}_{N,\varepsilon,\beta}. As the total number of vertices in a walk is t=st=s^{\ell}, each level of the code cascade will use walks with ss vertices. Let 𝒞0,𝒞1,,𝒞\mathcal{C}_{0},\mathcal{C}_{1},\dots,\mathcal{C}_{\ell} be the sequence of codes in this cascade.

  5. 5.

    In order to satisfy the splittability requirement of Lemma 6.3, the walk collection Wi(s)W_{i}(s) at each level of the code cascade must be τ\tau-splittable, where ττ0(η,s2)\tau\leq\tau_{0}(\eta,s^{2}). (We use k=s2k=s^{2} instead of k=sk=s in the requirement for a technical reason that will be clear in Section 8.2.) The bounds on the singular values of GG and HH and Lemma 6.5 ensure that

    τ=σ2(G)+2σ2(H)+σ2(H)24λ24ss2,\tau=\sigma_{2}(G)+2\sigma_{2}(H)+\sigma_{2}(H)^{2}\leq 4\lambda_{2}\leq 4s^{-s^{2}},

    which is smaller than τ0(η,s2)=η8/(Ks224s2)\tau_{0}(\eta,s^{2})=\eta^{8}/(K\cdot s^{2}\cdot 2^{4s^{2}}) by Eq. 3

  6. 6.

    As all hypotheses of Lemma 6.3 are satisfied by the code cascade, we apply it to conclude that 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} is uniquely decodable in time

    g(n0)i=1ni1O(1/τ0(η,s)4)NO(1)i=1NOβ(1)=NOβ(log(log(1/ε))),g(n_{0})\cdot\prod_{i=1}^{\ell}n_{i-1}^{O(1/\tau_{0}(\eta,s)^{4})}\leq N^{O(1)}\cdot\prod_{i=1}^{\ell}N^{O_{\beta}(1)}=N^{O_{\beta}(\log(\log(1/\varepsilon)))},

    where we use that 𝒞0\mathcal{C}_{0} is uniquely decodable in time n0O(1)n_{0}^{O(1)}, 1/τ0(η,s)=2O(1/β)1/\tau_{0}(\eta,s)=2^{O(1/\beta)}, ni1<n=Nn_{i-1}<n_{\ell}=N for every i[]i\in[\ell], and =O(log(log(1/ε)))\ell=O(\log(\log(1/\varepsilon))).

 

In the code cascade constructed in Theorem 6.6, the final number of vertices in a walk is t=st=s^{\ell}, where ss is a sufficiently large constant that does not depend on ε\varepsilon. The limited choices for tt place some restrictions on the values of the final bias ε\varepsilon that can be achieved. To achieve any bias ε\varepsilon for 𝒞\mathcal{C}_{\ell} we need to choose the parameters more carefully, which will be done in Section 8.2 to yield our next main result.

Theorem 6.7 (Main II).

For every β>0\beta>0 and every ε>0\varepsilon>0 with β\beta and ε\varepsilon sufficiently small, there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} for infinitely many values NN\in\mathbb{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}), and

  3. (iii)

    a unique decoding algorithm with running time NOβ(log(log(1/ε)))N^{O_{\beta}(\log(\log(1/\varepsilon)))}.

Ta-Shma obtained codes of rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) with vanishing β\beta as ε\varepsilon goes to zero. We obtain a unique decoding algorithm for this regime (with slightly slower decreasing β\beta as ε\varepsilon vanishes). More precisely, using the parameters described in Section 8.3 and the running time analysis in Section 6.2, we obtain the following theorem which is our main result for unique decoding.

Theorem 6.8 (Main Unique Decoding (restatement of Theorem 1.1)).

For every ε>0\varepsilon>0 sufficiently small, there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} for infinitely many values NN\in\mathbb{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) where β=O(1/(log2(1/ε))1/6)\beta=O(1/(\log_{2}(1/\varepsilon))^{1/6}), and

  3. (iii)

    a unique decoding algorithm with running time NOε,β(1)N^{O_{\varepsilon,\beta}(1)}.

Furthermore, if instead we take β>0\beta>0 to be an arbitrary constant, the running time becomes (log(1/ε))O(1)NOβ(1)(\log(1/\varepsilon))^{O(1)}\cdot N^{O_{\beta}(1)} (fixed polynomial time).

Theorem 1.2 about gentle list decoding is proved in Section 8.4 after instantiating Ta-Shma codes in some parameter regimes in the preceding parts of Section 8.

6.2 Fixed Polynomial Time

In Theorem 6.7, a running time of NOβ(log(log(1/ε)))N^{O_{\beta}(\log(\log(1/\varepsilon)))} was obtained to decode Ta-Shma codes 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} of distance 1/2ε/21/2-\varepsilon/2 and rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) for constant β>0\beta>0 and block length NN. The running time contains an exponent which depends on the bias ε\varepsilon and is therefore not fixed polynomial time. We show how to remove this dependence in this regime of β>0\beta>0 being an arbitrary constant. More precisely, we show the following.

Theorem 6.9 (Fixed PolyTime Unique Decoding).

Let β>0\beta>0 be an arbitrary constant. For every ε>0\varepsilon>0 sufficiently small, there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} for infinitely many values NN\in\mathbb{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}), and

  3. (iii)

    a unique decoding algorithm with fixed polynomial running time (log(1/ε))O(1)NOβ(1)(\log(1/\varepsilon))^{O(1)}\cdot N^{O_{\beta}(1)}.

The list decoding framework finds a list of pairs (z,y=dsum(z))(z,y=\operatorname{dsum}(z)) of size at most N(1/τ0(η,k))O(1)N^{(1/\tau_{0}(\eta,k))^{O(1)}} at each level of the code cascade and recursively issues decoding calls to all zz in this list. Since the number of lifts in the cascade is Ω(log(log(1/ε)))\Omega(\log(\log(1/\varepsilon))), we end up with an overall running time of NOβ(log(log(1/ε)))N^{O_{\beta}(\log(\log(1/\varepsilon)))}.

We will describe a method of pruning these lists which will lead to fixed polynomial running time. Let 1/2η1/2-\sqrt{\eta}, where η>0\eta>0 is a constant, be the list decoding radius used for a unique decoding step in the cascade. To achieve fixed polynomial time we will prune this polynomially large list of words to a constant size at each inductive step in Lemma 6.3. As we are working with parameters within the Johnson bound, the actual list of codewords has constant size (1/η)O(1)(1/\eta)^{O(1)}. At every step, we will be able to find a small sublist whose size only depends on η\eta that has a certain covering property, and then issue decoding calls to this much smaller list.

Definition 6.10 (ζ\zeta-cover).

Let W(k)[n]kW(k)\subseteq[n]^{k}, 𝒞𝔽2n\mathcal{C}\subseteq\mathbb{F}_{2}^{n} be a code, A𝒞A\subseteq\mathcal{C}, and ={(z,dsumW(k)(z))zA}\mathcal{L}=\{(z,\operatorname{dsum}_{W(k)}(z))\mid z\in A\}. We say that ={(z(1),dsumW(k)(z(1))),,(z(m),dsumW(k)(z(m)))}\mathcal{L}^{\prime}=\{(z^{(1)},\operatorname{dsum}_{W(k)}(z^{(1)})),\dots,(z^{(m)},\operatorname{dsum}_{W(k)}(z^{(m)}))\} is a ζ\zeta-cover of \mathcal{L} if for every (z,y)(z,y)\in\mathcal{L}, there exists some (z,y)(z^{\prime},y^{\prime})\in\mathcal{L}^{\prime} with bias(zz)>12ζ\operatorname{bias}(z-z^{\prime})>1-2\zeta (that is, either Δ(z,z)<ζ\Delta(z,z^{\prime})<\zeta or Δ(z,z)>1ζ\Delta(z,z^{\prime})>1-\zeta).

Lemma 6.11 (Cover Compactness).

Let W(k)[n]kW(k)\subseteq[n]^{k}, 𝒞𝔽2n\mathcal{C}\subseteq\mathbb{F}_{2}^{n} be a linear η0\eta_{0}-balanced code, 𝒞=dsumW(k)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(k)}(\mathcal{C}) be an η\eta-balanced code, and y~𝔽2W(k)\tilde{y}\in\mathbb{F}_{2}^{W(k)}. Define

(y~,𝒞,𝒞){(z,dsumW(k)(z))z𝒞,Δ(dsumW(k)(z),y~)12η}.\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})\coloneqq\left\{(z,\operatorname{dsum}_{W(k)}(z))\mid z\in\mathcal{C},\Delta\left\lparen\operatorname{dsum}_{W(k)}(z),\tilde{y}\right\rparen\leq\frac{1}{2}-\sqrt{\eta}\right\}.

Suppose \mathcal{L}^{\prime} is a ζ\zeta-cover of (y~,𝒞,𝒞)\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime}) for some ζ<1/2\zeta<1/2. Further, suppose that for every (z,y)(z^{\prime},y^{\prime})\in\mathcal{L}^{\prime}, we have Δ(y,y~)1/2η\Delta\left\lparen y^{\prime},\tilde{y}\right\rparen\leq 1/2-\sqrt{\eta}. If W(k)W(k) is a (12ζ,η)(1-2\zeta,\eta)-parity sampler, then there exists ′′\mathcal{L}^{\prime\prime}\subseteq\mathcal{L}^{\prime} with |′′|1/η\left\lvert\mathcal{L}^{\prime\prime}\right\rvert\leq 1/\eta which is a (2ζ)(2\zeta)-cover of \mathcal{L}.

Proof.

Build a graph where the vertices are pairs (z,y)(z^{\prime},y^{\prime})\in\mathcal{L}^{\prime} and two vertices (z(i),y(i))(z^{(i)},y^{(i)}), (z(j),y(j))(z^{(j)},y^{(j)}) are connected iff bias(z(i)z(j))>12ζ\operatorname{bias}(z^{(i)}-z^{(j)})>1-2\zeta. Let ′′\mathcal{L}^{\prime\prime} be any maximal independent set of this graph. Any two vertices (z(i),y(i)),(z(j),y(j))′′(z^{(i)},y^{(i)}),(z^{(j)},y^{(j)})\in\mathcal{L}^{\prime\prime} have bias(z(i)z(j))12ζ\operatorname{bias}(z^{(i)}-z^{(j)})\leq 1-2\zeta and thus bias(y(i)y(j))η\operatorname{bias}(y^{(i)}-y^{(j)})\leq\eta since W(k)W(k) is a (12ζ,η)(1-2\zeta,\eta)-parity sampler. This means that {y′′(z′′,y′′)′′}\{y^{\prime\prime}\mid(z^{\prime\prime},y^{\prime\prime})\in\mathcal{L}^{\prime\prime}\} is a code of distance at least 1/2η/21/2-\eta/2. By the condition that Δ(y′′,y~)1/2η\Delta(y^{\prime\prime},\tilde{y})\leq 1/2-\sqrt{\eta} for all (z′′,y′′)′′(z^{\prime\prime},y^{\prime\prime})\in\mathcal{L}^{\prime\prime} and the Johnson bound, we have |′′|1/η\left\lvert\mathcal{L}^{\prime\prime}\right\rvert\leq 1/\eta.

Finally, we will show that ′′\mathcal{L}^{\prime\prime} is a (2ζ)(2\zeta)-cover of \mathcal{L}. Let (z,y)(z,y)\in\mathcal{L}. As \mathcal{L}^{\prime} is a ζ\zeta-cover of \mathcal{L}, there exists a pair (z,y)(z^{\prime},y^{\prime})\in\mathcal{L} with bias(zz)>12ζ\operatorname{bias}(z-z^{\prime})>1-2\zeta, so zz is within distance ζ\zeta of either zz^{\prime} or its complement z¯\overline{z^{\prime}}. The construction of ′′\mathcal{L}^{\prime\prime} as a maximal independent set ensures that there is some (z′′,y′′)′′(z^{\prime\prime},y^{\prime\prime})\in\mathcal{L}^{\prime\prime} with bias(zz′′)>12ζ\operatorname{bias}(z^{\prime}-z^{\prime\prime})>1-2\zeta, so z′′z^{\prime\prime} is also within distance ζ\zeta of either zz^{\prime} or its complement z¯\overline{z^{\prime}}. Applying the triangle inequality in all of the possible cases, we see that either Δ(z,z′′)<2ζ\Delta(z,z^{\prime\prime})<2\zeta or Δ(z,z′′)>12ζ\Delta(z,z^{\prime\prime})>1-2\zeta, which implies ′′\mathcal{L}^{\prime\prime} is a (2ζ)(2\zeta)-cover of \mathcal{L}.      

To decode in fixed polynomial time, we use a modification of the list decoding result Theorem 6.1 that outputs a ζ\zeta-cover \mathcal{L}^{\prime} of the list of codewords \mathcal{L} instead of the list itself. Theorem 6.1 recovers the list \mathcal{L} by finding \mathcal{L}^{\prime} and unique decoding every element of it. To get \mathcal{L}^{\prime}, we use the same algorithm, but stop before the final decoding step. This removes the unique decoding time f(n)f(n) of the base code from the running time of the list decoding algorithm. We will apply Lemma 6.11 after each time we call this ζ\zeta-cover algorithm to pare the list down to a constant size before unique decoding; note that this loses a factor of 2 in the strength of the cover. To compensate for this, we will use a collection W(k)W(k) with stronger parity sampling than required for Theorem 6.1. In that theorem, W(k)W(k) was a (1/2+η0/2,η)(1/2+\eta_{0}/2,\eta)-parity sampler to ensure that we obtained words within the list decoding radius (1/4η0/4)(1/4-\eta_{0}/4) of the base code. By using a stronger parity sampler, the words in the pruned list ′′\mathcal{L}^{\prime\prime} will still be within the unique decoding radius even after accounting for the loss in the bias from cover compactness, which means decoding will still be possible at every level of the cascade. Fortunately, improving the parity sampling only requires increasing the walk length kk by a constant multiplicative factor. The cover retrieval algorithm below will be proven in Section 9.

Theorem 6.12 (Cover retrieval for direct sum lifting).

Let η0(0,1/4)\eta_{0}\in(0,1/4) be a constant, η(0,η0)\eta\in(0,\eta_{0}), ζ=1/8η0/8\zeta=1/8-\eta_{0}/8, and

kk0(η)Θ(log(1/η)).k\geq k_{0}^{\prime}(\eta)\coloneqq\Theta(\log(1/\eta)).

Suppose 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} is an η0\eta_{0}-balanced linear code and 𝒞=dsumW(k)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(k)}(\mathcal{C}) is the direct sum lifting of 𝒞\mathcal{C} on a τ\tau-splittable collection of walks W(k)W(k). There exists an absolute constant K>0K>0 such that if

ττ0(η,k)η8Kk24k,\tau\leq\tau_{0}(\eta,k)\coloneqq\frac{\eta^{8}}{K\cdot k\cdot 2^{4k}},

then the code 𝒞\mathcal{C}^{\prime} is η\eta-balanced, W(k)W(k) is a (12ζ,η)(1-2\zeta,\eta)-parity sampler, and we have the following:

If y~\tilde{y} is (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}^{\prime}, then we can compute a ζ\zeta-cover \mathcal{L}^{\prime} of the list

(y~,𝒞,𝒞){(z,dsumW(k)(z))z𝒞,Δ(dsumW(k)(z),y~)12η}\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})\coloneqq\left\{(z,\operatorname{dsum}_{W(k)}(z))\mid z\in\mathcal{C},\Delta\left\lparen\operatorname{dsum}_{W(k)}(z),\tilde{y}\right\rparen\leq\frac{1}{2}-\sqrt{\eta}\right\}

in which Δ(y,y~)1/2η\Delta(y^{\prime},\tilde{y})\leq 1/2-\sqrt{\eta} for every (z,y)(z^{\prime},y^{\prime})\in\mathcal{L}^{\prime}, in time

nO(1/τ0(η,k)4).n^{O(1/\tau_{0}(\eta,k)^{4})}.

Otherwise, we return =\mathcal{L}^{\prime}=\emptyset with the same running time of the preceding case.

We now have all of the pieces necessary to prove Theorem 6.9. The process is essentially the same as our earlier unique decoding algorithm, except we use the cover retrieval algorithm from Theorem 6.12 instead of the full list decoding from Theorem 6.1. This allows us to insert a list pruning step in between obtaining the ζ\zeta-cover and calling the unique decoding algorithm for the previous level of the cascade.

Proof of Theorem 6.9.

We use the code 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} from Theorem 6.7 to get the desired distance and rate, with the slight modification of ensuring ss is larger than k0k_{0}^{\prime} from Theorem 6.12 rather than k0k_{0} from Theorem 6.1.

Each level of the code cascade between 𝒞i1\mathcal{C}_{i-1} and 𝒞i\mathcal{C}_{i} uses constant η0<1/4\eta_{0}<1/4 and η<min{η0,1/16}\eta<\min\{\eta_{0},1/16\}, which allows for decoding in a similar fashion to Lemma 6.2 and Lemma 6.3. The difference is that we use Theorem 6.12 as the decoding step to obtain a ζ\zeta-cover \mathcal{L}^{\prime} of the list (y~,𝒞i1,𝒞i)\mathcal{L}(\tilde{y},\mathcal{C}_{i-1},\mathcal{C}_{i}) for y~𝔽2ni\tilde{y}\in{\mathbb{F}}_{2}^{n_{i}}, where ζ=1/8η0/8\zeta=1/8-\eta_{0}/8. By Lemma 6.11 and the fact that the walk collection is a (12ζ,η)(1-2\zeta,\eta)-parity sampler, \mathcal{L} has a (2ζ)(2\zeta)-cover ′′\mathcal{L}^{\prime\prime}\subseteq\mathcal{L}^{\prime} of size at most 1/η1/\eta. The covering property says that for every (z,y)(z,y)\in\mathcal{L}, there exists some (z′′,y′′)′′(z^{\prime\prime},y^{\prime\prime})\in\mathcal{L}^{\prime\prime} such that zz is within distance 2ζ=1/4η0/42\zeta=1/4-\eta_{0}/4 of either z′′z^{\prime\prime} or its complement z′′¯\overline{z^{\prime\prime}}. This is the unique decoding radius of the η0\eta_{0}-balanced code 𝒞i1\mathcal{C}_{i-1}, so we can recursively decode the list

′′{(z′′¯,dsum(z′′¯))(z′′,dsum(z′′))′′}\mathcal{L}^{\prime\prime}\cup\{(\overline{z^{\prime\prime}},\operatorname{dsum}(\overline{z^{\prime\prime}}))\mid(z^{\prime\prime},\operatorname{dsum}(z^{\prime\prime}))\in\mathcal{L}^{\prime\prime}\}

to obtain the complete list of codewords in 𝒞i1\mathcal{C}_{i-1}.

Now we analyze the running time. On each level of the code cascade, we run the cover retrieval algorithm once to get \mathcal{L}^{\prime}, prune the cover to get ′′\mathcal{L}^{\prime\prime}, and then feed the union of ′′\mathcal{L}^{\prime\prime} and its complement (which has size at most 2/η2/\eta) into the unique decoding algorithm for the next level of the cascade. Letting Ti(ni)T_{i}(n_{i}) be the running time of unique decoding a single word in the code 𝒞i𝔽2ni\mathcal{C}_{i}\subseteq{\mathbb{F}}_{2}^{n_{i}}, we have the following recurrence:

Ti(ni)niO(1/τ0(η,k)4)+2ηTi1(ni1) and T0(n0)=n0O(1).T_{i}(n_{i})\leq n_{i}^{O(1/\tau_{0}(\eta,k)^{4})}+\frac{2}{\eta}\cdot T_{i-1}(n_{i-1})\quad\text{ and }\quad T_{0}(n_{0})=n_{0}^{O(1)}.

Note that the base code 𝒞0\mathcal{C}_{0} has constant bias ε0\varepsilon_{0} and thus it has a fixed polynomial time decoding algorithm (e.g. Theorem 6.7). The height of the recursive call tree is the number of levels in the code cascade, which is =O(log(log(1/ε)))\ell=O(\log(\log(1/\varepsilon))), as in the proof of Theorem 6.6. Each node of this tree has a constant branching factor of 2/η2/\eta. Thus, the tree has (log(1/ε))O(1)(\log(1/\varepsilon))^{O(1)} nodes, each of which costs at most niO(1/τ0(η,k)4)NO(1/τ0(η,k)4)n_{i}^{O(1/\tau_{0}(\eta,k)^{4})}\leq N^{O(1/\tau_{0}(\eta,k)^{4})} time. Furthermore, in this regime of β>0\beta>0 being a constant, kk is constant as well as η\eta, so we have NO(1/τ0(η,k)4)=NOβ(1)N^{O(1/\tau_{0}(\eta,k)^{4})}=N^{O_{\beta}(1)} and the total running time is (log(1/ε))O(1)NOβ(1)(\log(1/\varepsilon))^{O(1)}\cdot N^{O_{\beta}(1)}.      

7 Satisfying the List Decoding Framework Requirements

The list decoding framework of [AJQ+20] is capable of decoding codes obtained from direct sum liftings, provided they satisfy a few requisite properties. The framework was originally shown to work for expander walks; we need to adapt it to our case of a code cascade based on walks on the ss-wide replacement product. We will start with a broad overview of the list decoding algorithm and point out where various requirements arise.

The problem of finding a list of codewords in a direct sum lifting close to a received word can be viewed as finding approximate solutions to a kk-XOR instance. This is done by solving a particular SOS program and rounding the resulting solution. The algorithm is unable to perform rounding if the kk-XOR instance is based on an arbitrary collection of walks W(k)W(k); it can only handle liftings in which W(k)W(k) satisfies a property called tensoriality. If W(k)W(k) is tensorial, the SOS local variables in the solution can be approximated by product distributions, which will allow us to obtain a list of solutions by independent rounding. Tensoriality for expander walks is a consequence of a simpler property known as splittability, which is a certain measure of the expansion of a walk collection.

Unfortunately, the list returned by the rounding process will not contain codewords directly—instead, we only get a guarantee that all of the codewords we are looking for have a weak agreement (just over 1/2) with something on this list. We will find the desired codewords by relying on the parity sampling of W(k)W(k). If W(k)W(k) is a sufficiently good parity sampler, weak agreement in the lifted space corresponds to a much stronger agreement in the ground space. This will allow us to recover the codewords using the unique decoding algorithm of the base code.

To recap, applying the list decoding framework in our setting requires doing the following:

  1. 1.

    Proving parity sampling for the walks used in the code cascade (Section 7.1).

  2. 2.

    Showing that the walk collection of the ss-wide replacement product is splittable (Section 7.2).

  3. 3.

    Making Ta-Shma’s construction compatible with the Sum-of-Squares machinery (Section 7.3) and then obtaining tensoriality from splittability (Section 7.4).

An additional complication is introduced by using a code cascade instead of a single decoding step: the above requirements need to be satisfied at every level of the cascade. The details of the proofs will often differ between the first level of the cascade, which is constructed using walks on the ss-wide replacement product, and higher levels, which are walks on a directed graph whose vertices are walks themselves. Once we have established all of the necessary properties, we will instantiate the list decoding framework in Section 9.

We will first define some convenient notation which will be used throughout this section.

Notation 7.1.

Let GG be a d1d_{1}-regular outer graph and HH be a d2d_{2}-regular inner graph used in Ta-Shma’s ss-wide replacement product.

Let 0k1k20\leq k_{1}\leq k_{2} be integers. We define W[k1,k2]W[k_{1},k_{2}] to be the set of all walks starting at time k1k_{1} and ending at time k2k_{2} in Ta-Shma’s construction. More precisely, since GG and HH are regular graphs, the collection W[k1,k2]W[k_{1},k_{2}] contains all walks obtained by sampling a uniform vertex (v,h)V(G)×V(H)(v,h)\in V(G)\times V(H) and applying the operator

(𝖨𝖠H)𝖦k21(I𝖠H)(𝖨𝖠H)𝖦k1(I𝖠H),(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{k_{2}-1}(I\otimes\mathsf{A}_{H})\cdots(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{k_{1}}(I\otimes\mathsf{A}_{H}),

where the index ii of each GiG_{i} is taken modulo ss. Observe that when k1=k2k_{1}=k_{2}, we have W[k1,k2]=V(G)×V(H)W[k_{1},k_{2}]=V(G)\times V(H).

We define a family of Markov operators which will play a similar role to the graphs G^i\widehat{G}_{i} from the cascade described in Section 5.1, but for Ta-Shma’s construction rather than expander walks.

Definition 7.2 (Split Operator).

Let 0k1k2<k30\leq k_{1}\leq k_{2}<k_{3}. We define the graph walk split operator

𝖲k1,k2,k3:W[k2+1,k3]W[k1,k2]\mathsf{S}_{k_{1},k_{2},k_{3}}\colon\mathbb{R}^{W[k_{2}+1,k_{3}]}\rightarrow\mathbb{R}^{W[k_{1},k_{2}]}

such that for every fW[k2+1,k3]f\in\mathbb{R}^{W[k_{2}+1,k_{3}]},

(𝖲k1,k2,k3(f))(w)𝔼w:wwW[k1,k3][f(w)],\displaystyle\left(\mathsf{S}_{k_{1},k_{2},k_{3}}(f)\right)(w)\coloneqq{\mathbb{E}}_{w^{\prime}:ww^{\prime}\in W[k_{1},k_{3}]}[f(w^{\prime})],

where wwww^{\prime} denotes the concatenation of the walks ww and ww^{\prime}. The operator 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} can be defined more concretely in matrix form such that for every wW[k1,k2]w\in W[k_{1},k_{2}] and wW[k2+1,k3]w^{\prime}\in W[k_{2}+1,k_{3}],

(𝖲k1,k2,k3)w,w=𝟙wwW[k1,k3]|{w~:ww~W[k1,k3]}|=𝟙wwW[k1,k3]d22(k3k2).\displaystyle\left(\mathsf{S}_{k_{1},k_{2},k_{3}}\right)_{w,w^{\prime}}=\frac{\mathbb{1}_{ww^{\prime}\in W[k_{1},k_{3}]}}{|\{\tilde{w}:w\tilde{w}\in W[k_{1},k_{3}]\}|}=\frac{\mathbb{1}_{ww^{\prime}\in W[k_{1},k_{3}]}}{d_{2}^{2(k_{3}-k_{2})}}.

7.1 Parity Sampling for the Code Cascade

To be able to apply the list decoding machinery to the code cascade 𝒞0𝔽2n0,𝒞1𝔽2n1,,𝒞𝔽2n\mathcal{C}_{0}\subseteq\mathbb{F}_{2}^{n_{0}},\mathcal{C}_{1}\subseteq\mathbb{F}_{2}^{n_{1}},\dots,\mathcal{C}_{\ell}\subseteq\mathbb{F}_{2}^{n_{\ell}}, we need the direct sum lifting at every level to be a parity sampler. The first level in the cascade uses walks directly on the ss-wide replacement product, which we can show is a good parity sampler using the spectral properties proven in Section 4.3.1. However, it will be more convenient for calculating parameters later on to prove a weaker result, which will suffice for our purposes since we only need to obtain constant bias for every level of the cascade. We analyze the parity sampling of these walks with the same strategy Ta-Shma employed to show parity sampling for walks on expander graphs (which resulted in Theorem 5.2).

Claim 7.3.

Let W[0,s1]W[0,s-1] be the collection of walks on the ss-wide replacement product of the graphs GG and HH and z𝔽2V(G)z\in\mathbb{F}_{2}^{V(G)} be a word with bias(z)η0\operatorname{bias}(z)\leq\eta_{0}. Let 𝖯z\mathsf{P}_{z} be the diagonal matrix with entries (𝖯z)(v,h),(v,h)=(1)zv(\mathsf{P}_{z})_{(v,h),(v,h)}=(-1)^{z_{v}} for (v,h)V(G)×V(H)(v,h)\in V(G)\times V(H). If σ2((𝖨𝖠H)missingGi(𝖨𝖠H))γ\sigma_{2}((\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{\mathsf{missing}}G_{i}(\mathsf{I}\otimes\mathsf{A}_{H}))\leq\gamma for all 0is20\leq i\leq s-2, then

i=0s2(𝖨𝖠H)𝖦i(𝖨𝖠H)𝖯z2(η0+2γ)(s1)/2.\left\lVert\prod_{i=0}^{s-2}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{P}_{z}\right\rVert_{2}\leq(\eta_{0}+2\gamma)^{\lfloor(s-1)/2\rfloor}.
Proof.

Let 0j<s20\leq j<s-2 be even. Take a vector vV(G)×V(H)v\in{\mathbb{R}}^{V(G)\times V(H)} with v2=1\left\lVert v\right\rVert_{2}=1 and let vv^{\parallel} and vv^{\perp} be its parallel and orthogonal components to the all ones vector. For 0is20\leq i\leq s-2, let 𝖠i=(𝖨𝖠H)𝖦i(𝖨𝖠H)\mathsf{A}_{i}=(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H}). Consider two terms 𝖠j+1𝖯z𝖠j𝖯z\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}\mathsf{P}_{z} of the product appearing in the claim. Since 𝖯z\mathsf{P}_{z} is unitary, 𝖠j+1𝖯z𝖠j𝖯z2=𝖠j+1𝖯z𝖠j2\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}\mathsf{P}_{z}\right\rVert_{2}=\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}\right\rVert_{2}. We have

𝖠j+1𝖯z𝖠jv2\displaystyle\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}v\right\rVert_{2} 𝖠j+1𝖯z𝖠jv2+𝖠j+1𝖯z𝖠jv2\displaystyle\leq\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}v^{\parallel}\right\rVert_{2}+\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}v^{\perp}\right\rVert_{2}
𝖠j+1𝖯z𝖠jv2+𝖠jv2\displaystyle\leq\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}\mathsf{A}_{j}v^{\parallel}\right\rVert_{2}+\left\lVert\mathsf{A}_{j}v^{\perp}\right\rVert_{2}
𝖠j+1𝖯zv2+σ2(𝖠j)\displaystyle\leq\left\lVert\mathsf{A}_{j+1}\mathsf{P}_{z}v^{\parallel}\right\rVert_{2}+\sigma_{2}(\mathsf{A}_{j})
𝖠j+1(𝖯zv)2+𝖠j+1(𝖯zv)2+σ2(𝖠j)\displaystyle\leq\left\lVert\mathsf{A}_{j+1}(\mathsf{P}_{z}v^{\parallel})^{\parallel}\right\rVert_{2}+\left\lVert\mathsf{A}_{j+1}(\mathsf{P}_{z}v^{\parallel})^{\perp}\right\rVert_{2}+\sigma_{2}(\mathsf{A}_{j})
(𝖯zv)2+σ2(𝖠j+1)+σ2(𝖠j)\displaystyle\leq\left\lVert(\mathsf{P}_{z}v^{\parallel})^{\parallel}\right\rVert_{2}+\sigma_{2}(\mathsf{A}_{j+1})+\sigma_{2}(\mathsf{A}_{j})
η0+2γ.\displaystyle\leq\eta_{0}+2\gamma.

Applying this inequality to every two terms of the product, the result follows.      

Corollary 7.4.

Let W[0,s1]W[0,s-1] be the collection of walks on the ss-wide replacement product of the graphs GG and HH and η0>0\eta_{0}>0. If σ2((𝖨𝖠H)missingGi(𝖨𝖠H))γ\sigma_{2}((\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{\mathsf{missing}}G_{i}(\mathsf{I}\otimes\mathsf{A}_{H}))\leq\gamma for all 0is20\leq i\leq s-2, then W[0,s1]W[0,s-1] is an (η0,η)(\eta_{0},\eta)-parity sampler, where η=(η0+2γ)(s1)/2\eta=(\eta_{0}+2\gamma)^{\lfloor(s-1)/2\rfloor}.

Proof.

Let z𝔽2nz\in{\mathbb{F}}_{2}^{n} have bias at most η0\eta_{0}. The bias of dsumW[0,s1](z)\operatorname{dsum}_{W[0,s-1]}(z) is given by 777This is slightly different from the expression for the bias given in Section 4.3, but both are equal since moving on the HH component of the graph doesn’t affect the bit assigned to a vertex.

bias(dsumW[0,s1](z))=|𝟏,𝖯z(i=0s2(𝖨𝖠H)𝖦i(𝖨𝖠H)𝖯z)𝟏|,\operatorname{bias}(\operatorname{dsum}_{W[0,s-1]}(z))=\left\lvert\left\langle\mathbf{1},\mathsf{P}_{z}\left(\prod_{i=0}^{s-2}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{P}_{z}\right)\mathbf{1}\right\rangle\right\rvert,

where 𝖯z\mathsf{P}_{z} is the diagonal matrix with entries (𝖯z)(v,h),(v,h)=(1)zv(\mathsf{P}_{z})_{(v,h),(v,h)}=(-1)^{z_{v}} for (v,h)V(G)×V(H)(v,h)\in V(G)\times V(H) and 𝟏\mathbf{1} is the all-ones vector. Since 𝖯z\mathsf{P}_{z} is unitary, we have

bias(dsumW[0,s1](z))i=0s2(𝖨𝖠H)𝖦i(𝖨𝖠H)𝖯z2(η0+2γ)(s1)/2=η\operatorname{bias}(\operatorname{dsum}_{W[0,s-1]}(z))\leq\left\lVert\prod_{i=0}^{s-2}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{G}_{i}(\mathsf{I}\otimes\mathsf{A}_{H})\mathsf{P}_{z}\right\rVert_{2}\leq(\eta_{0}+2\gamma)^{\lfloor(s-1)/2\rfloor}=\eta

by 7.3. Hence W[0,s1]W[0,s-1] is an (η0,η)(\eta_{0},\eta)-parity sampler.      

For higher levels of the cascade, we need to prove parity sampling for collections of walks over walks. Since the walks on the first level contain ss vertices, when we take walks on higher levels, the operator linking different walks together will always use Gs1G_{s-1} as the walk operator for the GG step. Thus we can consider a more specific form of the split operator where we split at a time parameter that is one less than a multiple of ss.

Definition 7.5.

Let r1(mods)r\equiv-1\pmod{s} be a positive integer. We define the operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup} as

𝖲r,r=𝖲k1,k2,k3,\mathsf{S}_{r,r}^{\bigtriangleup}=\mathsf{S}_{k_{1},k_{2},k_{3}},

where k1=0k_{1}=0, k2=rk_{2}=r, and k3=2r+1k_{3}=2r+1. In this case, W[k1,k2]=W[k2+1,k3]W[k_{1},k_{2}]=W[k_{2}+1,k_{3}].

All levels of the code cascade beyond the first use walks generated by the directed operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup}. Proving parity sampling for these walks is analogous to the proof of Corollary 7.4, but slightly simpler since the walk operator doesn’t change with each step.

Claim 7.6.

Let r1(mods)r\equiv-1\pmod{s} be a positive integer and z𝔽2W[0,r]z\in\mathbb{F}_{2}^{W[0,r]} be a word with bias(z)η0\operatorname{bias}(z)\leq\eta_{0}. Let 𝖯~z\widetilde{\mathsf{P}}_{z} be the diagonal matrix with entries (𝖯~z)w,w=(1)zw(\widetilde{\mathsf{P}}_{z})_{w,w}=(-1)^{z_{w}} for wW[0,r]w\in W[0,r]. For every integer k1k\geq 1, we have

(𝖲r,r𝖯~z)k12(η0+2σ2(𝖲r,r))(k1)/2.\left\lVert\left(\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\right)^{k-1}\right\rVert_{2}\leq\left(\eta_{0}+2\cdot\sigma_{2}\left(\mathsf{S}_{r,r}^{\bigtriangleup}\right)\right)^{\lfloor(k-1)/2\rfloor}.
Proof.

Take a vector vW[0,r]v\in{\mathbb{R}}^{W[0,r]} with v2=1\left\lVert v\right\rVert_{2}=1 and let vv^{\parallel} and vv^{\perp} be its parallel and orthogonal components to the all ones vector. Since 𝖯~z\widetilde{\mathsf{P}}_{z} is unitary, 𝖲r,r𝖯~z𝖲r,r𝖯~z2=𝖲r,r𝖯~z𝖲r,r2\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\right\rVert_{2}=\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\mathsf{S}_{r,r}^{\bigtriangleup}\right\rVert_{2}. We have

𝖲r,r𝖯~z𝖲r,rv2\displaystyle\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\mathsf{S}_{r,r}^{\bigtriangleup}v\right\rVert_{2} 𝖲r,r𝖯~z𝖲r,rv2+𝖲r,r𝖯~z𝖲r,rv2\displaystyle\leq\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\mathsf{S}_{r,r}^{\bigtriangleup}v^{\parallel}\right\rVert_{2}+\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\mathsf{S}_{r,r}^{\bigtriangleup}v^{\perp}\right\rVert_{2}
𝖲r,r𝖯~z𝖲r,rv2+𝖲r,rv2\displaystyle\leq\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\mathsf{S}_{r,r}^{\bigtriangleup}v^{\parallel}\right\rVert_{2}+\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}v^{\perp}\right\rVert_{2}
𝖲r,r𝖯~zv2+σ2(𝖲r,r)\displaystyle\leq\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}v^{\parallel}\right\rVert_{2}+\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup})
𝖲r,r(𝖯~zv)2+𝖲r,r(𝖯~zv)2+σ2(𝖲r,r)\displaystyle\leq\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}(\widetilde{\mathsf{P}}_{z}v^{\parallel})^{\parallel}\right\rVert_{2}+\left\lVert\mathsf{S}_{r,r}^{\bigtriangleup}(\widetilde{\mathsf{P}}_{z}v^{\parallel})^{\perp}\right\rVert_{2}+\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup})
(𝖯~zv)2+σ2(𝖲r,r)+σ2(𝖲r,r)\displaystyle\leq\left\lVert(\widetilde{\mathsf{P}}_{z}v^{\parallel})^{\parallel}\right\rVert_{2}+\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup})+\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup})
η0+2σ2(𝖲r,r).\displaystyle\leq\eta_{0}+2\cdot\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup}).

As (𝖲r,r𝖯~z)k12(𝖲r,r𝖯~z)2(k1)/2\left\lVert(\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z})^{k-1}\right\rVert_{2}\leq\left\lVert(\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z})^{2}\right\rVert^{\lfloor(k-1)/2\rfloor}, the result follows.      

Corollary 7.7.

Let r1(mods)r\equiv-1\pmod{s} be a positive integer and η0>0\eta_{0}>0. The collection of walks W(k)W(k) with kk vertices over the vertex set W[0,r]W[0,r] using random walk operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup} is an (η0,η)(\eta_{0},\eta)-parity sampler, where η=(η0+2σ2(𝖲r,r))(k1)/2\eta=(\eta_{0}+2\cdot\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup}))^{\lfloor(k-1)/2\rfloor}.

Proof.

Let z𝔽2W[0,r]z\in{\mathbb{F}}_{2}^{W[0,r]} have bias at most η0\eta_{0}. The bias of the direct sum lifting of zz is given by

bias(dsumW(k)(z))=|𝟏,𝖯~z(𝖲r,r𝖯~z)k1𝟏|,\operatorname{bias}(\operatorname{dsum}_{W(k)}(z))=\left\lvert\left\langle\mathbf{1},\widetilde{\mathsf{P}}_{z}(\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z})^{k-1}\mathbf{1}\right\rangle\right\rvert,

where 𝖯~z\widetilde{\mathsf{P}}_{z} is the diagonal matrix with entries (𝖯~z)w,w=(1)zw(\widetilde{\mathsf{P}}_{z})_{w,w}=(-1)^{z_{w}} for wW[0,r]w\in W[0,r] and 𝟏\mathbf{1} is the all-ones vector. Since 𝖯~z\widetilde{\mathsf{P}}_{z} is unitary, we have

|𝟏,𝖯~z(𝖲r,r𝖯~z)k1𝟏|(𝖲r,r𝖯~z)k12(η0+2σ2(𝖲r,r))(k1)/2=η\left\lvert\left\langle\mathbf{1},\widetilde{\mathsf{P}}_{z}(\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z})^{k-1}\mathbf{1}\right\rangle\right\rvert\leq\left\lVert\left(\mathsf{S}_{r,r}^{\bigtriangleup}\widetilde{\mathsf{P}}_{z}\right)^{k-1}\right\rVert_{2}\leq\left(\eta_{0}+2\cdot\sigma_{2}\left(\mathsf{S}_{r,r}^{\bigtriangleup}\right)\right)^{\lfloor(k-1)/2\rfloor}=\eta

by 7.6. Hence W(k)W(k) is an (η0,η)(\eta_{0},\eta)-parity sampler.      

7.2 Splittability of Ta-Shma’s Construction

We investigate the splittability of the collection of walks generated by Ta-Shma’s construction. In order to formally define this property, we will need the concept of an interval splitting tree, which describes how a walk is split into smaller and smaller pieces.

Definition 7.8 (Interval Splitting Tree).

We say that a binary rooted tree 𝒯\mathcal{T} is a kk-interval splitting tree if it has exactly kk leaves and

  • -

    the root of 𝒯\mathcal{T} is labeled with (0,m,k1)(0,m,k-1) for some m{0,1,,k2}m\in\{0,1,\dots,k-2\}, and

  • -

    each non-leaf non-root vertex vv of 𝒯\mathcal{T} is labeled with (k1,k2,k3)(k_{1},k_{2},k_{3}) for some integer k2[k1,k31]k_{2}\in[k_{1},k_{3}-1]. Suppose (k1,k2,k3)(k_{1}^{\prime},k_{2}^{\prime},k_{3}^{\prime}) is the label assigned to the parent of vv. If vv is a left child, we must have k1=k1k_{1}=k_{1}^{\prime} and k3=k2k_{3}=k_{2}^{\prime}; otherwise, we must have k1=k2+1k_{1}=k_{2}^{\prime}+1 and k3=k3k_{3}=k_{3}^{\prime}.

Given an interval splitting tree 𝒯\mathcal{T}, we can naturally associate a split operator 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} to each internal node (k1,k2,k3)(k_{1},k_{2},k_{3}). The splittability of a collection W[0,k1]W[0,k-1] of kk-tuples is a notion of expansion at every node in the splitting tree.

Definition 7.9 ((𝒯,τ)(\mathcal{T},\tau)-splittability).

The collection W[0,k1]W[0,k-1] is said to be (𝒯,τ)(\mathcal{T},\tau)-splittable if 𝒯\mathcal{T} is a kk-interval splitting tree and

σ2(𝖲k1,k2,k3)τ\sigma_{2}(\mathsf{S}_{k_{1},k_{2},k_{3}})\leq\tau

for every internal node (k1,k2,k3)(k_{1},k_{2},k_{3}) of 𝒯\mathcal{T}.

If there exists some kk-interval splitting tree 𝒯\mathcal{T} such that W[0,k1]W[0,k-1] is (𝒯,τ)(\mathcal{T},\tau)-splittable, then W[0,k1]W[0,k-1] will be called τ\tau-splittable.

In order to prove that the collection of walks in Ta-Shma’s construction is splittable, a split operator 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} can be related to the walk operator (I𝖠H)Gk2(I𝖠H)(I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H}) as shown below. This structural property will allow us to deduce spectral properties of 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} from the spectrum of (I𝖠H)Gk2(I𝖠H)(I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H}).

Lemma 7.10.

Let 0k1k2<k30\leq k_{1}\leq k_{2}<k_{3}. Suppose GG is a d1d_{1}-regular outer graph on vertex set [n][n] with walk operator Gk2G_{k_{2}} used at step k2k_{2} of a walk on the ss-wide replacement product and HH is a d2d_{2}-regular inner graph on vertex set [m][m] with normalized random walk operator 𝖠H\mathsf{A}_{H}. Then there are orderings of the rows and columns of the representations of 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} and 𝖠H\mathsf{A}_{H} as matrices such that

𝖲k1,k2,k3=((I𝖠H)Gk2(I𝖠H))𝖩/d22(k3k21),\mathsf{S}_{k_{1},k_{2},k_{3}}=\left((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H})\right)\otimes\mathsf{J}/d_{2}^{2(k_{3}-k_{2}-1)},

where 𝖩[d2]2(k2k1)×[d2]2(k3k21)\mathsf{J}\in\mathbb{R}^{[d_{2}]^{2(k_{2}-k_{1})}\times[d_{2}]^{2(k_{3}-k_{2}-1)}} is the all ones matrix.

Proof.

Partition the set of walks W[k1,k2]W[k_{1},k_{2}] into the sets W1,1,,Wn,mW_{1,1},\dots,W_{n,m}, where wWi,jw\in W_{i,j} if the last vertex of the walk wk2=(vk2,hk2)w_{k_{2}}=(v_{k_{2}},h_{k_{2}}) satisfies vk2=iv_{k_{2}}=i and hk2=jh_{k_{2}}=j. Similarly, partition W[k2+1,k3]W[k_{2}+1,k_{3}] into the sets W1,1,,Wn,mW_{1,1}^{\prime},\dots,W_{n,m}^{\prime}, where wWi,jw^{\prime}\in W_{i,j}^{\prime} if the first vertex of the walk w1=(v1,h1)w^{\prime}_{1}=(v_{1},h_{1}) satisfies v1=iv_{1}=i and h1=jh_{1}=j. Note that |Wi,j|=d22(k2k1)\left\lvert W_{i,j}\right\rvert=d_{2}^{2(k_{2}-k_{1})} and |Wi,j|=d22(k3k21)\left\lvert W_{i,j}^{\prime}\right\rvert=d_{2}^{2(k_{3}-k_{2}-1)} for all (i,j)[n]×[m](i,j)\in[n]\times[m], since there are d22d_{2}^{2} choices for each step of the walk.

Now order the rows of the matrix 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} so that all of the rows corresponding to walks in W1,1W_{1,1} appear first, followed by those for walks in W1,2W_{1,2}, and so on in lexicographic order of the indices (i,j)(i,j) of Wi,jW_{i,j}, with an arbitrary order within each set. Do a similar re-ordering of the columns for the sets W1,1,,W1,mW_{1,1}^{\prime},\dots,W_{1,m}^{\prime}. Observe that

(𝖲k1,k2,k3)w,w\displaystyle\left(\mathsf{S}_{k_{1},k_{2},k_{3}}\right)_{w,w^{\prime}} =𝟙wwW[k1,k3]d22(k3k2)\displaystyle=\frac{\mathbb{1}_{ww^{\prime}\in W[k_{1},k_{3}]}}{d_{2}^{2(k_{3}-k_{2})}}
=d22(weight of transition from (vk2,hk2) to (v1,h1) in (I𝖠H)Gk2(I𝖠H))d22(k3k2),\displaystyle=\frac{d_{2}^{2}\cdot(\text{weight of transition from }(v_{k_{2}},h_{k_{2}})\text{ to }(v_{1}^{\prime},h^{\prime}_{1})\text{ in }(I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H}))}{d_{2}^{2(k_{3}-k_{2})}},

which only depends on the adjacency of the last vertex of ww and the first vertex of ww^{\prime}. If the vertices wk2=(vk2,hk2)w_{k_{2}}=(v_{k_{2}},h_{k_{2}}) and w1=(v1,h1)w_{1}^{\prime}=(v_{1},h_{1}) are adjacent, then

(𝖲k1,k2,k3)w,w=((I𝖠H)Gk2(I𝖠H))(vk2,hk2),(v1,h1)/d22(k3k21),\left(\mathsf{S}_{k_{1},k_{2},k_{3}}\right)_{w,w^{\prime}}=\left((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H})\right)_{(v_{k_{2}},h_{k_{2}}),(v_{1}^{\prime},h^{\prime}_{1})}/d_{2}^{2(k_{3}-k_{2}-1)},

for every wWwk2w\in W_{w_{k_{2}}} and wWwk1w^{\prime}\in W_{w_{k_{1}}}^{\prime}; otherwise, (𝖲k1,k2,k3)w,w=0\left(\mathsf{S}_{k_{1},k_{2},k_{3}}\right)_{w,w^{\prime}}=0. Since the walks in the rows and columns are sorted according to their last and first vertices, respectively, the matrix 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} exactly matches the tensor product ((I𝖠H)Gk2(I𝖠H))𝖩/d22(k3k21)((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H}))\otimes\mathsf{J}/d_{2}^{2(k_{3}-k_{2}-1)}.      

Corollary 7.11.

Let 0k1k2<k30\leq k_{1}\leq k_{2}<k_{3}. Suppose GG is a d1d_{1}-regular outer graph with walk operator Gk2G_{k_{2}} used at step k2k_{2} of a walk on the ss-wide replacement product and HH is a d2d_{2}-regular inner graph with normalized random walk operator 𝖠H\mathsf{A}_{H}. Then

σ2(𝖲k1,k2,k3)=σ2((I𝖠H)Gk2(I𝖠H)).\sigma_{2}(\mathsf{S}_{k_{1},k_{2},k_{3}})=\sigma_{2}((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H})).
Proof.

Using Lemma 7.10 and the fact that

σ2(((I𝖠H)Gk2(I𝖠H))𝖩/d22(k3k21))=σ2((I𝖠H)Gk2(I𝖠H)),\sigma_{2}(((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H}))\otimes\mathsf{J}/d_{2}^{2(k_{3}-k_{2}-1)})=\sigma_{2}((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H})),

the result follows.      

Remark 7.12.

Corollary 7.11 is what causes the splittability argument to break down for Ta-Shma’s original construction, as σ2(𝖦k2(𝖨𝖠H))=1\sigma_{2}(\mathsf{G}_{k_{2}}(\mathsf{I}\otimes\mathsf{A}_{H}))=1.

By combining this result with the spectral bound from 4.4, we find that the collection of walks of length ss on the ss-wide replacement product is (𝒯,τ)(\mathcal{T},\tau)-splittable for any splitting tree 𝒯\mathcal{T}, where τ\tau is controlled by the second singular values of the graphs GG and HH. This analysis can also be applied to walks on higher levels of the cascade where the vertex set is W[0,r]W[0,r].

Corollary 7.13 (Restatement of Lemma 6.5).

The collection of walks W[0,s1]W[0,s-1] on the ss-wide replacement product with outer graph GG and inner graph HH and the collection of walks W(k)W(k) on the vertex set W[0,r]W[0,r] with random walk operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup} and r1(mods)r\equiv-1\pmod{s} are both τ\tau-splittable with τ=σ2(G)+2σ2(H)+σ2(H)2\tau=\sigma_{2}(G)+2\sigma_{2}(H)+\sigma_{2}(H)^{2}.

Proof.

By Corollary 7.11 and 4.4, the split operator 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} for any 0k1k2<k30\leq k_{1}\leq k_{2}<k_{3} satisfies

σ2(𝖲k1,k2,k3)=σ2((I𝖠H)Gk2(I𝖠H))σ2(G)+2σ2(H)+σ2(H)2,\sigma_{2}(\mathsf{S}_{k_{1},k_{2},k_{3}})=\sigma_{2}((I\otimes\mathsf{A}_{H})G_{k_{2}}(I\otimes\mathsf{A}_{H}))\leq\sigma_{2}(G)+2\sigma_{2}(H)+\sigma_{2}(H)^{2},

so W[0,s1]W[0,s-1] is τ\tau-splittable with τ=σ2(G)+2σ2(H)+σ2(H)2\tau=\sigma_{2}(G)+2\sigma_{2}(H)+\sigma_{2}(H)^{2}, as any internal node (k1,k2,k3)(k_{1},k_{2},k_{3}) of any ss-interval splitting tree will have σ2(𝖲k1,k2,k3)τ\sigma_{2}(\mathsf{S}_{k_{1},k_{2},k_{3}})\leq\tau. The split operators of any kk-interval splitting tree for the collection W(k)W(k) are of the form 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} with k10(mods)k_{1}\equiv 0\pmod{s} and k2,k31(mods)k_{2},k_{3}\equiv-1\pmod{s}, which means W(k)W(k) is τ\tau-splittable as well.      

7.3 Integration with Sum-of-Squares

Before defining tensoriality and obtaining it in our setting, we examine how the Sum-of-Squares hierarchy is used in the list decoding algorithm in more detail.

7.3.1 SOS Preliminaries: pp-local PSD Ensembles

The SOS hierarchy gives a sequence of increasingly tight semidefinite programming relaxations for several optimization problems, including CSPs. Since we will use relatively few facts about the SOS hierarchy, already developed in the analysis of Barak, Raghavendra and Steurer [BRS11], we will adapt their notation of pp-local distributions to describe the relaxations.

Solutions to a semidefinite relaxation of a CSP on nn boolean variables using pp levels of the SOS hierarchy induce probability distributions μS\mu_{S} over 𝔽2S{\mathbb{F}}_{2}^{S} for any set S[n]S\subseteq[n] with |S|p\left\lvert S\right\rvert\leq p. These distributions are consistent on intersections: for TS[n]T\subseteq S\subseteq[n], we have μS|T=μT\mu_{S|T}=\mu_{T}, where μS|T\mu_{S|T} denotes the restriction of the distribution μS\mu_{S} to the set TT. We use these distributions to define a collection of random variables 𝐙1,,𝐙n\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n} taking values in 𝔽2{\mathbb{F}}_{2} such that for any set SS with |S|p\left\lvert S\right\rvert\leq p, the collection of variables {𝐙i}iS\left\{\mathbf{Z}_{i}\right\}_{i\in S} has joint distribution μS\mu_{S}. Note that the entire collection {𝐙1,,𝐙n}\{\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}\} may not have a joint distribution: this property is only true for sub-collections of size at most pp. We will refer to the collection {𝐙1,,𝐙n}\{\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}\} as a pp-local ensemble of random variables.

For any T[n]T\subseteq[n] with |T|p2\left\lvert T\right\rvert\leq p-2 and any ξ𝔽2T\xi\in{\mathbb{F}}_{2}^{T}, we can define a (p|T|)(p-\left\lvert T\right\rvert)-local ensemble {𝐙1,,𝐙n}\{\mathbf{Z}_{1}^{\prime},\ldots,\mathbf{Z}_{n}^{\prime}\} by “conditioning” the local distributions on the event 𝐙T=ξ\mathbf{Z}_{T}=\xi, where 𝐙T\mathbf{Z}_{T} is shorthand for the collection {𝐙i}iT\left\{\mathbf{Z}_{i}\right\}_{i\in T}. For any SS with |S|p|T|\left\lvert S\right\rvert\leq p-\left\lvert T\right\rvert, we define the distribution of 𝐙S\mathbf{Z}_{S}^{\prime} as μS:=μST|{𝐙T=ξ}\mu_{S}^{\prime}:=\mu_{S\cup T}|\{\mathbf{Z}_{T}=\xi\}.

Finally, the semidefinite program also ensures that for any such conditioning, the conditional covariance matrix

𝖬(S1,α1)(S2,α2)=Cov(𝟙[𝐙S1=α1],𝟙[𝐙S2=α2])\mathsf{M}_{(S_{1},\alpha_{1})(S_{2},\alpha_{2})}~{}=~{}\operatorname{\operatorname{Cov}}\left(\mathbb{1}_{[\mathbf{Z}_{S_{1}}^{\prime}=\alpha_{1}]},\mathbb{1}_{[\mathbf{Z}_{S_{2}}^{\prime}=\alpha_{2}]}\right)

is positive semidefinite, where |S1|,|S2|(p|T|)/2\left\lvert S_{1}\right\rvert,\left\lvert S_{2}\right\rvert\leq(p-\left\lvert T\right\rvert)/2. Here, for each pair S1,S2S_{1},S_{2} the covariance is computed using the joint distribution μS1S2\mu_{S_{1}\cup S_{2}}^{\prime}. In this paper, we will only consider pp-local ensembles such that for every conditioning on a set of size at most (p2)(p-2), the conditional covariance matrix is PSD. We will refer to these as pp-local PSD ensembles. We will also need a simple corollary of the above definitions.

Fact 7.14.

Let {𝐙1,,𝐙n}\{\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}\} be a pp-local PSD ensemble and W(k)[n]kW(k)\subseteq[n]^{k} For 1i<k1\leq i<k, define W(i)[n]iW(i)\subseteq[n]^{i} to be the collection of tuples of size ii appearing in elements of W(k)W(k). For all pp/2p^{\prime}\leq p/2, the collection {𝐙set(w)}wW(p)\left\{\mathbf{Z}_{\operatorname{set}(w)}\right\}_{w\in W(\leq p^{\prime})} is a (p/p)(p/p^{\prime})-local PSD ensemble, where W(p)=i=1pW(i)W(\leq p^{\prime})=\bigcup_{i=1}^{p^{\prime}}W(i).

For random variables 𝐙S\mathbf{Z}_{S} in a pp-local PSD ensemble, we use the notation {𝐙S}\left\{\mathbf{Z}_{S}\right\} to denote the distribution of 𝐙S\mathbf{Z}_{S} (which exists when |S|p\left\lvert S\right\rvert\leq p). As we will work with ordered tuples of variables instead of sets, we define 𝐙w\mathbf{Z}_{w} for w[n]kw\in[n]^{k} based on the set Sw=set(w)S_{w}=\operatorname{set}(w), taking care that repeated elements of ww are always assigned the same value.

Definition 7.15 (Plausible assignment).

Given w=(w1,,wk)[n]kw=(w_{1},\dots,w_{k})\in[n]^{k} and an assignment α𝔽2w\alpha\in{\mathbb{F}}_{2}^{w}, we say that α\alpha is plausible for ww if there are no distinct i,j[k]i,j\in[k] such that wi=wjw_{i}=w_{j} but αiαj\alpha_{i}\neq\alpha_{j}.

The distribution {𝐙w}=μw\{\mathbf{Z}_{w}\}=\mu_{w} is defined as μw(α)=μSw(α|Sw)\mu_{w}(\alpha)=\mu_{S_{w}}(\alpha|_{S_{w}}) if α𝔽2w\alpha\in{\mathbb{F}}_{2}^{w} is plausible for ww, and μw(α)=0\mu_{w}(\alpha)=0 otherwise.

7.3.2 Tensoriality

A key algorithm in the list decoding framework is propagation rounding (7.16), which solves a CSP to find solutions close to a codeword. Suppose W(k)[n]kW(k)\subseteq[n]^{k} is a collection of walks, or more generally, a collection of any kk-tuples. The algorithm starts with a local PSD ensemble {𝐙1,,𝐙n}\{\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}\} which is the solution to an SOS program for list decoding. Propagation rounding takes this solution and conditions some of the variables according to a random assignment to these variables to yield another local PSD ensemble 𝐙\mathbf{Z}^{\prime}.

Algorithm 7.16 (Propagation Rounding Algorithm, adapted from [AJQ+20]).
Input An (L+2k)(L+2k)-local PSD ensemble {𝐙1,,𝐙n}\{\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}\} and collection W(k)[n]kW(k)\subseteq[n]^{k}. Output A random assignment (σ1,,σn)𝔽2n(\sigma_{1},\ldots,\sigma_{n})\in{\mathbb{F}}_{2}^{n} and 2k2k-local PSD ensemble 𝐙\mathbf{Z}^{\prime}. 1. Choose m{1,,L/k}m\in\left\{1,\ldots,L/k\right\} uniformly at random. 2. For j=1,,mj=1,\dots,m, sample a walk wjw_{j} independently and uniformly from W(k)W(k). 3. Write S=j=1mset(wj)S=\bigcup_{j=1}^{m}\textup{set}(w_{j}) for the set of the seed vertices. 4. Sample an assignment σ:S𝔽2\sigma:S\rightarrow{\mathbb{F}}_{2} according to the local distribution {𝐙S}\{\mathbf{Z}_{S}\}. 5. Set 𝐙={𝐙1,,𝐙n|𝐙S=σ}\mathbf{Z}^{\prime}=\{\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}|\mathbf{Z}_{S}=\sigma\}, i.e. the local ensemble 𝐙\mathbf{Z} conditioned on agreeing with σ\sigma. 6. For all i[n]i\in[n], sample independently σi{𝐙i}\sigma_{i}\sim\{\mathbf{Z}^{\prime}_{i}\}. 7. Output (σ1,,σn)(\sigma_{1},\ldots,\sigma_{n}) and 𝐙\mathbf{Z}^{\prime}.

If the collection W(k)[n]kW(k)\subseteq[n]^{k} used in the direct sum lifting is amenable to SOS rounding, the conditioned ensemble 𝐙\mathbf{Z}^{\prime} will be able to recover a word close to some codeword on the list. This is quantified by the following tensorial properties. We will see shortly how splittability will be used to obtain tensoriality in our setting.

Definition 7.17 (Tensorial Walk Collection).

Let W(k)[n]kW(k)\subseteq[n]^{k}, μ[0,1]\mu\in[0,1], and LL\in\mathbb{N}. Define Ω\Omega to be the set of all tuples (m,S,σ)(m,S,\sigma) obtainable in propagation rounding (7.16) on W(k)W(k) with SOS degree parameter LL. We say that W(k)W(k) is (μ,L)(\mu,L)-tensorial if the local PSD ensemble 𝐙\mathbf{Z}^{\prime} returned by propagation rounding satisfies

𝔼Ω𝔼wW(k){𝐙w}{𝐙w(1)}{𝐙w(k)}1μ.\operatorname*{\mathbb{E}}_{\Omega}\operatorname*{\mathbb{E}}_{w\in W(k)}{\left\lVert\{\mathbf{Z}_{w}^{\prime}\}-\left\{\mathbf{Z}_{w(1)}^{\prime}\right\}\cdots\left\{\mathbf{Z}_{w(k)}^{\prime}\right\}\right\rVert_{1}}\leq\mu. (4)

The framework actually uses a strengthening of the above property, in which variables for pairs of walks chosen independently approximately behave as a product.

Definition 7.18 (Two-Step Tensorial Walk Collection).

Let W(k)[n]kW(k)\subseteq[n]^{k}, μ[0,1]\mu\in[0,1], and LL\in\mathbb{N}. Define Ω\Omega to be the set of all tuples (m,S,σ)(m,S,\sigma) obtainable in propagation rounding (7.16) on W(k)W(k) with SOS degree parameter LL. We say that W(k)W(k) is (μ,L)(\mu,L)-two-step tensorial if it is (μ,L)(\mu,L)-tensorial and the local PSD ensemble 𝐙\mathbf{Z}^{\prime} returned by propagation rounding satisfies the additional condition

𝔼Ω𝔼w,wW(k){𝐙w𝐙w}{𝐙w}{𝐙w}1μ.\operatorname*{\mathbb{E}}_{\Omega}\operatorname*{\mathbb{E}}_{w,w^{\prime}\in W(k)}{\left\lVert\{\mathbf{Z}_{w}^{\prime}\mathbf{Z}_{w^{\prime}}^{\prime}\}-\left\{\mathbf{Z}_{w}^{\prime}\right\}\left\{\mathbf{Z}_{w^{\prime}}^{\prime}\right\}\right\rVert_{1}}\leq\mu.

7.3.3 From Directed to Undirected

In order to apply the list decoding framework using the directed split operator 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}}, we will replace it with the symmetrized version

𝒰(𝖲k1,k2,k3)=(0𝖲k1,k2,k3(𝖲k1,k2,k3)0)\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}})=\begin{pmatrix}0&\mathsf{S}_{k_{1},k_{2},k_{3}}\\ \left(\mathsf{S}_{k_{1},k_{2},k_{3}}\right)^{{\dagger}}&0\end{pmatrix}

and show how 𝒰(𝖲k1,k2,k3)\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}}) corresponds to a particular undirected graph.

Definition 7.19.

Let 0k1k2<k30\leq k_{1}\leq k_{2}<k_{3}. We define the operator 𝔖k2,k3,k1:W[k1,k2]W[k2+1,k3]\mathfrak{S}_{k_{2},k_{3},k_{1}}\colon\mathbb{R}^{W[k_{1},k_{2}]}\rightarrow\mathbb{R}^{W[k_{2}+1,k_{3}]} such that for every fW[k1,k2]f\in{\mathbb{R}}^{W[k_{1},k_{2}]},

(𝔖k2,k3,k1(f))(w)𝔼w:wwW[k1,k3][f(w)],\left(\mathfrak{S}_{k_{2},k_{3},k_{1}}(f)\right)(w^{\prime})\coloneqq{\mathbb{E}}_{w:ww^{\prime}\in W[k_{1},k_{3}]}[f(w)],

for every wW[k2+1,k3]w^{\prime}\in W[k_{2}+1,k_{3}].

The operator 𝒰(𝖲k1,k2,k3)\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}}) defines an undirected weighted bipartite graph on the vertices W[k1,k2]W[k2+1,k3]W[k_{1},k_{2}]\cup W[k_{2}+1,k_{3}]. We can see that 𝔖k2,k3,k1\mathfrak{S}_{k_{2},k_{3},k_{1}} is the adjoint of 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}}, which means that each edge wwww^{\prime} in this graph is weighted according to the transition probability from one walk to the other whenever one of ww, ww^{\prime} is in W[k1,k2]W[k_{1},k_{2}] and the other is in W[k2+1,k3]W[k_{2}+1,k_{3}].

Claim 7.20.
(𝖲k1,k2,k3)=𝔖k2,k3,k1.\left(\mathsf{S}_{k_{1},k_{2},k_{3}}\right)^{{\dagger}}=\mathfrak{S}_{k_{2},k_{3},k_{1}}.
Proof.

Let fCW[k1,k2]f\in C^{W[k_{1},k_{2}]} and gCW[k2+1,k3]g\in C^{W[k_{2}+1,k_{3}]}. For iji\leq j, define Πi,j\Pi_{i,j} to be the uniform distribution on W[i,j]W[i,j]. We show that f,𝖲k1,k2,k3g=𝔖k2,k3,k1f,g\left\langle f,\mathsf{S}_{k_{1},k_{2},k_{3}}g\right\rangle=\left\langle\mathfrak{S}_{k_{2},k_{3},k_{1}}f,g\right\rangle. On one hand we have

f,𝖲k1,k2,k3g\displaystyle\left\langle f,\mathsf{S}_{k_{1},k_{2},k_{3}}g\right\rangle =𝔼wW[k1,k2][f(w)𝔼w:wwW[k1,k3][g(w)]]\displaystyle={\mathbb{E}}_{w\in W[k_{1},k_{2}]}\left[f(w){\mathbb{E}}_{w^{\prime}:ww^{\prime}\in W[k_{1},k_{3}]}[g(w^{\prime})]\right]
=𝔼wW[k1,k2][f(w)wW[k2+1,k3]Πk1,k3(ww)Πk1,k2(w)g(w)]\displaystyle={\mathbb{E}}_{w\in W[k_{1},k_{2}]}\left[f(w)\sum_{w^{\prime}\in W[k_{2}+1,k_{3}]}\frac{\Pi_{k_{1},k_{3}}(ww^{\prime})}{\Pi_{k_{1},k_{2}}(w)}g(w^{\prime})\right]
=wW[k1,k2]Πk1,k2(w)f(w)wW[k2+1,k3]Πk1,k3(ww)Πk1,k2(w)g(w)\displaystyle=\sum_{w\in W[k_{1},k_{2}]}\Pi_{k_{1},k_{2}}(w)f(w)\sum_{w^{\prime}\in W[k_{2}+1,k_{3}]}\frac{\Pi_{k_{1},k_{3}}(ww^{\prime})}{\Pi_{k_{1},k_{2}}(w)}g(w^{\prime})
=wwW[k1,k3]f(w)g(w)Πk1,k3(ww).\displaystyle=\sum_{ww^{\prime}\in W[k_{1},k_{3}]}f(w)g(w^{\prime})\Pi_{k_{1},k_{3}}(ww^{\prime}).

On the other hand we have

𝔖k2,k3,k1f,g\displaystyle\left\langle\mathfrak{S}_{k_{2},k_{3},k_{1}}f,g\right\rangle =𝔼wW[k2+1,k3][𝔼w:wwW[k1,k3][f(w)]g(w)]\displaystyle={\mathbb{E}}_{w^{\prime}\in W[k_{2}+1,k_{3}]}\left[{\mathbb{E}}_{w:ww^{\prime}\in W[k_{1},k_{3}]}[f(w)]g(w^{\prime})\right]
=𝔼wW[k2+1,k3][wW[k1,k2]Πk1,k3(ww)Πk2+1,k3(w)f(w)g(w)]\displaystyle={\mathbb{E}}_{w^{\prime}\in W[k_{2}+1,k_{3}]}\left[\sum_{w\in W[k_{1},k_{2}]}\frac{\Pi_{k_{1},k_{3}}(ww^{\prime})}{\Pi_{k_{2}+1,k_{3}}(w^{\prime})}f(w)g(w^{\prime})\right]
=wW[k2+1,k3]Πk2+1,k3(w)wW[k1,k2]Πk1,k3(ww)Πk2+1,k3(w)f(w)g(w)\displaystyle=\sum_{w^{\prime}\in W[k_{2}+1,k_{3}]}\Pi_{k_{2}+1,k_{3}}(w^{\prime})\sum_{w\in W[k_{1},k_{2}]}\frac{\Pi_{k_{1},k_{3}}(ww^{\prime})}{\Pi_{k_{2}+1,k_{3}}(w^{\prime})}f(w)g(w^{\prime})
=wwW[k1,k3]f(w)g(w)Πk1,k3(ww).\displaystyle=\sum_{ww^{\prime}\in W[k_{1},k_{3}]}f(w)g(w^{\prime})\Pi_{k_{1},k_{3}}(ww^{\prime}).

Hence, 𝔖k2,k3,k1=(𝖲k1,k2,k3)\mathfrak{S}_{k_{2},k_{3},k_{1}}=(\mathsf{S}_{k_{1},k_{2},k_{3}})^{{\dagger}} as claimed.      

7.3.4 Variables for Walks on the ss-wide Replacement Product

When analyzing walks on the ss-wide replacement product, we actually need to use two separate, but related, local PSD ensembles. In Ta-Shma’s construction, the vertices of the outer graph GG correspond to positions in the base code 𝒞0𝔽2n\mathcal{C}_{0}\subseteq{\mathbb{F}}_{2}^{n}, where n=|V(G)|n=\left\lvert V(G)\right\rvert. Given a vertex (v,h)V(G)×V(H)(v,h)\in V(G)\times V(H) in the ss-wide replacement product and codeword z𝒞0z\in\mathcal{C}_{0}, (v,h)(v,h) is assigned bit zvz_{v}, regardless of the vertex hh of the inner graph. We will enforce this property by working with variables in V(G)V(G) rather than the full V(G)×V(H)V(G)\times V(H). The local PSD ensemble 𝐙={𝐙v}vV(G)\mathbf{Z}=\{\mathbf{Z}_{v}\}_{v\in V(G)} contains one variable for every vertex of GG, with local distributions for sets of variables up to a given size. For a walk ww on the ss-wide replacement product, we will use 𝐙w\mathbf{Z}_{w} as an abbreviation for 𝐙Sw\mathbf{Z}_{S_{w}}, where SwS_{w} is the set of all GG-components of vertices visited on the walk.

The constraints of the CSP are placed on walks on the ss-wide replacement product that do care about the HH-component of the vertices, so we define a second local PSD ensemble 𝐘={𝐘(v,h)}(v,h)V(G)×V(H)\mathbf{Y}=\{\mathbf{Y}_{(v,h)}\}_{(v,h)\in V(G)\times V(H)} with a variable for each vertex of the ss-wide replacement product of GG and HH. It is this collection 𝐘\mathbf{Y} for which we need to prove tensoriality in order to use the list decoding framework. When we perform propagation rounding, we condition the ensemble 𝐙\mathbf{Z} on a random assignment σ\sigma to a subset SV(G)S\subseteq V(G), rather than conditioning 𝐘\mathbf{Y} on a random assignment to a subset of V(G)×V(H)V(G)\times V(H). Working with 𝐙\mathbf{Z} ensures that the rounded assignments will be consistent on each cloud of the ss-wide replacement product. Since the bit assigned to a vertex (v,h)(v,h) only depends on vv, independent rounding of {𝐙𝐙S=σ}\{\mathbf{Z}\mid\mathbf{Z}_{S}=\sigma\} will also yield the desired rounding of {𝐘𝐙S=σ}\{\mathbf{Y}\mid\mathbf{Z}_{S}=\sigma\}.

We can define 𝐘\mathbf{Y} based on the ensemble 𝐙\mathbf{Z} more concretely. Suppose SV(G)×V(H)S^{\prime}\subseteq V(G)\times V(H) is a subset of size at most pp, where pp is the locality of the ensemble, and define T={v(v,h)S}T=\{v\mid(v,h)\in S^{\prime}\}. The distribution μS\mu_{S^{\prime}} of 𝐘S\mathbf{Y}_{S^{\prime}} is defined based on the distribution μT\mu_{T} of 𝐙T\mathbf{Z}_{T} by μS(α)=μT(α|T)\mu_{S^{\prime}}(\alpha)=\mu_{T}(\alpha|_{T}), where α𝔽2S\alpha\in{\mathbb{F}}_{2}^{S^{\prime}} is an assignment to SS^{\prime} whose value on each vertex (v,h)(v,h) only depends on vv.

Observe that the introduction of the ensemble 𝐘\mathbf{Y} is only necessary on the first level of the Ta-Shma code cascade between the codes 𝒞0\mathcal{C}_{0} and 𝒞1\mathcal{C}_{1}, which takes place on the ss-wide replacement product. Higher levels of the cascade use walks on graphs whose vertices are the walks from the level below. The association of the bits of a codeword to the vertices of this graph has no consistency requirement, so we simply use a single local ensemble 𝐙\mathbf{Z} with a variable for each vertex.

7.4 Splittability Implies Tensoriality

The connection between splittability and tensoriality will be made with the help of a version of the triangle inequality.

Claim 7.21 (Triangle inequality, adapted from [AJQ+20]).

Let s+s\in\mathbb{N}^{+} and 𝒯\mathcal{T} be an ss-interval splitting tree. Then

𝔼wW[0,s1]{𝐙w}i=0s1{𝐙w(i)}1(k1,k2,k3)𝒯𝔼wW[k1,k3]{𝐙w}{𝐙w(k1,k2)}{𝐙w(k2+1,k3)}1,\operatorname*{\mathbb{E}}_{w\in W[0,s-1]}{\left\lVert\{\mathbf{Z}_{w}\}-\prod_{i=0}^{s-1}\left\{\mathbf{Z}_{w(i)}\right\}\right\rVert_{1}}\leq\sum_{(k_{1},k_{2},k_{3})\in\mathcal{T}}~{}\operatorname*{\mathbb{E}}_{w\in W[k_{1},k_{3}]}{\left\lVert\{\mathbf{Z}_{w}\}-\left\{\mathbf{Z}_{w(k_{1},k_{2})}\right\}\left\{\mathbf{Z}_{w(k_{2}+1,k_{3})}\right\}\right\rVert_{1}},

where the sum is taken over the labels of the internal nodes of 𝒯\mathcal{T}.

To prove tensoriality, we will use the method of [BRS11] and [AJT19] to show that we can break correlations over expanding collections of tuples arising in the ss-wide replacement product of the form

𝔼wwW[k1,k3]wW[k1,k2],wW[k2+1,k3]{𝐙ww}{𝐙w}{𝐙w}1\operatorname*{\mathbb{E}}_{\begin{subarray}{c}ww^{\prime}\in W[k_{1},k_{3}]\\ w\in W[k_{1},k_{2}],w^{\prime}\in W[k_{2}+1,k_{3}]\end{subarray}}{\left\lVert\{\mathbf{Z}_{ww^{\prime}}\}-\{\mathbf{Z}_{w}\}\{\mathbf{Z}_{w^{\prime}}\}\right\rVert_{1}}

appearing on the right-hand side of the triangle inequality.

7.4.1 The First Level of the Cascade

We now check the technical details to obtain tensoriality for the first lifting in the code cascade between the codes 𝒞0\mathcal{C}_{0} and 𝒞1\mathcal{C}_{1}, which corresponds to taking ss steps in Ta-Shma’s construction. Recall that in order to obtain an assignment z𝔽2nz^{\prime}\in\mathbb{F}_{2}^{n} whose lifting is consistent on vertices with the same GG-component, we need to prove tensoriality for the ensemble 𝐘\mathbf{Y} with a variable for each vertex in V(G)×V(H)V(G)\times V(H).

The proof of tensoriality will make use of a specific entropic potential function. For an arbitrary random variable 𝐗\mathbf{X} taking values in a finite set [q][q], define the function (𝐗)\mathcal{H}(\mathbf{X}) as

(𝐗)1qa[q]H(𝟙[𝐗=a])=𝔼a[q]H(𝟙[𝐗=a]),\mathcal{H}(\mathbf{X})~{}\coloneqq~{}\frac{1}{q}\sum_{a\in[q]}\textup{H}(\mathbb{1}_{[\mathbf{X}=a]})~{}=~{}{\mathbb{E}}_{a\in[q]}\textup{H}(\mathbb{1}_{[\mathbf{X}=a]}),

where H is the binary entropy function. Using this, we define a potential function for a weighted undirected graph GG.

Definition 7.22 (Graph Potential).

Let G=(V,E)G=(V,E) be a weighted graph with edge distribution ΠE\Pi_{E}. Let ΠV\Pi_{V} be the marginal distribution on VV. Suppose that {𝐘i}iV\{\mathbf{Y}_{i}\}_{i\in V} is a pp-local PSD ensemble for some p1p\geq 1. We define ΦG\Phi^{G} to be

ΦG𝔼iΠV[(𝐘i)].\Phi^{G}~{}\coloneqq~{}\mathchoice{\underset{i\sim\Pi_{V}}{\mathbb{E}}\left[\mathcal{H}(\mathbf{Y}_{i})\right]}{{\mathbb{E}}_{i\sim\Pi_{V}}[\mathcal{H}(\mathbf{Y}_{i})]}{{\mathbb{E}}_{i\sim\Pi_{V}}[\mathcal{H}(\mathbf{Y}_{i})]}{{\mathbb{E}}_{i\sim\Pi_{V}}[\mathcal{H}(\mathbf{Y}_{i})]}.

Let 𝒯\mathcal{T} be an ss-interval splitting tree associated with the ss-wide replacement product of graphs GG and HH. We define

Φ𝒯(k1,k2,k3)𝒯Φ𝒰(𝖲k1,k2,k3),\Phi^{\mathcal{T}}\coloneqq\sum_{(k_{1},k_{2},k_{3})\in\mathcal{T}}\Phi^{\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}})},

where 𝒰(𝖲k1,k2,k3)\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}}) is the associated bipartite undirected graph of the operator 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}}.

Lemma 7.23 (Splittability Implies Tensoriality).

Let W[0,s1]W[0,s-1] be the walk collection of the ss-wide replacement product of two graphs GG and HH. If L128(s424s/μ4)L\geq 128\cdot(s^{4}\cdot 2^{4s}/\mu^{4}) and W[0,s1]W[0,s-1] is τ\tau-splittable with τμ/(4s24s)\tau\leq\mu/(4s\cdot 2^{4s}), then W[0,s1]W[0,s-1] is (μ,L)(\mu,L)-tensorial.

Proof.

We need to show that

𝔼wW[0,s1]{𝐘w}i=0s1{𝐘w(i)}1μ,\operatorname*{\mathbb{E}}_{w\in W[0,s-1]}{\left\lVert\{\mathbf{Y}_{w}^{\prime}\}-\prod_{i=0}^{s-1}\left\{\mathbf{Y}_{w(i)}^{\prime}\right\}\right\rVert_{1}}\leq\mu,

which can be proven by adapting a potential argument technique from [BRS11]. First, set the potential

Φm=𝔼SΠm𝔼σ{𝐙S}Φ𝐙S=σ𝒯,\Phi_{m}=\operatorname*{\mathbb{E}}_{S\sim\Pi_{m}}{\operatorname*{\mathbb{E}}_{\sigma\sim\{\mathbf{Z}_{S}\}}{\Phi^{\mathcal{T}}_{\mid\mathbf{Z}_{S}=\sigma}}}, (5)

where the distribution Πm\Pi_{m} on SV(G)S\subseteq V(G) is obtained from the process of choosing SS in propagation rounding (7.16) once mm has been fixed. Consider the error term

μm𝔼SΠm𝔼σ{𝐙S}D(S,σ),\mu_{m}\coloneqq\operatorname*{\mathbb{E}}_{S\sim\Pi_{m}}{\operatorname*{\mathbb{E}}_{\sigma\sim\{\mathbf{Z}_{S}\}}{D(S,\sigma)}}, (6)

where D(S,σ)𝔼wW[0,s1]{𝐘w𝐙S=σ}i=0s1{𝐘w(i)𝐙S=σ}1D(S,\sigma)\coloneqq\operatorname*{\mathbb{E}}_{w\in W[0,s-1]}\left\lVert\{\mathbf{Y}_{w}\mid\mathbf{Z}_{S}=\sigma\}-\prod_{i=0}^{s-1}\left\{\mathbf{Y}_{w(i)}\mid\mathbf{Z}_{S}=\sigma\right\}\right\rVert_{1}. If μmμ/2\mu_{m}\geq\mu/2, then

SΠm,σ{𝐙S}[D(S,σ)μm/2]μ4.\operatorname*{\mathbb{P}}_{S\sim\Pi_{m},\sigma\sim\{\mathbf{Z}_{S}\}}\left[D(S,\sigma)\geq\mu_{m}/2\right]\geq\frac{\mu}{4}.

For each choice of SS and σ\sigma such that D(S,σ)μ/2D(S,\sigma)\geq\mu/2, applying the triangle inequality from 7.21 to the conditioned variables gives us

μ2\displaystyle\frac{\mu}{2} 𝔼wW[0,s1]{𝐘w𝐙S=σ}i=0s1{𝐘w(i)𝐙S=σ}1\displaystyle\leq\operatorname*{\mathbb{E}}_{w\in W[0,s-1]}{\left\lVert\{\mathbf{Y}_{w}\mid\mathbf{Z}_{S}=\sigma\}-\prod_{i=0}^{s-1}\left\{\mathbf{Y}_{w(i)}\mid\mathbf{Z}_{S}=\sigma\right\}\right\rVert_{1}}
(k1,k2,k3)𝒯𝔼wW[k1,k3]{𝐘w𝐙S=σ}{𝐘w(k1,k2)𝐙S=σ}{𝐘w(k2+1,k3)𝐙S=σ}1.\displaystyle\leq\sum_{(k_{1},k_{2},k_{3})\in\mathcal{T}}~{}\operatorname*{\mathbb{E}}_{w\in W[k_{1},k_{3}]}{\left\lVert\{\mathbf{Y}_{w}\mid\mathbf{Z}_{S}=\sigma\}-\left\{\mathbf{Y}_{w(k_{1},k_{2})}\mid\mathbf{Z}_{S}=\sigma\right\}\left\{\mathbf{Y}_{w(k_{2}+1,k_{3})}\mid\mathbf{Z}_{S}=\sigma\right\}\right\rVert_{1}}.

Hence, there exists (k1,k2,k3)(k_{1},k_{2},k_{3}) such that

μ2s𝔼wW[k1,k3]{𝐘w𝐙S=σ}{𝐘w(k1,k2)𝐙S=σ}{𝐘w(k2+1,k3)𝐙S=σ}1.\frac{\mu}{2s}\leq\operatorname*{\mathbb{E}}_{w\in W[k_{1},k_{3}]}{\left\lVert\{\mathbf{Y}_{w}\mid\mathbf{Z}_{S}=\sigma\}-\left\{\mathbf{Y}_{w(k_{1},k_{2})}\mid\mathbf{Z}_{S}=\sigma\right\}\left\{\mathbf{Y}_{w(k_{2}+1,k_{3})}\mid\mathbf{Z}_{S}=\sigma\right\}\right\rVert_{1}}.

Note that choosing wW[0,s1]w\in W[0,s-1] uniformly and restricting to w(k1,k3)w(k_{1},k_{3}) gives a uniformly random element of W[k1,k3]W[k_{1},k_{3}]. If we choose w(k1,k2)w(k_{1},k_{2}) or w(k2+1,k3)w(k_{2}+1,k_{3}) with equal probability, then the final walk is distributed according to the stationary measure of 𝒰(𝖲k1,k2,k3)\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}}). Let ww^{\prime} denote the chosen walk. Observe that 𝐘w\mathbf{Y}_{w^{\prime}} is a deterministic function of 𝐙w𝐙S=σ\mathbf{Z}_{w^{\prime}}\mid\mathbf{Z}_{S}=\sigma. Now, we sample 𝐙w𝐙S=σ\mathbf{Z}_{w^{\prime}}\mid\mathbf{Z}_{S}=\sigma, which gives us a sample of 𝐘w\mathbf{Y}_{w^{\prime}}. Applying Lemma A.1, we have

Φ|{𝐘w|𝐙S=σ}𝒰(𝖲k1,k2,k3)Φ𝐙S=σ𝒰(𝖲k1,k2,k3)μ216s224s.\Phi^{\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}})}_{|\{\mathbf{Y}_{w^{\prime}}|\mathbf{Z}_{S}=\sigma\}}\leq\Phi^{\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}})}_{\mathbf{Z}_{S}=\sigma}-\frac{\mu^{2}}{16s^{2}\cdot 2^{4s}}.

This conditioning on an assignment to 𝐙set(w)𝐙S=σ\mathbf{Z}_{\textup{set}(w^{\prime})}\mid\mathbf{Z}_{S}=\sigma does not increase the other terms of Φ𝒯\Phi^{\mathcal{T}} associated to split operators other than 𝒰(𝖲k1,k2,k3)\mathcal{U}(\mathsf{S}_{k_{1},k_{2},k_{3}}) since entropy is non-increasing under conditioning. Similarly, conditioning on the remaining variables that are part of ww but not ww^{\prime} does not increase Φ𝒯\Phi^{\mathcal{T}}. Then, we obtain

ΦmΦm+1SΠm,σ{𝐙S}[D(S,σ)μm/2]μ216s224s.\Phi_{m}-\Phi_{m+1}\geq\operatorname*{\mathbb{P}}_{S\sim\Pi_{m},\sigma\sim\{\mathbf{Z}_{S}\}}\left[D(S,\sigma)\geq\mu_{m}/2\right]\cdot\frac{\mu^{2}}{16s^{2}\cdot 2^{4s}}.

Since sΦ1ΦL/(s+1)0s\geq\Phi_{1}\geq\cdots\geq\Phi_{L/(s+1)}\geq 0, there can be at most 32s324s/μ332s^{3}\cdot 2^{4s}/\mu^{3} indices m[L/s]m\in[L/s] such that μmμ/2\mu_{m}\geq\mu/2. In particular, since the total number of indices is L/sL/s, we have

𝔼m[L/s][μm]μ2+sL32s324sμ3.\operatorname*{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]\leq\frac{\mu}{2}+\frac{s}{L}\cdot\frac{32s^{3}\cdot 2^{4s}}{\mu^{3}}.

Our choice of LL is more than enough to ensure 𝔼m[L/s][μm]μ\mathchoice{\underset{m\in[L/s]}{\mathbb{E}}\left[\mu_{m}\right]}{{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]}{{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]}{{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]}\leq\mu.      

Applying the list decoding framework will require the stronger property of two-step tensoriality, which we can obtain under the same assumptions.

Lemma 7.24 (Splittability Implies Two-step Tensoriality).

Let W[0,s1]W[0,s-1] be the walk collection of the ss-wide replacement product of two graphs GG and HH. If L128(s424s/μ4)L\geq 128\cdot(s^{4}\cdot 2^{4s}/\mu^{4}) and W[0,s1]W[0,s-1] is τ\tau-splittable with τμ/(4s24s)\tau\leq\mu/(4s\cdot 2^{4s}), then W[0,s1]W[0,s-1] is (μ,L)(\mu,L)-two-step tensorial.

Proof.

Under our assumptions the (μ,L)(\mu,L)-tensorial property follows from Lemma 7.23 (which is the only place where the assumption on τ\tau is used), so we only need to show

𝔼w,wW[0,s1]{𝐘w𝐘w}{𝐘w}{𝐘w}1μ,\operatorname*{\mathbb{E}}_{w,w^{\prime}\in W[0,s-1]}{\left\lVert\{\mathbf{Y}_{w}^{\prime}\mathbf{Y}_{w^{\prime}}^{\prime}\}-\left\{\mathbf{Y}_{w}^{\prime}\right\}\left\{\mathbf{Y}_{w^{\prime}}^{\prime}\right\}\right\rVert_{1}}\leq\mu,

which can be proven by adapting a potential argument technique from [BRS11]. First, set the potential

Φm=𝔼SΠm𝔼σ{𝐙S}𝔼wW[0,s1](𝐘w𝐙S=σ),\Phi_{m}=\operatorname*{\mathbb{E}}_{S\sim\Pi_{m}}{\operatorname*{\mathbb{E}}_{\sigma\sim\{\mathbf{Z}_{S}\}}{\operatorname*{\mathbb{E}}_{w\in W[0,s-1]}{\mathcal{H}(\mathbf{Y}_{w}\mid\mathbf{Z}_{S}=\sigma)}}}, (7)

where the distribution Πm\Pi_{m} on SV(G)S\subseteq V(G) is obtained from the process of choosing SS in propagation rounding (7.16) once mm has been fixed. Consider the error term

μm𝔼SΠm𝔼σ{𝐙S}D(S,σ),\mu_{m}\coloneqq\operatorname*{\mathbb{E}}_{S\sim\Pi_{m}}{\operatorname*{\mathbb{E}}_{\sigma\sim\{\mathbf{Z}_{S}\}}{D(S,\sigma)}}, (8)

where D(S,σ)𝔼w,wW[0,s1][{𝐘w𝐘w𝐙S=σ}{𝐘w|𝐙S=σ}{𝐘w|𝐙S=σ}1]D(S,\sigma)\coloneqq\mathchoice{\underset{w,w^{\prime}\in W[0,s-1]}{\mathbb{E}}\left[\left\lVert\{\mathbf{Y}_{w}\mathbf{Y}_{w^{\prime}}\mid\mathbf{Z}_{S}=\sigma\}-\{\mathbf{Y}_{w}|\mathbf{Z}_{S}=\sigma\}\{\mathbf{Y}_{w^{\prime}}|\mathbf{Z}_{S}=\sigma\}\right\rVert_{1}\right]}{{\mathbb{E}}_{w,w^{\prime}\in W[0,s-1]}[\left\lVert\{\mathbf{Y}_{w}\mathbf{Y}_{w^{\prime}}\mid\mathbf{Z}_{S}=\sigma\}-\{\mathbf{Y}_{w}|\mathbf{Z}_{S}=\sigma\}\{\mathbf{Y}_{w^{\prime}}|\mathbf{Z}_{S}=\sigma\}\right\rVert_{1}]}{{\mathbb{E}}_{w,w^{\prime}\in W[0,s-1]}[\left\lVert\{\mathbf{Y}_{w}\mathbf{Y}_{w^{\prime}}\mid\mathbf{Z}_{S}=\sigma\}-\{\mathbf{Y}_{w}|\mathbf{Z}_{S}=\sigma\}\{\mathbf{Y}_{w^{\prime}}|\mathbf{Z}_{S}=\sigma\}\right\rVert_{1}]}{{\mathbb{E}}_{w,w^{\prime}\in W[0,s-1]}[\left\lVert\{\mathbf{Y}_{w}\mathbf{Y}_{w^{\prime}}\mid\mathbf{Z}_{S}=\sigma\}-\{\mathbf{Y}_{w}|\mathbf{Z}_{S}=\sigma\}\{\mathbf{Y}_{w^{\prime}}|\mathbf{Z}_{S}=\sigma\}\right\rVert_{1}]}. If μmμ/2\mu_{m}\geq\mu/2, then

SΠm,σ{𝐙S}[D(S,σ)μm/2]μ4.\operatorname*{\mathbb{P}}_{S\sim\Pi_{m},\sigma\sim\{\mathbf{Z}_{S}\}}\left[D(S,\sigma)\geq\mu_{m}/2\right]\geq\frac{\mu}{4}.

Let G=(V=W[0,s1],E)G^{\prime}=(V=W[0,s-1],E) be the graph with edges E={{w,w}w,wW[0,s1]}E=\{\{w,w^{\prime}\}\mid w,w^{\prime}\in W[0,s-1]\}. Local correlation (expectation over the edges) on this graph GG^{\prime} is the same as global correlation (expectation over two independent copies of vertices). Then, we obtain 888See [AJT19] or [BRS11] for the details.

ΦmΦm+1SΠm,σ{𝐙S}[D(S,σ)μm/2]μ2222s.\Phi_{m}-\Phi_{m+1}\geq\operatorname*{\mathbb{P}}_{S\sim\Pi_{m},\sigma\sim\{\mathbf{Z}_{S}\}}\left[D(S,\sigma)\geq\mu_{m}/2\right]\cdot\frac{\mu^{2}}{2\cdot 2^{2s}}.

Since 1Φ1ΦL/(s+1)01\geq\Phi_{1}\geq\cdots\geq\Phi_{L/(s+1)}\geq 0, there can be at most 822s/μ38\cdot 2^{2s}/\mu^{3} indices m[L/s]m\in[L/s] such that μmμ/2\mu_{m}\geq\mu/2. In particular, since the total number of indices is L/sL/s, we have

𝔼m[L/s]μmμ2+kL822sμ3.\operatorname*{\mathbb{E}}_{m\in[L/s]}{\mu_{m}}\leq\frac{\mu}{2}+\frac{k}{L}\cdot\frac{8\cdot 2^{2s}}{\mu^{3}}.

Our choice of LL is more than enough to ensure 𝔼m[L/s][μm]μ\mathchoice{\underset{m\in[L/s]}{\mathbb{E}}\left[\mu_{m}\right]}{{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]}{{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]}{{\mathbb{E}}_{m\in[L/s]}[\mu_{m}]}\leq\mu.      

We have already established that W[0,s1]W[0,s-1] is τ\tau-splittable with τ=σ2(G)+2σ2(H)+σ2(H)2\tau=\sigma_{2}(G)+2\sigma_{2}(H)+\sigma_{2}(H)^{2} in Corollary 7.13, so we can obtain (μ,L)(\mu,L)-two-step tensoriality for any μ\mu if this quantity is small enough.

7.4.2 Higher Levels of the Cascade

We now discuss tensoriality of the other levels of the cascade between 𝒞i1\mathcal{C}_{i-1} and 𝒞i\mathcal{C}_{i} for i2i\geq 2. Tensorial properties are simpler to establish here than on the first level of the cascade. The relevant split operators are special cases of 𝖲k1,k2,k3\mathsf{S}_{k_{1},k_{2},k_{3}} where k10(mods)k_{1}\equiv 0\pmod{s} and k2,k31(mods)k_{2},k_{3}\equiv-1\pmod{s}. The main difference now is that we can associate the parity bits of 𝒞i1\mathcal{C}_{i-1} with the vertices of 𝒰(𝖲r,r)\mathcal{U}(\mathsf{S}_{r,r}^{\bigtriangleup}), which themselves represent walks. As this association of parity bits doesn’t need to satisfy a consistency condition, we only need to work with a single ensemble 𝐙\mathbf{Z} instead of working with two different ensembles as in the previous case. The proofs of Lemma 7.23 and Lemma 7.24 with these slight modifications give us two-step tensoriality.

Lemma 7.25 (Two-step Tensoriality for Higher Levels).

Let W(k)W(k) be the set of walks defined using (k1)(k-1) steps of the operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup}. If L128(k424k/μ4)L\geq 128\cdot(k^{4}\cdot 2^{4k}/\mu^{4}) and W(k)W(k) is τ\tau-splittable with τμ/(4k24k)\tau\leq\mu/(4k\cdot 2^{4k}), then W(k)W(k) is (μ,L)(\mu,L)-two-step tensorial.

We know that the collection of walks obtained from σ2(𝖲r,r)\sigma_{2}(\mathsf{S}_{r,r}^{\bigtriangleup}) is (σ2(G)+2σ2(H)+σ2(H)2)(\sigma_{2}(G)+2\cdot\sigma_{2}(H)+\sigma_{2}(H)^{2})-splittable, so the parameters necessary to obtain two-step tensoriality are the same as in the first level of the cascade.

8 Choosing Parameters for Ta-Shma’s Construction

We explore how some choices of parameters for Ta-Shma’s construction interact with the requirements of our decoding algorithm. The analysis is divided into rounds of increasingly stronger decoding guarantees with later rounds relying on the codes obtained in previous rounds. Naturally, the stronger guarantees come with more delicate and technical considerations. We briefly summarize the goals of each round and some key parameters.

  1. 1.

    Round I: For any constant β>0\beta>0, we obtain efficient unique decodable codes 𝒞\mathcal{C}_{\ell} with distance at least 1/2ε1/2-\varepsilon and rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) for infinitely many discrete values of ε>0\varepsilon>0 (with ε\varepsilon as close to 0 as desired). In this regime, it suffices for the expansion of HH to be constant. This round leads to Theorem 6.6.

  2. 2.

    Round II: Similar to Round I, but now ε\varepsilon can be any value in an interval (0,b)(0,b) with b<1/2b<1/2 being a function of β\beta. Again the expansion of HH can be constant. This round leads to Theorem 6.7.

  3. 3.

    Round III: We want β\beta to vanish as ε\varepsilon vanishes (this is qualitatively similar to Ta-Shma’s result). In this regime, we make the expansion of HH be a function of ε\varepsilon, and we rely on the uniquely decodable codes of Round II. This round leads to Theorem 1.1.

  4. 4.

    Round IV: For any constant β0>0\beta_{0}>0, we obtain efficient list decodable codes 𝒞\mathcal{C}_{\ell} with list decoding radius 1/2β01/2-\beta_{0} and rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) with β0\beta\rightarrow 0 as ε0\varepsilon\rightarrow 0. In this regime, we make the expansion of HH be a function of ε\varepsilon, and we rely on the uniquely decodable codes of Round III. This round leads to Theorem 1.2.

The way we choose parameters for Ta-Shma’s construction borrows heavily from Ta-Shma’s arguments in [TS17]. We fix some notation common to all rounds. A graph is said to be an (n,d,λ)(n,d,\lambda)-graph provided it has nn vertices, is dd-regular, and has second largest singular value of its normalized adjacency matrix at most λ\lambda.

Notation 8.1.

We use the following notation for the graphs GG and HH used in the ss-wide replacement product.

  • -

    The outer graph GG will be an (n,d1,λ1)(n^{\prime},d_{1},\lambda_{1})-graph.

  • -

    The inner graph HH will be a (d1s,d2,λ2)(d_{1}^{s},d_{2},\lambda_{2})-graph.

The parameters n,d1,d2,λ1,λ2n^{\prime},d_{1},d_{2},\lambda_{1},\lambda_{2} and ss will be chosen in the subsequent sections.

8.1 Round I: Initial Analysis

We are given the dimension DD of the desired code and ε(0,1/2)\varepsilon\in(0,1/2). We set a parameter α1/128\alpha\leq 1/128 such that (for convenience) 1/α1/\alpha is a power of 22 and

α54log2(1/α)1log2(1/ε).\frac{\alpha^{5}}{4\log_{2}(1/\alpha)}\geq\frac{1}{\log_{2}(1/\varepsilon)}. (9)

We can assume that α1/128\alpha\leq 1/128 satisfy Eq. 9 since otherwise ε\varepsilon is a constant and we can use the list decodable codes from [AJQ+20]. The use of Eq. 9 will be clear shortly. It becomes a necessity from round III onward. For rounds I and II, the parameter α\alpha will be a constant, but it will be useful to establish the analysis in more generality now so that subsequent rounds can reuse it.

The inner graph HH.  The choice of HH is similar to Ta-Shma’s choice. More precisely, we set s=1/αs=1/\alpha and d2=s4s2d_{2}=s^{4s^{2}} (Ta-Shma took d2=s4sd_{2}=s^{4s}). We obtain a Cayley graph H=Cay(𝔽24slog2(d2),A)H=\textup{Cay}(\mathbb{F}_{2}^{4s\log_{2}(d_{2})},A) such that HH is an (n2=d24s,d2,λ2)(n_{2}=d_{2}^{4s},d_{2},\lambda_{2}) graph where λ2=b2/d2\lambda_{2}=b_{2}/\sqrt{d_{2}} and b2=4slog2(d2)b_{2}=4s\log_{2}(d_{2}). (The set of generators, AA, comes from a small bias code derived from a construction of Alon et al. [AGHP92], but we will rely on Ta-Shma’s analysis embodied in Lemma B.2 and not discuss it further.)

The base code 𝒞0\mathcal{C}_{0}.  Set ε0=1/d22=λ24/b24λ24/3\varepsilon_{0}=1/d_{2}^{2}=\lambda_{2}^{4}/b_{2}^{4}\leq\lambda_{2}^{4}/3 (this choice differs from Ta-Shma’s and it appears because we are essentially working with H2H^{2} rather than HH). We will choose a base code 𝒞0\mathcal{C}_{0} such that the desired code will be obtained as a direct sum lifting of 𝒞0\mathcal{C}_{0}, and because this lifting preserves the dimension, the dimension of 𝒞0\mathcal{C}_{0} should be DD. We choose 𝒞0\mathcal{C}_{0} to be an ε0\varepsilon_{0}-balanced code with dimension DD and block length n=Oε0(D)n=O_{\varepsilon_{0}}(D). For instance, we can start with any good (constant rate and relative distance) linear base code 𝒞0\mathcal{C}_{0} that has an efficient unique decoding algorithm and obtain a ε0\varepsilon_{0}-balanced lifted code that can be efficiently unique decoded (as long as ε0\varepsilon_{0} is constant) using the framework in [AJQ+20].

The outer graph GG.  Set d1=d24d_{1}=d_{2}^{4} so that n2=d1sn_{2}=d_{1}^{s} as required by the ss-wide replacement product. We apply Ta-Shma’s explicit Ramanujan graph Lemma B.1 with parameters nn, d1d_{1} and θ\theta to obtain an (n,d1,λ1)(n^{\prime},d_{1},\lambda_{1}) Ramanujan graph GG with λ122/d1\lambda_{1}\leq 2\sqrt{2}/\sqrt{d_{1}} and n[(1θ)n,n]n^{\prime}\in[(1-\theta)n,n] or n[(1θ)2n,2n]n^{\prime}\in[(1-\theta)2n,2n]. Here, θ\theta is an error parameter that we set as θ=λ24/6\theta=\lambda_{2}^{4}/6 (this choice of θ\theta differs from Ta-Shma). Because we can construct words with block length 2n2n (if needed) by duplicating each codeword, we may assume w.l.o.g. that nn^{\prime} is close to nn and (nn)θn2θn(n-n^{\prime})\leq\theta n\leq 2\theta n^{\prime}. See Appendix B for a more formal description of this graph.

Note that λ1λ24/6\lambda_{1}\leq\lambda_{2}^{4}/6 since λ13/d1=3/d22=3λ24/b24λ24/6\lambda_{1}\leq 3/\sqrt{d_{1}}=3/d_{2}^{2}=3\cdot\lambda_{2}^{4}/b_{2}^{4}\leq\lambda_{2}^{4}/6. Hence, ε0+2θ+2λ1λ24\varepsilon_{0}+2\theta+2\lambda_{1}\leq\lambda_{2}^{4}.

The walk length.  Set the walk length t1t-1 to be the smallest integer such that

(λ22)(15α)(1α)(t1)ε.(\lambda_{2}^{2})^{(1-5\alpha)(1-\alpha)(t-1)}\leq\varepsilon.

This will imply using Ta-Shma’s analysis that the bias of the final code is at most ε\varepsilon as shown later.

s=1/α, such that α54log2(1/α)1log2(1/ε)s=1/\alpha,\text{ such that }\frac{\alpha^{5}}{4\log_{2}(1/\alpha)}\geq\frac{1}{\log_{2}(1/\varepsilon)} H:(n2,d2,λ2),n2=d1s,d2=s4s2,λ2=b2d2,b2=4slogd2H:(n_{2},d_{2},\lambda_{2}),\quad n_{2}=d_{1}^{s},\quad d_{2}=s^{4s^{2}},\quad\lambda_{2}=\frac{b_{2}}{\sqrt{d_{2}}},\quad b_{2}=4s\log d_{2} G:(n,d1,λ1),nn=O(D/ε0c),d1=d24,λ122d1G:(n^{\prime},d_{1},\lambda_{1}),\quad n^{\prime}\approx n=O(D/\varepsilon_{0}^{c}),\quad d_{1}=d_{2}^{4},\quad\lambda_{1}\leq\frac{2\sqrt{2}}{d_{1}} t: smallest integer such that (λ22)(15α)(1α)(t1)εt:\text{ smallest integer such that }(\lambda_{2}^{2})^{(1-5\alpha)(1-\alpha)(t-1)}\leq\varepsilon

Claim 8.2.

We have t1s/α=s2t-1\geq s/\alpha=s^{2}.

Proof.

Using d2=s4s2d_{2}=s^{4s^{2}} and Eq. 9, we have

(1λ22)(15α)(1α)s/α\displaystyle\left(\frac{1}{\lambda_{2}^{2}}\right)^{(1-5\alpha)(1-\alpha)s/\alpha} (1λ22)s/α=(d2b22)s/α(d2)s/α=s4s3/α\displaystyle\leq\left(\frac{1}{\lambda_{2}^{2}}\right)^{s/\alpha}=\left(\frac{d_{2}}{b_{2}^{2}}\right)^{s/\alpha}\leq\left(d_{2}\right)^{s/\alpha}=s^{4s^{3}/\alpha}
=24s3log2(s)/α=24log2(1/α)/α42log2(1/ε)=1ε.\displaystyle=2^{4s^{3}\log_{2}(s)/\alpha}=2^{4\log_{2}(1/\alpha)/\alpha^{4}}\leq 2^{\log_{2}(1/\varepsilon)}=\frac{1}{\varepsilon}.

Hence, ε(λ22)(15α)(1α)s/α\varepsilon\leq(\lambda_{2}^{2})^{(1-5\alpha)(1-\alpha)s/\alpha} and thus t1t-1 must be at least s/αs/\alpha.      

Remark 8.3.

By our choice of tt, we have (λ22)(15α)(1α)(t2)ε(\lambda_{2}^{2})^{(1-5\alpha)(1-\alpha)(t-2)}\geq\varepsilon. Since 1/(t1)α1/(t-1)\leq\alpha, we get (λ22)(15α)(1α)2(t1)ε(\lambda_{2}^{2})^{(1-5\alpha)(1-\alpha)^{2}(t-1)}\geq\varepsilon.

Final Bias. We denote by 𝒞\mathcal{C}_{\ell} the final code obtained by tt steps of the ss-wide replacement product. The bias of 𝒞\mathcal{C}_{\ell} is given by Corollary 4.10 (which in turn is a simple corollary of Ta-Shma’s 4.9) as shown next.

Corollary 8.4.

The code 𝒞\mathcal{C}_{\ell} is ε\varepsilon-balanced.

Proof.

Using Corollary 4.10, we have that the final bias

b(σ2(H2)s1+(s1)σ2(H2)s2+(s1)2σ2(H2)s4)(t1)/s\displaystyle b\coloneqq\left(\sigma_{2}(H^{2})^{s-1}+(s-1)\cdot\sigma_{2}(H^{2})^{s-2}+(s-1)^{2}\cdot\sigma_{2}(H^{2})^{s-4}\right)^{\lfloor(t-1)/s\rfloor}

is bounded by

b\displaystyle b (3(s1)2σ2(H2)s4)((t1)/s)1\displaystyle\leq(3(s-1)^{2}\sigma_{2}(H^{2})^{s-4})^{((t-1)/s)-1} (Using σ2(H2)1/3s2)\displaystyle(\text{Using }\sigma_{2}(H^{2})\leq 1/3s^{2})
((σ2(H2)s5)(t1s)/s\displaystyle\leq((\sigma_{2}(H^{2})^{s-5})^{(t-1-s)/s}
=σ2(H2)(15/s)(1s/(t1))(t1)\displaystyle=\sigma_{2}(H^{2})^{(1-5/s)(1-s/(t-1))(t-1)}
σ2(H2)(15α)(1α)(t1)\displaystyle\leq\sigma_{2}(H^{2})^{(1-5\alpha)(1-\alpha)(t-1)}
=(λ22)(15α)(1α)(t1)ε,\displaystyle=\left(\lambda_{2}^{2}\right)^{(1-5\alpha)(1-\alpha)(t-1)}\leq\varepsilon,

where the last inequality follows from s=1/αs=1/\alpha and t1s/αt-1\geq s/\alpha, the latter from 8.2.      

Rate. The proof of the rate follows a similar structure of Ta-Shma’s original argument except that we take ss to be a constant independent of ε\varepsilon so that ε0\varepsilon_{0}, λ1\lambda_{1}, and λ2\lambda_{2} are also constants independent of ε\varepsilon. Note that we previously said α=1/s\alpha=1/s needs to satisfy Equation 9, but that implies only an upper bound for ss, and smaller (even constant) values for ss are still permissible.

Claim 8.5.

𝒞\mathcal{C}_{\ell} has rate Ω(ε2+26α)\Omega(\varepsilon^{2+26\cdot\alpha}) provided ε0>0\varepsilon_{0}>0 is constant.

Proof.

The support size is the number of walks of length tt on the ss-wide replacement product of GG and HH (each step of the walk has d22d_{2}^{2} options), which is

|V(G)||V(H)|d22(t1)=nd1sd22(t1)\displaystyle|V(G)||V(H)|d_{2}^{2(t-1)}=n^{\prime}\cdot d_{1}^{s}\cdot d_{2}^{2(t-1)} =nd22(t1)+4snd22(t1)+4s\displaystyle=n^{\prime}\cdot d_{2}^{2(t-1)+4s}\leq n\cdot d_{2}^{2(t-1)+4s}
=Θε0(Dd22(t1)+4s)\displaystyle=\Theta_{\varepsilon_{0}}\left(D\cdot d_{2}^{2(t-1)+4s}\right)
=Θ(D(d22)t1+2s)\displaystyle=\Theta\left(D\cdot(d_{2}^{2})^{t-1+2s}\right)
=O(D(d22)(1+2α)(t1)),\displaystyle=O\left(D\cdot(d_{2}^{2})^{(1+2\alpha)(t-1)}\right),

where the penultimate equality follows from the assumption that ε0\varepsilon_{0} is a constant.

Note that d2α=d21/s=s4sb2d_{2}^{\alpha}=d_{2}^{1/s}=s^{4s}\geq b_{2} since b2=4slog2(d2)=16s3log2(s)s4b_{2}=4s\log_{2}(d_{2})=16s^{3}\log_{2}(s)\leq s^{4} (recall that s=1/α128s=1/\alpha\geq 128). Thus,

d212α=d2d22αd2b22=1σ2(H2).d_{2}^{1-2\alpha}=\frac{d_{2}}{d_{2}^{2\alpha}}\leq\frac{d_{2}}{b_{2}^{2}}=\frac{1}{\sigma_{2}(H^{2})}.

We obtain

(d22)(t1)\displaystyle(d_{2}^{2})^{(t-1)} (1σ2(H2))2(t1)12α\displaystyle\leq\left(\frac{1}{\sigma_{2}(H^{2})}\right)^{\frac{2(t-1)}{1-2\alpha}}
(1ε)2(12α)(15α)(1α)2\displaystyle\leq\left(\frac{1}{\varepsilon}\right)^{\frac{2}{(1-2\alpha)(1-5\alpha)(1-\alpha)^{2}}} (Using Remark 8.3)
(1ε)2(1+10α),\displaystyle\leq\left(\frac{1}{\varepsilon}\right)^{2(1+10\alpha)},

which implies a block length of

O(D(d22)(1+2α)(t1))=O(D(1ε)2(1+10α)(1+2α))=O(D(1ε)2(1+13α)).O\left(D\cdot(d_{2}^{2})^{(1+2\alpha)(t-1)}\right)=O\left(D\left(\frac{1}{\varepsilon}\right)^{2(1+10\alpha)(1+2\alpha)}\right)=O\left(D\left(\frac{1}{\varepsilon}\right)^{2(1+13\alpha)}\right).

 

Lemma 8.6 (Codes Near the GV bound I).

For every constant β>0\beta>0, there exists a sufficiently large constant ss in the above analysis so that for any dimension value D+D\in\mathbb{N}^{+} (sufficiently large) and ε>0\varepsilon>0 (sufficiently small) the final code 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta}, where NN is the block length, satisfies

  • -

    𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} is ε\varepsilon-balanced,

  • -

    𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} has rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}), and

  • -

    𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} is a linear code of dimension DD.

Remark 8.7.

As a consequence of code cascading, the final attainable walk lengths have the form s1s^{\ell}-1 where \ell is a positive integer. Given β>0\beta>0, we have infinitely many values of ε\varepsilon attainable by such walk lengths which gives infinitely many codes 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta}. This means that although the bias ε\varepsilon cannot be arbitrary, we have an infinite sequence of values of ε\varepsilon for which the rates of the codes 𝒞N,ε,β\mathcal{C}_{N,\varepsilon,\beta} are near the GV bound. In Section 8.2, we show how to bypass this artificial limitation. These codes are used in the proof of Theorem 6.6.

We can view the above analysis as defining a function Γ\Gamma that receives

  • -

    the dimension D+D\in\mathbb{N}^{+},

  • -

    the final bias ε>0\varepsilon>0,

  • -

    the approximating error α(0,1/128]\alpha\in(0,1/128] with s1/αs\coloneqq 1/\alpha being a power of two, and

  • -

    a multiplying factor Q+Q\in\mathbb{N}^{+} such that d2=s4s2Qd_{2}=s^{4s^{2}\cdot Q} (in the above QQ was 11).

and outputs a tuple of parameters (t,ε0,θ,d1,λ1,n)(t,\varepsilon_{0},\theta,d_{1},\lambda_{1},n^{\prime}), graphs GG and HH (as above) where, in particular, the number of steps t+t\in\mathbb{N}^{+} is such that the final code 𝒞\mathcal{C}_{\ell} has bias at most ε\varepsilon and rate Ω(ε2+26α)\Omega(\varepsilon^{2+26\cdot\alpha}).

In future rounds, Γ\Gamma may be called with Q=sQ=s instead of Q=1Q=1. This will cause d2d_{2} to increase from s4s2s^{4s^{2}} to s4s2Qs^{4s^{2}\cdot Q}, and so in the proof of 8.2, 24log2(1/α)/α42^{4\log_{2}(1/\alpha)/\alpha^{4}} will be replaced by 24log2(1/α)/α52^{4\log_{2}(1/\alpha)/\alpha^{5}}. This explains why Eq. 9 has a stricter requirement than needed in the Q=1Q=1 case above.

8.2 Round II: A More Careful Analysis

We are given the dimension of the code DD and ε(0,1/2)\varepsilon\in(0,1/2). As before, we set a parameter α1/128\alpha\leq 1/128 such that (for convenience) 1/α1/\alpha is a power of 22. Set s=1/αs=1/\alpha and Q=sQ=s.

Apply Γ\Gamma to (D,ε,α,Q)(D,\varepsilon,\alpha,Q) to obtain all parameters except tt. Choose tt to be the smallest integer satisfying

(λ22)(15α)(12α)(1α)(t1)ε,(\lambda_{2}^{2})^{(1-5\alpha)(1-2\alpha)(1-\alpha)(t-1)}\leq\varepsilon,

where observe that an extra (12α)(1-2\alpha) factor appears in the exponent. This change in tt will worsen the rate but by losing a factor of 112α\frac{1}{1-2\alpha} in the exponent, we can lower bound the rate. That is, (d22)(t1)=Ω(ε2+26α12α)(d_{2}^{2})^{-(t-1)}=\Omega(\varepsilon^{\frac{2+26\cdot\alpha}{1-2\alpha}}).

Set +\ell\in\mathbb{N}^{+} to be the smallest value such that sts^{\ell}\geq t (here we are implicitly assuming that t>st>s). If s=ts^{\ell}=t, we are done since we can use all the parameters returned by Γ\Gamma for the construction of 𝒞\mathcal{C}_{\ell}. Now assume s>ts^{\ell}>t and let ζ=t/s1\zeta=t/s^{\ell-1}. Note that ζ(1,s)\zeta\in(1,s). Choose PP to be the integer in the interval [Q,sQ][Q,s\cdot Q] such that

0PQζ1Q.0\leq\frac{P}{Q}-\zeta\leq\frac{1}{Q}.

Because s>ts^{\ell}>t, and only powers of ss may be chosen for walk length, we might overshoot in walk length by a multiplicative factor of ss. This will cause a corresponding decay in rate computation that we cannot afford. To overcome this, in the last level of the cascade between codes 𝒞1\mathcal{C}_{\ell-1} and 𝒞\mathcal{C}_{\ell}, perform the direct sum over walks of length (P1)(P-1) instead of length (s1)(s-1). The new total number of vertices is t=Ps1t^{\prime}=Ps^{\ell-1}. Note that PP can be as large as s2s^{2}, so our splittability guarantee of W(P)W(P) (the walk collection from the lift between 𝒞1\mathcal{C}_{\ell-1} and 𝒞\mathcal{C}_{\ell}) has to be strong enough to accommodate this larger arity and not only arity ss.

Claim 8.8.

We have t1t1Q(1+2α)(t1)t-1\leq\frac{t^{\prime}-1}{Q}\leq(1+2\alpha)(t-1).

Proof.

By construction, we have the sequence of implications

0PQs1ζs1s1Q\displaystyle 0\leq\frac{P}{Q}s^{\ell-1}-\zeta s^{\ell-1}\leq\frac{s^{\ell-1}}{Q}
\displaystyle\Rightarrow 0tQts1QtQ\displaystyle 0\leq\frac{t^{\prime}}{Q}-t\leq\frac{s^{\ell-1}}{Q}\leq\frac{t}{Q}
\displaystyle\Rightarrow t1Qt1Q(t1)(1+1Q)+1,\displaystyle t-\frac{1}{Q}\leq\frac{t^{\prime}-1}{Q}\leq(t-1)\left(1+\frac{1}{Q}\right)+1,

from which we obtain

t1t1Qt1Qt-1\leq t-\frac{1}{Q}\leq\frac{t^{\prime}-1}{Q}

and

t1Q(t1)(1+1Q)+1=(1+α)(t1)+1<(1+2α)(t1),\frac{t^{\prime}-1}{Q}\leq(t-1)\left(1+\frac{1}{Q}\right)+1=(1+\alpha)(t-1)+1<(1+2\alpha)(t-1),

the latter using Q=s=1/αQ=s=1/\alpha.      

We apply Γ\Gamma again but this time to (D,ε,α,1)(D,\varepsilon,\alpha,1) to obtain new parameters (t′′,ε0(t^{\prime\prime},\varepsilon_{0}^{\prime}, θ\theta^{\prime}, d1d_{1}^{\prime}, λ1\lambda_{1}^{\prime}, n′′)n^{\prime\prime}), and graphs GG^{\prime} and HH^{\prime}.

Claim 8.9.

The code 𝒞\mathcal{C}_{\ell}^{\prime} obtained by tt^{\prime} walk steps on the ss-wide replacement product of GG^{\prime} and HH^{\prime} from the second application of Γ\Gamma has bias at most ε\varepsilon and rate Ω(ε2+40α)\Omega(\varepsilon^{2+40\alpha}).

Proof.

Let d2=s4s2Qd_{2}=s^{4s^{2}\cdot Q}, b2=4slog2(d2)b_{2}=4s\log_{2}(d_{2}) and λ2=b2/d2\lambda_{2}=b_{2}/\sqrt{d_{2}} be the parameters of the first invocation of Γ\Gamma. Recall that tt was chosen to be the smallest integer satisfying

(λ22)(15α)2(1α)(t1)ε.(\lambda_{2}^{2})^{(1-5\alpha)^{2}(1-\alpha)(t-1)}\leq\varepsilon.

Let d2=s4s2d_{2}^{\prime}=s^{4s^{2}}, b2=4slog2(d2)b_{2}^{\prime}=4s\log_{2}(d_{2}^{\prime}) and λ2=b2/d2\lambda_{2}^{\prime}=b_{2}^{\prime}/\sqrt{d_{2}^{\prime}} be the parameters of the second invocation of Γ\Gamma. Observe that

(λ2)Q\displaystyle(\lambda_{2}^{\prime})^{Q} =(b2)Q(d2)Q=(b2Q)d2=(16s3log2(s))Qs2s2Q\displaystyle=\frac{(b_{2}^{\prime})^{Q}}{\sqrt{(d_{2}^{\prime})^{Q}}}=\frac{(b_{2}^{\prime Q})}{\sqrt{d_{2}}}=\frac{(16s^{3}\log_{2}(s))^{Q}}{s^{2s^{2}\cdot Q}}
s4Qs2s2Q=1s2s2Q(12s2)=(1s2s2Q)12α(b2d2)12α=λ212α.\displaystyle\leq\frac{s^{4Q}}{s^{2s^{2}\cdot Q}}=\frac{1}{s^{2s^{2}\cdot Q(1-\frac{2}{s^{2}})}}=\left(\frac{1}{s^{2s^{2}\cdot Q}}\right)^{1-2\alpha}\leq\left(\frac{b_{2}}{\sqrt{d_{2}}}\right)^{1-2\alpha}=\lambda_{2}^{1-2\alpha}.

Then the bias of 𝒞\mathcal{C}^{\prime}_{\ell} is at most

(((λ2)Q)2)(15α)(1α)(t1)/Q\displaystyle(((\lambda_{2}^{\prime})^{Q})^{2})^{(1-5\alpha)(1-\alpha)(t^{\prime}-1)/Q} (λ22)(15α)(12α)(1α)(t1)/Q\displaystyle\leq(\lambda_{2}^{2})^{(1-5\alpha)(1-2\alpha)(1-\alpha)(t^{\prime}-1)/Q}
(λ22)(15α)(12α)(1α)(t1)ε.\displaystyle\leq(\lambda_{2}^{2})^{(1-5\alpha)(1-2\alpha)(1-\alpha)(t-1)}\leq\varepsilon.

For the rate computation of 𝒞\mathcal{C}^{\prime}_{\ell}, we will lower bound the term ((d2)2)(t1)((d^{\prime}_{2})^{2})^{-(t^{\prime}-1)}. Since d2=(d2)Qd_{2}=(d_{2}^{\prime})^{Q}, (d22)(t1)=Ω(ε2+26α12α)(d_{2}^{2})^{-(t-1)}=\Omega(\varepsilon^{\frac{2+26\cdot\alpha}{1-2\alpha}}) and t1Q(1+2α)(t1)\frac{t^{\prime}-1}{Q}\leq(1+2\alpha)(t-1) (the latter by 8.8), the rate of 𝒞\mathcal{C}_{\ell}^{\prime} is

Ω(((d2)2)(t1))=Ω((d22)(t1)/Q)=Ω((d22)(1+2α)(t1))=Ω((ε2+26α)1+2α12α)=Ω(ε2+40α).\Omega(((d^{\prime}_{2})^{2})^{-(t^{\prime}-1)})=\Omega((d_{2}^{2})^{-(t^{\prime}-1)/Q})=\Omega((d_{2}^{2})^{-(1+2\alpha)(t-1)})=\Omega((\varepsilon^{2+26\cdot\alpha})^{\frac{1+2\alpha}{1-2\alpha}})=\Omega(\varepsilon^{2+40\cdot\alpha}).

 

8.3 Round III: Vanishing β\beta as ε\varepsilon Vanishes

We are given the dimension of the code DD and ε(0,1/2)\varepsilon\in(0,1/2). As before, we set a parameter α1/128\alpha\leq 1/128 such that (for convenience) 1/α1/\alpha is a power of 22. Set s1/αs\coloneqq 1/\alpha.

We will consider the regime where ss is a function of ε\varepsilon. As a consequence, the parameters d2,λ2,d1,λ1,ε0d_{2},\lambda_{2},d_{1},\lambda_{1},\varepsilon_{0} will also depend on ε\varepsilon. Since x1/log2(1/x)x\leq 1/\log_{2}(1/x) for x1/2x\leq 1/2 (and α1/2\alpha\leq 1/2), if α\alpha satisfies α6/41/log2(1/β)\alpha^{6}/4\geq 1/\log_{2}(1/\beta), it also satisfies Eq. 9 (we lose a log factor by replacing 1/log2(1/α)1/\log_{2}(1/\alpha) by α\alpha, but we will favor simplicity of parameters). In particular, we can set α\alpha so that ss is

s=Θ((log2(1/ε))1/6),s=\Theta((\log_{2}(1/\varepsilon))^{1/6}),

and satisfy Eq. 9.

We follow the same choices as in Round II except for the base code 𝒞0\mathcal{C}_{0}.

The base code 𝒞0\mathcal{C}_{0}.  Set ε0=1/d22=λ24/b24λ24/3\varepsilon_{0}=1/d_{2}^{2}=\lambda_{2}^{4}/b_{2}^{4}\leq\lambda_{2}^{4}/3. We choose an ε0\varepsilon_{0}-balanced code 𝒞0\mathcal{C}_{0} with support size n=O(D/ε0c)n=O(D/\varepsilon_{0}^{c}) where c=2.001c=2.001 (this choice of cc is arbitrary, it is enough to have cc as a fixed small constant) using the construction from Round II. It is crucial that we can unique decode 𝒞0\mathcal{C}_{0} (using our algorithm), since this is required in order to apply the list decoding framework.

Note that ε0\varepsilon_{0} is no longer a constant. For this reason, we need to consider the rate computation of the final code 𝒞\mathcal{C}_{\ell} more carefully. The proof will follow an argument similar to Ta-Shma’s.

Claim 8.10.

𝒞\mathcal{C}_{\ell} has rate Ω(ε2+26α)\Omega(\varepsilon^{2+26\cdot\alpha}) where α=Θ(1/(log2(1/ε))1/6)\alpha=\Theta(1/(\log_{2}(1/\varepsilon))^{1/6}).

Proof.

The support size is the number of walks of length t1t-1 on the ss-wide replacement product of GG and HH (each step of the walk has d22d_{2}^{2} options), which is

|V(G)||V(H)|d22(t1)=nd1sd22(t1)\displaystyle|V(G)||V(H)|d_{2}^{2(t-1)}=n^{\prime}\cdot d_{1}^{s}\cdot d_{2}^{2(t-1)} =nd22(t1)+4snd22(t1)+4s\displaystyle=n^{\prime}\cdot d_{2}^{2(t-1)+4s}\leq n\cdot d_{2}^{2(t-1)+4s}
=Θ(Dε0cd22(t1)+4s)\displaystyle=\Theta\left(\frac{D}{\varepsilon_{0}^{c}}\cdot d_{2}^{2(t-1)+4s}\right)
=Θ(D(d22)(t1)+2s+2.001)\displaystyle=\Theta\left(D\cdot(d_{2}^{2})^{(t-1)+2s+2.001}\right)
=O(D(d22)(1+2α)(t1)).\displaystyle=O\left(D\cdot(d_{2}^{2})^{(1+2\alpha)(t-1)}\right).

From this point the proof continues exactly as the proof of 8.5.      

8.4 Round IV: Arbitrary Gentle List Decoding

In round III, when we take

s=Θ((log2(1/ε))1/6),s=\Theta((\log_{2}(1/\varepsilon))^{1/6}),

we will have λ2=4slog(s4s2)/s2s2ss2\lambda_{2}=4s\log(s^{4s^{2}})/s^{2s^{2}}\leq s^{-s^{2}} provided ss is large enough. This non-constant λ2\lambda_{2} will allow us perform “gentle” list decoding with radius arbitrarily close to 1/21/2. More precisely, we have the following.

Theorem 8.11 (Gentle List Decoding (restatement of Theorem 1.2)).

For every ε>0\varepsilon>0 sufficiently small, there are explicit binary linear Ta-Shma codes 𝒞N,ε,β𝔽2N\mathcal{C}_{N,\varepsilon,\beta}\subseteq\mathbb{F}_{2}^{N} for infinitely many values NN\in\mathbb{N} with

  1. (i)

    distance at least 1/2ε/21/2-\varepsilon/2 (actually ε\varepsilon-balanced),

  2. (ii)

    rate Ω(ε2+β)\Omega(\varepsilon^{2+\beta}) where β=O(1/(log2(1/ε))1/6)\beta=O(1/(\log_{2}(1/\varepsilon))^{1/6}), and

  3. (iii)

    a list decoding algorithm that decodes within radius 1/22Θ((log2(1/ε))1/6)1/2-2^{-\Theta((\log_{2}(1/\varepsilon))^{1/6})} in time NOε,β(1)N^{O_{\varepsilon,\beta}(1)}.

Proof.

We consider some parameter requirements in order to apply the list decoding framework Theorem 9.1 between 𝒞1\mathcal{C}_{\ell-1} and 𝒞\mathcal{C}_{\ell}. Suppose we want to list decode within radius 1/2η1/2-\sqrt{\eta}. For parity sampling, we need

sΘ(log2(1/η)).s\geq\Theta(\log_{2}(1/\eta)).

Since the number of vertices in a walk can be at most s2s^{2}, for splittability we need

η8/(s222s2)2ss2.\eta^{8}/(s^{2}\cdot 2^{2s^{2}})\geq 2\cdot s^{-s^{2}}.

In particular, we can take η=2Θ(s)\eta=2^{-\Theta(s)} and satisfy both conditions above.      

9 Instantiating the List Decoding Framework

We established the tensoriality (actually two-step tensoriality) and parity sampling properties of every lifting between consecutive codes 𝒞i1\mathcal{C}_{i-1} and 𝒞i\mathcal{C}_{i} in Ta-Shma’s cascade. Using these properties, we will be able to invoke the list decoding framework from [AJQ+20] to obtain the following list decoding result.

Theorem 9.1 (Restatement of Theorem 6.1).

Let η0(0,1/4)\eta_{0}\in(0,1/4) be a constant, η(0,η0)\eta\in(0,\eta_{0}), and

kk0(η)Θ(log(1/η)).k\geq k_{0}(\eta)\coloneqq\Theta(\log(1/\eta)).

Suppose 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} is an η0\eta_{0}-balanced linear code and 𝒞=dsumW(k)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(k)}(\mathcal{C}) is the direct sum lifting of 𝒞\mathcal{C} on a τ\tau-splittable collection of walks W(k)W(k), where W(k)W(k) is either the set of walks W[0,s]W[0,s] on an ss-wide replacement product graph or a set of walks using the random walk operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup}. There exists an absolute constant K>0K>0 such that if

ττ0(η,k)η8Kk24k,\tau\leq\tau_{0}(\eta,k)\coloneqq\frac{\eta^{8}}{K\cdot k\cdot 2^{4k}},

then the code 𝒞\mathcal{C}^{\prime} is η\eta-balanced and can be efficiently list decoded in the following sense:

If y~\tilde{y} is (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}^{\prime}, then we can compute the list

(y~,𝒞,𝒞){(z,dsumW(k)(z))z𝒞,Δ(dsumW(k)(z),y~)12η}\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})\coloneqq\left\{(z,\operatorname{dsum}_{W(k)}(z))\mid z\in\mathcal{C},\Delta\left\lparen\operatorname{dsum}_{W(k)}(z),\tilde{y}\right\rparen\leq\frac{1}{2}-\sqrt{\eta}\right\}

in time

nO(1/τ0(η,k)4)f(n),n^{O(1/\tau_{0}(\eta,k)^{4})}\cdot f(n),

where f(n)f(n) is the running time of a unique decoding algorithm for 𝒞\mathcal{C}. Otherwise, we return (y~,𝒞,𝒞)=\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})=\emptyset with the same running time of the preceding case  999In the case y~\tilde{y} is not (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}^{\prime}, but the SOS program turns out to be feasible, some of the calls to the unique decoding algorithm of 𝒞\mathcal{C} (issued by the list decoding framework) might be outside all unique decoding balls. Such cases may be handled by returning failure if the algorithm does not terminate by the time f(n)f(n). Even if a codeword in 𝒞\mathcal{C} is found, the pruning step of list decoding [AJQ+20] will return an empty list for (y~,𝒞,𝒞)\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime}) since y~\tilde{y} is not (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}..

9.1 List Decoding Framework

We recall the precise statement of the list decoding framework tailored to direct sum lifting.

Theorem 9.2 (List Decoding Theorem (Adapted from [AJQ+20])).

Suppose dsumW(k)\operatorname{dsum}_{W(k)} is an (η8/230,L)(\eta^{8}/2^{30},L)-two-step tensorial direct sum lifting from an η0\eta_{0}-balanced code 𝒞𝔽2n\mathcal{C}\subseteq\mathbb{F}_{2}^{n} to 𝒞\mathcal{C}^{\prime} on a multiset W(k)[n]kW(k)\subseteq[n]^{k} which is a (1/2+η0/2,η)(1/2+\eta_{0}/2,\eta)-parity sampler.

Let y~𝔽2W(k)\tilde{y}\in{\mathbb{F}}_{2}^{W(k)} be (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}^{\prime}. Then the List Decoding algorithm returns the coupled code list (y~,𝒞,𝒞)\mathcal{L}(\widetilde{y},\mathcal{C},\mathcal{C}^{\prime}). Furthermore, the running time is nO(L+k)(polylog(1/η)+f(n))n^{O(L+k)}\left({\mathrm{polylog}}(1/\eta)+f(n)\right) where f(n)f(n) is the running time of an unique decoding algorithm of 𝒞\mathcal{C}.

We apply the list decoding framework of Theorem 9.2 to the liftings arising in the Ta-Shma cascade to obtain Theorem 9.1. This requires choosing parameters so that both the parity sampling and tensoriality requirements are met at every level of the cascade, which we do by appealing to our results from Section 7.

Proof of Theorem 9.1.

We want to define parameters for τ\tau-splittability so that W(k)W(k) satisfies strong enough parity sampling and tensoriality assumptions to apply Theorem 9.2.

For parity sampling, we require W(k)W(k) to be an (1/2+η0/2,η)(1/2+\eta_{0}/2,\eta)-parity sampler. Suppose W(k)W(k) is τ\tau-splittable with τ<1/16\tau<1/16. By Corollary 7.4 or Corollary 7.7 and splittability, the collection of walks W(k)W(k) is an (η0,η)(\eta_{0}^{\prime},\eta^{\prime})-parity sampler, where η(η0+2τ)(k1)/2\eta^{\prime}\leq(\eta_{0}^{\prime}+2\tau)^{\lfloor(k-1)/2\rfloor}. To achieve the desired parity sampling, we take η0=1/2+η0/2\eta_{0}^{\prime}=1/2+\eta_{0}/2 and choose a value of kk large enough so that ηη\eta^{\prime}\leq\eta. Using the assumption η0<1/4\eta_{0}<1/4, we compute

η=(η0+2τ)(k1)/2(1/2+η0/2+2τ)k/21<(3/4)k/21,\eta^{\prime}=(\eta_{0}^{\prime}+2\tau)^{\lfloor(k-1)/2\rfloor}\leq(1/2+\eta_{0}/2+2\tau)^{k/2-1}<(3/4)^{k/2-1},

which will be smaller than η\eta as long as kk is at least

k0(η)=2(1+log(1/η)log(4/3))=Θ(log(1/η)).k_{0}(\eta)=2\left(1+\frac{\log(1/\eta)}{\log(4/3)}\right)=\Theta(\log(1/\eta)).

Achieving this level of parity sampling also ensures that the lifted code 𝒞\mathcal{C}^{\prime} is η\eta-balanced.

The list decoding theorem also requires (η8/230,L)(\eta^{8}/2^{30},L)-two-step tensoriality. Lemma 7.24 (with s=ks=k) and Lemma 7.25 each provide (μ,L)(\mu,L)-two-step tensoriality for τ\tau-splittable walk collections on the ss-wide replacement product and using 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup}, respectively, with

L128k424kμ4 and τμ4k24k.L\geq\frac{128k^{4}\cdot 2^{4k}}{\mu^{4}}\text{\quad and \quad}\tau\leq\frac{\mu}{4k\cdot 2^{4k}}.

To get μ=η8/230\mu=\eta^{8}/2^{30}, we require

LKk424kη32 and ττ0(η,k)=η8Kk24k,L\geq\frac{K^{\prime}\cdot k^{4}\cdot 2^{4k}}{\eta^{32}}\text{\quad and \quad}\tau\leq\tau_{0}(\eta,k)=\frac{\eta^{8}}{K\cdot k\cdot 2^{4k}},

where KK and KK^{\prime} are (very large) constants. This ensures that τ\tau is small enough for the parity sampling requirement as well. With these parameters, the running time for the list decoding algorithm in Theorem 9.2 becomes

nO(L+k)(polylog(1/η)+f(n))=nO(L)f(n)=nO(1/τ0(η,k)4)f(n).n^{O(L+k)}({\mathrm{polylog}}(1/\eta)+f(n))=n^{O(L)}\cdot f(n)=n^{O(1/\tau_{0}(\eta,k)^{4})}\cdot f(n).

 

For decoding in fixed polynomial time, we also need a variation of list decoding where we don’t run the unique decoding algorithm of the base code and only obtain an approximate list of solutions. The proof is very similar to the proof of Theorem 9.1 above.

Theorem 9.3 (Restatement of Theorem 6.12).

Let η0(0,1/4)\eta_{0}\in(0,1/4) be a constant, η(0,η0)\eta\in(0,\eta_{0}), ζ=1/8η0/8\zeta=1/8-\eta_{0}/8, and

kk0(η)Θ(log(1/η)).k\geq k_{0}^{\prime}(\eta)\coloneqq\Theta(\log(1/\eta)).

Suppose 𝒞𝔽2n\mathcal{C}\subseteq{\mathbb{F}}_{2}^{n} is an η0\eta_{0}-balanced linear code and 𝒞=dsumW(k)(𝒞)\mathcal{C}^{\prime}=\operatorname{dsum}_{W(k)}(\mathcal{C}) is the direct sum lifting of 𝒞\mathcal{C} on a τ\tau-splittable collection of walks W(k)W(k), where W(k)W(k) is either the set of walks W[0,s]W[0,s] on an ss-wide replacement product graph or a set of walks using the random walk operator 𝖲r,r\mathsf{S}_{r,r}^{\bigtriangleup}. There exists an absolute constant K>0K>0 such that if

ττ0(η,k)η8Kk24k,\tau\leq\tau_{0}(\eta,k)\coloneqq\frac{\eta^{8}}{K\cdot k\cdot 2^{4k}},

then the code 𝒞\mathcal{C}^{\prime} is η\eta-balanced, W(k)W(k) is a (12ζ,η)(1-2\zeta,\eta)-parity sampler, and we have the following:

If y~\tilde{y} is (1/2η)(1/2-\sqrt{\eta})-close to 𝒞\mathcal{C}^{\prime}, then we can compute a ζ\zeta-cover \mathcal{L}^{\prime} of the list

(y~,𝒞,𝒞){(z,dsumW(k)(z))z𝒞,Δ(dsumW(k)(z),y~)12η}\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime})\coloneqq\left\{(z,\operatorname{dsum}_{W(k)}(z))\mid z\in\mathcal{C},\Delta\left\lparen\operatorname{dsum}_{W(k)}(z),\tilde{y}\right\rparen\leq\frac{1}{2}-\sqrt{\eta}\right\}

in which Δ(y,y~)1/2η\Delta(y^{\prime},\tilde{y})\leq 1/2-\sqrt{\eta} for every (z,y)(z^{\prime},y^{\prime})\in\mathcal{L}^{\prime} 101010A randomized rounding will ensure this, but see Appendix D for obtaining this property deterministically., in time

nO(1/τ0(η,k)4).n^{O(1/\tau_{0}(\eta,k)^{4})}.

Otherwise, we return =\mathcal{L}^{\prime}=\emptyset with the same running time of the preceding case.

Proof.

The list decoding framework produces a cover \mathcal{L}^{\prime} of the list (y~,𝒞,𝒞)\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime}), and, as its final step, corrects the cover to obtain the actual list (y~,𝒞,𝒞)\mathcal{L}(\tilde{y},\mathcal{C},\mathcal{C}^{\prime}) by running the unique decoding algorithm of 𝒞\mathcal{C} on each entry of \mathcal{L}^{\prime} (see [AJQ+20] for details). Using Theorem 9.2 with a (12ζ,η)(1-2\zeta,\eta)-parity sampler and omitting this final step of the algorithm, we can obtain the ζ\zeta-cover \mathcal{L}^{\prime} in time nO(L+k)polylog(1/η)n^{O(L+k)}{\mathrm{polylog}}(1/\eta).

The tensoriality part of the proof of Theorem 9.1 applies here unchanged, so we need only make sure kk is large enough to yield the stronger parity sampling necessary for this theorem. As in that proof, we have that W(k)W(k) is an (η0,η)(\eta_{0}^{\prime},\eta^{\prime})-parity sampler with η(η0+2τ)(k1)/2\eta^{\prime}\leq(\eta_{0}^{\prime}+2\tau)^{\lfloor(k-1)/2\rfloor}. Take η0=12ζ=3/4+η0/4\eta_{0}^{\prime}=1-2\zeta=3/4+\eta_{0}/4. Using η0<1/4\eta_{0}<1/4 and assuming τ<1/16\tau<1/16, we have

η(η0+2τ)(k1)/2(3/4+η0/4+2τ)k/21<(15/16)k/21,\eta^{\prime}\leq(\eta_{0}^{\prime}+2\tau)^{\lfloor(k-1)/2\rfloor}\leq(3/4+\eta_{0}/4+2\tau)^{k/2-1}<(15/16)^{k/2-1},

which will be smaller than η\eta as long as kk is at least

k0(η)=2(1+log(1/η)log(16/15))=Θ(log(1/η)).k_{0}^{\prime}(\eta)=2\left(1+\frac{\log(1/\eta)}{\log(16/15)}\right)=\Theta(\log(1/\eta)).

 

Acknowledgement

We thank Amnon Ta-Shma for suggesting the problem of decoding in fixed polynomial running time (with the exponent of NN independent of ε\varepsilon) which led us to think about Theorem 6.9. Part of this work was done when some of the authors were visiting the Simons Institute in Berkeley, and we thank them for their kind hospitality.

References

  • [ABN+92] N. Alon, J. Bruck, J. Naor, M. Naor, and R. Roth. Construction of asymptotically good, low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 28:509–516, 1992.
  • [AGHP92] N. Alon, O. Goldreich, J. Håstad, and R. Peralta. Simple constructions of almost kk-wise independent random variables. Random Structures and Algorithms, 3(3):289–304, 1992.
  • [AJQ+20] Vedat Levi Alev, Fernando Granha Jeronimo, Dylan Quintana, Shashank Srivastava, and Madhur Tulsiani. List decoding of direct sum codes. In Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms, pages 1412–1425. SIAM, 2020.
  • [AJT19] Vedat Levi Alev, Fernando Granha Jeronimo, and Madhur Tulsiani. Approximating constraint satisfaction problems on high-dimensional expanders. In Proceedings of the 60th IEEE Symposium on Foundations of Computer Science, pages 180–201, 2019.
  • [Ari09] E. Arikan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory, 55(7):3051–3073, July 2009.
  • [Aro02] Sanjeev Arora. How NP got a new definition: a survey of probabilistically checkable proofs. In Proceedings of the International Congress of Mathematicians, pages 637–648, 2002. Volume 3.
  • [Bog12] Andrej Bogdanov. A different way to improve the bias via expanders. Lecture notes, April 2012. URL: http://www.cse.cuhk.edu.hk/~andrejb/csc5060/notes/12L12.pdf.
  • [BRS11] Boaz Barak, Prasad Raghavendra, and David Steurer. Rounding semidefinite programming hierarchies via global correlation. In Proceedings of the 52nd IEEE Symposium on Foundations of Computer Science, pages 472–481, 2011.
  • [Cha16] Siu On Chan. Approximation resistance from pairwise-independent subgroups. J. ACM, 63(3), August 2016.
  • [Chu97] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
  • [CT06] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, New York, NY, USA, 2006.
  • [DDG+15] Roee David, Irit Dinur, Elazar Goldenberg, Guy Kindler, and Igor Shinkar. Direct sum testing. ITCS ’15, pages 327–336, New York, NY, USA, 2015. ACM.
  • [Del75] P. Delsarte. The association schemes of coding theory. In Combinatorics, pages 143–161. Springer Netherlands, 1975.
  • [DHK+19] Irit Dinur, Prahladh Harsha, Tali Kaufman, Inbal Livni Navon, and Amnon Ta-Shma. List decoding with double samplers. In Proceedings of the 30th ACM-SIAM Symposium on Discrete Algorithms, pages 2134–2153, 2019.
  • [DK17] Irit Dinur and Tali Kaufman. High dimensional expanders imply agreement expanders. In Proceedings of the 58th IEEE Symposium on Foundations of Computer Science, pages 974–985, 2017.
  • [DS14] Irit Dinur and David Steurer. Direct product testing. In Proceedings of the 29th IEEE Conference on Computational Complexity, CCC ’14, pages 188–196, 2014.
  • [Gal62] R. Gallager. Low-density parity-check codes. IRE Transactions on Information Theory, 8(1):21–28, 1962.
  • [GI01] Venkatesan Guruswami and Piotr Indyk. Expander-based constructions of efficiently decodable codes. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, pages 658–667, 2001.
  • [GI03] Venkatesan Guruswami and Piotr Indyk. Linear time encodable and list decodable codes. In Proceedings of the 35th ACM Symposium on Theory of Computing, 2003.
  • [GI04] Venkatesan Guruswami and Piotr Indyk. Efficiently decodable codes meeting Gilbert-Varshamov bound for low rates. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms, SODA ’04, pages 756–757, 2004.
  • [Gil52] E.N. Gilbert. A comparison of signalling alphabets. Bell System Technical Journal, 31:504–522, 1952.
  • [GKO+17] Sivakanth Gopi, Swastik Kopparty, Rafael Oliveira, Noga Ron-Zewi, and Shubhangi Saraf. Locally testable and locally correctable codes approaching the Gilbert-Varshamov bound. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms, SODA ’17, pages 2073–2091, 2017.
  • [GR06] Venkatesan Guruswami and Atri Rudra. Explicit capacity-achieving list-decodable codes. In Proceedings of the 38th ACM Symposium on Theory of Computing, pages 1–10, 2006.
  • [GR08] Venkatesan Guruswami and Atri Rudra. Concatenated codes can achieve list-decoding capacity. In Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, pages 258–267, 2008.
  • [Gri01] Dima Grigoriev. Linear lower bound on degrees of positivstellensatz calculus proofs for the parity. Theor. Comput. Sci., 259(1-2):613–622, 2001.
  • [GRS19] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan. Essential coding theory. 2019.
  • [GRY19] Venkatesan Guruswami, Andrii Riazanov, and Min Ye. Arikan meets Shannon: Polar codes with near-optimal convergence to channel capacity. Electronic Colloquium on Computational Complexity (ECCC), 26:154, 2019.
  • [GS11] Venkatesan Guruswami and Ali Kemal Sinop. Lasserre hierarchy, higher eigenvalues, and approximation schemes for graph partitioning and quadratic integer programming with psd objectives. In FOCS, pages 482–491, 2011.
  • [Gur05] Venkatesan Guruswami. Algebraic-geometric generalizations of the Parvaresh-Vardy codes. Electronic Colloquium on Computational Complexity (ECCC), (132), 2005.
  • [Gur09] Venkatesan Guruswami. List decoding of binary codes–a brief survey of some recent results. In Coding and Cryptology, pages 97–106. Springer Berlin Heidelberg, 2009.
  • [Gur10] Venkatesan Guruswami. Bridging Shannon and Hamming: List error-correction with optimal rate. In ICM, 2010.
  • [Ham50] Richard Hamming. Error detecting and error correcting codes. Bell System Technical Journal, 29:147–160, 1950.
  • [Hås97] J. Håstad. Some optimal inapproximability results. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages 1–10, 1997.
  • [Hås01] Johan Håstad. Some optimal inapproximability results. Journal of the ACM, 48(4):798–859, 2001.
  • [HRW17] B. Hemenway, N. Ron-Zewi, and M. Wootters. Local list recovery of high-rate tensor codes applications. In Proceedings of the 58th IEEE Symposium on Foundations of Computer Science, pages 204–215, Oct 2017.
  • [IKW09] Russell Impagliazzo, Valentine Kabanets, and Avi Wigderson. New direct-product testers and 2-query PCPs. In Proceedings of the 41st ACM Symposium on Theory of Computing, STOC ’09, pages 131–140, 2009.
  • [IW97] Russell Impagliazzo and Avi Wigderson. P=BPPP=BPP unless EE has sub-exponential circuits. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages 220–229, 1997.
  • [LPS88] Alexander Lubotzky, R. Phillips, and Peter Sarnak. Ramanujan graphs. Combinatorica, 8:261–277, 1988.
  • [MMT11] Alexander May, Alexander Meurer, and Enrico Thomae. Decoding random linear codes in 𝒪~(20.054n)\tilde{\mathcal{O}}(2^{0.054n}). In Advances in Cryptology – ASIACRYPT 2011, pages 107–124, 2011.
  • [MRRW77] R. McEliece, E. Rodemich, H. Rumsey, and L. Welch. New upper bounds on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Transactions on Information Theory, 23(2):157–166, 1977.
  • [MRRZ+19] Jonathan Mosheiff, Nicolas Resch, Noga Ron-Zewi, Shashwat Silas, and Mary Wootters. LDPC codes achieve list decoding capacity, 2019. arXiv:1909.06430.
  • [NN90] J. Naor and M. Naor. Small-bias probability spaces: efficient constructions and applications. In Proceedings of the 22nd ACM Symposium on Theory of Computing, pages 213–223, 1990.
  • [NS09] Michael Navon and Alex Samorodnitsky. Linear programming bounds for codes via a covering argument. Discrete Comput. Geom., 41(2):199–207, March 2009.
  • [RT12] Prasad Raghavendra and Ning Tan. Approximating CSPs with global cardinality constraints using SDP hierarchies. In Proceedings of the 23rd ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, pages 373–387, 2012.
  • [RVW00] O. Reingold, S. Vadhan, and A. Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science, 2000.
  • [Sha48] Claude Shannon. A mathematical theory of communications. Bell System Technical Journal, 27:379–423, 623–656, 1948.
  • [SKS19] Ronen Shaltiel, Swastik Kopparty, and Jad Silbak. Quasilinear time list-decodable codes for space bounded channels. 2019.
  • [Sti08] Henning Stichtenoth. Algebraic Function Fields and Codes. Springer Publishing Company, Incorporated, 2nd edition, 2008.
  • [Tho83] C. Thommesen. The existence of binary linear concatenated codes with Reed- Solomon outer codes which asymptotically meet the Gilbert-Varshamov bound. IEEE Transactions on Information Theory, 29(6):850–853, November 1983.
  • [TS17] Amnon Ta-Shma. Explicit, almost optimal, epsilon-balanced codes. In Proceedings of the 49th ACM Symposium on Theory of Computing, STOC 2017, pages 238–251, New York, NY, USA, 2017. ACM.
  • [Var57] R.R. Varshamov. Estimate of the number of signals in error correcting codes. Doklady Akademii Nauk SSSR, 117:739–741, 1957.
  • [vL99] Jacobus H. van Lint. Introduction to Coding Theory. Springer-Verlag, 1999.

Appendix A Auxiliary Results to Obtain Tensoriality

A key result used in the SOS rounding analysis is embodied in Lemma A.1 below. Roughly speaking, it quantifies the decrease in the potential ΦG\Phi^{G}, under conditioning on a random 𝐘i\mathbf{Y}_{i} for iVi\sim V, when the ensemble {𝐘i}\{\mathbf{Y}_{i}\} has non-trivial correlation over the edges and GG is a strong enough expander graph. A generalization of this result to low threshold rank graphs was present in [BRS11]. To derive sharper parameters in the simpler expander case and to make the presentation self-contained, we give (essentially) a full proof of this result.

Lemma A.1 (Progress Lemma).

Suppose GG satisfy λ2(G)β2/q4\lambda_{2}(G)\leq\beta^{2}/q^{4}. If

𝔼ij[{𝐘i𝐘j}{𝐘i}{𝐘j}1]β,\mathchoice{\underset{i\sim j}{\mathbb{E}}\left[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}\right]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}\geq\beta,

then

𝔼jV[Φ|𝐘jG]ΦGβ24q4.\mathchoice{\underset{j\sim V}{\mathbb{E}}\left[\Phi^{G}_{|\mathbf{Y}_{j}}\right]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}\leq\Phi^{G}-\frac{\beta^{2}}{4\cdot q^{4}}.

A.1 Expander Case

We will need the following characterization of the spectral gap of regular graph GG. We denote by 𝖠G\mathsf{A}_{G} its adjacency operator and by 𝖫G\mathsf{L}_{G} its Laplacian operator [Chu97].

Fact A.2 (Spectral Gap [Chu97]).
λ2(𝖫G)=minv1,,vnn𝔼ijvivj2𝔼i,jVvivj2.\lambda_{2}(\mathsf{L}_{G})~{}=~{}\min_{v_{1},\dots,v_{n}\in\mathbb{R}^{n}}\frac{{\mathbb{E}}_{i\sim j}\left\lVert v_{i}-v_{j}\right\rVert^{2}}{{\mathbb{E}}_{i,j\sim V}\left\lVert v_{i}-v_{j}\right\rVert^{2}}.

Using the above characterization, we derive the following local-to-global result.

Lemma A.3 (Local-to-Global).

Let v1,,vnnv_{1},\dots,v_{n}\in\mathbb{R}^{n} be vectors in the unit ball. Suppose λ2(𝖫G)1β/2\lambda_{2}(\mathsf{L}_{G})\geq 1-\beta/2 (equivalently λ2(𝖠G)β/2\lambda_{2}(\mathsf{A}_{G})\leq\beta/2). If 𝔼ijvi,vjβ{\mathbb{E}}_{i\sim j}\left\langle v_{i},v_{j}\right\rangle\geq\beta, then

𝔼i,jVvi,vjβ2.{\mathbb{E}}_{i,j\sim V}\left\langle v_{i},v_{j}\right\rangle~{}\geq~{}\frac{\beta}{2}.
Proof.

Using A.2, we have

λ2(𝖫G)𝔼iVvi2𝔼ijvi,vj𝔼iVvi2𝔼i,jVvi,vj.\lambda_{2}(\mathsf{L}_{G})~{}\leq~{}\frac{{\mathbb{E}}_{i\sim V}\left\lVert v_{i}\right\rVert^{2}-{\mathbb{E}}_{i\sim j}\left\langle v_{i},v_{j}\right\rangle}{{\mathbb{E}}_{i\sim V}\left\lVert v_{i}\right\rVert^{2}-{\mathbb{E}}_{i,j\sim V}\left\langle v_{i},v_{j}\right\rangle}.

Set λ2=λ2(𝖫G)\lambda_{2}=\lambda_{2}(\mathsf{L}_{G}). We consider two cases: λ21\lambda_{2}\leq 1 and λ2>1\lambda_{2}>1. First, suppose λ21\lambda_{2}\leq 1. Then

𝔼i,jVvi,vj\displaystyle{\mathbb{E}}_{i,j\sim V}\left\langle v_{i},v_{j}\right\rangle 1λ2𝔼ijvi,vj(1λ2λ2)𝔼iVvi2\displaystyle~{}\geq~{}\frac{1}{\lambda_{2}}{\mathbb{E}}_{i\sim j}\left\langle v_{i},v_{j}\right\rangle-\left(\frac{1-\lambda_{2}}{\lambda_{2}}\right){\mathbb{E}}_{i\sim V}\left\lVert v_{i}\right\rVert^{2}
1λ2(β(1λ2))\displaystyle~{}\geq~{}\frac{1}{\lambda_{2}}\left(\beta-\left(1-\lambda_{2}\right)\right)
1λ2(β(β2))β2.\displaystyle~{}\geq~{}\frac{1}{\lambda_{2}}\left(\beta-\left(\frac{\beta}{2}\right)\right)\geq\frac{\beta}{2}.

Now suppose λ2>1\lambda_{2}>1. Then

𝔼i,jVvi,vj\displaystyle{\mathbb{E}}_{i,j\sim V}\left\langle v_{i},v_{j}\right\rangle 1λ2𝔼ijvi,vj(1λ2λ2)𝔼iVvi2\displaystyle~{}\geq~{}\frac{1}{\lambda_{2}}{\mathbb{E}}_{i\sim j}\left\langle v_{i},v_{j}\right\rangle-\left(\frac{1-\lambda_{2}}{\lambda_{2}}\right){\mathbb{E}}_{i\sim V}\left\lVert v_{i}\right\rVert^{2}
1λ2𝔼ijvi,vj1λ2ββ2,\displaystyle~{}\geq~{}\frac{1}{\lambda_{2}}{\mathbb{E}}_{i\sim j}\left\langle v_{i},v_{j}\right\rangle\geq\frac{1}{\lambda_{2}}\cdot\beta\geq\frac{\beta}{2},

where the last inequality follows from λ22\lambda_{2}\leq 2 for any graph GG.      

More Preliminaries

We will need some standard notions in information theory [CT06].

Definition A.4 (Relative Entropy/Kullback-Leibler Divergence).

The relative entropy of two distributions D1D_{1} and D2D_{2} with support contained in 𝒬\mathcal{Q} is

KL(D1,D2)a𝒬D1(a)log(D1(a)D2(a)).\textup{KL}(D_{1},D_{2})~{}\coloneqq~{}\sum_{a\in\mathcal{Q}}D_{1}(a)\log\left(\frac{D_{1}(a)}{D_{2}(a)}\right).
Notation A.5.

Let 𝐗\mathbf{X} be a random variable. We denote by {𝐗}\{\mathbf{X}\} the distribution of 𝐗\mathbf{X}.

Definition A.6 (Mutual Information).

Let 𝐗,𝐘\mathbf{X},\mathbf{Y} be two random variables. The mutual information I(𝐗,𝐘)\textup{I}(\mathbf{X},\mathbf{Y}) is

I(𝐗,𝐘)KL({𝐗,𝐘},{𝐗}{𝐘}).\textup{I}(\mathbf{X},\mathbf{Y})~{}\coloneqq~{}\textup{KL}(\{\mathbf{X},\mathbf{Y}\},\{\mathbf{X}\}\{\mathbf{Y}\}).
Fact A.7.
I(𝐗,𝐘)=H(𝐗)H(𝐗|𝐘).\textup{I}(\mathbf{X},\mathbf{Y})~{}=~{}\textup{H}(\mathbf{X})-\textup{H}(\mathbf{X}|\mathbf{Y}).
Fact A.8 (Fact B.5 of Raghavendra and Tan [RT12]).

Let 𝐗a\mathbf{X}_{a} and 𝐗b\mathbf{X}_{b} be indicator random variables. Then

Cov(𝐗a,𝐗b)22I(𝐗a,𝐗b).\textup{Cov}(\mathbf{X}_{a},\mathbf{X}_{b})^{2}~{}\leq~{}2\cdot\textup{I}(\mathbf{X}_{a},\mathbf{X}_{b}).

Progress Lemma

We are ready to prove Lemma A.1 which we restate below for convenience.

Lemma A.9 (Progress Lemma (restatement of Lemma A.1)).

Suppose GG satisfy λ2(G)β2/q4\lambda_{2}(G)\leq\beta^{2}/q^{4}. If

𝔼ij[{𝐘i𝐘j}{𝐘i}{𝐘j}1]β,\mathchoice{\underset{i\sim j}{\mathbb{E}}\left[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}\right]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}\geq\beta,

then

𝔼jV[Φ|𝐘jG]ΦGβ24q4.\mathchoice{\underset{j\sim V}{\mathbb{E}}\left[\Phi^{G}_{|\mathbf{Y}_{j}}\right]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}\leq\Phi^{G}-\frac{\beta^{2}}{4\cdot q^{4}}.
Proof.

Firstly, we show how to relate the distances {𝐘i𝐘j}{𝐘i}{𝐘j}1\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1} over the edges iji\sim j to certain covariances. Let a,b[q]2a,b\in[q]^{2}. Observe that

|Cov(𝐘i,a,𝐘j,b)|=|Pr[𝐘i=a𝐘j=b]Pr[𝐘i=a]Pr[𝐘j=b]|.\left\lvert\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\right\rvert~{}=~{}\left\lvert\Pr[\mathbf{Y}_{i}=a\wedge\mathbf{Y}_{j}=b]-\Pr[\mathbf{Y}_{i}=a]\Pr[\mathbf{Y}_{j}=b]\right\rvert.

We have

𝔼ij[1q2a,b[q]2Cov(𝐘i,a,𝐘j,b)2]\displaystyle\mathchoice{\underset{i\sim j}{\mathbb{E}}\left[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}\right]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}~{}\geq~{} (𝔼ij[1q2a,b[q]2|Cov(𝐘i,a,𝐘j,b)|])2\displaystyle\left(\mathchoice{\underset{i\sim j}{\mathbb{E}}\left[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\left|\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\right|\right]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\left|\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\right|]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\left|\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\right|]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\left|\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\right|]}\right)^{2}
\displaystyle~{}\geq~{} 1q4(𝔼ij[{𝐘i𝐘j}{𝐘i}{𝐘j}1])2β2q4.\displaystyle\frac{1}{q^{4}}\left(\mathchoice{\underset{i\sim j}{\mathbb{E}}\left[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}\right]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}{{\mathbb{E}}_{i\sim j}[\left\lVert\{\mathbf{Y}_{i}\mathbf{Y}_{j}\}-\{\mathbf{Y}_{i}\}\{\mathbf{Y}_{j}\}\right\rVert_{1}]}\right)^{2}\geq\frac{\beta^{2}}{q^{4}}.

Note that the graph GJ/q\mathcal{F}\coloneqq G\otimes J/q is an expander with λ2(GJ/q)=λ2(G)\lambda_{2}(G\otimes J/q)=\lambda_{2}(G). Moreover, the matrix 𝖢{Cov(𝐘i,a,𝐘j,b)}i,jV;a,b[q]2\mathsf{C}\coloneqq\{\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\}_{i,j\in V;a,b\in[q]^{2}} is PSD since the vectorization {vi,a𝔼[𝐘i,a]v}iV;a[q]\{v_{i,a}-{\mathbb{E}}[\mathbf{Y}_{i,a}]\cdot v_{\emptyset}\}_{i\in V;a\in[q]} gives a Gram matrix decomposition of 𝖢\mathsf{C}. Thus, the covariance matrix {Cov(𝐘i,a,𝐘j,b)2}i,jV;a,b[q]2\{\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}\}_{i,j\in V;a,b\in[q]^{2}} is also PSD since it is the Schur product (i.e., entrywise product) of two PSD matrices, namely, 𝖢𝖢\mathsf{C}\circ\mathsf{C}. Therefore, we are in position of applying the local-to-global Lemma A.3 with the expander \mathcal{F} and a vectorization for 𝖢𝖢\mathsf{C}\circ\mathsf{C}. We have

β2q4\displaystyle\frac{\beta^{2}}{q^{4}} 𝔼ij[1q2a,b[q]2Cov(𝐘i,a,𝐘j,b)2]\displaystyle~{}\leq~{}\mathchoice{\underset{i\sim j}{\mathbb{E}}\left[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}\right]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}{{\mathbb{E}}_{i\sim j}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}
2𝔼i,jV2[1q2a,b[q]2Cov(𝐘i,a,𝐘j,b)2]\displaystyle~{}\leq~{}2\mathchoice{\underset{i,j\sim V^{\otimes 2}}{\mathbb{E}}\left[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}\right]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\frac{1}{q^{2}}\sum_{a,b\in[q]^{2}}\textup{Cov}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)^{2}]} (local-to-global Lemma A.3)
4q2𝔼i,jV2[a,b[q]2I(𝐘i,a,𝐘j,b)]\displaystyle~{}\leq~{}\frac{4}{q^{2}}\mathchoice{\underset{i,j\sim V^{\otimes 2}}{\mathbb{E}}\left[\sum_{a,b\in[q]^{2}}\textup{I}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)\right]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a,b\in[q]^{2}}\textup{I}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a,b\in[q]^{2}}\textup{I}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a,b\in[q]^{2}}\textup{I}\left(\mathbf{Y}_{i,a},\mathbf{Y}_{j,b}\right)]} (A.8)
4q2𝔼i,jV2[a,b[q]2H(𝐘i,a)H(𝐘i,a|𝐘j,b)]\displaystyle~{}\leq~{}\frac{4}{q^{2}}\mathchoice{\underset{i,j\sim V^{\otimes 2}}{\mathbb{E}}\left[\sum_{a,b\in[q]^{2}}\textup{H}\left(\mathbf{Y}_{i,a}\right)-\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j,b}\right)\right]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a,b\in[q]^{2}}\textup{H}\left(\mathbf{Y}_{i,a}\right)-\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j,b}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a,b\in[q]^{2}}\textup{H}\left(\mathbf{Y}_{i,a}\right)-\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j,b}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a,b\in[q]^{2}}\textup{H}\left(\mathbf{Y}_{i,a}\right)-\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j,b}\right)]}
4q[𝔼iV[a[q]H(𝐘i,a)]𝔼i,jV2[a[q]H(𝐘i,a|𝐘j)]]\displaystyle~{}\leq~{}\frac{4}{q}\left[\mathchoice{\underset{i\sim V}{\mathbb{E}}\left[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}\right)\right]}{{\mathbb{E}}_{i\sim V}[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}\right)]}{{\mathbb{E}}_{i\sim V}[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}\right)]}{{\mathbb{E}}_{i\sim V}[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}\right)]}-\mathchoice{\underset{i,j\sim V^{\otimes 2}}{\mathbb{E}}\left[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j}\right)\right]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\sum_{a\in[q]}\textup{H}\left(\mathbf{Y}_{i,a}|\mathbf{Y}_{j}\right)]}\right]
=4[𝔼iV[(𝐘i)]𝔼i,jV2[(𝐘i|𝐘j)]]\displaystyle~{}=~{}4\left[\mathchoice{\underset{i\sim V}{\mathbb{E}}\left[\mathcal{H}\left(\mathbf{Y}_{i}\right)\right]}{{\mathbb{E}}_{i\sim V}[\mathcal{H}\left(\mathbf{Y}_{i}\right)]}{{\mathbb{E}}_{i\sim V}[\mathcal{H}\left(\mathbf{Y}_{i}\right)]}{{\mathbb{E}}_{i\sim V}[\mathcal{H}\left(\mathbf{Y}_{i}\right)]}-\mathchoice{\underset{i,j\sim V^{\otimes 2}}{\mathbb{E}}\left[\mathcal{H}\left(\mathbf{Y}_{i}|\mathbf{Y}_{j}\right)\right]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\mathcal{H}\left(\mathbf{Y}_{i}|\mathbf{Y}_{j}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\mathcal{H}\left(\mathbf{Y}_{i}|\mathbf{Y}_{j}\right)]}{{\mathbb{E}}_{i,j\sim V^{\otimes 2}}[\mathcal{H}\left(\mathbf{Y}_{i}|\mathbf{Y}_{j}\right)]}\right]
=4[ΦG𝔼jV[Φ|𝐘jG]].\displaystyle~{}=~{}4\left[\Phi^{G}-\mathchoice{\underset{j\sim V}{\mathbb{E}}\left[\Phi^{G}_{|\mathbf{Y}_{j}}\right]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}\right].

Therefore, we have 𝔼jV[Φ|𝐘jG]ΦGβ2/(4q4)\mathchoice{\underset{j\sim V}{\mathbb{E}}\left[\Phi^{G}_{|\mathbf{Y}_{j}}\right]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}{{\mathbb{E}}_{j\sim V}[\Phi^{G}_{|\mathbf{Y}_{j}}]}\leq\Phi^{G}-\beta^{2}/(4\cdot q^{4}), as claimed.      

Appendix B Explicit Structures

We recall some explicit structures used in Ta-Shma’s construction.

B.1 Explicit Ramanujan Graphs

The outer graph GG in the ss-wide replacement product was chosen to be a Ramanujan graph. Ta-Shma provides a convenient lemma to efficiently obtain explicit Ramanujan graphs given the intended number of vertices nn (which might end up being nearly twice this much), the expansion λ\lambda and an error parameter θ>0\theta>0. These Ramanujan graphs are based on the LPS construction [LPS88]. Due to number theoretic reasons, we might be forced to work with slightly different parameters, but this is not an issue.

Lemma B.1 (Lemma 2.10 [TS17]).

For every θ>0\theta>0, there exists an algorithm that given nn and λ(0,1)\lambda\in(0,1) runs in time poly(n){\mathrm{poly}}(n) and outputs a Ramanujan graph GG such that

  • -

    GG has degree d8/λd\leq 8/\lambda,

  • -

    σ2(G)λ\sigma_{2}(G)\leq\lambda, and

  • -

    |V(G)|\left\lvert V(G)\right\rvert is either in the range [(1θ)n,n][(1-\theta)n,n] or in the range [(1θ)2n,2n][(1-\theta)2n,2n].

Moreover, the algorithm outputs a locally invertible function φ:[d][d]\varphi\colon[d]\rightarrow[d] computable in polynomial time in its input length.

B.2 Explicit Biased Distribution

The inner graph HH in the ss-wide replacement product is chosen to be a Cayley graph on 2m\mathbb{Z}_{2}^{m} for some positive integer mm. Ta-Shma uses the construction of Alon et al. [AGHP92] (AGHP) to deduce a result similar to Lemma B.2 below. To compute the refined parameter version of our main result Theorem 1.1, we will need the specifics of the AGHP construction.

Lemma B.2 (Based on Lemma 6 [TS17]).

For every β=β(m)\beta=\beta(m), there exists a fully explicit set A2mA\subseteq\mathbb{Z}_{2}^{m} such that

  • -

    |A|4m2/β2\left\lvert A\right\rvert\leq 4\cdot m^{2}/\beta^{2}, and

  • -

    for every S[m]S\subseteq[m], we have |𝔼zAχS(z)|β\left\lvert{\mathbb{E}}_{z\in A}\chi_{S}(z)\right\rvert\leq\beta.

Furthermore, if m/βm/\beta is a power of 22, then |A|=m2/β2\left\lvert A\right\rvert=m^{2}/\beta^{2}. In particular, the graph Cay(2m,A)\textup{Cay}(\mathbb{Z}_{2}^{m},A) is a (n=2m,d=|A|,λ=β)(n=2^{m},d=\left\lvert A\right\rvert,\lambda=\beta) expander graph.

Remark B.3.

Given d,m+d,m\in\mathbb{N}^{+} such that dd is the square of a power of 22 with d2md\leq 2^{m}, by setting β=m/d\beta=m/\sqrt{d} we can use Lemma B.2 with β\beta and mm (note that m/βm/\beta is a power of 22) to obtain a Cayley graph Cay(2m,A)\textup{Cay}(\mathbb{Z}_{2}^{m},A) with parameters (n=2m,d=|A|,λ=β)(n=2^{m},d=\left\lvert A\right\rvert,\lambda=\beta).

Appendix C Zig-Zag Spectral Bound

We prove the zig-zag spectral bound 4.4.

Claim C.1.

Let GG be an outer graph and HH be an inner graph used in the ss-wide replacement product. For any integer 0is10\leq i\leq s-1,

σ2((I𝖠H)Gi(I𝖠H))σ2(G)+2σ2(H)+σ2(H)2.\sigma_{2}((I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H}))\leq\sigma_{2}(G)+2\cdot\sigma_{2}(H)+\sigma_{2}(H)^{2}.
Proof.

Let vv be a unit vector such that v𝟏v\bot\mathbf{1}, and decompose it into v=u+wv=u+w such that u𝒲=span{abV(G)V(H)b=𝟏}u\in\mathcal{W}^{\parallel}=\textup{span}\{a\otimes b\in\mathbb{R}^{V(G)}\otimes\mathbb{R}^{V(H)}\mid b=\mathbf{1}\} and w𝒲=(𝒲)w\in\mathcal{W}^{\bot}=(\mathcal{W}^{\parallel})^{\bot}.

|v,(I𝖠H)Gi(I𝖠H)v|\displaystyle\left|\left\langle v,(I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H})v\right\rangle\right|\leq |u,(I𝖠H)Gi(I𝖠H)u|+|u,(I𝖠H)Gi(I𝖠H)w|+\displaystyle\left|\left\langle u,(I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H})u\right\rangle\right|+\left|\left\langle u,(I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H})w\right\rangle\right|+
|w,(I𝖠H)Gi(I𝖠H)u|+|w,(I𝖠H)Gi(I𝖠H)w|\displaystyle\left|\left\langle w,(I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H})u\right\rangle\right|+\left|\left\langle w,(I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H})w\right\rangle\right|
\displaystyle\leq |u,Giu|+|(I𝖠H)w|+\displaystyle\left|\left\langle u,G_{i}u\right\rangle\right|+|(I\otimes\mathsf{A}_{H})w|+
|(I𝖠H)w|+|(I𝖠H)w|2\displaystyle|(I\otimes\mathsf{A}_{H})w|+|(I\otimes\mathsf{A}_{H})w|^{2}
\displaystyle\leq |u,Giu|+2σ2(H)+σ22(H)\displaystyle\left|\left\langle u,G_{i}u\right\rangle\right|+2\sigma_{2}(H)+\sigma_{2}^{2}(H)

To bound |u,(I𝖠H)Gi(I𝖠H)u|\left|\left\langle u,(I\otimes\mathsf{A}_{H})G_{i}(I\otimes\mathsf{A}_{H})u\right\rangle\right|, observe that u=x𝟏u=x\otimes\mathbf{1} for some xV(G)x\in\mathbb{R}^{V(G)}. Then,

0=v,𝟏=u,𝟏+w,𝟏=u,𝟏=x,𝟏G0=\left\langle v,\mathbf{1}\right\rangle=\left\langle u,\mathbf{1}\right\rangle+\left\langle w,\mathbf{1}\right\rangle=\left\langle u,\mathbf{1}\right\rangle=\left\langle x,\mathbf{1}_{G}\right\rangle

so that x𝟏Gx\bot\mathbf{1}_{G}. Because uu is uniform over HH-component, |u,Giu|=|x,Gx|σ2(G)\left|\left\langle u,G_{i}u\right\rangle\right|=\left|\left\langle x,Gx\right\rangle\right|\leq\sigma_{2}(G), which completes the proof.      

We also derive a (simple) tighter bound for the expansion of the zig-zag product in a particular parameter regime.

Claim C.2.

Let GG be a λ1\lambda_{1}-two-sided expander and HH be a λ2\lambda_{2}-two-sided expander such that both are regular graphs. If λ1λ2\lambda_{1}\leq\lambda_{2}, then

σ2(GzH)2λ2.\sigma_{2}(G\operatorname{\leavevmode\hbox to9.42pt{\vbox to9.42pt{\pgfpicture\makeatletter\hbox{\hskip 4.70757pt\lower-4.70757pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.50757pt}{0.0pt}\pgfsys@curveto{4.50757pt}{2.48949pt}{2.48949pt}{4.50757pt}{0.0pt}{4.50757pt}\pgfsys@curveto{-2.48949pt}{4.50757pt}{-4.50757pt}{2.48949pt}{-4.50757pt}{0.0pt}\pgfsys@curveto{-4.50757pt}{-2.48949pt}{-2.48949pt}{-4.50757pt}{0.0pt}{-4.50757pt}\pgfsys@curveto{2.48949pt}{-4.50757pt}{4.50757pt}{-2.48949pt}{4.50757pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.22221pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\rm z}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}H)\leq 2\cdot\lambda_{2}.
Proof.

Let v=av+bvv=a\cdot v^{\parallel}+b\cdot v^{\perp} with a2+b2=1a^{2}+b^{2}=1 be such that v1v\perp 1. In particular, if v=vG1Hv^{\parallel}=v^{G}\otimes 1^{H}, then vG1Gv^{G}\perp 1^{G} since otherwise v,1=vG,1G0\left\langle v,1\right\rangle=\left\langle v^{G},1^{G}\right\rangle\neq 0. We have

maxa,b:a2+b2=1a2λ1+2abλ2+b2λ22\displaystyle\max_{a,b\in\mathbb{R}\colon a^{2}+b^{2}=1}a^{2}\cdot\lambda_{1}+2ab\cdot\lambda_{2}+b^{2}\cdot\lambda_{2}^{2} maxa,b:a2+b2=1a2λ2+2abλ2+b2λ2\displaystyle\leq\max_{a,b\in\mathbb{R}\colon a^{2}+b^{2}=1}a^{2}\cdot\lambda_{2}+2ab\cdot\lambda_{2}+b^{2}\cdot\lambda_{2}
=maxa,b:a2+b2=1λ2+2abλ2,\displaystyle=\max_{a,b\in\mathbb{R}\colon a^{2}+b^{2}=1}\lambda_{2}+2ab\cdot\lambda_{2},

where the inequality follows from the assumption λ1λ2\lambda_{1}\leq\lambda_{2} (and trivially λ22λ2\lambda_{2}^{2}\leq\lambda_{2}) and the equality follows from a2+b2=1a^{2}+b^{2}=1. Since we also have 2ab=(a+b)2(a2+b2)12ab=(a+b)^{2}-(a^{2}+b^{2})\leq 1, the result follows.      

Appendix D Derandomization

To unique decode in fixed polynomial time (i.e., poly(n/ε){\mathrm{poly}}(n/\varepsilon)) we need to prune the list of coupled words \mathcal{L} covering the list (y~)={(z,y=dsum(z))z𝒞,Δ(y~,y)1/2n}\mathcal{L}^{*}(\tilde{y})=\{(z,y=\operatorname{dsum}(z))\mid z\in\mathcal{C},\Delta(\tilde{y},y)\leq 1/2-\sqrt{n}\} of codewords we want to retrieve. To do so, given (z,y=dsum(z))(y~)(z^{*},y^{*}=\operatorname{dsum}(z^{*}))\in\mathcal{L}^{*}(\tilde{y}), we need to have (z,y=dsum(z))(z,y=\operatorname{dsum}(z))\in\mathcal{L} such that

  1. 1.

    |y,dsum(z)|\left\lvert\left\langle y^{*},\operatorname{dsum}(z)\right\rangle\right\rvert is not too small, and

  2. 2.

    y~,dsum(z)\left\langle\tilde{y},\operatorname{dsum}(z)\right\rangle is not too small (in order to apply Lemma 6.11).

The slice (S,σ)(S,\sigma) of the SOS solution from which yy^{*} is recoverable satisfies in expectation

𝔼z{𝐙|(S,σ)}[y,dsum(z)2]3η2,{\mathbb{E}}_{z\sim\{\mathbf{Z}^{\otimes}|_{(S,\sigma)}\}}\left[\left\langle y^{*},\operatorname{dsum}(z)\right\rangle^{2}\right]\geq 3\eta^{2},

and

𝔼z{𝐙|(S,σ)}[y~,dsum(z)]3η/2.{\mathbb{E}}_{z\sim\{\mathbf{Z}^{\otimes}|_{(S,\sigma)}\}}\left[\left\langle\tilde{y},\operatorname{dsum}(z)\right\rangle\right]\geq 3\sqrt{\eta}/2.

Moreover, since zy,dsum(z)2z\mapsto\left\langle y^{*},\operatorname{dsum}(z)\right\rangle^{2} and zy~,dsum(z)z\mapsto\left\langle\tilde{y},\operatorname{dsum}(z)\right\rangle are O(1/n)O(1/n)-Lipschitz 111111In this fixed polynomial time regime, the parameters s,d1,d2,ε0,ηs,d_{1},d_{2},\varepsilon_{0},\eta are constant independent of the final bias ε\varepsilon. with respect to the 1\ell_{1}-norm, Hoeffding’s inequality gives

z{𝐙|(S,σ)}[y,dsum(z)2<η2]exp(Θ(n)),\operatorname*{\mathbb{P}}_{z\sim\{\mathbf{Z}^{\otimes}|_{(S,\sigma)}\}}\left[\left\langle y^{*},\operatorname{dsum}(z)\right\rangle^{2}<\eta^{2}\right]\leq\exp\left(-\Theta(n)\right),

and

z{𝐙|(S,σ)}[y~,dsum(z)<η]exp(Θ(n)).\operatorname*{\mathbb{P}}_{z\sim\{\mathbf{Z}^{\otimes}|_{(S,\sigma)}\}}\left[\left\langle\tilde{y},\operatorname{dsum}(z)\right\rangle<\sqrt{\eta}\right]\leq\exp\left(-\Theta(n)\right).

At least randomly, such a zz can be easily found. In [AJQ+20], alternatively to satisfying Item 1 it was shown that by choosing z{±1}nz^{\prime}\in\{\pm 1\}^{n} by majority vote, i.e.

zi=argmaxb{±1}[𝐙i=b]z^{\prime}_{i}=\operatorname*{\arg\!\max}_{b\in\{\pm 1\}}\operatorname*{\mathbb{P}}[\mathbf{Z}_{i}=b]

for i[n]i\in[n], one has that |z,z|\left\lvert\left\langle z^{*},z^{\prime}\right\rangle\right\rvert is large which is enough to address the first item. More precisely implicit in [AJQ+20], for any constant β(0,1)\beta\in(0,1) as long as parity sampling is sufficiently strong we have

𝔼z{𝐙|(S,σ)}[z,z2]1β.{\mathbb{E}}_{z\sim\{\mathbf{Z}^{\otimes}|_{(S,\sigma)}\}}\left[\left\langle z^{\prime},z\right\rangle^{2}\right]\geq 1-\beta.

Similarly zz,z2z\mapsto\left\langle z^{\prime},z\right\rangle^{2} is O(1/n)O(1/n)-Lipschitz with respect to the 1\ell_{1}-norm, so Hoeffding’s inequality yields

z{𝐙|(S,σ)}[z,z2<1β/2]exp(Θ(n)).\operatorname*{\mathbb{P}}_{z\sim\{\mathbf{Z}^{\otimes}|_{(S,\sigma)}\}}\left[\left\langle z^{\prime},z\right\rangle^{2}<1-\beta/2\right]\leq\exp\left(-\Theta(n)\right).

However, we want to efficiently and deterministically find a zz satisfying z,z21β/2\left\langle z^{\prime},z\right\rangle^{2}\geq 1-\beta/2 as well as satisfying Item 2. Note that at this stage in the decoding process yy^{*} is not known (without issuing a recursive unique decoding call), so running expectation maximization to satisfy item Item 1 would not be possible. Fortunately, the majority zz^{\prime} can be cheaply computed without a recursive call to an unique decoder. On the other hand zz satisfying only Item 2 can be found by expectation maximization. The challenge is to satisfy both conditions at the same time. For this reason, we design a simultaneous expectation maximization derandomization procedure tailored to our setting.

D.1 Abstract Derandomization: Simultaneous Expectation Maximization

Suppose that Ω\Omega is a probability space where two random variables 𝐀\mathbf{A} and 𝐁\mathbf{B} are defined satisfying the following first moment conditions

𝔼[𝐀]a and 𝔼[𝐁]1β.{\mathbb{E}}\left[\mathbf{A}\right]\geq a\quad\text{ and }\quad{\mathbb{E}}\left[\mathbf{B}\right]\geq 1-\beta.

We provided a sufficient conditions so that ωΩ\omega\in\Omega satisfying

𝐀(ω)a and 𝐁(ω)1β\mathbf{A}(\omega)\geq a^{\prime}\quad\text{ and }\quad\mathbf{B}(\omega)\geq 1-\beta^{\prime}

can be efficiently deterministically found with the aid of an oracle, where aaa\approx a^{\prime} and ββ\beta\approx\beta^{\prime}. More precisely, we have the following lemma.

Lemma D.1.

Let Ω=({1,1}n,ν1××νn)\Omega=(\{-1,1\}^{n},\nu_{1}\times\cdots\times\nu_{n}) be a probability space with a product distribution. Suppose 𝐀[1,1]\mathbf{A}\in[-1,1] is a random variable on Ω\Omega satisfying, for a>0a>0 and for some function eA:+e_{A}\colon\mathbb{N}\rightarrow\mathbb{R}^{+},

[𝐀<a]eA(n).\operatorname*{\mathbb{P}}\left[\mathbf{A}<a\right]\leq e_{A}(n).

Suppose 𝐁[1,1]\mathbf{B}\in[-1,1] is a random variable on Ω\Omega satisfying, for some function eB:×++e_{B}\colon\mathbb{N}\times\mathbb{R}^{+}\rightarrow\mathbb{R}^{+},

[𝐁<1β]eB(n,β).\operatorname*{\mathbb{P}}\left[\mathbf{B}<1-\beta\right]\leq e_{B}(n,\beta).

Suppose that there is an oracle to evaluate 𝔼[𝐀𝐁2k]{\mathbb{E}}\left[\mathbf{A}\mathbf{B}^{2k}\right] under any product distribution μ1××μn\mu_{1}^{\prime}\times\cdots\times\mu_{n}^{\prime} for kk\in\mathbb{N}. Given δ,β(0,1)\delta,\beta\in(0,1), if

eA(n)+eB(n,β/(4ln(a(1β)+1)/δ))aβ2,e_{A}(n)+e_{B}(n,\beta/(4\lceil-\ln(a(1-\beta)+1)/\delta\rceil))\leq a\frac{\beta}{2}, (10)

then it is possible to find ω{±1}n\omega\in\{\pm 1\}^{n} using 2n2n invocations to the oracle and satisfying

𝐀(ω)a(1β) and |𝐁(ω)|1δ.\mathbf{A}(\omega)\geq a(1-\beta)\quad\text{ and }\quad\left\lvert\mathbf{B}(\omega)\right\rvert\geq 1-\delta.
Proof.

Set k=ln(a(1β)+1)/δk=\lceil-\ln(a(1-\beta)+1)/\delta\rceil. Set β=β/(4k)\beta^{\prime}=\beta/(4k). Note that

𝔼[𝐀𝐁2k]a(1β4k)2keA(n)eB(n,β)a(1β),{\mathbb{E}}\left[\mathbf{A}\mathbf{B}^{2k}\right]\geq a\left(1-\frac{\beta}{4k}\right)^{2k}-e_{A}(n)-e_{B}(n,\beta^{\prime})\geq a\left(1-\beta\right),

where we use Eq. 10 in the last inequality. Do expectation maximization to deterministically find ω{±1}n\omega\in\{\pm 1\}^{n}, with 2n2\cdot n invocations to the oracle of 𝔼[𝐀𝐁2k]{\mathbb{E}}\left[\mathbf{A}\mathbf{B}^{2k}\right], such that

𝐀(ω)𝐁(ω)2ka(1β).\mathbf{A}(\omega)\mathbf{B}(\omega)^{2k}\geq a\left(1-\beta\right).

Since 𝐁(ω)2k1\mathbf{B}(\omega)^{2k}\leq 1, we have 𝐀(ω)a(1β)\mathbf{A}(\omega)\geq a\left(1-\beta\right). Towards a contradiction suppose |𝐁(ω)|1δ\left\lvert\mathbf{B}(\omega)\right\rvert\leq 1-\delta. Using that 𝐀(ω)1\mathbf{A}(\omega)\leq 1, we have

e2kδ(1δ)2k𝐀(ω)𝐁(ω)2ka(1β).e^{-2k\cdot\delta}\geq(1-\delta)^{2k}\geq\mathbf{A}(\omega)\mathbf{B}(\omega)^{2k}\geq a(1-\beta). (11)

By our choice of kk, we get

e2kδ<a(1β),e^{-2k\cdot\delta}<a(1-\beta),

contradicting Eq. 11.      

D.2 Implementing the Oracle

Now, we provide an efficient deterministic oracle for our setting. We take

𝐀y~,dsum(z) and 𝐁z,z2,\mathbf{A}\coloneqq\left\langle\tilde{y},\operatorname{dsum}(z)\right\rangle\quad\text{ and }\quad\mathbf{B}\coloneqq\left\langle z^{\prime},z\right\rangle^{2},

where zi=argmaxb{±1}[𝐙i=b]z^{\prime}_{i}=\operatorname*{\arg\!\max}_{b\in\{\pm 1\}}\operatorname*{\mathbb{P}}[\mathbf{Z}_{i}=b]. Note that

𝐀𝐁2k=T[n]:|T|=O(1)αTiTzi.\mathbf{A}\mathbf{B}^{2k}=\sum_{T\subset[n]\colon\left\lvert T\right\rvert=O(1)}\alpha_{T}\prod_{i\in T}z_{i}.

To compute 𝔼[𝐀𝐁2k]{\mathbb{E}}\left[\mathbf{A}\mathbf{B}^{2k}\right] under any product distribution μ1××μn\mu_{1}^{\prime}\times\cdots\times\mu_{n}^{\prime}, use linearity of expectation and sum at most nO(1)n^{O(1)} terms αT𝔼[iTzi]\alpha_{T}{\mathbb{E}}\left[\prod_{i\in T}z_{i}\right] where each can be computed in O(1)O(1) since restricted to TT we have a product distribution taking values in {±1}T\{\pm 1\}^{T}.