Unique Decoding of Explicit -balanced Codes Near
the Gilbert–Varshamov Bound
The Gilbert–Varshamov bound (non-constructively) establishes the existence of binary codes of distance and rate (where an upper bound of is known). Ta-Shma [STOC 2017] gave an explicit construction of -balanced binary codes, where any two distinct codewords are at a distance between and , achieving a near optimal rate of , where as .
We develop unique and list decoding algorithms for (a slight modification of) the family of codes constructed by Ta-Shma, in the adversarial error model. We prove the following results for -balanced codes with block length and rate in this family:
-
-
For all there are explicit codes which can be uniquely decoded up to an error of half the minimum distance in time .
-
-
For any fixed constant independent of , there is an explicit construction of codes which can be uniquely decoded up to an error of half the minimum distance in time .
-
-
For any , there are explicit -balanced codes with rate which can be list decoded up to error in time , where as .
The starting point of our algorithms is the framework for list decoding direct-sum codes develop in Alev et al. [SODA 2020], which uses the Sum-of-Squares SDP hierarchy. The rates obtained there were quasipolynomial in . Here, we show how to overcome the far from optimal rates of this framework obtaining unique decoding algorithms for explicit binary codes of near optimal rate. These codes are based on simple modifications of Ta-Shma’s construction.
1 Introduction
Binary error correcting codes have pervasive applications [Gur10, GRS19] and yet we are far from understanding some of their basic properties [Gur09]. For instance, until very recently no explicit binary code achieving distance with rate near was known, even though the existence of such codes was (non-constructively) established long ago [Gil52, Var57] in what is now referred as the Gilbert–Varshamov (GV) bound. On the impossibility side, a rate upper bound of is known for binary codes of distance (e.g., [Del75, MRRW77, NS09]).
In a breakthrough result [TS17], Ta-Shma gave an explicit construction of binary codes achieving nearly optimal distance versus rate trade-off, namely, binary codes of distance with rate where vanishes as vanishes 111In fact, Ta-Shma obtained and thus .. Actually, Ta-Shma obtained -balanced binary linear codes, that is, linear binary codes with the additional property that non-zero codewords have Hamming weight bounded not only below by but also above by , and this is a fundamental property in the study of pseudo-randomness [NN90, AGHP92].
While the codes constructed by Ta-Shma are explicit, they were not known to admit efficient decoding algorithms, while such results are known for codes with smaller rates. In particular, an explicit binary code due to Guruswami and Rudra [GR06] is known to be even list decodable at an error radius with rate . We consider the following question:
Do explicit binary codes near the GV bound admit an efficient decoding algorithm?
Here, we answer this question in the affirmative by providing an efficient 222By “efficient”, we mean polynomial time. Given the fundamental nature of the problem of decoding nearly optimal binary codes, it is an interesting open problem to make these techniques viable in practice. unique decoding algorithm for (essentially) Ta-Shma’s code construction, which we refer as Ta-Shma codes. More precisely, by building on Ta-Shma’s construction and using our unique decoding algorithm we have the following result.
Theorem 1.1 (Unique Decoding).
For every sufficiently small, there are explicit binary linear Ta-Shma codes for infinitely many values with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate where , and
-
(iii)
a unique decoding algorithm with running time .
Furthermore, if instead we take to be an arbitrary constant, the running time becomes (fixed polynomial time).
We can also perform “gentle” list decoding in the following sense (note that this partially implies Theorem 1.1).
Theorem 1.2 (Gentle List Decoding).
For every sufficiently small, there are explicit binary linear Ta-Shma codes for infinitely many values with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate where , and
-
(iii)
a list decoding algorithm that decodes within radius in time .
We observe that the exponent in the running time appearing in Theorem 1.1 and Theorem 1.2 depends on . This dependence is no worse than , and if is taken to be an arbitrarily constant (independent of ), the running time becomes . Avoiding this dependence in the exponent when is an interesting open problem. Furthermore, obtaining a list decoding radius of in Theorem 1.2 with the same rate (or even ) is another very interesting open problem and related to a central open question in the adversarial error regime [Gur09].
Direct sum codes. Our work can be viewed within the broader context of developing algorithms for the decoding of direct sum codes. Given a (say linear) code and a collection of tuples , the code with block length is defined as
The direct sum operation has been used for several applications in coding and complexity theory [ABN+92, IW97, GI01, IKW09, DS14, DDG+15, Cha16, DK17, Aro02]. It is easy to see that if is -balanced for a constant , then for any , choosing to be a random collection of tuples of size results in being an -balanced code. The challenge in trying to construct good codes using this approach is to find explicit constructions of (sparse) collections which are “pseudorandom” enough to yield a similar distance amplification as above. On the other hand, the challenge in decoding such codes is to identify notions of “structure” in such collections , which can be exploited by decoding algorithms.
In Ta-Shma’s construction [TS17], such a pseudorandom collection was constructed by considering an expanding graph over the vertex set , and generating -tuples using sufficiently long walks of length over the so-called -wide replacement product of with another (small) expanding graph . Roughly speaking, this graph product is a generalization of the celebrated zig-zag product [RVW00] but with different steps of the zig-zag product instead of a single one. Ta-Shma’s construction can also be viewed as a clever way of selecting a sub-collection of all walks in , which refines an earlier construction suggested by Rozenman and Wigderson [Bog12] (and also analyzed by Ta-Shma) using all walks of length .
Identifying structures to facilitate decoding. For the closely related direct product construction (where the entry corresponding to is the entire -tuple ) which amplifies distance but increases the alphabet size, it was proved by Alon et al. [ABN+92] that the resulting code admits a unique decoding algorithm if the incidence graph corresponding to the collection is a good sampler. Very recently, it was proved by Dinur et al. [DHK+19] that such a direct product construction admits list decoding if the incidence graph is a “double sampler”. The results of [DHK+19] also apply to direct sum, but the use of double samplers pushes the rate away from near optimality.
For the case of direct sum codes, the decoding task can be phrased as a maximum -XOR problem with the additional constraint that the solution must lie in . More precisely, given within the unique decoding radius of , we consider the following optimization problem
where is the (normalized) Hamming distance. While maximum -XOR is in general hard to solve to even any non-trivial degree of approximation [Hås97], previous work by the authors [AJQ+20] identified a structural condition on called “splittability” under which the above constraint satisfaction problem can be solved (approximately) resulting in efficient unique and list decoding algorithms. However, by itself the splittability condition is too crude to be applicable to codes such as the ones in Ta-Shma’s construction. The requirements it places on the expansion of are too strong and the framework in [AJQ+20] is only able to obtain algorithms for direct sum codes with rate .
The conceptual contribution of this work can be viewed as identifying a different recursive structure in direct sums generated by expander walks, which allows us to view the construction as giving a sequence of codes . Here, is the starting code and is the final desired code, and each element in the sequence can be viewed as being obtained via a direct sum operation on the preceding code. Instead of considering a “one-shot” decoding task of finding an element of , this facilitates an iterative approach where at each step we reduce the task of decoding the code to decoding for , using the above framework from [AJQ+20]. Such an iterative approach with a sequence of codes was also used (in a very different setting) in a work of Guruswami and Indyk [GI03] constructing codes over a large alphabet which are list decodable in linear time via spectral algorithms.
Another simple and well-known (see e.g., [GI04]) observation, which is very helpful in our setting, is the use of list decoding algorithms for unique decoding. For a code with distance , unique decoding can be obtained by list decoding at a much smaller error radius of (say) . This permits a much more efficient application of the framework from [AJQ+20], with a milder dependence on the expansion of the graphs and in Ta-Shma’s construction, resulting in higher rates. We give a more detailed overview of our approach in Section 3.
Known results for random ensembles. While the focus in this work is on explicit constructions, there are several known (non-explicit) constructions of random ensembles of binary codes near or achieving the Gilbert–Varshamov bound (e.g., Table 1). Although it is usually straightforward to ensure the desired rate in such constructions, the distance only holds with high probability. Given a sample code from such ensembles, certifying the minimum distance is usually not known to be polynomial time in the block length. Derandomizing such constructions is also a possible avenue for obtaining optimal codes, although such results remain elusive to this date (to the best of our knowledge).
One of the simplest constructions is that of random binary linear codes in which the generator matrix is sampled uniformly. This random ensemble achieves the GV bound with high probability, but its decoding is believed to be computationally hard [MMT11].
Much progress has been made on binary codes by using results for larger alphabet codes [Gur09]. Codes over non-binary alphabets with optimal (or nearly optimal) parameters are available [vL99, Sti08, GR06] and thanks to this availability a popular approach to constructing binary codes has been to concatenate such large alphabet codes with binary ones. Thommesen [Tho83] showed that by concatenating Reed–Solomon (RS) codes with random binary codes (one random binary code for each position of the outer RS code) it is possible to achieve the GV bound. Note that Thommesen codes arise from a more structured ensemble than random binary linear codes. This additional structure enabled Guruswami and Indyk [GI04] to obtain efficient decoding algorithms for the non-explicit Thommesen codes (whose minimum distance is not known to admit efficient certification). This kind of concatenation starting from a large alphabet code and using random binary codes, which we refer as Thommesen-like, has been an important technique in tackling binary code constructions with a variety of properties near or at the GV bound. An important drawback in several such Thommesen-like code constructions is that they end up being non-explicit (unless efficient derandomization or brute-force is viable).
Using a Thommesen-like construction, Gopi et al. [GKO+17] showed non-explicit constructions of locally testable and locally correctable binary codes approaching the GV bound. More recently, again with a Thommesen-like construction, Hemenway et al. [HRW17] obtained non-explicit near linear time unique decodable codes at the GV bound improving the running time of Guruswami and Indyk [GI04] (and also the decoding rates). We summarize the results discussed so far in Table 1.
Binary Code Results near the Gilbert–Varshamov bound | ||||||
Who? | Construction | GV | Explicit | Concatenated | Decoding | Local |
[Gil52, Var57] | existential | yes | no | no | no | n/a |
[Tho83] | Reed–Solomon + random binary | yes | no | yes | no | n/a |
[GI04] | Thommesen [Tho83] | yes | no | yes | unique decoding | n/a |
[GKO+17] | Thommesen-like | yes | no | yes | unique decoding | LTC/LCC |
[HRW17] | Thommesen-like | yes | no | yes | near linear time unique decoding | n/a |
[TS17] | Expander-based | yes | no | no | n/a | |
this paper | Ta-Shma [TS17] | yes | no | gentle list decoding | n/a |
There are also non-explicit constructions known to achieve list decoding capacity [GR08, MRRZ+19] (being concatenated or LDPC/Gallager [Gal62] is not an obstruction to achieve capacity). Contrary to the other results in this subsection, Guruswami and Rudra [Gur05, GR06, Gur09], also using a Thommesen-like construction, obtained explicit codes that are efficiently list decodable from radius with rate . This was done by concatenating the so-called folded Reed–Solomon codes with a derandomization of a binary ensemble of random codes.
Results for non-adversarial error models. All the results mentioned above are for the adversarial error model of Hamming [Ham50, Gur10]. In the setting of random corruptions (Shannon model), the situation seems to be better understood thanks to the seminal result on explicit polar codes of Arikan [Ari09]. More recently, in another breakthrough Guruswami et al. [GRY19] showed that polar codes can achieve almost linear time decoding with near optimal convergence to capacity for the binary symmetric channel. This result gives an explicit code construction achieving parameter trade-offs similar to Shannon’s randomized construction [Sha48] while also admitting very efficient encoding and decoding. Explicit capacity-achieving constructions are also known for bounded memory channels [SKS19] which restrict the power of the adversary and thus interpolate between the Shannon and Hamming models.
2 Preliminaries and Notation
2.1 Codes
We briefly recall some standard code terminology. Given , recall that the relative Hamming distance between and is . A binary code is any subset . The distance of is defined as where . We say that is a linear code if is a linear subspace of . The rate of is .
Instead of discussing the distance of a binary code, it will often be more natural to phrase results in terms of its bias.
Definition 2.1 (Bias).
The bias of a word is defined as . The bias of a code is the maximum bias of any non-zero codeword in .
Definition 2.2 (-balanced Code).
A binary code is -balanced if for every pair of distinct .
Remark 2.3.
For linear binary code , the condition is equivalent to being an -balanced code.
2.2 Direct Sum Lifts
Starting from a code , we amplify its distance by considering the direct sum lifting operation based on a collection . The direct sum lifting maps each codeword of to a new word in by taking the -XOR of its entries on each element of .
Definition 2.4 (Direct Sum Lifting).
Let . For , we define the direct sum lifting as such that for all . The direct sum lifting of a code is
We will omit from this notation when it is clear from context.
Remark 2.5.
We will be concerned with collections arising from length- walks on expanding structures (mostly on the -wide replacement product of two expander graphs).
We will be interested in cases where the direct sum lifting reduces the bias of the base code; in [TS17], structures with such a property are called parity samplers, as they emulate the reduction in bias that occurs by taking the parity of random samples.
Definition 2.6 (Parity Sampler).
A collection is called an -parity sampler if for all with , we have .
2.3 Linear Algebra Conventions
All vectors considered in this paper are taken to be column vectors, and are multiplied on the left with any matrices or operators acting on them. Consequently, given an indexed sequence of operators (with ) corresponding to steps through of a walk, we expand the product as
Unless otherwise stated, all inner products for vectors in coordinate spaces are taken to be with respect to the (uniform) probability measure on the coordinates. Similarly, all inner products for functions are taken to be with respect to the uniform measure on the inputs. All operators considered in this paper are normalized to have singular values at most 1.
3 Proof Overview
The starting point for our work is the framework developed in [AJQ+20] for decoding direct sum codes, obtained by starting from a code and considering all parities corresponding to a set of -tuples . Ta-Shma’s near optimal -balanced codes are constructed by starting from a code with constant rate and constant distance and considering such a direct sum lifting. The set of tuples in his construction corresponds to a set of walks of length on the -wide replacement product of an expanding graph with vertex set and a smaller expanding graph . The -wide replacement product can be thought of here as a way of constructing a much smaller pseudorandom subset of the set of all walks of length on , which yields a similar distance amplification for the lifted code.
The simplified construction with expander walks.
While we analyze Ta-Shma’s construction later in the paper, it is instructive to first consider a simply consisting of all walks of length on an expander. This construction, based on a suggestion of Rozenman and Wigderson [Bog12], was also analyzed by Ta-Shma [TS17] and can be used to obtain -balanced codes with rate . It helps to illustrate many of the conceptual ideas involved in our proof, while avoiding some technical issues.
Let be a -regular expanding graph with vertex set and the (normalized) second singular value of the adjacency operator being . Let denote the set of -tuples corresponding to all walks of length , with . Ta-Shma proves that for all , satisfies
i.e., is an -parity sampler for . Choosing and (say), we can choose and obtain the -balanced code with rate (although the right constants matter a lot for optimal rates).
Decoding as constraint satisfaction.
The starting point for our work is the framework in [AJQ+20] which views the task of decoding with (where the distance of is ) as an instance of the MAX t-XOR problem (see Fig. 1). The goal is to find
which can be rephrased as
It is possible to ignore the condition that if the collection is a slightly stronger parity sampler. For any solution (not necessarily in ) such that
we have
by the triangle inequality, and thus . If is not just an -parity sampler, but in fact a -parity sampler, this would imply . Thus, (or ) and we can use a unique decoding algorithm for to find given .
The task of finding such a boils down to finding a solution to a MAX t-XOR instance, up to a an additive loss of in the fraction of constraints satisfied by the optimal solution. While this is hard to do in general [Hås01, Gri01], [AJQ+20] (building on [AJT19]) show that this can be done if the instance satisfies a special property called splittability. To define this, we let denote the collection of -tuples obtained by considering the indices between and for all tuples in . We also assume that all can be extended to the same number of tuples in (which is true for walks).
Definition 3.1 (Splittability (informal)).
A collection is said to be -splittable, if (base case) or there exists such that:
-
1.
The matrix defined by has normalized second singular value at most (where denotes the concatenated tuple).
-
2.
The collections and are -splittable.
For example, considering walks in of length () and , we get that , the set of oriented edges in . Also if and only if the second vertex of and first vertex of are adjacent in . Thus, up to permutation of rows and columns, we can write the normalized version of as where is normalized adjacency matrix of and denotes the matrix of 1s. Hence such a satisfies with , and a similar proof works for walks of all lengths.
The framework in [AJQ+20] and [AJT19] gives that if is -splittable for , then the above instance of MAX t-XOR can be solved to additive error using the Sum-of-Squares (SOS) SDP hierarchy. Broadly speaking, splittability allows one to (recursively) treat instances as expanding instances of problems with two “tuple variables” in each constraint, which can then be analyzed using known algorithms for 2-CSPs [BRS11, GS11] in the SOS hierarchy. Combined with parity sampling, this yields a unique decoding algorithm. Crucially, this framework can also be extended to perform list decoding333 While unique decoding can be thought of as recovering a single solution to a constraint satisfaction problem, the goal in the list decoding setting can be thought of as obtaining a “sufficiently rich” set of solutions which forms a good cover. This is achieved in the framework by adding an entropic term to the semidefinite program, which ensures that the SDP solution satisfies such a covering property. up to a radius of under a similar condition on , which will be very useful for our application.
While the above can yield decoding algorithms for suitably expanding , the requirement on (and hence on ) makes the rate much worse. We need (for unique decoding) and (for parity sampling), which requires , yielding only a quasipolynomial rate for the code (recall that we could take earlier yielding polynomial rates).
Unique decoding: weakening the error requirement.
We first observe that it is possible to get rid of the dependence above by using the list decoding algorithm for unique decoding. It suffices to take and return the closest element from the the list of all codewords up to an error radius , if we are promised that is within the unique decoding radius (see Fig. 2). However, this alone does not improve the rate as we still need the splittability (and hence ) to be with .
Code cascades: handling the dependence on walk length.
To avoid the dependence of the expansion on the length of the walk (and hence on ), we avoid the “one-shot” decoding above, and instead consider a sequence of intermediate codes between and . Consider the case when , and instead of computing -wise sums of bits in each , we first compute -wise sums according to walks of length on , and then a -wise sum of these values. In fact, the second sum can also be thought of as arising from a length walk on a different graph, with vertices corresponding to (directed) walks with vertices in , and edges connecting and when the last vertex of is connected to the first one in (this is similar to the matrix considered for defining splittability). We can thus think of a sequence of codes with and , and both and being -wise direct sums. More generally, when for an appropriate constant we can think of a sequence , where each is an -wise direct sum of the previous code, obtained via walks of length (hence vertices) in an appropriate graph. We refer to such sequences (defined formally in Section 5) as code cascades (see Fig. 3).
Instead of applying the decoding framework above to directly reduce the decoding of a corrupted codeword from to the unique decoding problem in , we apply it at each level of a cascade, reducing the unique decoding problem in to that in . If the direct sum at each level of the cascade is an -parity sampler, the list decoding algorithm at radius suffices for unique decoding even if is a (sufficiently small) constant independent of . This implies that we can take to be a (suitably large) constant. This also allows the splittability (and hence ) to be , yielding polynomial rates. We present the reduction using cascades in Section 6 and the parameter choices in Section 8. The specific versions of the list decoding results from [AJQ+20] needed here are instantiated in Section 9.
While the above allows for polynomial rate, the running time of the algorithm is still exponential in the number of levels (which is ) since the list decoding for each level potentially produces a list of size , and recursively calls the decoding algorithm for the previous level on each element of the list. We obtain a fixed polynomial time algorithm by “pruning” the list at each level of the cascade before invoking the decoding algorithm for the previous level, while only slightly increasing the parity sampling requirements. The details are contained in Section 6.
Working with Ta-Shma’s construction.
Finally, to obtain near-optimal rates, we need to work with with Ta-Shma’s construction, where the set of tuples corresponds to walks arising from an -wide replacement product of with another expanding graph . One issue that arises is that the collection of walks as defined in [TS17] does not satisfy the important splittability condition required by our algorithms. However, this turns out to be easily fixable by modifying each step in Ta-Shma’s construction to be exactly according to the zig-zag product of Reingold, Vadhan and Wigderson [RVW00]. We present Ta-Shma’s construction and this modification in Section 4.
We also verify that the tuples given by Ta-Shma’s construction satisfy the conditions for applying the list decoding framework, in Section 7. While the sketch above stated this in terms of splittability, the results in [AJQ+20] are in terms of a more technical condition called tensoriality. We show in Section 7 that this is indeed implied by splittability, and also prove splittability for (the modified version of) Ta-Shma’s construction.
4 Ta-Shma’s Construction: A Summary and Some Tweaks
In this section, we first discuss the -wide replacement product that is central to Ta-Shma’s construction of optimal -balanced codes, and then we describe the construction itself (we refer the reader to [TS17] for formal details beyond those we actually need here).
As mentioned before, we will also need to modify Ta-Shma’s construction [TS17] a little to get splittability which is a notion of expansion of a collection (and it is formally defined in Definition 7.9). The reason for this simple modification is that this splittability property is required by the list decoding framework. Note that we are not improving the Ta-Shma code parameters; in fact, we need to argue why with this modification we can still achieve Ta-Shma’s parameters. Fortunately, this modification is simple enough that we will be able to essentially reuse Ta-Shma’s original analysis. In Section 4.3, we will also have the opportunity to discuss, at an informal level, the intuition behind some parameter trade-offs in Ta-Shma codes which should provide enough motivation when we instantiate these codes in Section 8.
4.1 The -wide Replacement Product
Ta-Shma’s code construction is based on the so-called -wide replacement product [TS17]. This is a derandomization of random walks on a graph that will be defined via a product operation of with another graph (see 4.2 for a formal definition). We will refer to as the outer graph and as the inner graph in this construction.
Let be a -regular graph on vertex set and be a -regular graph on vertex set , where is any positive integer. Suppose the neighbors of each vertex of are labeled 1, 2, …, . For , let be the -th neighbor of . The -wide replacement product is defined by replacing each vertex of with a copy of , called a “cloud”. While the edges within each cloud are determined by , the edges between clouds are based on the edges of , which we will define via operators . The -th operator specifies one inter-cloud edge for each vertex , which goes to the cloud whose component is , the neighbor of in indexed by the -th coordinate of the component. (We will resolve the question of what happens to the component after taking such a step momentarily.)
Walks on the -wide replacement product consist of steps with two different parts: an intra-cloud part followed by an inter-cloud part. All of the intra-cloud substeps simply move to a random neighbor in the current cloud, which corresponds to applying the operator , where is the normalized adjacency matrix of . The inter-cloud substeps are all deterministic, with the first moving according to , the second according to , and so on, returning to for step number . The operator for such a walk taking steps on the -wide replacement product is
Observe that a walk on the -wide replacement product yields a walk on the outer graph by recording the component after each step of the walk. The number of -step walks on the -wide replacement product is
since a walk is completely determined by its intra-cloud steps. If is much smaller than and is large compared to , this is less than , the number of -step walks on itself. Thus the -wide replacement product will be used to simulate random walks on while requiring a reduced amount of randomness (of course this simulation is only possible under special conditions, namely, when we are uniformly distributed on each cloud).
To formally define the -wide replacement product, we must consider the labeling of neighbors in more carefully.
Definition 4.1 (Rotation Map).
Suppose is a -regular graph on . For each and , let be the -th neighbor of in . Based on the indexing of the neighbors of each vertex, we define the rotation map 444This kind of map is denoted rotation map in the zig-zag terminology [RVW00]. such that for every ,
Furthermore, if there exists a bijection such that for every ,
then we call locally invertible.
If has a locally invertible rotation map, the cloud label after applying the rotation map only depends on the current cloud label, not the vertex of . In the -wide replacement product, this corresponds to the component of the rotation map only depending on a vertex’s component, not its component. We define the -wide replacement product as described before, with the inter-cloud operator using the -th coordinate of the component, which is a value in , to determine the inter-cloud step.
Definition 4.2 (-wide replacement product).
Suppose we are given the following:
-
-
A -regular graph together with a locally invertible rotation map .
-
-
A -regular graph .
And we define:
-
-
For , we define as, for every and ,
where .
-
-
Denote by the operator realizing and let be the normalized random walk operator of . Note that is a permutation operator corresponding to a product of transpositions.
Then steps of the -wide replacement product are given by the operator
Ta-Shma instantiates the -wide replacement product with an outer graph that is a Cayley graph, for which locally invertible rotation maps exist generically.
Remark 4.3.
Let be a group and where the set is closed under inversion. For every Cayley graph , the map defined as gives rise to the locally invertible rotation map
for every , .
4.2 The Construction
Ta-Shma’s code construction works by starting with a constant bias code in and boosting to arbitrarily small bias using direct sum liftings. Recall that the direct sum lifting is based on a collection , which Ta-Shma obtains using steps of random walk on the -wide replacement product of two regular expander graphs and . The graph is on vertices (same as blocklength of the base code) and other parameters like degrees and of and respectively are chosen based on target code parameters.
To elaborate, every length walk on the replacement product gives a sequence of outer vertices or -vertices, which can be seen as an element of . This gives the collection with which means the rate of lifted code is smaller than the rate of by a factor of . However, the collection is a parity sampler and this means that the bias decreases (or the distance increases). The relationship between this decrease in bias and decrease in rate with some careful parameter choices allows Ta-Shma to obtain nearly optimal -balanced codes.
4.3 Tweaking the Construction
Recall the first steps in Ta-Shma’s construction are given by the operator
Naively decomposing the above operator into the product of operators is not good enough to obtain the splittability property which would hold provided was small for every in . However, each has singular values equal to since is an orthogonal operator and has singular values equal to . To avoid this issue we will tweak the construction to be the following product
The operator is exactly the walk operator of the zig-zag product of and with a rotation map given by the (rotation map) operator . This tweaked construction is slightly simpler in the sense that is an undirected graph. We know by the zig-zag analysis that is expanding as long and are themselves expanders. More precisely, we have a bound that follows from [RVW00].
Fact 4.4.
Let be an outer graph and be an inner graph used in the -wide replacement product. For any integer ,
This bound will imply splittability as shown in Section 7.2. We will need to argue that this modification still preserves the correctness of the parity sampling and that it can be achieved with similar parameter trade-offs.
The formal definition of a length- walk on this slightly modified construction is given below.
Definition 4.5.
Let , be a -regular graph and be a -regular graph on vertices. Given a starting vertex , a -step walk on the tweaked -wide replacement product of and is a tuple such that
-
-
, and
-
-
for every , we have adjacent to in .
Note that each is a walk operator of a -regular graph. Therefore, the starting vertex together with a degree sequence uniquely defines a -step walk.
4.3.1 Parity Sampling
We argue informally why parity sampling still holds with similar parameter trade-offs. Later in Section 4.3.2, we formalize a key result underlying parity sampling and, in Section 8, we compute the new trade-off between bias and rate in some regimes. In Section 4.1, the definition of the original -wide replacement product as a purely graph theoretic operation was given. Now, we explain how Ta-Shma used this construction for parity sampling obtaining codes near the GV bound.
For a word in the base code, let be the diagonal matrix, whose rows and columns are indexed by , with . Proving parity sampling requires analyzing the operator norm of the following product
(1) |
when . Let be the all-ones vector and be the collection of all -step walks on the tweaked -wide replacement product. Ta-Shma showed (and it is not difficult to verify) that
From the previous equation, one readily deduces that
Set . To analyze the operator norm of , we will first need some notation. Note that is an operator acting on the space . Two of its subspaces play an important role in the analysis, namely,
Note that the complement subspace is with respect to the standard inner product. Observe that . Given arbitrary unit vectors , Ta-Shma considers the inner product
(2) |
Each time an operator appears in the above expression, the next step of the walk can take one out of possibilities and thus the rate suffers a multiplicative decrease of . We think that we are “paying” for this step of the walk. The whole problem lies in the trade-off between rate and distance, so the crucial question now is how much the norm decreases as we pay . For a moment, suppose that the norm always decreases by a factor of per occurrence of . If in this hypothetical case we could further assume , then if was a product containing factors of , the final bias would be at most and the rate would have suffered a multiplicative decrease of (essentially) and we would be done.
Of course, this was an oversimplification. The general strategy is roughly the above, but a beautiful non-trivial step is needed. Going back to the bilinear form Eq. 2, if (or ), we pay and we do obtain a norm decrease of . More generally, note that can decompose with and (decompose similarly) and we can carry this process iteratively collecting factors of . However, we are stuck with several terms of the form for ,
with , and for which the preceding naive norm decrease argument fails. This is the point in the analysis where the structure of the -wide replacement product is used. Since , these vectors are uniform on each “cloud”, i.e., copy of . Recall that a vertex in is an -tuple . Ta-Shma leverages the fact of having a uniform such tuple to implement (up to ) steps of random walk on . More precisely, Ta-Shma obtains the following beautiful result:
Theorem 4.6 (Adapted from Ta-Shma [TS17]).
Let be a locally invertible graph of degree , be a Cayley graph on , and be integers. If and , then
where is the diagonal matrix defined as for .
Remark 4.7.
Note that the walk operator in this theorem corresponds to the original construction. Theorem 4.6 was used by Ta-Shma to obtain 4.9 whose 4.10 corresponds to the modified construction.
Ta-Shma proved Theorem 4.6 under the more general condition that is 0-pseudorandom. Roughly speaking, this property means that if we start with a distribution that is uniform over the clouds, and walk according to fixed -steps , then the distribution of -vertices obtained will be identical to the distribution obtained if we were doing the usual random walk on . We will always choose to be a Cayley graph on , which will imply that is also 0-pseudorandom. The proof of Theorem 4.6 crucially uses the product structure of : every vertex of can be represented by registers of bits each, and both inter-cloud and intra-cloud steps can be seen as applying register-wise bijections using some canonical mapping between and .
Ta-Shma’s original parity sampling proof required , where is the initial bias and is an error parameter arising from a number theoretic construction of Ramanujan graphs for the outer graph . This is because is the reduction of bias in every two steps while taking a walk on (see Theorem 5.2). Having ensured that after establishing Theorem 4.6, we were collecting enough reduction for price we paid for two steps. In the modified construction, we now have possibilities for each step in (so price for two steps), and so if instead we have in the modified construction, we claim that the correctness of the parity sampling analysis is preserved as well as (essentially) the trade-off between walk length and norm decay. Fortunately, Ta-Shma’s parameters decouple and we can choose parameters to satisfy the above requirement.
Remark 4.8.
This modification on the -replacement product of and essentially 555Except at the first and last factors in the product of operators. amounts to taking a different inner graph which can be factored as (and is still -pseudorandom).
4.3.2 Spectral Analysis of the Modified Construction
We formally show that we don’t loose much by going from Ta-Shma’s original -wide product construction to its tweaked version. The key technical result obtained by Ta-Shma is the following, which is used to analyze the bias reduction as a function of the total number walk steps .
Fact 4.9 (Theorem 24 abridged [TS17]).
If is a Cayley graph on and , then
where is the sign operator of a biased word defined as a diagonal matrix with for every .
We reduce the analysis of Ta-Shma’s tweaked construction to 4.9. In doing so, we only lose one extra step as shown below.
Corollary 4.10.
Proof.
Remark 4.11.
We know that in the modified construction is a Cayley graph since is a Cayley graph.
From this point onward, we will be working exclusively with the modified construction instead of using it in its original form. Any references to Ta-Shma’s construction or the -wide replacement product will actually refer to the modified versions described in this section.
5 Code Cascading
A code cascade is a sequence of codes generated by starting with a base code and recursively applying lifting operations.
Definition 5.1.
We say that a sequence of codes is a code cascade provided for every . Each is a subset of , where is the block length of the code .
Let us see how code cascades may be useful for decoding. Suppose we wish to lift the code to , and there is some such that . In our case of bias boosting, this will depend on the target bias . However, the expansion requirement of the list-decoding framework of [AJQ+20] has a poor dependence on . A way to work around this issue is to go from to via a code cascade as above such that each is a constant independent of the final bias but (which means depends on ). The final code of the cascade is the same as the code obtained from length- walks. While decoding will now become an -level recursive procedure, the gain from replacing by will outweigh this loss, as we discuss below.
5.1 Warm-up: Code Cascading Expander Walks
We now describe the code cascading construction and unique decoding algorithm in more detail. Let be a -regular graph with uniform distribution over the edges. Let be a sufficiently large positive integer, which will be the number of vertices of the walks used for the lifting between consecutive codes in the cascade. At first, it will be crucial that we can take so that the triangle inequality arising from the analysis of the lifting between two consecutive codes involves a constant number of terms. We construct a recursive family of codes as follows.
-
-
Start with a code which is linear and has constant bias .
-
-
Define the code , which is the direct sum lifting over the collection of all length- walks on using the code .
-
-
Let be the (directed) graph where is the collection of all walks on vertices on with two walks and connected iff is adjacent to in .
-
-
Define to be the direct sum lifting on the collection of all length- walks on using the code , i.e., .
-
-
Repeat this process to yield a code cascade .
Thanks to the definition of the graphs and the recursive nature of the construction, the final code is the same as the code obtained from by taking the direct sum lifting over all walks on vertices of . We can use Ta-Shma’s analysis (building on the ideas of Rozenman and Wigderson [Bog12]) for the simpler setting of walks over a single expander graph to determine the amplification in bias that occurs in going from all the way to .
Theorem 5.2 (Adapted from Ta-Shma [TS17]).
Let be an -balanced linear code, and let be the direct sum lifting of over the collection of all length- walks on a graph . Then
If and , taking in the above theorem shows that the final code is -balanced. Observe that the required expansion of the graph only depends on the constant initial bias , not on the desired final bias . It will be important for being able to decode with better parameters that both and are constant with respect to ; only depends on the final bias (with more care we can make depend on , but we restrict this analysis to Ta-Shma’s refined construction on the -wide replacement product).
As mentioned before, to uniquely decode we will inductively employ the list decoding machinery for expander walks from [AJQ+20]. The list decoding algorithm can decode a direct sum lifting as long as the graph is sufficiently expanding, the walk length is large enough, and the base code has an efficient unique decoding algorithm (see Theorem 6.1 for details).
The expansion requirement ultimately depends on the desired list decoding radius of , or more specifically, on how close the list decoding radius is to . If the distance of is at most , its unique decoding radius is at most , which means list decoding at the unique decoding radius is at a constant difference from and thus places only a constant requirement on the expansion of . In the case of the code cascade , unique decoding of is guaranteed by the induction hypothesis. It is not too difficult to see that each graph will have the same second singular value as , so we can uniquely decode if meets the constant expansion requirement and is sufficiently large.
5.2 Code Cascading Ta-Shma’s Construction
We will now describe how to set up a code cascade based on walks on an -wide replacement product. Consider the -wide replacement product of the outer graph with the inner graph . The first walk steps are given by the walk operator
Let . If the total walk length is a multiple of , the walks are generated using the operator
Here is used as a “binding” operator to connect two walks containing vertices at level , vertices at level , and so on. More precisely, we form the following code cascade.
-
-
is an -balanced linear code efficiently uniquely decodable from a constant radius.
-
-
, where is the set of length-(s-1) walks given by the operator
-
-
, where is the set of length- walks over the vertex set (with the latter being the set of length- walks on the replacement product graph as mentioned above).
-
-
, where is the set of length- walks 666For simplicity we chose the number of vertices in all walks of the cascade to be , but it could naturally be some depending on . over the vertex set . Similarly to the cascade of expander walks above, the lift can be thought of as being realized by taking walks using a suitable operator analogous to . Since its description is more technical we postpone its definition (see Definition 7.2) to Section 7.2 where it is actually used.
-
-
denotes the final code in the sequence, which will later be chosen so that its bias is at most .
6 Unique Decoding of Ta-Shma Codes
We show how code cascading together with list decoding for each level of the cascade allow us to obtain an efficient unique decoding algorithm for Ta-Shma’s construction. We obtain a sequence of results of increasing strength culminating in Theorem 1.1 (which we recall below for convenience). The approach is as follows: we use several different instantiations of Ta-Shma’s construction, each yielding a value of (for the -wide replacement product) and expansion parameters for the family of outer and inner graphs, and show how the list decoding framework can be invoked in the associated cascade for each one.
See 1.1
In this section, we will fit these objects and tools together assuming the parameters are chosen to achieve the required rates and the conditions for applying the list decoding results are satisfied. The concrete instantiations of Ta-Shma codes are done in Section 8. Establishing that the list decoding framework can be applied to this construction is done in Section 7 after which the framework is finally instantiated in Section 9.
Ta-Shma uses the direct sum lifting on an -wide replacement product graph to construct a family of -balanced codes with rate and finds parameters for such codes to have the required bias and rate. We will discuss unique decoding results for several versions of these codes. Throughout this section, we will use collections which will always be either the set of walks with vertices on an -wide replacement product graph (corresponding to the first level of the code cascade), which we denote , or a set of walks where the vertices are walks on a lower level of the code cascade.
6.1 Unique Decoding via Code Cascading
To perform unique decoding we will use the machinery of list decoding from Theorem 6.1 (proven later in Section 9), which relies on the list decoding framework of [AJQ+20]. Proving that this framework can be applied to Ta-Shma’s construction is one of our technical contributions.
Theorem 6.1 (List decoding direct sum lifting).
Let be a constant, , and
Suppose is an -balanced linear code and is the direct sum lifting of on a -splittable collection of walks . There exists an absolute constant such that if
then the code is -balanced and can be efficiently list decoded in the following sense:
If is -close to , then we can compute the list
in time
where is the running time of a unique decoding algorithm for . Otherwise, we return with the same running time of the preceding case.
Note that the requirement on in the above theorem is necessary for the lifted code to be -balanced. Splittability will imply that the walk collection is expanding, which gives us parity sampling for large . Specifically, must be large enough for to be a -parity sampler.
Applying the list decoding tool above, we can perform unique decoding in the regime of , , and being constant. With these choices the expansion required for splittability and the parity sampling strength are only required to be constants.
Lemma 6.2 (Decoding Step).
Let and . If is a walk collection on vertex set with and splittability , where and are as in Theorem 6.1, we have the following unique decoding property:
If is an -balanced linear code that can be uniquely decoded in time , then is an -balanced code that can be uniquely decoded in time .
Proof.
Using Theorem 6.1, we can list decode up to a radius of for any if we have the appropriate parameters and . Let be a received word within the unique decoding radius of . To perform unique decoding, we simply run the list decoding algorithm on and return the codeword on the resulting list which is closest to ; this will yield the correct result as long as the list decoding radius is larger than the unique decoding radius. It suffices to have . We choose parameters as follows:
-
1.
Take to ensure .
-
2.
Let be the smallest integer satisfying the assumption in Theorem 6.1 with the chosen . Take .
-
3.
Take .
Note that and satisfy the conditions of Theorem 6.1, so we can use this theorem to list decode a received word in time . To unique decode, we return the closest on the list to (or failure if the list is empty).
Iteratively using the decoding step given by Lemma 6.2 above, we obtain unique decodability of all codes in a cascade (under suitable assumptions).
Lemma 6.3 (Code Cascade Decoding).
Let and . Suppose is a code cascade where is an -balanced linear code that can be uniquely decoded in time .
If for every we have that is obtained from by a -splittable walk collection on vertex set with and , where and are as in Theorem 6.1, then is uniquely decodable in time
Proof.
We are almost ready to prove our first main theorem establishing decodability close to the Gilbert–Varshamov bound. We will need parameters for an instantiation of Ta-Shma’s code that achieves the desired distance and rate (which will be developed in Section 8.1) and a lemma relating splittability to the spectral properties of the graphs used in the construction (to be proven in Section 7.2).
Lemma 6.4 (Ta-Shma Codes I).
For any , there are infinitely many values of (with 0 as an accumulation point) such that for infinitely many values of , there are explicit binary Ta-Shma codes with
-
(i)
distance at least (actually -balanced), and
-
(ii)
rate .
Furthermore, is the direct sum lifting of a base code using the collection of walks on the -wide replacement product of two graphs and , with the following parameters:
-
-
.
-
-
The inner graph is a regular graph with , where .
-
-
The outer graph is a regular graph with , where .
-
-
The base code is unique decodable in time and has bias .
-
-
The number of vertices in the walks satisfies .
Lemma 6.5.
Let be either the collection of walks of length on the -wide replacement product with outer graph and inner graph or the collection of walks over the vertex set , where . Then is -splittable with .
The statement of this first decoding theorem is more technical than Theorem 1.1, but it will be easier to prove while the latter will build on this theorem with a more careful tuning of parameters.
Theorem 6.6 (Main I).
For every , there are infinitely many values (with an accumulation point) such that for infinitely many values of there are explicit binary linear Ta-Shma codes with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate , and
-
(iii)
a unique decoding algorithm with running time .
Proof.
We proceed as follows:
-
1.
Let and (these choices are arbitrary; we only need , , and ). Let be the constant from Theorem 6.1 with this value of .
-
2.
Given , Lemma 6.4 provides a value such that the direct sum lifting on the -wide replacement product with can achieve a rate of for infinitely many . Choose to be an integer larger than both and that also satisfies
(3) where is the constant from Theorem 6.1.
-
3.
Use Lemma 6.4 with this value of to obtain graphs and and a base code having the specified parameters , , , and , with the additional requirement that for some integer . These parameter choices ensure that the resulting code has the desired distance and rate. Since , we have . From the choice of satisfying , we deduce that . Note also that the bias of the code is smaller than .
-
4.
Create a code cascade with levels using the -wide replacement product of the graphs and as in Section 5.2, starting with and ending with the final code . As the total number of vertices in a walk is , each level of the code cascade will use walks with vertices. Let be the sequence of codes in this cascade.
-
5.
In order to satisfy the splittability requirement of Lemma 6.3, the walk collection at each level of the code cascade must be -splittable, where . (We use instead of in the requirement for a technical reason that will be clear in Section 8.2.) The bounds on the singular values of and and Lemma 6.5 ensure that
which is smaller than by Eq. 3
-
6.
As all hypotheses of Lemma 6.3 are satisfied by the code cascade, we apply it to conclude that is uniquely decodable in time
where we use that is uniquely decodable in time , , for every , and .
In the code cascade constructed in Theorem 6.6, the final number of vertices in a walk is , where is a sufficiently large constant that does not depend on . The limited choices for place some restrictions on the values of the final bias that can be achieved. To achieve any bias for we need to choose the parameters more carefully, which will be done in Section 8.2 to yield our next main result.
Theorem 6.7 (Main II).
For every and every with and sufficiently small, there are explicit binary linear Ta-Shma codes for infinitely many values with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate , and
-
(iii)
a unique decoding algorithm with running time .
Ta-Shma obtained codes of rate with vanishing as goes to zero. We obtain a unique decoding algorithm for this regime (with slightly slower decreasing as vanishes). More precisely, using the parameters described in Section 8.3 and the running time analysis in Section 6.2, we obtain the following theorem which is our main result for unique decoding.
Theorem 6.8 (Main Unique Decoding (restatement of Theorem 1.1)).
For every sufficiently small, there are explicit binary linear Ta-Shma codes for infinitely many values with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate where , and
-
(iii)
a unique decoding algorithm with running time .
Furthermore, if instead we take to be an arbitrary constant, the running time becomes (fixed polynomial time).
Theorem 1.2 about gentle list decoding is proved in Section 8.4 after instantiating Ta-Shma codes in some parameter regimes in the preceding parts of Section 8.
6.2 Fixed Polynomial Time
In Theorem 6.7, a running time of was obtained to decode Ta-Shma codes of distance and rate for constant and block length . The running time contains an exponent which depends on the bias and is therefore not fixed polynomial time. We show how to remove this dependence in this regime of being an arbitrary constant. More precisely, we show the following.
Theorem 6.9 (Fixed PolyTime Unique Decoding).
Let be an arbitrary constant. For every sufficiently small, there are explicit binary linear Ta-Shma codes for infinitely many values with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate , and
-
(iii)
a unique decoding algorithm with fixed polynomial running time .
The list decoding framework finds a list of pairs of size at most at each level of the code cascade and recursively issues decoding calls to all in this list. Since the number of lifts in the cascade is , we end up with an overall running time of .
We will describe a method of pruning these lists which will lead to fixed polynomial running time. Let , where is a constant, be the list decoding radius used for a unique decoding step in the cascade. To achieve fixed polynomial time we will prune this polynomially large list of words to a constant size at each inductive step in Lemma 6.3. As we are working with parameters within the Johnson bound, the actual list of codewords has constant size . At every step, we will be able to find a small sublist whose size only depends on that has a certain covering property, and then issue decoding calls to this much smaller list.
Definition 6.10 (-cover).
Let , be a code, , and . We say that is a -cover of if for every , there exists some with (that is, either or ).
Lemma 6.11 (Cover Compactness).
Let , be a linear -balanced code, be an -balanced code, and . Define
Suppose is a -cover of for some . Further, suppose that for every , we have . If is a -parity sampler, then there exists with which is a -cover of .
Proof.
Build a graph where the vertices are pairs and two vertices , are connected iff . Let be any maximal independent set of this graph. Any two vertices have and thus since is a -parity sampler. This means that is a code of distance at least . By the condition that for all and the Johnson bound, we have .
Finally, we will show that is a -cover of . Let . As is a -cover of , there exists a pair with , so is within distance of either or its complement . The construction of as a maximal independent set ensures that there is some with , so is also within distance of either or its complement . Applying the triangle inequality in all of the possible cases, we see that either or , which implies is a -cover of .
To decode in fixed polynomial time, we use a modification of the list decoding result Theorem 6.1 that outputs a -cover of the list of codewords instead of the list itself. Theorem 6.1 recovers the list by finding and unique decoding every element of it. To get , we use the same algorithm, but stop before the final decoding step. This removes the unique decoding time of the base code from the running time of the list decoding algorithm. We will apply Lemma 6.11 after each time we call this -cover algorithm to pare the list down to a constant size before unique decoding; note that this loses a factor of 2 in the strength of the cover. To compensate for this, we will use a collection with stronger parity sampling than required for Theorem 6.1. In that theorem, was a -parity sampler to ensure that we obtained words within the list decoding radius of the base code. By using a stronger parity sampler, the words in the pruned list will still be within the unique decoding radius even after accounting for the loss in the bias from cover compactness, which means decoding will still be possible at every level of the cascade. Fortunately, improving the parity sampling only requires increasing the walk length by a constant multiplicative factor. The cover retrieval algorithm below will be proven in Section 9.
Theorem 6.12 (Cover retrieval for direct sum lifting).
Let be a constant, , , and
Suppose is an -balanced linear code and is the direct sum lifting of on a -splittable collection of walks . There exists an absolute constant such that if
then the code is -balanced, is a -parity sampler, and we have the following:
If is -close to , then we can compute a -cover of the list
in which for every , in time
Otherwise, we return with the same running time of the preceding case.
We now have all of the pieces necessary to prove Theorem 6.9. The process is essentially the same as our earlier unique decoding algorithm, except we use the cover retrieval algorithm from Theorem 6.12 instead of the full list decoding from Theorem 6.1. This allows us to insert a list pruning step in between obtaining the -cover and calling the unique decoding algorithm for the previous level of the cascade.
Proof of Theorem 6.9.
We use the code from Theorem 6.7 to get the desired distance and rate, with the slight modification of ensuring is larger than from Theorem 6.12 rather than from Theorem 6.1.
Each level of the code cascade between and uses constant and , which allows for decoding in a similar fashion to Lemma 6.2 and Lemma 6.3. The difference is that we use Theorem 6.12 as the decoding step to obtain a -cover of the list for , where . By Lemma 6.11 and the fact that the walk collection is a -parity sampler, has a -cover of size at most . The covering property says that for every , there exists some such that is within distance of either or its complement . This is the unique decoding radius of the -balanced code , so we can recursively decode the list
to obtain the complete list of codewords in .
Now we analyze the running time. On each level of the code cascade, we run the cover retrieval algorithm once to get , prune the cover to get , and then feed the union of and its complement (which has size at most ) into the unique decoding algorithm for the next level of the cascade. Letting be the running time of unique decoding a single word in the code , we have the following recurrence:
Note that the base code has constant bias and thus it has a fixed polynomial time decoding algorithm (e.g. Theorem 6.7). The height of the recursive call tree is the number of levels in the code cascade, which is , as in the proof of Theorem 6.6. Each node of this tree has a constant branching factor of . Thus, the tree has nodes, each of which costs at most time. Furthermore, in this regime of being a constant, is constant as well as , so we have and the total running time is .
7 Satisfying the List Decoding Framework Requirements
The list decoding framework of [AJQ+20] is capable of decoding codes obtained from direct sum liftings, provided they satisfy a few requisite properties. The framework was originally shown to work for expander walks; we need to adapt it to our case of a code cascade based on walks on the -wide replacement product. We will start with a broad overview of the list decoding algorithm and point out where various requirements arise.
The problem of finding a list of codewords in a direct sum lifting close to a received word can be viewed as finding approximate solutions to a -XOR instance. This is done by solving a particular SOS program and rounding the resulting solution. The algorithm is unable to perform rounding if the -XOR instance is based on an arbitrary collection of walks ; it can only handle liftings in which satisfies a property called tensoriality. If is tensorial, the SOS local variables in the solution can be approximated by product distributions, which will allow us to obtain a list of solutions by independent rounding. Tensoriality for expander walks is a consequence of a simpler property known as splittability, which is a certain measure of the expansion of a walk collection.
Unfortunately, the list returned by the rounding process will not contain codewords directly—instead, we only get a guarantee that all of the codewords we are looking for have a weak agreement (just over 1/2) with something on this list. We will find the desired codewords by relying on the parity sampling of . If is a sufficiently good parity sampler, weak agreement in the lifted space corresponds to a much stronger agreement in the ground space. This will allow us to recover the codewords using the unique decoding algorithm of the base code.
To recap, applying the list decoding framework in our setting requires doing the following:
-
1.
Proving parity sampling for the walks used in the code cascade (Section 7.1).
-
2.
Showing that the walk collection of the -wide replacement product is splittable (Section 7.2).
-
3.
Making Ta-Shma’s construction compatible with the Sum-of-Squares machinery (Section 7.3) and then obtaining tensoriality from splittability (Section 7.4).
An additional complication is introduced by using a code cascade instead of a single decoding step: the above requirements need to be satisfied at every level of the cascade. The details of the proofs will often differ between the first level of the cascade, which is constructed using walks on the -wide replacement product, and higher levels, which are walks on a directed graph whose vertices are walks themselves. Once we have established all of the necessary properties, we will instantiate the list decoding framework in Section 9.
We will first define some convenient notation which will be used throughout this section.
Notation 7.1.
Let be a -regular outer graph and be a -regular inner graph used in Ta-Shma’s -wide replacement product.
Let be integers. We define to be the set of all walks starting at time and ending at time in Ta-Shma’s construction. More precisely, since and are regular graphs, the collection contains all walks obtained by sampling a uniform vertex and applying the operator
where the index of each is taken modulo . Observe that when , we have .
We define a family of Markov operators which will play a similar role to the graphs from the cascade described in Section 5.1, but for Ta-Shma’s construction rather than expander walks.
Definition 7.2 (Split Operator).
Let . We define the graph walk split operator
such that for every ,
where denotes the concatenation of the walks and . The operator can be defined more concretely in matrix form such that for every and ,
7.1 Parity Sampling for the Code Cascade
To be able to apply the list decoding machinery to the code cascade , we need the direct sum lifting at every level to be a parity sampler. The first level in the cascade uses walks directly on the -wide replacement product, which we can show is a good parity sampler using the spectral properties proven in Section 4.3.1. However, it will be more convenient for calculating parameters later on to prove a weaker result, which will suffice for our purposes since we only need to obtain constant bias for every level of the cascade. We analyze the parity sampling of these walks with the same strategy Ta-Shma employed to show parity sampling for walks on expander graphs (which resulted in Theorem 5.2).
Claim 7.3.
Let be the collection of walks on the -wide replacement product of the graphs and and be a word with . Let be the diagonal matrix with entries for . If for all , then
Proof.
Let be even. Take a vector with and let and be its parallel and orthogonal components to the all ones vector. For , let . Consider two terms of the product appearing in the claim. Since is unitary, . We have
Applying this inequality to every two terms of the product, the result follows.
Corollary 7.4.
Let be the collection of walks on the -wide replacement product of the graphs and and . If for all , then is an -parity sampler, where .
Proof.
Let have bias at most . The bias of is given by 777This is slightly different from the expression for the bias given in Section 4.3, but both are equal since moving on the component of the graph doesn’t affect the bit assigned to a vertex.
where is the diagonal matrix with entries for and is the all-ones vector. Since is unitary, we have
by 7.3. Hence is an -parity sampler.
For higher levels of the cascade, we need to prove parity sampling for collections of walks over walks. Since the walks on the first level contain vertices, when we take walks on higher levels, the operator linking different walks together will always use as the walk operator for the step. Thus we can consider a more specific form of the split operator where we split at a time parameter that is one less than a multiple of .
Definition 7.5.
Let be a positive integer. We define the operator as
where , , and . In this case, .
All levels of the code cascade beyond the first use walks generated by the directed operator . Proving parity sampling for these walks is analogous to the proof of Corollary 7.4, but slightly simpler since the walk operator doesn’t change with each step.
Claim 7.6.
Let be a positive integer and be a word with . Let be the diagonal matrix with entries for . For every integer , we have
Proof.
Take a vector with and let and be its parallel and orthogonal components to the all ones vector. Since is unitary, . We have
As , the result follows.
Corollary 7.7.
Let be a positive integer and . The collection of walks with vertices over the vertex set using random walk operator is an -parity sampler, where .
Proof.
Let have bias at most . The bias of the direct sum lifting of is given by
where is the diagonal matrix with entries for and is the all-ones vector. Since is unitary, we have
by 7.6. Hence is an -parity sampler.
7.2 Splittability of Ta-Shma’s Construction
We investigate the splittability of the collection of walks generated by Ta-Shma’s construction. In order to formally define this property, we will need the concept of an interval splitting tree, which describes how a walk is split into smaller and smaller pieces.
Definition 7.8 (Interval Splitting Tree).
We say that a binary rooted tree is a -interval splitting tree if it has exactly leaves and
-
-
the root of is labeled with for some , and
-
-
each non-leaf non-root vertex of is labeled with for some integer . Suppose is the label assigned to the parent of . If is a left child, we must have and ; otherwise, we must have and .
Given an interval splitting tree , we can naturally associate a split operator to each internal node . The splittability of a collection of -tuples is a notion of expansion at every node in the splitting tree.
Definition 7.9 (-splittability).
The collection is said to be -splittable if is a -interval splitting tree and
for every internal node of .
If there exists some -interval splitting tree such that is -splittable, then will be called -splittable.
In order to prove that the collection of walks in Ta-Shma’s construction is splittable, a split operator can be related to the walk operator as shown below. This structural property will allow us to deduce spectral properties of from the spectrum of .
Lemma 7.10.
Let . Suppose is a -regular outer graph on vertex set with walk operator used at step of a walk on the -wide replacement product and is a -regular inner graph on vertex set with normalized random walk operator . Then there are orderings of the rows and columns of the representations of and as matrices such that
where is the all ones matrix.
Proof.
Partition the set of walks into the sets , where if the last vertex of the walk satisfies and . Similarly, partition into the sets , where if the first vertex of the walk satisfies and . Note that and for all , since there are choices for each step of the walk.
Now order the rows of the matrix so that all of the rows corresponding to walks in appear first, followed by those for walks in , and so on in lexicographic order of the indices of , with an arbitrary order within each set. Do a similar re-ordering of the columns for the sets . Observe that
which only depends on the adjacency of the last vertex of and the first vertex of . If the vertices and are adjacent, then
for every and ; otherwise, . Since the walks in the rows and columns are sorted according to their last and first vertices, respectively, the matrix exactly matches the tensor product .
Corollary 7.11.
Let . Suppose is a -regular outer graph with walk operator used at step of a walk on the -wide replacement product and is a -regular inner graph with normalized random walk operator . Then
Proof.
Remark 7.12.
Corollary 7.11 is what causes the splittability argument to break down for Ta-Shma’s original construction, as .
By combining this result with the spectral bound from 4.4, we find that the collection of walks of length on the -wide replacement product is -splittable for any splitting tree , where is controlled by the second singular values of the graphs and . This analysis can also be applied to walks on higher levels of the cascade where the vertex set is .
Corollary 7.13 (Restatement of Lemma 6.5).
The collection of walks on the -wide replacement product with outer graph and inner graph and the collection of walks on the vertex set with random walk operator and are both -splittable with .
Proof.
By Corollary 7.11 and 4.4, the split operator for any satisfies
so is -splittable with , as any internal node of any -interval splitting tree will have . The split operators of any -interval splitting tree for the collection are of the form with and , which means is -splittable as well.
7.3 Integration with Sum-of-Squares
Before defining tensoriality and obtaining it in our setting, we examine how the Sum-of-Squares hierarchy is used in the list decoding algorithm in more detail.
7.3.1 SOS Preliminaries: -local PSD Ensembles
The SOS hierarchy gives a sequence of increasingly tight semidefinite programming relaxations for several optimization problems, including CSPs. Since we will use relatively few facts about the SOS hierarchy, already developed in the analysis of Barak, Raghavendra and Steurer [BRS11], we will adapt their notation of -local distributions to describe the relaxations.
Solutions to a semidefinite relaxation of a CSP on boolean variables using levels of the SOS hierarchy induce probability distributions over for any set with . These distributions are consistent on intersections: for , we have , where denotes the restriction of the distribution to the set . We use these distributions to define a collection of random variables taking values in such that for any set with , the collection of variables has joint distribution . Note that the entire collection may not have a joint distribution: this property is only true for sub-collections of size at most . We will refer to the collection as a -local ensemble of random variables.
For any with and any , we can define a -local ensemble by “conditioning” the local distributions on the event , where is shorthand for the collection . For any with , we define the distribution of as .
Finally, the semidefinite program also ensures that for any such conditioning, the conditional covariance matrix
is positive semidefinite, where . Here, for each pair the covariance is computed using the joint distribution . In this paper, we will only consider -local ensembles such that for every conditioning on a set of size at most , the conditional covariance matrix is PSD. We will refer to these as -local PSD ensembles. We will also need a simple corollary of the above definitions.
Fact 7.14.
Let be a -local PSD ensemble and For , define to be the collection of tuples of size appearing in elements of . For all , the collection is a -local PSD ensemble, where .
For random variables in a -local PSD ensemble, we use the notation to denote the distribution of (which exists when ). As we will work with ordered tuples of variables instead of sets, we define for based on the set , taking care that repeated elements of are always assigned the same value.
Definition 7.15 (Plausible assignment).
Given and an assignment , we say that is plausible for if there are no distinct such that but .
The distribution is defined as if is plausible for , and otherwise.
7.3.2 Tensoriality
A key algorithm in the list decoding framework is propagation rounding (7.16), which solves a CSP to find solutions close to a codeword. Suppose is a collection of walks, or more generally, a collection of any -tuples. The algorithm starts with a local PSD ensemble which is the solution to an SOS program for list decoding. Propagation rounding takes this solution and conditions some of the variables according to a random assignment to these variables to yield another local PSD ensemble .
Algorithm 7.16 (Propagation Rounding Algorithm, adapted from [AJQ+20]).
Input
An -local PSD ensemble and collection .
Output
A random assignment and -local PSD ensemble .
1.
Choose uniformly at random.
2.
For , sample a walk independently and uniformly from .
3.
Write for the set of the seed vertices.
4.
Sample an assignment according to the local distribution .
5.
Set , i.e. the local ensemble
conditioned on agreeing with .
6.
For all , sample independently .
7.
Output and .
If the collection used in the direct sum lifting is amenable to SOS rounding, the conditioned ensemble will be able to recover a word close to some codeword on the list. This is quantified by the following tensorial properties. We will see shortly how splittability will be used to obtain tensoriality in our setting.
Definition 7.17 (Tensorial Walk Collection).
Let , , and . Define to be the set of all tuples obtainable in propagation rounding (7.16) on with SOS degree parameter . We say that is -tensorial if the local PSD ensemble returned by propagation rounding satisfies
(4) |
The framework actually uses a strengthening of the above property, in which variables for pairs of walks chosen independently approximately behave as a product.
Definition 7.18 (Two-Step Tensorial Walk Collection).
Let , , and . Define to be the set of all tuples obtainable in propagation rounding (7.16) on with SOS degree parameter . We say that is -two-step tensorial if it is -tensorial and the local PSD ensemble returned by propagation rounding satisfies the additional condition
7.3.3 From Directed to Undirected
In order to apply the list decoding framework using the directed split operator , we will replace it with the symmetrized version
and show how corresponds to a particular undirected graph.
Definition 7.19.
Let . We define the operator such that for every ,
for every .
The operator defines an undirected weighted bipartite graph on the vertices . We can see that is the adjoint of , which means that each edge in this graph is weighted according to the transition probability from one walk to the other whenever one of , is in and the other is in .
Claim 7.20.
Proof.
Let and . For , define to be the uniform distribution on . We show that . On one hand we have
On the other hand we have
Hence, as claimed.
7.3.4 Variables for Walks on the -wide Replacement Product
When analyzing walks on the -wide replacement product, we actually need to use two separate, but related, local PSD ensembles. In Ta-Shma’s construction, the vertices of the outer graph correspond to positions in the base code , where . Given a vertex in the -wide replacement product and codeword , is assigned bit , regardless of the vertex of the inner graph. We will enforce this property by working with variables in rather than the full . The local PSD ensemble contains one variable for every vertex of , with local distributions for sets of variables up to a given size. For a walk on the -wide replacement product, we will use as an abbreviation for , where is the set of all -components of vertices visited on the walk.
The constraints of the CSP are placed on walks on the -wide replacement product that do care about the -component of the vertices, so we define a second local PSD ensemble with a variable for each vertex of the -wide replacement product of and . It is this collection for which we need to prove tensoriality in order to use the list decoding framework. When we perform propagation rounding, we condition the ensemble on a random assignment to a subset , rather than conditioning on a random assignment to a subset of . Working with ensures that the rounded assignments will be consistent on each cloud of the -wide replacement product. Since the bit assigned to a vertex only depends on , independent rounding of will also yield the desired rounding of .
We can define based on the ensemble more concretely. Suppose is a subset of size at most , where is the locality of the ensemble, and define . The distribution of is defined based on the distribution of by , where is an assignment to whose value on each vertex only depends on .
Observe that the introduction of the ensemble is only necessary on the first level of the Ta-Shma code cascade between the codes and , which takes place on the -wide replacement product. Higher levels of the cascade use walks on graphs whose vertices are the walks from the level below. The association of the bits of a codeword to the vertices of this graph has no consistency requirement, so we simply use a single local ensemble with a variable for each vertex.
7.4 Splittability Implies Tensoriality
The connection between splittability and tensoriality will be made with the help of a version of the triangle inequality.
Claim 7.21 (Triangle inequality, adapted from [AJQ+20]).
Let and be an -interval splitting tree. Then
where the sum is taken over the labels of the internal nodes of .
To prove tensoriality, we will use the method of [BRS11] and [AJT19] to show that we can break correlations over expanding collections of tuples arising in the -wide replacement product of the form
appearing on the right-hand side of the triangle inequality.
7.4.1 The First Level of the Cascade
We now check the technical details to obtain tensoriality for the first lifting in the code cascade between the codes and , which corresponds to taking steps in Ta-Shma’s construction. Recall that in order to obtain an assignment whose lifting is consistent on vertices with the same -component, we need to prove tensoriality for the ensemble with a variable for each vertex in .
The proof of tensoriality will make use of a specific entropic potential function. For an arbitrary random variable taking values in a finite set , define the function as
where H is the binary entropy function. Using this, we define a potential function for a weighted undirected graph .
Definition 7.22 (Graph Potential).
Let be a weighted graph with edge distribution . Let be the marginal distribution on . Suppose that is a -local PSD ensemble for some . We define to be
Let be an -interval splitting tree associated with the -wide replacement product of graphs and . We define
where is the associated bipartite undirected graph of the operator .
Lemma 7.23 (Splittability Implies Tensoriality).
Let be the walk collection of the -wide replacement product of two graphs and . If and is -splittable with , then is -tensorial.
Proof.
We need to show that
which can be proven by adapting a potential argument technique from [BRS11]. First, set the potential
(5) |
where the distribution on is obtained from the process of choosing in propagation rounding (7.16) once has been fixed. Consider the error term
(6) |
where . If , then
For each choice of and such that , applying the triangle inequality from 7.21 to the conditioned variables gives us
Hence, there exists such that
Note that choosing uniformly and restricting to gives a uniformly random element of . If we choose or with equal probability, then the final walk is distributed according to the stationary measure of . Let denote the chosen walk. Observe that is a deterministic function of . Now, we sample , which gives us a sample of . Applying Lemma A.1, we have
This conditioning on an assignment to does not increase the other terms of associated to split operators other than since entropy is non-increasing under conditioning. Similarly, conditioning on the remaining variables that are part of but not does not increase . Then, we obtain
Since , there can be at most indices such that . In particular, since the total number of indices is , we have
Our choice of is more than enough to ensure .
Applying the list decoding framework will require the stronger property of two-step tensoriality, which we can obtain under the same assumptions.
Lemma 7.24 (Splittability Implies Two-step Tensoriality).
Let be the walk collection of the -wide replacement product of two graphs and . If and is -splittable with , then is -two-step tensorial.
Proof.
Under our assumptions the -tensorial property follows from Lemma 7.23 (which is the only place where the assumption on is used), so we only need to show
which can be proven by adapting a potential argument technique from [BRS11]. First, set the potential
(7) |
where the distribution on is obtained from the process of choosing in propagation rounding (7.16) once has been fixed. Consider the error term
(8) |
where . If , then
Let be the graph with edges . Local correlation (expectation over the edges) on this graph is the same as global correlation (expectation over two independent copies of vertices). Then, we obtain 888See [AJT19] or [BRS11] for the details.
Since , there can be at most indices such that . In particular, since the total number of indices is , we have
Our choice of is more than enough to ensure .
We have already established that is -splittable with in Corollary 7.13, so we can obtain -two-step tensoriality for any if this quantity is small enough.
7.4.2 Higher Levels of the Cascade
We now discuss tensoriality of the other levels of the cascade between and for . Tensorial properties are simpler to establish here than on the first level of the cascade. The relevant split operators are special cases of where and . The main difference now is that we can associate the parity bits of with the vertices of , which themselves represent walks. As this association of parity bits doesn’t need to satisfy a consistency condition, we only need to work with a single ensemble instead of working with two different ensembles as in the previous case. The proofs of Lemma 7.23 and Lemma 7.24 with these slight modifications give us two-step tensoriality.
Lemma 7.25 (Two-step Tensoriality for Higher Levels).
Let be the set of walks defined using steps of the operator . If and is -splittable with , then is -two-step tensorial.
We know that the collection of walks obtained from is -splittable, so the parameters necessary to obtain two-step tensoriality are the same as in the first level of the cascade.
8 Choosing Parameters for Ta-Shma’s Construction
We explore how some choices of parameters for Ta-Shma’s construction interact with the requirements of our decoding algorithm. The analysis is divided into rounds of increasingly stronger decoding guarantees with later rounds relying on the codes obtained in previous rounds. Naturally, the stronger guarantees come with more delicate and technical considerations. We briefly summarize the goals of each round and some key parameters.
-
1.
Round I: For any constant , we obtain efficient unique decodable codes with distance at least and rate for infinitely many discrete values of (with as close to as desired). In this regime, it suffices for the expansion of to be constant. This round leads to Theorem 6.6.
-
2.
Round II: Similar to Round I, but now can be any value in an interval with being a function of . Again the expansion of can be constant. This round leads to Theorem 6.7.
-
3.
Round III: We want to vanish as vanishes (this is qualitatively similar to Ta-Shma’s result). In this regime, we make the expansion of be a function of , and we rely on the uniquely decodable codes of Round II. This round leads to Theorem 1.1.
-
4.
Round IV: For any constant , we obtain efficient list decodable codes with list decoding radius and rate with as . In this regime, we make the expansion of be a function of , and we rely on the uniquely decodable codes of Round III. This round leads to Theorem 1.2.
The way we choose parameters for Ta-Shma’s construction borrows heavily from Ta-Shma’s arguments in [TS17]. We fix some notation common to all rounds. A graph is said to be an -graph provided it has vertices, is -regular, and has second largest singular value of its normalized adjacency matrix at most .
Notation 8.1.
We use the following notation for the graphs and used in the -wide replacement product.
-
-
The outer graph will be an -graph.
-
-
The inner graph will be a -graph.
The parameters and will be chosen in the subsequent sections.
8.1 Round I: Initial Analysis
We are given the dimension of the desired code and . We set a parameter such that (for convenience) is a power of and
(9) |
We can assume that satisfy Eq. 9 since otherwise is a constant and we can use the list decodable codes from [AJQ+20]. The use of Eq. 9 will be clear shortly. It becomes a necessity from round III onward. For rounds I and II, the parameter will be a constant, but it will be useful to establish the analysis in more generality now so that subsequent rounds can reuse it.
The inner graph . The choice of is similar to Ta-Shma’s choice. More precisely, we set and (Ta-Shma took ). We obtain a Cayley graph such that is an graph where and . (The set of generators, , comes from a small bias code derived from a construction of Alon et al. [AGHP92], but we will rely on Ta-Shma’s analysis embodied in Lemma B.2 and not discuss it further.)
The base code . Set (this choice differs from Ta-Shma’s and it appears because we are essentially working with rather than ). We will choose a base code such that the desired code will be obtained as a direct sum lifting of , and because this lifting preserves the dimension, the dimension of should be . We choose to be an -balanced code with dimension and block length . For instance, we can start with any good (constant rate and relative distance) linear base code that has an efficient unique decoding algorithm and obtain a -balanced lifted code that can be efficiently unique decoded (as long as is constant) using the framework in [AJQ+20].
The outer graph . Set so that as required by the -wide replacement product. We apply Ta-Shma’s explicit Ramanujan graph Lemma B.1 with parameters , and to obtain an Ramanujan graph with and or . Here, is an error parameter that we set as (this choice of differs from Ta-Shma). Because we can construct words with block length (if needed) by duplicating each codeword, we may assume w.l.o.g. that is close to and . See Appendix B for a more formal description of this graph.
Note that since . Hence, .
The walk length. Set the walk length to be the smallest integer such that
This will imply using Ta-Shma’s analysis that the bias of the final code is at most as shown later.
Claim 8.2.
We have .
Proof.
Remark 8.3.
By our choice of , we have . Since , we get .
Final Bias. We denote by the final code obtained by steps of the -wide replacement product. The bias of is given by Corollary 4.10 (which in turn is a simple corollary of Ta-Shma’s 4.9) as shown next.
Corollary 8.4.
The code is -balanced.
Proof.
Using Corollary 4.10, we have that the final bias
is bounded by
where the last inequality follows from and , the latter from 8.2.
Rate. The proof of the rate follows a similar structure of Ta-Shma’s original argument except that we take to be a constant independent of so that , , and are also constants independent of . Note that we previously said needs to satisfy Equation 9, but that implies only an upper bound for , and smaller (even constant) values for are still permissible.
Claim 8.5.
has rate provided is constant.
Proof.
The support size is the number of walks of length on the -wide replacement product of and (each step of the walk has options), which is
where the penultimate equality follows from the assumption that is a constant.
Lemma 8.6 (Codes Near the GV bound I).
For every constant , there exists a sufficiently large constant in the above analysis so that for any dimension value (sufficiently large) and (sufficiently small) the final code , where is the block length, satisfies
-
-
is -balanced,
-
-
has rate , and
-
-
is a linear code of dimension .
Remark 8.7.
As a consequence of code cascading, the final attainable walk lengths have the form where is a positive integer. Given , we have infinitely many values of attainable by such walk lengths which gives infinitely many codes . This means that although the bias cannot be arbitrary, we have an infinite sequence of values of for which the rates of the codes are near the GV bound. In Section 8.2, we show how to bypass this artificial limitation. These codes are used in the proof of Theorem 6.6.
We can view the above analysis as defining a function that receives
-
-
the dimension ,
-
-
the final bias ,
-
-
the approximating error with being a power of two, and
-
-
a multiplying factor such that (in the above was ).
and outputs a tuple of parameters , graphs and (as above) where, in particular, the number of steps is such that the final code has bias at most and rate .
8.2 Round II: A More Careful Analysis
We are given the dimension of the code and . As before, we set a parameter such that (for convenience) is a power of . Set and .
Apply to to obtain all parameters except . Choose to be the smallest integer satisfying
where observe that an extra factor appears in the exponent. This change in will worsen the rate but by losing a factor of in the exponent, we can lower bound the rate. That is, .
Set to be the smallest value such that (here we are implicitly assuming that ). If , we are done since we can use all the parameters returned by for the construction of . Now assume and let . Note that . Choose to be the integer in the interval such that
Because , and only powers of may be chosen for walk length, we might overshoot in walk length by a multiplicative factor of . This will cause a corresponding decay in rate computation that we cannot afford. To overcome this, in the last level of the cascade between codes and , perform the direct sum over walks of length instead of length . The new total number of vertices is . Note that can be as large as , so our splittability guarantee of (the walk collection from the lift between and ) has to be strong enough to accommodate this larger arity and not only arity .
Claim 8.8.
We have .
Proof.
By construction, we have the sequence of implications
from which we obtain
and
the latter using .
We apply again but this time to to obtain new parameters , , , , , and graphs and .
Claim 8.9.
The code obtained by walk steps on the -wide replacement product of and from the second application of has bias at most and rate .
Proof.
Let , and be the parameters of the first invocation of . Recall that was chosen to be the smallest integer satisfying
Let , and be the parameters of the second invocation of . Observe that
Then the bias of is at most
For the rate computation of , we will lower bound the term . Since , and (the latter by 8.8), the rate of is
8.3 Round III: Vanishing as Vanishes
We are given the dimension of the code and . As before, we set a parameter such that (for convenience) is a power of . Set .
We will consider the regime where is a function of . As a consequence, the parameters will also depend on . Since for (and ), if satisfies , it also satisfies Eq. 9 (we lose a log factor by replacing by , but we will favor simplicity of parameters). In particular, we can set so that is
and satisfy Eq. 9.
We follow the same choices as in Round II except for the base code .
The base code . Set . We choose an -balanced code with support size where (this choice of is arbitrary, it is enough to have as a fixed small constant) using the construction from Round II. It is crucial that we can unique decode (using our algorithm), since this is required in order to apply the list decoding framework.
Note that is no longer a constant. For this reason, we need to consider the rate computation of the final code more carefully. The proof will follow an argument similar to Ta-Shma’s.
Claim 8.10.
has rate where .
Proof.
The support size is the number of walks of length on the -wide replacement product of and (each step of the walk has options), which is
From this point the proof continues exactly as the proof of 8.5.
8.4 Round IV: Arbitrary Gentle List Decoding
In round III, when we take
we will have provided is large enough. This non-constant will allow us perform “gentle” list decoding with radius arbitrarily close to . More precisely, we have the following.
Theorem 8.11 (Gentle List Decoding (restatement of Theorem 1.2)).
For every sufficiently small, there are explicit binary linear Ta-Shma codes for infinitely many values with
-
(i)
distance at least (actually -balanced),
-
(ii)
rate where , and
-
(iii)
a list decoding algorithm that decodes within radius in time .
Proof.
We consider some parameter requirements in order to apply the list decoding framework Theorem 9.1 between and . Suppose we want to list decode within radius . For parity sampling, we need
Since the number of vertices in a walk can be at most , for splittability we need
In particular, we can take and satisfy both conditions above.
9 Instantiating the List Decoding Framework
We established the tensoriality (actually two-step tensoriality) and parity sampling properties of every lifting between consecutive codes and in Ta-Shma’s cascade. Using these properties, we will be able to invoke the list decoding framework from [AJQ+20] to obtain the following list decoding result.
Theorem 9.1 (Restatement of Theorem 6.1).
Let be a constant, , and
Suppose is an -balanced linear code and is the direct sum lifting of on a -splittable collection of walks , where is either the set of walks on an -wide replacement product graph or a set of walks using the random walk operator . There exists an absolute constant such that if
then the code is -balanced and can be efficiently list decoded in the following sense:
If is -close to , then we can compute the list
in time
where is the running time of a unique decoding algorithm for . Otherwise, we return with the same running time of the preceding case 999In the case is not -close to , but the SOS program turns out to be feasible, some of the calls to the unique decoding algorithm of (issued by the list decoding framework) might be outside all unique decoding balls. Such cases may be handled by returning failure if the algorithm does not terminate by the time . Even if a codeword in is found, the pruning step of list decoding [AJQ+20] will return an empty list for since is not -close to ..
9.1 List Decoding Framework
We recall the precise statement of the list decoding framework tailored to direct sum lifting.
Theorem 9.2 (List Decoding Theorem (Adapted from [AJQ+20])).
Suppose is an -two-step tensorial direct sum lifting from an -balanced code to on a multiset which is a -parity sampler.
Let be -close to . Then the List Decoding algorithm returns the coupled code list . Furthermore, the running time is where is the running time of an unique decoding algorithm of .
We apply the list decoding framework of Theorem 9.2 to the liftings arising in the Ta-Shma cascade to obtain Theorem 9.1. This requires choosing parameters so that both the parity sampling and tensoriality requirements are met at every level of the cascade, which we do by appealing to our results from Section 7.
Proof of Theorem 9.1.
We want to define parameters for -splittability so that satisfies strong enough parity sampling and tensoriality assumptions to apply Theorem 9.2.
For parity sampling, we require to be an -parity sampler. Suppose is -splittable with . By Corollary 7.4 or Corollary 7.7 and splittability, the collection of walks is an -parity sampler, where . To achieve the desired parity sampling, we take and choose a value of large enough so that . Using the assumption , we compute
which will be smaller than as long as is at least
Achieving this level of parity sampling also ensures that the lifted code is -balanced.
The list decoding theorem also requires -two-step tensoriality. Lemma 7.24 (with ) and Lemma 7.25 each provide -two-step tensoriality for -splittable walk collections on the -wide replacement product and using , respectively, with
To get , we require
where and are (very large) constants. This ensures that is small enough for the parity sampling requirement as well. With these parameters, the running time for the list decoding algorithm in Theorem 9.2 becomes
For decoding in fixed polynomial time, we also need a variation of list decoding where we don’t run the unique decoding algorithm of the base code and only obtain an approximate list of solutions. The proof is very similar to the proof of Theorem 9.1 above.
Theorem 9.3 (Restatement of Theorem 6.12).
Let be a constant, , , and
Suppose is an -balanced linear code and is the direct sum lifting of on a -splittable collection of walks , where is either the set of walks on an -wide replacement product graph or a set of walks using the random walk operator . There exists an absolute constant such that if
then the code is -balanced, is a -parity sampler, and we have the following:
If is -close to , then we can compute a -cover of the list
in which for every 101010A randomized rounding will ensure this, but see Appendix D for obtaining this property deterministically., in time
Otherwise, we return with the same running time of the preceding case.
Proof.
The list decoding framework produces a cover of the list , and, as its final step, corrects the cover to obtain the actual list by running the unique decoding algorithm of on each entry of (see [AJQ+20] for details). Using Theorem 9.2 with a -parity sampler and omitting this final step of the algorithm, we can obtain the -cover in time .
The tensoriality part of the proof of Theorem 9.1 applies here unchanged, so we need only make sure is large enough to yield the stronger parity sampling necessary for this theorem. As in that proof, we have that is an -parity sampler with . Take . Using and assuming , we have
which will be smaller than as long as is at least
Acknowledgement
We thank Amnon Ta-Shma for suggesting the problem of decoding in fixed polynomial running time (with the exponent of independent of ) which led us to think about Theorem 6.9. Part of this work was done when some of the authors were visiting the Simons Institute in Berkeley, and we thank them for their kind hospitality.
References
- [ABN+92] N. Alon, J. Bruck, J. Naor, M. Naor, and R. Roth. Construction of asymptotically good, low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 28:509–516, 1992.
- [AGHP92] N. Alon, O. Goldreich, J. Håstad, and R. Peralta. Simple constructions of almost -wise independent random variables. Random Structures and Algorithms, 3(3):289–304, 1992.
- [AJQ+20] Vedat Levi Alev, Fernando Granha Jeronimo, Dylan Quintana, Shashank Srivastava, and Madhur Tulsiani. List decoding of direct sum codes. In Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms, pages 1412–1425. SIAM, 2020.
- [AJT19] Vedat Levi Alev, Fernando Granha Jeronimo, and Madhur Tulsiani. Approximating constraint satisfaction problems on high-dimensional expanders. In Proceedings of the 60th IEEE Symposium on Foundations of Computer Science, pages 180–201, 2019.
- [Ari09] E. Arikan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory, 55(7):3051–3073, July 2009.
- [Aro02] Sanjeev Arora. How NP got a new definition: a survey of probabilistically checkable proofs. In Proceedings of the International Congress of Mathematicians, pages 637–648, 2002. Volume 3.
- [Bog12] Andrej Bogdanov. A different way to improve the bias via expanders. Lecture notes, April 2012. URL: http://www.cse.cuhk.edu.hk/~andrejb/csc5060/notes/12L12.pdf.
- [BRS11] Boaz Barak, Prasad Raghavendra, and David Steurer. Rounding semidefinite programming hierarchies via global correlation. In Proceedings of the 52nd IEEE Symposium on Foundations of Computer Science, pages 472–481, 2011.
- [Cha16] Siu On Chan. Approximation resistance from pairwise-independent subgroups. J. ACM, 63(3), August 2016.
- [Chu97] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
- [CT06] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, New York, NY, USA, 2006.
- [DDG+15] Roee David, Irit Dinur, Elazar Goldenberg, Guy Kindler, and Igor Shinkar. Direct sum testing. ITCS ’15, pages 327–336, New York, NY, USA, 2015. ACM.
- [Del75] P. Delsarte. The association schemes of coding theory. In Combinatorics, pages 143–161. Springer Netherlands, 1975.
- [DHK+19] Irit Dinur, Prahladh Harsha, Tali Kaufman, Inbal Livni Navon, and Amnon Ta-Shma. List decoding with double samplers. In Proceedings of the 30th ACM-SIAM Symposium on Discrete Algorithms, pages 2134–2153, 2019.
- [DK17] Irit Dinur and Tali Kaufman. High dimensional expanders imply agreement expanders. In Proceedings of the 58th IEEE Symposium on Foundations of Computer Science, pages 974–985, 2017.
- [DS14] Irit Dinur and David Steurer. Direct product testing. In Proceedings of the 29th IEEE Conference on Computational Complexity, CCC ’14, pages 188–196, 2014.
- [Gal62] R. Gallager. Low-density parity-check codes. IRE Transactions on Information Theory, 8(1):21–28, 1962.
- [GI01] Venkatesan Guruswami and Piotr Indyk. Expander-based constructions of efficiently decodable codes. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, pages 658–667, 2001.
- [GI03] Venkatesan Guruswami and Piotr Indyk. Linear time encodable and list decodable codes. In Proceedings of the 35th ACM Symposium on Theory of Computing, 2003.
- [GI04] Venkatesan Guruswami and Piotr Indyk. Efficiently decodable codes meeting Gilbert-Varshamov bound for low rates. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms, SODA ’04, pages 756–757, 2004.
- [Gil52] E.N. Gilbert. A comparison of signalling alphabets. Bell System Technical Journal, 31:504–522, 1952.
- [GKO+17] Sivakanth Gopi, Swastik Kopparty, Rafael Oliveira, Noga Ron-Zewi, and Shubhangi Saraf. Locally testable and locally correctable codes approaching the Gilbert-Varshamov bound. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms, SODA ’17, pages 2073–2091, 2017.
- [GR06] Venkatesan Guruswami and Atri Rudra. Explicit capacity-achieving list-decodable codes. In Proceedings of the 38th ACM Symposium on Theory of Computing, pages 1–10, 2006.
- [GR08] Venkatesan Guruswami and Atri Rudra. Concatenated codes can achieve list-decoding capacity. In Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, pages 258–267, 2008.
- [Gri01] Dima Grigoriev. Linear lower bound on degrees of positivstellensatz calculus proofs for the parity. Theor. Comput. Sci., 259(1-2):613–622, 2001.
- [GRS19] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan. Essential coding theory. 2019.
- [GRY19] Venkatesan Guruswami, Andrii Riazanov, and Min Ye. Arikan meets Shannon: Polar codes with near-optimal convergence to channel capacity. Electronic Colloquium on Computational Complexity (ECCC), 26:154, 2019.
- [GS11] Venkatesan Guruswami and Ali Kemal Sinop. Lasserre hierarchy, higher eigenvalues, and approximation schemes for graph partitioning and quadratic integer programming with psd objectives. In FOCS, pages 482–491, 2011.
- [Gur05] Venkatesan Guruswami. Algebraic-geometric generalizations of the Parvaresh-Vardy codes. Electronic Colloquium on Computational Complexity (ECCC), (132), 2005.
- [Gur09] Venkatesan Guruswami. List decoding of binary codes–a brief survey of some recent results. In Coding and Cryptology, pages 97–106. Springer Berlin Heidelberg, 2009.
- [Gur10] Venkatesan Guruswami. Bridging Shannon and Hamming: List error-correction with optimal rate. In ICM, 2010.
- [Ham50] Richard Hamming. Error detecting and error correcting codes. Bell System Technical Journal, 29:147–160, 1950.
- [Hås97] J. Håstad. Some optimal inapproximability results. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages 1–10, 1997.
- [Hås01] Johan Håstad. Some optimal inapproximability results. Journal of the ACM, 48(4):798–859, 2001.
- [HRW17] B. Hemenway, N. Ron-Zewi, and M. Wootters. Local list recovery of high-rate tensor codes applications. In Proceedings of the 58th IEEE Symposium on Foundations of Computer Science, pages 204–215, Oct 2017.
- [IKW09] Russell Impagliazzo, Valentine Kabanets, and Avi Wigderson. New direct-product testers and 2-query PCPs. In Proceedings of the 41st ACM Symposium on Theory of Computing, STOC ’09, pages 131–140, 2009.
- [IW97] Russell Impagliazzo and Avi Wigderson. unless has sub-exponential circuits. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages 220–229, 1997.
- [LPS88] Alexander Lubotzky, R. Phillips, and Peter Sarnak. Ramanujan graphs. Combinatorica, 8:261–277, 1988.
- [MMT11] Alexander May, Alexander Meurer, and Enrico Thomae. Decoding random linear codes in . In Advances in Cryptology – ASIACRYPT 2011, pages 107–124, 2011.
- [MRRW77] R. McEliece, E. Rodemich, H. Rumsey, and L. Welch. New upper bounds on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Transactions on Information Theory, 23(2):157–166, 1977.
- [MRRZ+19] Jonathan Mosheiff, Nicolas Resch, Noga Ron-Zewi, Shashwat Silas, and Mary Wootters. LDPC codes achieve list decoding capacity, 2019. arXiv:1909.06430.
- [NN90] J. Naor and M. Naor. Small-bias probability spaces: efficient constructions and applications. In Proceedings of the 22nd ACM Symposium on Theory of Computing, pages 213–223, 1990.
- [NS09] Michael Navon and Alex Samorodnitsky. Linear programming bounds for codes via a covering argument. Discrete Comput. Geom., 41(2):199–207, March 2009.
- [RT12] Prasad Raghavendra and Ning Tan. Approximating CSPs with global cardinality constraints using SDP hierarchies. In Proceedings of the 23rd ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, pages 373–387, 2012.
- [RVW00] O. Reingold, S. Vadhan, and A. Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science, 2000.
- [Sha48] Claude Shannon. A mathematical theory of communications. Bell System Technical Journal, 27:379–423, 623–656, 1948.
- [SKS19] Ronen Shaltiel, Swastik Kopparty, and Jad Silbak. Quasilinear time list-decodable codes for space bounded channels. 2019.
- [Sti08] Henning Stichtenoth. Algebraic Function Fields and Codes. Springer Publishing Company, Incorporated, 2nd edition, 2008.
- [Tho83] C. Thommesen. The existence of binary linear concatenated codes with Reed- Solomon outer codes which asymptotically meet the Gilbert-Varshamov bound. IEEE Transactions on Information Theory, 29(6):850–853, November 1983.
- [TS17] Amnon Ta-Shma. Explicit, almost optimal, epsilon-balanced codes. In Proceedings of the 49th ACM Symposium on Theory of Computing, STOC 2017, pages 238–251, New York, NY, USA, 2017. ACM.
- [Var57] R.R. Varshamov. Estimate of the number of signals in error correcting codes. Doklady Akademii Nauk SSSR, 117:739–741, 1957.
- [vL99] Jacobus H. van Lint. Introduction to Coding Theory. Springer-Verlag, 1999.
Appendix A Auxiliary Results to Obtain Tensoriality
A key result used in the SOS rounding analysis is embodied in Lemma A.1 below. Roughly speaking, it quantifies the decrease in the potential , under conditioning on a random for , when the ensemble has non-trivial correlation over the edges and is a strong enough expander graph. A generalization of this result to low threshold rank graphs was present in [BRS11]. To derive sharper parameters in the simpler expander case and to make the presentation self-contained, we give (essentially) a full proof of this result.
Lemma A.1 (Progress Lemma).
Suppose satisfy . If
then
A.1 Expander Case
We will need the following characterization of the spectral gap of regular graph . We denote by its adjacency operator and by its Laplacian operator [Chu97].
Fact A.2 (Spectral Gap [Chu97]).
Using the above characterization, we derive the following local-to-global result.
Lemma A.3 (Local-to-Global).
Let be vectors in the unit ball. Suppose (equivalently ). If , then
Proof.
Using A.2, we have
Set . We consider two cases: and . First, suppose . Then
Now suppose . Then
where the last inequality follows from for any graph .
More Preliminaries
We will need some standard notions in information theory [CT06].
Definition A.4 (Relative Entropy/Kullback-Leibler Divergence).
The relative entropy of two distributions and with support contained in is
Notation A.5.
Let be a random variable. We denote by the distribution of .
Definition A.6 (Mutual Information).
Let be two random variables. The mutual information is
Fact A.7.
Fact A.8 (Fact B.5 of Raghavendra and Tan [RT12]).
Let and be indicator random variables. Then
Progress Lemma
We are ready to prove Lemma A.1 which we restate below for convenience.
Lemma A.9 (Progress Lemma (restatement of Lemma A.1)).
Suppose satisfy . If
then
Proof.
Firstly, we show how to relate the distances over the edges to certain covariances. Let . Observe that
We have
Note that the graph is an expander with . Moreover, the matrix is PSD since the vectorization gives a Gram matrix decomposition of . Thus, the covariance matrix is also PSD since it is the Schur product (i.e., entrywise product) of two PSD matrices, namely, . Therefore, we are in position of applying the local-to-global Lemma A.3 with the expander and a vectorization for . We have
(local-to-global Lemma A.3) | ||||
(A.8) | ||||
Therefore, we have , as claimed.
Appendix B Explicit Structures
We recall some explicit structures used in Ta-Shma’s construction.
B.1 Explicit Ramanujan Graphs
The outer graph in the -wide replacement product was chosen to be a Ramanujan graph. Ta-Shma provides a convenient lemma to efficiently obtain explicit Ramanujan graphs given the intended number of vertices (which might end up being nearly twice this much), the expansion and an error parameter . These Ramanujan graphs are based on the LPS construction [LPS88]. Due to number theoretic reasons, we might be forced to work with slightly different parameters, but this is not an issue.
Lemma B.1 (Lemma 2.10 [TS17]).
For every , there exists an algorithm that given and runs in time and outputs a Ramanujan graph such that
-
-
has degree ,
-
-
, and
-
-
is either in the range or in the range .
Moreover, the algorithm outputs a locally invertible function computable in polynomial time in its input length.
B.2 Explicit Biased Distribution
The inner graph in the -wide replacement product is chosen to be a Cayley graph on for some positive integer . Ta-Shma uses the construction of Alon et al. [AGHP92] (AGHP) to deduce a result similar to Lemma B.2 below. To compute the refined parameter version of our main result Theorem 1.1, we will need the specifics of the AGHP construction.
Lemma B.2 (Based on Lemma 6 [TS17]).
For every , there exists a fully explicit set such that
-
-
, and
-
-
for every , we have .
Furthermore, if is a power of , then . In particular, the graph is a expander graph.
Remark B.3.
Given such that is the square of a power of with , by setting we can use Lemma B.2 with and (note that is a power of ) to obtain a Cayley graph with parameters .
Appendix C Zig-Zag Spectral Bound
We prove the zig-zag spectral bound 4.4.
Claim C.1.
Let be an outer graph and be an inner graph used in the -wide replacement product. For any integer ,
Proof.
Let be a unit vector such that , and decompose it into such that and .
To bound , observe that for some . Then,
so that . Because is uniform over -component, , which completes the proof.
We also derive a (simple) tighter bound for the expansion of the zig-zag product in a particular parameter regime.
Claim C.2.
Let be a -two-sided expander and be a -two-sided expander such that both are regular graphs. If , then
Proof.
Let with be such that . In particular, if , then since otherwise . We have
where the inequality follows from the assumption (and trivially ) and the equality follows from . Since we also have , the result follows.
Appendix D Derandomization
To unique decode in fixed polynomial time (i.e., ) we need to prune the list of coupled words covering the list of codewords we want to retrieve. To do so, given , we need to have such that
-
1.
is not too small, and
-
2.
is not too small (in order to apply Lemma 6.11).
The slice of the SOS solution from which is recoverable satisfies in expectation
and
Moreover, since and are -Lipschitz 111111In this fixed polynomial time regime, the parameters are constant independent of the final bias . with respect to the -norm, Hoeffding’s inequality gives
and
At least randomly, such a can be easily found. In [AJQ+20], alternatively to satisfying Item 1 it was shown that by choosing by majority vote, i.e.
for , one has that is large which is enough to address the first item. More precisely implicit in [AJQ+20], for any constant as long as parity sampling is sufficiently strong we have
Similarly is -Lipschitz with respect to the -norm, so Hoeffding’s inequality yields
However, we want to efficiently and deterministically find a satisfying as well as satisfying Item 2. Note that at this stage in the decoding process is not known (without issuing a recursive unique decoding call), so running expectation maximization to satisfy item Item 1 would not be possible. Fortunately, the majority can be cheaply computed without a recursive call to an unique decoder. On the other hand satisfying only Item 2 can be found by expectation maximization. The challenge is to satisfy both conditions at the same time. For this reason, we design a simultaneous expectation maximization derandomization procedure tailored to our setting.
D.1 Abstract Derandomization: Simultaneous Expectation Maximization
Suppose that is a probability space where two random variables and are defined satisfying the following first moment conditions
We provided a sufficient conditions so that satisfying
can be efficiently deterministically found with the aid of an oracle, where and . More precisely, we have the following lemma.
Lemma D.1.
Let be a probability space with a product distribution. Suppose is a random variable on satisfying, for and for some function ,
Suppose is a random variable on satisfying, for some function ,
Suppose that there is an oracle to evaluate under any product distribution for . Given , if
(10) |
then it is possible to find using invocations to the oracle and satisfying
Proof.
D.2 Implementing the Oracle
Now, we provide an efficient deterministic oracle for our setting. We take
where . Note that
To compute under any product distribution , use linearity of expectation and sum at most terms where each can be computed in since restricted to we have a product distribution taking values in .