A Criterion for Decoding on the BSC
Abstract
We present an approach to showing that a linear code is resilient to random errors. We use this approach to obtain decoding results for both transitive codes and Reed-Muller codes. We give three kinds of results about linear codes in general, and transitive linear codes in particular.
-
1.
We give a tight bound on the weight distribution of every transitive linear code : .
-
2.
We give a criterion that certifies that a linear code can be decoded on the binary symmetric channel. Let denote the Krawtchouk polynomial of degree , and let denote the dual code of . We show that bounds on imply that recovers from errors on the binary symmetric channel with parameter . Weaker bounds can be used to obtain list-decoding results using similar methods. One consequence of our criterion is that whenever the weight distribution of is sufficiently close to the binomial distribution in some interval around , is resilient to -errors.
-
3.
We combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for Reed-Muller codes, to obtain list-decoding results for both these families of codes. In some regimes, our bounds for Reed-Muller codes achieve the information-theoretic optimal trade-off between rate and list size.
1 Introduction
In his seminal 1948 paper, Shannon laid out the bases of coding theory and introduced the concept of channel capacity, which is the maximal rate at which information can be transmitted over a communication channel [Sha48]. The two channels that have received the most attention are the Binary Symmetric Channel (BSC), where each bit is independently flipped with some probability , and the Binary Erasure Channel (BEC), where each bit is independently replaced by an erasure symbol with some probability . Shannon’s work initiated a decades-long search for explicit codes that can achieve high rates over a noisy channel.
Explicit construction of codes often have a lot of symmetry. In particular, many known constructions of codes are transitive. The group of symmetries of a code is the subgroup of permutations such that permuting the coordinates of each of the codewords using does not change the code. A code is transitive if for every two coordinates , there is a permutation with . A code is -transitive if for every , there is a permutation with . Many known constructions of codes are cyclic, and every cyclic code is transitive. Reed-Solomon codes, BCH codes and Reed-Muller codes are all transitive.
The binary code that is arguably the cleanest explicit candidate to achieving capacity over both the BSC and the BEC is the family of Reed-Muller codes. The codewords of the Reed-Muller code are the evaluation vectors (over all points in ) of all multivariate polynomials of degree in variables.
Reed-Muller codes enjoy strong symmetry beyond transitivity: their symmetry group is the group of invertible affine transformations over . Using fundamental results from Fourier analysis about the influences of symmetric boolean functions [KKL88, Tal94, BK97] has led to a very successful line of work, with [KKM+16] showing that Reed-Muller codes achieve capacity over the BEC and [HSS21] showing that they are polynomially close to achieving capacity over the BSC. In fact, [KKM+16] show that if a linear code has a -transitive symmetry group such that for every with , , then can tolerate fraction of random erasures. Given these results, it is natural to investigate the types of symmetry that lead to good codes. In this paper, we prove three kinds of results relevant to understanding the error resilience of general linear codes, transitive linear codes, and Reed-Muller codes.
-
1.
We give a clean and tight weight distribution bound for every transitive linear code. We show that for any such code ,
This bound is proved by combining transitivity with the subadditivity of entropy. In some regimes, it improves on all previously known weight bounds for Reed-Muller codes (see Appendix A).
-
2.
We give a new criterion to validate that a code can be decoded over the BSC. Let
denote the Krawtchouk polynomial of degree , and let denote the dual code of . In spirit, our criterion says that any code satisfying
can be uniquely decoded on the BSC with high probability. Our actual result is a little more technically involved (see Theorem 2). This criterion implies that any code whose dual codewords are distributed sufficiently close to the binomial distribution must be resilient to -errors (see Corollary 3). Moreover, if the above expectation is bounded by , then we prove that the code can be list-decoded with a list size of about .
-
3.
Finally, we combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for Reed-Muller codes, to obtain list-decoding results for both families of codes. In some regimes, our bounds for Reed-Muller codes achieve the information-theoretic optimal trade-off between rate and list size.
Next, we discuss our results more rigorously. We note that throughout this section, for any set we denote the uniform distribution over by .
I. Weight Bounds for Transitive Codes
We bound the weight distribution of any transitive linear code over any prime field. See section 6 for the proof.
Theorem 1.
Let be a transitive linear code. Then for any we have
where is the uniform distribution over all codewords in , is the number of non-zero coordinates of , and is the q-ary entropy
Note that denotes the binary entropy function. We note that in some regimes (for e.g. when the degree satisfies and is larger than some constant depending on ), the bound above improves on all previously proven weight distribution bounds for Reed-Muller codes, even though the only feature of the code that we use is transitivity. See Appendix A for some details.
II. A Criterion for Decoding on the BSC
We develop a new approach for proving decoding results over the BSC, i.e. the communication channel whose errors are sampled from the -noisy distribution
for some . Our approach is based on Fourier analysis, although unlike [KKM+16] and [HSS21], the ideas we use do not rely on bounds on influences. We obtain the following result (recall that denotes the uniform distribution over ):
Theorem 2.
Let be any linear code, and denote by its dual code. Then for any , there exists a decoding function such that for all we have
where , and where for the Krawtchouk polynomial of degree .
We will now consider one interesting consequence of Theorem 2. Let be arbitrary, and define
Our next corollary states that whenever the dual codewords of are distributed sufficiently close to the binomial distribution for all weights in , the code must be resilient to -errors. See Appendix B for the proof.
Corollary 3.
Let be a linear code, and let be arbitrary. Suppose that for every we have
and suppose that
Then is resilient to -errors.
As a proof of concept, we note that a uniformly random linear code of dimension satisfies all these conditions simultaneously with high probability.
III. List Decoding Results
Using a generalized version of Theorem 2 (namely, Theorem 21 in section 5), we obtain list decoding bounds for both transitive codes and Reed-Muller codes. We start with our bound for Reed-Muller codes.
Theorem 4.
Let and be such that . Then the Reed-Muller code of dimension can with high probability list-decode -errors using a list of size
Although our lists have exponential size, for small the list size is non-trivial, in the sense that it is much smaller than the number of noise vectors (which is about ) and the number of codewords in the code (which is ). In fact, a standard calculation (see Appendix C) shows that any code of dimension that can successfully list-decode errors of probability with list size must satisfy
(1) |
Our bound in Theorem 4 shows that Reed-Muller codes achieve these optimal parameters, at least in some regimes (for e.g. when and is small enough). We now turn to our list-decoding bound for transitive codes.
Theorem 5.
Fix any , , and . Then any transitive linear code of dimension can with high probability list-decode -errors using a list of size
As an explicit example of the types of bounds one gets from Theorem 5, we have that any transitive linear code of dimension can with high probability list-decode -errors using a list of size
For comparison, recall that our lower bound (1) states that any code of dimension requires a list size of at least about .
1.1 Techniques
Our weight distribution bound for transitive linear codes (Theorem 1) is proven by showing that the entropy of a uniformly random codeword of weight is small. To do this, we analyze the entropy of the coordinates corresponding to linearly independent columns of the generator matrix. Transitivity implies that every coordinate in the code has the same entropy, and subadditivity of entropy can then be used to bound the entropy of the entire distribution.
To obtain our decoding criterion, we make use of a connection between the probability of a decoding error and the norm of the coset distribution of the code. To explain the intuition, let us start by assuming that exactly of the coordinates in the codeword are flipped, although our results actually hold over the BSC as well. Let be the vector in that represents the errors introduced by the channel, and let be the parity check matrix of the code. Then by standard arguments, if can be recovered from , the codeword can be decoded. In the case where is uniformly distributed on vectors of weight , this amounts to showing that with high probability, the coset of does not contain any string of weight (in other words, there is no of weight such that ). This can be understood by computing the norm
where . The norm above is always at least , and if it is close to then the code can be decoded with high probability. If is larger than , then we show that the code can be list-decoded with high probability, where the size of the list is related to .
Thus, to understand decoding, we need to understand . Using Fourier analysis, we express this quantity as
(2) |
where is a uniformly random codeword in the dual code and is the Krawtchouk polynomial of degree . We note that such relations for the coset weight distribution have been used to understand the discrepancy of subsets of the sphere, as well as subsets of other homogeneous spaces. In particular, (2) was proven in a slightly different form in [Bar21] (see Theorem 2.1 and Lemma 4.1), whereas over results of this type had previously been derived in [BDM18, Skr19].
Using estimates for the magnitude of Krawtchouk polynomials and bounds for the weight distribution of the dual code , one can thus bound the norm in the set-up where the error string is a random vector of weight exactly . Using essentially the same techniques, one can also bound the norm when the error string is a random vector of weight , i.e. is taken uniformly at random from the set .
Our next step is then to show that the norm corresponding to the -biased distribution is very similar to the norm corresponding to the uniform distribution over . Intuitively, this is because only contains a very small range of weights, so the -biased distribution and the uniform distribution must behave very similarly over strings of weight in . It then follows that their corresponding norms must be similar as well.
Our decoding criteria (Theorem 2, Corollary 3) are thus obtained by bounding the norm using estimates for Krawtchouk polynomials and for the weight distribution of the dual code . Our list-decoding results (Theorems 4 and 5) then follow from our weight bound for transitive codes (Theorem 1) and from a weight bound of Samorodnitsky for Reed-Muller codes (Theorem 6).
1.2 Related Work
It has been shown that LDPC codes achieve capacity over Binary Memoryless Symmetric Channels (BMS) [LMS+97, KRU13, Gal62], which includes both the BSC and the BEC. These constructions are not deterministic, and it is only with the advent of polar codes [Ari09] that we obtained capacity-achieving codes with both a deterministic constructions and efficient encoding and decoding algorithms.
Polar codes are closely related to Reed-Muller codes, in the sense that they also consist of subspaces that correspond to polynomials over . In [Ari09] it was shown that Polar codes achieve capacity over the BSC, and algorithms were given to both encode and decode them.
It has long been believed that Reed-Muller codes achieve capacity, and significant progress has been made in that direction over the last few years. (See [ASY21] for a discussion on the subject, as well as a thorough exposition to Reed-Muller codes). Abbe, Shpilka and Wigderson first showed that Reed-Muller codes achieve capacity over the BSC and the BEC for sub-constant and super-constant rates [ASW15]. Kudekar, Kumar, Mondelli, Pfister, Sasoglu and Urbanke then proved that in the constant rate regime, Reed-Muller codes achieve capacity over the BEC channel [KKM+16]. Abbe and Ye showed that the Reed-Muller transform polarizes the conditional mutual information, and proved that some non-explicit variant of the Reed-Muller code achieves capacity [AY19]. (They conjecture that this variant is in fact the Reed-Muller code itself). Hazla, Samorodnitsky and Sberlo then proved that Reed-Muller codes of constant rates can decode a constant fraction of errors on the BSC [HSS21]; this had previously been shown for Almost-Reed-Muller codes by Abbe, Hazla and Nachum [AHN21]. Most recently, Reeves and Pfister showed that Reed-Muller codes achieve capacity over all BMS channels under bit-MAP decoding [RP21], i.e. that one can with high probability recover any single bit of the original codeword (but not with high enough probability that one could take a union bound). Despite these breakthroughs, the conjecture that Reed-Muller codes achieve capacity over all BMS channels under block-MAP decoding (i.e. recover the whole codeword with high probability) is ultimately still open.
Weight Bounds for Reed-Muller Codes
Several past works have proven bounds on the weight distribution of Reed-Muller codes. Kaufman, Lovett and Porat gave asymptotically tight bounds on the weight distribution of Reed-Muller codes of constant degree [KLP12]. Abbe, Shpilka and Wigderson then built on these techniques to obtain bounds for all degrees smaller than [ASW15], before Sberlo and Shpilka again improved the approach and obtained bounds for all degrees [SS20]. Most recently, Samorodnitsky used completely different ideas to obtain weight bounds in the regime where both the rate of the code and the normalized weight of the codeword are [Sam20]. We will later use his following result in our list-decoding arguments:
Theorem 6 ([Sam20]).
Let for some , and denote by the uniform distribution over all codewords in . Then for any we have
These bounds are strong when . For close to , the first results we are aware of are due to Ben-Eliezer, Hod and Lovett [BHL12]. Their bounds, which were extended to Reed-Muller codes over prime fields by Beame, Oveis Gharan and Yang [BGY20], are strongest when the degree is sublinear. Sberlo and Shpilka then obtained bounds for all degrees in [SS20], while Samorodnitsky again obtained bounds in the regime where both and are [Sam20].
We note that in some regimes (for e.g. when the degree satisfies and is larger than some constant depending on ), our Theorem 1 improves on all the aforementioned weight bounds. See Appendix A for some details.
List Decoding
List decoding was proposed by Elias in 1957 as an alternative to unique decoding [Eli57]. In the list decoding framework, the receiver of a corrupted codeword is asked to output a list of potential codewords, with the guarantee that with high probability one of these codewords is the original one. This of course allows for a greater fraction of errors to be tolerated.
The list decoding community has largely focused on proving results for the adversarial noise model, and many codes are now known to achieve list-decoding capacity. For example uniformly random codes achieve capacity, as do uniformly random linear codes [GHSZ02, LW18, GHK11]. Folded Reed-Solomon codes were the first explicit codes to provably achieve list-decoding capacity [GR08], followed by several others a few years later [GX12, Kop15, HRW17, MRR+20]. For the rest of this paper however, we will exclusively work in the model where the errors are stochastic. In this model, the strongest known list decoding bound for the code with is, to our knowledge, that one can output a list of size
(3) |
and succeed with high probability in decoding -errors. This result, although not explicitly stated in [Sam20], can be obtained from his weight bound of Theorem 6 by bounding the expected number of codewords that end up closer to the received string than the original codeword, and then applying Markov’s inequality. We note that the expression in (3) stays strictly below the optimal size of (see Appendix D.1 for a proof of this).
Krawtchouk polynomials
The Krawtchouk polynomial of degree is the polynomial
For any subset , we will be interested in the corresponding polynomial . For , we will sometimes abuse notation and use to mean . The following proposition follows from standard results (see for instance [KL99], or Theorem 16 in [MS77]).
Proposition 7.
For any and any , we have
Good estimates for Krawtchouk polynomials of any degree were obtained in [KL95, IS98, Pol19] (see for e.g. [Pol19], Lemma 2.1). These estimates are asymptotically tight in the exponent. Note that , so it suffices to understand the case .
As the second expression can be somewhat cumbersome to use, [Pol19] also gives the following weaker bound (see Lemma 2.2 and equation 2.10 in [Pol19]):
Theorem 9 ([Pol19]).
For any and any , we have
We will need the above estimates when using our Theorem 2 to obtain list-decoding results for transitive codes and Reed-Muller codes.
2 Notation, Conventions and Preliminaries
For the sake of conciseness, we will use the notation
the notation
and for the notation
Let . We will be working with the vector spaces and . For convenience, we associate with the set , by ordering the elements of lexicographically. For , we write to denote the weight of .
2.1 Linear Codes
An -bit code is a subset . Whenever is a subspace of , we say that is a linear code. Any linear code can be represented by its generator matrix, which is a matrix whose rows form a basis for . The matrix generates all codewords of in the sense that
Another useful way to describe a linear code is via its parity-check matrix, which is an matrix whose rows span the orthogonal complement of . The linear code can then be expressed as
One property that will play an important role is transitivity, which we define below:
Definition 1.
A set is transitive if for every there exists a permutation such that
-
(i)
-
(ii)
For every element we have
We note that the dual code of a transitive code is also transitive (see Appendix D.2 for the proof).
Claim 10.
The dual code of a transitive code is transitive.
2.2 Reed-Muller Codes
We will denote by the Reed-Muller code with variables and degree . Throughout this section, we let be the generator matrix of ; this is an matrix whose rows correspond to sets of size at most , ordered lexicographically, and whose columns correspond to elements of . For and , the corresponding entry is . If is empty, this entry is set to .
If is a row vector, can be thought of as describing the coefficients of a multilinear polynomial in of degree at most . The row vector is then the evaluations of this polynomial on all inputs from . It is well known that has full rank, . In fact we have the following standard fact (see Appendix D.3 for the proof):
Fact 11.
The columns of that correspond to the points with are linearly independent.
The parity-check matrix of the Reed-Muller code is known to be the same as the generator matrix of a different Reed-Muller code. Namely, let be the generator matrix for the code . Then has full rank, and . So, the rows of are a basis for the orthogonal complement of the span of the rows of . Reed-Muller codes also have useful algebraic features, notably transitivity:
Fact 12.
For all and all , the Reed-Muller code is transitive.
See Appendix D.3 for the proof.
2.3 Entropy
The binary entropy function is defined to be
The following fact allows us to approximate binomial coefficients using the entropy function:
Fact 13.
For ,
The leftmost inequality is a consequence of Stirling’s approximation for the binomial coefficients, and the rightmost is a consequence of the sub-additivity of entropy.
The following lemma, which is essentially a 2-way version of Pinsker’s inequality, gives a useful way to control the entropy function near .
Lemma 14.
For any , we have
See Appendix D.4 for the proof.
2.4 Probability Distributions
There are two types of probability distributions that we will use frequently. The first one is the -Bernoulli distribution over , which we will denote by
The second one is the uniformly random distribution over some set , which we will denote by
There are two particular cases for the uniform distribution that will occur often enough that we attribute them their own notation. The first one is the uniform distribution over , which we will denote by
The second one is the uniform distribution over all vectors of weight , for some . We will denote this probability distribution by
2.5 Probability Theory
We will need two very standard results of probability theory (see for e.g. [BLM13]): Markov’s inequality and Chernoff’s bound. We start with Markov’s inequality.
Lemma 15.
Let be a nonnegative random variable. Then for any , we have
We will also need Chernoff’s bound:
Lemma 16.
Let be i.i.d. random variables taking values in , and define . Then for any , we have
2.6 Fourier Analysis
The Fourier basis is a useful basis for the space of functions mapping to the real numbers. We recall some of its properties below (see for e.g. [dW08]). For , define the inner product
For every , define the character
These functions form an orthonormal basis, namely for ,
We define the Fourier coefficients . Then for , we have
In particular,
3 Outline of the Paper
The main question we will be looking into is whether or not a family of list-decoding codes , with , is asymptotically resilient to independent errors of probability . Formally, we are given a list size and want to know if there exists a family of decoding functions , with , such that for every sequence of codewords we have
We note that the unique decoding problem can be seen as setting in the above set-up. Our general approach will be based on trying to identify the error string from its image . In particular, we will be interested in the max-likelihood decoder
(4) |
We show in the following lemma that if the max-likelihood decoder is able to identify the error string , then it is possible to recover the original codeword.
Lemma 17.
Let be the parity-check matrix of the linear code , and let be an arbitrary function. Then there exists a decoding function
such that for every we have
Proof.
Given , define to be
We will show that whenever satisfies , also satisfies for every . Suppose . Note that since is the parity-check matrix of , every satisfies . So for every , any that satisfies must also satisfy . It then follows by definition of that
∎
From this point onward, our goal will thus be to prove that the max-likelihood decoder in (3) succeeds in recovering with high probability. In section 4, we relate the decoding error probability of the max-likelihood decoder to the collision probability
In section 5, we build on this result to obtain a bound on the performance of in terms of the weight distribution of the dual code. We then present new bounds on the weight distribution of transitive codes in section 6. These bounds are interesting in their own right, and we show that they are essentially tight. In section 7, we combine these bounds with our results from section 5 to obtain list-decoding results for transitive linear codes. We then repeat this argument with Samorodnitsky’s Theorem 6 in section 8 to obtain a stronger list-decoding bound for Reed-Muller codes.
4 Collisions vs Decoding
Recall that we denote by the -Bernoulli distribution over , i.e. the distribution
Recall also that for any subset , we denote by the uniform distribution over all strings of weight , i.e.
The goal of this section will be to analyze the relationship between the decoding of an error string and the collision probability of strings within the map . Intuitively, the more collisions there are within this mapping, the harder it is for our decoder to correctly identify the error string upon seeing only its image . However, certain error strings might be unlikely enough to occur that our decoder can safely ignore them. For example, if we are interested in an -noisy error string , then is unlikely to have weight far away from . We could thus choose to ignore all strings whose weights do not lie in the set , for some integer . In order to analyze the collisions that occur when strings are required to have weight , we define for every and every the set of -colliders of , i.e. the set of strings that lie in the coset of and have weight :
Definition 2.
For any and any subset , define
This definition captures a natural parameter for how large of a list we need before we can confidently claim that it contains the error string: if we are given and are told that with high probability the error string has weight , then we should output the list . For unique decoding we want to argue that with high probability, whereas for list decoding we want to argue that with high probability, for some integer . The expectation of will thus be a key quantity in our analysis. We will call this expectation the ”collision count,” because it will later be useful to interpret it as the renormalized collision probability of the map (see for instance the proof of Proposition 20).
Definition 3.
For any subset and any matrix , define
In the following lemma, we use Markov’s inequality to bound the probability of a list decoding error in terms of .
Lemma 18.
For any subset , any matrix with columns, and any integer , we have
Proof.
Note that for any with weight , so the random variable is always non-negative. Applying Markov’s inequality (i.e. Lemma 15), we then have
∎
When the error string is sampled uniformly at random from the set , the above lemma allows us to relate the decoding error probability to the collision count . The problem we are most interested in, however, is when is sampled not from some uniform distribution, but from the -noisy probability distribution . We will now show how to connect these two decoding problems. The intuition is that by the Chernoff bound, we only need to concern ourselves with strings whose weights lie in , for some appropriately chosen . But in this weight band all strings have similar weight, and so are given similar probability under the distribution . Intuitively, the -decoder must then perform very similarly to the -decoder. The following theorem makes this idea precise, and then uses Lemma 18 to bound the probability of a decoding error. Recall that is the max-likelihood decoder
Theorem 19.
Fix , let be any matrix with columns, and let for some integers and . Then
where is when and otherwise
Proof.
We will consider the unique decoding case (, i.e. ) and the list-decoding case (, i.e. ) separately.
Case 1: Unique decoding, i.e.
Let be the number of rows in the matrix . We will show that a slightly less performant decoder satisfies the desired probability bound. We define as follows: upon receiving input , outputs the minimum-weight string from the set
If there is no such string, the decoder fails. It is clear that
since always returns the most likely string whereas may not. We thus turn to proving the desired bound for . Letting
we have by Chernoff’s bound (i.e. Lemma 16) that
(5) |
We want to bound the second term. For any , we define the set of ”problematic weights” . We note that for , our decoder can only fail if there is some string satisfying and . Recalling the definition , we can then rewrite our equation (4) as
Considering the most problematic weight level within the region and using a union bound over all lower levels , we get
We now note that under the condition , the probability distributions and are identical (they are both uniform on strings of weight ). We can thus rewrite our last inequality as
But by basic conditional probability we know that
so we can bound our previous expression by
Now for any and , we have . It then follows that
The theorem statement then follows from Lemma 18.
Case 2: List decoding, i.e.
Let be the number of rows in the matrix . We will show that a slightly less performant decoding function satisfies the desired probability bound. We define as follows: upon receiving input , outputs strings from , for each . If there are fewer than strings in some level , the decoder returns all of them. It is clear that for any we have
since returns the most likely strings while returns at most strings. We thus turn to proving the desired bound for . We first bound the probability that the error string be far away from its mean. Letting
we have, by Chernoff’s bound (i.e. Lemma 16), that
Since the distribution gives the same probability to any two strings of equal weights, we get
Applying Lemma 18, we get
where in the last line we used that whenever . ∎
5 A Criterion for Decoding
In this section, we give a criterion that certifies that a linear code is resilient to errors of probability . We give such a criterion for both unique decoding and list decoding. The function we will need to make this connection is the Krawtchouk polynomial of degree , which as we recall is defined as
For vectors , we will abuse notation and write to mean For convenience, we also define for any the function
In the following proposition, we use basic Fourier analysis tools to rewrite the collision count in terms of the Krawtchouk polynomial . We note that Proposition 20 was previously proven in a different form in [Bar21] (see Theorem 2.1 and Lemma 4.1), and can be seen as describing the coset weight distribution of the code. Recall that we use to denote the uniform distribution over all vectors in , and that we use the notation .
Proposition 20.
Fix , and let be a matrix with entries in . Then for any , we have
Proof.
The main tool we will use is Parseval’s Identity, which relates the evaluations of a function to its Fourier coefficients by
(6) |
We will first need to rewrite as the norm of some function . For this, we recall the definition and note that
We are now ready to apply Parseval’s Identity. Letting in equation (6), we get
But by definition we have , so the last equation can be rewritten as
(7) |
Define the function to be 1 if satisfies , and 0 otherwise. We can then express as
(8) |
Combining expressions (7) and (8) and applying the definition of the Fourier transform, we get
∎
We will now combine Theorem 19 and Proposition 20 to obtain Theorem 2, i.e. to obtain a bound on the decoding error probability in terms of the Fourier coefficients of the level function . We prove a generalized version of Theorem 2 below. To recover Theorem 2, set and . (You want to think of the parameter as being in both the case and the case , so that the error term is small).
Theorem 21.
Fix , let be any Boolean matrix, and let for any integers and . Then
where the function is when , and otherwise.
One consequence of Theorem 21 is Corollary 3, which states that is resilient to -errors if the weight distribution of is close enough to the binomial distribution (see Appendix B for the proof). As another application of Theorem 21, we present the following bound on the probability of making a list-decoding error for a code . We note that once again, our bound depends only on the weight distribution of the dual code .
Proposition 22.
Fix any , and define for Let , and let for some integer . Then for all and all integers , we have that any matrix with entries in satisfies
Proof.
We will use Theorem 21 to bound the decoding error probability in terms of the Krawtchouk polynomials and the probability factors . Some of these terms will then be bounded using Proposition 7, and some will be bounded using Theorem 9. We proceed with the proof; letting in Theorem 21, we get
(9) |
We want to bound the summation in the second term. We will start with the central terms . For these we rely on Proposition 7, which states that for all . For any , we then get
(10) |
We then want to bound the contribution of the faraway terms to the summation in (9), i.e. we want to bound
(11) |
We will want to apply Theorem 9 to every term in this sum. Note that by definition of Krawtchouk polynomials, for any we have
We can then bound equation 11 by
Applying Theorem 9, we get
Combining this bound for the faraway terms with our bound (5) for the central terms of the summation, we bound the right-hand side of equation (9) by
∎
6 The Weight Distribution of Transitive Linear Codes
We will now prove Theorem 1. We note that the bound we get is essentially tight, since for the repetition code
is transitive, has dimension , and has weight distribution
for all . We recall and prove our Theorem 1 below:
Theorem.
Let be a transitive linear code. Then for any we have
where is the uniform distribution over all codewords in , is the number of non-zero coordinates of , and is the q-ary entropy
Proof.
Let be the generator matrix of , and let . Without loss of generality, suppose that the first columns of span the column-space of . Define
and let be a uniformly random codeword in . Now is transitive, so for every the random variables and are identically distributed. By linearity of expectation and by definition of , we thus have that for every ,
(12) |
But under condition (12), has maximal entropy when its remaining mass is uniformly distributed over , i.e. when for all . The entropy of is thus bounded by
(13) |
We will now show that for every . To this end, fix some . Recall that the columns span the column-space of , so we can write the column as for some . But any codeword can be expressed as for some , so any codeword satisfies
The random variable is thus determined by , and so we indeed have
for every . Applying (6) and the chain rule for entropy then gives
Now is sampled uniformly from , so . We thus have
∎
For Reed-Muller codes, we will abuse notation and denote by the uniform distribution over all codewords in .
Theorem 23.
For any and , the Reed-Muller code over the prime field satisfies
7 List Decoding for Transitive Codes
We now turn to proving Theorem 5. Recall that in section 5 we bounded the minimum size for the decoding list of a linear code in terms of the weight distribution of its dual code. But as we mentioned in the preliminaries, the dual code of a transitive code is also transitive. For any transitive linear code , we can thus apply our Theorem 1 for the weight distribution of to get a bound on the size of the decoding list for . We restate and prove our Theorem 5 below.
Theorem.
Fix any , , and . Then any transitive linear code of dimension can with high probability list-decode -errors using a list of size
Proof.
We will show that there exists a function mapping every to a subset of size
with the property that for every codeword we have
Let denote the parity-check matrix of . By Lemma 17, it is sufficient to show that for any and any we have
(14) |
We will thus prove (14). Recall that for , Proposition 22 yields the following bound on the left-hand side of (14):
(15) |
where for , and . Our goal will be to bound both the central terms and the faraway terms by using our bounds on the weight distribution of transitive codes. As we’ve seen in section 2, the dual code is a transitive linear code of dimension . By Theorem 1, we thus have that for all ,
(16) |
For any , we then have by Fact 13 that
But for we have , so the right-hand side is maximized at . Applying Lemma 14, we get
(17) |
We now turn to the faraway terms of equation (7). By equation (16), we have
Note that by definition of , any can be written as for some . By Lemma 14, we can then rewrite our previous expression as
But for any positive constant , the derivative of is , and the second derivative is always negative. Thus, the above expression achieves its maximum when . We then get
(18) |
where in the last line we used the inequality for to get . We now use equations (7) and (7) to bound the central and faraway terms of (7) respectively. This gives
We have shown (14), and so we are done. ∎
8 List Decoding for Reed-Muller Codes
We will now turn to proving our list-decoding bounds for Reed-Muller codes. The dual code of the Reed-Muller code is the code , so we can apply Samorodnitsky’s Theorem 6 to our Proposition 22. We restate and prove our Theorem 4 below.
Theorem.
Let and be such that . Then the Reed-Muller code of dimension can with high probability list-decode -errors using a list of size
Proof.
We will show that there exists a function mapping every to a subset of size
with the property that for every codeword we have
Let denote the parity-check matrix of . By Lemma 17, it is sufficient to show that for any and any we have
(19) |
We will thus prove (19). Recall that for , Proposition 22 yields the following bound on the left-hand side of (19):
(20) |
where for , and . Our goal is to bound every term in these sums by using the weight distribution bounds given in Theorems 1 and 6. We bound the central terms in exactly the same way as in Theorem 5: by Theorem 23 we know that the weight distribution of the Reed-Muller code satisfies
so by Fact 13 we have
But , so by Lemma 14 we have
(21) |
For the faraway terms, we use the weight bound from Theorem 6. By symmetry, we get that
(22) |
Now the function
has first derivative
and second derivative
Thus achieves its maximum at and is decreasing over . Whenever , we have ; in that case the argument in equation (8) is maximized at , and we get
Combining this bound for the faraway terms with the bound (8) for the central terms, we bound the right-hand side of (8) by
We have shown (19), and so we are done. ∎
Acknowledgements
We thank Alexander Barg, Paul Beame, Noam Elkies, Amir Shpilka, Madhu Sudan and Amir Yehudayoff for useful discussions.
Appendix A Weight Bounds Comparisons
In this section, we will compare our Theorem 23 with previously known bounds on the weight distribution of Reed-Muller codes. We recall our Theorem 23 below. Note that throughout this section, will denote the uniform distribution over all codewords in , and will denote the number of non-zero coordinates of .
Theorem.
For any and , the Reed-Muller code over the prime field satisfies
where we have defined
Reed-Muller codes over odd prime fields
We start with Reed-Muller codes over odd prime fields, for which the only preexisting weight bound we are aware of is the following result of [BGY20]:
Theorem 24 ([BGY20]).
For any , there are constants such that for any odd prime and for any integers such that , we have
This was a generalization of [BHL12], who proved the same result for Reed-Muller codes over . Theorem 24 is very strong for small degrees, but gets weaker as the degree increases. When is linear in we have , meaning that in this regime Theorem 24 can only give a nontrivial bound on normalized weights that are at least a constant away from . Our Theorem 23 gives nontrivial bounds for all normalized weights , for all degrees .
Reed-Muller codes over
We now turn to Reed-Muller codes over , for which more results are known. The same bound as Theorem 24 was proven over by [BHL12]. For comparison with our Theorem 23, see the discussion above.
In the constant-rate regime (i.e. ), the strongest known bounds for contant weights are the following two results of [Sam20]:
Theorem 25 ([Sam20]).
Let for some . Then for any we have
This result is strong when is away from . For close to , the following bound is stronger.
Theorem 26 ([Sam20]).
Let for some , and define . Then for any ,
We note that the combination of Theorems 25 and 26 is stronger than our Theorem 23 whenever both the rate of the code and the normalized weight of the codeword are constant (i.e. and ).
However, when the normalized weight is subconstant or when the degree is away from (i.e. or ), the term becomes too large for Theorems 25 and 26 to give a strong bound. An approach that has been fairly successful in these two regimes (subonstant rate or subconstant weight) is the line of work of [KLP12, ASW15, SS20]. To our knowledge, the strongest results for these regimes are due to [SS20]. We start with their bound for lower weights, i.e. for weights in
Theorem 27 ([SS20]).
For any integers , we have
We claim that for every , there is some weight threshold for which our Theorem 23 is stronger than Theorem 27 for all weights larger than . One way to see this is to note that our Theorem 23 satisfies
while the expression in Theorem 27 satisfies
Thus our Theorem 23 is stronger than Theorem 27 whenever , i.e. whenever
For any , this gives a nontrivial range.
This concludes our comparison of Theorem 23 with Theorem 27, which was the bound of [SS20] for weights in We now turn to their bounds for larger weights.
Theorem 28 ([SS20]).
Let and let be a parameter (which may be constant or depend on ) such that Then
where .
This bound holds when the degree is smaller than . For arbitrary degree, [SS20] gives the following:
Theorem 29 ([SS20]).
For any integers and any , we have
We will start by comparing our Theorem 23 with Theorem 29. Applying Lemma 14, we get from Theorem 23 that
Thus our Theorem 23 is strictly stronger than Theorem 29 for all . We will now compare our Theorem 23 with Theorem 28. Applying Lemma 14, we get from Theorem 23 that
It follows that our Theorem 23 is stronger than Theorem 28 whenever , i.e. whenever
But , and as . Thus there exists some constant such that our Theorem 23 is stronger than Theorem 28 whenever . In private correspondence with Amir Shpilka and Ori Sberlo, we learned that can be computed to be .
Appendix B Proof of Corollary 3
Recall that for any we defined
and that for any code we denote by the uniform distribution over the dual code . We now restate and prove our Corollary 3.
Corollary.
Let be a linear code, and let be arbitrary. Suppose that for every we have
and suppose that
Then is resilient to -errors.
Proof.
From Theorem 2, we know that there exists some decoder such that for all ,
(23) |
where and where for the Krawtchouk polynomial of degree . Let be such that , and define the set of weights
We will start by bounding the central terms in equation (23). Applying Proposition 7 and the first condition in our theorem statement, we immediately get that for any ,
(24) |
We now turn to the faraway terms . For these we note that by definition of Krawtchouk polynomials, for any integer we have
For any , we can then bound the faraway terms of equation (23) by
Applying the second condition in our theorem statement in combination with Fact 13 and the subbaditivity of entropy, we get that
Combining this bound for the faraway terms with our bound (24) for the central terms, we bound equation (23) by
∎
Appendix C Lower Bounds on List Decoding
Claim 30.
Let be arbitrary, and consider any . Suppose a code and a decoder satisfy
for the -noisy distribution and the uniform distribution on . Then we must have
Proof.
We will first show that in order for the decoder to succeed with high probability, there must be many codewords for which
Intuitively this is because the sphere of radius around any codeword contains points (and for any sent codeword , with high probability the received message will satisfy ). We will then simply double-count the number of pairs for which . On the one hand, there are such pairs, since every received message is mapped to codewords; on the other hand, there must be at least about pairs, since as we’ve just argued most codewords in need to be matched to at least points. It follows that we must have
Formally, we first note that the theorem condition implies that at least codewords must satisfy
(25) |
Fix any such . Now from Chernoff’s bound (i.e Lemma 16), we have for large enough that
In order for to satisfy with probability at least , there must then be a subset satisfying both
(26) |
and
(27) |
But every element satisfies , so every satisfies
(28) |
Equations (27) and (C) imply that any that can be list-decoded by with probability must satisfy . It then follows from (26) that any such must satisfy
By double counting, we get
The result then follows from rearranging terms. ∎
Appendix D Other Proofs for Sections 1 and 2
D.1 On Known List-Decoding Bounds for Reed-Muller Codes
We recall the known list-decoding bound for Reed-Muller codes (see equation (3) in section 1):
We claim this bound never achieves the information-theoretic .
Claim 31.
For any and any , we have
Proof.
We will show that for any and we have
i.e. that
(29) |
We first fix some and compute the maximizing . Note that
and
so is minimized at and increasing over . We thus have
(30) |
We deal with each case separately. For the case , we want to show that
The first derivative is
and the second derivative is
Thus the function is maximized at and monotone on each side of . In particular, since we know that over the interval the function achieves its minimum at either or . But , so we indeed have that
for all . This deals with the first case of (30). For the second case of (30), we want to show that for all we have
But
is decreasing in and , so we indeed have for all ∎
D.2 Duals of Transitive Codes - Proof of Fact 10
Claim.
The dual code of a transitive code is transitive.
Proof.
Let be arbitrary. Since is transitive, we know there exists a permutation such that and that for any , we have . Clearly satisfies , and we claim that it also satisfies that for all . For this we note that since for every , we have by definition that every satisfies
We thus have
∎
D.3 Basic Properties of Reed-Muller Codes - Proof of Facts 11 and 12
Fact.
Let be the generator matrix of the Reed-Muller code. The columns of that correspond to the points with are linearly independent.
Proof.
Let be the submatrix of whose columns correspond to the points with . It suffices to show that when you order the columns of in increasing order of , every column is linearly independent from the preceding ones. But this is clearly the case, as for the monomial we have and for all preceding v. ∎
Fact.
For all and all , the Reed-Muller code is transitive.
Proof.
Recall that we view each coordinate as a point , and that every codeword in is the evaluation vector of a polynomial of degree in variables.
Now fix two points . We want to show that there is a permutation such that
-
(i)
-
(ii)
If then
To this end, we choose the permutation . Then:
-
(i)
.
-
(ii)
If is a codeword, it can be written as for some polynomial of degree . But then the polynomial
satisfies , so must be a codeword. Then since by definition, we have that .
∎
D.4 A version of Pinsker’s inequality - Proof of Lemma 14
Lemma.
For any , we have
Proof.
Thus and . ∎
References
- [AHN21] Emmanuel Abbe, Jan Hazla, and Ido Nachum. Almost-reed-muller codes achieve constant rates for random errors. IEEE Trans. Inf. Theory, 67(12):8034–8050, 2021.
- [Ari09] Erdal Arikan. Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory, 55(7):3051–3073, 2009.
- [ASW15] Emmanuel Abbe, Amir Shpilka, and Avi Wigderson. Reed-muller codes for random erasures and errors. IEEE Trans. Inf. Theory, 61(10):5229–5252, 2015.
- [ASY21] Emmanuel Abbe, Amir Shpilka, and Min Ye. Reed-muller codes: Theory and algorithms. IEEE Trans. Inf. Theory, 67(6):3251–3277, 2021.
- [AY19] Emmanuel Abbe and Min Ye. Reed-muller codes polarize. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 273–286. IEEE Computer Society, 2019.
- [Bar21] Alexander Barg. Stolarsky’s invariance principle for finite metric spaces. Mathematika, 67(1):158–186, 2021.
- [BDM18] Dmitriy Bilyk, Feng Dai, and Ryan Matzke. Stolarsky principle and energy optimization on the sphere. Constructive Approximation, 48, 08 2018.
- [BGY20] Paul Beame, Shayan Oveis Gharan, and Xin Yang. On the bias of reed-muller codes over odd prime fields. SIAM J. Discret. Math., 34(2):1232–1247, 2020.
- [BHL12] Ido Ben-Eliezer, Rani Hod, and Shachar Lovett. Random low-degree polynomials are hard to approximate. Comput. Complex., 21(1):63–81, 2012.
- [BK97] Jean Bourgain and Gil Kalai. Influences of variables and threshold intervals under group symmetries. Geometric & Functional Analysis GAFA, 7(3):438–461, 1997.
- [BLM13] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities - A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
- [dW08] Ronald de Wolf. A brief introduction to fourier analysis on the boolean cube. Theory Comput., 1:1–20, 2008.
- [Eli57] Peter Elias. List decoding for noisy channels. Wescon Convention Record, Part 2, pages 94–104, 1957.
- [Gal62] Robert G. Gallager. Low-density parity-check codes. IRE Trans. Inf. Theory, 8(1):21–28, 1962.
- [GHK11] Venkatesan Guruswami, Johan Håstad, and Swastik Kopparty. On the list-decodability of random linear codes. IEEE Trans. Inf. Theory, 57(2):718–725, 2011.
- [GHSZ02] Venkatesan Guruswami, Johan Håstad, Madhu Sudan, and David Zuckerman. Combinatorial bounds for list decoding. IEEE Trans. Inf. Theory, 48(5):1021–1034, 2002.
- [GR08] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Trans. Inf. Theory, 54(1):135–150, 2008.
- [GX12] Venkatesan Guruswami and Chaoping Xing. Folded codes from function field towers and improved optimal rate list decoding. In Howard J. Karloff and Toniann Pitassi, editors, Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 339–350. ACM, 2012.
- [HRW17] Brett Hemenway, Noga Ron-Zewi, and Mary Wootters. Local list recovery of high-rate tensor codes & applications. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 204–215. IEEE Computer Society, 2017.
- [HSS21] Jan Hazla, Alex Samorodnitsky, and Ori Sberlo. On codes decoding a constant fraction of errors on the BSC. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1479–1488. ACM, 2021.
- [IS98] Mourad E.H Ismail and Plamen Simeonov. Strong asymptotics for krawtchouk polynomials. Journal of Computational and Applied Mathematics, 100(2):121–144, 1998.
- [KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions. In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80. IEEE Computer Society, 1988.
- [KKM+16] Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu, and Rüdiger L. Urbanke. Reed-muller codes achieve capacity on erasure channels. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 658–669. ACM, 2016.
- [KL95] Gil Kalai and Nathan Linial. On the distance distribution of codes. IEEE Trans. Inf. Theory, 41(5):1467–1472, 1995.
- [KL99] Ilia Krasikov and Simon Litsyn. Survey of binary krawtchouk polynomials. In Alexander Barg and Simon Litsyn, editors, Codes and Association Schemes, Proceedings of a DIMACS Workshop, Piscataway, New Jersey, USA, November 9-12, 1999, volume 56 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 199–211. DIMACS/AMS, 1999.
- [KLP12] Tali Kaufman, Shachar Lovett, and Ely Porat. Weight distribution and list-decoding size of reed-muller codes. IEEE Trans. Inf. Theory, 58(5):2689–2696, 2012.
- [Kop15] Swastik Kopparty. List-decoding multiplicity codes. Theory Comput., 11:149–182, 2015.
- [KRU13] Shrinivas Kudekar, Tom Richardson, and Rüdiger L. Urbanke. Spatially coupled ensembles universally achieve capacity under belief propagation. IEEE Trans. Inf. Theory, 59(12):7761–7813, 2013.
- [LMS+97] Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, Daniel A. Spielman, and Volker Stemann. Practical loss-resilient codes. In Frank Thomson Leighton and Peter W. Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, pages 150–159. ACM, 1997.
- [LW18] Ray Li and Mary Wootters. Improved list-decodability of random linear binary codes. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, volume 116 of LIPIcs, pages 50:1–50:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
- [MRR+20] Jonathan Mosheiff, Nicolas Resch, Noga Ron-Zewi, Shashwat Silas, and Mary Wootters. LDPC codes achieve list decoding capacity. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 458–469. IEEE, 2020.
- [MS77] Florence MacWilliams and Neil Sloane. The theory of error correcting codes. North-Holland Publishing Company, 1977.
- [Pol19] Yury Polyanskiy. Hypercontractivity of spherical averages in hamming space. SIAM J. Discret. Math., 33(2):731–754, 2019.
- [RP21] Galen Reeves and Henry D. Pfister. Reed-muller codes achieve capacity on BMS channels. CoRR, abs/2110.14631, 2021.
- [Sam20] Alex Samorodnitsky. An upper bound on norms of noisy functions. IEEE Trans. Inf. Theory, 66(2):742–748, 2020.
- [Sha48] Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27(3):379–423, 1948.
- [Skr19] M. Skriganov. Point distributions in two-point homogeneous spaces. Mathematika, 65:557–587, 03 2019.
- [SS20] Ori Sberlo and Amir Shpilka. On the performance of reed-muller codes with respect to random errors and erasures. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1357–1376. SIAM, 2020.
- [Tal94] Michel Talagrand. On russo’s approximate zero-one law. The Annals of Probability, 22(3):1576–1587, 1994.