This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Criterion for Decoding on the BSC

Anup Rao
School of Computer Science, University of Washington
[email protected]
Supported by NSF CCF-2131899.
   Oscar Sprumont
School of Computer Science, University of Washington
[email protected]
Supported by NSERC PGSD3-545945-2020 and NSF CCF-2131899.
Abstract

We present an approach to showing that a linear code is resilient to random errors. We use this approach to obtain decoding results for both transitive codes and Reed-Muller codes. We give three kinds of results about linear codes in general, and transitive linear codes in particular.

  1. 1.

    We give a tight bound on the weight distribution of every transitive linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N}: PrcC[|c|=αN]2(1h(α))𝖽𝗂𝗆(C)\Pr_{c\in C}[|c|=\alpha N]\leq 2^{-(1-h(\alpha))\cdot\mathsf{dim}(C)}.

  2. 2.

    We give a criterion that certifies that a linear code CC can be decoded on the binary symmetric channel. Let Ks(x)K_{s}(x) denote the Krawtchouk polynomial of degree ss, and let CC^{\perp} denote the dual code of CC. We show that bounds on missingEcC[KϵN(|c|)2]\mathop{\mathbb{missing}}{E}_{c\in C^{\perp}}[K_{\epsilon N}(|c|)^{2}] imply that CC recovers from errors on the binary symmetric channel with parameter ϵ\epsilon. Weaker bounds can be used to obtain list-decoding results using similar methods. One consequence of our criterion is that whenever the weight distribution of CC^{\perp} is sufficiently close to the binomial distribution in some interval around N2\frac{N}{2}, CC is resilient to ϵ\epsilon-errors.

  3. 3.

    We combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for Reed-Muller codes, to obtain list-decoding results for both these families of codes. In some regimes, our bounds for Reed-Muller codes achieve the information-theoretic optimal trade-off between rate and list size.

1 Introduction

In his seminal 1948 paper, Shannon laid out the bases of coding theory and introduced the concept of channel capacity, which is the maximal rate at which information can be transmitted over a communication channel [Sha48]. The two channels that have received the most attention are the Binary Symmetric Channel (BSC), where each bit is independently flipped with some probability ϵ\epsilon, and the Binary Erasure Channel (BEC), where each bit is independently replaced by an erasure symbol with some probability ϵ\epsilon. Shannon’s work initiated a decades-long search for explicit codes that can achieve high rates over a noisy channel.

Explicit construction of codes often have a lot of symmetry. In particular, many known constructions of codes are transitive. The group of symmetries of a code is the subgroup GG of permutations π:{1,,N}{1,,N}\pi:\{1,\dotsc,N\}\rightarrow\{1,\dotsc,N\} such that permuting the coordinates of each of the codewords using π\pi does not change the code. A code is transitive if for every two coordinates i,ji,j, there is a permutation πG\pi\in G with π(i)=j\pi(i)=j. A code is 22-transitive if for every iki\neq k, jj\neq\ell there is a permutation πG\pi\in G with π(i)=j,π(k)=\pi(i)=j,\pi(k)=\ell. Many known constructions of codes are cyclic, and every cyclic code is transitive. Reed-Solomon codes, BCH codes and Reed-Muller codes are all transitive.

The binary code that is arguably the cleanest explicit candidate to achieving capacity over both the BSC and the BEC is the family of Reed-Muller codes. The codewords of the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) are the evaluation vectors (over all points in 𝔽2n\mathbb{F}_{2}^{n}) of all multivariate polynomials of degree dd in nn variables.

Reed-Muller codes enjoy strong symmetry beyond transitivity: their symmetry group is the group of invertible affine transformations over 𝔽2n\mathbb{F}_{2}^{n}. Using fundamental results from Fourier analysis about the influences of symmetric boolean functions [KKL88, Tal94, BK97] has led to a very successful line of work, with [KKM+16] showing that Reed-Muller codes achieve capacity over the BEC and [HSS21] showing that they are polynomially close to achieving capacity over the BSC. In fact, [KKM+16] show that if a linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} has a 22-transitive symmetry group GG such that for every S{1,,N}S\subseteq\{1,\dotsc,N\} with |S|=(slogN)0.99|S|=(s\log N)^{0.99}, |{π(S):πG}|Ns|\{\pi(S):\pi\in G\}|\geq N^{s}, then CC can tolerate ϵO(1/s)\epsilon-O(1/s) fraction of random erasures. Given these results, it is natural to investigate the types of symmetry that lead to good codes. In this paper, we prove three kinds of results relevant to understanding the error resilience of general linear codes, transitive linear codes, and Reed-Muller codes.

  1. 1.

    We give a clean and tight weight distribution bound for every transitive linear code. We show that for any such code C𝔽2NC\subseteq\mathbb{F}_{2}^{N},

    PrcC[|c|=αN]2(1h(α))𝖽𝗂𝗆(C).\Pr_{c\in C}[|c|=\alpha N]\leq 2^{-(1-h(\alpha))\cdot\mathsf{dim}(C)}.

    This bound is proved by combining transitivity with the subadditivity of entropy. In some regimes, it improves on all previously known weight bounds for Reed-Muller codes (see Appendix A).

  2. 2.

    We give a new criterion to validate that a code can be decoded over the BSC. Let

    Kt(x)=j=0t(1)j(xj)(Nxtj)K_{t}(x)=\sum_{j=0}^{t}(-1)^{j}\binom{x}{j}\binom{N-x}{t-j}

    denote the Krawtchouk polynomial of degree tt, and let CC^{\perp} denote the dual code of CC. In spirit, our criterion says that any code CC satisfying

    missingEcC[KϵN(|c|)2]<(1+o(N1))(NϵN)\mathop{\mathbb{missing}}{E}_{c\in C^{\perp}}[K_{\epsilon N}(|c|)^{2}]<(1+o(N^{-1}))\cdot\binom{N}{\epsilon N}

    can be uniquely decoded on the BSC with high probability. Our actual result is a little more technically involved (see Theorem 2). This criterion implies that any code whose dual codewords are distributed sufficiently close to the binomial distribution must be resilient to ϵ\epsilon-errors (see Corollary 3). Moreover, if the above expectation is bounded by o(k(NϵN)1)o(k\binom{N}{\epsilon N}^{-1}), then we prove that the code can be list-decoded with a list size of about kk.

  3. 3.

    Finally, we combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for Reed-Muller codes, to obtain list-decoding results for both families of codes. In some regimes, our bounds for Reed-Muller codes achieve the information-theoretic optimal trade-off between rate and list size.

Next, we discuss our results more rigorously. We note that throughout this section, for any set XX we denote the uniform distribution over XX by 𝒟(X)\mathcal{D}(X).

I. Weight Bounds for Transitive Codes
We bound the weight distribution of any transitive linear code over any prime field. See section 6 for the proof.

Theorem 1.

Let C𝔽qNC\subseteq\mathbb{F}_{q}^{N} be a transitive linear code. Then for any α(0,11/q)\alpha\in(0,1-1/q) we have

Prc𝒟(C)[|c|=αN]q(1hq(α))dim C,\Pr_{c\sim\mathcal{D}(C)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\textnormal{dim }C},

where 𝒟(C)\mathcal{D}(C) is the uniform distribution over all codewords in CC, |c||c| is the number of non-zero coordinates of cc, and hqh_{q} is the q-ary entropy

hq(α)=(1α)logq11α+αlogqq1α.h_{q}(\alpha)=(1-\alpha)\log_{q}\frac{1}{1-\alpha}+\alpha\log_{q}\frac{q-1}{\alpha}.

Note that h2(α)h_{2}(\alpha) denotes the binary entropy function. We note that in some regimes (for e.g. when the degree satisfies 0.38n<d<0.499n0.38n<d<0.499n and α\alpha is larger than some constant depending on d/nd/n), the bound above improves on all previously proven weight distribution bounds for Reed-Muller codes, even though the only feature of the code that we use is transitivity. See Appendix A for some details.

II. A Criterion for Decoding on the BSC
We develop a new approach for proving decoding results over the BSC, i.e. the communication channel whose errors z𝔽2Nz\in\mathbb{F}_{2}^{N} are sampled from the ϵ\epsilon-noisy distribution

Pϵ(z)=ϵ|z|(1ϵ)N|z|P_{\epsilon}(z)=\epsilon^{|z|}(1-\epsilon)^{N-|z|}

for some ϵ(0,1)\epsilon\in(0,1). Our approach is based on Fourier analysis, although unlike [KKM+16] and [HSS21], the ideas we use do not rely on bounds on influences. We obtain the following result (recall that 𝒟(C)\mathcal{D}(C^{\perp}) denotes the uniform distribution over CC^{\perp}):

Theorem 2.

Let C𝔽2NC\subseteq\mathbb{F}_{2}^{N} be any linear code, and denote by C𝔽2NC^{\perp}\subseteq\mathbb{F}_{2}^{N} its dual code. Then for any ϵ(0,\epsilon\in(0, 12)\frac{1}{2}), there exists a decoding function d:𝔽2N𝔽2Nd:\mathbb{F}_{2}^{N}\rightarrow\mathbb{F}_{2}^{N} such that for all cCc\in C we have

PrρPϵ[d(c+ρ)c]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[d(c+\rho)\neq c] 2eN3ϵ+NmaxS{ϵN±N3/4}1|S|2{1(NS)missingEc𝒟(C)[KS(|c|)2]1},\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+N\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{c\sim\mathcal{D}(C^{\perp})}\big{[}K_{S}(|c|)^{2}\big{]}-1\Big{\}},

where (NS)=jS(NS)\binom{N}{S}=\sum_{j\in S}\binom{N}{S}, and where KS(x)=jSKj(x)K_{S}(x)=\sum_{j\in S}K_{j}(x) for KjK_{j} the Krawtchouk polynomial of degree jj.

We will now consider one interesting consequence of Theorem 2. Let ϵ(0,12)\epsilon\in(0,\frac{1}{2}) be arbitrary, and define

Aϵ={αN:h(α)>1h(ϵ)N1/5}.A_{\epsilon}=\{\alpha N:h(\alpha)>1-h(\epsilon)-N^{-1/5}\}.

Our next corollary states that whenever the dual codewords of CC are distributed sufficiently close to the binomial distribution for all weights in AϵA_{\epsilon}, the code CC must be resilient to ϵ\epsilon-errors. See Appendix B for the proof.

Corollary 3.

Let C𝔽2NC\subseteq\mathbb{F}_{2}^{N} be a linear code, and let ϵ(0,12)\epsilon\in(0,\frac{1}{2}) be arbitrary. Suppose that for every jAϵj\in A_{\epsilon} we have

Pry𝒟(C)[|y|=j](1+o(N1))(Nj)2N,\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|=j\big{]}\leq\big{(}1+o(N^{-1})\big{)}\frac{\binom{N}{j}}{2^{N}},

and suppose that

Pry𝒟(C)[|y|Aϵ]2N34iAϵ(Ni)2N.\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|\notin A_{\epsilon}\big{]}\leq 2^{N^{\frac{3}{4}}}\cdot\frac{\sum_{i\notin A_{\epsilon}}\binom{N}{i}}{2^{N}}.

Then CC is resilient to ϵ\epsilon-errors.

As a proof of concept, we note that a uniformly random linear code of dimension (1h(ϵ))N+N(1-h(\epsilon))N+\sqrt{N} satisfies all these conditions simultaneously with high probability.

III. List Decoding Results
Using a generalized version of Theorem 2 (namely, Theorem 21 in section 5), we obtain list decoding bounds for both transitive codes and Reed-Muller codes. We start with our bound for Reed-Muller codes.

Theorem 4.

Let ϵ(0,12)\epsilon\in(0,\frac{1}{2}) and γ(0,1)\gamma\in(0,1) be such that 1γ22ϵ(ln2)21-\gamma\geq 2^{-\frac{2\epsilon}{(\ln 2)^{2}}}. Then the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) of dimension (nd)=(1γ)N\binom{n}{\leq d}=(1-\gamma)N can with high probability list-decode ϵ\epsilon-errors using a list TT of size

|T|=2(h(ϵ)γ)N+o(N)+24ϵN+o(N).|T|=2^{(h(\epsilon)-\gamma)N+o(N)}+2^{4\epsilon N+o(N)}.

Although our lists have exponential size, for small ϵ\epsilon the list size is non-trivial, in the sense that it is much smaller than the number of noise vectors (which is about (NϵN)2h(ϵ)N\binom{N}{\epsilon N}\approx 2^{h(\epsilon)N}) and the number of codewords in the code (which is 2dim C2^{\textnormal{dim }C}). In fact, a standard calculation (see Appendix C) shows that any code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} of dimension (1γ)N(1-\gamma)N that can successfully list-decode errors of probability ϵ\epsilon with list size |T||T| must satisfy

|T|2(h(ϵ)γ)N.\displaystyle|T|\gtrsim 2^{(h(\epsilon)-\gamma)N}. (1)

Our bound in Theorem 4 shows that Reed-Muller codes achieve these optimal parameters, at least in some regimes (for e.g. when (nd)11.99ϵln2\binom{n}{\leq d}\geq 1-\frac{1.99\epsilon}{\ln 2} and ϵ\epsilon is small enough). We now turn to our list-decoding bound for transitive codes.

Theorem 5.

Fix any ϵ(0,12)\epsilon\in(0,\frac{1}{2}), η(0,1)\eta\in(0,1), and N>(5ϵ)20N>\left(\frac{5}{\epsilon}\right)^{20}. Then any transitive linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} of dimension dim C=ηN\textnormal{dim }C=\eta N can with high probability list-decode ϵ\epsilon-errors using a list TT of size

|T(x)|=2ϵNlog(21η)+o(N)+24ϵN.|T(x)|=2^{\epsilon N\log(\frac{2}{1-\eta})+o(N)}+2^{4\epsilon N}.

As an explicit example of the types of bounds one gets from Theorem 5, we have that any transitive linear code of dimension dim C=(14ϵe)N\textnormal{dim }C=(1-\frac{4\epsilon}{e})N can with high probability list-decode ϵ\epsilon-errors using a list TT of size

|T|=2(h(ϵ)ϵ+ϵ2ln2)N+o(N)+24ϵN.|T|=2^{(h(\epsilon)-\epsilon+\frac{\epsilon^{2}}{\ln 2})N+o(N)}+2^{4\epsilon N}.

For comparison, recall that our lower bound (1) states that any code CC of dimension (14ϵe)N(1-\frac{4\epsilon}{e})N requires a list size of at least about 2(h(ϵ)4ϵe)N2^{(h(\epsilon)-\frac{4\epsilon}{e})N}.

1.1 Techniques

Our weight distribution bound for transitive linear codes (Theorem 1) is proven by showing that the entropy of a uniformly random codeword of weight αN\alpha N is small. To do this, we analyze the entropy of the coordinates corresponding to linearly independent columns of the generator matrix. Transitivity implies that every coordinate in the code has the same entropy, and subadditivity of entropy can then be used to bound the entropy of the entire distribution.

To obtain our decoding criterion, we make use of a connection between the probability of a decoding error and the 2\ell_{2} norm of the coset distribution of the code. To explain the intuition, let us start by assuming that exactly ϵN\epsilon N of the coordinates in the codeword are flipped, although our results actually hold over the BSC as well. Let zz be the vector in 𝔽2N\mathbb{F}_{2}^{N} that represents the errors introduced by the channel, and let HH be the parity check matrix of the code. Then by standard arguments, if zz can be recovered from HzHz^{\intercal}, the codeword can be decoded. In the case where zz is uniformly distributed on vectors of weight ϵN\epsilon N, this amounts to showing that with high probability, the coset of zz does not contain any string of weight ϵN\epsilon N (in other words, there is no w𝔽2Nw\in\mathbb{F}_{2}^{N} of weight |w|=ϵN|w|=\epsilon N such that Hz=HwHz^{\perp}=Hw^{\perp}). This can be understood by computing the norm

f22=12Nyf(y)2=12NyPr[Hz=y]2,\|f\|_{2}^{2}=\frac{1}{2^{N}}\sum_{y}f(y)^{2}=\frac{1}{2^{N}}\sum_{y}\Pr[Hz^{\intercal}=y^{\intercal}]^{2},

where f(y)=Pr[Hz=y]f(y)=\Pr[Hz^{\intercal}=y^{\intercal}]. The norm above is always at least 2N(NϵN)12^{-N}\binom{N}{\epsilon N}^{-1}, and if it is close to 2N(NϵN)12^{-N}\binom{N}{\epsilon N}^{-1} then the code can be decoded with high probability. If f22\|f\|_{2}^{2} is larger than 2N(NϵN)12^{-N}\binom{N}{\epsilon N}^{-1}, then we show that the code can be list-decoded with high probability, where the size of the list is related to 2N(NϵN)f222^{N}\binom{N}{\epsilon N}\|f\|_{2}^{2}.

Thus, to understand decoding, we need to understand f22\|f\|_{2}^{2}. Using Fourier analysis, we express this quantity as

f22=j=0NPr[|c|=j]KϵN(j)2,\displaystyle\|f\|_{2}^{2}=\sum_{j=0}^{N}\Pr[|c^{\perp}|=j]\cdot K_{\epsilon N}(j)^{2}, (2)

where cc^{\perp} is a uniformly random codeword in the dual code and KϵNK_{\epsilon N} is the Krawtchouk polynomial of degree ϵN\epsilon N. We note that such relations for the coset weight distribution have been used to understand the discrepancy of subsets of the sphere, as well as subsets of other homogeneous spaces. In particular, (2) was proven in a slightly different form in [Bar21] (see Theorem 2.1 and Lemma 4.1), whereas over N\mathbb{R}^{N} results of this type had previously been derived in [BDM18, Skr19].

Using estimates for the magnitude of Krawtchouk polynomials and bounds for the weight distribution of the dual code CC^{\perp}, one can thus bound the norm f22\|f\|_{2}^{2} in the set-up where the error string zz is a random vector of weight exactly ϵN\epsilon N. Using essentially the same techniques, one can also bound the norm f22\|f\|_{2}^{2} when the error string zz is a random vector of weight ϵN\approx\epsilon N, i.e. zz is taken uniformly at random from the set S={x𝔽2N:|x|=ϵN±N3/4}S=\{x\in\mathbb{F}_{2}^{N}:|x|=\epsilon N\pm N^{3/4}\}.

Our next step is then to show that the 2\ell_{2} norm corresponding to the ϵ\epsilon-biased distribution is very similar to the 2\ell_{2} norm corresponding to the uniform distribution over SS. Intuitively, this is because SS only contains a very small range of weights, so the ϵ\epsilon-biased distribution and the uniform distribution must behave very similarly over strings of weight in SS. It then follows that their corresponding 2\ell_{2} norms must be similar as well.

Our decoding criteria (Theorem 2, Corollary 3) are thus obtained by bounding the norm f22\|f\|_{2}^{2} using estimates for Krawtchouk polynomials and for the weight distribution of the dual code CC^{\perp}. Our list-decoding results (Theorems 4 and 5) then follow from our weight bound for transitive codes (Theorem 1) and from a weight bound of Samorodnitsky for Reed-Muller codes (Theorem 6).

1.2 Related Work

It has been shown that LDPC codes achieve capacity over Binary Memoryless Symmetric Channels (BMS) [LMS+97, KRU13, Gal62], which includes both the BSC and the BEC. These constructions are not deterministic, and it is only with the advent of polar codes [Ari09] that we obtained capacity-achieving codes with both a deterministic constructions and efficient encoding and decoding algorithms.

Polar codes are closely related to Reed-Muller codes, in the sense that they also consist of subspaces that correspond to polynomials over 𝔽2[Ari09]\mathbb{F}_{2}\cite[cite]{[\@@bibref{}{arikan2009polar}{}{}]}. In [Ari09] it was shown that Polar codes achieve capacity over the BSC, and algorithms were given to both encode and decode them.

It has long been believed that Reed-Muller codes achieve capacity, and significant progress has been made in that direction over the last few years. (See [ASY21] for a discussion on the subject, as well as a thorough exposition to Reed-Muller codes). Abbe, Shpilka and Wigderson first showed that Reed-Muller codes achieve capacity over the BSC and the BEC for sub-constant and super-constant rates [ASW15]. Kudekar, Kumar, Mondelli, Pfister, Sasoglu and Urbanke then proved that in the constant rate regime, Reed-Muller codes achieve capacity over the BEC channel [KKM+16]. Abbe and Ye showed that the Reed-Muller transform polarizes the conditional mutual information, and proved that some non-explicit variant of the Reed-Muller code achieves capacity [AY19]. (They conjecture that this variant is in fact the Reed-Muller code itself). Hazla, Samorodnitsky and Sberlo then proved that Reed-Muller codes of constant rates can decode a constant fraction of errors on the BSC [HSS21]; this had previously been shown for Almost-Reed-Muller codes by Abbe, Hazla and Nachum [AHN21]. Most recently, Reeves and Pfister showed that Reed-Muller codes achieve capacity over all BMS channels under bit-MAP decoding [RP21], i.e. that one can with high probability recover any single bit of the original codeword (but not with high enough probability that one could take a union bound). Despite these breakthroughs, the conjecture that Reed-Muller codes achieve capacity over all BMS channels under block-MAP decoding (i.e. recover the whole codeword with high probability) is ultimately still open.

Weight Bounds for Reed-Muller Codes
Several past works have proven bounds on the weight distribution of Reed-Muller codes. Kaufman, Lovett and Porat gave asymptotically tight bounds on the weight distribution of Reed-Muller codes of constant degree [KLP12]. Abbe, Shpilka and Wigderson then built on these techniques to obtain bounds for all degrees smaller than n4\frac{n}{4} [ASW15], before Sberlo and Shpilka again improved the approach and obtained bounds for all degrees [SS20]. Most recently, Samorodnitsky used completely different ideas to obtain weight bounds in the regime where both the rate of the code and the normalized weight of the codeword are Θ(1)\Theta(1) [Sam20]. We will later use his following result in our list-decoding arguments:

Theorem 6 ([Sam20]).

Let (nd)=η2n=ηN\binom{n}{\leq d}=\eta 2^{n}=\eta N for some η(0,1)\eta\in(0,1), and denote by 𝒟(n,d)\mathcal{D}(n,d) the uniform distribution over all codewords in 𝖱𝖬(n,d)\mathsf{RM}(n,d). Then for any α(0,12)\alpha\in(0,\frac{1}{2}) we have

Prc𝒟(n,d)[|c|αN]2o(N)(11η)2ln2αN2ηN.\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\alpha N]\leq 2^{o(N)}\left(\frac{1}{1-\eta}\right)^{2\ln 2\cdot\alpha N}2^{-\eta N}.

These bounds are strong when α1/2\alpha\ll 1/2. For α\alpha close to 1/21/2, the first results we are aware of are due to Ben-Eliezer, Hod and Lovett [BHL12]. Their bounds, which were extended to Reed-Muller codes over prime fields by Beame, Oveis Gharan and Yang [BGY20], are strongest when the degree is sublinear. Sberlo and Shpilka then obtained bounds for all degrees in [SS20], while Samorodnitsky again obtained bounds in the regime where both α\alpha and η\eta are Θ(1)\Theta(1) [Sam20].

We note that in some regimes (for e.g. when the degree satisfies 0.38n<d<0.499n0.38n<d<0.499n and α\alpha is larger than some constant depending on d/nd/n), our Theorem 1 improves on all the aforementioned weight bounds. See Appendix A for some details.

List Decoding
List decoding was proposed by Elias in 1957 as an alternative to unique decoding [Eli57]. In the list decoding framework, the receiver of a corrupted codeword is asked to output a list of potential codewords, with the guarantee that with high probability one of these codewords is the original one. This of course allows for a greater fraction of errors to be tolerated.

The list decoding community has largely focused on proving results for the adversarial noise model, and many codes are now known to achieve list-decoding capacity. For example uniformly random codes achieve capacity, as do uniformly random linear codes [GHSZ02, LW18, GHK11]. Folded Reed-Solomon codes were the first explicit codes to provably achieve list-decoding capacity [GR08], followed by several others a few years later [GX12, Kop15, HRW17, MRR+20]. For the rest of this paper however, we will exclusively work in the model where the errors are stochastic. In this model, the strongest known list decoding bound for the code 𝖱𝖬(n,d)\mathsf{RM}(n,d) with (nd)=ηN>NNlog(1+2ϵ(1ϵ))\binom{n}{\leq d}=\eta N>N-N\log(1+2\sqrt{\epsilon(1-\epsilon)}) is, to our knowledge, that one can output a list TT of size

|T|=2ϵNlog4ϵ(1ϵ)(1η)4ln2+o(N)\displaystyle|T|=2^{\epsilon N\log\frac{4\epsilon(1-\epsilon)}{(1-\eta)^{4\ln 2}}+o(N)} (3)

and succeed with high probability in decoding ϵ\epsilon-errors. This result, although not explicitly stated in [Sam20], can be obtained from his weight bound of Theorem 6 by bounding the expected number of codewords that end up closer to the received string than the original codeword, and then applying Markov’s inequality. We note that the expression in (3) stays strictly below the optimal size of 2h(ϵ)N(1η)N+o(N)2^{h(\epsilon)N-(1-\eta)N+o(N)} (see Appendix D.1 for a proof of this).

Krawtchouk polynomials
The Krawtchouk polynomial of degree ss is the polynomial

Ks(x)=j=0s(1)j(xj)(Nxsj).K_{s}(x)=\sum_{j=0}^{s}(-1)^{j}\binom{x}{j}\binom{N-x}{s-j}.

For any subset S{0,,N}S\subseteq\{0,...,N\}, we will be interested in the corresponding polynomial KS(x):=sSKs(x)K_{S}(x):=\sum_{s\in S}K_{s}(x). For v𝔽2Nv\in\mathbb{F}_{2}^{N}, we will sometimes abuse notation and use Ks(v)K_{s}(v) to mean KS(|v|)K_{S}(|v|). The following proposition follows from standard results (see for instance [KL99], or Theorem 16 in [MS77]).

Proposition 7.

For any NN and any S{1,,N}S\subseteq\{1,...,N\}, we have

2NsS(Ns)j=0N(Nj)KS(j)2=1.\frac{2^{-N}}{\sum_{s\in S}\binom{N}{s}}\sum_{j=0}^{N}\binom{N}{j}K_{S}(j)^{2}=1.

Good estimates for Krawtchouk polynomials of any degree were obtained in [KL95, IS98, Pol19] (see for e.g. [Pol19], Lemma 2.1). These estimates are asymptotically tight in the exponent. Note that |Ks(x)|=|Ks(Nx)|=|KNs(x)||K_{s}(x)|=|K_{s}(N-x)|=|K_{N-s}(x)|, so it suffices to understand the case x,sN2x,s\leq\frac{N}{2}.

Theorem 8 ([KL95, IS98, Pol19]).

Let ϵ,δ(0,12)\epsilon,\delta\in(0,\frac{1}{2}) be arbitrary. If δ12ϵ(1ϵ)\delta\geq\frac{1}{2}-\sqrt{\epsilon(1-\epsilon)}, then

|KϵN(δN)|2(1+h(ϵ)h(δ))N2.|K_{\epsilon N}(\delta N)|\leq 2^{(1+h(\epsilon)-h(\delta))\frac{N}{2}}.

If δ<12ϵ(1ϵ)\delta<\frac{1}{2}-\sqrt{\epsilon(1-\epsilon)}, define ω=12δsgn(12δ)(12δ)24ϵ(1ϵ)2(12δ).\omega=\frac{1-2\delta-\textnormal{sgn}(1-2\delta)\sqrt{(1-2\delta)^{2}-4\epsilon(1-\epsilon)}}{2(1-2\delta)}. Then

|KϵN(δN)|(1ω)δN(1+ω)(1δ)NωϵN.|K_{\epsilon N}(\delta N)|\leq\frac{(1-\omega)^{\delta N}(1+\omega)^{(1-\delta)N}}{\omega^{\epsilon N}}.

As the second expression can be somewhat cumbersome to use, [Pol19] also gives the following weaker bound (see Lemma 2.2 and equation 2.10 in [Pol19]):

Theorem 9 ([Pol19]).

For any ϵ(0,12)\epsilon\in(0,\frac{1}{2}) and any δ<12ϵ(1ϵ)\delta<\frac{1}{2}\sqrt{\epsilon(1-\epsilon)}, we have

|KϵN(δN)|2h(ϵ)N+ϵNlog(12δ).|K_{\epsilon N}(\delta N)|\leq 2^{h(\epsilon)N+\epsilon N\log(1-2\delta)}.

We will need the above estimates when using our Theorem 2 to obtain list-decoding results for transitive codes and Reed-Muller codes.

2 Notation, Conventions and Preliminaries

For the sake of conciseness, we will use the notation

{a±l}={al,,a+l},\{a\pm l\}=\{a-l,...,a+l\},

the notation

(nd)=(n0)+(n1)++(nd),\binom{n}{\leq d}=\binom{n}{0}+\binom{n}{1}+\dotsb+\binom{n}{d},

and for S{0,,N}S\subseteq\{0,...,N\} the notation

(NS)=sS(Ns).\binom{N}{S}=\sum_{s\in S}\binom{N}{s}.

Let N=2nN=2^{n}. We will be working with the vector spaces 𝔽2n\mathbb{F}_{2}^{n} and 𝔽2N\mathbb{F}_{2}^{N}. For convenience, we associate 𝔽2n\mathbb{F}_{2}^{n} with the set [N]={1,2,,N}[N]=\{1,2,\dotsc,N\}, by ordering the elements of 𝔽2n\mathbb{F}_{2}^{n} lexicographically. For x𝔽2Nx\in\mathbb{F}_{2}^{N}, we write |x|=|{j[N]:xj=1}||x|=|\{j\in[N]:x_{j}=1\}| to denote the weight of xx.

2.1 Linear Codes

An NN-bit code is a subset C𝔽2NC\subseteq\mathbb{F}_{2}^{N}. Whenever CC is a subspace of 𝔽2N\mathbb{F}_{2}^{N}, we say that CC is a linear code. Any linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} can be represented by its generator matrix, which is a dim C×N\dim\textnormal{ C}\times N matrix GG whose rows form a basis for CC. The matrix GG generates all codewords of CC in the sense that

C={vG:v𝔽2dim C}.C=\{vG:v\in\mathbb{F}_{2}^{\textnormal{dim }C}\}.

Another useful way to describe a linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} is via its parity-check matrix, which is an (Ndim C)×N(N-\dim\textnormal{ C})\times N matrix HH whose rows span the orthogonal complement of CC. The linear code CC can then be expressed as

C={c𝔽2N:Hc=0}.C=\{c\in\mathbb{F}_{2}^{N}:Hc^{\intercal}=0\}.

One property that will play an important role is transitivity, which we define below:

Definition 1.

A set C𝔽2NC\subseteq\mathbb{F}_{2}^{N} is transitive if for every i,j[N]i,j\in[N] there exists a permutation π:[N][N]\pi:[N]\rightarrow[N] such that

  1. (i)

    π(i)=j\pi(i)=j

  2. (ii)

    For every element v=(v1,,vN)Cv=(v_{1},...,v_{N})\in C we have (vπ(1),vπ(N))C(v_{\pi(1)},...v_{\pi(N)})\in C

We note that the dual code of a transitive code is also transitive (see Appendix D.2 for the proof).

Claim 10.

The dual code CC^{\perp} of a transitive code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} is transitive.

2.2 Reed-Muller Codes

We will denote by 𝖱𝖬(n,d)\mathsf{RM}(n,d) the Reed-Muller code with nn variables and degree dd. Throughout this section, we let MM be the generator matrix of 𝖱𝖬(n,d)\mathsf{RM}(n,d); this is an (nd)×N\binom{n}{\leq d}\times N matrix whose rows correspond to sets of size at most dd, ordered lexicographically, and whose columns correspond to elements of 𝔽2n\mathbb{F}_{2}^{n}. For S[n],|S|dS\subseteq[n],|S|\leq d and x𝔽2nx\in\mathbb{F}_{2}^{n}, the corresponding entry is MS,x=jSxjM_{S,x}=\prod_{j\in S}x_{j}. If SS is empty, this entry is set to 11.

If v𝔽2(nd)v\in\mathbb{F}_{2}^{\binom{n}{\leq d}} is a row vector, vv can be thought of as describing the coefficients of a multilinear polynomial in 𝔽2[X1,,Xn]\mathbb{F}_{2}[X_{1},\dotsc,X_{n}] of degree at most dd. The row vector vMvM is then the evaluations of this polynomial on all inputs from 𝔽2n\mathbb{F}_{2}^{n}. It is well known that MM has full rank, (nd)\binom{n}{\leq d}. In fact we have the following standard fact (see Appendix D.3 for the proof):

Fact 11.

The columns of MM that correspond to the points x𝔽2nx\in\mathbb{F}_{2}^{n} with |x|d|x|\leq d are linearly independent.

The parity-check matrix of the Reed-Muller code is known to be the same as the generator matrix of a different Reed-Muller code. Namely, let HH be the (nnd1)×N\binom{n}{\leq n-d-1}\times N generator matrix for the code 𝖱𝖬(n,nd1)\mathsf{RM}(n,n-d-1). Then HH has full rank, and MH=0MH^{\intercal}=0. So, the rows of HH are a basis for the orthogonal complement of the span of the rows of MM. Reed-Muller codes also have useful algebraic features, notably transitivity:

Fact 12.

For all nn and all dnd\leq n, the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) is transitive.

See Appendix D.3 for the proof.

2.3 Entropy

The binary entropy function h:[0,1]h:[0,1]\rightarrow\mathbb{R} is defined to be

h(ϵ)=ϵlog1ϵ+(1ϵ)log11ϵ.h(\epsilon)=\epsilon\cdot\log\frac{1}{\epsilon}+(1-\epsilon)\cdot\log\frac{1}{1-\epsilon}.

The following fact allows us to approximate binomial coefficients using the entropy function:

Fact 13.

For n/2d1n/2\geq d\geq 1, 8πe4n2h(d/n)n(nd)(nd)2h(d/n)n.\sqrt{\frac{8\pi}{e^{4}n}}\cdot 2^{h(d/n)\cdot n}\leq\binom{n}{d}\leq\binom{n}{\leq d}\leq 2^{h(d/n)\cdot n}.

The leftmost inequality is a consequence of Stirling’s approximation for the binomial coefficients, and the rightmost is a consequence of the sub-additivity of entropy.

The following lemma, which is essentially a 2-way version of Pinsker’s inequality, gives a useful way to control the entropy function near 1/21/2.

Lemma 14.

For any μ(0,1)\mu\in(0,1), we have

μ22ln21h(1μ2)μ2.\frac{\mu^{2}}{2\ln 2}\leq 1-h\left(\frac{1-\mu}{2}\right)\leq\mu^{2}.

See Appendix D.4 for the proof.

2.4 Probability Distributions

There are two types of probability distributions that we will use frequently. The first one is the ϵ\epsilon-Bernoulli distribution over 𝔽2N\mathbb{F}_{2}^{N}, which we will denote by

Pϵ(z)=ϵ|z|(1ϵ)N|z|.P_{\epsilon}(z)=\epsilon^{|z|}(1-\epsilon)^{N-|z|}.

The second one is the uniformly random distribution over some set TT, which we will denote by

𝒟(T)(z)={1|T|if zT,0otherwise..\mathcal{D}(T)(z)=\begin{cases}\frac{1}{|T|}&\text{if $z\in T$,}\\ 0&\text{otherwise.}\end{cases}.

There are two particular cases for the uniform distribution that will occur often enough that we attribute them their own notation. The first one is the uniform distribution over 𝔽2t\mathbb{F}_{2}^{t}, which we will denote by

μt=𝒟(𝔽2t).\mu_{t}=\mathcal{D}(\mathbb{F}_{2}^{t}).

The second one is the uniform distribution over all vectors z𝔽2Nz\in\mathbb{F}_{2}^{N} of weight |z|S|z|\in S, for some S{0,,N}S\subseteq\{0,...,N\}. We will denote this probability distribution by

λS=𝒟({z𝔽2N:|z|S}).\lambda_{S}=\mathcal{D}(\{z\in\mathbb{F}_{2}^{N}:|z|\in S\}).

2.5 Probability Theory

We will need two very standard results of probability theory (see for e.g. [BLM13]): Markov’s inequality and Chernoff’s bound. We start with Markov’s inequality.

Lemma 15.

Let XX be a nonnegative random variable. Then for any a>0a>0, we have

Pr[Xa]missingE[X]a.\Pr[X\geq a]\leq\frac{\mathop{\mathbb{missing}}{E}[X]}{a}.

We will also need Chernoff’s bound:

Lemma 16.

Let X1,,XnX_{1},...,X_{n} be i.i.d. random variables taking values in {0,1}\{0,1\}, and define X=X1++XnX=X_{1}+...+X_{n}. Then for any δ(0,1)\delta\in(0,1), we have

Pr[|XmissingE[X]|>δnmissingE[X1]]2eδ2nmissingE[X1]3.\Pr\Big{[}\big{|}X-\mathop{\mathbb{missing}}{E}[X]\big{|}>\delta\cdot n\mathop{\mathbb{missing}}{E}[X_{1}]\Big{]}\leq 2e^{-\frac{\delta^{2}\cdot n\mathop{\mathbb{missing}}{E}[X_{1}]}{3}}.

2.6 Fourier Analysis

The Fourier basis is a useful basis for the space of functions mapping 𝔽2N\mathbb{F}_{2}^{N} to the real numbers. We recall some of its properties below (see for e.g. [dW08]). For f,g𝔽2Nf,g\in\mathbb{F}_{2}^{N}\rightarrow\mathbb{R}, define the inner product

f,g=12Nx𝔽2Nf(x)g(x).\langle f,g\rangle=\frac{1}{2^{N}}\sum_{x\in\mathbb{F}_{2}^{N}}f(x)g(x).

For every x,y𝔽2Nx,y\in\mathbb{F}_{2}^{N}, define the character

χy(x)=(1)j=1Nxjyj.\chi_{y}(x)=(-1)^{\sum_{j=1}^{N}x_{j}y_{j}}.

These functions form an orthonormal basis, namely for y,y𝔽2Ny,y^{\prime}\in\mathbb{F}_{2}^{N},

χy,χy={1if y=y,0otherwise.\displaystyle\langle\chi_{y},\chi_{y^{\prime}}\rangle=\begin{cases}1&\text{if $y=y^{\prime}$,}\\ 0&\text{otherwise.}\end{cases}

We define the Fourier coefficients f^(y)=f,χy\hat{f}(y)=\langle f,\chi_{y}\rangle. Then for f,g:𝔽2Nf,g:\mathbb{F}_{2}^{N}\rightarrow\mathbb{R}, we have

f,g=y𝔽2Nf^(y)g^(y).\langle f,g\rangle=\sum_{y\in\mathbb{F}_{2}^{N}}\hat{f}(y)\cdot\hat{g}(y).

In particular,

f22=f,f=yf^(y)2.\|f\|^{2}_{2}=\langle f,f\rangle=\sum_{y}\hat{f}(y)^{2}.

3 Outline of the Paper

The main question we will be looking into is whether or not a family of list-decoding codes {CN}\{C_{N}\}, with CN𝔽2NC_{N}\subseteq\mathbb{F}_{2}^{N}, is asymptotically resilient to independent errors of probability ϵ\epsilon. Formally, we are given a list size k=k(N)k=k(N) and want to know if there exists a family of decoding functions {dN}\{d_{N}\}, with dN:𝔽2N(𝔽2N)kd_{N}:\mathbb{F}_{2}^{N}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes k}, such that for every sequence of codewords {cN}\{c_{N}\} we have

limNPrρNPϵ[cNdN(cN+ρN)]=0.\displaystyle\lim_{N\rightarrow\infty}\Pr_{\rho_{N}\sim P_{\epsilon}}\big{[}c_{N}\notin d_{N}(c_{N}+\rho_{N})\big{]}=0.

We note that the unique decoding problem can be seen as setting k=1k=1 in the above set-up. Our general approach will be based on trying to identify the error string ρ𝔽2N\rho\in\mathbb{F}_{2}^{N} from its image HρH\rho^{\intercal}. In particular, we will be interested in the max-likelihood decoder

Dk(x)\displaystyle D_{k}(x) =argmax{z1,,zk}𝔽2NHzi=x for all i{Pϵ(z1)++Pϵ(zk)}\displaystyle=\mathop{\textnormal{argmax}}_{\begin{subarray}{c}\{z_{1},...,z_{k}\}\subseteq\mathbb{F}_{2}^{N}\\ {Hz_{i}}^{\intercal}=x\textnormal{ for all }i\end{subarray}}\{P_{\epsilon}(z_{1})+...+P_{\epsilon}(z_{k})\}
=argmin{z1,,zk}𝔽2NHzi=x for all i{|z1|++|zk|}.\displaystyle=\mathop{\textnormal{argmin}}_{\begin{subarray}{c}\{z_{1},...,z_{k}\}\subseteq\mathbb{F}_{2}^{N}\\ Hz_{i}^{\intercal}=x\textnormal{ for all }i\end{subarray}}\{|z_{1}|+...+|z_{k}|\}. (4)

We show in the following lemma that if the max-likelihood decoder is able to identify the error string ρ\rho, then it is possible to recover the original codeword.

Lemma 17.

Let HH be the t×Nt\times N parity-check matrix of the linear code CC, and let D:𝔽2t(𝔽2N)kD:\mathbb{F}_{2}^{t}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}} be an arbitrary function. Then there exists a decoding function

d:𝔽2N(𝔽2N)kd:\mathbb{F}_{2}^{N}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}}

such that for every cCc\in C we have

PrρPϵ[cd(c+ρ)]PrρPϵ[ρD(Hρ)].\displaystyle\Pr_{\rho\sim P_{\epsilon}}[c\notin d(c+\rho)]\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D(H\rho^{\intercal})].
Proof.

Given D:𝔽2t(𝔽2N)kD:\mathbb{F}_{2}^{t}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}}, define d:𝔽2N(𝔽2N)kd:\mathbb{F}_{2}^{N}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}} to be

d(z)={z+y:yD(Hz)}.d(z)=\{z+y:y\in D(Hz^{\intercal})\}.

We will show that whenever ρ\rho satisfies ρD(Hρ)\rho\in D(H\rho^{\intercal}), ρ\rho also satisfies cd(c+ρ)c\in d(c+\rho) for every cCc\in C. Suppose ρD(Hρ)\rho\in D(H\rho^{\intercal}). Note that since HH is the parity-check matrix of CC, every cCc\in C satisfies Hc=0Hc^{\intercal}=0. So for every cCc\in C, any ρ\rho that satisfies ρD(Hρ)\rho\in D(H\rho^{\intercal}) must also satisfy ρD(H(c+ρ))\rho\in D(H(c^{\intercal}+\rho^{\intercal})). It then follows by definition of d(c+ρ)d(c+\rho) that

c=c+ρ+ρd(c+ρ).\displaystyle c=c+\rho+\rho\in d(c+\rho).

From this point onward, our goal will thus be to prove that the max-likelihood decoder in (3) succeeds in recovering ρ\rho with high probability. In section 4, we relate the decoding error probability of the max-likelihood decoder DkD_{k} to the collision probability

x𝔽2tPr[Hz=x]2.\sum_{x\in\mathbb{F}_{2}^{t}}\Pr[Hz^{\intercal}=x]^{2}.

In section 5, we build on this result to obtain a bound on the performance of DkD_{k} in terms of the weight distribution of the dual code. We then present new bounds on the weight distribution of transitive codes in section 6. These bounds are interesting in their own right, and we show that they are essentially tight. In section 7, we combine these bounds with our results from section 5 to obtain list-decoding results for transitive linear codes. We then repeat this argument with Samorodnitsky’s Theorem 6 in section 8 to obtain a stronger list-decoding bound for Reed-Muller codes.

4 Collisions vs Decoding

Recall that we denote by PϵP_{\epsilon} the ϵ\epsilon-Bernoulli distribution over 𝔽2N\mathbb{F}_{2}^{N}, i.e. the distribution

Pϵ(z)=ϵ|z|(1ϵ)N|z|.P_{\epsilon}(z)=\epsilon^{|z|}(1-\epsilon)^{N-|z|}.

Recall also that for any subset S{0,,N}S\subseteq\{0,...,N\}, we denote by λS\lambda_{S} the uniform distribution over all strings z𝔽2Nz\in\mathbb{F}_{2}^{N} of weight |z|S|z|\in S, i.e.

λS(z)={1jS(Nj)if |z|S,0otherwise.\lambda_{S}(z)=\begin{cases}\frac{1}{\sum_{j\in S}\binom{N}{j}}&\text{if $|z|\in S$,}\\ 0&\text{otherwise.}\end{cases}

The goal of this section will be to analyze the relationship between the decoding of an error string ρ𝔽2N\rho\in\mathbb{F}_{2}^{N} and the collision probability of strings z𝔽2Nz\in\mathbb{F}_{2}^{N} within the map zHzz\mapsto Hz^{\intercal}. Intuitively, the more collisions there are within this mapping, the harder it is for our decoder to correctly identify the error string ρ𝔽2N\rho\in\mathbb{F}_{2}^{N} upon seeing only its image Hρ𝔽2tH\rho^{\intercal}\in\mathbb{F}_{2}^{t}. However, certain error strings might be unlikely enough to occur that our decoder can safely ignore them. For example, if we are interested in an ϵ\epsilon-noisy error string ρ\rho, then ρ\rho is unlikely to have weight |ρ||\rho| far away from ϵN\epsilon N. We could thus choose to ignore all strings whose weights do not lie in the set S={ϵNl,,ϵN+l}S=\{\epsilon N-l,...,\epsilon N+l\}, for some integer ll. In order to analyze the collisions that occur when strings are required to have weight zSz\in S, we define for every z𝔽2Nz\in\mathbb{F}_{2}^{N} and every S{0,,N}S\subseteq\{0,...,N\} the set of SS-colliders of zz, i.e. the set of strings yy that lie in the coset of zz and have weight |y|S|y|\in S:

Definition 2.

For any z𝔽2Nz\in\mathbb{F}_{2}^{N} and any subset S{0,,N}S\subseteq\{0,...,N\}, define

ΩzS={y𝔽2N:|y|S and Hy=Hz}.\Omega_{z}^{S}=\left\{y\in\mathbb{F}_{2}^{N}:|y|\in S\textnormal{ and }Hy^{\intercal}=Hz^{\intercal}\right\}.

This definition captures a natural parameter for how large of a list we need before we can confidently claim that it contains the error string: if we are given HρH\rho^{\intercal} and are told that with high probability the error string ρ\rho has weight |ρ|S|\rho|\in S, then we should output the list ΩρS\Omega_{\rho}^{S}. For unique decoding we want to argue that |ΩρS|=1|\Omega_{\rho}^{S}|=1 with high probability, whereas for list decoding we want to argue that |ΩρS|k|\Omega_{\rho}^{S}|\leq k with high probability, for some integer k>1k>1. The expectation of |ΩρS||\Omega_{\rho}^{S}| will thus be a key quantity in our analysis. We will call this expectation the ”collision count,” because it will later be useful to interpret it as the renormalized collision probability of the map zHzz\mapsto Hz^{\intercal} (see for instance the proof of Proposition 20).

Definition 3.

For any subset S{0,..,N}S\subseteq\{0,..,N\} and any t×Nt\times N matrix HH, define

𝖢𝗈𝗅𝗅H(S)\displaystyle\mathsf{Coll}_{H}(S) =missingEzλS[|ΩzS|].\displaystyle=\mathop{\mathbb{missing}}{E}_{z\sim\lambda_{S}}\big{[}|\Omega_{z}^{S}|\big{]}.

In the following lemma, we use Markov’s inequality to bound the probability of a list decoding error in terms of 𝖢𝗈𝗅𝗅H(S)\mathsf{Coll}_{H}(S).

Lemma 18.

For any subset S{0,,N}S\subseteq\{0,...,N\}, any matrix HH with NN columns, and any integer k1k\geq 1, we have

PrρλS[|ΩρS|>k]𝖢𝗈𝗅𝗅𝖧(S)1k.\displaystyle\Pr_{\rho\sim\lambda_{S}}\big{[}|\Omega_{\rho}^{S}|>k\big{]}\leq\frac{\mathsf{Coll_{H}}(S)-1}{k}.
Proof.

Note that |ΩzS|1|\Omega_{z}^{S}|\geq 1 for any z𝔽2Nz\in\mathbb{F}_{2}^{N} with weight |z|S|z|\in S, so the random variable |ΩzS|1|\Omega_{z}^{S}|-1 is always non-negative. Applying Markov’s inequality (i.e. Lemma 15), we then have

PrρλS[|ΩρS|>k]\displaystyle\Pr_{\rho\sim\lambda_{S}}\big{[}|\Omega_{\rho}^{S}|>k\big{]} =PrρλS[|ΩρS|1k]\displaystyle=\Pr_{\rho\sim\lambda_{S}}\big{[}|\Omega_{\rho}^{S}|-1\geq k\big{]}
𝖢𝗈𝗅𝗅𝖧(S)1k.\displaystyle\leq\frac{\mathsf{Coll_{H}}(S)-1}{k}.

When the error string ρ\rho is sampled uniformly at random from the set {z𝔽2N:|z|S}\{z\in\mathbb{F}_{2}^{N}:|z|\in S\}, the above lemma allows us to relate the decoding error probability to the collision count 𝖢𝗈𝗅𝗅𝖧(S)\mathsf{Coll_{H}}(S). The problem we are most interested in, however, is when ρ\rho is sampled not from some uniform distribution, but from the ϵ\epsilon-noisy probability distribution PϵP_{\epsilon}. We will now show how to connect these two decoding problems. The intuition is that by the Chernoff bound, we only need to concern ourselves with strings whose weights lie in S={ϵN±l}S=\{\epsilon N\pm l\}, for some appropriately chosen ll. But in this weight band all strings have similar weight, and so are given similar probability under the distribution PϵP_{\epsilon}. Intuitively, the PϵP_{\epsilon}-decoder must then perform very similarly to the λS\lambda_{S}-decoder. The following theorem makes this idea precise, and then uses Lemma 18 to bound the probability of a decoding error. Recall that Dk:𝔽2t(𝔽2N)kD_{k}:\mathbb{F}_{2}^{t}\rightarrow(\mathbb{F}_{2}^{N})^{\otimes k} is the max-likelihood decoder

Dk(x)=argmin{z1,,zk}𝔽2NHzi=x for all i{|z1|++|zk|}.D_{k}(x)=\mathop{\textnormal{argmin}}_{\begin{subarray}{c}\{z_{1},...,z_{k}\}\subseteq\mathbb{F}_{2}^{N}\\ Hz_{i}^{\intercal}=x\textnormal{ for all }i\end{subarray}}\{|z_{1}|+...+|z_{k}|\}.
Theorem 19.

Fix ϵ<12\epsilon<\frac{1}{2}, let HH be any matrix with NN columns, and let k=(2l+1)m+1k=(2l+1)m+1 for some integers m0m\geq 0 and l(12ϵ)Nl\leq(\frac{1}{2}-\epsilon)N. Then

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2el23ϵN+4(l+1)kmaxS{ϵN±l}1|S|1+𝟙{k=1}{𝖢𝗈𝗅𝗅𝖧(S)1},\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq|S|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}},

where 𝟙{k=1}\mathbbm{1}\{k=1\} is 11 when k=1k=1 and 0 otherwise

Proof.

We will consider the unique decoding case (k=1k=1, i.e. m=0m=0) and the list-decoding case (k>1k>1, i.e. mm\in\mathbb{N}) separately.
Case 1: Unique decoding, i.e. k=1k=1
Let tt be the number of rows in the matrix HH. We will show that a slightly less performant decoder D~1:𝔽2t𝔽2N\tilde{D}_{1}:\mathbb{F}_{2}^{t}\rightarrow\mathbb{F}_{2}^{N} satisfies the desired probability bound. We define D~1\tilde{D}_{1} as follows: upon receiving input x𝔽2tx\in\mathbb{F}_{2}^{t}, D~1\tilde{D}_{1} outputs the minimum-weight string from the set {z𝔽2N:Hz=x,|z|=ϵN±l}.\{z\in\mathbb{F}_{2}^{N}:Hz^{\perp}=x,|z|=\epsilon N\pm l\}. If there is no such string, the decoder fails. It is clear that

PrρPϵ[ρD1(Hρ)]PrρPϵ[ρD~1(Hρ)],\Pr_{\rho\sim P_{\epsilon}}[\rho\neq D_{1}(H\rho^{\intercal})]\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})],

since D1D_{1} always returns the most likely string whereas D~1\tilde{D}_{1} may not. We thus turn to proving the desired bound for D~1\tilde{D}_{1}. Letting

B={z𝔽2N:||z|ϵN|>l}},B=\{z\in\mathbb{F}_{2}^{N}:\big{|}|z|-\epsilon N\big{|}>l\}\},

we have by Chernoff’s bound (i.e. Lemma 16) that

PrρPϵ[ρD~1(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})] PrρPϵ[ρB]+PrρPϵ[ρD~1(Hρ)|ρB]\displaystyle\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\in B]+\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})\big{|}\rho\notin B]
2el23ϵN+PrρPϵ[ρD~1(Hρ)|ρB].\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})\big{|}\rho\notin B]. (5)

We want to bound the second term. For any ρB\rho\notin B, we define the set of ”problematic weights” S(ρ)={ϵNl,,|ρ|}S(\rho)=\{\epsilon N-l,...,|\rho|\} . We note that for ρB\rho\notin B, our decoder D~1\tilde{D}_{1} can only fail if there is some string zρz\neq\rho satisfying Hz=HρHz^{\perp}=H\rho^{\perp} and |z|S(ρ)|z|\in S(\rho). Recalling the definition ΩρS={z:Hz=Hρ,|z|S}\Omega_{\rho}^{S}=\{z:Hz^{\perp}=H\rho^{\perp},|z|\in S\}, we can then rewrite our equation (4) as

PrρPϵ[ρD~1(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})] 2el23ϵN+PrρPϵ[|ΩρS(ρ)|>1|ρB].\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\Pr_{\rho\sim P_{\epsilon}}\big{[}|\Omega_{\rho}^{S(\rho)}|>1\big{|}\rho\notin B\big{]}.

Considering the most problematic weight level ww within the region {ϵN±l}\{\epsilon N\pm l\} and using a union bound over all lower levels www^{\prime}\leq w, we get

PrρPϵ[ρD~1(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})] 2el23ϵN+maxw{ϵN±l}{PrρPϵ[|ΩρS(ρ)|>1|ρ|=w]}\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\left\{\Pr_{\rho\sim P_{\epsilon}}\big{[}|\Omega_{\rho}^{S(\rho)}|>1\big{|}|\rho|=w\big{]}\right\}
2el23ϵN+(2l+1)maxw,w{ϵN±l}ww{PrρPϵ[|Ωρ{w,w}|>1|ρ|=w]}.\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+(2l+1)\mathop{\max}_{\begin{subarray}{c}w,w^{\prime}\in\{\epsilon N\pm l\}\\ w^{\prime}\leq w\end{subarray}}\left\{\Pr_{\rho\sim P_{\epsilon}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{|}|\rho|=w\big{]}\right\}.

We now note that under the condition |ρ|=w|\rho|=w, the probability distributions Pϵ(ρ)P_{\epsilon}(\rho) and λw,w(ρ)\lambda_{w,w^{\prime}}(\rho) are identical (they are both uniform on strings of weight ww). We can thus rewrite our last inequality as

PrρPϵ[ρD~1(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})] 2el23ϵN+(2l+1)maxw,w{ϵN±l}ww{Prρλw,w[|Ωρ{w,w}|>1|ρ|=w]}.\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+(2l+1)\mathop{\max}_{\begin{subarray}{c}w,w^{\prime}\in\{\epsilon N\pm l\}\\ w^{\prime}\leq w\end{subarray}}\left\{\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{|}|\rho|=w\big{]}\right\}.

But by basic conditional probability we know that

Prρλw,w[|Ωρ{w,w}|>1]Prρλw,w[|ρ|=w]Prρλw,w[|Ωρ{w,w}|>1|ρ|=w],\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{]}\geq\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\rho|=w\big{]}\cdot\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{|}|\rho|=w\big{]},

so we can bound our previous expression by

PrρPϵ[ρD~1(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})] 2el23ϵN+(2l+1)maxw,w{ϵN±l}ww{Prρλw,w[|Ωρ{w,w}|>1]Prρλw,w[|ρ|=w]}.\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+(2l+1)\mathop{\max}_{\begin{subarray}{c}w,w^{\prime}\in\{\epsilon N\pm l\}\\ w^{\prime}\leq w\end{subarray}}\left\{\frac{\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{]}}{\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\rho|=w\big{]}}\right\}.

Now for any w<N2w<\frac{N}{2} and www^{\prime}\leq w, we have Prρλ{w,w}[|ρ|=w]=(Nw)(N{w,w})(Nw)(Nw)+(Nw)12\Pr_{\rho\sim\lambda_{\{w,w^{\prime}\}}}\big{[}|\rho|=w\big{]}=\frac{\binom{N}{w}}{\binom{N}{\{w,w^{\prime}\}}}\geq\frac{\binom{N}{w}}{\binom{N}{w}+\binom{N}{w^{\prime}}}\geq\frac{1}{2}. It then follows that

PrρPϵ[ρD~1(Hρ)]2el23ϵN+2(2l+1)maxS{ϵN±l}|S|{1,2}{PrρλS[ΩρS>1]}.\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}\rho\notin\tilde{D}_{1}(H\rho^{\perp})\big{]}\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+2(2l+1)\cdot\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ |S|\in\{1,2\}\end{subarray}}\left\{\Pr_{\rho\sim\lambda_{S}}\big{[}\Omega_{\rho}^{S}>1\big{]}\right\}.

The theorem statement then follows from Lemma 18.

Case 2: List decoding, i.e. k>1k>1
Let tt be the number of rows in the matrix HH. We will show that a slightly less performant decoding function Dk,l:𝔽2t(𝔽2N)kD_{k,l}:\mathbb{F}_{2}^{t}\rightarrow(\mathbb{F}_{2}^{N})^{\otimes k} satisfies the desired probability bound. We define Dk,lD_{k,l} as follows: upon receiving input x𝔽2tx\in\mathbb{F}_{2}^{t}, Dk,lD_{k,l} outputs k12l+1\frac{k-1}{2l+1} strings from {z𝔽2N:Hz=x,|z|=w}\{z\in\mathbb{F}_{2}^{N}:Hz=x,|z|=w\}, for each w{ϵN±l}w\in\{\epsilon N\pm l\}. If there are fewer than k12l+1\frac{k-1}{2l+1} strings in some level ww, the decoder returns all of them. It is clear that for any ll we have

PrρPϵ[ρDk(Hρ)]PrρPϵ[ρDk,l(Hρ)],\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})],

since DkD_{k} returns the kk most likely strings while Dk,lD_{k,l} returns at most k1k-1 strings. We thus turn to proving the desired bound for Dk,lD_{k,l}. We first bound the probability that the error string |ρ||\rho| be far away from its mean. Letting

B={z𝔽2N:||z|ϵN|>l},B=\Big{\{}z\in\mathbb{F}_{2}^{N}:\big{|}|z|-\epsilon N\big{|}>l\Big{\}},

we have, by Chernoff’s bound (i.e. Lemma 16), that

PrρPϵ[ρDk,l(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})] PrρPϵ[ρB]+PrρPϵ[ρDk,l(Hρ)|ρB]\displaystyle\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\in B]+\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})\big{|}\rho\notin B]
2el23ϵN+maxw{ϵN±l}PrρPϵ[ρDk,l(Hρ)|ρ|=w].\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})\big{|}|\rho|=w].

Since the distribution PϵP_{\epsilon} gives the same probability to any two strings of equal weights, we get

PrρPϵ[ρDk,l(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})] 2el23ϵN+maxw{ϵN±l}Prρλ{w}[ρDk,l(Hρ)]\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\Pr_{\rho\sim\lambda_{\{w\}}}[\rho\notin D_{k,l}(H\rho^{\intercal})]
2el23ϵN+maxw{ϵN±l}Prρλ{w}[|Ωρ{w}|>k12l+1].\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\Pr_{\rho\sim\lambda_{\{w\}}}[|\Omega_{\rho}^{\{w\}}|>\frac{k-1}{2l+1}].

Applying Lemma 18, we get

PrρPϵ[ρDk,l(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})] 2el23ϵN+2l+1k1maxw{ϵN±l}{𝖢𝗈𝗅𝗅𝖧(S)1}\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{2l+1}{k-1}\cdot\max_{w\in\{\epsilon N\pm l\}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}}
2el23ϵN+2(l+1)kmaxw{ϵN±l}{𝖢𝗈𝗅𝗅𝖧(S)1},\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{2(l+1)}{k}\cdot\max_{w\in\{\epsilon N\pm l\}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}},

where in the last line we used that aba+1b+1\frac{a}{b}\leq\frac{a+1}{b+1} whenever aba\leq b. ∎

5 A Criterion for Decoding

In this section, we give a criterion that certifies that a linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} is resilient to errors of probability ϵ\epsilon. We give such a criterion for both unique decoding and list decoding. The function we will need to make this connection is the Krawtchouk polynomial of degree ss, which as we recall is defined as

Ks(x)=j=0s(1)j(xj)(Nxsj).K_{s}(x)=\sum_{j=0}^{s}(-1)^{j}\binom{x}{j}\binom{N-x}{s-j}.

For vectors v𝔽2Nv\in\mathbb{F}_{2}^{N}, we will abuse notation and write Ks(v)K_{s}(v) to mean Ks(|v|).K_{s}(|v|). For convenience, we also define for any S{0,,N}S\subseteq\{0,...,N\} the function

KS(x)=sSKs(x).K_{S}(x)=\sum_{s\in S}K_{s}(x).

In the following proposition, we use basic Fourier analysis tools to rewrite the collision count 𝖢𝗈𝗅𝗅H(S)\mathsf{Coll}_{H}(S) in terms of the Krawtchouk polynomial KSK_{S}. We note that Proposition 20 was previously proven in a different form in [Bar21] (see Theorem 2.1 and Lemma 4.1), and can be seen as describing the coset weight distribution of the code. Recall that we use μt\mu_{t} to denote the uniform distribution over all vectors in 𝔽2t\mathbb{F}_{2}^{t}, and that we use the notation (NS)=sS(Ns)\binom{N}{S}=\sum_{s\in S}\binom{N}{s}.

Proposition 20.

Fix ϵ(0,12)\epsilon\in(0,\frac{1}{2}), and let HH be a t×Nt\times N matrix with entries in 𝔽2\mathbb{F}_{2}. Then for any S{1,,N}S\subseteq\{1,...,N\}, we have

𝖢𝗈𝗅𝗅H(S)\displaystyle\mathsf{Coll}_{H}(S) =1(NS)missingEvμt[KS(vH)2].\displaystyle=\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{v\sim\mu_{t}}[K_{S}(vH)^{2}].
Proof.

The main tool we will use is Parseval’s Identity, which relates the evaluations f(x)f(x) of a function f:𝔽2tf:\mathbb{F}_{2}^{t}\rightarrow\mathbb{R} to its Fourier coefficients f^(y)\hat{f}(y) by

12tx𝔽2tf(x)2=y𝔽2tf^(y)2.\displaystyle\frac{1}{2^{t}}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)^{2}=\sum_{y\in\mathbb{F}_{2}^{t}}\hat{f}(y)^{2}. (6)

We will first need to rewrite 𝖢𝗈𝗅𝗅H(S)\mathsf{Coll}_{H}(S) as the 2\ell_{2} norm of some function ff. For this, we recall the definition |ΩzS|={y𝔽2N:|y|S and Hy=Hz}|\Omega_{z}^{S}|=\left\{y\in\mathbb{F}_{2}^{N}:|y|\in S\textnormal{ and }Hy^{\intercal}=Hz^{\intercal}\right\} and note that

𝖢𝗈𝗅𝗅H(S)\displaystyle\mathsf{Coll}_{H}(S) :=1(NS)z𝔽2N:|z|S|ΩzS|\displaystyle:=\frac{1}{\binom{N}{S}}\sum_{z\in\mathbb{F}_{2}^{N}:|z|\in S}|\Omega_{z}^{S}|
=(NS)z𝔽2N:|z|S1|ΩzS|PraλS[Ha=Hz]2\displaystyle=\binom{N}{S}\sum_{z\in\mathbb{F}_{2}^{N}:|z|\in S}\frac{1}{|\Omega_{z}^{S}|}\Pr_{a\sim\lambda_{S}}[Ha^{\intercal}=Hz^{\intercal}]^{2}
=(NS)x𝔽2tPrzλS[Hz=x]2.\displaystyle=\binom{N}{S}\sum_{x\in\mathbb{F}_{2}^{t}}\Pr_{z\sim\lambda_{S}}[Hz^{\intercal}=x]^{2}.

We are now ready to apply Parseval’s Identity. Letting f(x)=PrzλS[Hz=x]f(x)=\Pr_{z\sim\lambda_{S}}[Hz^{\intercal}=x] in equation (6), we get

𝖢𝗈𝗅𝗅H(S)\displaystyle\mathsf{Coll}_{H}(S) =(NS)x𝔽2tf(x)2\displaystyle=\binom{N}{S}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)^{2}
=2t(NS)y𝔽2tf^(y)2.\displaystyle=2^{t}\binom{N}{S}\sum_{y\in\mathbb{F}_{2}^{t}}\hat{f}(y)^{2}.

But by definition we have f^(y):=2tx𝔽2tf(x)(1)yx\hat{f}(y):=2^{-t}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)\cdot(-1)^{y\cdot x^{\intercal}}, so the last equation can be rewritten as

𝖢𝗈𝗅𝗅H(S)\displaystyle\mathsf{Coll}_{H}(S) =2t(NS)y𝔽2t(x𝔽2tf(x)(1)yx)2.\displaystyle=2^{-t}\binom{N}{S}\sum_{y\in\mathbb{F}_{2}^{t}}\Big{(}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)\cdot(-1)^{y\cdot x^{\intercal}}\Big{)}^{2}. (7)

Define the function LS(z)L_{S}(z) to be 1 if z𝔽2Nz\in\mathbb{F}_{2}^{N} satisfies |z|S|z|\in S, and 0 otherwise. We can then express f(x)f(x) as

f(x)=PrzλS[Hz=x]=1(NS)z𝔽2NHz=xLS(z).\displaystyle f(x)=\Pr_{z\sim\lambda_{S}}[Hz^{\intercal}=x]=\frac{1}{\binom{N}{S}}\sum_{\begin{subarray}{c}z\in\mathbb{F}_{2}^{N}\\ Hz^{\intercal}=x\end{subarray}}L_{S}(z). (8)

Combining expressions (7) and (8) and applying the definition of the Fourier transform, we get

𝖢𝗈𝗅𝗅H(S)\displaystyle\mathsf{Coll}_{H}(S) =2t(NS)y𝔽2t(z𝔽2NLS(z)(NS)(1)yHz)2\displaystyle=2^{-t}\binom{N}{S}\sum_{y\in\mathbb{F}_{2}^{t}}\Big{(}\sum_{z\in\mathbb{F}_{2}^{N}}\frac{L_{S}(z)}{\binom{N}{S}}\cdot(-1)^{yHz^{\intercal}}\Big{)}^{2}
=22Nt(NS)y𝔽2tL^S(yH)2\displaystyle=\frac{2^{2N-t}}{\binom{N}{S}}\sum_{y\in\mathbb{F}_{2}^{t}}\hat{L}_{S}(yH)^{2}
=2t(NS)y𝔽2tKS(yH)2.\displaystyle=\frac{2^{-t}}{\binom{N}{S}}\sum_{y\in\mathbb{F}_{2}^{t}}K_{S}(yH)^{2}.

We will now combine Theorem 19 and Proposition 20 to obtain Theorem 2, i.e. to obtain a bound on the decoding error probability in terms of the Fourier coefficients of the level function LϵL_{\epsilon}. We prove a generalized version of Theorem 2 below. To recover Theorem 2, set k=1k=1 and l=N3/4l=N^{3/4}. (You want to think of the parameter ll as being l>>Nl>>\sqrt{N} in both the case k=1k=1 and the case k>1k>1, so that the error term eN3ϵe^{-\frac{\sqrt{N}}{3\epsilon}} is small).

Theorem 21.

Fix ϵ(0,12)\epsilon\in(0,\frac{1}{2}), let HH be any t×Nt\times N Boolean matrix, and let k=(2l+1)m+1k=(2l+1)m+1 for any integers m0m\geq 0 and l(12ϵ)Nl\leq(\frac{1}{2}-\epsilon)N. Then

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2el23ϵN+4(l+1)kmaxS{ϵN±l}1|S|1+𝟙{k=1}{1(NS)missingEvμt[KS(vH)2]1},\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq|S|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{v\sim\mu_{t}}\big{[}K_{S}(vH)^{2}\big{]}-1\Big{\}},

where the function 𝟙{k=1}\mathbbm{1}\{k=1\} is 11 when k=1k=1, and 0 otherwise.

Proof.

Applying Theorem 19 and Proposition 20, we have

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2el23ϵN+4(l+1)kmaxS{ϵN±l}1|S|1+𝟙{k=1}{𝖢𝗈𝗅𝗅𝖧(S)1}\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq|S|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}}
=2el23ϵN+4(l+1)kmaxS{ϵN±l}1|S|1+𝟙{k=1}{1(NS)missingEvμt[KS(vH)2]1}.\displaystyle=2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq|S|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{v\sim\mu_{t}}\big{[}K_{S}(vH)^{2}\big{]}-1\Big{\}}.

One consequence of Theorem 21 is Corollary 3, which states that CC is resilient to ϵ\epsilon-errors if the weight distribution of CC^{\perp} is close enough to the binomial distribution (see Appendix B for the proof). As another application of Theorem 21, we present the following bound on the probability of making a list-decoding error for a code CC. We note that once again, our bound depends only on the weight distribution of the dual code CC^{\perp}.

Proposition 22.

Fix any ϵ(0,12)\epsilon\in(0,\frac{1}{2}), and define β=12ϵ~(1ϵ~)2\beta=\frac{1-2\sqrt{\tilde{\epsilon}(1-\tilde{\epsilon})}}{2} for ϵ~=ϵ+N1/4.\tilde{\epsilon}=\epsilon+N^{-1/4}. Let B={βN,,(1β)N}B=\{\beta N,...,(1-\beta)N\}, and let k=(2N3/4+1)m+1k^{*}=(2N^{3/4}+1)m+1 for some integer m0m\geq 0. Then for all N>(5ϵ)20N>\left(\frac{5}{\epsilon}\right)^{20} and all integers kkk\geq k^{*}, we have that any t×Nt\times N matrix HH with entries in 𝔽2\mathbb{F}_{2} satisfies

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2eN3ϵ+NkmaxjB{Prvμt[|vH|=j]2N(Nj)1}\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{N}{k^{*}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot\frac{2^{N}}{\binom{N}{j}}-1\right\}
+2h(ϵ)N+N45kmaxjB{Prvμt[|vH|=j]22ϵNlog|12jN|}.\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}}}{k^{*}}\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 2^{2\epsilon N\log|1-\frac{2j}{N}|}\right\}.
Proof.

We will use Theorem 21 to bound the decoding error probability in terms of the Krawtchouk polynomials KS(j)K_{S}(j) and the probability factors Prvμt[|vH|=j]\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}. Some of these terms will then be bounded using Proposition 7, and some will be bounded using Theorem 9. We proceed with the proof; letting l=N3/4l=N^{3/4} in Theorem 21, we get

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2eN3ϵ+NkmaxS{ϵN±N3/4}1|S|2{1(NS)j=0NPrvμt[|vH|=j]KS(j)21}.\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{N}{k^{*}}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j=0}^{N}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}K_{S}(j)^{2}-1\Big{\}}. (9)

We want to bound the summation in the second term. We will start with the central terms jBj\in B. For these we rely on Proposition 7, which states that 2N(NS)j=0N(Nj)KS(j)2=1\frac{2^{-N}}{\binom{N}{S}}\sum_{j=0}^{N}\binom{N}{j}\cdot K_{S}(j)^{2}=1 for all S{0,,N}S\subseteq\{0,...,N\}. For any S{0,,N}S\subseteq\{0,...,N\}, we then get

1(NS)jBPrvμt[|vH|=j]KS(j)2\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\in B}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}K_{S}(j)^{2} 1(NS)maxjB{Prvμt[|vH|=j]1(Nj)}jB(Nj)KS(j)2\displaystyle\leq\frac{1}{\binom{N}{S}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot\frac{1}{\binom{N}{j}}\right\}\sum_{j\in B}\binom{N}{j}\cdot K_{S}(j)^{2}
2NmaxjB{Prvμt[|vH|=j]1(Nj)}.\displaystyle\leq 2^{N}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot\frac{1}{\binom{N}{j}}\right\}. (10)

We then want to bound the contribution of the faraway terms jBj\notin B to the summation in (9), i.e. we want to bound

maxS{ϵN±N3/4}1|S|2{1(NS)jBPrvμt[|vH|=j]KS(j)2}.\displaystyle\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j\notin B}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}K_{S}(j)^{2}\Big{\}}. (11)

We will want to apply Theorem 9 to every term in this sum. Note that by definition of Krawtchouk polynomials, for any w,ww,w^{\prime} we have

K{w,w}(y)\displaystyle K_{\{w,w^{\prime}\}}(y) =Kw(y)+Kw(y)\displaystyle=K_{w}(y)+K_{w^{\prime}}(y)
2max{Kw(y),Kw(y)}.\displaystyle\leq 2\cdot\max\big{\{}K_{w}(y),K_{w^{\prime}}(y)\big{\}}.

We can then bound equation 11 by

(11)\displaystyle(\ref{bdgoal}) 1(NϵNN3/4)NmaxS{ϵN±N3/4}1|S|2jB{Prvμt[|vH|=j]KS(j)2}\displaystyle\leq\frac{1}{\binom{N}{\epsilon N-N^{3/4}}}\cdot N\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\\ j\notin B\end{subarray}}\Big{\{}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}K_{S}(j)^{2}\Big{\}}
N(NϵNN3/4)maxw{ϵN±N3/4}jB{Prvμt[|vH|=j]4Kw(j)2}.\displaystyle\leq\frac{N}{\binom{N}{\epsilon N-N^{3/4}}}\mathop{\max}_{\begin{subarray}{c}w\in\{\epsilon N\pm N^{3/4}\}\\ j\notin B\end{subarray}}\Big{\{}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 4K_{w}(j)^{2}\Big{\}}.

Applying Theorem 9, we get

(11)\displaystyle(\ref{bdgoal}) 4N(NϵNN34)maxw{ϵN±N34}jB{Prvμt[|vH|=j]22h(w)N+2wlog|12jN|}\displaystyle\leq\frac{4N}{\binom{N}{\epsilon N-N^{\frac{3}{4}}}}\max_{\begin{subarray}{c}w\in\{\epsilon N\pm N^{\frac{3}{4}}\}\\ j\notin B\end{subarray}}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 2^{2h(w)N+2w\log|1-\frac{2j}{N}|}\right\}
2N4/5N2h(ϵ)Nmaxj<βN{Prvμt[|vH|=j]22ϵNlog(12jN)}.\displaystyle\leq\frac{2^{N^{4/5}}}{N}\cdot 2^{h(\epsilon)N}\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}.

Combining this bound for the faraway terms with our bound (5) for the central terms of the summation, we bound the right-hand side of equation (9) by

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2eN3ϵ+NkmaxjB{Prvμt[|vH|=j]2N(Nj)1}\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{N}{k^{*}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot\frac{2^{N}}{\binom{N}{j}}-1\right\}
+2h(ϵ)N+N45kmaxj<βN{Prvμt[|vH|=j]22ϵNlog(12jN)}.\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}}}{k^{*}}\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}.

6 The Weight Distribution of Transitive Linear Codes

We will now prove Theorem 1. We note that the bound we get is essentially tight, since for η(0,1)\eta\in(0,1) the repetition code

C={(z,,z)𝔽qN:z𝔽qηN}C=\left\{(z,...,z)\in\mathbb{F}_{q}^{N}:z\in\mathbb{F}_{q}^{\eta N}\right\}

is transitive, has dimension ηN\eta N, and has weight distribution

Prc𝒟(C)[|c|=αN]\displaystyle\Pr_{c\sim\mathcal{D}(C)}\Big{[}|c|=\alpha N\Big{]} =qηN(ηN(1α)ηN)(q1)αηN\displaystyle=q^{-\eta N}\cdot\binom{\eta N}{(1-\alpha)\eta N}(q-1)^{\alpha\eta N}
qηN8πe4ηN2h(α)ηNqαηNlogq(q1)\displaystyle\geq q^{-\eta N}\cdot\sqrt{\frac{8\pi}{e^{4}\eta N}}\cdot 2^{h(\alpha)\eta N}\cdot q^{\alpha\eta N\log_{q}(q-1)}
=8πe4ηNq(1hq(α))ηN\displaystyle=\sqrt{\frac{8\pi}{e^{4}\eta N}}\cdot q^{-(1-h_{q}(\alpha))\eta N}

for all α(0,1)\alpha\in(0,1). We recall and prove our Theorem 1 below:

Theorem.

Let C𝔽qNC\subseteq\mathbb{F}_{q}^{N} be a transitive linear code. Then for any α(0,11/q)\alpha\in(0,1-1/q) we have

Prc𝒟(C)[|c|=αN]q(1hq(α))dim C,\Pr_{c\sim\mathcal{D}(C)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\textnormal{dim }C},

where 𝒟(C)\mathcal{D}(C) is the uniform distribution over all codewords in CC, |c||c| is the number of non-zero coordinates of cc, and hqh_{q} is the q-ary entropy

hq(α)=(1α)logq11α+αlogqq1α.h_{q}(\alpha)=(1-\alpha)\log_{q}\frac{1}{1-\alpha}+\alpha\log_{q}\frac{q-1}{\alpha}.
Proof.

Let MM be the t×Nt\times N generator matrix of CC, and let r=rank M=dim Cr=\textnormal{rank }M=\textnormal{dim }C. Without loss of generality, suppose that the first rr columns of MM span the column-space of MM. Define

C(α)={cC:|c|=αN},C^{(\alpha)}=\{c\in C:|c|=\alpha N\},

and let Z=(Z1,,ZN)Z=(Z_{1},...,Z_{N}) be a uniformly random codeword in C(α)C^{(\alpha)}. Now CC is transitive, so for every j,k{1,,N}j,k\in\{1,...,N\} the random variables ZjZ_{j} and ZkZ_{k} are identically distributed. By linearity of expectation and by definition of C(α)C^{(\alpha)}, we thus have that for every j{1,,N}j\in\{1,...,N\},

PrZ𝒟(C(α))[Zj=0]=1α.\displaystyle\Pr_{Z\sim\mathcal{D}(C^{(\alpha)})}[Z_{j}=0]=1-\alpha. (12)

But under condition (12), ZjZ_{j} has maximal entropy when its remaining mass is uniformly distributed over {1,,q1}\{1,...,q-1\}, i.e. when Pr[Zj=m]=αq1\Pr[Z_{j}=m]=\frac{\alpha}{q-1} for all m{1,,q1}m\in\{1,...,q-1\}. The entropy of ZjZ_{j} is thus bounded by

𝖧Z𝒟(C(α))(Zj)\displaystyle\mathop{\mathsf{H}}_{Z\sim\mathcal{D}(C^{(\alpha)})}(Z_{j}) (1α)log11α+(q1)αq1logq1α\displaystyle\leq(1-\alpha)\log\frac{1}{1-\alpha}+(q-1)\cdot\frac{\alpha}{q-1}\log\frac{q-1}{\alpha}
=hq(α)log(q).\displaystyle=h_{q}(\alpha)\log(q). (13)

We will now show that 𝖧(Zj|Z1,,Zj1)=0\mathsf{H}(Z_{j}|Z_{1},...,Z_{j-1})=0 for every j>rj>r. To this end, fix some j>rj>r. Recall that the columns {M1,,Mr}\{M_{1},...,M_{r}\} span the column-space of MM, so we can write the column MjM_{j} as Mj=k=1rβkMkM_{j}=\sum_{k=1}^{r}\beta_{k}M_{k} for some β1,,βr𝔽q\beta_{1},...,\beta_{r}\in\mathbb{F}_{q}. But any codeword cCc\in C can be expressed as v(c)Mv^{(c)}M for some v(c)𝔽qtv^{(c)}\in\mathbb{F}_{q}^{t}, so any codeword cCc\in C satisfies

cj=v(c)Mj=k=1rβkv(c)Mk=k=1rβkck.c_{j}=v^{(c)}M_{j}=\sum_{k=1}^{r}\beta_{k}v^{(c)}M_{k}=\sum_{k=1}^{r}\beta_{k}c_{k}.

The random variable ZjZ_{j} is thus determined by {Z1,,Zr}\{Z_{1},...,Z_{r}\}, and so we indeed have

𝖧Z𝒟(C(α))(Zj|Z1,,Zj1)=0\mathop{\mathsf{H}}_{Z\sim\mathcal{D}(C^{(\alpha)})}(Z_{j}|Z_{1},...,Z_{j-1})=0

for every j>rj>r. Applying (6) and the chain rule for entropy then gives

𝖧(Z)\displaystyle\mathsf{H}(Z) =𝖧(Z1)+i=2N𝖧(Zi|Z1,,Zi1)\displaystyle=\mathsf{H}(Z_{1})+\sum_{i=2}^{N}\mathsf{H}(Z_{i}|Z_{1},...,Z_{i-1})
i=1r𝖧(Zi)\displaystyle\leq\sum_{i=1}^{r}\mathsf{H}(Z_{i})
=rhq(α)log(q)\displaystyle=r\cdot h_{q}(\alpha)\log(q)

Now ZZ is sampled uniformly from C(α)C^{(\alpha)}, so 𝖧(Z)=log(|C(α)|)\mathsf{H}(Z)=\log\Big{(}|C^{(\alpha)}|\Big{)}. We thus have

Prc𝒟(C)[|c|=αN]\displaystyle\Pr_{c\sim\mathcal{D}(C)}\Big{[}|c|=\alpha N\Big{]} =|C(α)|qr\displaystyle=\frac{\left|C^{(\alpha)}\right|}{q^{r}}
=2𝖧(Z)qr\displaystyle=2^{\mathsf{H}(Z)}\cdot q^{-r}
q(1hq(α))r.\displaystyle\leq q^{-(1-h_{q}(\alpha))\cdot r}.

For Reed-Muller codes, we will abuse notation and denote by 𝒟(n,d)\mathcal{D}(n,d) the uniform distribution over all codewords in 𝖱𝖬(n,d)\mathsf{RM}(n,d).

Theorem 23.

For any n,d<n,n,d<n, and α(0,1)\alpha\in(0,1), the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) over the prime field 𝔽q\mathbb{F}_{q} satisfies

Prc𝒟(n,d)[|c|=αN]q(1hq(α))(nd).\Pr_{c\sim\mathcal{D}(n,d)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\cdot\binom{n}{\leq d}}.
Proof.

This follows immediately from Theorem 1, Fact 12, and Fact 11. ∎

7 List Decoding for Transitive Codes

We now turn to proving Theorem 5. Recall that in section 5 we bounded the minimum size for the decoding list of a linear code in terms of the weight distribution of its dual code. But as we mentioned in the preliminaries, the dual code of a transitive code is also transitive. For any transitive linear code CC, we can thus apply our Theorem 1 for the weight distribution of CC^{\perp} to get a bound on the size of the decoding list for CC. We restate and prove our Theorem 5 below.

Theorem.

Fix any ϵ(0,12)\epsilon\in(0,\frac{1}{2}), η(0,1)\eta\in(0,1), and N>(5ϵ)20N>\left(\frac{5}{\epsilon}\right)^{20}. Then any transitive linear code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} of dimension dim C=ηN\textnormal{dim }C=\eta N can with high probability list-decode ϵ\epsilon-errors using a list TT of size

|T(x)|=2N5/6(24ϵηN+2ϵNlog(21η)).|T(x)|=2^{N^{5/6}}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}).
Proof.

We will show that there exists a function TT mapping every x𝔽2Nx\in\mathbb{F}_{2}^{N} to a subset T(x)CT(x)\subseteq C of size

|T(x)|=2N5/6(24ϵηN+2ϵNlog(21η)),|T(x)|=2^{N^{5/6}}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}),

with the property that for every codeword cCc\in C we have

PrρPϵ[cT(c+ρ)]2N.\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}c\notin T(c+\rho)\big{]}\leq\frac{2}{N}.

Let HH denote the parity-check matrix of CC. By Lemma 17, it is sufficient to show that for any N>(5ϵ)20N>\left(\frac{5}{\epsilon}\right)^{20} and any k>1k>1 we have

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 1N+2N5/6Nk(24ϵηN+2ϵNlog(21η)).\displaystyle\leq\frac{1}{N}+\frac{2^{N^{5/6}}}{Nk}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}). (14)

We will thus prove (14). Recall that for k>Nk>N, Proposition 22 yields the following bound on the left-hand side of (14):

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2eN3ϵ+2NkmaxjB{Prvμt[|vH|=j]2N(Nj)}\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{2N}{k}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot\frac{2^{N}}{\binom{N}{j}}\right\}
+2h(ϵ)N+N45+1kmaxjB{Prvμt[|vH|=j]22ϵNlog(12jN)},\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}+1}}{k}\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}, (15)

where β=12(12ϵ~(1ϵ~))\beta=\frac{1}{2}\left(1-2\sqrt{\tilde{\epsilon}(1-\tilde{\epsilon})}\right) for ϵ~=ϵ+N1/4\tilde{\epsilon}=\epsilon+N^{-1/4}, and B={βN,,(1β)N}B=\{\beta N,...,(1-\beta)N\}. Our goal will be to bound both the central terms jBj\in B and the faraway terms jBj\notin B by using our bounds on the weight distribution of transitive codes. As we’ve seen in section 2, the dual code CC^{\perp} is a transitive linear code of dimension Ndim CN-\textnormal{dim }C. By Theorem 1, we thus have that for all j{0,,N}j\in\{0,...,N\},

Prvμt[|vH|=j]2(1h(jN))(1η)N.\displaystyle\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\leq 2^{-(1-h(\frac{j}{N}))(1-\eta)N}. (16)

For any jBj\in B, we then have by Fact 13 that

Prvμt[|vH|=j]2N(Nj)\displaystyle\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}} 2(1h(j/N))(1η)N2N8πe4N2h(j/N)N\displaystyle\leq 2^{-(1-h(j/N))(1-\eta)N}\cdot\frac{2^{N}}{\sqrt{\frac{8\pi}{e^{4}N}}\cdot 2^{h(j/N)N}}
=e4N8π2(1h(j/N))ηN.\displaystyle=\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(j/N))\eta N}.

But for jBj\in B we have β<jN<1β\beta<\frac{j}{N}<1-\beta, so the right-hand side is maximized at j=βNj=\beta N. Applying Lemma 14, we get

maxjB{Prvμt[|vH|=j]2N(Nj)}\displaystyle\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\} e4N8π2(1h(β))ηN\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(\beta))\eta N}
e4N8π24ϵ~(1ϵ~)ηN.\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{4\tilde{\epsilon}(1-\tilde{\epsilon})\eta N}. (17)

We now turn to the faraway terms of equation (7). By equation (16), we have

maxj<βN{Prvμt[|vH|=j]22ϵNlog(12jN)}maxδ<β{2(1h(δ))(1η)N22ϵNlog(12δ)}.\displaystyle\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}\leq\max_{\delta<\beta}\left\{2^{-(1-h(\delta))(1-\eta)N}\cdot 2^{2\epsilon N\log(1-2\delta)}\right\}.

Note that by definition of β\beta, any δ<β\delta<\beta can be written as δ=12αϵ~(1ϵ~)2\delta=\frac{1-2\sqrt{\alpha\tilde{\epsilon}(1-\tilde{\epsilon})}}{2} for some α>1\alpha>1. By Lemma 14, we can then rewrite our previous expression as

maxj<βN{Prvμt[|vH|=j]22ϵNlog(12jN)}maxα>1{22αϵ~(1ϵ~)ln2(1η)N2ϵNlog(4αϵ~(1ϵ~))}.\displaystyle\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}\leq\max_{\alpha>1}\left\{2^{-\frac{2\alpha\tilde{\epsilon}(1-\tilde{\epsilon})}{\ln 2}(1-\eta)N}\cdot 2^{\epsilon N\log(4\alpha\tilde{\epsilon}(1-\tilde{\epsilon}))}\right\}.

But for any positive constant cc, the derivative of log(α)cα\log(\alpha)-c\alpha is 1αln2c\frac{1}{\alpha\cdot\ln 2}-c, and the second derivative is always negative. Thus, the above expression achieves its maximum when α=ϵ2ϵ~(1ϵ~)(1η)\alpha=\frac{\epsilon}{2\tilde{\epsilon}(1-\tilde{\epsilon})(1-\eta)}. We then get

maxj<βN{Prvμt[|vH|=j]22ϵNlog(12jN)}\displaystyle\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\} 2ϵNln22ϵNlog(2ϵ1η)\displaystyle\leq 2^{-\frac{\epsilon N}{\ln 2}}\cdot 2^{\epsilon N\log(\frac{2\epsilon}{1-\eta})}
2h(ϵ)N2ϵNlog(21η),\displaystyle\leq 2^{-h(\epsilon)N}\cdot 2^{\epsilon N\log(\frac{2}{1-\eta})}, (18)

where in the last line we used the inequality log(1x)x(1x)ln2\log(1-x)\geq-\frac{x}{(1-x)\ln 2} for x<1x<1 to get h(ϵ)ϵlog(ϵ)+ϵln2h(\epsilon)\leq-\epsilon\log(\epsilon)+\frac{\epsilon}{\ln 2}. We now use equations (7) and (7) to bound the central and faraway terms of (7) respectively. This gives

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2eN3ϵ+2Nke4N8π24ϵ~(1ϵ~)ηN+2N45+1k2ϵNlog(21η)\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{2N}{k}\cdot\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{4\tilde{\epsilon}(1-\tilde{\epsilon})\eta N}+\frac{2^{N^{\frac{4}{5}}+1}}{k}\cdot 2^{\epsilon N\log(\frac{2}{1-\eta})}
1N+2N5/6Nk(24ϵηN+2ϵNlog(21η)).\displaystyle\leq\frac{1}{N}+\frac{2^{N^{5/6}}}{Nk}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}).

We have shown (14), and so we are done. ∎

8 List Decoding for Reed-Muller Codes

We will now turn to proving our list-decoding bounds for Reed-Muller codes. The dual code of the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) is the code 𝖱𝖬(n,nd1)\mathsf{RM}(n,n-d-1), so we can apply Samorodnitsky’s Theorem 6 to our Proposition 22. We restate and prove our Theorem 4 below.

Theorem.

Let ϵ(0,12)\epsilon\in(0,\frac{1}{2}) and γ(0,1)\gamma\in(0,1) be such that 1γ22ϵ(ln2)21-\gamma\geq 2^{-\frac{2\epsilon}{(\ln 2)^{2}}}. Then the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) of dimension (nd)=(1γ)N\binom{n}{\leq d}=(1-\gamma)N can with high probability list-decode ϵ\epsilon-errors using a list TT of size

|T|=2h(ϵ)NγN+o(N)+24ϵN+o(N).|T|=2^{h(\epsilon)N-\gamma N+o(N)}+2^{4\epsilon N+o(N)}.
Proof.

We will show that there exists a function TT mapping every x𝔽2Nx\in\mathbb{F}_{2}^{N} to a subset T(x)𝖱𝖬(n,d)T(x)\subseteq\mathsf{RM}(n,d) of size

|T|=2h(ϵ)NγN+o(N)+24ϵN+o(N),|T|=2^{h(\epsilon)N-\gamma N+o(N)}+2^{4\epsilon N+o(N)},

with the property that for every codeword c𝖱𝖬(n,d)c\in\mathsf{RM}(n,d) we have

PrρPϵ[cT(c+ρ)]2N.\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}c\notin T(c+\rho)\big{]}\leq\frac{2}{N}.

Let HH denote the parity-check matrix of 𝖱𝖬(n,d)\mathsf{RM}(n,d). By Lemma 17, it is sufficient to show that for any N>(5ϵ)20N>\left(\frac{5}{\epsilon}\right)^{20} and any k>1k>1 we have

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 1N+2o(N)kN(24ϵN+2h(ϵ)N(1η)N).\displaystyle\leq\frac{1}{N}+\frac{2^{o(N)}}{kN}\left(2^{4\epsilon N}+2^{h(\epsilon)N-(1-\eta)N}\right). (19)

We will thus prove (19). Recall that for k>Nk>N, Proposition 22 yields the following bound on the left-hand side of (19):

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 2eN3ϵ+2NkmaxjB{Prvμt[|vH|=j]2N(Nj)}\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{2N}{k}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\}
+2h(ϵ)N+N45+1kmaxjB{Prvμt[|vH|=j]22ϵNlog|12jN|},\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}+1}}{k}\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot 2^{2\epsilon N\log|1-\frac{2j}{N}|}\right\}, (20)

where β=12(12ϵ~(1ϵ~))\beta=\frac{1}{2}\left(1-2\sqrt{\tilde{\epsilon}(1-\tilde{\epsilon})}\right) for ϵ~=ϵ+N1/4\tilde{\epsilon}=\epsilon+N^{-1/4}, and B={βN,,(1β)N}B=\{\beta N,...,(1-\beta)N\}. Our goal is to bound every term in these sums by using the weight distribution bounds given in Theorems 1 and 6. We bound the central terms in exactly the same way as in Theorem 5: by Theorem 23 we know that the weight distribution of the Reed-Muller code satisfies

Prvμt[|vH|=j]2(1h(jN))(1η)N,\displaystyle\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\leq 2^{-(1-h(\frac{j}{N}))(1-\eta)N},

so by Fact 13 we have

maxjB{Prvμt[|vH|=j]2N(Nj)}\displaystyle\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\} maxjB{2(1h(j/N))(1η)N2N8πe4N2h(j/N)N}\displaystyle\leq\max_{j\in B}\left\{2^{-(1-h(j/N))(1-\eta)N}\cdot\frac{2^{N}}{\sqrt{\frac{8\pi}{e^{4}N}}\cdot 2^{h(j/N)N}}\right\}
=maxjB{e4N8π2(1h(j/N))ηN}.\displaystyle=\max_{j\in B}\left\{\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(j/N))\eta N}\right\}.

But B={βN,,(1β)N}B=\{\beta N,...,(1-\beta)N\}, so by Lemma 14 we have

maxjB{Prvμt[|vH|=j]2N(Nj)}\displaystyle\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\} e4N8π2(1h(β))ηN\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(\beta))\eta N}
e4N8π24ϵ~(1ϵ~)ηN.\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{4\tilde{\epsilon}(1-\tilde{\epsilon})\eta N}. (21)

For the faraway terms, we use the weight bound from Theorem 6. By symmetry, we get that

maxjB{Prvμt[|vH|=j]22ϵNlog|12jN|}\displaystyle\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log|1-\frac{2j}{N}|}\right\} 2o(N)maxjN2{2(1η)N(1η)2jln222ϵNlog|12jN|}\displaystyle\leq 2^{o(N)}\cdot\max_{j\leq\frac{N}{2}}\left\{2^{-(1-\eta)N}\left(\frac{1}{\eta}\right)^{2j\ln 2}\cdot 2^{2\epsilon N\log|1-\frac{2j}{N}|}\right\}
=2o(N)2(1η)NmaxjN2{22jln2log(η)+2ϵNlog(12jN)}.\displaystyle=2^{o(N)}\cdot 2^{-(1-\eta)N}\max_{j\leq\frac{N}{2}}\left\{2^{-2j\ln 2\cdot\log(\eta)+2\epsilon N\log(1-\frac{2j}{N})}\right\}. (22)

Now the function

g(j)=2jln2log(η)+2ϵNlog(12jN)g(j)=-2j\ln 2\cdot\log(\eta)+2\epsilon N\log(1-\frac{2j}{N})

has first derivative

dgdj=2ln2log(η)4ϵln2(12jN),\frac{dg}{dj}=-2\ln 2\cdot\log(\eta)-\frac{4\epsilon}{\ln 2\cdot(1-\frac{2j}{N})},

and second derivative

dg2d2j=8ϵln2N(12jN)2<0.\frac{dg^{2}}{d^{2}j}=-\frac{8\epsilon}{\ln 2\cdot N(1-\frac{2j}{N})^{2}}<0.

Thus g(j)g(j) achieves its maximum at j=N2+ϵN(ln2)2log(η)j^{*}=\frac{N}{2}+\frac{\epsilon N}{(\ln 2)^{2}\log(\eta)} and is decreasing over [j,N2][j^{*},\frac{N}{2}]. Whenever η>22ϵ(ln2)2\eta>2^{-\frac{2\epsilon}{(\ln 2)^{2}}}, we have j0j^{*}\leq 0; in that case the argument in equation (8) is maximized at j=0j=0, and we get

maxjB{Prvμt[|vH|=j]22ϵNlog|12jN|}\displaystyle\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log|1-\frac{2j}{N}|}\right\} 2(1η)N+o(N).\displaystyle\leq 2^{-(1-\eta)N+o(N)}.

Combining this bound for the faraway terms with the bound (8) for the central terms, we bound the right-hand side of (8) by

PrρPϵ[ρDk(Hρ)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})] 1N+2o(N)kN(24ϵηN+2h(ϵ)N(1η)N).\displaystyle\leq\frac{1}{N}+\frac{2^{o(N)}}{kN}\left(2^{4\epsilon\eta N}+2^{h(\epsilon)N-(1-\eta)N}\right).

We have shown (19), and so we are done. ∎

Acknowledgements

We thank Alexander Barg, Paul Beame, Noam Elkies, Amir Shpilka, Madhu Sudan and Amir Yehudayoff for useful discussions.

Appendix A Weight Bounds Comparisons

In this section, we will compare our Theorem 23 with previously known bounds on the weight distribution of Reed-Muller codes. We recall our Theorem 23 below. Note that throughout this section, 𝒟(n,d)\mathcal{D}(n,d) will denote the uniform distribution over all codewords in 𝖱𝖬(n,d)\mathsf{RM}(n,d), and |c||c| will denote the number of non-zero coordinates of cc.

Theorem.

For any n,d<n,n,d<n, and α(0,1)\alpha\in(0,1), the Reed-Muller code 𝖱𝖬(n,d)\mathsf{RM}(n,d) over the prime field 𝔽q\mathbb{F}_{q} satisfies

Prc𝒟(n,d)[|c|=αN]q(1hq(α))(nd),\Pr_{c\sim\mathcal{D}(n,d)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\cdot\binom{n}{\leq d}},

where we have defined

hq(α)=(1α)logq11α+αlogqq1α.h_{q}(\alpha)=(1-\alpha)\log_{q}\frac{1}{1-\alpha}+\alpha\log_{q}\frac{q-1}{\alpha}.

Reed-Muller codes over odd prime fields
We start with Reed-Muller codes over odd prime fields, for which the only preexisting weight bound we are aware of is the following result of [BGY20]:

Theorem 24 ([BGY20]).

For any 0<δ<120<\delta<\frac{1}{2}, there are constants c1,c2>0c_{1},c_{2}>0 such that for any odd prime qq and for any integers d,nd,n such that dδnd\leq\delta n, we have

Prc𝒟(n,d)[|c|N11qqc1nd]qc2(nd).\Pr_{c\sim\mathcal{D}(n,d)}\Big{[}\frac{|c|}{N}\leq 1-\frac{1}{q}-q^{-c_{1}\frac{n}{d}}\Big{]}\leq q^{-c_{2}\binom{n}{\leq d}}.

This was a generalization of [BHL12], who proved the same result for Reed-Muller codes over 𝔽2\mathbb{F}_{2}. Theorem 24 is very strong for small degrees, but gets weaker as the degree increases. When dd is linear in nn we have qc1nd=Θ(1)q^{-c_{1}\frac{n}{d}}=\Theta(1), meaning that in this regime Theorem 24 can only give a nontrivial bound on normalized weights that are at least a constant away from 11q1-\frac{1}{q}. Our Theorem 23 gives nontrivial bounds for all normalized weights <11q<1-\frac{1}{q}, for all degrees d<nd<n.

Reed-Muller codes over 𝔽2\mathbb{F}_{2}
We now turn to Reed-Muller codes over 𝔽2\mathbb{F}_{2}, for which more results are known. The same bound as Theorem 24 was proven over 𝔽2\mathbb{F}_{2} by [BHL12]. For comparison with our Theorem 23, see the discussion above.

In the constant-rate regime (i.e. d=n2±O(n)d=\frac{n}{2}\pm O(\sqrt{n})), the strongest known bounds for contant weights are the following two results of [Sam20]:

Theorem 25 ([Sam20]).

Let (nd)=η2n=ηN\binom{n}{\leq d}=\eta 2^{n}=\eta N for some η(0,1)\eta\in(0,1). Then for any α(0,12)\alpha\in(0,\frac{1}{2}) we have

Prc𝒟(n,d)[|c|αN]2o(N)(11η)2ln2αN2ηN.\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\alpha N]\leq 2^{o(N)}\left(\frac{1}{1-\eta}\right)^{2\ln 2\cdot\alpha N}2^{-\eta N}.

This result is strong when α\alpha is away from 1/21/2. For α\alpha close to 1/21/2, the following bound is stronger.

Theorem 26 ([Sam20]).

Let (nd)=η2n=ηN\binom{n}{\leq d}=\eta 2^{n}=\eta N for some η(0,1)\eta\in(0,1), and define A={1η2ln22,,12}A=\{\frac{1-\eta^{2\ln 2}}{2},...,\frac{1}{2}\}. Then for any α(0,12)\alpha\in(0,\frac{1}{2}),

Prc𝒟(n,d)[|c|αN]2o(N){(NαN)2Nif αA,1(1η2ln2)αN(1+η2ln2)(1α)Notherwise.\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\alpha N]\leq 2^{o(N)}\cdot\begin{cases}\frac{\binom{N}{\alpha N}}{2^{N}}&\text{if $\alpha\in A$,}\\ \frac{1}{(1-\eta^{2\ln 2})^{\alpha N}(1+\eta^{2\ln 2})^{(1-\alpha)N}}&\text{otherwise.}\end{cases}

We note that the combination of Theorems 25 and 26 is stronger than our Theorem 23 whenever both the rate of the code and the normalized weight of the codeword are constant (i.e. α=Θ(1)\alpha=\Theta(1) and d=n2±O(n)d=\frac{n}{2}\pm O(\sqrt{n})).

However, when the normalized weight is subconstant or when the degree is away from n2\frac{n}{2} (i.e. α=o(1)\alpha=o(1) or d=n2Θ(n)d=\frac{n}{2}-\Theta(n)), the 2o(N)2^{o(N)} term becomes too large for Theorems 25 and 26 to give a strong bound. An approach that has been fairly successful in these two regimes (subonstant rate or subconstant weight) is the line of work of [KLP12, ASW15, SS20]. To our knowledge, the strongest results for these regimes are due to [SS20]. We start with their bound for lower weights, i.e. for weights in [0,N4].[0,\frac{N}{4}].

Theorem 27 ([SS20]).

For any integers j,n,dj,n,d, we have

Prc𝒟(n,d)[|c|2j2n]\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq 2^{-j}\cdot 2^{n}] 2(117(j1dn+2dn(1dn)2)(dn)j1)(nd)+O(n4).\displaystyle\leq 2^{-\big{(}1-17(\frac{j}{1-\frac{d}{n}}+\frac{2-\frac{d}{n}}{(1-\frac{d}{n})^{2}})(\frac{d}{n})^{j-1}\big{)}\binom{n}{\leq d}+O(n^{4})}.

We claim that for every d>n34d>\frac{n}{34}, there is some weight threshold Ad<14A_{d}<\frac{1}{4} for which our Theorem 23 is stronger than Theorem 27 for all weights larger than AdNA_{d}N. One way to see this is to note that our Theorem 23 satisfies

Pr[|c|2j2n]\displaystyle Pr[|c|\leq 2^{-j}\cdot 2^{n}] 2(1h(2j))(nd)\displaystyle\leq 2^{-\big{(}1-h(2^{-j})\big{)}\binom{n}{\leq d}}
2(12j2j)(nd),\displaystyle\leq 2^{-(1-2j\cdot 2^{-j})\binom{n}{\leq d}},

while the expression in Theorem 27 satisfies

2(117(j1dn+2dn(1dn)2)(dn)j1)(nd)2(117j(dn)j1)(nd).2^{-\big{(}1-17(\frac{j}{1-\frac{d}{n}}+\frac{2-\frac{d}{n}}{(1-\frac{d}{n})^{2}})(\frac{d}{n})^{j-1}\big{)}\binom{n}{\leq d}}\geq 2^{-\big{(}1-17j(\frac{d}{n})^{j-1}\big{)}\binom{n}{\leq d}}.

Thus our Theorem 23 is stronger than Theorem 27 whenever j2(j1)<17j(dn)j1j\cdot 2^{-(j-1)}<17j\cdot(\frac{d}{n})^{j-1}, i.e. whenever

j<log17logn2d+1.j<\frac{\log 17}{\log\frac{n}{2d}}+1.

For any d>n34d>\frac{n}{34}, this gives a nontrivial range.

This concludes our comparison of Theorem 23 with Theorem 27, which was the bound of [SS20] for weights in [0,N4].[0,\frac{N}{4}]. We now turn to their bounds for larger weights.

Theorem 28 ([SS20]).

Let j,nj,n\in\mathbb{N} and let 0<γ(n)<12Ω(lognn)0<\gamma(n)<\frac{1}{2}-\Omega\left(\sqrt{\frac{\log n}{n}}\right) be a parameter (which may be constant or depend on nn) such that j+log112γ(12γ)2=o(n).\frac{j+\log\frac{1}{1-2\gamma}}{(1-2\gamma)^{2}}=o(n). Then

Prc𝒟(n,γn)[|c|12j2N]22c(γ,j)(nd)+O(n4),\Pr_{c\sim\mathcal{D}(n,\gamma n)}[|c|\leq\frac{1-2^{-j}}{2}N]\leq 2^{-2^{-c(\gamma,j)}\binom{n}{\leq d}+O(n^{4})},

where c(γ,j)=O(γ2j+γlog112γ12γ+γ)c(\gamma,j)=O\left(\frac{\gamma^{2}j+\gamma\log\frac{1}{1-2\gamma}}{1-2\gamma}+\gamma\right).

This bound holds when the degree is smaller than n2\frac{n}{2}. For arbitrary degree, [SS20] gives the following:

Theorem 29 ([SS20]).

For any integers n,dn,d and any δ>0\delta>0, we have

Prc𝒟(n,d)[|c|1δ2N]\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\frac{1-\delta}{2}N] eδ222d.\displaystyle\leq e^{-\frac{\delta^{2}}{2}\cdot 2^{d}}.

We will start by comparing our Theorem 23 with Theorem 29. Applying Lemma 14, we get from Theorem 23 that

Prc𝒟(n,d)[|c|1δ2N]\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\frac{1-\delta}{2}N] 2(1h(1δ2))(nd)\displaystyle\leq 2^{-(1-h(\frac{1-\delta}{2}))\cdot\binom{n}{\leq d}}
eδ22(nd).\displaystyle\leq e^{-\frac{\delta^{2}}{2}\cdot\binom{n}{\leq d}}.

Thus our Theorem 23 is strictly stronger than Theorem 29 for all d<nd<n. We will now compare our Theorem 23 with Theorem 28. Applying Lemma 14, we get from Theorem 23 that

Prc𝒟(n,d)[|c|12j2N]\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\frac{1-2^{-j}}{2}N] 2(1h(12j2))(nd)\displaystyle\leq 2^{-(1-h(\frac{1-2^{-j}}{2}))\cdot\binom{n}{\leq d}}
222j2ln2(nd).\displaystyle\leq 2^{-\frac{2^{-2j}}{2\ln 2}\cdot\binom{n}{\leq d}}.

It follows that our Theorem 23 is stronger than Theorem 28 whenever 2(2j+1)2c(γ,j)2^{-(2j+1)}\geq 2^{-c(\gamma,j)}, i.e. whenever

2j+1c(γ,j).\displaystyle 2j+1\leq c(\gamma,j).

But c(γ,j):=O(γ212γj+γlog112γ12γ+γ)c(\gamma,j):=O\left(\frac{\gamma^{2}}{1-2\gamma}\cdot j+\frac{\gamma\log\frac{1}{1-2\gamma}}{1-2\gamma}+\gamma\right), and γ212γ\frac{\gamma^{2}}{1-2\gamma}\rightarrow\infty as γ1/2\gamma\rightarrow 1/2. Thus there exists some constant γ(0,12)\gamma^{*}\in(0,\frac{1}{2}) such that our Theorem 23 is stronger than Theorem 28 whenever d>γnd>\gamma^{*}n. In private correspondence with Amir Shpilka and Ori Sberlo, we learned that γ\gamma^{*} can be computed to be γ0.38\gamma^{*}\approx 0.38.

Appendix B Proof of Corollary 3

Recall that for any ϵ(0,1)\epsilon\in(0,1) we defined

Aϵ={αN:h(α)>1h(ϵ)N1/5},A_{\epsilon}=\{\alpha N:h(\alpha)>1-h(\epsilon)-N^{-1/5}\},

and that for any code CC we denote by 𝒟(C)\mathcal{D}(C^{\perp}) the uniform distribution over the dual code CC^{\perp}. We now restate and prove our Corollary 3.

Corollary.

Let C𝔽2NC\subseteq\mathbb{F}_{2}^{N} be a linear code, and let ϵ(0,12)\epsilon\in(0,\frac{1}{2}) be arbitrary. Suppose that for every jAϵj\in A_{\epsilon} we have

Pry𝒟(C)[|y|=j](1+o(N1))(Nj)2N,\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|=j\big{]}\leq\big{(}1+o(N^{-1})\big{)}\frac{\binom{N}{j}}{2^{N}},

and suppose that

Pry𝒟(C)[|y|Aϵ]2N34iAϵ(Ni)2N.\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|\notin A_{\epsilon}\big{]}\leq 2^{N^{\frac{3}{4}}}\cdot\frac{\sum_{i\notin A_{\epsilon}}\binom{N}{i}}{2^{N}}.

Then CC is resilient to ϵ\epsilon-errors.

Proof.

From Theorem 2, we know that there exists some decoder d:𝔽2NCd:\mathbb{F}_{2}^{N}\rightarrow C such that for all cCc\in C,

PrρPϵ[d(c+ρ)c]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[d(c+\rho)\neq c] 2eN3ϵ+NmaxS{ϵN±N3/4}1|S|2{1(NS)j=0NPrcC[|c|=j]KS(j)21},\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+N\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j=0}^{N}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2}-1\Big{\}}, (23)

where (NS)=sS(Ns)\binom{N}{S}=\sum_{s\in S}\binom{N}{s} and where KS=sSKsK_{S}=\sum_{s\in S}K_{s} for KsK_{s} the Krawtchouk polynomial of degree ss. Let ν\nu be such that h(ν)=1h(ϵ)N1/5h(\nu)=1-h(\epsilon)-N^{-1/5}, and define the set of weights

Aϵ={νN,,(1ν)N}.A_{\epsilon}=\{\nu N,...,(1-\nu)N\}.

We will start by bounding the central terms jAϵj\in A_{\epsilon} in equation (23). Applying Proposition 7 and the first condition in our theorem statement, we immediately get that for any S{0,,N}S\subseteq\{0,...,N\},

1(NS)jAϵPrcC[|c|=j]KS(j)2\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\in A_{\epsilon}}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2} 1+o(1N).\displaystyle\leq 1+o\big{(}\frac{1}{N}\big{)}. (24)

We now turn to the faraway terms jAϵj\notin A_{\epsilon}. For these we note that by definition of Krawtchouk polynomials, for any integer ss we have

Ks(x)=j=0s(1)j(xj)(Nxsj)j=0s(xj)(Nxsj)=(Ns).\displaystyle K_{s}(x)=\sum_{j=0}^{s}(-1)^{j}\binom{x}{j}\binom{N-x}{s-j}\leq\sum_{j=0}^{s}\binom{x}{j}\binom{N-x}{s-j}=\binom{N}{s}.

For any S{0,,N}S\subseteq\{0,...,N\}, we can then bound the faraway terms jAϵj\notin A_{\epsilon} of equation (23) by

1(NS)jAϵPrcC[|c|=j]KS(j)2\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\notin A_{\epsilon}}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2} (NS)Pr[|y|Aϵ].\displaystyle\leq\binom{N}{S}\Pr\big{[}|y|\notin A_{\epsilon}\big{]}.

Applying the second condition in our theorem statement in combination with Fact 13 and the subbaditivity of entropy, we get that

maxS{ϵN±N3/4}1|S|2{1(NS)jAϵPrcC[|c|=j]KS(j)2}\displaystyle\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j\notin A_{\epsilon}}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2}\Big{\}} 2(NϵN+N3/4)22h(ϵ)NN4/5+N3/4\displaystyle\leq 2\binom{N}{\epsilon N+N^{3/4}}\cdot 2\cdot 2^{-h(\epsilon)N-N^{4/5}+N^{3/4}}
42h(ϵ)N+h(N1/4)N2h(ϵ)NN4/5+N3/4\displaystyle\leq 4\cdot 2^{h(\epsilon)N+h(N^{-1/4})N}\cdot 2^{-h(\epsilon)N-N^{4/5}+N^{3/4}}
o(1N).\displaystyle\leq o(\frac{1}{N}).

Combining this bound for the faraway terms with our bound (24) for the central terms, we bound equation (23) by

PrρPϵ[d(c+ρ)c)]\displaystyle\Pr_{\rho\sim P_{\epsilon}}[d(c+\rho)\neq c)] 2eN3ϵ+No(1N)\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+N\cdot o\big{(}\frac{1}{N}\big{)}
o(1).\displaystyle\leq o(1).

Appendix C Lower Bounds on List Decoding

Claim 30.

Let ϵ(0,12)\epsilon\in(0,\frac{1}{2}) be arbitrary, and consider any N>10ϵ2N>\frac{10}{\epsilon^{2}}. Suppose a code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} and a decoder dk:𝔽2NCkd_{k}:\mathbb{F}_{2}^{N}\rightarrow C^{\otimes k} satisfy

PrρPϵc𝒟(𝒞)[cdk(c+ρ)]34,\mathop{\Pr}_{\begin{subarray}{c}\rho\sim P_{\epsilon}\\ c\sim\mathcal{D(C)}\end{subarray}}[c\in d_{k}(c+\rho)]\geq\frac{3}{4},

for PϵP_{\epsilon} the ϵ\epsilon-noisy distribution and 𝒟(𝒞)\mathcal{D(C)} the uniform distribution on CC. Then we must have

k|C|2(1h(ϵ))N2h(ϵ)N3/48.k\geq|C|\cdot 2^{-(1-h(\epsilon))N}\cdot\frac{2^{-h(\epsilon)N^{3/4}}}{8}.
Proof.

We will first show that in order for the decoder dkd_{k} to succeed with high probability, there must be many codewords cCc\in C for which

|{x𝔽2N:cdk(x)}|2h(ϵ)N.|\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}|\gtrsim 2^{h(\epsilon)N}.

Intuitively this is because the sphere of radius ϵN\epsilon N around any codeword cc contains 2h(ϵ)N\approx 2^{h(\epsilon)N} points (and for any sent codeword cc, with high probability the received message mm will satisfy |m+c|ϵN|m+c|\approx\epsilon N). We will then simply double-count the number of pairs (x,c)(x,c) for which cdk(x)c\in d_{k}(x). On the one hand, there are 2Nk2^{N}\cdot k such pairs, since every received message is mapped to kk codewords; on the other hand, there must be at least about |C|2h(ϵ)N|C|\cdot 2^{h(\epsilon)N} pairs, since as we’ve just argued most codewords in CC need to be matched to at least 2h(ϵ)N\approx 2^{h(\epsilon)N} points. It follows that we must have

k|C|2h(ϵ)N2N.k\gtrsim|C|\cdot\frac{2^{h(\epsilon)N}}{2^{N}}.

Formally, we first note that the theorem condition implies that at least |C|2\frac{|C|}{2} codewords cCc\in C must satisfy

PrρPϵ[cdk(c+ρ)]12.\displaystyle\Pr_{\rho\sim P_{\epsilon}}[c\in d_{k}(c+\rho)]\geq\frac{1}{2}. (25)

Fix any such cc. Now from Chernoff’s bound (i.e Lemma 16), we have for NN large enough that

PrρPϵ[|ρ|ϵNϵN3/4]\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}|\rho|\leq\epsilon N-\epsilon N^{3/4}\big{]} 14.\displaystyle\leq\frac{1}{4}.

In order for cc to satisfy cdk(c+ρ)c\in d_{k}(c+\rho) with probability at least 12\frac{1}{2}, there must then be a subset Sc{x𝔽2N:|c+x|ϵNϵN3/4}S_{c}\subseteq\{x\in\mathbb{F}_{2}^{N}:|c+x|\geq\epsilon N-\epsilon N^{3/4}\} satisfying both

xSccdk(x)\displaystyle x\in S_{c}\implies c\in d_{k}(x) (26)

and

PrρPϵ[ρSc]14.\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}\rho\in S_{c}\big{]}\geq\frac{1}{4}. (27)

But every element xScx\in S_{c} satisfies |c+x|ϵNϵN3/4|c+x|\geq\epsilon N-\epsilon N^{3/4}, so every xScx\in S_{c} satisfies

PrρPϵ[ρ=c+x]\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}\rho=c+x\big{]} ϵϵNϵN3/4(1ϵ)(1ϵ)N+ϵN3/4\displaystyle\leq\epsilon^{\epsilon N-\epsilon N^{3/4}}(1-\epsilon)^{(1-\epsilon)N+\epsilon N^{3/4}}
2(1N1/4)h(ϵ)N\displaystyle\leq 2^{-(1-N^{-1/4})h(\epsilon)N} (28)

Equations (27) and (C) imply that any cCc\in C that can be list-decoded by dkd_{k} with probability 12\geq\frac{1}{2} must satisfy |Sc|2(1N1/4)h(ϵ)N4|S_{c}|\geq\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}. It then follows from (26) that any such cc must satisfy

|{x𝔽2N:cdk(x)}|2(1N1/4)h(ϵ)N4.\big{|}\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}\big{|}\geq\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}.

By double counting, we get

2Nk\displaystyle 2^{N}\cdot k =cC|{x𝔽2N:cdk(x)}|\displaystyle=\sum_{c\in C}\big{|}\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}\big{|}
|C|22(1N1/4)h(ϵ)N4\displaystyle\geq\frac{|C|}{2}\cdot\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}
=|C|82h(ϵ)Nh(ϵ)N3/4.\displaystyle=\frac{|C|}{8}\cdot 2^{h(\epsilon)N-h(\epsilon)N^{3/4}}.

The result then follows from rearranging terms. ∎

Appendix D Other Proofs for Sections 1 and 2

D.1 On Known List-Decoding Bounds for Reed-Muller Codes

We recall the known list-decoding bound for Reed-Muller codes (see equation (3) in section 1):

|T|=2ϵNlog4ϵ(1ϵ)(1η)4ln2+o(N)\displaystyle|T|=2^{\epsilon N\log\frac{4\epsilon(1-\epsilon)}{(1-\eta)^{4\ln 2}}+o(N)}

We claim this bound never achieves the information-theoretic 2h(ϵ)N(Ndim C)+o(N)2^{h(\epsilon)N-(N-\textnormal{dim }C)+o(N)}.

Claim 31.

For any ϵ(0,12)\epsilon\in(0,\frac{1}{2}) and any γ=γ(ϵ)(0,1)\gamma=\gamma(\epsilon)\in(0,1), we have

ϵlog4ϵ(1ϵ)γ4ln2>h(ϵ)γ.\displaystyle\epsilon\log\frac{4\epsilon(1-\epsilon)}{\gamma^{4\ln 2}}>h(\epsilon)-\gamma.
Proof.

We will show that for any ϵ(0,12)\epsilon\in(0,\frac{1}{2}) and c=γϵ<1ϵc=\frac{\gamma}{\epsilon}<\frac{1}{\epsilon} we have

ϵlog4ϵ(1ϵ)(cϵ)2>h(ϵ)cϵ,\displaystyle\epsilon\log\frac{4\epsilon(1-\epsilon)}{(c\epsilon)^{2}}>h(\epsilon)-c\epsilon,

i.e. that

f(ϵ,c):=log(1ϵ)+2ϵ2ϵlogc+cϵ>0.\displaystyle f(\epsilon,c):=\log(1-\epsilon)+2\epsilon-2\epsilon\log c+c\epsilon>0. (29)

We first fix some ϵ(0,12)\epsilon\in(0,\frac{1}{2}) and compute the cc maximizing f(ϵ,c)f(\epsilon,c). Note that

cf(ϵ,c)\displaystyle\frac{\partial}{\partial c}f(\epsilon,c) =2ϵcln2+ϵ\displaystyle=-\frac{2\epsilon}{c\ln 2}+\epsilon

and

2c2f(ϵ,c)\displaystyle\frac{\partial^{2}}{\partial c^{2}}f(\epsilon,c) =2ϵc2ln2>0,\displaystyle=\frac{2\epsilon}{c^{2}\ln 2}>0,

so f(ϵ,c)f(\epsilon,c) is minimized at c=2ln2c=\frac{2}{\ln 2} and increasing over c[0,2ln2]c\in[0,\frac{2}{\ln 2}]. We thus have

minc<1ϵf(ϵ,c)={f(ϵ,2ln2)if ϵ<ln22,f(ϵ,1ϵ)otherwise.\displaystyle\min_{c<\frac{1}{\epsilon}}f(\epsilon,c)=\begin{cases}f(\epsilon,\frac{2}{\ln 2})&\text{if $\epsilon<\frac{\ln 2}{2}$,}\\ f(\epsilon,\frac{1}{\epsilon})&\text{otherwise.}\end{cases} (30)

We deal with each case separately. For the case ϵ<ln22\epsilon<\frac{\ln 2}{2}, we want to show that

f(ϵ,2ln2)=log(1ϵ)+2ϵlog(ln2)+2ϵln20.\displaystyle f(\epsilon,\frac{2}{\ln 2})=\log(1-\epsilon)+2\epsilon\log(\ln 2)+\frac{2\epsilon}{\ln 2}\geq 0.

The first derivative is

ϵf(ϵ,2ln2)=1(1ϵ)ln2+2log(ln2)+2ln2,\displaystyle\frac{\partial}{\partial\epsilon}f(\epsilon,\frac{2}{\ln 2})=-\frac{1}{(1-\epsilon)\ln 2}+2\log(\ln 2)+\frac{2}{\ln 2},

and the second derivative is

2ϵ2f(ϵ,2ln2)=1(1ϵ)2ln2<0.\displaystyle\frac{\partial^{2}}{\partial\epsilon^{2}}f(\epsilon,\frac{2}{\ln 2})=-\frac{1}{(1-\epsilon)^{2}\ln 2}<0.

Thus the function f(ϵ,2ln2)f(\epsilon,\frac{2}{\ln 2}) is maximized at ϵ=11(2log(ln2)+2ln2)ln20.21,\epsilon^{*}=1-\frac{1}{(2\log(\ln 2)+\frac{2}{\ln 2})\ln 2}\approx 0.21, and monotone on each side of ϵ\epsilon^{*}. In particular, since ϵ[0,ln22]\epsilon^{*}\in[0,\frac{\ln 2}{2}] we know that over the interval [0,ln22][0,\frac{\ln 2}{2}] the function f(ϵ,2ln2)f(\epsilon,\frac{2}{\ln 2}) achieves its minimum at either ϵ=0\epsilon=0 or ϵ=ln22\epsilon=\frac{\ln 2}{2}. But f(0,2ln2)=0<f(2ln2,ln22)f(0,\frac{2}{\ln 2})=0<f(\frac{2}{\ln 2},\frac{\ln 2}{2}), so we indeed have that

f(ϵ,2ln2)0f(\epsilon,\frac{2}{\ln 2})\geq 0

for all ϵ<ln22\epsilon<\frac{\ln 2}{2}. This deals with the first case of (30). For the second case of (30), we want to show that for all ϵ(0,12)\epsilon\in(0,\frac{1}{2}) we have

f(ϵ,1ϵ)0.f(\epsilon,\frac{1}{\epsilon})\geq 0.

But

f(ϵ,1ϵ)=log(1ϵ)+2ϵ+2ϵlogϵ+1f(\epsilon,\frac{1}{\epsilon})=\log(1-\epsilon)+2\epsilon+2\epsilon\log\epsilon+1

is decreasing in ϵ\epsilon and f(12,2)=0f(\frac{1}{2},2)=0, so we indeed have f(ϵ,1ϵ)0f(\epsilon,\frac{1}{\epsilon})\geq 0 for all ϵ.\epsilon.

D.2 Duals of Transitive Codes - Proof of Fact 10

Claim.

The dual code CC^{\perp} of a transitive code C𝔽2NC\subseteq\mathbb{F}_{2}^{N} is transitive.

Proof.

Let i,j[N]i,j\in[N] be arbitrary. Since CC is transitive, we know there exists a permutation π:[N][N]\pi:[N]\rightarrow[N] such that π(j)=i\pi(j)=i and that for any c=(c1,,cN)Cc=(c_{1},...,c_{N})\in C, we have cπ:=(cπ(1),,π(N))Cc_{\pi}:=(c_{\pi(1),...,\pi(N)})\in C . Clearly π1\pi^{-1} satisfies π1(i)=j\pi^{-1}(i)=j, and we claim that it also satisfies that vπ1Cv_{\pi^{-1}}\in C^{\perp} for all vCv\in C^{\perp}. For this we note that since cπCc_{\pi}\in C for every cCc\in C, we have by definition that every vCv\in C^{\perp} satisfies

kvkcπ(k)=0 for all cC.\sum_{k}v_{k}c_{\pi(k)}=0\textnormal{ for all }c\in C.

We thus have

vC\displaystyle v\in C^{\perp} kvkcπ(k)=0 for all cC\displaystyle\implies\sum_{k}v_{k}c_{\pi(k)}=0\textnormal{ for all }c\in C
kvπ1(k)ck=0 for all cC\displaystyle\implies\sum_{k}v_{\pi^{-1}(k)}c_{k}=0\textnormal{ for all }c\in C
vπ1C.\displaystyle\implies v_{\pi^{-1}}\in C^{\perp}.

D.3 Basic Properties of Reed-Muller Codes - Proof of Facts 11 and 12

Fact.

Let MM be the (nd)×N\binom{n}{\leq d}\times N generator matrix of the Reed-Muller code. The columns of MM that correspond to the points x𝔽2nx\in\mathbb{F}_{2}^{n} with |x|d|x|\leq d are linearly independent.

Proof.

Let MM^{\prime} be the submatrix of MM whose columns correspond to the points v𝔽2nv\in\mathbb{F}_{2}^{n} with |v|d|v|\leq d. It suffices to show that when you order the columns MvM^{\prime}_{v} of MM^{\prime} in increasing order of |v||v|, every column is linearly independent from the preceding ones. But this is clearly the case, as for the monomial m=i:vi=1xim=\prod_{i:v_{i}=1}x_{i} we have Mm,v=1M_{m,v}=1 and Mm,v=0M_{m,v^{\prime}}=0 for all vv^{\prime} preceding v. ∎

Fact.

For all nn and all d<nd<n, the Reed-Muller code 𝖱𝖬(n,d)𝔽2N\mathsf{RM}(n,d)\subseteq\mathbb{F}_{2}^{N} is transitive.

Proof.

Recall that we view each coordinate i[N]i\in[N] as a point vi𝔽2nv_{i}\in\mathbb{F}_{2}^{n}, and that every codeword in 𝖱𝖬(n,d)\mathsf{RM}(n,d) is the evaluation vector (f(v1),,f(vN))\big{(}f(v_{1}),...,f(v_{N})\big{)} of a polynomial ff of degree d\leq d in nn variables.

Now fix two points vi,vj𝔽2nv_{i},v_{j}\in\mathbb{F}_{2}^{n}. We want to show that there is a permutation π:𝔽2n𝔽2n\pi:\mathbb{F}_{2}^{n}\rightarrow\mathbb{F}_{2}^{n} such that

  1. (i)

    π(vi)=vj\pi(v_{i})=v_{j}

  2. (ii)

    If (zv1,,zvN)𝖱𝖬(n,d)\big{(}z_{v_{1}},...,z_{v_{N}}\big{)}\in\mathsf{RM}(n,d) then (zπ(v1),,zπ(vN))𝖱𝖬(n,d)\big{(}z_{\pi(v_{1})},...,z_{\pi(v_{N})}\big{)}\in\mathsf{RM}(n,d)

To this end, we choose the permutation π(x)=x+vi+vj\pi(x)=x+v_{i}+v_{j}. Then:

  1. (i)

    π(vi)=vi+vi+vj=vj\pi(v_{i})=v_{i}+v_{i}+v_{j}=v_{j}.

  2. (ii)

    If (zv1,,zvN)\big{(}z_{v_{1}},...,z_{v_{N}}\big{)} is a codeword, it can be written as (zv1,,zvN)=(f(v1),,f(vN))\big{(}z_{v_{1}},...,z_{v_{N}}\big{)}=\big{(}f(v_{1}),...,f(v_{N})\big{)} for some polynomial ff of degree d\leq d. But then the polynomial

    g(x)=f(x+vi+vj)g(x)=f(x+v_{i}+v_{j}) satisfies 𝖽𝖾𝗀(g)=𝖽𝖾𝗀(f)d\mathsf{deg}(g)=\mathsf{deg}(f)\leq d, so (g(v1),,g(vN))\big{(}g(v_{1}),...,g(v_{N})\big{)} must be a codeword. Then since g(x)=fπ(x)g(x)=f\circ\pi(x) by definition, we have that (zπ(v1),,zπ(vN))=(fπ(v1),,fπ(vN))=(g(v1),,g(vN))𝖱𝖬(n,d)\big{(}z_{\pi(v_{1})},...,z_{\pi(v_{N})}\big{)}=\big{(}f\circ\pi(v_{1}),...,f\circ\pi(v_{N})\big{)}=\big{(}g(v_{1}),...,g(v_{N})\big{)}\in\mathsf{RM}(n,d).

D.4 A version of Pinsker’s inequality - Proof of Lemma 14

Lemma.

For any μ(0,1)\mu\in(0,1), we have

μ22ln21h(1μ2)μ2\frac{\mu^{2}}{2\ln 2}\leq 1-h\left(\frac{1-\mu}{2}\right)\leq\mu^{2}
Proof.
1h(1μ2)\displaystyle 1-h(\frac{1-\mu}{2}) =1+1μ2log(1μ2)+1+μ2log(1+μ2)\displaystyle=1+\frac{1-\mu}{2}\log\left(\frac{1-\mu}{2}\right)+\frac{1+\mu}{2}\log\left(\frac{1+\mu}{2}\right)
=1μ2log(1μ)+1+μ2log(1+μ)\displaystyle=\frac{1-\mu}{2}\log\left(1-\mu\right)+\frac{1+\mu}{2}\log\left(1+\mu\right)
=12ln2[(1μ)i=1μii(1+μ)i=1(1)iμii]\displaystyle=\frac{1}{2\ln 2}\left[-(1-\mu)\sum_{i=1}^{\infty}\frac{\mu^{i}}{i}-(1+\mu)\sum_{i=1}^{\infty}(-1)^{i}\frac{\mu^{i}}{i}\right]
=12ln2[2μi=1μ2i12i12i=1μ2i2i]\displaystyle=\frac{1}{2\ln 2}\left[2\mu\sum_{i=1}^{\infty}\frac{\mu^{2i-1}}{2i-1}-2\sum_{i=1}^{\infty}\frac{\mu^{2i}}{2i}\right]
=1ln2i=1μ2i(12i112i)\displaystyle=\frac{1}{\ln 2}\sum_{i=1}^{\infty}\mu^{2i}\left(\frac{1}{2i-1}-\frac{1}{2i}\right)
=12ln2i=1μ2ii(2i1)\displaystyle=\frac{1}{2\ln 2}\sum_{i=1}^{\infty}\frac{\mu^{2i}}{i(2i-1)}

Thus 1h(1μ2)μ22ln21-h(\frac{1-\mu}{2})\geq\frac{\mu^{2}}{2\ln 2} and 1h(1μ2)12ln2i=1μ2i(2i1)=12ln22ln2μ2=μ21-h(\frac{1-\mu}{2})\leq\frac{1}{2\ln 2}\sum_{i=1}^{\infty}\frac{\mu^{2}}{i(2i-1)}=\frac{1}{2\ln 2}\cdot 2\ln 2\cdot\mu^{2}=\mu^{2}. ∎

References

  • [AHN21] Emmanuel Abbe, Jan Hazla, and Ido Nachum. Almost-reed-muller codes achieve constant rates for random errors. IEEE Trans. Inf. Theory, 67(12):8034–8050, 2021.
  • [Ari09] Erdal Arikan. Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory, 55(7):3051–3073, 2009.
  • [ASW15] Emmanuel Abbe, Amir Shpilka, and Avi Wigderson. Reed-muller codes for random erasures and errors. IEEE Trans. Inf. Theory, 61(10):5229–5252, 2015.
  • [ASY21] Emmanuel Abbe, Amir Shpilka, and Min Ye. Reed-muller codes: Theory and algorithms. IEEE Trans. Inf. Theory, 67(6):3251–3277, 2021.
  • [AY19] Emmanuel Abbe and Min Ye. Reed-muller codes polarize. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 273–286. IEEE Computer Society, 2019.
  • [Bar21] Alexander Barg. Stolarsky’s invariance principle for finite metric spaces. Mathematika, 67(1):158–186, 2021.
  • [BDM18] Dmitriy Bilyk, Feng Dai, and Ryan Matzke. Stolarsky principle and energy optimization on the sphere. Constructive Approximation, 48, 08 2018.
  • [BGY20] Paul Beame, Shayan Oveis Gharan, and Xin Yang. On the bias of reed-muller codes over odd prime fields. SIAM J. Discret. Math., 34(2):1232–1247, 2020.
  • [BHL12] Ido Ben-Eliezer, Rani Hod, and Shachar Lovett. Random low-degree polynomials are hard to approximate. Comput. Complex., 21(1):63–81, 2012.
  • [BK97] Jean Bourgain and Gil Kalai. Influences of variables and threshold intervals under group symmetries. Geometric & Functional Analysis GAFA, 7(3):438–461, 1997.
  • [BLM13] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities - A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
  • [dW08] Ronald de Wolf. A brief introduction to fourier analysis on the boolean cube. Theory Comput., 1:1–20, 2008.
  • [Eli57] Peter Elias. List decoding for noisy channels. Wescon Convention Record, Part 2, pages 94–104, 1957.
  • [Gal62] Robert G. Gallager. Low-density parity-check codes. IRE Trans. Inf. Theory, 8(1):21–28, 1962.
  • [GHK11] Venkatesan Guruswami, Johan Håstad, and Swastik Kopparty. On the list-decodability of random linear codes. IEEE Trans. Inf. Theory, 57(2):718–725, 2011.
  • [GHSZ02] Venkatesan Guruswami, Johan Håstad, Madhu Sudan, and David Zuckerman. Combinatorial bounds for list decoding. IEEE Trans. Inf. Theory, 48(5):1021–1034, 2002.
  • [GR08] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Trans. Inf. Theory, 54(1):135–150, 2008.
  • [GX12] Venkatesan Guruswami and Chaoping Xing. Folded codes from function field towers and improved optimal rate list decoding. In Howard J. Karloff and Toniann Pitassi, editors, Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 339–350. ACM, 2012.
  • [HRW17] Brett Hemenway, Noga Ron-Zewi, and Mary Wootters. Local list recovery of high-rate tensor codes & applications. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 204–215. IEEE Computer Society, 2017.
  • [HSS21] Jan Hazla, Alex Samorodnitsky, and Ori Sberlo. On codes decoding a constant fraction of errors on the BSC. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1479–1488. ACM, 2021.
  • [IS98] Mourad E.H Ismail and Plamen Simeonov. Strong asymptotics for krawtchouk polynomials. Journal of Computational and Applied Mathematics, 100(2):121–144, 1998.
  • [KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions. In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80. IEEE Computer Society, 1988.
  • [KKM+16] Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu, and Rüdiger L. Urbanke. Reed-muller codes achieve capacity on erasure channels. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 658–669. ACM, 2016.
  • [KL95] Gil Kalai and Nathan Linial. On the distance distribution of codes. IEEE Trans. Inf. Theory, 41(5):1467–1472, 1995.
  • [KL99] Ilia Krasikov and Simon Litsyn. Survey of binary krawtchouk polynomials. In Alexander Barg and Simon Litsyn, editors, Codes and Association Schemes, Proceedings of a DIMACS Workshop, Piscataway, New Jersey, USA, November 9-12, 1999, volume 56 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 199–211. DIMACS/AMS, 1999.
  • [KLP12] Tali Kaufman, Shachar Lovett, and Ely Porat. Weight distribution and list-decoding size of reed-muller codes. IEEE Trans. Inf. Theory, 58(5):2689–2696, 2012.
  • [Kop15] Swastik Kopparty. List-decoding multiplicity codes. Theory Comput., 11:149–182, 2015.
  • [KRU13] Shrinivas Kudekar, Tom Richardson, and Rüdiger L. Urbanke. Spatially coupled ensembles universally achieve capacity under belief propagation. IEEE Trans. Inf. Theory, 59(12):7761–7813, 2013.
  • [LMS+97] Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, Daniel A. Spielman, and Volker Stemann. Practical loss-resilient codes. In Frank Thomson Leighton and Peter W. Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, pages 150–159. ACM, 1997.
  • [LW18] Ray Li and Mary Wootters. Improved list-decodability of random linear binary codes. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, volume 116 of LIPIcs, pages 50:1–50:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
  • [MRR+20] Jonathan Mosheiff, Nicolas Resch, Noga Ron-Zewi, Shashwat Silas, and Mary Wootters. LDPC codes achieve list decoding capacity. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 458–469. IEEE, 2020.
  • [MS77] Florence MacWilliams and Neil Sloane. The theory of error correcting codes. North-Holland Publishing Company, 1977.
  • [Pol19] Yury Polyanskiy. Hypercontractivity of spherical averages in hamming space. SIAM J. Discret. Math., 33(2):731–754, 2019.
  • [RP21] Galen Reeves and Henry D. Pfister. Reed-muller codes achieve capacity on BMS channels. CoRR, abs/2110.14631, 2021.
  • [Sam20] Alex Samorodnitsky. An upper bound on lql_{q} norms of noisy functions. IEEE Trans. Inf. Theory, 66(2):742–748, 2020.
  • [Sha48] Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27(3):379–423, 1948.
  • [Skr19] M. Skriganov. Point distributions in two-point homogeneous spaces. Mathematika, 65:557–587, 03 2019.
  • [SS20] Ori Sberlo and Amir Shpilka. On the performance of reed-muller codes with respect to random errors and erasures. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1357–1376. SIAM, 2020.
  • [Tal94] Michel Talagrand. On russo’s approximate zero-one law. The Annals of Probability, 22(3):1576–1587, 1994.