A Criterion for Decoding on the BSC

Anup Rao
School of Computer Science, University of Washington
[email protected] Supported by NSF CCF-2131899. Oscar Sprumont
School of Computer Science, University of Washington
[email protected] Supported by NSERC PGSD3-545945-2020 and NSF CCF-2131899.

Abstract

We present an approach to showing that a linear code is resilient to random errors. We use this approach to obtain decoding results for both transitive codes and Reed-Muller codes. We give three kinds of results about linear codes in general, and transitive linear codes in particular.

1.

We give a tight bound on the weight distribution of every transitive linear code $C\subseteq\mathbb{F}_{2}^{N}$ : $\Pr_{c\in C}[|c|=\alpha N]\leq 2^{-(1-h(\alpha))\cdot\mathsf{dim}(C)}$ .
2.

We give a criterion that certifies that a linear code $C$ can be decoded on the binary symmetric channel. Let $K_{s}(x)$ denote the Krawtchouk polynomial of degree $s$ , and let $C^{\perp}$ denote the dual code of $C$ . We show that bounds on $\mathop{\mathbb{missing}}{E}_{c\in C^{\perp}}[K_{\epsilon N}(|c|)^{2}]$ imply that $C$ recovers from errors on the binary symmetric channel with parameter $\epsilon$ . Weaker bounds can be used to obtain list-decoding results using similar methods. One consequence of our criterion is that whenever the weight distribution of $C^{\perp}$ is sufficiently close to the binomial distribution in some interval around $\frac{N}{2}$ , $C$ is resilient to $\epsilon$ -errors.
3.

We combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for Reed-Muller codes, to obtain list-decoding results for both these families of codes. In some regimes, our bounds for Reed-Muller codes achieve the information-theoretic optimal trade-off between rate and list size.

1 Introduction

In his seminal 1948 paper, Shannon laid out the bases of coding theory and introduced the concept of channel capacity, which is the maximal rate at which information can be transmitted over a communication channel [Sha48]. The two channels that have received the most attention are the Binary Symmetric Channel (BSC), where each bit is independently flipped with some probability $\epsilon$ , and the Binary Erasure Channel (BEC), where each bit is independently replaced by an erasure symbol with some probability $\epsilon$ . Shannon’s work initiated a decades-long search for explicit codes that can achieve high rates over a noisy channel.

Explicit construction of codes often have a lot of symmetry. In particular, many known constructions of codes are transitive. The group of symmetries of a code is the subgroup $G$ of permutations $\pi:\{1,\dotsc,N\}\rightarrow\{1,\dotsc,N\}$ such that permuting the coordinates of each of the codewords using $\pi$ does not change the code. A code is transitive if for every two coordinates $i,j$ , there is a permutation $\pi\in G$ with $\pi(i)=j$ . A code is $2$ -transitive if for every $i\neq k$ , $j\neq\ell$ there is a permutation $\pi\in G$ with $\pi(i)=j,\pi(k)=\ell$ . Many known constructions of codes are cyclic, and every cyclic code is transitive. Reed-Solomon codes, BCH codes and Reed-Muller codes are all transitive.

The binary code that is arguably the cleanest explicit candidate to achieving capacity over both the BSC and the BEC is the family of Reed-Muller codes. The codewords of the Reed-Muller code $\mathsf{RM}(n,d)$ are the evaluation vectors (over all points in $\mathbb{F}_{2}^{n}$ ) of all multivariate polynomials of degree $d$ in $n$ variables.

Reed-Muller codes enjoy strong symmetry beyond transitivity: their symmetry group is the group of invertible affine transformations over $\mathbb{F}_{2}^{n}$ . Using fundamental results from Fourier analysis about the influences of symmetric boolean functions [KKL88, Tal94, BK97] has led to a very successful line of work, with [KKM⁺16] showing that Reed-Muller codes achieve capacity over the BEC and [HSS21] showing that they are polynomially close to achieving capacity over the BSC. In fact, [KKM⁺16] show that if a linear code $C\subseteq\mathbb{F}_{2}^{N}$ has a $2$ -transitive symmetry group $G$ such that for every $S\subseteq\{1,\dotsc,N\}$ with $|S|=(s\log N)^{0.99}$ , $|\{\pi(S):\pi\in G\}|\geq N^{s}$ , then $C$ can tolerate $\epsilon-O(1/s)$ fraction of random erasures. Given these results, it is natural to investigate the types of symmetry that lead to good codes. In this paper, we prove three kinds of results relevant to understanding the error resilience of general linear codes, transitive linear codes, and Reed-Muller codes.

1.

We give a clean and tight weight distribution bound for every transitive linear code. We show that for any such code $C\subseteq\mathbb{F}_{2}^{N}$ ,

$\Pr_{c\in C}[|c|=\alpha N]\leq 2^{-(1-h(\alpha))\cdot\mathsf{dim}(C)}.$

This bound is proved by combining transitivity with the subadditivity of entropy. In some regimes, it improves on all previously known weight bounds for Reed-Muller codes (see Appendix A).
2.

We give a new criterion to validate that a code can be decoded over the BSC. Let

$K_{t}(x)=\sum_{j=0}^{t}(-1)^{j}\binom{x}{j}\binom{N-x}{t-j}$

denote the Krawtchouk polynomial of degree $t$ , and let $C^{\perp}$ denote the dual code of $C$ . In spirit, our criterion says that any code $C$ satisfying

$\mathop{\mathbb{missing}}{E}_{c\in C^{\perp}}[K_{\epsilon N}(|c|)^{2}]<(1+o(N^{-1}))\cdot\binom{N}{\epsilon N}$

can be uniquely decoded on the BSC with high probability. Our actual result is a little more technically involved (see Theorem 2). This criterion implies that any code whose dual codewords are distributed sufficiently close to the binomial distribution must be resilient to $\epsilon$ -errors (see Corollary 3). Moreover, if the above expectation is bounded by $o(k\binom{N}{\epsilon N}^{-1})$ , then we prove that the code can be list-decoded with a list size of about $k$ .
3.

Finally, we combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for Reed-Muller codes, to obtain list-decoding results for both families of codes. In some regimes, our bounds for Reed-Muller codes achieve the information-theoretic optimal trade-off between rate and list size.

Next, we discuss our results more rigorously. We note that throughout this section, for any set $X$ we denote the uniform distribution over $X$ by $\mathcal{D}(X)$ .

I. Weight Bounds for Transitive Codes
We bound the weight distribution of any transitive linear code over any prime field. See section 6 for the proof.

Theorem 1.

Let $C\subseteq\mathbb{F}_{q}^{N}$ be a transitive linear code. Then for any $\alpha\in(0,1-1/q)$ we have

\Pr_{c\sim\mathcal{D}(C)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\textnormal{dim }C},

where $\mathcal{D}(C)$ is the uniform distribution over all codewords in $C$ , $|c|$ is the number of non-zero coordinates of $c$ , and $h_{q}$ is the q-ary entropy

h_{q}(\alpha)=(1-\alpha)\log_{q}\frac{1}{1-\alpha}+\alpha\log_{q}\frac{q-1}{\alpha}.

Note that $h_{2}(\alpha)$ denotes the binary entropy function. We note that in some regimes (for e.g. when the degree satisfies $0.38n<d<0.499n$ and $\alpha$ is larger than some constant depending on $d/n$ ), the bound above improves on all previously proven weight distribution bounds for Reed-Muller codes, even though the only feature of the code that we use is transitivity. See Appendix A for some details.

II. A Criterion for Decoding on the BSC
We develop a new approach for proving decoding results over the BSC, i.e. the communication channel whose errors $z\in\mathbb{F}_{2}^{N}$ are sampled from the $\epsilon$ -noisy distribution

P_{\epsilon}(z)=\epsilon^{|z|}(1-\epsilon)^{N-|z|}

for some $\epsilon\in(0,1)$ . Our approach is based on Fourier analysis, although unlike [KKM⁺16] and [HSS21], the ideas we use do not rely on bounds on influences. We obtain the following result (recall that $\mathcal{D}(C^{\perp})$ denotes the uniform distribution over $C^{\perp}$ ):

Theorem 2.

Let $C\subseteq\mathbb{F}_{2}^{N}$ be any linear code, and denote by $C^{\perp}\subseteq\mathbb{F}_{2}^{N}$ its dual code. Then for any $\epsilon\in(0,$ $\frac{1}{2})$ , there exists a decoding function $d:\mathbb{F}_{2}^{N}\rightarrow\mathbb{F}_{2}^{N}$ such that for all $c\in C$ we have

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[d(c+\rho)\neq c]

\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+N\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{c\sim\mathcal{D}(C^{\perp})}\big{[}K_{S}(|c|)^{2}\big{]}-1\Big{\}},

where $\binom{N}{S}=\sum_{j\in S}\binom{N}{S}$ , and where $K_{S}(x)=\sum_{j\in S}K_{j}(x)$ for $K_{j}$ the Krawtchouk polynomial of degree $j$ .

We will now consider one interesting consequence of Theorem 2. Let $\epsilon\in(0,\frac{1}{2})$ be arbitrary, and define

A_{\epsilon}=\{\alpha N:h(\alpha)>1-h(\epsilon)-N^{-1/5}\}.

Our next corollary states that whenever the dual codewords of $C$ are distributed sufficiently close to the binomial distribution for all weights in $A_{\epsilon}$ , the code $C$ must be resilient to $\epsilon$ -errors. See Appendix B for the proof.

Corollary 3.

Let $C\subseteq\mathbb{F}_{2}^{N}$ be a linear code, and let $\epsilon\in(0,\frac{1}{2})$ be arbitrary. Suppose that for every $j\in A_{\epsilon}$ we have

\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|=j\big{]}\leq\big{(}1+o(N^{-1})\big{)}\frac{\binom{N}{j}}{2^{N}},

and suppose that

\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|\notin A_{\epsilon}\big{]}\leq 2^{N^{\frac{3}{4}}}\cdot\frac{\sum_{i\notin A_{\epsilon}}\binom{N}{i}}{2^{N}}.

Then $C$ is resilient to $\epsilon$ -errors.

As a proof of concept, we note that a uniformly random linear code of dimension $(1-h(\epsilon))N+\sqrt{N}$ satisfies all these conditions simultaneously with high probability.

III. List Decoding Results
Using a generalized version of Theorem 2 (namely, Theorem 21 in section 5), we obtain list decoding bounds for both transitive codes and Reed-Muller codes. We start with our bound for Reed-Muller codes.

Theorem 4.

Let $\epsilon\in(0,\frac{1}{2})$ and $\gamma\in(0,1)$ be such that $1-\gamma\geq 2^{-\frac{2\epsilon}{(\ln 2)^{2}}}$ . Then the Reed-Muller code $\mathsf{RM}(n,d)$ of dimension $\binom{n}{\leq d}=(1-\gamma)N$ can with high probability list-decode $\epsilon$ -errors using a list $T$ of size

|T|=2^{(h(\epsilon)-\gamma)N+o(N)}+2^{4\epsilon N+o(N)}.

Although our lists have exponential size, for small $\epsilon$ the list size is non-trivial, in the sense that it is much smaller than the number of noise vectors (which is about $\binom{N}{\epsilon N}\approx 2^{h(\epsilon)N}$ ) and the number of codewords in the code (which is $2^{\textnormal{dim }C}$ ). In fact, a standard calculation (see Appendix C) shows that any code $C\subseteq\mathbb{F}_{2}^{N}$ of dimension $(1-\gamma)N$ that can successfully list-decode errors of probability $\epsilon$ with list size $|T|$ must satisfy

\displaystyle|T|\gtrsim 2^{(h(\epsilon)-\gamma)N}.

(1)

Our bound in Theorem 4 shows that Reed-Muller codes achieve these optimal parameters, at least in some regimes (for e.g. when $\binom{n}{\leq d}\geq 1-\frac{1.99\epsilon}{\ln 2}$ and $\epsilon$ is small enough). We now turn to our list-decoding bound for transitive codes.

Theorem 5.

Fix any $\epsilon\in(0,\frac{1}{2})$ , $\eta\in(0,1)$ , and $N>\left(\frac{5}{\epsilon}\right)^{20}$ . Then any transitive linear code $C\subseteq\mathbb{F}_{2}^{N}$ of dimension $\textnormal{dim }C=\eta N$ can with high probability list-decode $\epsilon$ -errors using a list $T$ of size

|T(x)|=2^{\epsilon N\log(\frac{2}{1-\eta})+o(N)}+2^{4\epsilon N}.

As an explicit example of the types of bounds one gets from Theorem 5, we have that any transitive linear code of dimension $\textnormal{dim }C=(1-\frac{4\epsilon}{e})N$ can with high probability list-decode $\epsilon$ -errors using a list $T$ of size

|T|=2^{(h(\epsilon)-\epsilon+\frac{\epsilon^{2}}{\ln 2})N+o(N)}+2^{4\epsilon N}.

For comparison, recall that our lower bound (1) states that any code $C$ of dimension $(1-\frac{4\epsilon}{e})N$ requires a list size of at least about $2^{(h(\epsilon)-\frac{4\epsilon}{e})N}$ .

1.1 Techniques

Our weight distribution bound for transitive linear codes (Theorem 1) is proven by showing that the entropy of a uniformly random codeword of weight $\alpha N$ is small. To do this, we analyze the entropy of the coordinates corresponding to linearly independent columns of the generator matrix. Transitivity implies that every coordinate in the code has the same entropy, and subadditivity of entropy can then be used to bound the entropy of the entire distribution.

To obtain our decoding criterion, we make use of a connection between the probability of a decoding error and the $\ell_{2}$ norm of the coset distribution of the code. To explain the intuition, let us start by assuming that exactly $\epsilon N$ of the coordinates in the codeword are flipped, although our results actually hold over the BSC as well. Let $z$ be the vector in $\mathbb{F}_{2}^{N}$ that represents the errors introduced by the channel, and let $H$ be the parity check matrix of the code. Then by standard arguments, if $z$ can be recovered from $Hz^{\intercal}$ , the codeword can be decoded. In the case where $z$ is uniformly distributed on vectors of weight $\epsilon N$ , this amounts to showing that with high probability, the coset of $z$ does not contain any string of weight $\epsilon N$ (in other words, there is no $w\in\mathbb{F}_{2}^{N}$ of weight $|w|=\epsilon N$ such that $Hz^{\perp}=Hw^{\perp}$ ). This can be understood by computing the norm

\|f\|_{2}^{2}=\frac{1}{2^{N}}\sum_{y}f(y)^{2}=\frac{1}{2^{N}}\sum_{y}\Pr[Hz^{\intercal}=y^{\intercal}]^{2},

where $f(y)=\Pr[Hz^{\intercal}=y^{\intercal}]$ . The norm above is always at least $2^{-N}\binom{N}{\epsilon N}^{-1}$ , and if it is close to $2^{-N}\binom{N}{\epsilon N}^{-1}$ then the code can be decoded with high probability. If $\|f\|_{2}^{2}$ is larger than $2^{-N}\binom{N}{\epsilon N}^{-1}$ , then we show that the code can be list-decoded with high probability, where the size of the list is related to $2^{N}\binom{N}{\epsilon N}\|f\|_{2}^{2}$ .

Thus, to understand decoding, we need to understand $\|f\|_{2}^{2}$ . Using Fourier analysis, we express this quantity as

\displaystyle\|f\|_{2}^{2}=\sum_{j=0}^{N}\Pr[|c^{\perp}|=j]\cdot K_{\epsilon N}(j)^{2},

(2)

where $c^{\perp}$ is a uniformly random codeword in the dual code and $K_{\epsilon N}$ is the Krawtchouk polynomial of degree $\epsilon N$ . We note that such relations for the coset weight distribution have been used to understand the discrepancy of subsets of the sphere, as well as subsets of other homogeneous spaces. In particular, (2) was proven in a slightly different form in [Bar21] (see Theorem 2.1 and Lemma 4.1), whereas over $\mathbb{R}^{N}$ results of this type had previously been derived in [BDM18, Skr19].

Using estimates for the magnitude of Krawtchouk polynomials and bounds for the weight distribution of the dual code $C^{\perp}$ , one can thus bound the norm $\|f\|_{2}^{2}$ in the set-up where the error string $z$ is a random vector of weight exactly $\epsilon N$ . Using essentially the same techniques, one can also bound the norm $\|f\|_{2}^{2}$ when the error string $z$ is a random vector of weight $\approx\epsilon N$ , i.e. $z$ is taken uniformly at random from the set $S=\{x\in\mathbb{F}_{2}^{N}:|x|=\epsilon N\pm N^{3/4}\}$ .

Our next step is then to show that the $\ell_{2}$ norm corresponding to the $\epsilon$ -biased distribution is very similar to the $\ell_{2}$ norm corresponding to the uniform distribution over $S$ . Intuitively, this is because $S$ only contains a very small range of weights, so the $\epsilon$ -biased distribution and the uniform distribution must behave very similarly over strings of weight in $S$ . It then follows that their corresponding $\ell_{2}$ norms must be similar as well.

Our decoding criteria (Theorem 2, Corollary 3) are thus obtained by bounding the norm $\|f\|_{2}^{2}$ using estimates for Krawtchouk polynomials and for the weight distribution of the dual code $C^{\perp}$ . Our list-decoding results (Theorems 4 and 5) then follow from our weight bound for transitive codes (Theorem 1) and from a weight bound of Samorodnitsky for Reed-Muller codes (Theorem 6).

1.2 Related Work

It has been shown that LDPC codes achieve capacity over Binary Memoryless Symmetric Channels (BMS) [LMS⁺97, KRU13, Gal62], which includes both the BSC and the BEC. These constructions are not deterministic, and it is only with the advent of polar codes [Ari09] that we obtained capacity-achieving codes with both a deterministic constructions and efficient encoding and decoding algorithms.

Polar codes are closely related to Reed-Muller codes, in the sense that they also consist of subspaces that correspond to polynomials over $\mathbb{F}_{2}\cite[cite]{[\@@bibref{}{arikan2009polar}{}{}]}$ . In [Ari09] it was shown that Polar codes achieve capacity over the BSC, and algorithms were given to both encode and decode them.

It has long been believed that Reed-Muller codes achieve capacity, and significant progress has been made in that direction over the last few years. (See [ASY21] for a discussion on the subject, as well as a thorough exposition to Reed-Muller codes). Abbe, Shpilka and Wigderson first showed that Reed-Muller codes achieve capacity over the BSC and the BEC for sub-constant and super-constant rates [ASW15]. Kudekar, Kumar, Mondelli, Pfister, Sasoglu and Urbanke then proved that in the constant rate regime, Reed-Muller codes achieve capacity over the BEC channel [KKM⁺16]. Abbe and Ye showed that the Reed-Muller transform polarizes the conditional mutual information, and proved that some non-explicit variant of the Reed-Muller code achieves capacity [AY19]. (They conjecture that this variant is in fact the Reed-Muller code itself). Hazla, Samorodnitsky and Sberlo then proved that Reed-Muller codes of constant rates can decode a constant fraction of errors on the BSC [HSS21]; this had previously been shown for Almost-Reed-Muller codes by Abbe, Hazla and Nachum [AHN21]. Most recently, Reeves and Pfister showed that Reed-Muller codes achieve capacity over all BMS channels under bit-MAP decoding [RP21], i.e. that one can with high probability recover any single bit of the original codeword (but not with high enough probability that one could take a union bound). Despite these breakthroughs, the conjecture that Reed-Muller codes achieve capacity over all BMS channels under block-MAP decoding (i.e. recover the whole codeword with high probability) is ultimately still open.

Weight Bounds for Reed-Muller Codes
Several past works have proven bounds on the weight distribution of Reed-Muller codes. Kaufman, Lovett and Porat gave asymptotically tight bounds on the weight distribution of Reed-Muller codes of constant degree [KLP12]. Abbe, Shpilka and Wigderson then built on these techniques to obtain bounds for all degrees smaller than $\frac{n}{4}$ [ASW15], before Sberlo and Shpilka again improved the approach and obtained bounds for all degrees [SS20]. Most recently, Samorodnitsky used completely different ideas to obtain weight bounds in the regime where both the rate of the code and the normalized weight of the codeword are $\Theta(1)$ [Sam20]. We will later use his following result in our list-decoding arguments:

Theorem 6 ([Sam20]).

Let $\binom{n}{\leq d}=\eta 2^{n}=\eta N$ for some $\eta\in(0,1)$ , and denote by $\mathcal{D}(n,d)$ the uniform distribution over all codewords in $\mathsf{RM}(n,d)$ . Then for any $\alpha\in(0,\frac{1}{2})$ we have

\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\alpha N]\leq 2^{o(N)}\left(\frac{1}{1-\eta}\right)^{2\ln 2\cdot\alpha N}2^{-\eta N}.

These bounds are strong when $\alpha\ll 1/2$ . For $\alpha$ close to $1/2$ , the first results we are aware of are due to Ben-Eliezer, Hod and Lovett [BHL12]. Their bounds, which were extended to Reed-Muller codes over prime fields by Beame, Oveis Gharan and Yang [BGY20], are strongest when the degree is sublinear. Sberlo and Shpilka then obtained bounds for all degrees in [SS20], while Samorodnitsky again obtained bounds in the regime where both $\alpha$ and $\eta$ are $\Theta(1)$ [Sam20].

We note that in some regimes (for e.g. when the degree satisfies $0.38n<d<0.499n$ and $\alpha$ is larger than some constant depending on $d/n$ ), our Theorem 1 improves on all the aforementioned weight bounds. See Appendix A for some details.

List Decoding
List decoding was proposed by Elias in 1957 as an alternative to unique decoding [Eli57]. In the list decoding framework, the receiver of a corrupted codeword is asked to output a list of potential codewords, with the guarantee that with high probability one of these codewords is the original one. This of course allows for a greater fraction of errors to be tolerated.

The list decoding community has largely focused on proving results for the adversarial noise model, and many codes are now known to achieve list-decoding capacity. For example uniformly random codes achieve capacity, as do uniformly random linear codes [GHSZ02, LW18, GHK11]. Folded Reed-Solomon codes were the first explicit codes to provably achieve list-decoding capacity [GR08], followed by several others a few years later [GX12, Kop15, HRW17, MRR⁺20]. For the rest of this paper however, we will exclusively work in the model where the errors are stochastic. In this model, the strongest known list decoding bound for the code $\mathsf{RM}(n,d)$ with $\binom{n}{\leq d}=\eta N>N-N\log(1+2\sqrt{\epsilon(1-\epsilon)})$ is, to our knowledge, that one can output a list $T$ of size

\displaystyle|T|=2^{\epsilon N\log\frac{4\epsilon(1-\epsilon)}{(1-\eta)^{4\ln 2}}+o(N)}

(3)

and succeed with high probability in decoding $\epsilon$ -errors. This result, although not explicitly stated in [Sam20], can be obtained from his weight bound of Theorem 6 by bounding the expected number of codewords that end up closer to the received string than the original codeword, and then applying Markov’s inequality. We note that the expression in (3) stays strictly below the optimal size of $2^{h(\epsilon)N-(1-\eta)N+o(N)}$ (see Appendix D.1 for a proof of this).

Krawtchouk polynomials
The Krawtchouk polynomial of degree $s$ is the polynomial

K_{s}(x)=\sum_{j=0}^{s}(-1)^{j}\binom{x}{j}\binom{N-x}{s-j}.

For any subset $S\subseteq\{0,...,N\}$ , we will be interested in the corresponding polynomial $K_{S}(x):=\sum_{s\in S}K_{s}(x)$ . For $v\in\mathbb{F}_{2}^{N}$ , we will sometimes abuse notation and use $K_{s}(v)$ to mean $K_{S}(|v|)$ . The following proposition follows from standard results (see for instance [KL99], or Theorem 16 in [MS77]).

Proposition 7.

For any $N$ and any $S\subseteq\{1,...,N\}$ , we have

\frac{2^{-N}}{\sum_{s\in S}\binom{N}{s}}\sum_{j=0}^{N}\binom{N}{j}K_{S}(j)^{2}=1.

Good estimates for Krawtchouk polynomials of any degree were obtained in [KL95, IS98, Pol19] (see for e.g. [Pol19], Lemma 2.1). These estimates are asymptotically tight in the exponent. Note that $|K_{s}(x)|=|K_{s}(N-x)|=|K_{N-s}(x)|$ , so it suffices to understand the case $x,s\leq\frac{N}{2}$ .

Theorem 8 ([KL95, IS98, Pol19]).

Let $\epsilon,\delta\in(0,\frac{1}{2})$ be arbitrary. If $\delta\geq\frac{1}{2}-\sqrt{\epsilon(1-\epsilon)}$ , then

|K_{\epsilon N}(\delta N)|\leq 2^{(1+h(\epsilon)-h(\delta))\frac{N}{2}}.

If $\delta<\frac{1}{2}-\sqrt{\epsilon(1-\epsilon)}$ , define $\omega=\frac{1-2\delta-\textnormal{sgn}(1-2\delta)\sqrt{(1-2\delta)^{2}-4\epsilon(1-\epsilon)}}{2(1-2\delta)}.$ Then

|K_{\epsilon N}(\delta N)|\leq\frac{(1-\omega)^{\delta N}(1+\omega)^{(1-\delta)N}}{\omega^{\epsilon N}}.

As the second expression can be somewhat cumbersome to use, [Pol19] also gives the following weaker bound (see Lemma 2.2 and equation 2.10 in [Pol19]):

Theorem 9 ([Pol19]).

For any $\epsilon\in(0,\frac{1}{2})$ and any $\delta<\frac{1}{2}\sqrt{\epsilon(1-\epsilon)}$ , we have

|K_{\epsilon N}(\delta N)|\leq 2^{h(\epsilon)N+\epsilon N\log(1-2\delta)}.

We will need the above estimates when using our Theorem 2 to obtain list-decoding results for transitive codes and Reed-Muller codes.

2 Notation, Conventions and Preliminaries

For the sake of conciseness, we will use the notation

\{a\pm l\}=\{a-l,...,a+l\},

the notation

\binom{n}{\leq d}=\binom{n}{0}+\binom{n}{1}+\dotsb+\binom{n}{d},

and for $S\subseteq\{0,...,N\}$ the notation

\binom{N}{S}=\sum_{s\in S}\binom{N}{s}.

Let $N=2^{n}$ . We will be working with the vector spaces $\mathbb{F}_{2}^{n}$ and $\mathbb{F}_{2}^{N}$ . For convenience, we associate $\mathbb{F}_{2}^{n}$ with the set $[N]=\{1,2,\dotsc,N\}$ , by ordering the elements of $\mathbb{F}_{2}^{n}$ lexicographically. For $x\in\mathbb{F}_{2}^{N}$ , we write $|x|=|\{j\in[N]:x_{j}=1\}|$ to denote the weight of $x$ .

2.1 Linear Codes

An $N$ -bit code is a subset $C\subseteq\mathbb{F}_{2}^{N}$ . Whenever $C$ is a subspace of $\mathbb{F}_{2}^{N}$ , we say that $C$ is a linear code. Any linear code $C\subseteq\mathbb{F}_{2}^{N}$ can be represented by its generator matrix, which is a $\dim\textnormal{ C}\times N$ matrix $G$ whose rows form a basis for $C$ . The matrix $G$ generates all codewords of $C$ in the sense that

C=\{vG:v\in\mathbb{F}_{2}^{\textnormal{dim }C}\}.

Another useful way to describe a linear code $C\subseteq\mathbb{F}_{2}^{N}$ is via its parity-check matrix, which is an $(N-\dim\textnormal{ C})\times N$ matrix $H$ whose rows span the orthogonal complement of $C$ . The linear code $C$ can then be expressed as

C=\{c\in\mathbb{F}_{2}^{N}:Hc^{\intercal}=0\}.

One property that will play an important role is transitivity, which we define below:

Definition 1.

A set $C\subseteq\mathbb{F}_{2}^{N}$ is transitive if for every $i,j\in[N]$ there exists a permutation $\pi:[N]\rightarrow[N]$ such that

(i)

$\pi(i)=j$
(ii)

For every element $v=(v_{1},...,v_{N})\in C$ we have $(v_{\pi(1)},...v_{\pi(N)})\in C$

We note that the dual code of a transitive code is also transitive (see Appendix D.2 for the proof).

Claim 10.

The dual code $C^{\perp}$ of a transitive code $C\subseteq\mathbb{F}_{2}^{N}$ is transitive.

2.2 Reed-Muller Codes

We will denote by $\mathsf{RM}(n,d)$ the Reed-Muller code with $n$ variables and degree $d$ . Throughout this section, we let $M$ be the generator matrix of $\mathsf{RM}(n,d)$ ; this is an $\binom{n}{\leq d}\times N$ matrix whose rows correspond to sets of size at most $d$ , ordered lexicographically, and whose columns correspond to elements of $\mathbb{F}_{2}^{n}$ . For $S\subseteq[n],|S|\leq d$ and $x\in\mathbb{F}_{2}^{n}$ , the corresponding entry is $M_{S,x}=\prod_{j\in S}x_{j}$ . If $S$ is empty, this entry is set to $1$ .

If $v\in\mathbb{F}_{2}^{\binom{n}{\leq d}}$ is a row vector, $v$ can be thought of as describing the coefficients of a multilinear polynomial in $\mathbb{F}_{2}[X_{1},\dotsc,X_{n}]$ of degree at most $d$ . The row vector $vM$ is then the evaluations of this polynomial on all inputs from $\mathbb{F}_{2}^{n}$ . It is well known that $M$ has full rank, $\binom{n}{\leq d}$ . In fact we have the following standard fact (see Appendix D.3 for the proof):

Fact 11.

The columns of $M$ that correspond to the points $x\in\mathbb{F}_{2}^{n}$ with $|x|\leq d$ are linearly independent.

The parity-check matrix of the Reed-Muller code is known to be the same as the generator matrix of a different Reed-Muller code. Namely, let $H$ be the $\binom{n}{\leq n-d-1}\times N$ generator matrix for the code $\mathsf{RM}(n,n-d-1)$ . Then $H$ has full rank, and $MH^{\intercal}=0$ . So, the rows of $H$ are a basis for the orthogonal complement of the span of the rows of $M$ . Reed-Muller codes also have useful algebraic features, notably transitivity:

Fact 12.

For all $n$ and all $d\leq n$ , the Reed-Muller code $\mathsf{RM}(n,d)$ is transitive.

See Appendix D.3 for the proof.

2.3 Entropy

The binary entropy function $h:[0,1]\rightarrow\mathbb{R}$ is defined to be

h(\epsilon)=\epsilon\cdot\log\frac{1}{\epsilon}+(1-\epsilon)\cdot\log\frac{1}{1-\epsilon}.

The following fact allows us to approximate binomial coefficients using the entropy function:

Fact 13.

For $n/2\geq d\geq 1$ , $\sqrt{\frac{8\pi}{e^{4}n}}\cdot 2^{h(d/n)\cdot n}\leq\binom{n}{d}\leq\binom{n}{\leq d}\leq 2^{h(d/n)\cdot n}.$

The leftmost inequality is a consequence of Stirling’s approximation for the binomial coefficients, and the rightmost is a consequence of the sub-additivity of entropy.

The following lemma, which is essentially a 2-way version of Pinsker’s inequality, gives a useful way to control the entropy function near $1/2$ .

Lemma 14.

For any $\mu\in(0,1)$ , we have

\frac{\mu^{2}}{2\ln 2}\leq 1-h\left(\frac{1-\mu}{2}\right)\leq\mu^{2}.

See Appendix D.4 for the proof.

2.4 Probability Distributions

There are two types of probability distributions that we will use frequently. The first one is the $\epsilon$ -Bernoulli distribution over $\mathbb{F}_{2}^{N}$ , which we will denote by

P_{\epsilon}(z)=\epsilon^{|z|}(1-\epsilon)^{N-|z|}.

The second one is the uniformly random distribution over some set $T$ , which we will denote by

\mathcal{D}(T)(z)=\begin{cases}\frac{1}{|T|}&\text{if $z\in T$,}\\ 0&\text{otherwise.}\end{cases}.

There are two particular cases for the uniform distribution that will occur often enough that we attribute them their own notation. The first one is the uniform distribution over $\mathbb{F}_{2}^{t}$ , which we will denote by

\mu_{t}=\mathcal{D}(\mathbb{F}_{2}^{t}).

The second one is the uniform distribution over all vectors $z\in\mathbb{F}_{2}^{N}$ of weight $|z|\in S$ , for some $S\subseteq\{0,...,N\}$ . We will denote this probability distribution by

\lambda_{S}=\mathcal{D}(\{z\in\mathbb{F}_{2}^{N}:|z|\in S\}).

2.5 Probability Theory

We will need two very standard results of probability theory (see for e.g. [BLM13]): Markov’s inequality and Chernoff’s bound. We start with Markov’s inequality.

Lemma 15.

Let $X$ be a nonnegative random variable. Then for any $a>0$ , we have

\Pr[X\geq a]\leq\frac{\mathop{\mathbb{missing}}{E}[X]}{a}.

We will also need Chernoff’s bound:

Lemma 16.

Let $X_{1},...,X_{n}$ be i.i.d. random variables taking values in $\{0,1\}$ , and define $X=X_{1}+...+X_{n}$ . Then for any $\delta\in(0,1)$ , we have

\Pr\Big{[}\big{|}X-\mathop{\mathbb{missing}}{E}[X]\big{|}>\delta\cdot n\mathop{\mathbb{missing}}{E}[X_{1}]\Big{]}\leq 2e^{-\frac{\delta^{2}\cdot n\mathop{\mathbb{missing}}{E}[X_{1}]}{3}}.

2.6 Fourier Analysis

The Fourier basis is a useful basis for the space of functions mapping $\mathbb{F}_{2}^{N}$ to the real numbers. We recall some of its properties below (see for e.g. [dW08]). For $f,g\in\mathbb{F}_{2}^{N}\rightarrow\mathbb{R}$ , define the inner product

\langle f,g\rangle=\frac{1}{2^{N}}\sum_{x\in\mathbb{F}_{2}^{N}}f(x)g(x).

For every $x,y\in\mathbb{F}_{2}^{N}$ , define the character

\chi_{y}(x)=(-1)^{\sum_{j=1}^{N}x_{j}y_{j}}.

These functions form an orthonormal basis, namely for $y,y^{\prime}\in\mathbb{F}_{2}^{N}$ ,

\displaystyle\langle\chi_{y},\chi_{y^{\prime}}\rangle=\begin{cases}1&\text{if $y=y^{\prime}$,}\\ 0&\text{otherwise.}\end{cases}

We define the Fourier coefficients $\hat{f}(y)=\langle f,\chi_{y}\rangle$ . Then for $f,g:\mathbb{F}_{2}^{N}\rightarrow\mathbb{R}$ , we have

\langle f,g\rangle=\sum_{y\in\mathbb{F}_{2}^{N}}\hat{f}(y)\cdot\hat{g}(y).

In particular,

\|f\|^{2}_{2}=\langle f,f\rangle=\sum_{y}\hat{f}(y)^{2}.

3 Outline of the Paper

The main question we will be looking into is whether or not a family of list-decoding codes $\{C_{N}\}$ , with $C_{N}\subseteq\mathbb{F}_{2}^{N}$ , is asymptotically resilient to independent errors of probability $\epsilon$ . Formally, we are given a list size $k=k(N)$ and want to know if there exists a family of decoding functions $\{d_{N}\}$ , with $d_{N}:\mathbb{F}_{2}^{N}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes k}$ , such that for every sequence of codewords $\{c_{N}\}$ we have

\displaystyle\lim_{N\rightarrow\infty}\Pr_{\rho_{N}\sim P_{\epsilon}}\big{[}c_{N}\notin d_{N}(c_{N}+\rho_{N})\big{]}=0.

We note that the unique decoding problem can be seen as setting $k=1$ in the above set-up. Our general approach will be based on trying to identify the error string $\rho\in\mathbb{F}_{2}^{N}$ from its image $H\rho^{\intercal}$ . In particular, we will be interested in the max-likelihood decoder

	$\displaystyle D_{k}(x)$	$\displaystyle=\mathop{\textnormal{argmax}}_{\begin{subarray}{c}\{z_{1},...,z_{k}\}\subseteq\mathbb{F}_{2}^{N}\\ {Hz_{i}}^{\intercal}=x\textnormal{ for all }i\end{subarray}}\{P_{\epsilon}(z_{1})+...+P_{\epsilon}(z_{k})\}$
		$\displaystyle=\mathop{\textnormal{argmin}}_{\begin{subarray}{c}\{z_{1},...,z_{k}\}\subseteq\mathbb{F}_{2}^{N}\\ Hz_{i}^{\intercal}=x\textnormal{ for all }i\end{subarray}}\{\|z_{1}\|+...+\|z_{k}\|\}.$		(4)

We show in the following lemma that if the max-likelihood decoder is able to identify the error string $\rho$ , then it is possible to recover the original codeword.

Lemma 17.

Let $H$ be the $t\times N$ parity-check matrix of the linear code $C$ , and let $D:\mathbb{F}_{2}^{t}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}}$ be an arbitrary function. Then there exists a decoding function

d:\mathbb{F}_{2}^{N}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}}

such that for every $c\in C$ we have

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[c\notin d(c+\rho)]\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D(H\rho^{\intercal})].

Proof.

Given $D:\mathbb{F}_{2}^{t}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}}$ , define $d:\mathbb{F}_{2}^{N}\rightarrow\left(\mathbb{F}_{2}^{N}\right)^{\otimes{k}}$ to be

d(z)=\{z+y:y\in D(Hz^{\intercal})\}.

We will show that whenever $\rho$ satisfies $\rho\in D(H\rho^{\intercal})$ , $\rho$ also satisfies $c\in d(c+\rho)$ for every $c\in C$ . Suppose $\rho\in D(H\rho^{\intercal})$ . Note that since $H$ is the parity-check matrix of $C$ , every $c\in C$ satisfies $Hc^{\intercal}=0$ . So for every $c\in C$ , any $\rho$ that satisfies $\rho\in D(H\rho^{\intercal})$ must also satisfy $\rho\in D(H(c^{\intercal}+\rho^{\intercal}))$ . It then follows by definition of $d(c+\rho)$ that

\displaystyle c=c+\rho+\rho\in d(c+\rho).

∎

From this point onward, our goal will thus be to prove that the max-likelihood decoder in (3) succeeds in recovering $\rho$ with high probability. In section 4, we relate the decoding error probability of the max-likelihood decoder $D_{k}$ to the collision probability

\sum_{x\in\mathbb{F}_{2}^{t}}\Pr[Hz^{\intercal}=x]^{2}.

In section 5, we build on this result to obtain a bound on the performance of $D_{k}$ in terms of the weight distribution of the dual code. We then present new bounds on the weight distribution of transitive codes in section 6. These bounds are interesting in their own right, and we show that they are essentially tight. In section 7, we combine these bounds with our results from section 5 to obtain list-decoding results for transitive linear codes. We then repeat this argument with Samorodnitsky’s Theorem 6 in section 8 to obtain a stronger list-decoding bound for Reed-Muller codes.

4 Collisions vs Decoding

Recall that we denote by $P_{\epsilon}$ the $\epsilon$ -Bernoulli distribution over $\mathbb{F}_{2}^{N}$ , i.e. the distribution

P_{\epsilon}(z)=\epsilon^{|z|}(1-\epsilon)^{N-|z|}.

Recall also that for any subset $S\subseteq\{0,...,N\}$ , we denote by $\lambda_{S}$ the uniform distribution over all strings $z\in\mathbb{F}_{2}^{N}$ of weight $|z|\in S$ , i.e.

\lambda_{S}(z)=\begin{cases}\frac{1}{\sum_{j\in S}\binom{N}{j}}&\text{if $|z|\in S$,}\\ 0&\text{otherwise.}\end{cases}

The goal of this section will be to analyze the relationship between the decoding of an error string $\rho\in\mathbb{F}_{2}^{N}$ and the collision probability of strings $z\in\mathbb{F}_{2}^{N}$ within the map $z\mapsto Hz^{\intercal}$ . Intuitively, the more collisions there are within this mapping, the harder it is for our decoder to correctly identify the error string $\rho\in\mathbb{F}_{2}^{N}$ upon seeing only its image $H\rho^{\intercal}\in\mathbb{F}_{2}^{t}$ . However, certain error strings might be unlikely enough to occur that our decoder can safely ignore them. For example, if we are interested in an $\epsilon$ -noisy error string $\rho$ , then $\rho$ is unlikely to have weight $|\rho|$ far away from $\epsilon N$ . We could thus choose to ignore all strings whose weights do not lie in the set $S=\{\epsilon N-l,...,\epsilon N+l\}$ , for some integer $l$ . In order to analyze the collisions that occur when strings are required to have weight $z\in S$ , we define for every $z\in\mathbb{F}_{2}^{N}$ and every $S\subseteq\{0,...,N\}$ the set of $S$ -colliders of $z$ , i.e. the set of strings $y$ that lie in the coset of $z$ and have weight $|y|\in S$ :

Definition 2.

For any $z\in\mathbb{F}_{2}^{N}$ and any subset $S\subseteq\{0,...,N\}$ , define

\Omega_{z}^{S}=\left\{y\in\mathbb{F}_{2}^{N}:|y|\in S\textnormal{ and }Hy^{\intercal}=Hz^{\intercal}\right\}.

This definition captures a natural parameter for how large of a list we need before we can confidently claim that it contains the error string: if we are given $H\rho^{\intercal}$ and are told that with high probability the error string $\rho$ has weight $|\rho|\in S$ , then we should output the list $\Omega_{\rho}^{S}$ . For unique decoding we want to argue that $|\Omega_{\rho}^{S}|=1$ with high probability, whereas for list decoding we want to argue that $|\Omega_{\rho}^{S}|\leq k$ with high probability, for some integer $k>1$ . The expectation of $|\Omega_{\rho}^{S}|$ will thus be a key quantity in our analysis. We will call this expectation the ”collision count,” because it will later be useful to interpret it as the renormalized collision probability of the map $z\mapsto Hz^{\intercal}$ (see for instance the proof of Proposition 20).

Definition 3.

For any subset $S\subseteq\{0,..,N\}$ and any $t\times N$ matrix $H$ , define

\displaystyle\mathsf{Coll}_{H}(S)

\displaystyle=\mathop{\mathbb{missing}}{E}_{z\sim\lambda_{S}}\big{[}|\Omega_{z}^{S}|\big{]}.

In the following lemma, we use Markov’s inequality to bound the probability of a list decoding error in terms of $\mathsf{Coll}_{H}(S)$ .

Lemma 18.

For any subset $S\subseteq\{0,...,N\}$ , any matrix $H$ with $N$ columns, and any integer $k\geq 1$ , we have

\displaystyle\Pr_{\rho\sim\lambda_{S}}\big{[}|\Omega_{\rho}^{S}|>k\big{]}\leq\frac{\mathsf{Coll_{H}}(S)-1}{k}.

Proof.

Note that $|\Omega_{z}^{S}|\geq 1$ for any $z\in\mathbb{F}_{2}^{N}$ with weight $|z|\in S$ , so the random variable $|\Omega_{z}^{S}|-1$ is always non-negative. Applying Markov’s inequality (i.e. Lemma 15), we then have

	$\displaystyle\Pr_{\rho\sim\lambda_{S}}\big{[}\|\Omega_{\rho}^{S}\|>k\big{]}$	$\displaystyle=\Pr_{\rho\sim\lambda_{S}}\big{[}\|\Omega_{\rho}^{S}\|-1\geq k\big{]}$
		$\displaystyle\leq\frac{\mathsf{Coll_{H}}(S)-1}{k}.$

∎

When the error string $\rho$ is sampled uniformly at random from the set $\{z\in\mathbb{F}_{2}^{N}:|z|\in S\}$ , the above lemma allows us to relate the decoding error probability to the collision count $\mathsf{Coll_{H}}(S)$ . The problem we are most interested in, however, is when $\rho$ is sampled not from some uniform distribution, but from the $\epsilon$ -noisy probability distribution $P_{\epsilon}$ . We will now show how to connect these two decoding problems. The intuition is that by the Chernoff bound, we only need to concern ourselves with strings whose weights lie in $S=\{\epsilon N\pm l\}$ , for some appropriately chosen $l$ . But in this weight band all strings have similar weight, and so are given similar probability under the distribution $P_{\epsilon}$ . Intuitively, the $P_{\epsilon}$ -decoder must then perform very similarly to the $\lambda_{S}$ -decoder. The following theorem makes this idea precise, and then uses Lemma 18 to bound the probability of a decoding error. Recall that $D_{k}:\mathbb{F}_{2}^{t}\rightarrow(\mathbb{F}_{2}^{N})^{\otimes k}$ is the max-likelihood decoder

D_{k}(x)=\mathop{\textnormal{argmin}}_{\begin{subarray}{c}\{z_{1},...,z_{k}\}\subseteq\mathbb{F}_{2}^{N}\\ Hz_{i}^{\intercal}=x\textnormal{ for all }i\end{subarray}}\{|z_{1}|+...+|z_{k}|\}.

Theorem 19.

Fix $\epsilon<\frac{1}{2}$ , let $H$ be any matrix with $N$ columns, and let $k=(2l+1)m+1$ for some integers $m\geq 0$ and $l\leq(\frac{1}{2}-\epsilon)N$ . Then

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]

\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq|S|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}},

where $\mathbbm{1}\{k=1\}$ is $1$ when $k=1$ and $0$ otherwise

Proof.

We will consider the unique decoding case ( $k=1$ , i.e. $m=0$ ) and the list-decoding case ( $k>1$ , i.e. $m\in\mathbb{N}$ ) separately.
Case 1: Unique decoding, i.e. $k=1$
Let $t$ be the number of rows in the matrix $H$ . We will show that a slightly less performant decoder $\tilde{D}_{1}:\mathbb{F}_{2}^{t}\rightarrow\mathbb{F}_{2}^{N}$ satisfies the desired probability bound. We define $\tilde{D}_{1}$ as follows: upon receiving input $x\in\mathbb{F}_{2}^{t}$ , $\tilde{D}_{1}$ outputs the minimum-weight string from the set $\{z\in\mathbb{F}_{2}^{N}:Hz^{\perp}=x,|z|=\epsilon N\pm l\}.$ If there is no such string, the decoder fails. It is clear that

\Pr_{\rho\sim P_{\epsilon}}[\rho\neq D_{1}(H\rho^{\intercal})]\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})],

since $D_{1}$ always returns the most likely string whereas $\tilde{D}_{1}$ may not. We thus turn to proving the desired bound for $\tilde{D}_{1}$ . Letting

B=\{z\in\mathbb{F}_{2}^{N}:\big{|}|z|-\epsilon N\big{|}>l\}\},

we have by Chernoff’s bound (i.e. Lemma 16) that

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})]$	$\displaystyle\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\in B]+\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})\big{\|}\rho\notin B]$
		$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})\big{\|}\rho\notin B].$		(5)

We want to bound the second term. For any $\rho\notin B$ , we define the set of ”problematic weights” $S(\rho)=\{\epsilon N-l,...,|\rho|\}$ . We note that for $\rho\notin B$ , our decoder $\tilde{D}_{1}$ can only fail if there is some string $z\neq\rho$ satisfying $Hz^{\perp}=H\rho^{\perp}$ and $|z|\in S(\rho)$ . Recalling the definition $\Omega_{\rho}^{S}=\{z:Hz^{\perp}=H\rho^{\perp},|z|\in S\}$ , we can then rewrite our equation (4) as

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})]

\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\Pr_{\rho\sim P_{\epsilon}}\big{[}|\Omega_{\rho}^{S(\rho)}|>1\big{|}\rho\notin B\big{]}.

Considering the most problematic weight level $w$ within the region $\{\epsilon N\pm l\}$ and using a union bound over all lower levels $w^{\prime}\leq w$ , we get

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\left\{\Pr_{\rho\sim P_{\epsilon}}\big{[}\|\Omega_{\rho}^{S(\rho)}\|>1\big{\|}\|\rho\|=w\big{]}\right\}$
		$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+(2l+1)\mathop{\max}_{\begin{subarray}{c}w,w^{\prime}\in\{\epsilon N\pm l\}\\ w^{\prime}\leq w\end{subarray}}\left\{\Pr_{\rho\sim P_{\epsilon}}\big{[}\|\Omega_{\rho}^{\{w,w^{\prime}\}}\|>1\big{\|}\|\rho\|=w\big{]}\right\}.$

We now note that under the condition $|\rho|=w$ , the probability distributions $P_{\epsilon}(\rho)$ and $\lambda_{w,w^{\prime}}(\rho)$ are identical (they are both uniform on strings of weight $w$ ). We can thus rewrite our last inequality as

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})]

\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+(2l+1)\mathop{\max}_{\begin{subarray}{c}w,w^{\prime}\in\{\epsilon N\pm l\}\\ w^{\prime}\leq w\end{subarray}}\left\{\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{|}|\rho|=w\big{]}\right\}.

But by basic conditional probability we know that

\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{]}\geq\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\rho|=w\big{]}\cdot\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{|}|\rho|=w\big{]},

so we can bound our previous expression by

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\neq\tilde{D}_{1}(H\rho^{\intercal})]

\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+(2l+1)\mathop{\max}_{\begin{subarray}{c}w,w^{\prime}\in\{\epsilon N\pm l\}\\ w^{\prime}\leq w\end{subarray}}\left\{\frac{\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\Omega_{\rho}^{\{w,w^{\prime}\}}|>1\big{]}}{\Pr_{\rho\sim\lambda_{w,w^{\prime}}}\big{[}|\rho|=w\big{]}}\right\}.

Now for any $w<\frac{N}{2}$ and $w^{\prime}\leq w$ , we have $\Pr_{\rho\sim\lambda_{\{w,w^{\prime}\}}}\big{[}|\rho|=w\big{]}=\frac{\binom{N}{w}}{\binom{N}{\{w,w^{\prime}\}}}\geq\frac{\binom{N}{w}}{\binom{N}{w}+\binom{N}{w^{\prime}}}\geq\frac{1}{2}$ . It then follows that

\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}\rho\notin\tilde{D}_{1}(H\rho^{\perp})\big{]}\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+2(2l+1)\cdot\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ |S|\in\{1,2\}\end{subarray}}\left\{\Pr_{\rho\sim\lambda_{S}}\big{[}\Omega_{\rho}^{S}>1\big{]}\right\}.

The theorem statement then follows from Lemma 18.

Case 2: List decoding, i.e. $k>1$
Let $t$ be the number of rows in the matrix $H$ . We will show that a slightly less performant decoding function $D_{k,l}:\mathbb{F}_{2}^{t}\rightarrow(\mathbb{F}_{2}^{N})^{\otimes k}$ satisfies the desired probability bound. We define $D_{k,l}$ as follows: upon receiving input $x\in\mathbb{F}_{2}^{t}$ , $D_{k,l}$ outputs $\frac{k-1}{2l+1}$ strings from $\{z\in\mathbb{F}_{2}^{N}:Hz=x,|z|=w\}$ , for each $w\in\{\epsilon N\pm l\}$ . If there are fewer than $\frac{k-1}{2l+1}$ strings in some level $w$ , the decoder returns all of them. It is clear that for any $l$ we have

\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})],

since $D_{k}$ returns the $k$ most likely strings while $D_{k,l}$ returns at most $k-1$ strings. We thus turn to proving the desired bound for $D_{k,l}$ . We first bound the probability that the error string $|\rho|$ be far away from its mean. Letting

B=\Big{\{}z\in\mathbb{F}_{2}^{N}:\big{|}|z|-\epsilon N\big{|}>l\Big{\}},

we have, by Chernoff’s bound (i.e. Lemma 16), that

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})]$	$\displaystyle\leq\Pr_{\rho\sim P_{\epsilon}}[\rho\in B]+\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})\big{\|}\rho\notin B]$
		$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})\big{\|}\|\rho\|=w].$

Since the distribution $P_{\epsilon}$ gives the same probability to any two strings of equal weights, we get

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\Pr_{\rho\sim\lambda_{\{w\}}}[\rho\notin D_{k,l}(H\rho^{\intercal})]$
		$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\max_{w\in\{\epsilon N\pm l\}}\Pr_{\rho\sim\lambda_{\{w\}}}[\|\Omega_{\rho}^{\{w\}}\|>\frac{k-1}{2l+1}].$

Applying Lemma 18, we get

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k,l}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{2l+1}{k-1}\cdot\max_{w\in\{\epsilon N\pm l\}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}}$
		$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{2(l+1)}{k}\cdot\max_{w\in\{\epsilon N\pm l\}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}},$

where in the last line we used that $\frac{a}{b}\leq\frac{a+1}{b+1}$ whenever $a\leq b$ . ∎

5 A Criterion for Decoding

In this section, we give a criterion that certifies that a linear code $C\subseteq\mathbb{F}_{2}^{N}$ is resilient to errors of probability $\epsilon$ . We give such a criterion for both unique decoding and list decoding. The function we will need to make this connection is the Krawtchouk polynomial of degree $s$ , which as we recall is defined as

K_{s}(x)=\sum_{j=0}^{s}(-1)^{j}\binom{x}{j}\binom{N-x}{s-j}.

For vectors $v\in\mathbb{F}_{2}^{N}$ , we will abuse notation and write $K_{s}(v)$ to mean $K_{s}(|v|).$ For convenience, we also define for any $S\subseteq\{0,...,N\}$ the function

K_{S}(x)=\sum_{s\in S}K_{s}(x).

In the following proposition, we use basic Fourier analysis tools to rewrite the collision count $\mathsf{Coll}_{H}(S)$ in terms of the Krawtchouk polynomial $K_{S}$ . We note that Proposition 20 was previously proven in a different form in [Bar21] (see Theorem 2.1 and Lemma 4.1), and can be seen as describing the coset weight distribution of the code. Recall that we use $\mu_{t}$ to denote the uniform distribution over all vectors in $\mathbb{F}_{2}^{t}$ , and that we use the notation $\binom{N}{S}=\sum_{s\in S}\binom{N}{s}$ .

Proposition 20.

Fix $\epsilon\in(0,\frac{1}{2})$ , and let $H$ be a $t\times N$ matrix with entries in $\mathbb{F}_{2}$ . Then for any $S\subseteq\{1,...,N\}$ , we have

\displaystyle\mathsf{Coll}_{H}(S)

\displaystyle=\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{v\sim\mu_{t}}[K_{S}(vH)^{2}].

Proof.

The main tool we will use is Parseval’s Identity, which relates the evaluations $f(x)$ of a function $f:\mathbb{F}_{2}^{t}\rightarrow\mathbb{R}$ to its Fourier coefficients $\hat{f}(y)$ by

\displaystyle\frac{1}{2^{t}}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)^{2}=\sum_{y\in\mathbb{F}_{2}^{t}}\hat{f}(y)^{2}.

(6)

We will first need to rewrite $\mathsf{Coll}_{H}(S)$ as the $\ell_{2}$ norm of some function $f$ . For this, we recall the definition $|\Omega_{z}^{S}|=\left\{y\in\mathbb{F}_{2}^{N}:|y|\in S\textnormal{ and }Hy^{\intercal}=Hz^{\intercal}\right\}$ and note that

	$\displaystyle\mathsf{Coll}_{H}(S)$	$\displaystyle:=\frac{1}{\binom{N}{S}}\sum_{z\in\mathbb{F}_{2}^{N}:\|z\|\in S}\|\Omega_{z}^{S}\|$
		$\displaystyle=\binom{N}{S}\sum_{z\in\mathbb{F}_{2}^{N}:\|z\|\in S}\frac{1}{\|\Omega_{z}^{S}\|}\Pr_{a\sim\lambda_{S}}[Ha^{\intercal}=Hz^{\intercal}]^{2}$
		$\displaystyle=\binom{N}{S}\sum_{x\in\mathbb{F}_{2}^{t}}\Pr_{z\sim\lambda_{S}}[Hz^{\intercal}=x]^{2}.$

We are now ready to apply Parseval’s Identity. Letting $f(x)=\Pr_{z\sim\lambda_{S}}[Hz^{\intercal}=x]$ in equation (6), we get

	$\displaystyle\mathsf{Coll}_{H}(S)$	$\displaystyle=\binom{N}{S}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)^{2}$
		$\displaystyle=2^{t}\binom{N}{S}\sum_{y\in\mathbb{F}_{2}^{t}}\hat{f}(y)^{2}.$

But by definition we have $\hat{f}(y):=2^{-t}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)\cdot(-1)^{y\cdot x^{\intercal}}$ , so the last equation can be rewritten as

\displaystyle\mathsf{Coll}_{H}(S)

\displaystyle=2^{-t}\binom{N}{S}\sum_{y\in\mathbb{F}_{2}^{t}}\Big{(}\sum_{x\in\mathbb{F}_{2}^{t}}f(x)\cdot(-1)^{y\cdot x^{\intercal}}\Big{)}^{2}.

(7)

Define the function $L_{S}(z)$ to be 1 if $z\in\mathbb{F}_{2}^{N}$ satisfies $|z|\in S$ , and 0 otherwise. We can then express $f(x)$ as

\displaystyle f(x)=\Pr_{z\sim\lambda_{S}}[Hz^{\intercal}=x]=\frac{1}{\binom{N}{S}}\sum_{\begin{subarray}{c}z\in\mathbb{F}_{2}^{N}\\ Hz^{\intercal}=x\end{subarray}}L_{S}(z).

(8)

Combining expressions (7) and (8) and applying the definition of the Fourier transform, we get

	$\displaystyle\mathsf{Coll}_{H}(S)$	$\displaystyle=2^{-t}\binom{N}{S}\sum_{y\in\mathbb{F}_{2}^{t}}\Big{(}\sum_{z\in\mathbb{F}_{2}^{N}}\frac{L_{S}(z)}{\binom{N}{S}}\cdot(-1)^{yHz^{\intercal}}\Big{)}^{2}$
		$\displaystyle=\frac{2^{2N-t}}{\binom{N}{S}}\sum_{y\in\mathbb{F}_{2}^{t}}\hat{L}_{S}(yH)^{2}$
		$\displaystyle=\frac{2^{-t}}{\binom{N}{S}}\sum_{y\in\mathbb{F}_{2}^{t}}K_{S}(yH)^{2}.$

∎

We will now combine Theorem 19 and Proposition 20 to obtain Theorem 2, i.e. to obtain a bound on the decoding error probability in terms of the Fourier coefficients of the level function $L_{\epsilon}$ . We prove a generalized version of Theorem 2 below. To recover Theorem 2, set $k=1$ and $l=N^{3/4}$ . (You want to think of the parameter $l$ as being $l>>\sqrt{N}$ in both the case $k=1$ and the case $k>1$ , so that the error term $e^{-\frac{\sqrt{N}}{3\epsilon}}$ is small).

Theorem 21.

Fix $\epsilon\in(0,\frac{1}{2})$ , let $H$ be any $t\times N$ Boolean matrix, and let $k=(2l+1)m+1$ for any integers $m\geq 0$ and $l\leq(\frac{1}{2}-\epsilon)N$ . Then

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]

\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq|S|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{v\sim\mu_{t}}\big{[}K_{S}(vH)^{2}\big{]}-1\Big{\}},

where the function $\mathbbm{1}\{k=1\}$ is $1$ when $k=1$ , and $0$ otherwise.

Proof.

Applying Theorem 19 and Proposition 20, we have

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq\|S\|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\mathsf{Coll_{H}}(S)-1\Big{\}}$
		$\displaystyle=2e^{-\frac{l^{2}}{3\epsilon N}}+\frac{4(l+1)}{k}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm l\}\\ 1\leq\|S\|\leq 1+\mathbbm{1}\{k=1\}\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\mathop{\mathbb{missing}}{E}_{v\sim\mu_{t}}\big{[}K_{S}(vH)^{2}\big{]}-1\Big{\}}.$

∎

One consequence of Theorem 21 is Corollary 3, which states that $C$ is resilient to $\epsilon$ -errors if the weight distribution of $C^{\perp}$ is close enough to the binomial distribution (see Appendix B for the proof). As another application of Theorem 21, we present the following bound on the probability of making a list-decoding error for a code $C$ . We note that once again, our bound depends only on the weight distribution of the dual code $C^{\perp}$ .

Proposition 22.

Fix any $\epsilon\in(0,\frac{1}{2})$ , and define $\beta=\frac{1-2\sqrt{\tilde{\epsilon}(1-\tilde{\epsilon})}}{2}$ for $\tilde{\epsilon}=\epsilon+N^{-1/4}.$ Let $B=\{\beta N,...,(1-\beta)N\}$ , and let $k^{*}=(2N^{3/4}+1)m+1$ for some integer $m\geq 0$ . Then for all $N>\left(\frac{5}{\epsilon}\right)^{20}$ and all integers $k\geq k^{*}$ , we have that any $t\times N$ matrix $H$ with entries in $\mathbb{F}_{2}$ satisfies

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{N}{k^{*}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{2^{N}}{\binom{N}{j}}-1\right\}$
		$\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}}}{k^{*}}\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 2^{2\epsilon N\log\|1-\frac{2j}{N}\|}\right\}.$

Proof.

We will use Theorem 21 to bound the decoding error probability in terms of the Krawtchouk polynomials $K_{S}(j)$ and the probability factors $\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}$ . Some of these terms will then be bounded using Proposition 7, and some will be bounded using Theorem 9. We proceed with the proof; letting $l=N^{3/4}$ in Theorem 21, we get

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]

\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{N}{k^{*}}\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j=0}^{N}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}K_{S}(j)^{2}-1\Big{\}}.

(9)

We want to bound the summation in the second term. We will start with the central terms $j\in B$ . For these we rely on Proposition 7, which states that $\frac{2^{-N}}{\binom{N}{S}}\sum_{j=0}^{N}\binom{N}{j}\cdot K_{S}(j)^{2}=1$ for all $S\subseteq\{0,...,N\}$ . For any $S\subseteq\{0,...,N\}$ , we then get

	$\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\in B}\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}K_{S}(j)^{2}$	$\displaystyle\leq\frac{1}{\binom{N}{S}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{1}{\binom{N}{j}}\right\}\sum_{j\in B}\binom{N}{j}\cdot K_{S}(j)^{2}$
		$\displaystyle\leq 2^{N}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{1}{\binom{N}{j}}\right\}.$		(10)

We then want to bound the contribution of the faraway terms $j\notin B$ to the summation in (9), i.e. we want to bound

\displaystyle\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j\notin B}\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}K_{S}(j)^{2}\Big{\}}.

(11)

We will want to apply Theorem 9 to every term in this sum. Note that by definition of Krawtchouk polynomials, for any $w,w^{\prime}$ we have

	$\displaystyle K_{\{w,w^{\prime}\}}(y)$	$\displaystyle=K_{w}(y)+K_{w^{\prime}}(y)$
		$\displaystyle\leq 2\cdot\max\big{\{}K_{w}(y),K_{w^{\prime}}(y)\big{\}}.$

We can then bound equation 11 by

	$\displaystyle(\ref{bdgoal})$	$\displaystyle\leq\frac{1}{\binom{N}{\epsilon N-N^{3/4}}}\cdot N\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq\|S\|\leq 2\\ j\notin B\end{subarray}}\Big{\{}\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}K_{S}(j)^{2}\Big{\}}$
		$\displaystyle\leq\frac{N}{\binom{N}{\epsilon N-N^{3/4}}}\mathop{\max}_{\begin{subarray}{c}w\in\{\epsilon N\pm N^{3/4}\}\\ j\notin B\end{subarray}}\Big{\{}\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 4K_{w}(j)^{2}\Big{\}}.$

Applying Theorem 9, we get

	$\displaystyle(\ref{bdgoal})$	$\displaystyle\leq\frac{4N}{\binom{N}{\epsilon N-N^{\frac{3}{4}}}}\max_{\begin{subarray}{c}w\in\{\epsilon N\pm N^{\frac{3}{4}}\}\\ j\notin B\end{subarray}}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 2^{2h(w)N+2w\log\|1-\frac{2j}{N}\|}\right\}$
		$\displaystyle\leq\frac{2^{N^{4/5}}}{N}\cdot 2^{h(\epsilon)N}\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}.$

Combining this bound for the faraway terms with our bound (5) for the central terms of the summation, we bound the right-hand side of equation (9) by

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{N}{k^{*}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{2^{N}}{\binom{N}{j}}-1\right\}$
		$\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}}}{k^{*}}\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}.$

∎

6 The Weight Distribution of Transitive Linear Codes

We will now prove Theorem 1. We note that the bound we get is essentially tight, since for $\eta\in(0,1)$ the repetition code

C=\left\{(z,...,z)\in\mathbb{F}_{q}^{N}:z\in\mathbb{F}_{q}^{\eta N}\right\}

is transitive, has dimension $\eta N$ , and has weight distribution

	$\displaystyle\Pr_{c\sim\mathcal{D}(C)}\Big{[}\|c\|=\alpha N\Big{]}$	$\displaystyle=q^{-\eta N}\cdot\binom{\eta N}{(1-\alpha)\eta N}(q-1)^{\alpha\eta N}$
		$\displaystyle\geq q^{-\eta N}\cdot\sqrt{\frac{8\pi}{e^{4}\eta N}}\cdot 2^{h(\alpha)\eta N}\cdot q^{\alpha\eta N\log_{q}(q-1)}$
		$\displaystyle=\sqrt{\frac{8\pi}{e^{4}\eta N}}\cdot q^{-(1-h_{q}(\alpha))\eta N}$

for all $\alpha\in(0,1)$ . We recall and prove our Theorem 1 below:

Theorem.

Let $C\subseteq\mathbb{F}_{q}^{N}$ be a transitive linear code. Then for any $\alpha\in(0,1-1/q)$ we have

\Pr_{c\sim\mathcal{D}(C)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\textnormal{dim }C},

where $\mathcal{D}(C)$ is the uniform distribution over all codewords in $C$ , $|c|$ is the number of non-zero coordinates of $c$ , and $h_{q}$ is the q-ary entropy

h_{q}(\alpha)=(1-\alpha)\log_{q}\frac{1}{1-\alpha}+\alpha\log_{q}\frac{q-1}{\alpha}.

Proof.

Let $M$ be the $t\times N$ generator matrix of $C$ , and let $r=\textnormal{rank }M=\textnormal{dim }C$ . Without loss of generality, suppose that the first $r$ columns of $M$ span the column-space of $M$ . Define

C^{(\alpha)}=\{c\in C:|c|=\alpha N\},

and let $Z=(Z_{1},...,Z_{N})$ be a uniformly random codeword in $C^{(\alpha)}$ . Now $C$ is transitive, so for every $j,k\in\{1,...,N\}$ the random variables $Z_{j}$ and $Z_{k}$ are identically distributed. By linearity of expectation and by definition of $C^{(\alpha)}$ , we thus have that for every $j\in\{1,...,N\}$ ,

\displaystyle\Pr_{Z\sim\mathcal{D}(C^{(\alpha)})}[Z_{j}=0]=1-\alpha.

(12)

But under condition (12), $Z_{j}$ has maximal entropy when its remaining mass is uniformly distributed over $\{1,...,q-1\}$ , i.e. when $\Pr[Z_{j}=m]=\frac{\alpha}{q-1}$ for all $m\in\{1,...,q-1\}$ . The entropy of $Z_{j}$ is thus bounded by

	$\displaystyle\mathop{\mathsf{H}}_{Z\sim\mathcal{D}(C^{(\alpha)})}(Z_{j})$	$\displaystyle\leq(1-\alpha)\log\frac{1}{1-\alpha}+(q-1)\cdot\frac{\alpha}{q-1}\log\frac{q-1}{\alpha}$
		$\displaystyle=h_{q}(\alpha)\log(q).$		(13)

We will now show that $\mathsf{H}(Z_{j}|Z_{1},...,Z_{j-1})=0$ for every $j>r$ . To this end, fix some $j>r$ . Recall that the columns $\{M_{1},...,M_{r}\}$ span the column-space of $M$ , so we can write the column $M_{j}$ as $M_{j}=\sum_{k=1}^{r}\beta_{k}M_{k}$ for some $\beta_{1},...,\beta_{r}\in\mathbb{F}_{q}$ . But any codeword $c\in C$ can be expressed as $v^{(c)}M$ for some $v^{(c)}\in\mathbb{F}_{q}^{t}$ , so any codeword $c\in C$ satisfies

c_{j}=v^{(c)}M_{j}=\sum_{k=1}^{r}\beta_{k}v^{(c)}M_{k}=\sum_{k=1}^{r}\beta_{k}c_{k}.

The random variable $Z_{j}$ is thus determined by $\{Z_{1},...,Z_{r}\}$ , and so we indeed have

\mathop{\mathsf{H}}_{Z\sim\mathcal{D}(C^{(\alpha)})}(Z_{j}|Z_{1},...,Z_{j-1})=0

for every $j>r$ . Applying (6) and the chain rule for entropy then gives

	$\displaystyle\mathsf{H}(Z)$	$\displaystyle=\mathsf{H}(Z_{1})+\sum_{i=2}^{N}\mathsf{H}(Z_{i}\|Z_{1},...,Z_{i-1})$
		$\displaystyle\leq\sum_{i=1}^{r}\mathsf{H}(Z_{i})$
		$\displaystyle=r\cdot h_{q}(\alpha)\log(q)$

Now $Z$ is sampled uniformly from $C^{(\alpha)}$ , so $\mathsf{H}(Z)=\log\Big{(}|C^{(\alpha)}|\Big{)}$ . We thus have

	$\displaystyle\Pr_{c\sim\mathcal{D}(C)}\Big{[}\|c\|=\alpha N\Big{]}$	$\displaystyle=\frac{\left\|C^{(\alpha)}\right\|}{q^{r}}$
		$\displaystyle=2^{\mathsf{H}(Z)}\cdot q^{-r}$
		$\displaystyle\leq q^{-(1-h_{q}(\alpha))\cdot r}.$

∎

For Reed-Muller codes, we will abuse notation and denote by $\mathcal{D}(n,d)$ the uniform distribution over all codewords in $\mathsf{RM}(n,d)$ .

Theorem 23.

For any $n,d<n,$ and $\alpha\in(0,1)$ , the Reed-Muller code $\mathsf{RM}(n,d)$ over the prime field $\mathbb{F}_{q}$ satisfies

\Pr_{c\sim\mathcal{D}(n,d)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\cdot\binom{n}{\leq d}}.

Proof.

This follows immediately from Theorem 1, Fact 12, and Fact 11. ∎

7 List Decoding for Transitive Codes

We now turn to proving Theorem 5. Recall that in section 5 we bounded the minimum size for the decoding list of a linear code in terms of the weight distribution of its dual code. But as we mentioned in the preliminaries, the dual code of a transitive code is also transitive. For any transitive linear code $C$ , we can thus apply our Theorem 1 for the weight distribution of $C^{\perp}$ to get a bound on the size of the decoding list for $C$ . We restate and prove our Theorem 5 below.

Theorem.

|T(x)|=2^{N^{5/6}}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}).

Proof.

We will show that there exists a function $T$ mapping every $x\in\mathbb{F}_{2}^{N}$ to a subset $T(x)\subseteq C$ of size

|T(x)|=2^{N^{5/6}}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}),

with the property that for every codeword $c\in C$ we have

\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}c\notin T(c+\rho)\big{]}\leq\frac{2}{N}.

Let $H$ denote the parity-check matrix of $C$ . By Lemma 17, it is sufficient to show that for any $N>\left(\frac{5}{\epsilon}\right)^{20}$ and any $k>1$ we have

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]

\displaystyle\leq\frac{1}{N}+\frac{2^{N^{5/6}}}{Nk}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}).

(14)

We will thus prove (14). Recall that for $k>N$ , Proposition 22 yields the following bound on the left-hand side of (14):

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{2N}{k}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{2^{N}}{\binom{N}{j}}\right\}$
		$\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}+1}}{k}\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\},$		(15)

where $\beta=\frac{1}{2}\left(1-2\sqrt{\tilde{\epsilon}(1-\tilde{\epsilon})}\right)$ for $\tilde{\epsilon}=\epsilon+N^{-1/4}$ , and $B=\{\beta N,...,(1-\beta)N\}$ . Our goal will be to bound both the central terms $j\in B$ and the faraway terms $j\notin B$ by using our bounds on the weight distribution of transitive codes. As we’ve seen in section 2, the dual code $C^{\perp}$ is a transitive linear code of dimension $N-\textnormal{dim }C$ . By Theorem 1, we thus have that for all $j\in\{0,...,N\}$ ,

\displaystyle\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\leq 2^{-(1-h(\frac{j}{N}))(1-\eta)N}.

(16)

For any $j\in B$ , we then have by Fact 13 that

	$\displaystyle\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}$	$\displaystyle\leq 2^{-(1-h(j/N))(1-\eta)N}\cdot\frac{2^{N}}{\sqrt{\frac{8\pi}{e^{4}N}}\cdot 2^{h(j/N)N}}$
		$\displaystyle=\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(j/N))\eta N}.$

But for $j\in B$ we have $\beta<\frac{j}{N}<1-\beta$ , so the right-hand side is maximized at $j=\beta N$ . Applying Lemma 14, we get

	$\displaystyle\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\}$	$\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(\beta))\eta N}$
		$\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{4\tilde{\epsilon}(1-\tilde{\epsilon})\eta N}.$		(17)

We now turn to the faraway terms of equation (7). By equation (16), we have

\displaystyle\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}\leq\max_{\delta<\beta}\left\{2^{-(1-h(\delta))(1-\eta)N}\cdot 2^{2\epsilon N\log(1-2\delta)}\right\}.

Note that by definition of $\beta$ , any $\delta<\beta$ can be written as $\delta=\frac{1-2\sqrt{\alpha\tilde{\epsilon}(1-\tilde{\epsilon})}}{2}$ for some $\alpha>1$ . By Lemma 14, we can then rewrite our previous expression as

\displaystyle\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}\leq\max_{\alpha>1}\left\{2^{-\frac{2\alpha\tilde{\epsilon}(1-\tilde{\epsilon})}{\ln 2}(1-\eta)N}\cdot 2^{\epsilon N\log(4\alpha\tilde{\epsilon}(1-\tilde{\epsilon}))}\right\}.

But for any positive constant $c$ , the derivative of $\log(\alpha)-c\alpha$ is $\frac{1}{\alpha\cdot\ln 2}-c$ , and the second derivative is always negative. Thus, the above expression achieves its maximum when $\alpha=\frac{\epsilon}{2\tilde{\epsilon}(1-\tilde{\epsilon})(1-\eta)}$ . We then get

	$\displaystyle\max_{j<\beta N}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot 2^{2\epsilon N\log(1-\frac{2j}{N})}\right\}$	$\displaystyle\leq 2^{-\frac{\epsilon N}{\ln 2}}\cdot 2^{\epsilon N\log(\frac{2\epsilon}{1-\eta})}$
		$\displaystyle\leq 2^{-h(\epsilon)N}\cdot 2^{\epsilon N\log(\frac{2}{1-\eta})},$		(18)

where in the last line we used the inequality $\log(1-x)\geq-\frac{x}{(1-x)\ln 2}$ for $x<1$ to get $h(\epsilon)\leq-\epsilon\log(\epsilon)+\frac{\epsilon}{\ln 2}$ . We now use equations (7) and (7) to bound the central and faraway terms of (7) respectively. This gives

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{2N}{k}\cdot\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{4\tilde{\epsilon}(1-\tilde{\epsilon})\eta N}+\frac{2^{N^{\frac{4}{5}}+1}}{k}\cdot 2^{\epsilon N\log(\frac{2}{1-\eta})}$
		$\displaystyle\leq\frac{1}{N}+\frac{2^{N^{5/6}}}{Nk}\cdot(2^{4\epsilon\eta N}+2^{\epsilon N\log(\frac{2}{1-\eta})}).$

We have shown (14), and so we are done. ∎

8 List Decoding for Reed-Muller Codes

We will now turn to proving our list-decoding bounds for Reed-Muller codes. The dual code of the Reed-Muller code $\mathsf{RM}(n,d)$ is the code $\mathsf{RM}(n,n-d-1)$ , so we can apply Samorodnitsky’s Theorem 6 to our Proposition 22. We restate and prove our Theorem 4 below.

Theorem.

|T|=2^{h(\epsilon)N-\gamma N+o(N)}+2^{4\epsilon N+o(N)}.

Proof.

We will show that there exists a function $T$ mapping every $x\in\mathbb{F}_{2}^{N}$ to a subset $T(x)\subseteq\mathsf{RM}(n,d)$ of size

|T|=2^{h(\epsilon)N-\gamma N+o(N)}+2^{4\epsilon N+o(N)},

with the property that for every codeword $c\in\mathsf{RM}(n,d)$ we have

\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}c\notin T(c+\rho)\big{]}\leq\frac{2}{N}.

Let $H$ denote the parity-check matrix of $\mathsf{RM}(n,d)$ . By Lemma 17, it is sufficient to show that for any $N>\left(\frac{5}{\epsilon}\right)^{20}$ and any $k>1$ we have

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]

\displaystyle\leq\frac{1}{N}+\frac{2^{o(N)}}{kN}\left(2^{4\epsilon N}+2^{h(\epsilon)N-(1-\eta)N}\right).

(19)

We will thus prove (19). Recall that for $k>N$ , Proposition 22 yields the following bound on the left-hand side of (19):

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]$	$\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+\frac{2N}{k}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\}$
		$\displaystyle+\frac{2^{h(\epsilon)N+N^{\frac{4}{5}}+1}}{k}\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot 2^{2\epsilon N\log\|1-\frac{2j}{N}\|}\right\},$		(20)

where $\beta=\frac{1}{2}\left(1-2\sqrt{\tilde{\epsilon}(1-\tilde{\epsilon})}\right)$ for $\tilde{\epsilon}=\epsilon+N^{-1/4}$ , and $B=\{\beta N,...,(1-\beta)N\}$ . Our goal is to bound every term in these sums by using the weight distribution bounds given in Theorems 1 and 6. We bound the central terms in exactly the same way as in Theorem 5: by Theorem 23 we know that the weight distribution of the Reed-Muller code satisfies

\displaystyle\Pr_{v\sim\mu_{t}}\big{[}|vH|=j\big{]}\leq 2^{-(1-h(\frac{j}{N}))(1-\eta)N},

so by Fact 13 we have

	$\displaystyle\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\}$	$\displaystyle\leq\max_{j\in B}\left\{2^{-(1-h(j/N))(1-\eta)N}\cdot\frac{2^{N}}{\sqrt{\frac{8\pi}{e^{4}N}}\cdot 2^{h(j/N)N}}\right\}$
		$\displaystyle=\max_{j\in B}\left\{\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(j/N))\eta N}\right\}.$

But $B=\{\beta N,...,(1-\beta)N\}$ , so by Lemma 14 we have

	$\displaystyle\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}\cdot\frac{2^{N}}{\binom{N}{j}}\right\}$	$\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{(1-h(\beta))\eta N}$
		$\displaystyle\leq\sqrt{\frac{e^{4}N}{8\pi}}\cdot 2^{4\tilde{\epsilon}(1-\tilde{\epsilon})\eta N}.$		(21)

For the faraway terms, we use the weight bound from Theorem 6. By symmetry, we get that

	$\displaystyle\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot 2^{2\epsilon N\log\|1-\frac{2j}{N}\|}\right\}$	$\displaystyle\leq 2^{o(N)}\cdot\max_{j\leq\frac{N}{2}}\left\{2^{-(1-\eta)N}\left(\frac{1}{\eta}\right)^{2j\ln 2}\cdot 2^{2\epsilon N\log\|1-\frac{2j}{N}\|}\right\}$
		$\displaystyle=2^{o(N)}\cdot 2^{-(1-\eta)N}\max_{j\leq\frac{N}{2}}\left\{2^{-2j\ln 2\cdot\log(\eta)+2\epsilon N\log(1-\frac{2j}{N})}\right\}.$		(22)

Now the function

g(j)=-2j\ln 2\cdot\log(\eta)+2\epsilon N\log(1-\frac{2j}{N})

has first derivative

\frac{dg}{dj}=-2\ln 2\cdot\log(\eta)-\frac{4\epsilon}{\ln 2\cdot(1-\frac{2j}{N})},

and second derivative

\frac{dg^{2}}{d^{2}j}=-\frac{8\epsilon}{\ln 2\cdot N(1-\frac{2j}{N})^{2}}<0.

Thus $g(j)$ achieves its maximum at $j^{*}=\frac{N}{2}+\frac{\epsilon N}{(\ln 2)^{2}\log(\eta)}$ and is decreasing over $[j^{*},\frac{N}{2}]$ . Whenever $\eta>2^{-\frac{2\epsilon}{(\ln 2)^{2}}}$ , we have $j^{*}\leq 0$ ; in that case the argument in equation (8) is maximized at $j=0$ , and we get

\displaystyle\max_{j\notin B}\left\{\Pr_{v\sim\mu_{t}}[|vH|=j]\cdot 2^{2\epsilon N\log|1-\frac{2j}{N}|}\right\}

\displaystyle\leq 2^{-(1-\eta)N+o(N)}.

Combining this bound for the faraway terms with the bound (8) for the central terms, we bound the right-hand side of (8) by

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[\rho\notin D_{k}(H\rho^{\intercal})]

\displaystyle\leq\frac{1}{N}+\frac{2^{o(N)}}{kN}\left(2^{4\epsilon\eta N}+2^{h(\epsilon)N-(1-\eta)N}\right).

We have shown (19), and so we are done. ∎

Acknowledgements

We thank Alexander Barg, Paul Beame, Noam Elkies, Amir Shpilka, Madhu Sudan and Amir Yehudayoff for useful discussions.

Appendix A Weight Bounds Comparisons

In this section, we will compare our Theorem 23 with previously known bounds on the weight distribution of Reed-Muller codes. We recall our Theorem 23 below. Note that throughout this section, $\mathcal{D}(n,d)$ will denote the uniform distribution over all codewords in $\mathsf{RM}(n,d)$ , and $|c|$ will denote the number of non-zero coordinates of $c$ .

Theorem.

For any $n,d<n,$ and $\alpha\in(0,1)$ , the Reed-Muller code $\mathsf{RM}(n,d)$ over the prime field $\mathbb{F}_{q}$ satisfies

\Pr_{c\sim\mathcal{D}(n,d)}\Big{[}|c|=\alpha N\Big{]}\leq q^{-(1-h_{q}(\alpha))\cdot\binom{n}{\leq d}},

where we have defined

h_{q}(\alpha)=(1-\alpha)\log_{q}\frac{1}{1-\alpha}+\alpha\log_{q}\frac{q-1}{\alpha}.

Reed-Muller codes over odd prime fields
We start with Reed-Muller codes over odd prime fields, for which the only preexisting weight bound we are aware of is the following result of [BGY20]:

Theorem 24 ([BGY20]).

For any $0<\delta<\frac{1}{2}$ , there are constants $c_{1},c_{2}>0$ such that for any odd prime $q$ and for any integers $d,n$ such that $d\leq\delta n$ , we have

\Pr_{c\sim\mathcal{D}(n,d)}\Big{[}\frac{|c|}{N}\leq 1-\frac{1}{q}-q^{-c_{1}\frac{n}{d}}\Big{]}\leq q^{-c_{2}\binom{n}{\leq d}}.

This was a generalization of [BHL12], who proved the same result for Reed-Muller codes over $\mathbb{F}_{2}$ . Theorem 24 is very strong for small degrees, but gets weaker as the degree increases. When $d$ is linear in $n$ we have $q^{-c_{1}\frac{n}{d}}=\Theta(1)$ , meaning that in this regime Theorem 24 can only give a nontrivial bound on normalized weights that are at least a constant away from $1-\frac{1}{q}$ . Our Theorem 23 gives nontrivial bounds for all normalized weights $<1-\frac{1}{q}$ , for all degrees $d<n$ .

Reed-Muller codes over $\mathbb{F}_{2}$
We now turn to Reed-Muller codes over $\mathbb{F}_{2}$ , for which more results are known. The same bound as Theorem 24 was proven over $\mathbb{F}_{2}$ by [BHL12]. For comparison with our Theorem 23, see the discussion above.

In the constant-rate regime (i.e. $d=\frac{n}{2}\pm O(\sqrt{n})$ ), the strongest known bounds for contant weights are the following two results of [Sam20]:

Theorem 25 ([Sam20]).

Let $\binom{n}{\leq d}=\eta 2^{n}=\eta N$ for some $\eta\in(0,1)$ . Then for any $\alpha\in(0,\frac{1}{2})$ we have

\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\alpha N]\leq 2^{o(N)}\left(\frac{1}{1-\eta}\right)^{2\ln 2\cdot\alpha N}2^{-\eta N}.

This result is strong when $\alpha$ is away from $1/2$ . For $\alpha$ close to $1/2$ , the following bound is stronger.

Theorem 26 ([Sam20]).

Let $\binom{n}{\leq d}=\eta 2^{n}=\eta N$ for some $\eta\in(0,1)$ , and define $A=\{\frac{1-\eta^{2\ln 2}}{2},...,\frac{1}{2}\}$ . Then for any $\alpha\in(0,\frac{1}{2})$ ,

\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\alpha N]\leq 2^{o(N)}\cdot\begin{cases}\frac{\binom{N}{\alpha N}}{2^{N}}&\text{if $\alpha\in A$,}\\ \frac{1}{(1-\eta^{2\ln 2})^{\alpha N}(1+\eta^{2\ln 2})^{(1-\alpha)N}}&\text{otherwise.}\end{cases}

We note that the combination of Theorems 25 and 26 is stronger than our Theorem 23 whenever both the rate of the code and the normalized weight of the codeword are constant (i.e. $\alpha=\Theta(1)$ and $d=\frac{n}{2}\pm O(\sqrt{n})$ ).

However, when the normalized weight is subconstant or when the degree is away from $\frac{n}{2}$ (i.e. $\alpha=o(1)$ or $d=\frac{n}{2}-\Theta(n)$ ), the $2^{o(N)}$ term becomes too large for Theorems 25 and 26 to give a strong bound. An approach that has been fairly successful in these two regimes (subonstant rate or subconstant weight) is the line of work of [KLP12, ASW15, SS20]. To our knowledge, the strongest results for these regimes are due to [SS20]. We start with their bound for lower weights, i.e. for weights in $[0,\frac{N}{4}].$

Theorem 27 ([SS20]).

For any integers $j,n,d$ , we have

\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq 2^{-j}\cdot 2^{n}]

\displaystyle\leq 2^{-\big{(}1-17(\frac{j}{1-\frac{d}{n}}+\frac{2-\frac{d}{n}}{(1-\frac{d}{n})^{2}})(\frac{d}{n})^{j-1}\big{)}\binom{n}{\leq d}+O(n^{4})}.

We claim that for every $d>\frac{n}{34}$ , there is some weight threshold $A_{d}<\frac{1}{4}$ for which our Theorem 23 is stronger than Theorem 27 for all weights larger than $A_{d}N$ . One way to see this is to note that our Theorem 23 satisfies

	$\displaystyle Pr[\|c\|\leq 2^{-j}\cdot 2^{n}]$	$\displaystyle\leq 2^{-\big{(}1-h(2^{-j})\big{)}\binom{n}{\leq d}}$
		$\displaystyle\leq 2^{-(1-2j\cdot 2^{-j})\binom{n}{\leq d}},$

while the expression in Theorem 27 satisfies

2^{-\big{(}1-17(\frac{j}{1-\frac{d}{n}}+\frac{2-\frac{d}{n}}{(1-\frac{d}{n})^{2}})(\frac{d}{n})^{j-1}\big{)}\binom{n}{\leq d}}\geq 2^{-\big{(}1-17j(\frac{d}{n})^{j-1}\big{)}\binom{n}{\leq d}}.

Thus our Theorem 23 is stronger than Theorem 27 whenever $j\cdot 2^{-(j-1)}<17j\cdot(\frac{d}{n})^{j-1}$ , i.e. whenever

j<\frac{\log 17}{\log\frac{n}{2d}}+1.

For any $d>\frac{n}{34}$ , this gives a nontrivial range.

This concludes our comparison of Theorem 23 with Theorem 27, which was the bound of [SS20] for weights in $[0,\frac{N}{4}].$ We now turn to their bounds for larger weights.

Theorem 28 ([SS20]).

Let $j,n\in\mathbb{N}$ and let $0<\gamma(n)<\frac{1}{2}-\Omega\left(\sqrt{\frac{\log n}{n}}\right)$ be a parameter (which may be constant or depend on $n$ ) such that $\frac{j+\log\frac{1}{1-2\gamma}}{(1-2\gamma)^{2}}=o(n).$ Then

\Pr_{c\sim\mathcal{D}(n,\gamma n)}[|c|\leq\frac{1-2^{-j}}{2}N]\leq 2^{-2^{-c(\gamma,j)}\binom{n}{\leq d}+O(n^{4})},

where $c(\gamma,j)=O\left(\frac{\gamma^{2}j+\gamma\log\frac{1}{1-2\gamma}}{1-2\gamma}+\gamma\right)$ .

This bound holds when the degree is smaller than $\frac{n}{2}$ . For arbitrary degree, [SS20] gives the following:

Theorem 29 ([SS20]).

For any integers $n,d$ and any $\delta>0$ , we have

\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[|c|\leq\frac{1-\delta}{2}N]

\displaystyle\leq e^{-\frac{\delta^{2}}{2}\cdot 2^{d}}.

We will start by comparing our Theorem 23 with Theorem 29. Applying Lemma 14, we get from Theorem 23 that

	$\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[\|c\|\leq\frac{1-\delta}{2}N]$	$\displaystyle\leq 2^{-(1-h(\frac{1-\delta}{2}))\cdot\binom{n}{\leq d}}$
		$\displaystyle\leq e^{-\frac{\delta^{2}}{2}\cdot\binom{n}{\leq d}}.$

Thus our Theorem 23 is strictly stronger than Theorem 29 for all $d<n$ . We will now compare our Theorem 23 with Theorem 28. Applying Lemma 14, we get from Theorem 23 that

	$\displaystyle\Pr_{c\sim\mathcal{D}(n,d)}[\|c\|\leq\frac{1-2^{-j}}{2}N]$	$\displaystyle\leq 2^{-(1-h(\frac{1-2^{-j}}{2}))\cdot\binom{n}{\leq d}}$
		$\displaystyle\leq 2^{-\frac{2^{-2j}}{2\ln 2}\cdot\binom{n}{\leq d}}.$

It follows that our Theorem 23 is stronger than Theorem 28 whenever $2^{-(2j+1)}\geq 2^{-c(\gamma,j)}$ , i.e. whenever

\displaystyle 2j+1\leq c(\gamma,j).

But $c(\gamma,j):=O\left(\frac{\gamma^{2}}{1-2\gamma}\cdot j+\frac{\gamma\log\frac{1}{1-2\gamma}}{1-2\gamma}+\gamma\right)$ , and $\frac{\gamma^{2}}{1-2\gamma}\rightarrow\infty$ as $\gamma\rightarrow 1/2$ . Thus there exists some constant $\gamma^{*}\in(0,\frac{1}{2})$ such that our Theorem 23 is stronger than Theorem 28 whenever $d>\gamma^{*}n$ . In private correspondence with Amir Shpilka and Ori Sberlo, we learned that $\gamma^{*}$ can be computed to be $\gamma^{*}\approx 0.38$ .

Appendix B Proof of Corollary 3

Recall that for any $\epsilon\in(0,1)$ we defined

A_{\epsilon}=\{\alpha N:h(\alpha)>1-h(\epsilon)-N^{-1/5}\},

and that for any code $C$ we denote by $\mathcal{D}(C^{\perp})$ the uniform distribution over the dual code $C^{\perp}$ . We now restate and prove our Corollary 3.

Corollary.

Let $C\subseteq\mathbb{F}_{2}^{N}$ be a linear code, and let $\epsilon\in(0,\frac{1}{2})$ be arbitrary. Suppose that for every $j\in A_{\epsilon}$ we have

\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|=j\big{]}\leq\big{(}1+o(N^{-1})\big{)}\frac{\binom{N}{j}}{2^{N}},

and suppose that

\displaystyle\Pr_{y\sim\mathcal{D}(C^{\perp})}\big{[}|y|\notin A_{\epsilon}\big{]}\leq 2^{N^{\frac{3}{4}}}\cdot\frac{\sum_{i\notin A_{\epsilon}}\binom{N}{i}}{2^{N}}.

Then $C$ is resilient to $\epsilon$ -errors.

Proof.

From Theorem 2, we know that there exists some decoder $d:\mathbb{F}_{2}^{N}\rightarrow C$ such that for all $c\in C$ ,

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[d(c+\rho)\neq c]

\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+N\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq|S|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j=0}^{N}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2}-1\Big{\}},

(23)

where $\binom{N}{S}=\sum_{s\in S}\binom{N}{s}$ and where $K_{S}=\sum_{s\in S}K_{s}$ for $K_{s}$ the Krawtchouk polynomial of degree $s$ . Let $\nu$ be such that $h(\nu)=1-h(\epsilon)-N^{-1/5}$ , and define the set of weights

A_{\epsilon}=\{\nu N,...,(1-\nu)N\}.

We will start by bounding the central terms $j\in A_{\epsilon}$ in equation (23). Applying Proposition 7 and the first condition in our theorem statement, we immediately get that for any $S\subseteq\{0,...,N\}$ ,

\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\in A_{\epsilon}}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2}

\displaystyle\leq 1+o\big{(}\frac{1}{N}\big{)}.

(24)

We now turn to the faraway terms $j\notin A_{\epsilon}$ . For these we note that by definition of Krawtchouk polynomials, for any integer $s$ we have

\displaystyle K_{s}(x)=\sum_{j=0}^{s}(-1)^{j}\binom{x}{j}\binom{N-x}{s-j}\leq\sum_{j=0}^{s}\binom{x}{j}\binom{N-x}{s-j}=\binom{N}{s}.

For any $S\subseteq\{0,...,N\}$ , we can then bound the faraway terms $j\notin A_{\epsilon}$ of equation (23) by

\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\notin A_{\epsilon}}\Pr_{c\sim C^{\perp}}\big{[}|c|=j\big{]}K_{S}(j)^{2}

\displaystyle\leq\binom{N}{S}\Pr\big{[}|y|\notin A_{\epsilon}\big{]}.

Applying the second condition in our theorem statement in combination with Fact 13 and the subbaditivity of entropy, we get that

	$\displaystyle\mathop{\max}_{\begin{subarray}{c}S\subseteq\{\epsilon N\pm N^{3/4}\}\\ 1\leq\|S\|\leq 2\end{subarray}}\Big{\{}\frac{1}{\binom{N}{S}}\sum_{j\notin A_{\epsilon}}\Pr_{c\sim C^{\perp}}\big{[}\|c\|=j\big{]}K_{S}(j)^{2}\Big{\}}$	$\displaystyle\leq 2\binom{N}{\epsilon N+N^{3/4}}\cdot 2\cdot 2^{-h(\epsilon)N-N^{4/5}+N^{3/4}}$
		$\displaystyle\leq 4\cdot 2^{h(\epsilon)N+h(N^{-1/4})N}\cdot 2^{-h(\epsilon)N-N^{4/5}+N^{3/4}}$
		$\displaystyle\leq o(\frac{1}{N}).$

Combining this bound for the faraway terms with our bound (24) for the central terms, we bound equation (23) by

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}[d(c+\rho)\neq c)]$	$\displaystyle\leq 2e^{-\frac{\sqrt{N}}{3\epsilon}}+N\cdot o\big{(}\frac{1}{N}\big{)}$
		$\displaystyle\leq o(1).$

∎

Appendix C Lower Bounds on List Decoding

Claim 30.

Let $\epsilon\in(0,\frac{1}{2})$ be arbitrary, and consider any $N>\frac{10}{\epsilon^{2}}$ . Suppose a code $C\subseteq\mathbb{F}_{2}^{N}$ and a decoder $d_{k}:\mathbb{F}_{2}^{N}\rightarrow C^{\otimes k}$ satisfy

\mathop{\Pr}_{\begin{subarray}{c}\rho\sim P_{\epsilon}\\ c\sim\mathcal{D(C)}\end{subarray}}[c\in d_{k}(c+\rho)]\geq\frac{3}{4},

for $P_{\epsilon}$ the $\epsilon$ -noisy distribution and $\mathcal{D(C)}$ the uniform distribution on $C$ . Then we must have

k\geq|C|\cdot 2^{-(1-h(\epsilon))N}\cdot\frac{2^{-h(\epsilon)N^{3/4}}}{8}.

Proof.

We will first show that in order for the decoder $d_{k}$ to succeed with high probability, there must be many codewords $c\in C$ for which

|\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}|\gtrsim 2^{h(\epsilon)N}.

Intuitively this is because the sphere of radius $\epsilon N$ around any codeword $c$ contains $\approx 2^{h(\epsilon)N}$ points (and for any sent codeword $c$ , with high probability the received message $m$ will satisfy $|m+c|\approx\epsilon N$ ). We will then simply double-count the number of pairs $(x,c)$ for which $c\in d_{k}(x)$ . On the one hand, there are $2^{N}\cdot k$ such pairs, since every received message is mapped to $k$ codewords; on the other hand, there must be at least about $|C|\cdot 2^{h(\epsilon)N}$ pairs, since as we’ve just argued most codewords in $C$ need to be matched to at least $\approx 2^{h(\epsilon)N}$ points. It follows that we must have

k\gtrsim|C|\cdot\frac{2^{h(\epsilon)N}}{2^{N}}.

Formally, we first note that the theorem condition implies that at least $\frac{|C|}{2}$ codewords $c\in C$ must satisfy

\displaystyle\Pr_{\rho\sim P_{\epsilon}}[c\in d_{k}(c+\rho)]\geq\frac{1}{2}.

(25)

Fix any such $c$ . Now from Chernoff’s bound (i.e Lemma 16), we have for $N$ large enough that

\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}|\rho|\leq\epsilon N-\epsilon N^{3/4}\big{]}

\displaystyle\leq\frac{1}{4}.

In order for $c$ to satisfy $c\in d_{k}(c+\rho)$ with probability at least $\frac{1}{2}$ , there must then be a subset $S_{c}\subseteq\{x\in\mathbb{F}_{2}^{N}:|c+x|\geq\epsilon N-\epsilon N^{3/4}\}$ satisfying both

\displaystyle x\in S_{c}\implies c\in d_{k}(x)

(26)

and

\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}\rho\in S_{c}\big{]}\geq\frac{1}{4}.

(27)

But every element $x\in S_{c}$ satisfies $|c+x|\geq\epsilon N-\epsilon N^{3/4}$ , so every $x\in S_{c}$ satisfies

	$\displaystyle\Pr_{\rho\sim P_{\epsilon}}\big{[}\rho=c+x\big{]}$	$\displaystyle\leq\epsilon^{\epsilon N-\epsilon N^{3/4}}(1-\epsilon)^{(1-\epsilon)N+\epsilon N^{3/4}}$
		$\displaystyle\leq 2^{-(1-N^{-1/4})h(\epsilon)N}$		(28)

Equations (27) and (C) imply that any $c\in C$ that can be list-decoded by $d_{k}$ with probability $\geq\frac{1}{2}$ must satisfy $|S_{c}|\geq\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}$ . It then follows from (26) that any such $c$ must satisfy

\big{|}\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}\big{|}\geq\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}.

By double counting, we get

	$\displaystyle 2^{N}\cdot k$	$\displaystyle=\sum_{c\in C}\big{\|}\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}\big{\|}$
		$\displaystyle\geq\frac{\|C\|}{2}\cdot\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}$
		$\displaystyle=\frac{\|C\|}{8}\cdot 2^{h(\epsilon)N-h(\epsilon)N^{3/4}}.$

The result then follows from rearranging terms. ∎

Appendix D Other Proofs for Sections 1 and 2

D.1 On Known List-Decoding Bounds for Reed-Muller Codes

We recall the known list-decoding bound for Reed-Muller codes (see equation (3) in section 1):

\displaystyle|T|=2^{\epsilon N\log\frac{4\epsilon(1-\epsilon)}{(1-\eta)^{4\ln 2}}+o(N)}

We claim this bound never achieves the information-theoretic $2^{h(\epsilon)N-(N-\textnormal{dim }C)+o(N)}$ .

Claim 31.

For any $\epsilon\in(0,\frac{1}{2})$ and any $\gamma=\gamma(\epsilon)\in(0,1)$ , we have

\displaystyle\epsilon\log\frac{4\epsilon(1-\epsilon)}{\gamma^{4\ln 2}}>h(\epsilon)-\gamma.

Proof.

We will show that for any $\epsilon\in(0,\frac{1}{2})$ and $c=\frac{\gamma}{\epsilon}<\frac{1}{\epsilon}$ we have

\displaystyle\epsilon\log\frac{4\epsilon(1-\epsilon)}{(c\epsilon)^{2}}>h(\epsilon)-c\epsilon,

i.e. that

\displaystyle f(\epsilon,c):=\log(1-\epsilon)+2\epsilon-2\epsilon\log c+c\epsilon>0.

(29)

We first fix some $\epsilon\in(0,\frac{1}{2})$ and compute the $c$ maximizing $f(\epsilon,c)$ . Note that

\displaystyle\frac{\partial}{\partial c}f(\epsilon,c)

\displaystyle=-\frac{2\epsilon}{c\ln 2}+\epsilon

and

\displaystyle\frac{\partial^{2}}{\partial c^{2}}f(\epsilon,c)

\displaystyle=\frac{2\epsilon}{c^{2}\ln 2}>0,

so $f(\epsilon,c)$ is minimized at $c=\frac{2}{\ln 2}$ and increasing over $c\in[0,\frac{2}{\ln 2}]$ . We thus have

\displaystyle\min_{c<\frac{1}{\epsilon}}f(\epsilon,c)=\begin{cases}f(\epsilon,\frac{2}{\ln 2})&\text{if $\epsilon<\frac{\ln 2}{2}$,}\\ f(\epsilon,\frac{1}{\epsilon})&\text{otherwise.}\end{cases}

(30)

We deal with each case separately. For the case $\epsilon<\frac{\ln 2}{2}$ , we want to show that

\displaystyle f(\epsilon,\frac{2}{\ln 2})=\log(1-\epsilon)+2\epsilon\log(\ln 2)+\frac{2\epsilon}{\ln 2}\geq 0.

The first derivative is

\displaystyle\frac{\partial}{\partial\epsilon}f(\epsilon,\frac{2}{\ln 2})=-\frac{1}{(1-\epsilon)\ln 2}+2\log(\ln 2)+\frac{2}{\ln 2},

and the second derivative is

\displaystyle\frac{\partial^{2}}{\partial\epsilon^{2}}f(\epsilon,\frac{2}{\ln 2})=-\frac{1}{(1-\epsilon)^{2}\ln 2}<0.

Thus the function $f(\epsilon,\frac{2}{\ln 2})$ is maximized at $\epsilon^{*}=1-\frac{1}{(2\log(\ln 2)+\frac{2}{\ln 2})\ln 2}\approx 0.21,$ and monotone on each side of $\epsilon^{*}$ . In particular, since $\epsilon^{*}\in[0,\frac{\ln 2}{2}]$ we know that over the interval $[0,\frac{\ln 2}{2}]$ the function $f(\epsilon,\frac{2}{\ln 2})$ achieves its minimum at either $\epsilon=0$ or $\epsilon=\frac{\ln 2}{2}$ . But $f(0,\frac{2}{\ln 2})=0<f(\frac{2}{\ln 2},\frac{\ln 2}{2})$ , so we indeed have that

f(\epsilon,\frac{2}{\ln 2})\geq 0

for all $\epsilon<\frac{\ln 2}{2}$ . This deals with the first case of (30). For the second case of (30), we want to show that for all $\epsilon\in(0,\frac{1}{2})$ we have

f(\epsilon,\frac{1}{\epsilon})\geq 0.

But

f(\epsilon,\frac{1}{\epsilon})=\log(1-\epsilon)+2\epsilon+2\epsilon\log\epsilon+1

is decreasing in $\epsilon$ and $f(\frac{1}{2},2)=0$ , so we indeed have $f(\epsilon,\frac{1}{\epsilon})\geq 0$ for all $\epsilon.$ ∎

D.2 Duals of Transitive Codes - Proof of Fact 10

Claim.

The dual code $C^{\perp}$ of a transitive code $C\subseteq\mathbb{F}_{2}^{N}$ is transitive.

Proof.

Let $i,j\in[N]$ be arbitrary. Since $C$ is transitive, we know there exists a permutation $\pi:[N]\rightarrow[N]$ such that $\pi(j)=i$ and that for any $c=(c_{1},...,c_{N})\in C$ , we have $c_{\pi}:=(c_{\pi(1),...,\pi(N)})\in C$ . Clearly $\pi^{-1}$ satisfies $\pi^{-1}(i)=j$ , and we claim that it also satisfies that $v_{\pi^{-1}}\in C^{\perp}$ for all $v\in C^{\perp}$ . For this we note that since $c_{\pi}\in C$ for every $c\in C$ , we have by definition that every $v\in C^{\perp}$ satisfies

\sum_{k}v_{k}c_{\pi(k)}=0\textnormal{ for all }c\in C.

We thus have

	$\displaystyle v\in C^{\perp}$	$\displaystyle\implies\sum_{k}v_{k}c_{\pi(k)}=0\textnormal{ for all }c\in C$
		$\displaystyle\implies\sum_{k}v_{\pi^{-1}(k)}c_{k}=0\textnormal{ for all }c\in C$
		$\displaystyle\implies v_{\pi^{-1}}\in C^{\perp}.$

∎

D.3 Basic Properties of Reed-Muller Codes - Proof of Facts 11 and 12

Fact.

Let $M$ be the $\binom{n}{\leq d}\times N$ generator matrix of the Reed-Muller code. The columns of $M$ that correspond to the points $x\in\mathbb{F}_{2}^{n}$ with $|x|\leq d$ are linearly independent.

Proof.

Let $M^{\prime}$ be the submatrix of $M$ whose columns correspond to the points $v\in\mathbb{F}_{2}^{n}$ with $|v|\leq d$ . It suffices to show that when you order the columns $M^{\prime}_{v}$ of $M^{\prime}$ in increasing order of $|v|$ , every column is linearly independent from the preceding ones. But this is clearly the case, as for the monomial $m=\prod_{i:v_{i}=1}x_{i}$ we have $M_{m,v}=1$ and $M_{m,v^{\prime}}=0$ for all $v^{\prime}$ preceding v. ∎

Fact.

For all $n$ and all $d<n$ , the Reed-Muller code $\mathsf{RM}(n,d)\subseteq\mathbb{F}_{2}^{N}$ is transitive.

Proof.

Recall that we view each coordinate $i\in[N]$ as a point $v_{i}\in\mathbb{F}_{2}^{n}$ , and that every codeword in $\mathsf{RM}(n,d)$ is the evaluation vector $\big{(}f(v_{1}),...,f(v_{N})\big{)}$ of a polynomial $f$ of degree $\leq d$ in $n$ variables.

Now fix two points $v_{i},v_{j}\in\mathbb{F}_{2}^{n}$ . We want to show that there is a permutation $\pi:\mathbb{F}_{2}^{n}\rightarrow\mathbb{F}_{2}^{n}$ such that

(i)

$\pi(v_{i})=v_{j}$
(ii)

If $\big{(}z_{v_{1}},...,z_{v_{N}}\big{)}\in\mathsf{RM}(n,d)$ then $\big{(}z_{\pi(v_{1})},...,z_{\pi(v_{N})}\big{)}\in\mathsf{RM}(n,d)$

To this end, we choose the permutation $\pi(x)=x+v_{i}+v_{j}$ . Then:

(i)

$\pi(v_{i})=v_{i}+v_{i}+v_{j}=v_{j}$ .
(ii)

If $\big{(}z_{v_{1}},...,z_{v_{N}}\big{)}$ is a codeword, it can be written as $\big{(}z_{v_{1}},...,z_{v_{N}}\big{)}=\big{(}f(v_{1}),...,f(v_{N})\big{)}$ for some polynomial $f$ of degree $\leq d$ . But then the polynomial

$g(x)=f(x+v_{i}+v_{j})$ satisfies $\mathsf{deg}(g)=\mathsf{deg}(f)\leq d$ , so $\big{(}g(v_{1}),...,g(v_{N})\big{)}$ must be a codeword. Then since $g(x)=f\circ\pi(x)$ by definition, we have that $\big{(}z_{\pi(v_{1})},...,z_{\pi(v_{N})}\big{)}=\big{(}f\circ\pi(v_{1}),...,f\circ\pi(v_{N})\big{)}=\big{(}g(v_{1}),...,g(v_{N})\big{)}\in\mathsf{RM}(n,d)$ .

∎

D.4 A version of Pinsker’s inequality - Proof of Lemma 14

Lemma.

For any $\mu\in(0,1)$ , we have

\frac{\mu^{2}}{2\ln 2}\leq 1-h\left(\frac{1-\mu}{2}\right)\leq\mu^{2}

Proof.

	$\displaystyle 1-h(\frac{1-\mu}{2})$	$\displaystyle=1+\frac{1-\mu}{2}\log\left(\frac{1-\mu}{2}\right)+\frac{1+\mu}{2}\log\left(\frac{1+\mu}{2}\right)$
		$\displaystyle=\frac{1-\mu}{2}\log\left(1-\mu\right)+\frac{1+\mu}{2}\log\left(1+\mu\right)$
		$\displaystyle=\frac{1}{2\ln 2}\left[-(1-\mu)\sum_{i=1}^{\infty}\frac{\mu^{i}}{i}-(1+\mu)\sum_{i=1}^{\infty}(-1)^{i}\frac{\mu^{i}}{i}\right]$
		$\displaystyle=\frac{1}{2\ln 2}\left[2\mu\sum_{i=1}^{\infty}\frac{\mu^{2i-1}}{2i-1}-2\sum_{i=1}^{\infty}\frac{\mu^{2i}}{2i}\right]$
		$\displaystyle=\frac{1}{\ln 2}\sum_{i=1}^{\infty}\mu^{2i}\left(\frac{1}{2i-1}-\frac{1}{2i}\right)$
		$\displaystyle=\frac{1}{2\ln 2}\sum_{i=1}^{\infty}\frac{\mu^{2i}}{i(2i-1)}$

Thus $1-h(\frac{1-\mu}{2})\geq\frac{\mu^{2}}{2\ln 2}$ and $1-h(\frac{1-\mu}{2})\leq\frac{1}{2\ln 2}\sum_{i=1}^{\infty}\frac{\mu^{2}}{i(2i-1)}=\frac{1}{2\ln 2}\cdot 2\ln 2\cdot\mu^{2}=\mu^{2}$ . ∎

References

[AHN21] Emmanuel Abbe, Jan Hazla, and Ido Nachum. Almost-reed-muller codes achieve constant rates for random errors. IEEE Trans. Inf. Theory, 67(12):8034–8050, 2021.
[Ari09] Erdal Arikan. Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory, 55(7):3051–3073, 2009.
[ASW15] Emmanuel Abbe, Amir Shpilka, and Avi Wigderson. Reed-muller codes for random erasures and errors. IEEE Trans. Inf. Theory, 61(10):5229–5252, 2015.
[ASY21] Emmanuel Abbe, Amir Shpilka, and Min Ye. Reed-muller codes: Theory and algorithms. IEEE Trans. Inf. Theory, 67(6):3251–3277, 2021.
[AY19] Emmanuel Abbe and Min Ye. Reed-muller codes polarize. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 273–286. IEEE Computer Society, 2019.
[Bar21] Alexander Barg. Stolarsky’s invariance principle for finite metric spaces. Mathematika, 67(1):158–186, 2021.
[BDM18] Dmitriy Bilyk, Feng Dai, and Ryan Matzke. Stolarsky principle and energy optimization on the sphere. Constructive Approximation, 48, 08 2018.
[BGY20] Paul Beame, Shayan Oveis Gharan, and Xin Yang. On the bias of reed-muller codes over odd prime fields. SIAM J. Discret. Math., 34(2):1232–1247, 2020.
[BHL12] Ido Ben-Eliezer, Rani Hod, and Shachar Lovett. Random low-degree polynomials are hard to approximate. Comput. Complex., 21(1):63–81, 2012.
[BK97] Jean Bourgain and Gil Kalai. Influences of variables and threshold intervals under group symmetries. Geometric & Functional Analysis GAFA, 7(3):438–461, 1997.
[BLM13] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities - A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
[dW08] Ronald de Wolf. A brief introduction to fourier analysis on the boolean cube. Theory Comput., 1:1–20, 2008.
[Eli57] Peter Elias. List decoding for noisy channels. Wescon Convention Record, Part 2, pages 94–104, 1957.
[Gal62] Robert G. Gallager. Low-density parity-check codes. IRE Trans. Inf. Theory, 8(1):21–28, 1962.
[GHK11] Venkatesan Guruswami, Johan Håstad, and Swastik Kopparty. On the list-decodability of random linear codes. IEEE Trans. Inf. Theory, 57(2):718–725, 2011.
[GHSZ02] Venkatesan Guruswami, Johan Håstad, Madhu Sudan, and David Zuckerman. Combinatorial bounds for list decoding. IEEE Trans. Inf. Theory, 48(5):1021–1034, 2002.
[GR08] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Trans. Inf. Theory, 54(1):135–150, 2008.
[GX12] Venkatesan Guruswami and Chaoping Xing. Folded codes from function field towers and improved optimal rate list decoding. In Howard J. Karloff and Toniann Pitassi, editors, Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 339–350. ACM, 2012.
[HRW17] Brett Hemenway, Noga Ron-Zewi, and Mary Wootters. Local list recovery of high-rate tensor codes & applications. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 204–215. IEEE Computer Society, 2017.
[HSS21] Jan Hazla, Alex Samorodnitsky, and Ori Sberlo. On codes decoding a constant fraction of errors on the BSC. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1479–1488. ACM, 2021.
[IS98] Mourad E.H Ismail and Plamen Simeonov. Strong asymptotics for krawtchouk polynomials. Journal of Computational and Applied Mathematics, 100(2):121–144, 1998.
[KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions. In 29th Annual Symposium on Foundations of Computer Science, White Plains, New York, USA, 24-26 October 1988, pages 68–80. IEEE Computer Society, 1988.
[KKM⁺16] Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu, and Rüdiger L. Urbanke. Reed-muller codes achieve capacity on erasure channels. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 658–669. ACM, 2016.
[KL95] Gil Kalai and Nathan Linial. On the distance distribution of codes. IEEE Trans. Inf. Theory, 41(5):1467–1472, 1995.
[KL99] Ilia Krasikov and Simon Litsyn. Survey of binary krawtchouk polynomials. In Alexander Barg and Simon Litsyn, editors, Codes and Association Schemes, Proceedings of a DIMACS Workshop, Piscataway, New Jersey, USA, November 9-12, 1999, volume 56 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 199–211. DIMACS/AMS, 1999.
[KLP12] Tali Kaufman, Shachar Lovett, and Ely Porat. Weight distribution and list-decoding size of reed-muller codes. IEEE Trans. Inf. Theory, 58(5):2689–2696, 2012.
[Kop15] Swastik Kopparty. List-decoding multiplicity codes. Theory Comput., 11:149–182, 2015.
[KRU13] Shrinivas Kudekar, Tom Richardson, and Rüdiger L. Urbanke. Spatially coupled ensembles universally achieve capacity under belief propagation. IEEE Trans. Inf. Theory, 59(12):7761–7813, 2013.
[LMS⁺97] Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, Daniel A. Spielman, and Volker Stemann. Practical loss-resilient codes. In Frank Thomson Leighton and Peter W. Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, pages 150–159. ACM, 1997.
[LW18] Ray Li and Mary Wootters. Improved list-decodability of random linear binary codes. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, volume 116 of LIPIcs, pages 50:1–50:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
[MRR⁺20] Jonathan Mosheiff, Nicolas Resch, Noga Ron-Zewi, Shashwat Silas, and Mary Wootters. LDPC codes achieve list decoding capacity. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 458–469. IEEE, 2020.
[MS77] Florence MacWilliams and Neil Sloane. The theory of error correcting codes. North-Holland Publishing Company, 1977.
[Pol19] Yury Polyanskiy. Hypercontractivity of spherical averages in hamming space. SIAM J. Discret. Math., 33(2):731–754, 2019.
[RP21] Galen Reeves and Henry D. Pfister. Reed-muller codes achieve capacity on BMS channels. CoRR, abs/2110.14631, 2021.
[Sam20] Alex Samorodnitsky. An upper bound on $l_{q}$ norms of noisy functions. IEEE Trans. Inf. Theory, 66(2):742–748, 2020.
[Sha48] Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27(3):379–423, 1948.
[Skr19] M. Skriganov. Point distributions in two-point homogeneous spaces. Mathematika, 65:557–587, 03 2019.
[SS20] Ori Sberlo and Amir Shpilka. On the performance of reed-muller codes with respect to random errors and erasures. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1357–1376. SIAM, 2020.
[Tal94] Michel Talagrand. On russo’s approximate zero-one law. The Annals of Probability, 22(3):1576–1587, 1994.

	$\displaystyle\frac{1}{\binom{N}{S}}\sum_{j\in B}\Pr_{v\sim\mu_{t}}\big{[}\|vH\|=j\big{]}K_{S}(j)^{2}$	$\displaystyle\leq\frac{1}{\binom{N}{S}}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{1}{\binom{N}{j}}\right\}\sum_{j\in B}\binom{N}{j}\cdot K_{S}(j)^{2}$
		$\displaystyle\leq 2^{N}\max_{j\in B}\left\{\Pr_{v\sim\mu_{t}}[\|vH\|=j]\cdot\frac{1}{\binom{N}{j}}\right\}.$		(10)

	$\displaystyle 2^{N}\cdot k$	$\displaystyle=\sum_{c\in C}\big{\|}\{x\in\mathbb{F}_{2}^{N}:c\in d_{k}(x)\}\big{\|}$
		$\displaystyle\geq\frac{\|C\|}{2}\cdot\frac{2^{(1-N^{-1/4})h(\epsilon)N}}{4}$
		$\displaystyle=\frac{\|C\|}{8}\cdot 2^{h(\epsilon)N-h(\epsilon)N^{3/4}}.$