This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Threshold-Based Fast Successive-Cancellation Decoding of Polar Codes

Haotian Zheng, , Seyyed Ali Hashemi, , Alexios Balatsoukas-Stimming, , Zizheng Cao, , Ton Koonen, , John Cioffi, , Andrea Goldsmith H. Zheng, A. Balatsoukas-Stimming, Z. Cao, and A. M. J. Koonen are with the Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands (e-mails: {h.zheng, a.k.balatsoukas.stimming, z.cao, a.m.j.koonen}@tue.nl). S. A. Hashemi, J. Cioffi, and A. Goldsmith are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mails: {ahashemi, cioffi}@stanford.edu, [email protected]). (Corresponding author: Zizheng Cao)Part of this work was presented at the IEEE International Conference on Communications, June 2020.
Abstract

Fast SC decoding overcomes the latency caused by the serial nature of the SC decoding by identifying new nodes in the upper levels of the SC decoding tree and implementing their fast parallel decoders. In this work, we first present a novel sequence repetition node corresponding to a particular class of bit sequences. Most existing special node types are special cases of the proposed sequence repetition node. Then, a fast parallel decoder is proposed for this class of node. To further speed up the decoding process of general nodes outside this class, a threshold-based hard-decision-aided scheme is introduced. The threshold value that guarantees a given error-correction performance in the proposed scheme is derived theoretically. Analysis and hardware implementation results on a polar code of length 10241024 with code rates 1/41/4, 1/21/2, and 3/43/4 show that our proposed algorithm reduces the required clock cycles by up to 8%8\%, and leads to a 10%10\% improvement in the maximum operating frequency compared to state-of-the-art decoders without tangibly altering the error-correction performance. In addition, using the proposed threshold-based hard-decision-aided scheme, the decoding latency can be further reduced by 57%57\% at Eb/N0=5.0\mathrm{E_{b}}/\mathrm{N_{0}}=5.0 dB.

Index Terms:
Polar codes, Fast successive-cancellation decoding, Sequence repetition node, Threshold-based hard-decision-aided scheme.

I Introduction

Polar codes represent a channel coding scheme that can provably achieve the capacity of binary-input memoryless channels [1]. The explicit coding structure and low-complexity successive-cancellation (SC) decoding algorithm has generated significant interest in polar code research across both industry and academia. In particular, in the latest cellular standard of 5G [2], polar codes are adopted in the control channel of the enhanced mobile broadband (eMBB) use case. Although SC decoding provides a low-complexity capacity-achieving solution for polar codes with long block length, its sequential bit-by-bit decoding nature leads to high decoding latency, which constrains its application in low-latency communication scenarios such as the ultra-reliable low-latency communication (URLLC) [2] scheme of 5G. Therefore, the design of fast SC-based decoding algorithms for polar codes with low decoding latency has received a lot of attention [3].

A look-ahead technique was adopted to speed up SC decoding in [4, 5, 6] by pre-computing all the possible likelihoods of the bits that have not been decoded yet, and selecting the appropriate likelihood once the corresponding bit is estimated. Using the binary tree representation of SC decoding of polar codes, instead of working at the bit-level that corresponds to the leaf nodes of the SC decoding tree, parallel multi-bit decision is performed at the intermediate nodes of the SC decoding tree. An exhaustive-search decoding algorithm is used in [7, 8, 9, 10] to make multi-bit decisions and to avoid the latency caused by the traversal of the SC decoding tree to compute the intermediate likelihoods. However, due to the high complexity of the exhaustive search, this method is generally only suitable for nodes that represent codes of very short lengths.

It was shown in [11] that a node in the SC decoding tree that represents a code of rate 0 (Rate-0 node) or a code of rate 11 (Rate-1 node) can be decoded efficiently without traversing the SC decoding tree. In [12], fast decoders of repetition (REP) and single parity-check (SPC) nodes were proposed for the SC decoding. Using these nodes, a fully-unrolled hardware architecture was proposed in [13]. Techniques were developed in [14, 15, 16, 17, 18] to adjust the codes that are represented by the nodes in the SC decoding tree to increase the number of nodes that can be decoded efficiently. However, these methods result in a degraded error-correction performance. On the basis of the works in [11, 12], five new nodes (Type-I, Type-II, Type-III, Type-IV, and Type-V) were identified and their fast decoders were designed in [19]. In [20], a generalized REP (G-REP) node and a generalized parity-check (G-PC) node were proposed to reduce the latency of the SC decoding even further. In [21], seven of the most prevalent node patterns in short polar codes were analysed and efficient algorithms for processing these node patterns in parallel are proposed. However, the decoding of some of these node patterns leads to significant performance loss. Memory footprint optimization and operation merging were described in [22, 23] to improve the speed of SC decoding. The combination of fast SC and fast SC flip decoding algorithms was investigated in [24, 25]. Moreover, the identification and the utilization of the aforementioned special nodes were also extended to SC list (SCL) [26] decoding [27, 28, 29, 30, 31]. All these works require the design of a separate decoder for each class of node, which inevitably increases the implementation complexity. In addition, as shown in this work, the achievable parallelism in decoding can be further increased without degrading the error-correction performance.

For general nodes that do not fall in one of the above node categories, [32] proposed a hard-decision scheme based on node error probability. Specifically, in that work it was shown that extra latency reduction can be achieved when the communications channel has low noise. However, the hard-decision threshold is calculated empirically rather than for a desired error-correction performance. In [33], a hypothesis-testing-based strategy is designed to select reliable unstructured nodes for hard decision. However, additional operations are required to be performed to calculate the decision rule, thus, incurring extra decoding latency. For all the existing hard-decision schemes, a threshold comparison operation is required each time a general constituent code is encountered in the course of the SC decoding algorithm.

In this paper, a fast SC decoding algorithm with a higher degree of parallelism than the state of the art is proposed. First, a class of sequence repetition (SR) nodes is proposed which provides a unified description of most of the existing special nodes. This class of nodes is typically found at a higher level and has a higher frequency of occurrence in the decoding tree than other existing special nodes. 5G polar codes with different code lengths and code rates can be represented using only SR nodes. Utilizing this class of nodes, a simple and efficient fast simplified SC decoding algorithm called the SR node-based fast SC (SRFSC) decoding algorithm is proposed, which achieves a high degree of parallelism without degrading the error-correction performance. Employing only SR nodes, the SRFSC decoder requires a lower number of time steps than most existing works. Moreover, SR nodes can be implemented together with other operation mergers, such as node-branch and branch mergers, to outperform the state-of-the-art decoders in terms of the required number of time steps. In addition, if a realistic hardware implementation is considered, the proposed SRFSC decoder provides up to 8%8\% reduction in terms of the required number of clock cycles and 10%10\% improvement in terms of the maximum operating frequency with respect to the state-of-the-art decoders on a polar codes of length 10241024 changedwith code rates 1/41/4, 1/21/2, and 3/43/4.

Second, a threshold-based hard-decision-aided (TA) scheme is proposed to speed up the decoding of the nodes that are not SR nodes for a binary additive white Gaussian noise (BAWGN) channel. Consequently, a TA-SRFSC decoding algorithm is proposed that adopts a simpler threshold for hard-decision than that in [32]. The effect of the defined threshold on the error-correction performance of the proposed TA-SRFSC decoding algorithm is analyzed. Moreover, a systematic way to derive the threshold value for a desired upper bound for its block error rate (BLER) is determined. Performance results show that, with the help of the proposed TA scheme, the decoding latency of SRFSC decoding can be further reduced by 57%57\% at Eb/N0=5\mathrm{E_{b}}/\mathrm{N_{0}}=5 dB on a polar code of length 10241024 and rate 1/21/2. In addition, a multi-stage decoding strategy is introduced to mitigate the possible error-correction performance loss of TA-SRFSC decoding with respect to SC decoding, while achieving average decoding latency comparable to TA-SRFSC decoding.

The rest of this paper is organized as follows. Section II gives a brief introduction to the basic concept of polar codes and fast SC decoding. In Section III, the SRFSC decoding algorithm is introduced. With the help of the proposed TA scheme, the TA-SRFSC decoding is presented in Section IV. Section V analyzes the decoding latency and simulation results are shown in Section VI. Finally, Section VII gives a summary of the paper and concluding remarks.

II Preliminaries

II-A Notation Conventions

In this paper, blackboard letters, such as 𝕏\mathbb{X}, denote a set and |𝕏|\left|\mathbb{X}\right| denotes the number of elements in 𝕏\mathbb{X}. Bold letters, such as 𝒗\boldsymbol{v}, denote a row vector, 𝒗T\boldsymbol{v}^{T} denotes the transpose of 𝒗\boldsymbol{v}, and notation v[i:j]v\left[i:j\right], 1i<j1\leq i<j represents a subvector (v[i],v[i+1],,v[j])\left(v\left[i\right],v\left[i+1\right],\dots,v\left[j\right]\right). \oplus is used as the bitwise XOR operation and v[i:j]z=(v[i]z,v[i+1]z,,v[j]z)v\left[i:j\right]\oplus z=\left(v\left[i\right]\oplus z,v\left[i+1\right]\oplus z,\dots,v\left[j\right]\oplus z\right), z{0,1}z\in\left\{0,1\right\}. The Kronecker product of two matrices 𝐅\mathbf{F} and 𝐆\mathbf{G} is written as 𝐅𝐆\displaystyle\mathbf{F}\otimes\mathbf{G}. 𝐅N{\mathbf{F}}_{N} represents an N×NN\times N square matrix and 𝐅Nn\mathbf{F}_{N}^{\otimes n} denotes the nn-th Kronecker power of 𝐅N\mathbf{F}_{N}. Throughout this paper, ln(x)\ln(x) denotes the natural logarithm and log(x)\log(x) indicates the base-22 logarithm of xx, respectively.

II-B Polar Codes

A polar code with code length N=2nN=2^{n} and information length KK is denoted by 𝒫(N,K)\mathcal{P}\left(N,K\right) and has rate R=K/NR=K/N. The encoding can be expressed as 𝒙=𝒖𝐆N\boldsymbol{x}=\boldsymbol{u}\mathbf{G}_{N}, where 𝒖=(u[1],u[2],,u[N])\boldsymbol{u}=\left(u\left[1\right],u\left[2\right],\ldots,u\left[{N}\right]\right) is the input bit sequence and 𝒙=(x[1],x[2],,x[N])\boldsymbol{x}=\left(x\left[1\right],x\left[2\right],\ldots,x\left[{N}\right]\right) is the encoded bit sequence. 𝐆N=𝐑N𝐅2n\mathbf{G}_{N}=\mathbf{R}_{N}\mathbf{F}_{2}^{\otimes n} is the generator matrix, where 𝐑N\mathbf{R}_{N} is a bit-reversal permutation matrix and 𝐅2=[1011]\mathbf{F}_{2}=\left[\begin{smallmatrix}1&0\\ 1&1\end{smallmatrix}\right].

The input bit sequence 𝒖\boldsymbol{u} consists of KK information bits and NKN-K frozen bits. The information bits form set 𝔸\mathbb{A}, transmitting information bits, while the frozen bits form set 𝔸c\mathbb{A}^{c}, transmitting fixed bits known to the receiver. For symmetric channels, without loss of generality, all frozen bits are set to zero [1]. To distinguish between frozen and information bits, a vector of flags 𝒅=(d[1],d[2],,d[N])\boldsymbol{d}=\left(d\left[1\right],d\left[2\right],\ldots,d\left[{N}\right]\right) is used where each flag d[k]d\left[k\right] is assigned as

d[k]={0,if k𝔸c,1,otherwise.d\left[k\right]=\begin{cases}0,&\mbox{if }k\in\mathbb{A}^{c},\\ 1,&\mbox{otherwise.}\end{cases} (1)

The codeword 𝒙\boldsymbol{x} is transmitted through a channel after modulation. In this paper, non-systematic polar codes and binary phase-shift keying (BPSK) modulation that maps {0,1}\left\{0,1\right\} to {+1,1}\left\{+1,-1\right\} are considered. Transmission takes place over an additive white Gaussian noise (AWGN) channel.

II-C SC Decoding and Binary Tree Representation

Refer to caption
Figure 1: SC decoding on the factor graph of a polar code N=8N=8.
Refer to caption
Figure 2: Binary tree representation of a SC decoder for a polar code with N=8N=8.

SC decoding can be illustrated on the factor graph of polar codes as shown in Fig. 2. The factor graph consists of n+1n+1 levels and, by grouping all the operations that can be performed in parallel, SC decoding can be represented as the traversal of a binary tree. This traversal is shown in Fig. 2, starting from the left side of the binary tree. At level jj of the SC decoding tree with n+1n+1 levels, there are 2nj2^{n-j} nodes (0jn0\leq j\leq n), and the ii-th node at level jj (1i2nj1\leq i\leq 2^{n-j}) of the SC decoding tree is denoted as 𝒩ji\mathcal{N}_{j}^{i}. The left and the right child nodes of 𝒩ji\mathcal{N}_{j}^{i} are 𝒩j12i1\mathcal{N}_{j-1}^{2i-1} and 𝒩j12i\mathcal{N}_{j-1}^{2i}, respectively, as illustrated in Fig. 2. For 𝒩ji\mathcal{N}_{j}^{i}, αji[k]\alpha_{j}^{i}\left[k\right], 1k2j1\leq k\leq 2^{j}, indicates the kk-th input logarithmic likelihood ratio (LLR) value, and βji[k]\beta_{j}^{i}\left[k\right], 1k2j1\leq k\leq 2^{j}, denotes the kk-th output binary hard-valued message. For the AWGN channel, the received vector 𝒚=(y[1],y[2],,y[N])\boldsymbol{y}=\left(y\left[1\right],y\left[2\right],\ldots,y\left[N\right]\right) from the channel can be used to calculate the channel LLR as 2𝒚/σ22\boldsymbol{y}/\sigma^{2}, where σ2\sigma^{2} is the variance of the Gaussian noise. SC decoding starts by setting αn1[1:N]=2𝒚/σ2\alpha_{n}^{1}\left[1:N\right]=2\boldsymbol{y}/\sigma^{2}. A node will be activated once all its inputs are available. When LLR messages pass through a node in the factor graph which is indicated by the \oplus sign, the ff function over the LLR domain is executed as

αj12i1[k]=f(αji[2k1],αji[2k]),\alpha_{j-1}^{2i-1}\left[k\right]=f\left(\alpha_{j}^{i}\left[{2k-1}\right],\alpha_{j}^{i}\left[{2k}\right]\right), (2)

and when LLR messages pass through a node in the factor graph which is indicated by the = sign, the gg function over the LLR domain is executed as

αj12i[k]=g(αji[2k1],αji[2k],βj12i1[k]),\alpha_{j-1}^{2i}\left[k\right]=g\left(\alpha_{j}^{i}\left[{2k-1}\right],\alpha_{j}^{i}\left[{2k}\right],\beta_{j-1}^{2i-1}\left[k\right]\right), (3)

where

f(x,y)\displaystyle f\left(x,y\right) =2arctanh(tanh(x2)tanh(y2)),\displaystyle=2\operatorname{arctanh}\left(\tanh\left(\frac{x}{2}\right)\tanh\left(\frac{y}{2}\right)\right), (4)
g(x,y,u)\displaystyle g\left(x,y,u\right) =(1)ux+y.\displaystyle=\left(-1\right)^{u}x+y. (5)

The ff function can be approximated as [34]

f(x,y)=sign(x)sign(y)min(|x|,|y|).f\left(x,y\right)=\mathrm{sign}\left(x\right)\mathrm{sign}\left(y\right)\min\left(\left|x\right|,\left|y\right|\right). (6)

When the LLR value of the kk-th bit at level zero α0k, 1kN\alpha_{0}^{k},\;1\leq k\leq N, is calculated, the estimation of u[k]u\left[k\right], denoted as u^[k]{\hat{u}}\left[k\right], can be obtained as

u^[k]=β^0k={0,if k𝔸c,1sign(α0k)2,otherwise.\centering\hat{u}\left[k\right]=\hat{\beta}_{0}^{k}=\begin{cases}0,&\mbox{if }k\in\mathbb{A}^{c}\text{,}\\ \frac{1-\mathrm{sign}(\alpha_{0}^{k})}{2},&\mbox{otherwise.}\end{cases}\@add@centering (7)

The hard-valued messages are propagated back to the parent node as

β^ji[k]={β^j12i1[k+12]β^j12i[k+12],ifmod(k,2)=1,β^j12i[k2],ifmod(k,2)=0.\hat{\beta}_{j}^{i}\left[k\right]=\begin{cases}\hat{\beta}_{j-1}^{2i-1}\left[\frac{k+1}{2}\right]\oplus\hat{\beta}_{j-1}^{2i}\left[\frac{k+1}{2}\right],&\mbox{if}\mod(k,2)=1\text{,}\\ \hat{\beta}_{j-1}^{2i}\left[\frac{k}{2}\right],&\mbox{if}\mod(k,2)=0.\end{cases} (8)

After traversing all the nodes in the SC decoding tree, 𝒖^\hat{\boldsymbol{u}} contains the decoding result. Thus the latency of SC decoding for a polar code of length NN in terms of the number of time steps can be represented by the number of nodes in the SC decoding tree as [1]

𝒯SC=2N2.{\mathcal{T}}_{\text{SC}}=2N-2\text{.} (9)

II-D Fast SC Decoding

The SC decoding has strong data dependencies that limit the amount of parallelism that can be exploited within the algorithm because the estimation of each bit depends on the estimation of all previous bits. This leads to a large latency in the SC decoding algorithm. It was shown in [9] that for a node 𝒩ji\mathcal{N}_{j}^{i}, βji[1:2j]\beta_{j}^{i}\left[1:2^{j}\right] can be estimated without traversing the decoding tree as

β^ji[1:2j]=argmaxβji[1:2j]jik=12j(1)βji[k]αji[k],\hat{\beta}_{j}^{i}\left[1:2^{j}\right]=\underset{\beta_{j}^{i}\left[1:2^{j}\right]\in\mathbb{C}_{j}^{i}}{\operatorname*{arg\,max}}\sum_{k=1}^{2^{j}}\left(-1\right)^{\beta_{j}^{i}\left[k\right]}\alpha_{j}^{i}\left[k\right], (10)

where ji\mathbb{C}_{j}^{i} is the set of all the codewords associated with node 𝒩ji\mathcal{N}_{j}^{i}. Multibit decoding can be performed directly in an intermediate level instead of bit-by-bit sequential decoding at level 0, in order to traverse fewer nodes in the SC decoding tree and consequently, to reduce the latency caused by data computation and exchange. However, the evaluation of (10) generally requires exhaustive search over all the codewords in the set ji\mathbb{C}_{j}^{i} which is computationally intensive in practice. In [12], a fast SC (FSC) decoding algorithm was proposed which performs fast parallel decoding when specific special node types are encountered. The sequences of information and frozen bits of these node types have special bit-patterns. Therefore, they can be decoded more efficiently without the need for exhaustive search. Using the vector 𝒅\boldsymbol{d}, the five special node types proposed in [12] are described as:

  • Rate-0 node: all bits are frozen bits, 𝒅=(0,0,,0)\boldsymbol{d}=\left(0,0,...,0\right).

  • Rate-1 node: all bits are non-frozen bits, 𝒅=(1,1,,1)\boldsymbol{d}=\left(1,1,...,1\right).

  • REP node: all bits are frozen bits except the last one, 𝒅=(0,,0,1)\boldsymbol{d}=\left(0,...,0,1\right).

  • SPC node: all bits are non-frozen bits except the first one, 𝒅=(0,1,,1)\boldsymbol{d}=\left(0,1,...,1\right).

  • ML node: a code of length 44 with 𝒅=(0,1,0,1)\boldsymbol{d}=\left(0,1,0,1\right).

By merging the aforementioned special nodes, five additional nodes were proposed, namely, the REP-SPC node, which is a REP node followed by a SPC node; the P-01/P-0SPC node, which is generated by merging a Rate-0 node with a Rate-1/SPC node; and the P-R1/P-RSPC node, which is generated by merging a gg function branch operation in (5) with the following Rate-1/SPC node. In [19], five additional special node types and their corresponding fast decoders were introduced. This enhanced FSC decoding algorithm can achieve a lower decoding latency than FSC decoding. The five special node types are:

  • Type-I node: all bits are frozen bits except the last two, 𝒅=(0,,0,1,1)\boldsymbol{d}=\left(0,...,0,1,1\right).

  • Type-II node: all bits are frozen bits except the last three, 𝒅=(0,,0,1,1,1)\boldsymbol{d}=\left(0,...,0,1,1,1\right).

  • Type-III node: all bits are non-frozen bits except the first two, 𝒅=(0,0,1,,1)\boldsymbol{d}=\left(0,0,1,...,1\right).

  • Type-IV node: all bits are non-frozen bits except the first three, 𝒅=(0,0,0,1,,1)\boldsymbol{d}=\left(0,0,0,1,...,1\right).

  • Type-V node: all bits are frozen bits except the last three and the fifth to last, 𝒅=(0,,0,1,0,1,1,1)\boldsymbol{d}=\left(0,...,0,1,0,1,1,1\right).

A generalized FSC (GFSC) decoding algorithm was proposed in [20] by introducing the G-PC node and the G-REP node (previously identified as Multiple-G0 nodes in [23]). The G-PC node is a node at level jj having all its descendants as Rate-1 nodes except the leftmost one at a certain level r<jr<j, that is a Rate-0 node. The G-REP node is a node at level jj for which all its descendants are Rate-0 nodes, except the rightmost one at a certain level r<jr<j, which is a generic node of rate CC (Rate-C).

In [18], in addition to the nodes in [12], three special node types are added: the REP1 node, with the REP node on the left and the Rate-1 node on the right; the 0REPSPC node, which is a concatenation of the Rate-0, REP, and SPC nodes; and the 001 node, whose left 3/43/4 of bits are frozen bits and right 1/41/4 of bits are unfrozen bits.

To further reduce the latency, the operation merging scenarios are generalized in [22] and [23] where, in addition to merging special nodes, branch operation ((4),(5) and (8)) merging is also considered. Consequently, REP-REPSPC, REP-Rate1, and Rate0-ML nodes are adopted as the merging of special nodes, and F-REP node is adopted as a node-branch merger, which is generated by performing an ff function operation followed by a REP node. The following branch operation mergers are further introduced:

  • F×2\text{F}^{\times 2}: two consecutive ff function operations in (4).

  • G0×2\text{G0}^{\times 2}: two consecutive G0 operations, where G0 is the gg function in (5) assuming u=0u=0.

  • C×2, C×3: up to three consecutive operations in (8).

  • C0×2, C0×3: up to three consecutive operations in (8), assuming the estimations from the left branch are all zeros.

  • G-F: gg function operation followed by an ff function operation.

  • F-G0: ff function operation followed by a G0 operation.

The key advantage of using specific parallel decoders for the aforementioned special nodes is that, since the SC decoding tree is not traversed when one of these nodes is encountered, significant latency saving can be achieved. For example, if 𝒩ji\mathcal{N}_{j}^{i} is a Rate-1 node, hard decision decoding can be used to immediately obtain the decoding result as

β^ji[k]=h(αji[k])={0,if αji[k]0,1,otherwise.\hat{\beta}_{j}^{i}\left[k\right]=h\left(\alpha_{j}^{i}\left[k\right]\right)=\begin{cases}0,&\mbox{if }\alpha_{j}^{i}\left[k\right]\geq 0\text{,}\\ 1,&\mbox{otherwise.}\end{cases} (11)

If 𝒩ji\mathcal{N}_{j}^{i} is a SPC node, hard decision based on (11) is first derived followed by the calculation of the parity of the output using modulo-2 addition. The index of the least reliable bit is found as

k=argmin𝑘|αji[k]|.k^{\prime}=\underset{k}{\operatorname*{arg\,min}}\left|\alpha_{j}^{i}\left[k\right]\right|. (12)

Finally, the bits in a SPC node are estimated as

β^ji[k]={h(αji[k])parity,if k=k,h(αji[k]),otherwise.\hat{\beta}_{j}^{i}\left[k\right]=\begin{cases}h\left(\alpha_{j}^{i}\left[k\right]\right)\oplus\mathrm{parity},&\mbox{if }k=k^{\prime}\text{,}\\ h\left(\alpha_{j}^{i}\left[k\right]\right),&\mbox{otherwise.}\end{cases} (13)

This operation can be performed in a single time step [19]. Finally, if 𝒩ji\mathcal{N}_{j}^{i} is a G-PC node, the decoding can be viewed as a parallel decoding of several separate SPC nodes. The decoding of a G-PC node can generally be performed in one time step considering parallel SPC decoders [20].

All of the aforementioned fast SC decoding algorithms perform decoding at an intermediate level of the decoding tree in order to reduce the number of traversed nodes. A decoding algorithm that can decode a node at a higher level of the decoding tree generally results in more savings in terms of latency than one that decodes a node at a lower level of the decoding tree.

II-E Threshold-based Hard-decision-aided Scheme

For a binary AWGN channel with standard deviation σn\sigma_{n}, it was shown in [35] that, considering all the previous bits are decoded correctly, the LLR value aji[k]a_{j}^{i}\left[k\right], 1k2j1\leq k\leq 2^{j}, input into node 𝒩ji\mathcal{N}_{j}^{i} can be approximated as a Gaussian variable using a Gaussian approximation as

αji[k]𝒩(Mji[k],2|Mji[k]|),\alpha_{j}^{i}\left[k\right]\sim\mathcal{N}\left(M_{j}^{i}\left[k\right],2\left|M_{j}^{i}\left[k\right]\right|\right), (14)

where Mji[k]M_{j}^{i}\left[k\right] is the expectation of αji[k]\alpha_{j}^{i}\left[k\right] [36] such that

Mji[k]={mji,if βji[k]=0,mji,if βji[k]=1,M_{j}^{i}\left[k\right]=\begin{cases}m_{j}^{i},&\text{if }\beta_{j}^{i}\left[k\right]=0\text{,}\\ -m_{j}^{i},&\text{if }\beta_{j}^{i}\left[k\right]=1,\end{cases} (15)

and mjim_{j}^{i} can be calculated recursively offline assuming the all-zero codeword is transmitted as

mn1=2/σn2,mj12i1=φ1(1[1φ(mji)]2),mj12i=2mji,m_{n}^{1}=2/\sigma_{n}^{2},\;\;m_{j-1}^{2i-1}=\varphi^{-1}(1-[1-\varphi(m_{j}^{i})]^{2}),\;\;m_{j-1}^{2i}=2m_{j}^{i}, (16)

where

φ(x)={114|x|tanh(u2)e(ux)24|x|𝑑u,x0,0,x=0.\varphi(x)=\begin{cases}1-\frac{1}{\sqrt{4|x|}}\int_{-\infty}^{\infty}\tanh\left(\frac{u}{2}\right)e^{-\frac{(u-x)^{2}}{4|x|}}du,&x\neq 0\text{,}\\ 0,&x=0.\end{cases} (17)

It was shown in [32] that, when the magnitude of the LLR values at a certain node in the SC decoding tree is large enough, the node has enough reliability to perform hard decision directly at the node without tangibly altering the error-correction performance. To determine the reliability of the node, a threshold is defined in [32] as

T=ctlog112erfc(0.5mji)12erfc(0.5mji),T=c_{t}\log\frac{1-\frac{1}{2}\mathrm{erfc}\left(0.5\sqrt{m_{j}^{i}}\right)}{\frac{1}{2}\mathrm{erfc}\left(0.5\sqrt{m_{j}^{i}}\right)}, (18)

where ct1c_{t}\geq 1 is a constant that is selected empirically. The hard-decision estimate of the received LLR values is calculated using

HBji[k]={0,if αji[k]>T,1,if αji[k]<T.\mathrm{HB}_{j}^{i}\left[k\right]=\begin{cases}0,&\text{if }\alpha_{j}^{i}\left[k\right]>T,\\ 1,&\text{if }\alpha_{j}^{i}\left[k\right]<-T.\end{cases} (19)

The issue with the method in [32] is that the threshold defined in (18) contains complex calculations of complementary error functions erfc()\mathrm{erfc}\left(\cdot\right), making the corresponding calculation inefficient. Moreover, the hard-decision threshold is calculated empirically rather than for a desired error-correction performance and the threshold comparison in (19) is performed every time a node with no specific structure is encountered in the SC decoding process.

III Fast SC Decoding with Sequence Repetition Nodes

This section introduces a class of nodes that is at a higher level of the SC decoding tree and shows how parallel decoding at this class of nodes can be exploited to achieve significant latency savings in comparison with the state of the art.

III-A Sequence Repetition (SR) Node

Let 𝒩ji\mathcal{N}_{j}^{i} be a node at level jj of the tree representation of SC decoding as shown in Fig. 2. An SR node is any node at stage jj whose descendants are all either Rate-0 or REP nodes, except the rightmost one at a certain stage rr, 0rj0\leq r\leq j, that is a generic node of rate CC. The structure of an SR node is depicted in Fig. 4. The rightmost node 𝒩ri×2jr\mathcal{N}_{r}^{i\times 2^{j-r}} at stage rr is denoted as the source node of the SR node 𝒩ji\mathcal{N}_{j}^{i}. Let E=i×2jrE=i\times 2^{j-r} so the source node can be denoted as 𝒩rE\mathcal{N}_{r}^{E}.

An SR node can be represented by three parameters as SR(𝒗,SNT,r)\text{SR}(\boldsymbol{v},\text{SNT},r), where rr is the level of the SC decoding tree in which 𝒩rE\mathcal{N}_{r}^{E} is located, SNT is the source node type, and 𝒗=(v[j],v[j1],,v[r+1])\boldsymbol{v}=\left(v\left[{j}\right],v\left[{j-1}\right],\ldots,v\left[{r+1}\right]\right) is a vector of length (jr)\left(j-r\right) such that for the left child node of the parent node of 𝒩rE\mathcal{N}_{r}^{E} at level kk, r<kjr<k\leq j, v[k]v\left[{k}\right] is calculated as

v[k]={0,if the left child node is a Rate-0 node,1,if the left child node is a REP node.v\left[k\right]=\begin{cases}0,&\text{if the left child node is a Rate-0 node,}\\ 1,&\text{if the left child node is a REP node.}\end{cases} (20)

Note that when r=jr=j, 𝒩ji\mathcal{N}_{j}^{i} is a source node and thus 𝒗\boldsymbol{v} is an empty vector denoted as 𝒗=\boldsymbol{v}=\emptyset.

Refer to caption
Figure 3: General structure of SR Node.
Refer to caption
Figure 4: General structure of Extended G-PC Node.

III-B Source Node

To define the source node type, an extended class of G-PC (EG-PC) nodes is first introduced. The structure of the EG-PC node is depicted in Fig. 4. The EG-PC node is different from the G-PC node in its leftmost descendant node that can be either a Rate-0 or a REP node. The bits in an EG-PC node satisfy the following parity check constraint,

z=m=(k1)2jr+1k×2jrβji[m],z=\bigoplus_{m=\left(k-1\right)2^{j-r}+1}^{k\times 2^{j-r}}\beta_{j}^{i}\left[m\right], (21)

where k{1,,2r}k\in\left\{1,\ldots,2^{r}\right\}, and z{0,1}z\in\left\{0,1\right\} is the parity. Unlike G-PC nodes whose parity is always even (z=0z=0), the EG-PC node can have either even parity (z=0z=0) or odd parity (z=1z=1). The parity of the EG-PC node can be calculated as

z={0,if the leftmost node is Rate-0,h(k=12rαk),otherwise,z=\begin{cases}0,&\text{if the leftmost node is Rate-0,}\\ h\left(\sum_{k=1}^{2^{r}}\alpha_{k}\right),&\text{otherwise,}\end{cases} (22)

where αk=2tanh1(m=(k1)2jr+1k2jrtanh(αji[m]2))\alpha_{k}=2\tanh^{-1}\left(\prod_{m=\left(k-1\right)2^{j-r}+1}^{k2^{j-r}}\tanh\left(\frac{\alpha_{j}^{i}\left[m\right]}{2}\right)\right). After computing zz, Wagner decoders [37] can be used to decode the 2r2^{r} SPC codes with either even or odd parity constraints. A Wagner decoder performs hard decisions on the bits, and if the parity does not hold, it flips the bit with the lowest absolute LLR value.

SPC, Type-III, Type-IV, and G-PC nodes can be represented as special cases of EG-PC nodes. As a result, most of the common special nodes can be represented as SR nodes, as shown in Table I. Note that the node-branch mergers like P-RSPC and F-REP, and branch operation mergers such as F×2\text{F}^{\times 2}, G‐F, G0×2\text{G0}^{\times 2}, F‐G0, do not fit into the category of SR nodes. It is worth mentioning that other parameter choices for SR nodes may lead to the discovery of more types of special nodes that can be decoded efficiently, but this exploration is beyond the scope of this work.

TABLE I: SR node representation of common node types.
Node Type SR Node Representation Length of 𝒗\boldsymbol{v}
Rate-0 SR(,Rate-0,j)\mathrm{SR}(\emptyset,\text{Rate-0},j) 0
REP SR((0,,0),Rate-1,0)\mathrm{SR}((0,\ldots,0),\text{Rate-1},0) jj
SPC SR(,EG-PC,j)\mathrm{SR}(\emptyset,\text{EG-PC},j) 0
Rate-1 SR(,Rate-1,j)\mathrm{SR}(\emptyset,\text{Rate-1},j) 0
P-01 SR((0),Rate-1,j1)\mathrm{SR}((0),\text{Rate-1},j-1) 11
P-0SPC SR((0),EG-PC,j1)\mathrm{SR}((0),\text{EG-PC},j-1) 11
Type-I SR((0,,0),Rate-1,1)\mathrm{SR}((0,\ldots,0),\text{Rate-1},1) j1j-1
Type-II SR((0,,0),EG-PC,2)\mathrm{SR}((0,\ldots,0),\text{EG-PC},2) j2j-2
Type-III SR(,EG-PC,j)\mathrm{SR}(\emptyset,\text{EG-PC},j) 0
Type-IV SR(,EG-PC,j)\mathrm{SR}(\emptyset,\text{EG-PC},j) 0
Type-V SR((0,,0,1),EG-PC,2)\mathrm{SR}((0,\ldots,0,1),\text{EG-PC},2) j2j-2
G-PC SR(,EG-PC,j)\mathrm{SR}(\emptyset,\text{EG-PC},j) 0
G-REP SR((0,,0),Rate-C,r)\mathrm{SR}((0,\ldots,0),\text{Rate-C},r) jrj-r
REP-SPC SR((1),EG-PC,2)\mathrm{SR}((1),\text{EG-PC},2) 11
0REPSPC SR((0,1),EG-PC,j2)\mathrm{SR}((0,1),\text{EG-PC},j-2) 22
001 SR((0,0),Rate-1,j2)\mathrm{SR}((0,0),\text{Rate-1},j-2) 22
REP-REPSPC SR((1,1),EG-PC,j2)\mathrm{SR}((1,1),\text{EG-PC},j-2) 22
Rate0-ML SR((0,1,0),Rate-1,0)\mathrm{SR}((0,1,0),\text{Rate-1},0) 33
REP-Rate1(REP1) SR((1),Rate-1,j1)\mathrm{SR}((1),\text{Rate-1},j-1) 11

III-C Repetition Sequence

In this subsection, a set of sequences, called repetition sequences, is defined that can be used to calculate the output bit estimates of an SR node based on the output bit estimates of its source node. To derive the repetition sequences, 𝒗\boldsymbol{v} is used to generate all the possible sequences that have to be XORed with the output of the source node to generate the output bit estimates of the SR node. Let ηk\eta_{k} denote the rightmost bit value of the left child node of the parent node of 𝒩rE\mathcal{N}^{E}_{r} at level k+1k+1. When v[k+1]=0v[k+1]=0, the left child node is a Rate-0 node so ηk=0\eta_{k}=0. When v[k+1]=1v[k+1]=1, the left child node is a REP node, thus ηk\eta_{k} can take the value of either 0 or 11. The number of repetition sequences is dependent on the number of different values that ηk\eta_{k} can take. Let W𝒗W_{\boldsymbol{v}} denote the number of ‘11’s in 𝒗\boldsymbol{v}. The number of all possible repetition sequences is thus 2W𝒗2^{W_{\boldsymbol{v}}}. Let 𝕊={𝒔1,,𝒔2W𝒗}\mathbb{S}=\{\boldsymbol{s}_{1},\ldots,\boldsymbol{s}_{2^{W_{\boldsymbol{v}}}}\} denote the set of all possible repetition sequences.

The output bits of SR node βij[1:2j]\beta_{i}^{j}[1:2^{j}] have the property that their repetition sequence is repeated in blocks of length 2jr2^{j-r}. Let βrE[1:2r]\beta_{r}^{E}[1:2^{r}] denote the output bits of the source node of an SR node 𝒩ji\mathcal{N}_{j}^{i}. The output bits for each block of length 2jr2^{j-r} in 𝒩ji\mathcal{N}_{j}^{i} with respect to the output bits of its source node can be written as

βji[(k1)2jr+1:k2jr]=βrE[k]𝒔l,\beta_{j}^{i}\left[\left(k-1\right)2^{j-r}+1:k2^{j-r}\right]=\beta_{r}^{E}\left[k\right]\oplus{\boldsymbol{s}_{l}}, (23)

where k{1,,2r}k\in\left\{1,\dots,2^{r}\right\} and 𝒔l={sl[1],,sl[2jr]}\boldsymbol{s}_{l}=\{s_{l}[1],\ldots,s_{l}[2^{j-r}]\} is the ll-th repetition sequence in 𝕊\mathbb{S}. To obtain the repetition sequence 𝒔l\boldsymbol{s}_{l} and with a slight abuse of terminology and notation for convenience, the Kronecker sum operator \boxplus is used, which is equivalent to the Kronecker product operator, except that addition in GF(22) is used instead of multiplication. For each set of values that ηk\eta_{k}’s can take, 𝒔l\boldsymbol{s}_{l} can be calculated as

𝒔l=(ηr,0)(ηr+1,0)(ηj1,0).\boldsymbol{s}_{l}=\left(\eta_{r},0\right)\boxplus\left(\eta_{r+1},0\right)\boxplus\cdots\boxplus\left(\eta_{j-1},0\right). (24)
Example 1 (Repetition sequences for SR((1,1),EG-PC,2)\mathrm{SR}((1,1),\text{EG-PC},2)).

Consider an example in which the SR node 𝒩41\mathcal{N}_{4}^{1} is located at level 44 of the decoding tree and its source node 𝒩24\mathcal{N}_{2}^{4} is an EG-PC node located at level 22. Since 𝐯=(1,1)\boldsymbol{v}=(1,1), W𝐯=2W_{\boldsymbol{v}}=2 and |𝕊|=4|\mathbb{S}|=4. For η1{0,1}\eta_{1}\in\{0,1\} and η2{0,1}\eta_{2}\in\{0,1\},

𝒔1\displaystyle\boldsymbol{s}_{1} =(0,0)(0,0)=(0,0,0,0),\displaystyle=(0,0)\boxplus(0,0)=(0,0,0,0),
𝒔2\displaystyle\boldsymbol{s}_{2} =(1,0)(0,0)=(1,1,0,0),\displaystyle=(1,0)\boxplus(0,0)=(1,1,0,0),
𝒔3\displaystyle\boldsymbol{s}_{3} =(0,0)(1,0)=(1,0,1,0),\displaystyle=(0,0)\boxplus(1,0)=(1,0,1,0),
𝒔4\displaystyle\boldsymbol{s}_{4} =(1,0)(1,0)=(0,1,1,0).\displaystyle=(1,0)\boxplus(1,0)=(0,1,1,0).

For a polar code with a given 𝒅\boldsymbol{d}, the locations of SR nodes in the decoding tree are fixed and can be determined off-line. Therefore, the repetition sequences in 𝕊{\mathbb{S}} of all of the SR nodes can be pre-computed and used in the course of decoding.

III-D Decoding of SR Nodes

To decode SR nodes, the LLR values αrlE[1:2r]\alpha_{r_{l}}^{E}[1:2^{r}] of the source node 𝒩rE\mathcal{N}_{r}^{E} are calculated based on the LLR values αji[1:2j]\alpha_{j}^{i}[1:2^{j}] of the SR node 𝒩ji\mathcal{N}_{j}^{i} for every repetition sequence 𝒔l\boldsymbol{s}_{l} as follows:

Proposition 1.

Let αji[1:2j]\alpha_{j}^{i}[1:2^{j}] be the LLR values of the SR node 𝒩ji\mathcal{N}_{j}^{i} and αrlE[1:2r]\alpha_{r_{l}}^{E}[1:2^{r}] be the LLR values of its source node 𝒩rE\mathcal{N}_{r}^{E} associated with the ll-th repetition sequence 𝐬l\boldsymbol{s}_{l}. For k{1,,2r}k\in\left\{1,\dots,2^{r}\right\} and l{1,,2W𝐯}l\in\left\{1,\dots,2^{W_{\boldsymbol{v}}}\right\},

αrlE[k]=m=12jrαji[(k1)2jr+m](1)sl[m].\alpha_{r_{l}}^{E}\left[k\right]=\sum_{m=1}^{2^{j-r}}\alpha_{j}^{i}\left[\left(k-1\right)2^{j-r}+m\right]\left(-1\right)^{s_{l}[m]}. (25)
Proof:

See Appendix A. ∎

Using (23) and (25), (10) can be written as

β^ji[1:2j]\displaystyle\hat{\beta}_{j}^{i}\left[1:2^{j}\right] =argmaxβji[1:2j]jik=12j(1)βji[k]αji[k]\displaystyle=\underset{\beta_{j}^{i}\left[1:2^{j}\right]\in\mathbb{C}_{j}^{i}}{\operatorname*{arg\,max}}\sum_{k=1}^{2^{j}}\left(-1\right)^{\beta_{j}^{i}\left[k\right]}\alpha_{j}^{i}\left[k\right] (26)
=argmaxβrE[1:2r]rE𝒔l𝕊k=12r(1)βrE[k]m=12jrαjli[(k1)2jr+m](1)sl[m]\displaystyle=\mkern-10.0mu\underset{\begin{subarray}{c}\beta_{r}^{E}\left[1:2^{r}\right]\in\mathbb{C}_{r}^{E}\\ \boldsymbol{s}_{l}\in{\mathbb{S}}\end{subarray}}{\operatorname*{arg\,max}}\!\sum_{k=1}^{2^{r}}\!\left(-\!1\right)^{\beta_{r}^{E}\left[k\right]}\!\sum_{m=1}^{2^{j-r}}\!\!\alpha_{j_{l}}^{i}\!\!\left[\left(k\!-\!\!1\right)\!2^{j-r}\!\!+\!\!m\right]\!\left(-\!1\right)^{s_{l}[m]}
=argmaxβrE[1:2r]rEl{1,,|𝕊|}k=12r(1)βrE[k]αrlE[k].\displaystyle=\mkern-10.0mu\underset{\begin{subarray}{c}\beta_{r}^{E}\left[1:2^{r}\right]\in\mathbb{C}_{r}^{E}\\ l\in\{1,\ldots,\left|{\mathbb{S}}\right|\}\end{subarray}}{\operatorname*{arg\,max}}\sum_{k=1}^{2^{r}}\left(-1\right)^{\beta_{r}^{E}\left[k\right]}\alpha_{r_{l}}^{E}\left[k\right]. (27)

Thus, the bit estimates of an SR node β^ji[1:2j]\hat{\beta}_{j}^{i}\left[1:2^{j}\right] can be calculated by finding the bit estimates of its source node βrE[1:2r]\beta_{r}^{E}\left[1:2^{r}\right] using (26) and the repetition sequences as shown in (23).

The decoding algorithm of an SR node 𝒩ji\mathcal{N}_{j}^{i} is described in Algorithm 1. It first computes αrlE\alpha_{r_{l}}^{E} for l{1,,|𝕊|}l\in\left\{1,\dots,\left|{\mathbb{S}}\right|\right\} and generates |𝕊|\left|{\mathbb{S}}\right| new paths by extending the decoding path at the jrj-r rightmost bits corresponding to ηr,ηr+1,,ηj1\eta_{r},\eta_{r+1},\dots,\eta_{j-1}. Note that the ll-th path is generated when the repetition sequence is 𝒔l{\boldsymbol{s}}_{l} and αrlE\alpha_{r_{l}}^{E}, β^rlE\widehat{\beta}_{r_{l}}^{E}, and β^jli\widehat{\beta}_{j_{l}}^{i} are its soft and hard messages. Then, the source node is decoded under the rule of the SC decoding. If the source node is a special node, a hard decision is made directly. Parity check and bit flipping will be performed further using Wagner decoder if the source node is an EG-PC node. Finally, the optimal decoding path index can be selected according to the comparison in (28) below and the decoding result is obtained using (23).

Based on Algorithm 1, the SR node-based fast SC (SRFSC) decoding algorithm is proposed. It follows the SC decoding algorithm schedule until an SR node is encountered where Algorithm 1 is executed. Note that the |𝕊|\left|{\mathbb{S}}\right| paths can be processed simultaneously and the path selection operation in step 3 of Algorithm 1 can be performed in parallel with the decoding of the source node in step 2 and the following gg function operation. Once the selected index l^\hat{l} is obtained, only the ll-th decoding path corresponding to 𝒔l{\boldsymbol{s}}_{l} is retained and the remaining paths are deleted.

0:  αji[1:2j]\alpha_{j}^{i}\left[1:2^{j}\right], 𝕊{\mathbb{S}} 0:  β^ji[1:2j]\hat{\beta}_{j}^{i}\left[1:2^{j}\right]   1) Soft message computation for l{1,,|𝕊|}l\in\left\{1,\dots,\left|{\mathbb{S}}\right|\right\} do         Calculate αrlE\;\alpha_{r_{l}}^{E} according to (25).   2) Decoding of source node 𝒩rE\mathcal{N}_{r}^{E} for l{1,,|𝕊|}l\in\left\{1,\dots,\left|{\mathbb{S}}\right|\right\} do              if SNT=Rate-C then             Decode source node 𝒩rE\mathcal{N}_{r}^{E} using αrlE\;\alpha_{r_{l}}^{E} and obtain β^rlE\hat{\beta}_{r_{l}}^{E};       else             if SNT=Rate-0 then                   β^rlE[k]=0,k{1,,2r}\hat{\beta}_{r_{l}}^{E}\left[k\right]=0,\;k\in\left\{1,\dots,2^{r}\right\};                                else // SNT=Rate-1 or SNT=EG-PC                   β^rlE[k]=h(αrlE[k]),k{1,,2r}\hat{\beta}_{r_{l}}^{E}\left[k\right]=h\left(\alpha_{r_{l}}^{E}\left[k\right]\right),\;k\in\left\{1,\dots,2^{r}\right\}.        if SNT=EG-PC then             Perform parity check and bit flipping on β^rlE\hat{\beta}_{r_{l}}^{E} using αrlE\alpha_{r_{l}}^{E}.   3) Comparison and path selection l^=argmaxl{1,,|𝕊|}k=12r|αrlE[k]|.\hat{l}=\underset{l\in\left\{1,\dots,\left|\mathbb{S}\right|\right\}}{\operatorname*{arg\,max}}\sum_{k=1}^{2^{r}}\left|\alpha_{r_{l}}^{E}\left[k\right]\right|. (28) Return β^jl^i\hat{\beta}_{j_{\hat{l}}}^{i} to parent node according to (23).
Algorithm 1 Decoding algorithm of SR node 𝒩ji\mathcal{N}_{j}^{i}

IV Hard-decision-aided Fast SC Decoding with Sequence Repetition Nodes

In this section, a novel threshold-based hard-decision-aided scheme is proposed, which speeds up the decoding of general nodes with no specific structure in the SRFSC decoding at high signal-to-noise ratios. Consequently, the threshold-based hard-decision-aided SRFSC (TA-SRFSC) decoding algorithm is proposed. In addition, a multi-stage decoding strategy is introduced to mitigate the possible error-correction performance loss of the proposed TA-SRFSC decoding.

IV-A Proposed Threshold-based Hard-decision-aided Scheme

In a binary AWGN channel, the LLR value αji[k]\alpha_{j}^{i}\left[k\right] of any node 𝒩ji\mathcal{N}_{j}^{i} can be approximated as a Gaussian variable as shown in Fig. 5. The red area in Fig. 5 represents the approximate probability of correct hard decision P~c\widetilde{P}_{c} and the blue area represents the approximate probability of incorrect hard decision P~e\widetilde{P}_{e} when βji[k]=0\beta_{j}^{i}\left[k\right]=0 such that

P~c=Q(Tmji2mji),P~e=Q(T+mji2mji),\widetilde{P}_{c}=Q\left(\frac{T-m_{j}^{i}}{\sqrt{2m_{j}^{i}}}\right),\;\;\widetilde{P}_{e}=Q\left(\frac{T+m_{j}^{i}}{\sqrt{2m_{j}^{i}}}\right), (29)

where Q(x)=12πxet22𝑑tQ\left(x\right)=\frac{1}{\sqrt{2\mathrm{\pi}}}\int_{x}^{\infty}e^{-\frac{t^{2}}{2}}dt. The area between the two dashed lines represents the approximate probability that a hard decision is not performed.

Refer to caption
Figure 5: Probability distribution of αji[k]\alpha_{j}^{i}\left[k\right] under Gaussian approximation for mji=8m_{j}^{i}=8. The red dashed area represents the probability of correct hard decision and the blue solid area represents the probability of incorrect hard decision when βji[k]=0\beta_{j}^{i}\left[k\right]=0.

To simplify the calculation of the threshold, we take a different approach than [32] by using the Gaussian distribution of αji[k]\alpha_{j}^{i}\left[k\right] and constraining the approximate probability of error when βji[k]=0\beta_{j}^{i}\left[k\right]=0 to be

P~e=Q(T+mji2mji)<Q(c),\widetilde{P}_{e}=Q\left(\frac{T+m_{j}^{i}}{\sqrt{2m_{j}^{i}}}\right)<Q\left(c\right), (30)

where cc (and thus Q(c)Q\left(c\right)) is a positive constant, whose selection method will be given in Observation 1. This is equivalent to T>mji+c2mjiT>-m_{j}^{i}+c\sqrt{2m_{j}^{i}}. Therefore, the threshold is

T=|mji+c2mji|,T=\left|-m_{j}^{i}+c\sqrt{2m_{j}^{i}}\right|, (31)

where the absolute value ensures TT is positive for all values of mji>0m_{j}^{i}>0 with any c>0c>0.

Under the Gaussian approximation in (14), for a node that undergoes hard decision in (19), the proposed threshold leads to a bounded probability of correct hard decision as shown in the following observation. Note that the analytical derivations are accurate under the assumption of Gaussian approximation.

Observation 1.

Let 0.5<ε<10.5<\varepsilon<1 and c>0c>0 be real numbers such that

Q(c)1ε2n.Q\left(c\right)\leq 1-\sqrt[2^{n}]{\varepsilon}. (32)

Performing hard decision in (19) with the threshold in (31) on nodes 𝒩ji\mathcal{N}_{j}^{i} whose mjim_{j}^{i} satisfy

mji12[cQ1(Q(c)1ε2n1)]2,m_{j}^{i}\geq\frac{1}{2}\left[c-Q^{-1}\left(\frac{Q\left(c\right)}{\frac{1}{\sqrt[2^{n}]{\varepsilon}}-1}\right)\right]^{2}, (33)

has a probability of correct hard decision that is lower bounded by ε2(nj)\sqrt[2^{\left(n-j\right)}]{\varepsilon}, under the assumption of Gaussian approximation and assuming all prior bits are decoded correctly.

Explanation: See Appendix B.

The proposed TA scheme performs hard decision on a node 𝒩ji\mathcal{N}_{j}^{i} only if (33) is satisfied. Furthermore, a hard decision is performed on a node if all of its input LLR values αji[k]\alpha_{j}^{i}\left[k\right] satisfy (19). Otherwise, standard SC decoding is applied on 𝒩ji\mathcal{N}_{j}^{i} to obtain the decoding result. Compared to the TA scheme in [32], the threshold values of both methods can be computed off-line. However, the proposed TA scheme in this paper has the following three advantages: 1) The proposed method can provide a higher latency reduction than the best case in [32] as shown in the simulation results; 2) The effect of the proposed threshold values on the error-correction performance is approximately predictable according to Observation 1; 3) According to (33), only a fraction of nodes undergo hard decision in the decoding process of the proposed TA scheme, which avoids unnecessary threshold comparisons. To speed up the decoding process, the proposed TA scheme is combined with SRFSC decoding that results in the TA-SRFSC decoding algorithm. In TA-SRFSC decoding, when one of the special nodes considered in SRFSC decoding is encountered, SRFSC decoding is performed and when a general node with no special structure is encountered, the proposed TA scheme is applied. The following observation provides an approximate upper bound on the BLER of the proposed TA-SRFSC decoding.

Observation 2.

Let BLERTASRFSC{\mathrm{BLER}}_{\mathrm{TA-SRFSC}} and BLERSRFSC{\mathrm{BLER}}_{\mathrm{SRFSC}} denote the BLER of the TA-SRFSC decoding and the SRFSC decoding respectively. We can have the following approximate inequality

BLERTASRFSC1ε(1BLERSRFSC).{\mathrm{BLER}}_{\mathrm{TA-SRFSC}}\lesssim 1-\varepsilon\left(1-{\mathrm{BLER}}_{\mathrm{SRFSC}}\right). (34)

Explanation: See Appendix C.

Observation 2 provides a method to derive the threshold value for a desired upper bound of the BLER for TA-SRFSC decoding approximately. In fact, a large threshold value results in a better error-correction performance than a small threshold value at the cost of lower decoding speed. Therefore, a trade-off between the error-correction performance and the decoding speed can be achieved with the proposed TA-SRFSC decoding algorithm.

IV-B Multi-stage Decoding

To mitigate the possible error-correction performance loss of the proposed TA-SRFSC decoding, a multi-stage decoding strategy is adopted in which a maximum of two decoding attempts is conducted. In the first decoding attempt, TA-SRFSC decoding is used. If this decoding fails and if there existed a node that underwent hard decision in this first decoding attempt, then a second decoding attempt using SRFSC decoding is conducted. To determine if the TA-SRFSC decoding failed, a cyclic redundancy check (CRC) is concatenated to the polar code and it is verified after the TA-SRFSC decoding. Additionally, the CRC is verified after the second decoding attempt by the SRFSC decoding to determine if the overall decoding process succeeded.

As shown in the next section, most of the received frames are decoded correctly by the proposed TA-SRFSC decoding in the first decoding attempt. As a result, the average decoding latency of the proposed multi-stage SRFSC decoding is very close to that of the TA-SRFSC decoding with CRC bits. However, the error-correction performance of the proposed multi-stage SRFSC decoding algorithm is slightly worse than that of SRFSC decoding due to the addition of CRC bits. As such, multi-stage SRFSC decoding trades off a slight degradation in error-correction performance to obtain a significant reduction in the average decoding latency. It is worth noting that, in order to improve the error-correction performance of the proposed scheme, the proposed multi-stage decoding strategy can be generalized to have more than two decoding attempts. In this scenario the first two attempts are the same as described above, i.e., TA-SRFSC decoding is used first followed by SRFSC decoding.111If the TA-SRFSC decoder does not satisfy the CRC, the SRFSC decoding either should start from the beginning or from the first node which underwent the hard decision. In this paper, the SRFSC decoder starts decoding from the first bit to avoid storing intermediate LLRs. The third and any subsequent attempts would use increasingly more powerful decoding techniques than the first two, such as SCL decoding.

V Decoding Latency

In this section, the decoding latency of the proposed fast decoders is analyzed under two cases: 1) with no resource limitation and 2) with hardware resource constraints. The former is to facilitate the comparisons with the works that do not consider hardware implementation constraints [19, 20], and the latter is for comparison with those that do [12, 18, 23].

V-A No Resource Limitation

In this case, the same assumptions as in [1, 19] are used and the decoding latency is measured in terms of the number of required time steps. More specifically: 1) there is no resource limitation so that all the parallelizable instructions are performed in one time step, 2) bit operations are carried out instantaneously, 3) addition/subtraction of real numbers and check-node operation consume one time step, and 4) Wagner decoding can be performed in one time step.

V-A1 SRFSC and TA-SRFSC

For any node 𝒩ji\mathcal{N}_{j}^{i} that has no special structure, e.g. a parent node of an SR node, and that satisfies (33), the threshold comparison in (19) is performed in parallel with the calculation of the LLR values of its left child node. Therefore, the proposed hard decision scheme does not affect the latency requirements for the nodes that undergo hard decision.

The number of time steps required for the decoding of the SR node is calculated according to Algorithm 1. In Step 1, the calculation of the LLR values for the source node requires one time step if 𝒗\boldsymbol{v}\neq\emptyset. If 𝒗=\boldsymbol{v}=\emptyset, then the LLR values of the source node are available immediately. Thus, the required number of time steps for Step 1 is

𝒯1={0,if 𝒗=,1,if 𝒗.{\mathcal{T}}_{1}=\begin{cases}0,&\text{if }\boldsymbol{v}=\emptyset,\\ 1,&\text{if }\boldsymbol{v}\neq\emptyset.\end{cases} (35)

The time step requirement of Step 2 depends on the source node type. If SNT=Rate-C\text{SNT}=\text{Rate-C}, the time step requirement of Step 2 is the time step requirement of the Rate-C node. If SNT=Rate-0\text{SNT}=\text{Rate-0} or Rate-1, then there is no latency overhead in Step 2. If SNT=EG-PC\text{SNT}=\text{EG-PC} and in accordance with Section III-B, zz can be estimated in two time steps if the leftmost node is a REP node (one time step for performing the check-node operation and one time step for adding the LLR values). Also, Wagner decoding can be performed in parallel with the estimation of zz assuming z=0z=0 or z=1z=1. As such, at most two time steps are required for parity check and bit flipping of the EG-PC node. The required number of time steps for Step 2 is

𝒯2={0,if SNT=Rate-0 or Rate-1,1 or 2,if SNT=EG-PC,2r+12,if SNT=Rate-C.{\mathcal{T}}_{2}=\begin{cases}0,&\text{if }\text{SNT}=\text{Rate-0 or Rate-1},\\ 1\text{ or }2,&\text{if }\text{SNT}=\text{EG-PC},\\ 2^{r+1}-2,&\text{if }\text{SNT}=\text{Rate-C}.\end{cases} (36)

Step 3 consumes two time steps using an adder tree and a comparison tree if |𝕊|>1\left|\mathbb{S}\right|>1 and no time steps otherwise. Thus, the number of required time steps for Step 3 is

𝒯3={0,if |𝕊|=1,2,if |𝕊|>1.{\mathcal{T}}_{3}=\begin{cases}0,&\text{if }\left|\mathbb{S}\right|=1,\\ 2,&\text{if }\left|\mathbb{S}\right|>1.\end{cases} (37)

Since path selection in Step 3 can be executed in parallel with the decoding of source node in Step 2 and the following gg function calculation, the total number of time steps required to decode an SR node can be expressed as

𝒯SR=𝒯1+max(𝒯2,𝒯31),{\mathcal{T}}_{\mathrm{SR}}={\mathcal{T}}_{1}+\max\left({\mathcal{T}}_{2},{\mathcal{T}}_{3}-1\right), (38)

where 𝒯31{\mathcal{T}}_{3}-1 indicates that at least one time step in 𝒯3{\mathcal{T}}_{3} can be reduced by parallelizing Step 3 and the gg function calculation. Therefore, 𝒯SR{\mathcal{T}}_{\mathrm{SR}} is a variable that is dependent on its parameters. However, with a given polar code, the total number of time steps required for the decoding of the polar code using SRFSC decoding is fixed, regardless of the channel conditions.

V-A2 Multi-stage SRFSC

Let 𝒯TASRFSCcrc{\mathcal{T}}_{\mathrm{TA-SRFSC}_{\mathrm{crc}}} and 𝒯SRFSCcrc{\mathcal{T}}_{\mathrm{SRFSC}_{\mathrm{crc}}} denote the average decoding latency of the proposed TA-SRFSC decoding and the SRFSC decoding using CRC bits, respectively. The average decoding latency of multi-stage SRFSC decoding, 𝒯MultistageSRFSC{\mathcal{T}}_{\mathrm{Multi-stage\;SRFSC}}, is

𝒯MultistageSRFSC=𝒯TASRFSCcrc+PRedecoding𝒯SRFSCcrc,\mathcal{T}_{\mathrm{Multi-stage\;SRFSC}}={\mathcal{T}}_{\mathrm{TA-SRFSC}_{\mathrm{crc}}}+P_{\mathrm{Re-decoding}}{\mathcal{T}}_{\mathrm{SRFSC}_{\mathrm{crc}}}, (39)

where PRedecodingP_{\mathrm{Re-decoding}} indicates the probability that TA-SRFSC decoding fails and there is at least one node that undergoes hard decision. Note that PRedecodingP_{\mathrm{Re-decoding}} is less than or equal to the probability that the output of TA-SRFSC decoding fails the CRC verification, which can be approximated by BLERTASRFSCcrc{\mathrm{BLER}}_{\mathrm{TA-SRFSC}_{\mathrm{crc}}}. The approximation is due to the fact that the undetected error probability of CRC is negligible when its length is long enough [38]. In accordance with Observation 2, the average decoding latency requirement for the proposed multi-stage SRFSC decoding is

𝒯MultistageSRFSC𝒯TASRFSCcrc+(1ε(1BLERSRFSCcrc))𝒯SRFSCcrc.\begin{array}[]{l}\mathcal{T}_{\mathrm{Multi-stage\;SRFSC}}\\ \lesssim{\mathcal{T}}_{\mathrm{TA-SRFSC}_{\mathrm{crc}}}+\left(1-\varepsilon\left(1-{\mathrm{BLER}}_{\mathrm{SRFSC}_{\mathrm{crc}}}\right)\right){\mathcal{T}}_{\mathrm{SRFSC}_{\mathrm{crc}}}.\end{array} (40)

Since the decoding latency of SRFSC decoding is fixed, the average decoding latency and the worst case decoding latency of SRFSC decoding are equivalent. The worst case decoding latency of the proposed TA-SRFSC decoding can be calculated when none of the nodes in the decoding tree undergo hard decision. This occurs when the channel has a high level of noise. Thus, the worst case decoding latency of the proposed TA-SRFSC decoding is equivalent to the decoding latency of the SRFSC decoding. Moreover, when the channel is too noisy, PRedecoding0P_{\mathrm{Re-decoding}}\approx 0, because almost none of the nodes undergo hard decision. Thus, the worst case decoding latency of the proposed multi-stage SRFSC decoding is equivalent to the worst case decoding latency of TA-SRFSC decoding using a CRC, which is the latency of SRFSC decoding with a CRC.

V-B With Hardware Resource Constraints

In this case, the decoding latency is measured in terms of the number of clock cycles in the actual hardware implementation. Contrary to Section V-A, a time step may take more than one clock cycle in a realistic hardware implementation. First, the idea of a semi-parallel decoder in [39] is adopted in which at most PP processing elements can run at the same time. Thus, parallel ff function or gg function operations on more than PP processing elements require more than one clock cycle. Second, the adder tree used for the calculation of LLR values in step 1 of Algorithm 1, and the compare-and-select (CS) tree used by the Wagner decoder to find the index of the least reliable input bit in step 2 of Algorithm 1, both require pipeline registers to reduce the critical path delay and improve the operating frequency and the overall throughput. The addition of these pipeline registers increases the number of decoding clock cycles. The exact number depends on the implementation and the given polar code.

VI Results and Comparison

In this section, the average decoding latency and the error-correction performance of the proposed decoding algorithms are analyzed and compared with state-of-the-art fast SC decoding algorithms. To derive the results, polar codes of length N{128,512,1024}N\in\left\{128,512,1024\right\}, which are adopted in the 5G standard [40], are used. To do a fair comparison, for baseline decoding algorithms without hardware implementation, the latency is calculated under the assumptions in Section V-A. Otherwise, the hardware implementation results are reported.

To simulate the effect of ε\varepsilon on the error-correction performance and the latency of the proposed decoding algorithms, the assumption in [32] is adopted and three values of ε{0.9,0.99,0.999}\varepsilon\in\{0.9,0.99,0.999\} are selected. In accordance with (32), c3.8c\geq 3.8 for ε=0.9\varepsilon=0.9, c4.3c\geq 4.3 for ε=0.99\varepsilon=0.99, and c4.8c\geq 4.8 for ε=0.999\varepsilon=0.999. According to (30), with the increasing of cc, P~e\widetilde{P}_{e} decreases. At the same time, fewer nodes undergo hard decision and the decoding latency increases. To get a good trade-off between error-correction performance and latency, we set c=3.8c=3.8 for ε=0.9\varepsilon=0.9, c=4.3c=4.3 for ε=0.99\varepsilon=0.99, and c=4.8c=4.8 for ε=0.999\varepsilon=0.999. Consequently, mji9.3891m_{j}^{i}\geq 9.3891 for ε=0.9\varepsilon=0.9, mji14.7255m_{j}^{i}\geq 14.7255 for ε=0.99\varepsilon=0.99, and mji16.1604m_{j}^{i}\geq 16.1604 for ε=0.999\varepsilon=0.999 in accordance with (33). Using these values, the threshold TT defined in (31) and the BLER upper bound for the TA-SRFSC decoding in (34) can be calculated for different values of ε\varepsilon.

TABLE II: The number of SR nodes with different |𝕊|\left|{\mathbb{S}}\right| and the number of general nodes in 5G polar codes of lengths N{128,512,1024}N\in\{128,512,1024\} and rates R{1/4,1/2,3/4}R\in\{1/4,1/2,3/4\}.
SR General
NN RR |𝕊|\left|{\mathbb{S}}\right| Total Total
1 2 4 8 16
128 1/41/4 1 0 2 1 0 4 3
1/21/2 4 3 1 0 0 8 7
3/43/4 8 2 0 0 0 10 9
512 1/41/4 12 2 2 1 0 17 16
1/21/2 15 5 2 0 1 23 22
3/43/4 13 5 1 0 1 20 19
1024 1/41/4 17 6 2 2 1 28 27
1/21/2 25 8 2 3 1 39 38
3/43/4 29 8 2 1 0 40 39
TABLE III: Node length of SR nodes with different |𝕊|\left|{\mathbb{S}}\right| in 5G polar codes of lengths N{128,512,1024}N\in\{128,512,1024\} and rates R=1/2R=1/2.
NN Length |𝕊|\left|{\mathbb{S}}\right|
1 2 4 8 16
128 88 2 2 0 0 0
1616 0 1 1 0 0
3232 2 0 0 0 0
512 88 7 3 0 0 0
1616 4 1 2 0 0
3232 3 1 0 0 0
6464 1 0 0 0 0
128128 0 0 0 0 1
1024 88 10 6 0 0 0
1616 7 1 2 0 0
3232 4 0 0 3 0
6464 2 1 0 0 1
128128 2 0 0 0 0

Table III reports the number of SR nodes with different |𝕊|\left|{\mathbb{S}}\right|, the total number of SR nodes, and the total number of general nodes with no special structure, at different code lengths and rates. It can be seen that when the code length is 128128, 512512, and 10241024, the codes with rate 1/21/2 have the largest proportion of nodes with |𝕊|>1\left|\mathbb{S}\right|>1, respectively. This in turn results in more latency savings because a higher degree of parallelism can be exploited with these nodes. Thanks to the binary tree structure and the fact that SR nodes are always present at an intermediate level of the decoding tree, the number of traversed general nodes with no special structure equals the total number of SR nodes minus one, since they are the parent nodes of SR nodes. Table III shows the length of SR nodes with different |𝕊|\left|{\mathbb{S}}\right| at different code lengths when R=1/2R=1/2. The length of the SR nodes in the decoding tree corresponds to the level in the decoding tree that they are located. SR nodes with larger |𝕊|\left|{\mathbb{S}}\right| that are located on a higher level of the decoding tree contribute more in the overall latency reduction.

Table IV reports the number of time steps required to decode polar codes of lengths N{128,512,1024}N\in\{128,512,1024\} and rates R{1/4,1/2,3/4}R\in\{1/4,1/2,3/4\} with the proposed SRFSC decoding algorithm, and compares it with the required number of time steps of the decoders in [12, 18, 19, 20, 23]. Note that the decoder in [23] has the minimum required time steps amongst all the previous works considered in Table IV. Three versions of the proposed SRFSC decoder are considered in Table IV: the SRFSC decoder that only utilizes SR nodes; the SRFSC{}^{{}^{\prime}} decoder that considers SR nodes and P-0SPC nodes that is adopted in all other baseline decoders in Table IV; and the SRFSC′′{}^{{}^{\prime\prime}} decoder that uses SR nodes, P-0SPC nodes, and the operation mergers F×2\text{F}^{\times 2} and G-F. It can be seen that the SRFSC′′{}^{{}^{\prime\prime}} decoder requires fewer time steps with respect to other decoders, except for the case of N=128N=128 and R=3/4R=3/4, which has a frequent occurrence of REP-SPC nodes that provides an advantage for [23]. This is due to the fact that the REP-SPC decoder in other works decodes REP and SPC nodes in parallel, consuming only a single time step, while two time steps are needed to decode the REP-SPC node in the proposed SRFSC decoder.

TABLE IV: Number of time steps for different fast SC decoding algorithms of polar codes of lengths N{128,512,1024}N\in\left\{128,512,1024\right\} and rates R{1/4,1/2,3/4}R\in\left\{1/4,1/2,3/4\right\}.
NN RR [12] [18] [19] [20] [23] SRFSC SRFSC{}^{{}^{\prime}} SRFSC′′{}^{{}^{\prime\prime}}
128 1/41/4 25 23 24 23 10 13 12 10
1/21/2 33 31 24 24 21 25 24 20
3/43/4 22 22 22 22 16 29 25 20
512 1/41/4 73 70 63 63 42 57 52 40
1/21/2 87 83 77 77 56 72 66 52
3/43/4 79 73 64 64 50 63 56 45
1024 1/41/4 122 115 110 108 68 92 85 66
1/21/2 156 144 138 138 94 127 115 90
3/43/4 138 132 116 116 89 123 111 86

Table V compares the required number of clock cycles and the maximum operating frequency in the hardware implementation of different decoders for a polar code with N=1024N=1024, R{1/2,1/4,3/4}R\in\{1/2,1/4,3/4\}, and P=64P=64 for the proposed SRFSC decoding algorithm (taken from [41]) and the decoders in [12, 22, 23, 18]. All FPGA results are for an Altera Stratix IV EP4SGX530KH40C2 FPGA device and all ASIC results use TSMC 65nm CMOS technology. The required number of clock cycles for the proposed SRFSC decoder can be further reduced if we consider node-branch merging and branch operation merging similar to [23]. We have verified that the introduction of P-RSPC node to the proposed SRFSC decoder lengthens the critical path of the decoder, but the merging of branch operations, F×2\text{F}^{\times 2} and G-F, does not have an effect on the operating frequency. Thus, an enhanced SRFSC decoder with merging branch operations F×2\text{F}^{\times 2} and G-F is implemented and denoted as SRFSC in Table V. It can be seen that the SRFSC decoder requires a smaller number of clock cycles compared to other decoders and the reduction with respect to [23] is 8%8\%, 4%4\%, and 8%8\% at R{1/4,1/2,3/4}R\in\{1/4,1/2,3/4\}, respectively. Moreover, the SRFSC decoder and the SRFSC decoder both have the highest maximum operating frequency fmaxf_{\max} when implemented on the FPGA. Compared with the works that reported fmaxf_{\max} results for FPGA implementations, our decoders achieve an improvement of at least 10%10\%.

TABLE V: Required number of clock cycles and maximum operating frequency for different decoding algorithms of polar codes of length N=1024N=1024 with rate R={1/4,1/2,3/4}R=\left\{1/4,1/2,3/4\right\} and P=64P=64.
clock cycles fmax(MHz)f_{\max}(\text{MHz})
RR FPGA ASIC
1/41/4 1/21/2 3/43/4
SRFSC 186 222 200 109.6
SRFSC{\text{SRFSC}}^{\ast} 155 191 166 109.6
[23] 168 198 181 430
[22] 189 214 185 89.6 420
[18] 221 252 216 450
[12] 225 270 225 99.8 450
Refer to caption
Figure 6: BLER performance of different decoding algorithms for the 5G polar code of length N=1024N=1024 and rate R=1/2R=1/2.

Fig. 6 shows the BLER performance of different decoding algorithms when N=1024N=1024 and R=1/2R=1/2, for different values of energy per bit to noise power spectral density ratio (Eb/N0\mathrm{E_{b}}/\mathrm{N}_{0}). For each value of ε\varepsilon, the BLER of TA-SRFSC decoding is depicted together with the upper bound calculated in Observation 2. It can be seen that the introduction of the TA scheme results in BLER performance loss for the proposed TA-SRFSC decoding with respect to SC and SRFSC decoding, especially at higher values of Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}}. The simulations confirm that the BLER curves of TA-SRFSC decoding fall below their respective upper bounds. It can also be seen that, as the Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}} value increases beyond a specific point, the BLER performance of the TA-SRFSC decoding degrades. This is because of the difference in the performance of hard-decision and soft-decision decoding. In accordance with (33), more nodes undergo hard decision decoding for larger values of Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}}. Therefore, while the channel conditions improve, the hard decision decoding introduces errors that reduce the error-correction performance gain associated with these large Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}} values. As a result, the BLER performance degrades after a certain value of Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}}. This phenomenon exists as long as there are nodes that can undergo hard decision decoding. After all the nodes are decoded using hard decision, the BLER performance improves again as Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}} increases.

The proposed multi-stage SRFSC decoder is implemented using three different CRC lengths that are adopted in 5G to identify whether the decoding succeeded or failed: the CRC of length 66 (CRC66) with generator polynomial D6+D5+1D^{6}+D^{5}+1, the CRC of length 1111 (CRC1111) with generator polynomial D11+D10+D9+D5+1D^{11}+D^{10}+D^{9}+D^{5}+1, and the CRC of length 1616 (CRC1616) with generator polynomial D16+D12+D5+1D^{16}+D^{12}+D^{5}+1. For all values of ε\varepsilon, the proposed multi-stage SRFSC decoding results in almost the same BLER performance. Therefore, only the curve with ε=0.9\varepsilon=0.9 is plotted in Fig. 6 for the multi-stage SRFSC decoding. CRC1616 provides a better error-correction performance compared to CRC66 and CRC1111, especially at high values of Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}}, due to high undetected probability of error for short CRC lengths. Hence, CRC1616 is selected for the polar code of length N=1024N=1024 in this paper. It can be seen that the multi-stage SRFSC decoding with CRC1616 demonstrates a slight performance loss compared to the conventional SC and SRFSC decoders as a result of adding extra CRC bits. For polar codes of other lengths and rates, a similar trend can also be observed when comparing the BLER performance of different schemes.

Refer to caption
Figure 7: Average decoding latency of different decoding algorithms for the 5G polar code of length N=1024N=1024 and rate R=1/2R=1/2.

Fig. 7 presents the average decoding latency in terms of the required number of time steps for the proposed TA-SRFSC decoding and multi-stage decoding with CRC1616. In particular, it compares them with the latency of SRFSC decoding and the decoder in [32] at different values of Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}} when N=1024N=1024 and R=1/2R=1/2. It can be seen that the required number of time steps for the proposed TA-SRFSC decoding decreases as Eb/N0\mathrm{E_{b}}/\mathrm{N_{0}} increases and is reduced by 40%40\% for ε=0.999\varepsilon=0.999, by 48%48\% for ε=0.99\varepsilon=0.99, and by 57%57\% for ε=0.9\varepsilon=0.9, compared to SRFSC decoding at Eb/N0=5\mathrm{E_{b}}/\mathrm{N_{0}}=5 dB. In addition, the proposed multi-stage SRFSC decoding reduces the required number of time steps by 37%37\% for ε=0.999\varepsilon=0.999, by 46%46\% for ε=0.99\varepsilon=0.99, and by 53%53\% for ε=0.9\varepsilon=0.9, with respect to SRFSC decoding at Eb/N0=5\mathrm{E_{b}}/\mathrm{N_{0}}=5 dB. The required number of time steps for the proposed multi-stage SRFSC decoding outperforms the method in [32] with ct=1c_{t}=1 by 19%19\% for ε=0.9\varepsilon=0.9 at Eb/N0=5\mathrm{E_{b}}/\mathrm{N_{0}}=5 dB while providing a significantly better BLER performance. Fig. 7 also presents the approximate upper bound derived in (40). It can be seen in the figure that the upper bound in (40) becomes tighter as ε\varepsilon increases.

Refer to caption
Figure 8: Average number of threshold comparisons of the proposed TA-SRFSC decoding in comparison with the hard-decision scheme in [32] for the 5G polar code of length N=1024N=1024 and rate R=1/2R=1/2.

Fig. 8 compares the average number of threshold comparisons in (19) for the proposed TA-SRFSC decoder with ε{0.9,0.99,0.999}\varepsilon\in\{0.9,0.99,0.999\}, and the decoder in [32] with ct{1,2}c_{t}\in\{1,2\} for the 5G polar code of length N=1024N=1024 and rate R=1/2R=1/2. It can be seen that the proposed TA-SRFSC decoder shows significant benefits with respect to the decoder of [32] in terms of the average number of threshold comparisons. The TA-SRFSC decoder with ε=0.9\varepsilon=0.9 provides at least 36%36\% reduction with respect to [32] with ct=1c_{t}=1 while having a lower decoding latency. This means the decoder in [32] executes many unnecessary threshold comparison operations, while TA-SRFSC decoding only makes hard decisions when a node satisfies the condition in Equation (33).

VII Conclusion

In this work, a new sequence repetition (SR) node is identified in the successive-cancellation (SC) decoding tree of polar codes and an SR node-based fast SC (SRFSC) decoder is proposed. In addition, to speed up the decoding of nodes with no specific structure, the SRFSC decoder is combined with a threshold-based hard-decision-aided (TA) scheme and a multi-stage decoding strategy. We show that this method further reduces the decoding latency at high SNRs. In particular, hardware implementation results on an FPGA for a polar code of length 10241024 with code rates 1/41/4, 1/21/2, and 3/43/4 show that the proposed SRFSC decoder with merging branch operations requires up to 8%8\% fewer clock cycles and achieves 10%10\% higher maximum operating frequency compared to state-of-the-art decoders. In addition, the proposed TA-SRFSC decoding reduces the average decoding latency by 57%57\% with respect to SRFSC decoding at Eb/N0=5\mathrm{E_{b}}/\mathrm{N_{0}}=5 dB on a polar code of length 10241024 and rate 1/21/2. This average latency saving is particularly important in real-time applications such as video. Future work includes the design of a fast SC list decoder using SR nodes.

Appendix A Proof of Proposition 1

Let IkI_{k} denote the k×kk\times k identity matrix for k1k\geq 1. Since the source node is the rightmost node in an SR node, the gg function calculation in (3) can be used as

αrE[1:2r]=\displaystyle\alpha_{r}^{E}\left[1:2^{r}\right]= αji[1:2j]×(I2j1((1)ηj1,1)T)\displaystyle\alpha_{j}^{i}\left[1:2^{j}\right]\times\left(I_{2^{j-1}}\otimes\left(\left(-1\right)^{\eta_{j-1}},1\right)^{T}\right)
×(I2j2((1)ηj2,1)T)×\displaystyle\times\left(I_{2^{j-2}}\otimes\left(\left(-1\right)^{\eta_{j-2}},1\right)^{T}\right)\times\cdots
×(I2r((1)ηr,1)T).\displaystyle\times\left(I_{2^{r}}\otimes\left(\left(-1\right)^{\eta_{r}},1\right)^{T}\right).

Using the identity (AB)×(CD)=(A×C)(B×D)\left(A\otimes B\right)\times\left(C\otimes D\right)=\left(A\times C\right)\otimes\left(B\times D\right) with A=I2j2A=I_{2^{j-2}}, B=I2((1)ηj1,1)TB=I_{2}\otimes\left(\left(-1\right)^{\eta_{j-1}},1\right)^{T}, C=I2j2C=I_{2^{j-2}}, and D=((1)ηj2,1)TD=\left(\left(-1\right)^{\eta_{j-2}},1\right)^{T} results in

αrE[1:2r]=\displaystyle\alpha_{r}^{E}\left[1:2^{r}\right]= αji[1:2j]×\displaystyle\alpha_{j}^{i}\left[1:2^{j}\right]\times\cdots
[(I2j2×I2j2)(I2((1)ηj1,1)T×((1)ηj2,1)T)]\displaystyle\Big{[}\!\left(I_{2^{j-2}}\times I_{2^{j-2}}\right)\otimes\left(I_{2}\otimes\left(\left(-1\right)^{\eta_{j-1}},1\right)^{T}\times\left(\left(-1\right)^{\eta_{j-2}},1\right)^{T}\right)\!\Big{]}
×(I2j3((1)ηj3,1)T)××(I2r((1)ηr,1)T),\displaystyle\times\left(I_{2^{j-3}}\otimes\left(\left(-1\right)^{\eta_{j-3}},1\right)^{T}\right)\times\cdots\times\left(I_{2^{r}}\otimes\left(\left(-1\right)^{\eta_{r}},1\right)^{T}\right),

which can be written as

αrE[1:2r]=\displaystyle\alpha_{r}^{E}\left[1:2^{r}\right]= αji[1:2j]×I2j2(((1)ηj2,1)T((1)ηj1,1)T)\displaystyle\alpha_{j}^{i}\left[1:2^{j}\right]\times I_{2^{j-2}}\otimes\left(\left(\left(-1\right)^{\eta_{j-2}},1\right)^{T}\otimes\left(\left(-1\right)^{\eta_{j-1}},1\right)^{T}\right)
×(I2j3((1)ηj3,1)T)××(I2r((1)ηr,1)T),\displaystyle\times\left(I_{2^{j-3}}\otimes\left(\left(-1\right)^{\eta_{j-3}},1\right)^{T}\right)\times\cdots\times\left(I_{2^{r}}\otimes\left(\left(-1\right)^{\eta_{r}},1\right)^{T}\right),

where the identity I2(a1,,ak)T×(b1,b2)T=(b1,b2)T(a1,,ak)TI_{2}\otimes\left(a_{1},\dots,a_{k}\right)^{T}\times\left(b_{1},b_{2}\right)^{T}=\left(b_{1},b_{2}\right)^{T}\otimes\left(a_{1},\dots,a_{k}\right)^{T} is used. Repeating the above procedures results in

αrE[1:2r]\displaystyle\alpha_{r}^{E}\left[1:2^{r}\right] =αji[1:2j]×I2r(((1)ηr,1)T((1)ηj1,1)T)\displaystyle=\alpha_{j}^{i}\left[1:2^{j}\right]\times I_{2^{r}}\!\otimes\!\left(\left(\left(-1\right)^{\eta_{r}},1\right)^{T}\!\otimes\!\cdots\otimes\left(\left(-1\right)^{\eta_{j-1}},1\right)^{T}\right)
=αji[1:2j]×I2r((1)sl[1],(1)sl[2],,(1)sl[2jr])T.\displaystyle=\alpha_{j}^{i}\left[1:2^{j}\right]\times\!I_{2^{r}}\!\otimes\!\left(\left(-1\right)^{{s}_{l}\left[1\right]}\!,\!\left(-1\right)^{{s}_{l}\left[2\right]}\!,\!\ldots\!,\!\left(-1\right)^{{s}_{l}\left[2^{j-r}\right]}\right)^{\!T}\!.

Thus for k{1,,2r}k\in\left\{1,\dots,2^{r}\right\},

αrlE[k]=m=12jrαji[(k1)2jr+m](1)sl[m].\alpha_{r_{l}}^{E}\left[k\right]=\sum_{m=1}^{2^{j-r}}\alpha_{j}^{i}\left[\left(k-1\right)2^{j-r}+m\right]\left(-1\right)^{s_{l}[m]}.

Appendix B Explanation of Observation 1

To explain Observation 1, a lemma is first introduced as follows.

Lemma 1. Under the Gaussian approximation assumption in (14), for any node 𝒩ji\mathcal{N}_{j}^{i} whose 2j2^{j} bits undergo a hard decision in (19), assuming all prior bits are decoded correctly, the probability of correct decoding can be calculated as (P~cP~e+P~c)2j\left(\frac{\widetilde{P}_{c}}{\widetilde{P}_{e}+\widetilde{P}_{c}}\right)^{2^{j}}.

Proof:

In accordance with Fig. 5, for any node 𝒩ji\mathcal{N}_{j}^{i}, considering all the previous bits are decoded correctly, the probability that the kk-th bit (1k2j1\leq k\leq 2^{j}) in the node undergoes a hard decision is P~c+P~e\widetilde{P}_{c}+\widetilde{P}_{e}. Moreover, The probability of a correct hard decision for the kk-th bit in the node is P~c\widetilde{P}_{c}, regardless of the value of βji[k]\beta_{j}^{i}\left[k\right]. Thus, the conditional probability that a hard decision on the kk-th bit is correct given that the kk-th bit undergoes a hard decision is P~cP~e+P~c\frac{\widetilde{P}_{c}}{\widetilde{P}_{e}+\widetilde{P}_{c}}. Since the LLR values of bits in a node are independent of each other, the conditional probability that hard decisions on all the 2j2^{j} bits of node 𝒩ji\mathcal{N}_{j}^{i} are correct given that all its 2j2^{j} bits undergo hard decisions can be calculated as (P~cP~e+P~c)2j\left(\frac{\widetilde{P}_{c}}{\widetilde{P}_{e}+\widetilde{P}_{c}}\right)^{2^{j}}. ∎

To have a probability of correct decoding of at least ε\varepsilon for all the nodes that undergo hard decision in a polar code of length 2n2^{n}, any such node 𝒩ji\mathcal{N}_{j}^{i} is assumed to have the probability of correct decoding of at least ε2(nj)\sqrt[2^{\left(n-j\right)}]{\varepsilon}. Therefore and by using the result in Lemma 1, we have

(P~cP~e+P~c)2jε2(nj),\left(\frac{\widetilde{P}_{c}}{\widetilde{P}_{e}+\widetilde{P}_{c}}\right)^{2^{j}}\geq\sqrt[2^{\left(n-j\right)}]{\varepsilon}, (41)

which is equivalent to

P~cP~e+P~cε2n.\frac{\widetilde{P}_{c}}{\widetilde{P}_{e}+\widetilde{P}_{c}}\geq\sqrt[2^{n}]{\varepsilon}. (42)

If mji2c2m_{j}^{i}\leq 2c^{2}, then T=mji+c2mjiT=-m_{j}^{i}+c\sqrt{2m_{j}^{i}}, P~c=Q(c2mji)\widetilde{P}_{c}=Q\left(c-\sqrt{2m_{j}^{i}}\right), and P~e=Q(c)\widetilde{P}_{e}=Q\left(c\right). When 0.5<ε<10.5<\varepsilon<1, (42) can be written as

12[cQ1(Q(c)1ε2n1)]2mji2c2,\frac{1}{2}\left[c-Q^{-1}\left(\frac{Q\left(c\right)}{\frac{1}{\sqrt[2^{n}]{\varepsilon}}-1}\right)\right]^{2}\leq m_{j}^{i}\leq 2c^{2}, (43)

which requires

Q(c)1ε2n.Q\left(c\right)\leq 1-\sqrt[2^{n}]{\varepsilon}. (44)

If mji2c2m_{j}^{i}\geq 2c^{2}, then T=mjic2mjiT=m_{j}^{i}-c\sqrt{2m_{j}^{i}}, P~c=Q(c)\widetilde{P}_{c}=Q\left(-c\right), and P~e=Q(2mjic)\widetilde{P}_{e}=Q\left(\sqrt{2m_{j}^{i}}-c\right). Thus (42) can be written as

mjimax{2c2,12[c+Q1((1ε2n1)Q(c))]2}.m_{j}^{i}\geq\max\left\{2c^{2},\frac{1}{2}\left[c+Q^{-1}\left(\left(\frac{1}{\sqrt[2^{n}]{\varepsilon}}-1\right)Q\left(-c\right)\right)\right]^{2}\right\}. (45)

If (44) holds and by using the fact that Q(c)=1Q(c)Q(-c)=1-Q(c), then

2c212[c+Q1((1ε2n1)Q(c))]2.2c^{2}\geq\frac{1}{2}\left[c+Q^{-1}\left(\left(\frac{1}{\sqrt[2^{n}]{\varepsilon}}-1\right)Q\left(-c\right)\right)\right]^{2}. (46)

Thus mji2c2m_{j}^{i}\geq 2c^{2}, which always holds based on the initial assumption. Therefore, it suffices that

mji12[cQ1(Q(c)1ε2n1)]2,m_{j}^{i}\geq\frac{1}{2}\left[c-Q^{-1}\left(\frac{Q\left(c\right)}{\frac{1}{\sqrt[2^{n}]{\varepsilon}}-1}\right)\right]^{2}, (47)

and (44) to ensure (42). In other words, assuming all previous bits are decoded correctly, the probability that any node 𝒩ji\mathcal{N}_{j}^{i} that undergo hard decision (19) in the decoding process is decoded correctly is lower bounded by ε2(nj)\sqrt[2^{\left(n-j\right)}]{\varepsilon} if (44) and (47) are satisfied.

Appendix C Explanation of Observation 2

Note that based on Observation 1, any node 𝒩ji\mathcal{N}_{j}^{i} that is decoded using (19) has a probability of correct hard decision of approximately greater than or equal to ε2(nj)\sqrt[2^{\left(n-j\right)}]{\varepsilon}. For any node that undergoes the SRFSC decoding, the probability of correct decoding is determined by the error rate of SRFSC decoding. Thus, the probability of correct decoding for TA-SRFSC decoding is approximately greater than or equal to ε(1BLERSRFSC)\varepsilon\left(1-{\mathrm{BLER}}_{\mathrm{SRFSC}}\right). Consequently, BLERTASRFSC1ε(1BLERSRFSC){\mathrm{BLER}}_{\mathrm{TA-SRFSC}}\lesssim 1-\varepsilon\left(1-{\mathrm{BLER}}_{\mathrm{SRFSC}}\right).

References

  • [1] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
  • [2] 3GPP, “Final report of 3GPP TSG RAN WG1 #87 v1.0.0,” Reno, USA, Nov. 2016. [Online]. Available: http://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_87.
  • [3] K. Niu, K. Chen, J. Lin, and Q. Zhang, “Polar codes: Primary concepts and practical decoding algorithms,” IEEE Commun. Mag., vol. 52, no. 7, pp. 192–203, 2014.
  • [4] C. Zhang, B. Yuan, and K. K. Parhi, “Reduced-latency SC polar decoder architectures,” in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2012, pp. 3471–3475.
  • [5] B. Yuan and K. K. Parhi, “Low-latency successive-cancellation polar decoder architectures using 2-bit decoding,” IEEE Trans. Circ. Syst. I: Regul., vol. 61, no. 4, pp. 1241–1254, April. 2014.
  • [6] A. Mishra, A. J. Raymond, L. G. Amaru, G. Sarkis, C. Leroux, P. Meinerzhagen, A. Burg, and W. J. Gross, “A successive cancellation decoder ASIC for a 1024-bit polar code in 180nm cmos,” in Proc. Asian Solid-State Circuits Conf., Nov. 2012, pp. 205–208.
  • [7] B. Yuan and K. K. Parhi, “Reduced-latency LLR-based SC list decoder for polar codes,” in Proc. 25th Ed. Great Lakes Symp. VLSI, May. 2015, pp. 107–110.
  • [8] ——, “Low-latency successive-cancellation list decoders for polar codes with multibit decision,” IEEE Trans. VLSI Syst., vol. 23, no. 10, pp. 2268–2280, Oct. 2015.
  • [9] G. Sarkis and W. J. Gross, “Increasing the throughput of polar decoders,” IEEE Commun. Lett., vol. 17, no. 4, pp. 725–728, Apr. 2013.
  • [10] C. Husmann, P. C. Nikolaou, and K. Nikitopoulos, “Reduced latency ML polar decoding via multiple sphere-decoding tree searches,” IEEE Trans. Veh. Technol., vol. 67, no. 2, pp. 1835–1839, Feb. 2017.
  • [11] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-cancellation decoder for polar codes,” IEEE Commun. Lett., vol. 15, no. 12, pp. 1378–1380, Dec. 2011.
  • [12] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast polar decoders: Algorithm and implementation,” IEEE J. Sel. Areas Commun., vol. 32, no. 5, pp. 946–957, May. 2014.
  • [13] P. Giard, G. Sarkis, C. Thibeault, and W. J. Gross, “Multi-mode unrolled architectures for polar decoders,” IEEE Trans. Circ. Syst. I: Regul., vol. 63, no. 9, pp. 1443–1453, Sep. 2016.
  • [14] Z. Huang, C. Diao, and M. Chen, “Latency reduced method for modified successive cancellation decoding of polar codes,” Electron. Lett., vol. 48, no. 23, pp. 1505–1506, Nov. 2012.
  • [15] A. Balatsoukas-Stimming, G. Karakonstantis, and A. Burg, “Enabling complexity-performance trade-offs for successive cancellation decoding of polar codes,” in Proc. IEEE Int. Symp. Inf. Theory Process. (ISIT), Jun. 2014, pp. 2977–2981.
  • [16] L. Zhang, Z. Zhang, X. Wang, C. Zhong, and L. Ping, “Simplified successive-cancellation decoding using information set reselection for polar codes with arbitrary blocklength,” IET Commun., vol. 9, no. 11, pp. 1380–1387, Jul. 2015.
  • [17] P. Giard, G. Sarkis, C. Thibeault, and W. J. Gross, “A 638 Mbps low-complexity rate 1/2 polar decoder on FPGAs,” in Proc. IEEE Int. Workshop Signal Process. Syst. (SiPS), Oct. 2015, pp. 1–6.
  • [18] P. Giard, A. Balatsoukas-Stimming, G. Sarkis, C. Thibeault, and W. J. Gross, “Fast low-complexity decoders for low-rate polar codes,” J Sign Process Syst, vol. 90, no. 5, pp. 675–685, May. 2018.
  • [19] M. Hanif and M. Ardakani, “Fast successive-cancellation decoding of polar codes: identification and decoding of new nodes,” IEEE Commun. Lett., vol. 21, no. 11, pp. 2360–2363, Nov. 2017.
  • [20] C. Condo, V. Bioglio, and I. Land, “Generalized fast decoding of polar codes,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2018, pp. 1–6.
  • [21] H. Gamage, V. Ranasinghe, N. Rajatheva, and M. Latva-aho, “Low latency decoder for short blocklength polar codes,” arXiv preprint arXiv:1911.03201, 2019.
  • [22] F. Ercan, C. Condo, and W. J. Gross, “Reduced-memory high-throughput Fast-SSC polar code decoder architecture,” in Proc. IEEE Int. Workshop Signal Process. Syst. (SiPS), Oct. 2017, pp. 1–6.
  • [23] F. Ercan, T. Tonnellier, C. Condo, and W. J. Gross, “Operation merging for hardware implementations of fast polar decoders,” J Sign Process Syst, vol. 91, no. 9, pp. 995–1007, Sep. 2019.
  • [24] P. Giard and A. Burg, “Fast-SSC-flip decoding of polar codes,” in Proc. IEEE Wireless Commun. Netw. Conf. Workshops (WCNCW), Apr. 2018, pp. 73–77.
  • [25] F. Ercan, T. Tonnellier, and W. J. Gross, “Energy-efficient hardware architectures for fast polar decoders,” IEEE Trans. Circ. Syst. I: Regul., vol. 67, no. 1, pp. 322–335, Jan. 2020.
  • [26] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May. 2015.
  • [27] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast list decoders for polar codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2, pp. 318–328, Feb. 2016.
  • [28] S. A. Hashemi, C. Condo, and W. J. Gross, “A fast polar code list decoder architecture based on sphere decoding,” IEEE Trans. Circ. Syst. I: Regul., vol. 63, no. 12, pp. 2368–2380, Dec. 2016.
  • [29] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast and flexible successive-cancellation list decoders for polar codes,” IEEE Trans. Signal Process., vol. 65, no. 21, pp. 5756–5769, Nov. 2017.
  • [30] S. A. Hashemi, C. Condo, M. Mondelli, and W. J. Gross, “Rate-flexible fast polar decoders,” IEEE Trans. Signal Process., vol. 67, no. 22, pp. 5689–5701, Nov. 2019.
  • [31] M. H. Ardakani, M. Hanif, M. Ardakani, and C. Tellambura, “Fast successive-cancellation-based decoders of polar codes,” IEEE Trans. Commun., vol. 67, no. 7, pp. 4562–4574, Jul. 2019.
  • [32] S. Li, Y. Deng, L. Lu, J. Liu, and T. Huang, “A low-latency simplified successive cancellation decoder for polar codes based on node error probability,” IEEE Commun. Lett., vol. 22, no. 12, pp. 2439–2442, Dec. 2018.
  • [33] H. Sun, R. Liu, and C. Gao, “A simplified decoding method of polar codes based on hypothesis testing,” IEEE Commun. Lett., pp. 1–1, Jan. 2020.
  • [34] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware architectures for successive cancellation decoding of polar codes,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May. 2011, pp. 1665–1668.
  • [35] P. Trifonov, “Efficient design and decoding of polar codes,” IEEE Trans. Commun., vol. 60, no. 11, pp. 3221–3227, Nov. 2012.
  • [36] Z. Zhang and L. Zhang, “A split-reduced successive cancellation list decoder for polar codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2, pp. 292–302, Feb. 2016.
  • [37] R. Silverman and M. Balser, “Coding for constant-data-rate systems,” Trans. IRE Prof. Group Inf. Theory, vol. 4, no. 4, pp. 50–63, Sep. 1954.
  • [38] M. El-Khamy, J. Lee, and I. Kang, “Detection analysis of CRC-assisted decoding,” IEEE Commun. Lett., vol. 19, no. 3, pp. 483–486, Mar. 2015.
  • [39] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, “A semi-parallel successive-cancellation decoder for polar codes,” IEEE Trans. Signal Process., vol. 61, no. 2, pp. 289–299, Jan. 2013.
  • [40] 3GPP, “3GPP TS RAN 38.212 v1.2.1,” Dec. 2017. [Online]. Available: http://www.3gpp.org/ftp/Specs/archive/38_series/38.212/38212-f30.zip.
  • [41] H. Zheng, A. Balatsoukas-Stimming, Z. Cao, and A. M. J. Koonen, “Implementation of a high-throughput Fast-SSC polar decoder with sequence repetition node,” arXiv:2007.11394, in IEEE Int. Workshop Signal Process. Syst. (SIPS), Oct. 2020.