Realizing Neural Decoder at the Edge with Ensembled BNN

Devannagari Vikas, Nancy Nayak, and Sheetal Kalyani
Department of Electrical Engineering, Indian Institute of Technology Madras, India.
Emails: {ee19m018@smail,ee17d408@smail,skalyani@ee}.iitm.ac.in

Abstract

In this work, we propose extreme compression techniques like binarization, ternarization for Neural Decoders such as TurboAE. These methods reduce memory and computation by a factor of $64$ with a performance better than the quantized (with $1$ -bit or $2$ -bits) Neural Decoders. However, because of the limited representation capability of the Binary and Ternary networks, the performance is not as good as the real-valued decoder. To fill this gap, we further propose to ensemble $4$ such weak performers to deploy in the edge to achieve a performance similar to the real-valued network. These ensemble decoders give $16$ and $64$ times saving in memory and computation respectively and help to achieve performance similar to real-valued TurboAE.

Index Terms:

Neural decoding, Deep Learning, Computation and memory efficiency.

I Introduction

The future wireless communication system 6G will not only be equipped with multi-band high-speed transmission but also energy-efficient communication, low latency, and high security. In a digital communication system, different physical layer encryption algorithms like LDPC, Polar, Turbo codes [9, 3] are used as channel coding methods[4] to prevent the data from getting corrupted by channel noise. When the channel deviates from the Gaussian setting in a practical scenario, to exploit the power of the encoder, Neural Networks (NN) have been used to design the decoder while the encoder is fixed as a near-optimal code [5]. Deploying decoders for these codes takes up huge computation which is only possible because of recent advancements in signal processing methods. With a surge in the number of devices in the network, the interactions among themselves may result in excessive signal processing at the user end that gives rise to huge power consumption. Therefore economic energy usage to have elongated battery life in mobile devices has been a research direction of utmost importance[17, 10, 11]. In a noisy channel, encoding has been challenging even though the decoders have good performance[16]; so the authors in [13, 12, 18, 14] proposed neural code where the encoder and decoder are jointly trained. To overcome the problem of convergence to a local minimum in joint optimization, [7] proposed TurboAE that uses Convolutional Neural Network (CNN) based over-complete Auto Encoder (AE) model incorporating interleavers and de-interleavers to achieve the performance of State Of The Art (SOTA) channel codes under the AWGN scenario. All the existing neural AEs have real-valued network parameters and perform floating-point operations during deployment. For instance, the TurboAE architecture has nearly $26e5$ parameters that take up a memory of $20.84$ MB considering a $64$ bit floating-point representation. Out of total $26e5$ parameters, the encoder has nearly $1.5e5$ parameters whereas the decoder has nearly $25e5$ parameters. Because of the huge number of parameters in the AEs, deploying it in a resource-limited Internet Of Thing (IoT) setup is a challenging task. Furthermore, with the advent of edge computing in IoT scenarios, the computation is decentralized to edge devices where the data is processed locally. Realizing a Neural Decoder such as TurboAE[7] at a user end that has limited memory and computing power is not practically feasible.

I-A Contributions

In the domain of wireless communication, the channel noise is real-valued and till now, only the real-valued Neural Decoders have been used for end-to-end training and these use only floating-point operations. In this work, we have explored the possibilities of using extreme compactification techniques in Machine Learning-based wireless decoders like TurboAE. We further propose techniques that allow the decoder to be memory and computation-efficient but still have a performance close to the real-valued decoder. The major contributions of our work are the following:

1.

We propose to use binary filters/weights/biases and binary activations¹¹1Binary Neural Networks[6] take the compression to the extreme level by replacing $64$ bit floating point (FP) weights and activations to be $1$ -bit that gives a memory reduction of $64$ times. Also the FP multiplication and addition operations are replaced with xnor and popcount operations that reduces computation cost radically during the inference time. in the Neural Decoder to save in memory and computation at the edge.
2.

The performance is further improved by the use of a Ternary Neural Network (TNN) where the weights take three levels $\{-1,0,+1\}$ with the binary activation. The proposed architectures with binary and ternary weights are shown to be better than one where the trained network is quantized with $2$ bit or $1$ bit.
3.

An ensemble of multiple weak binary and ternary decoders is then proposed and is shown to perform close to the real-valued TurboAE and also achieve a $16$ times saving in memory and nearly $64$ times speed up due to less computation thus enabling us to achieve energy efficiency and low latency in the edge communication.

Before discussing different compressed versions of TurboAE, we first review the extreme compression techniques like BNN and TNN in Sec. II and then study the impact of these techniques in TurboAE in Sec. III.

II Extreme compression techniques

We denote a real valued NN $g_{\bm{\phi}}(.)$ where $\bm{\phi}$ represents the real valued network parameters. The output from the NN is given by $\mathbf{y}=g_{\bm{\phi}}(\mathbf{x})$ where $\mathbf{x}$ is the input features to the NN and can be real valued. The neural network $g_{\bm{\phi}}(.)$ can be of any type: a fully connected, a CNN or a Recurrent Neural Networks (RNN). As TurboAE uses a CNN for the Neural Decoder, we now focus on CNNs. For $g_{\mathbf{\phi}}(.)$ a CNN of $L$ layers, the parameters are the filters of the CNN and are given by $\mathbf{\phi}=\{\mathbf{W}_{1},\dots,\mathbf{W}_{L}\}$ where $\mathbf{W}_{l}\in\mathbb{R}^{c_{o}\times c_{i}\times k}$ for $l^{th}$ layer of one dimensional CNN. Here $c_{i}$ and $c_{o}$ represents number of input and output channels and $k$ is the dimension of the filter. For a one dimensional CNN as used in TurboAE, if the input to $l^{th}$ layer of CNN has spatial features of dimension $h_{in}$ , then input to $l^{th}$ layer is $\mathbf{a}_{l}\in\mathbb{R}^{c_{i}\times h_{in}}$ . The output of $l^{th}$ layer is $\mathbf{a}_{l+1}\in\mathbb{R}^{c_{o}\times h_{out}}$ where $h_{out}$ is the dimension of the output. For a Binary Neural Network (BNN), the weights and activations ( $\mathbf{W}$ and $\mathbf{a}$ ) are binarized using the $sign$ function before taking convolution.

\displaystyle b=sign(r)=\begin{cases}+1,&\text{if }r\geq 0\\ -1,&\text{otherwise}.\end{cases}

(1)

The binarized parameter $\mathbf{W}^{b}_{l}$ and $\mathbf{a}^{b}_{l}$ is given by:

\displaystyle\mathbf{W}^{b}_{l}=sign(\mathbf{W}_{l}),\text{ and }\mathbf{a}^{b}_{l}=sign(\mathbf{a}_{l})

(2)

The real-valued convolution is approximated with binary weights and activations as $\mathbf{W}_{l}\ast\mathbf{a}_{l}\approx\mathbf{W}^{b}_{l}\circledast\mathbf{a}^{b}_{l}$ where $\circledast$ is convolution performed with bitwise operations. Even though the binarized weights are used for the forward pass, only the real-valued latent weights are updated with the real-valued gradients during backpropagation. However, during inference, these latent weights can be dropped and a binary network with the binary weights and activations can be deployed. The $sign$ function is non-differentiable and has gradients as zero almost everywhere; thus it is not appropriate for the backpropagation during the training. Therefore a straight-through estimator [2] was proposed that binarizes in the forward pass but during backpropagation, it just passes the gradients as it is to the previous layers. For instance, if $b=sign(r)$ , then $grad_{r}=grad_{b}\mathbf{1}_{|r|\leq 1}$ where $grad_{r}=\frac{\partial C}{\partial r}$ , $grad_{b}=\frac{\partial C}{\partial b}$ and $C$ is the cost function of the NN. To have a stable update during the training, the updated real-valued weights are clipped between $[-1,1]$ .

If a real-valued network $g_{\bm{\phi}}(.)$ is deployed in a $64$ bit system, then its binary version will occupy $64$ times lesser memory and all the floating-point operations can be converted to just xnor and popcount operations. However, because of this extreme compactification, the performance generally degrades significantly. So [8] proposed to use Ternary Neural Network (TNN) where $3$ bits $\{-1,0,1\}$ are used. Therefore the ternarized parameter $t$ is given by:

\displaystyle t=tern(r)=\begin{cases}+1,&\text{if }r>\Delta\\ 0,&\text{if }r<|\Delta|\\ -1,&\text{if }r<-\Delta.\end{cases}

(3)

where $\Delta\approx 0.7E(|\mathbf{r}|)$ in our architecture where $\mathbf{r}$ the set parameters of the real network. The introduction of zero as another bit along with $\{+1,-1\}$ gives a better representation power and therefore better performance than BNN. But the zero weights need not to be saved during deployment. So the memory requirement of TNN is same as that of the BNN. Note that the activation is still binary and thus the computational complexity is also same as the BNN. Therefore with TNN, an improvement in performance over BNN is possible without any degradation in memory requirement or computation.

Figure 1: TurboAE interleaved encoder (left), Channel (middle) and TurboAE iterative decoder (right) with block rate

\frac{1}{3}

g_{\phi}^{v}=g_{\phi}^{b}

for BinTurboAE and

g_{\phi}^{v}=g_{\phi}^{t}

for TernTurboAE. Fig courtesy [7]

II-A Saving in computation

The convolution between real-valued $\mathbf{W}_{l}\in\mathbb{R}^{c_{o}\times c_{i}\times k}$ and $\mathbf{a}_{l}\in\mathbb{R}^{c_{i}\times h_{in}}$ at $l^{th}$ layer results in an output $\mathbf{a}_{l+1}\in\mathbb{R}^{c_{o}\times h_{out}}$ . The total number of multiplication for $l^{th}$ layer is $c_{i}\times k\times h_{out}\times c_{o}$ and the total number of addition for $l^{th}$ layer is $(c_{i}-1)\times(k-1)\times h_{out}\times c_{o}$ . The total count of FLoating Point Operations (FLOP) for $l^{th}$ layer of a real-valued 1D-CNN is the summation of the number of multiplication and addition that is roughly twice of the number of multiplication given by $2\times c_{i}\times k\times h_{out}\times c_{o}$ . For a binary counterpart, as the weights and activations are constrained to $-1$ or $+1$ , the $64$ bit floating point multiply-accumulation operations are replaced by $1$ bit xnor-count operations [6]. Note that the modern CPUs can perform a single multiplication and addition in a single clock cycle, and thus the total number of operations in a binary network is $c_{i}\times k\times h_{out}\times c_{o}$ . In recent CPUs, $64$ such binary operations can be performed in a single clock cycle hence, giving a speedup of nearly $64$ times in a binary or ternary network [15]. Because the filters take only $+1$ or $-1$ , only a limited number of filters are possible. So with BNN, the filter repetition can be exploited by using dedicated hardware/software. The implementation on GPU can be made faster by using SIMD within a register (SWAR) where $64$ binary variables are concatenated in a $64$ bit register and a $64$ times speedup on the bitwise operation like xnor can be achieved.

III TurboAE and its binarized versions

The method of channel coding in TurboAE can be divided into three sub-problems: an encoder $f_{\theta}(.)$ at the transmitter, a channel $c(.)$ and a decoder $g_{\phi}(.)$ at the receiver. In a communication system, the encoder $x=f_{\theta}(u)$ encodes the binary bits $\mathbf{u}=(u_{1},\dots,u_{K})\in\{+1,-1\}^{K}$ of block length $K$ to get the codeword $\mathbf{x}=(x_{1},\dots,x_{N})$ of length $N$ such that the codeword satisfies the power constraints. The code rate is $R=\frac{K}{N}$ , where $N>K$ . The i.i.d. AWGN channel corrupts the encoded bits and generates $z_{i}=x_{i}+w_{i}$ such that $w_{i}\sim\mathcal{N}(0,\sigma^{2})$ for $i=1,\dots,K$ . The noise in the AWGN channel is represented by the signal to noise ratio $\text{SNR}=-10\log_{10}\sigma^{2}$ . After transmission through the channel, the decoder $g_{\phi}(z)$ receives the real valued noisy encoded bits $z$ and map them to an estimate of the actual message sequence $\hat{\mathbf{u}}=(\hat{u}_{1},\dots,\hat{u}_{K})\in\{+1,-1\}^{K}$ using a decoding algorithm. Channel coding aims to minimize the Bit Error Rate (BER) or the BLock Error Rate (BLER) of the recovered message signal $\hat{u}$ given by $BER=\frac{1}{K}\sum_{1}^{K}Pr(\hat{u}_{i}\neq u_{i})$ and $BLER=Pr(\hat{\mathbf{u}}\neq\mathbf{u})$ . Naively applying deep learning models by replacing encoder and decoder with general purpose neural networks does not perform well. So in [7], authors have proposed a TurboAE with interleaved encoding and iterative decoding using 1D convolutional neural networks. To make the Neural Decoder utilizable at the edge, we first propose to binarize and ternarize the iterative decoder of TurboAE and inspect its performance. We briefly describe the TurboAE architecture before discussing the proposed compressing techniques.

Turbo code is one of the first capacity approaching codes based on recursive systematic convolutional (RSC) code that has an optimal decoding algorithm namely the Bahl-Cocke-Jelinek-Raviv (BCJR)[1]. To add long-range memory to the code, interleaving is used: out of two copies of input bits, the first one passes through the RSC code and the second goes through the interleaver before passing through the same RSC code as shown in Fig. 1(left). After the transmission through the channel, this code is then decoded by repeating (i) and (ii) alternatively: (i) soft decoding based on the signal received from the first copy (ii) using the de-interleaved version as a prior for decoding the second copy as shown in Fig. 1(right). This iterative decoding method keeps re-estimating the posterior distribution on the transmitted bits. Both the interleaved encoder and the iterative decoder are learnable as proposed in TurboAE [7]. The interleaver $\mathbf{x}^{\pi}=\pi(\mathbf{x})$ and the de-interleaver $\mathbf{x}=\pi^{-1}(\mathbf{x}^{\pi})$ shuffles and un-shuffles the input sequence with a random interleaving array known to both encoder and decoder respectively. A code rate of $1/3$ is considered for the interleaved encoder $f_{\theta}$ that has three learnable blocks $f_{1,\theta},f_{2,\theta}$ and $f_{3,\theta}$ . The first two takes the original message bit $\mathbf{u}$ to produce $\mathbf{x}_{1}$ and $\mathbf{x}_{2}$ whereas the third block takes the interleaved message $\pi(\mathbf{u})$ to return $\mathbf{x}_{3}$ as shown in Fig. 1. The encoded messages are transmitted through the channel and the received noisy messages are $\mathbf{z}_{1}$ , $\mathbf{z}_{2}$ and $\mathbf{z}_{3}$ . Our focus is mostly on the compression of the iterative decoder part so that it can be deployed at the edge devices. Thus we do not discuss much on the encoder part in this work. Interested readers may refer to [7] for more details on the encoder.

III-A Binary and Ternary iterative decoder

Considering $M(=6)$ iterations of the iterative decoder, each iteration consists of two decoders. First decoder $g_{\phi_{i,1}}(.)$ in $i^{th}$ iteration takes the original noisy message $\mathbf{z}_{1},\mathbf{z}_{2}$ and the prior distribution $p$ on the transmitted bits and returns a posterior $q$ that goes to the second decoder $g_{\phi_{i,2}}(.)$ via interleaving along with the interleaved noisy messages $\pi(\mathbf{z}_{1})$ and $\mathbf{z}_{3}$ . In the proposed binarized and ternarized TurboAE, named as BinTurboAE and TernTurboAE respectively, the real-valued decoders $\{g_{\phi_{1}},\dots,g_{\phi_{M}}\}$ are replaced with binary decoders $\{g^{b}_{\phi_{1}},\dots,g^{b}_{\phi_{M}}\}$ and ternary decoders $\{g^{t}_{\phi_{1}},\dots,g^{t}_{\phi_{M}}\}$ . For ease of notation, we represent the complete binary decoder by $g^{b}_{\phi}$ and the ternary decoder by $g^{t}_{\phi}$ . The main limitation of BinTurboAE and TernTurboAE is that they do not perform as well as the real-valued TurboAE. But in those applications where degradation in performance is acceptable at the cost of reduced computation and energy efficiency, BinTurboAE or TernTurboAE can be deployed at the Edge devices. As the performance of BinTurboAE is not as good as the real counterpart, each of these can be thought of as a single weak learner. Instead of relying on a single weak learner, we further propose to ensemble a set of weak learners’ outcomes to enable a performance that is as good as that of a real-valued network however with much lower complexity and memory requirement.

Figure 2: Architecture of the decoder of (Bin/Tern)TurboAE-Bag: the final estimate

\hat{u}

is the aggregate of

B=4

weak learners.

g_{\phi}^{v,i}=g_{\phi}^{b}

for BinTurboAE and

g_{\phi}^{v,i}=g_{\phi}^{t}

for TernTurboAE.

III-B Proposed Ensembled binary TurboAE

Considering each decoder $g^{b}_{\phi}$ a weak learner, $B$ such weak learners are trained separately with the complete dataset. The idea of “ensemble” is to get opinions from all these weak learners to arrive at a better prediction. One of the many ways the weak learners can be ensembled is Bagging [19]. In this work, we have proposed to ensemble $B$ BinTurboAEs with the Bagging method and denote this proposed method as BinTurboAE-Bag. The same with TernTurboAE is called TernTurboAE-Bag. Bagging is used in machine learning to improve stability and accuracy and to reduce variance. In Bagging method, the decisions from each one of these $B$ BinTurboAEs ( $\{\hat{u}^{1},\dots,\hat{u}^{B}\}$ ) are averaged to get the final prediction $\hat{\mathbf{u}}=\frac{1}{B}\sum_{b=1}^{B}\hat{\mathbf{u}}^{b}$ as shown in Fig. 2.

IV Experiments

To validate the usefulness of the proposed compression techniques, we consider the setting used in [7] to train the encoder and decoder networks. A large batch size, preferably greater than or equal to $500$ , is used to average the channel noise effects. We train the encoder and decoder separately to avoid any possible local optima. BinTurboAE and TernTurboAE need a smaller learning rate than the real-valued TurboAE. Hence we reduced the learning rate by 10 times whenever the validation loss gets saturated for higher training epochs. The hyper-parameters used in our experiment are shown in Table I.

TABLE I: Hyper-parameters of TurboAE

Loss	Binary Cross-Entropy (BCE)
Encoder	2 layers 1D-CNN, kernel size 5, 100 filters for each learnable encoding block
Decoder	5 layers 1D-CNN, kernel size 5, 100 filters for each learnable decoding block
Decoder Iterations	6
Info Feature Size F	5
Batch Size	500
Optimizer	Adam
Learning Rate	initially 0.0001 and reduced by 10 when test loss saturates for more number of epochs
Block Length K	100
Number of Epochs	800

TABLE II: Savings vs performances at the edge device

Model	Memory savings	Computation	Speed up	BER at SNR $0$ dB
Full precision DNN	1x	$\simeq 4e8$ FLOPs	1x	$1e-2$
QuantTurboAE ( $q=4$ )	$\simeq(64/q)$ x $=16$ x	$\simeq 4e8$ FLOPs	1x	$6e-2$ (q=4)
BinTurboAE	$\simeq 64$ x	$\simeq 4e8$ xnor-count	$64$ x	$1e-1$
TernTurboAE	$\simeq 64$ x	$\simeq 4e8$ xnor-count	$64$ x	$6e-2$
(Bin/Tern)TurboAE-bag ( $B=4$ )	$\simeq(64/B)$ x $=16$ x	$\simeq$ $16e8$ xnor-count	$64$ x	$2e-3$

Refer to caption — Figure 3: Performance of Binary and Ternary networks compared to the quantized and real valued TurboAE

IV-A Results

We provide results showing performance in terms of BER vs SNR of the proposed BinTurboAE and TernTurboAE and compare them with QuantTurboAE, the quantized TurboAE to $q$ levels after the training. For QuantTurboAE, the parameters of the trained TurboAE are quantized to different levels i.e. $8$ -bit, $4$ -bit, $2$ -bit, and $1$ -bit. The saving in memory is $8$ , $16$ , $32$ , and $64$ times respectively compared to the real-valued TurboAE network as shown in Table. II. QuantTurboAE does not offer any saving in computation unlike our proposed method. The $8$ -bit quantization after the training performs as well as the original TurboAE. But the $2$ -bit and $1$ -bit quantizations have very poor performance as shown in Fig. 3. But instead of quantization after the training, if the network is trained with $1$ -bit quantization like the BinTurboAE, the network outperforms 2-bit and 1-bit QuantTurboAEs. The Ternary network improves the BER performance even more by $0.5$ dB and performs similar to QuantTurboAE ( $q=4$ ) which uses $4$ bits to store each parameter whereas the TernTurboAE uses only $1$ bit. Therefore, compared to the real-valued TurboAE, both the binary and the ternary variants save the memory requirement by about $64$ times and the computations by converting all the floating-point computations to xnor and pop-count operations at the decoder side. The performance gap between the proposed methods and TurboAE still exists and needs one’s attention. To close this gap, $B=4$ such BinTurboAE as weak learners are ensembled and its performance is shown in Fig. 4.

The ensemble of just $B=4$ BinTurboAEs implemented with the bagging method performs much better than that of a single BinTurboAE. The BinTurboAE-Bag even outperforms the real network in the low SNR region by almost $1$ dB. The performance of TernTurboAE-Bag is slightly better than BinTurboAE-Bag as shown in the figure. In the high SNR region, the BinTurboAE-Bag performs close to the real TurboAE. This result is significant as the BinTurboAE-Bag saves a lot in terms of the memory requirement (about $64/B$ times) and the number of computations (FLOPs are replaced with xnor-count) at the edge device end without compromising the BER performance.

IV-B Computation and memory savings at the edge devices

Decoding usually happens at the edge device. In real TurboAE, the iterative decoder has a huge number of parameters that take up a lot of memory. It also involves floating-point operations thus making the computations slow at the edge devices. Our main goal is then to reduce the memory requirement and computations at the decoder side of the TurboAE so that the proposed decoders are suitable for deployment at the edge. The savings for each of the proposed techniques are shown in Table. II. BinTurboAE and TernTurboAE take up memory $64$ times lesser than the real-valued TurboAE. BinTurboAE-Bag takes a memory $B$ times of the BinTuboAE.

The number of FLOPs in the decoder of the real TurboAE at the edge devices is about $4e8$ . Even though the memory savings in $q$ bit Quantized network would be around $(64/q)$ times the real network’s requirement, QuantTurboAE and TurboAE do not speed up the computations as the computations are still in $64$ bit. As the Binary, Ternary and the Ensembled TurboAEs convert all the $4e8$ floating-point operations to only bitwise operations, the computations are extremely fast with much lower power consumption. When $64$ bitwise operations are performed in a single clock cycle, then the binary and ternary networks are $64$ times faster thus leading to very low latency when compared with the real TurboAE network. Even though the computation in BinTurboAE-Bag is $B$ times of the BinTurboAE, if parallel processing is available at edge, then BinTurboAE-Bag can be equally fast like BinTurboAE.

V Conclusion

In summary, we propose BinTurboAE and TernTurboAE intending to deploy the end-to-end channel coding in the targeted low-power edge devices by reducing the memory requirement and the computations by nearly $64$ times at the cost of acceptable performance degradation. We then propose BinTurboAE-bag and TernTurboAE-bag to improve the performance offered by a single BinTurboAE or single TernTurboAE respectively and achieve the performance close to the real network. The ensembled technique implemented with four such weak learners is shown to consume $16$ times less memory and computing power than the real-valued TurboAE with nearly similar performance.

References

[1] Lalit Bahl, John Cocke, Frederick Jelinek, and Josef Raviv. Optimal decoding of linear codes for minimizing symbol error rate (corresp.). IEEE Transactions on information theory, 20(2):284–287, 1974.
[2] Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
[3] Claude Berrou, Alain Glavieux, and Punya Thitimajshima. Near shannon limit error-correcting coding and decoding: Turbo-codes. 1. In Proceedings of ICC’93-IEEE International Conference on Communications, volume 2, pages 1064–1070. IEEE, 1993.
[4] Sepehr Dehdashtian, Matin Hashemi, and Saber Salehkaleybar. Deep-learning-based blind recognition of channel code parameters over candidate sets under awgn and multi-path fading conditions. IEEE Wireless Communications Letters, 10(5):1041–1045, 2021.
[5] Nghia Doan, Seyyed Ali Hashemi, and Warren J Gross. Neural successive cancellation decoding of polar codes. In 2018 IEEE 19th international workshop on signal processing advances in wireless communications (SPAWC), pages 1–5. IEEE, 2018.
[6] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in Neural Information Processing Systems, volume 29, 2016.
[7] Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, and Pramod Viswanath. Turbo autoencoder: Deep learning based channel codes for point-to-point communication channels. In Advances in Neural Information Processing Systems, volume 32, 2019.
[8] Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016.
[9] David JC MacKay and Radford M Neal. Near shannon limit performance of low density parity check codes. Electronics letters, 32(18):1645–1646, 1996.
[10] Rajarshi Mahapatra, Yogesh Nijsure, Georges Kaddoum, Naveed Ul Hassan, and Chau Yuen. Energy efficiency tradeoff mechanism towards wireless green communication: A survey. IEEE Communications Surveys & Tutorials, 18(1):686–705, 2015.
[11] Nancy Nayak, Thulasi Tholeti, Muralikrishnan Srinivasan, and Sheetal Kalyani. Green detnet: Computation and memory efficient detnet using smart compression and training. arXiv preprint arXiv:2003.09446, 2020.
[12] Timothy J O’Shea, Tugba Erpek, and T Charles Clancy. Deep learning based mimo communications. arXiv preprint arXiv:1707.07980, 2017.
[13] Vishnu Raj and Sheetal Kalyani. Backpropagating through the air: Deep learning at physical layer without channel models. IEEE Communications Letters, 22(11):2278–2281, 2018.
[14] Vishnu Raj and Sheetal Kalyani. Design of communication systems using deep learning: A variational inference perspective. IEEE Transactions on Cognitive Communications and Networking, 6(4):1320–1334, 2020.
[15] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016.
[16] Kirty Vedula, Randy Paffenroth, and D Richard Brown. Joint coding and modulation in the ultra-short blocklength regime for bernoulli-gaussian impulsive noise channels using autoencoders. In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2020, pages 5065–5069. IEEE, 2020.
[17] Ming Zhan, Zhibo Pang, Ming Xiao, and Hong Wen. A state metrics compressed decoding technique for energy-efficient turbo decoder. EURASIP Journal on Wireless Communications and Networking, 2018(1):1–7, 2018.
[18] Banghua Zhu, Jintao Wang, Longzhuang He, and Jian Song. Joint transceiver optimization for wireless communication phy using neural network. IEEE Journal on Selected Areas in Communications, 37(6):1364–1373, 2019.
[19] Shilin Zhu, Xin Dong, and Hao Su. Binary ensemble neural network: More bits per network or more networks per bit? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4923–4932, 2019.