Joint Activity Detection, Channel Estimation, and Data Decoding for Grant-free Massive Random Access

Xinyu Bian, Yuyi Mao, and Jun Zhang, Manuscript received 13 April 2022; revised 28 December 2022; accepted 3 February 2023. This work was supported in part by the General Research Fund (Project No. 15207220) from the Hong Kong Research Grants Council and in part by a start-up fund of the Hong Kong Polytechnic University (Project ID P0038174). This paper was presented in part at the 2021 IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) [1]. (Corresponding author: Yuyi Mao.) X. Bian and J. Zhang are with the Department of Electronic and Computer Engineering, the Hong Kong University of Science and Technology, Hong Kong (E-mail: [email protected], [email protected]). Y. Mao is with the Department of Electronic and Information Engineering, the Hong Kong Polytechnic University, Hong Kong (E-mail: [email protected]).

Abstract

In the massive machine-type communication (mMTC) scenario, a large number of devices with sporadic traffic need to access the network on limited radio resources. Recently, grant-free random access has emerged as a promising mechanism for this challenging scenario, but its potential has not been fully unleashed. In particular, the available auxiliary information has not been fully exploited, including the common sparsity pattern in the received pilot and data signal, as well as the channel decoding information. This paper develops advanced receivers in a holistic manner to improve the massive access performance by jointly designing activity detection, channel estimation, and data decoding. To tackle the algorithmic and computational challenges, a turbo structure is adopted at the joint receiver. For performance enhancement, all the received symbols are utilized to jointly estimate the channel state, user activity, and soft data symbols, which effectively exploits the common sparsity pattern. Meanwhile, the extrinsic information from the channel decoder will assist the joint channel estimation and data detection. To reduce the complexity, a low-cost side information (SI)-aided receiver is also proposed, where the channel decoder provides side information to update the estimates on whether a user is active or not. Simulation results show that the turbo receiver is able to reduce the activity detection, channel estimation, and data decoding errors effectively, supporting twice as many active users compared with a separate design that disregards the common sparsity. In addition, the SI-aided receiver notably outperforms the conventional methods with a relatively low complexity.

Index Terms:

Grant-free massive random access, massive machine-type communication (mMTC), user activity detection, channel estimation, channel coding, approximate message passing (AMP), turbo receiver.

I Introduction

The upsurge of numerous Internet of Things (IoT) applications, e.g., autonomous vehicles, intelligent robots, smart city, and industry 4.0, is boosting a rapid paradigm shift of wireless communications from connecting people to connecting things [2]. It is estimated by Cisco that the share of machine-to-machine (M2M) communications will increase from 33% in 2018 to 50% in 2023 [3], leading to an unprecedented requirement of ubiquitous and scalable connectivity. In order to fulfill such an urgent need, massive machine-type communications (mMTC) has been identified as one of the key application scenarios of the fifth-generation (5G) cellular networks by the third-generation partnership project (3GPP) [4]. The most distinctive feature of mMTC is that a huge amount of devices are simultaneously connected to a base station (BS), while only a small proportion of them are active for transmitting a short data packet at each time [5]. This brings forth significant challenges that cannot be catered with existing wireless technologies. In particular, uplink access in cellular networks is traditionally controlled by grant-based random access (RA) mechanisms, where each user first initiates a RA procedure by transmitting a scheduling request (SR) to its serving BS, and it cannot start data transmission until the request is granted [6]. Nevertheless, due to the limited preamble sequences for SR, grant-based RA suffers from potential collisions of the connection requests when two or more users pick the same sequence, especially with massive concurrent requests [7]. Although complementary techniques such as power ramping, back-off, and access class barring [8], can be utilized to relieve access collision, long access latency and significant signalling overhead will be incurred, which are unfavorable for mMTC [9, 10]. Therefore, more efficient RA mechanisms are needed to meet the stringent latency and reliability requirements of mMTC.

Grant-free RA, which allows users to directly send messages without waiting for access permissions from the BS [11], is widely acknowledged as a promising alternative for massive RA. In contrast to grant-based RA where each user selects a random pilot sequence at each time slot, grant-free RA assigns a fixed and unique pilot sequence to each user to enable contention-free uplink access. Nevertheless, the BS does not have knowledge of the set of transmitting (i.e., active) users in grant-free RA, making it arduous to perform accurate channel estimation and data reception. Consequently, detecting the set of transmitting users (i.e., user activity detection) at the BS, becomes a new and critical task for grant-free massive RA [12].

I-A Related Works and Motivations

The popular pilot-based grant-free RA protocol is considered in this paper, where the active users transmit their unique pilot sequences using dedicated radio resources followed by data symbols, and the BS identifies the set of active users and decodes their data. Attributed to the massive number of devices and the limited resources for pilot transmission, it is infeasible to assign orthogonal pilot sequences to all the devices, rendering conventional collision avoidance mechanisms not readily applicable. Fortunately, user activity detection in mMTC turns out to be a compressive sensing (CS) problem [13] thanks to the sporadic data traffic pattern, for which, many efficient algorithms are available [14, 15, 16, 17]. For instance, by formulating user activity detection as a maximum likelihood (ML) estimation problem, low-complexity algorithms based on sample covariance matrices of the received pilot signal were proposed for massive multi-input multi-output (MIMO) systems in [18]. The user activity detection accuracies in massive and cooperative MIMO systems were analyzed in [19] with the approximate message passing (AMP) algorithm. Nevertheless, early studies on grant-free massive RA only focused on user activity detection but neglected the data reception performance, which motivates the investigations on multi-user detection (MUD) for massive connectivity [20, 21]. In particular, based on the orthogonal matching pursuit (OMP), a support detection algorithm was proposed for joint activity and data detection in grant-free non-orthogonal multiple access (NOMA) systems in [20]. A similar problem was tackled by fusing the expectation maximization (EM) algorithm with AMP in [21], which leverages prior information of the transmitted data symbols in addition to the sparse user activity pattern. However, these works assume full channel state information (CSI) available at the BS, which is idealized and impractical for grant-free massive RA.

To bridge this gap, joint activity detection and channel estimation (JADCE) becomes an emerging theme for grant-free massive RA. It was shown in [22] that JADCE can be formulated as a single measurement vector (SMV) and multiple measurement vector (MMV) CS problem with single- and multi-antenna BS, respectively, and both problems can be efficiently solved by AMP. The false alarm and missed detection probabilities were also characterized in [22]. Interestingly, a subsequent investigation [23] revealed that the activity detection error can be made arbitrarily small with sufficient BS antennas. Besides, characteristics of the wireless channels have been utilized together with the sparse user activity pattern to promote the accuracy of JADCE [24, 25]. Moreover, in reliance on a spatial and angular domain channel model respectively, user activity detection and channel estimation algorithms were developed for orthogonal frequency division multiplexing (OFDM) massive MIMO systems in [24] to achieve considerably improved access performance. Beyond those, low-complexity JADCE algorithms were proposed based on dimension reduction [26] and deep learning [27].

While the aforementioned JADCE algorithms exploit the sparsity pattern in the received pilot signal, a common sparsity pattern inherently embedded in both the received pilot and data signal could be further utilized to enhance the performance. The benefits of exploiting such a common sparsity pattern was first witnessed in [28] for massive RA systems with a single-antenna BS, where a joint activity detection, channel estimation and MUD algorithm was proposed under the framework of MMV CS. An extension for multi-antenna BSs was conducted in [29] via the bilinear generalized AMP (BiG-AMP) algorithm [17]. These preliminary attempts, however, were limited to uncoded transmissions and failed to take advantages of channel coding in modern digital communication systems. Specifically, the error detection mechanism of channel codes can be utilized to determine a subset of active users with high channel quality [30]. Besides, the soft decoding results, which carry posterior information of the transmitted data symbols, are valuable for improving the accuracy of activity detection, channel estimation and MUD. However, the joint detection/decoding problem is computationally infeasible for channel codes even with reasonable block lengths [31]. Thus, it necessitates a holistic investigation of how channel decoders can be effectively integrated with other critical components in a massive RA receiver, including user activity detection, channel estimation, and MUD, which will be pursued in this paper.

I-B Contributions

In this paper, we endeavor to push the performance limit of uplink receivers for grant-free massive RA by leveraging both the common sparsity pattern and channel decoding results. Our main contributions lie in developing effective methods to tackle the algorithmic and computational challenges of the joint design of activity detection, channel estimation, and data decoding, as summarized below.

•

We propose a turbo receiver for joint activity detection, channel estimation, and data decoding, which iterates between a joint estimator and a channel decoder. In order to exploit the common sparsity pattern in the received pilot and data signal, the joint estimator for joint activity detection, channel estimation, and data symbol detection, is developed by solving a bilinear inference problem based on the BiG-AMP algorithm. To boost its performance, we further leverage the posterior log-likelihood ratios (LLRs) of the data bits from the channel decoder to derive the extrinsic information, which serves as the updated estimates of the user activity as well as data symbol distribution and are used as prior information for the next turbo iteration.

•

Albeit the turbo receiver is effective in exploiting the common sparsity pattern, the BiG-AMP-based joint estimator incurs significant computation overhead. To facilitate fast execution while retaining the performance gain of the iterative receiver, we develop a side information (SI)-aided receiver that executes a sequential estimator and a channel decoder alternatively. The sequential estimator is developed for JADCE based on the AMP algorithm, which processes only the received pilot signal, leaving data symbol detection to a minimum mean square error (MMSE) equalizer. To effectively leverage the common sparsity and channel decoding results, the estimates on whether a user is active or not are used as SI for the sequential estimator, which are derived according to the parity check results and posterior LLRs.

•

Simulation results show that the turbo receiver significantly reduces the activity detection, channel estimation, and data decoding errors compared with the baseline schemes. Remarkably, in the simulated setting, assuming the block error rate (BLER) requirement is $10^{-3}$ , the turbo receiver is able to support 40 active users while the separate design can only support 20. Meanwhile, the SI-aided receiver saves more than 60% of the execution time compared with the turbo receiver, while maintaining a noticeable performance improvement compared against a data-assisted design that only leverages the common sparsity pattern.

I-C Organization

The rest of this paper is organized as follows. We introduce the system model in Section II. A turbo receiver for joint activity detection, channel estimation, and data decoding is developed in Section III. In Section IV, we propose a low-complexity SI-aided receiver. Simulation results are presented in Section V, and Section VI concludes this paper.

I-D Notations

We use lower-case letters, bold-face lower-case letters, bold-face upper-case letters, and math calligraphy letters to denote scalars, vectors, matrices, and sets, respectively. The entry in the $i$ -th row and $j$ -th column of matrix $\mathbf{M}$ is denoted as $m_{ij}$ , and the matrix transpose, complex conjugate, and conjugate transpose operators are denoted as $(\cdot)^{\mathrm{T}}$ , $(\cdot)^{\mathrm{*}}$ , and $(\cdot)^{\mathrm{H}}$ , respectively. Besides, $\mathbf{M}_{\backslash i,j}$ represents all the elements in matrix $\mathbf{M}\triangleq[m_{ij}]$ except $m_{ij}$ . In addition, $\exp\left(\cdot\right)$ denotes the exponential function, $\delta\left(\cdot\right)$ denotes the Dirac delta function, $\lfloor\cdot\rfloor$ denotes the floor function, and $\mathcal{CN}(x;\mu,v)$ denotes the probability density function (PDF) of a complex Gaussian random variable $x$ with mean $\mu$ and variance $v$ .

II System Model

We consider an uplink cellular system as shown in Fig. 1, where a BS with $M$ antennas serves $N$ single-antenna users. The users are assumed to have short data packets to transmit occasionally, and at each time instant, $K$ ( $K\leq N$ ) among the $N$ users become active for transmission. Denote $u_{n}\in\{0,1\}$ as the user activity indicator, where $u_{n}=1$ means user $n$ is active and $u_{n}=0$ if it is inactive. The sets of system users and active users are represented by $\mathcal{N}\triangleq\{1,\cdots,N\}$ and $\Xi\triangleq\left\{n\in\mathcal{N}|u_{n}=1\right\}$ , respectively, and the set of BS antennas is denoted as $\mathcal{M}\triangleq\{1,\cdots,M\}$ . Besides, the number of BS antennas is assumed to be no less than the number of active users, i.e., $M\geq K$ , to avoid the system from being overloaded [32].

Each transmission block contains $T$ symbol intervals and we assume quasi-static block fading channels, i.e., the channel state remains unchanged within a transmission block, but varies independently across multiple blocks. The uplink channel vector from user $n$ to the BS is modeled as $\mathbf{f}_{n}=\sqrt{\beta_{n}}\bm{\alpha}_{n}$ , where $\bm{\alpha}_{n}$ and $\beta_{n}$ denote the small-scale and large-scale fading coefficients, respectively. We focus on Rayleigh fading channels, i.e., $\bm{\alpha}_{n}\sim\mathcal{CN}(\bm{0},\mathbf{I}_{M})$ ¹¹1With slight abuse of notation, $\mathcal{CN}\left(\bm{\mu},\bm{\Sigma}\right)$ also denotes the complex Gaussian distribution with mean $\bm{\mu}$ and covariance matrix $\bm{\Sigma}$ ., and assume that the users are static with $\{\beta_{n}\}$ ’s known by the BS [22].

A grant-free RA scheme where each transmission block is divided into two phases, as shown in Fig. 1, is adopted for uplink transmission. Specifically, in the first phase, $L$ symbols, denoted as $\mathbf{T}_{p}$ , are reserved for pilot transmission, which are essential for user activity detection and channel estimation at the BS; Whereas the remaining $L_{d}\triangleq T-L$ symbols, denoted as $\mathbf{T}_{d}$ , are used for payload delivery in the second phase. It is important to note that although orthogonal pilot sequences are most beneficial for accurate channel estimation, it is strictly prohibitive in mMTC since the number of users can be much larger than the pilot length, i.e., $N\gg L$ . As a result, we assign the users with a set of non-orthogonal and unique pilot sequences $\{\mathbf{x}_{pn}\}$ ’s by sampling a complex Gaussian distribution, i.e., $\mathbf{x}_{pn}\triangleq\left[x_{n1},\cdots,x_{nL}\right]$ with $x_{nl}\sim\mathcal{C}\mathcal{N}\left(0,1\right)$ , which achieves asymptotic orthogonality when $L$ is sufficiently large. Define $\mathbf{X}_{p}\triangleq\left[\mathbf{x}_{p1},\cdots,\mathbf{x}_{pN}\right]^{\mathrm{T}}$ as the collection of pilot sequences.

Refer to caption — Figure 1: System model and the transmission block structure.

In each transmission block, $N_{b}$ payload bits, denoted as $\bm{b}_{n}\triangleq\left[b_{n1},\cdots,b_{nN_{b}}\right],n\in\Xi$ , need to be transmitted for each active user, which are encoded for error detection and correction. Following contemporary communication standards such as the long term evolution (LTE) [33] and 5G new radio (NR) [34], cyclic redundancy check (CRC) bits are generated and attached to the payload bits to form a code block. We represent the CRC generation and attachment procedures by function $\Upsilon$ $:\{0,1\}^{N_{b}}\rightarrow\{0,1\}^{N_{d}}$ , where $N_{d}$ denotes the size of a code block. Thus, the code blocks of the active users can be expressed as follows:

\displaystyle\bm{d}_{n}\triangleq\left[d_{n1},\cdots,d_{nN_{d}}\right]=\Upsilon(\bm{b}_{n}),n\in\Xi.

(1)

Each code block is then encoded by a channel encoder, which is abstracted as function $\Phi$ $:\{0,1\}^{N_{d}}\rightarrow\{0,1\}^{N_{c}}$ , and the coded bits can be represented as follows:

\displaystyle\bm{c}_{n}\triangleq\left[c_{n1},\cdots,c_{nN_{c}}\right]=\Phi(\bm{d}_{n}),n\in\Xi.

(2)

Note that $N_{c}$ is the number of coded bits and the code rate $\phi$ is defined as the ratio between $N_{d}$ and $N_{c}$ , i.e., $\phi\triangleq\frac{N_{d}}{N_{c}}$ .

The coded bits are modulated to a set of constellation points $\mathcal{X}$ with normalized average power via an invertible mapping $\mu$ $:\{0,1\}^{\log_{2}|\mathcal{X}|}\rightarrow\mathcal{X}$ , i.e., for an arbitrary bit sequence with length $\log_{2}|\mathcal{X}|$ , $\mu\left([c_{1},\cdots,c_{\log_{2}|\mathcal{X}|}]\right)=s$ if and only if $\mu^{-1}\left(s\right)=\left[c_{1},\cdots,c_{\log_{2}|\mathcal{X}|}\right]$ , where $s\in\mathcal{X}$ is a constellation point. The modulated symbols for the active users are denoted as follows:

\displaystyle\mathbf{x}_{dn}\triangleq\left[x_{n(L+1)},\cdots,x_{nT}\right],n\in\Xi.

(3)

We assume $N_{c}=L_{d}\log_{2}|\mathcal{X}|$ for simplicity and assign zero vectors to $\mathbf{x}_{dn}$ for the set of inactive users. Let $\mathbf{X}_{d}\triangleq\left[\mathbf{x}_{d1},\cdots,\mathbf{x}_{dN}\right]^{\mathrm{T}}$ denote the transmitted data symbols from all the users. As a result, the received signal of the transmission block $\tilde{\mathbf{Y}}\in\mathbb{C}^{M\times T}$ at the BS can be expressed as follows:

\displaystyle\tilde{\mathbf{Y}}\triangleq\left[\tilde{\mathbf{Y}}_{p},\tilde{\mathbf{Y}}_{d}\right]=\sqrt{\gamma}\mathbf{H}\underbrace{\left[\mathbf{X}_{p},\mathbf{X}_{d}\right]}_{\triangleq\mathbf{X}}+\underbrace{\left[\tilde{\mathbf{N}}_{p},\tilde{\mathbf{N}}_{d}\right]}_{\triangleq\tilde{\mathbf{N}}},

(4)

where $\gamma$ is the uplink transmit power, $\tilde{\mathbf{Y}}_{p}\in\mathbb{C}^{M\times L}$ and $\tilde{\mathbf{Y}}_{d}\in\mathbb{C}^{M\times L_{d}}$ are the received pilot and data signal, respectively, and $\mathbf{H}\triangleq\left[\mathbf{h}_{1},...,\mathbf{h}_{N}\right]\in\mathbb{C}^{M\times N}$ with $\mathbf{h}_{n}\triangleq u_{n}\mathbf{f}_{n}$ denotes the effective channel coefficient matrix. Besides, $\tilde{\mathbf{N}}=\left[\tilde{\mathbf{n}}_{1},...,\tilde{\mathbf{n}}_{T}\right]\in\mathbb{C}^{M\times T}$ is the additive white Gaussian noise (AWGN) with zero mean and variance $\sigma^{2}$ for each element, and $\tilde{\mathbf{N}}_{p}\in\mathbb{C}^{M\times L}$ and $\tilde{\mathbf{N}}_{d}\in\mathbb{C}^{M\times L_{d}}$ are the noise of the received pilot and data signal, respectively. The noise variance $\sigma^{2}$ is assumed known, which can be accurately estimated at the BS [35]. Define $\mathbf{Y}\triangleq\tilde{\mathbf{Y}}/\sqrt{\gamma}$ , $\mathbf{Y}_{p}\triangleq\tilde{\mathbf{Y}}_{p}/\sqrt{\gamma}$ , $\mathbf{Y}_{d}\triangleq\tilde{\mathbf{Y}}_{d}/\sqrt{\gamma}$ , $\mathbf{N}\triangleq\tilde{\mathbf{N}}/\sqrt{\gamma}$ , $\mathbf{N}_{p}\triangleq\tilde{\mathbf{N}}_{p}/\sqrt{\gamma}$ , and $\mathbf{N}_{d}\triangleq\tilde{\mathbf{N}}_{d}/\sqrt{\gamma}$ as the normalized received signals and noise for the ease of notation. TABLE I summarizes the key notations in this paper and their definitions.

TABLE I: Key Notations and Their Definitions

Notation	Definition
$M$ , $N$ , $K$	Number of BS antennas, system users, and active users
$\mathcal{M}$ , $\mathcal{N}$ , $\Xi$	Set of BS antennas, system users, and active users
$T$ , $L$ , $L_{d}$	Number of symbols, pilot symbols, and data symbols in a transmission block
$\bm{\alpha}_{n}$ , $\beta_{n}$	Small-scale and large-scale fading coefficients
$u_{n}$	User activity indicator
$\mathbf{h}_{n}$ , $\mathbf{H}$	Effective channel coefficient vector and matrix
$\bm{b}_{n}$ , $\bm{d}_{n}$ , $\bm{c}_{n}$	Payload bits, code block, and coded bits
$N_{b}$ , $N_{d}$ , $N_{c}$	Number of payload bits, code block bits, and coded bits
$\mathcal{X}$	Set of constellation points
$\mathbf{X}_{p}$ , $\mathbf{X}_{d}$	Pilot sequences and modulated data symbols
$\tilde{\mathbf{Y}}_{p}$ , $\tilde{\mathbf{Y}}_{d}$	Received pilot and data signals
$\mathbf{Y}_{p}$ , $\mathbf{Y}_{d}$	Normalized received pilot and data signals
$\gamma$ , $\sigma^{2}$	Uplink transmit power and noise variance
$\hat{\Xi}$ , $\hat{\Xi}_{c}$	Estimated set of active users and users that pass CRC
$\hat{\bm{d}}_{n}$ , $\hat{\bm{b}}_{n}$	Detected code blocks and payload bits

In the following sections, we will develop efficient algorithms to detect the set of active users, estimate their channel coefficients and the transmitted payload bits.

III Joint Estimation via a Turbo Receiver

In this section, we develop a turbo receiver for joint estimation of the user activity, channel coefficients, and payload data of the active users. It is noteworthy that while the turbo principle has achieved great success in conventional multi-user MIMO systems [36, 37], its applications in grant-free massive RA are still unchartered due to the new requirement of user activity detection. Besides, in contrast to most state-of-the-art approaches that follow a sequential user activity detection and data detection/decoding pipeline [9], our design exploits the common sparsity pattern in both the received pilot and data signal. Meanwhile, it takes advantages of the soft decoding information in order to optimize the activity detection and data reception performance.

III-A Overview of the Turbo Receiver

The proposed turbo receiver iterates between a joint estimator and a channel decoder as shown in Fig. 2, which is inspired by the turbo decoding principle [38] that leverages multiple concatenated elementary decoders with the aid of the extrinsic information. In particular, responsible for user activity detection, channel estimation, and soft data symbol detection, the joint estimator is designed based on the BiG-AMP algorithm [17]. It also estimates the posterior probabilities of the transmitted data symbols in each turbo iteration, which are converted as extrinsic information of the coded bits. On the other hand, the channel decoder is developed based on the belief propagation (BP) algorithm [39], which accepts the extrinsic information of the coded bits as input to generate their posteriors. The extrinsic LLRs of the coded bits, i.e., the logarithm of ratio between the probabilities that a coded bit is “0” or “1”, are obtained accordingly and translated to priors of the transmitted data symbols for the use of the joint estimator in the next turbo iteration. The turbo iteration terminates after $Q_{1}$ rounds or when an exit condition is achieved, after which, hard decision is performed to obtain the code block, followed by a cyclic redundancy check. The workflow of the turbo receiver is summarized in Algorithm 1 with details of the joint estimator and channel decoder to be elaborated in the following subsections. Note that an initial estimate of the effective channel coefficients and their variances, as well as the average sparsity levels derived from the estimated effective channel coefficients, are obtained via the AMP algorithm developed in [24]²²2With prior knowledge of the user active probability, the AMP algorithm [24] estimates the effective channel coefficients and their variances based on the received pilot signal, and a set of belief indicators $\{\tilde{\rho}_{mn}\}$ ’s are derived as the posterior probabilities of the effective channel coefficients to be non-zero. In this paper, we term $\tilde{\rho}_{mn}$ as the posterior sparsity level of user $n$ at the $m$ -th BS antenna, and define $\bar{\rho}_{n}\triangleq\frac{1}{M}\sum_{m\in\mathcal{M}}\tilde{\rho}_{mn}$ as the average sparsity level of user $n$ , which is a reliable statistic of the activity status and updated iteratively by the joint estimator of the proposed turbo receiver..

Algorithm 1 The Proposed Turbo Receiver for Massive RA

Input: The normalized received signal $\mathbf{Y}$ , pilot symbols $\mathbf{X}_{p}$ , maximum number of iterations $Q_{1}$ , and accuracy tolerance $\epsilon_{1}$ .
Output: The estimated set of active users $\hat{\Xi}$ , the set of users that pass CRC $\hat{\Xi}_{c}$ and their detected payload bits $\{\hat{\bm{b}}_{n}\}$ ’s.
Initialize: $j\leftarrow 0$ , $n\in\mathcal{N}$ , $\hat{x}_{nt}^{(0)}\leftarrow 0$ , $t\in\mathbf{T}_{d}$ , $\lambda_{n}^{(0)}=\frac{K}{N}$ , $n\in\mathcal{N}$ , $\eta_{nt,s}^{(1)}\leftarrow\frac{1}{|\mathcal{X}|}$ , $t\in\mathbf{T}_{d}$ , $L_{E}^{a}\left(c_{nj_{c}}^{(1)}\right)\leftarrow 0$ , $j_{c}=1,\cdots,N_{c}$ .

1:Execute the AMP algorithm in [24] to obtain the initial estimates of the effective channel coefficients

\{\hat{h}_{mn}^{(0)}\}

’s and their variances

\{V_{mn}^{h(0)}\}

’s, and the average sparsity levels

\{\bar{\rho}_{n}^{(0)}\}

’s.

2:while

j<Q_{1}

and

\frac{\Sigma_{n,t}|\hat{x}_{nt}^{(j)}-\hat{x}_{nt}^{(j-1)}|^{2}}{\Sigma_{n,t}|\hat{x}_{nt}^{(j-1)}|^{2}}>\epsilon_{1}

j\leftarrow j+1

4: //The Joint Estimator//

5: Based on

\{\hat{h}_{mn}^{(j-1)}\}

’s,

\{V_{mn}^{h(j-1)}\}

’s, and

\{\bar{\rho}_{n}^{(j-1)}\}

’s, the

6: joint estimator executes Algorithm 2 to estimate the set

7: of active users

\hat{\Xi}^{(j)}

, the posterior probabilities that

x_{nt}

8: equals

s

, i.e.,

\tilde{\eta}_{nt,s}^{(j)}

n\in\hat{\Xi}^{(j)}

t\in\mathbf{T}_{d}

, and the soft data

9: symbols

\hat{x}_{nt}^{(j)}

t\in\mathbf{T}_{d}

10: Convert

\tilde{\eta}_{nt,s}^{(j)}

to the posterior LLRs of coded bits

11:

L_{E}^{p}\left(c_{nj_{c}}^{(j)}\right)

n\in\hat{\Xi}^{(j)}

according to (21).

12: Calculate the extrinsic information

L_{E}^{e}\left(c_{nj_{c}}^{(j)}\right)

n\in\hat{\Xi}^{(j)}

13: as input of the channel decoder

L_{D}^{a}\left(c_{nj_{c}}^{(j)}\right)

n\in\hat{\Xi}^{(j)}

14: according to (22).

15: //The Channel Decoder//

16: Perform soft data decoding via a BP-based channel

17: decoder and obtain the posterior LLRs of the coded bits

18:

L_{D}^{p}\left(c_{nj_{c}}^{(j)}\right)

n\in\hat{\Xi}^{(j)}

19: Calculate the extrinsic information

L_{D}^{e}\left(c_{nj_{c}}^{(j)}\right)

n\in\hat{\Xi}^{(j)}

20: via (24) as input of joint estimator

L_{E}^{a}\left(c_{nj_{c}}^{(j+1)}\right)

for the

21: next turbo iteration, and obtain the prior probabilities

22: that

x_{nt}

equals

s

, i.e.,

{\eta}_{nt,s}^{(j+1)}

t\in\mathbf{T}_{d}

according to (26).

23:end while

24:Determine the set of active users

\hat{\Xi}

\hat{\Xi}^{(j)}

25:Perform hard decision based on

L_{D}^{p}\left(c_{nj_{c}}^{(j)}\right)

via (27) to obtain the code blocks

\hat{\bm{d}}_{n}

n\in\hat{\Xi}

26:Perform CRC to determine

\hat{\Xi}_{c}

and detach the CRC bits from

\hat{\bm{d}}_{n}

to obtain

\hat{\bm{b}}_{n}

n\in\hat{\Xi}_{c}

III-B The Joint Estimator

The joint estimator is designed to detect the set of active users, estimate their channel coefficients and the transmitted data symbols. Since the user activity pattern is encapsulated in $\mathbf{H}$ and can be determined accordingly, it remains for the joint estimator to estimate the effective channel coefficients and soft data symbols. We resort to the MMSE estimators, which can be expressed for the effective channel coefficients and soft data symbols respectively as follows [40]:

\displaystyle\hat{h}_{mn}\triangleq\mathbb{E}\left[h_{mn}|\mathbf{Y}\right]=\int h_{mn}p(h_{mn}|\mathbf{Y})dh_{mn},\forall m\in\mathcal{M},n\in\mathcal{N},

(5)

\displaystyle\hat{x}_{nt}\triangleq\mathbb{E}\left[x_{nt}|\mathbf{Y}\right]=\sum x_{nt}p(x_{nt}|\mathbf{Y}),\forall n\in\mathcal{N},t\in\mathbf{T}_{d},

(6)

where $\hat{h}_{mn}$ ( $\hat{x}_{nt}$ ) is the estimate of $h_{mn}$ ( $x_{nt}$ ), and $p(h_{mn}|\mathbf{Y})$ ( $p(x_{nt}|\mathbf{Y})$ ) denotes the marginal posterior distribution of $h_{mn}$ ( $x_{nt}$ ) given the normalized received signal $\mathbf{Y}$ . The marginal posterior distributions can be rewritten in terms of the joint posterior distribution $p(\mathbf{H},\mathbf{X}|\mathbf{Y})$ as follows:

\displaystyle p\left(h_{mn}|\mathbf{Y}\right)=\int_{\mathbf{H}_{\backslash m,n}}\sum_{\mathbf{X}}p(\mathbf{H},\mathbf{X}|\mathbf{Y})d\mathbf{H},

(7)

\displaystyle p\left(x_{nt}|\mathbf{Y}\right)=\sum_{\mathbf{X}_{\backslash n,t}}\int_{\mathbf{H}}p(\mathbf{H},\mathbf{X}|\mathbf{Y})d\mathbf{H},

(8)

where $p\left(\mathbf{H},\mathbf{X}|\mathbf{Y}\right)$ can be factorized via the Bayes’ rule:

\displaystyle\begin{split}p(\mathbf{H},\mathbf{X}|\mathbf{Y})&=\frac{p(\mathbf{Y}|\mathbf{H},\mathbf{X})p(\mathbf{H})p(\mathbf{X})}{p(\mathbf{Y})}\\ &\overset{(a)}{=}\frac{1}{p(\mathbf{Y})}p(\mathbf{Y}|\mathbf{H},\mathbf{X})p(\mathbf{H|U})p(\mathbf{U})p(\mathbf{X})\\ &\overset{(b)}{=}\frac{1}{p(\mathbf{Y})}\prod_{m=1}^{M}\prod_{t=1}^{T}p\left(y_{mt}|\sum_{n=1}^{N}h_{mn}x_{nt}\right)\\ &\times\prod_{n=1}^{N}\left[p\left(u_{n}\right)\prod_{m=1}^{M}p\left(h_{mn}|u_{n}\right)\prod_{t=1}^{T}p\left(x_{nt}\right)\right].\end{split}

(9)

In (9), (a) holds since $p(\mathbf{H})=p(\mathbf{H,U})=p(\mathbf{H|U})p(\mathbf{U})$ , as the user activity pattern is deterministic given $\mathbf{H}$ , and (b) is attributed to the conditional independence of random variables.

Nevertheless, the marginal distributions in (7) and (8) are intractable due to high-dimensional integrals and summations. Fortunately, the factorization in (9) implies efficient approximations via the BP algorithm operating on factor graphs [39]. As shown in Fig. 3, a factor graph consists of variable nodes (as indicated by circles), factor nodes (correspond to PDFs as indicated by squares), and edges connecting variable nodes and factor nodes. In the formats of PDFs, messages are propagated in the factor graph and updated iteratively. Specifically, the message from a variable node to a factor node is the product of messages from other adjacent factor nodes of that variable node, while the message from a factor node to a variable node is the integral of the product of that factor and messages from other adjacent variable nodes of that factor node. The posterior PDF of a variable is approximated by the belief of the corresponding variable node, which is the product of messages from all its adjacent factor nodes. For instance, let $I_{x_{nt}\rightarrow f_{y_{mt}}}$ and $I_{f_{y_{mt}}\rightarrow x_{nt}}$ be the messages from variable node $x_{nt}$ to factor node $p(y_{mt}|\sum_{n\in\mathcal{N}}h_{mn}x_{nt})$ and from factor node $p(y_{mt}|\sum_{n\in\mathcal{N}}h_{mn}x_{nt})$ to variable node $x_{nt}$ , respectively. They are updated in each iteration of the BP algorithm as follows:

\displaystyle I_{x_{nt}\rightarrow f_{y_{mt}}}\leftarrow I_{f_{x_{nt}}\rightarrow x_{nt}}\prod_{k\in\mathcal{M}\setminus\{m\}}I_{f_{y_{kt}}\rightarrow x_{nt}},

(10)

\displaystyle\begin{split}I_{f_{y_{mt}}\rightarrow x_{nt}}&\leftarrow\int p\left(y_{mt}\mid\sum_{k=1}^{N}h_{mk}x_{kt}\right)\\ &\times\prod_{r\in\mathcal{N}\setminus\{n\}}\left(I_{x_{rt}\rightarrow f_{y_{mt}}}\right)\prod_{k\in\mathcal{N}}\left(I_{h_{mk}\rightarrow f_{y_{mt}}}\right)d\mathbf{h}_{m}d\mathbf{x}_{t\backslash n},\end{split}

(11)

where $I_{f_{x_{nt}}\rightarrow x_{nt}}$ denotes the message from factor node $p(x_{nt})$ to variable node $x_{nt}$ that is used to approximate the prior distribution of $x_{nt}$ , and $I_{h_{mn}\rightarrow f_{y_{mt}}}$ is the message from variable node $h_{mn}$ to factor node $p(y_{mt}|\sum_{n\in\mathcal{N}}h_{mn}x_{nt})$ . The belief of variable node $x_{nt}$ and the approximated posterior distribution of $x_{nt}$ , i.e., $B_{x_{nt}}$ and $r_{x_{nt}}$ , are updated via the following expressions, respectively:

\displaystyle B_{x_{nt}}\leftarrow I_{f_{x_{nt}}\rightarrow x_{nt}}\prod_{m\in\mathcal{M}}I_{f_{y_{mt}}\rightarrow x_{nt}},

(12)

\displaystyle r_{x_{nt}}\leftarrow\frac{B_{x_{nt}}}{\int B_{x_{nt}}dx_{nt}}.

(13)

Unfortunately, although the BP algorithm is efficient in calculating marginal distributions, computations of the high-dimensional integrals in (11) still exhibit excessive complexity since $N$ is very large in massive RA systems. To develop a joint estimator with affordable complexity, we turn to the framework of AMP, which is a variant of BP that provides more tractable approximations for marginal distributions [16]. It adopts the Central Limit Theorem to approximate the product of some messages as a complex Gaussian distribution so that only the mean and variance need to be propagated. Besides, high-order terms of messages are omitted in deriving the means and variances to further reduce the computation complexity. Since the joint estimation of effective channel coefficients and soft data symbols belongs to a bilinear inference problem, the BiG-AMP algorithm [17] offers a viable solution by estimating $\mathbf{H}$ and $\mathbf{X}$ alternatively. Key steps of the BiG-AMP-based joint estimator are summarized in Algorithm 2, which is an iterative algorithm that estimates three sets of variables in each iteration, including: 1) The linear mixing variables $\{z_{mt}\}$ ’s ( $z_{mt}\triangleq\sum_{n\in\mathcal{N}}h_{mn}x_{nt}$ ); 2) The effective channel coefficients $\{h_{mn}\}$ ’s; and 3) The soft data symbols $\{x_{nt}\}$ ’s, as elaborated in the following.

Algorithm 2 The Joint Estimator based on BiG-AMP

Input: The normalized received signal $\mathbf{Y}$ , pilot symbols $\mathbf{X}_{p}$ , the estimates of the likelihood that each user is active $\{\lambda_{n}^{(j)}\}$ ’s, the estimates of the effective channel coefficients $\{\hat{h}_{mn}^{(j-1)}\}$ ’s and their variances $\{V_{mn}^{h(j-1)}\}$ ’s, the prior probabilities that $x_{nt}$ equals $s$ , i.e., $\{{\eta}_{nt,s}^{(j)}\}$ ’s, the threshold of determining the active user $\theta$ , maximum number of iterations $Q_{2}$ , and accuracy tolerance $\epsilon_{2}$ .
Output: The estimated set of active users $\hat{\Xi}^{(j)}$ , and the posterior probabilities that $x_{nt}$ equals $s$ , i.e., $\tilde{\eta}_{nt,s}^{(j)}$ , $n\in\hat{\Xi}^{(j)}$ , $t\in\mathbf{T}_{d}$ .
Initialize: $i\leftarrow 0$ , $\hat{h}_{mn}^{(j)}(0)\leftarrow\hat{h}_{mn}^{(j-1)}$ , $V_{mn}^{h(j)}(0)\leftarrow V_{mn}^{h(j-1)}$ , $\hat{s}_{mt}^{(j)}(0)\leftarrow 0$ , $\hat{x}_{nt}^{(j)}(0)\leftarrow 0$ , $t\in\mathbf{T}_{d}$ , $V_{nt}^{x(j)}(0)\leftarrow 1$ , $t\in\mathbf{T}_{d}$ .

1:while

i<Q_{2}

and

\frac{\Sigma_{m,t}|\hat{z}_{mt}^{(j)}(i)-\hat{z}_{mt}^{(j)}(i-1)|^{2}}{\Sigma_{m,t}|\hat{z}_{mt}^{(j)}(i-1)|^{2}}>\epsilon_{2}

i\leftarrow i+1

3: //Estimate the Linear Mixing Variable

{z}_{mt}

\forall m,t:M_{mt}^{p(j)}(i)=\sum_{n}\hat{h}_{mn}^{(j)}(i-1)\hat{x}_{nt}^{(j)}(i-1)-\hat{s}_{mt}^{(j)}(i-1)

\sum_{n}\Big{(}|\hat{x}_{nt}^{(j)}(i-1)|^{2}V_{mn}^{h(j)}(i-1)+|\hat{h}_{mn}^{(j)}(i-1)|^{2}V_{nt}^{x(j)}(i-1)\Big{)}

\forall m,t:V_{mt}^{p(j)}(i)=\sum_{n}\Big{(}|\hat{x}_{nt}^{(j)}(i-1)|^{2}V_{mn}^{h(j)}(i-1)

+|\hat{h}_{mn}^{(j)}(i-1)|^{2}V_{nt}^{x(j)}(i-1)\Big{)}+\sum_{n}V_{mn}^{h(j)}(i-1)V_{nt}^{x(j)}(i-1)

\forall m,t:\hat{z}_{mt}^{(j)}(i)=\mathbb{E}\left[z_{mt}|M_{mt}^{p(j)}(i),V_{mt}^{p(j)}(i)\right]

=\frac{y_{mt}V_{mt}^{p(j)}(i)+(\sigma^{2}/\gamma)M_{mt}^{p(j)}(i)}{(\sigma^{2}/\gamma)+V_{mt}^{p(j)}(i)}

10:

\forall m,t:V_{mt}^{z(j)}(i)=\text{Var}\left[z_{mt}|M_{mt}^{p(j)}(i),V_{mt}^{p(j)}(i)\right]

11:

=\frac{(\sigma^{2}/\gamma)V_{mt}^{p(j)}(i)}{(\sigma^{2}/\gamma)+V_{mt}^{p(j)}(i)}

12:

\forall m,t:\hat{s}_{mt}^{(j)}(i)=\left(\hat{z}_{mt}^{(j)}(i)-M_{mt}^{p(j)}(i)\right)/V_{mt}^{p(j)}(i)

13:

\forall m,t:V_{mt}^{s(j)}(i)=\left(1-V_{mt}^{z(j)}(i)/{V_{mt}^{p(j)}(i)}\right)/V_{mt}^{p(j)}(i)

14: //Estimate the Effective Channel Coefficients //

15:

\forall m,n:Q_{p,mn}^{h(j)}(i)=\left(\sum_{t\in\mathbf{T}_{p}}|x_{nt}|^{2}V_{mt}^{s(j)}(i)\right)^{-1}

16:

\forall m,n:P_{p,mn}^{h(j)}(i)=\hat{h}_{mn}^{(j)}(i-1)+Q_{p,mn}^{h(j)}(i)

17:

\sum_{t\in\mathbf{T}_{p}}x_{nt}^{*}\hat{s}_{mt}^{(j)}(i)\cdot

18:

\forall m,n:Q_{d,mn}^{h(j)}(i)=\left(\sum_{t\in\mathbf{T}_{d}}\left|\hat{x}_{nt}^{(j)}(i-1)\right|^{2}V_{mt}^{s(j)}(i)\right)^{-1}

19:

\forall m,n:P_{d,mn}^{h(j)}(i)=\hat{h}_{mn}^{(j)}(i-1)\Big{(}1-Q_{d,mn}^{h(j)}(i)\sum_{t\in\mathbf{T}_{d}}

20:

V_{nt}^{x(j)}(i-1)V_{mt}^{s(j)}(i)\Big{)}+Q_{d,mn}^{h(j)}(i)\sum_{t\in\mathbf{T}_{d}}\hat{x}_{nt}^{(j)*}(i-1)\hat{s}_{mt}^{(j)}(i)

21:

\forall m,n:P_{mn}^{h(j)}(i)=\Big{(}P_{p,mn}^{h(j)}(i)Q_{d,mn}^{h(j)}(i)+P_{d,mn}^{h(j)}(i)\cdot

22:

Q_{p,mn}^{h(j)}(i)\Big{)}\Big{/}\left(Q_{p,mn}^{h(j)}(i)+Q_{d,mn}^{h(j)}(i)\right)

23:

\forall m,n:Q_{mn}^{h(j)}(i)=Q_{p,mn}^{h(j)}(i)Q_{d,mn}^{h(j)}(i)\Big{/}\Big{(}Q_{p,mn}^{h(j)}(i)

24:

+Q_{d,mn}^{h(j)}(i)\Big{)}

25:

\forall m,n:K_{mn}^{(j)}(i)=\ln\left(\frac{\mathcal{CN}\left(0;P_{mn}^{h(j)}(i),Q_{mn}^{h(j)}(i)+\beta_{n}\right)}{\mathcal{CN}\left(0;P_{mn}^{h(j)}(i),Q_{mn}^{h(j)}(i)\right)}\right)

26:

=\ln\left(\frac{Q_{mn}^{h(j)}(i)}{Q_{mn}^{h(j)}(i)+\beta_{n}}\right)+\frac{\left|P_{mn}^{h(j)}(i)\right|^{2}\beta_{n}}{\left(Q_{mn}^{h(j)}(i)+\beta_{n}\right)Q_{mn}^{h(j)}(i)}

27:

\forall m,n:L_{mn}^{(j)}(i)=\ln\left(\frac{\lambda_{n}^{(j)}}{1-\lambda_{n}^{(j)}}\right)+\sum_{k\in\mathcal{M}\backslash\{m\}}\left(K_{kn}^{(j)}(i)\right)

28:

\forall m,n:\rho_{mn}^{(j)}(i)=\exp\left(L_{mn}^{(j)}(i)\right)\Big{/}\left(1+\exp\left(L_{mn}^{(j)}(i)\right)\right)

29:

\forall m,n:\mu_{mn}^{(j)}(i)=\beta_{n}P_{mn}^{h(j)}(i)\Big{/}\left(\beta_{n}+Q_{mn}^{h(j)}(i)\right)

30:

\tau_{mn}^{(j)}(i)=\beta_{n}Q_{mn}^{h(j)}(i)\Big{/}\left(\beta_{n}+Q_{mn}^{h(j)}(i)\right)

31:

\forall m,n:\tilde{\rho}_{mn}^{(j)}(i)=\rho_{mn}^{(j)}(i)\Big{/}\Big{(}\rho_{mn}^{(j)}(i)+\left(1-\rho_{mn}^{(j)}(i)\right)\cdot

32:

\exp\Big{(}-\ln\left(\frac{Q_{mn}^{h(j)}(i)}{Q_{mn}^{h(j)}(i)+\beta_{n}}\right)-\frac{|P_{mn}^{h(j)}(i)|^{2}\beta_{n}}{\left(Q_{mn}^{h(j)}(i)+\beta_{n}\right)Q_{mn}^{h(j)}(i)}\Big{)}\Big{)}

33:

\forall m,n:\hat{h}_{mn}^{(j)}(i)=\mathbb{E}\left[h_{mn}|P_{mn}^{h(j)}(i),Q_{mn}^{h(j)}(i)\right]

34:

=\tilde{\rho}_{mn}^{(j)}(i)\mu_{mn}^{(j)}(i)

35:

\forall m,n:V_{mn}^{h(j)}(i)=\text{Var}\left[h_{mn}|P_{mn}^{h(j)}(i),Q_{mn}^{h(j)}(i)\right]

36:

=\tilde{\rho}_{mn}^{(j)}(i)\left(\left|\mu_{mn}^{(j)}(i)\right|^{2}+\tau_{mn}^{(j)}(i)\right)-\left|\hat{h}_{mn}^{(j)}(i)\right|^{2}

37: //Estimate the Soft Data Symbols//

38:

\forall n,t\in\mathbf{T}_{d}:Q_{nt}^{x(j)}(i)=\left(\sum_{m}\left|\hat{h}_{mn}^{(j)}(i-1)\right|^{2}V_{mt}^{s(j)}(i)\right)^{-1}

39:

\forall n,t\in\mathbf{T}_{d}:P_{nt}^{x(j)}(i)=\hat{x}_{nt}^{(j)}(i-1)\Big{(}1-Q_{nt}^{x(j)}(i)\cdot

40:

\sum_{m}V_{mn}^{h(j)}(i-1)V_{mt}^{s(j)}(i)\Big{)}+Q_{nt}^{x(j)}(i)\sum_{m}\hat{h}_{mn}^{(j)*}(i-1)\hat{s}_{mt}^{(j)}(i)

41:

\forall n,t\in\mathbf{T}_{d}:\tilde{\eta}_{nt,s}^{(j)}(i)=\frac{\eta_{nt,s}^{(j)}\mathcal{C}\mathcal{N}\left(s;P_{nt}^{x(j)}(i),Q_{nt}^{x(j)}(i)\right)}{\sum_{s^{\prime}\in\mathcal{X}}\eta_{nt,s^{\prime}}^{(j)}\mathcal{C}\mathcal{N}\left(s^{\prime};P_{nt}^{x(j)}(i),Q_{nt}^{x(j)}(i)\right)}

42:

\forall n,t\in\mathbf{T}_{d}:\hat{x}_{nt}^{(j)}(i)=\mathbb{E}\left[x_{nt}|P_{nt}^{x(j)}(i),V_{nt}^{x(j)}(i)\right]

43:

=\bar{\rho}_{n}^{(j-1)}\sum_{s\in\mathcal{X}}\tilde{\eta}_{nt,s}^{(j)}(i)s

44:

\forall n,t\in\mathbf{T}_{d}:V_{nt}^{x(j)}(i)=\text{Var}\left[x_{nt}|P_{nt}^{x(j)}(i),V_{nt}^{x(j)}(i)\right]

45:

=\sum_{s\in\mathcal{X}}\tilde{\eta}_{nt,s}^{(j)}(i)\left|\bar{\rho}_{n}^{(j-1)}s-\hat{x}_{nt}^{(j)}(i)\right|^{2}

46:end while

47:Update

\bar{\rho}_{n}^{(j)}=\frac{1}{M}\sum_{m\in\mathcal{M}}\tilde{\rho}_{mn}^{(j)}

\hat{h}_{mn}^{(j)}

, and

V_{mn}^{h(j)}

for the next turbo iteration.

48:Update

\lambda_{n}^{(j+1)}=\kappa\bar{\rho}_{n}^{(j)}+(1-\kappa)\lambda_{n}^{(j)}

\forall n\in\mathcal{N}

49:Determine the estimated set of active users as

\hat{\Xi}^{(j)}\triangleq\{n\in\mathcal{N}\mid\bar{\rho}_{n}^{(j)}\geq\theta\}

III-B1 Estimate the linear mixing variables

In each iteration, the joint estimator first estimates the linear mixing variable $z_{mt}$ from $y_{mt}$ . The basic principle is that if the prior distribution of a variable and its likelihood function are available, the posterior probability can be derived for MMSE estimation by using the Bayes’ rule. Since $y_{mt}=z_{mt}+n_{mt}$ , the likelihood function is given as follows:

\displaystyle p\left(y_{mt}|\sum_{n\in\mathcal{N}}h_{mn}x_{nt}\right)=\frac{\gamma}{\pi\sigma^{2}}\text{exp}\left(-\frac{\gamma}{\sigma^{2}}\left|y_{mt}-\sum_{n\in\mathcal{N}}h_{mn}x_{nt}\right|^{2}\right).

(14)

The prior distribution of $z_{mt}$ is approximated as a complex Gaussian distribution with mean $M_{mt}^{p(j)}(i)$ and variance $V_{mt}^{p(j)}(i)$ in the $i$ -th iteration of the joint estimator, as shown in Line 3 and 4 in Algorithm 2, respectively, where the superscript “^(j)” denotes the turbo iteration index, $\hat{h}_{mn}^{(j)}(i-1)$ and $V_{mn}^{h(j)}(i-1)$ are the most updated estimate of the effective channel coefficient and its variance, $\hat{x}_{nt}^{(j)}(i-1)$ and $V_{nt}^{x(j)}(i-1)$ are the latest estimate of the soft data symbol and its variance, and $\hat{s}_{mt}^{(j)}(i-1)$ denotes the scaled residual of $z_{mt}$ . Note that for $t\in\mathbf{T}_{p}$ , we have $\hat{x}_{nt}^{(j)}(i-1)=x_{nt}$ and $V_{nt}^{x(j)}(i-1)=0$ as the pilot symbols are known at the BS.

Since both the approximated prior distribution of $z_{mt}$ and the likelihood function of $z_{mt}$ are complex Gaussian, the posterior distribution $p\left(z_{mt}|y_{mt}\right)$ can also be approximated by a complex Gaussian distribution with mean $\hat{z}_{mt}^{(j)}(i)$ and variance $V_{mt}^{z(j)}(i)$ given in Line 5 and 6 in Algorithm 2, respectively. Detailed derivations are provided in Appendix A. Note that the posterior mean of $z_{mt}$ also gives the MMSE estimate $\hat{z}_{mt}^{(j)}(i)$ . Besides, the scaled residual $\hat{s}_{mt}^{(j)}(i)$ of $z_{mt}$ and the corresponding inverse-residual-variance $V_{mt}^{s(j)}(i)$ are updated in Line 7 and 8, respectively, which are useful for approximating the likelihood functions of the effective channel coefficients and soft data symbols.

III-B2 Estimate the effective channel coefficients

The effective channel coefficients and their variances are estimated by incorporating both the received pilot and data signals. In order to approximate the posterior distribution of $h_{mn}$ in the $i$ -th iteration of the joint estimator, we first obtain the belief of variable node $h_{mn}$ as it only differs from the posterior distribution of $h_{mn}$ by a normalizing constant, which can be derived based on the BP algorithm as follows:

\displaystyle B_{h_{mn}}^{(j)}(i)=I_{f_{h_{mn}\rightarrow h_{mn}}}^{(j)}(i)\prod_{t\in\mathbf{T}_{p}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i)\prod_{t\in\mathbf{T}_{d}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i),

(15)

where $I_{f_{h_{mn}\rightarrow h_{mn}}}^{(j)}(i)$ denotes the message from factor node $p(h_{mn}|u_{n})$ to variable node $h_{mn}$ that serves as the prior distribution of $h_{mn}$ , and $I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i)$ represents the message from factor node $p(y_{mt}|z_{mt})$ to variable node $h_{mn}$ . Thus, the term $\prod_{t\in\mathbf{T}_{p}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i)$ $\prod_{t\in\mathbf{T}_{d}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i)$ can be interpreted as the likelihood function of $h_{mn}$ in the $i$ -th iteration. Specifically, the term $\prod_{t\in\mathbf{T}_{p}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i)$ , which corresponds to the received pilot symbols, is approximated as a complex Gaussian PDF with mean $P_{p,mn}^{h(j)}(i)$ and variance $Q_{p,mn}^{h(j)}(i)$ , and the term $\prod_{t\in\mathbf{T}_{d}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}(i)$ that relates to the received data symbols, is approximated as another complex Gaussian PDF with mean $P_{d,mn}^{h(j)}(i)$ and variance $Q_{d,mn}^{h(j)}(i)$ . Consequently, the term $\prod_{t\in\mathbf{T}_{p}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}\!(i)\!\prod_{t\in\mathbf{T}_{d}}I_{f_{y_{mt}\rightarrow h_{mn}}}^{(j)}\!(i)$ is also approximated as a complex Gaussian PDF with mean $P_{mn}^{h(j)}(i)$ and variance $Q_{mn}^{h(j)}(i)$ given in Line 13 and 14 of Algorithm 2, respectively, which are derived in Appendix B.

To derive $B_{h_{mn}}^{(j)}(i)$ , we obtain $I_{f_{h_{mn}\rightarrow h_{mn}}}^{(j)}(i)$ as follows:

\displaystyle I_{f_{h_{mn}\rightarrow h_{mn}}}^{(j)}(i)=\left(1-\rho_{mn}^{(j)}(i)\right)\delta(h_{mn})\!+\!\rho_{mn}^{(j)}(i)\mathcal{CN}(h_{mn};0,\beta_{n}),

(16)

where $\rho_{mn}^{(j)}(i)$ approximates the probability that $h_{mn}$ is non-zero, and it is defined as the sparsity level of user $n$ at the $m$ -th BS antenna. Detailed derivations of (16) are deferred to Appendix C. Note that estimates of the likelihood that each user is active or not in the considered transmission block, i.e., $\lambda_{n}^{\left(j\right)}$ , are required for the calculations of $\{\rho_{mn}^{(j)}\left(i\right)\}$ ’s in Lines 15-17 of Algorithm 2. We propose to update $\{\lambda_{n}^{\left(j\right)}\}$ ’s in each turbo iteration for more accurate estimation of the BiG-AMP algorithm, as will be elaborated shortly. Therefore, the posterior distribution of $h_{mn}$ is approximated in the $i$ -th iteration as follows:

\displaystyle\begin{split}r_{h_{mn}}^{(j)}(i)&=\frac{B_{h_{mn}}^{(j)}(i)}{\int B_{h_{mn}}^{(j)}(i)dh_{mn}}=\left(1-\tilde{\rho}_{mn}^{(j)}(i)\right)\delta(h_{mn})\\ &+\tilde{\rho}_{mn}^{(j)}(i)\mathcal{C}\mathcal{N}\left(h_{mn};\mu_{mn}^{(j)}(i),\tau_{mn}^{(j)}(i)\right),\end{split}

(17)

where $\mu_{mn}^{(j)}(i)$ and $\tau_{mn}^{(j)}(i)$ are given in Line 18 of Algorithm 2, and $\tilde{\rho}_{mn}^{(j)}(i)$ presented in Line 19 is defined as the posterior sparsity level of user $n$ at the $m$ -th BS antenna. Based on the posterior distribution, the MMSE estimate of the effective channel coefficient and its variance are obtained in Line 20 and 21 of Algorithm 2, respectively.

III-B3 Estimate the soft data symbols

Due to the symmetry of $x_{nt}$ and $h_{mn}$ in the bilinear inference problem of the joint estimator, we similarly obtain the conditional mean $P_{nt}^{x(j)}(i)$ and variance $Q_{nt}^{x(j)}(i)$ given $x_{nt}$ in Line 23 and 22 of Algorithm 2, respectively. Prior distributions of the transmitted data symbols can be estimated as follows:

\displaystyle p(x_{nt})=I_{f_{x_{nt}}\rightarrow x_{nt}}^{(j)}(i)\approx\bar{\rho}_{n}^{(j-1)}\sum_{s\in\mathcal{X}}\eta_{nt,s}^{(j)}\delta(x_{nt}-s),t\in\mathbf{T}_{d},

(18)

where $\bar{\rho}_{n}^{(j-1)}\triangleq\frac{1}{M}\sum_{m\in\mathcal{M}}\tilde{\rho}_{mn}^{(j-1)}$ , and $\eta_{nt,s}^{(j)}$ denotes the probability that $x_{nt}$ belongs to constellation point $s$ . As will be introduced in the next subsection, $\{\eta_{nt,s}^{(j)}\}$ ’s are obtained from the channel decoder in the last turbo iteration. Thus, the approximated posterior distributions of the transmitted data symbols in the $i$ -th iteration of the joint estimator can be expressed as follows:

\displaystyle r_{x_{nt}}^{(j)}(i)=\bar{\rho}_{n}^{(j-1)}\sum_{s\in\mathcal{X}}\tilde{\eta}_{nt,s}^{(j)}(i)\delta(x_{nt}-s),t\in\mathbf{T}_{d},

(19)

where $\tilde{\eta}_{nt,s}^{(j)}(i)$ denotes the posterior probability that $x_{nt}$ belongs to constellation point $s$ as derived using the Bayes’ rule in Line 24 of Algorithm 2. The soft data symbols and the corresponding posterior variances are estimated via Line 25 and 26, respectively.

Once the while loop of Algorithm 2 is terminated, $\bar{\rho}_{n}^{(j)}$ , $\hat{h}_{mn}^{(j)}$ and $V_{mn}^{h(j)}$ are updated for the next turbo iteration. Accordingly, we update $\{\lambda_{n}^{(j)}\}$ ’s in Line 28 using the average sparsity level $\bar{\rho}_{n}^{(j)}$ , i.e., $p(u_{n}=1)\triangleq\lambda_{n}^{(j+1)}=\kappa\bar{\rho}_{n}^{(j)}+(1-\kappa)\lambda_{n}^{(j)}$ , $n\in\mathcal{N}$ , where $\kappa\in[0,1]$ is the learning rate. This is inspired by the idea of exploration and exploitation from reinforcement learning [41], which avoids using the average sparsity levels exclusively to eliminate the potential estimation errors caused by inaccurate prior information of the BiG-AMP algorithm. The set of active users is determined as $\hat{\Xi}^{(j)}\triangleq\{n\in\mathcal{N}|\bar{\rho}_{n}^{(j)}\geq\theta\}$ , where $\theta$ is an empirical threshold [24].

III-B4 Derive the extrinsic information of the joint estimator

With the estimated set of active users $\hat{\Xi}^{(j)}$ and soft data symbols $\{\hat{x}_{nt}\}$ ’s, extrinsic information of the joint estimator is derived as input of the channel decoder, which aims at minimizing the data decoding error by eliminating some redundancy from the prior information of the coded bits. In particular, the posterior probability of a transmitted data symbols is translated to the posterior probabilities of the corresponding coded bits as follows:

\displaystyle p\left(c_{nj_{c}}^{(j)}=b|\mathbf{Y}\right)=\sum\nolimits_{s\in\mathcal{X}_{\hat{j}_{c}}^{b}}\tilde{\eta}_{nt,s}^{(j)},n\in\hat{\Xi}^{(j)},

(20)

where $\hat{j}_{c}\triangleq\mod({j}_{c},\log_{2}|\mathcal{X}|)$ , $t=L+1+\left\lfloor\frac{j_{c}}{\log_{2}|\mathcal{X}|}\right\rfloor$ , and $\mathcal{X}_{l}^{b}$ represents the set of constellation points with the $l$ -th position ( $l=0,\cdots,\log_{2}|\mathcal{X}|-1$ ) of the corresponding bit sequence as $b$ . For example, suppose the bit sequences “00”, “01”, “10” and “11” are modulated to constellation points $s_{0}$ , $s_{1}$ , $s_{2}$ , and $s_{3}$ respectively in quadrature phase shift keying (QPSK), we have $\mathcal{X}_{0}^{0}=\{s_{0},s_{1}\}$ , $\mathcal{X}_{0}^{1}=\{s_{2},s_{3}\}$ , $\mathcal{X}_{1}^{0}=\{s_{0},s_{2}\}$ , and $\mathcal{X}_{1}^{1}=\{s_{1},s_{3}\}$ . Other modulation schemes such as the 16-Quadrature Amplitude Modulation (QAM) can be applied similarly. The posterior probabilities of the coded bits are used to derive the posterior LLRs as follows:

\displaystyle L_{E}^{p}\left(c_{nj_{c}}^{(j)}\right)\triangleq\ln\left(\frac{p(c_{nj_{c}}^{(j)}=0|\mathbf{Y})}{p(c_{nj_{c}}^{(j)}=1|\mathbf{Y})}\right),n\in\hat{\Xi}^{(j)},

(21)

which are converted to the extrinsic information as defined below [42]:

\displaystyle L_{E}^{e}\left(c_{nj_{c}}^{(j)}\right)\triangleq L_{E}^{p}\left(c_{nj_{c}}^{(j)}\right)-L_{E}^{a}\left(c_{nj_{c}}^{(j)}\right),n\in\hat{\Xi}^{(j)},

(22)

where $L_{E}^{a}\left(c_{nj_{c}}^{(j)}\right)\triangleq\ln\left(\frac{p(c_{nj_{c}}^{(j)}=0)}{p(c_{nj_{c}}^{(j)}=1)}\right)$ is the prior information obtained from the channel decoder in the last turbo iteration.

III-C The Channel Decoder

The channel decoder determines the most probable code block for each user that is determined as active by the joint estimator, which can be formulated as the following maximum a posteriori probability (MAP) estimation problem:

\displaystyle\bm{\hat{c}}_{n}=\arg\max_{\bm{c}_{n}\in\{0,1\}^{N_{c}}}p\left(\bm{c}_{n}\mid\{L_{E}^{e}(c_{nj_{c}}^{(j)})\}\right),n\in\hat{\Xi}^{(j)}.

(23)

Known for its effectiveness in calculating marginal distributions, the BP algorithm has a long history of applications for channel decoder designs [42]. In this paper, we adopt a BP-based channel decoder to solve (23), which should be able to accept the extrinsic information of the coded bits derived from the joint estimator as input, and calculate the posterior LLRs of the coded bits $L_{D}^{p}(c_{nj_{c}}^{(j)})$ , $n\in\hat{\Xi}^{(j)}$ as the soft decoding results. We emphasize that this is a mild requirement that can be satisfied by a variety of off-the-shelf BP-based channel decoders, e.g., the decoders developed in [43], [44] and [38], [45] for low-density parity-check (LDPC) code and turbo code, respectively.

Similar to the joint estimator, extrinsic information of the channel decoder is derived as follows:

\displaystyle L_{D}^{e}\left(c_{nj_{c}}^{(j)}\right)\triangleq L_{D}^{p}\left(c_{nj_{c}}^{(j)}\right)-L_{D}^{a}\left(c_{nj_{c}}^{(j)}\right),n\in\hat{\Xi}^{(j)},

(24)

which is adopted as prior information $L_{E}^{a}(c_{nj_{c}}^{(j+1)})$ , $n\in\hat{\Xi}^{(j)}$ for the use of the joint estimator in the next turbo iteration. Therefore, the prior distribution of a coded bit is given as

\displaystyle p\left(c_{nj_{c}}^{(j)}\right)=\left\{\begin{aligned} \frac{1}{1+\text{exp}\left(L_{D}^{e}\left(c_{nj_{c}}^{(j)}\right)\right)},c_{nj_{c}}^{(j)}=1,\\ \frac{\text{exp}\left(L_{D}^{e}\left(c_{nj_{c}}^{(j)}\right)\right)}{1+\text{exp}\left(L_{D}^{e}\left(c_{nj_{c}}^{(j)}\right)\right)},c_{nj_{c}}^{(j)}=0,\end{aligned}\right.\ n\in\hat{\Xi}^{(j)},

(25)

and prior distributions of the transmitted data symbols can be estimated according to the following expression:

\displaystyle{\eta}_{nt,s}^{(j+1)}=\prod\nolimits_{j_{c}=v_{1}}\nolimits^{v_{2}}p\left(c_{nj_{c}}^{(j)}\right),t\in\mathbf{T}_{d},n\in\hat{\Xi}^{(j)}.

(26)

where $v_{1}\triangleq(t-L-1)\text{log}_{2}|\mathcal{X}|$ , $v_{2}\triangleq(t-L)\text{log}_{2}|\mathcal{X}|-1$ , and $\mu\left([c_{nv_{1}},\cdots,c_{nv_{2}}]\right)=s$ . Note that for the users that are determined as inactive, we reuse the prior information of the transmitted data symbols from the last turbo iteration by setting $L_{E}^{a}\left(c_{nj_{c}}^{(j+1)}\right)=L_{E}^{a}\left(c_{nj_{c}}^{(j)}\right)$ and ${\eta}_{nt,s}^{(j+1)}={\eta}_{nt,s}^{(j)}$ , $n\in\mathcal{N}\setminus\hat{\Xi}^{(j)}$ .

The values of $\{L_{D}^{p}\left(c_{nj_{c}}\right)\}$ ’s are also utilized to obtain the code block $\hat{\bm{d}}_{n}$ , $n\in\hat{\Xi}$ after the last turbo iteration by performing hard decision as follows:

\displaystyle\hat{d}_{nj_{c}}=\left\{\begin{aligned} 0,L_{D}^{p}\left(c_{nj_{c}}\right)\geq 0,\\ 1,L_{D}^{p}\left(c_{nj_{c}}\right)<0.\end{aligned}\right.

(27)

CRC is then performed for $\hat{\bm{d}}_{n},n\in\hat{\Xi}$ and the CRC bits are detached to obtain the payload bits $\hat{\bm{b}}_{n}$ for the users that pass parity check, which is denoted as $\hat{\Xi}_{c}$ in Algorithm 1.

IV A Low-complexity Side Information-aided Receiver

Assisted by prior information of the transmitted data symbols, the proposed turbo receiver effectively exploits the common sparsity pattern in the received pilot and data signal via BiG-AMP. Nevertheless, such a design incurs significant computation overhead even with a reasonable size of the payload data, since the joint estimation of effective channel coefficients and soft data symbols is performed iteratively by incorporating all the received symbols in each turbo iteration. Besides, in order to estimate the prior information, the channel decoder needs to be executed in each turbo iteration for all users in $\hat{\Xi}^{\left(j\right)}$ (See Line 7 of Algorithm 1), which brings additional computation overhead. These observations necessitate low-complexity receivers for massive RA that can leverage both the common sparsity pattern and channel decoding results more efficiently. In this section, we develop a low-complexity side information (SI)-aided receiver without relying on BiG-AMP.

IV-A Overview of the SI-aided Receiver

The SI-aided receiver iterates between a sequential estimator and a channel decoder as shown in Fig. 4. Unlike the turbo receiver developed in Section III, it estimates the effective channel coefficients and soft data symbols sequentially in each iteration to reduce the computation overhead. Specifically, the sequential estimator cascades the AMP algorithm [24] for JADCE, and an MMSE-based soft demodulator to compute the prior LLRs of the coded bits. A BP-based channel decoder is adopted to obtain the posterior LLRs of the coded bits similar as the turbo receiver, while hard decision is required for parity check in each iteration. By updating the SI, i.e., the estimates on whether a user is active, in each iteration jointly based on the average sparsity levels, the posterior LLRs of the coded bits, and the parity check results, the receiver progresses with more precise prior knowledge for the sequential estimator so that more accurate JADCE can be achieved [46]. The workflow of the SI-aided receiver is summarized in Algorithm 3. We introduce details of the sequential estimator and the channel decoder in Section IV-B, and elaborate the design of the SI in Section IV-C.

Algorithm 3 The Proposed SI-aided Receiver for Massive RA

Input: The normalized received signal $\mathbf{Y}$ , pilot symbols $\mathbf{X}_{p}$ , maximum number of iterations $Q_{3}$ , and accuracy tolerance $\epsilon_{3}$ .
Output: The estimated set of active users $\hat{\Xi}$ , the set of users that pass CRC $\hat{\Xi}_{c}$ and their detected payload bits $\hat{\bm{b}}_{n}$ .
Initialize: $\!j\!\leftarrow\!0$ , $\lambda_{n}^{(1)}\!\leftarrow\!\frac{K}{N}$ , $n\!\in\!\mathcal{N}$ , $\hat{x}_{nt}^{(0)}\leftarrow 0$ , $t\in\mathbf{T}_{d}$ , $\hat{\Xi}_{c}\leftarrow\emptyset$ .

1:while

j<Q_{3}

and

\frac{\Sigma_{n,t}|\hat{x}_{nt}^{(j)}-\hat{x}_{nt}^{(j-1)}|^{2}}{\Sigma_{n,t}|\hat{x}_{nt}^{(j-1)}|^{2}}>\epsilon_{3}

j\leftarrow j+1

3: //The Sequential Estimator//

4: Execute the AMP algorithm [24] with the SI

\{\lambda_{n}^{(j)}\}

’s

5: as the prior knowledge of the user activity to estimate

6: the effective channel coefficients

\{\hat{h}_{mn}^{(j)}\}

’s and set of

7: active users

\hat{\Xi}^{(j)}

8: Estimate the transmitted data symbols

\hat{x}_{nt}^{(j)}

t\in\mathbf{T}_{d}

via

9: an MMSE equalizer, i.e.,

\hat{\mathbf{X}}_{d,a}^{(j)}=\Big{(}(\hat{\mathbf{H}}_{a}^{(j)})^{\mathrm{H}}\hat{\mathbf{H}}_{a}^{(j)}+

10:

(\sigma^{2}/\gamma)\mathbf{I}\Big{)}^{-1}(\hat{\mathbf{H}}_{a}^{(j)})^{\mathrm{H}}\mathbf{Y}_{d}

, where

\mathbf{H}_{a}^{(j)}\triangleq\left[\left\{\hat{\mathbf{h}}_{k}^{(j)}\right\}_{k\in\hat{\Xi}^{(j)}}\right]

11: stacks the effective channel coefficients of all

12: the estimated active users, and

\hat{x}_{nt}^{(j)}

is the entry of

\hat{\mathbf{X}}_{d,a}^{(j)}

13: Compute the prior LLRs of the coded bits as

L_{D}^{a}\left(c_{nj_{c}}^{(j)}\right)

14:

n\in\hat{\Xi}^{(j)}

via soft demodulation according to (28).

15: //The Channel Decoder//

16: Perform soft data decoding via a BP-based channel

17: decoder to obtain the posterior LLRs of the coded bits

18:

L_{D}^{p}\left(c_{nj_{c}}^{(j)}\right)

n\in\hat{\Xi}^{(j)}\setminus\hat{\Xi}_{c}

19: Perform hard decision, determine the set of users in

20:

\hat{\Xi}^{(j)}

that pass

\hat{\Xi}_{c}^{(j)}

and obtain their payload bits

\{\hat{\bm{b}}_{n}\}

’s.

21:

\hat{\Xi}_{c}\leftarrow\hat{\Xi}_{c}\bigcup\hat{\Xi}_{c}^{(j)}

22: Update the SI

\{\lambda_{n}^{(j+1)}\}

’s according to (29).

23:end while

IV-B The Sequential Estimator and the Channel Decoder

Based on the normalized received pilot signal $\mathbf{Y}_{p}$ , the sequential estimator adopts the AMP algorithm [24] to estimate the effective channel coefficients $\{\hat{h}_{mn}^{(j)}\}$ ’s and the set of active users $\hat{\Xi}^{(j)}$ . We also derive the sparsity levels $\{\rho_{mn}^{(j)}\}$ ’s from the AMP algorithm following similar steps in Lines 15-17 of Algorithm 2. Soft data symbol detection is then performed using an MMSE-based soft demodulator based on the results of the sequential estimator. In particular, signal distortion caused by wireless fading is first removed from the normalized received data signal $\mathbf{Y}_{d}$ to estimate the transmitted data symbols $\hat{x}_{nt}$ , $n\in\hat{\Xi}^{(j)}$ , $t\in\mathbf{T}_{d}$ via an MMSE equalizer. The prior LLRs of the coded bits are obtained via soft demodulation as follows:

\displaystyle\begin{split}L_{D}^{a}\left(c_{nj_{c}}^{(j)}\right)&\triangleq\ln\left(\frac{p\left(c_{nj_{c}}^{(j)}=0|\hat{x}_{nt}^{(j)}\right)}{p\left(c_{nj_{c}}^{(j)}=1|\hat{x}_{nt}^{(j)}\right)}\right)\\ &=\ln\left(\frac{\sum_{s\in\mathcal{X}_{\hat{j}_{c}}^{0}}\exp\left(-\gamma||\hat{x}_{nt}^{(j)}-s||_{2}^{2}/\sigma^{2}\right)}{\sum_{s\in\mathcal{X}_{\hat{j}_{c}}^{1}}\exp\left(-\gamma||\hat{x}_{nt}^{(j)}-s||_{2}^{2}/\sigma^{2}\right)}\right),n\in\hat{\Xi}^{(j)},\end{split}

(28)

where $\hat{j}_{c}\triangleq\mod({j}_{c},\log_{2}|\mathcal{X}|)$ and $t=L+1+\left\lfloor\frac{j_{c}}{\log_{2}|\mathcal{X}|}\right\rfloor$ .

With the knowledge of $\{L_{D}^{a}\left(c_{nj_{c}}^{(j)}\right)\}$ ’s, the BP-based channel decoder calculates the posterior LLRs of the coded bits $L_{D}^{P}\left(c_{nj_{c}}^{(j)}\right)$ , $n\in\hat{\Xi}^{(j)}\setminus\hat{\Xi}_{c}$ and decides the code blocks according to (27). CRC is performed for all users in $\hat{\Xi}^{(j)}\setminus\hat{\Xi}_{c}$ to obtain their payload bits $\{\hat{\bm{b}}_{n}\}$ ’s. Note that channel decoding is not performed for the users that have already passed the parity check, which differs the turbo receiver and helps to save the computations. We would like to point out that the SI-aided receiver can be extended by incorporating the idea of successful interference cancellation, i.e., subtracting the user data that have passed CRC from the received signal. However, the performance improvement is not guaranteed due to the potentially large channel estimation error. A thorough investigation on such an extension will be left for future works.

IV-C The Side Information

The AMP algorithm is a key component of the sequential estimator, which determines the average sparsity levels and effective channel coefficients based on the framework of Bayesian estimation [40]. As a consequence, prior knowledge of the user activity, i.e., the SI for the sequential estimator $\{\lambda_{n}\}$ ’s, also has significant impacts on the estimation accuracy, similar with the case of the BiG-AMP algorithm. In order to obtain more precise estimates through multiple iterations of Algorithm 3, we propose to update the SI by jointly utilizing the results of the sequential estimator and channel decoder, according to three different cases depending on the estimated set of active users and their parity check results via the following update rule:

\displaystyle\lambda_{n}^{(j+1)}=\left\{\begin{aligned} &1,\ n\in\hat{\Xi}_{c},\\ &\kappa_{1}\bar{\rho}_{n}^{(j)}+\frac{1-\kappa_{1}}{N_{c}}\sum_{j_{c}}\frac{\left|L_{D}^{P}(c_{nj_{c}}^{(j)})\right|}{1+\left|L_{D}^{P}(c_{nj_{c}}^{(j)})\right|},\ n\in\hat{\Xi}^{(j)}\setminus\hat{\Xi}_{c},\\ &\kappa_{2}\bar{\rho}_{n}^{(j)}+(1-\kappa_{2})\lambda_{n}^{(j)},\ n\in\mathcal{N}\setminus(\hat{\Xi}^{(j)}\cup\hat{\Xi}_{c}).\end{aligned}\right.

(29)

In particular, in the first case of (29), we set $\lambda_{n}^{(j+1)}=1$ , $n\in\hat{\Xi}_{c}$ since the users that have passed the parity check in the current or previous iterations can be safely determined as active. In the second case, we handle the users that are estimated as active but fail to pass the parity check in the current iteration, i.e., $n\in\hat{\Xi}^{(j)}\setminus\hat{\Xi}_{c}$ . For this set of users, the average sparsity levels and the posterior LLRs of the coded bits are jointly utilized to update the SI since both of them are informative on users’ activity. Specifically, the term $\frac{1}{N_{c}}\sum_{j_{c}}\frac{\left|L_{D}^{P}(c_{nj_{c}}^{(j)})\right|}{1+\left|L_{D}^{P}(c_{nj_{c}}^{(j)})\right|}\in[0,1)$ indicates the decoding reliability of user $n$ as its complement, i.e., $\frac{1}{N_{c}}\sum_{j_{c}}\frac{1}{1+\left|L_{D}^{P}(c_{nj_{c}}^{(j)})\right|}$ , provides an accurate estimate of the bit error rate [47]. Besides, parameter $\kappa_{1}\in[0,1]$ is an empirical weighting factor balancing the contributions of the channel estimation and data decoding results. In the third case, we update the SI for the users that neither pass the CRC in any iteration nor being determined as inactive in the current iteration using the average sparsity levels, using similar methodology as that for the turbo receiver in Section III, where $\kappa_{2}\in[0,1]$ denotes the learning rate.

To demonstrate the rationality of the SI update rule in (29), we provide numerical examples on the evolutions of $\{\lambda_{n}^{(j)}\}$ ’s, considering two scenarios with $K=40$ and $80$ in Fig. 5(a) and (b), respectively. From these figures, we see that as the iteration of Algorithm 3 proceeds, the SI evolves from the initial values, which is set to be $\lambda_{n}^{(0)}=\frac{K}{N}$ , $n\in\mathcal{N}$ , to the perfect estimates, i.e., $\lambda_{n}^{(0)}=1$ , $n=1,\cdots,K$ and $\lambda_{n}^{(0)}=0$ , $n=K+1,\cdots,N$ . This validates the effectiveness of the proposed SI update rule. Besides, we observe that a larger number of active users leads to a slower convergence rate and higher estimation variance, implying the need of more iterations in Algorithm 3.

IV-D Computational Complexity Analysis

The computational complexity of the two proposed receivers is summarized in TABLE II, where the number of complex-valued multiplications is adopted as the metric, and the complexity of a real-valued multiplication is assumed to be one quarter of a complex-valued multiplication. We use $O_{d}$ to denote the complexity of the decoder. Since the overall computational complexity of the two proposed receivers is determined by the iteration numbers, i.e., $Q_{1}$ and $Q_{3}$ for the turbo and SI-aided receiver respectively, we only focus on the complexity of one iteration in the following discussions, which is contributed by the estimator and channel decoder.

TABLE II: Complexity of the proposed receivers in one iteration

Receiver	Number of complex multiplications
Turbo	$(\frac{9}{4}MNL+\frac{13}{4}MNL_{d}+\frac{15}{4}MN$ $+\frac{3}{2}M(L+L_{d})+\frac{3}{4}NL_{d}\|\mathcal{X}\|)T_{j}+K_{1}^{\prime}O_{d}$
SI-aided	$(4MNL+\frac{7}{4}MN+\frac{19}{2}NL)T_{s}+\frac{5}{2}K_{2}^{2}M$ $+\frac{3}{4}MK_{2}L_{d}+\frac{5}{4}K_{2}+\frac{1}{2}K_{2}L_{d}\|\mathcal{X}\|+K_{2}^{\prime}O_{d}$

As summarized in TABLE II, the complexity of the channel decoder in each turbo iteration is given as $K_{1}^{\prime}O_{d}$ for the turbo receiver ( $K_{2}^{\prime}O_{d}$ for the SI-aided receiver), where $K_{1}^{\prime}$ ( $K_{2}^{\prime}$ ) is the number of users that need to be decoded in this iteration. The other terms shown in the table correspond to the complexity of the joint (sequential) estimator. Compared with the joint estimator that requires $(\frac{9}{4}MNL+\frac{13}{4}MNL_{d}+\frac{15}{4}MN+\frac{3}{2}M(L+$ $L_{d})+\frac{3}{4}NL_{d}|\mathcal{X}|)T_{j}$ complex-valued multiplications for the BiG-AMP algorithm in each turbo iteration, the sequential estimator first performs the AMP algorithm, followed by an one-shot data detection procedure, which respectively require $(4MNL+\frac{7}{4}MN+\frac{19}{2}NL)T_{s}$ and $\frac{5}{2}K_{2}^{2}M+\frac{3}{4}MK_{2}L_{d}+\frac{5}{4}K_{2}+\frac{1}{2}K_{2}L_{d}|\mathcal{X}|$ complex-valued multiplications. Please be noted that $K_{2}$ denotes the number of users that are detected as active, and $T_{j}$ and $T_{s}$ stand for the actual iteration numbers of the BiG-AMP and AMP algorithm, respectively.

We use the simulation setting as will be detailed in Section V to give a more intuitive idea on the computational complexity of the two proposed receivers by assuming $K=20$ . The values of $T_{j}$ and $T_{s}$ , given respectively by 41 and 52, are obtained by averaging the results over $100$ independent channel realizations. The numbers of complex-valued multiplications of the turbo and SI-aided receiver in one iteration (excluding those of the channel decoders) are given by $3.2\times 10^{8}$ and $1.39\times 10^{8}$ , respectively. In addition, since hard decision is made in each iteration of the SI-aided receiver, $K_{2}^{\prime}$ is typically smaller than $K_{1}^{\prime}$ , especially in later iterations. These two factors jointly imply that the SI-aided receiver has a much lower complexity compared with the turbo receiver. In the next section, we will compare the computational complexity of different receivers numerically using the measured execution time in simulations.

V Simulation Results

V-A Simulation Setting and Baseline Schemes

A single-cell uplink cellular network is simulated, where 200 users are randomly distributed in a circle with a radius of 500 m centered at the BS equipped with 64 antennas. The path loss of user $n$ is calculated as $\beta_{n}=-128.1-36.7\text{log}_{10}(r_{n})$ (dB), where $r_{n}$ (km) is the distance to the BS. The system bandwidth is 1 MHz, and the user transmit power is 23 dBm. Without otherwise specified, QPSK is employed as the modulation scheme and LDPC code is used for channel coding. Besides, we select CRC-8 to show the effectiveness of the proposed receivers, which is one of the CRC options for the physical uplink control channel (PUCCH) in 3GPP standards [48].

In order to achieve more stable convergence behavior, a damping factor $\omega\in(0,1]$ [17] is applied to moderate the updates of $M_{mt}^{p}$ , $V_{mt}^{p}$ , $\hat{h}_{mn}$ , and $\hat{x}_{nt}$ . For instance, the damped version of the estimated soft data symbol can be expressed as $\bar{x}_{nt}(i)=\omega\hat{x}_{nt}(i)+(1-\omega)\bar{x}_{nt}(i-1),t\in\mathbf{T}_{d}$ . In particular, the mean and variance of $z_{mt}$ in Line 3 and 4 of Algorithm 2 are replaced with the damped versions $\bar{M}_{mt}^{p}$ and $\bar{V}_{mt}^{p}$ , respectively, while the damped versions of $\hat{h}_{mn}$ and $\hat{x}_{nt}$ are used in Lines 10-12 and Lines 22-23 of Algorithm 2. The simulation results are averaged over $10^{5}$ independent channel realizations, and other critical simulation parameters are summarized in TABLE III.

TABLE III: Simulation parameters

Parameters	Values	Parameters	Values
$M$	64	$T$	200
$L$	50	$L_{d}$	150
$N_{b}$	142	$N_{d}$	150
$N_{c}$	300	CRC type	CRC-8
$\omega$	0.6	$N$	200
$\theta$	0.4	$Q_{1}$ , $Q_{3}$	6
$Q_{2}$	100	$\epsilon_{1}$ , $\epsilon_{2}$ , $\epsilon_{3}$	$10^{-5}$
$\kappa$ , $\kappa_{2}$	0.5	$\kappa_{1}$	0.5
Noise power density	-169 dBm/Hz	Code rate	$1/2$

We adopt two baseline schemes and a performance upper bound for comparisons:

•

Separate design [24]: This scheme first performs JADCE via the AMP algorithm, after which, data symbols are detected using an MMSE equalizer. The detected soft data symbols are then converted to prior LLRs of the coded bits for data decoding via soft demodulation using (28). This can be viewed as an instance of the SI-aided receiver by setting $Q_{3}=1$ .

•

Data-assisted design with BiG-AMP [29]: This scheme exploits the common sparsity pattern using the BiG-AMP algorithm for joint activity detection, channel estimation, and soft data symbol detection. The detected soft data symbols are converted to prior LLRs of the coded bits for data decoding using (28). This is a special case of the turbo receiver when $Q_{1}=1$ .

•

Turbo receiver with known user activity: This scheme assumes perfect knowledge of the user activity and consequently, channel estimation and data decoding are performed via the proposed turbo receiver by setting $\lambda_{n}^{(j)}=1$ , $n\in\Xi$ , and $\lambda_{n}^{(j)}=0$ , $n\in\mathcal{N}\setminus\Xi$ . This scheme serves as a performance upper bound.

Note that all the simulated schemes adopt the same BP-based LDPC decoder [43] for fair comparisons.

V-B Results

We first evaluate the activity detection error probability (including the missed detection and false alarm probability) and the normalized mean square error (NMSE) of channel estimation in Fig. 6 and Fig. 7, respectively. It is observed that a large number of active users degrade both the activity detection and channel estimation performance due to the limited radio resource reserved for pilot transmission. Compared with the separate design, the data-assisted design achieves much lower activity detection and channel estimation errors, validating the benefits of incorporating the received data symbols. It is also seen that the proposed turbo receiver significantly outperforms the data-assisted design as the soft channel decoding results are further utilized to refine the prior distributions of the transmitted data symbols through multiple turbo iterations. Besides, despite with some performance loss compared with the turbo receiver, the low-cost SI-aided receiver secures noticeable performance improvement compared with the data-assisted design, which can be credited to the use of the customized SI as prior knowledge of the user activity.

Fig. 8 shows the BLER of all the simulated schemes versus the number of active users. Similar to activity detection and channel estimation, the turbo receiver achieves the best BLER performance. Assuming the block error rate requirement is $10^{-3}$ , the turbo receiver is able to support 40 active users while the separate design can only support 20, which is a remarkable 100% increase. Compared with the baseline schemes, it also greatly narrows the performance gap to the upper bound scheme with perfect knowledge of the user activity, owing to the more accurate activity detection and channel estimation. Because of the same reason, the SI-aided receiver brings notable BLER reduction compared with the data-assisted design.

Since both the turbo receiver and the SI-aided receiver iterate between an estimator and a channel decoder, we further investigate the impacts of the number of iterations, i.e., $Q_{1}$ for the turbo receiver and $Q_{3}$ for the SI-aided receiver, as shown in Fig. 9. We examine the computation complexity of different schemes by measuring their average execution time on the same computing server. Since the average execution time is platform-specific, it is normalized with respect to that of the separate design. As shown in the figure, the separate design has the lowest complexity but the highest BLER, as it ignores both the common sparsity pattern and the information offered by the channel decoder. Besides, it is observed that the performance achieved by both of the proposed receivers improves with the number of iterations (i.e., $Q_{1}$ for the turbo receiver and $Q_{3}$ for the data-assisted design), which again corroborates the effectiveness of the iterative estimation on the prior information of the user activity and the transmitted data symbols. However, such performance improvement is accompanied with increased computation complexity. Compared with the turbo receiver, the SI-aided receiver enjoys 66% $\sim$ 74% average execution time reduction since the sequential estimator only processes the pilot signal for JADCE, and channel decoding is performed just for the users that have not passed the parity check. In addition, we notice that the major performance gains of the proposed receivers come from the first few iterations, e.g., seven in the considered scenario, and the subsequent iterations only contribute to marginal further improvement. In other words, there is no need to execute a large number of iterations, and wise choices of hyper-parameters for the proposed receivers are critical to balance the performance gain and the computation cost.

To show the effectiveness of the two proposed receivers with different modulation schemes, we also simulate a grant-free massive RA system with 16-QAM while keeping $L_{d}$ unchanged. As shown in Fig. 10, when 16-QAM is used, the BLER performance of different receivers degrades, which is in accordance with intuition. However, we see the two proposed receivers still achieve significant performance improvements compared with the separate design, which again demonstrates the benefit of utilizing the common sparsity and data decoding results in designing grant-free massive RA receivers.

Furthermore, we investigate the impacts of the threshold value $\theta$ for determining the set of active users in the AMP-based receivers. A covariance-based receiver is also simulated, which applies the covariance-based method [18] for activity detection together with an MMSE channel estimator. Similar to the AMP-based receivers, a threshold $\nu$ ( $\nu>0$ ) was introduced to detect the set of active users in [18]. It is observed from Fig. 11 and Fig. 12 that there is a tradeoff between the missed detection and false alarm probability, and an optimal threshold value brings the best BLER performance for a given receiver. The two proposed receivers achieve better performance compared with the baselines with various values of $\theta$ . Although the covariance-based receiver outperforms the conventional separate and data-assisted designs, it is defeated by the proposed turbo receiver by a large margin. On the other hand, while the SI-aided receiver achieves comparable activity detection performance as the covariance-based receiver, it outperforms the covariance-based receiver in terms of BLER with an optimized threshold value. It is also worthwhile to note that the covariance-based receiver has a very high complexity and its execution time is 6.5 times of the separate design, but that of the SI-aided receiver is only $4.2$ times. These results demonstrate the competence of the proposed receivers over the covariance-based receiver.

VI Conclusions

This paper carried out the first holistic investigation that jointly considered activity detection, channel estimation, and data decoding for grant-free massive random access (RA). A turbo receiver was proposed to exploit the common sparsity pattern in the received pilot and data signal, and its performance is enhanced by the extrinsic information from the channel decoder. To reduce the complexity, we also developed a low-cost side information (SI)-aided receiver, where the SI is updated iteratively to take advantages of the common sparsity pattern and the channel decoding results. Simulation results demonstrated that substantial performance gains can be obtained with advanced receivers for a given protocol.

Our study demonstrates the benefits of exploiting the structured information from the pilot and data symbols, and the importance of incorporating the interplay among activity detection, channel estimation, and data decoding to the design of massive RA receivers. In other words, treating activity detection and data decoding separately, as in many previous studies, leads to highly suboptimal receivers. For future investigations, it would be interesting to extend the proposed receivers to scenarios with spatial-temporal correlation of the user activity and investigate more complex massive RA systems supported by ultra-massive MIMO and reconfigurable meta-surfaces. Meanwhile, since the proposed receivers are iterative algorithms by nature, it is critical to further reduce their computational complexity via, for examples, deep learning based methods [49, 50] to facilitate practical implementation. In addition, optimally controlling the uplink transmit power is also very important.

Appendix A Derivations of $\hat{z}_{mt}$ and Its Variance

Since $p(y_{mt}|z_{mt})=\mathcal{CN}\left(z_{mt};y_{mt},\frac{\sigma^{2}}{\gamma}\right)$ and the prior distribution of $p(z_{mt})$ is approximated as $\mathcal{CN}\Big{(}z_{mt};M_{mt}^{p(j)}(i),$ $V_{mt}^{p(j)}(i)\Big{)}$ , the joint PDF $p(y_{mt},z_{mt})$ is approximated in the $i$ -th iteration of Algorithm 2 by

	$\displaystyle J_{z_{mt}}^{(j)}(i)$	$\displaystyle=\mathcal{CN}\left(z_{mt};y_{mt},\frac{\sigma^{2}}{\gamma}\right)\mathcal{CN}\left(z_{mt};M_{mt}^{p(j)}(i),V_{mt}^{p(j)}(i)\right)$		(30)
		$\displaystyle=A_{1}\cdot\mathcal{CN}\left(z_{mt};C^{(j)}(i),D^{(j)}(i)\right),$

where $C^{(j)}(i)\triangleq\frac{y_{mt}V_{mt}^{p(j)}(i)\gamma+\sigma^{2}M_{mt}^{p(j)}(i)}{\sigma^{2}+V_{mt}^{p(j)}(i)\gamma}$ , $D^{(j)}(i)\triangleq\frac{\sigma^{2}V_{mt}^{p(j)}(i)}{\sigma^{2}+V_{mt}^{p(j)}(i)\gamma}$ , and $A_{1}\triangleq\mathcal{CN}\Big{(}0;y_{mt}-M_{mt}^{p(j)}(i),\frac{\sigma^{2}}{\gamma}+V_{mt}^{p(j)}(i)\Big{)}$ . Thus, the posterior distribution of $z_{mt}$ can be approximated in the $i$ -th iteration via the Bayes’ rule as follows:

\displaystyle r_{z_{mt}}^{(j)}(i)=\frac{J_{z_{mt}}^{(j)}(i)}{\int J_{z_{mt}}^{(j)}(i)dz_{mt}}=\mathcal{CN}\left(z_{mt};C^{(j)}(i),D^{(j)}(i)\right).

(31)

Since the MMSE estimate of $z_{mt}$ is the posterior mean, we have $\hat{z}_{mt}^{(j)}(i)=\frac{y_{mt}V_{mt}^{p(j)}(i)\gamma+\sigma^{2}M_{mt}^{p(j)}(i)}{\sigma^{2}+V_{mt}^{p(j)}(i)\gamma}$ . Accordingly, the variance of the MMSE estimate is given by the posterior variance as $V_{mt}^{z(j)}(i)=\frac{\sigma^{2}V_{mt}^{p(j)}(i)}{\sigma^{2}+V_{mt}^{p(j)}(i)\gamma}$ .

Appendix B Derivations of $P_{mn}^{h\left(j\right)}\left(i\right)$ and $Q_{mn}^{h\left(j\right)}\left(i\right)$

According to the principles of the BiG-AMP algorithm, $\prod_{t=1}^{L}I_{f_{y_{mt}}\rightarrow h_{mn}}^{(j)}\left(i\right)$ and $\prod_{t=L+1}^{T}I_{f_{y_{mt}}\rightarrow h_{mn}}^{(j)}\left(i\right)$ are approximated as complex Gaussian distributions $\mathcal{CN}\Big{(}h_{mn};P_{p,mn}^{h(j)}(i),$ $Q_{p,mn}^{h(j)}(i)\Big{)}$ and $\mathcal{CN}\Big{(}h_{mn};P_{d,mn}^{h(j)}(i),Q_{d,mn}^{h(j)}(i)\Big{)}$ , respectively. By substituting these two complex Gaussian PDFs into the term $\prod_{t=1}^{L}I_{f_{y_{mt}}\rightarrow h_{mn}}^{(j)}\left(i\right)\prod_{t=L+1}^{T}I_{f_{y_{mt}}\rightarrow h_{mn}}^{(j)}\left(i\right)$ , we have:

	$\displaystyle\prod_{t=1}^{L}I_{f_{y_{mt}}\rightarrow h_{mn}}^{(j)}\left(i\right)\prod_{t=L+1}^{T}I_{f_{y_{mt}}\rightarrow h_{mn}}^{(j)}\left(i\right)$		(32)
	$\displaystyle\!=\!\mathcal{CN}\!\left(h_{mn};\!P_{p,mn}^{h(j)}(i),Q_{p,mn}^{h(j)}(i)\!\right)\!\mathcal{CN}\!\left(h_{mn};\!P_{d,mn}^{h(j)}(i),Q_{d,mn}^{h(j)}(i)\!\right)$
	$\displaystyle\!=\!A_{2}\cdot\mathcal{CN}\!\left(h_{mn};P_{mn}^{h(j)}(i),Q_{mn}^{h(j)}(i)\right),$

where $A_{2}\!\triangleq\!\mathcal{CN}\left(0;P_{p,mn}^{h(j)}(i)\!-\!P_{d,mn}^{h(j)}(i),P_{p,mn}^{h(j)}(i)\!+\!P_{d,mn}^{h(j)}(i)\right)$ , $P_{mn}^{h(j)}(i)=\frac{P_{p,mn}^{h(j)}(i)Q_{d,mn}^{h(j)}(i)+P_{d,mn}^{h(j)}(i)Q_{p,mn}^{h(j)}(i)}{Q_{p,mn}^{h(j)}(i)+Q_{d,mn}^{h(j)}(i)}$ , and $Q_{mn}^{h(j)}(i)=$ $\frac{Q_{p,mn}^{h(j)}(i)Q_{d,mn}^{h(j)}(i)}{Q_{p,mn}^{h(j)}(i)+Q_{d,mn}^{h(j)}(i)}$ .

Appendix C Derivation of (16)

It is straightforward that $p\left(h_{mn}|u_{n}=1\right)=\mathcal{CN}(h_{mn};$ $0,\beta_{n})$ and $p\left(h_{mn}|u_{n}=0\right)=\delta(h_{mn})$ . Thus, based on the BP algorithm, the term $I_{f_{h_{mn}\rightarrow h_{mn}}}^{(j)}(i)$ can be determined as

$\displaystyle I_{f_{h_{mn}}\rightarrow h_{mn}}^{(j)}(i)$	$\displaystyle=\sum_{u_{n}\in\{0,1\}}p(h_{mn}\|u_{n})I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)$	(33)
	$\displaystyle=I_{u_{n}\rightarrow f_{h_{mn}}}^{\left(j\right)}\left(i\right)\Big{\|}_{u_{n}=0}\delta(h_{mn})$
	$\displaystyle+I_{u_{n}\rightarrow f_{h_{mn}}}^{\left(j\right)}\left(i\right)\Big{\|}_{u_{n}=1}\mathcal{CN}(h_{mn};0,\beta_{n}),$

where $I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)$ is the message from variable node $u_{n}$ to factor node $p(h_{mn}|u_{n})$ that can be obtained as follows:

\displaystyle I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)=p(u_{n})\prod_{k\in\mathcal{M}\backslash\{m\}}I_{f_{h_{kn}}\rightarrow u_{n}}^{(j)}(i).

(34)

In (34), $p\left(u_{n}\right)$ stands for the likelihood that user $n$ is active or not in the considered transmission block, and $I_{f_{h_{kn}}\rightarrow u_{n}}^{(j)}(i)$ is the message from factor node $p(h_{kn}|u_{n})$ ( $k\in\mathcal{M}\setminus\{m\}$ ) to variable node $u_{n}$ that can be expanded as follows:

\displaystyle I_{f_{h_{kn}}\rightarrow u_{n}}^{(j)}(i)=\int p\left(h_{kn}|u_{n}\right)I_{h_{kn}\rightarrow f_{h_{kn}}}^{(j)}(i)dh_{kn}.

(35)

In (35), $I_{h_{kn}\rightarrow f_{h_{kn}}}^{(j)}(i)$ is the message from variable node $h_{kn}$ to factor node $p(h_{kn}|u_{n})$ that can be obtained based on the BP algorithm as follows:

	$\displaystyle I_{h_{kn}\rightarrow f_{h_{kn}}}^{(j)}(i)=\prod_{t=1}^{L}I_{f_{y_{kt}}\rightarrow h_{kn}}^{(j)}\left(i\right)\prod_{t=L+1}^{T}I_{f_{y_{kt}}\rightarrow h_{kn}}^{(j)}\left(i\right)$		(36)
	$\displaystyle=A_{3}\cdot\mathcal{CN}\left(h_{kn};P_{kn}^{h(j)}(i),Q_{kn}^{h(j)}(i)\right),$

where the second equality in (36) adopts the same approximation as the one used in (32), and $A_{3}$ is a constant given as $\mathcal{CN}\Big{(}0;$ $P_{p,kn}^{h(j)}(i)-P_{d,kn}^{h(j)}(i),Q_{p,kn}^{h(j)}(i)+Q_{d,kn}^{h(j)}(i)\Big{)}$ . By substituting the right-hand side of (36) into (35), we have:

$\displaystyle I_{f_{h_{kn}}\rightarrow u_{n}}^{(j)}(i)$	$\displaystyle=A_{3}\int p(h_{kn}\|u_{n})$	(37)
	$\displaystyle\times\mathcal{CN}\left(h_{kn};P_{kn}^{h(j)}(i),Q_{kn}^{h(j)}(i)\right)dh_{kn}$
	$\displaystyle=A_{3}\left\{\begin{array}[]{ll}\mathcal{CN}\left(0;P_{kn}^{h(j)}(i),Q_{kn}^{h(j)}(i)+\beta_{n}\right),u_{n}=1,\\ \mathcal{CN}\left(0;P_{kn}^{h(j)}(i),Q_{kn}^{h(j)}(i)\right),u_{n}=0.\end{array}\right.$	(40)

Define

\displaystyle L_{mn}^{(j)}(i)\triangleq\ln\left(\frac{I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)|_{u_{n}=1}}{I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)|_{u_{n}=0}}\right),

(41)

which uniquely determines the values of $I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)\Big{|}_{u_{n}=1}$ and $I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)\Big{|}_{u_{n}=0}$ , since $\sum_{u_{n}\in\{0,1\}}I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)=1$ . We further define

	$\displaystyle K_{kn}^{(j)}(i)$	$\displaystyle\triangleq\ln\left(\frac{I_{f_{h_{kn}}\rightarrow u_{n}}^{(j)}(i)\|_{u_{n}=1}}{I_{f_{h_{kn}}\rightarrow u_{n}}^{(j)}(i)\|_{u_{n}=0}}\right)$		(42)
		$\displaystyle=\ln\left(\frac{Q_{mn}^{h(j)}(i)}{Q_{mn}^{h(j)}(i)+\beta_{n}}\right)+\frac{\|P_{mn}^{h(j)}(i)\|^{2}\beta_{n}}{\left(Q_{mn}^{h(j)}(i)+\beta_{n}\right)Q_{mn}^{h(j)}(i)}$

and $U_{n}\triangleq\ln\left(\frac{p\left(u_{n}=1\right)}{p\left(u_{n}=0\right)}\right)$ . With some basic mathematical manipulations, $L_{mn}^{(j)}(i)$ can be simplified as follows:

\displaystyle L_{mn}^{(j)}(i)=U_{n}+\sum\nolimits_{k\in\mathcal{M}\backslash\{m\}}K_{kn}^{(j)}(i).

(43)

As a result, the terms $I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)\Big{|}_{u_{n}=1}$ and $I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)\Big{|}_{u_{n}=0}$ can be determined as $\frac{\exp{\left(L_{mn}^{(j)}(i)\right)}}{1+\exp{\left(L_{mn}^{(j)}(i)\right)}}$ and $\frac{1}{1+\exp{\left(L_{mn}^{(j)}(i)\right)}}$ , respectively. We complete the derivation by defining $\rho_{mn}^{(j)}(i)\triangleq\frac{\exp{\left(L_{mn}^{(j)}(i)\right)}}{1+\exp{\left(L_{mn}^{(j)}(i)\right)}}$ .

References

[1] X. Bian, Y. Mao, and J. Zhang, “Joint activity detection and data decoding in massive random access via a turbo receiver,” in Proc. IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Sep. 2021.
[2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of Things: A survey on enabling technologies, protocols, and applications,” IEEE Commun. Surveys Tut., vol. 17, no. 4, pp. 2347-2376, Fourth Quart. 2015.
[3] Cisco, “Cisco annual Internet report (2018–2023),” Cisco White Paper, Mar. 2020.
[4] ITU-R, “Framework and overall objectives of the future development of IMT for 2020 and beyond Recommendation,” ITU-R M.2083-0, 2015.
[5] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: Physical and MAC-layer solutions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 59–65, Sep. 2016.
[6] M. Hasan, E. Hossain, and D. Niyato, “Random access for machine-to-machine communication in LTE-advanced networks: Issues and approaches,” IEEE Commun. Mag., vol. 51, no. 6, pp. 86-93, Jun. 2013.
[7] E. Björnson, E. Carvalho, J. H. Sørensen, E. G. Larsson, and P. Popovski, “A random access protocol for pilot allocation in crowded massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 4, pp. 2220-2234, Apr. 2017.
[8] N. Jiang, Y. Deng, A. Nallanathan, X. Kang, and T. Q. S. Quek, “Analyzing random access collisions in massive IoT networks,” IEEE Trans. Wireless Commun., vol. 17, no. 10, pp. 6853-6879, Oct. 2018.
[9] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanović, and E. Carvalh, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the Internet of Things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88-99, Sep. 2018.
[10] P. Schulz et al., “Latency critical IoT applications in 5G: Perspective on the design of radio interface and network architecture,” IEEE Commun. Mag., vol. 55, no. 2, pp. 70-78, Feb. 2017.
[11] X. Chen, D. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615-637, Mar. 2021.
[12] Y. Wu, X. Gao, S. Zhou, W. Yang, Y. Polyanskiy, and G. Caire, “Massive access for future wireless communication systems,” IEEE Wireless Commun., vol. 27, no. 4, pp. 148-156, Aug. 2020.
[13] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
[14] T. Robert, “Regression shrinkage and selection via the Lasso,” J. Roy. Statist. Soc., vol. 58, no. 1, pp. 267–288, Jan. 1996.
[15] J. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 2231– 2242, Oct. 2004.
[16] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing: I. motivation and construction,” in Proc. IEEE Inf. Theory Workshop (ITW), Cairo, Egypt, Jan. 2010.
[17] J. T. Parker, P. Schniter, and V. Cevher, “Bilinear generalized approximate message passing – Part I: Derivation,” IEEE Trans. Signal Process., vol. 62, no. 22, pp. 5839–5853, Nov. 2014.
[18] S. Haghighatshoar, P. Jung, and G. Caire, “Improved scaling law for activity detection in massive MIMO systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA, Jun. 2018.
[19] Z. Chen, F. Sohrabi, and W. Yu, “Multi-cell sparse activity detection for massive random access: Massive MIMO versus cooperative MIMO,” IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4060-4072, Aug. 2019.
[20] B. Wang, L. Dai, Y. Zhang, T. Mir, and J. Li, “Dynamic compressive sensing-based multi-user detection for uplink grant-Free NOMA,” IEEE Commun. Lett., vol. 20, no. 11, pp. 2320-2323, Nov. 2016.
[21] C. Wei, H. Liu, Z. Zhang, J. Dang, and L. Wu, “Approximate message passing-based joint user activity and data detection for NOMA,” IEEE Commun. Lett., vol. 21, no. 3, pp. 640–643, Mar. 2017.
[22] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,” IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1890–1904, Apr. 2018.
[23] L. Liu and W. Yu, “Massive connectivity with massive MIMO – Part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, Jun. 2018.
[24] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764–779, 2020.
[25] Y. Cheng, L. Liu, and P. Li, “Orthogonal AMP for massive access in channels with spatial and temporal correlations,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 726-740, Mar. 2021.
[26] X. Shao, X. Chen and R. Jia, “A dimension reduction-based joint activity detection and channel estimation algorithm for massive access,” IEEE Trans. Signal Process., vol. 68, pp. 420-435, Dec. 2019.
[27] Y. Cui, S. Li and W. Zhang, “Jointly sparse signal recovery and support recovery via deep learning with applications in MIMO-based grant-free random access,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 788-803, Mar. 2021.
[28] Y. Du, et al., “Joint channel estimation and multiuser detection for uplink grant-free NOMA,” IEEE Wireless Commun. Lett., vol. 7, no. 4, pp. 682–685, Feb. 2018.
[29] Q. Zou, H. Zhang, D. Cai and H. Yang, “A low-complexity joint user activity, channel and data estimation for grant-free massive MIMO systems,” IEEE Signal Process. Lett., vol. 27, pp. 1290-1294, 2020.
[30] X. Bian, Y. Mao, and J. Zhang, “Supporting more active users for massive access via data-assisted activity detection,” in Proc. IEEE Int. Conf. Commun. (ICC), Montreal, QC, Canada, Jun. 2021.
[31] B. M. Hochwald and S. t. Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389-399, Mar. 2003.
[32] K. Wong, A. Paulraj, and R. Murch, “Efficient high-performance decoding for overloaded MIMO antenna systems,” IEEE Trans. Wireless Commun., vol. 6, no. 5, pp. 1833–1843, May 2007.
[33] 3GPP, “3GPP TS 36.212 version 10.0.0 Release 10,” Jan. 2011.
[34] 3GPP, “3GPP TS 38.212 version 15.10.0 Release 15,” Nov. 2020.
[35] T. Cui and C. Tellambura, “Power delay profile and noise variance estimation for OFDM,” IEEE Commun. Lett., vol. 10, no. 1, pp. 25-27, Jan. 2006.
[36] S. Haykin, M. Sellathurai, Y. de Jong, and T. Willink, “Turbo-MIMO for wireless communications,” IEEE Commun. Mag., vol. 42, no. 10, pp. 48-53, Oct. 2004.
[37] X. Wautelet, A. Dejonghe, and L. Vandendorpe, “MMSE-based fractional turbo receiver for space-time BICM over frequency-selective MIMO fading channels,” IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1804-1809, Jun. 2004.
[38] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding: Turbo-codes,” IEEE Trans. Commun., vol. 44, no. 10, pp. 1261-1271, Oct. 1996.
[39] F. R. Kschischang, B. J. Frey and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498-519, Feb. 2001.
[40] S. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, Englewood Cliffs, NJ, USA, 1993.
[41] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018.
[42] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applications. Springer, 2001.
[43] R. G. Gallager, “Low density parity check codes,” IRE Trans. Inf. Theory, vol. IT-8, no. 1, pp. 21-28, Jan. 1962.
[44] D. E. Hocevar, “A reduced complexity decoder architecture via layered decoding of LDPC codes,” in Proc. IEEE Workshop Signal Process. Syst. (SiPS), Austin, TX, USA, Oct. 2004.
[45] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 429-445, Mar. 1996.
[46] A. Ma, Y. Zhou, C. Rush, D. Baron, and D. Needell, “An approximate message passing framework for side information,” IEEE Trans. Signal Process., vol. 67, no. 7, pp. 1875-1888, Apr. 2019.
[47] B. Goektepe, S. Faehse, L. Thiele, T. Schierl, and C. Hellge, “Subcode-based early HARQ for 5G,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, USA, May 2018.
[48] 3GPP, “3GPP TS 36.211 version 15.3.0 Release 15,” Oct. 2018.
[49] V. Satorras and M. Welling, “Neural enhanced belief propagation on factor graphs,” in Proc. AISTATS-21, pp. 685–693, Apr. 2021.
[50] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “Graph neural networks for scalable radio resource management: architecture design and theoretical analysis,” IEEE J. Select. Areas Commun., vol. 39, no. 1, pp. 101–115, Jan. 2021.

$\displaystyle I_{f_{h_{mn}}\rightarrow h_{mn}}^{(j)}(i)$	$\displaystyle=\sum_{u_{n}\in\{0,1\}}p(h_{mn}\|u_{n})I_{u_{n}\rightarrow f_{h_{mn}}}^{(j)}(i)$	(33)
	$\displaystyle=I_{u_{n}\rightarrow f_{h_{mn}}}^{\left(j\right)}\left(i\right)\Big{\|}_{u_{n}=0}\delta(h_{mn})$
	$\displaystyle+I_{u_{n}\rightarrow f_{h_{mn}}}^{\left(j\right)}\left(i\right)\Big{\|}_{u_{n}=1}\mathcal{CN}(h_{mn};0,\beta_{n}),$