Joint Device Detection, Channel Estimation, and Data Decoding with Collision Resolution for MIMO Massive Unsourced Random Access

Tianya Li, Yongpeng Wu, Senior Member, IEEE, Mengfan Zheng, Wenjun Zhang, Fellow, IEEE,
Chengwen Xing, Member, IEEE, Jianping An, Senior Member, IEEE, Xiang-Gen Xia, Fellow, IEEE,
and Chengshan Xiao, Fellow, IEEE T. Li, Y. Wu (corresponding author), and W. Zhang are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Minhang 200240, China. Email: {tianya, yongpeng.wu, zhangwenjun}@sjtu.edu.cn. M. Zheng is with the Department of Electrical, and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K. E-mail: [email protected]. C. Xing and J. An are with the School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China. E-mail: [email protected]; [email protected]. X.-G. Xia is with the Department of Electrical, and Computer Engineering, University of Delaware, Newark, DE 19716 USA. E-mail: [email protected]. C. Xiao is with the Department of Electrical, and Computer Engineering, Lehigh University, Bethlehem, PA 18015 USA. E-mail: [email protected].

Abstract

In this paper, we investigate a joint device activity detection (DAD), channel estimation (CE), and data decoding (DD) algorithm for multiple-input multiple-output (MIMO) massive unsourced random access (URA). Different from the state-of-the-art slotted transmission scheme, the data in the proposed framework is split into only two parts. A portion of the data is coded by compressed sensing (CS) and the rest is low-density-parity-check (LDPC) coded. In addition to being part of the data, information bits in the CS phase also undertake the task of interleaving pattern design and channel estimation (CE). The principle of interleave-division multiple access (IDMA) is exploited to reduce the interference among devices in the LDPC phase. Based on the belief propagation (BP) algorithm, a low-complexity iterative message passing (MP) algorithm is utilized to decode the data embedded in these two phases separately. Moreover, combined with successive interference cancellation (SIC), the proposed joint DAD-CE-DD algorithm is performed to further improve performance by utilizing the belief of each other. Additionally, based on the energy detection (ED) and sliding window protocol (SWP), we develop a collision resolution protocol to handle the codeword collision, a common issue in the URA system. In addition to the complexity reduction, the proposed algorithm exhibits a substantial performance enhancement compared to the state-of-the-art in terms of efficiency and accuracy.

Index Terms:

belief propagation, compressed sensing, LDPC, MIMO, unsourced random access

I Introduction

I-A Background and Related Works

Next generation multiple access (NGMA) aims to support massive connectivity scenarios for many envisioned Internet of Things (IoT) applications, e.g. manufacturing, transportation, agriculture, and medicine, to be efficiently and flexibly connected [5, 2, 4, 6, 1, 3]. A generic scenario for IoT connectivity involves a massive number of machine-type connections [2], known as the massive machine-type communication (mMTC), which is one of the main use cases of the fifth-generation (5G) and beyond [3]. Compared to human-type communications (HTC), MTC has two distinct features: sporadic traffic and short packet communication. The sporadic access means that for any given time, for the sake of long battery life and energy saving, only a small fraction within a huge number of devices are active for data transmission [4]. While the small data payloads result in a fall of spectrum efficiency and an increase of latency when applied with traditional multiple access protocols. In this regard, a number of scalable random access schemes have been investigated in the content of massive connectivity. Conceptually, there are two lines of work for that end, namely, individual and common codebook-based approaches, respectively.

In the spirit of the individual codebook based approach, each device is equipped with a unique codebook for the identification conducted by the base station (BS) [4, 7]. Since the BS is interested in the IDs of devices, this framework can be referred to as sourced random access (SRA). A typical transmission process includes two phases in SRA, namely, training and data transmission, respectively. In the training phase, the BS conducts the task of device activity detection (DAD) and channel estimation (CE) based on the pre-allocated unique pilot sequences transmitted by active devices, while the data transmission is performed in the following phase. The task of DAD-CE can be modeled as the compressed sensing (CS) problem and there have been quantities of algorithmic solutions for this issue. Among these CS recovery techniques, the approximate message passing (AMP) algorithm, which is first proposed in [8] for the single measurement vector (SMV) based problem, exhibits great performance in terms of accuracy and complexity. Since the publication of [8], there have been vast variants based on the AMP algorithm, such as multiple measurement vector (MMV) AMP [4], generalized MMV (GMMV) AMP [9], orthogonal AMP (OAMP) [10], generalized AMP (GAMP) [11] and many other works. Another line of work is based on the belief propagation (BP) algorithm [12], which models the problem as a graph and iteratively calculates the messages among different nodes [13, 14, 15, 16]. Thanks to the consistency of the message updating rules, the messages can always be jointly updated within iterations to obtain improved performance. However, the premise of these algorithms is that each device should have a unique pilot sequence. As we jump out of these algorithmic solutions and go back to the essence of the individual codebook, we find this is inapplicable to assign each device a unique pilot in the mMTC scenario with a huge amount of potential devices. Meanwhile, it is a waste of resources especially with sporadic traffic. To mitigate this issue, the study on the framework of common codebooks is rapidly developed.

As opposed to the case of individual codebooks, active devices choose codewords from a common codebook for their messages. In this regard, the task of BS is only to produce the transmitted messages regardless of the corresponding IDs, thus leading to the so-called unsourced random access (URA). The main difference between URA with other grant-based and grant-free random access protocols is that the BS does not perform device identification but only decodes the list of active device messages up to permutations [3]. Conceptually, there are two distinct advantages for this novel framework: $\left.i\right)$ the BS is capable of accommodating massive devices, since all devices share the common codebook instead of being assigned unique preamble sequences as in traditional schemes. $\left.ii\right)$ No device identification information is needed to be embedded in the transmitted sequence. As such, the overhead will be reduced, contributing to improved efficiency. The study on URA is first reported in [17], in which Polyanskiy considered the scenario of a massive number of infrequent and uncoordinated devices and discussed a random coding existence bound in the Gaussian multiple-access channel (GMAC). Since the publication of [17], there have been many works devoted to approaching that bound [18, 19, 20, 21, 23, 24, 25, 26]. A low complexity coding scheme is investigated in [18]. Based on the combination of compute-and-forward and coding for a binary adder channel, it exhibits an improved performance compared with the traditional schemes, such as slotted-ALOHA and treating interference as noise (TIN). However, the size of the codebook increases exponentially with the blocklength, resulting in difficulty in achieving that underlying bound. To mitigate this issue, a slotted transmission framework is proposed in [19], referred to as the coded compressed sensing (CCS) scheme. This cascaded system includes inner and outer codes. The outer tree encoder first splits the data into several slots with parity bits added. The CS coding is then employed within each slot to pick codewords from the codebook. The blocklength, as well as the size of the codebook in each slot, are greatly reduced, thus leading to a relaxation of the computational burden. On the receiver side, the inner CS-based decoder is first utilized to recover the slot-wise transmitted messages within each slot. The outer tree decoder then stitches the decoded messages across different slots according to the prearranged parity. This structure is inherited by the later works in [20, 21], where the authors exploit the concept of sparse regression code (SPARC) [22] to conduct the inner encoding and decode it with the AMP algorithm. Some other coding schemes for URA are also reported in [23, 24, 25, 26]. The polar coding schemes based on TIN and successive interference cancellation (SIC) are investigated in [23, 24]. In [25, 26], the author consider a hierarchical framework, where the device’s data is split into two parts. A portion of the data is encoded by a common CS codebook and recovered by the CS techniques, while the rest is encoded by the low-density parity check code (LDPC) [27] with the key parameters conveyed in the former part. Besides the above works considering the GMAC, there are also works in the fading channel [28, 29, 30].

Besides the above works considering a single receive antenna at the BS, the study of massive URA in multiple-input multiple-output (MIMO) systems has also drawn increasing attention. The BS equipped with multiple antennas provides extra dimensions for the received signal, thus offering more opportunity for signal detection and access of a massive number of devices. A coordinated-wise descend activity detection (AD) algorithm is proposed in [31, 32], which finds the large-scale fading coefficients (LSFCs) of all the devices (coordinates) iteratively. The authors adopt a concatenated coding scheme with the aforementioned tree code as the outer code, and a non-Bayesian method (i.e., maximum likelihood (ML) detection) is leveraged as the inner decoder based on the covariance of the received signal, referred to as the covariance-based ML (CB-ML) algorithm. However, the performance of the CB-ML algorithm degrades dramatically when the number of antennas is less than that of the active devices. There are also some improvements to this algorithm in terms of complexity or accuracy [33, 34]. The slotted transmission framework proposed in [33] eliminates the need for concatenated coding. That is, no parity bit is added within each slot and the slot-wise messages can be stitched together by clustering the estimated channels. However, the channels in [33] are assumed to be identically distributed over all the slots, which is difficult to hold in practice. Besides, the collision in codewords (more than one device chooses the same codeword) will lead to a superimposition of the estimated channels, resulting in a failure of the stitching process. Currently, the design of an efficient collision resolution scheme in URA remains missing. In [34], a more efficient coordinate selection policy is developed based on the multi-armed bandit approaches, leading to a faster convergence than the CB-ML algorithm. Nevertheless, these slotted transmission frameworks, such as CB-ML and its variants, all map a piece of data within each slot to a long transmitted codeword based on the CS coding, resulting in low efficiency. In this regard, the algorithmic design for the MIMO massive URA with high efficiency and accuracy remains missing.

I-B Challenges

In this paper, we investigate a two-phase transmission scheme for MIMO massive URA, where the device’s transmitted data is split into two phases with CS and LDPC coded, respectively. In this framework, no parity bits are needed to be embedded in the CS phase, and the data is encoded linearly in the LDPC phase instead of being mapped to a long codeword. As such, it exhibits higher efficiency and lower latency than the aforementioned slotted transmission scheme. The existing schemes considering this framework all focus on the single-antenna case. For instance, Polar coding [24] and LDPC coding schemes [25, 26] consider the GMAC with perfect channel state information (CSI) at the BS. In this regard, the above channel coding schemes can obtain satisfying performance. However, such an assumption cannot be established under the MIMO case. Besides, because of the massive connectivity and sporadic traffic of devices, the positions of active devices in the estimated equations are not determined. Consequently, conventional linear receivers, such as zero-forcing (ZF), are not applicable. Moreover, compared with the fading channel models [28, 29, 30] where the channel of each device is a scalar, in MIMO, the channel is a vector, and elements among antennas share the same activity. Therefore, elements in the channel vector should be estimated jointly rather than separately, of which the distribution is formulated as a multi-dimensional Bernoulli-Gaussian distribution in this paper. In this regard, the conventional estimators, such as the minimum mean square error (MMSE) estimation or the maximum a posteriori (MAP) estimation, are hard to carry out straightforwardly. Specifically, it is computationally intractable to obtain a precise posterior distribution, since it involves marginalizing a joint distribution of the activity and channel with high dimensions.

Another challenge for the proposed scheme in the MIMO case is activity detection. We note that whether in the GMAC [24, 25, 26] or fading channel [28, 29, 30] in the single-antenna case, the device activity can be directly detected. For instance, since the device’s channel is a scalar in the fading channel, it can be simply determined to be active when the estimated channel is non-zero or larger than a given threshold. However, it will be much involved in MIMO because there are multiple observations of the activity in the estimated channel vector. The existing works, such as [9], make a hard decision based on the energy of the channel and [33] considers the LLR at each antenna separately and sums them to obtain the final decision. However, the threshold for the decision is hard to obtain in practice and the distribution of the channel is not utilized in [9]. Although [33] considers the channel distribution and gives a closed-form expression for activity detection, the nature that channels at each antenna share the same activity is not considered.

Besides, the LDPC code is efficient but sensitive to the CE results. Therefore, the overall performance is limited by the accuracy of CE. Moreover, with the presence of collision in URA, if more than one device chooses the same codeword in the CS phase, the corresponding channels will be superimposed and thus the subsequent LDPC decoding process will fail.

I-C Contributions

To cope with the arising issues, based on the message passing (MP) algorithm, we propose the Joint DAD-CE algorithm to conduct the task of joint activity detection and channel estimate in the CS phase, and MIMO-LDPC-SIC Decoding algorithm for data decoding embedded in the LDPC phase. Moreover, to further improve the performance, we propose the Joint DAD-CE-DD algorithm to jointly update the messages in these two phases by utilizing the belief of each other. Finally, we propose a collision resolution protocol to address the collision in URA. The key and distinguishing features of the proposed algorithms are listed below.

•

Based on the principle of the BP algorithm, we develop a low-complexity iterative MP algorithm to decode the two parts of data. For the CS phase, we investigate the Joint DAD-CE algorithm to recover part of the data embedded in the device activities, the interleaving patterns, and channel coefficients, which are the key parameters for the remaining data. Specifically, we derive a close-form expression for DAD by utilizing the joint distribution of the channel among antennas. Combined with SIC and in the spirit of interleave-division multiple access (IDMA) [35], we elaborate on the MIMO-LDPC-SIC Decoding algorithm to decode the remaining data embedded in the LDPC phase. In addition to complexity reduction, the proposed algorithm exhibits a substantial performance enhancement in terms of accuracy and efficiency compared to the state-of-the-art CB-ML algorithm.
•

Thanks to the consistency of the MP algorithm, we propose the Joint DAD-CE-DD algorithm to further improve the performance. The proposed algorithm suggests a paradigm connecting the two parts. That is, messages in the decoding process of CS and LDPC parts can be jointly updated by utilizing the belief of each other, thus leading to improved performance. We employ the correctly decoded codewords as soft pilots to conduct CE jointly with the codewords in the CS phase, which contributes to improved accuracy of the estimated channel. Combined with the SIC method, the accuracy of the residual signal can be improved, leading to the enhanced decoding performance.
•

Under the current framework based on the common codebook, a collision happens if there are more than one device having the same preamble, which leads to a superimposition of the estimated channels. Accordingly, the superimposed channel will cause a failure in the LDPC decoding process. To this end, we propose a collision resolution protocol based on the energy detection (ED) and sliding window protocol (SWP). Succinctly, the ED is performed on the estimated channel of each device to find out the superimposition. Then, the BS broadcasts the indexes of the superimposed channels to all devices. Afterwards, the devices in collision slide the window in the data sequence and the CS coding is again performed for retransmission.

The rest of the paper is organized as follows. In Section II, we introduce the system model for MIMO massive URA . In Section III, we implement the two-phase encoding scheme and the collision resolution protocol is developed in Section IV. Then, we elaborate on the low-complexity iterative decoding algorithm based on BP and explain the joint update algorithm in Section V. After verifying the numerical results and analyzing the complexity in VI, we conclude the paper in Section VII.

I-D Notations

Unless otherwise noted, lower- and upper-case bold fonts, $\bm{x}$ and $\bm{X}$ , are used to denote vectors and matrices, respectively; the $(m,n)$ -th entry of $\bm{X}$ is denoted as ${X}_{m,n}$ ; $\bm{X}_{i,:}$ denotes the $i$ -th row of $\bm{X}$ ; $\{\cdot\}^{*}$ denotes the conjugate of a number; $\{\cdot\}^{T}$ and $\{\cdot\}^{H}$ denote transpose and conjugate transpose of a matrix, respectively; $\mathbb{E}\{\cdot\}$ and $\text{Var}\left\{\cdot\right\}$ denote the statistical expectation and variance, respectively; $\bm{X}\sim\mathcal{CN}(\bm{x};\bm{\mu},\bm{\Sigma})$ means that the random vector $\bm{x}$ follows a complex Gaussian distribution with mean $\bm{\mu}$ and auto-covariance matrix $\bm{\Sigma}$ ; $\{\cdot\}!$ denotes the factorial; $\left[1:M\right]$ denotes the set of integers between $1$ and $M$ ; $\mathcal{L}\backslash l$ denotes the entries in set $\left\{1,2,\cdots,L\right\}$ except $l$ ; $\text{Re}(\cdot)$ and $\text{Im}(\cdot)$ denote the real and imaginary parts of a complex number, respectively; $\lceil\cdot\rceil$ denotes ceiling; $i$ is the imaginary unit (i.e., $i^{2}=-1$ ).

II System Model

Consider the uplink of a single-cell cellular network consisting of $K_{tot}$ single-antenna devices, which are being served by a base station (BS) equipped with $M$ antennas. This paper assumes sporadic device activity, i.e, a small number, $K_{a}\ll K_{tot}$ of devices are active within a coherence time. Each device has $B$ bits of information to be coded and transmitted into a block-fading channel with $L$ channel uses. Let $\bm{v}_{k}\in\left\{0,1\right\}^{B}$ denote device $k$ ’s binary message and $f(\cdot):\left\{0,1\right\}^{B}\rightarrow\mathbb{C}^{L}$ is some one-to-one encoding function. Typically, in URA scenario, the implementation of $f(\cdot)$ is to select the corresponding codeword $\bm{a}_{k}$ from a shared codebook $\bm{A}=\left[\bm{a}_{1},\bm{a}_{2},\cdots,\bm{a}_{2^{B}}\right]\in\mathbb{C}^{L\times 2^{B}}$ according to $\bm{v}_{k}$ [31, 20, 32]. The corresponding received signal can be written as

\bm{Y}=\sum\nolimits_{k\in\mathcal{K}_{tot}}{\phi_{k}f(\bm{v}_{k})\tilde{\bm{h}}_{k}^{T}}+\bm{Z},

(1)

where $\phi_{k}$ is the device activity indicator, which is modeled as a Bernoulli random variable and defined as follows:

\phi_{k}=\left\{\begin{array}[]{l}1,\text{ if device }k\text{ is active, }\\ 0,\text{ otherwise }\end{array}\quad\forall k\in\mathcal{K}_{tot}.\right.

(2)

$\tilde{\bm{h}}_{k}\in\mathbb{C}^{M\times 1}$ is the channel vector of device $k$ and $\bm{Z}\in\mathbb{C}^{L\times M}$ is the additive white Gaussian noise (AWGN) matrix distributed as $\mathcal{CN}(0,\sigma^{2}\bm{I}_{M})$ . In line with the state-of-the-art setting [32], we assume for simplicity and fair comparison that $\bm{h}_{k}$ are i.i.d., i.e., independent of each other and uncorrelated among antennas. Specifically, the Rayleigh fading model is considered in this paper, where $\tilde{\bm{h}}_{k}=\sqrt{\beta_{k}}\bm{g}_{k}$ and $\bm{g_{k}}\sim\mathcal{CN}(0,\bm{I}_{M})$ denotes the Rayleigh fading component, and $\beta_{k}$ denotes LSFC. We would like to mention that more realistic channel models in MIMO or massive MIMO have been discussed and well investigated, such as the spatially correlated fading channel [9, 36, 37, 38, 39]. Due to the limited angular spread, the channel in the virtual angular domain exhibits a sparsity among antennas. As such, the data of devices is not directly superimposed but staggered on different antennas, and the multi-user interference can be further reduced, contributing to improved performance. Nevertheless, for the sake of consistency with the benchmarks [4, 32] and isolating the fundamental aspects of the problem without additional model complication, we embark throughout this paper on the study of i.i.d. Rayleigh fading channel. The study on the spatially correlated channel is remarkable and left for future work.

The BS’s task is to produce a list of transmitted messages $\mathcal{L}(\bm{Y})$ without identifying from whom they are sent, thereby leading to the so-called URA. The performance of a URA system is evaluated by the probability of missed detection and false alarm, denoted by $p_{md}$ and $p_{fa}$ , respectively, which are given by:

	$\displaystyle p_{md}$	$\displaystyle=\frac{1}{{{K_{a}}}}\sum\nolimits_{k\in{\mathcal{K}_{a}}}{P\left({{\bm{v}_{{k}}}\notin\mathcal{L}}\right)}$		(3)
	$\displaystyle p_{fa}$	$\displaystyle=\frac{{\left\|{\mathcal{L}\backslash\left\{{{\bm{v}_{{k}}}:k\in{\mathcal{K}_{a}}}\right\}}\right\|}}{{\left\|\mathcal{L}\right\|}}.$		(4)

In this system, the code rate $R_{c}={B}/{L}$ and the spectral efficiency $\mu=\frac{B\cdot K_{a}}{L\cdot M}$ . Let $P$ denote the power (per symbol) of each device, then the energy-per-bit $E_{b}/N_{0}$ is defined by

E_{b}/N_{0}=\frac{LP}{2B}.

(5)

III Encoder

As aforementioned, the existing slotted transmission scheme exhibits low efficiency by the CS coding. To mitigate this issue, a two-phases coding scheme is proposed in [26, 25], which considers the $T$ -user Gaussian multiple access (GMAC) channel and also URA scenario. We extend this work to the MIMO system and refer to it as the CS-LDPC coding scheme. Similarly, the hierarchical form of the encoding process is considered in this paper. The $B$ bits of information are first divided into two parts, namely, $B_{p}$ and $B_{c}$ bits with $B_{p}+B_{c}=B$ . Typically, one would want $B_{p}\ll B_{c}$ . The former $B_{p}$ information bits are coded by a CS-based encoder to pick codeword from the common codebook. Based on the codebook, the BS is tasked to recover part of the messages, the number of active devices, channel coefficients as well as interleaving patterns for the latter part. The remaining $B_{c}$ bits are coded with LDPC codes. For clarity, we denote the former and latter encoding processes as CS and LDPC phases, respectively. Correspondingly, the total $L$ channel uses are split into two segments of lengths $L_{p}$ and $L_{c}$ , respectively, with $L_{p}+L_{c}=L$ . Since only a small fraction of data is CS coded and the rest is LDPC coded, the efficiency in our scheme is higher than those purely CS-coded schemes [33, 19, 20, 32]. The key features of this encoding process are summarized in Fig. 1. We elaborate on these two encoding phases below.

Refer to caption — Figure 1: The encoding process in the CS and LDPC phases.

III-A CS Phase

The URA fashion is considered in this phase. Let $\mathbf{A}\in\mathbb{C}^{L_{p}\times 2^{B_{p}}}$ denote the common codebook shared by all the devices. That is, the columns of $\mathbf{A}=\left[\bm{a}_{1},\bm{a}_{2},\cdots,\bm{a}_{2^{B_{p}}}\right]$ form a set of codewords with power constraint $\left\|{{\mathbf{a}_{i}}}\right\|_{2}^{2}=L_{p}$ , from which each active device chooses in order to encode its ${B_{p}}$ bits of information. With a slight abuse of notation, let $\bm{v}_{k}^{p}$ denote the first $B_{p}$ bits of device $k$ ’s binary message, i.e., $\bm{v}_{k}^{p}\triangleq\bm{v}_{k}(1:B_{p})\in\left\{0,1\right\}^{B_{p}}$ . To apply the encoding scheme, we convert $\bm{v}_{k}^{p}$ into the $\ell_{1}$ -norm binary vector $\tilde{\bm{v}}_{k}^{p}\in\left\{0,1\right\}^{2^{B_{p}}}$ , in which a single one is placed at the location $i_{k}$ . The value $i_{k}$ of binary sequence $\bm{v}_{k}^{p}$ is obtained by regarding it as an integer expressed in radix- $2$ (plus one), which we write as $i_{k}=\left[\bm{v}_{k}^{p}\right]_{2}\in\left[1:2^{B_{p}}\right]$ . Then, the coded sequence is obtained by taking the transpose of the $i_{k}$ -th column of $\bm{A}$ , which we denote by $\bm{a}_{i_{k}}^{T}$ . This facilitates the CS architecture, which maps the information to an elementary vector $\tilde{\bm{v}}_{k}^{p}$ , according to which the corresponding codeword is selected from a fixed codebook. The corresponding received signal can be written as

\bm{Y}=\sum\nolimits_{k\in\mathcal{K}_{tot}}{\phi_{k}\bm{a}_{i_{k}}\tilde{\bm{h}}_{k}^{T}}+\bm{Z},

(6)

where $\phi_{k}$ is the device activity indicator, as defined in (2). The matrix form of (6) is given by

\bm{Y}=\bm{A}\bm{\Gamma}\tilde{\bm{H}}+\bm{Z},

(7)

where $\mathbf{A}\in\mathbb{C}^{L\times 2^{B_{p}}}$ is the common codebook shared by all devices. $\mathbf{\Gamma}\in{\left\{{0,1}\right\}^{{2^{B_{p}}}\times{K_{tot}}}}$ denotes the binary selection matrix. For each active device $k\in\mathcal{K}_{a}$ , the corresponding column $\bm{\Gamma}_{:,k}$ is all-zero but a single one in position $i_{k}$ , while for all $\mathcal{K}_{tot}\backslash\mathcal{K}_{a}$ the corresponding column $\bm{\Gamma}_{:,k}$ is all zeros. $\tilde{\bm{H}}=\left[\tilde{\bm{h}}_{1},\cdots,\tilde{\bm{h}}_{K_{tot}}\right]^{T}\in\mathbb{C}^{K_{tot}\times M}$ corresponds to the MIMO channel coefficient matrix. We note that the number of total devices $K_{tot}$ plays no role in URA. In order to get rid of this variable, with slight abuse of notation we define the modified activity indicator and selection matrix as $\phi_{r}=\sum\nolimits_{k\in\mathcal{K}_{a}}{\gamma_{r,k}}$ and $\bm{\Phi}=\text{diag}\left\{\phi_{1},\cdots,\phi_{2^{B_{p}}}\right\}$ , respectively, where $\gamma_{r,k}$ is the $(r,k)$ -th entry of $\bm{\Gamma}$ . Correspondingly, the modified channel is ${\bm{H}}=\left[\bm{h}_{1},\cdots,\bm{h}_{2^{B_{p}}}\right]$ with $\bm{h}_{r}=\sum\nolimits_{k\in{\mathcal{K}_{a}}}{\gamma_{r,k}\tilde{\bm{h}}_{k}}$ . Hence, (7) can be written as

\bm{Y}=\bm{A}\bm{\Phi}\bm{H}+\bm{Z}.

(8)

Let $\bm{X}=\bm{\Phi}\bm{H}=\left[\bm{x}_{1},\cdots,\bm{x}_{2^{B_{p}}}\right]^{T}$ . The goal for the BS in the CS phase is to detect the non-zero positions of the selection matrix $\bm{\Phi}$ and the corresponding channel vectors by recovering $\bm{X}$ based on the noisy observation $\bm{Y}$ . Since $\bm{X}$ is row sparse, i.e., many $\bm{x}_{n}$ are zero, such reconstruction problem can be modeled as the CS problem [4]. Once $\bm{\Phi}$ is recovered, the message indicators of active devices $\left\{i_{k},\forall k\in\mathcal{K}_{a}\right\}$ are also recovered. Moreover, $i_{k}$ acts as the parameter of the LDPC code, since it determines the interleaving pattern of the data in the LDPC phase. Considering the fact that the $B_{p}$ bits of message carries key parameters for the latter phase, we name it the preamble. We note that it is different from the preamble in the traditional grant-free scenario, which is a pure pilot with no data embedded.

III-B LDPC Phase

Likewise, let $\bm{v}_{k}^{c}\triangleq\bm{v}_{k}(B_{p}+1:B)\in\left\{0,1\right\}^{B_{c}}$ denote the remaining $B_{c}$ information bits of device $k$ . $\bm{v}_{k}^{c}$ is first encoded into an LDPC code $\bm{b}_{k}^{c}\in\left\{0,1\right\}^{\tilde{L}_{c}}$ , which is determined by the LDPC parity check matrix $\bm{C}$ with size $\left(\tilde{L}_{c}-B_{c}\right)\times\tilde{L}_{c}$ . We note that in the decoding process, if $\hat{\bm{v}_{k}^{c}}$ is a valid codeword, then $\text{mod}\left(\bm{C}\hat{\bm{v}}_{k}^{c},2\right)=0$ . $\bm{b}_{k}^{c}$ is then modulated to $\bm{s}_{k}^{c}$ . We adopt a sparse spreading scheme introduced in [25]. That is, $\bm{s}_{k}^{c}$ is zero-padded into a length $L_{c}$ vector $\tilde{\bm{s}}_{k}^{c}$

\tilde{\bm{s}}_{k}^{c}=\left[\bm{s}_{k}^{c},~{}0,~{}\cdots,~{}0\right].

(9)

We then employ the index representation $i_{k}$ to permute the ordering of $\tilde{\bm{s}}_{k}^{c}$ . This is implemented by a random interleaver $\pi_{i_{k}}$ with interleaving pattern $i_{k}$ . As mentioned in [26], the purpose for permuting the codewords is to decorrelate the random access interference from other devices. This is similar to the IDMA scheme since the interleaving patterns for most of the devices are different because of the distinctive indices $i_{k}$ . Hence, $\bm{v}_{k}^{c}$ is finally encoded to $\pi_{i_{k}}\left(\tilde{\bm{s}}_{k}^{c}\right)$ . Appending it to the coded message in the CS phase yields the final codeword $\bm{x}_{k}$ :

\bm{x}_{k}=\left[\bm{a}_{i_{k}}^{T},\pi_{i_{k}}\left(\tilde{\bm{s}}_{k}^{c}\right)\right]^{T}.

(10)

The received signal including both the CS and LDPC phases is given by

\bm{Y}=\sum\nolimits_{k\in\mathcal{K}_{a}}{\bm{x}_{k}\bm{h}_{i_{k}}^{T}}+\bm{Z},

(11)

where $\bm{h}_{i_{k}}$ is assumed to follow independent quasi-static flat fading within the above two phases in this paper. The goal for the BS is to recover $\left\{\bm{v}_{k},\forall k\in\mathcal{K}_{a}\right\}$ based on the received signal $\bm{Y}$ and the channel $\bm{h}_{i_{k}}$ estimated in the CS phase. We emphasize again that the BS only produces the transmitted messages without distinguishing the corresponding devices.

This hierarchical encoding process appears to be similar to the work of [26], where CS and channel coding techniques are utilized to encode the messages. However, unlike in [26] considering the GMAC system, our approach is in the MIMO channel. Hence, in addition to recovering the parameters of LDPC codes conveyed by the codebook $\bm{A}$ as in [26], the BS is also tasked to estimate the channel coefficients. Besides, as we will see shortly in Section V-C, a belief propagation decoder draws a connection between the CS and LDPC phases. That is, messages in the CE as well as DD processes can be jointly updated by utilizing the belief of each other. Whereas the two phases in [26] are two independent modules and work sequentially.

IV Collision Resolution Protocol

It is possible that two or more devices have the same preamble message, $\bm{v}^{p}$ , which, although, may occur in a small probability. In this case, the collided devices will have the same interleaving pattern in the LDPC phase, which goes against the principles of the IDMA scheme. Moreover, they will choose the same codeword as their coded messages in the CS phase, which results in the received signal in the CS phase being

\bm{Y}_{colli}=\sum\nolimits_{i_{k}\in\mathcal{C}}{\bm{a}_{i_{k}}\sum\nolimits_{j\in\mathcal{K}_{i_{k}}}{\tilde{\bm{h}}_{j}^{T}}}+\bm{Z},

(12)

where $\mathcal{C}$ and $\mathcal{K}_{i_{k}}$ denote the set of collided indexes and collided devices corresponding to the message index $i_{k}$ , respectively. In this regard, the BS can only recover the superimposed channels of these devices, instead of their own, which will lead to the failure of the LDPC decoding process. Although collision may occur in a small probability, it can occur. However, the design of an efficient collision resolution scheme in URA remains missing.

In this paper, we develop a collision resolution protocol based on the ED and SWP. We note that in real scenarios, the near-far effects can be well solved with the existing power control schemes [40, 41, 42]. In this regard, a flat fading channel is considered in this paper. That is, the LSFCs $\beta_{k}$ for all the devices are assumed to be identical, as also considered in [31, 32]. As mentioned above, if a collision happens, the recovered channels of the collided devices will be a superposition of their own, that is

\hat{\bm{h}}_{i_{k}}=\sum\nolimits_{j\in\mathcal{K}_{i_{k}}}{\tilde{\bm{h}}_{j}}+\bm{z},\quad i_{k}\in\mathcal{C},

(13)

where $\bm{z}\sim\mathcal{CN}(0,\sigma^{2}\bm{I}_{M})$ denotes the Gaussian noise. And $\hat{\bm{h}}_{i_{k}}$ is distributed as $\hat{\bm{h}}_{i_{k}}\sim\mathcal{CN}(0,\left(|\mathcal{K}_{i_{k}}|+\sigma^{2})\bm{I}_{M}\right)$ , which has a higher power than those without collision. Therefore, an effective way to detect collision is to perform ED on the estimated channel by the BS

\epsilon_{i_{k}}=\mathbb{E}\left[\hat{\bm{h}}_{i_{k}}^{H}\hat{\bm{h}}_{i_{k}}\right].

(14)

If $\epsilon_{i_{k}}$ is greater than a given threshold $\eta$ , then it is utilized as evidence that there are at least two devices that have the same preamble and thus they choose the same CS codeword $\bm{a}_{i_{k}}$ . Since devices themselves do not know whether they have been in a collision, the BS needs to feed this information back. To this end, the BS will broadcast the collided index representations $\left\{i_{k}|\epsilon_{i_{k}}>\eta\right\}$ to all the devices to help them get the judgment. Fig. 2a showcases that device $i$ to $j$ realized that a conflict occurred after receiving the indexes broadcast by the BS. We note that in the above process, the additional information required at the BS is only the threshold and it can be easily preset in practice.

As illustrated in Fig. 2b, in order to get a new non-conflicting index representation, the collided devices will slide the window with length $B_{p}$ bits forward within the total $B$ bits to get new sequences, denoted by ${\bm{v}_{k}^{p}}^{\prime}$ . We denote $B_{0}$ as the sliding length which satisfies $0<B_{0}<B_{p}$ . The reason behind $B_{0}<B_{p}$ is that there should be a common part, labeled in green in Fig. 2b, between the sequences before and after sliding the window, so as to splice back the sequences between different windows.

Algorithm 1 Collision Resolution Protocol

1: Input: estimated channel

\hat{\bm{H}}

, maximum iteration

t_{max}

, maximum and minimum threshold

\eta

\gamma

2: Output: recovered channel

\tilde{\bm{H}}

3: Initialize: iteration count

t=0

\tilde{\bm{H}}=\left[~{}\right]

4: repeat

5: Energy detection:

\epsilon_{i_{k}}=\mathbb{E}\left[\hat{\bm{H}}_{i_{k},:}\hat{\bm{H}}_{i_{k},:}^{H}\right],i_{k}\in\left[1:2^{B_{p}}\right]

6: Feedback: the BS broadcasts

\left\{i_{k},\epsilon_{i_{k}}>\eta\right\}

7: Combine:

\tilde{\bm{H}}\leftarrow\left[\tilde{\bm{H}};\hat{\bm{H}}_{i_{k},:}\right],\left\{i_{k}|\gamma<\epsilon_{i_{k}}<\eta\right\}

8: Window sliding and retransmission

9: Channel estimation:

\hat{\bm{H}}

10:

t\leftarrow t+1

11: until

t=t_{max}

\{\epsilon_{i_{k}}<\eta,~{}\forall i_{k}\in\left[1:2^{B_{p}}\right]\}

12: return

\tilde{\bm{H}}

After obtaining ${\bm{v}_{k}^{p}}^{\prime}$ , the CS-based encoder is again performed to encode ${\bm{v}_{k}^{p}}^{\prime}$ to $\bm{a}_{{i_{k}}^{\prime}}$ with ${i_{k}}^{\prime}=\left[{\bm{v}_{k}^{p}}^{\prime}\right]_{2}\in\left[1:2^{B_{p}}\right]$ . The encoding process is the same as that in III-A. Channels of the collided devices are expected to be recovered separately after the retransmission. The ED will be again performed on the recovered channels. If the collision still exists, the window sliding process will be executed again until the maximum number of retransmission is reached or no collision exists. The above collision resolution protocol is summarized in Algorithm 1. We give the analysis for this collision resolution protocol in Appendix -A, which illustrates that as the window sliding progresses, the number of collided devices will decrease and tend to zero.

V Decoder

The decoding process can be distilled into two key operations: the recovery of the preamble as well as the key parameters for the LDPC code, and the LDPC decoding process combined with SIC [43]. Both are carried out with the MP algorithm. We emphasize again that the beliefs of these two parts can be leveraged to jointly update the messages of the decoding process, which is not considered in [25, 26].

V-A Joint DAD-CE Algorithm

The recovery of the preamble as well as the channels in the CS phase is equivalent to the joint DAD-CE problem, which can be modeled as a CS problem. According to the formulation in (8), the recovery of the sparse matrix $\bm{X}$ can be addressed by the CS-based methods, such as the AMP algorithm and its variants [8, 4, 9, 10, 11]. Besides, the MP-based approaches [13, 14] also work well on the above issues. The Bernoulli and Rician messages are jointly updated in [13], which considers the fading channel and grant-free scenario. This scenario is also considered in [14], which takes advantage of estimated data symbols as soft pilot sequences to perform joint channel and data estimation. In this subsection, we consider the MIMO system and derive the update rules of messages based on the BP algorithm.

We derive the update rules of the activity indicator $\phi_{k}$ and channel vector $\bm{h}_{k}$ in (6), which are modeled as Bernoulli and Gaussian messages, respectively. The Gaussian messages can be characterized by the estimation of mean-value $\bm{u}_{k}$ and auto-covariance matrix $\bm{\Sigma}_{k}$ . That is, $\bm{u}_{k}$ and $\bm{\Sigma}_{k}$ are the estimation and estimating deviation of $\bm{h}_{k}$ in (6), respectively. Besides, the Bernoulli messages for the activity indicator can be represented by $p_{k}$ , which is the probability of $\phi_{k}$ taking the value one. These messages are updated iteratively between the observation and variable nodes, which can be characterized by the factor graph in Fig. 3, where the received signal $y_{l,m}$ represents the observation node, denoted by SN, and channel $h_{k,m}$ and the activity pattern $\phi_{k}$ are the variable nodes, denoted by VN. The edges in the factor graph represent the connections among nodes. In the BP algorithm, messages are passed along these edges. We elaborate on the update rules of these messages below.

V-A1 Message Update at Observation Nodes

We denote $p_{i\rightarrow l}^{VN}(t)$ as the Bernoulli message for the activity of device $i$ , which is passed from VN $i$ to SN $l$ in the $t$ -th iteration. Accordingly, $\mu_{im\rightarrow lm}^{VN}(t)$ and $\bm{\Sigma}_{i\rightarrow l}^{VN}(t)$ denote the Gaussian messages passed from VN $im$ to SN $lm$ , which represent the estimation and the estimating deviation of the channel $\bm{h}_{i}$ , respectively. The index $m\in\left[1:M\right]$ denotes the $m$ -th antenna and also the $m$ -th value of $\bm{h}_{i}$ . Since the update rules of the messages are the same with respect to different iterations, the index of iteration is omitted in the following derivation. For clarity, we assume there is no collision in the CS phase for the following derivation. As such, $i_{k}$ , the original subscript of $\bm{h}$ is replaced by $k$ , since there is a one-to-one mapping between these two terms. We emphasize that the collision is considered in our implementation and addressed by the proposed resolution protocol in Algorithm 1.

To give the message update rules at the SN $y_{l,m}$ in Fig. 4, we first rewrite (6) as

	$\displaystyle y_{lm}$	$\displaystyle=\sum\nolimits_{i=1}^{K}{A_{li}\phi_{i}h_{im}}+n_{lm}$		(15)
		$\displaystyle=A_{lk}\phi_{k}h_{km}+\underbrace{\sum\nolimits_{i\in\mathcal{K}\backslash k}{A_{li}\phi_{i}h_{im}}+n_{lm}}_{z_{lkm}},$		(15)

where $l$ is in $L_{p}$ and $\mathcal{K}\backslash k$ denotes the entries in set $\left\{1,2,\cdots,K\right\}$ except $k$ . The term $\sum\nolimits_{i\in K\backslash k}{A_{li}\phi_{i}h_{im}}+n_{lm}$ is modeled as an equivalent Gaussian noise with $\bm{z}_{lk}\sim\mathcal{CN}(\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})$ , where $\bm{\mu}_{z_{lk}}=[\mu_{z_{lk1}},\cdots,\mu_{z_{lkm}},\cdots,\mu_{z_{lkM}}]$ and $\mu_{z_{lkm}}$ is given by

\mu_{z_{lkm}}=\sum\nolimits_{i\in\mathcal{K}\backslash k}{A_{li}\cdot p_{i\rightarrow l}^{VN}\cdot\mu_{im\rightarrow lm}^{VN}}.

(16)

For $\bm{\Sigma}_{z_{lk}}$ , the auto-covariance of $\bm{z}_{lk}$ , since $\phi_{i}$ is the same for $M$ antennas, resulting in the correlation among antennas, we should not consider the variance of $\bm{z}_{lkm}$ at each antenna $m$ separately. Instead, the covariance of $z_{lk}$ is considered in this paper. Rewrite (15) in a vector form:

	$\displaystyle\mathbf{y}_{l}^{T}$	$\displaystyle=\left[{\begin{array}[]{{20}{c}}y_{l1}\\ y_{l2}\\ \vdots\\ y_{lM}\end{array}}\right]=A_{lk}\phi_{k}\cdot\left[{\begin{array}[]{{20}{c}}h_{k1}\\ h_{k2}\\ \vdots\\ h_{kM}\end{array}}\right]$		(17)
		$\displaystyle+\underbrace{\sum\nolimits_{i\in\mathcal{K}\backslash k}{A_{li}\phi_{i}\cdot\left[{\begin{array}[]{{20}{c}}h_{i1}\\ h_{i2}\\ \vdots\\ h_{iM}\end{array}}\right]}+\left[{\begin{array}[]{{20}{c}}n_{l1}\\ n_{l2}\\ \vdots\\ n_{lM}\end{array}}\right]}_{\mathbf{z}_{lk}}.$		(17)

The $(m,n)$ -th $(m\neq n)$ entry for $\bm{\Sigma_{z_{lk}}}$ satisfies

	$\displaystyle(\Sigma_{z_{lk}})_{(m,n)}$	$\displaystyle=\sum\nolimits_{i\in\mathcal{K}\backslash k}{{\left\|{{A_{li}}}\right\|}^{2}\cdot p_{i\to l}^{VN}\cdot\Big{\{}(\Sigma_{i\to l}^{VN})_{(m,n)}}$		(18)
		$\displaystyle{+~{}q_{i\to l}^{VN}\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\Big{\}}},~{}m\neq n,$		(18)

where $q_{i\to l}^{VN}=1-p_{i\to l}^{VN}$ denotes the probability that the Bernoulli variable $\phi_{k}$ equals to zero. If $m=n$ , we have

	$\displaystyle(\Sigma_{z_{lk}})_{(m,m)}$	$\displaystyle=\sum\nolimits_{i\in\mathcal{K}\backslash k}{{\left\|{{A_{li}}}\right\|}^{2}\cdot p_{i\to l}^{VN}\cdot\Big{\{}(\Sigma_{i\to l}^{VN})_{(m,m)}}$		(19)
		$\displaystyle{\left.\quad\quad\quad+~{}q_{i\to l}^{VN}\cdot\left\|\mu_{im\to lm}^{VN}\right\|^{2}\right\}}+\sigma_{n}^{2}.$		(19)

Details about the derivation of $\bm{\Sigma_{z_{lk}}}$ are given in Appendix -B1. After obtaining the mean and covariance of $\bm{z}_{lk}$ , we can get the Gaussian messages $\mu_{lm\rightarrow km}^{SN}$ and $\bm{\Sigma}_{l\rightarrow k}^{SN}$ passed from SN $lm$ to VN $km$ as below

	$\displaystyle\begin{split}\mu_{lm\to km}^{SN}&=\mathbb{E}\left[h_{km}\|y_{lm},\mu_{z_{lkm}},\bm{\Sigma}_{z_{lk}},\phi_{k}=1\right]\\ &=(y_{lm}-\mu_{z_{lkm}})/A_{lk}\end{split}$			(20)
	$\displaystyle\begin{split}\bm{\Sigma}_{l\to k}^{SN}&=\text{Var}\left[h_{km}\|y_{lm},\mu_{z_{lkm}},\bm{\Sigma}_{z_{lk}},\phi_{k}=1\right]\\ &=\bm{\Sigma}_{z_{lk}}/\left\|A_{lk}\right\|^{2},\end{split}$			(21)

where $\mathbb{E}\left[a|b\right]$ and $\text{Var}\left[a|b\right]$ denote the expectation and variance of $a$ conditioned on $b$ , respectively.

For the Bernoulli message $p_{l\to k}^{SN}$ passed for SN $l$ to VN $k$ , we have

$\displaystyle p_{l\to k}^{SN}$	$\displaystyle=\left[1+\frac{P(\mathbf{y}_{l}\|\phi_{k}=0,\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{P(\mathbf{y}_{l}\|\phi_{k}=1,\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}\right]^{-1}$	(22)
	$\displaystyle=\left[1+\frac{P(\bm{y}_{l}=\bm{z}_{lk}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{P(\bm{y}_{l}=A_{lk}\cdot\bm{h}_{k}+\bm{z}_{lk}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}\right]^{-1}$
	$\displaystyle=\left[1+\frac{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})}\right]^{-1}$
	$\displaystyle=\frac{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})}{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})+f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})},$

where

	$\displaystyle\bm{\mu}_{z_{lk}}^{{}^{\prime}}$	$\displaystyle=A_{lk}\cdot\bm{\mu}_{k\to l}^{VN}+\bm{\mu}_{z_{lk}}$		(23)
	$\displaystyle\bm{\Sigma}_{z_{lk}}^{{}^{\prime}}$	$\displaystyle=\left\|A_{lk}\right\|^{2}\cdot\bm{\Sigma}_{k\to l}^{VN}+\bm{\Sigma}_{z_{lk}},$		(24)

which denote the mean-value and covariance of $\bm{y}_{l}$ when $\phi_{k}=1$ , respectively. And $f(\bm{x}|\bm{\mu},\Sigma)$ denotes the probability density function (pdf) of the multi-dimensional complex Gaussian distribution $\mathcal{CN}_{M}(\bm{x}|\bm{\mu},\Sigma)$ , that is

f(\bm{x}|\bm{\mu},\bm{\Sigma})=\frac{1}{\pi^{M}\!\cdot\!\det\left(\bm{\Sigma}\right)}\!\cdot\exp\left[-(\bm{x}-\bm{\mu})^{H}\bm{\Sigma}^{-1}(\bm{x}-\bm{\mu})\right].

(25)

Moreover, the Bernoulli message can be simplified by the use of log-likelihood ratio (LLR) to reduce the complexity as well as to avoid the computation overflow. Hence, the LLR of the message in (22) can be represented as

$\displaystyle l_{l\to k}^{SN}$	$\displaystyle\triangleq\text{ln}\frac{P(\phi_{k}=1)}{P(\phi_{k}=0)}=\ln\frac{p_{l\to k}^{SN}}{1-p_{l\to k}^{SN}}$	(26)
	$\displaystyle\overset{(a)}{=}\ln\frac{\det\left(\bm{\Sigma}_{z_{lk}}\right)}{\det\left(\bm{\Sigma}_{z_{lk}}^{{}^{\prime}}\right)}\!+\!(\bm{y}_{l}-\bm{\mu}_{z_{lk}})^{H}\cdot\bm{\Sigma}_{z_{lk}}^{-1}\cdot(\bm{y}_{l}-\bm{\mu}_{z_{lk}})$
	$\displaystyle\quad\quad\quad-(\bm{y}_{l}-\bm{\mu}_{z_{lk}}^{{}^{\prime}})^{H}\cdot(\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})^{-1}\cdot(\bm{y}_{l}-\bm{\mu}_{z_{lk}}^{{}^{\prime}}),$

where $\overset{(a)}{=}$ is derived by the substitution of (22) and (25).

V-A2 Message Update at Variable Nodes

Likewise, the Gaussian and Bernoulli messages at VNs are updated by collecting the incoming messages from SNs. To ensure convergence, messages from the VN’s own are not included in the calculation [12]. Typically, for the Gaussian messages, since $\bm{h}$ follows the Gaussian distribution, the update rule at the VN is to multiply the pdfs observed at each SN to obtain a new one. We note that the prior Gaussian distribution of $\bm{h}$ is also included in the multiplication. The new pdf still follows the Gaussian distribution, of which the mean and covariance are the updated messages at the VN. The pdf of $\bm{h}_{k}$ passed from VN $k$ to SN $l$ is given by

		$\displaystyle f(\bm{h}\|\bm{\mu}_{k\to l}^{VN},\bm{\Sigma}_{k\to l}^{VN})$		(27)
		$\displaystyle\quad\quad\propto\prod\nolimits_{i\in\mathcal{L}\backslash l}{f(\bm{h}\|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN})}\cdot f(\bm{h}\|\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}).$		(27)

Accordingly, for the Gaussian pdf $f(\bm{h}|\bm{\mu}_{k\to l}^{VN},\bm{\Sigma}_{k\to l}^{VN})$ , the mean and covariance are give by

	$\displaystyle\begin{split}\bm{\mu}_{k\to l}^{VN}&=\mathbb{E}\left[\bm{h}_{k}\|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN},i\in\mathcal{L}\backslash l\right]\\ &=\bm{\Sigma}_{k\to l}^{VN}\cdot\Big{[}(\bm{\Sigma}_{k}^{pri})^{-1}\cdot\bm{\mu}_{k}^{pri}\\ &\quad\quad\quad\quad+\sum\nolimits_{i\in\mathcal{L}\backslash l}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}\cdot\bm{\mu}_{i\to k}^{SN}}\Big{]}\end{split}$			(28)
	$\displaystyle\begin{split}\bm{\Sigma}_{k\to l}^{VN}&=\operatorname{Var}\left[\bm{h}_{k}\|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN},i\in\mathcal{L}\backslash l\right]\\ &=\left[\sum\nolimits_{i\in\mathcal{L}\backslash l}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}}+(\bm{\Sigma}_{k}^{pri})^{-1}\right]^{-1},\end{split}$			(29)

where $\bm{\mu}_{k}^{pri}=\bm{0}_{M\times 1}$ and $\bm{\Sigma}_{k}^{pri}=\bm{I}_{M}$ are the prior mean and covariance of $\bm{h}_{k}$ , $\mathcal{L}\backslash l$ denotes the entries in set $\left\{1,2,\cdots,L\right\}$ except $l$ . We give the derivations of (28) and (29) in Appendix -B2.

The derivation of the Bernoulli messages is the same as above, which is updated by collecting the messages observed at SNs. For $p_{k\to l}^{VN}$ passed for VN $k$ to SN $l$ , it is obtained by multiplying the probability of $\phi_{k}=1$ passed from all the SNs to VN $k$ and then normalizing. Likewise, for convergence, the message passed from SN $l$ to VN $k$ is not included. We emphasize that the prior activation probability of each device $p_{a}$ is also considered. As such, $p_{k\to l}^{VN}$ is given by

	$\displaystyle p_{k\to l}^{VN}$	$\displaystyle=P\left(\phi_{k}=1\|\left\{p_{i\to k}^{SN},i\in\mathcal{L}\backslash l\right\},p_{a}\right)$		(30)
		$\displaystyle=\frac{p_{a}\cdot\prod_{i\in\mathcal{L}\backslash l}p^{SN}_{i\to k}}{p_{a}\cdot\prod_{i\in\mathcal{L}\backslash l}p^{SN}_{i\to k}+(1-p_{a})\cdot\prod_{i\in\mathcal{L}\backslash l}\left({1-p^{SN}_{i\to k}}\right)}.$		(30)

Likewise, for complexity reduction, we also employ the LLR to represent this message in iterations, of which the relationship with the activation probability is

	$\displaystyle l_{k\to l}^{VN}$	$\displaystyle=\ln\frac{p^{VN}_{k\to l}}{1-p^{VN}_{k\to l}}=l_{0}+\sum_{i\in\mathcal{L}\backslash l}l^{SN}_{i\to k}$		(31)
	$\displaystyle p^{VN}_{k\to l}$	$\displaystyle=\frac{1}{1+\exp\left({{-l^{VN}_{k\to l}}}\right)},$		(31)

where $l_{0}=\ln\frac{p_{a}}{1-p_{a}}$ is the prior LLR of the probability for the device being active.

V-A3 DAD Decision and CE Output

Since the messages above are iteratively updated between SNs and VNs, after reaching the maximum number of iterations, the Bernoulli and Gaussian messages will have an output at VNs. For the Gaussian messages, similar to the above update rules, the output is obtained by combining all the incoming messages from SNs, i.e,

	$\displaystyle\begin{split}\bm{\mu}_{k}^{dec}&=\bm{\Sigma}_{k}^{dec}\cdot\left[(\bm{\Sigma}_{k}^{pri})^{-1}\cdot\bm{\mu}_{k}^{pri}\right.\\ &\quad\quad\quad\quad\left.+\sum\nolimits_{i\in\mathcal{L}}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}\cdot\bm{\mu}_{i\to k}^{SN}}\right]\end{split}$			(32)
	$\displaystyle\begin{split}\bm{\Sigma}_{k}^{dec}&=\left[\sum\nolimits_{i\in\mathcal{L}}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}}+(\bm{\Sigma}_{k}^{pri})^{-1}\right]^{-1},\end{split}$			(33)

which denote the output estimation and estimating deviation of $\bm{h}_{k}$ , respectively. For the Bernoulli messages, the LLR of the DAD decision is

l^{\text{dec}}_{k}=l_{0}+\sum\nolimits_{l\in\mathcal{L}}l^{SN}_{l\to k}+l^{ce}_{k}.

(34)

Device $k$ ’s activity is detected as $\hat{\phi}_{k}=1$ if $l^{\text{dec}}_{k}>0$ and vice versa. The term $l_{k}^{ce}$ in (34) is to improve the DAD accuracy by exploiting the CE result [13], which is derived as follows. The estimated channel $\bm{h}_{k}$ can be modeled as $\hat{\bm{h}}_{k}=\bm{h}_{k}+\bm{\epsilon}_{k}$ , where $\bm{\epsilon}_{k}$ is the complex Gaussian noise distributed as $\bm{\epsilon}_{k}\sim\mathcal{CN}(0,\bm{\Sigma}_{k}^{dec})$ . Accordingly, the distribution of $\hat{\bm{h}}_{k}$ with respect to value of $\phi_{k}$ is

\hat{\bm{h}}_{k}\sim\left\{\begin{array}[]{l c}\mathcal{CN}(\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}+\bm{\Sigma}_{k}^{dec}),&\phi_{k}=1\\ \mathcal{CN}(\bm{0},\bm{\Sigma}_{k}^{dec}),&\phi_{k}=0\end{array}\quad\forall k\in\mathcal{K}_{tot}.\right.

(35)

Therefore, this information can be leveraged to give an extra belief to the DAD decision. Similar to (22), $l^{ce}_{k}$ is given by

$\displaystyle l^{ce}_{k}=$	$\displaystyle\ln\frac{P\left({\hat{\bm{h}}_{k}=\bm{\mu}_{k}^{dec}\|\phi_{k}=1,\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{dec},\bm{\Sigma}_{k}^{pri}}\right)}{P\left({\hat{\bm{h}}_{k}=\bm{\mu}_{k}^{dec}\|\phi_{k}=0,\bm{\Sigma}_{k}^{dec}}\right)}$	(36)
$\displaystyle=$	$\displaystyle\ln\frac{f\left({\bm{\mu}_{k}^{dec}\|\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}+\bm{\Sigma}_{k}^{dec}}\right)}{f\left({\bm{\mu}_{k}^{dec}\|\bm{0},\bm{\Sigma}_{k}^{dec}}\right)}$
$\displaystyle=$	$\displaystyle\ln\frac{\det\left(\bm{\Sigma}_{k}^{dec}\right)}{\det\left(\bm{\Sigma}_{k}^{pri}\!+\!\bm{\Sigma}_{k}^{dec}\right)}+\left(\bm{\mu}_{k}^{dec}\right)^{H}\cdot\left(\bm{\Sigma}_{k}^{dec}\right)^{-1}\cdot\bm{\mu}_{k}^{dec}$
	$\displaystyle-\left(\bm{\mu}_{k}^{dec}\!-\!\bm{\mu}_{k}^{pri}\right)^{H}\ \!\!\!\cdot\!\left(\bm{\Sigma}_{k}^{pri}\!+\!\bm{\Sigma}_{k}^{dec}\right)^{-1}\!\!\!\cdot\!\left(\bm{\mu}_{k}^{dec}\!-\!\bm{\mu}_{k}^{pri}\right)\!.$

As aforementioned, we derive the LLR expression by utilizing the joint distribution of the channel among antennas. Finally, we obtain the estimated channel of device $k$ as

\bm{\hat{h}}_{k}=\hat{\phi}_{k}\cdot\bm{\mu}_{k}^{dec}.

(37)

The above joint DAD-CE algorithm is summarized in Algorithm 2, where $N_{iter}$ denotes the maximum number of iterations. We note that $\bm{\Sigma}^{VN}$ and $\bm{\Sigma}^{SN}$ both go to diagonal matrices over the iteration in our numerical verification. Hence, the corresponding matrix inverse operations can be simplified to the divisions to reduce the complexity with little performance loss. As shown in Fig. 5, Simplified denotes the approximation by treating $\bm{\Sigma}^{VN}$ and $\bm{\Sigma}^{SN}$ as diagonal matrices and Original means there is no approximation in Algorithm 2. We use normalized mean square error (NMSE) for the evaluation of the CE performance. Fig. 5 illustrates that this approximation introduces little performance loss, which, although, has greatly reduced the complexity as aforementioned.

Algorithm 2 Joint DAD-CE Algorithm

1: Input:

\bm{Y_{p}}

\bm{A}

\bm{\mu}^{pri}

\bm{\Sigma}^{pri}

\sigma_{n}^{2}

p_{a}

2: Initialize:

\bm{\mu}^{VN}=\bm{\mu}^{pri},\bm{\Sigma}^{VN}=\bm{\Sigma}^{pri}

3: repeat

4: SN update:

\bm{\mu}^{SN}

\bm{\Sigma}^{SN}

by (20)-(21)

5: SN update:

l^{SN}

by (26)

6: VN update:

\bm{\mu}^{VN}

\bm{\Sigma}^{VN}

by (28)-(29)

7: VN update:

l^{VN}

by (31)

8: until

N_{iter}

reached

9: CE output:

\bm{\mu}^{dec}

\bm{\Sigma}^{dec}

by (32)-(33)

10: DAD Decision:

l^{dec}

by (34)

11: return

\left\{\hat{\bm{h}}_{k},\hat{\phi}_{k},\forall k\in\mathcal{K}_{tot}\right\}

V-B MIMO-LDPC-SIC Decoder

After obtaining the key parameters, such as interleaving patterns and channels by the joint DAD-CE algorithm, the LDPC decoding problem can be addressed by the standard BP algorithm [12, 44]. Likewise, the DD process is performed by updating messages iteratively at different nodes. Differently, since the LDPC is a forward error correction code, besides the observation and variable nodes, the check nodes (CNs) are considered in the factor graph to provide an extra belief, as shown in Fig. 6. We rewrite the received signal in the LDPC phase as

{\mathbf{Y}_{c}}={\mathbf{Y}_{{L_{p}}+1:{L},:}}=\sum\limits_{k\in{\mathcal{K}_{a}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\mathbf{h}_{{i_{k}}}^{T}}+{\mathbf{Z}_{{}_{{L_{p}}+1:{L},:}}}},

(38)

where $\mathbf{Y}_{c}\in{C^{{L_{c}}\times M}}$ is the last $L_{c}$ rows of $\mathbf{Y}$ . The LDPC decoder is tasked to recover the last $B_{c}$ bits of information based on the received signal $\mathbf{Y_{c}}$ , estimated interleaving patterns and channels using the low-complexity iterative BP algorithm. Owing to the two-phase encoding scheme, these key parameters can be recovered in the decoding of the CS phase. We emphasize that once the active indicators $\phi_{k}$ are recovered, the positions of $\left\{\phi_{k}=1,k\in\mathcal{K}_{tot}\right\}$ in the selection matrix $\bm{\Phi}$ , i.e., $\left\{i_{k},\forall k\in\mathcal{K}_{a}\right\}$ , are determined. Thereby, the interleaving patterns of active devices are also recovered. As shown in (38), the zero-padded sequences $\tilde{\mathbf{s}}_{k}$ are subject to different permutations, and each is determined by the interleaving pattern $i_{k}$ . Therefore, the effect of these permutations needs to be considered when the messages are being sent to and from the VNs, i.e., interleaving and de-interleaving in Fig. 6, respectively.

The connections among nodes in Fig. 6 appear to be more involved than those of Fig. 3. In the upper part of Fig. 6, the CNs (blue color) and VNs (green color) as well as the edges connecting them constitute the Tanner graph in LDPC. The subscript $N=\tilde{L}_{c}-B_{c}$ denotes the number of CNs in the LDPC code, which corresponds to the number of rows of the LDPC check matrix. $K$ denotes the number of active devices estimated in the CS phase. Other subscripts are consistent with the aforementioned. The edges between CNs and VNs are described by the LDPC check matrix, which cannot be marked explicitly in the graph. For example, in the check matrix of device $k$ , if the entry $c_{i,j}=1$ , there will be an edge between SN $c_{k,i}$ and VN $s_{k,j}$ .

The lower part of Fig. 6 refers to the graph for MIMO detection, of which the edges between VNs and SNs (yellow color) are simply determined by (38) though looking complicated. For example, the VN $s_{k,l_{1}}\left(l_{1}\in\left[1:\tilde{L}_{c}\right]\right)$ is connected to the $l_{2}$ -th SN from all antennas (i.e., $y_{m,l_{2}},m=1,2,...,M,l_{2}\in\left[1:L_{c}\right]$ ). We note that $l_{1}$ is not necessarily equal to $l_{2}$ in the presence of zero-padding and interleaving. Correspondingly, the VNs connected to SN $y_{m,l_{2}}$ depend on whose data is interleaved to the $l_{2}$ -th channel use. For instance, as illustrated in Fig. 7, the set of VNs connected to SN $y_{1,3}$ is $\left\{s_{1,5},s_{3,4},s_{5,2}\right\}$ . That is, after zero-padding and interleaving, the fifth, fourth, and second bits of devices $1$ , $3$ , and $5$ are mapped to the third channel use, respectively. Before conducting the MP algorithm, we define the types of messages as follows.

•

${R_{k,n\to l_{1}}}$ : Messages passed from CN $c_{k,n}$ to VN $s_{k,l_{1}}$ .
•

${Q_{k,l_{1}\to n}}$ : Messages passed from VN $s_{k,l_{1}}$ to CN $c_{k,n}$ .
•

${P_{k,l_{1}\to m,l_{2}}}$ : Messages passed from VN $s_{k,l_{1}}$ to SN $y_{m,l_{2}}$ .
•

${\Lambda_{m,l_{2}\to k,l_{1}}}$ : Messages passed from SN $y_{m,l_{2}}$ to VN $s_{k,l_{1}}$ .

The massages $R$ and $Q$ refer to the parity check constraints in the LDPC code, while $P$ and $\Lambda$ are related to the received signals in the MIMO system.

We first give the MP rules for the LDPC decoding with BPSK modulation, which is known as the sum-product algorithm. As illustrated in Fig. 8, the messages $R$ and $Q$ are iteratively updated between CNs and VNs. Likewise, for the reduction of complexity, we give the updating rules in the LLR form.

	$\displaystyle{Q_{k,l_{1}\to n}}$	$\displaystyle=\sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left(\Lambda_{j,l_{2}\to k,l_{1}}\right)}\!+\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n}{\!\!\!{R_{k,j\to l_{1}}}}$		(39)
	$\displaystyle{R_{k,n\to l_{1}}}$	$\displaystyle=2{\tanh^{-1}}\left({\prod\limits_{j\in{\mathcal{N}_{v}}\left({k,n}\right)\backslash l_{1}}{\!\!\!\tanh\left({\frac{{{Q_{k,j\to n}}}}{2}}\right)}}\right),$		(40)

where ${\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n$ denotes the set of CNs connected to $s_{k,l_{1}}$ except $c_{k,n}$ , i.e., $\left\{c_{k,n},c_{k,n_{1}},\cdots,c_{k,n_{i}}\right\}$ in Fig. 8. Likewise, ${\mathcal{N}_{v}}\left({k,n}\right)\backslash l_{1}$ denotes the set of VNs connected to $c_{k,n}$ except $s_{k,l_{1}}$ , i.e., $\left\{s_{k,l_{1}^{1}},\cdots,s_{k,l_{1}^{i}}\right\}$ in Fig. 8.

The message ${\Lambda_{m,l_{2}\to k,l_{1}}}$ in (39) is the LLR with the probability of VN $s_{k,l_{1}}$ taking different values observed at SN $y_{m,l_{2}}$ . For BPSK modulated system, it is given by

	$\displaystyle{\Lambda_{m,l_{2}\to k,l_{1}}}$	$\displaystyle=\log\frac{{P\left({{y_{m,l_{2}}}\|\mathbf{H},{s_{k,l_{1}}}=+1}\right)}}{{P\left({{y_{m,l_{2}}}\|\mathbf{H},{s_{k,l_{1}}}=-1}\right)}}$		(41)
		$\displaystyle=\frac{2}{{\sigma_{{z_{k,l_{2}}}}^{2}}}\text{Re}\left({h_{m,k}^{*}\left({{y_{m,l_{2}}}-{\mu_{{z_{k,l_{2}}}}}}\right)}\right),$		(41)

where $\bm{H}$ is the channel matrix, which can be estimated in the CS phase. We note that similar to the joint DAD-CE algorithm, (41) is also obtained by treating the interference from other devices as noise. We rewrite (38) as

	$\displaystyle{y_{m,l_{2}}}$	$\displaystyle=\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2}),l\in\mathcal{L}(m,l_{2})\end{subarray}}{{h_{m,j}}{\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)}+{n_{m,l_{2}}}}$		(42)
		$\displaystyle={h_{m,k}}\pi_{i_{k}}\left(\tilde{s}_{k,l_{1}}\right)\!+\!\underbrace{\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2})\backslash k\\ l\in\mathcal{L}(m,l_{2})\backslash l_{1}\end{subarray}}{{h_{m,j}}\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\!+\!{n_{m,l_{2}}}}}_{z_{k,l_{2}}},$		(42)

where $n_{m,l_{2}}$ is the Gaussian noise with zero mean and variance ${\sigma_{n}^{2}}$ , and $\mathcal{K}(m,l_{2}),\mathcal{L}(m,l_{2})$ denote the set of devices as well as the corresponding bits related with SN $y_{m,l_{2}}$ , respectively. Accordingly, the set of related VNs is $\left\{s_{j,l}|j\in\mathcal{K}(m,l_{2}),l\in\mathcal{L}(m,l_{2})\right\}$ and we note that the subscripts $j$ and $l$ are one-to-one mappings. For instance, Fig. 7 showcases the VNs related with SN $y_{1,3}$ . In this regard, $\mathcal{K}(1,3)=\left\{1,3,5\right\}$ and $\mathcal{L}(1,3)=\left\{5,4,2\right\}$ and the corresponding VNs are $\left\{s_{1,5},s_{3,4},s_{5,2}\right\}$ . The Gaussian noise $n_{m,l_{2}}$ and the interference from other devices are all treated as noise denoted by $z_{k,l_{2}}$ , which is a Gaussian variable with mean ${\mu_{{z_{k,l_{2}}}}}$ and variance ${\sigma_{{z_{k,l_{2}}}}^{2}}$ given by

	$\displaystyle{\mu_{{z_{k,l_{2}}}}}$	$\displaystyle=\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2})\backslash k\\ l\in\mathcal{L}(m,l_{2})\backslash l_{1}\end{subarray}}{{h_{m,j}}\cdot\mathbb{E}\left[\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right]}$		(43)
	$\displaystyle\sigma_{{z_{k,l_{2}}}}^{2}$	$\displaystyle=\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2})\backslash k\\ l\in\mathcal{L}(m,l_{2})\backslash l_{1}\end{subarray}}{{{\left\|{{h_{m,j}}}\right\|}^{2}}\cdot\text{Var}\left[\left(\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right)\right]}+\sigma_{n}^{2}.$		(44)

And for BPSK modulation, the mean and variance of $\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)$ is given by

	$\displaystyle\mathbb{E}\left[\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right]$	$\displaystyle=2\cdot\pi_{i_{j}}\left({P_{j,l\to m,l_{2}}}\right)-1$		(45)
	$\displaystyle\text{Var}\left[\left(\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right)\right]$	$\displaystyle=4\cdot\pi_{i_{j}}\left(\left({1-{P_{j,l\to m,l_{2}}}}\right)\cdot{P_{j,l\to m,l_{2}}}\right),$		(46)

where ${P_{j,l\to m,l_{2}}}$ denotes the probability of VN $s_{j,l}=1$ , and is initialized to $0.5$ . We note that it needs to be interleaved before the calculation. As shown in Fig. 9, ${P_{k,l_{1}\to m,l_{2}}}$ is updated by collecting the incoming messages from CNs related with VN $s_{k,l_{1}}$ and those from all SNs except $y_{m,l_{2}}$ . We give the update rule as below.

{P_{k,l_{1}\to m,l_{2}}}\!=\!\frac{{\exp\!\left(\Lambda+R\right)}}{{1+\exp\left(\Lambda+R\right)}},

(47)

where

\Lambda=\!\!\!\sum\limits_{j\in M\backslash m}{\pi_{i_{j}}^{-1}\left({\Lambda_{j,l_{2}\to k,l_{1}}}\right)},~{}~{}R=\!\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)}{{R_{k,j\to l_{1}}}}.

(48)

Similar to (39), the message $\Lambda_{j,l_{2}\to k,l_{1}}$ needs to be de-interleaved in the update of $P$ . The LLR of VN $s_{k,l_{1}}$ at the end of an iteration is given by

{L_{k,l_{1}}}=\sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left({\Lambda_{j,l_{2}\to k,l_{1}}}\right)+\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)}{{R_{k,j\to l_{1}}}}}.

(49)

The information bit $\hat{v}_{k,l_{1}}^{c}$ is decoded as one if $L_{k,l_{1}}>0$ and zero otherwise. Since LDPC codes are described by the parity matrix $\bm{C}$ , the iteratively decoding process is continued till $\text{mod}\left(\bm{C}\hat{\bm{v}}_{k}^{c},2\right)=0$ or the maximum number of iterations $N_{iter}$ is reached.

To further improve the spectrum efficiency, the QPSK modulation is also considered in this coding system, of which the constellation set is $\mathbb{S}=\left\{\pm 1/\sqrt{2},\pm 1/\sqrt{2}i\right\}$ . Briefly, QPSK modulated signals can be split into two orthogonal BPSK ones. As such, we can implement the above MP algorithm on these two signals separately. Additionally, the real and imaginary parts of the messages $\Lambda$ and $P$ need to be considered separately, so does the parity of subscript $l_{1}$ in $Q_{k,l_{1}\to n}$ and $R_{k,n\to l_{1}}$ . It is worth noting that the range of $l_{1}$ in $\Lambda$ and $P$ is half of that in $Q$ and $P$ , i.e., $\left[1:\tilde{L}_{c}/2\right]$ , since there are both messages on the real and imaginary parts. In what follows we give the updated rules of these messages.

Similar to (41), $\Lambda$ is the LLR of the probability that $s_{k,l_{1}}$ takes different values in $\mathbb{S}$ . However, it is no longer a real number. Instead, it is given by

\Lambda_{m,l_{2}\to k,l_{1}}=\frac{2\sqrt{2}}{{\sigma_{{z_{k,l_{2}}}}^{2}}}\cdot{h_{m,k}^{*}\cdot\left({{y_{m,l_{2}}}-{\mu_{{z_{k,l_{2}}}}}}\right)},

(50)

where the mean and variance of $z_{k,l_{2}}$ are given in (43) and (44), respectively. And for QPSK modulation, the mean and variance of $\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)$ is given by

$\displaystyle\mathbb{E}\left[\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right]$	$\displaystyle=1/\sqrt{2}\cdot\left\{2\cdot\pi_{i_{j}}\left({P_{j,l\to m,l_{2}}^{r}}\right)-1\right.$
	$\displaystyle\quad\quad\quad~{}\left.+\left[2\cdot\pi_{i_{j}}\left({P_{j,l\to m,l_{2}}^{i}}\right)-1\right]\cdot i\right\}$	(51)
$\displaystyle\text{Var}\left[\left(\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right)\right]$	$\displaystyle=2\cdot\pi_{i_{j}}\left[{P_{j,l\to m,l_{2}}^{r}}-\left({P_{j,l\to m,l_{2}}^{r}}\right)^{2}\right.$
	$\displaystyle\quad\quad\quad~{}\left.+{P_{j,l\to m,l_{2}}^{i}}-\left({P_{j,l\to m,l_{2}}^{i}}\right)^{2}\right],$	(52)

where ${P_{j,l\to m,l_{2}}^{r}}$ and ${P_{j,l\to m,l_{2}}^{i}}$ are the real and imaginary parts of $P_{j,l\to m,l_{2}}$ , respectively. Similar to (47) and (48), ${P_{k,l_{1}\to m,l_{2}}^{r}}$ and ${P_{k,l_{1}\to m,l_{2}}^{i}}$ are given by

	$\displaystyle{P_{k,\lceil\frac{l_{1}}{2}\rceil\to m,l_{2}}^{r}}$	$\displaystyle=\frac{\exp\left[\text{Re}\left(\Lambda\right)+R\right]}{{1+\exp\left[\text{Re}\left(\Lambda\right)+R\right]}},~{}l_{1}~{}\text{is odd},$		(53)
	$\displaystyle{P_{k,\frac{l_{1}}{2}\to m,l_{2}}^{i}}$	$\displaystyle=\frac{\exp\left[\text{Im}\left(\Lambda\right)+R\right]}{{1+\exp\left[\text{Im}\left(\Lambda\right)+R\right]}},~{}l_{1}~{}\text{is even}.$		(54)

As such, the range of subscript $l_{1}$ in $P$ is half of that in $R$ as aforementioned. $\Lambda$ and $R$ are defined in (48). The update rule for message $R$ is the same as (40) and that for $Q$ is given by

{Q_{k,l_{1}\to n}}=\left\{\begin{array}[]{l}\sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left(\text{Re}\left(\Lambda_{j,l_{2}\to k,\lceil\frac{l_{1}}{2}\rceil}\right)\right)}\!\\ \quad\quad\quad+\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n}{\!\!\!{R_{k,j\to l_{1}}}},~{}l_{1}~{}\text{is odd}\\ \sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left(\text{Im}\left(\Lambda_{j,l_{2}\to k,\frac{l_{1}}{2}}\right)\right)}\!\\ \quad\quad\quad+\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n}{\!\!\!{R_{k,j\to l_{1}}}},~{}l_{1}~{}\text{is even}\\ \end{array}\right.

(55)

The LLR of VN $s_{k,l_{1}}$ at the end of an iteration is given by

L_{k,l_{1}}={Q_{k,l_{1}\to n}}+R_{k,n\to l_{1}},~{}\forall n\in\mathcal{N}_{c}(k,l_{1}).

(56)

The decision rule and termination condition are the same as those in the BPSK system mentioned earlier. Finally, we obtain the decoded messages.

Note that the estimated number of active devices $K$ is not guaranteed to be equal to $K_{a}$ . Therefore, not all the decoded messages satisfy the parity check. We denote $\widehat{\mathcal{V}}=\left\{\hat{\bm{v}}_{k}^{c},k\in\widehat{\mathcal{K}}\right\}$ and $\widehat{\mathcal{K}}$ as the set of successfully decoded messages and the corresponding devices, respectively. And we have $\left|{\widehat{\mathcal{K}}}\right|\leq{K_{a}}$ . To further improve the performance, we combine the MIMO-LDPC decoder with the SIC method and we denote it as the MIMO-LDPC-SIC algorithm, which works as follows.

Let $\widehat{\mathbf{H}}\in{C^{K\times M}}$ and $\mathcal{K}$ denote the channel matrix and the set of active devices estimated in the CS phase. Let ${\hat{\mathcal{V}}_{0}}$ and ${\hat{\mathcal{K}}_{0}}$ respectively denote the sets of correctly decoded messages (i.e., those that satisfy the check) and the corresponding devices obtained by the decoder, which are initialized to empty sets. With $\bm{Y}_{c}$ , interleaving patterns $\left\{\pi_{i_{k}},k\in\mathcal{K}\right\}$ and $\hat{\bm{H}}$ , the decoder outputs the set of successfully decoded messages ${\widehat{\mathcal{V}}}$ and the corresponding devices ${\widehat{\mathcal{K}}}$ . Then we have ${\widehat{\mathcal{V}}_{0}}\leftarrow{\widehat{\mathcal{V}}_{0}}\cup\widehat{\mathcal{V}}$ , ${\widehat{\mathcal{K}}_{0}}\leftarrow{\widehat{\mathcal{K}}_{0}}\cup\widehat{\mathcal{K}}$ and ${\mathbf{H}}={\widehat{\mathbf{H}}_{k,:}}$ for $k\in{\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}$ . The residual signal is updated by

\mathbf{Y}={\mathbf{Y}_{c}}-\sum\nolimits_{k\in{{\hat{\mathcal{K}}}_{0}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\widehat{\mathbf{H}}_{k,:}}},

(57)

where $\tilde{\mathbf{s}}_{k}\in\mathbb{C}^{L_{c}\times 1}$ is the $k$ -th codeword in $\widehat{\mathcal{V}}_{0}$ after modulation and zero-padding. The updated $Y$ and $\bm{H}$ as well as the interleaving patterns are sent to the MIMO-LDPC decoder for the next round of decoding. This iterative process ends when ${\hat{\mathcal{K}}}=\emptyset$ or ${\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}=\emptyset$ . The overall decoding algorithm is summarized in Algorithm 3.

Algorithm 3 MIMO-LDPC-SIC Decoding Algorithm

1: Input:

\bm{Y_{c}}

\hat{\bm{H}}

\mathcal{L}=\left\{{{i_{k}},k\in\mathcal{K}}\right\}

\sigma_{n}^{2}

2: Initialize:

\bm{Y}\!=\!\bm{Y_{c}}

\bm{H}\!=\!\hat{\bm{H}}

{\widehat{\mathcal{V}}_{0}}\!\leftarrow\!\emptyset

{\widehat{\mathcal{K}}_{0}}\!\leftarrow\!\emptyset

{R_{k,n\to l_{1}}}=0

,BPSK:

{P_{k,l_{1}\to m,l_{2}}}=0.5

, QPSK:

{P_{k,l_{1}\to m,l_{2}}}=0.5+0.5i

3: repeat

4: repeat

\Lambda

update:

\Lambda_{m,l_{2}\to k,l_{1}}

by (41) or (50).

Q

update:

Q_{k,l_{1}\to n}

by (39) or (55).

R

update:

R_{k,n\to l_{1}}

by (40).

P

update:

P_{k,l_{1}\to m,l_{2}}

by (47) or (53)-(54).

L

update and hard decision:

L_{k,l_{1}}

by (49) or (56).

10: until

N_{iter}

reached or

\text{mod}(\bm{C}\hat{\bm{v}}^{c},2)=0

11: Output:

{\widehat{\mathcal{V}}}

{\widehat{\mathcal{K}}}

12:

{\widehat{\mathcal{V}}_{0}}\leftarrow{\widehat{\mathcal{V}}_{0}}\cup\widehat{\mathcal{V}}

{\widehat{\mathcal{K}}_{0}}\leftarrow{\widehat{\mathcal{K}}_{0}}\cup\widehat{\mathcal{K}}

13:

{\mathbf{H}}={\widehat{\mathbf{H}}_{k,:}}

for

k\in{\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}

14:

\mathbf{Y}={\mathbf{Y}_{c}}-\sum\nolimits_{k\in{{\hat{\mathcal{K}}}_{0}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\widehat{\mathbf{H}}_{k,:}}}

15: until

{\widehat{\mathcal{K}}}=\emptyset

{\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}=\emptyset

16: Return:

{\widehat{\mathcal{V}}_{0}}

{\widehat{\mathcal{K}}_{0}}

We note that the stitching of the messages in the CS and LDPC phases is easy. In URA, devices are not identified, as such, the IDs can not be employed to distinguish the messages. We have known that the interleaving patterns and channels $\left\{\pi_{i_{k}},\hat{\bm{h}}_{k},k\in\widehat{\mathcal{K}}_{0}\right\}$ obtained in the CS phase acting as key parameters participate in the decoding process in the LDPC phase. For each decoded message in $\widehat{\mathcal{V}}_{0}$ , it is decoded with a specific interleaving pattern $\pi_{i_{k}}$ as well as the channel $\hat{\bm{h}}_{k}$ . As aforementioned, $\pi_{i_{k}}$ is uniquely determined by the message index representation $i_{k}$ , which directly corresponds to device $k$ ’ preamble. Therefore, $\pi_{i_{k}}$ and $\hat{\bm{h}}_{k}$ establish a connection of devices’ messages in the two phases. Briefly, if $\hat{\bm{v}}^{c}$ is successfully decoded with the participation of $\pi_{i_{k}}$ and $\hat{\bm{h}}_{k}$ , it is exactly the latter $B_{c}$ bits of message of device $k$ . As such, the stitching of the messages in two phases will not be a problem.

V-C Joint Update

The above CS and LDPC decoders can recover the $B$ bits of information with their work in tandem. Besides, thanks to the consistency of the above MP algorithm, the BP-based decoder can draw a connection of the decoding process in the CS and LDPC phases. That is, messages in the CE as well as the MIMO-LDPC decoding processes can be jointly updated by utilizing the belief of each other, thus leading to improved performance. This joint update algorithm is denoted as joint DAD-CE-DD algorithm and elaborated as follows.

For the successfully decoded devices ${\widehat{\mathcal{K}}_{0}}$ , the corresponding messages $\left\{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right),k\in\widehat{\mathcal{K}}_{0}\right\}$ can be leveraged as soft pilot sequences joint with their codewords $\left\{\bm{a}_{i_{k}},k\in\widehat{\mathcal{K}}_{0}\right\}$ in the CS phase to carry out a second CE. This longer pilot sequence will lead to a better CE performance, which has been confirmed in our simulation in Fig. 15. We note that the subsequent CE is conducted via the above joint DAD-CE algorithm with devices’ activity fixed. In this regard, the Gaussian messages $\left\{\mu_{lm\to km}^{SN},\bm{\Sigma}_{l\to k}^{SN},\mu_{lm\to km}^{VN},\bm{\Sigma}_{l\to k}^{VN}\right\}$ in (20)-(21) and (28)-(29) are iteratively updated with a longer observation sequence $\left\{\bm{y}_{l},l\in\left[1:L_{p}+L_{c}\right]\right\}$ . Besides, it is worth noting that in the CE output in (32)-(33), the prior mean and covariance $\mu_{k}^{pri},\bm{\Sigma}_{k}^{pri}$ are no longer zero and $\bm{I}_{M}$ , respectively. Instead, they are the output estimation $\mu_{k}^{dec}$ and estimating deviation $\bm{\Sigma}_{k}^{dec}$ of $\bm{h}_{k}$ in the first CE, respectively.

Algorithm 4 Joint DAD-CE-DD Algorithm

1: Input:

\bm{Y_{p}}

\bm{Y_{c}}

\bm{A}

\bm{\mu}^{pri}

\bm{\Sigma}^{pri}

\sigma_{n}^{2}

p_{a}

\triangleright

CS Phase: Joint DAD-CE Algorithm

\triangleright

Collision Resolution Protocol

4: Output:

\bm{\mu}^{dec}

\bm{\Sigma}^{dec}

\mathcal{L}=\left\{{{i_{k}},k\in\mathcal{K}}\right\}

\triangleright

Joint DAD-CE-DD Algorithm:

6: Initialize:

\widetilde{\bm{Y}}_{r}=\bm{Y}_{c}

\widetilde{\mathcal{V}}=\emptyset

\widetilde{\mathcal{K}}=\emptyset

7: repeat

\triangleright

LDPC Phase:

9: Input:

\widetilde{\bm{Y}}_{r}

\bm{\mu}^{dec}

\mathcal{L}

\sigma_{n}^{2}

10: MIMO-LDPC-SIC Decoding Algorithm

11: Output:

{\widehat{\mathcal{V}}_{0}}

{\widehat{\mathcal{K}}_{0}}

12:

{\widetilde{\mathcal{V}}}\leftarrow{\widetilde{\mathcal{V}}}\cup\widehat{\mathcal{V}}_{0}

{\widetilde{\mathcal{K}}}\leftarrow{\widetilde{\mathcal{K}}}\cup\widehat{\mathcal{K}}_{0}

13:

\widetilde{\bm{Y}}_{c}=\sum\nolimits_{k\in{{\widehat{\mathcal{K}}}_{0}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\bm{\mu}^{dec}_{k}}}+\bm{Z}

14:

\triangleright

CS Phase:

15: Input:

\bm{Y}_{p}

\bm{A}

\widetilde{\bm{Y}}_{c}

{\widehat{\mathcal{V}}_{0}}

\bm{\mu}^{dec}

\bm{\Sigma}^{dec}

\sigma_{n}^{2}

16: CE (Activity fixed)

17:

\triangleright

Collision Resolution Protocol

18: Output:

\widetilde{\bm{\mu}}^{dec}

\widetilde{\bm{\Sigma}}^{dec}

19:

\widetilde{\bm{Y}}_{r}={\mathbf{Y}_{c}}-\sum\nolimits_{k\in{{\widetilde{\mathcal{K}}}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\widetilde{\bm{\mu}}^{dec}_{k}}}

20: until

{\widehat{\mathcal{K}}}_{0}=\emptyset

{\mathcal{K}}\backslash{\widetilde{\mathcal{K}}}=\emptyset

21: Return:

\widetilde{\mathcal{V}}

\widetilde{\mathcal{K}}

\mathcal{L}=\left\{{{i_{k}},k\in\widetilde{\mathcal{K}}}\right\}

\widetilde{\bm{\mu}}^{dec}

Remark 1: We remark that messages in the MP algorithm exhibit the property of consistency and unity. As such, messages among different parts can always be jointly processed and updated. For instance, recalling the Joint DAD-CE algorithm, where the Bernoulli messages are updated by utilizing the Gaussian messages. Likewise, in the MIMO-LDPC-SIC Decoding algorithm, the MP algorithm can be applied to MIMO detection or LDPC decoding. In the proposed algorithm, we combine these two parts and update the underlying messages jointly, i.e., the LLR message of the symbol in MIMO detection can be involved in the LDPC decoding process, and vice versa. Moreover, the property is again exploited in the Joint DAD-CE-DD algorithm. By leveraging the belief of the estimated channel and the correctness of the LDPC codewords, the CE can be again performed aided with the correctly decoded codewords in the LDPC phase, thus connecting these two phases. Throughout the paper, we take into consideration the idea of the joint update for the messages in MP-based algorithms.

One might argue that for the correctly decoded messages, improving the accuracy of the corresponding channels will not bring substantial performance improvements. Nevertheless, according to the SIC method in (57), by improving the channel accuracy of those devices whose messages are successfully decoded, the residual signal of the incorrectly decoded messages can be obtained more accurately. We denote the residual signal with improved accuracy as $\widetilde{\bm{Y}}_{r}$ . Consequently, $\widetilde{\bm{Y}}_{r}$ will improve the performance for those messages that have not been successfully decoded yet. And this is the reason why the channel needs to be estimated twice or more times. We give a brief diagram for this algorithm in Fig. 10, where $\widetilde{\bm{Y}}_{c}$ denotes the noisy signal reconstructed from the correctly decoded messages.

We note this data-aided CE algorithm appears to be similar to the work in [14], which employs the data as soft pilots to conduct CE jointly with the real pilot sequences. However, the convergence and correctness of the estimated data cannot be verified, which may result in the propagation of errors and failure of the joint data and channel estimation. Whereas the convergence of the proposed algorithm can be guaranteed. The proposed Joint DAD-CE-DD algorithm consists of two modules, namely, the Joint DAD-CE algorithm for activity detection and channel estimation, and the MIMO-LDPC-SIC Decoding algorithm for data decoding. Messages are passed between these two modules. For arbitrary channel estimation results as input, the decoding process will be executed by the MIMO-LDPC-SIC Decoding algorithm. Once a codeword is successfully decoded (i.e., satisfies the check), it will not be changed in subsequent iterations. Therefore, the number of correctly decoded codewords is monotonically non-decreasing at each iteration. Correspondingly, the error rate $P_{e}$ is monotonically non-increasing, and also, it is bounded below by zero. According to the well-known monotone convergence theorem [45], the proposed algorithm is guaranteed to converge. Besides, with the correctly decoded codewords aided to estimate the channel, an improved CE accuracy can be guaranteed and it contributes to a higher probability of successfully decoding for those who have not yet. This iterative algorithm ends when no more messages are correctly decoded or all the messages are decoded successfully, which is summarized in Algorithm 4.

V-D Complexity

In this section, we compare the complexity of the proposed algorithm with the existing alternative in terms of the real-number multiplication (or division), addition (or subtraction), and other operations (eg., exp, log, tan, $\text{tan}^{-1}$ ), as shown in Table I. As a divide-and-conquer strategy, there are inner and outer decoders in the CB-ML algorithm, referred to as ML and tree decoder, respectively. The complexity of our algorithm is mainly incurred by the Joint DAD-CE algorithm and MIMO-LDPC decoder. In Table I, the parameters $\alpha$ and $c$ satisfy that $B_{p}=\alpha\log_{2}K_{a},L_{p}=c\log_{2}K_{a}$ , respectively. We note that the complexity of the tree decoder is defined by the total number of parity check constraints that need to be verified [19], and this is obtained in the regime that $B_{p}$ and $K_{a}$ tend to infinity. Nevertheless, in practice, it increases exponentially with the number of slots $S$ . Besides, the tree code in [32] verifies the estimated codewords in each segment successively, and the check and estimate are performed separately. While during the LDPC decoding process in our algorithm, the check and estimate of the codeword are performed jointly and simultaneously, both by calculating the soft messages. Once the codeword is estimated, the corresponding check is completed. As for the inner code, the complexity of the ML decoder is nearly the same order as that of our algorithm. However, as a consequence of the coordinate descent algorithm, there are $2^{B_{p}}$ cycles in the ML decoding per iteration, which can only be computed successively. On the contrary, as a remarkable property of the MP algorithm, all computations in our algorithm can be decomposed into many local ones, which can be performed in parallel over the factor graph. Hence, our algorithm has lower time complexity. In general, the proposed algorithm exhibits a complexity linear with $K,M,L$ , which is lower than the existing CB-ML algorithm.

TABLE I: COMPLEXITY COMPARISON PER ITERATION

Algorithms

Real Multi. / Div.

Real Add. / Sub.

Others

CB-ML

ML decoder

\mathcal{O}(KL_{p}^{2})

\mathcal{O}(KL_{p}^{2})

\backslash

Tree decoder

\mathcal{O}\left(K_{a}^{\alpha/c}\text{log}_{2}K_{a}\right),B_{p},K_{a}\rightarrow\infty

\backslash

Joint DAD-CE-DD

Joint DAD-CE

\mathcal{O}(KML_{p})

\mathcal{O}(KML_{p})

exp / log:

\mathcal{O}(KL_{p})

MIMO-LDPC

\mathcal{O}(\widehat{K}M\widetilde{L}_{c})

\mathcal{O}(\widehat{K}M\widetilde{L}_{c})

exp:

\mathcal{O}(\widehat{K}M\widetilde{L}_{c})

, tanh /

\text{tanh}^{-1}

\mathcal{O}(\widehat{K}NN_{v})

Note:

K=2^{B_{p}}

\widehat{K},N,N_{v}

denote the estimated number of active devices, the rows and row weights of the parity check matrix, respectively.

VI Numerical Results

VI-A Parameter Settings

In this section, we assess the overall performance of the proposed framework with the metric defined in (3)-(4). The CB-ML proposed by Fengler et. al. in [32] serves as the baseline in this paper. For the sake of fair comparison to the benchmarks, and isolating the fundamental aspects of the problem without additional model complication, we consider the flat path loss model in the simulation, i.e., the channel is i.i.d. Rayleigh fading model and the LSFC is fixed to $\beta_{k}=1$ in all schemes. We note again that this can be achieved by the well-studied power control schemes [40, 41, 42] in practice. As such, the carrier frequency is not specified in the simulation, since it can be arbitrary and does not affect the performance of the proposed algorithm, when considering flat path loss model. The user distribution is considered to be uniformly distributed in the cell. The noise variance is set to $\sigma_{n}^{2}=1$ and is known to all schemes. The maximum scheduling times for the collision resolution protocol is $t_{max}=3$ . For the joint DAD-CE algorithm, MIMO-LDPC-SIC decoder, and Joint DAD-CE-DD algorithm, the maximum number of iterations is set to $N_{iter}=20$ , $30$ , and $20$ , respectively. We fix the message length to $B=96$ with $B_{p}=12$ and $B_{c}=84$ . The length of the CS codeword is $L_{p}=100$ in both schemes. In the channel coding part, we employ the $(3,6)$ -regular LDPC code [27] with the rate $0.5$ . In our scheme, the length of channel use varies with the number of zeros padded in the sequence. However, it is discretely valued in the CB-ML scheme and changes with the number of slots denoted as $S$ . Since the length of the information is fixed, we can change $S$ by adjusting the parity check allocation, which is given in Table II according to the principle in [19]. We note that the slot length $J=12$ aligns with $B_{p}$ .

TABLE II: The parity check allocation for different number of slots.

$S$	Parity Check Allocation
12	(12,0), (3,9), (9,3), $\cdots$ , (9,3), (0,12)
13	(12,0), (4,8), (8,4), $\cdots$ , (0,12)
14	(12,0), (7,5), (7,5), $\cdots$ , (7,5), (0,12)
15	(12,0), (7,5), (7,5), $\cdots$ , (7,5), (0,12), (0,12)
16	(12,0), (6,6), (6,6), $\cdots$ , (6,6), (0,12)
17	(12,0), (6,6), (6,6), $\cdots$ , (6,6), (0,12), (0,12)

In Table II, (3,9) means that the first three are data bits and the last nine are parity bits and so on. For more details, we refer the reader to [32] for CB-ML algorithm and [19] for the tree coding scheme.

VI-B Results

We first evaluate the error rate performance of the proposed schemes compared with the CB-ML scheme as a function of the code rate. The relationship between the code rate $R_{c}$ and channel use $L$ is $R_{c}=B/L$ . In Fig. 11, we fix the energy-per-bit $E_{b}/N_{0}=10$ dB, the number of antennas and active devices $M=30$ and $K_{a}=50$ , respectively. In our schemes, No-colli-avoid refers to the scheme that there is no collision avoidance or the joint update. That is, the CS and LDPC phases work sequentially and potential collision may exist in the CS phase. For the other three schemes, the collision resolution protocol has been implemented and No-SIC denotes that the SIC method is not utilized in the LDPC phase nor do the two phases work jointly. Joint-BPSK and Joint-QPSK refer to the joint update algorithm with the SIC method in BPSK and QPSK modulations, respectively. As illustrated in Fig. 11, there is a substantial performance enhancement compared to the CB-ML algorithm. The main reason is that the employed LDPC code has a higher code rate than the tree code proposed in [19]. As such, the proposed algorithm can work well in a relatively high rate region while the CB-ML algorithm can not. Once we set a high rate, there will be a substantial performance enhancement. However, this improvement decreases with the decrease of rate. For the target error rate $P_{e}=0.1$ , the required code rate of the proposed Joint-BPSK is increased by $1.45$ times that of CB-ML, while the Joint-QPSK increases even more. For instance, the proposed Joint-QPSK and Joint-BPSK outperform CB-ML with a nearly $0.8$ dB gap and a $1.5$ dB gap at $R_{c}=0.06$ , respectively. Additionally, we note that Joint-QPSK exhibits an overall $0.7$ dB performance gain compared with Joint-BPSK in terms of the error rate. This is because only half of the channel use is needed for the QPSK than BPSK modulation and thus more zeros can be padded into the sequence. Consequently, the interference of devices in the QPSK system is further reduced, resulting in improved performance. Altogether, the gain of the collision resolution protocol increases with the decrease of the code rate. However, there is no much gain for the work of joint update. This is because the gain mainly comes from the reduction of interference of devices, which has been reduced to a relatively low level by the zero-padding. As such, the joint update cannot provide more gains. Nevertheless, as we will see shortly, there is a certain gain by the joint update under a higher interference scenario.

For the sake of complexity reduction, we employ BPSK modulation instead of QPSK in the subsequent simulations since both of them have already outperformed CB-ML. In Fig. 12, we compare the error rate performance of the proposed algorithms with respect to $E_{b}/N_{0}$ , the number of antennas $M$ and active devices $K_{a}$ , respectively. The number of channel uses is fixed to $L=1600$ . Correspondingly, the data is split into $16$ slots in the CB-ML algorithm, of which the parity check allocation is given in Table. II. The other parameters are set as $E_{b}/N_{0}=10$ dB, $M=30$ and $K_{a}=50$ . As illustrated in Fig. 12, the state-of-the-art method CB-ML suffers from high error floors, which stems from the poor parity check constraints. In contrast, with collision resolution, the proposed Joint and No-SIC schemes exhibit water-falling curves in terms of the error rate with respect to $E_{b}/N_{0}$ and $K_{a}$ , while they gradually stabilize with respect to $M$ . This is because the interference of devices cannot be reduced to infinitesimal by increasing $M$ . As such, error still exits even for a large $M$ .

Moreover, we evaluate the performance of the proposed Joint-DAD-CE-DD algorithm under a large-scale antenna with respect to different channel uses. As showcased in Fig. 13, the performance of the proposed algorithm is almost linear with the antennas, but with different slopes under different channel uses. We note that CB-ML cannot work at $L=1600$ , while the Joint-DAD-CE-DD algorithm can achieve $P_{e}<10^{-4}$ when $M>140$ . Besides, the proposed algorithm only needs half of the channel uses, i.e., $L=800$ to outperform CB-ML when $M\geq 90$ . To sum up, the proposed algorithms outperform CB-ML in terms of error rate and spectral efficiency, with an explicit gain with respect to various variables.

In order to provide more insights about the performance gain of different methods, we compare the error rate and CE performance among the methods in terms of $E_{b}/N_{0}$ . In this regard, we fix the channel use to $L=268$ with BPSK modulation and no zero is padded, resulting in the increase of interference among devices. Besides, the code rate $R_{c}=0.36$ , which is relatively high and the CB-ML cannot work under this circumstance. In Fig. 14, No-Joint refers to the scheme with the SIC method but no joint update. Firstly, it is obvious that the collision resolution-based schemes all outperform the No-colli-avoid scheme with the increase of $E_{b}/N_{0}$ . Then, the performance gain of the SIC method in the LDPC phase is about $0.3$ dB or even more when $E_{b}/N_{0}$ increases according to the comparison of No-SIC and No-Joint. Finally, by the comparison of No-Joint and Joint, there is also about $0.3$ dB gain coming from the work of joint update and it decreases with the increase of $E_{b}/N_{0}$ . At a higher $E_{b}/N_{0}$ , such as $20$ dB, there is nearly no performance gain by the joint update. This demonstrates that under a higher interference level of devices, the joint update algorithm can provide a certain gain, which gradually decreases with the increase of $E_{b}/N_{0}$ .

The NMSE is employed to evaluate the CE performance. In Fig. 15, the AMP algorithm investigated in [4] is utilized as a baseline for comparison. When $E_{b}/N_{0}$ ranges from $13$ to $15$ dB, the performance of Joint and No-Joint is of slight difference and both are slightly worse than AMP. However, the Joint scheme outperforms AMP with an ultra-linear speed when $E_{b}/N_{0}$ exceeds $15$ dB. As aforementioned, this gain exactly comes from the Joint-DAD-CE-DD algorithm, where the real pilot sequences as well as correctly decoded messages jointly conduct the task of CE. Figs. 14 and 15 demonstrate that the joint update indeed contributes to the improved accuracy of both the DD and CE. In general, the proposed algorithms provide substantial performance enhancements compared with CB-ML in terms of efficiency and accuracy.

VII Conclusion

In this paper, we have investigated a joint DAD, CE, and DD algorithm for MIMO massive URA. Different from the state-of-the-art slotted transmission scheme, the data in the proposed framework has been split into only two parts. A portion of the data is coded by CS and the rest is LDPC coded. Based on the principle of the BP, the iterative MP algorithm has been utilized to decode these two parts of data. Moreover, by exploiting the concept of the belief within each part, a joint decoding framework has been proposed to further improve the performance. Additionally, based on the ED and SWP, a collision resolution protocol has been developed to resolve the codeword collision issue in the URA system. In addition to the complexity reduction, the proposed algorithm has exhibited a substantial performance enhancement compared to the state-of-the-art in terms of efficiency and accuracy. The possible avenues for future work are various. More realistic channel models can be taken into consideration for MIMO URA. By utilizing the sparsity in the virtual angular domain of the spatially correlated channel, the multi-user interference can be further reduced. In addition, the proposed algorithm can be extended to handle more practical scenarios in the presence of asynchronization and frequency offset. Moreover, although this paper only studies LDPC codes, other codes, such as Turbo codes and Polar codes, could be applied if iterative soft decodings with superior performance exist in a desired block length.

-A Analysis of the Collision Resolution Protocol

Let $\bm{\phi}=\text{diag}\{\bm{\Phi}\}$ denote the vector composed of diagonal elements of $\bm{\Phi}$ and $M_{p}=2^{B_{p}}$ . $\bm{\Phi}$ is the modified selection matrix defined in (8). Then we have $\bm{\phi}=\left[\phi_{1},\phi_{2},\cdots,\phi_{M_{p}}\right]$ , which is drawn independently from the signal space

\mathcal{\bm{S}}=\{\bm{\phi}\in\{0,\cdots,K_{a}\}^{M_{p}}|\sum\limits_{i=1}^{M_{p}}{\phi_{i}}=K_{a}\}

(58)

according to the multinomial probability mass function :

p(\bm{\phi})=P\{\bm{\phi}=\left(\phi_{1},\phi_{2},\!\cdots,\!\phi_{M_{p}}\right)\}=\frac{K_{a}!}{\phi_{1}!\cdots\phi_{M_{p}}!}\!\cdot\!\frac{1}{{M_{p}}^{K_{a}}}

(59)

for $\bm{\phi}\in\mathcal{\bm{S}}$ . Then the probability of no collision among devices equals the probability that $K_{a}$ entries in $\bm{\phi}$ are one and the others are zero, which is given by

	$\displaystyle P_{no-colli}$	$\displaystyle=\frac{K_{a}!}{{M_{p}}^{K_{a}}}\cdot\binom{M_{p}}{K_{a}}$		(60)
		$\displaystyle=\frac{M_{p}(M_{p}-1)\cdots(M_{p}-K_{a}+1)}{{M_{p}}^{K_{a}}}.$		(60)

-A1 The first window sliding

The number of devices in collision is

k_{0}=K_{a}\cdot(1-P_{no-colli}).

(61)

Let $n_{0}$ and $k_{0}^{i}$ denote the numbers of collided messages in the first $B_{p}$ bits and collided devices with each message, respectively, which satisfies $\sum\nolimits_{i=1}^{n_{0}}{k_{0}^{i}}=k_{0}$ .

For the $i$ -th collided message, the probability of no collision in the sliding part is

P_{no-colli-1}=\frac{M_{p1}(M_{p1}\!-\!1)\!\cdots\!(M_{p1}\!-\!k_{0}^{i}+1)}{{M_{p1}}^{k_{0}^{i}}},i\in\left[1:n_{0}\right],

(62)

where $M_{p1}=2^{B_{0}}$ . Besides, the common part is for devices to splice back the messages among different windows. If the common parts of different collided messages are the same, the message of one device will be spliced to the other’s. As such, an error occurs. Therefore, this probability should also be taken into consideration. The probability that the common parts of different messages are different is

P_{no-colli-1^{\prime}}=\frac{M_{p1^{\prime}}(M_{p1^{\prime}}-1)\cdots(M_{p1^{\prime}}-n_{0}^{i}+1)}{{M_{p1^{\prime}}}^{n_{0}}},

(63)

where $M_{p1^{\prime}}=2^{B_{p}-B_{0}}$ . $P_{no-colli-1}$ refers to the probability of no collision after sliding once, while $P_{no-colli-1^{\prime}}$ represents the probability that messages of one device cross windows can be spliced back successfully after sliding and retransmission. However, there may still be collision after sliding once, which requires another round.

-A2 The second window sliding

For the $i$ -th collided message, the number of devices in collision after sliding once is

k_{1}^{i}=k_{0}^{i}\cdot(1-P_{no-colli-1}),~{}i\in\left[1:n_{0}\right].

(64)

Without loss of generality, we consider the $i$ -th collided message in the first sliding. Likewise, let $n_{1}^{i}$ and $k_{2}^{j}$ denote the numbers of collided messages and the corresponding devices after the first sliding, respectively, which satisfies $\sum\nolimits_{j=1}^{n_{1}^{i}}{k_{2}^{j}}=k_{1}^{i}$ .

For the sliding part, the probability of no collision is

P_{no-colli-2}=\frac{M_{p1}(M_{p1}\!-\!1)\!\cdots\!(M_{p1}\!-\!k_{2}^{j}\!+\!1)}{{M_{p1}}^{k_{2}^{j}}},j\in\left[1:n_{1}^{i}\right].

(65)

Note that in the common part, if there are identical sequences among $n_{i}^{i}$ messages, there will not be an error. What needs to be considered is whether the sequences among $n_{0}$ messages are the same or not. Thus, the probability of no collision in the common part is

P_{no-colli-2^{\prime}}=\frac{\mathbb{P}_{M_{p1^{\prime}}}^{\sum{n_{1}^{i}}}}{\prod_{i=1}^{n_{0}}{\mathbb{P}_{M_{p1^{\prime}}}^{n_{1}^{i}}}}.

(66)

Here we denote $\mathbb{P}_{n}^{r}$ as the permutations, i.e., $n!/(n-r)!$ . It is obvious that $\mathbb{P}_{M_{p1^{\prime}}}^{n_{1}^{i}}<({M_{p1^{\prime}}})^{n_{0}}$ since $n_{i}^{i}<n_{0}$ . However, the relationship between $\sum\nolimits_{i=1}^{n_{0}}{n_{1}^{i}}$ and $n_{0}$ is unknown. As such, the relationship between $P_{no-colli-2^{\prime}}$ and $P_{no-colli-1^{\prime}}$ is also unknown. Fortunately, since $k_{2}^{j}<k_{1}^{i}<k_{0}^{i}$ , we have $P_{no-colli-2}>P_{no-colli-1}$ , which demonstrates that as the window sliding progresses, the probability of devices in a collision will decrease.

-A3 The $l$ -th window sliding

In order to draw a general conclusion, we extend the above derivation to the $l$ -th ( $l>2$ ) window sliding. Similarly, we denote $n_{l-1}^{i}$ and $k_{l}^{j}$ as the numbers collided messages and corresponding devices after the $(l-1)$ -th sliding, respectively, which satisfies $\sum\nolimits_{j=1}^{n_{l-1}^{i}}{k_{l}^{j}}=k_{l-1}^{i}$ . As such, the probability of no collision after the $l$ -th sliding is given by

P_{no-colli-l}=\frac{M_{p1}(M_{p1}\!-\!1)\!\cdots\!(M_{p1}\!-\!k_{l}^{j}\!+\!1)}{{M_{p1}}^{k_{l}^{j}}},

(67)

where $j\in\left[1:n_{l-1}^{i}\right].$ And the number of collided devices after the the $l$ -th sliding is

k_{l+1}^{j}=k_{l}^{j}\cdot(1-P_{no-colli-l}),~{}j\in\left[1:n_{l-1}^{i}\right].

(68)

Obviously, we have

k_{l+1}^{j}<k_{l}^{j}<\cdots<k_{2}^{j}<k_{1}^{i}<k_{0}^{i}<\cdots<K_{a}.

(69)

Consequently, the probability of no collision satisfies

P_{no-colli-l}>\cdots>P_{no-colli-1}>P_{no-colli}.

(70)

In this regard, we further have

	$\displaystyle k_{l+1}^{j}$	$\displaystyle<K_{a}\cdot(1-P_{no-colli-l})\cdots(1-P_{no-colli})$		(71)
		$\displaystyle<K_{a}\cdot(1-P_{no-colli})^{(l+1)},$		(71)

which demonstrates that as the sliding times $l$ goes to infinity, the number of collided devices will eventually approach zero.

Although the collision probability of the common part does not show an obvious trend of decreasing, in practice, we can reduce the collision probability by controlling the length of the window. There is a compromise between the collision probability of common and sliding parts. A longer sliding length will result in a lower collision probability in the sliding part, but at the same time, it will also increase that of the common part. Since the collision probability of the sliding part will eventually approach zero, the sliding length can be shorter in practice to minimize the collision probability of the common part.

-B Derivation of the Gaussian and Bernoulli Messages in V-A

-B1 Derivation of $\bm{\Sigma}_{lk}$ at SNs

According to the formulation in (17), since the activity and channel among different devices are both independent from each other, the $(m,n)$ -th $(m\neq n)$ entry for $\bm{\Sigma_{z_{lk}}}$ is given by

		$\displaystyle(\Sigma_{z_{lk}})_{(m,n)}$		(72)
		$\displaystyle=\sum\nolimits_{i\in K\backslash k}{{{\left\|{{A_{li}}}\right\|}^{2}}}\mathbb{E}\left\{{\left[{\phi_{i}}{h_{im}}-\hat{\phi}_{i}\cdot\hat{h}_{im}\right]}\right.$
		$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad~{}\left.{\cdot\left[{{\phi_{i}}h_{in}}{-\hat{\phi}_{i}\cdot\hat{h}_{in}}\right]}^{*}\right\}$
		$\displaystyle=\sum\nolimits_{i\in K\backslash k}{{{\left\|{{A_{li}}}\right\|}^{2}}}\mathbb{E}\Big{\{}{\left[{{\phi_{i}}{h_{im}}-p_{i\to l}^{VN}\cdot\mu_{im\to lm}^{VN}}\right]}$
		$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad~{}{\cdot\left[{{\phi_{i}}h_{in}}{-p_{i\to l}^{VN}\cdot{{{\mu_{in\to ln}^{VN}}}}}\right]}^{*}\Big{\}}$
		$\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left\|{{A_{li}}}\right\|}^{2}\cdot\left\{\mathbb{E}\left[\phi_{i}^{2}\cdot h_{im}\cdot h_{in}^{*}\right]\right.}$
		$\displaystyle\quad\quad\quad\quad\quad\quad{\left.-(p_{i\to l}^{VN})^{2}\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right\}},$

where

		$\displaystyle\mathbb{E}\left[\phi_{i}^{2}\cdot h_{im}\cdot h_{in}^{}\right]=\mathbb{E}\left[\phi_{i}^{2}\right]\cdot\mathbb{E}\left[h_{im}\cdot h_{in}^{}\right].$		(73)
		$\displaystyle\quad=p_{i\to l}^{VN}\cdot\left[(\Sigma_{i\to l}^{VN})_{(m,n)}+\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right].$		(73)

Then we can rewrite (72) as

		$\displaystyle(\Sigma_{z_{lk}})_{(m,n)}$		(74)
		$\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left\|{{A_{li}}}\right\|}^{2}\cdot p_{i\to l}^{VN}\cdot\left\{(\Sigma_{i\to l}^{VN})_{(m,n)}\right.}$
		$\displaystyle\quad\quad\quad\quad\quad\quad~{}~{}{\left.+(1-p_{i\to l}^{VN})\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right\}}$
		$\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left\|{{A_{li}}}\right\|}^{2}\cdot p_{i\to l}^{VN}\cdot\left\{(\Sigma_{i\to l}^{VN})_{(m,n)}\right.}$
		$\displaystyle\quad\quad\quad\quad\quad\quad~{}~{}{\left.+q_{i\to l}^{VN}\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right\}},~{}m\neq n,$

where $q_{i\to l}^{VN}=1-p_{i\to l}^{VN}$ denotes the probability that the Bernoulli variable $\phi_{k}$ equals zero. Therefore, the covariance of $\mathbf{z}_{lk}$ is not diagonal. If $m=n$ , we have

	$\displaystyle(\Sigma_{z_{lk}})_{(m,m)}$	$\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left\|{{A_{li}}}\right\|}^{2}\cdot p_{i\to l}^{VN}\cdot\Big{\{}(\Sigma_{i\to l}^{VN})_{(m,m)}}$		(75)
		$\displaystyle{\quad\quad\quad+~{}q_{i\to l}^{VN}\cdot\left\|\mu_{im\to lm}^{VN}\right\|^{2}\Big{\}}}+\sigma_{n}^{2}.$		(75)

This completes the derivation.

-B2 Derivation of Gaussian Messages at VNs

We give the derivations of (28) and (29) here. Rewrite the multivariate complex Gaussian pdf in (25) to canonical notation as

f(\bm{x}|\bm{\mu},\bm{\Sigma})=\exp\left[-\bm{x}^{H}\bm{\Gamma}\bm{x}+\bm{\eta}^{H}\bm{x}+\bm{x}^{H}\bm{\eta}+\bm{\zeta}\right],

(76)

where

	$\displaystyle\bm{\Gamma}$	$\displaystyle=\bm{\Sigma}^{-1},\quad\bm{\eta}=\bm{\Sigma}^{-1}\bm{\mu}$		(77)
	$\displaystyle\bm{\zeta}$	$\displaystyle=-\bm{\eta}^{H}\bm{\Gamma}^{-1}\bm{\eta}+\ln\|\bm{\Gamma}\|-M\ln\pi,$		(77)

Therefore, the product of $n$ Gaussian pdfs is

	$\displaystyle\prod_{i=1}^{n}{f_{i}(\bm{x})}$	$\displaystyle=\exp\Big{[}-\bm{x}^{H}\left(\sum\nolimits_{i=1}^{n}{\bm{\Gamma}_{i}}\right)\bm{x}+\left(\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}\right)^{H}\bm{x}$		(78)
		$\displaystyle\quad\quad\quad+\bm{x}^{H}\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}+\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}\Big{]},$		(78)

where

\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}=-\sum\nolimits_{i=1}^{n}{\left(\bm{\eta}_{i}^{H}\bm{\Gamma}_{i}^{-1}\bm{\eta}_{i}+ln|\bm{\Gamma}_{i}|\right)}-nM\ln\pi.

(79)

We make a simple substitution as below

	$\displaystyle\bm{\Gamma}_{n}$	$\displaystyle=\sum\nolimits_{i=1}^{n}{\bm{\Gamma}_{i}},\quad\bm{\eta}_{n}=\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}$		(80)
	$\displaystyle\bm{\zeta}_{n}$	$\displaystyle=-\bm{\eta}_{n}^{H}\bm{\Gamma}_{n}^{-1}\bm{\eta}_{n}+\ln\|\bm{\Gamma}_{n}\|-M\ln\pi.$		(80)

Then we can rewrite (78) as

$\displaystyle\prod_{i=1}^{n}{f_{i}(\bm{x})}$	$\displaystyle=\exp\Big{[}-\bm{x}^{H}\bm{\Gamma}_{n}\bm{x}+\bm{\eta}_{n}^{H}\bm{x}+\bm{x}^{H}\bm{\eta}_{n}+\bm{\zeta}_{n}$	(81)
	$\displaystyle\quad\quad\quad+\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}-\bm{\zeta}_{n}\Big{]}$
	$\displaystyle=c\cdot\exp\left[-\bm{x}^{H}\bm{\Gamma}_{n}\bm{x}+\bm{\eta}_{n}^{H}\bm{x}+\bm{x}^{H}\bm{\eta}_{n}+\bm{\zeta}_{n}\right],$

where $c=\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}-\bm{\zeta}_{n}$ is a constant for normalization. Since the $n$ Gaussian pdfs are independent, the product of which still follows the Gaussian distribution. Comparing with (77) and (78), the covariance and mean of the Gaussian distribution in (81) are

	$\displaystyle\bm{\Sigma}_{n}$	$\displaystyle=\bm{\Gamma}_{n}^{-1}=\left(\sum\nolimits_{i=1}^{n}{\bm{\Sigma}_{i}^{-1}}\right)^{-1}$		(82)
	$\displaystyle\bm{\mu}_{n}$	$\displaystyle=\bm{\Sigma}_{n}\cdot\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}=\bm{\Sigma}_{n}\cdot\sum\nolimits_{i=1}^{n}{\bm{\Sigma}_{i}^{-1}\bm{\mu}_{i}}.$		(82)

Similarly, in (27), it is the product of Gaussian pdfs $\left\{f(\bm{h}|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN}),i\in\mathcal{L}\backslash l\right\}$ and $f(\bm{h}|\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri})$ . According to the rule in (82), we can obtain (28) and (29).

References

[1] Y. Liu, S. Zhang, X. Mu, Z. Ding, R. Schober, N. Al-Dhahir, E. Hossain, and X. Shen, “Evolution of NOMA toward next generation multiple access (NGMA),” Aug. 2021. Online Available: arXiv:2108.04561.
[2] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the internet of things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88-99, Sept. 2018.
[3] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615-637, March 2021.
[4] L. Liu and W. Yu, “Massive connectivity with massive MIMO—Part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, June 2018.
[5] Y. Wu, X. Gao, S. Zhou, W. Yang, Y. Polyanskiy, and G. Caire, “Massive access for future wireless communication systems,” IEEE Wireless Commun., vol. 27, no. 4, pp. 148-156, Aug. 2020.
[6] V. Shyianov, F. Bellili, A. Mezghani, and E. Hossain, “Massive unsourced random access based on uncoupled compressive sensing: Another blessing of massive MIMO,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 820-834, Mar. 2021.
[7] J. Gao, Y. Wu, Y. Wang, W. Zhang, and F. Wei, “Uplink transmission design for crowded correlated cell-free massive MIMO-OFDM systems,” Sci. China Inf. Sci., vol. 64, no. 8, pp. 182309:1–182309:16, Aug. 2021.
[8] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
[9] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing-based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764-779, Jan. 2020.
[10] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020-2033, Jan. 2017.
[11] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), St. Petersburg, Russia, Aug. 2011, pp. 2168-2172.
[12] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Understanding belief propagation and its generalizations,” Mitsubishi Tech. Rep. TR2001-022, Jan. 2002.
[13] Z. Zhang, Y. Li, C. Huang, Q. Guo, L. Liu, C, Yuen, and Y. L. Guan, “User activity detection and channel estimation for grant-free random access in LEO satellite-enabled internet of things,” IEEE Internet Things J., vol. 7, no. 9, pp. 8811-8825, Sept. 2020.
[14] H. Iimori, T. Takahashim, K. Ishibashi, G. T. F. de Abreu, and W. Yu, “Grant-free access via bilinear inference for cell-free MIMO with low-coherent pilots,” to appear in IEEE Trans. Wireless Commun.. Online Available: arXiv:2009.12863v1.
[15] S. Jiang, X. Yuan, X. Wang, C. Xu, and W. Yu, “Joint user identification, channel estimation, and signal detection for grant-free NOMA,” IEEE Wireless Commun., vol. 19, no. 10, pp. 6960-6976, Oct. 2020.
[16] C. Huang, L. Liu, C. Yuen, and S. Sun, “Iterative channel estimation using LSE and sparse message passing for mmWave MIMO systems,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 245-259, Jan. 2019.
[17] Y. Polyanskiy, “A perspective on massive random-access,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Aachen, Germany, June 2017, pp. 2523–2527.
[18] O. Ordentlich and Y. Polyanskiy, “Low complexity schemes for the random access Gaussian channel,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June 2017, pp. 2528-2532.
[19] V. K. Amalladinne, J. -F. Chamberland, and K. R. Narayanan, “A coded compressed sensing scheme for unsourced multiple access,” IEEE Trans. Inf. Theory, vol. 66, no. 10, pp. 6509-6533, Oct. 2020.
[20] A. Fengler, P. Jung, and G. Caire, “SPARCs and AMP for unsourced random access,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Paris, France, July 2019, pp. 2843-2847 .
[21] A. Fengler, P. Jung, and G. Caire, “SPARCs for unsourced random access,” to appear in IEEE Trans. Inf. Theory. Online Available: https://arxiv.org/abs/1901.06234.
[22] A. Joseph and A. R. Barron, “Least squares superposition codes of moderate dictionary size are reliable at rates up to capacity,” IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 2541–2557, Jan. 2012.
[23] A. K. Pradhan, V. K. Amalladinne, K. R. Narayanan, and J. Chamberland, “Polar coding and random spreading for unsourced multiple access,” in Proc. IEEE Int. Conf. Commun. (ICC), Dublin, Ireland, June 2020, pp. 1-6.
[24] M. Zheng, Y. Wu, and W. Zhang, “Polar coding and sparse spreading for massive unsourced random access,” in Proc. IEEE Veh. Technol. Conf. (VTC-Fall), Victoria, BC, Canada, Nov. 2020, pp. 1-5.
[25] A. Pradhan, V. Amalladinne, A. Vem, K. R. Narayanan, and J. -F. Chamberland, “A joint graph based coding scheme for the unsourced random access Gaussian channel,” in Proc. IEEE Glob. Commun. Conf. (GLOBECOM), Waikoloa, HI, USA, Dec. 2019, pp. 1-6.
[26] A. Vem, K. R. Narayanan, J. Chamberland, and J. Cheng, “A user-independent successive interference cancellation based coding scheme for the unsourced random access Gaussian channel,” IEEE Trans. Commun., vol. 67, no. 12, pp. 8258-8272, Dec. 2019.
[27] R. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 21-28, Jan. 1962.
[28] S. S. Kowshik, K. Andreev, A. Frolov, and Y. Polyanskiy, “Short-packet low-power coded access for massive MAC,” in Proc. Asilomar Conf. Signals Syst. Comput. (ACSSC), Pacific Grove, CA, USA, Nov. 2019, pp. 827-832.
[29] S. S. Kowshik, K. Andreev, A. Frolov, and Y. Polyanskiy, “Energy efficient random access for the quasi-static fading MAC,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Paris, France, 2019, July 2019, pp. 2768-2772.
[30] S. S. Kowshik, K. Andreev, A. Frolov, and Y. Polyanskiy, “Energy efficient coded random access for the wireless uplink,” IEEE Trans. Commun., vol. 68, no. 8, pp. 4694-4708, Aug. 2020.
[31] S. Haghighatshoar, P. Jung, and G. Caire, “Improved scaling law for activity detection in massive MIMO systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA, Jun. 2018, pp. 381–385.
[32] A. Fengler, S. Haghighatshoar, P. Jung, and G. Caire, “Non-Bayesian activity detection, large-scale fading coefficient estimation, and unsourced random access with a massive MIMO receiver,” IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 2925-2951, May 2021.
[33] V. Shyianov, F. Bellili, A. Mezghani, and E. Hossain, “Massive unsourced random access based on uncoupled compressive sensing: Another blessing of massive MIMO,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 820-834, Mar. 2021.
[34] J. Dong, J. Zhang, Y. Shi, and J. H. Wang, “Faster activity and data detection in massive random access: A multi-armed bandit approach,” Jan. 2020. Online Available: arXiv:2001.10237v1.
[35] L. Ping, L. Liu, K. Wu, and W. K. Leung, “Interleave division multiple access,” IEEE Trans. Wireless Commun., vol. 5, no. 4, pp. 938–947, Apr. 2006.
[36] L. Sanguinetti, E. Björnson, and J. Hoydis, “Toward massive MIMO 2.0: Understanding spatial correlation, interference suppression, and pilot contamination,” IEEE Trans. Commun., vol. 68, no. 1, pp. 232-257, Jan. 2020.
[37] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,” IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169-6183, Dec.1, 2015.
[38] L. You, X. Gao, X.-G. Xia, N. Ma, and Y. Peng, “Pilot reuse for massive MIMO transmission over spatially correlated Rayleigh fading channels,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3352-3366, June 2015.
[39] X. Xie and Y. Wu, “Unsourced random access with a massive MIMO receiver: Exploiting angular domain sparsity,” in Proc. IEEE/CIC Int. Conf. Commun. China (ICCC), Xiamen, China, 2021, pp. 741-745.
[40] V. Chandrasekhar, J. G. Andrews, T. Muharemovic, Z. Shen, and A. Gatherer, “Power control in two-tier femtocell networks,” IEEE Trans. Wireless Commun., vol. 8, no. 8, pp. 4316-4328, Aug. 2009.
[41] P. Patel and J. Holtzman, “Analysis of a simple successive interference cancellation scheme in a DS/CDMA system,” IEEE J. Sel. Areas Commun., vol. 12, no. 5, pp. 796-807, June 1994.
[42] G. Turin, “The effects of multipath and fading on the performance of direct-sequence CDMA systems,” IEEE J. Sel. Areas Commun., vol. 2, no. 4, pp. 597-603, July 1984.
[43] T. Li, Y. Wu, M. Zheng, D. Wang, and W. Zhang, “SPARC-LDPC coding for MIMO massive unsourced random access,” in Proc. IEEE Globecom Workshops (GC Wkshps), Taipei, Taiwan, Dec. 2020, pp. 1-6.
[44] T. L. Narasimhan, A. Chockalingam, and B. S. Rajan, “Factor graph based joint detection/decoding for LDPC coded large-MIMO systems,” in Proc. IEEE Veh. Technol. Conf. (VTC-Spring), Yokohama, Japan, May 2012, pp. 1-5.
[45] J. Bibby, “Axiomatisations of the average and a further generalisation of monotonic sequences”, Glas. Math. J., vol. 15, no. 1, pp. 63–65, 1974.

$\displaystyle p_{l\to k}^{SN}$	$\displaystyle=\left[1+\frac{P(\mathbf{y}_{l}\|\phi_{k}=0,\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{P(\mathbf{y}_{l}\|\phi_{k}=1,\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}\right]^{-1}$	(22)
	$\displaystyle=\left[1+\frac{P(\bm{y}_{l}=\bm{z}_{lk}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{P(\bm{y}_{l}=A_{lk}\cdot\bm{h}_{k}+\bm{z}_{lk}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}\right]^{-1}$
	$\displaystyle=\left[1+\frac{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})}\right]^{-1}$
	$\displaystyle=\frac{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})}{f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})+f(\bm{y}_{l}\|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})},$

Joint Device Detection, Channel Estimation, and Data Decoding with Collision Resolution for MIMO Massive Unsourced Random Access

Abstract

Index Terms:

I Introduction

I-A Background and Related Works

I-B Challenges

I-C Contributions

I-D Notations

II System Model

III Encoder

III-A CS Phase

III-B LDPC Phase

IV Collision Resolution Protocol

V Decoder

V-A Joint DAD-CE Algorithm

V-A1 Message Update at Observation Nodes

V-A2 Message Update at Variable Nodes

V-A3 DAD Decision and CE Output

V-B MIMO-LDPC-SIC Decoder

V-C Joint Update

V-D Complexity

VI Numerical Results

VI-A Parameter Settings

VI-B Results

VII Conclusion

-A Analysis of the Collision Resolution Protocol

-A1 The first window sliding

-A2 The second window sliding

-A3 The ll-th window sliding

-B Derivation of the Gaussian and Bernoulli Messages in V-A

-B1 Derivation of 𝚺l​k\bm{\Sigma}_{lk} at SNs

-B2 Derivation of Gaussian Messages at VNs

References

-A3 The $l$ -th window sliding

-B1 Derivation of $\bm{\Sigma}_{lk}$ at SNs