This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Joint Device Detection, Channel Estimation, and Data Decoding with Collision Resolution for MIMO Massive Unsourced Random Access

Tianya Li, Yongpeng Wu, Senior Member, IEEE, Mengfan Zheng, Wenjun Zhang, Fellow, IEEE,
Chengwen Xing, Member, IEEE, Jianping An, Senior Member, IEEE, Xiang-Gen Xia, Fellow, IEEE,
and Chengshan Xiao, Fellow, IEEE
T. Li, Y. Wu (corresponding author), and W. Zhang are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Minhang 200240, China. Email: {tianya, yongpeng.wu, zhangwenjun}@sjtu.edu.cn. M. Zheng is with the Department of Electrical, and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K. E-mail: [email protected]. C. Xing and J. An are with the School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China. E-mail: [email protected]; [email protected]. X.-G. Xia is with the Department of Electrical, and Computer Engineering, University of Delaware, Newark, DE 19716 USA. E-mail: [email protected]. C. Xiao is with the Department of Electrical, and Computer Engineering, Lehigh University, Bethlehem, PA 18015 USA. E-mail: [email protected].
Abstract

In this paper, we investigate a joint device activity detection (DAD), channel estimation (CE), and data decoding (DD) algorithm for multiple-input multiple-output (MIMO) massive unsourced random access (URA). Different from the state-of-the-art slotted transmission scheme, the data in the proposed framework is split into only two parts. A portion of the data is coded by compressed sensing (CS) and the rest is low-density-parity-check (LDPC) coded. In addition to being part of the data, information bits in the CS phase also undertake the task of interleaving pattern design and channel estimation (CE). The principle of interleave-division multiple access (IDMA) is exploited to reduce the interference among devices in the LDPC phase. Based on the belief propagation (BP) algorithm, a low-complexity iterative message passing (MP) algorithm is utilized to decode the data embedded in these two phases separately. Moreover, combined with successive interference cancellation (SIC), the proposed joint DAD-CE-DD algorithm is performed to further improve performance by utilizing the belief of each other. Additionally, based on the energy detection (ED) and sliding window protocol (SWP), we develop a collision resolution protocol to handle the codeword collision, a common issue in the URA system. In addition to the complexity reduction, the proposed algorithm exhibits a substantial performance enhancement compared to the state-of-the-art in terms of efficiency and accuracy.

Index Terms:
belief propagation, compressed sensing, LDPC, MIMO, unsourced random access

I Introduction

I-A Background and Related Works

Next generation multiple access (NGMA) aims to support massive connectivity scenarios for many envisioned Internet of Things (IoT) applications, e.g. manufacturing, transportation, agriculture, and medicine, to be efficiently and flexibly connected [5, 2, 4, 6, 1, 3]. A generic scenario for IoT connectivity involves a massive number of machine-type connections [2], known as the massive machine-type communication (mMTC), which is one of the main use cases of the fifth-generation (5G) and beyond [3]. Compared to human-type communications (HTC), MTC has two distinct features: sporadic traffic and short packet communication. The sporadic access means that for any given time, for the sake of long battery life and energy saving, only a small fraction within a huge number of devices are active for data transmission [4]. While the small data payloads result in a fall of spectrum efficiency and an increase of latency when applied with traditional multiple access protocols. In this regard, a number of scalable random access schemes have been investigated in the content of massive connectivity. Conceptually, there are two lines of work for that end, namely, individual and common codebook-based approaches, respectively.

In the spirit of the individual codebook based approach, each device is equipped with a unique codebook for the identification conducted by the base station (BS) [4, 7]. Since the BS is interested in the IDs of devices, this framework can be referred to as sourced random access (SRA). A typical transmission process includes two phases in SRA, namely, training and data transmission, respectively. In the training phase, the BS conducts the task of device activity detection (DAD) and channel estimation (CE) based on the pre-allocated unique pilot sequences transmitted by active devices, while the data transmission is performed in the following phase. The task of DAD-CE can be modeled as the compressed sensing (CS) problem and there have been quantities of algorithmic solutions for this issue. Among these CS recovery techniques, the approximate message passing (AMP) algorithm, which is first proposed in [8] for the single measurement vector (SMV) based problem, exhibits great performance in terms of accuracy and complexity. Since the publication of [8], there have been vast variants based on the AMP algorithm, such as multiple measurement vector (MMV) AMP [4], generalized MMV (GMMV) AMP [9], orthogonal AMP (OAMP) [10], generalized AMP (GAMP) [11] and many other works. Another line of work is based on the belief propagation (BP) algorithm [12], which models the problem as a graph and iteratively calculates the messages among different nodes [13, 14, 15, 16]. Thanks to the consistency of the message updating rules, the messages can always be jointly updated within iterations to obtain improved performance. However, the premise of these algorithms is that each device should have a unique pilot sequence. As we jump out of these algorithmic solutions and go back to the essence of the individual codebook, we find this is inapplicable to assign each device a unique pilot in the mMTC scenario with a huge amount of potential devices. Meanwhile, it is a waste of resources especially with sporadic traffic. To mitigate this issue, the study on the framework of common codebooks is rapidly developed.

As opposed to the case of individual codebooks, active devices choose codewords from a common codebook for their messages. In this regard, the task of BS is only to produce the transmitted messages regardless of the corresponding IDs, thus leading to the so-called unsourced random access (URA). The main difference between URA with other grant-based and grant-free random access protocols is that the BS does not perform device identification but only decodes the list of active device messages up to permutations [3]. Conceptually, there are two distinct advantages for this novel framework: i)\left.i\right) the BS is capable of accommodating massive devices, since all devices share the common codebook instead of being assigned unique preamble sequences as in traditional schemes. ii)\left.ii\right) No device identification information is needed to be embedded in the transmitted sequence. As such, the overhead will be reduced, contributing to improved efficiency. The study on URA is first reported in [17], in which Polyanskiy considered the scenario of a massive number of infrequent and uncoordinated devices and discussed a random coding existence bound in the Gaussian multiple-access channel (GMAC). Since the publication of [17], there have been many works devoted to approaching that bound [18, 19, 20, 21, 23, 24, 25, 26]. A low complexity coding scheme is investigated in [18]. Based on the combination of compute-and-forward and coding for a binary adder channel, it exhibits an improved performance compared with the traditional schemes, such as slotted-ALOHA and treating interference as noise (TIN). However, the size of the codebook increases exponentially with the blocklength, resulting in difficulty in achieving that underlying bound. To mitigate this issue, a slotted transmission framework is proposed in [19], referred to as the coded compressed sensing (CCS) scheme. This cascaded system includes inner and outer codes. The outer tree encoder first splits the data into several slots with parity bits added. The CS coding is then employed within each slot to pick codewords from the codebook. The blocklength, as well as the size of the codebook in each slot, are greatly reduced, thus leading to a relaxation of the computational burden. On the receiver side, the inner CS-based decoder is first utilized to recover the slot-wise transmitted messages within each slot. The outer tree decoder then stitches the decoded messages across different slots according to the prearranged parity. This structure is inherited by the later works in [20, 21], where the authors exploit the concept of sparse regression code (SPARC) [22] to conduct the inner encoding and decode it with the AMP algorithm. Some other coding schemes for URA are also reported in [23, 24, 25, 26]. The polar coding schemes based on TIN and successive interference cancellation (SIC) are investigated in [23, 24]. In [25, 26], the author consider a hierarchical framework, where the device’s data is split into two parts. A portion of the data is encoded by a common CS codebook and recovered by the CS techniques, while the rest is encoded by the low-density parity check code (LDPC) [27] with the key parameters conveyed in the former part. Besides the above works considering the GMAC, there are also works in the fading channel [28, 29, 30].

Besides the above works considering a single receive antenna at the BS, the study of massive URA in multiple-input multiple-output (MIMO) systems has also drawn increasing attention. The BS equipped with multiple antennas provides extra dimensions for the received signal, thus offering more opportunity for signal detection and access of a massive number of devices. A coordinated-wise descend activity detection (AD) algorithm is proposed in [31, 32], which finds the large-scale fading coefficients (LSFCs) of all the devices (coordinates) iteratively. The authors adopt a concatenated coding scheme with the aforementioned tree code as the outer code, and a non-Bayesian method (i.e., maximum likelihood (ML) detection) is leveraged as the inner decoder based on the covariance of the received signal, referred to as the covariance-based ML (CB-ML) algorithm. However, the performance of the CB-ML algorithm degrades dramatically when the number of antennas is less than that of the active devices. There are also some improvements to this algorithm in terms of complexity or accuracy [33, 34]. The slotted transmission framework proposed in [33] eliminates the need for concatenated coding. That is, no parity bit is added within each slot and the slot-wise messages can be stitched together by clustering the estimated channels. However, the channels in [33] are assumed to be identically distributed over all the slots, which is difficult to hold in practice. Besides, the collision in codewords (more than one device chooses the same codeword) will lead to a superimposition of the estimated channels, resulting in a failure of the stitching process. Currently, the design of an efficient collision resolution scheme in URA remains missing. In [34], a more efficient coordinate selection policy is developed based on the multi-armed bandit approaches, leading to a faster convergence than the CB-ML algorithm. Nevertheless, these slotted transmission frameworks, such as CB-ML and its variants, all map a piece of data within each slot to a long transmitted codeword based on the CS coding, resulting in low efficiency. In this regard, the algorithmic design for the MIMO massive URA with high efficiency and accuracy remains missing.

I-B Challenges

In this paper, we investigate a two-phase transmission scheme for MIMO massive URA, where the device’s transmitted data is split into two phases with CS and LDPC coded, respectively. In this framework, no parity bits are needed to be embedded in the CS phase, and the data is encoded linearly in the LDPC phase instead of being mapped to a long codeword. As such, it exhibits higher efficiency and lower latency than the aforementioned slotted transmission scheme. The existing schemes considering this framework all focus on the single-antenna case. For instance, Polar coding [24] and LDPC coding schemes [25, 26] consider the GMAC with perfect channel state information (CSI) at the BS. In this regard, the above channel coding schemes can obtain satisfying performance. However, such an assumption cannot be established under the MIMO case. Besides, because of the massive connectivity and sporadic traffic of devices, the positions of active devices in the estimated equations are not determined. Consequently, conventional linear receivers, such as zero-forcing (ZF), are not applicable. Moreover, compared with the fading channel models [28, 29, 30] where the channel of each device is a scalar, in MIMO, the channel is a vector, and elements among antennas share the same activity. Therefore, elements in the channel vector should be estimated jointly rather than separately, of which the distribution is formulated as a multi-dimensional Bernoulli-Gaussian distribution in this paper. In this regard, the conventional estimators, such as the minimum mean square error (MMSE) estimation or the maximum a posteriori (MAP) estimation, are hard to carry out straightforwardly. Specifically, it is computationally intractable to obtain a precise posterior distribution, since it involves marginalizing a joint distribution of the activity and channel with high dimensions.

Another challenge for the proposed scheme in the MIMO case is activity detection. We note that whether in the GMAC [24, 25, 26] or fading channel [28, 29, 30] in the single-antenna case, the device activity can be directly detected. For instance, since the device’s channel is a scalar in the fading channel, it can be simply determined to be active when the estimated channel is non-zero or larger than a given threshold. However, it will be much involved in MIMO because there are multiple observations of the activity in the estimated channel vector. The existing works, such as [9], make a hard decision based on the energy of the channel and [33] considers the LLR at each antenna separately and sums them to obtain the final decision. However, the threshold for the decision is hard to obtain in practice and the distribution of the channel is not utilized in [9]. Although [33] considers the channel distribution and gives a closed-form expression for activity detection, the nature that channels at each antenna share the same activity is not considered.

Besides, the LDPC code is efficient but sensitive to the CE results. Therefore, the overall performance is limited by the accuracy of CE. Moreover, with the presence of collision in URA, if more than one device chooses the same codeword in the CS phase, the corresponding channels will be superimposed and thus the subsequent LDPC decoding process will fail.

I-C Contributions

To cope with the arising issues, based on the message passing (MP) algorithm, we propose the Joint DAD-CE algorithm to conduct the task of joint activity detection and channel estimate in the CS phase, and MIMO-LDPC-SIC Decoding algorithm for data decoding embedded in the LDPC phase. Moreover, to further improve the performance, we propose the Joint DAD-CE-DD algorithm to jointly update the messages in these two phases by utilizing the belief of each other. Finally, we propose a collision resolution protocol to address the collision in URA. The key and distinguishing features of the proposed algorithms are listed below.

  • Based on the principle of the BP algorithm, we develop a low-complexity iterative MP algorithm to decode the two parts of data. For the CS phase, we investigate the Joint DAD-CE algorithm to recover part of the data embedded in the device activities, the interleaving patterns, and channel coefficients, which are the key parameters for the remaining data. Specifically, we derive a close-form expression for DAD by utilizing the joint distribution of the channel among antennas. Combined with SIC and in the spirit of interleave-division multiple access (IDMA) [35], we elaborate on the MIMO-LDPC-SIC Decoding algorithm to decode the remaining data embedded in the LDPC phase. In addition to complexity reduction, the proposed algorithm exhibits a substantial performance enhancement in terms of accuracy and efficiency compared to the state-of-the-art CB-ML algorithm.

  • Thanks to the consistency of the MP algorithm, we propose the Joint DAD-CE-DD algorithm to further improve the performance. The proposed algorithm suggests a paradigm connecting the two parts. That is, messages in the decoding process of CS and LDPC parts can be jointly updated by utilizing the belief of each other, thus leading to improved performance. We employ the correctly decoded codewords as soft pilots to conduct CE jointly with the codewords in the CS phase, which contributes to improved accuracy of the estimated channel. Combined with the SIC method, the accuracy of the residual signal can be improved, leading to the enhanced decoding performance.

  • Under the current framework based on the common codebook, a collision happens if there are more than one device having the same preamble, which leads to a superimposition of the estimated channels. Accordingly, the superimposed channel will cause a failure in the LDPC decoding process. To this end, we propose a collision resolution protocol based on the energy detection (ED) and sliding window protocol (SWP). Succinctly, the ED is performed on the estimated channel of each device to find out the superimposition. Then, the BS broadcasts the indexes of the superimposed channels to all devices. Afterwards, the devices in collision slide the window in the data sequence and the CS coding is again performed for retransmission.

The rest of the paper is organized as follows. In Section II, we introduce the system model for MIMO massive URA . In Section III, we implement the two-phase encoding scheme and the collision resolution protocol is developed in Section IV. Then, we elaborate on the low-complexity iterative decoding algorithm based on BP and explain the joint update algorithm in Section V. After verifying the numerical results and analyzing the complexity in VI, we conclude the paper in Section VII.

I-D Notations

Unless otherwise noted, lower- and upper-case bold fonts, 𝒙\bm{x} and 𝑿\bm{X}, are used to denote vectors and matrices, respectively; the (m,n)(m,n)-th entry of 𝑿\bm{X} is denoted as Xm,n{X}_{m,n}; 𝑿i,:\bm{X}_{i,:} denotes the ii-th row of 𝑿\bm{X}; {}\{\cdot\}^{*} denotes the conjugate of a number; {}T\{\cdot\}^{T} and {}H\{\cdot\}^{H} denote transpose and conjugate transpose of a matrix, respectively; 𝔼{}\mathbb{E}\{\cdot\} and Var{}\text{Var}\left\{\cdot\right\} denote the statistical expectation and variance, respectively; 𝑿𝒞𝒩(𝒙;𝝁,𝚺)\bm{X}\sim\mathcal{CN}(\bm{x};\bm{\mu},\bm{\Sigma}) means that the random vector 𝒙\bm{x} follows a complex Gaussian distribution with mean 𝝁\bm{\mu} and auto-covariance matrix 𝚺\bm{\Sigma}; {}!\{\cdot\}! denotes the factorial; [1:M]\left[1:M\right] denotes the set of integers between 11 and MM; \l\mathcal{L}\backslash l denotes the entries in set {1,2,,L}\left\{1,2,\cdots,L\right\} except ll; Re()\text{Re}(\cdot) and Im()\text{Im}(\cdot) denote the real and imaginary parts of a complex number, respectively; \lceil\cdot\rceil denotes ceiling; ii is the imaginary unit (i.e., i2=1i^{2}=-1).

II System Model

Consider the uplink of a single-cell cellular network consisting of KtotK_{tot} single-antenna devices, which are being served by a base station (BS) equipped with MM antennas. This paper assumes sporadic device activity, i.e, a small number, KaKtotK_{a}\ll K_{tot} of devices are active within a coherence time. Each device has BB bits of information to be coded and transmitted into a block-fading channel with LL channel uses. Let 𝒗k{0,1}B\bm{v}_{k}\in\left\{0,1\right\}^{B} denote device kk’s binary message and f():{0,1}BLf(\cdot):\left\{0,1\right\}^{B}\rightarrow\mathbb{C}^{L} is some one-to-one encoding function. Typically, in URA scenario, the implementation of f()f(\cdot) is to select the corresponding codeword 𝒂k\bm{a}_{k} from a shared codebook 𝑨=[𝒂1,𝒂2,,𝒂2B]L×2B\bm{A}=\left[\bm{a}_{1},\bm{a}_{2},\cdots,\bm{a}_{2^{B}}\right]\in\mathbb{C}^{L\times 2^{B}} according to 𝒗k\bm{v}_{k} [31, 20, 32]. The corresponding received signal can be written as

𝒀=k𝒦totϕkf(𝒗k)𝒉~kT+𝒁,\bm{Y}=\sum\nolimits_{k\in\mathcal{K}_{tot}}{\phi_{k}f(\bm{v}_{k})\tilde{\bm{h}}_{k}^{T}}+\bm{Z}, (1)

where ϕk\phi_{k} is the device activity indicator, which is modeled as a Bernoulli random variable and defined as follows:

ϕk={1, if device k is active, 0, otherwise k𝒦tot.\phi_{k}=\left\{\begin{array}[]{l}1,\text{ if device }k\text{ is active, }\\ 0,\text{ otherwise }\end{array}\quad\forall k\in\mathcal{K}_{tot}.\right. (2)

𝒉~kM×1\tilde{\bm{h}}_{k}\in\mathbb{C}^{M\times 1} is the channel vector of device kk and 𝒁L×M\bm{Z}\in\mathbb{C}^{L\times M} is the additive white Gaussian noise (AWGN) matrix distributed as 𝒞𝒩(0,σ2𝑰M)\mathcal{CN}(0,\sigma^{2}\bm{I}_{M}). In line with the state-of-the-art setting [32], we assume for simplicity and fair comparison that 𝒉k\bm{h}_{k} are i.i.d., i.e., independent of each other and uncorrelated among antennas. Specifically, the Rayleigh fading model is considered in this paper, where 𝒉~k=βk𝒈k\tilde{\bm{h}}_{k}=\sqrt{\beta_{k}}\bm{g}_{k} and 𝒈𝒌𝒞𝒩(0,𝑰M)\bm{g_{k}}\sim\mathcal{CN}(0,\bm{I}_{M}) denotes the Rayleigh fading component, and βk\beta_{k} denotes LSFC. We would like to mention that more realistic channel models in MIMO or massive MIMO have been discussed and well investigated, such as the spatially correlated fading channel [9, 36, 37, 38, 39]. Due to the limited angular spread, the channel in the virtual angular domain exhibits a sparsity among antennas. As such, the data of devices is not directly superimposed but staggered on different antennas, and the multi-user interference can be further reduced, contributing to improved performance. Nevertheless, for the sake of consistency with the benchmarks [4, 32] and isolating the fundamental aspects of the problem without additional model complication, we embark throughout this paper on the study of i.i.d. Rayleigh fading channel. The study on the spatially correlated channel is remarkable and left for future work.

The BS’s task is to produce a list of transmitted messages (𝒀)\mathcal{L}(\bm{Y}) without identifying from whom they are sent, thereby leading to the so-called URA. The performance of a URA system is evaluated by the probability of missed detection and false alarm, denoted by pmdp_{md} and pfap_{fa}, respectively, which are given by:

pmd\displaystyle p_{md} =1Kak𝒦aP(𝒗k)\displaystyle=\frac{1}{{{K_{a}}}}\sum\nolimits_{k\in{\mathcal{K}_{a}}}{P\left({{\bm{v}_{{k}}}\notin\mathcal{L}}\right)} (3)
pfa\displaystyle p_{fa} =|\{𝒗k:k𝒦a}|||.\displaystyle=\frac{{\left|{\mathcal{L}\backslash\left\{{{\bm{v}_{{k}}}:k\in{\mathcal{K}_{a}}}\right\}}\right|}}{{\left|\mathcal{L}\right|}}. (4)

In this system, the code rate Rc=B/LR_{c}={B}/{L} and the spectral efficiency μ=BKaLM\mu=\frac{B\cdot K_{a}}{L\cdot M}. Let PP denote the power (per symbol) of each device, then the energy-per-bit Eb/N0E_{b}/N_{0} is defined by

Eb/N0=LP2B.E_{b}/N_{0}=\frac{LP}{2B}. (5)

III Encoder

As aforementioned, the existing slotted transmission scheme exhibits low efficiency by the CS coding. To mitigate this issue, a two-phases coding scheme is proposed in [26, 25], which considers the TT-user Gaussian multiple access (GMAC) channel and also URA scenario. We extend this work to the MIMO system and refer to it as the CS-LDPC coding scheme. Similarly, the hierarchical form of the encoding process is considered in this paper. The BB bits of information are first divided into two parts, namely, BpB_{p} and BcB_{c} bits with Bp+Bc=BB_{p}+B_{c}=B. Typically, one would want BpBcB_{p}\ll B_{c}. The former BpB_{p} information bits are coded by a CS-based encoder to pick codeword from the common codebook. Based on the codebook, the BS is tasked to recover part of the messages, the number of active devices, channel coefficients as well as interleaving patterns for the latter part. The remaining BcB_{c} bits are coded with LDPC codes. For clarity, we denote the former and latter encoding processes as CS and LDPC phases, respectively. Correspondingly, the total LL channel uses are split into two segments of lengths LpL_{p} and LcL_{c}, respectively, with Lp+Lc=LL_{p}+L_{c}=L. Since only a small fraction of data is CS coded and the rest is LDPC coded, the efficiency in our scheme is higher than those purely CS-coded schemes [33, 19, 20, 32]. The key features of this encoding process are summarized in Fig. 1. We elaborate on these two encoding phases below.

Refer to caption

Figure 1: The encoding process in the CS and LDPC phases.

III-A CS Phase

The URA fashion is considered in this phase. Let 𝐀Lp×2Bp\mathbf{A}\in\mathbb{C}^{L_{p}\times 2^{B_{p}}} denote the common codebook shared by all the devices. That is, the columns of 𝐀=[𝒂1,𝒂2,,𝒂2Bp]\mathbf{A}=\left[\bm{a}_{1},\bm{a}_{2},\cdots,\bm{a}_{2^{B_{p}}}\right] form a set of codewords with power constraint 𝐚i22=Lp\left\|{{\mathbf{a}_{i}}}\right\|_{2}^{2}=L_{p}, from which each active device chooses in order to encode its Bp{B_{p}} bits of information. With a slight abuse of notation, let 𝒗kp\bm{v}_{k}^{p} denote the first BpB_{p} bits of device kk’s binary message, i.e., 𝒗kp𝒗k(1:Bp){0,1}Bp\bm{v}_{k}^{p}\triangleq\bm{v}_{k}(1:B_{p})\in\left\{0,1\right\}^{B_{p}}. To apply the encoding scheme, we convert 𝒗kp\bm{v}_{k}^{p} into the 1\ell_{1}-norm binary vector 𝒗~kp{0,1}2Bp\tilde{\bm{v}}_{k}^{p}\in\left\{0,1\right\}^{2^{B_{p}}}, in which a single one is placed at the location iki_{k}. The value iki_{k} of binary sequence 𝒗kp\bm{v}_{k}^{p} is obtained by regarding it as an integer expressed in radix-22 (plus one), which we write as ik=[𝒗kp]2[1:2Bp]i_{k}=\left[\bm{v}_{k}^{p}\right]_{2}\in\left[1:2^{B_{p}}\right]. Then, the coded sequence is obtained by taking the transpose of the iki_{k}-th column of 𝑨\bm{A}, which we denote by 𝒂ikT\bm{a}_{i_{k}}^{T}. This facilitates the CS architecture, which maps the information to an elementary vector 𝒗~kp\tilde{\bm{v}}_{k}^{p}, according to which the corresponding codeword is selected from a fixed codebook. The corresponding received signal can be written as

𝒀=k𝒦totϕk𝒂ik𝒉~kT+𝒁,\bm{Y}=\sum\nolimits_{k\in\mathcal{K}_{tot}}{\phi_{k}\bm{a}_{i_{k}}\tilde{\bm{h}}_{k}^{T}}+\bm{Z}, (6)

where ϕk\phi_{k} is the device activity indicator, as defined in (2). The matrix form of (6) is given by

𝒀=𝑨𝚪𝑯~+𝒁,\bm{Y}=\bm{A}\bm{\Gamma}\tilde{\bm{H}}+\bm{Z}, (7)

where 𝐀L×2Bp\mathbf{A}\in\mathbb{C}^{L\times 2^{B_{p}}} is the common codebook shared by all devices. 𝚪{0,1}2Bp×Ktot\mathbf{\Gamma}\in{\left\{{0,1}\right\}^{{2^{B_{p}}}\times{K_{tot}}}} denotes the binary selection matrix. For each active device k𝒦ak\in\mathcal{K}_{a}, the corresponding column 𝚪:,k\bm{\Gamma}_{:,k} is all-zero but a single one in position iki_{k}, while for all 𝒦tot\𝒦a\mathcal{K}_{tot}\backslash\mathcal{K}_{a} the corresponding column 𝚪:,k\bm{\Gamma}_{:,k} is all zeros. 𝑯~=[𝒉~1,,𝒉~Ktot]TKtot×M\tilde{\bm{H}}=\left[\tilde{\bm{h}}_{1},\cdots,\tilde{\bm{h}}_{K_{tot}}\right]^{T}\in\mathbb{C}^{K_{tot}\times M} corresponds to the MIMO channel coefficient matrix. We note that the number of total devices KtotK_{tot} plays no role in URA. In order to get rid of this variable, with slight abuse of notation we define the modified activity indicator and selection matrix as ϕr=k𝒦aγr,k\phi_{r}=\sum\nolimits_{k\in\mathcal{K}_{a}}{\gamma_{r,k}} and 𝚽=diag{ϕ1,,ϕ2Bp}\bm{\Phi}=\text{diag}\left\{\phi_{1},\cdots,\phi_{2^{B_{p}}}\right\}, respectively, where γr,k\gamma_{r,k} is the (r,k)(r,k)-th entry of 𝚪\bm{\Gamma}. Correspondingly, the modified channel is 𝑯=[𝒉1,,𝒉2Bp]{\bm{H}}=\left[\bm{h}_{1},\cdots,\bm{h}_{2^{B_{p}}}\right] with 𝒉r=k𝒦aγr,k𝒉~k\bm{h}_{r}=\sum\nolimits_{k\in{\mathcal{K}_{a}}}{\gamma_{r,k}\tilde{\bm{h}}_{k}}. Hence, (7) can be written as

𝒀=𝑨𝚽𝑯+𝒁.\bm{Y}=\bm{A}\bm{\Phi}\bm{H}+\bm{Z}. (8)

Let 𝑿=𝚽𝑯=[𝒙1,,𝒙2Bp]T\bm{X}=\bm{\Phi}\bm{H}=\left[\bm{x}_{1},\cdots,\bm{x}_{2^{B_{p}}}\right]^{T}. The goal for the BS in the CS phase is to detect the non-zero positions of the selection matrix 𝚽\bm{\Phi} and the corresponding channel vectors by recovering 𝑿\bm{X} based on the noisy observation 𝒀\bm{Y}. Since 𝑿\bm{X} is row sparse, i.e., many 𝒙n\bm{x}_{n} are zero, such reconstruction problem can be modeled as the CS problem [4]. Once 𝚽\bm{\Phi} is recovered, the message indicators of active devices {ik,k𝒦a}\left\{i_{k},\forall k\in\mathcal{K}_{a}\right\} are also recovered. Moreover, iki_{k} acts as the parameter of the LDPC code, since it determines the interleaving pattern of the data in the LDPC phase. Considering the fact that the BpB_{p} bits of message carries key parameters for the latter phase, we name it the preamble. We note that it is different from the preamble in the traditional grant-free scenario, which is a pure pilot with no data embedded.

III-B LDPC Phase

Likewise, let 𝒗kc𝒗k(Bp+1:B){0,1}Bc\bm{v}_{k}^{c}\triangleq\bm{v}_{k}(B_{p}+1:B)\in\left\{0,1\right\}^{B_{c}} denote the remaining BcB_{c} information bits of device kk. 𝒗kc\bm{v}_{k}^{c} is first encoded into an LDPC code 𝒃kc{0,1}L~c\bm{b}_{k}^{c}\in\left\{0,1\right\}^{\tilde{L}_{c}}, which is determined by the LDPC parity check matrix 𝑪\bm{C} with size (L~cBc)×L~c\left(\tilde{L}_{c}-B_{c}\right)\times\tilde{L}_{c}. We note that in the decoding process, if 𝒗kc^\hat{\bm{v}_{k}^{c}} is a valid codeword, then mod(𝑪𝒗^kc,2)=0\text{mod}\left(\bm{C}\hat{\bm{v}}_{k}^{c},2\right)=0. 𝒃kc\bm{b}_{k}^{c} is then modulated to 𝒔kc\bm{s}_{k}^{c}. We adopt a sparse spreading scheme introduced in [25]. That is, 𝒔kc\bm{s}_{k}^{c} is zero-padded into a length LcL_{c} vector 𝒔~kc\tilde{\bm{s}}_{k}^{c}

𝒔~kc=[𝒔kc,0,,0].\tilde{\bm{s}}_{k}^{c}=\left[\bm{s}_{k}^{c},~{}0,~{}\cdots,~{}0\right]. (9)

We then employ the index representation iki_{k} to permute the ordering of 𝒔~kc\tilde{\bm{s}}_{k}^{c}. This is implemented by a random interleaver πik\pi_{i_{k}} with interleaving pattern iki_{k}. As mentioned in [26], the purpose for permuting the codewords is to decorrelate the random access interference from other devices. This is similar to the IDMA scheme since the interleaving patterns for most of the devices are different because of the distinctive indices iki_{k}. Hence, 𝒗kc\bm{v}_{k}^{c} is finally encoded to πik(𝒔~kc)\pi_{i_{k}}\left(\tilde{\bm{s}}_{k}^{c}\right). Appending it to the coded message in the CS phase yields the final codeword 𝒙k\bm{x}_{k}:

𝒙k=[𝒂ikT,πik(𝒔~kc)]T.\bm{x}_{k}=\left[\bm{a}_{i_{k}}^{T},\pi_{i_{k}}\left(\tilde{\bm{s}}_{k}^{c}\right)\right]^{T}. (10)

The received signal including both the CS and LDPC phases is given by

𝒀=k𝒦a𝒙k𝒉ikT+𝒁,\bm{Y}=\sum\nolimits_{k\in\mathcal{K}_{a}}{\bm{x}_{k}\bm{h}_{i_{k}}^{T}}+\bm{Z}, (11)

where 𝒉ik\bm{h}_{i_{k}} is assumed to follow independent quasi-static flat fading within the above two phases in this paper. The goal for the BS is to recover {𝒗k,k𝒦a}\left\{\bm{v}_{k},\forall k\in\mathcal{K}_{a}\right\} based on the received signal 𝒀\bm{Y} and the channel 𝒉ik\bm{h}_{i_{k}} estimated in the CS phase. We emphasize again that the BS only produces the transmitted messages without distinguishing the corresponding devices.

This hierarchical encoding process appears to be similar to the work of [26], where CS and channel coding techniques are utilized to encode the messages. However, unlike in [26] considering the GMAC system, our approach is in the MIMO channel. Hence, in addition to recovering the parameters of LDPC codes conveyed by the codebook 𝑨\bm{A} as in [26], the BS is also tasked to estimate the channel coefficients. Besides, as we will see shortly in Section V-C, a belief propagation decoder draws a connection between the CS and LDPC phases. That is, messages in the CE as well as DD processes can be jointly updated by utilizing the belief of each other. Whereas the two phases in [26] are two independent modules and work sequentially.

IV Collision Resolution Protocol

It is possible that two or more devices have the same preamble message, 𝒗p\bm{v}^{p}, which, although, may occur in a small probability. In this case, the collided devices will have the same interleaving pattern in the LDPC phase, which goes against the principles of the IDMA scheme. Moreover, they will choose the same codeword as their coded messages in the CS phase, which results in the received signal in the CS phase being

𝒀colli=ik𝒞𝒂ikj𝒦ik𝒉~jT+𝒁,\bm{Y}_{colli}=\sum\nolimits_{i_{k}\in\mathcal{C}}{\bm{a}_{i_{k}}\sum\nolimits_{j\in\mathcal{K}_{i_{k}}}{\tilde{\bm{h}}_{j}^{T}}}+\bm{Z}, (12)

where 𝒞\mathcal{C} and 𝒦ik\mathcal{K}_{i_{k}} denote the set of collided indexes and collided devices corresponding to the message index iki_{k}, respectively. In this regard, the BS can only recover the superimposed channels of these devices, instead of their own, which will lead to the failure of the LDPC decoding process. Although collision may occur in a small probability, it can occur. However, the design of an efficient collision resolution scheme in URA remains missing.

Refer to caption
(a) The BS broadcasts collided index representations {ik|ϵik>η}\left\{i_{k}|\epsilon_{i_{k}}>\eta\right\} to all the devices. Based on this, active devices can figure out whether they have been in a collision. Deivce ii to jj are in collision in this case.
Refer to caption
(b) The collided devices slide the window with length BpB_{p} bits forward within the total BB bits to get new sequences. The sliding length is B0B_{0}, satisfying 0<B0<Bp0<B_{0}<B_{p}.
Figure 2: The diagram of the collision resolution protocol.

In this paper, we develop a collision resolution protocol based on the ED and SWP. We note that in real scenarios, the near-far effects can be well solved with the existing power control schemes [40, 41, 42]. In this regard, a flat fading channel is considered in this paper. That is, the LSFCs βk\beta_{k} for all the devices are assumed to be identical, as also considered in [31, 32]. As mentioned above, if a collision happens, the recovered channels of the collided devices will be a superposition of their own, that is

𝒉^ik=j𝒦ik𝒉~j+𝒛,ik𝒞,\hat{\bm{h}}_{i_{k}}=\sum\nolimits_{j\in\mathcal{K}_{i_{k}}}{\tilde{\bm{h}}_{j}}+\bm{z},\quad i_{k}\in\mathcal{C}, (13)

where 𝒛𝒞𝒩(0,σ2𝑰M)\bm{z}\sim\mathcal{CN}(0,\sigma^{2}\bm{I}_{M}) denotes the Gaussian noise. And 𝒉^ik\hat{\bm{h}}_{i_{k}} is distributed as 𝒉^ik𝒞𝒩(0,(|𝒦ik|+σ2)𝑰M)\hat{\bm{h}}_{i_{k}}\sim\mathcal{CN}(0,\left(|\mathcal{K}_{i_{k}}|+\sigma^{2})\bm{I}_{M}\right), which has a higher power than those without collision. Therefore, an effective way to detect collision is to perform ED on the estimated channel by the BS

ϵik=𝔼[𝒉^ikH𝒉^ik].\epsilon_{i_{k}}=\mathbb{E}\left[\hat{\bm{h}}_{i_{k}}^{H}\hat{\bm{h}}_{i_{k}}\right]. (14)

If ϵik\epsilon_{i_{k}} is greater than a given threshold η\eta, then it is utilized as evidence that there are at least two devices that have the same preamble and thus they choose the same CS codeword 𝒂ik\bm{a}_{i_{k}}. Since devices themselves do not know whether they have been in a collision, the BS needs to feed this information back. To this end, the BS will broadcast the collided index representations {ik|ϵik>η}\left\{i_{k}|\epsilon_{i_{k}}>\eta\right\} to all the devices to help them get the judgment. Fig. 2a showcases that device ii to jj realized that a conflict occurred after receiving the indexes broadcast by the BS. We note that in the above process, the additional information required at the BS is only the threshold and it can be easily preset in practice.

As illustrated in Fig. 2b, in order to get a new non-conflicting index representation, the collided devices will slide the window with length BpB_{p} bits forward within the total BB bits to get new sequences, denoted by 𝒗kp{\bm{v}_{k}^{p}}^{\prime}. We denote B0B_{0} as the sliding length which satisfies 0<B0<Bp0<B_{0}<B_{p}. The reason behind B0<BpB_{0}<B_{p} is that there should be a common part, labeled in green in Fig. 2b, between the sequences before and after sliding the window, so as to splice back the sequences between different windows.

Algorithm 1 Collision Resolution Protocol
1:  Input: estimated channel 𝑯^\hat{\bm{H}}, maximum iteration tmaxt_{max}, maximum and minimum threshold η\eta, γ\gamma
2:  Output: recovered channel 𝑯~\tilde{\bm{H}}
3:  Initialize: iteration count t=0t=0, 𝑯~=[]\tilde{\bm{H}}=\left[~{}\right]
4:  repeat
5:     Energy detection:
ϵik=𝔼[𝑯^ik,:𝑯^ik,:H],ik[1:2Bp]\epsilon_{i_{k}}=\mathbb{E}\left[\hat{\bm{H}}_{i_{k},:}\hat{\bm{H}}_{i_{k},:}^{H}\right],i_{k}\in\left[1:2^{B_{p}}\right]
6:     Feedback: the BS broadcasts {ik,ϵik>η}\left\{i_{k},\epsilon_{i_{k}}>\eta\right\}
7:     Combine: 𝑯~[𝑯~;𝑯^ik,:],{ik|γ<ϵik<η}\tilde{\bm{H}}\leftarrow\left[\tilde{\bm{H}};\hat{\bm{H}}_{i_{k},:}\right],\left\{i_{k}|\gamma<\epsilon_{i_{k}}<\eta\right\}
8:     Window sliding and retransmission
9:     Channel estimation: 𝑯^\hat{\bm{H}}
10:     tt+1t\leftarrow t+1
11:  until t=tmaxt=t_{max} or {ϵik<η,ik[1:2Bp]}\{\epsilon_{i_{k}}<\eta,~{}\forall i_{k}\in\left[1:2^{B_{p}}\right]\}
12:  return  𝑯~\tilde{\bm{H}}

After obtaining 𝒗kp{\bm{v}_{k}^{p}}^{\prime}, the CS-based encoder is again performed to encode 𝒗kp{\bm{v}_{k}^{p}}^{\prime} to 𝒂ik\bm{a}_{{i_{k}}^{\prime}} with ik=[𝒗kp]2[1:2Bp]{i_{k}}^{\prime}=\left[{\bm{v}_{k}^{p}}^{\prime}\right]_{2}\in\left[1:2^{B_{p}}\right]. The encoding process is the same as that in III-A. Channels of the collided devices are expected to be recovered separately after the retransmission. The ED will be again performed on the recovered channels. If the collision still exists, the window sliding process will be executed again until the maximum number of retransmission is reached or no collision exists. The above collision resolution protocol is summarized in Algorithm 1. We give the analysis for this collision resolution protocol in Appendix -A, which illustrates that as the window sliding progresses, the number of collided devices will decrease and tend to zero.

V Decoder

The decoding process can be distilled into two key operations: the recovery of the preamble as well as the key parameters for the LDPC code, and the LDPC decoding process combined with SIC [43]. Both are carried out with the MP algorithm. We emphasize again that the beliefs of these two parts can be leveraged to jointly update the messages of the decoding process, which is not considered in [25, 26].

V-A Joint DAD-CE Algorithm

The recovery of the preamble as well as the channels in the CS phase is equivalent to the joint DAD-CE problem, which can be modeled as a CS problem. According to the formulation in (8), the recovery of the sparse matrix 𝑿\bm{X} can be addressed by the CS-based methods, such as the AMP algorithm and its variants [8, 4, 9, 10, 11]. Besides, the MP-based approaches [13, 14] also work well on the above issues. The Bernoulli and Rician messages are jointly updated in [13], which considers the fading channel and grant-free scenario. This scenario is also considered in [14], which takes advantage of estimated data symbols as soft pilot sequences to perform joint channel and data estimation. In this subsection, we consider the MIMO system and derive the update rules of messages based on the BP algorithm.

We derive the update rules of the activity indicator ϕk\phi_{k} and channel vector 𝒉k\bm{h}_{k} in (6), which are modeled as Bernoulli and Gaussian messages, respectively. The Gaussian messages can be characterized by the estimation of mean-value 𝒖k\bm{u}_{k} and auto-covariance matrix 𝚺k\bm{\Sigma}_{k}. That is, 𝒖k\bm{u}_{k} and 𝚺k\bm{\Sigma}_{k} are the estimation and estimating deviation of 𝒉k\bm{h}_{k} in (6), respectively. Besides, the Bernoulli messages for the activity indicator can be represented by pkp_{k}, which is the probability of ϕk\phi_{k} taking the value one. These messages are updated iteratively between the observation and variable nodes, which can be characterized by the factor graph in Fig. 3, where the received signal yl,my_{l,m} represents the observation node, denoted by SN, and channel hk,mh_{k,m} and the activity pattern ϕk\phi_{k} are the variable nodes, denoted by VN. The edges in the factor graph represent the connections among nodes. In the BP algorithm, messages are passed along these edges. We elaborate on the update rules of these messages below.

Refer to caption

Figure 3: Factor graph of the joint DAD-CE algorithm.

V-A1 Message Update at Observation Nodes

We denote pilVN(t)p_{i\rightarrow l}^{VN}(t) as the Bernoulli message for the activity of device ii, which is passed from VN ii to SN ll in the tt-th iteration. Accordingly, μimlmVN(t)\mu_{im\rightarrow lm}^{VN}(t) and 𝚺ilVN(t)\bm{\Sigma}_{i\rightarrow l}^{VN}(t) denote the Gaussian messages passed from VN imim to SN lmlm, which represent the estimation and the estimating deviation of the channel 𝒉i\bm{h}_{i}, respectively. The index m[1:M]m\in\left[1:M\right] denotes the mm-th antenna and also the mm-th value of 𝒉i\bm{h}_{i}. Since the update rules of the messages are the same with respect to different iterations, the index of iteration is omitted in the following derivation. For clarity, we assume there is no collision in the CS phase for the following derivation. As such, iki_{k}, the original subscript of 𝒉\bm{h} is replaced by kk, since there is a one-to-one mapping between these two terms. We emphasize that the collision is considered in our implementation and addressed by the proposed resolution protocol in Algorithm 1.

To give the message update rules at the SN yl,my_{l,m} in Fig. 4, we first rewrite (6) as

ylm\displaystyle y_{lm} =i=1KAliϕihim+nlm\displaystyle=\sum\nolimits_{i=1}^{K}{A_{li}\phi_{i}h_{im}}+n_{lm} (15)
=Alkϕkhkm+i𝒦\kAliϕihim+nlmzlkm,\displaystyle=A_{lk}\phi_{k}h_{km}+\underbrace{\sum\nolimits_{i\in\mathcal{K}\backslash k}{A_{li}\phi_{i}h_{im}}+n_{lm}}_{z_{lkm}},

Refer to caption

Figure 4: Message update rules at VNs and SNs. The output message on each edge is obtained by collecting the messages from the other edges connected with the same node.

where ll is in LpL_{p} and 𝒦\k\mathcal{K}\backslash k denotes the entries in set {1,2,,K}\left\{1,2,\cdots,K\right\} except kk. The term iK\kAliϕihim+nlm\sum\nolimits_{i\in K\backslash k}{A_{li}\phi_{i}h_{im}}+n_{lm} is modeled as an equivalent Gaussian noise with 𝒛lk𝒞𝒩(𝝁zlk,𝚺zlk)\bm{z}_{lk}\sim\mathcal{CN}(\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}}), where 𝝁zlk=[μzlk1,,μzlkm,,μzlkM]\bm{\mu}_{z_{lk}}=[\mu_{z_{lk1}},\cdots,\mu_{z_{lkm}},\cdots,\mu_{z_{lkM}}] and μzlkm\mu_{z_{lkm}} is given by

μzlkm=i𝒦\kAlipilVNμimlmVN.\mu_{z_{lkm}}=\sum\nolimits_{i\in\mathcal{K}\backslash k}{A_{li}\cdot p_{i\rightarrow l}^{VN}\cdot\mu_{im\rightarrow lm}^{VN}}. (16)

For 𝚺zlk\bm{\Sigma}_{z_{lk}}, the auto-covariance of 𝒛lk\bm{z}_{lk}, since ϕi\phi_{i} is the same for MM antennas, resulting in the correlation among antennas, we should not consider the variance of 𝒛lkm\bm{z}_{lkm} at each antenna mm separately. Instead, the covariance of zlkz_{lk} is considered in this paper. Rewrite (15) in a vector form:

𝐲lT\displaystyle\mathbf{y}_{l}^{T} =[yl1yl2ylM]=Alkϕk[hk1hk2hkM]\displaystyle=\left[{\begin{array}[]{*{20}{c}}y_{l1}\\ y_{l2}\\ \vdots\\ y_{lM}\end{array}}\right]=A_{lk}\phi_{k}\cdot\left[{\begin{array}[]{*{20}{c}}h_{k1}\\ h_{k2}\\ \vdots\\ h_{kM}\end{array}}\right] (17)
+i𝒦\kAliϕi[hi1hi2hiM]+[nl1nl2nlM]𝐳lk.\displaystyle+\underbrace{\sum\nolimits_{i\in\mathcal{K}\backslash k}{A_{li}\phi_{i}\cdot\left[{\begin{array}[]{*{20}{c}}h_{i1}\\ h_{i2}\\ \vdots\\ h_{iM}\end{array}}\right]}+\left[{\begin{array}[]{*{20}{c}}n_{l1}\\ n_{l2}\\ \vdots\\ n_{lM}\end{array}}\right]}_{\mathbf{z}_{lk}}.

The (m,n)(m,n)-th (mn)(m\neq n) entry for 𝚺𝒛𝒍𝒌\bm{\Sigma_{z_{lk}}} satisfies

(Σzlk)(m,n)\displaystyle(\Sigma_{z_{lk}})_{(m,n)} =i𝒦\k|Ali|2pilVN{(ΣilVN)(m,n)\displaystyle=\sum\nolimits_{i\in\mathcal{K}\backslash k}{{\left|{{A_{li}}}\right|}^{2}\cdot p_{i\to l}^{VN}\cdot\Big{\{}(\Sigma_{i\to l}^{VN})_{(m,n)}} (18)
+qilVNμimlmVN(μinlnVN)},mn,\displaystyle{+~{}q_{i\to l}^{VN}\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\Big{\}}},~{}m\neq n,

where qilVN=1pilVNq_{i\to l}^{VN}=1-p_{i\to l}^{VN} denotes the probability that the Bernoulli variable ϕk\phi_{k} equals to zero. If m=nm=n, we have

(Σzlk)(m,m)\displaystyle(\Sigma_{z_{lk}})_{(m,m)} =i𝒦\k|Ali|2pilVN{(ΣilVN)(m,m)\displaystyle=\sum\nolimits_{i\in\mathcal{K}\backslash k}{{\left|{{A_{li}}}\right|}^{2}\cdot p_{i\to l}^{VN}\cdot\Big{\{}(\Sigma_{i\to l}^{VN})_{(m,m)}} (19)
+qilVN|μimlmVN|2}+σn2.\displaystyle{\left.\quad\quad\quad+~{}q_{i\to l}^{VN}\cdot\left|\mu_{im\to lm}^{VN}\right|^{2}\right\}}+\sigma_{n}^{2}.

Details about the derivation of 𝚺𝒛𝒍𝒌\bm{\Sigma_{z_{lk}}} are given in Appendix -B1. After obtaining the mean and covariance of 𝒛lk\bm{z}_{lk}, we can get the Gaussian messages μlmkmSN\mu_{lm\rightarrow km}^{SN} and 𝚺lkSN\bm{\Sigma}_{l\rightarrow k}^{SN} passed from SN lmlm to VN kmkm as below

μlmkmSN=𝔼[hkm|ylm,μzlkm,𝚺zlk,ϕk=1]=(ylmμzlkm)/Alk\displaystyle\begin{split}\mu_{lm\to km}^{SN}&=\mathbb{E}\left[h_{km}|y_{lm},\mu_{z_{lkm}},\bm{\Sigma}_{z_{lk}},\phi_{k}=1\right]\\ &=(y_{lm}-\mu_{z_{lkm}})/A_{lk}\end{split} (20)
𝚺lkSN=Var[hkm|ylm,μzlkm,𝚺zlk,ϕk=1]=𝚺zlk/|Alk|2,\displaystyle\begin{split}\bm{\Sigma}_{l\to k}^{SN}&=\text{Var}\left[h_{km}|y_{lm},\mu_{z_{lkm}},\bm{\Sigma}_{z_{lk}},\phi_{k}=1\right]\\ &=\bm{\Sigma}_{z_{lk}}/\left|A_{lk}\right|^{2},\end{split} (21)

where 𝔼[a|b]\mathbb{E}\left[a|b\right] and Var[a|b]\text{Var}\left[a|b\right] denote the expectation and variance of aa conditioned on bb, respectively.

For the Bernoulli message plkSNp_{l\to k}^{SN} passed for SN ll to VN kk, we have

plkSN\displaystyle p_{l\to k}^{SN} =[1+P(𝐲l|ϕk=0,𝝁zlk,𝚺zlk)P(𝐲l|ϕk=1,𝝁zlk,𝚺zlk)]1\displaystyle=\left[1+\frac{P(\mathbf{y}_{l}|\phi_{k}=0,\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{P(\mathbf{y}_{l}|\phi_{k}=1,\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}\right]^{-1} (22)
=[1+P(𝒚l=𝒛lk|𝝁zlk,𝚺zlk)P(𝒚l=Alk𝒉k+𝒛lk|𝝁zlk,𝚺zlk)]1\displaystyle=\left[1+\frac{P(\bm{y}_{l}=\bm{z}_{lk}|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{P(\bm{y}_{l}=A_{lk}\cdot\bm{h}_{k}+\bm{z}_{lk}|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}\right]^{-1}
=[1+f(𝒚l|𝝁zlk,𝚺zlk)f(𝒚l|𝝁zlk,𝚺zlk)]1\displaystyle=\left[1+\frac{f(\bm{y}_{l}|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})}{f(\bm{y}_{l}|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})}\right]^{-1}
=f(𝒚l|𝝁zlk,𝚺zlk)f(𝒚l|𝝁zlk,𝚺zlk)+f(𝒚l|𝝁zlk,𝚺zlk),\displaystyle=\frac{f(\bm{y}_{l}|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})}{f(\bm{y}_{l}|\bm{\mu}_{z_{lk}},\bm{\Sigma}_{z_{lk}})+f(\bm{y}_{l}|\bm{\mu}_{z_{lk}}^{{}^{\prime}},\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})},

where

𝝁zlk\displaystyle\bm{\mu}_{z_{lk}}^{{}^{\prime}} =Alk𝝁klVN+𝝁zlk\displaystyle=A_{lk}\cdot\bm{\mu}_{k\to l}^{VN}+\bm{\mu}_{z_{lk}} (23)
𝚺zlk\displaystyle\bm{\Sigma}_{z_{lk}}^{{}^{\prime}} =|Alk|2𝚺klVN+𝚺zlk,\displaystyle=\left|A_{lk}\right|^{2}\cdot\bm{\Sigma}_{k\to l}^{VN}+\bm{\Sigma}_{z_{lk}}, (24)

which denote the mean-value and covariance of 𝒚l\bm{y}_{l} when ϕk=1\phi_{k}=1, respectively. And f(𝒙|𝝁,Σ)f(\bm{x}|\bm{\mu},\Sigma) denotes the probability density function (pdf) of the multi-dimensional complex Gaussian distribution 𝒞𝒩M(𝒙|𝝁,Σ)\mathcal{CN}_{M}(\bm{x}|\bm{\mu},\Sigma), that is

f(𝒙|𝝁,𝚺)=1πMdet(𝚺)exp[(𝒙𝝁)H𝚺1(𝒙𝝁)].f(\bm{x}|\bm{\mu},\bm{\Sigma})=\frac{1}{\pi^{M}\!\cdot\!\det\left(\bm{\Sigma}\right)}\!\cdot\exp\left[-(\bm{x}-\bm{\mu})^{H}\bm{\Sigma}^{-1}(\bm{x}-\bm{\mu})\right]. (25)

Moreover, the Bernoulli message can be simplified by the use of log-likelihood ratio (LLR) to reduce the complexity as well as to avoid the computation overflow. Hence, the LLR of the message in (22) can be represented as

llkSN\displaystyle l_{l\to k}^{SN} lnP(ϕk=1)P(ϕk=0)=lnplkSN1plkSN\displaystyle\triangleq\text{ln}\frac{P(\phi_{k}=1)}{P(\phi_{k}=0)}=\ln\frac{p_{l\to k}^{SN}}{1-p_{l\to k}^{SN}} (26)
=(a)lndet(𝚺zlk)det(𝚺zlk)+(𝒚l𝝁zlk)H𝚺zlk1(𝒚l𝝁zlk)\displaystyle\overset{(a)}{=}\ln\frac{\det\left(\bm{\Sigma}_{z_{lk}}\right)}{\det\left(\bm{\Sigma}_{z_{lk}}^{{}^{\prime}}\right)}\!+\!(\bm{y}_{l}-\bm{\mu}_{z_{lk}})^{H}\cdot\bm{\Sigma}_{z_{lk}}^{-1}\cdot(\bm{y}_{l}-\bm{\mu}_{z_{lk}})
(𝒚l𝝁zlk)H(𝚺zlk)1(𝒚l𝝁zlk),\displaystyle\quad\quad\quad-(\bm{y}_{l}-\bm{\mu}_{z_{lk}}^{{}^{\prime}})^{H}\cdot(\bm{\Sigma}_{z_{lk}}^{{}^{\prime}})^{-1}\cdot(\bm{y}_{l}-\bm{\mu}_{z_{lk}}^{{}^{\prime}}),

where =(a)\overset{(a)}{=} is derived by the substitution of (22) and (25).

V-A2 Message Update at Variable Nodes

Likewise, the Gaussian and Bernoulli messages at VNs are updated by collecting the incoming messages from SNs. To ensure convergence, messages from the VN’s own are not included in the calculation [12]. Typically, for the Gaussian messages, since 𝒉\bm{h} follows the Gaussian distribution, the update rule at the VN is to multiply the pdfs observed at each SN to obtain a new one. We note that the prior Gaussian distribution of 𝒉\bm{h} is also included in the multiplication. The new pdf still follows the Gaussian distribution, of which the mean and covariance are the updated messages at the VN. The pdf of 𝒉k\bm{h}_{k} passed from VN kk to SN ll is given by

f(𝒉|𝝁klVN,𝚺klVN)\displaystyle f(\bm{h}|\bm{\mu}_{k\to l}^{VN},\bm{\Sigma}_{k\to l}^{VN}) (27)
i\lf(𝒉|𝝁ikSN,𝚺ikSN)f(𝒉|𝝁kpri,𝚺kpri).\displaystyle\quad\quad\propto\prod\nolimits_{i\in\mathcal{L}\backslash l}{f(\bm{h}|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN})}\cdot f(\bm{h}|\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}).

Accordingly, for the Gaussian pdf f(𝒉|𝝁klVN,𝚺klVN)f(\bm{h}|\bm{\mu}_{k\to l}^{VN},\bm{\Sigma}_{k\to l}^{VN}), the mean and covariance are give by

𝝁klVN=𝔼[𝒉k|𝝁ikSN,𝚺ikSN,i\l]=𝚺klVN[(𝚺kpri)1𝝁kpri+i\l(𝚺ikSN)1𝝁ikSN]\displaystyle\begin{split}\bm{\mu}_{k\to l}^{VN}&=\mathbb{E}\left[\bm{h}_{k}|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN},i\in\mathcal{L}\backslash l\right]\\ &=\bm{\Sigma}_{k\to l}^{VN}\cdot\Big{[}(\bm{\Sigma}_{k}^{pri})^{-1}\cdot\bm{\mu}_{k}^{pri}\\ &\quad\quad\quad\quad+\sum\nolimits_{i\in\mathcal{L}\backslash l}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}\cdot\bm{\mu}_{i\to k}^{SN}}\Big{]}\end{split} (28)
𝚺klVN=Var[𝒉k|𝝁ikSN,𝚺ikSN,i\l]=[i\l(𝚺ikSN)1+(𝚺kpri)1]1,\displaystyle\begin{split}\bm{\Sigma}_{k\to l}^{VN}&=\operatorname{Var}\left[\bm{h}_{k}|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN},i\in\mathcal{L}\backslash l\right]\\ &=\left[\sum\nolimits_{i\in\mathcal{L}\backslash l}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}}+(\bm{\Sigma}_{k}^{pri})^{-1}\right]^{-1},\end{split} (29)

where 𝝁kpri=𝟎M×1\bm{\mu}_{k}^{pri}=\bm{0}_{M\times 1} and 𝚺kpri=𝑰M\bm{\Sigma}_{k}^{pri}=\bm{I}_{M} are the prior mean and covariance of 𝒉k\bm{h}_{k}, \l\mathcal{L}\backslash l denotes the entries in set {1,2,,L}\left\{1,2,\cdots,L\right\} except ll. We give the derivations of (28) and (29) in Appendix -B2.

The derivation of the Bernoulli messages is the same as above, which is updated by collecting the messages observed at SNs. For pklVNp_{k\to l}^{VN} passed for VN kk to SN ll, it is obtained by multiplying the probability of ϕk=1\phi_{k}=1 passed from all the SNs to VN kk and then normalizing. Likewise, for convergence, the message passed from SN ll to VN kk is not included. We emphasize that the prior activation probability of each device pap_{a} is also considered. As such, pklVNp_{k\to l}^{VN} is given by

pklVN\displaystyle p_{k\to l}^{VN} =P(ϕk=1|{pikSN,i\l},pa)\displaystyle=P\left(\phi_{k}=1|\left\{p_{i\to k}^{SN},i\in\mathcal{L}\backslash l\right\},p_{a}\right) (30)
=pai\lpikSNpai\lpikSN+(1pa)i\l(1pikSN).\displaystyle=\frac{p_{a}\cdot\prod_{i\in\mathcal{L}\backslash l}p^{SN}_{i\to k}}{p_{a}\cdot\prod_{i\in\mathcal{L}\backslash l}p^{SN}_{i\to k}+(1-p_{a})\cdot\prod_{i\in\mathcal{L}\backslash l}\left({1-p^{SN}_{i\to k}}\right)}.

Likewise, for complexity reduction, we also employ the LLR to represent this message in iterations, of which the relationship with the activation probability is

lklVN\displaystyle l_{k\to l}^{VN} =lnpklVN1pklVN=l0+i\llikSN\displaystyle=\ln\frac{p^{VN}_{k\to l}}{1-p^{VN}_{k\to l}}=l_{0}+\sum_{i\in\mathcal{L}\backslash l}l^{SN}_{i\to k} (31)
pklVN\displaystyle p^{VN}_{k\to l} =11+exp(lklVN),\displaystyle=\frac{1}{1+\exp\left({{-l^{VN}_{k\to l}}}\right)},

where l0=lnpa1pal_{0}=\ln\frac{p_{a}}{1-p_{a}} is the prior LLR of the probability for the device being active.

V-A3 DAD Decision and CE Output

Since the messages above are iteratively updated between SNs and VNs, after reaching the maximum number of iterations, the Bernoulli and Gaussian messages will have an output at VNs. For the Gaussian messages, similar to the above update rules, the output is obtained by combining all the incoming messages from SNs, i.e,

𝝁kdec=𝚺kdec[(𝚺kpri)1𝝁kpri+i(𝚺ikSN)1𝝁ikSN]\displaystyle\begin{split}\bm{\mu}_{k}^{dec}&=\bm{\Sigma}_{k}^{dec}\cdot\left[(\bm{\Sigma}_{k}^{pri})^{-1}\cdot\bm{\mu}_{k}^{pri}\right.\\ &\quad\quad\quad\quad\left.+\sum\nolimits_{i\in\mathcal{L}}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}\cdot\bm{\mu}_{i\to k}^{SN}}\right]\end{split} (32)
𝚺kdec=[i(𝚺ikSN)1+(𝚺kpri)1]1,\displaystyle\begin{split}\bm{\Sigma}_{k}^{dec}&=\left[\sum\nolimits_{i\in\mathcal{L}}{(\bm{\Sigma}_{i\to k}^{SN})^{-1}}+(\bm{\Sigma}_{k}^{pri})^{-1}\right]^{-1},\end{split} (33)

which denote the output estimation and estimating deviation of 𝒉k\bm{h}_{k}, respectively. For the Bernoulli messages, the LLR of the DAD decision is

lkdec=l0+lllkSN+lkce.l^{\text{dec}}_{k}=l_{0}+\sum\nolimits_{l\in\mathcal{L}}l^{SN}_{l\to k}+l^{ce}_{k}. (34)

Device kk’s activity is detected as ϕ^k=1\hat{\phi}_{k}=1 if lkdec>0l^{\text{dec}}_{k}>0 and vice versa. The term lkcel_{k}^{ce} in (34) is to improve the DAD accuracy by exploiting the CE result [13], which is derived as follows. The estimated channel 𝒉k\bm{h}_{k} can be modeled as 𝒉^k=𝒉k+ϵk\hat{\bm{h}}_{k}=\bm{h}_{k}+\bm{\epsilon}_{k}, where ϵk\bm{\epsilon}_{k} is the complex Gaussian noise distributed as ϵk𝒞𝒩(0,𝚺kdec)\bm{\epsilon}_{k}\sim\mathcal{CN}(0,\bm{\Sigma}_{k}^{dec}). Accordingly, the distribution of 𝒉^k\hat{\bm{h}}_{k} with respect to value of ϕk\phi_{k} is

𝒉^k{𝒞𝒩(𝝁kpri,𝚺kpri+𝚺kdec),ϕk=1𝒞𝒩(𝟎,𝚺kdec),ϕk=0k𝒦tot.\hat{\bm{h}}_{k}\sim\left\{\begin{array}[]{l c}\mathcal{CN}(\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}+\bm{\Sigma}_{k}^{dec}),&\phi_{k}=1\\ \mathcal{CN}(\bm{0},\bm{\Sigma}_{k}^{dec}),&\phi_{k}=0\end{array}\quad\forall k\in\mathcal{K}_{tot}.\right. (35)

Therefore, this information can be leveraged to give an extra belief to the DAD decision. Similar to (22), lkcel^{ce}_{k} is given by

lkce=\displaystyle l^{ce}_{k}= lnP(𝒉^k=𝝁kdec|ϕk=1,𝝁kpri,𝚺kdec,𝚺kpri)P(𝒉^k=𝝁kdec|ϕk=0,𝚺kdec)\displaystyle\ln\frac{P\left({\hat{\bm{h}}_{k}=\bm{\mu}_{k}^{dec}|\phi_{k}=1,\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{dec},\bm{\Sigma}_{k}^{pri}}\right)}{P\left({\hat{\bm{h}}_{k}=\bm{\mu}_{k}^{dec}|\phi_{k}=0,\bm{\Sigma}_{k}^{dec}}\right)} (36)
=\displaystyle= lnf(𝝁kdec|𝝁kpri,𝚺kpri+𝚺kdec)f(𝝁kdec|𝟎,𝚺kdec)\displaystyle\ln\frac{f\left({\bm{\mu}_{k}^{dec}|\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}+\bm{\Sigma}_{k}^{dec}}\right)}{f\left({\bm{\mu}_{k}^{dec}|\bm{0},\bm{\Sigma}_{k}^{dec}}\right)}
=\displaystyle= lndet(𝚺kdec)det(𝚺kpri+𝚺kdec)+(𝝁kdec)H(𝚺kdec)1𝝁kdec\displaystyle\ln\frac{\det\left(\bm{\Sigma}_{k}^{dec}\right)}{\det\left(\bm{\Sigma}_{k}^{pri}\!+\!\bm{\Sigma}_{k}^{dec}\right)}+\left(\bm{\mu}_{k}^{dec}\right)^{H}\cdot\left(\bm{\Sigma}_{k}^{dec}\right)^{-1}\cdot\bm{\mu}_{k}^{dec}
(𝝁kdec𝝁kpri)H(𝚺kpri+𝚺kdec)1(𝝁kdec𝝁kpri).\displaystyle-\left(\bm{\mu}_{k}^{dec}\!-\!\bm{\mu}_{k}^{pri}\right)^{H}\ \!\!\!\cdot\!\left(\bm{\Sigma}_{k}^{pri}\!+\!\bm{\Sigma}_{k}^{dec}\right)^{-1}\!\!\!\cdot\!\left(\bm{\mu}_{k}^{dec}\!-\!\bm{\mu}_{k}^{pri}\right)\!.

As aforementioned, we derive the LLR expression by utilizing the joint distribution of the channel among antennas. Finally, we obtain the estimated channel of device kk as

𝒉^k=ϕ^k𝝁kdec.\bm{\hat{h}}_{k}=\hat{\phi}_{k}\cdot\bm{\mu}_{k}^{dec}. (37)

Refer to caption

Figure 5: NMSE of CE by the Joint DAD-CE algorithm. Bp=9B_{p}=9, Ka=30K_{a}=30, M=30M=30, Lp=200L_{p}=200.

The above joint DAD-CE algorithm is summarized in Algorithm 2, where NiterN_{iter} denotes the maximum number of iterations. We note that 𝚺VN\bm{\Sigma}^{VN} and 𝚺SN\bm{\Sigma}^{SN} both go to diagonal matrices over the iteration in our numerical verification. Hence, the corresponding matrix inverse operations can be simplified to the divisions to reduce the complexity with little performance loss. As shown in Fig. 5, Simplified denotes the approximation by treating 𝚺VN\bm{\Sigma}^{VN} and 𝚺SN\bm{\Sigma}^{SN} as diagonal matrices and Original means there is no approximation in Algorithm 2. We use normalized mean square error (NMSE) for the evaluation of the CE performance. Fig. 5 illustrates that this approximation introduces little performance loss, which, although, has greatly reduced the complexity as aforementioned.

Algorithm 2 Joint DAD-CE Algorithm
1:  Input: 𝒀𝒑\bm{Y_{p}}, 𝑨\bm{A}, 𝝁pri\bm{\mu}^{pri}, 𝚺pri\bm{\Sigma}^{pri}, σn2\sigma_{n}^{2}, pap_{a}
2:  Initialize: 𝝁VN=𝝁pri,𝚺VN=𝚺pri\bm{\mu}^{VN}=\bm{\mu}^{pri},\bm{\Sigma}^{VN}=\bm{\Sigma}^{pri}
3:  repeat
4:     SN update: 𝝁SN\bm{\mu}^{SN}, 𝚺SN\bm{\Sigma}^{SN} by (20)-(21)
5:     SN update: lSNl^{SN} by (26)
6:     VN update: 𝝁VN\bm{\mu}^{VN}, 𝚺VN\bm{\Sigma}^{VN} by (28)-(29)
7:     VN update: lVNl^{VN} by (31)
8:  until NiterN_{iter} reached
9:  CE output: 𝝁dec\bm{\mu}^{dec}, 𝚺dec\bm{\Sigma}^{dec} by (32)-(33)
10:  DAD Decision: ldecl^{dec} by (34)
11:  return  {𝒉^k,ϕ^k,k𝒦tot}\left\{\hat{\bm{h}}_{k},\hat{\phi}_{k},\forall k\in\mathcal{K}_{tot}\right\}

Refer to caption

Figure 6: Factor graph for LDPC decoding.

V-B MIMO-LDPC-SIC Decoder

After obtaining the key parameters, such as interleaving patterns and channels by the joint DAD-CE algorithm, the LDPC decoding problem can be addressed by the standard BP algorithm [12, 44]. Likewise, the DD process is performed by updating messages iteratively at different nodes. Differently, since the LDPC is a forward error correction code, besides the observation and variable nodes, the check nodes (CNs) are considered in the factor graph to provide an extra belief, as shown in Fig. 6. We rewrite the received signal in the LDPC phase as

𝐘c=𝐘Lp+1:L,:=k𝒦aπik(𝐬~k)𝐡ikT+𝐙Lp+1:L,:,{\mathbf{Y}_{c}}={\mathbf{Y}_{{L_{p}}+1:{L},:}}=\sum\limits_{k\in{\mathcal{K}_{a}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\mathbf{h}_{{i_{k}}}^{T}}+{\mathbf{Z}_{{}_{{L_{p}}+1:{L},:}}}}, (38)

where 𝐘cCLc×M\mathbf{Y}_{c}\in{C^{{L_{c}}\times M}} is the last LcL_{c} rows of 𝐘\mathbf{Y}. The LDPC decoder is tasked to recover the last BcB_{c} bits of information based on the received signal 𝐘𝐜\mathbf{Y_{c}}, estimated interleaving patterns and channels using the low-complexity iterative BP algorithm. Owing to the two-phase encoding scheme, these key parameters can be recovered in the decoding of the CS phase. We emphasize that once the active indicators ϕk\phi_{k} are recovered, the positions of {ϕk=1,k𝒦tot}\left\{\phi_{k}=1,k\in\mathcal{K}_{tot}\right\} in the selection matrix 𝚽\bm{\Phi}, i.e., {ik,k𝒦a}\left\{i_{k},\forall k\in\mathcal{K}_{a}\right\}, are determined. Thereby, the interleaving patterns of active devices are also recovered. As shown in (38), the zero-padded sequences 𝐬~k\tilde{\mathbf{s}}_{k} are subject to different permutations, and each is determined by the interleaving pattern iki_{k}. Therefore, the effect of these permutations needs to be considered when the messages are being sent to and from the VNs, i.e., interleaving and de-interleaving in Fig. 6, respectively.

The connections among nodes in Fig. 6 appear to be more involved than those of Fig. 3. In the upper part of Fig. 6, the CNs (blue color) and VNs (green color) as well as the edges connecting them constitute the Tanner graph in LDPC. The subscript N=L~cBcN=\tilde{L}_{c}-B_{c} denotes the number of CNs in the LDPC code, which corresponds to the number of rows of the LDPC check matrix. KK denotes the number of active devices estimated in the CS phase. Other subscripts are consistent with the aforementioned. The edges between CNs and VNs are described by the LDPC check matrix, which cannot be marked explicitly in the graph. For example, in the check matrix of device kk, if the entry ci,j=1c_{i,j}=1, there will be an edge between SN ck,ic_{k,i} and VN sk,js_{k,j}.

Refer to caption

Figure 7: The set of VNs connected with an SN is decided by the interleaving patterns. For instance, in the diagram, the set of VNs related with y1,3y_{1,3} is {s1,5,s3,4,s5,2}\left\{s_{1,5},s_{3,4},s_{5,2}\right\}. Correspondingly, the set of SNs connected to these VNs is {ym,3,m=1,2,,M}\left\{y_{m,3},m=1,2,\cdots,M\right\}.

The lower part of Fig. 6 refers to the graph for MIMO detection, of which the edges between VNs and SNs (yellow color) are simply determined by (38) though looking complicated. For example, the VN sk,l1(l1[1:L~c])s_{k,l_{1}}\left(l_{1}\in\left[1:\tilde{L}_{c}\right]\right) is connected to the l2l_{2}-th SN from all antennas (i.e., ym,l2,m=1,2,,M,l2[1:Lc]y_{m,l_{2}},m=1,2,...,M,l_{2}\in\left[1:L_{c}\right]). We note that l1l_{1} is not necessarily equal to l2l_{2} in the presence of zero-padding and interleaving. Correspondingly, the VNs connected to SN ym,l2y_{m,l_{2}} depend on whose data is interleaved to the l2l_{2}-th channel use. For instance, as illustrated in Fig. 7, the set of VNs connected to SN y1,3y_{1,3} is {s1,5,s3,4,s5,2}\left\{s_{1,5},s_{3,4},s_{5,2}\right\}. That is, after zero-padding and interleaving, the fifth, fourth, and second bits of devices 11, 33, and 55 are mapped to the third channel use, respectively. Before conducting the MP algorithm, we define the types of messages as follows.

  • Rk,nl1{R_{k,n\to l_{1}}}: Messages passed from CN ck,nc_{k,n} to VN sk,l1s_{k,l_{1}}.

  • Qk,l1n{Q_{k,l_{1}\to n}}: Messages passed from VN sk,l1s_{k,l_{1}} to CN ck,nc_{k,n}.

  • Pk,l1m,l2{P_{k,l_{1}\to m,l_{2}}}: Messages passed from VN sk,l1s_{k,l_{1}} to SN ym,l2y_{m,l_{2}}.

  • Λm,l2k,l1{\Lambda_{m,l_{2}\to k,l_{1}}}: Messages passed from SN ym,l2y_{m,l_{2}} to VN sk,l1s_{k,l_{1}}.

The massages RR and QQ refer to the parity check constraints in the LDPC code, while PP and Λ\Lambda are related to the received signals in the MIMO system.

Refer to caption

Figure 8: Update rules for messages RR and QQ at CNs and VNs on the Tanner graph for device kk. The dashed and solid lines represent input and output messages, respectively.

We first give the MP rules for the LDPC decoding with BPSK modulation, which is known as the sum-product algorithm. As illustrated in Fig. 8, the messages RR and QQ are iteratively updated between CNs and VNs. Likewise, for the reduction of complexity, we give the updating rules in the LLR form.

Qk,l1n\displaystyle{Q_{k,l_{1}\to n}} =jMπij1(Λj,l2k,l1)+j𝒩c(k,l1)\nRk,jl1\displaystyle=\sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left(\Lambda_{j,l_{2}\to k,l_{1}}\right)}\!+\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n}{\!\!\!{R_{k,j\to l_{1}}}} (39)
Rk,nl1\displaystyle{R_{k,n\to l_{1}}} =2tanh1(j𝒩v(k,n)\l1tanh(Qk,jn2)),\displaystyle=2{\tanh^{-1}}\left({\prod\limits_{j\in{\mathcal{N}_{v}}\left({k,n}\right)\backslash l_{1}}{\!\!\!\tanh\left({\frac{{{Q_{k,j\to n}}}}{2}}\right)}}\right), (40)

where 𝒩c(k,l1)\n{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n denotes the set of CNs connected to sk,l1s_{k,l_{1}} except ck,nc_{k,n}, i.e., {ck,n,ck,n1,,ck,ni}\left\{c_{k,n},c_{k,n_{1}},\cdots,c_{k,n_{i}}\right\} in Fig. 8. Likewise, 𝒩v(k,n)\l1{\mathcal{N}_{v}}\left({k,n}\right)\backslash l_{1} denotes the set of VNs connected to ck,nc_{k,n} except sk,l1s_{k,l_{1}}, i.e., {sk,l11,,sk,l1i}\left\{s_{k,l_{1}^{1}},\cdots,s_{k,l_{1}^{i}}\right\} in Fig. 8.

Refer to caption

Figure 9: Update rules for messages Λ\Lambda and PP at VNs and SNs on the factor graph. The dashed and solid lines represent input and output messages, respectively.

The message Λm,l2k,l1{\Lambda_{m,l_{2}\to k,l_{1}}} in (39) is the LLR with the probability of VN sk,l1s_{k,l_{1}} taking different values observed at SN ym,l2y_{m,l_{2}}. For BPSK modulated system, it is given by

Λm,l2k,l1\displaystyle{\Lambda_{m,l_{2}\to k,l_{1}}} =logP(ym,l2|𝐇,sk,l1=+1)P(ym,l2|𝐇,sk,l1=1)\displaystyle=\log\frac{{P\left({{y_{m,l_{2}}}|\mathbf{H},{s_{k,l_{1}}}=+1}\right)}}{{P\left({{y_{m,l_{2}}}|\mathbf{H},{s_{k,l_{1}}}=-1}\right)}} (41)
=2σzk,l22Re(hm,k(ym,l2μzk,l2)),\displaystyle=\frac{2}{{\sigma_{{z_{k,l_{2}}}}^{2}}}\text{Re}\left({h_{m,k}^{*}\left({{y_{m,l_{2}}}-{\mu_{{z_{k,l_{2}}}}}}\right)}\right),

where 𝑯\bm{H} is the channel matrix, which can be estimated in the CS phase. We note that similar to the joint DAD-CE algorithm, (41) is also obtained by treating the interference from other devices as noise. We rewrite (38) as

ym,l2\displaystyle{y_{m,l_{2}}} =j𝒦(m,l2),l(m,l2)hm,jπij(s~j,l)+nm,l2\displaystyle=\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2}),l\in\mathcal{L}(m,l_{2})\end{subarray}}{{h_{m,j}}{\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)}+{n_{m,l_{2}}}} (42)
=hm,kπik(s~k,l1)+j𝒦(m,l2)\kl(m,l2)\l1hm,jπij(s~j,l)+nm,l2zk,l2,\displaystyle={h_{m,k}}\pi_{i_{k}}\left(\tilde{s}_{k,l_{1}}\right)\!+\!\underbrace{\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2})\backslash k\\ l\in\mathcal{L}(m,l_{2})\backslash l_{1}\end{subarray}}{{h_{m,j}}\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\!+\!{n_{m,l_{2}}}}}_{z_{k,l_{2}}},

where nm,l2n_{m,l_{2}} is the Gaussian noise with zero mean and variance σn2{\sigma_{n}^{2}}, and 𝒦(m,l2),(m,l2)\mathcal{K}(m,l_{2}),\mathcal{L}(m,l_{2}) denote the set of devices as well as the corresponding bits related with SN ym,l2y_{m,l_{2}}, respectively. Accordingly, the set of related VNs is {sj,l|j𝒦(m,l2),l(m,l2)}\left\{s_{j,l}|j\in\mathcal{K}(m,l_{2}),l\in\mathcal{L}(m,l_{2})\right\} and we note that the subscripts jj and ll are one-to-one mappings. For instance, Fig. 7 showcases the VNs related with SN y1,3y_{1,3}. In this regard, 𝒦(1,3)={1,3,5}\mathcal{K}(1,3)=\left\{1,3,5\right\} and (1,3)={5,4,2}\mathcal{L}(1,3)=\left\{5,4,2\right\} and the corresponding VNs are {s1,5,s3,4,s5,2}\left\{s_{1,5},s_{3,4},s_{5,2}\right\}. The Gaussian noise nm,l2n_{m,l_{2}} and the interference from other devices are all treated as noise denoted by zk,l2z_{k,l_{2}}, which is a Gaussian variable with mean μzk,l2{\mu_{{z_{k,l_{2}}}}} and variance σzk,l22{\sigma_{{z_{k,l_{2}}}}^{2}} given by

μzk,l2\displaystyle{\mu_{{z_{k,l_{2}}}}} =j𝒦(m,l2)\kl(m,l2)\l1hm,j𝔼[πij(s~j,l)]\displaystyle=\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2})\backslash k\\ l\in\mathcal{L}(m,l_{2})\backslash l_{1}\end{subarray}}{{h_{m,j}}\cdot\mathbb{E}\left[\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right]} (43)
σzk,l22\displaystyle\sigma_{{z_{k,l_{2}}}}^{2} =j𝒦(m,l2)\kl(m,l2)\l1|hm,j|2Var[(πij(s~j,l))]+σn2.\displaystyle=\sum\limits_{\begin{subarray}{c}j\in\mathcal{K}(m,l_{2})\backslash k\\ l\in\mathcal{L}(m,l_{2})\backslash l_{1}\end{subarray}}{{{\left|{{h_{m,j}}}\right|}^{2}}\cdot\text{Var}\left[\left(\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right)\right]}+\sigma_{n}^{2}. (44)

And for BPSK modulation, the mean and variance of πij(s~j,l)\pi_{i_{j}}\left(\tilde{s}_{j,l}\right) is given by

𝔼[πij(s~j,l)]\displaystyle\mathbb{E}\left[\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right] =2πij(Pj,lm,l2)1\displaystyle=2\cdot\pi_{i_{j}}\left({P_{j,l\to m,l_{2}}}\right)-1 (45)
Var[(πij(s~j,l))]\displaystyle\text{Var}\left[\left(\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right)\right] =4πij((1Pj,lm,l2)Pj,lm,l2),\displaystyle=4\cdot\pi_{i_{j}}\left(\left({1-{P_{j,l\to m,l_{2}}}}\right)\cdot{P_{j,l\to m,l_{2}}}\right), (46)

where Pj,lm,l2{P_{j,l\to m,l_{2}}} denotes the probability of VN sj,l=1s_{j,l}=1, and is initialized to 0.50.5. We note that it needs to be interleaved before the calculation. As shown in Fig. 9, Pk,l1m,l2{P_{k,l_{1}\to m,l_{2}}} is updated by collecting the incoming messages from CNs related with VN sk,l1s_{k,l_{1}} and those from all SNs except ym,l2y_{m,l_{2}}. We give the update rule as below.

Pk,l1m,l2=exp(Λ+R)1+exp(Λ+R),{P_{k,l_{1}\to m,l_{2}}}\!=\!\frac{{\exp\!\left(\Lambda+R\right)}}{{1+\exp\left(\Lambda+R\right)}}, (47)

where

Λ=jM\mπij1(Λj,l2k,l1),R=j𝒩c(k,l1)Rk,jl1.\Lambda=\!\!\!\sum\limits_{j\in M\backslash m}{\pi_{i_{j}}^{-1}\left({\Lambda_{j,l_{2}\to k,l_{1}}}\right)},~{}~{}R=\!\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)}{{R_{k,j\to l_{1}}}}. (48)

Similar to (39), the message Λj,l2k,l1\Lambda_{j,l_{2}\to k,l_{1}} needs to be de-interleaved in the update of PP. The LLR of VN sk,l1s_{k,l_{1}} at the end of an iteration is given by

Lk,l1=jMπij1(Λj,l2k,l1)+j𝒩c(k,l1)Rk,jl1.{L_{k,l_{1}}}=\sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left({\Lambda_{j,l_{2}\to k,l_{1}}}\right)+\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)}{{R_{k,j\to l_{1}}}}}. (49)

The information bit v^k,l1c\hat{v}_{k,l_{1}}^{c} is decoded as one if Lk,l1>0L_{k,l_{1}}>0 and zero otherwise. Since LDPC codes are described by the parity matrix 𝑪\bm{C}, the iteratively decoding process is continued till mod(𝑪𝒗^kc,2)=0\text{mod}\left(\bm{C}\hat{\bm{v}}_{k}^{c},2\right)=0 or the maximum number of iterations NiterN_{iter} is reached.

To further improve the spectrum efficiency, the QPSK modulation is also considered in this coding system, of which the constellation set is 𝕊={±1/2,±1/2i}\mathbb{S}=\left\{\pm 1/\sqrt{2},\pm 1/\sqrt{2}i\right\}. Briefly, QPSK modulated signals can be split into two orthogonal BPSK ones. As such, we can implement the above MP algorithm on these two signals separately. Additionally, the real and imaginary parts of the messages Λ\Lambda and PP need to be considered separately, so does the parity of subscript l1l_{1} in Qk,l1nQ_{k,l_{1}\to n} and Rk,nl1R_{k,n\to l_{1}}. It is worth noting that the range of l1l_{1} in Λ\Lambda and PP is half of that in QQ and PP, i.e., [1:L~c/2]\left[1:\tilde{L}_{c}/2\right], since there are both messages on the real and imaginary parts. In what follows we give the updated rules of these messages.

Similar to (41), Λ\Lambda is the LLR of the probability that sk,l1s_{k,l_{1}} takes different values in 𝕊\mathbb{S}. However, it is no longer a real number. Instead, it is given by

Λm,l2k,l1=22σzk,l22hm,k(ym,l2μzk,l2),\Lambda_{m,l_{2}\to k,l_{1}}=\frac{2\sqrt{2}}{{\sigma_{{z_{k,l_{2}}}}^{2}}}\cdot{h_{m,k}^{*}\cdot\left({{y_{m,l_{2}}}-{\mu_{{z_{k,l_{2}}}}}}\right)}, (50)

where the mean and variance of zk,l2z_{k,l_{2}} are given in (43) and (44), respectively. And for QPSK modulation, the mean and variance of πij(s~j,l)\pi_{i_{j}}\left(\tilde{s}_{j,l}\right) is given by

𝔼[πij(s~j,l)]\displaystyle\mathbb{E}\left[\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right] =1/2{2πij(Pj,lm,l2r)1\displaystyle=1/\sqrt{2}\cdot\left\{2\cdot\pi_{i_{j}}\left({P_{j,l\to m,l_{2}}^{r}}\right)-1\right.
+[2πij(Pj,lm,l2i)1]i}\displaystyle\quad\quad\quad~{}\left.+\left[2\cdot\pi_{i_{j}}\left({P_{j,l\to m,l_{2}}^{i}}\right)-1\right]\cdot i\right\} (51)
Var[(πij(s~j,l))]\displaystyle\text{Var}\left[\left(\pi_{i_{j}}\left(\tilde{s}_{j,l}\right)\right)\right] =2πij[Pj,lm,l2r(Pj,lm,l2r)2\displaystyle=2\cdot\pi_{i_{j}}\left[{P_{j,l\to m,l_{2}}^{r}}-\left({P_{j,l\to m,l_{2}}^{r}}\right)^{2}\right.
+Pj,lm,l2i(Pj,lm,l2i)2],\displaystyle\quad\quad\quad~{}\left.+{P_{j,l\to m,l_{2}}^{i}}-\left({P_{j,l\to m,l_{2}}^{i}}\right)^{2}\right], (52)

where Pj,lm,l2r{P_{j,l\to m,l_{2}}^{r}} and Pj,lm,l2i{P_{j,l\to m,l_{2}}^{i}} are the real and imaginary parts of Pj,lm,l2P_{j,l\to m,l_{2}}, respectively. Similar to (47) and (48), Pk,l1m,l2r{P_{k,l_{1}\to m,l_{2}}^{r}} and Pk,l1m,l2i{P_{k,l_{1}\to m,l_{2}}^{i}} are given by

Pk,l12m,l2r\displaystyle{P_{k,\lceil\frac{l_{1}}{2}\rceil\to m,l_{2}}^{r}} =exp[Re(Λ)+R]1+exp[Re(Λ)+R],l1is odd,\displaystyle=\frac{\exp\left[\text{Re}\left(\Lambda\right)+R\right]}{{1+\exp\left[\text{Re}\left(\Lambda\right)+R\right]}},~{}l_{1}~{}\text{is odd}, (53)
Pk,l12m,l2i\displaystyle{P_{k,\frac{l_{1}}{2}\to m,l_{2}}^{i}} =exp[Im(Λ)+R]1+exp[Im(Λ)+R],l1is even.\displaystyle=\frac{\exp\left[\text{Im}\left(\Lambda\right)+R\right]}{{1+\exp\left[\text{Im}\left(\Lambda\right)+R\right]}},~{}l_{1}~{}\text{is even}. (54)

As such, the range of subscript l1l_{1} in PP is half of that in RR as aforementioned. Λ\Lambda and RR are defined in (48). The update rule for message RR is the same as (40) and that for QQ is given by

Qk,l1n={jMπij1(Re(Λj,l2k,l12))+j𝒩c(k,l1)\nRk,jl1,l1is oddjMπij1(Im(Λj,l2k,l12))+j𝒩c(k,l1)\nRk,jl1,l1is even{Q_{k,l_{1}\to n}}=\left\{\begin{array}[]{l}\sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left(\text{Re}\left(\Lambda_{j,l_{2}\to k,\lceil\frac{l_{1}}{2}\rceil}\right)\right)}\!\\ \quad\quad\quad+\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n}{\!\!\!{R_{k,j\to l_{1}}}},~{}l_{1}~{}\text{is odd}\\ \sum\limits_{j\in M}{\pi_{i_{j}}^{-1}\left(\text{Im}\left(\Lambda_{j,l_{2}\to k,\frac{l_{1}}{2}}\right)\right)}\!\\ \quad\quad\quad+\!\!\sum\limits_{j\in{\mathcal{N}_{c}}\left({k,l_{1}}\right)\backslash n}{\!\!\!{R_{k,j\to l_{1}}}},~{}l_{1}~{}\text{is even}\\ \end{array}\right. (55)

The LLR of VN sk,l1s_{k,l_{1}} at the end of an iteration is given by

Lk,l1=Qk,l1n+Rk,nl1,n𝒩c(k,l1).L_{k,l_{1}}={Q_{k,l_{1}\to n}}+R_{k,n\to l_{1}},~{}\forall n\in\mathcal{N}_{c}(k,l_{1}). (56)

The decision rule and termination condition are the same as those in the BPSK system mentioned earlier. Finally, we obtain the decoded messages.

Note that the estimated number of active devices KK is not guaranteed to be equal to KaK_{a}. Therefore, not all the decoded messages satisfy the parity check. We denote 𝒱^={𝒗^kc,k𝒦^}\widehat{\mathcal{V}}=\left\{\hat{\bm{v}}_{k}^{c},k\in\widehat{\mathcal{K}}\right\} and 𝒦^\widehat{\mathcal{K}} as the set of successfully decoded messages and the corresponding devices, respectively. And we have |𝒦^|Ka\left|{\widehat{\mathcal{K}}}\right|\leq{K_{a}}. To further improve the performance, we combine the MIMO-LDPC decoder with the SIC method and we denote it as the MIMO-LDPC-SIC algorithm, which works as follows.

Let 𝐇^CK×M\widehat{\mathbf{H}}\in{C^{K\times M}} and 𝒦\mathcal{K} denote the channel matrix and the set of active devices estimated in the CS phase. Let 𝒱^0{\hat{\mathcal{V}}_{0}} and 𝒦^0{\hat{\mathcal{K}}_{0}} respectively denote the sets of correctly decoded messages (i.e., those that satisfy the check) and the corresponding devices obtained by the decoder, which are initialized to empty sets. With 𝒀c\bm{Y}_{c}, interleaving patterns {πik,k𝒦}\left\{\pi_{i_{k}},k\in\mathcal{K}\right\} and 𝑯^\hat{\bm{H}}, the decoder outputs the set of successfully decoded messages 𝒱^{\widehat{\mathcal{V}}} and the corresponding devices 𝒦^{\widehat{\mathcal{K}}}. Then we have 𝒱^0𝒱^0𝒱^{\widehat{\mathcal{V}}_{0}}\leftarrow{\widehat{\mathcal{V}}_{0}}\cup\widehat{\mathcal{V}}, 𝒦^0𝒦^0𝒦^{\widehat{\mathcal{K}}_{0}}\leftarrow{\widehat{\mathcal{K}}_{0}}\cup\widehat{\mathcal{K}} and 𝐇=𝐇^k,:{\mathbf{H}}={\widehat{\mathbf{H}}_{k,:}} for k𝒦\𝒦^0k\in{\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}. The residual signal is updated by

𝐘=𝐘ck𝒦^0πik(𝐬~k)𝐇^k,:,\mathbf{Y}={\mathbf{Y}_{c}}-\sum\nolimits_{k\in{{\hat{\mathcal{K}}}_{0}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\widehat{\mathbf{H}}_{k,:}}}, (57)

where 𝐬~kLc×1\tilde{\mathbf{s}}_{k}\in\mathbb{C}^{L_{c}\times 1} is the kk-th codeword in 𝒱^0\widehat{\mathcal{V}}_{0} after modulation and zero-padding. The updated YY and 𝑯\bm{H} as well as the interleaving patterns are sent to the MIMO-LDPC decoder for the next round of decoding. This iterative process ends when 𝒦^={\hat{\mathcal{K}}}=\emptyset or 𝒦\𝒦^0={\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}=\emptyset. The overall decoding algorithm is summarized in Algorithm 3.

Algorithm 3 MIMO-LDPC-SIC Decoding Algorithm
1:  Input: 𝒀𝒄\bm{Y_{c}}, 𝑯^\hat{\bm{H}}, ={ik,k𝒦}\mathcal{L}=\left\{{{i_{k}},k\in\mathcal{K}}\right\}, σn2\sigma_{n}^{2}
2:  Initialize:𝒀=𝒀𝒄\bm{Y}\!=\!\bm{Y_{c}}, 𝑯=𝑯^\bm{H}\!=\!\hat{\bm{H}}, 𝒱^0{\widehat{\mathcal{V}}_{0}}\!\leftarrow\!\emptyset, 𝒦^0{\widehat{\mathcal{K}}_{0}}\!\leftarrow\!\emptyset, Rk,nl1=0{R_{k,n\to l_{1}}}=0,BPSK: Pk,l1m,l2=0.5{P_{k,l_{1}\to m,l_{2}}}=0.5, QPSK: Pk,l1m,l2=0.5+0.5i{P_{k,l_{1}\to m,l_{2}}}=0.5+0.5i.
3:  repeat
4:     repeat
5:        Λ\Lambda update: Λm,l2k,l1\Lambda_{m,l_{2}\to k,l_{1}} by (41) or (50).
6:        QQ update: Qk,l1nQ_{k,l_{1}\to n} by (39) or (55).
7:        RR update: Rk,nl1R_{k,n\to l_{1}} by (40).
8:        PP update: Pk,l1m,l2P_{k,l_{1}\to m,l_{2}} by (47) or (53)-(54).
9:        LL update and hard decision: Lk,l1L_{k,l_{1}} by (49) or (56).
10:     until NiterN_{iter} reached or mod(𝑪𝒗^c,2)=0\text{mod}(\bm{C}\hat{\bm{v}}^{c},2)=0.
11:     Output: 𝒱^{\widehat{\mathcal{V}}}, 𝒦^{\widehat{\mathcal{K}}}
12:     𝒱^0𝒱^0𝒱^{\widehat{\mathcal{V}}_{0}}\leftarrow{\widehat{\mathcal{V}}_{0}}\cup\widehat{\mathcal{V}}, 𝒦^0𝒦^0𝒦^{\widehat{\mathcal{K}}_{0}}\leftarrow{\widehat{\mathcal{K}}_{0}}\cup\widehat{\mathcal{K}}.
13:     𝐇=𝐇^k,:{\mathbf{H}}={\widehat{\mathbf{H}}_{k,:}} for k𝒦\𝒦^0k\in{\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}.
14:     𝐘=𝐘ck𝒦^0πik(𝐬~k)𝐇^k,:\mathbf{Y}={\mathbf{Y}_{c}}-\sum\nolimits_{k\in{{\hat{\mathcal{K}}}_{0}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\widehat{\mathbf{H}}_{k,:}}}.
15:  until 𝒦^={\widehat{\mathcal{K}}}=\emptyset or 𝒦\𝒦^0={\mathcal{K}}\backslash{\widehat{\mathcal{K}}_{0}}=\emptyset.
16:  Return: 𝒱^0{\widehat{\mathcal{V}}_{0}}, 𝒦^0{\widehat{\mathcal{K}}_{0}}

We note that the stitching of the messages in the CS and LDPC phases is easy. In URA, devices are not identified, as such, the IDs can not be employed to distinguish the messages. We have known that the interleaving patterns and channels {πik,𝒉^k,k𝒦^0}\left\{\pi_{i_{k}},\hat{\bm{h}}_{k},k\in\widehat{\mathcal{K}}_{0}\right\} obtained in the CS phase acting as key parameters participate in the decoding process in the LDPC phase. For each decoded message in 𝒱^0\widehat{\mathcal{V}}_{0}, it is decoded with a specific interleaving pattern πik\pi_{i_{k}} as well as the channel 𝒉^k\hat{\bm{h}}_{k}. As aforementioned, πik\pi_{i_{k}} is uniquely determined by the message index representation iki_{k}, which directly corresponds to device kk’ preamble. Therefore, πik\pi_{i_{k}} and 𝒉^k\hat{\bm{h}}_{k} establish a connection of devices’ messages in the two phases. Briefly, if 𝒗^c\hat{\bm{v}}^{c} is successfully decoded with the participation of πik\pi_{i_{k}} and 𝒉^k\hat{\bm{h}}_{k}, it is exactly the latter BcB_{c} bits of message of device kk. As such, the stitching of the messages in two phases will not be a problem.

V-C Joint Update

The above CS and LDPC decoders can recover the BB bits of information with their work in tandem. Besides, thanks to the consistency of the above MP algorithm, the BP-based decoder can draw a connection of the decoding process in the CS and LDPC phases. That is, messages in the CE as well as the MIMO-LDPC decoding processes can be jointly updated by utilizing the belief of each other, thus leading to improved performance. This joint update algorithm is denoted as joint DAD-CE-DD algorithm and elaborated as follows.

For the successfully decoded devices 𝒦^0{\widehat{\mathcal{K}}_{0}}, the corresponding messages {πik(𝐬~k),k𝒦^0}\left\{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right),k\in\widehat{\mathcal{K}}_{0}\right\} can be leveraged as soft pilot sequences joint with their codewords {𝒂ik,k𝒦^0}\left\{\bm{a}_{i_{k}},k\in\widehat{\mathcal{K}}_{0}\right\} in the CS phase to carry out a second CE. This longer pilot sequence will lead to a better CE performance, which has been confirmed in our simulation in Fig. 15. We note that the subsequent CE is conducted via the above joint DAD-CE algorithm with devices’ activity fixed. In this regard, the Gaussian messages {μlmkmSN,𝚺lkSN,μlmkmVN,𝚺lkVN}\left\{\mu_{lm\to km}^{SN},\bm{\Sigma}_{l\to k}^{SN},\mu_{lm\to km}^{VN},\bm{\Sigma}_{l\to k}^{VN}\right\} in (20)-(21) and (28)-(29) are iteratively updated with a longer observation sequence {𝒚l,l[1:Lp+Lc]}\left\{\bm{y}_{l},l\in\left[1:L_{p}+L_{c}\right]\right\}. Besides, it is worth noting that in the CE output in (32)-(33), the prior mean and covariance μkpri,𝚺kpri\mu_{k}^{pri},\bm{\Sigma}_{k}^{pri} are no longer zero and 𝑰M\bm{I}_{M}, respectively. Instead, they are the output estimation μkdec\mu_{k}^{dec} and estimating deviation 𝚺kdec\bm{\Sigma}_{k}^{dec} of 𝒉k\bm{h}_{k} in the first CE, respectively.

Algorithm 4 Joint DAD-CE-DD Algorithm
1:  Input: 𝒀𝒑\bm{Y_{p}}, 𝒀𝒄\bm{Y_{c}}, 𝑨\bm{A}, 𝝁pri\bm{\mu}^{pri}, 𝚺pri\bm{\Sigma}^{pri}, σn2\sigma_{n}^{2}, pap_{a}
2:  \triangleright CS Phase:  Joint DAD-CE Algorithm
3:   \triangleright Collision Resolution Protocol
4:   Output: 𝝁dec\bm{\mu}^{dec}, 𝚺dec\bm{\Sigma}^{dec}, ={ik,k𝒦}\mathcal{L}=\left\{{{i_{k}},k\in\mathcal{K}}\right\}
5:  \triangleright Joint DAD-CE-DD Algorithm:
6:   Initialize: 𝒀~r=𝒀c\widetilde{\bm{Y}}_{r}=\bm{Y}_{c}, 𝒱~=\widetilde{\mathcal{V}}=\emptyset, 𝒦~=\widetilde{\mathcal{K}}=\emptyset
7:   repeat
8:    \triangleright LDPC Phase:
9:     Input: 𝒀~r\widetilde{\bm{Y}}_{r}, 𝝁dec\bm{\mu}^{dec}, \mathcal{L}, σn2\sigma_{n}^{2}
10:     MIMO-LDPC-SIC Decoding Algorithm
11:     Output: 𝒱^0{\widehat{\mathcal{V}}_{0}}, 𝒦^0{\widehat{\mathcal{K}}_{0}}
12:    𝒱~𝒱~𝒱^0{\widetilde{\mathcal{V}}}\leftarrow{\widetilde{\mathcal{V}}}\cup\widehat{\mathcal{V}}_{0}, 𝒦~𝒦~𝒦^0{\widetilde{\mathcal{K}}}\leftarrow{\widetilde{\mathcal{K}}}\cup\widehat{\mathcal{K}}_{0}
13:    𝒀~c=k𝒦^0πik(𝐬~k)𝝁kdec+𝒁\widetilde{\bm{Y}}_{c}=\sum\nolimits_{k\in{{\widehat{\mathcal{K}}}_{0}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\bm{\mu}^{dec}_{k}}}+\bm{Z}
14:     \triangleright CS Phase:
15:     Input: 𝒀p\bm{Y}_{p}, 𝑨\bm{A}, 𝒀~c\widetilde{\bm{Y}}_{c}, 𝒱^0{\widehat{\mathcal{V}}_{0}}, 𝝁dec\bm{\mu}^{dec}, 𝚺dec\bm{\Sigma}^{dec}, σn2\sigma_{n}^{2}
16:     CE (Activity fixed)
17:     \triangleright Collision Resolution Protocol
18:     Output: 𝝁~dec\widetilde{\bm{\mu}}^{dec}, 𝚺~dec\widetilde{\bm{\Sigma}}^{dec}
19:    𝒀~r=𝐘ck𝒦~πik(𝐬~k)𝝁~kdec\widetilde{\bm{Y}}_{r}={\mathbf{Y}_{c}}-\sum\nolimits_{k\in{{\widetilde{\mathcal{K}}}}}{{\pi_{{i_{k}}}}\left({{\tilde{\mathbf{s}}_{k}}}\right){\widetilde{\bm{\mu}}^{dec}_{k}}}
20:   until   𝒦^0={\widehat{\mathcal{K}}}_{0}=\emptyset or 𝒦\𝒦~={\mathcal{K}}\backslash{\widetilde{\mathcal{K}}}=\emptyset
21:  Return: 𝒱~\widetilde{\mathcal{V}}, 𝒦~\widetilde{\mathcal{K}}, ={ik,k𝒦~}\mathcal{L}=\left\{{{i_{k}},k\in\widetilde{\mathcal{K}}}\right\}, 𝝁~dec\widetilde{\bm{\mu}}^{dec}

Remark 1: We remark that messages in the MP algorithm exhibit the property of consistency and unity. As such, messages among different parts can always be jointly processed and updated. For instance, recalling the Joint DAD-CE algorithm, where the Bernoulli messages are updated by utilizing the Gaussian messages. Likewise, in the MIMO-LDPC-SIC Decoding algorithm, the MP algorithm can be applied to MIMO detection or LDPC decoding. In the proposed algorithm, we combine these two parts and update the underlying messages jointly, i.e., the LLR message of the symbol in MIMO detection can be involved in the LDPC decoding process, and vice versa. Moreover, the property is again exploited in the Joint DAD-CE-DD algorithm. By leveraging the belief of the estimated channel and the correctness of the LDPC codewords, the CE can be again performed aided with the correctly decoded codewords in the LDPC phase, thus connecting these two phases. Throughout the paper, we take into consideration the idea of the joint update for the messages in MP-based algorithms.

One might argue that for the correctly decoded messages, improving the accuracy of the corresponding channels will not bring substantial performance improvements. Nevertheless, according to the SIC method in (57), by improving the channel accuracy of those devices whose messages are successfully decoded, the residual signal of the incorrectly decoded messages can be obtained more accurately. We denote the residual signal with improved accuracy as 𝒀~r\widetilde{\bm{Y}}_{r}. Consequently, 𝒀~r\widetilde{\bm{Y}}_{r} will improve the performance for those messages that have not been successfully decoded yet. And this is the reason why the channel needs to be estimated twice or more times. We give a brief diagram for this algorithm in Fig. 10, where 𝒀~c\widetilde{\bm{Y}}_{c} denotes the noisy signal reconstructed from the correctly decoded messages.

Refer to caption

Figure 10: The diagram of the joint DAD-CE-DD algorithm.

We note this data-aided CE algorithm appears to be similar to the work in [14], which employs the data as soft pilots to conduct CE jointly with the real pilot sequences. However, the convergence and correctness of the estimated data cannot be verified, which may result in the propagation of errors and failure of the joint data and channel estimation. Whereas the convergence of the proposed algorithm can be guaranteed. The proposed Joint DAD-CE-DD algorithm consists of two modules, namely, the Joint DAD-CE algorithm for activity detection and channel estimation, and the MIMO-LDPC-SIC Decoding algorithm for data decoding. Messages are passed between these two modules. For arbitrary channel estimation results as input, the decoding process will be executed by the MIMO-LDPC-SIC Decoding algorithm. Once a codeword is successfully decoded (i.e., satisfies the check), it will not be changed in subsequent iterations. Therefore, the number of correctly decoded codewords is monotonically non-decreasing at each iteration. Correspondingly, the error rate PeP_{e} is monotonically non-increasing, and also, it is bounded below by zero. According to the well-known monotone convergence theorem [45], the proposed algorithm is guaranteed to converge. Besides, with the correctly decoded codewords aided to estimate the channel, an improved CE accuracy can be guaranteed and it contributes to a higher probability of successfully decoding for those who have not yet. This iterative algorithm ends when no more messages are correctly decoded or all the messages are decoded successfully, which is summarized in Algorithm 4.

V-D Complexity

In this section, we compare the complexity of the proposed algorithm with the existing alternative in terms of the real-number multiplication (or division), addition (or subtraction), and other operations (eg., exp, log, tan, tan1\text{tan}^{-1}), as shown in Table I. As a divide-and-conquer strategy, there are inner and outer decoders in the CB-ML algorithm, referred to as ML and tree decoder, respectively. The complexity of our algorithm is mainly incurred by the Joint DAD-CE algorithm and MIMO-LDPC decoder. In Table I, the parameters α\alpha and cc satisfy that Bp=αlog2Ka,Lp=clog2KaB_{p}=\alpha\log_{2}K_{a},L_{p}=c\log_{2}K_{a}, respectively. We note that the complexity of the tree decoder is defined by the total number of parity check constraints that need to be verified [19], and this is obtained in the regime that BpB_{p} and KaK_{a} tend to infinity. Nevertheless, in practice, it increases exponentially with the number of slots SS. Besides, the tree code in [32] verifies the estimated codewords in each segment successively, and the check and estimate are performed separately. While during the LDPC decoding process in our algorithm, the check and estimate of the codeword are performed jointly and simultaneously, both by calculating the soft messages. Once the codeword is estimated, the corresponding check is completed. As for the inner code, the complexity of the ML decoder is nearly the same order as that of our algorithm. However, as a consequence of the coordinate descent algorithm, there are 2Bp2^{B_{p}} cycles in the ML decoding per iteration, which can only be computed successively. On the contrary, as a remarkable property of the MP algorithm, all computations in our algorithm can be decomposed into many local ones, which can be performed in parallel over the factor graph. Hence, our algorithm has lower time complexity. In general, the proposed algorithm exhibits a complexity linear with K,M,LK,M,L, which is lower than the existing CB-ML algorithm.

TABLE I: COMPLEXITY COMPARISON PER ITERATION
Algorithms Real Multi. / Div.      Real Add. / Sub. Others
CB-ML ML decoder 𝒪(KLp2)\mathcal{O}(KL_{p}^{2}) 𝒪(KLp2)\mathcal{O}(KL_{p}^{2}) \\backslash
Tree decoder 𝒪(Kaα/clog2Ka),Bp,Ka\mathcal{O}\left(K_{a}^{\alpha/c}\text{log}_{2}K_{a}\right),B_{p},K_{a}\rightarrow\infty \\backslash
Joint DAD-CE-DD Joint DAD-CE 𝒪(KMLp)\mathcal{O}(KML_{p}) 𝒪(KMLp)\mathcal{O}(KML_{p}) exp / log: 𝒪(KLp)\mathcal{O}(KL_{p})
MIMO-LDPC 𝒪(K^ML~c)\mathcal{O}(\widehat{K}M\widetilde{L}_{c}) 𝒪(K^ML~c)\mathcal{O}(\widehat{K}M\widetilde{L}_{c}) exp: 𝒪(K^ML~c)\mathcal{O}(\widehat{K}M\widetilde{L}_{c}), tanh / tanh1\text{tanh}^{-1}: 𝒪(K^NNv)\mathcal{O}(\widehat{K}NN_{v})
Note: K=2BpK=2^{B_{p}}, K^,N,Nv\widehat{K},N,N_{v} denote the estimated number of active devices, the rows and row weights of the parity check matrix, respectively.

VI Numerical Results

VI-A Parameter Settings

In this section, we assess the overall performance of the proposed framework with the metric defined in (3)-(4). The CB-ML proposed by Fengler et. al. in [32] serves as the baseline in this paper. For the sake of fair comparison to the benchmarks, and isolating the fundamental aspects of the problem without additional model complication, we consider the flat path loss model in the simulation, i.e., the channel is i.i.d. Rayleigh fading model and the LSFC is fixed to βk=1\beta_{k}=1 in all schemes. We note again that this can be achieved by the well-studied power control schemes [40, 41, 42] in practice. As such, the carrier frequency is not specified in the simulation, since it can be arbitrary and does not affect the performance of the proposed algorithm, when considering flat path loss model. The user distribution is considered to be uniformly distributed in the cell. The noise variance is set to σn2=1\sigma_{n}^{2}=1 and is known to all schemes. The maximum scheduling times for the collision resolution protocol is tmax=3t_{max}=3. For the joint DAD-CE algorithm, MIMO-LDPC-SIC decoder, and Joint DAD-CE-DD algorithm, the maximum number of iterations is set to Niter=20N_{iter}=20, 3030, and 2020, respectively. We fix the message length to B=96B=96 with Bp=12B_{p}=12 and Bc=84B_{c}=84. The length of the CS codeword is Lp=100L_{p}=100 in both schemes. In the channel coding part, we employ the (3,6)(3,6)-regular LDPC code [27] with the rate 0.50.5. In our scheme, the length of channel use varies with the number of zeros padded in the sequence. However, it is discretely valued in the CB-ML scheme and changes with the number of slots denoted as SS. Since the length of the information is fixed, we can change SS by adjusting the parity check allocation, which is given in Table II according to the principle in [19]. We note that the slot length J=12J=12 aligns with BpB_{p}.

TABLE II: The parity check allocation for different number of slots.
       SS        Parity Check Allocation
       12        (12,0), (3,9), (9,3), \cdots, (9,3), (0,12)
       13        (12,0), (4,8), (8,4), \cdots, (0,12)
       14        (12,0), (7,5), (7,5), \cdots, (7,5), (0,12)
       15        (12,0), (7,5), (7,5), \cdots, (7,5), (0,12), (0,12)
       16        (12,0), (6,6), (6,6), \cdots, (6,6), (0,12)
       17        (12,0), (6,6), (6,6), \cdots, (6,6), (0,12), (0,12)

In Table II, (3,9) means that the first three are data bits and the last nine are parity bits and so on. For more details, we refer the reader to [32] for CB-ML algorithm and [19] for the tree coding scheme.

VI-B Results

Refer to caption

Figure 11: Performance of the proposed URA schemes as a function a the code rate. Eb/N0=10E_{b}/N_{0}=10 dB, M=30M=30, Ka=50K_{a}=50.

Refer to caption

Figure 12: Performance of the proposed URA schemes as a function of Eb/N0E_{b}/N_{0}, the number of antennas MM and active devices KaK_{a}, respectively. Parameter settings: Eb/N0=10E_{b}/N_{0}=10 dB, M=30M=30, Ka=50K_{a}=50, L=1600L=1600.

We first evaluate the error rate performance of the proposed schemes compared with the CB-ML scheme as a function of the code rate. The relationship between the code rate RcR_{c} and channel use LL is Rc=B/LR_{c}=B/L. In Fig. 11, we fix the energy-per-bit Eb/N0=10E_{b}/N_{0}=10 dB, the number of antennas and active devices M=30M=30 and Ka=50K_{a}=50, respectively. In our schemes, No-colli-avoid refers to the scheme that there is no collision avoidance or the joint update. That is, the CS and LDPC phases work sequentially and potential collision may exist in the CS phase. For the other three schemes, the collision resolution protocol has been implemented and No-SIC denotes that the SIC method is not utilized in the LDPC phase nor do the two phases work jointly. Joint-BPSK and Joint-QPSK refer to the joint update algorithm with the SIC method in BPSK and QPSK modulations, respectively. As illustrated in Fig. 11, there is a substantial performance enhancement compared to the CB-ML algorithm. The main reason is that the employed LDPC code has a higher code rate than the tree code proposed in [19]. As such, the proposed algorithm can work well in a relatively high rate region while the CB-ML algorithm can not. Once we set a high rate, there will be a substantial performance enhancement. However, this improvement decreases with the decrease of rate. For the target error rate Pe=0.1P_{e}=0.1, the required code rate of the proposed Joint-BPSK is increased by 1.451.45 times that of CB-ML, while the Joint-QPSK increases even more. For instance, the proposed Joint-QPSK and Joint-BPSK outperform CB-ML with a nearly 0.80.8 dB gap and a 1.51.5 dB gap at Rc=0.06R_{c}=0.06, respectively. Additionally, we note that Joint-QPSK exhibits an overall 0.70.7 dB performance gain compared with Joint-BPSK in terms of the error rate. This is because only half of the channel use is needed for the QPSK than BPSK modulation and thus more zeros can be padded into the sequence. Consequently, the interference of devices in the QPSK system is further reduced, resulting in improved performance. Altogether, the gain of the collision resolution protocol increases with the decrease of the code rate. However, there is no much gain for the work of joint update. This is because the gain mainly comes from the reduction of interference of devices, which has been reduced to a relatively low level by the zero-padding. As such, the joint update cannot provide more gains. Nevertheless, as we will see shortly, there is a certain gain by the joint update under a higher interference scenario.

For the sake of complexity reduction, we employ BPSK modulation instead of QPSK in the subsequent simulations since both of them have already outperformed CB-ML. In Fig. 12, we compare the error rate performance of the proposed algorithms with respect to Eb/N0E_{b}/N_{0}, the number of antennas MM and active devices KaK_{a}, respectively. The number of channel uses is fixed to L=1600L=1600. Correspondingly, the data is split into 1616 slots in the CB-ML algorithm, of which the parity check allocation is given in Table. II. The other parameters are set as Eb/N0=10E_{b}/N_{0}=10 dB, M=30M=30 and Ka=50K_{a}=50. As illustrated in Fig. 12, the state-of-the-art method CB-ML suffers from high error floors, which stems from the poor parity check constraints. In contrast, with collision resolution, the proposed Joint and No-SIC schemes exhibit water-falling curves in terms of the error rate with respect to Eb/N0E_{b}/N_{0} and KaK_{a}, while they gradually stabilize with respect to MM. This is because the interference of devices cannot be reduced to infinitesimal by increasing MM. As such, error still exits even for a large MM.

Refer to caption

Figure 13: Performance of the proposed Joint-DAD-CE-DD algorithm as a function of MM and channel use. Parameter settings: Eb/N0=8E_{b}/N_{0}=8 dB, Ka=70K_{a}=70, Lp=100L_{p}=100.

Moreover, we evaluate the performance of the proposed Joint-DAD-CE-DD algorithm under a large-scale antenna with respect to different channel uses. As showcased in Fig. 13, the performance of the proposed algorithm is almost linear with the antennas, but with different slopes under different channel uses. We note that CB-ML cannot work at L=1600L=1600, while the Joint-DAD-CE-DD algorithm can achieve Pe<104P_{e}<10^{-4} when M>140M>140. Besides, the proposed algorithm only needs half of the channel uses, i.e., L=800L=800 to outperform CB-ML when M90M\geq 90. To sum up, the proposed algorithms outperform CB-ML in terms of error rate and spectral efficiency, with an explicit gain with respect to various variables.

Refer to caption

Figure 14: Error rate of the proposed URA schemes as a function of Eb/N0E_{b}/N_{0}. M=30M=30, Ka=40K_{a}=40, L=268L=268, Rc=0.36R_{c}=0.36.

In order to provide more insights about the performance gain of different methods, we compare the error rate and CE performance among the methods in terms of Eb/N0E_{b}/N_{0}. In this regard, we fix the channel use to L=268L=268 with BPSK modulation and no zero is padded, resulting in the increase of interference among devices. Besides, the code rate Rc=0.36R_{c}=0.36, which is relatively high and the CB-ML cannot work under this circumstance. In Fig. 14, No-Joint refers to the scheme with the SIC method but no joint update. Firstly, it is obvious that the collision resolution-based schemes all outperform the No-colli-avoid scheme with the increase of Eb/N0E_{b}/N_{0}. Then, the performance gain of the SIC method in the LDPC phase is about 0.30.3 dB or even more when Eb/N0E_{b}/N_{0} increases according to the comparison of No-SIC and No-Joint. Finally, by the comparison of No-Joint and Joint, there is also about 0.30.3 dB gain coming from the work of joint update and it decreases with the increase of Eb/N0E_{b}/N_{0}. At a higher Eb/N0E_{b}/N_{0}, such as 2020 dB, there is nearly no performance gain by the joint update. This demonstrates that under a higher interference level of devices, the joint update algorithm can provide a certain gain, which gradually decreases with the increase of Eb/N0E_{b}/N_{0}.

Refer to caption


Figure 15: NMSE of CE with the proposed URA schemes as a function of Eb/N0E_{b}/N_{0}. M=30M=30, Ka=40K_{a}=40, L=268L=268, Rc=0.36R_{c}=0.36.

The NMSE is employed to evaluate the CE performance. In Fig. 15, the AMP algorithm investigated in [4] is utilized as a baseline for comparison. When Eb/N0E_{b}/N_{0} ranges from 1313 to 1515 dB, the performance of Joint and No-Joint is of slight difference and both are slightly worse than AMP. However, the Joint scheme outperforms AMP with an ultra-linear speed when Eb/N0E_{b}/N_{0} exceeds 1515 dB. As aforementioned, this gain exactly comes from the Joint-DAD-CE-DD algorithm, where the real pilot sequences as well as correctly decoded messages jointly conduct the task of CE. Figs. 14 and 15 demonstrate that the joint update indeed contributes to the improved accuracy of both the DD and CE. In general, the proposed algorithms provide substantial performance enhancements compared with CB-ML in terms of efficiency and accuracy.

VII Conclusion

In this paper, we have investigated a joint DAD, CE, and DD algorithm for MIMO massive URA. Different from the state-of-the-art slotted transmission scheme, the data in the proposed framework has been split into only two parts. A portion of the data is coded by CS and the rest is LDPC coded. Based on the principle of the BP, the iterative MP algorithm has been utilized to decode these two parts of data. Moreover, by exploiting the concept of the belief within each part, a joint decoding framework has been proposed to further improve the performance. Additionally, based on the ED and SWP, a collision resolution protocol has been developed to resolve the codeword collision issue in the URA system. In addition to the complexity reduction, the proposed algorithm has exhibited a substantial performance enhancement compared to the state-of-the-art in terms of efficiency and accuracy. The possible avenues for future work are various. More realistic channel models can be taken into consideration for MIMO URA. By utilizing the sparsity in the virtual angular domain of the spatially correlated channel, the multi-user interference can be further reduced. In addition, the proposed algorithm can be extended to handle more practical scenarios in the presence of asynchronization and frequency offset. Moreover, although this paper only studies LDPC codes, other codes, such as Turbo codes and Polar codes, could be applied if iterative soft decodings with superior performance exist in a desired block length.

-A Analysis of the Collision Resolution Protocol

Let ϕ=diag{𝚽}\bm{\phi}=\text{diag}\{\bm{\Phi}\} denote the vector composed of diagonal elements of 𝚽\bm{\Phi} and Mp=2BpM_{p}=2^{B_{p}}. 𝚽\bm{\Phi} is the modified selection matrix defined in (8). Then we have ϕ=[ϕ1,ϕ2,,ϕMp]\bm{\phi}=\left[\phi_{1},\phi_{2},\cdots,\phi_{M_{p}}\right], which is drawn independently from the signal space

𝓢={ϕ{0,,Ka}Mp|i=1Mpϕi=Ka}\mathcal{\bm{S}}=\{\bm{\phi}\in\{0,\cdots,K_{a}\}^{M_{p}}|\sum\limits_{i=1}^{M_{p}}{\phi_{i}}=K_{a}\} (58)

according to the multinomial probability mass function :

p(ϕ)=P{ϕ=(ϕ1,ϕ2,,ϕMp)}=Ka!ϕ1!ϕMp!1MpKap(\bm{\phi})=P\{\bm{\phi}=\left(\phi_{1},\phi_{2},\!\cdots,\!\phi_{M_{p}}\right)\}=\frac{K_{a}!}{\phi_{1}!\cdots\phi_{M_{p}}!}\!\cdot\!\frac{1}{{M_{p}}^{K_{a}}} (59)

for ϕ𝓢\bm{\phi}\in\mathcal{\bm{S}}. Then the probability of no collision among devices equals the probability that KaK_{a} entries in ϕ\bm{\phi} are one and the others are zero, which is given by

Pnocolli\displaystyle P_{no-colli} =Ka!MpKa(MpKa)\displaystyle=\frac{K_{a}!}{{M_{p}}^{K_{a}}}\cdot\binom{M_{p}}{K_{a}} (60)
=Mp(Mp1)(MpKa+1)MpKa.\displaystyle=\frac{M_{p}(M_{p}-1)\cdots(M_{p}-K_{a}+1)}{{M_{p}}^{K_{a}}}.

-A1 The first window sliding

The number of devices in collision is

k0=Ka(1Pnocolli).k_{0}=K_{a}\cdot(1-P_{no-colli}). (61)

Let n0n_{0} and k0ik_{0}^{i} denote the numbers of collided messages in the first BpB_{p} bits and collided devices with each message, respectively, which satisfies i=1n0k0i=k0\sum\nolimits_{i=1}^{n_{0}}{k_{0}^{i}}=k_{0}.

Refer to caption
Figure 16: The first window sliding. Collided devices slide the window forward with window length and sliding length BpB_{p} and B0B_{0}, respectively.

For the ii-th collided message, the probability of no collision in the sliding part is

Pnocolli1=Mp1(Mp11)(Mp1k0i+1)Mp1k0i,i[1:n0],P_{no-colli-1}=\frac{M_{p1}(M_{p1}\!-\!1)\!\cdots\!(M_{p1}\!-\!k_{0}^{i}+1)}{{M_{p1}}^{k_{0}^{i}}},i\in\left[1:n_{0}\right], (62)

where Mp1=2B0M_{p1}=2^{B_{0}}. Besides, the common part is for devices to splice back the messages among different windows. If the common parts of different collided messages are the same, the message of one device will be spliced to the other’s. As such, an error occurs. Therefore, this probability should also be taken into consideration. The probability that the common parts of different messages are different is

Pnocolli1=Mp1(Mp11)(Mp1n0i+1)Mp1n0,P_{no-colli-1^{\prime}}=\frac{M_{p1^{\prime}}(M_{p1^{\prime}}-1)\cdots(M_{p1^{\prime}}-n_{0}^{i}+1)}{{M_{p1^{\prime}}}^{n_{0}}}, (63)

where Mp1=2BpB0M_{p1^{\prime}}=2^{B_{p}-B_{0}}. Pnocolli1P_{no-colli-1} refers to the probability of no collision after sliding once, while Pnocolli1P_{no-colli-1^{\prime}} represents the probability that messages of one device cross windows can be spliced back successfully after sliding and retransmission. However, there may still be collision after sliding once, which requires another round.

-A2 The second window sliding

For the ii-th collided message, the number of devices in collision after sliding once is

k1i=k0i(1Pnocolli1),i[1:n0].k_{1}^{i}=k_{0}^{i}\cdot(1-P_{no-colli-1}),~{}i\in\left[1:n_{0}\right]. (64)

Without loss of generality, we consider the ii-th collided message in the first sliding. Likewise, let n1in_{1}^{i} and k2jk_{2}^{j} denote the numbers of collided messages and the corresponding devices after the first sliding, respectively, which satisfies j=1n1ik2j=k1i\sum\nolimits_{j=1}^{n_{1}^{i}}{k_{2}^{j}}=k_{1}^{i}.

Refer to caption
Figure 17: The second window sliding with window length and sliding length BpB_{p} and B0B_{0}, respectively.

For the sliding part, the probability of no collision is

Pnocolli2=Mp1(Mp11)(Mp1k2j+1)Mp1k2j,j[1:n1i].P_{no-colli-2}=\frac{M_{p1}(M_{p1}\!-\!1)\!\cdots\!(M_{p1}\!-\!k_{2}^{j}\!+\!1)}{{M_{p1}}^{k_{2}^{j}}},j\in\left[1:n_{1}^{i}\right]. (65)

Note that in the common part, if there are identical sequences among niin_{i}^{i} messages, there will not be an error. What needs to be considered is whether the sequences among n0n_{0} messages are the same or not. Thus, the probability of no collision in the common part is

Pnocolli2=Mp1n1ii=1n0Mp1n1i.P_{no-colli-2^{\prime}}=\frac{\mathbb{P}_{M_{p1^{\prime}}}^{\sum{n_{1}^{i}}}}{\prod_{i=1}^{n_{0}}{\mathbb{P}_{M_{p1^{\prime}}}^{n_{1}^{i}}}}. (66)

Here we denote nr\mathbb{P}_{n}^{r} as the permutations, i.e., n!/(nr)!n!/(n-r)!. It is obvious that Mp1n1i<(Mp1)n0\mathbb{P}_{M_{p1^{\prime}}}^{n_{1}^{i}}<({M_{p1^{\prime}}})^{n_{0}} since nii<n0n_{i}^{i}<n_{0}. However, the relationship between i=1n0n1i\sum\nolimits_{i=1}^{n_{0}}{n_{1}^{i}} and n0n_{0} is unknown. As such, the relationship between Pnocolli2P_{no-colli-2^{\prime}} and Pnocolli1P_{no-colli-1^{\prime}} is also unknown. Fortunately, since k2j<k1i<k0ik_{2}^{j}<k_{1}^{i}<k_{0}^{i}, we have Pnocolli2>Pnocolli1P_{no-colli-2}>P_{no-colli-1}, which demonstrates that as the window sliding progresses, the probability of devices in a collision will decrease.

-A3 The ll-th window sliding

In order to draw a general conclusion, we extend the above derivation to the ll-th (l>2l>2) window sliding. Similarly, we denote nl1in_{l-1}^{i} and kljk_{l}^{j} as the numbers collided messages and corresponding devices after the (l1)(l-1)-th sliding, respectively, which satisfies j=1nl1iklj=kl1i\sum\nolimits_{j=1}^{n_{l-1}^{i}}{k_{l}^{j}}=k_{l-1}^{i}. As such, the probability of no collision after the ll-th sliding is given by

Pnocollil=Mp1(Mp11)(Mp1klj+1)Mp1klj,P_{no-colli-l}=\frac{M_{p1}(M_{p1}\!-\!1)\!\cdots\!(M_{p1}\!-\!k_{l}^{j}\!+\!1)}{{M_{p1}}^{k_{l}^{j}}}, (67)

where j[1:nl1i].j\in\left[1:n_{l-1}^{i}\right]. And the number of collided devices after the the ll-th sliding is

kl+1j=klj(1Pnocollil),j[1:nl1i].k_{l+1}^{j}=k_{l}^{j}\cdot(1-P_{no-colli-l}),~{}j\in\left[1:n_{l-1}^{i}\right]. (68)

Obviously, we have

kl+1j<klj<<k2j<k1i<k0i<<Ka.k_{l+1}^{j}<k_{l}^{j}<\cdots<k_{2}^{j}<k_{1}^{i}<k_{0}^{i}<\cdots<K_{a}. (69)

Consequently, the probability of no collision satisfies

Pnocollil>>Pnocolli1>Pnocolli.P_{no-colli-l}>\cdots>P_{no-colli-1}>P_{no-colli}. (70)

In this regard, we further have

kl+1j\displaystyle k_{l+1}^{j} <Ka(1Pnocollil)(1Pnocolli)\displaystyle<K_{a}\cdot(1-P_{no-colli-l})\cdots(1-P_{no-colli}) (71)
<Ka(1Pnocolli)(l+1),\displaystyle<K_{a}\cdot(1-P_{no-colli})^{(l+1)},

which demonstrates that as the sliding times ll goes to infinity, the number of collided devices will eventually approach zero.

Although the collision probability of the common part does not show an obvious trend of decreasing, in practice, we can reduce the collision probability by controlling the length of the window. There is a compromise between the collision probability of common and sliding parts. A longer sliding length will result in a lower collision probability in the sliding part, but at the same time, it will also increase that of the common part. Since the collision probability of the sliding part will eventually approach zero, the sliding length can be shorter in practice to minimize the collision probability of the common part.

-B Derivation of the Gaussian and Bernoulli Messages in V-A

-B1 Derivation of 𝚺lk\bm{\Sigma}_{lk} at SNs

According to the formulation in (17), since the activity and channel among different devices are both independent from each other, the (m,n)(m,n)-th (mn)(m\neq n) entry for 𝚺𝒛𝒍𝒌\bm{\Sigma_{z_{lk}}} is given by

(Σzlk)(m,n)\displaystyle(\Sigma_{z_{lk}})_{(m,n)} (72)
=iK\k|Ali|2𝔼{[ϕihimϕ^ih^im]\displaystyle=\sum\nolimits_{i\in K\backslash k}{{{\left|{{A_{li}}}\right|}^{2}}}\mathbb{E}\left\{{\left[{\phi_{i}}{h_{im}}-\hat{\phi}_{i}\cdot\hat{h}_{im}\right]}\right.
[ϕihinϕ^ih^in]}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad~{}\left.{\cdot\left[{{\phi_{i}}h_{in}}{-\hat{\phi}_{i}\cdot\hat{h}_{in}}\right]}^{*}\right\}
=iK\k|Ali|2𝔼{[ϕihimpilVNμimlmVN]\displaystyle=\sum\nolimits_{i\in K\backslash k}{{{\left|{{A_{li}}}\right|}^{2}}}\mathbb{E}\Big{\{}{\left[{{\phi_{i}}{h_{im}}-p_{i\to l}^{VN}\cdot\mu_{im\to lm}^{VN}}\right]}
[ϕihinpilVNμinlnVN]}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad~{}{\cdot\left[{{\phi_{i}}h_{in}}{-p_{i\to l}^{VN}\cdot{{{\mu_{in\to ln}^{VN}}}}}\right]}^{*}\Big{\}}
=iK\k|Ali|2{𝔼[ϕi2himhin]\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left|{{A_{li}}}\right|}^{2}\cdot\left\{\mathbb{E}\left[\phi_{i}^{2}\cdot h_{im}\cdot h_{in}^{*}\right]\right.}
(pilVN)2μimlmVN(μinlnVN)},\displaystyle\quad\quad\quad\quad\quad\quad{\left.-(p_{i\to l}^{VN})^{2}\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right\}},

where

𝔼[ϕi2himhin]=𝔼[ϕi2]𝔼[himhin].\displaystyle\mathbb{E}\left[\phi_{i}^{2}\cdot h_{im}\cdot h_{in}^{*}\right]=\mathbb{E}\left[\phi_{i}^{2}\right]\cdot\mathbb{E}\left[h_{im}\cdot h_{in}^{*}\right]. (73)
=pilVN[(ΣilVN)(m,n)+μimlmVN(μinlnVN)].\displaystyle\quad=p_{i\to l}^{VN}\cdot\left[(\Sigma_{i\to l}^{VN})_{(m,n)}+\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right].

Then we can rewrite (72) as

(Σzlk)(m,n)\displaystyle(\Sigma_{z_{lk}})_{(m,n)} (74)
=iK\k|Ali|2pilVN{(ΣilVN)(m,n)\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left|{{A_{li}}}\right|}^{2}\cdot p_{i\to l}^{VN}\cdot\left\{(\Sigma_{i\to l}^{VN})_{(m,n)}\right.}
+(1pilVN)μimlmVN(μinlnVN)}\displaystyle\quad\quad\quad\quad\quad\quad~{}~{}{\left.+(1-p_{i\to l}^{VN})\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right\}}
=iK\k|Ali|2pilVN{(ΣilVN)(m,n)\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left|{{A_{li}}}\right|}^{2}\cdot p_{i\to l}^{VN}\cdot\left\{(\Sigma_{i\to l}^{VN})_{(m,n)}\right.}
+qilVNμimlmVN(μinlnVN)},mn,\displaystyle\quad\quad\quad\quad\quad\quad~{}~{}{\left.+q_{i\to l}^{VN}\cdot\mu_{im\to lm}^{VN}\cdot(\mu_{in\to ln}^{VN})^{*}\right\}},~{}m\neq n,

where qilVN=1pilVNq_{i\to l}^{VN}=1-p_{i\to l}^{VN} denotes the probability that the Bernoulli variable ϕk\phi_{k} equals zero. Therefore, the covariance of 𝐳lk\mathbf{z}_{lk} is not diagonal. If m=nm=n, we have

(Σzlk)(m,m)\displaystyle(\Sigma_{z_{lk}})_{(m,m)} =iK\k|Ali|2pilVN{(ΣilVN)(m,m)\displaystyle=\sum\nolimits_{i\in K\backslash k}{{\left|{{A_{li}}}\right|}^{2}\cdot p_{i\to l}^{VN}\cdot\Big{\{}(\Sigma_{i\to l}^{VN})_{(m,m)}} (75)
+qilVN|μimlmVN|2}+σn2.\displaystyle{\quad\quad\quad+~{}q_{i\to l}^{VN}\cdot\left|\mu_{im\to lm}^{VN}\right|^{2}\Big{\}}}+\sigma_{n}^{2}.

This completes the derivation.

-B2 Derivation of Gaussian Messages at VNs

We give the derivations of (28) and (29) here. Rewrite the multivariate complex Gaussian pdf in (25) to canonical notation as

f(𝒙|𝝁,𝚺)=exp[𝒙H𝚪𝒙+𝜼H𝒙+𝒙H𝜼+𝜻],f(\bm{x}|\bm{\mu},\bm{\Sigma})=\exp\left[-\bm{x}^{H}\bm{\Gamma}\bm{x}+\bm{\eta}^{H}\bm{x}+\bm{x}^{H}\bm{\eta}+\bm{\zeta}\right], (76)

where

𝚪\displaystyle\bm{\Gamma} =𝚺1,𝜼=𝚺1𝝁\displaystyle=\bm{\Sigma}^{-1},\quad\bm{\eta}=\bm{\Sigma}^{-1}\bm{\mu} (77)
𝜻\displaystyle\bm{\zeta} =𝜼H𝚪1𝜼+ln|𝚪|Mlnπ,\displaystyle=-\bm{\eta}^{H}\bm{\Gamma}^{-1}\bm{\eta}+\ln|\bm{\Gamma}|-M\ln\pi,

Therefore, the product of nn Gaussian pdfs is

i=1nfi(𝒙)\displaystyle\prod_{i=1}^{n}{f_{i}(\bm{x})} =exp[𝒙H(i=1n𝚪i)𝒙+(i=1n𝜼i)H𝒙\displaystyle=\exp\Big{[}-\bm{x}^{H}\left(\sum\nolimits_{i=1}^{n}{\bm{\Gamma}_{i}}\right)\bm{x}+\left(\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}\right)^{H}\bm{x} (78)
+𝒙Hi=1n𝜼i+i=1n𝜻i],\displaystyle\quad\quad\quad+\bm{x}^{H}\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}+\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}\Big{]},

where

i=1n𝜻i=i=1n(𝜼iH𝚪i1𝜼i+ln|𝚪i|)nMlnπ.\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}=-\sum\nolimits_{i=1}^{n}{\left(\bm{\eta}_{i}^{H}\bm{\Gamma}_{i}^{-1}\bm{\eta}_{i}+ln|\bm{\Gamma}_{i}|\right)}-nM\ln\pi. (79)

We make a simple substitution as below

𝚪n\displaystyle\bm{\Gamma}_{n} =i=1n𝚪i,𝜼n=i=1n𝜼i\displaystyle=\sum\nolimits_{i=1}^{n}{\bm{\Gamma}_{i}},\quad\bm{\eta}_{n}=\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}} (80)
𝜻n\displaystyle\bm{\zeta}_{n} =𝜼nH𝚪n1𝜼n+ln|𝚪n|Mlnπ.\displaystyle=-\bm{\eta}_{n}^{H}\bm{\Gamma}_{n}^{-1}\bm{\eta}_{n}+\ln|\bm{\Gamma}_{n}|-M\ln\pi.

Then we can rewrite (78) as

i=1nfi(𝒙)\displaystyle\prod_{i=1}^{n}{f_{i}(\bm{x})} =exp[𝒙H𝚪n𝒙+𝜼nH𝒙+𝒙H𝜼n+𝜻n\displaystyle=\exp\Big{[}-\bm{x}^{H}\bm{\Gamma}_{n}\bm{x}+\bm{\eta}_{n}^{H}\bm{x}+\bm{x}^{H}\bm{\eta}_{n}+\bm{\zeta}_{n} (81)
+i=1n𝜻i𝜻n]\displaystyle\quad\quad\quad+\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}-\bm{\zeta}_{n}\Big{]}
=cexp[𝒙H𝚪n𝒙+𝜼nH𝒙+𝒙H𝜼n+𝜻n],\displaystyle=c\cdot\exp\left[-\bm{x}^{H}\bm{\Gamma}_{n}\bm{x}+\bm{\eta}_{n}^{H}\bm{x}+\bm{x}^{H}\bm{\eta}_{n}+\bm{\zeta}_{n}\right],

where c=i=1n𝜻i𝜻nc=\sum\nolimits_{i=1}^{n}{\bm{\zeta}_{i}}-\bm{\zeta}_{n} is a constant for normalization. Since the nn Gaussian pdfs are independent, the product of which still follows the Gaussian distribution. Comparing with (77) and (78), the covariance and mean of the Gaussian distribution in (81) are

𝚺n\displaystyle\bm{\Sigma}_{n} =𝚪n1=(i=1n𝚺i1)1\displaystyle=\bm{\Gamma}_{n}^{-1}=\left(\sum\nolimits_{i=1}^{n}{\bm{\Sigma}_{i}^{-1}}\right)^{-1} (82)
𝝁n\displaystyle\bm{\mu}_{n} =𝚺ni=1n𝜼i=𝚺ni=1n𝚺i1𝝁i.\displaystyle=\bm{\Sigma}_{n}\cdot\sum\nolimits_{i=1}^{n}{\bm{\eta}_{i}}=\bm{\Sigma}_{n}\cdot\sum\nolimits_{i=1}^{n}{\bm{\Sigma}_{i}^{-1}\bm{\mu}_{i}}.

Similarly, in (27), it is the product of Gaussian pdfs {f(𝒉|𝝁ikSN,𝚺ikSN),i\l}\left\{f(\bm{h}|\bm{\mu}_{i\to k}^{SN},\bm{\Sigma}_{i\to k}^{SN}),i\in\mathcal{L}\backslash l\right\} and f(𝒉|𝝁kpri,𝚺kpri)f(\bm{h}|\bm{\mu}_{k}^{pri},\bm{\Sigma}_{k}^{pri}). According to the rule in (82), we can obtain (28) and (29).

References

  • [1] Y. Liu, S. Zhang, X. Mu, Z. Ding, R. Schober, N. Al-Dhahir, E. Hossain, and X. Shen, “Evolution of NOMA toward next generation multiple access (NGMA),” Aug. 2021. Online Available: arXiv:2108.04561.
  • [2] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the internet of things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88-99, Sept. 2018.
  • [3] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615-637, March 2021.
  • [4] L. Liu and W. Yu, “Massive connectivity with massive MIMO—Part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, June 2018.
  • [5] Y. Wu, X. Gao, S. Zhou, W. Yang, Y. Polyanskiy, and G. Caire, “Massive access for future wireless communication systems,” IEEE Wireless Commun., vol. 27, no. 4, pp. 148-156, Aug. 2020.
  • [6] V. Shyianov, F. Bellili, A. Mezghani, and E. Hossain, “Massive unsourced random access based on uncoupled compressive sensing: Another blessing of massive MIMO,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 820-834, Mar. 2021.
  • [7] J. Gao, Y. Wu, Y. Wang, W. Zhang, and F. Wei, “Uplink transmission design for crowded correlated cell-free massive MIMO-OFDM systems,” Sci. China Inf. Sci., vol. 64, no. 8, pp. 182309:1–182309:16, Aug. 2021.
  • [8] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
  • [9] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing-based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764-779, Jan. 2020.
  • [10] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020-2033, Jan. 2017.
  • [11] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), St. Petersburg, Russia, Aug. 2011, pp. 2168-2172.
  • [12] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Understanding belief propagation and its generalizations,” Mitsubishi Tech. Rep. TR2001-022, Jan. 2002.
  • [13] Z. Zhang, Y. Li, C. Huang, Q. Guo, L. Liu, C, Yuen, and Y. L. Guan, “User activity detection and channel estimation for grant-free random access in LEO satellite-enabled internet of things,” IEEE Internet Things J., vol. 7, no. 9, pp. 8811-8825, Sept. 2020.
  • [14] H. Iimori, T. Takahashim, K. Ishibashi, G. T. F. de Abreu, and W. Yu, “Grant-free access via bilinear inference for cell-free MIMO with low-coherent pilots,” to appear in IEEE Trans. Wireless Commun.. Online Available: arXiv:2009.12863v1.
  • [15] S. Jiang, X. Yuan, X. Wang, C. Xu, and W. Yu, “Joint user identification, channel estimation, and signal detection for grant-free NOMA,” IEEE Wireless Commun., vol. 19, no. 10, pp. 6960-6976, Oct. 2020.
  • [16] C. Huang, L. Liu, C. Yuen, and S. Sun, “Iterative channel estimation using LSE and sparse message passing for mmWave MIMO systems,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 245-259, Jan. 2019.
  • [17] Y. Polyanskiy, “A perspective on massive random-access,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Aachen, Germany, June 2017, pp. 2523–2527.
  • [18] O. Ordentlich and Y. Polyanskiy, “Low complexity schemes for the random access Gaussian channel,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June 2017, pp. 2528-2532.
  • [19] V. K. Amalladinne, J. -F. Chamberland, and K. R. Narayanan, “A coded compressed sensing scheme for unsourced multiple access,” IEEE Trans. Inf. Theory, vol. 66, no. 10, pp. 6509-6533, Oct. 2020.
  • [20] A. Fengler, P. Jung, and G. Caire, “SPARCs and AMP for unsourced random access,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Paris, France, July 2019, pp. 2843-2847 .
  • [21] A. Fengler, P. Jung, and G. Caire, “SPARCs for unsourced random access,” to appear in IEEE Trans. Inf. Theory. Online Available: https://arxiv.org/abs/1901.06234.
  • [22] A. Joseph and A. R. Barron, “Least squares superposition codes of moderate dictionary size are reliable at rates up to capacity,” IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 2541–2557, Jan. 2012.
  • [23] A. K. Pradhan, V. K. Amalladinne, K. R. Narayanan, and J. Chamberland, “Polar coding and random spreading for unsourced multiple access,” in Proc. IEEE Int. Conf. Commun. (ICC), Dublin, Ireland, June 2020, pp. 1-6.
  • [24] M. Zheng, Y. Wu, and W. Zhang, “Polar coding and sparse spreading for massive unsourced random access,” in Proc. IEEE Veh. Technol. Conf. (VTC-Fall), Victoria, BC, Canada, Nov. 2020, pp. 1-5.
  • [25] A. Pradhan, V. Amalladinne, A. Vem, K. R. Narayanan, and J. -F. Chamberland, “A joint graph based coding scheme for the unsourced random access Gaussian channel,” in Proc. IEEE Glob. Commun. Conf. (GLOBECOM), Waikoloa, HI, USA, Dec. 2019, pp. 1-6.
  • [26] A. Vem, K. R. Narayanan, J. Chamberland, and J. Cheng, “A user-independent successive interference cancellation based coding scheme for the unsourced random access Gaussian channel,” IEEE Trans. Commun., vol. 67, no. 12, pp. 8258-8272, Dec. 2019.
  • [27] R. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 21-28, Jan. 1962.
  • [28] S. S. Kowshik, K. Andreev, A. Frolov, and Y. Polyanskiy, “Short-packet low-power coded access for massive MAC,” in Proc. Asilomar Conf. Signals Syst. Comput. (ACSSC), Pacific Grove, CA, USA, Nov. 2019, pp. 827-832.
  • [29] S. S. Kowshik, K. Andreev, A. Frolov, and Y. Polyanskiy, “Energy efficient random access for the quasi-static fading MAC,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Paris, France, 2019, July 2019, pp. 2768-2772.
  • [30] S. S. Kowshik, K. Andreev, A. Frolov, and Y. Polyanskiy, “Energy efficient coded random access for the wireless uplink,” IEEE Trans. Commun., vol. 68, no. 8, pp. 4694-4708, Aug. 2020.
  • [31] S. Haghighatshoar, P. Jung, and G. Caire, “Improved scaling law for activity detection in massive MIMO systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA, Jun. 2018, pp. 381–385.
  • [32] A. Fengler, S. Haghighatshoar, P. Jung, and G. Caire, “Non-Bayesian activity detection, large-scale fading coefficient estimation, and unsourced random access with a massive MIMO receiver,” IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 2925-2951, May 2021.
  • [33] V. Shyianov, F. Bellili, A. Mezghani, and E. Hossain, “Massive unsourced random access based on uncoupled compressive sensing: Another blessing of massive MIMO,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 820-834, Mar. 2021.
  • [34] J. Dong, J. Zhang, Y. Shi, and J. H. Wang, “Faster activity and data detection in massive random access: A multi-armed bandit approach,” Jan. 2020. Online Available: arXiv:2001.10237v1.
  • [35] L. Ping, L. Liu, K. Wu, and W. K. Leung, “Interleave division multiple access,” IEEE Trans. Wireless Commun., vol. 5, no. 4, pp. 938–947, Apr. 2006.
  • [36] L. Sanguinetti, E. Björnson, and J. Hoydis, “Toward massive MIMO 2.0: Understanding spatial correlation, interference suppression, and pilot contamination,” IEEE Trans. Commun., vol. 68, no. 1, pp. 232-257, Jan. 2020.
  • [37] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,” IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169-6183, Dec.1, 2015.
  • [38] L. You, X. Gao, X.-G. Xia, N. Ma, and Y. Peng, “Pilot reuse for massive MIMO transmission over spatially correlated Rayleigh fading channels,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3352-3366, June 2015.
  • [39] X. Xie and Y. Wu, “Unsourced random access with a massive MIMO receiver: Exploiting angular domain sparsity,” in Proc. IEEE/CIC Int. Conf. Commun. China (ICCC), Xiamen, China, 2021, pp. 741-745.
  • [40] V. Chandrasekhar, J. G. Andrews, T. Muharemovic, Z. Shen, and A. Gatherer, “Power control in two-tier femtocell networks,” IEEE Trans. Wireless Commun., vol. 8, no. 8, pp. 4316-4328, Aug. 2009.
  • [41] P. Patel and J. Holtzman, “Analysis of a simple successive interference cancellation scheme in a DS/CDMA system,” IEEE J. Sel. Areas Commun., vol. 12, no. 5, pp. 796-807, June 1994.
  • [42] G. Turin, “The effects of multipath and fading on the performance of direct-sequence CDMA systems,” IEEE J. Sel. Areas Commun., vol. 2, no. 4, pp. 597-603, July 1984.
  • [43] T. Li, Y. Wu, M. Zheng, D. Wang, and W. Zhang, “SPARC-LDPC coding for MIMO massive unsourced random access,” in Proc. IEEE Globecom Workshops (GC Wkshps), Taipei, Taiwan, Dec. 2020, pp. 1-6.
  • [44] T. L. Narasimhan, A. Chockalingam, and B. S. Rajan, “Factor graph based joint detection/decoding for LDPC coded large-MIMO systems,” in Proc. IEEE Veh. Technol. Conf. (VTC-Spring), Yokohama, Japan, May 2012, pp. 1-5.
  • [45] J. Bibby, “Axiomatisations of the average and a further generalisation of monotonic sequences”, Glas. Math. J., vol. 15, no. 1, pp. 63–65, 1974.