This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

MIMO-OFDM-Based Massive Connectivity With Frequency Selectivity Compensation

Wenjun Jiang, Mingyang Yue, Xiaojun Yuan, , and Yong Zuo
Abstract

In this paper, we study how to efficiently and reliably detect active devices and estimate their channels in a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) based grant-free non-orthogonal multiple access (NOMA) system to enable massive machine-type communications (mMTC). First, by exploiting the correlation of the channel frequency responses in narrow-band mMTC, we propose a block-wise linear channel model. Specifically, the continuous OFDM subcarriers in the narrow-band are divided into several sub-blocks and a linear function with only two variables (mean and slope) is used to approximate the frequency-selective channel in each sub-block. This significantly reduces the number of variables to be determined in channel estimation and the sub-block number can be adjusted to reliably compensate the channel frequency-selectivity. Second, we formulate the joint active device detection and channel estimation in the block-wise linear system as a Bayesian inference problem. By exploiting the block-sparsity of the channel matrix, we develop an efficient turbo message passing (Turbo-MP) algorithm to resolve the Bayesian inference problem with near-linear complexity. We further incorporate machine learning approaches into Turbo-MP to learn unknown prior parameters. Numerical results demonstrate the superior performance of the proposed algorithm over state-of-the-art algorithms.

Index Terms:
grant-free non-orthogonal multiple access, orthogonal frequency division multiplexing, turbo message passing.

I Introduction

massive machine-type communications (mMTC) has been envisioned as one of the three key application scenarios of fifth-generation (5G) wireless communications. To support massive connectivity of machine-type devices, mMTC typically has the feature of sporadic transmission with short packets, which is different from conventional human-type communications [1]. This implies that only a small subgroup of devices are active in any time instance of mMTC. As such, in addition to channel estimation and signal detection, a fundamentally new challenge for the design of an mMTC receiver is to reliably and efficiently identify which subgroup of devices are actively engaged in packet transmission.

Recently, a new random access protocol termed grant-free non-orthogonal multiple access (NOMA) has been evaluated and highlighted for mMTC [2]. In specific, in grant-free NOMA, the devices share the same time and frequency resources for signal transmission, and the signals can be transmitted without the scheduling grant from the base station (BS). The receiver at the BS is then required to perform active device detection (ADD), channel estimation (CE), and signal detection (SD), either separately or jointly. The earlier work [3, 4, 5, 6] assumed that full channel state information (CSI) can be acquired at the BS and studied joint CE and SD. However, the assumption of full CSI availability is not practical since it will cause a huge overhead to estimate the CSI of all access devices. The follow-up work [7] proposed to divide the process at the BS into two phases, namely, the joint ADD and CE phase the SD phase. Since the BS only needs to estimate the CSI of active devices, the pilot overhead is significantly reduced. In addition, the channel sparsity in the device domain enables the employment of compressed sensing (CS) algorithms [8] to solve the joint ADD and CE problem. For example, the authors in [9] considered the multiple-input multiple-output (MIMO) transmission and leveraged a multiple measurement vector (MMV) CS technique termed vector approximate message passing (Vector AMP) [10] to achieve asymptotically perfect ADD. In [11], the authors further considered a mixed analog-to-digital converters (ADCs) architecture at the BS antennas and proposed a CS algorithm based on the turbo compressive sensing (Turbo-CS) [12]. It is known from [12] that Turbo-CS outperforms approximate message passing (AMP) [13] both in convergence performance and computational complexity. Another line of research considered the more challenging joint ADD, CE, and SD problem [14, 15]. As compared to the two-phase approach, these joint schemes can achieve significant performance improvement but at the expense of higher computational complexity.

Orthogonal frequency division multiplexing (OFDM) is a mature and enabling technology for 5G to provide high spectral efficiency. As such, the design of OFDM-based grant-free NOMA has attracted much research interest in recent years [16, 17, 18]. In [16], the authors exploited the block-sparsity of the channel responses on OFDM subcarriers to design a message-passing-based iterative algorithm. Besides, it has been demonstrated in [17] that the message-passing-based iterative algorithm can be unfolded into a deep neural network. By training the parameters of the neural network, the convergence and performance of the algorithm are improved. Furthermore, OFDM-based grant-free NOMA with massive MIMO has been considered in [18]. By leveraging the sparsity both in the device domain and the virtual angular domain, the authors utilized the AMP algorithm to achieve the joint ADD and CE. One issue in [16, 17, 18] is that the frequency-domain channel estimation on every subcarrier requires a high pilot overhead, which is inefficient for the short data packets transmission in mMTC.

To reduce the pilot overhead, a common strategy is to transform the frequency-domain channel into the time-domain channel by inverse discrete Fourier transform (IDFT) [19]. Due to the limited delay spread in the time-domain channel, the time-domain channel is sparse, thereby requiring fewer pilots for CE. Furthermore, by exploiting the sparsity of both the time-domain channel and the device activity pattern, some state-of-the-art CS algorithms such as Turbo-CS [12, 20] and Vector AMP [10, 9] can be applied to the considered systems with some straightforward modifications. However, there exists an energy leakage problem caused by the IDFT transform to obtain the time-domain channel. The energy leakage compromises the channel sparsity in the time domain. In addition, the power delay profile (PDP) is generally difficult for the BS to acquire, and thus cannot be exploited as prior information to improve the system performance.

Motivated by the bottleneck of the existing channel models when applied to the MIMO-OFDM-based grant-free NOMA system, we aim to construct a channel model to enable efficient massive random access. Due to the short packets in mMTC, the bandwidth for packet transmission is usually narrow, e.g., 1MHz for 10510^{5} access devices [21]. Then the variations of the channel frequency responses across the subcarriers are limited and slow. By leveraging this fact, we propose a block-wise linear channel model. Specifically, the continuous subcarriers are divided into several sub-blocks. In each sub-block, the frequency-selective channel is approximated by a linear function. We demonstrate that the number of variables in the block-wise linear channel model is typically much less than the number of non-zero delay taps in the time-domain channel. Moreover, the sub-block number can be modified to strike the trade-off between the model accuracy and the number of the channel variables to be estimated.

Based on the block-wise linear system model, we aim to design a CS algorithm to solve the joint ADD and CE problem. We first introduce a probability model to characterize the block-sparsity of the channel matrix. Then the joint ADD and CE is formulated as a Bayesian inference problem. Inspired by the success of Turbo-CS [12, 20] in sparse signal recovery, we design a message passing algorithm termed turbo message passing (Turbo-MP) to resolve the Bayesian inference problem. The message passing processes are derived based on the principle of Turbo-CS and the sum-product rule [22]. By designing a partial orthogonal pilot matrix with fast transform, Turbo-MP achieves near-linear complexity.

Furthermore, we show that machine learning methods can be incorporated into Turbo-MP to learn the unknown prior parameters. We adopt the expectation maximization (EM) algorithm [23] to learn the prior parameters. We then show how to unfold Turbo-MP into a neural network (NN), where the prior parameters are seen as the learnable parameters of the neural network. Numerical results show that NN-based Turbo-MP has a faster convergence rate than EM-based Turbo-MP. More importantly, we show that Turbo-MP designed for the propose frequency-domain block-wise linear model significantly outperforms the state-of-the-art counterparts, especially those message passing algorithms designed for the time-domain sparse channel model [9, 11].

I-A Notation and Organization

We use bold capital letters like 𝐗\mathbf{X} for matrices and bold lowercase letters 𝐱\mathbf{x} for vectors. ()T(\cdot)^{T} and ()H(\cdot)^{H} are used to denote the transpose and the conjugate transpose, respectively. We use diag(𝐱){\rm diag}(\mathbf{x}) for the diagonal matrix created from vector 𝐱\mathbf{x}, diag(𝐱1,,𝐱N){\rm diag}(\mathbf{x}_{1},...,\mathbf{x}_{N}) for the block diagonal matrix with the nn-th block being vector 𝐱n\mathbf{x}_{n}, and diag(𝐗1,,𝐗N){\rm diag}(\mathbf{X}_{1},...,\mathbf{X}_{N}) for the block diagonal matrix with the nn-th block being matrix 𝐗n\mathbf{X}_{n}. We use vec(𝐗){\rm vec}(\mathbf{X}) for the vectorization of matrix 𝐗\mathbf{X} and \otimes for the Kronecker product. 𝐗F||\mathbf{X}||_{F} and 𝐱2||\mathbf{x}||_{2} are used to denote the Frobenius norm of matrix 𝐗\mathbf{X} and the l2l_{2} norm of vector 𝐱\mathbf{x}, respectively. Matrix 𝐈\mathbf{I} denotes the identity matrix with an appropriate size. For a random vector 𝐱\mathbf{x}, we denote its probability density function (pdf) by p(𝐱)p(\mathbf{x}). δ()\delta(\cdot) denotes the Dirac delta function and δ[]\delta[\cdot] denotes the Kronecker delta function. The pdf of a complex Gaussian random vector 𝐱N\mathbf{x}\in\mathbb{C}^{N} with mean 𝐦\mathbf{m} and covariance 𝐂\mathbf{C} is denoted by 𝒞𝒩(𝐱;𝐦,𝐂)=exp((𝐱𝐦)H𝐂1(𝐱𝐦))/(πN|𝐂|)\mathcal{CN}(\mathbf{x};\mathbf{m},\mathbf{C})={\rm exp}(-(\mathbf{x}-\mathbf{m})^{H}\mathbf{C}^{-1}(\mathbf{x}-\mathbf{m}))/(\pi^{N}|\mathbf{C}|).

The remainder of this paper is organized as follows. In Section II, we introduce the existing system models. Furthermore, we propose the block-wise linear system model and demonstrate its superiority. In Section III, we formulate a Bayesian inference problem to address the ADD and CE problem. In Section IV, we propose the Turbo-MP algorithm, describe the pilot design, and analyze the algorithm complexity. In Section V, we apply the EM algorithm to learn the prior parameters. Moreover, we show how Turbo-MP is unfolded into a neural network. In Section VI, we present the numerical results. In Section VII, we conclude this paper.

II System Modeling

II-A MIMO-OFDM-Based Grant-free NOMA Model

Refer to caption
Figure 1: An illustration of MIMO-OFDM-based mMTC, where a small subgroup of devices are active in each time instance and share NN continuous subcarriers for pilot transmission within TT OFDM symbols.

Consider a MIMO-OFDM-based grant-free NOMA system as shown in Fig. 1, where a frequency band of NN adjacent OFDM subcarriers are allocated for mMTC. NN is usually much less than the total number of available subcarriers in the considered OFDM system. This frequency allocation strategy follows the idea of narrow-band internet-of-things [24] and network slicing [25]. Then the allocated bandwidth is used to support KK single-antenna devices to randomly access an MM-antenna base station (BS). In each time instance, only a small subset of devices are active. To characterize such sporadic transmission, the device activity is described by an indicator function αk\alpha_{k} as

αk={1, device k is active 0, device k is inactive,k=1,,K\alpha_{k}=\left\{\begin{array}[]{l}1,\text{ device }k\text{ is active }\\ 0,\text{ device }k\text{ is inactive},\end{array}\quad k=1,...,K\right. (1)

with p(αk=1)=λp(\alpha_{k}=1)=\lambda where λ1\lambda\ll 1.

We adopt a multipath block-fading channel, i.e., the multipath channel response remain constant within the coherence time. Denote the channel frequency response on the nn-th subcarrier from the kk-th device at mm-th BS antenna by

gk,m,n=\displaystyle g_{k,m,n}= l=1Lkρk,lβk,m,lej2πΔfτk,ln,\displaystyle\sum\limits_{l=1}^{L_{k}}\sqrt{\rho_{k,l}}\beta_{k,m,l}{e^{{-j2\pi\Delta f\tau_{k,l}n}}},
k=1,,K;m=1,,M;n=1,,N\displaystyle\quad k=1,...,K;~{}m=1,...,M;~{}n=1,...,N (2)

where Δf\Delta f is the OFDM subcarrier spacing; LkL_{k} is the number of channel taps of the kk-th device; ρk,l\rho_{k,l} and τk,l\tau_{k,l}111Strictly speaking, the differences of the tap delay at different BS antennas are absorbed in βk,m,l\beta_{k,m,l} are respectively the ll-th tap power and tap delay of the kk-th device; βk,m,l𝒞𝒩(βk,m,l;0,1)\beta_{k,m,l}\sim\mathcal{CN}(\beta_{k,m,l};0,1) is the normalized complex gain and assumed to be independent for any k,m,lk,m,l [26]. Then the channel frequency response can be expressed in a matrix form as

𝐆k\displaystyle\mathbf{G}_{k} =αk(gk,1,1gk,m,1gk,M,1gk,1,Ngk,m,Ngk,M,N)\displaystyle=\alpha_{k}\left(\begin{array}[]{ccccc}g_{k,1,1}&\cdots&g_{k,m,1}&\cdots&g_{k,M,1}\\ \vdots&&\vdots&&\vdots\\ g_{k,1,N}&\cdots&g_{k,m,N}&\cdots&g_{k,M,N}\end{array}\right) (6)

Let ak,n(t)a_{k,n}^{(t)} be the pilot symbol of the kk-th device transmitted on th nn-th subcarrier at the tt-th OFDM symbol, and TT be the number of OFDM symbols for pilot transmission. Then we construct a block diagonal matrix 𝚲kTN×N\mathbf{\Lambda}_{k}\in\mathbb{C}^{TN\times N} with the nn-th diagonal block being [ak,n(1),,ak,n(T)]T[a_{k,n}^{(1)},...,a_{k,n}^{(T)}]^{T}, i.e.,

𝚲k=diag([ak,1(1),,ak,1(T)]T,,[ak,N(1),,ak,N(T)]T)\mathbf{\Lambda}_{k}={\rm diag}\big{(}[a_{k,1}^{(1)},...,a_{k,1}^{(T)}]^{T},...,[a_{k,N}^{(1)},...,a_{k,N}^{(T)}]^{T}\big{)} (7)

Assume the cyclic-prefix (CP) length Lcp>τk,l,k,lL_{cp}>\tau_{k,l},\forall k,l. After removing the CP and applying the discrete Fourier transform (DFT), the system model in the frequency domain is described as

𝐘=k=1K𝚲k𝐆k+𝐍\mathbf{Y}=\sum_{k=1}^{K}\mathbf{\Lambda}_{k}\mathbf{G}_{k}+\mathbf{N} (8)

where 𝐘TN×M\mathbf{Y}\in\mathbb{C}^{TN\times M} is the observation matrix; 𝐍\mathbf{N} is an additive white Gaussian noise (AWGN) matrix with its elements independently drawn from 𝒞𝒩(0,σN2)\mathcal{CN}(0,\sigma_{N}^{2}). In [16, 17, 18], CS algorithms were proposed based on (8) to achieve the CE on every OFDM subcarrier.

Define the time-domain channel matrix of the kk-th device as 𝐇~kN×M\tilde{\mathbf{H}}_{k}\in\mathbb{C}^{N\times M}. Note that 𝐇~k\tilde{\mathbf{H}}_{k} can be represented as the IDFT of 𝐆k\mathbf{G}_{k}, i.e,

𝐇~k\displaystyle\tilde{\mathbf{H}}_{k} =𝐅H𝐆k\displaystyle=\mathbf{F}^{H}\mathbf{G}_{k} (9)

where 𝐅N×N\mathbf{F}\in\mathbb{C}^{N\times N} is the DFT matrix with the (n1,n2)(n_{1},n_{2})-th element being 1/Nexp(j2πn1n2/N)1/\sqrt{N}\cdot{\rm exp}(-j2\pi n_{1}n_{2}/N). Similarly, 𝐆k\mathbf{G}_{k} is represented as 𝐆k=𝐅𝐇~k\mathbf{G}_{k}=\mathbf{F}\tilde{\mathbf{H}}_{k}. Substituting 𝐆k=𝐅𝐇~k\mathbf{G}_{k}=\mathbf{F}\tilde{\mathbf{H}}_{k} into (8), we obtain

𝐘\displaystyle\mathbf{Y} =k=1K𝚲k𝐅𝐇~k+𝐍\displaystyle=\sum_{k=1}^{K}\mathbf{\Lambda}_{k}\mathbf{F}\tilde{\mathbf{H}}_{k}+\mathbf{N}
=[𝚲1𝐅,,𝚲K𝐅]𝐇~+𝐍\displaystyle=[\mathbf{\Lambda}_{1}\mathbf{F},...,\mathbf{\Lambda}_{K}\mathbf{F}]\tilde{\mathbf{H}}+\mathbf{N} (10)

where 𝐇~=[𝐇~1T,,𝐇~KT]T\tilde{\mathbf{H}}=[\tilde{\mathbf{H}}_{1}^{T},...,\tilde{\mathbf{H}}_{K}^{T}]^{T} is the channel matrix in the time domain. It is known that the channel delay spread is usually much smaller than NN, i.e., some rows of 𝐇~k\tilde{\mathbf{H}}_{k} are zeros. Besides, due to the sporadic transmission of the devices, 𝐇~k\tilde{\mathbf{H}}_{k} is an all-zero matrix if device kk is inactive. In this case, CS algorithms such as Vector AMP [10, 9] and Turbo-CS [12, 20] can be used to recover 𝐇~\tilde{\mathbf{H}} from observation 𝐘\mathbf{Y} by exploiting the sparsity of 𝐇~\tilde{\mathbf{H}}. However, path delay is generally not an integer multiple of the sampling interval, resulting in the energy leakage problem of the IDFT transform (9) which severely compromises the sparsity of 𝐇~\tilde{\mathbf{H}}. An illustration of the energy leakage problem is given in Fig. 2(b).

Refer to caption
Figure 2: An example of channel model comparison. TDL-C multi-path channel in TR 38.901 R14 [27]. The number of OFDM subcarriers N=72N=72. (a) frequency-domain channel response and its block-wise linear approximation with sub-block number Q=4Q=4. (b) corresponding time-domain channel response where the increase of the channel response at the tail is caused by the energy leakage of the IDFT transform.

II-B Proposed Block-Wise Linear System Model

For narrow-band mMTC [1], the variations of the channel frequency response across the subcarriers are typically slow and limited, which inspires us to develop an alternative channel model to efficiently leverage the correlation in the frequency domain. In specific, we propose a block-wise linear channel model. We divide the NN continuous subcarriers into QQ sub-blocks. In each sub-block qq, a linear function is used as the approximation of the channel frequency response:

gk,nq,m=hk,q,m\displaystyle g_{k,n_{q},m}=h_{k,q,m} +(nqlq)ck,q,m+Δk,nq,m\displaystyle+(n_{q}-l_{q})c_{k,q,m}+\Delta_{k,n_{q},m}
nq=(q1)N/Q+1,,qN/Q\displaystyle n_{q}=(q-1)N/Q+1,...,qN/Q (11)

where hk,q,mh_{k,q,m} and ck,q,mc_{k,q,m} represent the mean and slope of the linear function in the qq-th sub-block, respectively; Δk,nq,m\Delta_{k,n_{q},m} is the error term due to model mismatch; and lq=(q12)N/Ql_{q}=(q-\frac{1}{2})N/Q is the midpoint of nqn_{q}. Intuitively, hk,q,mh_{k,q,m} can be seen as the mean-value of the channel response in the qq-th sub-block, and ck,q,mc_{k,q,m} is used to characterize the change of the channel response for the compensation of the frequency-selectivity. For the kk-th device, we define the matrix 𝐇kQ×M\mathbf{H}_{k}\in\mathbb{C}^{Q\times M} and 𝐂kQ×M\mathbf{C}_{k}\in\mathbb{C}^{Q\times M} as

𝐇k\displaystyle\mathbf{H}_{k} =αk(hk,1,1hk,1,mhk,1,Mhk,Q,1hk,Q,mhk,Q,M)\displaystyle=\alpha_{k}\left(\begin{array}[]{ccccc}h_{k,1,1}&\cdots&h_{k,1,m}&\cdots&h_{k,1,M}\\ \vdots&&\vdots&&\vdots\\ h_{k,Q,1}&\cdots&h_{k,Q,m}&\cdots&h_{k,Q,M}\end{array}\right) (15)
𝐂k\displaystyle\mathbf{C}_{k} =αk(ck,1,1ck,1,mck,1,Mck,Q,1ck,Q,mck,Q,M).\displaystyle=\alpha_{k}\left(\begin{array}[]{ccccc}c_{k,1,1}&\cdots&c_{k,1,m}&\cdots&c_{k,1,M}\\ \vdots&&\vdots&&\vdots\\ c_{k,Q,1}&\cdots&c_{k,Q,m}&\cdots&c_{k,Q,M}\end{array}\right)_{.} (19)

The reason for introducing αk\alpha_{k} in (15)-(19) is that the channel estimation and device activity detection can be jointly achieved by recovering 𝐇k\mathbf{H}_{k} and 𝐂k\mathbf{C}_{k}.

Define 𝐄1=diag(𝟏N/Q,,𝟏N/Q)N×Q\mathbf{E}_{1}={\rm diag}(\mathbf{1}_{N/Q},...,\mathbf{1}_{N/Q})\in\mathbb{R}^{N\times Q} with 𝟏N/Q\mathbf{1}_{N/Q} being an all-one vector of length N/QN/Q and 𝐄2=diag(𝐝,,𝐝)N×Q\mathbf{E}_{2}={\rm diag}(\mathbf{d},...,\mathbf{d})\in\mathbb{R}^{N\times Q} with 𝐝=[N2Q+1,,N2Q]T\mathbf{d}=[-\frac{N}{2Q}+1,...,\frac{N}{2Q}]^{T}. Then the block-wise linear approximation of the channel frequency response 𝐆k\mathbf{G}_{k} is given by

𝐆k=𝐄1𝐇k+𝐄2𝐂k+𝚫k\mathbf{G}_{k}=\mathbf{E}_{1}\mathbf{H}_{k}+\mathbf{E}_{2}\mathbf{C}_{k}+\mathbf{\Delta}_{k} (20)

where 𝚫kN×M\mathbf{\Delta}_{k}\in\mathbb{C}^{N\times M} is the error matrix from the kk-th device with the (n,m)(n,m)-th element Δk,n,m\Delta_{k,n,m} in (II-B). Substituting (20) into (8) with some manipulations, we obtain the block-wise linear system model as

𝐘=𝐀𝐇+𝐁𝐂+𝐖\displaystyle\mathbf{Y}=\mathbf{A}\mathbf{H}+\mathbf{B}\mathbf{C}+\mathbf{W} (21)

where 𝐀=[𝚲1𝐄1,,𝚲K𝐄1]TN×QK\mathbf{A}=[\mathbf{\Lambda}_{1}\mathbf{E}_{1},...,\mathbf{\Lambda}_{K}\mathbf{E}_{1}]\in\mathbb{C}^{TN\times QK} and 𝐁=[𝚲1𝐄2,,𝚲K𝐄2]TN×QK\mathbf{B}=[\mathbf{\Lambda}_{1}\mathbf{E}_{2},...,\mathbf{\Lambda}_{K}\mathbf{E}_{2}]\in\mathbb{C}^{TN\times QK} are the pilot matrices; 𝐇=[𝐇1T,,𝐇KT]TQK×M\mathbf{H}=[\mathbf{H}_{1}^{T},...,\mathbf{H}_{K}^{T}]^{T}\in\mathbb{C}^{QK\times M} is the channel mean matrix; 𝐂=[𝐂1T,,𝐂KT]TQK×M\mathbf{C}=[\mathbf{C}_{1}^{T},...,\mathbf{C}_{K}^{T}]^{T}\in\mathbb{C}^{QK\times M} is the channel compensation matrix; 𝐖\mathbf{W} is the summation of the AWGN and the error terms from model mismatch as

𝐖=k=1K𝚲k𝚫k+𝐍.\mathbf{W}=\sum_{k=1}^{K}\mathbf{\Lambda}_{k}\mathbf{\Delta}_{k}+\mathbf{N}. (22)

Following the central limit theorem, for a large devices number KK, 𝐖\mathbf{W} can be modeled as an AWGN matrix with mean zero and variance σw2\sigma_{w}^{2}. We note that with Q=NQ=N and 𝐂=𝟎\mathbf{C}=\mathbf{0}, system model (21) reduces to the model (8) in [16, 17, 18] where the channel response on every subcarrier needs to be estimated exactly. Furthermore, we adjust the sub-block number QQ to strike a balance between the number of channel variables to be estimated and the model accuracy. An example is shown in Fig. 2 where the channel response in the frequency domain and its block-wise linear approximation is given in Fig. 2(a). It is seen that sub-block number Q=4Q=4 is sufficient to ensure that the block-wise linear model approximates the frequency-domain channel response very well. This implies that, with model (21), only 2Q2Q variables need to be estimated for each device at each BS antenna. Compared to the channel response in the time domain as shown in Fig. 2(b), it is clear that the number of unknown variables 2Q2Q is much less than the number of corresponding non-zero taps. In the remainder of the paper, we focus on the estimation algorithm design based on model (21). In the simulation section, we will show that our algorithm designed based on (21) significantly outperforms the existing approaches based on (8) and (II-A).

III Problem Statement

With model (21), our goal is to recover the channel mean matrix 𝐇\mathbf{H} and channel compensation matrix 𝐂\mathbf{C} based on the noisy observation 𝐘\mathbf{Y}. This task can be constructed as a Bayesian inference problem. In the following, we first introduce the probability model of 𝐇\mathbf{H} and 𝐂\mathbf{C}, and then describe the statistical inference problem.

Due to the sporadic transmission of the devices, matrices 𝐇\mathbf{H} and 𝐂\mathbf{C} have a structured sparsity referred to as block-sparsity. In specific, if the kk-th device is inactive, we have 𝐇k=𝟎\mathbf{H}_{k}=\mathbf{0} and 𝐂k=𝟎\mathbf{C}_{k}=\mathbf{0}. With some abuse of notation, we utilize a conditional Bernoulli-Gaussian (BG) distribution [19] to characterize the block-sparsity as

p(𝐇k|αk)\displaystyle p(\mathbf{H}_{k}|\alpha_{k}) δ[αk]δ(𝐇k)+δ[1αk]𝒞𝒩(𝐇k;𝟎;ϑ𝐇𝐈)\displaystyle\sim\delta[\alpha_{k}]\delta(\mathbf{H}_{k})+\delta[1-\alpha_{k}]{\cal CN}(\mathbf{H}_{k};\mathbf{0};\vartheta_{\mathbf{H}}\mathbf{I})
p(𝐂k|αk)\displaystyle p(\mathbf{C}_{k}|\alpha_{k}) δ[αk]δ(𝐂k)+δ[1αk]𝒞𝒩(𝐂k;𝟎;ϑ𝐂𝐈)\displaystyle\sim\delta[\alpha_{k}]\delta(\mathbf{C}_{k})+\delta[1-\alpha_{k}]{\cal CN}(\mathbf{C}_{k};\mathbf{0};\vartheta_{\mathbf{C}}\mathbf{I}) (23)

where δ()\delta(\cdot) is the Dirac delta function and δ[]\delta[\cdot] is the Kronecker delta function. With indicator function αk=0\alpha_{k}=0, 𝐇k\mathbf{H}_{k} and 𝐂k\mathbf{C}_{k} are both zeros. With αk=1\alpha_{k}=1, the elements of 𝐇k\mathbf{H}_{k} and 𝐂k\mathbf{C}_{k} are independent and identically distributed (i.i.d.) Gaussian with variances ϑ𝐇\vartheta_{\mathbf{H}} and ϑ𝐂\vartheta_{\mathbf{C}}, respectively. We further assume that each device accesses the BS in an i.i.d. manner. Then the indicator function αk\alpha_{k} is drawn from the Bernoulli distribution as

p(αk)=(1λ)δ[αk]+λδ[1αk]p(\alpha_{k})=(1-\lambda)\delta[\alpha_{k}]+\lambda\delta[1-\alpha_{k}] (24)

where λ\lambda is the device activity rate.

Consider an estimator to minimize the mean-square error (MSE) of 𝐇\mathbf{H} and 𝐂\mathbf{C}. It is known that the estimator which minimizes the MSE is the posterior expectation with respect to the posterior distribution [28]. Define 𝐡mQK\mathbf{h}_{m}\in\mathbb{C}^{QK} and 𝐜mQK\mathbf{c}_{m}\in\mathbb{C}^{QK} as the mm-th column of 𝐇\mathbf{H} and 𝐂\mathbf{C}, respectively. Define 𝐲mTN\mathbf{y}_{m}\in\mathbb{C}^{TN} as the mm-th column of 𝐘\mathbf{Y}. Then the posterior distribution p(𝐇,𝐂,𝜶|𝐘)p(\mathbf{H},\mathbf{C},\boldsymbol{\alpha}|\mathbf{Y}) is described as

p(𝐇,𝐂,𝜶|𝐘)p(𝐘|𝐇,𝐂)p(𝐇,𝐂,𝜶)\displaystyle p(\mathbf{H},\mathbf{C},\boldsymbol{\alpha}|\mathbf{Y})\propto p(\mathbf{Y}|\mathbf{H},\mathbf{C})p(\mathbf{H},\mathbf{C},\boldsymbol{\alpha})
mp(𝐲m|𝐡m,𝐜m)kp(𝐇k|αk)p(𝐂k|αk)p(αk)\displaystyle\quad\propto\prod_{m}p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m})\prod_{k}p(\mathbf{H}_{k}|\alpha_{k})p(\mathbf{C}_{k}|\alpha_{k})p(\alpha_{k}) (25)

where mp(𝐲m|𝐡m,𝐜m)\prod_{m}p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}) and kp(𝐇k|αk)p(𝐂k|αk)p(αk)\prod_{k}p(\mathbf{H}_{k}|\alpha_{k})p(\mathbf{C}_{k}|\alpha_{k})p(\alpha_{k}) are the likelihood and the prior, respectively; vector 𝜶=[α1,,αk]T\boldsymbol{\alpha}=[\alpha_{1},...,\alpha_{k}]^{T}. In mMTC with a large device number KK, it is computationally intractable to obtain the minimum mean-square error (MMSE) estimator. In the following section, we propose a low-complexity algorithm termed turbo message passing (Turbo-MP) to obtain an approximate MMSE solution.

IV Turbo Message Passing

IV-A Algorithm Framework

Refer to caption
Figure 3: The block diagram of the proposed turbo message passing (Turbo-MP) algorithm.

The factor graph representation of p(𝐇,𝐂,𝜶|𝐘)p(\mathbf{H},\mathbf{C},\boldsymbol{\alpha}|\mathbf{Y}) is shown in Fig. 3, based on which Turbo-MP is established. In the factor graph, the likelihood and prior as in (III) are treated as the factor nodes (grey rectangles), while the random variables are treated as the variable nodes (blank circles). In specific, Turbo-MP consists of four modules referred to as Module A, B, C, and D, respectively. Module A is to obtain the estimates of the channel mean matrix 𝐇\mathbf{H} and the channel compensation matrix 𝐂\mathbf{C} by exploiting the likelihood mp(𝐲m|𝐡m,𝐜m)\prod_{m}p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}). Modules B and C are to obtain the estimates of 𝐇\mathbf{H} and 𝐂\mathbf{C} by exploiting the priors kp(𝐇k|αk)\prod_{k}p(\mathbf{H}_{k}|\alpha_{k}) and p(𝐂k|αk)p(\mathbf{C}_{k}|\alpha_{k}), respectively. Module D is to obtain the estimate of αk\alpha_{k} by exploiting the prior p(αk)p(\alpha_{k}). The estimates are passed between modules in each Turbo-MP iteration like turbo decoding [29]. For example, the output of Module B is used as the input of Module A, and vice versa. The four modules are executed iteratively until convergence. The derivation of Turbo-MP algorithm follows the sum-product rule [22] and the principle of Turbo-CS [12, 20].

IV-B Module A: Linear Estimation of 𝐡m\mathbf{h}_{m} and 𝐜m\mathbf{c}_{m}

In Module A, 𝐡m\mathbf{h}_{m} and 𝐜m\mathbf{c}_{m} at antenna m=1,..,Mm=1,..,M are estimated separately by exploiting the likelihood mp(𝐲m|𝐡m,𝐜m)\prod_{m}p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}) and the messages from Modules B and C. In specific, denote the message from variable node 𝐡m\mathbf{h}_{m} to factor node p(𝐲m|𝐡m,𝐜m)p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}) by 𝐡m𝐲m(𝐡m)\mathcal{M}_{\mathbf{h}_{m}\to\mathbf{y}_{m}}(\mathbf{h}_{m}) and the message from variable node 𝐜m\mathbf{c}_{m} to factor node p(𝐲m|𝐡m,𝐜m)p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}) by 𝐜m𝐲m(𝐜m)\mathcal{M}_{\mathbf{c}_{m}\to\mathbf{y}_{m}}(\mathbf{c}_{m}). Following the principle of Turbo-CS [12, 20], we assume that the above two messages are Gaussian, i.e., 𝐡m𝐲m(𝐡m)𝒞𝒩(𝐡m;𝐡mA,pri,v𝐡mA,pri𝐈)\mathcal{M}_{\mathbf{h}_{m}\to\mathbf{y}_{m}}(\mathbf{h}_{m})\sim\mathcal{CN}(\mathbf{h}_{m};\mathbf{h}_{m}^{A,pri},v_{\mathbf{h}_{m}}^{A,pri}\mathbf{I}) and 𝐜m𝐲m(𝐜m)𝒞𝒩(𝐜m;𝐜mA,pri,v𝐜mA,pri𝐈)\mathcal{M}_{\mathbf{c}_{m}\to\mathbf{y}_{m}}(\mathbf{c}_{m})\sim\mathcal{CN}(\mathbf{c}_{m};\mathbf{c}_{m}^{A,pri},v_{\mathbf{c}_{m}}^{A,pri}\mathbf{I}). Note that the algorithm starts with 𝐡mA,pri=𝟎\mathbf{h}_{m}^{A,pri}=\mathbf{0}, 𝐜mA,pri=𝟎\mathbf{c}_{m}^{A,pri}=\mathbf{0}, v𝐡mA,pri=λϑ𝐇v_{\mathbf{h}_{m}}^{A,pri}=\lambda\vartheta_{\mathbf{H}}, and v𝐜mA,pri=λϑ𝐂v_{\mathbf{c}_{m}}^{A,pri}=\lambda\vartheta_{\mathbf{C}}. From the sum-product rule, the belief of hk,q,mh_{k,q,m} at factor node p(𝐲m|𝐡m,𝐜m)p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}) is

𝐲m(hk,q,m)=/hk,q,mp(𝐲m|𝐡m,\displaystyle\mathcal{M}_{\mathbf{y}_{m}}(h_{k,q,m})=\int_{/h_{k,q,m}}p(\mathbf{y}_{m}|\mathbf{h}_{m}, 𝐜m)𝐡m𝐲m(𝐡m)\displaystyle\mathbf{c}_{m})\mathcal{M}_{\mathbf{h}_{m}\to\mathbf{y}_{m}}(\mathbf{h}_{m})
×\displaystyle\times 𝐜m𝐲m(𝐜m).\displaystyle\mathcal{M}_{\mathbf{c}_{m}\to\mathbf{y}_{m}}(\mathbf{c}_{m}). (26)

In the above, 𝐲m\mathbf{y}_{m} is the subscript is the shorthand for factor node p(𝐲m|𝐡m,𝐜m)p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}); /hk,q,m/h_{k,q,m} denotes the set includes all elements in 𝐡m\mathbf{h}_{m} except hk,q,mh_{k,q,m}. Then we define the belief of 𝐡m\mathbf{h}_{m} at factor node p(𝐲m|𝐡m,𝐜m)p(\mathbf{y}_{m}|\mathbf{h}_{m},\mathbf{c}_{m}) as

𝐲m(𝐡m)=k,q𝐲m(hk,q,m).\mathcal{M}_{\mathbf{y}_{m}}(\mathbf{h}_{m})=\prod_{k,q}\mathcal{M}_{\mathbf{y}_{m}}(h_{k,q,m}). (27)

Combing (IV-B) and (27), the message 𝐲m(𝐡m)\mathcal{M}_{\mathbf{y}_{m}}(\mathbf{h}_{m}) is also Gaussian with the mean 𝐡mA,post\mathbf{h}_{m}^{A,post} and variance v𝐡mA,priv_{\mathbf{h}_{m}}^{A,pri}. From [20, Eq. 11], the mean 𝐡mA,post\mathbf{h}_{m}^{A,post} is expressed as

𝐡mA,post=𝐡mA,pri\displaystyle\mathbf{h}_{m}^{A,post}=\ \mathbf{h}_{m}^{A,pri} +v𝐡mA,pri𝐀H𝚺m1\displaystyle+\ v_{\mathbf{h}_{m}}^{A,pri}\mathbf{A}^{H}\mathbf{\boldsymbol{\Sigma}}_{m}^{-1}
×(𝐲m𝐀𝐡mA,pri𝐁𝐜mA,pri)\displaystyle~{}~{}~{}~{}\times\left(\mathbf{y}_{m}-\mathbf{A}\mathbf{h}_{m}^{A,pri}-\mathbf{B}\mathbf{c}_{m}^{A,pri}\right) (28)

where 𝚺m\mathbf{\boldsymbol{\Sigma}}_{m} is the covariance matrix given by

𝚺m=v𝐡mA,pri𝐀𝐀H+v𝐜mA,pri𝐁𝐁H+σw2𝐈.\mathbf{\boldsymbol{\Sigma}}_{m}=v_{\mathbf{h}_{m}}^{A,pri}\mathbf{AA}^{H}+v_{\mathbf{c}_{m}}^{A,pri}\mathbf{BB}^{H}+\sigma_{w}^{2}\mathbf{I}. (29)

To reduce the computational complexity of channel inverse 𝚺m1{\boldsymbol{\Sigma}}_{m}^{-1}, we require that 𝐀\mathbf{A} is partial orthogonal [12], i.e., 𝐀𝐀H=KP𝐈\mathbf{A}\mathbf{A}^{H}=KP\mathbf{I}, where PP is the average power of the pilot symbol. (The design of 𝐀\mathbf{A} to guarantee partial orthogonality is presented in IV-F.) In addition, it is easy to verify 𝐁=𝐃𝐀\mathbf{B}=\mathbf{D}\mathbf{A} where the diagonal matrix 𝐃=diag([𝐝T,,𝐝T]T(𝟏T)T)TN×TN\mathbf{D}={\rm diag}\big{(}[\mathbf{d}^{T},...,\mathbf{d}^{T}]^{T}\otimes\mathbf{(}\mathbf{1}_{T})^{T}\big{)}\in\mathbb{R}^{TN\times TN} with 𝟏T\mathbf{1}_{T} being an all-one vector of length TT. Then the expression of 𝚺m\mathbf{\boldsymbol{\Sigma}}_{m} is simplified to

𝚺m=KPv𝐡mA,pri𝐈+KPv𝐜mA,pri𝐃𝐃H+σw2𝐈.{\boldsymbol{\Sigma}}_{m}=KPv_{\mathbf{h}_{m}}^{A,pri}\mathbf{I}+KPv_{\mathbf{c}_{m}}^{A,pri}\mathbf{D}\mathbf{D}^{H}+\sigma_{w}^{2}\mathbf{I}. (30)

Then the variance v𝐡mA,postv_{\mathbf{h}_{m}}^{A,post} is given by

v𝐡mA,post=v𝐡mA,prii=1TNP(v𝐡mA,pri)2Σm,i,iv_{\mathbf{h}_{m}}^{A,post}=v_{\mathbf{h}_{m}}^{A,pri}-\sum_{i=1}^{TN}\frac{P(v_{\mathbf{h}_{m}}^{A,pri})^{2}}{{\Sigma}_{m,i,i}} (31)

where Σm,i,i{\Sigma}_{m,i,i} is the (i,i)(i,i)-th element of 𝚺m{\boldsymbol{\Sigma}}_{m}. We note that (IV-B) and (31) also corresponds to the linear minimum mean-square error (LMMSE) estimator [28, Chap. 11] given the observation 𝐲m\mathbf{y}_{m}, the mean 𝐡mA,pri,𝐜mA,pri\mathbf{h}_{m}^{A,pri},\mathbf{c}_{m}^{A,pri}and the variance v𝐡mA,pri,v𝐜mA.priv_{\mathbf{h}_{m}}^{A,pri},v_{\mathbf{c}_{m}}^{A.pri}.

From the sum-produce rule, the extrinsic message from factor node p(𝐲m|𝐡m𝐜m)p(\mathbf{y}_{m}|\mathbf{h}_{m}\mathbf{c}_{m}) to variable node 𝐡m\mathbf{h}_{m} is calculated as

𝐲m𝐡m(𝐡m)𝐲m(𝐡m)𝐡m𝐲m(𝐡m).\mathcal{M}_{\mathbf{y}_{m}\to\mathbf{h}_{m}}(\mathbf{h}_{m})\propto\frac{\mathcal{M}_{\mathbf{y}_{m}}(\mathbf{h}_{m})}{\mathcal{M}_{\mathbf{h}_{m}\to\mathbf{y}_{m}}(\mathbf{h}_{m})}. (32)

Given the Gaussian messages (𝐡m)\mathcal{M}(\mathbf{h}_{m}) and 𝐡m𝐲m(𝐡m)\mathcal{M}_{\mathbf{h}_{m}\to\mathbf{y}_{m}}(\mathbf{h}_{m}), the extrinsic message is also Gaussian as

𝐲m𝐡m(𝐡m)𝒞𝒩(𝐡m;𝐡mA,ext,v𝐡mA,ext𝐈)\mathcal{M}_{\mathbf{y}_{m}\to\mathbf{h}_{m}}(\mathbf{h}_{m})\sim\mathcal{CN}(\mathbf{h}_{m};\mathbf{h}_{m}^{A,ext},v_{\mathbf{h}_{m}}^{A,ext}\mathbf{I}) (33)

where the variance v𝐡mA,extv_{\mathbf{h}_{m}}^{A,ext} and the mean 𝐡mA,ext\mathbf{h}_{m}^{A,ext} are respectively given by

v𝐡mA,ext=(1v𝐡mA,post1v𝐡mA,pri)1\displaystyle{v}_{\mathbf{h}_{m}}^{A,ext}=\left(\frac{1}{{v}_{\mathbf{h}_{m}}^{A,post}}-\frac{1}{{v}_{\mathbf{h}_{m}}^{A,pri}}\right)^{-1}
𝐡mA,ext=v𝐡mA,ext(𝐡mA,postv𝐡mA,post𝐡mA,priv𝐡mA,pri).\displaystyle\mathbf{h}_{m}^{A,ext}={v}_{\mathbf{h}_{m}}^{A,ext}\left(\frac{\mathbf{h}_{m}^{A,post}}{{v}_{\mathbf{h}_{m}}^{A,post}}-\frac{\mathbf{h}_{m}^{A,pri}}{{v}_{\mathbf{h}_{m}}^{A,pri}}\right). (34)

The calculation to obtain the belief of 𝐜m\mathbf{c}_{m} is similar. We have the Gaussian belief (𝐜m)\mathcal{M}(\mathbf{c}_{m}) with its mean and variance respectively given by

𝐜mA,post=𝐜mA,pri\displaystyle\mathbf{c}_{m}^{A,post}=\mathbf{c}_{m}^{A,pri} +v𝐜mA,pri𝐁H𝚺m1\displaystyle+\ v_{\mathbf{c}_{m}}^{A,pri}\mathbf{B}^{H}\mathbf{\boldsymbol{\Sigma}}_{m}^{-1}
×(𝐲m𝐀𝐡mA,pri𝐁𝐜mA,pri)\displaystyle\times\left(\mathbf{y}_{m}-\mathbf{A}\mathbf{h}_{m}^{A,pri}-\mathbf{B}\mathbf{c}_{m}^{A,pri}\right)
v𝐜mA,post=v𝐜mA,pri\displaystyle v_{\mathbf{c}_{m}}^{A,post}=v_{\mathbf{c}_{m}}^{A,pri} i=1TNPDi,i2(v𝐜mA,pri)2Σm,i,i\displaystyle-\sum_{i=1}^{TN}\frac{PD_{i,i}^{2}(v_{\mathbf{c}_{m}}^{A,pri})^{2}}{{\Sigma}_{m,i,i}} (35)

where Di,iD_{i,i} is the (i,i)(i,i)-th element of 𝐃\mathbf{D}. Then we obtain the extrinsic message 𝐲m𝐜m(𝐜m)𝒞𝒩(𝐜m;𝐜mA,ext,v𝐜mA,ext𝐈)\mathcal{M}_{\mathbf{y}_{m}\to\mathbf{c}_{m}}(\mathbf{c}_{m})\sim\mathcal{CN}(\mathbf{c}_{m};\mathbf{c}_{m}^{A,ext},v_{\mathbf{c}_{m}}^{A,ext}\mathbf{I}) with its mean and variance as follows:

v𝐜mA,ext=(1v𝐜mA,post1v𝐜mA,pri)1\displaystyle{v}_{\mathbf{c}_{m}}^{A,ext}=\left(\frac{1}{{v}_{\mathbf{c}_{m}}^{A,post}}-\frac{1}{{v}_{\mathbf{c}_{m}}^{A,pri}}\right)^{-1}
𝐜mA,ext=v𝐜mA,ext(𝐜mA,postv𝐜mA,post𝐜mA,priv𝐜mA,pri).\displaystyle\mathbf{c}_{m}^{A,ext}={v}_{\mathbf{c}_{m}}^{A,ext}\left(\frac{\mathbf{c}_{m}^{A,post}}{{v}_{\mathbf{c}_{m}}^{A,post}}-\frac{\mathbf{c}_{m}^{A,pri}}{{v}_{\mathbf{c}_{m}}^{A,pri}}\right). (36)

IV-C Module B: Denoiser of 𝐇k\mathbf{H}_{k}

In Module B, each 𝐇k,k\mathbf{H}_{k},\forall k is estimated individually by exploiting the prior p(𝐇k|αk)p(\mathbf{H}_{k}|\alpha_{k}) and the messages from Modules A and D. In specific, given the message 𝐲m𝐡m(𝐡m)𝒞𝒩(𝐡m;𝐡mA,ext,v𝐡mA,ext𝐈)\mathcal{M}_{\mathbf{y}_{m}\to\mathbf{h}_{m}}(\mathbf{h}_{m})\sim\mathcal{CN}(\mathbf{h}_{m};\mathbf{h}_{m}^{A,ext},v_{\mathbf{h}_{m}}^{A,ext}\mathbf{I}) from module A, we have

𝐲mhk,q,m(hk,q,m)𝒞𝒩(hk,q,m;𝐡k,q,mext,v𝐡mA,ext).\mathcal{M}_{\mathbf{y}_{m}\to h_{k,q,m}}(h_{k,q,m})\sim\mathcal{CN}(h_{k,q,m};\mathbf{h}_{k,q,m}^{ext},v_{\mathbf{h}_{m}}^{A,ext}). (37)

For description convenience, the factor node p(𝐇k|αk)p(\mathbf{H}_{k}|\alpha_{k}) is replaced by fkBf_{k}^{B}. Then the message from variable node 𝐇k\mathbf{H}_{k} to factor node p(𝐇k|αk)p(\mathbf{H}_{k}|\alpha_{k}) is expressed as

𝐇kfkB(𝐇k)\displaystyle\mathcal{M}_{\mathbf{H}_{k}\to f_{k}^{B}}(\mathbf{H}_{k}) =q,m𝐲mhk,q,m(hk,q,m)\displaystyle=\prod_{q,m}\mathcal{M}_{\mathbf{y}_{m}\to h_{k,q,m}}(h_{k,q,m})
=𝒞𝒩(𝐇k;𝐇kB,pri,𝐕B)\displaystyle={\cal{CN}}(\mathbf{H}_{k};\mathbf{H}_{k}^{B,pri},\mathbf{V}^{B}) (38)

where 𝐇kB,priQ×M\mathbf{H}_{k}^{B,pri}\in\mathbb{C}^{Q\times M} with the (q,m)(q,m)-th element hk,q,mB,pri=hk,q,mA,exth_{k,q,m}^{B,pri}=h_{k,q,m}^{A,ext} and 𝐕B=diag([v𝐡1B,pri,,v𝐡MB,pri]T𝟏QT)\mathbf{V}^{B}={\rm diag}([v_{\mathbf{h}_{1}}^{B,pri},...,v_{\mathbf{h}_{M}}^{B,pri}]^{T}\otimes\mathbf{1}_{Q}^{T}) with 𝟏Q\mathbf{1}_{Q} being an all-one vector of length QQ.

Combing the Bernoulli Gaussian prior p(𝐇k|αk)p(\mathbf{H}_{k}|\alpha_{k}), Gaussian message 𝐇kfkB(𝐇k)\mathcal{M}_{\mathbf{H}_{k}\to f_{k}^{B}}(\mathbf{H}_{k}) and Bernoulli message αkfkB(αk)\mathcal{M}_{\alpha_{k}\to f^{B}_{k}}(\alpha_{k}) in (IV-E), the belief of 𝐇k\mathbf{H}_{k} at factor node fkBf_{k}^{B} is Bernoulli Gaussian and expressed as

fkB(𝐇k)\displaystyle\mathcal{M}_{f_{k}^{B}}(\mathbf{H}_{k}) αk=01p(𝐇k|αk)αkfkB(αk)𝐇kfkB(𝐇k)\displaystyle\propto\sum_{\alpha_{k}=0}^{1}p(\mathbf{H}_{k}|\alpha_{k})\mathcal{M}_{\alpha_{k}\to f^{B}_{k}}(\alpha_{k})\mathcal{M}_{\mathbf{H}_{k}\to f_{k}^{B}}(\mathbf{H}_{k})
=(1λkB,post)δ(𝐇k)+λkB,post𝒞𝒩(𝐇k;𝝁kB;𝚽kB)\displaystyle=(1-\lambda_{k}^{B,post})\delta(\mathbf{H}_{k})+\lambda_{k}^{B,post}{\cal CN}(\mathbf{H}_{k};\boldsymbol{\mu}_{k}^{B};\mathbf{\Phi}_{k}^{B}) (39)

where

λkB,post=(1+(1λkB,pri)𝒞𝒩(𝟎;𝐇kB,pri,𝐕B)λkB,pri𝒞𝒩(𝟎;𝐇kB,pri,𝐕B+ϑ𝐇𝐈))1\displaystyle\lambda_{k}^{B,post}=\left(1+\frac{(1-\lambda_{k}^{B,pri})\cdot{\cal CN}(\mathbf{0};\mathbf{H}_{k}^{B,pri},\mathbf{V}^{B})}{\lambda_{k}^{B,pri}\cdot{\cal CN}(\mathbf{0};\mathbf{H}_{k}^{B,pri},\mathbf{V}^{B}+\vartheta_{\mathbf{H}}\mathbf{I})}\right)^{-1}
𝚽kB=(ϑ𝐇1𝐈+(𝐕B)1)1\displaystyle\mathbf{\Phi}_{k}^{B}=(\vartheta_{\mathbf{H}}^{-1}\mathbf{I}+(\mathbf{V}^{B})^{-1})^{-1}
𝝁kB=ϑ𝐇(ϑ𝐇𝐈+𝐕B)1vec(𝐇kB,pri).\displaystyle\boldsymbol{\mu}_{k}^{B}={\vartheta_{\mathbf{H}}}{(\vartheta_{\mathbf{H}}{\bf{I}}+\mathbf{V}^{B})^{-1}}\cdot{\rm vec}(\mathbf{H}_{k}^{B,pri}). (40)

We note that the mean of vec(𝐇k){\rm vec}(\mathbf{H}_{k}) with respect to fkB(𝐇k)\mathcal{M}_{f_{k}^{B}}(\mathbf{H}_{k}) is

vec(𝐇kB,post)=λkB,post𝝁kB.\displaystyle{\rm vec}(\mathbf{H}_{k}^{B,post})=\lambda_{k}^{B,post}\boldsymbol{\mu}_{k}^{B}. (41)

Define l=(m1)Q+ql=(m-1)Q+q. The variance of the (q,m)(q,m)-th element of 𝐇k\mathbf{H}_{k} with respect to fkB(𝐇k)\mathcal{M}_{f_{k}^{B}}(\mathbf{H}_{k}) is

ϑhk,q,mB,post=λkB,post(|μk,lB|2+Φk,l,lB)|hk,q,mB,post|2.\vartheta_{h_{k,q,m}}^{B,post}=\lambda_{k}^{B,post}\left(|\mu_{k,l}^{B}|^{2}+\Phi_{k,l,l}^{B}\right)-|h_{k,q,m}^{B,post}|^{2}. (42)

Recall that the relationship between 𝐇k\mathbf{H}_{k} and 𝐡m\mathbf{h}_{m} is described as [𝐇1T,,𝐇KT]T=[𝐡1,,𝐡M][\mathbf{H}_{1}^{T},...,\mathbf{H}_{K}^{T}]^{T}=[\mathbf{h}_{1},...,\mathbf{h}_{M}]. Through such relationship, we obtain 𝐡mB,post\mathbf{h}_{m}^{B,post}. Following [12, 20], the variance v𝐡mB,post{v}_{\mathbf{h}_{m}}^{B,post} is approximated by

v𝐡mB,post=1KQk,qϑhk,q,mB,post.{v}_{\mathbf{h}_{m}}^{B,post}=\frac{1}{KQ}\sum_{k,q}\vartheta_{h_{k,q,m}}^{B,post}. (43)

Then the belief of 𝐡m\mathbf{h}_{m} at Module B is defined as

B(𝐡m)𝒞𝒩(𝐡m;𝐡mB,post,v𝐡mB,post𝐈).\mathcal{M}_{B}(\mathbf{h}_{m})\sim\mathcal{CN}(\mathbf{h}_{m};\mathbf{h}_{m}^{B,post},{v}_{\mathbf{h}_{m}}^{B,post}\mathbf{I}). (44)

The above Gaussian belief assumption is widely used in message passing based iterative algorithms such as Turbo-CS [12], approximate message passing (AMP) [13] and expectation propagation (EP) [30]. Such treatment may loses some information but facilities the message updates. (Strictly speaking, the belief of the variable corresponds to a factor node instead of a Module. However, the belief of 𝐡m\mathbf{h}_{m} is associated with KK factor nodes p(𝐇k|αk),kp(\mathbf{H}_{k}|\alpha_{k}),\forall k, making the expression cumbersome. For convenience, we use the definition B(𝐡m)\mathcal{M}_{B}(\mathbf{h}_{m}).)

From the sum-product rule, we have 𝐡mB(𝐡m)=𝐲m𝐡m(𝐡m)\mathcal{M}_{\mathbf{h}_{m}\to B}(\mathbf{h}_{m})={\mathcal{M}_{\mathbf{y}_{m}\to\mathbf{h}_{m}}(\mathbf{h}_{m})}. Then the extrinsic message is calculated as

B𝐡m(𝐡m)B(𝐡m)𝐲m𝐡m(𝐡m).\mathcal{M}_{B\to\mathbf{h}_{m}}(\mathbf{h}_{m})\propto\frac{\mathcal{M}_{B}(\mathbf{h}_{m})}{\mathcal{M}_{\mathbf{y}_{m}\to\mathbf{h}_{m}}(\mathbf{h}_{m})}. (45)

Given the Gaussian messages B(𝐡m)\mathcal{M}_{B}(\mathbf{h}_{m}) and 𝐡mB(𝐡m)\mathcal{M}_{\mathbf{h}_{m}\to B(\mathbf{h}_{m})}, the extrinsic message is also Gaussian with its variance and mean respectively given by

v𝐡mB,ext=(1v𝐡mB,post1v𝐡mB,pri)1\displaystyle{v}_{\mathbf{h}_{m}}^{B,ext}=\left(\frac{1}{{v}_{\mathbf{h}_{m}}^{B,post}}-\frac{1}{{v}_{\mathbf{h}_{m}}^{B,pri}}\right)^{-1}
𝐡mB,ext=v𝐡mB,ext(𝐡mB,postv𝐡mB,post𝐡mB,priv𝐡mB,pri)\displaystyle\mathbf{h}_{m}^{B,ext}={v}_{\mathbf{h}_{m}}^{B,ext}\left(\frac{\mathbf{h}_{m}^{B,post}}{{v}_{\mathbf{h}_{m}}^{B,post}}-\frac{\mathbf{h}_{m}^{B,pri}}{{v}_{\mathbf{h}_{m}}^{B,pri}}\right) (46)

where 𝐡mB,ext\mathbf{h}_{m}^{B,ext} and v𝐡mB,ext{v}_{\mathbf{h}_{m}}^{B,ext} are respectively used as the input mean and variance of 𝐡m\mathbf{h}_{m} for Module A, i.e., 𝐡mA,pri=𝐡mB,ext\mathbf{h}_{m}^{A,pri}=\mathbf{h}_{m}^{B,ext} and v𝐡mA,pri=v𝐡mB,ext{v}_{\mathbf{h}_{m}}^{A,pri}={v}_{\mathbf{h}_{m}}^{B,ext}. Then we have

𝐡m𝐲m(𝐡m)=B𝐡m(𝐡m)𝒞𝒩(𝐡m;𝐡mA,pri,v𝐡mA,pri𝐈)\mathcal{M}_{\mathbf{h}_{m}\to\mathbf{y}_{m}}(\mathbf{h}_{m})=\mathcal{M}_{B\to\mathbf{h}_{m}}(\mathbf{h}_{m})\sim\mathcal{CN}(\mathbf{h}_{m};\mathbf{h}_{m}^{A,pri},v_{\mathbf{h}_{m}}^{A,pri}\mathbf{I}) (47)

IV-D Module C: Denoiser of 𝐂k\mathbf{C}_{k}

Similarly to the process in Module B, each 𝐂k,k\mathbf{C}_{k},\forall k in module C is estimated individually by exploiting the prior p(𝐂k|αk)p(\mathbf{C}_{k}|\alpha_{k}) and the messages from Modules A and D. For description convenience, fkCf_{k}^{C} is used to replace factor node p(𝐂k|αk)p(\mathbf{C}_{k}|\alpha_{k}). The message from variable node 𝐂k\mathbf{C}_{k} to factor node p(𝐂k|αk)p(\mathbf{C}_{k}|\alpha_{k}) is expressed as

𝐂kfkC(𝐂k)𝒞𝒩(𝐂k;𝐂kC,pri,𝐕C)\displaystyle\mathcal{M}_{\mathbf{C}_{k}\to f_{k}^{C}}(\mathbf{C}_{k})\sim{\cal{CN}}(\mathbf{C}_{k};\mathbf{C}_{k}^{C,pri},\mathbf{V}^{C}) (48)

where 𝐂kC,priQ×M\mathbf{C}_{k}^{C,pri}\in\mathbb{C}^{Q\times M} with its (q,m)(q,m)-th element ck,q,mC,pri=ck,q,mA,extc_{k,q,m}^{C,pri}=c_{k,q,m}^{A,ext} and 𝐕C=diag([v𝐜1C,pri,,v𝐜MC,pri]T𝟏QT)\mathbf{V}^{C}={\rm diag}([v_{\mathbf{c}_{1}}^{C,pri},...,v_{\mathbf{c}_{M}}^{C,pri}]^{T}\otimes\mathbf{1}_{Q}^{T}).

With the Bernoulli Gaussian prior p(𝐂k|αk)p(\mathbf{C}_{k}|\alpha_{k}), Gaussian message 𝐂kfkC(𝐂k)\mathcal{M}_{\mathbf{C}_{k}\to f_{k}^{C}}(\mathbf{C}_{k}) and message αkfkC(αk)\mathcal{M}_{\alpha_{k}\to f^{C}_{k}}(\alpha_{k}) in (IV-E), the belief of 𝐂k\mathbf{C}_{k} at factor node fkCf_{k}^{C} is Bernoulli Gaussian and expressed as

fkC(𝐂k)\displaystyle\mathcal{M}_{f_{k}^{C}}(\mathbf{C}_{k}) αk=01p(𝐂k|αk)αkfkC(αk)𝐂kfkC(𝐂k)\displaystyle\propto\sum_{\alpha_{k}=0}^{1}p(\mathbf{C}_{k}|\alpha_{k})\mathcal{M}_{\alpha_{k}\to f^{C}_{k}}(\alpha_{k})\mathcal{M}_{\mathbf{C}_{k}\to f_{k}^{C}}(\mathbf{C}_{k})
=(1λkC,post)δ(𝐂k)+λkC,post𝒞𝒩(𝐂k;𝝁kC;𝚽kC)\displaystyle=(1-\lambda_{k}^{C,post})\delta(\mathbf{C}_{k})+\lambda_{k}^{C,post}{\cal CN}(\mathbf{C}_{k};\boldsymbol{\mu}_{k}^{C};\mathbf{\Phi}_{k}^{C}) (49)

where

λkC,post=(1+(1λkC,pri)𝒞𝒩(𝟎;𝐂kC,pri,𝐕C)λkC,pri𝒞𝒩(𝟎;𝐂kC,pri,𝐕C+ϑ𝐂𝐈))1\displaystyle\lambda_{k}^{C,post}=\left(1+\frac{(1-\lambda_{k}^{C,pri})\cdot{\cal CN}(\mathbf{0};\mathbf{C}_{k}^{C,pri},\mathbf{V}^{C})}{\lambda_{k}^{C,pri}\cdot{\cal CN}(\mathbf{0};\mathbf{C}_{k}^{C,pri},\mathbf{V}^{C}+\vartheta_{\mathbf{C}}\mathbf{I})}\right)^{-1}
𝚽kC=(ϑ𝐂1𝐈+(𝐕C)1)1\displaystyle\mathbf{\Phi}_{k}^{C}=(\vartheta_{\mathbf{C}}^{-1}\mathbf{I}+(\mathbf{V}^{C})^{-1})^{-1}
𝝁kC=ϑ𝐂(ϑ𝐂𝐈+𝐕C)1vec(𝐂kC,pri).\displaystyle\boldsymbol{\mu}_{k}^{C}={\vartheta_{\mathbf{C}}}{(\vartheta_{\mathbf{C}}{\bf{I}}+\mathbf{V}^{C})^{-1}}\cdot{\rm vec}(\mathbf{C}_{k}^{C,pri}). (50)

Then the mean of 𝐂k\mathbf{C}_{k} with respect to fkC(𝐂k)\mathcal{M}_{f_{k}^{C}}(\mathbf{C}_{k}) is

vec(𝐂kpost)=λkC,post𝝁kC.\displaystyle{\rm vec}(\mathbf{C}_{k}^{post})=\lambda_{k}^{C,post}\boldsymbol{\mu}_{k}^{C}. (51)

The variance of the (q,m)(q,m)-th element of 𝐂k\mathbf{C}_{k} is given by

ϑck,q,mC,post=λkC,post(|μk,lC|2+Φk,l,lC)|ck,q,mC,post|2.\vartheta_{c_{k,q,m}}^{C,post}=\lambda_{k}^{C,post}\left(|\mu_{k,l}^{C}|^{2}+\Phi_{k,l,l}^{C}\right)-|c_{k,q,m}^{C,post}|^{2}. (52)

Similarly to (44), we define the belief of 𝐜m\mathbf{c}_{m} at Module C as

C(𝐜m)𝒞𝒩(𝐜m;𝐜mC,post,v𝐜mC,post𝐈)\mathcal{M}_{C}(\mathbf{c}_{m})\sim\mathcal{CN}(\mathbf{c}_{m};\mathbf{c}_{m}^{C,post},{v}_{\mathbf{c}_{m}}^{C,post}\mathbf{I}) (53)

with the variance v𝐜mC,post{v}_{\mathbf{c}_{m}}^{C,post} given by

v𝐜mC,post=1KQk,qϑck,q,mC,post.{v}_{\mathbf{c}_{m}}^{C,post}=\frac{1}{KQ}\sum_{k,q}\vartheta_{c_{k,q,m}}^{C,post}. (54)

Then we calculate the Gaussian extrinsic message with its variance and mean as follows :

v𝐜mC,ext=(1v𝐜mC,post1v𝐜mC,pri)1\displaystyle{v}_{\mathbf{c}_{m}}^{C,ext}=\left(\frac{1}{{v}_{\mathbf{c}_{m}}^{C,post}}-\frac{1}{{v}_{\mathbf{c}_{m}}^{C,pri}}\right)^{-1}
𝐜mA,pri=v𝐜mC,ext(𝐜mC,postv𝐜mC,post𝐜mC,priv𝐜mC,pri).\displaystyle\mathbf{c}_{m}^{A,pri}={v}_{\mathbf{c}_{m}}^{C,ext}\left(\frac{\mathbf{c}_{m}^{C,post}}{{v}_{\mathbf{c}_{m}}^{C,post}}-\frac{\mathbf{c}_{m}^{C,pri}}{{v}_{\mathbf{c}_{m}}^{C,pri}}\right). (55)

The input mean and variance of 𝐜m\mathbf{c}_{m} for module A are respectively set as 𝐜mA,pri=𝐜mC,ext\mathbf{c}_{m}^{A,pri}=\mathbf{c}_{m}^{C,ext} and v𝐜mA,pri=v𝐜mC,ext{v}_{\mathbf{c}_{m}}^{A,pri}={v}_{\mathbf{c}_{m}}^{C,ext}.

IV-E Module D: Estimation of αk\alpha_{k}

In Module D, we calculate the messages αkfkB(αk)\mathcal{M}_{\alpha_{k}\to f^{B}_{k}}(\alpha_{k}) and αkfkC(αk)\mathcal{M}_{\alpha_{k}\to f^{C}_{k}}(\alpha_{k}) as the inputs of Modules B and C, respectively. Furthermore, the device activity is detected by combing the messages from Modules B and C. According to the sum-product rule, the message from factor node fkBf_{k}^{B} to variable node αk\alpha_{k} is

fkBαk(αk)\displaystyle\mathcal{M}_{f_{k}^{B}\to\alpha_{k}}(\alpha_{k}) 𝐇kfkB(𝐇k,αk)𝐇kfkB(𝐇k)\displaystyle\propto\int_{\mathbf{H}_{k}}f_{k}^{B}(\mathbf{H}_{k},\alpha_{k})\cdot\mathcal{M}_{\mathbf{H}_{k}\to f_{k}^{B}}(\mathbf{H}_{k})
=(1πkB)δ[αk]+πkBδ[1αk]\displaystyle=(1-\pi_{k}^{B})\delta[\alpha_{k}]+\pi_{k}^{B}\delta[1-\alpha_{k}] (56)

where

πkB=(1+𝒞𝒩(𝟎;𝐇kB,pri,𝐕B)𝒞𝒩(𝟎;𝐇kB,pri,𝐕B+ϑ𝐇𝐈))1.\pi_{k}^{B}=\left(1+\frac{{\cal CN}(\mathbf{0};\mathbf{H}_{k}^{B,pri},\mathbf{V}^{B})}{{\cal CN}(\mathbf{0};\mathbf{H}_{k}^{B,pri},\mathbf{V}^{B}+\vartheta_{\mathbf{H}}\mathbf{I})}\right)^{-1}. (57)

Then the message from variable node αk\alpha_{k} to factor node fkCf_{k}^{C} is

αkfkC(αk)\displaystyle\mathcal{M}_{\alpha_{k}\to f_{k}^{C}}(\alpha_{k}) fkBαkp(αk)\displaystyle\propto\mathcal{M}_{f^{B}_{k}\to\alpha_{k}}\cdot p(\alpha_{k})
=(1λkC,pri)δ[αk]+λkC,priδ[1αk]\displaystyle=(1-\lambda_{k}^{C,pri})\delta[\alpha_{k}]+\lambda_{k}^{C,pri}\delta[1-\alpha_{k}] (58)

with

λkC,pri=λπkBλπkB+(1λ)(1πkB).\lambda_{k}^{C,pri}=\frac{\lambda\cdot\pi_{k}^{B}}{\lambda\cdot\pi_{k}^{B}+(1-\lambda)\cdot(1-\pi_{k}^{B})}. (59)

Similarly to (IV-E), the message from factor node fkCf_{k}^{C} to variable node αk\alpha_{k} is

fkCαk=(1πkC)δ[αk]+πkCδ[1αk]\displaystyle\mathcal{M}_{f_{k}^{C}\to\alpha_{k}}=(1-\pi_{k}^{C})\delta[\alpha_{k}]+\pi_{k}^{C}\delta[1-\alpha_{k}] (60)

where

πkC=(1+𝒞𝒩(𝟎;𝐂kC,pri,𝐕C)𝒞𝒩(𝟎;𝐂kC,pri,𝐕C+ϑ𝐂𝐈))1.\pi_{k}^{C}=\left(1+\frac{{\cal CN}(\mathbf{0};\mathbf{C}_{k}^{C,pri},\mathbf{V}^{C})}{{\cal CN}(\mathbf{0};\mathbf{C}_{k}^{C,pri},\mathbf{V}^{C}+\vartheta_{\mathbf{C}}\mathbf{I})}\right)^{-1}. (61)

Then the message from variable node αk\alpha_{k} to factor node fkBf^{B}_{k} is

αkfkB(αk)\displaystyle\mathcal{M}_{\alpha_{k}\to f^{B}_{k}}(\alpha_{k}) fkCαkp(αk)\displaystyle\propto\mathcal{M}_{f^{C}_{k}\to\alpha_{k}}\cdot p(\alpha_{k})
=(1λkB,pri)δ[αk]+λkB,priδ[1αk]\displaystyle=(1-\lambda_{k}^{B,pri})\delta[\alpha_{k}]+\lambda_{k}^{B,pri}\delta[1-\alpha_{k}] (62)

with

λkB,pri=λπkCλπkC+(1λ)(1πkC).\lambda_{k}^{B,pri}=\frac{\lambda\cdot\pi_{k}^{C}}{\lambda\cdot\pi_{k}^{C}+(1-\lambda)\cdot(1-\pi_{k}^{C})}. (63)

Define the belief of αk\alpha_{k} at factor node p(αk)p(\alpha_{k}) as k(αk)\mathcal{M}_{k}(\alpha_{k}). From the sum-product rule, k(αk)\mathcal{M}_{k}(\alpha_{k}) is given by

k(αk)\displaystyle\mathcal{M}_{k}(\alpha_{k}) fkBαk(αk)fkCαk(αk)p(αk)\displaystyle\propto\mathcal{M}_{f_{k}^{B}\to\alpha_{k}}(\alpha_{k})\mathcal{M}_{f_{k}^{C}\to\alpha_{k}}(\alpha_{k})p(\alpha_{k})
=(1λkD,post)δ[αk]+λkD,postδ[1αk]\displaystyle=(1-\lambda_{k}^{D,post})\delta[\alpha_{k}]+\lambda_{k}^{D,post}\delta[1-\alpha_{k}] (64)

where

λkD,post=λπkBπkCλπkBπkC+(1λ)(1πkB)(1πkC).\lambda_{k}^{D,post}=\frac{\lambda\cdot\pi_{k}^{B}\cdot\pi_{k}^{C}}{\lambda\cdot\pi_{k}^{B}\cdot\pi_{k}^{C}+(1-\lambda)\cdot(1-\pi_{k}^{B})\cdot(1-\pi_{k}^{C})}. (65)

Clearly, λkD,post\lambda_{k}^{D,post} (0λkD,post10\leq\lambda_{k}^{D,post}\leq 1) indicates the probability that the kk-th device is active. Therefore, we adopt a threshold-based strategy for device activity detection as

α^k={1,λkD,postλthr0,λkD,post<λthrk=1,,K\hat{\alpha}_{k}=\left\{\begin{array}[]{l}1,\quad\lambda_{k}^{D,post}\geq\lambda^{thr}\vspace{3pt}\\ 0,\quad\lambda_{k}^{D,post}<\lambda^{thr}\end{array}\quad k=1,...,K\right. (66)

where λthr\lambda^{thr} is a predetermined threshold.

IV-F Pilot Design and Complexity Analysis

The overall algorithm is summarized in Algorithm 1. In each iteration, the channel mean matrix 𝐇\mathbf{H} is first updated (step 2-8) and then the channel compensation matrix 𝐂\mathbf{C} is updated (step 9-15). This is because the power of the channel mean matrix 𝐇\mathbf{H} is dominant and the iteration process effectively suppresses error propagation. Note that the prior parameters learning is shown in the next section.

As mentioned in Section IV-B, the pilot matrix 𝐀\mathbf{A} is required to be partial orthogonal. To fulfill this requirement, the pilot symbols {ak,n(t)}k=1K\{a_{k,n}^{(t)}\}_{k=1}^{K} transmitted on the nn-th subcarrier at the tt-th OFDM symbol has the following property:

[a1,n(t),,ak,n(t),,aK,n(t)]=𝐮i[a_{1,n}^{(t)},...,a_{k,n}^{(t)},...,a_{K,n}^{(t)}]=\mathbf{u}_{i} (67)

where 𝐮i\mathbf{u}_{i} is a row vector randomly selected from an orthogonal matrix 𝐔K×K\mathbf{U}\in\mathbb{C}^{K\times K} and the selected row is different for different n,tn,t. Combing 𝐀=[𝚲1𝐄1,,𝚲K𝐄1]\mathbf{A}=[\mathbf{\Lambda}_{1}\mathbf{E}_{1},...,\mathbf{\Lambda}_{K}\mathbf{E}_{1}] and the definition of 𝚲k\mathbf{\Lambda}_{k} in (7), it is easy to verify the partial orthogonality of 𝐀\mathbf{A}.

We next show that when 𝐔\mathbf{U} is the discrete Fourier transform (DFT) matrix, the algorithm complexity can be further reduced. To enable the fast Fourier transform (FFT), the pilot matrix 𝐀\mathbf{A} is expressed as

𝐀=diag(𝐒1𝐔,,𝐒Q𝐔)𝐏\mathbf{A}={\rm diag(\mathbf{S}_{1}\mathbf{U},...,\mathbf{S}_{Q}\mathbf{U})}\mathbf{P} (68)

where 𝐒qTN/Q×K\mathbf{S}_{q}\in\mathbb{R}^{TN/Q\times K} is a row selection matrix consisting of TN/QTN/Q randomly selected rows from the K×KK\times K identity matrix. (The selected rows are different for different 𝐒q\mathbf{S}_{q}.) 𝐏=[𝐏1,,𝐏K]\mathbf{P}=[\mathbf{P}_{1},...,\mathbf{P}_{K}] is a column exchange matrix. In specific, in the qq-th column of 𝐏kKQ×Q,k\mathbf{P}_{k}\in\mathbb{R}^{KQ\times Q},\forall k, only the k+K(q1)k+K(q-1)-th row is one while the others are zeros. Note that |ak,n(t)|2=P|a_{k,n}^{(t)}|^{2}=P and matrix 𝐁\mathbf{B} has the same fast transform as 𝐀\mathbf{A} due to the relationship 𝐁=𝐃𝐀\mathbf{B}=\mathbf{D}\mathbf{A}.

By using the FFT, the estimates of 𝐡mQK\mathbf{h}_{m}\in\mathbb{C}^{QK} and 𝐜mQK,m\mathbf{c}_{m}\in\mathbb{C}^{QK},\forall m in Module A has the complexity 𝒪(QMKlog2K){\cal O}(QMK{\rm log_{2}}K). In modules B, C and D, the estimates of 𝐇kQ×M\mathbf{H}_{k}\in\mathbb{C}^{Q\times M}, 𝐂kQ×M\mathbf{C}_{k}\in\mathbb{C}^{Q\times M}, and αk,k,\alpha_{k},\forall k, involve vector multiplications with the complexity 𝒪(QMK){\cal O}(QMK). As a result, the overall complexity of Turbo-MP is 𝒪(QMKlog2K)+𝒪(QMK){\cal O}(QMK{\rm log_{2}}K)+{\cal O}(QMK) per iteration. It is worth noting that the algorithm complexity is linear to the antenna number MM and is approximately linear to the device number KK.

Algorithm 1 Turbo Message Passing (Turbo-MP)
0:  𝐘\mathbf{Y}, 𝐀\mathbf{A}, 𝐁\mathbf{B}.   1: Initialize 𝜽\boldsymbol{\theta} by the EM or NN approach. while the stopping criterion is not met do % Module A: Linear estimation of 𝐡m\mathbf{h}_{m}   2: Update 𝐡mA,post\mathbf{h}_{m}^{A,post}, v𝐡mA,postv_{\mathbf{h}_{m}}^{A,post} by (19) and (21)-(22), m\forall m.    3: Update 𝐡mA,ext\mathbf{h}_{m}^{A,ext}, v𝐡mA,extv_{\mathbf{h}_{m}}^{A,ext} by (25), m\forall m. % Module B: Denoiser of 𝐇k\mathbf{H}_{k}    4: Update 𝐇kB,pri\mathbf{H}_{k}^{B,pri}, 𝐕B\mathbf{V}^{B} by (29), k\forall k.    5: Update fkCαk\mathcal{M}_{f_{k}^{C}\to\alpha_{k}}, αkfkB\mathcal{M}_{\alpha_{k}\to f_{k}^{B}} by (51)-(54), k\forall k.    6: Update 𝐇kB,post\mathbf{H}_{k}^{B,post}, vhk,q,mB,postv_{h_{k,q,m}}^{B,post} by (30)-(33), k\forall k.    7: Update 𝐡mB,post\mathbf{h}_{m}^{B,post}, v𝐡mB,postv_{\mathbf{h}_{m}}^{B,post} by (34), m\forall m.    8: Update 𝐡mB,ext\mathbf{h}_{m}^{B,ext}, v𝐡mB,extv_{\mathbf{h}_{m}}^{B,ext} by (37), m\forall m. % Module A: Linear estimation of 𝐜m\mathbf{c}_{m}   9: Update 𝐜mA,post\mathbf{c}_{m}^{A,post}, v𝐜mA,postv_{\mathbf{c}_{m}}^{A,post} by (21) and (26), m\forall m.    10: Update 𝐜mA,ext\mathbf{c}_{m}^{A,ext}, v𝐜mA,extv_{\mathbf{c}_{m}}^{A,ext} by (27), m\forall m. % Module C: Denoiser of 𝐂k\mathbf{C}_{k}    11: Update 𝐂kC,pri\mathbf{C}_{k}^{C,pri}, 𝐕C\mathbf{V}^{C} by (39), k\forall k.    12: Update fkBαk\mathcal{M}_{f_{k}^{B}\to\alpha_{k}}, αkfkC\mathcal{M}_{\alpha_{k}\to f_{k}^{C}} by (47)-(50), k\forall k.    13: Update 𝐂kC,post\mathbf{C}_{k}^{C,post}, vck,q,mC,postv_{c_{k,q,m}}^{C,post} by (40)-(43), k\forall k.    14: Update 𝐜mC,post\mathbf{c}_{m}^{C,post}, v𝐜mC,postv_{\mathbf{c}_{m}}^{C,post} by (45), m\forall m.    15: Update 𝐜mC,ext\mathbf{c}_{m}^{C,ext}, v𝐜mC,extv_{\mathbf{c}_{m}}^{C,ext} by (46), m\forall m. % Parameters learning    16: Update 𝜽\boldsymbol{\theta} by EM or NN approach. end while    17: Update λkD,post\lambda_{k}^{D,post} by (55) and (56), k\forall k.
0:  𝐇kB,post\mathbf{H}_{k}^{B,post}, 𝐂kC,post\mathbf{C}_{k}^{C,post}, and λkD,post\lambda_{k}^{D,post}, k\forall k.

V Parameters Learning

The prior parameters 𝜽={ϑ𝐇,ϑ𝐂,σw2,λ}\boldsymbol{\theta}=\{\vartheta_{\mathbf{H}},\vartheta_{\mathbf{C}},\sigma_{w}^{2},\lambda\} used in Turbo-MP are unknown and required to be estimated in practice. In the following, we utilize two machine learning methods, i.e., the EM and the NN approach, to learn these prior parameters.

V-A EM Approach

We first use the EM algorithm [23] to learn 𝜽\boldsymbol{\theta}. The EM process is described as

𝜽(i+1)=argmax𝜽𝔼[lnp(𝐘,𝐇,𝐂,𝜶;𝜽)𝐘;𝜽(i)]\boldsymbol{\theta}^{(i+1)}=\arg\max_{\boldsymbol{\theta}}\mathbb{E}\left[\ln p(\mathbf{Y},\mathbf{H},\mathbf{C},\boldsymbol{\alpha};\boldsymbol{\theta})\mid\mathbf{Y};\boldsymbol{\theta}^{(i)}\right] (69)

where 𝜽(i)\boldsymbol{\theta}^{(i)} is the estimate of 𝜽\boldsymbol{\theta} at the ii-th EM iteration. 𝔼[|𝐘;𝜽(i)]\mathbb{E}[\cdot|\mathbf{Y};\boldsymbol{\theta}^{(i)}] represents the expectation over the posterior distribution p(𝐇,𝐂,𝜶|𝐘;𝜽(i))p(\mathbf{H},\mathbf{C},\boldsymbol{\alpha}|\mathbf{Y};\boldsymbol{\theta}^{(i)}). Note that it is difficult to obtain the true posterior distribution. Instead, we utilize the message products kfkB(𝐇k)fkC(𝐂k)k(αk)\prod_{k}\mathcal{M}_{f_{k}^{B}}(\mathbf{H}_{k})\mathcal{M}_{f_{k}^{C}}(\mathbf{C}_{k})\mathcal{M}_{k}(\alpha_{k}) as an approximation. Then we set the derivatives of 𝔼[lnp(𝐇,𝐂,𝐘;𝜽)𝐘;𝜽(i)]\mathbb{E}[\ln p(\mathbf{H},\mathbf{C},\mathbf{Y};\boldsymbol{\theta})\mid\mathbf{Y};\boldsymbol{\theta}^{(i)}] (with respect to ϑ𝐇\vartheta_{\mathbf{H}}) to zero and obtain

ϑ𝐇(i+1)=kλkD,post(𝐇kB,postF2+q,mϑhk,q,mB,post)QMkλkD,post.\displaystyle\vartheta_{\mathbf{H}}^{(i+1)}=\frac{\sum_{k}\lambda_{k}^{D,post}\left(||\mathbf{H}_{k}^{B,post}||_{F}^{2}+\sum_{q,m}\vartheta_{h_{k,q,m}}^{B,post}\right)}{QM\sum_{k}\lambda_{k}^{D,post}}. (70)

Similarly, the EM estimate of ϑ𝐂\vartheta_{\mathbf{C}} is given by

ϑ𝐂(i+1)=kλkD,post(𝐂kC,postF2+q,mϑck,q,mC,post)QMkλkD,post.\displaystyle\vartheta_{\mathbf{C}}^{(i+1)}=\frac{\sum_{k}\lambda_{k}^{D,post}\left(||\mathbf{C}_{k}^{C,post}||_{F}^{2}+\sum_{q,m}\vartheta_{c_{k,q,m}}^{C,post}\right)}{QM\sum_{k}\lambda_{k}^{D,post}}. (71)

The EM estimate of σw2\sigma_{w}^{2} is given by

(σw2)(i+1)=\displaystyle(\sigma_{w}^{2})^{(i+1)}= 1MTN𝐘𝐀𝐇B,post𝐁𝐂C,postF2\displaystyle\frac{1}{MTN}\Big{|}\Big{|}\mathbf{Y}-\mathbf{A}\mathbf{H}^{B,post}-\mathbf{B}\mathbf{C}^{C,post}\Big{|}\Big{|}_{F}^{2}
+KMm(v𝐡mB,post+1TNi=1TNDi,i2v𝐜mC,post).\displaystyle+\frac{K}{M}\sum_{m}\big{(}v_{\mathbf{h}_{m}}^{B,post}+\frac{1}{TN}\sum_{i=1}^{TN}D_{i,i}^{2}v_{\mathbf{c}_{m}}^{C,post}\big{)}. (72)

The EM estimate of λ\lambda is given by

λ(i+1)=1KkλkD,post.\lambda^{(i+1)}=\frac{1}{K}\sum_{k}\lambda_{k}^{D,post}. (73)

Turbo-MP algorithm with EM approach to learn 𝜽\boldsymbol{\theta} is shown in Algorithm 1, which we refer to as Turbo-MP-EM. In practice, we can update 𝐇\mathbf{H} several times and then update 𝐂\mathbf{C} once to improve the algorithm stability. Besides, it is known the estimation accuracy of ϑ𝐇\vartheta_{\mathbf{H}}, ϑ𝐂\vartheta_{\mathbf{C}}, and λ\lambda by EM relies on the accuracy of the posterior approximation in Turbo-MP, and inaccurate estimation may affect the algorithm convergence. Therefore, we recommend to update ϑ𝐇\vartheta_{\mathbf{H}}, ϑ𝐂\vartheta_{\mathbf{C}}, and λ\lambda once after 𝐇\mathbf{H} and 𝐂\mathbf{C} is updated several times. In the first algorithm iteration, we initialize ϑ𝐇(0)=1\vartheta_{\mathbf{H}}^{(0)}=1, ϑ𝐂(0)=103\vartheta_{\mathbf{C}}^{(0)}=10^{-3}, and λ(0)=0.1\lambda^{(0)}=0.1. As for the update of σw2\sigma_{w}^{2}, it is updated in each iteration of Turbo-MP with the initialization (σw2)(0)=𝐘F2/MTN(\sigma_{w}^{2})^{(0)}\!\!=\!\!||\mathbf{Y}||_{F}^{2}/MTN. However, we find the second part of (V-A), i.e., KMm(v𝐡mB,post+1TNi=1TNDi,i2v𝐜mC,post)\frac{K}{M}\sum_{m}\big{(}v_{\mathbf{h}_{m}}^{B,post}+\frac{1}{TN}\sum_{i=1}^{TN}D_{i,i}^{2}v_{\mathbf{c}_{m}}^{C,post}\big{)} is negligible in most cases while in worst case, it continues to rise with the increase of Turbo-MP iterations. Therefore, we delete the second part of (V-A) in simulation.

V-B NN Approach

In deep learning [3], training data can be used to train the parameters of a deep neural network. Inspired by the idea in [31] and [32], we unfold the iterations of Turbo-MP algorithm and regard it as a feed-forward NN termed Turbo-MP-NN. In specific, each iteration represents one layer of the feed-forward NN where 𝜽\boldsymbol{\theta} is seen as the network parameter. We hope that with an appropriately defined loss function, Turbo-MP-NN can adaptively learn 𝜽\boldsymbol{\theta} from the training data.

Refer to caption
Figure 4: The block diagram of Turbo-MP-NN. The iterations of Turbo-MP are unfolded into a neural network. Capital letters A, B, C, D denote modules A, B, C, and D, respectively.

The structure of Turbo-MP-NN is shown in Fig. 4 which consists of LL layers. Each layer has the same structure, i.e., the same linear and non-linear operations following step 2-15 in algorithm 1. To distinguish the estimates at different layer, denote 𝐇A,pri\mathbf{H}^{A,pri} ,𝐂A,pri\mathbf{C}^{A,pri} and πkC\pi_{k}^{C} obtained at the i1i-1-th layer (iteration) by 𝐇A,pri,(i)\mathbf{H}^{A,pri,(i)}, 𝐂A,pri,(i)\mathbf{C}^{A,pri,(i)}, and πkC,(i)\pi_{k}^{C,(i)}, respectively. In the ii-th layer, the inputs consist of training data 𝐘\mathbf{Y} and the outputs of the i1i-1-th layer including 𝐇A,pri,(i)\mathbf{H}^{A,pri,(i)}, 𝐂A,pri,(i)\mathbf{C}^{A,pri,(i)}, and {πkC,(i)}k=1K\{\pi_{k}^{C,(i)}\}_{k=1}^{K}. The loss function is defined as the normalized mean square error (NMSE) of channel estimation given by

f(𝜽)=k𝐆k𝐄1𝐇^k(L)𝐄2𝐂^k(L)F2𝐆kF2,\displaystyle f(\boldsymbol{\theta})=\sum_{k}\frac{||\mathbf{G}_{k}-\mathbf{E}_{1}\mathbf{\hat{H}}_{k}^{(L)}-\mathbf{E}_{2}\mathbf{\hat{C}}_{k}^{(L)}||_{F}^{2}}{||\mathbf{G}_{k}||_{F}^{2}}, (74)

where the estimates 𝐇^k(L)\mathbf{\hat{H}}_{k}^{(L)} and 𝐂^k(L)\mathbf{\hat{C}}_{k}^{(L)} can be 𝐇kB,post\mathbf{H}_{k}^{B,post} and 𝐂kC,post\mathbf{C}_{k}^{C,post} or 𝐇kA,pri\mathbf{H}_{k}^{A,pri} and 𝐂kC,pri\mathbf{C}_{k}^{C,pri} obtained in the LL-th layer. Note that different from the EM approach, all layers of the NN have the same 𝜽\boldsymbol{\theta}. To avoid over-fitting, we train the neural network through the layer-by-layer method [32]. Specifically, we begins from the training of the first layer, then first two layers, and finally LL layers. 𝜽\boldsymbol{\theta} is initialized following (63). For the first ii layers i=1,2,,Li=1,2,\ldots,L, we optimize 𝜽\boldsymbol{\theta} by using the back-propagation to minimize the loss function

f(i)(𝜽)=k𝐆k𝐄1𝐇^k(i)𝐄2𝐂^k(i)F2𝐆kF2.\displaystyle f^{(i)}(\boldsymbol{\theta})=\sum_{k}\frac{||\mathbf{G}_{k}-\mathbf{E}_{1}\mathbf{\hat{H}}_{k}^{(i)}-\mathbf{E}_{2}\mathbf{\hat{C}}_{k}^{(i)}||_{F}^{2}}{||\mathbf{G}_{k}||_{F}^{2}}. (75)

The detailed training process is shown in Algorithm 2. Once trained, the iteration process of Turbo-MP-NN is illustrated in Algorithm 1. From the simulation, we find that for a fixed sub-block number QQ, the change of the learned ϑ𝐇\vartheta_{\mathbf{H}} and ϑ𝐂\vartheta_{\mathbf{C}} is linear with respect to the signal-to-ratio SNR=P/σN2{\rm SNR}=P/\sigma_{N}^{2}, and the change of the σw2\sigma_{w}^{2} minus σN2\sigma_{N}^{2} is also linear with respect to the SNR. As such, there is no need to train the prior parameters offline at different SNR.

Algorithm 2 Parameter Training of Turbo-MP-NN via Layer-by-Layer Method
0:    𝐘,𝐀\mathbf{Y},\mathbf{A}, 𝐁\mathbf{B}.
0:    
1:  Initialize 𝜽\boldsymbol{\theta}.
2:  for i[1,L]i\in[1,L] do
3:     Run ii iterations of Turbo-MP algorithm following step 2-15 in Algorithm 1 to obtain 𝐇^k(i)\widehat{\mathbf{H}}^{(i)}_{k} and 𝐂^k(i)\widehat{\mathbf{C}}^{(i)}_{k}.
4:     With the loss function f(i)(𝜽)f^{(i)}(\boldsymbol{\theta}), use back-propagation to update 𝜽\boldsymbol{\theta}.
5:  end for
6:  return  𝜽\boldsymbol{\theta}.

VI Numerical Results

In this section, we evaluate the performance of the proposed Turbo-MP algorithm in channel estimation and device activity detection. The simulation setup is as follows. The BS is equipped with M=4M=4 or 88 antennas. P=72P=72 OFDM subcarriers are allocated for the random access with subcarrier spacing Δf=15\Delta f=15 kHz. K=1000K=1000 devices access the BS and transmit their signals with probability λ=0.05\lambda=0.05 in each timeslot. We adopt the TDL-C channel model with 300 ns r.m.s delay spread and its detailed PDP can be found in TR 38.901 R14 [27]. For Turbo-MP-NN, we randomly generate 10410^{4} channel realizations for training and the training data is divided into minibatches of size 4. To evaluate the performance of the proposed algorithm, we use the NMSE and the detection error probability Pe=Pmiss+Pfalse{\rm Pe}={\rm P_{miss}}+{\rm P_{false}} as the performance metrics, where Pmiss=1/Kkp(α^k=0|αk=1){\rm P_{miss}}=1/K\sum_{k}p(\hat{\alpha}_{k}=0|\alpha_{k}=1) is the probability of miss detection and Pfalse=1/Kkp(α^k=1|αk=0){\rm P_{false}}=1/K\sum_{k}p(\hat{\alpha}_{k}=1|\alpha_{k}=0) is the probability of false alarm. Note that for the figures showing CE performance, each data point is averaged over 50005000 realizations. For the figures showing ADD performance, at each data point, the accumulative number of the detection errors is larger than 10310^{3}.

The baseline algorithms are as follows:

  • FD-GMMV-AMP: The GMMV-AMP algorithm was proposed in [18] to joint estimate the channel response on every subcarrier and detect the active devices based on the frequency-domain system model (8). We assume that the prior parameters including the channel variance on every subcarrier, noise variance σN2\sigma_{N}^{2} and access probability λ\lambda are known for FD-GMMV-AMP. (The following algorithms also use σN2\sigma_{N}^{2} and λ\lambda as known parameters.)

  • TD-Gturbo-MMV: We adopt the Gturbo-MMV algorithm [11] to achieve the ADD and CE based on the time-domain system model (II-A), which we refer to as TD-Gturbo-MMV. Note that the denoiser in [11] is extended and applied to 𝐇~k\tilde{\mathbf{H}}_{k}. We further assume that the BS knows the delay spread of the time-domain channel, and thus the time-domain delay taps can be truncated to reduce the number of channel coefficients to be estimated. Specifically, there are 88 delay taps left with 44 taps at the head and 44 taps at the tail which contains 99%99\% energy of the time-domain channel. As a Bayesian algorithm, TD-Gturbo-MMV requires the PDP of the time-domain channel as prior information. One option is to assume that the exact PDP is known. However, in practice, the exact PDP of each device is difficult to acquire. Therefore, we also consider using a flat PDP with 88 equal-power delay taps as the prior information.

  • TD-Vector AMP: The Vector AMP [9] algorithm is adopted as another baseline in the time-domain system model (II-A). Similarly, we extend and apply the denoiser in [9] to 𝐇~k\tilde{\mathbf{H}}_{k}. TD-Vector AMP with the exact PDP and the flat PDP are both considered.

Refer to caption
Figure 5: Channel estimation NMSE versus Iterations number. The number of OFDM symbols T=8T=8 and the sub-block number Q=4Q=4. SNR=10{\rm SNR}=10 dB and the number of BS antennas M=8M=8.

Fig. 5 shows the channel estimation NMSE versus the iteration number. Both Turbo-MP-EM and Turbo-MP-NN converge and significantly outperform the other algorithms by more than 66 dB in CE NMSE. Turbo-MP-NN converges faster than Turbo-MP-EM, which implying that the neural network approach can obtain more accurate prior parameters. In addition, it is seen that FD-GMMV-AMP has a quite poor performance since the average number of the unknown variables on every subcarrier is much larger than the number of the measurements, i.e., λKT\lambda K\gg T. Concerning TD-GTurbo-MMV and TD-Vector AMP, there is a performance loss when the PDP is not exactly known.

Refer to caption
Figure 6: Channel estimation NMSE versus SNR. The number of OFDM symbols T=8T=8 and the sub-block number Q=4Q=4.
Refer to caption
Figure 7: Detection error probability versus SNR. The number of OFDM symbols T=8T=8 and the sub-block number Q=4Q=4.

To further demonstrate the performance superiority of Turbo-MP, Fig. 6 shows the channel estimation NMSE against the SNR. With the increase of the SNR, the performance gap between Turbo-MP and the baselines becomes larger, which suggests that if the BS adopts the Turbo-MP algorithm to reach a high CE performance, the devices will consume much lower power. It is also found that the CE performance of each algorithm at different BS antenna number is similar. Fig. 7 shows the detection error probability versus the SNR. It is seen that as the antenna number MM increases, the detection performances of the tested algorithms improve more than one order of magnitude. This is because the increase of BS antennas leads to a larger dimension of the block-sparsity vector. Among the tested algorithms, Turbo-MP-NN has superiority over the other algorithms especially when antenna number M=8M=8.

Refer to caption
Figure 8: Channel estimation NMSE versus the number of OFDM symbols TT. SNR=10{\rm SNR}=10 dB and the sub-block number Q=4Q=4.
Refer to caption
Figure 9: Detection error probability versus the number of OFDM symbols TT. SNR=15{\rm SNR}=-15 dB and the sub-block number Q=4Q=4.

Fig. 8 shows the channel estimation NMSE against the different number of OFDM symbols. There is a clear performance gap between Turbo-MP and the baselines at different TT. Moreover, to reach NMSE=15\rm{NMSE}=-15 dB, the pilot overhead (T=5T=5) for Turbo-MP is only half of that (T=10T=10) for TD-Vector AMP and TD-Gturbo-MMV with exact PDP. It is shown that our proposed scheme can support mMTC with a dramatically reduced overhead. In Fig. 9, the detection error probability versus OFDM symbols is shown. The trend is similar to Fig. 7. Turbo-MP achieves the best ADD performance at the different number of OFDM symbols.

Refer to caption
Figure 10: Channel estimation NMSE versus sub-block number QQ at different SNR{\rm SNR}. The number of BS antennas M=8M=8 and the number of OFDM symbols T=10T=10.
Refer to caption
Figure 11: Probability of miss detection versus probability of false alarm at different sub-block number and antennas’ number. SNR=15{\rm SNR}=-15 dB and the number of OFDM symbols T=10T=10.

Fig. 10 illustrates the impact of the sub-block number on the CE performance at different SNR, which implies the trade-off between the estimation accuracy and the model accuracy. In specific, it is seen that at low SNR, a smaller sub-block number corresponds to better NMSE performance while the case is contrary at high SNR. The reason is that the NMSE performance is mainly affected by AWGN at low SNR, in which case a small sub-block number helps to improve the estimation accuracy. However, at high SNR, the CE performance is limited by the model mismatch which can be reduced by increasing the sub-block number. In practice, a proper sub-block number needs to be chosen according to the wireless environment. Fig. 11 shows the miss detection probability versus false alarm probability, where we modify the threshold λthr\lambda^{thr} to reach the trade-off between miss detection and false alarm. Clearly, Turbo-MP-NN outperforms Turbo-MP-EM at different settings of sub-block number and antennas number. Besides, we see that Turbo-MP with the smaller sub-block number corresponds to a better ADD performance at SNR=15{\rm SNR}=-15 dB. This result is consistent with the CE performance in Fig. 10. From Fig. 10 and Fig.11, we find that the algorithms can achieve excellent device detection performance at relative low SNR while the accurate channel estimation requires a higher SNR. Such observation suggests that when the BS is equipped with multiple antennas, the bottleneck in mMTC is CE instead of ADD.

VII Conclusion

In this paper, a frequency-domain block-wise linear channel model was established in the MIMO-OFDM-based grant-free NOMA system to effectively compensate the channel frequency-selectivity and reduce the number with a small number of variables to be determined in channel estimation. From the perspective of Bayesian inference, we designed the low-complexity Turbo-MP algorithm to solve the ADD and CE problem, where machine learning was incorporated to learn the prior parameters. We numerically show that Turbo-MP designed for the proposed block-wise linear model significantly outperforms the state-of-the-art counterpart algorithms.

References

  • [1] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5g: physical and mac-layer solutions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 59–65, Sep. 2016.
  • [2] Study on New Radio Access Technology, Std. TR 38.901 version 14.2.0 Release 14, 3GPP, Sep. 2017.
  • [3] B. Wang, L. Dai, Y. Zhang, T. Mir, and J. Li, “Dynamic compressive sensing-based multi-user detection for uplink grant-free noma,” IEEE Commun. Lett., vol. 20, no. 11, pp. 2320–2323, Nov. 2016.
  • [4] Y. Du, C. Cheng, B. Dong, Z. Chen, X. Wang, J. Fang, and S. Li, “Block-sparsity-based multiuser detection for uplink grant-free NOMA,” IEEE Trans. Wireless Commun., vol. 17, no. 12, pp. 7894–7909, Dec. 2018.
  • [5] B. K. Jeong, B. Shim, and K. B. Lee, “MAP-based active user and data detection for massive machine-type communications,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8481–8494, Sep. 2018.
  • [6] L. Liu, C. Yuen, Y. L. Guan, Y. Li, and C. Huang, “Gaussian message passing for overloaded massive MIMO-NOMA,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 210–226, Jan. 2018.
  • [7] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the internet of things,” IEEE Signal Process. Mag, vol. 35, no. 5, pp. 88–99, Sep. 2018.
  • [8] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, April 2006.
  • [9] L. Liu and W. Yu, “Massive connectivity with massive MIMO-part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, June 2018.
  • [10] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,” IEEE Transactions on Signal Processing, vol. 66, no. 7, pp. 1890–1904, April 2018.
  • [11] T. Liu, S. Jin, C. Wen, M. Matthaiou, and X. You, “Generalized channel estimation and user detection for massive connectivity with mixed-ADC massive MIMO,” IEEE Trans. Wireless Commun., vol. 18, no. 6, pp. 3236–3250, June 2019.
  • [12] J. Ma, X. Yuan, and L. Ping, “Turbo compressed sensing with partial DFT sensing matrix,” IEEE Signal Process. Lett., vol. 22, no. 2, pp. 158–161, Feb. 2015.
  • [13] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 764–785, Feb. 2011.
  • [14] Q. Zou, H. Zhang, D. Cai, and H. Yang, “A low-complexity joint user activity, channel and data estimation for grant-free massive MIMO systems,” IEEE Signal Process. Lett., vol. 27, pp. 1290–1294, July 2020.
  • [15] T. Ding, X. Yuan, and S. C. Liew, “Sparsity learning-based multiuser detection in grant-free massive-device multiple access,” IEEE Trans. Wireless Commun., vol. 18, no. 7, pp. 3569–3582, July 2019.
  • [16] Y. Zhang, Q. Guo, Z. Wang, J. Xi, and N. Wu, “Block sparse bayesian learning based joint user activity detection and channel estimation for grant-free NOMA systems,” IEEE Trans. Veh. Technol., vol. 67, no. 10, pp. 9631–9640, Oct. 2018.
  • [17] Z. Zhang, Y. Li, C. Huang, Q. Guo, C. Yuen, and Y. L. Guan, “DNN-aided block sparse Bayesian learning for user activity detection and channel estimation in grant-free non-orthogonal random access,” IEEE Trans. Veh. Technol., vol. 68, no. 12, pp. 12 000–12 012, Dec. 2019.
  • [18] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing-based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764–779, Jan 2020.
  • [19] X. Kuai, L. Chen, X. Yuan, and A. Liu, “Structured turbo compressed sensing for downlink massive mimo-ofdm channel estimation,” IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 3813–3826, Aug. 2019.
  • [20] Z. Xue, X. Yuan, and Y. Yang, “Denoising-based turbo message passing for compressed video background subtraction,” IEEE Trans. Image Process., vol. 30, pp. 2682–2696, Feb. 2021.
  • [21] R. Ratasuk, N. Mangalvedhe, D. Bhatoolaul, and A. Ghosh, “LTE-M evolution towards 5G massive MTC,” in in proc. 2017 IEEE Globecom Workshops (GC Wkshps), 2017, pp. 1–6.
  • [22] F. R. Kschischang, B. J. Frey, and H. . Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, Feb 2001.
  • [23] T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47–60, Nov 1996.
  • [24] M. Chen, Y. Miao, Y. Hao, and K. Hwang, “Narrow band internet of things,” IEEE Access, vol. 5, pp. 20 557–20 577, Sep. 2017.
  • [25] P. Popovski, K. F. Trillingsgaard, O. Simeone, and G. Durisi, “5G wireless network slicing for eMBB, URLLC, and mMTC: A communication-theoretic view,” IEEE Access, vol. 6, pp. 55 765–55 779, Sep. 2018.
  • [26] Y. Li, L. J. Cimini, and N. R. Sollenberger, “Robust channel estimation for ofdm systems with rapid dispersive fading channels,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 902–915, July 1998.
  • [27] 5G; Study on channel model for frequencies from 0.5 to 100 GHz, Std. TR 38.901 version 14.0.0 Release 14, 3GPP, May 2017.
  • [28] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory.   Prentice Hall International Editions, 1993.
  • [29] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding: turbo-codes,” IEEE Trans. Commun., Oct. 1996.
  • [30] T. P. Minka, “Expectation propagation for approximate bayesian inference,” arXiv preprint arXiv:1301.2294, Jan. 2013.
  • [31] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks for sparse linear inverse problems,” IEEE Trans. Signal Process., vol. 65, no. 16, pp. 4293–4308, August 2017.
  • [32] X. He, Z. Xue, and X. Yuan, “Learned turbo message passing for affine rank minimization and compressed robust principal component analysis,” IEEE Access, vol. 7, pp. 140 606–140 617, 2019.