This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Message Passing-Based Joint User Activity Detection and Channel Estimation for Temporally-Correlated Massive Access

Weifeng Zhu, , Meixia Tao, , Xiaojun Yuan, , and Yunfeng Guan This paper was presented in part at the IEEE International Conference of Communications (ICC) 2021 [1].W. Zhu, M. Tao, and Y. Guan are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: {wf.zhu, mxtao, yfguan69}@sjtu.edu.cn).X. Yuan is with the Center for Intelligent Networking and Communication (CINC), University of Electronic Science and Technology of China, Chengdu 610000, China (e-mail: [email protected]).
Abstract

This paper studies the user activity detection and channel estimation problem in a temporally-correlated massive access system where a very large number of users communicate with a base station sporadically and each user once activated can transmit with a large probability over multiple consecutive frames. We formulate the problem as a dynamic compressed sensing (DCS) problem to exploit both the sparsity and the temporal correlation of user activity. By leveraging the hybrid generalized approximate message passing (HyGAMP) framework, we design a computationally efficient algorithm, HyGAMP-DCS, to solve this problem. In contrast to only exploit the historical estimations, the proposed algorithm performs bidirectional message passing between the neighboring frames for activity likelihood update to fully exploit the temporally-correlated user activities. Furthermore, we develop an expectation maximization HyGAMP-DCS (EM-HyGAMP-DCS) algorithm to adaptively learn the hyperparameters during the estimation procedure when the system statistics are unknown. In particular, we propose to utilize the analysis tool of state evolution to find the appropriate hyperparameter initialization of EM-HyGAMP-DCS. Simulation results demonstrate that our proposed algorithms can significantly improve the user activity detection accuracy and reduce the channel estimation error.

Index Terms:
Temporally-correlated massive access, user activity detection, channel estimation, hybrid generalized approximate message passing (HyGAMP), expectation maximization (EM).

I Introduction

Massive machine-type communications (mMTC) is one of the main use cases of the fifth-generation (5G) cellular networks, for Internet of Things (IoT) applications [2]. It aims to provide wireless connectivity to a massive number of IoT devices, whose traffic is typically sporadic [3, 4]. The main technical challenge of mMTC is to design scalable, efficient, and low-latency multiple-access schemes. Due to the massive number of device, the conventional grant-based random access schemes suffer from excessive delay and signaling overhead, then the grant-free random access scheme is considered as the promising solution to decrease the access delay and reduce the control overhead for coordination [4].

In the grant-free random access protocol, a unique pilot sequence for identification and channel estimation is pre-allocated to each device. Note that, due to the massive number of the IoT devices but the limited coherence time interval, the pilot sequences are usually non-orthogonal. When one device is activated, it directly transmits its dedicated pilot followed by data without waiting for the grant from the base station (BS). Meanwhile, the BS can perform joint user activity detection and channel estimation based on the received signals. Given the sporadic traffic generated from IoT devices, the joint user activity detection and channel estimation problem by nature can be cast into a large-scale sparse signal recovery problem, which can be reliably solved by the compressed sensing (CS) techniques [4].

In many practical IoT systems, once a device is activated, it often transmits continuously over multiple frames. This suggests that the device activities are often temporally-correlated. To exploit the temporally-correlated user activity, we are motivated to detect the active users and estimate their channels in multiple consecutive frames jointly. The joint user activity detection and channel estimation problem can be formulated from the dynamic compressed sensing (DCS) perspective [5, 6], which takes both the sporadic user activity and the temporal correlation of user activities into account. Based on the probabilistic and graphical model of the signals, the joint user activity detection and channel estimation can be realized by the standard message passing (MP) algorithm [7]. To simplify the implementation of MP in large-scale systems, we propose to leverage the hybrid generalized approximate message passing (HyGAMP) [8] framework which employs a hybrid of AMP and MP for the graphical model. Considering that the perfect system statistics are usually unknown, the expectation maximization (EM) technique is incorporated with the algorithm to adaptively learn the hyperparameters in the estimation procedure. Compared with the existing DCS-based algorithms, we show that the proposed algorithms can significantly improve the user activity detection accuracy and reduce the channel estimation error.

I-A Related Works

To accomplish joint user activity detection and channel estimation for massive access, a variety of advanced sparse signal recovery algorithms have been proposed for different wireless communication systems. For the single-cell scenario, the works [9, 10] utilize the standard CS algorithms of orthogonal matching pursuit (OMP) and basis pursuit denoising (BPDN). However, these two kinds of algorithms usually suffer high computational complexity due to the extremely large amount of devices. Then the works [11, 12] propose the computationally efficient AMP algorithms with Bayesian denoiser for user activity detection. In the work [13], the message-scheduling generalized AMP algorithm is developed to further reduce the computational complexity without degrading the detection and estimation performance. The AMP-based algorithms are also extended to perform data detection along with the activity detection in [14, 15]. Besides the AMP-based algorithms, covariance matching pursuit [16, 17], dimension reduction-based optimization [18] and deep learning methods [19, 20, 21] are also investigated for performance enhancement. In particular, the covariance matching pursuit algorithm [16, 17] and dimension reduced-based optimization algorithm [18] are especially designed for the massive multiple-input multiple-out (MIMO) systems, which can significantly outperform the conventional CS-based methods. The deep learning methods [19, 20, 21] employ the artificial neural networks and learn the network parameters based on the pre-collected data, which are able to outperform the traditional hand-designed algorithms and to enjoy low computational complexity. Moreover, a sparsity-constrained method is proposed for the non-ideal scenario where different users have unknown frequency offsets in [22]. In the multi-cell systems, the cooperative user activity detection and channel estimation is studied in [23, 24] based on the AMP-based algorithms. Recently, unsourced random access as a new random access protocol is also studied in [25, 16], where all users share a common codebook and the BS only needs to detect the transmit codewords.

The aforementioned works all detect the user activity in each frame individually by assuming the user activity is independent for each frame. Due to the low-rate transmission, the information transmission of one active user may occupy multiple consecutive frames. Such temporal correlation of user activity can be exploited for performance enhancement. Assuming the user sparsity level and the channel state information (CSI) are available, the work [26] proposes a DCS-based algorithm where the set of the detected active user in the last slot is used as the initial set to be detected in the current slot. By adaptively exploiting the prior support based on the corresponding support quality information, the work [27] proposes a prior-information-aided adaptive subspace pursuit (PIA-ASP) algorithm, which can always outperform the DCS-based algorithm in [26]. These algorithms include the matrix inversion calculation in each iteration, and thus suffer high computational complexity in the large-scale problems. Moreover, both the works [26, 27] assume that the CSI is perfectly known in advance, which is unrealistic in massive access. By leveraging the system statistics, the authors in [28] propose a sequential AMP (S-AMP) algorithm to sequentially perform user activity detection and channel estimation in each frame. Then the work [29] proposes to extract the side information (SI) from the estimation results in the previous frame to enhance the user activity detection performance in the current frame based on the AMP framework. Different from [29] which only utilizes the historical estimation results as SI, our previous work [30] proposes to extract the double-sided information by considering the estimation from the next frame as well for further exploiting the temporal correlation.

Compared with these prior works [26, 27, 29, 28, 30] that exploit the temporal correlation for use activity detection and channel estimation, this paper differs mainly in the following two aspects. First, while all the exiting works perform activity detection in a frame-by-frame manner, (i.e., the user activity is still detected sequentially frame by frame, though the temporal correlation among adjacent frames has been exploited), this paper performs multi-frame activity detection in a block-by-block manner. Second, this paper proposes to adaptively learn the parameters of the system statistics by using the EM technique, while the previous algorithms either do not consider the system statistics [26, 27] or assume the system statistics are perfectly known [29, 28, 30].

I-B Main Contributions

In this work, we consider the problem of joint user activity detection and channel estimation in multiple consecutive frames for the temporally-correlated massive access systems. The main contributions and distinctions of this work are summarized as follows:

  1. 1.

    We propose to perform the user activity detection and channel estimation jointly in multiple consecutive frames and formulate it as a DCS problem. Based on the probabilistic model, not only the sparse activity pattern is considered in the problem, but also the statistical relationships of the activities in these consecutive frames are exploited.

  2. 2.

    In contrast to only making use of the historical estimations in [28, 29], this paper proposes a HyGAMP-DCS algorithm to fully exploit the temporally-correlated user activities in the multiple consecutive frames. The HyGAMP-DCS algorithm combines the computationally efficient GAMP algorithm for channel estimation and the standard MP algorithm for the activity likelihood update. In particular, the activity likelihood values in each frame are updated by aggregating both the forward messages from the previous frame and backward messages from the next frame at each iteration. Numerical results show that HyGAMP-DCS can achieve superior performance to the existing DCS-based algorithms [26, 27, 28, 29] thanks to the bidirectional message propagation.

  3. 3.

    To make the proposed algorithm applicable for the practical system with imperfect system statistics, this paper incorporates the EM algorithm in HyGAMP-DCS to adaptively learn the hyperparameters during the estimation procedure. Compared with the traditional EM-AMP algorithm [31] for the CS problem, the proposed EM-HyGAMP-DCS algorithm additionally learns the statistical dependencies between the activities in the neighboring frames. Simulation results demonstrate that EM-HyGAMP-DCS realizes very similar performance to HyGAMP-DCS which requires the perfect system statistics.

  4. 4.

    We provide the performance and complexity analysis of the proposed algorithms. We first introduce the state evolution (SE) to predict the asymptotic performance of HyGAMP-DCS and EM-HyGAMP-DCS. In particular, we also point out that the SE is quite essential to find the appropriate hyperparameter initialization for EM-HyGAMP-DCS, which is validated by the numerical results. Then the computational complexity comparison with the state-of-the-art methods is given to illustrate the computational efficiency of our proposed algorithms.

I-C Organizations and Notations

The remaining part of this paper is organized as follows. Section II introduces the system model of temporally-correlated massive access. In Section III, we present the probabilistic model of the considered system and then introduce the Bayesian inference methods. In Section IV, the HyGAMP-DCS algorithm is proposed, then the EM algorithm is employed for hyperparameter update in Section V. Next, we analyze the performance and the computational complexity of the proposed algorithms in Section VI. The simulation results of the proposed algorithms are shown in Section VII. Finally, we conclude this paper in Section VIII.

In this paper, upper-case and lower-case letters denote random variables and their realizations, respectively. Letters 𝐱\mathbf{x}, 𝐗\mathbf{X} denote vector and matrix, respectively. Superscripts ()T,()(\cdot)^{T},(\cdot)^{*} denote transpose and conjugate, respectively. Further, 𝔼[]\mathbb{E}[\cdot] and 𝕍[]\mathbb{V}[\cdot] denotes the expectation operation and variance operation, respectively; |||\cdot| denotes the magnitude of a variable. In addition, a random vector 𝐱M×1\mathbf{x}\in\mathbb{C}^{M\times 1} drawn from the complex Gaussian distribution with mean 𝐱0M×1\mathbf{x}_{0}\in\mathbb{C}^{M\times 1} and covariance matrix 𝚺M×M\boldsymbol{\Sigma}\in\mathbb{C}^{M\times M} is characterized by the probability density function 𝒞𝒩(𝐱;𝐱0,𝚺)=exp((𝐱𝐱0)H𝚺1(𝐱𝐱0))πM|𝚺|\mathcal{CN}(\mathbf{x};\mathbf{x}_{0},\boldsymbol{\Sigma})=\frac{\exp\left(-(\mathbf{x}-\mathbf{x}_{0})^{H}\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\mathbf{x}_{0})\right)}{\pi^{M}|\boldsymbol{\Sigma}|}.

II System Model

Refer to caption
Figure 1: Illustration of a temporally-correlated massive access system

We consider a single-cell massive access system where a large number, denoted as NN, of users communicate to a common BS under the grant-free mechanism. The BS and users are all equipped with single antenna. We assume that the transmit signals of all users are perfectly synchronized at the BS. The block fading channel model is adopted, i.e., the channel coefficients on each communication link keep unchanged within a transmission frame but vary in different frames. Due to the sporadic traffic of IoT applications, only a small portion of users are active for uplink transmission while others keep silent at each time frame.

II-A Temporal Correlation of the User Activity

As shown in Fig. 1, once a user is activated, its data transmission can occupy multiple consecutive frames. As such, there exist temporal correlations of the user activities in neighboring frames.

Similar to [28, 29, 30, 6], the dynamic user activity over frames is modeled as a stationary first-order Markov chain with two discrete states. The Markov chains on different devices are assumed to independent and identically distributed (i.i.d.). Let λn,t{0,1}\lambda_{n,t}\in\{0,1\} indicate whether user n{1,2,,N}n\in\{1,2,\dots,N\} is active or not in the ttth frame. The steady state probability is given by Pr(λn,t=1)=pa\text{Pr}(\lambda_{n,t}=1)=p_{a}, where pap_{a} denotes the active probability of each user nn in frame tt. The transition probability matrix is defined as

𝐏=[p00p10p01p11],\mathbf{P}=\left[\begin{array}[]{cc}p_{00}&p_{10}\\ p_{01}&p_{11}\end{array}\right], (1)

where p10=Pr{λn,t=0|λn,t1=1}p_{10}=\text{Pr}\{\lambda_{n,t}=0|\lambda_{n,t-1}=1\} and other three transition probabilities are defined similarly. By the steady-state assumption, the Markov chain can be completely characterized by two parameters pap_{a} and p10p_{10}. The other three transition probabilities can be easily obtained as p01=pap10/(1pa)p_{01}=p_{a}p_{10}/(1-p_{a}), p00=1p01p_{00}=1-p_{01}, and p11=1p10p_{11}=1-p_{10}. Note that 1/p101/p_{10} can be considered as the expected number of the consecutive frames occupied by the data transmission of one user, which means that smaller p10p_{10} leads to stronger temporal correlation of the user activity over frames. Although the temporal correlation can be described more precisely by higher-order Markov chains, we consider the first-order Markov chain here for simplicity.

II-B Multi-Frame Dynamic Signal Model

In order to exploit the temporally correlated user activities, we consider TT consecutive frames for signal transmission with T2T\geq 2. In each frame t{1,2,,T}t\in\{1,2,\dots,T\}, the transmission is divided into two phases, the pilot phase and data phase. Each user nn is pre-assigned a unique pilot sequence 𝐚n=[an,1,an,2,,an,L]TL×1\mathbf{a}_{n}=[a_{n,1},a_{n,2},\dots,a_{n,L}]^{T}\in\mathbb{C}^{L\times 1} with length LL. The BS performs user identification and channel estimation based on the detected pilot sequences in the pilot phase and then decodes the data in the data phase. In this work, we concentrate on the joint user activity detection and channel estimation problem in the pilot phase. Due to the limit of orthogonal resources, the pilot length LL is assumed to be much smaller than the number of users NN, i.e., LNL\ll N, so that the pilot sequences of different users cannot be mutually orthogonal. The pilot sequences in this work are generated from the i.i.d. complex Gaussian distributions with zero mean and variance of 1/L1/L, and then scaled to have unit power.

The channel in each frame is assumed to satisfy independent Rayleigh distribution. We denote the channel coefficient between user nn and the BS in the ttth frame as hn,t=βngn,th_{n,t}=\sqrt{\beta_{n}}g_{n,t}, where βn=Pnγn\beta_{n}=P_{n}\gamma_{n} is the effective large-scale fading component decided by the transmit power PnP_{n} and the large-scale attenuation γn\gamma_{n}, gn,tg_{n,t}\in\mathbb{C} is the small-scale fading coefficient following i.i.d. circularly symmetric complex Gaussian distributions with zero mean and unit variance, i.e., gn,t𝒞𝒩(0,1)g_{n,t}\sim\mathcal{CN}(0,1). We adopt the power control strategy in [14], so that the effective large-scale fading component of each user is the same, i.e., βn=β,n\beta_{n}=\beta,\forall n. Then the received pilot signals at the BS in continuous TT frames can be written as

𝐘\displaystyle\mathbf{Y} =𝐀(𝚲𝐇)+𝐖𝐀𝐗+𝐖,\displaystyle=\mathbf{A}\left(\mathbf{\Lambda}\odot\mathbf{H}\right)+\mathbf{W}\triangleq\mathbf{A}\mathbf{X}+\mathbf{W}, (2)

where 𝐀=[𝐚1,𝐚2,,𝐚N]L×N\mathbf{A}=[\mathbf{a}_{1},\mathbf{a}_{2},\dots,\mathbf{a}_{N}]\in\mathbb{C}^{L\times N} is the pilot matrix of all users; 𝚲=[𝝀1,𝝀2,,𝝀T]N×T\mathbf{\Lambda}=[\boldsymbol{\lambda}_{1},\boldsymbol{\lambda}_{2},\dots,\boldsymbol{\lambda}_{T}]\in\mathbb{C}^{N\times T} is the activity matrix with 𝝀t=[λ1,t,λ2,t,,λN,t]TN×1\boldsymbol{\lambda}_{t}=[\lambda_{1,t},\lambda_{2,t},\dots,\lambda_{N,t}]^{T}\in\mathbb{C}^{N\times 1}; 𝐇=[𝐡1,𝐡2,,𝐡T]N×T\mathbf{H}=[\mathbf{h}_{1},\mathbf{h}_{2},\dots,\mathbf{h}_{T}]\in\mathbb{C}^{N\times T} is the channel matrix with 𝐡t=[h1,t,h2,t,,hN,t]TN×1\mathbf{h}_{t}=[h_{1,t},h_{2,t},\dots,h_{N,t}]^{T}\in\mathbb{C}^{N\times 1}; 𝐗=[𝐱1,𝐱2,,𝐱T]N×T\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{T}]\in\mathbb{C}^{N\times T} is the effective channel matrix defined as the Hadamard product of 𝚲\mathbf{\Lambda} and 𝐇\mathbf{H} with each entry xn,t=λn,thn,tx_{n,t}=\lambda_{n,t}h_{n,t}; 𝐖=[𝐰1,𝐰2,,𝐰T]L×T\mathbf{W}=[\mathbf{w}_{1},\mathbf{w}_{2},\dots,\mathbf{w}_{T}]\in\mathbb{C}^{L\times T} is the additive white complex Gaussian noise (AWGN) matrix where each element has zero mean and variance σw2\sigma^{2}_{w}.

Our objective is to jointly detect the active users and estimate their channels by recovering 𝐗\mathbf{X} from the received signal 𝐘\mathbf{Y} given the pilot matrix 𝐀\mathbf{A}. This underdetermined linear inverse problem can be referred to as the dynamic compressed sensing (DCS) problem, where the support of the vector 𝐱t\mathbf{x}_{t} slowly changes with tt. The DCS problem can be solved by optimization-based methods using convex relaxation and the standard convex problem solver, e.g., Dynamic LASSO [5]. It can also be solved by the greed-based algorithms based on OMP and SP by employing the detected active user set in the previous frame as the prior information to assist the detection in the current frame [26, 27]. These two approaches can outperform the traditional CS-based algorithms, but however, are unable to fully utilize the system statistics including the Markov chain-based user activity and the channel distributions. To improve the performance over the optimization-based methods and the greedy-based algorithms, we propose to employ the Bayesian inference method in this work, whose details will be given in the next section.

III Bayesian Inference

In this section, the probabilistic model of the temporal-correlated massive access system is first introduced, then we introduce the powerful Bayesian inference methods to solve the considered DCS problem.

First, we represent all the variables and their relationships from the probability perspective. Since the block fading Rayleigh channel is assumed, the prior probability of the effective channel coefficient of user nn in the ttth frame, xn,tx_{n,t}, can be characterized by an independent Bernoulli-Gaussian distribution as

p(xn,t|β,λn,t)=(1λn,t)δ(xn,t)+λn,t𝒞𝒩(xn,t;0,β),n,t,p(x_{n,t}|\beta,\lambda_{n,t})=(1-\lambda_{n,t})\delta(x_{n,t})+\lambda_{n,t}\mathcal{CN}(x_{n,t};0,\beta),\forall n,t, (3)

where δ()\delta(\cdot) is the Dirac delta function. Note that the more complicated channel model can also be considered and we can approximate the channel distribution with the Gaussian mixture distribution [31]. The conditional probability p(𝐲t|𝐱t)p(\mathbf{y}_{t}|\mathbf{x}_{t}) is obtained from the AWGN channel as

p(𝐲t|𝐱t)=𝒞𝒩(𝐲t;𝐀𝐱t,σw2𝐈L),t.p(\mathbf{y}_{t}|\mathbf{x}_{t})=\mathcal{CN}(\mathbf{y}_{t};\mathbf{A}\mathbf{x}_{t},\sigma^{2}_{w}\mathbf{I}_{L}),\forall t. (4)

By combining the Markov-chain modeled user activities, the probabilistic signal model describing the linear system of (2) can be finally derived as

p(𝐘,𝐗,𝚲)\displaystyle p(\mathbf{Y},\mathbf{X},\boldsymbol{\Lambda}) =t=1T(p(𝐲t|𝐱t)n=1Np(xn,t|λn,t)p(λn,t|λn,t1)),\displaystyle=\prod_{t=1}^{T}\left(p(\mathbf{y}_{t}|\mathbf{x}_{t})\prod_{n=1}^{N}p(x_{n,t}|\lambda_{n,t})p(\lambda_{n,t}|\lambda_{n,t-1})\right), (5)

where it is noted that p(λn,1|λn,0)=p(λn,1)=(1pa)(1λn,1)+paλn,1p(\lambda_{n,1}|\lambda_{n,0})=p(\lambda_{n,1})=(1-p_{a})(1-\lambda_{n,1})+p_{a}\lambda_{n,1} and p(λn,t|λn,t1)p(\lambda_{n,t}|\lambda_{n,t-1}) is given as (-D) in the Appendix.

Based on the probabilistic model (5), the optimal performance of user activity detection and channel estimation under the minimum mean square error (MMSE) principle can be realized by Bayesian inference. Specifically, we first follow the Bayes’ rule to obtain the posterior joint probability of 𝐗\mathbf{X} and 𝚲\mathbf{\Lambda} as

p(𝐗,𝚲|𝐘)\displaystyle p(\mathbf{X},\mathbf{\Lambda}|\mathbf{Y}) =p(𝐘,𝐗,𝚲)p(𝐘)\displaystyle=\frac{p(\mathbf{Y},\mathbf{X},\boldsymbol{\Lambda})}{p(\mathbf{Y})} (6)

where the marginal probability of the received signal 𝐘\mathbf{Y} is derived by

p(𝐘)=𝐗𝚲p(𝐘,𝐗,𝚲)𝑑𝐗𝑑𝚲.\displaystyle p(\mathbf{Y})=\int_{\mathbf{X}}\int_{\mathbf{\mathbf{\Lambda}}}p(\mathbf{Y},\mathbf{X},\boldsymbol{\Lambda})~{}d\mathbf{X}d\mathbf{\mathbf{\Lambda}}. (7)

Given the posterior marginal probability p(xn,t|𝐘)=𝚲𝑑𝚲𝐗xn,tp(𝐗,𝚲|𝐘)𝑑𝐗xn,tp(x_{n,t}|\mathbf{Y})=\int_{\mathbf{\Lambda}}d\mathbf{\Lambda}\int_{\mathbf{X}_{\setminus x_{n,t}}}p(\mathbf{X},\mathbf{\Lambda}|\mathbf{Y})d\mathbf{X}_{\setminus x_{n,t}}, the MMSE estimator for xn,tx_{n,t} can be expressed as

x^n,t=xn,tp(xn,t|𝐘)𝑑xn,t,n,t.\widehat{x}_{n,t}=\int x_{n,t}\cdot p(x_{n,t}|\mathbf{Y})~{}dx_{n,t},\forall n,t. (8)

The estimator (8) achieves the minimum (Bayesian) MSE defined as

MSE(𝐗)=1NT𝔼[𝐗^𝐗F2],\text{MSE}(\mathbf{X})=\frac{1}{NT}\mathbb{E}\left[||\widehat{\mathbf{X}}-\mathbf{X}||_{F}^{2}\right], (9)

where the expectation is computed over the joint distribution p(𝐗,𝚲,𝐘)p(\mathbf{X},\mathbf{\Lambda},\mathbf{Y}) expressed in (5). Then the activity likelihood p^n,t\widehat{p}_{n,t} of user nn in frame tt can be obtained as

p^n,t=p(λn,t=1|𝐘),n,t,\widehat{p}_{n,t}=p(\lambda_{n,t}=1|\mathbf{Y}),\forall n,t, (10)

where p(λn,t|𝐘)=𝐗𝑑𝐗𝚲λn,tp(𝐗,𝚲|𝐘)𝑑𝚲λn,tp(\lambda_{n,t}|\mathbf{Y})=\int_{\mathbf{X}}d\mathbf{X}\int_{\mathbf{\Lambda}_{\setminus\lambda_{n,t}}}p(\mathbf{X},\mathbf{\Lambda}|\mathbf{Y})d\mathbf{\Lambda}_{\setminus\lambda_{n,t}}. By comparing (10) with a predefined threshold, the user activities can be finally decided.

Due to the fact that the number of users in massive access is very large, both the MMSE estimator (8) and the user activity likelihood calculator (10) involve very high-dimensional integrals and are thus intractable. In the next section, we will design a computationally efficient HyGAMP-based algorithm to approximately derive (8) and (10) for our problem.

IV The HyGAMP-DCS Algorithm

Refer to caption
Figure 2: The factor graph representation of the joint posterior distribution p(𝐗,𝚲|𝐘)p(\mathbf{X},\mathbf{\Lambda}|\mathbf{Y}).

Before illustrating the algorithm design, we first give the factor graph representation of the joint marginal probability p(𝐘,𝐗,𝚲)p(\mathbf{Y},\mathbf{X},\boldsymbol{\Lambda}) with decomposition (5) as shown in Fig. 2. Specifically, the black rectangle represents the factor node corresponding to the function p(yl,t|𝐱t)p(y_{l,t}|\mathbf{x}_{t}), p(xn,t|β,λn,t)p(x_{n,t}|\beta,\lambda_{n,t}) or p(λn,t|λn,t1)p(\lambda_{n,t}|\lambda_{n,t-1}), and the blank circle represents the variable node with the random variable yl,ty_{l,t}, xn,tx_{n,t} or λn,t\lambda_{n,t}. Intuitively, the factor graph suggests the usage of the MP algorithm [7] to approximately obtain (8) and (10). The standard MP algorithm iteratively updates the messages conveyed between the adjacent nodes and aggregates the messages arrived at the nodes {xn,t}\{x_{n,t}\} and {λn,t}\{\lambda_{n,t}\} to obtain their posterior probabilities, which avoids the extremely high-dimensional integrals in (8) and (10). However, the calculation of messages μ(n,t)(l,t)i(xn,t)\mu^{i}_{(n,t)\rightarrow(l,t)}(x_{n,t}) and υ(n,t)(l,t)i(xn,t)\upsilon^{i}_{(n,t)\leftarrow(l,t)}(x_{n,t}) in the dense bipartite graph including the nodes {xn,t}\{x_{n,t}\} and {p(yl,t|𝐱t)}\{p(y_{l,t}|\mathbf{x}_{t})\} is still intractable since LL and NN is large in massive access. Fortunately, the HyGAMP framework [8] can be leveraged to significantly simplify the implementation of the standard MP algorithm. Its key idea is to partition the edges in the factor graph into the strong and weak subsets. Then the messages passed on the weak edges are simplified by the AMP-style approximation, while the messages passed on the strong edge are still updated by standard MP principle.

By following the idea of the HyGAMP framework, we propose a HyGAMP-DCS algorithm specific to the considered DCS problem. Compared with the conventional HyGAMP algorithm in [8], the MP equations in HyGAMP-DCS are re-derived based on the statistical dependencies of the signals in our considered system. The HyGAMP-DCS algorithm can be divided into two parts of GAMP and MP based on the message updating rule, where these two parts perform channel estimation and activity likelihood update, respectively. By exchanging the extrinsic messages between these two parts, the performance of user activity detection and channel estimation can both be improved. In the following, the details of the message passing equations in these two part are introduced.

TABLE I: Message Definition in the Factor Graph at the iith Iteration
μ(n,t)(l,t)i(xn,t)\mu^{i}_{(n,t)\rightarrow(l,t)}(x_{n,t}) message from xn,tx_{n,t} to p(yl,t|𝐱t)p(y_{l,t}|\mathbf{x}_{t})
υ(n,t)(l,t)i(xn,t)\upsilon^{i}_{(n,t)\leftarrow(l,t)}(x_{n,t}) message from p(yl,t|𝐱t)p(y_{l,t}|\mathbf{x}_{t}) to xn,tx_{n,t}
μ(n,t)(n,t)i(xn,t)\mu^{i}_{(n,t)\rightarrow(n,t)}(x_{n,t}) message from xn,tx_{n,t} to p(xn,t|β,λn,t)p(x_{n,t}|\beta,\lambda_{n,t})
υ(n,t)(n,t)i(xn,t)\upsilon^{i}_{(n,t)\leftarrow(n,t)}(x_{n,t}) message from p(xn,t|β,λn,t)p(x_{n,t}|\beta,\lambda_{n,t}) to xn,tx_{n,t}
μ(n,t)(n,t)i(λn,t)\mu^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) message from λn,t\lambda_{n,t} to p(xn,t|β,λn,t)p(x_{n,t}|\beta,\lambda_{n,t})
υ(n,t)(n,t)i(λn,t)\upsilon^{i}_{(n,t)\leftarrow(n,t)}(\lambda_{n,t}) message from p(xn,t|β,λn,t)p(x_{n,t}|\beta,\lambda_{n,t}) to λn,t\lambda_{n,t}
ξ(n,t)(n,t)i(λn,t)\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) message from p(λn,t|λn,t1)p(\lambda_{n,t}|\lambda_{n,t-1}) to λn,t\lambda_{n,t}
ζ(n,t)(n,t)i(λn,t)\zeta^{i}_{(n,t)\leftarrow(n,t)}(\lambda_{n,t}) message from λn,t\lambda_{n,t} to p(λn,t|λn,t1)p(\lambda_{n,t}|\lambda_{n,t-1})
ξ(n,t)(n,t+1)i(λn,t)\xi^{i}_{(n,t)\rightarrow(n,t+1)}(\lambda_{n,t}) message from λn,t\lambda_{n,t} to p(λn,t+1|λn,t)p(\lambda_{n,t+1}|\lambda_{n,t})
ζ(n,t)(n,t+1)i(λn,t)\zeta^{i}_{(n,t)\leftarrow(n,t+1)}(\lambda_{n,t}) message from p(λn,t+1|λn,t)p(\lambda_{n,t+1}|\lambda_{n,t}) to λn,t\lambda_{n,t}

IV-A GAMP Part

Following the HyGAMP framework, the GAMP part regards the edges between the nodes {xn,t}\{x_{n,t}\} and {p(yl,t|𝐱t)}\{p(y_{l,t}|\mathbf{x}_{t})\} as the weak edges thanks to their linearizable coupling. Then the messages μ(n,t)(l,t)i(xn,t)\mu^{i}_{(n,t)\rightarrow(l,t)}(x_{n,t}) and υ(n,t)(l,t)i(xn,t)\upsilon^{i}_{(n,t)\leftarrow(l,t)}(x_{n,t}) are simplified by using GAMP approximations based on the central limit theorem and Taylor expansion.

The basic GAMP estimation is given in lines 8-17 of Algorithm 1, which can incorporate arbitrary distribution on the input 𝐗\mathbf{X} and output 𝐘\mathbf{Y} [32]. Compared with the standard MP algorithm, the number of the update variables in GAMP estimation shrinks from 𝒪(LN)\mathcal{O}(LN) to 𝒪(L+N)\mathcal{O}(L+N). In lines 8-11, the GAMP estimation performs the update of the noiseless signal 𝐙=𝐀𝐗\mathbf{Z}=\mathbf{A}\mathbf{X}. The distribution of each element zl,tz_{l,t} is obtained as 𝒞𝒩(zl,t;p^l,t(i),τl,tp(i))\mathcal{CN}(z_{l,t};\widehat{p}_{l,t}(i),\tau^{p}_{l,t}(i)) by the messages from the variable nodes {xn,t}\{x_{n,t}\}, where p^l,t(i)\widehat{p}_{l,t}(i) is the plug-in estimate of zl,tz_{l,t} with variance τl,tp(i)\tau^{p}_{l,t}(i) in the iith iteration. By accounting the AWGN channel between 𝐙\mathbf{Z} and 𝐘\mathbf{Y}, the explicit expression of the MMSE estimator for zl,tz_{l,t} and its variance in lines 10-11 are available as

z^l,t0(i)\displaystyle\widehat{z}^{0}_{l,t}(i) =yl,tτl,tp(i)+p^l,t(i)σw2τl,tp(i)+σw2,\displaystyle=\frac{y_{l,t}\tau^{p}_{l,t}(i)+\widehat{p}_{l,t}(i)\sigma^{2}_{w}}{\tau^{p}_{l,t}(i)+\sigma^{2}_{w}}, (11)
τl,tz(i)\displaystyle\tau^{z}_{l,t}(i) =τl,tp(i)σw2τl,tp(i)+σw2.\displaystyle=\frac{\tau^{p}_{l,t}(i)\sigma^{2}_{w}}{\tau^{p}_{l,t}(i)+\sigma^{2}_{w}}. (12)

After updating the noiseless signal zn,tz_{n,t}, the mean and the variance of each element in the residual signal that removes the estimated 𝐗\mathbf{X} from 𝐘\mathbf{Y} are given in lines 12-13. Then lines 14-15 give the update of the plug-in estimate r^n,t\widehat{r}_{n,t} of the true signal xn,tx_{n,t}, which is modeled as r^n,t(i)=xn,t+nn,t(i)\widehat{r}_{n,t}(i)=x_{n,t}+n_{n,t}(i) with nn,t(i)𝒞𝒩(0,τn,tr(i))n_{n,t}(i)\sim\mathcal{CN}(0,\tau^{r}_{n,t}(i)). To obtain the MMSE estimator of 𝐗\mathbf{X}, we can give the approximate posterior marginal distribution of xn,tx_{n,t} as

pi(xn,t|𝐘)=p(r^n,t(i))p(r^n,t(i)|xn,t)p(r^n,t(i))=pXi(xn,t)𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))pXi(xn,t)𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))𝑑xn,t,\displaystyle p^{i}(x_{n,t}|\mathbf{Y})=\frac{p(\widehat{r}_{n,t}(i))p(\widehat{r}_{n,t}(i)|x_{n,t})}{p(\widehat{r}_{n,t}(i))}=\frac{p^{i}_{X}(x_{n,t})\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))}{\int p^{i}_{X}(x_{n,t})\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))dx_{n,t}}, (13)

where pXi(xn,t)=(1pn,t(i))δ(xn,t)+pn,t(i)𝒞𝒩(xn,t;0,β)p^{i}_{X}(x_{n,t})=(1-\overrightarrow{p}_{n,t}(i))\delta(x_{n,t})+\overrightarrow{p}_{n,t}(i)\mathcal{CN}(x_{n,t};0,\beta) based on the extrinsic message μ(n,t)(n,t)i(λn,t)=(1pn,t(i))(1λn,t)+pn,t(i)λn,t\mu^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})=(1-\overrightarrow{p}_{n,t}(i))(1-\lambda_{n,t})+\overrightarrow{p}_{n,t}(i)\lambda_{n,t}. The detailed derivation of pn,t(i)\overrightarrow{p}_{n,t}(i) will be given as (25) in the MP part. By simplifying (13), the approximate posterior marginal distribution pi(xn,t|𝐘)p^{i}(x_{n,t}|\mathbf{Y}) can be written in the form of a spike and slab probability as

pi(xn,t|𝐘)\displaystyle p^{i}(x_{n,t}|\mathbf{Y})
=(1pn,t(i))δ(xn,t)𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))+pn,t(i)𝒞𝒩(xn,t;0,β)𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))(1pn,t(i))δ(xn,t)𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))+pn,t(i)𝒞𝒩(xn,t;0,β)𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))dxn,t\displaystyle=\frac{(1-\overrightarrow{p}_{n,t}(i))\delta(x_{n,t})\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))+\overrightarrow{p}_{n,t}(i)\mathcal{CN}(x_{n,t};0,\beta)\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))}{\int(1-\overrightarrow{p}_{n,t}(i))\delta(x_{n,t})\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))+\overrightarrow{p}_{n,t}(i)\mathcal{CN}(x_{n,t};0,\beta)\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))dx_{n,t}}
=(1pn,t(i))𝒞𝒩(0;r^n,t(i),τn,tr(i))δ(xn,t)+pn,t(i)𝒞𝒩(0;r^n,t(i),β+τn,tr(i))𝒞𝒩(xn,t;γn,t(i),τn,tγ(i))(1pn,t(i))𝒞𝒩(0;r^n,t(i),τn,tr(i))+pn,t(i)𝒞𝒩(0;r^n,t(i),β+τn,tr(i))\displaystyle=\frac{(1-\overrightarrow{p}_{n,t}(i))\mathcal{CN}(0;\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))\delta(x_{n,t})+\overrightarrow{p}_{n,t}(i)\mathcal{CN}(0;\widehat{r}_{n,t}(i),\beta+\tau^{r}_{n,t}(i))\mathcal{CN}(x_{n,t};\gamma_{n,t}(i),\tau^{\gamma}_{n,t}(i))}{(1-\overrightarrow{p}_{n,t}(i))\mathcal{CN}(0;\widehat{r}_{n,t}(i),\tau_{n,t}^{r}(i))+\overrightarrow{p}_{n,t}(i)\mathcal{CN}(0;\widehat{r}_{n,t}(i),\beta+\tau^{r}_{n,t}(i))}
=(1ϖn,t(i))δ(xn,t)+ϖn,t(i)𝒞𝒩(xn,t;γn,t(i),τn,tγ(i)),\displaystyle=(1-\varpi_{n,t}(i))\delta(x_{n,t})+\varpi_{n,t}(i)\mathcal{CN}(x_{n,t};\gamma_{n,t}(i),\tau^{\gamma}_{n,t}(i)), (14)

where

ϖn,t(i)\displaystyle\varpi_{n,t}(i) =[1+1pn,t(i)pn,t(i)β+τn,tr(i)τn,tr(i)exp(|r^n,t(i)|2τn,tr(i)(1+τn,tr(i)β))]1,\displaystyle=\left[1+\frac{1-\overrightarrow{p}_{n,t}(i)}{\overrightarrow{p}_{n,t}(i)}\frac{\beta+\tau_{n,t}^{r}(i)}{\tau_{n,t}^{r}(i)}\exp\Bigg{(}-\frac{|\widehat{r}_{n,t}(i)|^{2}}{\tau_{n,t}^{r}(i)(1+\frac{\tau_{n,t}^{r}(i)}{\beta})}\Bigg{)}\right]^{-1}, (15)
γn,t(i)\displaystyle\gamma_{n,t}(i) =ββ+τn,tr(i)r^n,t(i),\displaystyle=\frac{\beta}{\beta+\tau_{n,t}^{r}(i)}\widehat{r}_{n,t}(i), (16)
τn,tγ(i)\displaystyle\tau_{n,t}^{\gamma}(i) =βτn,tr(i)β+τn,tr(i).\displaystyle=\frac{\beta\tau^{r}_{n,t}(i)}{\beta+\tau^{r}_{n,t}(i)}. (17)

Thus, we can obtain the explicit expression of the posterior mean and variance of xn,mx_{n,m} in lines 16-17 as

x^n,t(i+1)\displaystyle\widehat{x}_{n,t}(i+1) =𝔼[xn,t|r^n,t(i),τn,tr(i),pn,t(i)]=ϖn,t(i)γn,t(i),\displaystyle=\mathbb{E}[x_{n,t}|\widehat{r}_{n,t}(i),\tau^{r}_{n,t}(i),\overrightarrow{p}_{n,t}(i)]=\varpi_{n,t}(i)\gamma_{n,t}(i), (18)
τn,tx(i+1)\displaystyle\tau^{x}_{n,t}(i+1) =𝕍[xn,t|r^n,t(i),τn,tr(i),pn,t(i)]=ϖn,t(i)((1ϖn,t(i))|γn,t(i)|2+τn,tγ(i)).\displaystyle=\mathbb{V}[x_{n,t}|\widehat{r}_{n,t}(i),\tau^{r}_{n,t}(i),\overrightarrow{p}_{n,t}(i)]=\varpi_{n,t}(i)\left((1-\varpi_{n,t}(i))|\gamma_{n,t}(i)|^{2}+\tau^{\gamma}_{n,t}(i)\right). (19)

Since pn,t(i)\overrightarrow{p}_{n,t}(i) is updated by the MP part at each iteration, the MMSE estimator (18) also changes at each iteration. However, the MMSE estimator on xn,tx_{n,t} keeps constant at each iteration in the AMP-based algorithms proposed in [28, 29] for temporally-correlated massive access.

IV-B MP Part

Algorithm 1 HyGAMP-DCS for joint user activity detection and channel estimation
0:  Pilot matrix 𝐀\mathbf{A}, received signals 𝐘\mathbf{Y}, large-scale fading coefficients β\mathbf{\beta}, active probability pap_{a}, transition probability matrix 𝐏\mathbf{P}, noise variance σw2\sigma^{2}_{w}, tolerance ε\varepsilon, and maximum iteration ImaxI_{max}
0:  MMSE estimate 𝐗^\widehat{\mathbf{X}}
1:  Initialize
2:  i1i\leftarrow 1
3:  n,t:x^n,t(i)=0\forall n,t:~{}\widehat{x}_{n,t}(i)=0, τn,tx(i)=𝕍[xn,t]=paβ\tau^{x}_{n,t}(i)=\mathbb{V}[x_{n,t}]=p_{a}\beta, pn,t(i)=pa\overrightarrow{p}_{n,t}(i)=p_{a}.
4:  l,t:s^l,t(i1)=0\forall l,t:~{}\widehat{s}_{l,t}(i-1)=0.
5:  n:qn,1(i)=pa\forall n:~{}\overrightarrow{q}_{n,1}(i)=p_{a}.
6:  repeat
7:     {Basic GAMP estimation}
8:     l,t:τl,tp(i)=n=1N|al,n|2τn,tx(i)\forall l,t:~{}\tau^{p}_{l,t}(i)=\sum_{n=1}^{N}|a_{l,n}|^{2}\tau^{x}_{n,t}(i).
9:     l,t:p^l,t(i)=n=1Nal,nx^n,t(i)τl,tp(i)s^l,t(i1)\forall l,t:~{}\widehat{p}_{l,t}(i)=\sum_{n=1}^{N}a_{l,n}\widehat{x}_{n,t}(i)-\tau^{p}_{l,t}(i)\widehat{s}_{l,t}(i-1).
10:     l,t:τl,tz(i)=𝕍[zl,t|p^l,t(i),τl,tp(i),yl,t,σw2]\forall l,t:~{}\tau^{z}_{l,t}(i)=\mathbb{V}[z_{l,t}|\widehat{p}_{l,t}(i),\tau^{p}_{l,t}(i),y_{l,t},\sigma^{2}_{w}].
11:     l,t:z^l,t0(i)=𝔼[zl,t|p^l,t(i),τl,tp(i),yl,t,σw2]\forall l,t:~{}\widehat{z}^{0}_{l,t}(i)=\mathbb{E}[z_{l,t}|\widehat{p}_{l,t}(i),\tau^{p}_{l,t}(i),y_{l,t},\sigma^{2}_{w}].
12:     l,t:τl,ts(i)=(1τl,tz(i)/τl,tp(i))/τl,tp(i)\forall l,t:~{}\tau^{s}_{l,t}(i)=(1-\tau^{z}_{l,t}(i)/\tau^{p}_{l,t}(i))/\tau^{p}_{l,t}(i).
13:     l,t:s^l,t(i)=(z^l,t0(i)p^l,t(i))/τl,tp(i)\forall l,t:~{}\widehat{s}_{l,t}(i)=(\widehat{z}^{0}_{l,t}(i)-\widehat{p}_{l,t}(i))/\tau^{p}_{l,t}(i).
14:     n,t:τn,tr(i)=1/(l=1L|al,n|2τl,ns(i))\forall n,t:~{}\tau^{r}_{n,t}(i)=1/\left(\sum_{l=1}^{L}|a_{l,n}|^{2}\tau^{s}_{l,n}(i)\right).
15:     n,t:r^n,t(i)=x^n,t(i)+τn,tr(i)l=1Lal,ts^l,t(i)\forall n,t:~{}\widehat{r}_{n,t}(i)=\widehat{x}_{n,t}(i)+\tau^{r}_{n,t}(i)\sum_{l=1}^{L}a^{*}_{l,t}\widehat{s}_{l,t}(i).
16:     n,t:τn,tx(i+1)=𝕍[xn,t|r^n,t(i),τn,tr(i),pn,t(i)]\forall n,t:~{}\tau^{x}_{n,t}(i+1)=\mathbb{V}[x_{n,t}|\widehat{r}_{n,t}(i),\tau^{r}_{n,t}(i),\overrightarrow{p}_{n,t}(i)].
17:     n,t:x^n,t(i+1)=𝔼[xn,t|r^n,t(i),τn,tr(i),pn,t(i)]\forall n,t:~{}\widehat{x}_{n,t}(i+1)=\mathbb{E}[x_{n,t}|\widehat{r}_{n,t}(i),\tau^{r}_{n,t}(i),\overrightarrow{p}_{n,t}(i)].
18:     {Activity likelihood update}
19:     n,t:pn,t(i)\forall n,t:~{}\overleftarrow{p}_{n,t}(i) is updated in (IV-B).
20:     n,t:qn,t(i)\forall n,t:~{}\overrightarrow{q}_{n,t}(i) is updated in (23) from frame 1 to frame TT.
21:     n,t:qn,t(i)\forall n,t:~{}\overleftarrow{q}_{n,t}(i) is updated in (24) from frame (T1)(T-1) to frame 1.
22:     n,t:pn,t(i+1)\forall n,t:~{}\overrightarrow{p}_{n,t}(i+1) is updated in (25).
23:     ii+1i\leftarrow i+1
24:  until i>Imaxi>I_{max} or 𝐗^(i+1)𝐗^(i)F2𝐗^(i)F2ε\frac{||\widehat{\mathbf{X}}(i+1)-\widehat{\mathbf{X}}(i)||_{F}^{2}}{||\widehat{\mathbf{X}}(i)||_{F}^{2}}\leq\varepsilon.

By regarding all the edges in the MP part as strong edges, we utilize the standard MP algorithm to update the activity probability of each user with the extrinsic message υ(n,t)(n,t)i(λn,t)\upsilon^{i}_{(n,t)\leftarrow(n,t)}(\lambda_{n,t}) from the GAMP part.

First, the extrinsic message υ(n,t)(n,t)i(λn,t)\upsilon^{i}_{(n,t)\leftarrow(n,t)}(\lambda_{n,t}) is obtained as

υ(n,t)(n,t)i(λn,t)=\displaystyle\upsilon^{i}_{(n,t)\leftarrow(n,t)}(\lambda_{n,t})= p(xn,t|β,λn,t)υ(n,t)(n,t)i(xn,t)𝑑xn,t\displaystyle\int p(x_{n,t}|\beta,\lambda_{n,t})\cdot\upsilon^{i}_{(n,t)\leftarrow(n,t)}(x_{n,t})dx_{n,t}
=\displaystyle\quad= (1pn,t(i))(1λn,t)+pn,t(i)λn,t,\displaystyle~{}(1-\overleftarrow{p}_{n,t}(i))(1-\lambda_{n,t})+\overleftarrow{p}_{n,t}(i)\lambda_{n,t}, (20)

where υ(n,t)(n,t)i(xn,t)=𝒞𝒩(xn,t;r^n,t(i),τn,tr(i))\upsilon^{i}_{(n,t)\leftarrow(n,t)}(x_{n,t})=\mathcal{CN}(x_{n,t};\widehat{r}_{n,t}(i),\tau^{r}_{n,t}(i)) and pn,t(i)\overleftarrow{p}_{n,t}(i) can be expressed by

pn,t(i)\displaystyle\overleftarrow{p}_{n,t}(i) =p(r^n,t(i)|λn,t=1)p(r^n,t(i)|λn,t=0)+p(r^n,t(i)|λn,t=1)\displaystyle=\frac{p(\widehat{r}_{n,t}(i)|\lambda_{n,t}=1)}{p(\widehat{r}_{n,t}(i)|\lambda_{n,t}=0)+p(\widehat{r}_{n,t}(i)|\lambda_{n,t}=1)}
=𝒞𝒩(r^n,t;0,τn,tr(i)+β)𝒞𝒩(r^n,t;0,τn,tr(i))+𝒞𝒩(r^n,t;0,τn,tr(i)+β).\displaystyle=\frac{\mathcal{CN}(\widehat{r}_{n,t};0,\tau^{r}_{n,t}(i)+\beta)}{\mathcal{CN}(\widehat{r}_{n,t};0,\tau^{r}_{n,t}(i))+\mathcal{CN}(\widehat{r}_{n,t};0,\tau^{r}_{n,t}(i)+\beta)}. (21)

With (IV-B), we can implement the statistical dependencies across the frames based on the Markov chain of the activity variables {λn,t}\{\lambda_{n,t}\}. We also define qn,t(i)\overleftarrow{q}_{n,t}(i) and qn,t(i)\overrightarrow{q}_{n,t}(i) as the activity likelihoods of user nn contained in the messages ζ(n,t)(n,t+1)i(λn,t)\zeta^{i}_{(n,t)\leftarrow(n,t+1)}(\lambda_{n,t}) and ξ(n,t)(n,t)i(λn,t)\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}), respectively. The message ξ(n,t)(n,t+1)i(λn,t)\xi^{i}_{(n,t)\rightarrow(n,t+1)}(\lambda_{n,t}) for t=1,,T1\forall t=1,\dots,T-1 is expressed by the product of two Bernoulli distributions as

ξ(n,t)(n,t+1)i(λn,t)\displaystyle\xi^{i}_{(n,t)\rightarrow(n,t+1)}(\lambda_{n,t})\propto ξ(n,t)(n,t)i(λn,t)υ(n,t)(n,t)i(λn,t)\displaystyle~{}\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})\cdot\upsilon^{i}_{(n,t)\leftarrow(n,t)}(\lambda_{n,t})
\displaystyle\quad\propto (1qn,t(i))(1pn,t(i))(1λn,t)+qn,t(i)pn,t(i)λn,t.\displaystyle~{}(1-\overrightarrow{q}_{n,t}(i))(1-\overleftarrow{p}_{n,t}(i))(1-\lambda_{n,t})+\overrightarrow{q}_{n,t}(i)\overleftarrow{p}_{n,t}(i)\lambda_{n,t}. (22)

Then we update the forward message ξ(n,t)(n,t)i(λn,t)\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) that represents the useful information conveyed from λn,t1\lambda_{n,t-1} to λn,t\lambda_{n,t}. For t=1t=1, we always ξ(n,t)(n,t)i(λn,t)=(1pa)(1λn,t)+paλn,t\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})=(1-p_{a})(1-\lambda_{n,t})+p_{a}\lambda_{n,t}, i.e., qn,1(i)=pa,n\overrightarrow{q}_{n,1}(i)=p_{a},\forall n, since p(λn,1)p(\lambda_{n,1}) is only connected with λn,1\lambda_{n,1} in the factor graph. For t2t\geq 2, the forward message ξ(n,t)(n,t)i(λn,t)\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) can be obtained as

ξ(n,t)(n,t)i(λn,t)=\displaystyle\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})= p(λn,t|λn,t1)ξ(n,t1)(n,t)i(λn,t1)𝑑λn,t1\displaystyle\int p(\lambda_{n,t}|\lambda_{n,t-1})\cdot\xi^{i}_{(n,t-1)\rightarrow(n,t)}(\lambda_{n,t-1})d\lambda_{n,t-1}

where

qn,t(i)=p01(1pn,t1(i))(1qn,t1(i))+p11pn,t1(i)qn,t1(i)(1pn,t1(i))(1qn,t1(i))+pn,t1(i)qn,t1(i).\overrightarrow{q}_{n,t}(i)=\frac{p_{01}(1-\overleftarrow{p}_{n,t-1}(i))(1-\overrightarrow{q}_{n,t-1}(i))+p_{11}\overleftarrow{p}_{n,t-1}(i)\overrightarrow{q}_{n,t-1}(i)}{(1-\overleftarrow{p}_{n,t-1}(i))(1-\overrightarrow{q}_{n,t-1}(i))+\overleftarrow{p}_{n,t-1}(i)\overrightarrow{q}_{n,t-1}(i)}. (23)

Note that the activity likelihoods {qn,t(i)}\{\overrightarrow{q}_{n,t}(i)\} are updated sequentially from frame 11 to frame TT.

During the forward message passing, the backward message passing can be performed in parallel to convey the useful information from λn,t+1\lambda_{n,t+1} to λn,t\lambda_{n,t} for t=1,,T1t=1,\dots,T-1. Due to the symmetry, the update formulas for qn,t(i)\overleftarrow{q}_{n,t}(i) in the backward message ζ(n,t)(n,t+1)i(λn,t)\zeta^{i}_{(n,t)\leftarrow(n,t+1)}(\lambda_{n,t}) is similar to that of qn,t(i)\overrightarrow{q}_{n,t}(i), except that {qn,t(i)}\{\overleftarrow{q}_{n,t}(i)\} are updated sequentially from frame (T1)(T-1) to frame 11. For tT1t\leq T-1, the activity likelihood qn,t(i)\overleftarrow{q}_{n,t}(i) can be obtained as

qn,t(i)=p10(1pn,t+1(i))(1qn,t+1(i))+p11pn,t+1(i)qn,t+1(i)(p00+p10)(1pn,t+1(i))(1qn,t+1(i))+(p11+p01)pn,t+1(i)qn,t+1(i).\overleftarrow{q}_{n,t}(i)=\frac{p_{10}(1-\overleftarrow{p}_{n,t+1}(i))(1-\overleftarrow{q}_{n,t+1}(i))+p_{11}\overleftarrow{p}_{n,t+1}(i)\overleftarrow{q}_{n,t+1}(i)}{(p_{00}+p_{10})(1-\overleftarrow{p}_{n,t+1}(i))(1-\overleftarrow{q}_{n,t+1}(i))+(p_{11}+p_{01})\overleftarrow{p}_{n,t+1}(i)\overleftarrow{q}_{n,t+1}(i)}. (24)

It is noted that the proposed algorithms with block-by-block detection performs bidirectional message propagation, while the algorithms in [28, 29] only have forward message propagation. Therefore, by utilizing the useful information from both λn,t1\lambda_{n,t-1} and λn,t+1\lambda_{n,t+1}, HyGAMP-DCS can further exploit the temporal correlation to provide a more precise estimation of the activity λn,t\lambda_{n,t}, which then also contributes to enhanced channel estimation performance.

With both the forward message and backward message in the above, the extrinsic message μ(n,t)(n,t)i(λn,t)\mu^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) to refine the channel estimation in the next iteration is subsequently given by μ(n,t)(n,t)i(λn,t)ξ(n,t)(n,t)i(λn,t)ζ(n,t)(n,t+1)i(λn,t)\mu^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})\propto\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})\cdot\zeta^{i}_{(n,t)\leftarrow(n,t+1)}(\lambda_{n,t}) for t=1,,T1t=1,\dots,T-1. Then the update rule of pn,t(i)\overrightarrow{p}_{n,t}(i) in message μ(n,t)(n,t)i(λn,t)\mu^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) for t=1,,T1t=1,\dots,T-1 is obtained as

pn,t(i+1)=qn,t(i)qn,t(i)(1qn,t(i))(1qn,t(i))+qn,t(i)qn,t(i).\displaystyle\overrightarrow{p}_{n,t}(i+1)=\frac{\overleftarrow{q}_{n,t}(i)\overrightarrow{q}_{n,t}(i)}{(1-\overleftarrow{q}_{n,t}(i))(1-\overrightarrow{q}_{n,t}(i))+\overleftarrow{q}_{n,t}(i)\overrightarrow{q}_{n,t}(i)}. (25)

However, for t=Tt=T, the update rule is given by μ(n,t)(n,t)i(λn,t)=ξ(n,t)(n,t)i(λn,t)\mu^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t})=\xi^{i}_{(n,t)\rightarrow(n,t)}(\lambda_{n,t}) and pn,T(i+1)=qn,T(i)\overrightarrow{p}_{n,T}(i+1)=\overrightarrow{q}_{n,T}(i), since the variable node λn,T\lambda_{n,T} is connected to only one factor node p(λn,T|λn,T1)p(\lambda_{n,T}|\lambda_{n,T-1}) in the MP part. The whole procedure of the proposed HyGAMP-DCS algorithm is outlined in Algorithm 1. In this work, we only consider the algorithm design single-antenna scenario, while the extension of the algorithm for the multiple-antenna scenario is straightforward.

IV-C Discussion

In this subsection, the impact of the number of joint detection frames TT on the estimation performance and the detection delay in the proposed algorithm is discussed. Due to the block-by-block detection, the estimation in the 1st frame and the TTth frame only benefits from single-sided message, while the estimation in the ttth frame (2tT12\leq t\leq T-1) can utilize the double-sided messages. The ratio of the number of frames in which the estimations can combine the double-sided messages to the number of joint detection frames, i.e., T2T\frac{T-2}{T}, is enlarged when TT increases. By increasing the number of joint detection frames TT, the overall user activity detection and channel estimation performance in the TT consecutive frames is improved. When TT is large, the ratio T2T\frac{T-2}{T} approaches 11 and then the proposed algorithm may achieve its best performance. However, the block-by-block detection also leads to detection delay, and the average detection delay can be obtained as T12\frac{T-1}{2} (frame). From the simulation results in Section VII, we find that HyGAMP-DCS can achieve significant performance improvement with a moderate value of TT (e.g., T=5T=5).

V Hyperparameter Learning via EM Algorithm

The HyGAMP-DCS algorithm requires the prior knowledge of the system statistics characterized by the hyperparameter set ϑ={pa,β,p10,σw2}\boldsymbol{\vartheta}=\{p_{a},\beta,p_{10},\sigma^{2}_{w}\}, which may not be estimated accurately in advance. In this section, we propose to integrate the expectation maximization algorithm [33] with the proposed HyGAMP-DCS algorithm, where the statistical parameters are learned from the received signals in the jointly detected TT frames during the estimation procedure.

The EM algorithm aims to find the optimal parameter setting that maximizes the likelihood function p(𝐘,𝐗|ϑ)p(\mathbf{Y},\mathbf{X}|\boldsymbol{\vartheta}), while the optimization problem is usually non-convex and has intractable computational complexity. Instead, the EM algorithm maximizes a lower bound of the likelihood function p(𝐘,𝐗|ϑ)p(\mathbf{Y},\mathbf{X}|\boldsymbol{\vartheta}) in each iteration, then p(𝐘,𝐗|ϑ)p(\mathbf{Y},\mathbf{X}|\boldsymbol{\vartheta}) is guaranteed to be increased to a local maximal point. In specific, two steps of the E-step and M-step are alternatively performed until convergence, which can be written as

i(ϑ)\displaystyle\mathcal{L}^{i}(\boldsymbol{\vartheta}) =𝔼pi(𝐗)[lnp(𝐗,𝐘|ϑ)],\displaystyle=\mathbb{E}_{p^{i}(\mathbf{X})}[\ln p(\mathbf{X},\mathbf{Y}|\boldsymbol{\vartheta})], (26)
ϑi+1\displaystyle\boldsymbol{\vartheta}^{i+1} =argmaxϑi(ϑ),\displaystyle=\arg\max_{\boldsymbol{\vartheta}}\mathcal{L}^{i}(\boldsymbol{\vartheta}), (27)

where pi(𝐗)=p(𝐗|𝐘,ϑi)p^{i}(\mathbf{X})=p(\mathbf{X}|\mathbf{Y},\boldsymbol{\vartheta}^{i}) is the estimated posterior marginal distribution of 𝐗\mathbf{X} based on the updated statistical parameters ϑi\boldsymbol{\vartheta}^{i} and 𝔼pi(𝐱)[]\mathbb{E}_{p^{i}(\mathbf{x})}[\cdot] denotes the expectation over the posterior distribution pi(𝐗)p^{i}(\mathbf{X}). Since the number of users NN is very large, the expectation calculation may bring about unaffordable computational cost. Fortunately, the HyGAMP framework enables us to approximate the posterior probability by pi(𝐗)=t=1Tn=1Np(xn,t|𝐘,ϑi)p^{i}(\mathbf{X})=\prod_{t=1}^{T}\prod_{n=1}^{N}p(x_{n,t}|\mathbf{Y},\boldsymbol{\vartheta}^{i}) in the large-scale systems, which leads to a computationally efficient EM estimation procedure. After obtaining the expectation, we leverage the EM algorithm in the incremental version [34] to simplify the joint optimization problem (27), where the hyperparameters are optimized in the coordinate-wise manner. So that only one hyperparameter ϑϑ\vartheta\in\boldsymbol{\vartheta} is updated at a time while others all keep fixed. The simplified optimization problem can be written as

ϑi+1\displaystyle\vartheta^{i+1} =argmaxϑi(ϑϑi,ϑ).\displaystyle=\arg\max_{\vartheta}\mathcal{L}^{i}(\boldsymbol{\vartheta}^{i}_{\setminus\vartheta},\vartheta). (28)

The solution of the problem (28) can be easily obtained by setting the derivative of i(ϑϑi,ϑ)\mathcal{L}^{i}(\boldsymbol{\vartheta}^{i}_{\setminus\vartheta},\vartheta) with respect to ϑ\vartheta to be zero. The hyperparameters in the iith iteration are updated by

σw2(i+1)=\displaystyle\sigma^{2}_{w}(i+1)= 1LTt=1Tl=1L[|yl,tz^l,t0(i)|2+τl,tz(i)],\displaystyle~{}\frac{1}{LT}\sum_{t=1}^{T}\sum_{l=1}^{L}\Big{[}|y_{l,t}-\widehat{z}^{0}_{l,t}(i)|^{2}+\tau_{l,t}^{z}(i)\Big{]}, (29)
β(i+1)=\displaystyle\beta(i+1)= t=1Tn=1N{ϖn,t(i)[|γn,t(i)|2+τn,tγ(i)]}t=1Tn=1Nϖn,t(i),\displaystyle~{}\frac{\sum_{t=1}^{T}\sum_{n=1}^{N}\left\{\varpi_{n,t}(i)\big{[}|\gamma_{n,t}(i)|^{2}+\tau^{\gamma}_{n,t}(i)\big{]}\right\}}{\sum_{t=1}^{T}\sum_{n=1}^{N}\varpi_{n,t}(i)}, (30)
pa(i+1)=\displaystyle p_{a}(i+1)= 1Nn=1Nqn,1(i)qn,1(i)pn,1(i)(1qn,1(i))(1qn,1(i))(1pn,1(i))+qn,1(i)qn,1(i)pn,1(i),\displaystyle~{}\frac{1}{N}\sum_{n=1}^{N}\frac{\overleftarrow{q}_{n,1}(i)\overrightarrow{q}_{n,1}(i)\overleftarrow{p}_{n,1}(i)}{(1-\overleftarrow{q}_{n,1}(i))(1-\overrightarrow{q}_{n,1}(i))(1-\overleftarrow{p}_{n,1}(i))+\overleftarrow{q}_{n,1}(i)\overrightarrow{q}_{n,1}(i)\overleftarrow{p}_{n,1}(i)}, (31)
p10(i+1)=\displaystyle p_{10}(i+1)= t=2Tn=1N(𝔼[λn,t1]𝔼[λn,t1λn,t])t=2Tn=1N𝔼[λn,t1].\displaystyle~{}\frac{\sum_{t=2}^{T}\sum_{n=1}^{N}\Big{(}\mathbb{E}[\lambda_{n,t-1}]-\mathbb{E}[\lambda_{n,t-1}\lambda_{n,t}]\Big{)}}{\sum_{t=2}^{T}\sum_{n=1}^{N}\mathbb{E}[\lambda_{n,t-1}]}. (32)

After obtaining the updated probabilities pa(i+1)p_{a}(i+1) and p10(i+1)p_{10}(i+1), we can simply obtain the updated probabilities p01(i+1)=pa(i)p10(i)/(1pa(i))p_{01}(i+1)=p_{a}(i)p_{10}(i)/(1-p_{a}(i)), p11(i+1)=1p10(i+1),p00(i+1)=1p01(i+1)p_{11}(i+1)=1-p_{10}(i+1),p_{00}(i+1)=1-p_{01}(i+1) and finally complete the update of the transition probability matrix in the Markov chain. To make the work self-contained, the detailed derivations of the above equations (29) to (32) are all provided in the Appendix. In addition, the explicit expressions of 𝔼[λn,t1]\mathbb{E}[\lambda_{n,t-1}] and 𝔼[λn,t1λn,t]\mathbb{E}[\lambda_{n,t-1}\lambda_{n,t}] are given as (LABEL:equ:E_lamb_t-1) and (66) in the Appendix. To enable the EM algorithm to converge to the global maximum, we use a judicious initialization of the hyperparameters [33]:

σw2(1)\displaystyle\sigma^{2}_{w}(1) =𝐘F2(SNR0+1)LT,\displaystyle=\frac{||\mathbf{Y}||^{2}_{F}}{(\text{SNR}^{0}+1)LT}, (33)
pa(1)\displaystyle p_{a}(1) =LN{maxc>012NL[(1+c2Φ(c)cϕ(c))]1+c22[(1+c2)Φ(c)cϕ(c)]},\displaystyle=\frac{L}{N}\left\{\max_{c>0}\frac{1-\frac{2N}{L}[(1+c^{2}\Phi(-c)-c\phi(c))]}{1+c^{2}-2[(1+c^{2})\Phi(-c)-c\phi(c)]}\right\}, (34)
β(1)\displaystyle\beta(1) =𝐘F2LTσw(1)AF2paT,\displaystyle=\frac{||\mathbf{Y}||_{F}^{2}-LT\sigma_{w}(1)}{||A||_{F}^{2}p_{a}T}, (35)
p10(1)\displaystyle p_{10}(1) =pa(1),\displaystyle=p_{a}(1), (36)

where Φ()\Phi(\cdot) and ϕ()\phi(\cdot) denote the cumulative distribution function and the probability distribution function of the standard normal distribution, respectively. Since the knowledge of SNR may not be available, the appropriate initial value of SNR0\text{SNR}^{0} is in need. The conventional works usually adopt the value SNR0=20dB\text{SNR}^{0}=20\text{dB} [31], but we find that this setting can usually lead to a bad local optimal point for the EM-HyGAMP-DCS algorithm under our system setting. Though the suitable value of SNR0\text{SNR}^{0} can be found by exhaustive search method, this approach needs to collect lots of data with known user activities and user channels, which is inefficient. To tackle this problem, we will first introduce the analysis tool of SE in the next section and then show that the SE can be an effective tool to find the suitable value SNR0\text{SNR}^{0} for the specific system. Finally, we outline the whole procedure of the EM-HyGAMP-DCS algorithm in Algorithm 2.

Algorithm 2 EM-HyGAMP-DCS for joint user activity detection and channel estimation
0:  Pilot matrix 𝐀\mathbf{A}, received signals 𝐘\mathbf{Y}, tolerance ε\varepsilon, and maximum iteration ImaxI_{max}
0:  MMSE estimate 𝐗^\widehat{\mathbf{X}}
1:  Initialize
2:  i1i\leftarrow 1
3:  σw2(1)\sigma^{2}_{w}(1), pa(1)p_{a}(1), β(1)\beta(1), and p10(1)p_{10}(1) are initialized as (33)–(36).
4:  n,t:x^n,t(i)=0\forall n,t:~{}\widehat{x}_{n,t}(i)=0, τn,tx(i)=𝕍[xn,t]=pa(1)β(1)\tau^{x}_{n,t}(i)=\mathbb{V}[x_{n,t}]=p_{a}(1)\beta(1), pn,t(i)=pa(i)\overrightarrow{p}_{n,t}(i)=p_{a}(i).
5:  l,t:s^l,t(i1)=0\forall l,t:~{}\widehat{s}_{l,t}(i-1)=0.
6:  n:qn,1(1)=pa(i)\forall n:~{}\overrightarrow{q}_{n,1}(1)=p_{a}(i).
7:  repeat
8:     {HyGAMP-DCS estimation}
9:     Perform line 8-24 in Algorithm 1 with updated σw(i)\sigma_{w}(i), pa(i)p_{a}(i), β(i)\beta(i), and p10(i)p_{10}(i).
10:     {Hyperparameter update}
11:     σw2(i+1)\sigma^{2}_{w}(i+1), β(i+1)\beta(i+1), pa(i+1)p_{a}(i+1) and p10(i+1)p_{10}(i+1) are updated as (29)–(32).
12:     ii+1i\leftarrow i+1
13:  until i>Imaxi>I_{max} or 𝐗^(i+1)𝐗^(i)F2𝐗^(i)F2ε\frac{||\widehat{\mathbf{X}}(i+1)-\widehat{\mathbf{X}}(i)||_{F}^{2}}{||\widehat{\mathbf{X}}(i)||_{F}^{2}}\leq\varepsilon.

VI Performance and Complexity Analysis

In this section, we first analyze the performance of the proposed algorithms by the SE in the asymptotic regime, where L,NL,N\to\infty but LN\frac{L}{N} is fixed. It is noted that the SE can not only be used to decide the setting on the pilot length LL and the number of joint detection frames TT, but also play an important role of finding the appropriate hyperparameter initialization for the EM-HyGAMP-DCS algorithm. Then we illustrate its computational efficiency.

VI-A Performance Analysis Using State Evolution

SE is a common framework to analyze the estimation performance of the AMP-based algorithms in the asymptotic regime. When the condition that the pilot matrix has i.i.d. sub-Gaussian elements is satisfied, the SE can accurately track the MSE of estimator 𝐗^(i)\widehat{\mathbf{X}}(i) in large-scale problems. To simplify the SE analysis, we consider that HyGAMP-DCS has scalar variance in each frame tt. These scalar variances of the algorithm 1 can be rewritten as

τtx(i)τ¯tx(i)\displaystyle{\tau}^{x}_{t}(i)\approx\bar{\tau}^{x}_{t}(i) =1Nn=1Nτn,tx(i),t,\displaystyle=\frac{1}{N}\sum_{n=1}^{N}\tau^{x}_{n,t}(i),\forall t, (37)
τtp(i)τ¯tp(i)\displaystyle{\tau}^{p}_{t}(i)\approx\bar{\tau}^{p}_{t}(i) =NLτtx(i),t,\displaystyle=\frac{N}{L}{\tau}^{x}_{t}(i),\forall t, (38)
τtz(i)τ¯tz(i)\displaystyle{\tau}^{z}_{t}(i)\approx\bar{\tau}^{z}_{t}(i) =τtp(i)σw2τtp(i)+σw2,t,\displaystyle=\frac{{\tau}^{p}_{t}(i)\sigma^{2}_{w}}{{\tau}^{p}_{t}(i)+\sigma^{2}_{w}},\forall t, (39)
τts(i)τ¯ts(i)\displaystyle{\tau}^{s}_{t}(i)\approx\bar{\tau}^{s}_{t}(i) =1τtp(i)(1τtz(i)τtp(i)),t,\displaystyle=\frac{1}{{\tau}^{p}_{t}(i)}\left(1-\frac{{\tau}^{z}_{t}(i)}{{\tau}^{p}_{t}(i)}\right),\forall t, (40)
τtr(i)τ¯tr(i)\displaystyle{\tau}^{r}_{t}(i)\approx\bar{\tau}^{r}_{t}(i) =1τts(i),t,\displaystyle=\frac{1}{{\tau}^{s}_{t}(i)},\forall t, (41)

then τn,tx(i+1)\tau^{x}_{n,t}(i+1) is also calculated by (19).

Following the assumptions of the SE for the AMP-based algorithms in [32], the asymptotic MSE of the estimator x^n,t(i)\widehat{x}_{n,t}(i) is identical to 𝔼[τtx(i)]\mathbb{E}[\tau^{x}_{t}(i)] whose expectation is performed on r^n,t(i)\widehat{r}_{n,t}(i). Specifically, for the random variable x0pX(x0)x_{0}\sim p_{X}(x_{0}) with pX(x0)=(1ρ0)δ(x0)+ρ0𝒞𝒩(x0;0,β)p_{X}(x_{0})=(1-\rho_{0})\delta(x_{0})+\rho_{0}\mathcal{CN}(x_{0};0,\beta), the variable r^0(i)\widehat{r}_{0}(i) in each iteration ii can be modeled as

r^0(i)\displaystyle\widehat{r}_{0}(i) =x0+n0r(i),\displaystyle=x_{0}+n^{r}_{0}(i), (42)

where n0r(i)𝒞𝒩(0,τ0r(i))n^{r}_{0}(i)\sim\mathcal{CN}(0,\tau^{r}_{0}(i)) is the corrupting complex Gaussian noise being independent to x0x_{0}. Under the proposed algorithm with scalar variance, we have

τ0r(i)\displaystyle\tau^{r}_{0}(i) =(τ0p(i))2τ0p(i)τ0z(i)=σw2+NL𝔼[τ0x(i)],\displaystyle=\frac{(\tau^{p}_{0}(i))^{2}}{\tau^{p}_{0}(i)-\tau^{z}_{0}(i)}=\sigma^{2}_{w}+\frac{N}{L}\mathbb{E}[\tau^{x}_{0}(i)], (43)

where τ0x(i)\tau^{x}_{0}(i) is calculated by (19) and the expectation operates on both x0x_{0} and n0r(i1)n^{r}_{0}(i-1). The explicit expression of the expectation is given as

𝔼[τ0x(i)]\displaystyle\mathbb{E}[\tau^{x}_{0}(i)] =τ0x(i)pX(x0)p(n0r(i1))𝑑x0𝑑n0r\displaystyle=\iint\tau^{x}_{0}(i)p_{X}(x_{0})p(n^{r}_{0}(i-1))~{}dx_{0}dn^{r}_{0}
=ρ0βτ0r(i1)β+τ0r(i1)+ρ0β2(1ψ(β/τ0r(i1)))β+τ0r(i1)\displaystyle=\frac{\rho_{0}\beta\tau^{r}_{0}(i-1)}{\beta+\tau^{r}_{0}(i-1)}+\frac{\rho_{0}\beta^{2}\left(1-\psi(\beta/\tau^{r}_{0}(i-1))\right)}{\beta+\tau^{r}_{0}(i-1)} (44)

where

ψ(b)\displaystyle\psi(b) =0sexp(s)1+(ρ011)(1+b)exp(bs)𝑑s.\displaystyle=\int_{0}^{\infty}\frac{s\exp(-s)}{1+(\rho^{-1}_{0}-1)(1+b)\exp(-bs)}~{}ds. (45)

Then the asymptotic MSE of 𝐗^(i)\widehat{\mathbf{X}}(i) in each iteration ii can be given as

𝔼[τx(i)]\displaystyle\mathbb{E}[\tau^{x}(i)] =𝔼[1NTn=1Nt=1Tτn,tx(i)]=1NTn=1Nt=1T𝔼[τn,tx(i)].\displaystyle=\mathbb{E}\Bigg{[}\frac{1}{NT}\sum_{n=1}^{N}\sum_{t=1}^{T}\tau^{x}_{n,t}(i)\Bigg{]}=\frac{1}{NT}\sum_{n=1}^{N}\sum_{t=1}^{T}\mathbb{E}[\tau^{x}_{n,t}(i)]. (46)

However, the active probability ρ0\rho_{0} is updated at the MP part in each iteration and it is hard to give the explicit expression of the marginal probability on ρ0\rho_{0}. Thus, we resort to the Monte Carlo simulations to obtain ρn,t=pn,t(i)\rho_{n,t}=\overrightarrow{p}_{n,t}(i) for each user nn in frame tt at each iteration and then substitute it into (46). Similarly, the performance of the EM-HyGAMP-DCS algorithm can also be predicted by following the above procedure where the statistical parameter set ϑ\boldsymbol{\vartheta} in each iteration are replaced by the learned one ϑ(i)\boldsymbol{\vartheta}(i).

Refer to caption
(a) EM-HyGAMP-DCS
Refer to caption
(b) The SE of EM-HyGAMP-DCS
Figure 3: The impact of the value of SNR0\text{SNR}^{0} on the recovery performance of EM-HyGAMP-DCS and the SE under the system setting SNR=βσw2=10dB\text{SNR}=\frac{\beta}{\sigma_{w}^{2}}=-10\text{dB} (The value SNR0=15dB\text{SNR}^{0}=-15\text{dB} is appropriate).
Refer to caption
(a) EM-HyGAMP-DCS
Refer to caption
(b) The SE of EM-HyGAMP-DCS
Figure 4: The impact of the value of SNR0\text{SNR}^{0} on the recovery performance of EM-HyGAMP-DCS and the SE under the system setting SNR=βσw2=0dB\text{SNR}=\frac{\beta}{\sigma_{w}^{2}}=0\text{dB} (The value SNR0=5dB\text{SNR}^{0}=-5\text{dB} is appropriate).

From the simulation trials, we can find that the EM-HyGAMP-DCS algorithm easily converges to a bad local optimal point with an inappropriate selection on SNR0\text{SNR}^{0}. As usual, we can easily collect lots of signals {𝐲t}\{\mathbf{y}^{t}\} received at the BS, but the true channels {𝐱t0}\{\mathbf{x}^{0}_{t}\} are hard to be obtained in advance, which hinders us to find the appropriate initialization based on empirical results. Fortunately, the SE can be utilized to find an appropriate value of SNR0\text{SNR}^{0} for the initialization of the EM-HyGAMP-DCS algorithm in different systems. Based on the SE, we can try different values of SNR0\text{SNR}^{0} and then select the suitable one that assures the EM-HyGAMP-DCS algorithm to converge to a satisfactory point. Fig. 3 and Fig. 4 show the impact of the value of SNR0\text{SNR}^{0} on the algorithm performance and the SE under the systems with SNR=βσw2=10 dB\text{SNR}=\frac{\beta}{\sigma_{w}^{2}}=-10\text{~{}dB} and SNR=βσw2=0 dB\text{SNR}=\frac{\beta}{\sigma_{w}^{2}}=0\text{~{}dB}, respectively. Here, we use the time-averaged normalized MSE (TNMSE) as the performance metric, which is computed as

TNMSE=10log10(𝐗^𝐗F2𝐗F2).\text{TNMSE}=10\log_{10}\left(\frac{||\widehat{\mathbf{X}}-\mathbf{X}||^{2}_{F}}{||\mathbf{X}||_{F}^{2}}\right). (47)

The numerical results in Fig. 3 and Fig. 4 show that the performance of EM-HyGAMP-DCS is sensitive to the value of SNR0\text{SNR}^{0}. We can observe that using SNR0<SNR\text{SNR}^{0}<\text{SNR} can always ensure EM-HyGAMP-DCS algorithm to converge to a satisfactory point but may have a slow convergence rate. For SNR0SNR\text{SNR}^{0}\gg\text{SNR}, the EM-HyGAMP-DCS algorithm usually cannot converge to a stationary point and thus achieve bad performance. Similarly, the SE by using SNR0<SNR\text{SNR}^{0}<\text{SNR} can converge to a stationary point. However, if SNR0SNR\text{SNR}^{0}\ll\text{SNR}, the SE has TNMSE fluctuation before it converges in the case of SNR=10 dB\text{SNR}=-10\text{~{}dB}. The SE also have a slower convergence rate when we set SNR0SNR\text{SNR}^{0}\ll\text{SNR}. When we have SNR0SNR\text{SNR}^{0}\gg\text{SNR}, the SE cannot converge to a stationary point though it seems to give a smaller TNMSE. For the SNR0\text{SNR}^{0} being close to the true SNR, the SE is shown to be smooth without TNME fluctuation and has fast convergence rate. In summary, the SE can be an essential tool to select an appropriate value of SNR0\text{SNR}^{0}, which makes EM-HyGAMP-DCS achieve satisfactory performance and fast convergence.

Besides using the SE to find a suitable value of SNR0\text{SNR}^{0}, we can also find the appropriate initialization of the other hyperparameters by the SE. Thus, the SE is not only used to analyze the asymptotic performance of the AMP-based algorithms under different system settings, but also a powerful tool to find the appropriate hyperparameter initialization for the AMP-based algorithms incorporated with the EM techniques.

VI-B Computational Complexity Analysis

We compare the computational complexity of the proposed algorithms with those of the benchmark algorithms in Table II. We consider five benchmarks which are the DCS-OMP algorithm [26], the PIA-ASP algorithm [27], the GAMP algorithm [32], the S-AMP algorithm [28] and DCS-AMP algorithm [6]. Note that DCS-OMP, PIA-ASP, GAMP, and S-AMP are frame-by-frame detection schemes, while DCS-AMP and the proposed HyGAMP-DCS and EM-HyGAMP-DCS are block-by-block detection schemes. Thus, for fair comparison, the number of the complex value multiplications at the each iteration ii normalized by the number of joint detection frames, TT, denoted as CiC_{i}, is given. Then the average computational complexity per frame of each algorithm can be obtained by C=i=1ICiC=\sum_{i=1}^{I}C_{i} with II being the iteration number. We can find that the average computational complexities per frame of DCS-OMP and PIA-ASP are in the same order of 𝒪(I(paN)3)\mathcal{O}(I(p_{a}N)^{3}), since these two algorithms adopt least square (LS) estimation which performs matrix inversion operation in each iteration. By contrast, the average computational complexities per frame of the AMP-based algorithms are in the order of 𝒪(ILN)\mathcal{O}(ILN). Compared with the standard GAMP algorithm, the HyGAMP-DCS algorithm and the EM-HyGAMP-DCS algorithm have slightly higher computational complexity due to the additional message updates between the adjacent frames. Additionally, we can find that HyGAMP-DCS and EM-HyGAMP-DCS may have faster convergence than the GAMP algorithm from numerical results, meaning that the computational cost of these two proposed algorithms can be further reduced.

TABLE II: Computational Complexity Comparison
Algorithm Number of complex value multiplications at each iteration ii normalized by the number of joint detection frames (CiC_{i}) The order of the average computational complexity per frame 𝒪(C)\mathcal{O}(C)
DCS-OMP LN+2N+Li+2Li2+i3LN+2N+Li+2Li^{2}+i^{3} 𝒪(I(paN)3)\mathcal{O}(I(p_{a}N)^{3})
PIA-ASP LN+6N2sp+L(sp+j)+6L(sp+j)2+3(sp+j)3LN+6N-2s_{p}+L(s_{p}+j)+6L(s_{p}+j)^{2}+3(s_{p}+j)^{3} 𝒪(I(paN)3)\mathcal{O}(I(p_{a}N)^{3})
GAMP 4LN+9L+16N4LN+9L+16N 𝒪(ILN)\mathcal{O}(ILN)
S-AMP 4LN+9L+24N4LN+9L+24N 𝒪(ILN)\mathcal{O}(ILN)
DCS-AMP 4I0LN+9L+32N4I_{0}LN+9L+32N 𝒪(II0LN)\mathcal{O}(II_{0}LN)
HyGAMP-DCS 4LN+9L+32N4LN+9L+32N 𝒪(ILN)\mathcal{O}(ILN)
EM-HyGAMP-DCS 4LN+9L+45N4LN+9L+45N 𝒪(ILN)\mathcal{O}(ILN)

Note: ii indicates the iteration index, jj and sps_{p} indicate the index of sparsity level and quality information of the PIA-ASP algorithm, I0I_{0} indicates the iteration number of the inner AMP estimation in the DCS-AMP algorithm.

Remark 1

The proposed HyGAMP-DCS algorithm has similar structure to the DCS-AMP algorithm proposed in [6]. However, the DCS-AMP algorithm in each outer iteration runs a complete AMP algorithm where the variables are firstly initialized and then iteratively updated for I01I_{0}\gg 1 times [6]. Thanks to the I0I_{0}-iteration AMP module, DCS-AMP may needs only several outer iterations for messages exchange between the adjacent frames to update the activity likelihoods. In each iteration of HyGAMP-DCS, the variables in the GAMP part are only updated once based on the estimated results in the last iteration. Note that the computational cost of DCS-AMP and HyGAMP-DCS mainly results from the calculation in the AMP estimation. Therefore, the whole computational complexity of DCS-AMP is usually higher than that of HyGAMP-DCS, though DCS-AMP only has several outer iterations.

VII Simulation Results

In this section, we give the simulation results of our proposed algorithm in the temporally-correlated massive access system. To show the advantage of our proposed algorithm, the conventional DCS-based algorithm of PIA-ASP [27], the standard GAMP algorithm [32] and the S-AMP algorithm111From our simulation trials, we find that the S-AMP algorithm has very similar performance to the SI-aided MMV-AMP algorithm [29]. Thus, we only show the performance of the S-AMP algorithm in this section. [28] are also evaluated as the benchmarks.

We consider the simulation scenario where there are N=103N=10^{3} users. The number of consecutive frames for joint detection is set as T=4T=4 and we set p10=0.25p_{10}=0.25, if not specified. The performance metric used for the user activity detection is defined as the time-averaged activity error ratio (TAER), which is calculated as

TAER=t=1T[Nm(t)+Nf(t)]NT,\text{TAER}=\frac{\sum_{t=1}^{T}[N_{m}(t)+N_{f}(t)]}{NT}, (48)

where Nm(t)N_{m}(t) is the number of missed detected users in the ttth frame and Nf(t)N_{f}(t) is the number of false alarm users in the ttth frame. All the numerical results here are obtained by averaging over 10410^{4} channel realizations.

Refer to caption
Figure 5: The impact of SNR on the user activity detection performance.
Refer to caption
Figure 6: The impact of SNR on the channel estimation performance.

VII-A Comparison with the Benchmarks

We evaluate the performance of the proposed HyGAMP-DCS algorithm for user activity detection and channel estimation in Fig. 6 and Fig. 6. We set L=300L=300 and pa=0.2p_{a}=0.2. One user is detected to be active when the power of its estimated channel is larger than the previously selected threshold ϱ\varrho, i.e., |x^n,t|2>ϱ|\widehat{x}_{n,t}|^{2}>\varrho, and the value of ϱ\varrho is the same for these three schemes. The result shows that the three AMP-based algorithms can all outperform the PIA-ASP algorithm, meaning that the prior knowledge of channels can be exploited to greatly improve the performance. It is also observed that the gaps between the PIA-ASP algorithm and the three AMP-based algorithms become larger when SNR increases, which implies that the AMP-based algorithm can acquire larger performance gain than the greedy algorithm in high SNR scenario. By exploiting the temporal correlation, the S-AMP algorithm and the proposed HyGAMP-DCS algorithm can achieve better performance than the GAMP algorithm. Moreover, the HyGAMP-DCS algorithm has further improved performance over S-AMP by performing bidirectional message propagation at the cost of the increased detection delay.

Refer to caption
Figure 7: The impact of pilot length LL on the user activity detection performance.
Refer to caption
Figure 8: The impact of pilot length LL on the channel estimation performance.

The impact of the pilot length LL on the performance of the proposed algorithms and the benchmarks is evaluated in Fig. 8 and Fig. 8. We set pa=0.1p_{a}=0.1 and SNR=10dB\text{SNR}=-10\text{dB}. We observe that the AMP-based algorithms all have larger performance gaps to the PIA-ASP algorithm when LL increases, which indicates the essential role of the channel statistics for performance improvement. It is also observed that the PIA-ASP algorithm can outperform the GAMP algorithm in the user activity detection performance when LL is smaller than 190, though the GAMP algorithm always has better channel estimation performance. When LL keeps increasing, the GAMP algorithm can outperform the PIA-ASP algorithm, which means that exploiting the temporal correlation provides limited performance gain for the greedy-based algorithm. The EM-HyGAMP-DCS algorithm is shown to achieve similar performance to the HyGAMP-DCS algorithm even if the perfect system statistics are unknown, implying that the EM-HyGAMP-DCS algorithm can be suitable to the practical scenarios.

Refer to caption
Figure 9: The impact of the number of joint detection frames, TT, on the user activity detection performance.
Refer to caption
Figure 10: The impact of the number of joint detection frames, TT, on the channel estimation performance.

Fig. 10 and Fig. 10 show the impact of the number of joint detection frames TT, on the performance of the evaluated algorithms. We set L=200L=200, pa=0.1p_{a}=0.1 and SNR=10dB\text{SNR}=-10\text{dB}. Both the TAER and TNMSE of the DCS-based algorithms is reduced when we increases TT. Moreover, the TAERs and TNMSEs of the S-AMP algorithm, the HyGAMP-DCS algorithm and the EM-HyGAMP-DCS algorithm have sharp reductions when TT increases and T<4T<4. However, the performance improvement diminishes as TT keep increasing, and eventually the performance saturates. As such, we can set a moderate value TT to take a compromise between performance and detection delay.

Refer to caption
Figure 11: The impact of transition probability p11p_{11} on the user activity detection performance.
Refer to caption
Figure 12: The impact of transition probability p11p_{11} on the channel estimation performance.

Fig. 12 and Fig. 12 depict the impact of the transition probability p11p_{11} on the algorithm performance under fixed active probability pa=0.1p_{a}=0.1. We also set L=200L=200 and SNR=10 dB\text{SNR}=-10\text{~{}dB}. It is observed that DCS-based algorithms all have smaller TAER and TNMSE with stronger temporal correlation and their performance is greatly enhanced when p11p_{11} approaches 1. These results demonstrate that exploiting the temporal correlation can greatly improve the user activity and channel estimation performance. Furthermore, our proposed algorithms can significantly outperform the benchmarks especially in the scenario with strong temporal correlation, which validates the superior performance of the proposed algorithms.

Refer to caption
Figure 13: TNMSE performance with SNR=10dB\text{SNR}=-10\text{dB}.
Refer to caption
Figure 14: TNMSE performance with SNR=0dB\text{SNR}=0\text{dB}.

VII-B Validation of the Theoretical Analysis

Finally, the simulation and theoretical results under the setup of L=200L=200 and pa=0.1p_{a}=0.1 are given in Fig. 14 and Fig. 14. We can find that the converged TNMSE of the proposed HyGAMP-DCS algorithm and the EM-HyGAMP-DCS algorithm can be accurately predicted by the SE. Compared with the GAMP algorithm, the proposed algorithms can have faster convergence in the high SNR scenario, which means that the HyGAMP-DCS-based algorithms can achieve performance improvement in both estimation accuracy and convergence. The theoretical results also indicate that the EM-HyGAMP-DCS algorithm can achieve almost the same performance as the HyGAMP-DCS algorithm does in the asymptotic regime even if the perfect system statistics are unknown. Therefore, in the practical system with imperfect system statistics, we can first employ the SE to find the appropriate hyperparameter initialization value and then employ the EM-HyGAMP-DCS algorithm for joint user activity detection and channel estimation.

VIII Conclusions

The grant-free temporally-correlated massive access system is studied in this work, where the active users have a large probability to transmit in multiple continuous frames. The problem of joint user activity detection and channel estimation in multiple consecutive frames is formulated as a DCS problem. Based on the probabilistic model, we propose the computationally efficient HyGAMP-DCS algorithm by exploiting both the statistics of channels and the temporally-correlated user activities from the perspective of Bayesian inference. Simulation results show that the proposed algorithm can achieve much better performance than the conventional DCS-based algorithm. To acquire the knowledge of the system statistics, the EM algorithm is proposed to be incorporated in the HyGAMP-DCS algorithm by adaptively updating the hyperparameters during the estimation procedure. In particular, we point out that the SE is the powerful tool to select the appropriate hyperparameter initialization of EM-HyGAMP-DCS. Additionally, we observe that only a moderate number of joint detection frames is enough for the proposed algorithms to nearly achieve the optimal performance.

-A Derivation of (29)

The objective function in problem (27) can be expressed as

i(ϑ)\displaystyle\mathcal{L}^{i}(\boldsymbol{\vartheta}) =𝔼pi(𝐗)[lnp(𝐗,𝐘|ϑ)]\displaystyle=\mathbb{E}_{p^{i}(\mathbf{X})}[\ln p(\mathbf{X},\mathbf{Y}|\boldsymbol{\vartheta})]
=𝔼pi(𝐗)[t=1T(lnp(𝐲t|𝐱t)+n=1Nlnp(xn,t|β,λn,t)+lnp(λn,t|λn,t1))].\displaystyle=\mathbb{E}_{p^{i}(\mathbf{X})}\left[\sum_{t=1}^{T}\Big{(}\ln p(\mathbf{y}_{t}|\mathbf{x}_{t})+\sum_{n=1}^{N}\ln p(x_{n,t}|\beta,\lambda_{n,t})+\ln p(\lambda_{n,t}|\lambda_{n,t-1})\Big{)}\right]. (49)

We can observe that all the terms except lnp(𝐲t|𝐱t)\ln p(\mathbf{y}_{t}|\mathbf{x}_{t}) for all tt are independent to the noise variance σw2\sigma^{2}_{w}, so that the partial derivative of (-A) with respect to σw\sigma_{w} is obtained as

σw2i(ϑ)\displaystyle\frac{\partial}{\partial\sigma^{2}_{w}}\mathcal{L}^{i}(\boldsymbol{\vartheta}) =σw2𝔼pi(𝐗)[t=1Tlnp(𝐲t|𝐱t)]\displaystyle=\frac{\partial}{\partial\sigma^{2}_{w}}\mathbb{E}_{p^{i}(\mathbf{X})}\left[\sum_{t=1}^{T}\ln p(\mathbf{y}_{t}|\mathbf{x}_{t})\right]
=1σw4t=1Tl=1L[|yl,tz^l,t0(i)|2+τl,tz(i)]+LTσw2.\displaystyle=\frac{1}{\sigma_{w}^{4}}\sum_{t=1}^{T}\sum_{l=1}^{L}\Big{[}|y_{l,t}-\widehat{z}^{0}_{l,t}(i)|^{2}+\tau^{z}_{l,t}(i)\Big{]}+\frac{LT}{\sigma^{2}_{w}}. (50)

Finally, we can give the update rule of σw2\sigma^{2}_{w} in (29) by setting the equation (-A) equaling to zero.

-B Derivation of (30)

Similarly, only the terms lnp(xn,t|β,λn,t)\ln p(x_{n,t}|\beta,\lambda_{n,t}) for all tt depends on the large-scale attenuation component. We can derive the partial derivative of (-A) with respect to β\beta as

βi(ϑ)\displaystyle\frac{\partial}{\partial\beta}\mathcal{L}^{i}(\boldsymbol{\vartheta}) =t=1Tl=1Lpi(xn,t|𝐘)βlnp(xl,t|β,λn,t)𝑑xn,t,\displaystyle=\sum_{t=1}^{T}\sum_{l=1}^{L}\int p^{i}(x_{n,t}|\mathbf{Y})\frac{\partial}{\partial\beta}\ln p(x_{l,t}|\beta,\lambda_{n,t})dx_{n,t}, (51)

where

βlnp(xl,t|β,λn,t)={0,xn,t=0,|xn,t|2β21β,xn,t0.\frac{\partial}{\partial\beta}\ln p(x_{l,t}|\beta,\lambda_{n,t})=\left\{\begin{array}[]{ll}0,&x_{n,t}=0,\\ \frac{|x_{n,t}|^{2}}{\beta^{2}}-\frac{1}{\beta},&x_{n,t}\neq 0.\end{array}\right. (52)

To tackle the discontinuity of p(xn,t|β,λn,t)β\frac{\partial p(x_{n,t}|\beta,\lambda_{n,t})}{\partial\beta} at the point xn,t=0x_{n,t}=0, we adopt the strategy in [31] where the closed region {x||x|ϵ,x}\mathcal{B}\triangleq\{x||x|\leq\epsilon,x\in\mathbb{C}\} and its complementary ¯=\bar{\mathcal{B}}=\mathbb{C}\setminus\mathcal{B} are defined with the limit ϵ0\epsilon\to 0. Therefore, we can obtain

βi(ϑ)\displaystyle\frac{\partial}{\partial\beta}\mathcal{L}^{i}(\boldsymbol{\vartheta}) =1β2t=1Tn=1Npi(xn,t|𝐘)|xn,t2|2𝑑xn,t1βt=1Tn=1Npi(xn,t|𝐘)𝑑xn,t\displaystyle=\frac{1}{\beta^{2}}\sum_{t=1}^{T}\sum_{n=1}^{N}\int_{\mathcal{B}}p^{i}(x_{n,t}|\mathbf{Y})|x_{n,t}^{2}|^{2}dx_{n,t}-\frac{1}{\beta}\sum_{t=1}^{T}\sum_{n=1}^{N}\int_{\mathcal{B}}p^{i}(x_{n,t}|\mathbf{Y})dx_{n,t}
=1β2t=1Tn=1Nϖn,t(i)(|γn,t(i)|2+τn,tγ(i))1βt=1Tn=1Nϖn,t(i).\displaystyle=\frac{1}{\beta^{2}}\sum_{t=1}^{T}\sum_{n=1}^{N}\varpi_{n,t}(i)(|\gamma_{n,t}(i)|^{2}+\tau_{n,t}^{\gamma}(i))-\frac{1}{\beta}\sum_{t=1}^{T}\sum_{n=1}^{N}\varpi_{n,t}(i). (53)

Then the update rule of β\beta can be derived in (30) by setting (-B) to be zero.

-C Derivation of (31)

We can observe that only the term lnp(λn,t|λn,t1)\ln p(\lambda_{n,t}|\lambda_{n,t-1}) with t=1t=1 depends on pap_{a}, therefore, the partial derivative of (27) with respect to pap_{a} is given by

pai(ϑ)\displaystyle\frac{\partial}{\partial p_{a}}\mathcal{L}^{i}(\boldsymbol{\vartheta}) =n=1Npi(λn,1|𝐘)palnp(λn,1|λn,0)𝑑λn,1,\displaystyle=\sum_{n=1}^{N}\int p^{i}(\lambda_{n,1}|\mathbf{Y})\frac{\partial}{\partial p_{a}}\ln p(\lambda_{n,1}|\lambda_{n,0})d\lambda_{n,1}, (54)

Due to the fact that p(λn,1|λn,0)p(\lambda_{n,1}|\lambda_{n,0}) satisfies Bernoulli distribution, the partial derivative to lnp(λn,1|λn,0)\ln p(\lambda_{n,1}|\lambda_{n,0}) with respect to pap_{a} can be expressed as

palnp(λn,1|λn,0)\displaystyle\frac{\partial}{\partial p_{a}}\ln p(\lambda_{n,1}|\lambda_{n,0}) =2λn,11p(λn,1|λn,0)={11pa,λn,1=0,1pa,λn,1=1.\displaystyle=\frac{2\lambda_{n,1}-1}{p(\lambda_{n,1}|\lambda_{n,0})}=\left\{\begin{array}[]{ll}-\frac{1}{1-p_{a}},&\lambda_{n,1}=0,\\ \frac{1}{p_{a}},&\lambda_{n,1}=1.\end{array}\right. (57)

From the message passing algorithm, the posterior probability of λn,1\lambda_{n,1} can be obtained as

pi(λn,1|𝐘)\displaystyle p^{i}(\lambda_{n,1}|\mathbf{Y}) =ξ(n,1)(n,1)i(λn,1)υ(n,1)(n,1)i(λn,1)ζ(n,1)(n,2)i(λn,1)\displaystyle=\xi^{i}_{(n,1)\rightarrow(n,1)}(\lambda_{n,1})\cdot\upsilon^{i}_{(n,1)\leftarrow(n,1)}(\lambda_{n,1})\cdot\zeta^{i}_{(n,1)\leftarrow(n,2)}(\lambda_{n,1})
=(1κn,1(i))(1λn,1)+κn,1(i)λn,1,\displaystyle=(1-\kappa_{n,1}(i))(1-\lambda_{n,1})+\kappa_{n,1}(i)\lambda_{n,1}, (58)

where

κn,1(i)\displaystyle\kappa_{n,1}(i) =qn,1(i)qn,1(i)pn,1(i)(1qn,1(i))(1qn,1(i))(1pn,1(i))+qn,1(i)qn,1(i)pn,1(i).\displaystyle=\frac{\overrightarrow{q}_{n,1}(i)\overleftarrow{q}_{n,1}(i)\overleftarrow{p}_{n,1}(i)}{(1-\overrightarrow{q}_{n,1}(i))(1-\overleftarrow{q}_{n,1}(i))(1-\overleftarrow{p}_{n,1}(i))+\overrightarrow{q}_{n,1}(i)\overleftarrow{q}_{n,1}(i)\overleftarrow{p}_{n,1}(i)}. (59)

By inserting (-C) into (54), we can express (54) as

pai(ϑ)\displaystyle\frac{\partial}{\partial p_{a}}\mathcal{L}^{i}(\boldsymbol{\vartheta}) =n=1N[1κn,1(i)1pa+κn,1(i)pa].\displaystyle=\sum_{n=1}^{N}\left[-\frac{1-\kappa_{n,1}(i)}{1-p_{a}}+\frac{\kappa_{n,1}(i)}{p_{a}}\right]. (60)

Finally, we obtain (31) as the update rule for pap_{a} by setting (60) equal to zero.

-D Derivation of (32)

It is found that only the terms lnp(λn,t|λn,t1)\ln p(\lambda_{n,t}|\lambda_{n,t-1})s for t2t\geq 2 depend on p10p_{10}, and we have

p(λn,t|λn,t1)=\displaystyle p(\lambda_{n,t}|\lambda_{n,t-1})= (1λn,t1)(1λn,t)p00+(1λn,t1)λn,tp01\displaystyle~{}(1-\lambda_{n,t-1})(1-\lambda_{n,t})p_{00}+(1-\lambda_{n,t-1})\lambda_{n,t}p_{01}
+λn,t1(1λn,t)p10+λn,t1λn,tp11.\displaystyle+\lambda_{n,t-1}(1-\lambda_{n,t})p_{10}+\lambda_{n,t-1}\lambda_{n,t}p_{11}. (61)

Thus, the partial derivative of (27) with respect to p10p_{10} is given by

p10i(ϑ)\displaystyle\frac{\partial}{\partial p_{10}}\mathcal{L}^{i}(\boldsymbol{\vartheta}) =t=2Tn=1Npi(λn,t1,λn,t|𝐘)p10lnp(λn,t|λn,t1)𝑑λn,t1𝑑λn,t\displaystyle=\sum_{t=2}^{T}\sum_{n=1}^{N}\iint p^{i}(\lambda_{n,t-1},\lambda_{n,t}|\mathbf{Y})\frac{\partial}{\partial p_{10}}\ln p(\lambda_{n,t}|\lambda_{n,t-1})d\lambda_{n,t-1}d\lambda_{n,t}
=t=2Tn=1Npi(λn,t1,λn,t|𝐘)[λn,t1(1λn,t)p10λn,t1λn,t1p10]𝑑λn,t1𝑑λn,t\displaystyle=\sum_{t=2}^{T}\sum_{n=1}^{N}\iint p^{i}(\lambda_{n,t-1},\lambda_{n,t}|\mathbf{Y})\Bigg{[}\frac{\lambda_{n,t-1}(1-\lambda_{n,t})}{p_{10}}-\frac{\lambda_{n,t-1}\lambda_{n,t}}{1-p_{10}}\Bigg{]}d\lambda_{n,t-1}d\lambda_{n,t}
=t=2Tn=1N(𝔼[λn,t1]p10𝔼[λn,t1λn,t]p10(1p10)).\displaystyle=\sum_{t=2}^{T}\sum_{n=1}^{N}\left(\frac{\mathbb{E}[\lambda_{n,t-1}]}{p_{10}}-\frac{\mathbb{E}[\lambda_{n,t-1}\lambda_{n,t}]}{p_{10}(1-p_{10})}\right). (62)

Then the update rule of p10p_{10} can be obtained as (32) by setting (-D) equal to zero. To give the specific expression of the update rule for p10p_{10}, we can first obtain 𝔼[λn,t1]=κn,t1(i)\mathbb{E}[\lambda_{n,t-1}]=\kappa_{n,t-1}(i) in the iith iteration, where κn,t1(i)\kappa_{n,t-1}(i) can be similarly obtain by (59). From the MP-based algorithm, the pairwise joint posterior probability of p(λn,t1,λn,t|𝐘)p(\lambda_{n,t-1},\lambda_{n,t}|\mathbf{Y}) can be expressed as

pi(λn,t1,λn,t|𝐘)=\displaystyle p^{i}(\lambda_{n,t-1},\lambda_{n,t}|\mathbf{Y})= p(λn,t|λn,t1)ξ(n,t1)(n,t)i(λn,t1)ζ(n,t1)(n,t)i(λn,t),\displaystyle~{}p(\lambda_{n,t}|\lambda_{n,t-1})\cdot\xi^{i}_{(n,t-1)\rightarrow(n,t)}(\lambda_{n,t-1})\cdot\zeta^{i}_{(n,t-1)\leftarrow(n,t)}(\lambda_{n,t}),
=\displaystyle\quad= p(λn,t|λn,t1)[(1ϕn,t1(i))(1λn,t1)+ϕn,t1(i)λn,t1]\displaystyle~{}p(\lambda_{n,t}|\lambda_{n,t-1})\cdot[(1-\phi_{n,t-1}(i))(1-\lambda_{n,t-1})+\phi_{n,t-1}(i)\lambda_{n,t-1}]
[(1φn,t(i))(1λn,t)+φn,t(i)λn,t],\displaystyle\cdot[(1-\varphi_{n,t}(i))(1-\lambda_{n,t})+\varphi_{n,t}(i)\lambda_{n,t}], (63)

where

ϕn,t1(i)\displaystyle\phi_{n,t-1}(i) =qn,t1(i)pn,t1(i)(1qn,t1(i))(1pn,t1(i))+qn,t1(i)pn,t1(i),\displaystyle=\frac{\overrightarrow{q}_{n,t-1}(i)\overleftarrow{p}_{n,t-1}(i)}{(1-\overrightarrow{q}_{n,t-1}(i))(1-\overleftarrow{p}_{n,t-1}(i))+\overrightarrow{q}_{n,t-1}(i)\overleftarrow{p}_{n,t-1}(i)}, (64)
φn,t(i)\displaystyle\varphi_{n,t}(i) =qn,t(i)pn,t(i)(1qn,t(i))(1pn,t(i))+qn,t(i)pn,t(i).\displaystyle=\frac{\overleftarrow{q}_{n,t}(i)\overleftarrow{p}_{n,t}(i)}{(1-\overleftarrow{q}_{n,t}(i))(1-\overleftarrow{p}_{n,t}(i))+\overleftarrow{q}_{n,t}(i)\overleftarrow{p}_{n,t}(i)}. (65)

From the joint posterior probability p(λn,t1,λn,t|𝐘)p(\lambda_{n,t-1},\lambda_{n,t}|\mathbf{Y}), we can have

𝔼[λn,t1λn,t]\displaystyle\mathbb{E}[\lambda_{n,t-1}\lambda_{n,t}] =p11ϕn,t1(i)φn,t(i).\displaystyle=p_{11}\phi_{n,t-1}(i)\varphi_{n,t}(i). (66)

By inserting (66) into (32), we finally obtain the explicit expression of the update rule for p10p_{10}.

References

  • [1] W. Zhu, M. Tao, and Y. Guan, “Joint user activity detection and channel estimation for temporal-correlated massive access,” in Proc. IEEE Int. Conf. Commun., June 2021, pp. 1–6.
  • [2] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615–637, 2021.
  • [3] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: physical and MAC-layer solutions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 59–65, Sep. 2016.
  • [4] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the internet of things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88–99, Sep. 2018.
  • [5] D. Angelosante, G. B. Giannakis, and E. Grossi, “Compressed sensing of time-varying signals,” in Proc. Int. Conf. Digit. Signal. Process., 2009, pp. 1–8.
  • [6] J. Ziniel and P. Schniter, “Dynamic compressive sensing of time-varying signals via approximate message passing,” IEEE Trans. Signal Process., vol. 61, no. 21, pp. 5270–5284, 2013.
  • [7] F. Kschischang, B. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, 2001.
  • [8] S. Rangan, A. K. Fletcher, V. K. Goyal, E. Byrne, and P. Schniter, “Hybrid approximate message passing,” IEEE Trans. Signal Process., vol. 65, no. 17, pp. 4577–4592, 2017.
  • [9] H. F. Schepker, C. Bockelmann, and A. Dekorsy, “Exploiting sparsity in channel and data estimation for sporadic multi-user communication,” in Proc. Int. Symp. Wireless Commun. Syst., Aug 2013, pp. 1–5.
  • [10] G. Wunder, P. Jung, and M. Ramadan, “Compressive random access using a common overloaded control channel,” in Proc. IEEE Global Commun. Conf. Workshops, Dec 2015, pp. 1–6.
  • [11] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,” IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1890–1904, April 2018.
  • [12] L. Liu and W. Yu, “Massive connectivity with massive MIMO-Part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, June 2018.
  • [13] R. B. D. Renna and R. C. de Lamare, “Dynamic message scheduling based on activity-aware residual belief propagation for asynchronous mMTC,” IEEE Wireless Commun. Lett., vol. 10, no. 6, pp. 1290–1294, 2021.
  • [14] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabled massive MIMO: A compressive sensing approach,” IEEE Trans. Commun., vol. 66, no. 12, pp. 6164–6175, Dec 2018.
  • [15] S. Jiang, X. Yuan, X. Wang, C. Xu, and W. Yu, “Joint user identification, channel estimation, and signal detection for grant-free NOMA,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6960–6976, 2020.
  • [16] A. Fengler, S. Haghighatshoar, P. Jung, and G. Caire, “Non-bayesian activity detection, large-scale fading coefficient estimation, and unsourced random access with a massive MIMO receiver,” IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 2925–2951, 2021.
  • [17] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Phase transition analysis for covariance-based massive random access with massive mimo,” IEEE Trans. Inf. Theory, vol. 68, no. 3, pp. 1696–1715, 2021.
  • [18] X. Shao, X. Chen, and R. Jia, “A dimension reduction-based joint activity detection and channel estimation algorithm for massive access,” IEEE Trans. Signal Process., vol. 68, pp. 420–435, 2020.
  • [19] Z. Zhang, Y. Li, C. Huang, Q. Guo, C. Yuen, and Y. L. Guan, “DNN-aided block sparse bayesian learning for user activity detection and channel estimation in grant-free non-orthogonal random access,” IEEE Trans. Veh. Technol., vol. 68, no. 12, pp. 12 000–12 012, Dec 2019.
  • [20] X. Shao, X. Chen, Y. Qiang, C. Zhong, and Z. Zhang, “Feature-aided adaptive-tuning deep learning for massive device detection,” IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 1899–1914, 2021.
  • [21] W. Zhu, M. Tao, X. Yuan, and Y. Guan, “Deep-learned approximate message passing for asynchronous massive connectivity,” IEEE Trans. Wireless Commun., vol. 20, no. 8, pp. 5434–5448, 2021.
  • [22] Y. Li, M. Xia, and Y. Wu, “Activity detection for massive connectivity under frequency offsets via first-order algorithms,” IEEE Trans. Wireless Commun., vol. 18, no. 3, pp. 1988–2002, 2019.
  • [23] Z. Chen, F. Sohrabi, and W. Yu, “Multi-cell sparse activity detection for massive random access: Massive MIMO versus cooperative MIMO,” IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4060–4074, 2019.
  • [24] M. Ke, Z. Gao, Y. Wu, X. Gao, and K. K. Wong, “Massive access in cell-free massive MIMO-based internet of things: Cloud computing and edge computing paradigms,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 756–772, 2021.
  • [25] Y. Polyanskiy, “A perspective on massive random-access,” in Proc IEEE Int. Symp. Inf. Theory, June 2017, pp. 2523–2527.
  • [26] B. Wang, L. Dai, Y. Zhang, T. Mir, and J. Li, “Dynamic compressive sensing-based multi-user detection for uplink grant-free NOMA,” IEEE Commun. Lett., vol. 20, no. 11, pp. 2320–2323, 2016.
  • [27] Y. Du, B. Dong, Z. Chen, X. Wang, Z. Liu, P. Gao, and S. Li, “Efficient multi-user detection for uplink grant-free noma: Prior-information aided adaptive compressive sensing perspective,” IEEE J. Sel. Areas Commun., vol. 35, no. 12, pp. 2812–2828, 2017.
  • [28] J.-C. Jiang and H.-M. Wang, “Massive random access with sporadic short packets: Joint active user detection and channel estimation via sequential message passing,” IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4541–4555, 2021.
  • [29] Q. Wang, L. Liu, S. Zhang, and F. C. M. Lau, “On massive IoT connectivity with temporally-correlated user activity,” in Proc. IEEE Int. Symp. Inf. Theory, 2021, pp. 3020–3025.
  • [30] W. Zhu, M. Tao, and Y. Guan, “Double-sided information aided temporal-correlated massive access,” IEEE Wireless Commun. Lett., vol. 11, no. 9, pp. 1860–1864, 2022.
  • [31] J. P. Vila and P. Schniter, “Expectation-maximization gaussian-mixture approximate message passing,” IEEE Trans. Signal Process., vol. 61, no. 19, pp. 4658–4672, 2013.
  • [32] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. Inf. Theory, 2011, pp. 2168–2172.
  • [33] T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal Process. Mag., vol. 13, no. 6, pp. 47–60, 1996.
  • [34] R. M. Neal and G. E. Hinton, “A view of the em algorithm that justifies incremental, sparse, and other variants,” in Learning in graphical models.   Berlin: Springer, 1998, pp. 355–368.