This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Joint Activity-Delay Detection and Channel Estimation for Asynchronous Massive Random Access: A Free Probability Theory Approach

Xinyu Bian,  Yuyi Mao,  and Jun Zhang X. Bian and J. Zhang are with the Department of Electronic and Computer Engineering, the Hong Kong University of Science and Technology, Hong Kong (E-mail: [email protected], [email protected]). Y. Mao is with the Department of Electrical and Electronic Engineering, the Hong Kong Polytechnic University, Hong Kong (E-mail: [email protected]). This work was supported by the General Research Fund (Project No. 15207220) from the Hong Kong Research Grants Council. (Corresponding author: Yuyi Mao) Part of this work was presented at the 2023 IEEE Global Communications Conference (GLOBECOM) [1].
Abstract

Grant-free random access (RA) has been recognized as a promising solution to support massive connectivity due to the removal of the uplink grant request procedures. While most endeavours assume perfect synchronization among users and the base station, this paper investigates asynchronous grant-free massive RA, and develop efficient algorithms for joint user activity detection, synchronization delay detection, and channel estimation. Considering the sparsity on user activity, we formulate a sparse signal recovery problem and propose to utilize the framework of orthogonal approximate message passing (OAMP) to deal with the non-independent and identically distributed (i.i.d.) Gaussian pilot matrices caused by the synchronization delays. In particular, an OAMP-based algorithm is developed to fully harness the common sparsity among received pilot signals from multiple base station antennas. To reduce the computational complexity, we further propose a free probability AMP (FPAMP)-based algorithm, which exploits the rectangular free cumulants to make the cost-effective AMP framework compatible to general pilot matrices. Simulation results demonstrate that the two proposed algorithms outperform various baselines, and the FPAMP-based algorithm reduces 40%40\% of the computations while maintaining comparable detection/estimation accuracy with the OAMP-based algorithm.

Index Terms:
Grant-free massive random access, activity detection, delay detection, channel estimation, asynchronous connectivity, free probability theory, approximate message passing (AMP).

I Introduction

As applications supported by massive machine-type communications (mMTC) technologies rapidly evolve [2], an enormous challenge of supporting reliable communication for massive devices with limited bandwidth resources arises [3]. This challenge should be tackled together with other unique features of mMTC, such as the sporadic data traffic pattern, via novel random access (RA) schemes [4]. Conventionally, each user (e.g. a Internet-of-Things device) requests a grant from the base station (BS) for its transmission by initiating a complex handshaking process. Such grant-based RA schemes shall result in significant signaling overhead and access latency [5, 6], especially in the target scenarios with massive potential users. In addition, due to the limited preamble sequences for uplink grant request, grant-based RA suffers from potential collisions when two or more users pick the same sequence [7]. As a result, grant-free RA, where each active user directly transmits data without waiting for access permissions from the BS, is more appealing for mMTC [8].

Since the user activity is unknown beforehand at the BS with grant-free RA, each user is assigned with a fixed and unique pilot sequence to enable user activity detection and channel estimation, which is critical to the downstream data detection and decoding [9]. However, attributed to the large number of users and the limited resources for pilot transmission, only non-orthogonal pilot sequences can be adopted, which unavoidably compromises the accuracy of user activity detection and channel estimation. Therefore, many works have been carried out to develop advanced user activity detection and channel estimation algorithms for grant-free RA systems.

I-A Related Works and Motivations

User activity detection and channel estimation have been the hot spots in the research area of grant-free RA. In [10] and [11], the set of active users are determined by solving a maximum likelihood (ML) estimation problem via coordinate-wise descent according to the sample covariance-matrix of received pilot signal. Joint activity detection and channel estimation (JADCE) was investigated in [12, 13, 14, 15, 17] by adopting the family of approximate message passing (AMP) algorithms. In particular, by formulating JADCE as a single measurement vector (SMV) and multiple measurement vector (MMV) compressive sensing problem with single- and multi-antenna BS, respectively, a variety of AMP algorithms with a minimum mean square error (MMSE) denoiser were developed in [12]. To improve the user activity detection and channel estimation accuracy, wireless channel sparsity from the spatial and angular domain were further utilized by a general MMV-AMP algorithm in [13]. Besides, the common sparsity pattern in the received pilot and data signal was exploited via the bilinear generalized AMP (BiG-AMP) algorithm along with soft data decoding information in [14], while a correlated AMP algorithm were developed to take advantages of user activity correlation among transmission blocks in [15]. Moreover, low-complexity JADCE methods were proposed based on deep learning in [16, 17].

Nevertheless, the above studies assume that users and the BS are perfectly synchronized, which is impractical in grant-free RA systems. This is because without coordination of the BS, different users may send pilot sequences on-demand, which can be randomly delayed by some unknown symbol periods [18]. Therefore, asynchronous grant-free massive RA has attracted growing interest. Specifically, the joint activity and delay detection problem in asynchronous grant-free RA was formulated as a group least absolute shrinkage and selection operator (LASSO) problem in [19], which was solved by a block coordinate descent algorithm. However, this algorithm does not leverage the common sparsity among the received pilot signals of multiple BS antennas. Accordingly, a sample covariance matrix-based approach was proposed in [20] to achieve higher accuracy of activity and delay detection at increased computational complexity. Note that the methods in [19] and [20] require separate channel estimation procedures following user activity and delay detection. To overcome this limitation, joint activity-delay detection and channel estimation was investigated in [21] and [22], where a low-complexity orthogonal matching pursuit (OMP)-based and alternating direction method of multipliers (ADMM)-based methods were developed, respectively. However, it can be anticipated that both methods are far from optimal. On one hand, the OMP-based algorithm is not able to utilize prior knowledge of the channel coefficients. On the other hand, the ADMM-based approach does not explicitly incorporate the synchronization delays in optimizations. Therefore, [23] proposed a model-driven learned AMP (LAMP) network for asynchronous grant-free massive RA, which unrolled the AMP iterations as neural network layers. However, since entries of the effective pilot matrices in asynchronous grant-free massive RA systems are not independent and identically distributed (i.i.d.) Gaussian due to the delayed pilot symbols, such a basic AMP-based algorithm cannot achieve the best detection and estimation accuracy. It is thus of pressing needs to develop advanced receivers for asynchronous grant-free massive RA to enhance the performance of activity-delay detection and channel estimation.

I-B Contributions

In this paper, we develop novel joint activity-delay detection and channel estimation algorithms for asynchronous grant-free massive RA systems, which enjoy premium performance and low computational complexity. Our main contributions are summarized below.

  • Given that the conventional AMP-based joint activity detection and channel estimation algorithm fails to handle pilot matrices with non-i.i.d. Gaussian entries caused by the synchronization delay, the framework of orthogonal AMP (OAMP) [24] is adopted to jointly estimate the user activity, synchronization delay, and channel coefficients for asynchronous grant-free massive RA systems. To boost the detection/estimation performance, the common sparsity among multiple BS antennas is also utilized to update the prior information in each iteration of the OAMP-based algorithm.

  • One drawback of the OAMP-based algorithm is the high computational complexity due to the use of a linear MMSE (LMMSE) estimator. In order to reduce the complexity, we resort to the free probability theory and develop a free probability AMP (FPAMP)-based algorithm for joint activity-delay detection and channel estimation. Unlike the OAMP-based algorithm, FPAMP estimates the moments of the eigenvalue distributions of the pilot matrix, and exploits its rectangular free cumulants to extend the low-complexity AMP framework for general pilot matrices, which avoids using the complex LMMSE estimator.

  • Simulation results show that the proposed OAMP-based and FPAMP-based algorithms significantly reduce the activity-delay detection and channel estimation errors compared with the baseline schemes. In particular, if the probability of false alarm and missed detection requirement are respectively 10110^{-1} and 2×1032\times 10^{-3}, the two proposed algorithms are able to support 76%76\% more active users in comparsion with the conventional AMP-based algorithm. In addition, the FPAMP-based algorithm enjoys a similar complexity as the AMP-based algorithm, while maintaining almost the same detection/estimation performance as the OAMP-based algorithm.

I-C Organization

The rest of this paper is organized as follows. We introduce the system model in Section II. In Section III, we propose an OAMP-based algorithm for joint activity-delay detection and channel estimation. A low-complexity FPAMP-based algorithm is developed in Section IV. Simulation results are presented in Section V and conclusions are drawn in Section VI.

I-D Notations

We use lower-case letters, bold-face lower-case letters, bold-face upper-case letters, and math calligraphy letters to denote scalars, vectors, matrices, and sets, respectively. The transpose and conjugate transpose of matrix 𝐌\mathbf{M} are denoted as 𝐌T\mathbf{M}^{\mathrm{T}} and 𝐌H\mathbf{M}^{\mathrm{H}}, respectively. Besides, we denote the complex Gaussian distribution with mean 𝝁\bm{\mu} and covariance matrix 𝚺\bm{\Sigma} as 𝒞𝒩(𝝁,𝚺)\mathcal{C}\mathcal{N}(\bm{\mu},\bm{\Sigma}), and the probability density function (PDF) of a complex Gaussian variable 𝒙\bm{x} as 𝒞𝒩(𝒙;𝝁,𝚺)\mathcal{C}\mathcal{N}(\bm{x};\bm{\mu},\bm{\Sigma}). In addition, 𝐈L\mathbf{I}_{L} denotes the LL-by-LL identify matrix, δ0\delta_{0} denotes the Dirac delta function, “\otimes” stands for the Kronecker product, tr()\operatorname{tr}(\cdot) returns the trace of a matrix, and 𝔼[]\mathbb{E}[\cdot] and Var[]\operatorname{Var}[\cdot] denote the statistical expectation and variance, respectively. Given vector 𝐱=(x1,,xL)T\mathbf{x}=(x_{1},\cdots,x_{L})^{\mathrm{T}}, we use 𝐱\left\langle\mathbf{x}\right\rangle to denote its empirical average, i.e., 1Li=1Lxi\frac{1}{L}\sum_{i=1}^{L}x_{i}. We further use [𝐌]i1:i2,j1:j2[\mathbf{M}]_{i_{1}:i_{2},j_{1}:j_{2}} to denote the submatrix obtained by extracting elements of matrix 𝐌\mathbf{M} from row i1i_{1} to i2i_{2} and column j1j_{1} to j2j_{2}. If i1=i2i_{1}=i_{2} or j1=j2j_{1}=j_{2}, the submatrix reduces to a vector and the second index is omitted. Moveover, [f()]t[f(\cdot)]_{t} denotes the tt-th entry of the output of vector-valued function f()f(\cdot).

II System Model

We consider an uplink asynchronous grant-free massive RA system with NN single-antenna users and an MM-antenna BS, where the set of users and BS antennas are denoted as 𝒩{1,,N}\mathcal{N}\triangleq\{1,\cdots,N\} and {1,,M}\mathcal{M}\triangleq\{1,\cdots,M\}, respectively. Users are assumed to have sporadic data traffic. Specifically, at each channel block, KK (KNK\leq N) of the NN users independently become active for uplink transmission with equal probability [14]. Let un{0,1}u_{n}\in\{0,1\} be the activity indicator of user nn, where un=1u_{n}=1 indicates that user nn is active and vice versa. Denote the set of active users as 𝒦{n𝒩|un=1}\mathcal{K}\triangleq\left\{n\in\mathcal{N}|u_{n}=1\right\}. We assume quasi-static Rayleigh fading channels for simplicity, and denote the uplink channel vector from user nn to the BS as 𝐟n𝒞𝒩(𝟎,βn𝐈M)\mathbf{f}_{n}\sim\mathcal{CN}(\mathbf{0},\beta_{n}\mathbf{I}_{M}), where βn\beta_{n} is the path loss of user nn known at the BS.

A grant-free RA scheme is adopted for uplink transmission, which divides each transmission block into two phases for transmitting pilot and data signals, respectively. In the first phase, L¯\bar{L} pilot symbols are transmitted for user activity detection and channel estimation at the BS. Since orthogonal pilot sequences are insufficient for massive potential users, a unique and non-orthogonal pilot sequence, denoted as L¯𝐩¯n\sqrt{\bar{L}}\bar{\mathbf{p}}_{n}, where 𝐩¯n[p¯n,1,,p¯n,L¯]T\bar{\mathbf{p}}_{n}\triangleq\left[\bar{p}_{n,1},\cdots,\bar{p}_{n,\bar{L}}\right]^{\mathrm{T}} and p¯n,l𝒞𝒩(0,1/L¯)\bar{p}_{n,l}\sim\mathcal{C}\mathcal{N}\left(0,1/\bar{L}\right), is assigned to each user [12]. However, attributed to the synchronization delay, signal transmissions of different active users may start at different time instant. Without loss of generality, we assume each user transmits asynchronously in the frame level, but synchronously in the symbol level, i.e., the pilot sequence is transmitted with a delay of some unknown symbol periods. We use tnt_{n} to denote the unknown delay for user nn, which is assumed to be an integer uniformly distributed in set {0,,T}\{0,\cdots,T\} with TT denoting the maximum synchronization delay. To avoid interference between pilot and data signals, we insert a guard interval spanning TT symbol periods between them following [23], as shown in Fig. 1. Therefore, the expanded pilot sequence of user nn with delay tnt_{n}, denoted as 𝐩~n,tn\tilde{\mathbf{p}}_{n,t_{n}}, can be expressed as 𝐩~n,tn[𝟎tnT,𝐩¯nT,𝟎TtnT]T\tilde{\mathbf{p}}_{n,t_{n}}\triangleq\left[\mathbf{0}_{t_{n}}^{\mathrm{T}},\bar{\mathbf{p}}_{n}^{\mathrm{T}},\mathbf{0}_{T-t_{n}}^{\mathrm{T}}\right]^{\mathrm{T}}, which is a sequence with length L=L¯+TL=\bar{L}+T obtained by padding tnt_{n} and TtnT-t_{n} zeros before and after 𝐩¯n\bar{\mathbf{p}}_{n}, respectively. Correspondingly, by considering all possible synchronization delays, we define 𝐏n[𝐩~n,0,,𝐩~n,T]L×(T+1)\mathbf{P}_{n}\triangleq\left[\tilde{\mathbf{p}}_{n,0},\cdots,\tilde{\mathbf{p}}_{n,T}\right]\in\mathbb{C}^{L\times(T+1)} as the expanded pilot matrix of user nn.

Refer to caption
Figure 1: Frame structure of asynchronous grant-free massive RA, where a guard interval spanning T=3T=3 symbol periods are inserted between pilot and data symbols.

Since both the user activity and synchronization delay are unknown at the BS, we further define the expanded indicator ϕn[ϕn,0,,ϕn,T]T\boldsymbol{\phi}_{n}\triangleq[\phi_{n,0},\cdots,\phi_{n,T}]^{\mathrm{T}} for user nn, where ϕn,t=1\phi_{n,t}=1 only when un=1u_{n}=1 and tn=tt_{n}=t. The received pilot signal 𝐘~L×M\tilde{\mathbf{Y}}\in\mathbb{C}^{L\times M} at the BS is expressed as follows:

𝐘~\displaystyle\tilde{\mathbf{Y}} =ρL𝐏𝐇+𝐍~,\displaystyle=\sqrt{\rho L}\mathbf{P}\mathbf{H}+\tilde{\mathbf{N}}, (1)

where 𝐏[𝐏1,,𝐏N]L×(T+1)N\mathbf{P}\triangleq[\mathbf{P}_{1},\cdots,\mathbf{P}_{N}]\in\mathbb{C}^{L\times(T+1)N} concatenates the expanded pilot matrices of all users, and 𝐇[𝐇1T,,𝐇NT]T(T+1)N×M\mathbf{H}\triangleq\left[\mathbf{H}_{1}^{\mathrm{T}},...,\mathbf{H}_{N}^{\mathrm{T}}\right]^{\mathrm{T}}\in\mathbb{C}^{(T+1)N\times M} stands for the expanded effective channel matrix with 𝐇nϕn𝐟n(T+1)×M\mathbf{H}_{n}\triangleq\boldsymbol{\phi}_{n}\otimes\mathbf{f}_{n}\in\mathbb{C}^{(T+1)\times M}. Besides, ρ\rho is the transmit power, and 𝐍~=[𝐧~1,,𝐧~L]T\tilde{\mathbf{N}}=\left[\tilde{\mathbf{n}}_{1},...,\tilde{\mathbf{n}}_{L}\right]^{\mathrm{T}} denotes the Gaussian noise with zero mean and variance σ2\sigma^{2} for each element. We normalize the received signal and noise as 𝐘𝐘~/ρL\mathbf{Y}\triangleq\tilde{\mathbf{Y}}/\sqrt{\rho L} and 𝐍𝐍~/ρL\mathbf{N}\triangleq\tilde{\mathbf{N}}/\sqrt{\rho L} respectively for notational convenience.

In contrast to synchronous grant-free massive RA systems where only the user activity and channel coefficients need to be estimated, the synchronization delays of active users warrant additional attention in asynchronous grant-free massive RA systems. However, the heterogeneous synchronization delays among active users breaks the standard assumption that entries of the expanded pilot matrix are independent and follow an identical Gaussian distribution, which makes existing AMP algorithms incompetent. In the following sections, we will develop efficient algorithms to detect the set of active users, and estimate their synchronization delays as well as channel coefficients.

III OAMP-based Algorithm for Asynchronous Massive RA

To tackle the limitation of AMP in handling the non-i.i.d. entries in the expanded pilot matrix 𝐏\mathbf{P}, the framework of OAMP [24] emerges as an ideal candidate. However, the original OAMP framework was proposed for SMV problems, which cannot be directly applied to asynchronous grant-free RA systems with multi-antenna BSs. Therefore, we first derive the operations in an OAMP iteration based on the received signal of individual BS antennas in Section III-A, where a denoiser is carefully designed according to prior information of the expanded effective channel matrix. Then, we further develop the OAMP iterations by utilizing the common sparsity among received signals of multiple BS antennas [25], i.e., enforcing the same prior sparsity ratio for all antennas, in Section III-B, leading to the OAMP-based algorithm for asynchronous grant-free massive RA.

III-A OAMP Iteration for Each BS Antenna

Since both the user activity and synchronization delay are embedded in 𝐇\mathbf{H} and can be determined accordingly, we first derive the operations in an OAMP iteration for each BS antenna based on its normalized received pilot signal given as follows:

𝐲m=𝐏𝐡m+𝐧m,m,\displaystyle\mathbf{y}_{m}=\mathbf{P}\mathbf{h}_{m}+\mathbf{n}_{m},\forall m\in\mathcal{M}, (2)

where 𝐲m\mathbf{y}_{m}, 𝐡m\mathbf{h}_{m}, and 𝐧m\mathbf{n}_{m} denote the mm-th column of 𝐘\mathbf{Y}, 𝐇\mathbf{H}, and 𝐍\mathbf{N}, respectively. The conventional OAMP algorithm iterates between a linear estimator (LE) and a non-linear estimator (NLE). In principle, by restricting a de-correlated estimator in LE and a divergence-free estimator in NLE, the input and output errors for both LE and NLE are orthogonal so that the OAMP algorithm achieves Bayes-optimal performance. Starting with 𝐬m(1)=𝟎\mathbf{s}_{m}^{(1)}=\mathbf{0}, the operations in the ii-th OAMP iteration is expressed as follows:

LE:𝐫m(i)=𝐬m(i)+𝐖m(i)(𝐲m𝐏𝐬m(i)),\displaystyle\text{LE:}\ \ \ \ \ \ \mathbf{r}_{m}^{(i)}=\mathbf{s}_{m}^{(i)}+\mathbf{W}_{m}^{(i)}(\mathbf{y}_{m}-\mathbf{P}\mathbf{s}_{m}^{(i)}), (3)
NLE:𝐬m(i+1)=η(i)(𝐫m(i))=Cm(i)(η^(i)(𝐫m(i))n=1Nt=0T[η^(i)(𝐫n,m(i))]t(T+1)N𝐫m(i)),\displaystyle\begin{aligned} \text{NLE:}\ \ \ \mathbf{s}_{m}^{(i+1)}&=\!\eta^{(i)}(\mathbf{r}_{m}^{(i)})\\ &=\!C_{m}^{(i)}\Big{(}\hat{\eta}^{(i)}(\mathbf{r}_{m}^{(i)})\!-\!\frac{\sum_{n=1}^{N}\sum_{t=0}^{T}[\hat{\eta}^{\prime(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}}{(T+1)N}\mathbf{r}_{m}^{(i)}\Big{)},\end{aligned} (4)

where 𝐫m(i)\mathbf{r}^{\left(i\right)}_{m} and 𝐬m(i+1)\mathbf{s}_{m}^{\left(i+1\right)} are respectively the output of the LE and NLE, and other notations will be introduced in the sequel.

III-A1 LE

The LE aims at decorrelating the vector estimation problem in (2) to NN scalar estimation problems for each user, which is achieved by restricting the de-correlated estimator 𝐖m(i)\mathbf{W}_{m}^{(i)} as the following the optimal structure:

𝐖m(i)=(T+1)Ntr(𝐖^m(i)𝐏)𝐖^m(i),\displaystyle\mathbf{W}_{m}^{(i)}=\frac{(T+1)N}{\operatorname{tr}\left(\hat{\mathbf{W}}_{m}^{(i)}\mathbf{P}\right)}\hat{\mathbf{W}}_{m}^{(i)}, (5)

where 𝐖^m(i)=𝐏H(𝐏𝐏H+(σ2/ρL(vm(i))2)𝐈L)1\hat{\mathbf{W}}_{m}^{(i)}\!=\!\mathbf{P}^{\mathrm{H}}\left(\mathbf{P}\mathbf{P}^{\mathrm{H}}+(\sigma^{2}/\rho L(v_{m}^{(i)})^{2})\mathbf{I}_{L}\right)^{-1} is the LMMSE estimator with (vm(i))2(v_{m}^{(i)})^{2} representing the mean square error between the output of the NLE and the actual expanded channel vector 𝐡m\mathbf{h}_{m} defined as

(vm(i))2𝔼[𝐬m(i)𝐡m22](T+1)N.\displaystyle(v_{m}^{(i)})^{2}\triangleq\frac{\mathbb{E}[||\mathbf{s}_{m}^{(i)}-\mathbf{h}_{m}||_{2}^{2}]}{(T+1)N}. (6)

The empirical evaluation of (vm(i))2(v_{m}^{(i)})^{2} will be presented in (12) when introducing the NLE. Similarly, the mean square error between the output of the LE and the actual expanded channel vector 𝐡m\mathbf{h}_{m}, i.e., (τm(i))2𝔼[𝐫m(i)𝐡m22](T+1)N(\tau_{m}^{(i)})^{2}\triangleq\frac{\mathbb{E}[||\mathbf{r}_{m}^{(i)}-\mathbf{h}_{m}||_{2}^{2}]}{(T+1)N}, can be empirically evaluated as follows:

(τm(i))2tr(𝐁m(i)(𝐁m(i))H)(vm(i))2+tr(𝐖m(i)(𝐖m(i))H)σ2ρL(T+1)N,\displaystyle(\tau_{m}^{(i)})^{2}\approx\frac{\operatorname{tr}\left(\mathbf{B}_{m}^{(i)}(\mathbf{B}_{m}^{(i)})^{\mathrm{H}}\right)(v_{m}^{(i)})^{2}+\operatorname{tr}\left(\mathbf{W}_{m}^{(i)}(\mathbf{W}_{m}^{(i)})^{\mathrm{H}}\right)\cdot\frac{\sigma^{2}}{\rho L}}{(T+1)N}, (7)

where 𝐁m(i)𝐈(T+1)N𝐖m(i)𝐏\mathbf{B}_{m}^{(i)}\triangleq\mathbf{I}_{(T+1)N}-\mathbf{W}_{m}^{(i)}\mathbf{P}.

III-A2 NLE

The NLE applies a denoiser η(i)()\eta^{(i)}(\cdot) to the output of the LE, which is restricted to be divergence-free, i.e., 𝔼[η(i)()]=0\mathbb{E}\left[\eta^{\prime(i)}(\cdot)\right]=0 with η(i)()\eta^{\prime(i)}(\cdot) denoting the first-order derivative of η(i)()\eta^{(i)}(\cdot). Therefore, η^(i)()\hat{\eta}^{(i)}(\cdot) in the optimal NLE is the section-wise MMSE denoiser given as follows:

η^(i)(𝐫n,m(i))\displaystyle\hat{\eta}^{(i)}(\mathbf{r}_{n,m}^{(i)}) =𝔼[𝐡n,m𝐫n,m(i)],\displaystyle=\mathbb{E}[\mathbf{h}_{n,m}\mid\mathbf{r}_{n,m}^{(i)}], (8)

where 𝐡m\mathbf{h}_{m} and 𝐫m\mathbf{r}_{m} contain NN sections, i.e., 𝐡m=[𝐡1,mT,,𝐡N,mT]T\mathbf{h}_{m}=[\mathbf{h}_{1,m}^{\mathrm{T}},\cdots,\mathbf{h}_{N,m}^{\mathrm{T}}]^{\mathrm{T}} and 𝐫m(i)=[(𝐫1,m(i))T,,\mathbf{r}_{m}^{(i)}=[(\mathbf{r}_{1,m}^{(i)})^{\mathrm{T}},\cdots, (𝐫N,m(i))T]T(\mathbf{r}_{N,m}^{(i)})^{\mathrm{T}}]^{\mathrm{T}}. This means that the estimation result of the denoiser in the ii-th OAMP iteration is the posterior mean of 𝐡n,m\mathbf{h}_{n,m} given 𝐫n,m(i)\mathbf{r}_{n,m}^{(i)}. Thus, the posterior variance of each entry in 𝐡n,m\mathbf{h}_{n,m}, i.e., hn,t,mh_{n,t,m}, is given as follows:

ψn,t,m(i)=𝔼[[η^(i)(𝐫n,m(i))]thn,t,m2].\displaystyle\psi_{n,t,m}^{(i)}=\mathbb{E}[||[\hat{\eta}^{(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}-h_{n,t,m}||^{2}]. (9)

Besides, Cm(i)C_{m}^{(i)} in the optimal NLE, i.e., (4), is given as follows:

Cm(i)=(τm(i))2(τm(i))2ψ¯m(i),\displaystyle C_{m}^{(i)}=\frac{(\tau_{m}^{(i)})^{2}}{(\tau_{m}^{(i)})^{2}-\bar{\psi}_{m}^{(i)}}, (10)

where ψ¯m(i)n=1Nt=0Tψn,t,m(i)(T+1)N\bar{\psi}_{m}^{(i)}\triangleq\frac{\sum_{n=1}^{N}\sum_{t=0}^{T}\psi_{n,t,m}^{(i)}}{(T+1)N}. On the other hand, the term n=1Nt=0T[η^(i)(𝐫n,m(i))]t(T+1)N\frac{\sum_{n=1}^{N}\sum_{t=0}^{T}[\hat{\eta}^{\prime(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}}{(T+1)N} is derived as

n=1Nt=0T[η^(i)(𝐫n,m(i))]t(T+1)N=ψ¯m(i)(τm(i))2,\displaystyle\frac{\sum_{n=1}^{N}\sum_{t=0}^{T}[\hat{\eta}^{\prime(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}}{(T+1)N}=\frac{\bar{\psi}_{m}^{(i)}}{(\tau_{m}^{(i)})^{2}}, (11)

and (vm(i+1))2(v_{m}^{(i+1)})^{2} can be calculated as

(vm(i+1))2(1ψ¯m(i)1(τm(i))2)1.\displaystyle(v_{m}^{(i+1)})^{2}\approx\left(\frac{1}{\bar{\psi}_{m}^{(i)}}-\frac{1}{(\tau_{m}^{(i)})^{2}}\right)^{-1}. (12)

It remains to obtain the posterior mean and variance in (8) and (9), for which, the posterior distribution of 𝐡m\mathbf{h}_{m} is needed. For this purpose, the prior information and the likelihood function of 𝐡m\mathbf{h}_{m} need first to be obtained. Since all users become active with equal probability and the synchronization delay of an active user is uniformly distributed, for user nn, there is at most one non-zero element in its expanded channel vector 𝐡n,m\mathbf{h}_{n,m}, i.e., it is either inactive or active and has a particular synchronization delay. As a result, we model the prior information of 𝐡n,m\mathbf{h}_{n,m} for user nn as follows:

p(𝐡n,m)=(1t=0Tλn,t,m)t=0Tδ0(hn,t,m)+t=0T(λn,t,m𝒞𝒩(hn,t,m;0,βn)ttδ0(hn,t,m)),\displaystyle\begin{aligned} p\left(\mathbf{h}_{n,m}\right)&=\left(1-\sum_{t=0}^{T}\lambda_{n,t,m}\right)\prod_{t=0}^{T}\delta_{0}\left(h_{n,t,m}\right)\\ &+\sum_{t=0}^{T}\left(\lambda_{n,t,m}\mathcal{CN}\left(h_{n,t,m};0,\beta_{n}\right)\prod_{t^{\prime}\neq t}\delta_{0}(h_{n,t^{\prime},m})\right),\end{aligned} (13)

where λn,t,m=K(T+1)N\lambda_{n,t,m}=\frac{K}{(T+1)N}, t{0,,T}\forall t\in\{0,\cdots,T\}. Since 𝐫n,m(i)\mathbf{r}_{n,m}^{(i)} is modeled as an observation of 𝐡n,m\mathbf{h}_{n,m} with additive white Gaussian noise (AWGN) in the OAMP framework, i.e., 𝐫n,m(i)=𝐡n,m+τm(i)𝐳n,m\mathbf{r}_{n,m}^{(i)}=\mathbf{h}_{n,m}+\tau_{m}^{(i)}\mathbf{z}_{n,m} with 𝐳n,m𝒞𝒩(𝟎,𝐈T+1)\mathbf{z}_{n,m}\sim\mathcal{CN}(\mathbf{0},\mathbf{I}_{T+1}) independent of 𝐡n,m\mathbf{h}_{n,m}, according to the Bayes’ theorem, the posterior distribution of 𝐡n,m\mathbf{h}_{n,m} can be obtained as follows:

p(𝐡n,m|𝐫n,m(i))=(1t=0Tπn,t,m(i))t=0Tδ0(hn,t,m)+t=0T(πn,t,m(i)𝒞𝒩(hn,t,m;μn,t,m(i),Γn,t,m(i))×ttδ0(hn,t,m)),\displaystyle\begin{aligned} p\left(\mathbf{h}_{n,m}|\mathbf{r}_{n,m}^{(i)}\right)&=\left(1-\sum_{t=0}^{T}\pi_{n,t,m}^{(i)}\right)\prod_{t=0}^{T}\delta_{0}\left(h_{n,t,m}\right)\\ &+\sum_{t=0}^{T}\Bigg{(}\pi_{n,t,m}^{(i)}\mathcal{CN}\left(h_{n,t,m};\mu_{n,t,m}^{(i)},\Gamma_{n,t,m}^{(i)}\right)\\ &\times\prod_{t^{\prime}\neq t}\delta_{0}(h_{n,t^{\prime},m})\Bigg{)},\end{aligned} (14)

where μn,t,m(i)=βnrn,t,m(i)(τm(i))2+βn\mu_{n,t,m}^{(i)}=\frac{\beta_{n}r_{n,t,m}^{(i)}}{(\tau_{m}^{(i)})^{2}+\beta_{n}}, Γn,t,m(i)=(τm(i))2βn(τm(i))2+βn\Gamma_{n,t,m}^{(i)}=\frac{(\tau_{m}^{(i)})^{2}\beta_{n}}{(\tau_{m}^{(i)})^{2}+\beta_{n}}, and the posterior sparsity ratio is given as

πn,t,m(i)=eβn|rn,t,m(i)|2(τm(i))2(βn+(τm(i))2)(1+βn(τm(i))2)1t=0Tλn,t,mλn,t,m+t=0Tλn,t,mλn,t,meβn|rn,t,m(i)|2(τm(i))2(βn+(τm(i))2).\displaystyle\pi_{n,t,m}^{(i)}=\frac{e^{\frac{\beta_{n}|r_{n,t,m}^{(i)}|^{2}}{(\tau_{m}^{(i)})^{2}(\beta_{n}+(\tau_{m}^{(i)})^{2})}}}{\left(1\!+\!\frac{\beta_{n}}{(\tau_{m}^{(i)})^{2}}\right)\frac{1\!-\!\sum\limits_{t^{\prime}=0}^{T}\lambda_{n,t^{\prime},m}}{\lambda_{n,t,m}}\!+\!\sum\limits_{t^{\prime}=0}^{T}\frac{\lambda_{n,t^{\prime},m}}{\lambda_{n,t,m}}e^{\frac{\beta_{n}|r_{n,t^{\prime},m}^{(i)}|^{2}}{(\tau_{m}^{(i)})^{2}(\beta_{n}+(\tau_{m}^{(i)})^{2})}}}. (15)

Therefore, the posterior mean (8) and variance (9) are given respectively as follows:

[η^(i)(𝐫n,m(i))]t=πn,t,m(i)μn,t,m(i).\displaystyle[\hat{\eta}^{(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}=\pi_{n,t,m}^{(i)}\mu_{n,t,m}^{(i)}. (16)
ψn,t,m(i)=πn,t,m(i)(|μn,t,m(i)|2+Γn,t,m(i))|[η^(i)(𝐫n,m(i))]t|2.\displaystyle\psi_{n,t,m}^{(i)}=\pi_{n,t,m}^{(i)}(|\mu_{n,t,m}^{(i)}|^{2}+\Gamma_{n,t,m}^{(i)})-|[\hat{\eta}^{(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}|^{2}. (17)

III-B OAMP-based Algorithm for Multi-antenna BSs

The OAMP iteration for each BS antenna developed in Section III-A neglects the common sparsity among received pilot signals of multiple BS antennas, which could be leveraged to enhance the estimation performance. Since the BS antennas receive pilot signal at the same instant, a common activity indicator is formed to update the prior information for all of them. Specifically, we replace λn,t,m\lambda_{n,t,m} in (13) by variable ωn,t(i)\omega_{n,t}^{\left(i\right)}, which is updated according to the posterior sparsity ratio πn,t,m(i1),m\pi_{n,t,m}^{(i-1)},\forall m obtained in the (i1)(i-1)-th OAMP iteration as follows:

ωn,t(i)=1Mm=1Mπn,t,m(i1),\displaystyle\omega_{n,t}^{(i)}=\frac{1}{M}\sum_{m=1}^{M}\pi_{n,t,m}^{(i-1)}, (18)

i.e., the structured sparsity in the received signals of multiple BS antennas is learned by averaging their posterior sparsity ratios. Thus, the updated prior information in the ii-th OAMP iteration for a multi-antenna BS is expressed as follows:

p(𝐡n,m)=(1t=0Tωn,t(i))t=0Tδ0(hn,t,m)+t=0T(ωn,t(i)𝒞𝒩(hn,t,m;0,βn)ttδ0(hn,t,m)).\displaystyle\begin{aligned} p\left(\mathbf{h}_{n,m}\right)&=\left(1-\sum_{t=0}^{T}\omega_{n,t}^{\left(i\right)}\right)\prod_{t=0}^{T}\delta_{0}\left(h_{n,t,m}\right)\\ &+\sum_{t=0}^{T}\left(\omega_{n,t}^{\left(i\right)}\mathcal{CN}\left(h_{n,t,m};0,\beta_{n}\right)\prod_{t^{\prime}\neq t}\delta_{0}(h_{n,t^{\prime},m})\right).\end{aligned} (19)

Accordingly, we modify the posterior sparsity ratio as follows:

πn,t,m(i)=eβn|rn,t,m(i)|2(τm(i))2(βn+(τm(i))2)(1+βn(τm(i))2)(1t=0Tωn,t(i))ωn,t(i)+t=0Tωn,t(i)ωn,t(i)eβn|rn,t,m(i)|2(τm(i))2(βn+(τm(i))2).\displaystyle\pi_{n,t,m}^{(i)}=\frac{e^{\frac{\beta_{n}|r_{n,t,m}^{(i)}|^{2}}{(\tau_{m}^{(i)})^{2}(\beta_{n}+(\tau_{m}^{(i)})^{2})}}}{\left(1+\frac{\beta_{n}}{(\tau_{m}^{(i)})^{2}}\right)\frac{(1-\sum\limits_{t^{\prime}=0}^{T}\omega_{n,t^{\prime}}^{\left(i\right)})}{\omega_{n,t}^{\left(i\right)}}+\sum\limits_{t^{\prime}=0}^{T}\frac{\omega^{(i)}_{n,t^{\prime}}}{\omega^{(i)}_{n,t}}e^{\frac{\beta_{n}|r_{n,t^{\prime},m}^{(i)}|^{2}}{(\tau_{m}^{(i)})^{2}(\beta_{n}+(\tau_{m}^{(i)})^{2})}}}. (20)

This fully formulates our proposed OAMP-based algorithm for joint activity-delay detection and channel estimation with a multi-antenna BS, with details summarized in Algorithm 1. After the OAMP-based algorithm terminates at the II-th iteration, the effective channel vector is estimated as 𝐡^m=η^(I)(𝐫m(I))\hat{\mathbf{h}}_{m}=\hat{\eta}^{(I)}(\mathbf{r}_{m}^{(I)}), m\forall m, and the set of active users are determined as 𝒦^{n𝒩|t=0Tωn,t(I+1)θ}\hat{\mathcal{K}}\triangleq\{n\in\mathcal{N}|\sum_{t=0}^{T}\omega_{n,t}^{(I+1)}\geq\theta\}, where θ\theta is an empirical threshold. Besides, the synchronization delay for an estimated active user is determined as tn=argmaxt{0,,T}ωn,t(I+1)t_{n}=\mathop{\arg\max}\limits_{t\in\{0,\cdots,T\}}\omega_{n,t}^{(I+1)}, n𝒦^\forall n\in\hat{\mathcal{K}}.

Algorithm 1 The Proposed OAMP-based Algorithm

Input: The normalized received pilot signal 𝐘\mathbf{Y}, pilot sequences 𝐏\mathbf{P}, maximum number of iterations QQ, maximum synchronization delay TT, and accuracy tolerance ϵ\epsilon.
Output: The estimated effective channel matrix 𝐇^\hat{\mathbf{H}}, set of active users 𝒦^\hat{\mathcal{K}}, and synchronization delays {tn}\{t_{n}\}’s, n𝒦^n\in\hat{\mathcal{K}}.
Initialize: i0i\leftarrow 0, 𝐬m(1)=𝟎\mathbf{s}_{m}^{(1)}=\mathbf{0}, m\forall m\in\mathcal{M}, vm(1)=0v_{m}^{(1)}=0, m\forall m\in\mathcal{M}, ωn,t(1)=λT+1\omega_{n,t}^{(1)}=\frac{\lambda}{T+1}, n𝒩\forall n\in\mathcal{N}, t{0,,T}\forall t\in\{0,\cdots,T\}.

1:while i<Qi<Q and m𝐬m(i)𝐬m(i1)22m𝐬m(i1)22>ϵ\frac{\sum_{m}||\mathbf{s}_{m}^{(i)}-\mathbf{s}_{m}^{(i-1)}||_{2}^{2}}{\sum_{m}||\mathbf{s}_{m}^{(i-1)}||_{2}^{2}}>\epsilon do
2:   ii+1i\leftarrow i+1
3: //The LE//
4:   Calculate 𝐖m(i)\mathbf{W}_{m}^{(i)}, m\forall m according to (5).
5:   Perform the LE to obtain 𝐫m(i),m\mathbf{r}_{m}^{\left(i\right)},\forall m via (3).
6:   Calculate (τm(i))2(\tau_{m}^{(i)})^{2}, m\forall m according to (7).
7: //The NLE//
8:   Update the prior sparsity ratio ωn,t(i)\omega_{n,t}^{(i)}, n,t\forall n,t and the
9: corresponding prior information p(𝐡n,m)p(\mathbf{h}_{n,m}), n,m\forall n,m accord-
10: ing to (18) and (19), respectively.
11:   Calculate the posterior mean [η^(i)(𝐫n,m(i))]t[\hat{\eta}^{(i)}(\mathbf{r}_{n,m}^{(i)})]_{t} and variance
12:ψn,t,m(i)\psi_{n,t,m}^{(i)}, n,t,m\forall n,t,m according to (16) and (17), respectively.
13:   Calculate Cm(i)C_{m}^{(i)}, n=1Nt=0T[η^(i)(𝐫n,m(i))]t(T+1)N\frac{\sum_{n=1}^{N}\sum_{t=0}^{T}[\hat{\eta}^{\prime(i)}(\mathbf{r}_{n,m}^{(i)})]_{t}}{(T+1)N}, and (vm(i+1))2(v_{m}^{(i+1)})^{2},
14:n,t,m\forall n,t,m according to (10), (11), and (12), respectively.
15:   Perform the NLE to obtain 𝐬m(i+1)\mathbf{s}_{m}^{(i+1)}, m\forall m
16: according to (4).
17:end while
18:Obtain 𝐇^=[𝐡^1,,𝐡^M]\hat{\mathbf{H}}=[\hat{\mathbf{h}}_{1},\cdots,\hat{\mathbf{h}}_{M}] with 𝐡^m=η^(I)(𝐫m(I))\hat{\mathbf{h}}_{m}=\hat{\eta}^{(I)}(\mathbf{r}_{m}^{(I)}).
19:Determine the set of active users as 𝒦^{n𝒩|t=0Tωn,t(I+1)θ}\hat{\mathcal{K}}\triangleq\{n\in\mathcal{N}|\sum_{t=0}^{T}\omega_{n,t}^{(I+1)}\geq\theta\}.
20:Determine the synchronization delays as tn=argmaxt{0,,T}ωn,t(I+1)t_{n}=\mathop{\arg\max}\limits_{t\in\{0,\cdots,T\}}\omega_{n,t}^{(I+1)}, n\forall n.

IV A Low-Complexity FPAMP Approach for Asynchronous Massive RA

Although the OAMP-based algorithm is effective in solving the joint activity-delay detection and channel estimation problem, the high-complexity LMMSE estimator limits its application in practical mMTC systems that demand fast processing. In this section, we propose a novel algorithm for joint activity-delay detection and channel estimation based on the FPAMP framework, which has computational complexity at the same order as that of the standard AMP algorithm, while maintains similar performance as the OAMP-based algorithm.

IV-A Free Probability Theory

To develop the FPAMP-based algorithm, we first introduce basics of the free probability theory [26, 27], which was established to calculate moments and distributions of non-commutative random variables, such as random matrices. For two independent scalar random variables x1x_{1} and x2x_{2}, one can easily conclude 𝔼[x1x2x1x2]=𝔼[x12x22]\mathbb{E}[x_{1}x_{2}x_{1}x_{2}]=\mathbb{E}[x_{1}^{2}x_{2}^{2}] with classical probability theory. However, the result is not necessarily valid for two random matrices 𝐗1\mathbf{X}_{1} and 𝐗2\mathbf{X}_{2} since they are non-commutative. Therefore, free probability theory brings in a concept of free independence between non-commutative random variables in analogous to the independence between commutative random variables. Since free independence is based on the non-commutative probability space, we review their definitions as follows.

Definition 1.

(Non-commutative Probability Space) An algebraic non-commutative space is a pair (𝒜\mathcal{A}, ϕ\phi) consisting of a unital algebra 𝒜\mathcal{A}, which consists an identity element 11 (The unit of the algebra is usually denoted as 11) with 1x=x=x11x=x=x1, x𝒜\forall x\in\mathcal{A}, and a linear function ϕ:𝒜\phi:\mathcal{A}\rightarrow\mathbb{C} such that ϕ(1)=1\phi(1)=1. Elements from 𝒜{a1,,ak}\mathcal{A}\triangleq\{a_{1},\cdots,a_{k}\} are addressed as non-commutative random variables.

Definition 2.

(Free Independence) Consider an algebraic non-commutative space (𝒜\mathcal{A}, ϕ\phi) and let {𝒜i}\{\mathcal{A}_{i}\} be the unital subalgebras of 𝒜\mathcal{A}, i.e., 𝒜i𝒜\mathcal{A}_{i}\subset\mathcal{A}, i\forall i. The subalgebras are said to be free independent if ϕ(a1an)=0\phi(a_{1}\cdots a_{n})=0 when i1i2i_{1}\neq i_{2}, i2i3i_{2}\neq i_{3}, \cdots, in1ini_{n-1}\neq i_{n}, ak𝒜ika_{k}\in\mathcal{A}_{i_{k}} and ϕ(ak)=0\phi(a_{k})=0, k=1,,n\forall k=1,\cdots,n. The random variables are called free independent if their generated unital subalgebras are free independent.

While it is difficult to explicitly define the PDFs for non-commutative random variables (e.g., random matrices), we turn to their moments since the characteristic functions and PDFs can often be obtained based on moments. The free cumulants of symmetric random matrices for moment calculations are defined below.

Definition 3.

(Free Cumulants) Let XX be a random variable with finite moments of all orders and mk𝔼[Xk]m_{k}\triangleq\mathbb{E}[X^{k}] be its kk-th moment. Suppose the law of XX is the empirical eigenvalue distribution of a symmetric random matrix 𝐀n×n\mathbf{A}\in\mathbb{C}^{n\times n}. The free cumulants of 𝐀\mathbf{A}, denoted by (κk)k=1(\kappa_{k})_{k=1}^{\infty}, are defined recursively via the moment-cumulant relation as follows:

mk=π𝒩𝒞(k)j=1kκjNj(π),\displaystyle m_{k}=\sum_{\pi\in\mathcal{NC}(k)}\prod_{j=1}^{k}\kappa_{j}^{N_{j}(\pi)}, (21)

where π\pi is a partition of 𝒩𝒞(k)\mathcal{NC}(k), i.e., the set of all non-crossing partitions of [k]{1,2,,k}[k]\triangleq\{1,2,\cdots,k\}111A partition π\pi of set [k][k] is defined as π{V1,,Vs}\pi\triangleq\{V_{1},\cdots,V_{s}\} such that V1,,Vs[k]V_{1},\cdots,V_{s}\subset[k], where ViV_{i}\neq\emptyset, ViVj=V_{i}\cap V_{j}=\emptyset, ij{1,,k}i\neq j\in\{1,\cdots,k\}, and V1Vs=[k]V_{1}\cup\cdots\cup V_{s}=[k]. Subsets V1,,VsV_{1},\cdots,V_{s} are called the blocks of π\pi, and the length of a block denotes the number of its elements. Besides, if there exists i<j<l<si<j<l<s such that ii and ll are in one block VpV_{p}, and jj and ss are in another block VqV_{q}, we call that VpV_{p} and VqV_{q} cross. If there is no pair of blocks in π\pi crosses, π\pi is called a non-crossing partition., and Nj(π)N^{j}(\pi) represents the number of length-jj blocks in π\pi.

As will be seen later, the empirical eigenvalue distributions of random matrices are critical to the FPAMP framework. However, since the pilot matrix 𝐏\mathbf{P} is not square, we next introduce the rectangular free cumulants of rectangular random matrices.

Definition 4.

(Rectangular Free Cumulants) Let XX be a random variable with finite moments of all orders and m2k𝔼[X2k]m_{2k}\triangleq\mathbb{E}[X^{2k}] be its even moments. Suppose the law of X2X^{2} represents the empirical eigenvalue distribution of 𝐀𝐀H\mathbf{A}\mathbf{A}^{H} with 𝐀n×d\mathbf{A}\in\mathbb{C}^{n\times d} denoting a rectangular random matrix. The rectangular free cumulants {κ2k}k1\left\{\kappa_{2k}\right\}_{k\geq 1} of 𝐀\mathbf{A} are defined recursively via the moment-cumulant relation as follows:

m2k=ndπ𝒩𝒞(2k)𝒮πmin𝒮 is odd κ|𝒮|𝒮πmin𝒮 is even κ|𝒮|,\displaystyle m_{2k}=\frac{n}{d}\sum_{\pi\in\mathcal{NC}(2k)}\prod_{\begin{subarray}{c}\mathcal{S}\in\pi\\ \min\mathcal{S}\text{ is odd }\end{subarray}}\kappa_{|\mathcal{S}|}\prod_{\begin{subarray}{c}\mathcal{S}^{\prime}\in\pi\\ \min\mathcal{S}^{\prime}\text{ is even }\end{subarray}}\kappa_{|\mathcal{S}^{\prime}|}, (22)

where each 𝒮π𝒩𝒞(2k)\mathcal{S}\in\pi\in\mathcal{NC}\left(2k\right) has even cardinality.

With the basic knowledge on free probability theory, we propose the low-complexity FPAMP-based receiver in the next subsection.

IV-B The Proposed FPAMP-based Algorithm

We propose to reduce the high computational complexity caused by LMMSE estimator in the OAMP-based algorithm by utilizing the memorized information from all the preceding iterations. This is motivated by a solution to the Thouless-Anderson-Palmer (TAP) equations of Ising models with general invariant matrices [28, 29], where iterative algorithms that utilize the information of preceding iterations are analyzed. Our proposed FPAMP-based algorithm evolves from the AMP framework, which takes a major detour from the memory AMP algorithm [1] that was developed under the OAMP framework. Based on the normalized received pilot signal for individual BS antennas in (2), the FPAMP-based algorithm iteratively performs the following updates:

𝐡m(i)=𝐏H𝐬m(i)j=1i1γm,j(i)𝐡~m(j),\displaystyle\mathbf{h}_{m}^{(i)}=\mathbf{P}^{\mathrm{H}}\mathbf{s}_{m}^{(i)}-\sum_{j=1}^{i-1}\gamma_{m,j}^{(i)}\tilde{\mathbf{h}}_{m}^{(j)}, (23)
𝐡~m(i)=f(i)(𝐡m(1),,𝐡m(i)),\displaystyle\tilde{\mathbf{h}}_{m}^{(i)}=f^{(i)}(\mathbf{h}_{m}^{(1)},\cdots,\mathbf{h}_{m}^{(i)}), (24)
𝐫m(i)=𝐏𝐡~m(i)j=1iαm,j(i)𝐬m(j),\displaystyle\mathbf{r}_{m}^{(i)}=\mathbf{P}\tilde{\mathbf{h}}_{m}^{(i)}-\sum_{j=1}^{i}\alpha_{m,j}^{(i)}\mathbf{s}_{m}^{(j)}, (25)
𝐬m(i+1)=g(i+1)(𝐫m(1),,𝐫m(i),𝐲m),\displaystyle\mathbf{s}_{m}^{(i+1)}=g^{(i+1)}(\mathbf{r}_{m}^{(1)},\cdots,\mathbf{r}_{m}^{(i)},\mathbf{y}_{m}), (26)

where 𝐬m(1)=𝐲m\mathbf{s}_{m}^{(1)}=\mathbf{y}_{m} and 𝐡m(1)=𝐏H𝐬m(1)\mathbf{h}_{m}^{(1)}=\mathbf{P}^{\mathrm{H}}\mathbf{s}_{m}^{(1)}. In (24) and (26), f(i)()f^{(i)}\left(\cdot\right) and g(i)()g^{(i)}\left(\cdot\right) are Lipschitz functions, which are called denoisers and will be developed in the following. Besides, {αm,j(i)}j=1i\{\alpha_{m,j}^{(i)}\}_{j=1}^{i} and {γm,j(i)}j=1i1\{\gamma_{m,j}^{(i)}\}_{j=1}^{i-1} are the Onsager parameters, which restrict the joint distributions of {𝐡m(i),𝐫m(i)}\{\mathbf{h}_{m}^{(i)},\mathbf{r}_{m}^{(i)}\} conditioned on 𝐡m\mathbf{h}_{m} to Gaussian.

IV-B1 Onsager coefficients

We compute the Onsager coefficients via auxiliary matrices 𝚿m(i+1)(i+1)×(i+1)\mathbf{\Psi}_{m}^{(i+1)}\in\mathbb{C}^{\left(i+1\right)\times\left(i+1\right)}, 𝚽m(i+1)(i+1)×(i+1)\mathbf{\Phi}_{m}^{(i+1)}\in\mathbb{C}^{(i+1)\times(i+1)} defined as follows:

𝚿m(i+1)=[000001𝐡~m(1)0001𝐡~m(2)2𝐡~m(2)001𝐡~m(i)2𝐡~m(i)i𝐡~m(i)],\displaystyle\mathbf{\Psi}_{m}^{(i+1)}=\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ 0&\left\langle\partial_{1}\tilde{\mathbf{h}}_{m}^{(1)}\right\rangle&0&\ldots&0\\ 0&\left\langle\partial_{1}\tilde{\mathbf{h}}_{m}^{(2)}\right\rangle&\left\langle\partial_{2}\tilde{\mathbf{h}}_{m}^{(2)}\right\rangle&\ldots&0\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\left\langle\partial_{1}\tilde{\mathbf{h}}_{m}^{(i)}\right\rangle&\left\langle\partial_{2}\tilde{\mathbf{h}}_{m}^{(i)}\right\rangle&\ldots&\left\langle\partial_{i}\tilde{\mathbf{h}}_{m}^{(i)}\right\rangle\end{array}\right], (32)
𝚽m(i+1)=[0000𝐆m𝐬m(1)000𝐆m𝐬m(2)1𝐬m(2)00𝐆m𝐬m(i)1𝐬m(i)i1𝐬m(i)0],\displaystyle\mathbf{\Phi}_{m}^{(i+1)}=\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ \left\langle\partial_{\mathbf{G}_{m}}\mathbf{s}_{m}^{(1)}\right\rangle&0&0&\ldots&0\\ \left\langle\partial_{\mathbf{G}_{m}}\mathbf{s}_{m}^{(2)}\right\rangle&\left\langle\partial_{1}\mathbf{s}_{m}^{(2)}\right\rangle&0&\ldots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ \left\langle\partial_{\mathbf{G}_{m}}\mathbf{s}_{m}^{(i)}\right\rangle&\left\langle\partial_{1}\mathbf{s}_{m}^{(i)}\right\rangle&\ldots&\left\langle\partial_{i-1}\mathbf{s}_{m}^{(i)}\right\rangle&0\end{array}\right], (38)

where k𝐡~m(i)(T+1)N×1\partial_{k}\tilde{\mathbf{h}}_{m}^{(i)}\in\mathbb{C}^{(T+1)N\times 1} denotes the partial derivatives [h1,m(k)f(i)(h1,m(1),,h1,m(i)),,h(T+1)N,m(k)f(i)(h(T+1)N,m(1),,\Big{[}\partial_{h_{1,m}^{(k)}}f^{(i)}(h_{1,m}^{(1)},\cdots,h_{1,m}^{(i)}),\cdots,\partial_{h_{(T+1)N,m}^{(k)}}f^{(i)}(h_{(T+1)N,m}^{(1)},\!\cdots\!, h(T+1)N,m(i))]Th_{(T+1)N,m}^{(i)})\Big{]}^{\mathrm{T}} for k{1,,i}k\in\{1,\cdots,i\} with hl,m(i)h_{l,m}^{(i)} denoting the ll-th element of 𝐡m(i)\mathbf{h}_{m}^{(i)}. Besides, 𝐆m𝐏𝐡m\mathbf{G}_{m}\triangleq\mathbf{P}\mathbf{h}_{m} and 𝐆m𝐬m(i)L×1\partial_{\mathbf{G}_{m}}\mathbf{s}_{m}^{(i)}\in\mathbb{C}^{L\times 1} denotes the partial derivatives [G1,mg(i)(r1,m(1),,r1,m(i1),y1,m),,GL,mg(i)(rL,m(1),,\Big{[}\partial_{G_{1,m}}g^{(i)}(r_{1,m}^{(1)},\cdots,r_{1,m}^{(i-1)},y_{1,m}),\cdots,\partial_{G_{L,m}}g^{(i)}(r_{L,m}^{(1)},\cdots, rL,m(i1),yL,m)]Tr_{L,m}^{(i-1)},y_{L,m})\Big{]}^{\mathrm{T}} with Gl,mG_{l,m}, rl,m(i)r_{l,m}^{(i)}, and yl,my_{l,m} denoting the ll-th element of 𝐆m\mathbf{G}_{m}, 𝐫m(i)\mathbf{r}_{m}^{(i)}, and 𝐲m\mathbf{y}_{m}, respectively. Similarly, k𝐬m(i)L×1\partial_{k}\mathbf{s}_{m}^{(i)}\in\mathbb{C}^{L\times 1} denotes the partial derivatives [r1,m(k)g(i)(r1,m(1),,r1,m(i1),y1,m),,rL,m(k)g(i)(rL,m(1),,\Big{[}\partial_{r_{1,m}^{(k)}}g^{(i)}(r_{1,m}^{(1)},\cdots,r_{1,m}^{(i-1)},y_{1,m}),\cdots,\partial_{r_{L,m}^{(k)}}g^{(i)}(r_{L,m}^{(1)},\cdots, rL,m(i1),yL,m)]Tr_{L,m}^{(i-1)},y_{L,m})\Big{]}^{\mathrm{T}} for k{1,,i1}k\in\{1,\cdots,i-1\}.

Then, we define matrices 𝐌m,α(i+1)\mathbf{M}_{m,\alpha}^{(i+1)}, 𝐌m,γ(i+1)(i+1)×(i+1)\mathbf{M}_{m,\gamma}^{(i+1)}\in\mathbb{C}^{(i+1)\times(i+1)} as follows:

𝐌m,α(i+1)=j=0i+1κ2(j+1)𝚿m(i+1)(𝚽m(i+1)𝚿m(i+1))j,\displaystyle\mathbf{M}^{(i+1)}_{m,\alpha}=\sum_{j=0}^{i+1}\kappa_{2(j+1)}\mathbf{\Psi}_{m}^{(i+1)}\left(\mathbf{\Phi}_{m}^{(i+1)}\mathbf{\Psi}_{m}^{(i+1)}\right)^{j}, (39)
𝐌m,γ(i+1)=L(T+1)Nj=0iκ2(j+1)𝚽m(i+1)(𝚿m(i+1)𝚽m(i+1))j,\displaystyle\mathbf{M}^{(i+1)}_{m,\gamma}=\frac{L}{(T+1)N}\sum_{j=0}^{i}\kappa_{2(j+1)}\mathbf{\Phi}_{m}^{(i+1)}\left(\mathbf{\Psi}_{m}^{(i+1)}\mathbf{\Phi}_{m}^{(i+1)}\right)^{j}, (40)

where {κ2k}k1\{\kappa_{2k}\}_{k\geq 1} denotes the rectangular free cumulants of 𝐏\mathbf{P}, which can be calculated by exploiting the connection with the formal power series, i.e., a generalization of polynomials that allows an infinite number of terms. In particular, κ2=m2\kappa_{2}=m_{2}, and when k>1k>1 [30],

κ2k=m2k[zk](j=1k1κ2j(z(L(T+1)NM(z)+1)(M(z)+1))j).\displaystyle\kappa_{2k}\!=\!m_{2k}\!-\!\left[z^{k}\right]\!\left(\sum_{j=1}^{k-1}\!\kappa_{2j}\!\left(z\!\left(\frac{L}{(T+1)N}M(z)+1\!\right)\!\left(M(z)+1\right)\right)^{j}\!\right). (41)

Note that {m2k}k=1i\{m_{2k}\}_{k=1}^{i} denotes the first ii moments of the empirical eigenvalue distribution of 𝐏𝐏H\mathbf{P}\mathbf{P}^{\mathrm{H}}, M(z)k=1m2kzkM(z)\triangleq\sum_{k^{\prime}=1}^{\infty}m_{2k^{\prime}}z^{k^{\prime}}, and [zk](q(z))\left[z^{k}\right](q(z)) denotes the coefficient of zkz^{k} in polynomial q(z)q(z). The moments {m2k}k=1i\{m_{2k}\}_{k=1}^{i} can be estimated in 𝒪(L(T+1)N)\mathcal{O}(L(T+1)N) following [31], which is at the same complexity order of the proposed FPAMP-based algorithm in one iteration as will be analyzed in Section IV-C. To do so, one may sample a standard complex Gaussian vector 𝐜0𝒞𝒩(𝟎L,𝐈L)\mathbf{c}_{0}\sim\mathcal{CN}(\mathbf{0}_{L},\mathbf{I}_{L}) and compute 𝐜k=𝐏H𝐜k1\mathbf{c}_{k}=\mathbf{P}^{\mathrm{H}}\mathbf{c}_{k-1} for odd kk’s and 𝐜k=𝐏𝐜k1\mathbf{c}_{k}=\mathbf{P}\mathbf{c}_{k-1} for even kk’s. Thus, 𝐜k2(T+1)L\frac{||\mathbf{c}_{k}||^{2}}{(T+1)L} is a consistent estimate of the kk-th moment of the empirical eigenvalue distribution of 𝐏𝐏H\mathbf{P}\mathbf{P}^{\mathrm{H}}. Besides, we assume that the empirical distribution of the singular values of 𝐏\mathbf{P} converges weakly almost surely to a random variable with its even moments and rectangular free cumulants expressed as {m¯2k}k1\{\bar{m}_{2k}\}_{k\geq 1} and {κ¯2k}k1\{\bar{\kappa}_{2k}\}_{k\geq 1}. Therefore, as L,NL,N\rightarrow\infty, m2km¯2km_{2k}\rightarrow\bar{m}_{2k} and κ2kκ¯2k\kappa_{2k}\rightarrow\bar{\kappa}_{2k}, k\forall k.

Finally, we obtain the Onsager coefficients {αm,j(i)}j=1i\{\alpha_{m,j}^{(i)}\}_{j=1}^{i} and {γm,j(i)}j=1i1\{\gamma_{m,j}^{(i)}\}_{j=1}^{i-1} from the last row of 𝐌m,α(i+1)\mathbf{M}^{(i+1)}_{m,\alpha} and 𝐌m,γ(i+1)\mathbf{M}^{(i+1)}_{m,\gamma} as follows:

(αm,1(i),,αm,i(i))=([𝐌m,α(i+1)]i+1,2,,[𝐌m,α(i+1)]i+1,i+1),\displaystyle\left(\alpha_{m,1}^{(i)},\ldots,\alpha_{m,i}^{(i)}\right)=\left(\left[\mathbf{M}_{m,\alpha}^{(i+1)}\right]_{i+1,2},\ldots,\left[\mathbf{M}_{m,\alpha}^{(i+1)}\right]_{i+1,i+1}\right), (42)
(γm,1(i),,γm,i1(i))=([𝐌m,γ(i+1)]i+1,2,,[𝐌m,γ(i+1)]i+1,i).\displaystyle\left(\gamma_{m,1}^{(i)},\ldots,\gamma_{m,i-1}^{(i)}\right)=\left(\left[\mathbf{M}_{m,\gamma}^{(i+1)}\right]_{i+1,2},\ldots,\left[\mathbf{M}_{m,\gamma}^{(i+1)}\right]_{i+1,i}\right). (43)

IV-B2 State evolution and joint empirical distributions

In order to design the denoisers f(i)()f^{(i)}\left(\cdot\right) and g(i)()g^{(i)}\left(\cdot\right), we first present the state evolution of the FPAMP-based algorithm and the convergence of the joint empirical distributions of related random variables, as shown in Theorem 1. To proceed, we introduce the definition of W2W_{2}-convergence [32].

Definition 5.

(W2W_{2}-convergence) We use (𝐠(1),,𝐠(i))W2(G(1),,G(i))(\mathbf{g}^{(1)},\cdots,\mathbf{g}^{(i)})\stackrel{{\scriptstyle W_{2}}}{{\longrightarrow}}(G^{(1)},\cdots,G^{(i)}) to denote the joint empirical distributions of the rows of the random matrix [𝐠(1),,𝐠(i)]d×i[\mathbf{g}^{(1)},\cdots,\mathbf{g}^{(i)}]\in\mathbb{C}^{d\times i} converge to the law of random vector [G(1),,G(i)][G^{(1)},\cdots,G^{(i)}] in the Wasserstein space of order 2. Specifically [Corollary 7.4, 36], if a function ψ\psi: i\mathbb{C}^{i}\rightarrow\mathbb{C} satisfies

|ψ(𝐮)ψ(𝐯)|L𝐮𝐯2(1+𝐮2+𝐯2)\displaystyle|\psi(\mathbf{u})-\psi(\mathbf{v})|\leq L\|\mathbf{u}-\mathbf{v}\|_{2}(1+\|\mathbf{u}\|_{2}+\|\mathbf{v}\|_{2}) (44)

for all 𝐮\mathbf{u}, 𝐯i\mathbf{v}\in\mathbb{C}^{i}, and L>0L>0, we have

limd1dj=1dψ(gj(1),,gj(i))=𝔼{ψ(G(1),,G(i))},\displaystyle\lim_{d\rightarrow\infty}\frac{1}{d}\sum_{j=1}^{d}\psi\left(g_{j}^{(1)},\ldots,g_{j}^{(i)}\right)=\mathbb{E}\left\{\psi\left(G^{(1)},\ldots,G^{(i)}\right)\right\}, (45)

if (𝐠(1),,𝐠(i))W2(G(1),,G(i))(\mathbf{g}^{(1)},\cdots,\mathbf{g}^{(i)})\stackrel{{\scriptstyle W_{2}}}{{\longrightarrow}}(G^{(1)},\cdots,G^{(i)}). In addition, function ψ\psi is pseudo-Lipschitz of order 2.

Theorem 1.

Let ψ\psi: 2i+1\mathbb{C}^{2i+1}\rightarrow\mathbb{C} and ϕ\phi: 2i+2\mathbb{C}^{2i+2}\rightarrow\mathbb{C} be any pseudo-Lipschitz functions of order 2. For i=1,2,i=1,2,\cdots, it is almost sure that

lim(T+1)Nt=1(T+1)Nψ(ht,m(1),,ht,m(i),h~t,m(1),,h~t,m(i),ht,m)(T+1)N=𝔼[ψ(Hm(1),,Hm(i),H~m(1),,H~m(i),Hm)],\displaystyle\begin{aligned} \lim_{(T+1)N\rightarrow\infty}&\frac{\sum_{t=1}^{(T+1)N}\psi\left(h_{t,m}^{(1)},\ldots,h_{t,m}^{(i)},\tilde{h}_{t,m}^{(1)},\ldots,\tilde{h}_{t,m}^{(i)},h_{t,m}\right)}{(T+1)N}\\ &=\mathbb{E}\left[\psi\left(H_{m}^{(1)},\ldots,H_{m}^{(i)},\tilde{H}_{m}^{(1)},\ldots,\tilde{H}_{m}^{(i)},H_{m}\right)\right],\end{aligned} (46)
limL1Ll=1Lϕ(rl,m(1),,rl,m(i),sl,m(1),,sl,m(i+1),yl,m)=𝔼[ϕ(Rm(1),,Rm(i),Sm(1),,Sm(i+1),Ym)],\displaystyle\begin{aligned} \lim_{L\rightarrow\infty}&\frac{1}{L}\sum_{l=1}^{L}\phi\left(r_{l,m}^{(1)},\ldots,r_{l,m}^{(i)},s_{l,m}^{(1)},\ldots,s_{l,m}^{(i+1)},y_{l,m}\right)\\ &=\mathbb{E}\left[\phi\left(R_{m}^{(1)},\ldots,R_{m}^{(i)},S_{m}^{(1)},\ldots,S_{m}^{(i+1)},Y_{m}\right)\right],\end{aligned} (47)

where HmH_{m}, Hm(i)H^{(i)}_{m}, H~m(i)\tilde{H}^{(i)}_{m}, Sm(i)S^{(i)}_{m}, Rm(i)R^{(i)}_{m}, and YmY_{m} are defined in the state evolution of the FPAMP-based algorithm as follows:

[Gm,Rm(1),,Rm(i1)]T𝒞𝒩(𝟎,𝚺¯m(i)),\displaystyle\left[G_{m},R_{m}^{(1)},\ldots,R_{m}^{(i-1)}\right]^{\mathrm{T}}\sim\mathcal{CN}(\mathbf{0},\bar{\mathbf{\Sigma}}_{m}^{(i)}), (48)
Sm(i)=g(i)(Rm(1),,Rm(i1),Ym),\displaystyle S_{m}^{(i)}=g^{(i)}(R_{m}^{(1)},\ldots,R_{m}^{(i-1)},Y_{m}), (49)
[Hm(1),,Hm(i)]T=𝝁¯m(i)Hm+[Wm(1),,Wm(i)]T,\displaystyle\left[H_{m}^{(1)},\cdots,H_{m}^{(i)}\right]^{\mathrm{T}}=\bar{\boldsymbol{\mu}}_{m}^{(i)}H_{m}+\left[W_{m}^{(1)},\cdots,W_{m}^{(i)}\right]^{\mathrm{T}}, (50)
H~m(i)=f(i)(Hm(1),,Hm(i)).\displaystyle\tilde{H}_{m}^{(i)}=f^{(i)}(H_{m}^{(1)},\cdots,H_{m}^{(i)}). (51)

In particular, 𝐡mW2Hm\mathbf{h}_{m}\stackrel{{\scriptstyle W_{2}}}{{\longrightarrow}}H_{m}, Gm𝒞𝒩(0,κ¯2𝔼[Hm2])G_{m}\sim\mathcal{CN}(0,\bar{\kappa}_{2}\mathbb{E}[H_{m}^{2}]), and Ym=Gm+NmY_{m}=G_{m}+N_{m} with 𝐧mW2Nm\mathbf{n}_{m}\stackrel{{\scriptstyle W_{2}}}{{\longrightarrow}}N_{m}. Besides, 𝛍¯m(i)=[μ¯m(1),,μ¯m(i)]T\bar{\boldsymbol{\mu}}_{m}^{(i)}=[\bar{\mu}_{m}^{(1)},\cdots,\bar{\mu}_{m}^{(i)}]^{\mathrm{T}} and [Wm(1),,Wm(i)]𝒞𝒩(𝟎,𝛀¯m(i))\Big{[}W_{m}^{(1)},\cdots,W_{m}^{(i)}\Big{]}\sim\mathcal{CN}(\mathbf{0},\bar{\mathbf{\Omega}}_{m}^{(i)}), which are independent of HmH_{m}. Equivalently, as L,NL,N\rightarrow\infty, the joint empirical distributions of (𝐡m(1),,𝐡m(i),𝐡~m(1),,𝐡~m(i),(\mathbf{h}_{m}^{(1)},\cdots,\mathbf{h}_{m}^{(i)},\tilde{\mathbf{h}}_{m}^{(1)},\cdots,\tilde{\mathbf{h}}_{m}^{(i)}, 𝐡m)\mathbf{h}_{m}) and (𝐫m(1),,𝐫m(i)(\mathbf{r}_{m}^{(1)},\cdots,\mathbf{r}_{m}^{(i)}, 𝐬m(1),,𝐬m(i+1),𝐲m)\mathbf{s}_{m}^{(1)},\cdots,\mathbf{s}_{m}^{(i+1)},\mathbf{y}_{m}) converge almost surely in Wasserstein-2 distance to (Hm(1),,(H_{m}^{(1)},\ldots, Hm(i),H~m(1),,H~m(i),Hm)H_{m}^{(i)},\tilde{H}_{m}^{(1)},\ldots,\tilde{H}_{m}^{(i)},H_{m}) and (Rm(1),,Rm(i),Sm(1),,Sm(i+1),Ym)(R_{m}^{(1)},\ldots,R_{m}^{(i)},S_{m}^{(1)},\ldots,S_{m}^{(i+1)},Y_{m}), respectively.

Proof Sketch.

Due to the limited space, we highlight the key idea of the proof herein, which is based on the general recursion and its state evolution, which characterizes the common structure of the FPAMP-based algorithm. We first prove that the joint empirical distributions of some variables in the general recursion converge to the multivariate Gaussian distribution in its state evolution. Then, it can be shown via an induction that the state evolution parameters of the general recursion match those of the FPAMP-based algorithm, and the FPAMP iterates are close to the general recursion. Therefore, the joint empirical distributions of some variables in the FPAMP-based algorithm also converge to the multivariate Gaussian distribution in its state evolution. Detailed derivations are provided in Appendix A. ∎

IV-B3 Denoisers

Based on the state evolution of the FPAMP-based algorithm in (48)\sim(51), [(𝐡m(1)μ¯m(1)𝐡m)T,,\Big{[}(\mathbf{h}_{m}^{(1)}-\bar{\mu}_{m}^{(1)}\mathbf{h}_{m})^{\mathrm{T}},\!\cdots\!, (𝐡m(i)μ¯m(i)𝐡m)T]T(\mathbf{h}_{m}^{(i)}-\bar{\mu}_{m}^{(i)}\mathbf{h}_{m})^{\mathrm{T}}\Big{]}^{\mathrm{T}} and [𝐆mT,(𝐫m(1))T,,(𝐫m(i))T]T\Big{[}\mathbf{G}_{m}^{\mathrm{T}},(\mathbf{r}_{m}^{(1)})^{\mathrm{T}},\!\cdots,\!(\mathbf{r}_{m}^{(i)})^{\mathrm{T}}\Big{]}^{\mathrm{T}} converge in the large system limit, i.e., L,NL,N\rightarrow\infty, to two independent zero-mean multi-variate complex Gaussian distributions with covariance matrices 𝛀¯m(i)\bar{\mathbf{\Omega}}_{m}^{(i)} and 𝚺¯m(i+1)\bar{\mathbf{\Sigma}}_{m}^{(i+1)}, respectively [29, 31]. This property can be used to design the denoisers, with which, we only need to track the mean vector 𝝁¯m(i)\bar{\boldsymbol{\mu}}_{m}^{(i)} and covariance matrices 𝛀¯m(i)\bar{\mathbf{\Omega}}_{m}^{(i)}, 𝚺¯m(i+1)\bar{\mathbf{\Sigma}}_{m}^{(i+1)}.

First, we define auxiliary matrices 𝚿¯m(i+1)\bar{\mathbf{\Psi}}_{m}^{(i+1)}, 𝚽¯m(i+1)\bar{\mathbf{\Phi}}_{m}^{(i+1)}, 𝚪¯m(i+1)\bar{\mathbf{\Gamma}}_{m}^{(i+1)}, 𝚫¯m(i+1)\bar{\mathbf{\Delta}}_{m}^{(i+1)} as follows:

𝚿¯m(i+1)=[00000𝔼[1H~m(1)]000𝔼[1H~m(2)]𝔼[2H~m(2)]00𝔼[1H~m(i)]𝔼[2H~m(i)]𝔼[iH~m(i)]],\displaystyle\bar{\mathbf{\Psi}}_{m}^{(i+1)}\!=\!\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ 0&\mathbb{E}[\partial_{1}\tilde{H}^{(1)}_{m}]&0&\ldots&0\\ 0&\mathbb{E}[\partial_{1}\tilde{H}^{(2)}_{m}]&\mathbb{E}[\partial_{2}\tilde{H}^{(2)}_{m}]&\ldots&0\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\mathbb{E}[\partial_{1}\tilde{H}^{(i)}_{m}]&\mathbb{E}[\partial_{2}\tilde{H}^{(i)}_{m}]&\ldots&\mathbb{E}[\partial_{i}\tilde{H}^{(i)}_{m}]\end{array}\right], (57)
𝚽¯m(i+1)=[0000𝔼[GmSm(1)]000𝔼[GmSm(2)]𝔼[1Sm(2)]00𝔼[GmSm(i)]𝔼[1Sm(i)]𝔼[i1Sm(i)]0],\displaystyle\bar{\mathbf{\Phi}}_{m}^{(i+1)}\!=\!\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ \mathbb{E}[\partial_{G_{m}}S^{(1)}_{m}]&0&0&\ldots&0\\ \mathbb{E}[\partial_{G_{m}}S^{(2)}_{m}]&\mathbb{E}[\partial_{1}S^{(2)}_{m}]&0&\ldots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ \mathbb{E}[\partial_{G_{m}}S^{(i)}_{m}]&\mathbb{E}[\partial_{1}S^{(i)}_{m}]&\ldots&\mathbb{E}[\partial_{i-1}S^{(i)}_{m}]&0\end{array}\right], (63)
𝚪¯m(i+1)=[𝔼[Hm2]𝔼[HmH~m(1)]𝔼[HmH~m(i)]𝔼[HmH~m(1)]𝔼[(H~m(1))2]𝔼[H~m(1)H~m(i)]𝔼[HmH~m(2)]𝔼[H~m(1)H~m(2)]𝔼[H~m(2)H~m(i)]𝔼[HmH~m(i)]𝔼[H~m(1)H~m(i)]𝔼[(H~m(i))2]],\displaystyle\bar{\mathbf{\Gamma}}_{m}^{(i+1)}\!=\!\left[\begin{array}[]{cccc}\mathbb{E}[H_{m}^{2}]&\mathbb{E}[H_{m}\tilde{H}_{m}^{(1)}]&\ldots&\mathbb{E}[H_{m}\tilde{H}_{m}^{(i)}]\\ \mathbb{E}[H_{m}\tilde{H}_{m}^{(1)}]&\mathbb{E}[(\tilde{H}_{m}^{(1)})^{2}]&\ldots&\mathbb{E}[\tilde{H}_{m}^{(1)}\tilde{H}_{m}^{(i)}]\\ \mathbb{E}[H_{m}\tilde{H}_{m}^{(2)}]&\mathbb{E}[\tilde{H}_{m}^{(1)}\tilde{H}_{m}^{(2)}]&\ldots&\mathbb{E}[\tilde{H}_{m}^{(2)}\tilde{H}_{m}^{(i)}]\\ \vdots&\vdots&\ddots&\vdots\\ \mathbb{E}[H_{m}\tilde{H}_{m}^{(i)}]&\mathbb{E}[\tilde{H}_{m}^{(1)}\tilde{H}_{m}^{(i)}]&\ldots&\mathbb{E}[(\tilde{H}_{m}^{(i)})^{2}]\end{array}\right], (69)
𝚫¯m(i+1)=[00000𝔼[(Sm(1))2]𝔼[Sm(1)Sm(2)]𝔼[Sm(1)Sm(i)]0𝔼[Sm(1)Sm(2)]𝔼[(Sm(2))2]𝔼[Sm(2)Sm(i)]0𝔼[Sm(1)Sm(i)]𝔼[Sm(2)Sm(i)]𝔼[(Sm(i))2]],\displaystyle\bar{\mathbf{\Delta}}_{m}^{(i+1)}\!=\!\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ 0&\mathbb{E}[(S_{m}^{(1)})^{2}]&\mathbb{E}[S_{m}^{(1)}S_{m}^{(2)}]&\ldots&\mathbb{E}[S_{m}^{(1)}S_{m}^{(i)}]\\ 0&\mathbb{E}[S_{m}^{(1)}S_{m}^{(2)}]&\mathbb{E}[(S_{m}^{(2)})^{2}]&\ldots&\mathbb{E}[S_{m}^{(2)}S_{m}^{(i)}]\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\mathbb{E}[S_{m}^{(1)}S_{m}^{(i)}]&\mathbb{E}[S_{m}^{(2)}S_{m}^{(i)}]&\ldots&\mathbb{E}[(S_{m}^{(i)})^{2}]\end{array}\right], (75)

where kH~m(i)\partial_{k}\tilde{H}_{m}^{(i)}, GmSm(i)\partial_{G_{m}}S_{m}^{(i)}, and kSm(i)\partial_{k}S_{m}^{(i)} denote the partial derivatives Hm(k)f(i)(Hm(1),,Hm(i))\partial_{H_{m}^{(k)}}f^{(i)}(H_{m}^{(1)},\cdots,H_{m}^{(i)}), Gmg(i)\partial_{G_{m}}g^{(i)} (Rm(1),,Rm(i1),Ym)(R_{m}^{(1)},\cdots,R_{m}^{(i-1)},Y_{m}), and Rm(k)g(i)(Rm(1),,Rm(i1),Ym)\partial_{R_{m}^{(k)}}g^{(i)}(R_{m}^{(1)},\cdots,R_{m}^{(i-1)},Y_{m}). Starting with 𝚺¯m(1)=κ¯2𝔼[Hm2]\bar{\mathbf{\Sigma}}_{m}^{(1)}\!=\!\bar{\kappa}_{2}\mathbb{E}[H_{m}^{2}], 𝝁¯m(1)=Lκ¯2𝔼[Gmg(1)(Ym)](T+1)N\bar{\boldsymbol{\mu}}_{m}^{(1)}=\frac{L\bar{\kappa}_{2}\mathbb{E}[\partial_{G_{m}}g^{(1)}(Y_{m})]}{(T+1)N}, and 𝛀¯m(1)=L(κ¯2𝔼[(g(1)(Ym))2]+κ¯4𝔼[Hm2](𝔼[Gmg(1)(Ym)])2)(T+1)N\bar{\mathbf{\Omega}}_{m}^{(1)}=\frac{L\left(\bar{\kappa}_{2}\mathbb{E}[(g^{(1)}(Y_{m}))^{2}]+\bar{\kappa}_{4}\mathbb{E}[H_{m}^{2}](\mathbb{E}[\partial_{G_{m}}g^{(1)}(Y_{m})])^{2}\right)}{(T+1)N}, the covariance matrix 𝚺¯m(i+1)\bar{\mathbf{\Sigma}}_{m}^{(i+1)} is updated as follows:

𝚺¯m(i+1)=j=02i+1κ¯2(j+1)𝚵m,j(i+1),\displaystyle\bar{\mathbf{\Sigma}}_{m}^{(i+1)}=\sum_{j=0}^{2i+1}\bar{\kappa}_{2(j+1)}\mathbf{\Xi}_{m,j}^{(i+1)}, (76)

where 𝚵m,0(i+1)=𝚪¯m(i+1)\mathbf{\Xi}_{m,0}^{(i+1)}=\bar{\mathbf{\Gamma}}_{m}^{(i+1)}, and for j1j\geq 1,

𝚵m,j(i+1)=k=0j(𝚿¯m(i+1)𝚽¯m(i+1))k𝚪¯m(i+1)((𝚿¯m(i+1)𝚽¯m(i+1))H)jk+k=0j1(𝚿¯m(i+1)𝚽¯m(i+1))k𝚿¯m(i+1)𝚫¯m(i+1)(𝚿¯m(i+1))H((𝚿¯m(i+1)𝚽¯m(i+1))H)jk1.\displaystyle\begin{aligned} \mathbf{\Xi}_{m,j}^{(i+1)}&=\sum_{k=0}^{j}\left(\bar{\mathbf{\Psi}}_{m}^{(i+1)}\bar{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{k}\bar{\mathbf{\Gamma}}_{m}^{(i+1)}\left(\left(\bar{\mathbf{\Psi}}_{m}^{(i+1)}\bar{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{\mathrm{H}}\right)^{j-k}\\ &+\sum_{k=0}^{j-1}\left(\bar{\mathbf{\Psi}}_{m}^{(i+1)}\bar{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{k}\bar{\mathbf{\Psi}}_{m}^{(i+1)}\bar{\mathbf{\Delta}}_{m}^{(i+1)}(\bar{\mathbf{\Psi}}_{m}^{(i+1)})^{\mathrm{H}}\\ &\quad\quad\cdot\left(\left(\bar{\mathbf{\Psi}}_{m}^{(i+1)}\bar{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{\mathrm{H}}\right)^{j-k-1}.\end{aligned} (77)

Next, we define a symmetric matrix 𝛀ˇm(i+2)(i+2)×(i+2)\check{\mathbf{\Omega}}_{m}^{(i+2)}\in\mathbb{C}^{(i+2)\times(i+2)} by padding one row and one column of zeros before the first row and first column of 𝛀¯m(i+1)(i+1)×(i+1)\bar{\mathbf{\Omega}}_{m}^{(i+1)}\in\mathbb{C}^{(i+1)\times(i+1)}. In particular, we can compute 𝛀ˇm(i+2)\check{\mathbf{\Omega}}_{m}^{(i+2)} as follows:

𝛀ˇm(i+2)=L(T+1)Nj=02(i+1)κ¯2(j+1)𝚯m,j(i+2),\displaystyle\check{\mathbf{\Omega}}_{m}^{(i+2)}=\frac{L}{(T+1)N}\sum_{j=0}^{2(i+1)}\bar{\kappa}_{2(j+1)}\mathbf{\Theta}_{m,j}^{(i+2)}, (78)

where 𝚯m,0(i+2)=𝚫¯m(i+2)\mathbf{\Theta}_{m,0}^{(i+2)}=\bar{\mathbf{\Delta}}_{m}^{(i+2)}, and for j1j\geq 1,

𝚯m,j(i+2)=k=0j(𝚽¯m(i+2)𝚿¯m(i+2))k𝚫¯m(i+2)((𝚽¯m(i+2)𝚿¯m(i+2))H)jk+k=0j1(𝚽¯m(i+2)𝚿¯m(i+2))k𝚽¯m(i+2)𝚪¯m(i+2)(𝚽¯m(i+2))H((𝚽¯m(i+2)𝚿¯m(i+2))H)jk1.\displaystyle\begin{aligned} \mathbf{\Theta}_{m,j}^{(i+2)}&=\sum_{k=0}^{j}\left(\bar{\mathbf{\Phi}}_{m}^{(i+2)}\bar{\mathbf{\Psi}}_{m}^{(i+2)}\right)^{k}\bar{\mathbf{\Delta}}_{m}^{(i+2)}\left(\left(\bar{\mathbf{\Phi}}_{m}^{(i+2)}\bar{\mathbf{\Psi}}_{m}^{(i+2)}\right)^{\mathrm{H}}\right)^{j-k}\\ &+\sum_{k=0}^{j-1}\left(\bar{\mathbf{\Phi}}_{m}^{(i+2)}\bar{\mathbf{\Psi}}_{m}^{(i+2)}\right)^{k}\bar{\mathbf{\Phi}}_{m}^{(i+2)}\bar{\mathbf{\Gamma}}_{m}^{(i+2)}(\bar{\mathbf{\Phi}}_{m}^{(i+2)})^{\mathrm{H}}\\ &\quad\quad\cdot\left(\left(\bar{\mathbf{\Phi}}_{m}^{(i+2)}\bar{\mathbf{\Psi}}_{m}^{(i+2)}\right)^{\mathrm{H}}\right)^{j-k-1}.\end{aligned} (79)

Therefore, 𝛀¯m(i+1)\bar{\mathbf{\Omega}}_{m}^{(i+1)} is expressed as follows:

𝛀¯m(i+1)=[𝛀ˇm(i+2)]2:i+2,2:i+2.\displaystyle\bar{\mathbf{\Omega}}_{m}^{(i+1)}=[\check{\mathbf{\Omega}}_{m}^{(i+2)}]_{2:i+2,2:i+2}. (80)

Finally, we evaluate the mean vector 𝝁¯m(i+1)\bar{\boldsymbol{\mu}}_{m}^{(i+1)} recursively with each element given as follows:

μ¯m(i+1)=[𝐌¯m,γ(i+2)]i+2,1,\displaystyle\bar{\mu}_{m}^{(i+1)}=[\bar{\mathbf{M}}^{(i+2)}_{m,\gamma}]_{i+2,1}, (81)

where 𝐌¯m,γ(i+2)\bar{\mathbf{M}}^{(i+2)}_{m,\gamma} is obtained as

𝐌¯m,γ(i+2)=L(T+1)Nj=0i+1κ¯2(j+1)𝚽¯m(i+2)(𝚿¯m(i+2)𝚽¯m(i+2))j.\displaystyle\bar{\mathbf{M}}^{(i+2)}_{m,\gamma}=\frac{L}{(T+1)N}\sum_{j=0}^{i+1}\bar{\kappa}_{2(j+1)}\bar{\mathbf{\Phi}}_{m}^{(i+2)}\left(\bar{\mathbf{\Psi}}_{m}^{(i+2)}\bar{\mathbf{\Phi}}_{m}^{(i+2)}\right)^{j}. (82)

Note that the formulas of 𝚯m,j(i+2)\mathbf{\Theta}_{m,j}^{(i+2)} and 𝐌¯m,γ(i+2)\bar{\mathbf{M}}^{(i+2)}_{m,\gamma} in (79) and (82) involve matrices 𝚿¯m(i+2)\bar{\mathbf{\Psi}}_{m}^{(i+2)} and 𝚪¯m(i+2)\bar{\mathbf{\Gamma}}_{m}^{(i+2)}, which have not been computed yet. However, the last rows and columns of 𝚿¯m(i+2)\bar{\mathbf{\Psi}}_{m}^{(i+2)} and 𝚪¯m(i+2)\bar{\mathbf{\Gamma}}_{m}^{(i+2)} do not influence the calculation since these rows and columns are zeroed out according to the forms of 𝚽¯m(i+2)\bar{\mathbf{\Phi}}_{m}^{(i+2)} and 𝚫¯m(i+2)\bar{\mathbf{\Delta}}_{m}^{(i+2)}. In other words, we can just use the upper-left submatrices of 𝚿¯m(i+2)\bar{\mathbf{\Psi}}_{m}^{(i+2)} and 𝚪¯m(i+2)\bar{\mathbf{\Gamma}}_{m}^{(i+2)}, i.e., 𝚿¯m(i+1)\bar{\mathbf{\Psi}}_{m}^{(i+1)} and 𝚪¯m(i+1)\bar{\mathbf{\Gamma}}_{m}^{(i+1)}, to obtain 𝚯m,j(i+2)\mathbf{\Theta}_{m,j}^{(i+2)} and 𝐌¯m,γ(i+2)\bar{\mathbf{M}}^{(i+2)}_{m,\gamma}. In addition, all the expectations in these matrices are calculated via empirical averaging such that 𝚿m(i)=𝚿¯m(i)\mathbf{\Psi}_{m}^{(i)}=\bar{\mathbf{\Psi}}_{m}^{(i)}, 𝚽m(i)=𝚽¯m(i)\mathbf{\Phi}_{m}^{(i)}=\bar{\mathbf{\Phi}}_{m}^{(i)}, and 𝔼[H~m(i)H~m(j)]\mathbb{E}[\tilde{H}^{(i)}_{m}\tilde{H}^{(j)}_{m}], 𝔼[HmH~m(i)]\mathbb{E}[H_{m}\tilde{H}^{(i)}_{m}], 𝔼[Hm2]\mathbb{E}[H_{m}^{2}], and 𝔼[Sm(i)Sm(j)]\mathbb{E}[S^{(i)}_{m}S^{(j)}_{m}] can be obtained as (𝐡~m(i))H𝐡~m(j)(T+1)N\frac{(\tilde{\mathbf{h}}_{m}^{(i)})^{\mathrm{H}}\tilde{\mathbf{h}}_{m}^{(j)}}{(T+1)N}, 𝐡~m(i)22(T+1)N\frac{||\tilde{\mathbf{h}}^{(i)}_{m}||_{2}^{2}}{(T+1)N}, λNn=1Nβn\frac{\lambda}{N}\sum_{n=1}^{N}\beta_{n}, and (𝐬m(i))H𝐬m(j)L\frac{(\mathbf{s}_{m}^{(i)})^{\mathrm{H}}\mathbf{s}_{m}^{(j)}}{L}, respectively. Besides, {κ¯2k}k1\{\bar{\kappa}_{2k}\}_{k\geq 1} directly uses the values of {κ2k}k1\{\kappa_{2k}\}_{k\geq 1}.

Similar to the AMP algorithm, we adopt the MMSE denoisers, while intermediate estimates of all the preceding iterates instead of only the most recent one are utilized. Specifically, the denoisers are expressed as follows:

f(i)(𝐡m(1),,𝐡m(i))=𝔼[𝐡m|𝐡m(1),,𝐡m(i)],\displaystyle f^{(i)}(\mathbf{h}_{m}^{(1)},\cdots,\mathbf{h}_{m}^{(i)})=\mathbb{E}[\mathbf{h}_{m}|\mathbf{h}_{m}^{(1)},\cdots,\mathbf{h}_{m}^{(i)}], (83)
g(i+1)(𝐫m(1),,𝐫m(i),𝐲m)=𝔼[𝐆m|𝐫m(1),,𝐫m(i),𝐲m]𝔼[𝐆m|𝐫m(1),,𝐫m(i)].\displaystyle\begin{aligned} g^{(i+1)}(\mathbf{r}_{m}^{(1)},\cdots,\mathbf{r}_{m}^{(i)},\mathbf{y}_{m})&=\mathbb{E}[\mathbf{G}_{m}|\mathbf{r}_{m}^{(1)},\cdots,\mathbf{r}_{m}^{(i)},\mathbf{y}_{m}]\\ &-\mathbb{E}[\mathbf{G}_{m}|\mathbf{r}_{m}^{(1)},\cdots,\mathbf{r}_{m}^{(i)}].\end{aligned} (84)

Regarding the denoiser in (83), we need to compute the posterior distribution of 𝐡m\mathbf{h}_{m}. Since the likelihood function can be obtained in (48) and the prior information is available in (13), the posterior distribution is calculated by using the Bayes’ theorem as follows:

p(𝐡n,m|𝐡n,m(1),,𝐡n,m(i))=(1t=0Tπn,t,m(i))t=0Tδ0(hn,t,m)+t=0T(πn,t,m(i)𝒞𝒩(hn,t,m;Cn,t,m(i),Dn,t,m(i))ttδ0(hn,t,m)),\displaystyle\begin{aligned} &\quad p\left(\mathbf{h}_{n,m}|\mathbf{h}_{n,m}^{(1)},\cdots,\mathbf{h}_{n,m}^{(i)}\right)\\ &=\left(1-\sum_{t=0}^{T}\pi_{n,t,m}^{(i)}\right)\prod_{t=0}^{T}\delta_{0}\left(h_{n,t,m}\right)\\ &+\sum_{t=0}^{T}\left(\pi_{n,t,m}^{(i)}\mathcal{CN}\left(h_{n,t,m};C_{n,t,m}^{(i)},D_{n,t,m}^{(i)}\right)\prod_{t^{\prime}\neq t}\delta_{0}(h_{n,t^{\prime},m})\right),\end{aligned} (85)

where

Cn,t,m(i)=En,t,m(i)Fn,m(i),\displaystyle C_{n,t,m}^{(i)}=\frac{E_{n,t,m}^{(i)}}{F_{n,m}^{(i)}}, (86)
Dn,t,m(i)=1Fn,m(i),\displaystyle D_{n,t,m}^{(i)}=\frac{1}{F_{n,m}^{(i)}}, (87)
En,t,m(i)=(𝝁¯m(i))H(𝛀¯m(i))1[hn,t,m(1),,hn,t,m(i)]T,\displaystyle E_{n,t,m}^{(i)}=(\bar{\boldsymbol{\mu}}_{m}^{(i)})^{\mathrm{H}}(\bar{\mathbf{\Omega}}_{m}^{(i)})^{-1}\left[h_{n,t,m}^{(1)},\cdots,h_{n,t,m}^{(i)}\right]^{\mathrm{T}}, (88)
Fn,m(i)=1/βn+(𝝁¯m(i))H(𝛀¯m(i))1𝝁¯m(i),\displaystyle F_{n,m}^{(i)}=1/\beta_{n}+(\bar{\boldsymbol{\mu}}_{m}^{(i)})^{\mathrm{H}}(\bar{\mathbf{\Omega}}_{m}^{(i)})^{-1}\bar{\boldsymbol{\mu}}_{m}^{(i)}, (89)
πn,t,m(i)=e(En,t,m(i))2Fn,m(i)βnFn,m(i)1t=0Tλn,t,mλn,t,m+t=0Tλn,t,mλn,t,me(En,t,m(i))2Fn,m(i).\displaystyle\pi_{n,t,m}^{(i)}=\frac{e^{\frac{\left(E_{n,t,m}^{(i)}\right)^{2}}{F_{n,m}^{(i)}}}}{\beta_{n}F_{n,m}^{(i)}\frac{1-\sum_{t^{\prime}=0}^{T}\lambda_{n,t^{\prime},m}}{\lambda_{n,t,m}}+\sum_{t^{\prime}=0}^{T}\frac{\lambda_{n,t^{\prime},m}}{\lambda_{n,t,m}}e^{\frac{\left(E_{n,t^{\prime},m}^{(i)}\right)^{2}}{F_{n,m}^{(i)}}}}. (90)

Therefore, each element in 𝐡~m(i)\tilde{\mathbf{h}}_{m}^{(i)} of (24) in ii-th iteration, i.e, h~n,t,m(i)\tilde{h}_{n,t,m}^{(i)}, can be obtained as follows:

h~n,t,m(i)=[f(i)(𝐡n,m(1),,𝐡n,m(i))]t=πn,t,m(i)Cn,t,m(i).\displaystyle\tilde{h}_{n,t,m}^{(i)}=[f^{(i)}(\mathbf{h}_{n,m}^{(1)},\cdots,\mathbf{h}_{n,m}^{(i)})]_{t}=\pi_{n,t,m}^{(i)}C_{n,t,m}^{(i)}. (91)

After some manipulations, hn,t,m(k)[f(i)(𝐡n,m(1),,𝐡n,m(i))]t\partial_{h_{n,t,m}^{(k)}}[f^{(i)}(\mathbf{h}_{n,m}^{(1)},\cdots,\mathbf{h}_{n,m}^{(i)})]_{t} is derived as follows:

hn,t,m(k)[f(i)(𝐡n,m(1),,𝐡n,m(i))]t=πn,t,m(i)[2En,t,m(i)Fn,m(i)Cn,t,m(i)(1πn,t,m(i))+1](𝝁¯m(i))H(𝛀¯m(i))1𝐞k,\displaystyle\begin{aligned} &\partial_{h_{n,t,m}^{(k)}}[f^{(i)}(\mathbf{h}_{n,m}^{(1)},\cdots,\mathbf{h}_{n,m}^{(i)})]_{t}=\\ &\pi_{n,t,m}^{(i)}\left[\frac{2E_{n,t,m}^{(i)}}{F_{n,m}^{(i)}}C_{n,t,m}^{(i)}\left(1-\pi_{n,t,m}^{(i)}\right)+1\right](\bar{\boldsymbol{\mu}}_{m}^{(i)})^{\mathrm{H}}(\bar{\mathbf{\Omega}}_{m}^{(i)})^{-1}\mathbf{e}_{k},\end{aligned} (92)

where 𝐞ki×1\mathbf{e}_{k}\in\mathbb{R}^{i\times 1} represents the kk-th column of 𝐈i\mathbf{I}_{i}.

Regarding the denoiser in (84), since (2) is a linear model, we set g(1)(𝐲m)=𝐲mg^{(1)}(\mathbf{y}_{m})=\mathbf{y}_{m} and consequently Gl,mg(1)(yl,m)=1\partial_{G_{l,m}}g^{(1)}(y_{l,m})=1 for i=0i=0. For i>0i>0, based on the state evolution, we have [Gm,Rm(1),,Rm(i)]T𝒞𝒩(𝟎,𝚺¯m(i+1))[G_{m},R_{m}^{(1)},\cdots,R_{m}^{(i)}]^{\mathrm{T}}\sim\mathcal{CN}(\mathbf{0},\bar{\mathbf{\Sigma}}_{m}^{(i+1)}), and the second conditional expectation in (84) is given as follows:

𝔼[Gl,m|rl,m(1),,rl,m(i)]=[𝚺¯m(i+1)]1,2:i+1([𝚺¯m(i+1)]2:i+1,2:i+1)1[rl,m(1),,rl,m(i)]T.\displaystyle\begin{aligned} &\mathbb{E}[G_{l,m}|r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)}]\\ &=[\bar{\mathbf{\Sigma}}_{m}^{(i+1)}]_{1,2:i+1}([\bar{\mathbf{\Sigma}}_{m}^{(i+1)}]_{2:i+1,2:i+1})^{-1}\left[r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)}\right]^{\mathrm{T}}.\end{aligned} (93)

By further incorporating the received pilot signal, we have [Gm,Rm(1),,Rm(i),Ym]T𝒞𝒩(𝟎,𝐒¯m(i+2))[G_{m},R_{m}^{(1)},\cdots,R_{m}^{(i)},Y_{m}]^{\mathrm{T}}\sim\mathcal{CN}(\mathbf{0},\bar{\mathbf{S}}_{m}^{(i+2)}), where

𝐒¯m(i+2)=[𝚺¯m(i+1)[𝚺¯m(i+1)]1:i+1,1[𝚺¯m(i+1)]1,1:i+1κ2𝔼[Hm2]+σ2ρL].\displaystyle\bar{\mathbf{S}}_{m}^{(i+2)}=\left[\begin{array}[]{cc}\bar{\mathbf{\Sigma}}_{m}^{(i+1)}&\left[\bar{\mathbf{\Sigma}}_{m}^{(i+1)}\right]_{1:i+1,1}\\ \left[\bar{\mathbf{\Sigma}}_{m}^{(i+1)}\right]_{1,1:i+1}&\kappa_{2}\mathbb{E}[H_{m}^{2}]+\frac{\sigma^{2}}{\rho L}\end{array}\right]. (96)

Thus, the first conditional expectation in (84) is expressed as follows:

𝔼[Gl,m|rl,m(1),,rl,m(i),yl,m]=[𝐒¯m(i+2)]1,2:i+2([𝐒¯m(i+2)]2:i+2,2:i+2)1[rl,m(1),,rl,m(i),yl,m]T.\displaystyle\begin{aligned} &\mathbb{E}[G_{l,m}|r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m}]\\ &=[\bar{\mathbf{S}}_{m}^{(i+2)}]_{1,2:i+2}([\bar{\mathbf{S}}_{m}^{(i+2)}]_{2:i+2,2:i+2})^{-1}\left[r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m}\right]^{\mathrm{T}}.\end{aligned} (97)

By combining the results in (93) and (97), we obtain

g(i+1)(rl,m(1),,rl,m(i),yl,m)=[𝐒¯m(i+2)]1,2:i+2([𝐒¯m(i+2)]2:i+2,2:i+2)1[rl,m(1),,rl,m(i),yl,m]T[𝚺¯m(i+1)]1,2:i+1([𝚺¯m(i+1)]2:i+1,2:i+1)1[rl,m(1),,rl,m(i)]T.\displaystyle\begin{aligned} &g^{(i+1)}(r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m})\\ &=[\bar{\mathbf{S}}_{m}^{(i+2)}]_{1,2:i+2}([\bar{\mathbf{S}}_{m}^{(i+2)}]_{2:i+2,2:i+2})^{-1}\left[r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m}\right]^{\mathrm{T}}\\ &-[\bar{\mathbf{\Sigma}}_{m}^{(i+1)}]_{1,2:i+1}([\bar{\mathbf{\Sigma}}_{m}^{(i+1)}]_{2:i+1,2:i+1})^{-1}\left[r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)}\right]^{\mathrm{T}}.\end{aligned} (98)

Accordingly, the partial derivatives of g(i+1)g^{(i+1)} can be calculated as follows:

Gl,mg(i+1)(rl,m(1),,rl,m(i),yl,m)=[𝐒¯m(i+2)]1,2:i+2([𝐒¯m(i+2)]2:i+2,2:i+2)1𝐞i+1,\displaystyle\begin{aligned} &\partial_{G_{l,m}}g^{(i+1)}(r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m})\\ &=[\bar{\mathbf{S}}_{m}^{(i+2)}]_{1,2:i+2}([\bar{\mathbf{S}}_{m}^{(i+2)}]_{2:i+2,2:i+2})^{-1}\mathbf{e}_{i+1},\end{aligned} (99)
rl,m(k)g(i+1)(rl,m(1),,rl,m(i),yl,m)=[𝐒¯m(i+2)]1,2:i+2([𝐒¯m(i+2)]2:i+2,2:i+2)1[𝐞kT,0]T[𝚺¯m(i+1)]1,2:i+1([𝚺¯m(i+1)]2:i+1,2:i+1)1𝐞k.\displaystyle\begin{aligned} &\partial_{r_{l,m}^{(k)}}g^{(i+1)}(r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m})\\ &=[\bar{\mathbf{S}}_{m}^{(i+2)}]_{1,2:i+2}([\bar{\mathbf{S}}_{m}^{(i+2)}]_{2:i+2,2:i+2})^{-1}\left[\mathbf{e}_{k}^{\mathrm{T}},0\right]^{\mathrm{T}}\\ &-[\bar{\mathbf{\Sigma}}_{m}^{(i+1)}]_{1,2:i+1}([\bar{\mathbf{\Sigma}}_{m}^{(i+1)}]_{2:i+1,2:i+1})^{-1}\mathbf{e}_{k}.\end{aligned} (100)

Similar to the OAMP-based algorithm, the common sparsity pattern among all antennas is leveraged in the FPAMP-based algorithm. When the FPAMP iterations terminate, other steps follow those of the proposed OAMP-based algorithm in Section III-B. The proposed FPAMP-based algorithm for joint activity-delay detection and channel estimation are summarized in Algorithm 2.

Algorithm 2 The Proposed FPAMP-based Algorithm

Input: The normalized received pilot signal 𝐘\mathbf{Y}, pilot sequences 𝐏\mathbf{P}, maximum synchronization delay TT, maximum number of iterations QQ, and accuracy tolerance ϵ\epsilon.
Output: The estimated effective channel matrix 𝐇^\hat{\mathbf{H}}, set of active users 𝒦^\hat{\mathcal{K}}, and synchronization delays {tn}\{t_{n}\}’s, n𝒦^n\in\hat{\mathcal{K}}.

1:Calculate the rectangular free cumulants {κ2k}k1\{\kappa_{2k}\}_{k\geq 1} with (41).
2:Initialize: i0i\leftarrow 0, 𝐬m(1)=𝐲m\mathbf{s}_{m}^{(1)}=\mathbf{y}_{m}, m\forall m\in\mathcal{M}, 𝐡m(1)=𝐏H𝐬m(1)\mathbf{h}_{m}^{(1)}=\mathbf{P}^{\mathrm{H}}\mathbf{s}_{m}^{(1)}, m\forall m\in\mathcal{M}, ωn,t(1)=λT+1\omega_{n,t}^{(1)}=\frac{\lambda}{T+1}, n𝒩\forall n\in\mathcal{N}, t{0,,T}\forall t\in\{0,\cdots,T\}, 𝝁¯m(1)=L(T+1)Nκ2\bar{\boldsymbol{\mu}}_{m}^{(1)}=\frac{L}{(T+1)N}\kappa_{2}, m\forall m\in\mathcal{M}, 𝚺¯m(1)=κ2λNn=1Nβn\bar{\mathbf{\Sigma}}_{m}^{(1)}=\kappa_{2}\frac{\lambda}{N}\sum_{n=1}^{N}\beta_{n}, m\forall m\in\mathcal{M}, and 𝛀¯m(1)=1(T+1)Nκ2𝐲mH𝐲m+L(T+1)Nκ4λNn=1Nβn\bar{\mathbf{\Omega}}_{m}^{(1)}=\frac{1}{(T+1)N}\kappa_{2}\mathbf{y}_{m}^{\mathrm{H}}\mathbf{y}_{m}+\frac{L}{(T+1)N}\kappa_{4}\frac{\lambda}{N}\sum_{n=1}^{N}\beta_{n}, m\forall m\in\mathcal{M}.
3:while i<Qi<Q and m𝐡~m(i)𝐡~m(i1)22m𝐡~m(i1)22>ϵ\frac{\sum_{m}||\tilde{\mathbf{h}}_{m}^{(i)}-\tilde{\mathbf{h}}_{m}^{(i-1)}||_{2}^{2}}{\sum_{m}||\tilde{\mathbf{h}}_{m}^{(i-1)}||_{2}^{2}}>\epsilon do
4:   ii+1i\leftarrow i+1
5:   Calculate the auxiliary matrices 𝚿¯m(i+1)\bar{\mathbf{\Psi}}_{m}^{(i+1)}, 𝚽¯m(i+1)\bar{\mathbf{\Phi}}_{m}^{(i+1)}, 𝚪¯m(i+1)\bar{\mathbf{\Gamma}}_{m}^{(i+1)}
6: and 𝚫¯m(i+1)\bar{\mathbf{\Delta}}_{m}^{(i+1)}, m\forall m according to (57), (63), (69) and (75).
7:   Calculate the matrices 𝐌m,γ(i+1)\mathbf{M}_{m,\gamma}^{(i+1)} and the Onsager coeffi-
8: cients {γm,j(i)}j=1i1\{\gamma_{m,j}^{(i)}\}_{j=1}^{i-1}, m\forall m according to (40) and (43), where
9:𝚿m(i+1)=𝚿¯m(i+1)\mathbf{\Psi}_{m}^{(i+1)}=\bar{\mathbf{\Psi}}_{m}^{(i+1)} and 𝚽m(i+1)=𝚽¯m(i+1)\mathbf{\Phi}_{m}^{(i+1)}=\bar{\mathbf{\Phi}}_{m}^{(i+1)}, m\forall m.
10:   Perform (23), m\forall m.
11:   Update the prior sparsity ratio ωn,t(i)\omega_{n,t}^{(i)}, n,t\forall n,t according to
12: (18).
13:   Perform (24), m\forall m according to (86)-(91), where λn,t,m\lambda_{n,t,m}
14: is replaced by ωn,t(i)\omega_{n,t}^{(i)}.
15:   Calculate the matrices 𝐌m,α(i+1)\mathbf{M}_{m,\alpha}^{(i+1)} and the Onsager coeffi-
16: cients {αm,j(i)}j=1i\{\alpha_{m,j}^{(i)}\}_{j=1}^{i}, m\forall m according to (39) and (42), where
17:𝚿m(i+1)=𝚿¯m(i+1)\mathbf{\Psi}_{m}^{(i+1)}=\bar{\mathbf{\Psi}}_{m}^{(i+1)} and 𝚽m(i+1)=𝚽¯m(i+1)\mathbf{\Phi}_{m}^{(i+1)}=\bar{\mathbf{\Phi}}_{m}^{(i+1)}, m\forall m.
18:   Perform (25), m\forall m.
19:   Calculate the covariance matrix 𝚺¯m(i+1)\bar{\mathbf{\Sigma}}_{m}^{(i+1)}, m\forall m for the
20: denoiser g(i+1)g^{(i+1)} according to (76) and (77).
21:   Perform (26), m\forall m according to (96) and (98).
22:   Calculate the mean vector 𝝁¯m(i+1)\bar{\boldsymbol{\mu}}_{m}^{(i+1)}, m\forall m according to (81)
23: and (82).
24:   Calculate the covariance matrix 𝛀¯m(i+1)\bar{\mathbf{\Omega}}_{m}^{(i+1)}, m\forall m for the
25: denoiser f(i+1)f^{(i+1)} according to (78), (79) and (80).
26:   Calculate the partial derivatives {hn,t,m(k)[f(i)(𝐡n,m(1),,\{\partial_{h_{n,t,m}^{(k)}}[f^{(i)}(\mathbf{h}_{n,m}^{(1)},\cdots,
27:𝐡n,m(i))]t}\mathbf{h}_{n,m}^{(i)})]_{t}\}, {Gl,mg(i+1)(rl,m(1),,rl,m(i),yl,m)}\{\partial_{G_{l,m}}g^{(i+1)}(r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m})\} and
28:{rl,m(k)g(i+1)(rl,m(1),,rl,m(i),yl,m)}\{\partial_{r_{l,m}^{(k)}}g^{(i+1)}(r_{l,m}^{(1)},\cdots,r_{l,m}^{(i)},y_{l,m})\}, m\forall m according to
29: (92), (99) and (100), where λn,t,m\lambda_{n,t,m} is replaced by ωn,t(i)\omega_{n,t}^{(i)}.
30:end while
31:Obtain 𝐇^=[𝐡^1,,𝐡^M]\hat{\mathbf{H}}=[\hat{\mathbf{h}}_{1},\cdots,\hat{\mathbf{h}}_{M}] with 𝐡^m=𝐡~m(I)\hat{\mathbf{h}}_{m}=\tilde{\mathbf{h}}_{m}^{(I)}.
32:Determine the set of active users as 𝒦^{n𝒩|t=0Tωn,t(I+1)θ}\hat{\mathcal{K}}\triangleq\{n\in\mathcal{N}|\sum_{t=0}^{T}\omega_{n,t}^{(I+1)}\geq\theta\}.
33:Determine the synchronization delays as tn=argmaxt{0,,T}ωn,t(I+1)t_{n}=\mathop{\arg\max}\limits_{t\in\{0,\cdots,T\}}\omega_{n,t}^{(I+1)}, n\forall n.

IV-C Computational Complexity Analysis

The computational complexity of the proposed OAMP-based and FPAMP-based algorithms is analyzed and compared with that of the AMP-based algorithm [23]. Since the deep unrolling method in [23] is not able to reduce the complexity of the AMP algorithm significantly, it is not considered in the analysis. As shown in Table I, we choose the number of complex-valued multiplications as the metric of interest, and the complexity of one real-valued multiplication is assumed to be one quarter of the complex-valued counterpart’s. For OAMP-based algorithm that uses LMMSE, the computational complexity is highest due to the matrix inverse operation (5) in each iteration. Our proposed FPAMP-based algorithm avoids the matrix inversion and thus enjoys a low complexity. For iteration with index iN(T+1)i\ll N(T+1), the FPAMP-based and AMP-based algorithms have comparable complexity. Our experimental results show, the maximum iteration number for convergence of FPAMP-based algorithm typically does not exceed 2020, which is far smaller than the value of (T+1)N(T+1)N in grant-free massive RA systems.

We use the simulation setting with K=50K=50, M=16M=16, N=400N=400, L¯=50\bar{L}=50, and T=4T=4, as will be detailed in Section V, to give an intuitive picture on the computational complexity of these algorithms. The numbers of complex-valued multiplications of the AMP and OAMP-based algorithms in one iteration are given by 2.2×1062.2\times 10^{6} and 9.4×1079.4\times 10^{7}, respectively. In addition, suppose ii is swept from 11 to 1010, the number of complex-valued multiplications of the FPAMP algorithm in the ii-th iteration increases from 1.8×1061.8\times 10^{6} to 4.2×1064.2\times 10^{6}. These results confirm that the FPAMP-based algorithm has a much lower complexity compared with the OAMP-based algorithm.

TABLE I: Computational Complexity Analysis
Algorithm Number of complex multiplications in each iteration
AMP-based 14M(5(T+1)NL+9(T+1)N+3L)\frac{1}{4}M\left(5(T+1)NL+9(T+1)N+3L\right)
OAMP-based 14M(4(T+1)NL2+L3+2(T+1)NL+4(T+1)N)\frac{1}{4}M\left(4(T+1)NL^{2}\!+\!L^{3}+2(T+1)NL+4(T+1)N\right)
FPAMP-based 14M(4(T+1)NL+3(T+1)N(i+1)\frac{1}{4}M\Big{(}4(T+1)NL+3(T+1)N(i+1) +L(3i+2)+12i3+9i2+10i+L(3i+2)+12i^{3}+9i^{2}+10i +(4i3+4i2)×(k=12ilog2k+log2(2i+1k)))+(4i^{3}+4i^{2})\times(\sum_{k=1}^{2i}\log_{2}k+\log_{2}(2i+1-k))\Big{)}

V Simulation Results

We consider an uplink cellular network where 400 users are uniformly distributed within a circular ring centered at a multi-antenna BS in our simulations. The path loss of user nn is modeled as βn=128.136.7log10(dn)\beta_{n}=-128.1-36.7\log_{10}(d_{n}) (dB) with dn[0.05,1]d_{n}\in[0.05,1] km. The number of BS antennas is M=16M=16 and the pilot sequence length is L¯=50\bar{L}=50. Besides, the transmit power of each user is set to be 2323 dBm, and the noise power spectrum density is 169-169 dBm/Hz over 11 MHz bandwidth. In addition, the maxmium number of iterations is Q=50Q=50, and the accuracy tolerance is ϵ=105\epsilon=10^{-5}. The simulation results are averaged over 10510^{5} independent channel and active user set realizations. By default, the number of active users KK, the transmit power, and the maximum delay TT are set as 5050, 2323 dBm, and 44 symbols, respectively. For comparisons, the following three baselines are also simulated:

  • Group LASSO-based algorithm [19]: This method formulates the joint activity-delay detection and channel estimation problem as a group LASSO problem, which is solved by a block coordinate descent algorithm.

  • AMP-based algorithm [23]: This method uses the AMP algorithm with an MMSE denoiser to perform joint activity-delay detection and channel estimation. However, we do not consider the deep learning implementation in [23] due to the marginal gain of detection and estimation accuracy.

  • Covariance-based algorithm [20]: This method employs a block coordinate descent algorithm based on the covariance matrix of the received pilot signal to first perform activity and delay detection, after which, the channel coefficients are estimated via an MMSE algorithm.

Refer to caption
Figure 2: Probability of missed detection versus probability of false alarm.

We first evaluate the user activity detection performance of different algorithms. The probability of missed detection versus the probability of false alarm is shown in Fig. 2 by tuning the threshold θ\theta for determining the set of active users. In particular, the probability of missed detection is defined as Km/KK_{m}/K, where KmK_{m} denotes the number of active users that are detected as inactive, and the probability of false alarm is defined as Nf/(NK)N_{f}/\left(N-K\right) with NfN_{f} denoting the number of inactive users that are detected as active. It is first observed from Fig. 2 that the two proposed algorithms achieve the lowest missed detection probabilities for a given false alarm probability. Besides, although the covariance-based method outperforms the group LASSO-based and AMP-based methods, it is outperformed by the two proposed algorithms by a large margin. Such performance improvement, on one hand, is attributed to the capabilities of the proposed OAMP-based and FPAMP-based algorithms in handling pilot matrices with non-i.i.d. entries. On the other hand, exploration of the common sparsity pattern among multiple BS antennas also contributes significantly. In addition, the proposed FPAMP-based algorithm has negligible performance degradation compared with the OAMP-based algorithm despite with reduced computational complexity.

Next, the delay detection error probability is shown in Fig. 3, which is defined as Kd/KK_{d}/K with KdK_{d} denoting the number of active users with wrongly detected synchronization delay. The delay detection error probabilities are obtained with a fixed false alarm probability of 10110^{-1}, which is achieved by tuning the threshold value θ\theta. It is seen that an increased number of active users leads to degraded delay detection performance due to the limited radio resources for pilot transmissions. In accordance with in Fig. 2, the two proposed algorithms achieve the best performance, which results from their supreme activity detection performance since the delay detection procedure is based on the channel estimation and activity detection. Since the successful transmission critically depends on correct synchronization delay detection, and the missed detection probability is lower bounded by the delay detection error probability with fixed false alarm probability. Hence, we adopt the delay detection error probability as an indicator on the capability of supporting active users in asynchronous massive RA systems. Specifically, assuming the probability of missed detection requirement is 2×1032\times 10^{-3}, the two proposed algorithms are able to support 60 active users while the AMP-based algorithm can only support 34, which is a remarkable 76% increase.

Refer to caption
Figure 3: Delay detection error probability versus the number of active users KK.

We investigate the relationship between the normalized mean square error (NMSE) of channel estimation and transmit power in Fig. 4. Similar to Figs. 2 and 3, the two proposed algorithms achieve significant NMSE reduction compared with the baselines, especially with large transmit power, and the FPAMP-based algorithm maintains similar performance as the OAMP-based algorithm. These observations again validate the benefits of the LMMSE estimator in the OAMP-based algorithm in eliminating multi-user interference, and using rectangular free cumulants of the non-i.i.d Gaussian pilot matrix. In particular, the latter enables the compatibility of the low-complexity AMP framework with general pilot matrices.

Refer to caption
Figure 4: NMSE of channel estimation versus transmit power.

The user activity detection performance with different maximum synchronization delay, i.e., T=4T=4 and T=9T=9, is further examined in Fig. 5. We find that the performance of all methods deteriorates with a larger synchronization delay since more zero elements exist in the expanded pilot matrix that destroys the Gaussianity to a greater extent. Furthermore, the performance of the two proposed algorithms is less affected compared with the AMP-based algorithm, indicating that they are more robust to the non-i.i.d. Gaussian pilot matrices.

Refer to caption
Figure 5: Probability of missed detection versus probability of false alarm with different maximum synchronization delays.

The computational complexity of different algorithms in terms of average execution time on an Apple Macbook Pro laptop (Model: 14 inch, M1 Pro Chip with 16GB RAM) is summarized in Table II. From the table, it is clear that the group LASSO-based method has the lowest complexity. However, its performance is far inferior to that of other methods. Besides, while the OAMP-based algorithm achieves the best detection and estimation accuracy, it suffers from heavy computational overhead that originates from the LMMSE estimator. Although the covariance-based method has similar computational complexity as the OAMP-based method, there exists a significant performance gap as previously shown in Figs. 2 \sim 4. In addition, the AMP-based and FPAMP-based algorithms have comparable average execution time, and the FPAMP-based algorithm secures approximately 41% complexity reduction compared with the OAMP-based algorithm. This demonstrates that the FPAMP-based algorithm can fully exploit the property of pilot matrices with their the rectangular free cumulants to achieve low complexity and satisfactory detection/estimation accuracy.

TABLE II: Average execution time of different algorithms
Algorithm Average execution time (s)
Group LASSO-based 2.78
AMP-based 3.27
Covariance-based 5.11
OAMP-based 5.92
FPAMP-based 3.51

VI Conclusions

In this paper, we investigated the joint activity detection, synchronization delay detection, and channel estimation in asynchronous grant-free massive random access (RA) systems. Considering that entries in the pilot matrix are not independent and identically Gaussian distributed, we first proposed a novel algorithm based on orthogonal approximate message passing (OAMP), which also fully utilizes the common sparsity among the received pilot signals of multiple base station antennas. To accelerate the computation, a free probability approximate message passing (FPAMP)-based algorithm was further developed, which explores the rectangular free cumulants of the pilot matrix to avoid matrix inversion. Simulation results showed the effectiveness of the proposed algorithms, and the potential of the FPAMP-based algorithm for fast joint activity-delay detection and channel estimation in grant-free massive RA systems.

Our study demonstrates the benefits of exploiting high-order statistical information from the pilot matrix when designing the detection and estimation algorithm for grant-free massive RA. For future investigations, it would be interesting to develop more advanced algorithms for asynchronous massive RA systems, e.g., extending the proposed algorithms by fusing deep unrolling and FPAMP, to further enhance the performance and reduce the complexity.

Appendix A Proof of Theorem 1

In order to prove the theorem, we first present the general recursion as follows:

𝐳m(i)=𝐏H𝐮m(i)j=1i1bm,j(i)𝐯m(j),\displaystyle\mathbf{z}_{m}^{(i)}=\mathbf{P}^{\mathrm{H}}\mathbf{u}_{m}^{(i)}-\sum_{j=1}^{i-1}b_{m,j}^{(i)}\mathbf{v}_{m}^{(j)}, (101)
𝐯m(i)=f~(i)(𝐳m(1),,𝐳m(i),𝐄m),\displaystyle\mathbf{v}_{m}^{(i)}=\tilde{f}^{(i)}(\mathbf{z}_{m}^{(1)},\cdots,\mathbf{z}_{m}^{(i)},\mathbf{E}_{m}), (102)
𝐨m(i)=𝐏𝐯m(i)j=1iam,j(i)𝐮m(j),\displaystyle\mathbf{o}_{m}^{(i)}=\mathbf{P}\mathbf{v}_{m}^{(i)}-\sum_{j=1}^{i}a_{m,j}^{(i)}\mathbf{u}_{m}^{(j)}, (103)
𝐮m(i+1)=g~(i+1)(𝐨m(1),,𝐨m(i),𝐅m),\displaystyle\mathbf{u}_{m}^{(i+1)}=\tilde{g}^{(i+1)}(\mathbf{o}_{m}^{(1)},\cdots,\mathbf{o}_{m}^{(i)},\mathbf{F}_{m}), (104)

where 𝐮m(1)=𝟎\mathbf{u}_{m}^{(1)}=\mathbf{0}, 𝐳m(1)=𝐏H𝐮m(1)\mathbf{z}_{m}^{(1)}=\mathbf{P}^{\mathrm{H}}\mathbf{u}_{m}^{(1)}, and 𝐄m=𝐡m\mathbf{E}_{m}=\mathbf{h}_{m} and 𝐅m=𝐧m\mathbf{F}_{m}=\mathbf{n}_{m} are side information independent of 𝐏\mathbf{P} with finite second-order moments. Functions f~(i):i+1\tilde{f}^{(i)}:\mathbb{C}^{i+1}\rightarrow\mathbb{C} and g~(i+1):i+1\tilde{g}^{(i+1)}:\mathbb{C}^{i+1}\rightarrow\mathbb{C} are pseudo-Lipschitz. In particular, f~(1)(𝐳m(1),𝐡m)=𝐡m\tilde{f}^{(1)}(\mathbf{z}_{m}^{(1)},\mathbf{h}_{m})=\mathbf{h}_{m}, and for i=1,2,i=1,2,\cdots, f~(i+1)()\tilde{f}^{(i+1)}(\cdot) and h~(i+1)()\tilde{h}^{(i+1)}(\cdot) are defined as follows:

f~(i+1)(𝐳m(1),,𝐳m(i+1),𝐡m)=f(i)(𝐳m(2)+μ¯m(1)𝐡m,,𝐳m(i+1)+μ¯m(i)𝐡m),\displaystyle\begin{aligned} &\tilde{f}^{(i+1)}(\mathbf{z}_{m}^{(1)},\cdots,\mathbf{z}_{m}^{(i+1)},\mathbf{h}_{m})\\ &=f^{(i)}(\mathbf{z}_{m}^{(2)}+\bar{\mu}_{m}^{(1)}\mathbf{h}_{m},\cdots,\mathbf{z}_{m}^{(i+1)}+\bar{\mu}_{m}^{(i)}\mathbf{h}_{m}),\end{aligned} (105)
g~(i+1)(𝐨m(1),,𝐨m(i),𝐧m)=g(i)(𝐨m(2),,𝐨m(i),𝐨m(1)+𝐧m).\displaystyle\tilde{g}^{(i+1)}(\mathbf{o}_{m}^{(1)},\cdots,\mathbf{o}_{m}^{(i)},\mathbf{n}_{m})=g^{(i)}(\mathbf{o}_{m}^{(2)},\cdots,\mathbf{o}_{m}^{(i)},\mathbf{o}_{m}^{(1)}+\mathbf{n}_{m}). (106)

Then, the coefficients {am,j(i)}j=1i\{a_{m,j}^{(i)}\}_{j=1}^{i} and {bm,j(i)}j=1i1\{b_{m,j}^{(i)}\}_{j=1}^{i-1} are respectively given by the last rows of matrices 𝐌^m,a(i)\hat{\mathbf{M}}^{(i)}_{m,a} and 𝐌^m,b(i)\hat{\mathbf{M}}^{(i)}_{m,b} as follows:

𝐌^m,a(i)=j=0iκ2(j+1)𝚿^m(i)(𝚽^m(i)𝚿^m(i))j,\displaystyle\hat{\mathbf{M}}^{(i)}_{m,a}=\sum_{j=0}^{i}\kappa_{2(j+1)}\hat{\mathbf{\Psi}}_{m}^{(i)}\left(\hat{\mathbf{\Phi}}_{m}^{(i)}\hat{\mathbf{\Psi}}_{m}^{(i)}\right)^{j}, (107)
𝐌^m,b(i)=L(T+1)Nj=0i1κ2(j+1)𝚽^m(i)(𝚿^m(i)𝚽^m(i))j.\displaystyle\hat{\mathbf{M}}^{(i)}_{m,b}=\frac{L}{(T+1)N}\sum_{j=0}^{i-1}\kappa_{2(j+1)}\hat{\mathbf{\Phi}}_{m}^{(i)}\left(\hat{\mathbf{\Psi}}_{m}^{(i)}\hat{\mathbf{\Phi}}_{m}^{(i)}\right)^{j}. (108)

In the above expressions, the auxiliary matrices 𝚿^m(i)\hat{\mathbf{\Psi}}_{m}^{(i)} and 𝚽^m(i)\hat{\mathbf{\Phi}}_{m}^{(i)} are defined as follows:

𝚿^m(i)=[000002𝐯m(2)0002𝐯m(3)3𝐯m(3)002𝐯m(i)3𝐯m(i)i𝐯m(i)],\displaystyle\hat{\mathbf{\Psi}}_{m}^{(i)}=\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ 0&\left\langle\partial_{2}\mathbf{v}_{m}^{(2)}\right\rangle&0&\ldots&0\\ 0&\left\langle\partial_{2}\mathbf{v}_{m}^{(3)}\right\rangle&\left\langle\partial_{3}\mathbf{v}_{m}^{(3)}\right\rangle&\ldots&0\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\left\langle\partial_{2}\mathbf{v}_{m}^{(i)}\right\rangle&\left\langle\partial_{3}\mathbf{v}_{m}^{(i)}\right\rangle&\ldots&\left\langle\partial_{i}\mathbf{v}_{m}^{(i)}\right\rangle\end{array}\right], (114)
𝚽^m(i)=[00001𝐮m(2)0001𝐮m(3)2𝐮m(2)001𝐮m(i)2𝐮m(i)i1𝐮m(i)0],\displaystyle\hat{\mathbf{\Phi}}_{m}^{(i)}=\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ \left\langle\partial_{1}\mathbf{u}_{m}^{(2)}\right\rangle&0&0&\ldots&0\\ \left\langle\partial_{1}\mathbf{u}_{m}^{(3)}\right\rangle&\left\langle\partial_{2}\mathbf{u}_{m}^{(2)}\right\rangle&0&\ldots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ \left\langle\partial_{1}\mathbf{u}_{m}^{(i)}\right\rangle&\left\langle\partial_{2}\mathbf{u}_{m}^{(i)}\right\rangle&\ldots&\left\langle\partial_{i-1}\mathbf{u}_{m}^{(i)}\right\rangle&0\end{array}\right], (120)

where k𝐯m(i)\partial_{k}\mathbf{v}_{m}^{(i)} and k𝐮m(i)\partial_{k}\mathbf{u}_{m}^{(i)} denote the row-wise partial derivatives.

On the other hand, the state evolution of the general recursion is defined as follows:

[Zm(1),,Zm(i)]T𝒞𝒩(𝟎,𝛀~m(i)),\displaystyle\left[Z_{m}^{(1)},\ldots,Z_{m}^{(i)}\right]^{\mathrm{T}}\sim\mathcal{CN}(\mathbf{0},\tilde{\mathbf{\Omega}}_{m}^{(i)}), (121)
Vm(i)=f~(i)(Zm(1),,Zm(i),Hm),\displaystyle V_{m}^{(i)}=\tilde{f}^{(i)}(Z_{m}^{(1)},\ldots,Z_{m}^{(i)},H_{m}), (122)
[Om(1),,Om(i)]T𝒞𝒩(𝟎,𝚺~m(i)),\displaystyle\left[O_{m}^{(1)},\cdots,O_{m}^{(i)}\right]^{\mathrm{T}}\sim\mathcal{CN}(\mathbf{0},\tilde{\mathbf{\Sigma}}_{m}^{(i)}), (123)
Um(i+1)=g~(i+1)(Om(1),,Om(i),Nm).\displaystyle U_{m}^{(i+1)}=\tilde{g}^{(i+1)}(O_{m}^{(1)},\cdots,O_{m}^{(i)},N_{m}). (124)

The covariance matrices 𝚺~m(i+1)\tilde{\mathbf{\Sigma}}_{m}^{(i+1)} and 𝛀~m(i+1)\tilde{\mathbf{\Omega}}_{m}^{(i+1)} in (123) and (121) are computed as

𝚺~m(i+1)=j=02i+1κ¯2(j+1)𝚵~m,j(i+1),\displaystyle\tilde{\mathbf{\Sigma}}_{m}^{(i+1)}=\sum_{j=0}^{2i+1}\bar{\kappa}_{2(j+1)}\tilde{\mathbf{\Xi}}_{m,j}^{(i+1)}, (125)
𝛀~m(i+1)=L(T+1)Nj=02iκ¯2(j+1)𝚯~m,j(i+1),\displaystyle\tilde{\mathbf{\Omega}}_{m}^{(i+1)}=\frac{L}{(T+1)N}\sum_{j=0}^{2i}\bar{\kappa}_{2(j+1)}\tilde{\mathbf{\Theta}}_{m,j}^{(i+1)}, (126)

where 𝚵~m,0(i+1)=𝚪~m(i+1)\tilde{\mathbf{\Xi}}_{m,0}^{(i+1)}=\tilde{\mathbf{\Gamma}}_{m}^{(i+1)}, 𝚯~m,0(i+1)=𝚫~m(i+1)\tilde{\mathbf{\Theta}}_{m,0}^{(i+1)}=\tilde{\mathbf{\Delta}}_{m}^{(i+1)}, and for j1j\geq 1:

𝚵~m,j(i+1)=k=0j(𝚿~m(i+1)𝚽~m(i+1))k𝚪~m(i+1)((𝚿~m(i+1)𝚽~m(i+1))H)jk+k=0j1(𝚿~m(i+1)𝚽~m(i+1))k𝚿~m(i+1)𝚫~m(i+1)(𝚿~m(i+1))H((𝚿~m(i+1)𝚽~m(i+1))H)jk1,\displaystyle\begin{aligned} \tilde{\mathbf{\Xi}}_{m,j}^{(i+1)}&=\sum_{k=0}^{j}\left(\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{k}\tilde{\mathbf{\Gamma}}_{m}^{(i+1)}\left(\left(\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{\mathrm{H}}\right)^{j-k}\\ &+\sum_{k=0}^{j-1}\left(\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{k}\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\tilde{\mathbf{\Delta}}_{m}^{(i+1)}(\tilde{\mathbf{\Psi}}_{m}^{(i+1)})^{\mathrm{H}}\\ &\quad\quad\cdot\left(\left(\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\right)^{\mathrm{H}}\right)^{j-k-1},\end{aligned} (127)
𝚯~m,j(i+1)=k=0j(𝚽~m(i+1)𝚿~m(i+1))k𝚫~m(i+1)((𝚽~m(i+1)𝚿~m(i+1))H)jk+k=0j1(𝚽~m(i+1)𝚿~m(i+1))k𝚽~m(i+1)𝚪~m(i+1)(𝚽~m(i+1))H((𝚽~m(i+1)𝚿~m(i+1))H)jk1.\displaystyle\begin{aligned} \tilde{\mathbf{\Theta}}_{m,j}^{(i+1)}&=\sum_{k=0}^{j}\left(\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\right)^{k}\tilde{\mathbf{\Delta}}_{m}^{(i+1)}\left(\left(\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\right)^{\mathrm{H}}\right)^{j-k}\\ &+\sum_{k=0}^{j-1}\left(\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\right)^{k}\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\tilde{\mathbf{\Gamma}}_{m}^{(i+1)}(\tilde{\mathbf{\Phi}}_{m}^{(i+1)})^{\mathrm{H}}\\ &\quad\quad\left(\left(\tilde{\mathbf{\Phi}}_{m}^{(i+1)}\tilde{\mathbf{\Psi}}_{m}^{(i+1)}\right)^{\mathrm{H}}\right)^{j-k-1}.\end{aligned} (128)

In particular, 𝚿~m,j(i+1)\tilde{\mathbf{\Psi}}_{m,j}^{(i+1)} and 𝚽~m(i+1)\tilde{\mathbf{\Phi}}_{m}^{(i+1)} are the deterministic versions of 𝚿^m,j(i+1)\hat{\mathbf{\Psi}}_{m,j}^{(i+1)} and 𝚽^m(i+1)\hat{\mathbf{\Phi}}_{m}^{(i+1)} given as follows:

𝚿~m(i+1)=[00000𝔼[2Vm(2)]000𝔼[2Vm(3)]𝔼[3Vm(3)]00𝔼[3Vm(i+1)]𝔼[3Vm(i+1)]𝔼[i+1Vm(i+1)]],\displaystyle\begin{aligned} &\tilde{\mathbf{\Psi}}_{m}^{(i+1)}=\\ &\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ 0&\mathbb{E}[\partial_{2}V_{m}^{(2)}]&0&\ldots&0\\ 0&\mathbb{E}[\partial_{2}V_{m}^{(3)}]&\mathbb{E}[\partial_{3}V_{m}^{(3)}]&\ldots&0\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\mathbb{E}[\partial_{3}V_{m}^{(i+1)}]&\mathbb{E}[\partial_{3}V_{m}^{(i+1)}]&\ldots&\mathbb{E}[\partial_{i+1}V_{m}^{(i+1)}]\end{array}\right],\end{aligned} (129)
𝚽~m(i+1)=[0000𝔼[1Um(2)]000𝔼[1Um(3)]𝔼[2Um(2)]00𝔼[1Um(i+1)]𝔼[2Um(i+1)]𝔼[iUm(i+1)]0].\displaystyle\begin{aligned} &\tilde{\mathbf{\Phi}}_{m}^{(i+1)}=\\ &\left[\begin{array}[]{ccccc}0&0&\ldots&0&0\\ \mathbb{E}[\partial_{1}U_{m}^{(2)}]&0&0&\ldots&0\\ \mathbb{E}[\partial_{1}U_{m}^{(3)}]&\mathbb{E}[\partial_{2}U_{m}^{(2)}]&0&\ldots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ \mathbb{E}[\partial_{1}U_{m}^{(i+1)}]&\mathbb{E}[\partial_{2}U_{m}^{(i+1)}]&\ldots&\mathbb{E}[\partial_{i}U_{m}^{(i+1)}]&0\end{array}\right].\end{aligned} (130)

Besides, 𝚪~m(i+1)\tilde{\mathbf{\Gamma}}_{m}^{(i+1)} and 𝚫~m(i+1)\tilde{\mathbf{\Delta}}_{m}^{(i+1)} are given as follows:

𝚪~m(i+1)=[𝔼[(Vm(1))2]𝔼[Vm(1)Vm(2)]𝔼[Vm(1)Vm(i+1)]𝔼[Vm(1)Vm(2)]𝔼[(Vm(2))2]𝔼[Vm(2)Vm(i+1)]𝔼[Vm(1)Vm(3)]𝔼[Vm(2)Vm(3)]𝔼[Vm(3)Vm(i+1)]𝔼[Vm(1)Vm(i+1)]𝔼[Vm(2)Vm(i+1)]𝔼[(Vm(i+1))2]],\displaystyle\begin{aligned} &\tilde{\mathbf{\Gamma}}_{m}^{(i+1)}=\\ &\left[\begin{array}[]{cccc}\mathbb{E}[(V_{m}^{(1)})^{2}]&\mathbb{E}[V_{m}^{(1)}V_{m}^{(2)}]&\ldots&\mathbb{E}[V_{m}^{(1)}V_{m}^{(i+1)}]\\ \mathbb{E}[V_{m}^{(1)}V_{m}^{(2)}]&\mathbb{E}[(V_{m}^{(2)})^{2}]&\ldots&\mathbb{E}[V_{m}^{(2)}V_{m}^{(i+1)}]\\ \mathbb{E}[V_{m}^{(1)}V_{m}^{(3)}]&\mathbb{E}[V_{m}^{(2)}V_{m}^{(3)}]&\ldots&\mathbb{E}[V_{m}^{(3)}V_{m}^{(i+1)}]\\ \vdots&\vdots&\ddots&\vdots\\ \mathbb{E}[V_{m}^{(1)}V_{m}^{(i+1)}]&\mathbb{E}[V_{m}^{(2)}V_{m}^{(i+1)}]&\ldots&\mathbb{E}[(V_{m}^{(i+1)})^{2}]\end{array}\right],\end{aligned} (131)
𝚫~m(i+1)=[𝔼[(Um(1))2]𝔼[Um(1)Vm(2)]𝔼[Um(1)Um(i+1)]𝔼[Um(1)Um(2)]𝔼[(Um(2))2]𝔼[Um(2)Um(i+1)]𝔼[Um(1)Um(3)]𝔼[Um(2)Um(3)]𝔼[Um(3)Um(i+1)]𝔼[Um(1)Um(i+1)]𝔼[Um(2)Um(i+1)]𝔼[(Um(i+1))2]].\displaystyle\begin{aligned} &\tilde{\mathbf{\Delta}}_{m}^{(i+1)}=\\ &\left[\begin{array}[]{ccccc}\mathbb{E}[(U_{m}^{(1)})^{2}]&\mathbb{E}[U_{m}^{(1)}V_{m}^{(2)}]&\ldots&\mathbb{E}[U_{m}^{(1)}U_{m}^{(i+1)}]\\ \mathbb{E}[U_{m}^{(1)}U_{m}^{(2)}]&\mathbb{E}[(U_{m}^{(2)})^{2}]&\ldots&\mathbb{E}[U_{m}^{(2)}U_{m}^{(i+1)}]\\ \mathbb{E}[U_{m}^{(1)}U_{m}^{(3)}]&\mathbb{E}[U_{m}^{(2)}U_{m}^{(3)}]&\ldots&\mathbb{E}[U_{m}^{(3)}U_{m}^{(i+1)}]\\ \vdots&\vdots&\ddots&\vdots\\ \mathbb{E}[U_{m}^{(1)}U_{m}^{(i+1)}]&\mathbb{E}[U_{m}^{(2)}U_{m}^{(i+1)}]&\ldots&\mathbb{E}[(U_{m}^{(i+1)})^{2}]\end{array}\right].\end{aligned} (132)

The following theorem reveals the relationship between the general recursion and its state evolution.

Theorem 2.

Consider the general recursion in (101) \sim (104) and its state evolution defined in (121) \sim (124). Let ψ~\tilde{\psi}: 2i+1\mathbb{C}^{2i+1}\rightarrow\mathbb{C} and ϕ~\tilde{\phi}: 2i+2\mathbb{C}^{2i+2}\rightarrow\mathbb{C} be any pseudo-Lipschitz functions of order 2. For i=1,2,i=1,2,\cdots, it is almost sure that

lim(T+1)Nt=1(T+1)Nψ~(zt,m(2),,zt,m(i+1),vt,m(2),,vt,m(i+1),ht,m)(T+1)N=𝔼[ψ~(Zm(2),,Zm(i+1),Vm(2),,Vm(i+1),Hm)],\displaystyle\begin{aligned} \lim_{(T+1)N\rightarrow\infty}&\frac{\sum_{t=1}^{(T+1)N}\tilde{\psi}\left(z_{t,m}^{(2)},\ldots,z_{t,m}^{(i+1)},v_{t,m}^{(2)},\ldots,v_{t,m}^{(i+1)},h_{t,m}\right)}{(T+1)N}\\ &=\mathbb{E}\left[\tilde{\psi}\left(Z_{m}^{(2)},\ldots,Z_{m}^{(i+1)},V_{m}^{(2)},\ldots,V_{m}^{(i+1)},H_{m}\right)\right],\end{aligned} (133)
limL1Ll=1Lϕ~(ol,m(1),,ol,m(i),ul,m(1),,ul,m(i+1),nl,m)=𝔼[ϕ~(Om(1),,Om(i),Um(1),,Um(i+1),Nm)].\displaystyle\begin{aligned} \lim_{L\rightarrow\infty}&\frac{1}{L}\sum_{l=1}^{L}\tilde{\phi}\left(o_{l,m}^{(1)},\ldots,o_{l,m}^{(i)},u_{l,m}^{(1)},\ldots,u_{l,m}^{(i+1)},n_{l,m}\right)\\ &=\mathbb{E}\left[\tilde{\phi}\left(O_{m}^{(1)},\ldots,O_{m}^{(i)},U_{m}^{(1)},\ldots,U_{m}^{(i+1)},N_{m}\right)\right].\end{aligned} (134)

In other words, as N,LN,L\rightarrow\infty, the following results hold almost surely:

(𝐳m(2),,𝐳m(i+1),𝐯m(2),,𝐳m(i+1),𝐡m)W2(Zm(2),,Zm(i+1),Vm(2),,Vm(i+1),Hm),\displaystyle\begin{aligned} &\left(\mathbf{z}_{m}^{(2)},\cdots,\mathbf{z}_{m}^{(i+1)},\mathbf{v}_{m}^{(2)},\cdots,\mathbf{z}_{m}^{(i+1)},\mathbf{h}_{m}\right)\\ &\stackrel{{\scriptstyle W_{2}}}{{\longrightarrow}}\left(Z_{m}^{(2)},\ldots,Z_{m}^{(i+1)},V_{m}^{(2)},\ldots,V_{m}^{(i+1)},H_{m}\right),\end{aligned} (135)
(𝐨m(1),,𝐨m(i),𝐮m(1),,𝐮m(i+1),𝐧m)W2(Om(1),,Om(i),Um(1),,Um(i+1),Nm).\displaystyle\begin{aligned} &\left(\mathbf{o}_{m}^{(1)},\cdots,\mathbf{o}_{m}^{(i)},\mathbf{u}_{m}^{(1)},\cdots,\mathbf{u}_{m}^{(i+1)},\mathbf{n}_{m}\right)\\ &\stackrel{{\scriptstyle W_{2}}}{{\longrightarrow}}\left(O_{m}^{(1)},\ldots,O_{m}^{(i)},U_{m}^{(1)},\ldots,U_{m}^{(i+1)},N_{m}\right).\end{aligned} (136)
Proof.

The proof is referred to Theorem 5.3 in [29]. ∎

Theorem 2 shows that the joint empirical distribution of (𝐳m(1),,𝐳m(i))(\mathbf{z}_{m}^{(1)},\ldots,\mathbf{z}_{m}^{(i)}) converges to an ii-dimensional complex Gaussian distribution 𝒞𝒩(𝟎,𝛀~m(i))\mathcal{CN}(\mathbf{0},\tilde{\mathbf{\Omega}}_{m}^{(i)}), and the joint empirical distribution of (𝐨m(1),,𝐨m(i))(\mathbf{o}_{m}^{(1)},\ldots,\mathbf{o}_{m}^{(i)}) converges to an ii-dimensional complex Gaussian distribution 𝒞𝒩(𝟎,𝚺~m(i))\mathcal{CN}(\mathbf{0},\tilde{\mathbf{\Sigma}}_{m}^{(i)}). Based on this theorem, we next claim that the state evolution of the general recursion and the state evolution of the FPAMP-based algorithm are equivalent.

Lemma 1.

For i=1,2,i=1,2,\cdots, we have 𝚺~m(i)=𝚺¯m(i)\tilde{\mathbf{\Sigma}}_{m}^{(i)}=\bar{\mathbf{\Sigma}}_{m}^{(i)} and 𝛀~m(i+1)=𝛀ˇm(i+1)\tilde{\mathbf{\Omega}}_{m}^{(i+1)}=\check{\mathbf{\Omega}}_{m}^{(i+1)}, where 𝛀¯m(i)\bar{\mathbf{\Omega}}_{m}^{(i)} is the lower right i×ii\times i submatrix of 𝛀ˇm(i+1)\check{\mathbf{\Omega}}_{m}^{(i+1)}.

Proof.

We use induction to prove this lemma. In particular, for i=1i=1, Om(1)𝒞𝒩(0,𝚺~m(1))O_{m}^{(1)}\sim\mathcal{CN}(0,\tilde{\mathbf{\Sigma}}_{m}^{(1)}) with 𝚺~m(1)=κ¯2𝔼[Hm2]=𝚺¯m(1)\tilde{\mathbf{\Sigma}}_{m}^{(1)}=\bar{\kappa}_{2}\mathbb{E}[H_{m}^{2}]=\bar{\mathbf{\Sigma}}_{m}^{(1)}. Then, matrix 𝛀~m(2)\tilde{\mathbf{\Omega}}_{m}^{(2)} can be computed as

𝛀~m(2)=[000L(κ¯2𝔼[(g(1))2]+κ¯4𝔼[(Vm(1))2](𝔼[Om(1)g(1)])2)(T+1)N].\displaystyle\tilde{\mathbf{\Omega}}_{m}^{(2)}=\left[\begin{array}[]{cc}0&0\\ 0&\frac{L\left(\bar{\kappa}_{2}\mathbb{E}[(g^{(1)})^{2}]+\bar{\kappa}_{4}\mathbb{E}[(V_{m}^{(1)})^{2}](\mathbb{E}[\partial_{O_{m}^{(1)}}g^{(1)}])^{2}\right)}{(T+1)N}\end{array}\right]. (139)

Since Vm(1)V_{m}^{(1)} and HmH_{m} (Om(1)O_{m}^{(1)} and GmG_{m}) follow the same distribution, 𝛀~m(2)=𝛀ˇm(2)\tilde{\mathbf{\Omega}}_{m}^{(2)}=\check{\mathbf{\Omega}}_{m}^{(2)}.

Suppose 𝚺~m(i)=𝚺¯m(i)\tilde{\mathbf{\Sigma}}_{m}^{(i)}=\bar{\mathbf{\Sigma}}_{m}^{(i)} and 𝛀~m(i+1)=𝛀ˇm(i+1)\tilde{\mathbf{\Omega}}_{m}^{(i+1)}=\check{\mathbf{\Omega}}_{m}^{(i+1)} for i1i\geq 1. Also, based on the state evolution of the general recursion, we have (Om(1),,Om(i))𝒞𝒩(𝟎,𝚺~m(i))(O_{m}^{(1)},\!\cdots\!,O_{m}^{(i)})\!\sim\mathcal{CN}(\mathbf{0},\tilde{\mathbf{\Sigma}}_{m}^{(i)}) and (Zm(1),,Zm(i+1))𝒞𝒩(𝟎,𝛀~m(i+1))(Z_{m}^{(1)},\cdots,Z_{m}^{(i+1)})\sim\mathcal{CN}(\mathbf{0},\tilde{\mathbf{\Omega}}_{m}^{(i+1)}). By utilizing the induction hypothesis 𝚺~m(i)=𝚺¯m(i)\tilde{\mathbf{\Sigma}}_{m}^{(i)}=\bar{\mathbf{\Sigma}}_{m}^{(i)} and 𝛀~m(i+1)=𝛀ˇm(i+1)\tilde{\mathbf{\Omega}}_{m}^{(i+1)}=\check{\mathbf{\Omega}}_{m}^{(i+1)} in the definitions of 𝚿~m(i+1)\tilde{\mathbf{\Psi}}_{m}^{(i+1)}, 𝚽~m(i+1)\tilde{\mathbf{\Phi}}_{m}^{(i+1)}, 𝚪~m(i+1)\tilde{\mathbf{\Gamma}}_{m}^{(i+1)}, and 𝚫~m(i+1)\tilde{\mathbf{\Delta}}_{m}^{(i+1)} in (LABEL:tildepsi) \sim (132), we obtain 𝚿~m(i+1)=𝚿¯m(i+1)\tilde{\mathbf{\Psi}}_{m}^{(i+1)}=\bar{\mathbf{\Psi}}_{m}^{(i+1)}, 𝚽~m(i+1)=𝚽¯m(i+1)\tilde{\mathbf{\Phi}}_{m}^{(i+1)}=\bar{\mathbf{\Phi}}_{m}^{(i+1)}, 𝚪~m(i+1)=𝚪¯m(i+1)\tilde{\mathbf{\Gamma}}_{m}^{(i+1)}=\bar{\mathbf{\Gamma}}_{m}^{(i+1)}, and 𝚫~m(i+1)=𝚫¯m(i+1)\tilde{\mathbf{\Delta}}_{m}^{(i+1)}=\bar{\mathbf{\Delta}}_{m}^{(i+1)}. As a result, by comparing (76) and (125), we have 𝚺~m(i+1)=𝚺¯m(i+1)\tilde{\mathbf{\Sigma}}_{m}^{(i+1)}=\bar{\mathbf{\Sigma}}_{m}^{(i+1)}. This implies that 𝚽~m(i+2)=𝚽¯m(i+2)\tilde{\mathbf{\Phi}}_{m}^{(i+2)}=\bar{\mathbf{\Phi}}_{m}^{(i+2)}, 𝚫~m(i+2)=𝚫¯m(i+2)\tilde{\mathbf{\Delta}}_{m}^{(i+2)}=\bar{\mathbf{\Delta}}_{m}^{(i+2)}. Hence, according to (79) and (128), we have 𝚯~m,j(i+2)=𝚯m,j(i+2)\tilde{\mathbf{\Theta}}_{m,j}^{(i+2)}=\mathbf{\Theta}_{m,j}^{(i+2)}, and thus 𝛀~m(i+2)=𝛀ˇm(i+2)\tilde{\mathbf{\Omega}}_{m}^{(i+2)}=\check{\mathbf{\Omega}}_{m}^{(i+2)}, which completes the proof. ∎

Therefore, (Zm(2),,Zm(i+1))(Z_{m}^{(2)},\cdots,Z_{m}^{(i+1)}) and (Hm(1)μ¯m(1)Hm,,(H_{m}^{(1)}-\bar{\mu}_{m}^{(1)}H_{m},\cdots, Hm(i)μ¯m(i)Hm)H_{m}^{(i)}-\bar{\mu}_{m}^{(i)}H_{m}) have the same distribution, which also applies to (Om(1),,Om(i))(O_{m}^{(1)},\cdots,O_{m}^{(i)}) and (Gm,Rm(1),,Rm(i1))(G_{m},R_{m}^{(1)},\cdots,R_{m}^{(i-1)}). In order to prove Theorem 1, it remains to show in the following lemma that the FPAMP iterations are close to the general recursion.

Lemma 2.

Let ψ\psi: 2i+1\mathbb{C}^{2i+1}\rightarrow\mathbb{C} and ϕ\phi: 2i+2\mathbb{C}^{2i+2}\rightarrow\mathbb{C} be any pseudo-Lipschitz functions of order 2. For i=1,2,i=1,2,\cdots, it is almost sure that

lim(T+1)N1(T+1)N|t=1(T+1)Nψ(ht,m(1),,ht,m(i),h~t,m(1),,h~t,m(i),ht,m)t=1(T+1)Nψ(zt,m(2)+μ¯m(1)ht,m,,zt,m(i+1)+μ¯m(i)ht,m,vt,m(2),,vt,m(i+1),ht,m)|=0,\displaystyle\begin{aligned} &\lim_{(T+1)N\rightarrow\infty}\frac{1}{(T+1)N}\\ &\Bigg{|}\sum_{t=1}^{(T+1)N}\psi\left(h_{t,m}^{(1)},\ldots,h_{t,m}^{(i)},\tilde{h}_{t,m}^{(1)},\ldots,\tilde{h}_{t,m}^{(i)},h_{t,m}\right)\\ &-\sum_{t=1}^{(T+1)N}\psi\Big{(}z_{t,m}^{(2)}+\bar{\mu}_{m}^{(1)}h_{t,m},\ldots,z_{t,m}^{(i+1)}+\bar{\mu}_{m}^{(i)}h_{t,m},\\ &\quad\quad\quad\quad\quad v_{t,m}^{(2)},\ldots,v_{t,m}^{(i+1)},h_{t,m}\Big{)}\Bigg{|}=0,\end{aligned} (140)
limL1L|l=1Lϕ(rl,m(1),,rl,m(i),sl,m(1),,sl,m(i+1),yl,m)l=1Lϕ(ol,m(2),,ol,m(i+1),ul,m(2),,ul,m(i+2),ol,m(1)+nl,m)|=0.\displaystyle\begin{aligned} &\lim_{L\rightarrow\infty}\frac{1}{L}\Bigg{|}\sum_{l=1}^{L}\phi\left(r_{l,m}^{(1)},\ldots,r_{l,m}^{(i)},s_{l,m}^{(1)},\ldots,s_{l,m}^{(i+1)},y_{l,m}\right)\\ &-\sum_{l=1}^{L}\phi\left(o_{l,m}^{(2)},\ldots,o_{l,m}^{(i+1)},u_{l,m}^{(2)},\ldots,u_{l,m}^{(i+2)},o_{l,m}^{(1)}+n_{l,m}\right)\Bigg{|}=0.\end{aligned} (141)
Proof.

By applying Cauchy-Schwarz inequality, we have the following results since ψ()\psi\left(\cdot\right) is a pseudo-Lipschitz function:

1(T+1)N|t=1(T+1)Nψ(ht,m(1),,ht,m(i),h~t,m(1),,h~t,m(i),ht,m)t=1(T+1)Nψ(zt,m(2)+μ¯m(1)ht,m,,zt,m(i+1)+μ¯m(i)ht,m,vt,m(2),,vt,m(i+1),ht,m)|C0(T+1)Nt=1(T+1)N(1+|ht,m|+j=1i(|ht,m(j)|+|h~t,m(j)|+|zt,m(j+1)+μ¯m(j)ht,m|+|vt,m(j+1)|))×(j=1i(|ht,m(j)zt,m(j+1)μ¯m(j)ht,m|2+|h~t,m(j)vt,m(j+1)|2))12C0(4i+2)[1+𝐡m2(T+1)N+j=1i𝐡m(j)2+𝐡~m(j)2+𝐳m(j+1)+μ¯m(j)𝐡m2+𝐯m(j+1)2(T+1)N]12×(j=1i𝐡m(j)𝐳m(j+1)μ¯m(j)𝐡m2+𝐡~m(j)𝐯m(j+1)2(T+1)N)12,\displaystyle\begin{aligned} &\frac{1}{(T+1)N}\Bigg{|}\sum_{t=1}^{(T+1)N}\psi\left(h_{t,m}^{(1)},\ldots,h_{t,m}^{(i)},\tilde{h}_{t,m}^{(1)},\ldots,\tilde{h}_{t,m}^{(i)},h_{t,m}\right)\\ &-\sum_{t=1}^{(T+1)N}\psi\Big{(}z_{t,m}^{(2)}+\bar{\mu}_{m}^{(1)}h_{t,m},\ldots,z_{t,m}^{(i+1)}+\bar{\mu}_{m}^{(i)}h_{t,m},\\ &\quad\quad\quad\quad\quad v_{t,m}^{(2)},\ldots,v_{t,m}^{(i+1)},h_{t,m}\Big{)}\Bigg{|}\\ &\leq\frac{C_{0}}{(T+1)N}\sum_{t=1}^{(T+1)N}\Bigg{(}1+|h_{t,m}|+\sum_{j=1}^{i}\Big{(}|h_{t,m}^{(j)}|+|\tilde{h}_{t,m}^{(j)}|+|z_{t,m}^{(j+1)}\\ &\quad\quad\quad\quad\quad\quad\quad\quad\quad+\bar{\mu}_{m}^{(j)}h_{t,m}|+|v_{t,m}^{(j+1)}|\Big{)}\Bigg{)}\\ &\quad\times\left(\sum_{j=1}^{i}\left(|h_{t,m}^{(j)}-z_{t,m}^{(j+1)}-\bar{\mu}_{m}^{(j)}h_{t,m}|^{2}+|\tilde{h}_{t,m}^{(j)}-v_{t,m}^{(j+1)}|^{2}\right)\right)^{\frac{1}{2}}\\ &\leq C_{0}(4i+2)\Bigg{[}1+\frac{||\mathbf{h}_{m}||^{2}}{(T+1)N}\\ &\ +\sum_{j=1}^{i}\frac{||\mathbf{h}_{m}^{(j)}||^{2}+||\tilde{\mathbf{h}}_{m}^{(j)}||^{2}+||\mathbf{z}_{m}^{(j+1)}+\bar{\mu}_{m}^{(j)}\mathbf{h}_{m}||^{2}+||\mathbf{v}_{m}^{(j+1)}||^{2}}{(T+1)N}\Bigg{]}^{\frac{1}{2}}\\ &\quad\times\left(\sum_{j=1}^{i}\frac{||\mathbf{h}_{m}^{(j)}-\mathbf{z}_{m}^{(j+1)}-\bar{\mu}_{m}^{(j)}\mathbf{h}_{m}||^{2}+||\tilde{\mathbf{h}}_{m}^{(j)}-\mathbf{v}_{m}^{(j+1)}||^{2}}{(T+1)N}\right)^{\frac{1}{2}},\end{aligned} (142)

where C0C_{0} denotes a generic positive constant. Similarly, we have

1L|l=1Lϕ(rl,m(1),,rl,m(i),sl,m(1),,sl,m(i+1),yl,m)l=1Lϕ(ol,m(2),,ol,m(i+1),ul,m(2),,ul,m(i+2),ol,m(1)+nl,m)|C0(4i+5)[1+𝐲m2+𝐨m(1)+𝐧m2L+j=1i𝐫m(j)2+𝐨m(j+1)2L+j=1i+1𝐬m(j)2+𝐮m(j+1)2L]12×(𝐬m(i+1)𝐮m(i+2)2+𝐲m𝐨m(1)𝐧m2L+j=1i𝐫m(j)𝐨m(j+1)2+𝐬m(j)𝐮m(j+1)2L)12.\displaystyle\begin{aligned} &\frac{1}{L}\Bigg{|}\sum_{l=1}^{L}\phi\left(r_{l,m}^{(1)},\ldots,r_{l,m}^{(i)},s_{l,m}^{(1)},\ldots,s_{l,m}^{(i+1)},y_{l,m}\right)\\ &-\sum_{l=1}^{L}\phi\left(o_{l,m}^{(2)},\ldots,o_{l,m}^{(i+1)},u_{l,m}^{(2)},\ldots,u_{l,m}^{(i+2)},o_{l,m}^{(1)}+n_{l,m}\right)\Bigg{|}\\ &\leq C_{0}(4i+5)\Bigg{[}1+\frac{||\mathbf{y}_{m}||^{2}+||\mathbf{o}_{m}^{(1)}+\mathbf{n}_{m}||^{2}}{L}\\ &+\sum_{j=1}^{i}\frac{||\mathbf{r}_{m}^{(j)}||^{2}+||\mathbf{o}_{m}^{(j+1)}||^{2}}{L}+\sum_{j=1}^{i+1}\frac{||\mathbf{s}_{m}^{(j)}||^{2}+||\mathbf{u}_{m}^{(j+1)}||^{2}}{L}\Bigg{]}^{\frac{1}{2}}\\ &\times\Bigg{(}\frac{||\mathbf{s}_{m}^{(i+1)}-\mathbf{u}_{m}^{(i+2)}||^{2}+||\mathbf{y}_{m}-\mathbf{o}_{m}^{(1)}-\mathbf{n}_{m}||^{2}}{L}\\ &+\sum_{j=1}^{i}\frac{||\mathbf{r}_{m}^{(j)}-\mathbf{o}_{m}^{(j+1)}||^{2}+||\mathbf{s}_{m}^{(j)}-\mathbf{u}_{m}^{(j+1)}||^{2}}{L}\Bigg{)}^{\frac{1}{2}}.\end{aligned} (143)

To prove this lemma, we just need to show that as N,LN,L\rightarrow\infty, the terms inside the brackets of (LABEL:ineq1) and (LABEL:ineq2) converge to finite and deterministic limits while the terms in the last line of (LABEL:ineq1) and (LABEL:ineq2) converge to zero. This can be accomplished via mathematical induction following Appendix D.4 in [31].

Let

ψ~(zt,m(2),,zt,m(i+1),vt,m(2),,vt,m(i+1),ht,m)=ψ(zt,m(2)+μ¯m(1)ht,m,,zt,m(i+1)+μ¯m(i)ht,m,vt,m(2),,vt,m(i+1),ht,m).\displaystyle\begin{aligned} &\tilde{\psi}\Big{(}z_{t,m}^{(2)},\ldots,z_{t,m}^{(i+1)},v_{t,m}^{(2)},\ldots,v_{t,m}^{(i+1)},h_{t,m}\Big{)}\\ &=\psi\Big{(}z_{t,m}^{(2)}+\bar{\mu}_{m}^{(1)}h_{t,m},\ldots,z_{t,m}^{(i+1)}+\bar{\mu}_{m}^{(i)}h_{t,m},\\ &\quad\quad v_{t,m}^{(2)},\ldots,v_{t,m}^{(i+1)},h_{t,m}\Big{)}.\end{aligned} (144)

It is almost sure that

lim(T+1)N1(T+1)Nt=1(T+1)Nψ(zt,m(2)+μ¯m(1)ht,m,,zt,m(i+1)+μ¯m(i)ht,m,vt,m(2),,vt,m(i+1),ht,m)=(a)𝔼[ψ(Zm(2)+μ¯m(1)Hm,,Zm(i+1)+μ¯m(i)Hm,Vm(2),,Vm(i+1),Hm)]=(b)𝔼[ψ(Hm(1),,Hm(i),H~m(1),,H~m(i),Hm)],\displaystyle\begin{aligned} &\lim_{(T+1)N\rightarrow\infty}\frac{1}{(T+1)N}\\ &\sum_{t=1}^{(T+1)N}\psi\Big{(}z_{t,m}^{(2)}+\bar{\mu}_{m}^{(1)}h_{t,m},\ldots,z_{t,m}^{(i+1)}+\bar{\mu}_{m}^{(i)}h_{t,m},\\ &\quad\quad\quad\quad v_{t,m}^{(2)},\ldots,v_{t,m}^{(i+1)},h_{t,m}\Big{)}\\ &\overset{(\text{a})}{=}\mathbb{E}\Big{[}\psi\Big{(}Z_{m}^{(2)}+\bar{\mu}_{m}^{(1)}H_{m},\ldots,Z_{m}^{(i+1)}+\bar{\mu}_{m}^{(i)}H_{m},\\ &\quad\quad\quad\quad V_{m}^{(2)},\ldots,V_{m}^{(i+1)},H_{m}\Big{)}\Big{]}\\ &\overset{(\text{b})}{=}\mathbb{E}\left[\psi\left(H_{m}^{(1)},\ldots,H_{m}^{(i)},\tilde{H}_{m}^{(1)},\ldots,\tilde{H}_{m}^{(i)},H_{m}\right)\right],\end{aligned} (145)

where (a) can be obtained via (133) in Theorem 2, and (b) is based on Lemma 1, (51), and (122). Therefore, with (LABEL:transform) and (LABEL:close1) of Lemma 2, we can obtain (46) of Theorem 1. Meanwhile, (47) can be derived similarly, which completes the proof of Theorem 1.

References

  • [1] X. Bian, Y. Mao, and J. Zhang, “Joint activity-delay detection and channel estimation for asynchronous massive random access,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Kuala Lumpur, Malaysia, Dec. 2023.
  • [2] C. Bockelmann et al., “Massive machine-type communications in 5G: Physical and MAC-layer solutions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 59–65, Sep. 2016.
  • [3] Y. Shi, J. Dong, and J. Zhang, Low-overhead Communications in IoT Networks - Structured Signal Processing Approaches, Springer, 2020.
  • [4] M. T. Islam, A. E. M. Taha, and S. Akl, “A survey of access management techniques in machine type communications,” IEEE Commun. Mag., vol. 52, no. 4, pp. 74–81, Apr. 2014.
  • [5] P. Schulz et al., “Latency critical IoT applications in 5G: Perspective on the design of radio interface and network architecture,” IEEE Commun. Mag., vol. 55, no. 2, pp. 70-78, Feb. 2017.
  • [6] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615–637, Mar. 2021.
  • [7] E. Björnson, E. Carvalho, J. H. Sørensen, E. G. Larsson, and P. Popovski, “A random access protocol for pilot allocation in crowded massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 4, pp. 2220-2234, Apr. 2017.
  • [8] J. Choi, J. Ding, N. Le, and Z. Ding, “Grant-free random access in machine-type communication: Approaches and challenges,” IEEE Wireless Commun., vol. 29, no. 1, pp. 151-158, Feb. 2022.
  • [9] Y. Wu et al., “Massive access for future wireless communications," IEEE Wireless Commun., vol. 27, no. 4, pp. 148-156, Aug. 2020.
  • [10] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Covariance based joint activity and data detection for massive access with massive MIMO,” in Proc. IEEE Int. Conf. Commun. (ICC), Shanghai, China, May 2019.
  • [11] S. Haghighatshoar, P. Jung, and G. Caire, “Improved scaling law for activity detection in massive MIMO systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA, Jun. 2018.
  • [12] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,” IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1890–1904, Apr. 2018.
  • [13] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing based adaptive active user detection and channel estimation: Massive access meets massive MIMO," IEEE Trans. Signal Process., vol. 68, pp. 764–779, 2020.
  • [14] X. Bian, Y. Mao, and J. Zhang, “Joint activity detection, channel estimation, and data decoding for grant-free massive random access,” IEEE Internet Things J., vol. 10, no. 16, pp. 14042-14057, Aug. 2023.
  • [15] X. Bian, Y. Mao, and J. Zhang, “Grant-free massive random access with retransmission: Receiver optimization and performance analysis,” IEEE Trans. Commun., vol. 72, no. 2, pp. 786-800, Feb. 2024.
  • [16] Y. Qiang, X. Shao and X. Chen, “A model-driven deep learning algorithm for joint activity detection and channel estimation,” IEEE Commun. Lett., vol. 24, no. 11, pp. 2508-2512, Nov. 2020.
  • [17] Y. Cui, S. Li and W. Zhang, “Jointly sparse signal recovery and support recovery via deep learning with applications in MIMO-based grant-free random access,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 788-803, Mar. 2021.
  • [18] H. Zhang et al., “Asynchronous interference mitigation in cooperative base station systems,” IEEE Wireless Commun., vol. 7, no. 1, pp. 155-165, Jan. 2008.
  • [19] L. Liu, and Y. Liu, “An efficient algorithm for device detection and channel estimation in asynchronous IoT systems,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Toronto, ON, Canada, Jun. 2021.
  • [20] Z. Wang, Y. Liu, and L. Liu, “Covariance-based joint device activity and delay detection in asynchronous mMTC,” IEEE Signal Process. Lett., vol. 29, pp. 538-542, Jan. 2022.
  • [21] Y. Guo, Z. Liu, and Y. Sun, “Low-complexity joint activity detection and channel estimation with partially orthogonal pilot for asynchronous massive access,” IEEE Internet Things J., vol. 11, no. 1, pp. 1773-1783, Jan. 2024.
  • [22] M. Qiu, K. Cao, D. Cai, Z. Dong and Y. Cui, “Low-complexity joint estimation for asynchronous massive internet of things: An ADMM approach,” in Proc. Int. Conf. Intell. Techn. Embedded Syst. (ICITES), Chengdu, China, Sept. 2022.
  • [23] W. Zhu, M. Tao, X. Yuan, and Y. Guan, “Deep-learned approximate message passing for asynchronous massive connectivity,” IEEE Trans. Wireless Commun., vol. 20, no. 8, pp. 5434-5448, Mar. 2021.
  • [24] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–2033, 2017.
  • [25] Y. Mei et al., “Compressive sensing-based joint activity and data detection for grant-free massive IoT access,” IEEE Trans. Wireless Commun., vol. 21, no. 3, pp. 1851-1869, Mar. 2022.
  • [26] R. Speicher, “Lecture notes on free probability theory,” arXiv: 1908.08125, 2019. https://arxiv.org/pdf/1908.08125.
  • [27] D. Voiculescu, “Addition of certain noncommuting random variables,” J. Funct. Anal., vol. 66, no. 3, pp. 323-346, May 1986.
  • [28] M. Opper, B. Cakmak, and O. Winther, “A theory of solving TAP equations for Ising models with general invariant random matrices,” J. Phys. A: Math. Theor., vol. 49, no. 11, p. 114002, Feb. 2016.
  • [29] Z. Fan, “Approximate message passing algorithms for rotationally invariant matrices,” Ann. Stats., vol. 50, no. 1, pp. 197-224, 2022.
  • [30] F. B. Georges, “Rectangular random matrices, related convolution,” Probab. Theory Relat. Fields, vol. 144, pp. 471–515, Apr. 2009.
  • [31] R. Venkataramanan, K. Kögler, and M. Mondelli, “Estimation in rotationally invariant generalized linear models via approximate message passing,” in Proc. Int. Conf. Mach. Learn. (ICML), Baltimore MD, USA, Jul. 2022.
  • [32] C. Villani. Optimal Transport: Old and New. Springer, 2009.
  • [33] O. Y. Feng, R. Venkataramanan, C. Rush, and R. J. Samworth, “A unifying tutorial on approximate message passing,” Fdn. Trends® Mach. Learn., vol. 15, no. 4, pp. 335–536, 2022.