LEO Satellite-Enabled Grant-Free Random Access with MIMO-OTFS

Boxiao Shen, Yongpeng Wu, Wenjun Zhang, Geoffrey Ye Li, Jianping An, and Chengwen Xing The work of Y. Wu is supported in part by the National Key R&D Program of China under Grant 2018YFB1801102, National Science Foundation (NSFC) under Grant 62122052 and 62071289, 111 project BP0719010, and STCSM 18DZ2270700. B. Shen, Y. Wu, and W. Zhang are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]; [email protected]; [email protected]). G. Y. Li is with the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK (Email: [email protected]).J. An and C. Xing are with the School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China (e-mails: [email protected]; [email protected]).

Abstract

This paper investigates joint channel estimation and device activity detection in the LEO satellite-enabled grant-free random access systems with large differential delay and Doppler shift. In addition, the multiple-input multiple-output (MIMO) with orthogonal time-frequency space modulation (OTFS) is utilized to combat the dynamics of the terrestrial-satellite link. To simplify the computation process, we estimate the channel tensor in parallel along the delay dimension. Then, the deep learning and expectation-maximization approach are integrated into the generalized approximate message passing with cross-correlation-based Gaussian prior to capture the channel sparsity in the delay-Doppler-angle domain and learn the hyperparameters. Finally, active devices are detected by computing energy of the estimated channel. Simulation results demonstrate that the proposed algorithms outperform conventional methods.

Index Terms:

Random access, OTFS, satellite communications, message passing, Doppler shift

I Introduction

Internet-of-Things (IoT) are important application scenarios in the 5G and future networks. However, in many situations, a large number of IoT devices are distributed in remote areas, such as deserts, oceans, forests, etc[1]. The existing cellular communication networks are mainly designed and deployed in places where populations are concentrated, which makes it is difficult to support remote IoT devices. Fortunately, low-earth orbit (LEO) satellites are with low propagation delay, low path loss, and flexible elevation angle and could provide a very promising solution in these situations[2]. Therefore, we are investigating the LEO satellite-enabled grant-free random access (RA) for IoT systems.

Over the past few years, many methods have been proposed for joint channel estimation and device activity detection in the terrestrial grant-free RA systems. For example, the sparse Bayesian learning (SBL) has been adopted in [3] to solve this problem. In [4], the generalized multiple measurement vector approximate message passing (GMMV-AMP) has been proposed to explore the sparsity in the angular domain, which can further improve the estimation performance. The above works have been designed for the block fading channel, which is constant during a transmission slot. However, the high mobility of LEO satellites inevitably leads to the rapid change of terrestrial-satellite link (TSL) and the large Doppler shift, which degrades the performance of the current algorithms. Besides, the long delay effect should also be involved in the design of satellite communication systems. Therefore, the current grant-free random access schemes are hard to apply to LEO satellite communications straightforwardly.

Orthogonal time-frequency space (OTFS) which directly operates in the delay-Doppler domain is a promising solution to tackle the above issues. In [5] and [6], the input-output relationship of OTFS with massive multiple-input multiple-output (MIMO) has been derived and the 3D-structured sparsity of the channel in the delay-Doppler-angle (DDA) domain has been identified, which enables the development of the 3D-structured orthogonal matching pursuit channel estimation algorithm. In [7] and [8], two grant-free RA schemes with MIMO-OTFS have been proposed for the LEO satellite communications, where the device precompensates the delay and/or Doppler shift before the RA process. However, the precompensation incurs extra complexity, which will decrease the battery life of these remote IoT devices.

In this paper, we investigate the joint channel estimation and device activity detection in the LEO satellite-enabled grant-free RA, where OTFS is adopted to address the doubly dispersive effect in the TSL link and MIMO is integrated to improve the performance. Different from the existing literature, we consider the case that the differential delay is more than one symbol duration and/or the differential Doppler shift is more than one subcarrier spacing. To reduce the complexity of the IoT devices, we handle the large differential delay and Doppler shift at the satellite end. In this scenario, channel at each antenna can be expressed as a 3D tensor, which is complicated to handle directly. Hence, we estimate channel along delay dimension so that the 3D tensor can be decomposed into a set of matrices and computations can perform in parallel. Notice that the channel tensor in each delay dimension has the 2D burst block sparsity due to the adoption of MIMO-OTFS. To capture this 2D sparsity, we propose a deep learning-based generalized approximate message passing (DL-GAMP) algorithm with cross-correlation-based Gaussian prior, where the GAMP estimates the channel tensor, and the deep learning and expectation-maximization (EM) rule are adopted to learn the hyperparameters.

II System Model

In this section, the basic system setup is first introduced. Then, we present the adopted channel model and input-output relationship of the MIMO-OTFS system when both the large differential delay and Doppler shift exist. Finally, we formulate the considered problem.

II-A System Setup

We consider a LEO satellite-enabled grant-free random access system with MIMO-OTFS, where $U$ single-antenna devices adopting OTFS modulation intend to communicate with a LEO satellite. The satellite is equipped with $N_{a}=N_{y}\times N_{z}$ uniform planar array (UPA) antennas and works with the regenerative payload, which allows it for on-board processing of baseband signals. In a given time interval, only $U_{a}$ devices are active and transmit signal to satellite utilizing the same time-frequency resources. We denote $\lambda_{u}$ as the activity indicator of the $u$ -th device, with $\lambda_{u}=1$ if active and $\lambda_{u}=0$ otherwise. We follow the recommendations of 3GPP[9] and assume that the satellite precompensates Doppler shift for the center of the beam and broadcasts a common delay to all devices. The residual delay and Doppler shift (called differential delay and Doppler shift in 3GPP terminology) are handled by satellite in the uplink transmissions.

II-B Channel Model and Signal Modulation

II-B1 Channel Model

We consider the following channel response corresponding to the $u$ -th device in the delay-Doppler-space domain, i.e.,

\displaystyle\mathbf{h}_{u}(\tau,\nu)=\sum_{i=0}^{P}h_{u,i}\delta\left(\tau-\tau_{u,i}\right)\delta\left(\nu-\nu_{u,i}\right)\mathbf{v}(\psi_{u,i},\varphi_{u,i}),

(1)

where $h_{u,i}$ , $\nu_{u,i}$ , $\tau_{u,i}$ , $\psi_{u,i}$ , and $\varphi_{u,i}$ denote the complex gain, differential Doppler shift, differential delay, azimuth angle, and elevation angle of the $i$ -th path, respectively, and $\mathbf{v}(\psi_{u,i},\varphi_{u,i})$ is the steering vector of the antennas. The steering vector is the Kronecker product of the array response vectors $\mathbf{v}(\vartheta_{y_{u,i}})$ and $\mathbf{v}(\vartheta_{z_{u,i}})$ corresponding to the directions with respect to $y$ - and $z$ -axis, respectively, that is,

\displaystyle\mathbf{v}(\psi_{u,i},\varphi_{u,i})=\mathbf{v}(\vartheta_{y_{u,i}})\otimes\mathbf{v}(\vartheta_{z_{u,i}}),

(2)

where $\vartheta_{y_{u,i}}=\sin\varphi_{u,i}\sin\psi_{u,i}$ and $\vartheta_{z_{u,i}}=\cos\varphi_{u,i}$ are the directional cosines along the $y$ - and $z$ -axis, respectively, and

\displaystyle\mathbf{v}(\vartheta)=\left[1\exp\left\{-\bar{\jmath}\pi\vartheta\right\}\ldots\right.\left.\exp\left\{-\bar{\jmath}\pi\left(N_{y}-1\right)\vartheta\right\}\right]^{T}

(3)

The Doppler and delay taps for the $i$ -th path of the $u$ -th device can be expressed as

\displaystyle\nu_{u,i}=\frac{k_{u,i}+\tilde{k}_{u,i}+b_{u,i}N}{NT_{\text{sym}}},\tau_{u,i}=\frac{l_{u,i}+c_{u,i}M}{M\Delta f},

(4)

where $N$ is the number of OFDM symbols, $M$ is the number of subcarriers, and $\Delta f$ is the subcarrier spacing; $l_{u,i}=0,\dots,M-1$ and $k_{u,i}=\lceil-N/2\rceil,\dots,\lceil N/2\rceil-1$ represent indexes of the delay tap and Doppler tap corresponding to the delay $\tau_{u,i}$ and Doppler $\nu_{u,i}$ , respectively; $\tilde{k}_{i}\in(-\frac{1}{2},\frac{1}{2}]$ corresponds to the fractional Doppler shift; $T_{\mathrm{sym}}$ is the time duration of a OFDM symbol with cyclic prefix (CP); $b_{u,i}$ and $c_{u,i}$ are integers, referred to as the outer Doppler tap and outer delay tap, respectively, which are non-zeros when differential Doppler $\nu_{u,i}>\Delta f$ and/or differential delay $\tau_{u,i}>T$ ( $T$ is one symbol duration without CP). Outer Doppler and delay taps are often non-zeros in the LEO satellite communications due to the large geographical difference of devices, which is distinct from the terrestrial communications ( $\nu_{u,i}\ll\Delta f$ and $\tau_{u,i}\ll T$ ). For example, in the 3GPP set-2 configuration[9], the maximum differential Doppler and delay can be up to 3880 Hz and 699 $\mu$ s[2], respectively. Then, if a subcarrier spacing is equal to 3880 Hz, a symbol duration (without CP) will be 258 $\mu$ s and smaller than 699 $\mu$ s.

II-B2 Signal Modulation

The $u$ -th device employs the OFDM-based OTFS modulation to combat the doubly fading effect of the above channel. The detailed mathematical derivations of this modulation are omitted due to the limited spacing and interested readers can refer to [5, 6, 7]. We directly give the input-output relationship of the $u$ -th device in the delay-Doppler-space domain at the $(n_{z}+n_{y}N_{z})$ -th antenna as[7]

	$\displaystyle Y_{u,n_{y},n_{z}}^{\mathrm{DDS}}[k,l]=\sum_{i=0}^{P}\sum_{k^{\prime}=\lceil-N/2\rceil}^{\lceil N/2\rceil-1}X_{u}^{\mathrm{DD}}\left[\left\langle k-k^{\prime}\right\rangle_{N},\left(l-l_{u,i}\right)_{M}\right]$
	$\displaystyle\times H_{u,n_{y},n_{z}}^{\mathrm{DDS}}\left[k^{\prime},l_{u,i},l\right]+Z_{n_{y},n_{z}}^{\mathrm{DDS}}\left[k,l\right]$		(5)

where $n_{y}=0,\dots,N_{y}-1$ , $n_{z}=0,\dots,N_{z}-1$ , $(\cdot)_{M}$ denotes mod $M$ , $\langle x\rangle_{N}$ denotes $\left(x+\left\lfloor\frac{N}{2}\right\rfloor\right)_{N}-\left\lfloor\frac{N}{2}\right\rfloor$ , $Z_{n_{y},n_{z}}^{\mathrm{DDS}}$ is the noise, and the effective channel $H_{u}^{\mathrm{DDS}}$ . The effective channel can be expressed as

	$\displaystyle H_{u,n_{y},n_{z}}^{\mathrm{DDS}}$	$\displaystyle\left[k^{\prime},l_{u,i},l\right]=\frac{1}{N}\sum_{j=0}^{N-1}h_{M_{\mathrm{cp}}+j\left(M+M_{\mathrm{cp}}\right),l_{u,i}}e^{-\bar{\jmath}\frac{2\pi}{N}k^{\prime}j}$
		$\displaystyle\times e^{\bar{\jmath}2\pi\frac{\left(k_{u,i}+\tilde{k}_{u,i}+b_{u,i}N\right)\left(l-c_{u,i}M\right)}{\left(M+M_{\mathrm{cp}}\right)N}}e^{\bar{\jmath}\pi n_{y}\vartheta_{y_{u,i}}}e^{\bar{\jmath}\pi n_{z}\vartheta_{z_{u,i}}},$		(6)

where $M_{\text{cp}}$ is the length of the CP and $h_{\rho,l}$ is the complex gain of the time-variant channel on the delay tap $l$ at the time $\rho T_{\mathrm{s}}$ , defined as [6]

\displaystyle h_{\rho,l}=\sum_{i=0}^{P}h_{{u,i}}e^{\bar{\jmath}2\pi(\rho-l)T_{s}\nu_{u,i}}\delta\left(lT_{\mathrm{s}}-(\tau_{u,i})_{T}\right),

(7)

where $T_{\mathrm{s}}=\frac{1}{M\Delta f}$ is the system sampling interval.

Remark 1

The difference between the effective channel in (II-B2) and those in previous literature[5, 6, 7] is that the phase compensation $e^{\bar{\jmath}2\pi\frac{\left(k_{u,i}+\tilde{k}_{u,i}+b_{u,i}N\right)\left(l-c_{u,i}M\right)}{\left(M+M_{\mathrm{cp}}\right)N}}$ is difficult to be separated from the effective channel or to be neglected, due to the existence of outer Doppler tap $b_{i}$ and outer delay tap $c_{i}$ . In this scenario, the effective channel at each antenna is a 3D tensor and is complicated to estimate directly. To reduce computational complexity, we propose to handle it along delay dimension, subsequently.

II-C Problem Formulation

To exploit the sparsity in the angular domain, the two-dimensional discrete Fourier transform (2D-DFT) is applied for the channel of $u$ -th device along the space dimension, and then the delay-Doppler-angle domain channel is given by

		$\displaystyle H_{u,a_{y},a_{z}}^{\mathrm{DDA}}[k,l_{u,i},l]=\sqrt{N_{y}N_{z}}\sum_{i=0}^{P}h_{u,i}e^{\bar{\jmath}2\pi(M_{\text{cp}}+l)T_{s}\nu_{u,i}}$
		$\displaystyle\times e^{-\bar{\jmath}2\pi\tau_{u,i}\nu_{u,i}}\Pi_{N}(k-NT_{\text{sym}}\nu_{u,i})\delta\left(l_{u,i}T_{\mathrm{s}}-(\tau_{u,i})_{T}\right)$
		$\displaystyle\times\Pi_{N_{y}}(a_{y}-N_{y}\vartheta_{y_{u,i}}/2)\Pi_{N_{z}}(a_{z}-N_{z}\vartheta_{z_{u,i}}/2),$		(8)

where $a_{y}=0,\dots,N_{y}-1$ and $a_{z}=0,\dots,N_{z}-1$ are indexes along angular domain and $\Pi_{N}(x)\triangleq\frac{1}{N}\sum_{i=0}^{N-1}e^{-\bar{\jmath}2\pi\frac{x}{N}i}$ . From (II-C), $H_{u,a_{y},a_{z}}^{\mathrm{DDA}}[k,l_{u,i},l]^{\mathrm{DDA}}$ has dominant elements only if $k\approx NT_{\text{sym}}(\nu_{u,i})_{1/T_{\text{sym}}}$ , $l_{u,i}\approx M\Delta f(\tau_{u,i})_{T}$ , $a_{y}\approx N_{y}\Omega_{y_{u}}/2$ , and $a_{z}\approx N_{z}\Omega_{z_{u}}/2$ . Therefore, the channel in the delay-Doppler-angle domain has the 3D-structured sparsity[5].

The received symbols from all the devices in the delay-Doppler-angle domain is given by

		$\displaystyle Y_{a_{y},a_{z}}^{\mathrm{DDA}}[k,l]=\sum_{u=0}^{U-1}\sum_{l^{\prime}=0}^{M-1}\sum_{k^{\prime}=\lceil-N/2\rceil}^{\lceil N/2\rceil-1}\lambda_{u}H_{u,a_{y},a_{z}}^{\mathrm{DDA}}\left[k^{\prime},l^{\prime},l\right]$
		$\displaystyle\times X_{u}^{\mathrm{DD}}\left[\left\langle k-k^{\prime}\right\rangle_{N},\left(l-l^{\prime}\right)_{M}\right]+Z^{\mathrm{DDA}}[k,l,a_{y},a_{z}],$		(9)

where $Z^{\mathrm{DDA}}$ is the noise in the delay-Doppler-angle domain. To facilitate the following analysis, we rewrite (II-C) into the matrix form for each $l$ as

\displaystyle\mathbf{Y}_{l}=\mathbf{X}_{l}\Lambda\tilde{\mathbf{H}}_{l}+\mathbf{Z}_{l},

(10)

where the elements of $\mathbf{Z}_{l}$ are independent zero-mean Gaussian noises with variance $\sigma^{2}$ ; the $(a_{z}+1+N_{z}a_{y})$ -th column of $\mathbf{Y}_{l}$ is $\mathbf{Y}_{a_{y},a_{z}}^{\mathrm{DDA}}[:,l]$ ; $\mathbf{X}_{l}=[\mathbf{X}_{0,l},\dots,\mathbf{X}_{U-1,l}]\in\mathrm{C}^{N\times UMN}$ , where $\mathbf{X}_{u,l}\in\mathrm{C}^{N\times MN}$ is the pilot matrix of the $u$ -th device in delay dimension $l$ . Note that $\mathbf{X}_{u,l}$ is a block circulant matrix due to the 2D circular convolution in (II-C), and its sub-matrix $\mathbf{X}_{u,l,l^{\prime}}$ , $l^{\prime}=0,\dots,M-1$ is circulant matrix, formed by $\left[X_{u}^{\mathrm{DD}}[0,(l-l^{\prime})_{M}],\dots,X_{u}^{\mathrm{DD}}[-1,(l-l^{\prime})_{M}]\right]$ . $\Lambda=\mathrm{diag}([\lambda_{0}\mathbf{1}_{MN},\cdots,\lambda_{U-1}\mathbf{1}_{MN}])$ is a diagonal matrix composed of the activity indicator of all the devices and $\mathbf{1}_{MN}\in R^{MN}$ is a all-one vector; Finally, the $(a_{z}+1+N_{z}a_{y})$ -th column of $\tilde{\mathbf{H}}$ is given by

\displaystyle\tilde{\mathbf{H}}_{l}[:,a_{z}+1+N_{z}a_{y}]=\left[\begin{array}[]{cccc}\mathrm{vec}\left(\mathbf{H}^{\mathrm{DDA}}_{0,a_{y},a_{z}}[:,:,l]\right)\\ \vdots\\ \mathrm{vec}\left(\mathbf{H}^{\mathrm{DDA}}_{U-1,a_{y},a_{z}}[:,:,l]\right)\end{array}\right],

(14)

where $\mathrm{vec}(\cdot)$ denotes the vectorization of a matrix. We denote the channel matrix $\Lambda\tilde{\mathbf{H}_{l}}$ as $\mathbf{H}_{l}$ . Then, (10) can be written as

\displaystyle\mathbf{Y}_{l}=\mathbf{X}_{l}\mathbf{H}_{l}+\mathbf{Z}_{l}.

(15)

Therefore, the joint device activity detection and channel estimation in this scenario turns into a sparse signal reconstruction problem, and can be separately and parallelly handled along delay dimension for each $\mathbf{H}_{l}$ . In addition, the 2D burst block sparsity[7] appears in $\mathbf{H}_{l}$ , which is different from those in the conventional grant-free random access systems[3, 4] and can be utilized to improve the estimation performance further.

In this work, we estimate $\mathbf{H}_{l}$ adopting the minimum mean square error (MMSE) rule, represented as

\displaystyle\hat{\mathbf{H}}_{l}=\arg\max_{\mathbf{H^{\prime}}}\mathrm{E}[\|\mathbf{H^{\prime}}-\mathbf{H}_{l}\|_{2}^{2}|\mathbf{Y}_{l}]

(16)

Based on the estimation, the active device can be detected by comparing the energy of each device’s channel with a pre-defined threshold, i.e.,

\displaystyle\hat{\lambda}_{u}=\mathbb{I}\left\{\frac{1}{M}\sum_{l=0}^{M-1}\sum_{i=uMN}^{(u+1)MN-1}\sum_{j=0}^{N_{y}N_{z}-1}\left|\hat{H}_{l,i,j}\right|^{2}>\xi_{th}\right\},

(17)

where $\hat{H}_{l,i,j}$ is the $(i,j)$ -th element of $\hat{\mathbf{H}}_{l}$ . Note that since each $\mathbf{H}_{l}$ can be estimated separately, we omit the index $l$ in the following content.

III Joint Channel Estimation and Device
Activity Detection

Refer to caption — Figure 1: The proposed DL-GAMP algorithm.

III-A Correlation-based Prior for Channel Coefficients

Since channel matrix $\mathbf{H}$ has the 2D burst block sparsity, i.e., its non-zero elements are statistically dependent along the row and column, we consider the cross-correlation-based Gaussian prior for the $(i,j)$ -th element of $\mathbf{H}$ , given by

\displaystyle p(h_{i,j}|\mathrm{\mathbf{A}},\mathrm{\mathbf{B}})=\mathcal{CN}(h_{i,j}|0,\left(\sum_{p=-D_{x}}^{D_{x}}\sum_{q=-D_{y}}^{D_{y}}\beta_{p,q}\alpha_{i+p,j+q}\right)^{-1}),

(18)

where $\mathrm{\mathbf{A}}$ is the precision matrix whose $(i,j)$ -th element $\alpha_{i,j}$ is the local precision hyperparameter of $h_{i,j}$ ; $\mathrm{\mathbf{B}}\in[0,1]^{D_{x}\times D_{y}}$ is the kernel and its $(p,q)$ -th element $\beta_{p,q}$ represents the relevence between $h_{i,j}$ and $h_{i+p,j+q}$ . We denote $\theta_{i,j}$ as the prior variance of $h_{i,j}$ . From (18), $\theta_{i,j}$ not only depends on local hyperparameter $\alpha_{i,j}$ of $h_{i,j}$ , but also on the hyperparameters of its neighborhoods along the row and column of $\mathbf{H}$ . Therefore, such coupled prior has the potential to capture the unknown 2D sparsity structure in $\mathbf{H}$ . Then, following the conventional SBL, we adopt the Gamma distribution as the hierarchical prior for the local precision hyperparameter $\mathrm{\mathbf{A}}$ , i.e.,

\displaystyle p(\mathrm{\mathbf{A}}|a,b)=\prod_{i=0}^{UMN-1}\prod_{j=0}^{N_{y}N_{z}-1}\Gamma(a)^{-1}b^{a}\alpha_{i,j}^{a}e^{-b\alpha_{i,j}},

(19)

where $\Gamma(c)\triangleq\int_{0}^{\infty}t^{c-1}e^{-t}dt$ . Note that this hierarchical Bayesian modeling encourages the sparseness in the estimation since the overall prior of $\mathbf{H}$ (i.e., integrate out $\alpha_{i,j}$ ) is a Student- $t$ distribution that is sharply peaked at zero. Based on the above prior distribution, it is possible to find the analytical solution of (16). However, the matrix inversion will inevitably be involved, which leads to the high computational complexity. Fortunately, the prior and the likelihood function of $\mathbf{H}$ can be fully factorized, leading to the following posterior distribution, i.e.,

	$\displaystyle p(h_{i,j}$	$\displaystyle\|\mathbf{Y},\mathrm{\mathbf{A}},\mathrm{\mathbf{B}},\sigma^{2})\propto\int p(\mathbf{y}_{j}\|\mathbf{g}_{j},\sigma^{2})p(\mathbf{h}_{j}\|\mathrm{\mathbf{A}},\mathrm{\mathbf{B}})dh_{\sim i,j}$
	$\displaystyle=$	$\displaystyle\int\prod_{o=0}^{N-1}p(y_{o,j}\|g_{o,j},\sigma^{2})\prod_{i=0}^{UMN-1}p(h_{i,j}\|\mathrm{\mathbf{A}},\mathrm{\mathbf{B}})dh_{\sim i,j},$		(20)

where $\mathbf{y}_{j}$ and $\mathbf{h}_{j}$ are the $j$ -th column of $\mathbf{Y}$ and $\mathbf{H}$ , respectively, and $\mathbf{g}_{j}=\mathbf{X}\mathbf{h}_{j}$ . Hence, we can resort to the GAMP algorithm to estimate the posterior mean and variance of the channel matrix with lower complexity.

III-B MMSE Estimation via GAMP

The messages passed within the GAMP algorithm are approximated by Gaussian distributions, and then only the means and variances of the messages need to be calculated and preserved, which reduces the complexity significantly. We next outline the MMSE estimation of channel matrix $\mathbf{H}$ in the $t$ -th iteration by following the GAMP algorithms[10]. The detailed derivations are omitted for brevity. Here, the hyperparameters { $\mathbf{A}$ , $\mathbf{B}$ , $\sigma$ } are assumed to be known and fixed, and we will update them later.

III-B1 Initilization ( $t=1$ )

The initial estimation of the posterior mean of the channel is set as its prior mean i.e., $\hat{h}_{i,j}^{1}=0$ and the estimation of the posterior variance $\hat{\gamma}_{i,j}$ is initialized to its prior variance $\theta_{i,j}$ with a small positive number, e.g., $\hat{\gamma}_{i,j}^{1}=\theta_{i,j}^{1}=0.01$ . Residual $\hat{s}_{o,j}^{0}$ is set as 0.

III-B2 Messages on the Output Nodes

Based on linear relationship $\mathbf{g}_{j}=\mathbf{X}\mathbf{h}_{j}$ and observation $\mathbf{y}_{j}$ , the messages are combined to obtain the posterior mean and variance of $g_{o,j}$ , given by

	$\displaystyle\hat{g}_{o,j}=\mathrm{E}\left[g_{o,j}\|y_{o,j},\hat{p}_{o,j}^{t},\tau_{o,j}^{p,t}\right],$
	$\displaystyle\tau_{o,j}^{g,t}=\mathrm{Var}\left[g_{o,j}\|y_{o,j},\hat{p}_{o,j}^{t},\tau_{o,j}^{p,t}\right],$		(21)

where $\mathrm{E}[\cdot]$ and $\mathrm{Var}[\cdot]$ are taken with respect to the posterior distribution of $g_{o,j}$ given prior $\mathcal{CN}(g_{o,j}|\hat{p}_{o,j}^{t},\tau_{o,j}^{p,t})$ and likelihood function $\mathcal{CN}(y_{o,j}|g_{o,j},\sigma^{2}_{t})$ . The prior mean and variance of $g_{o,j}$ are calculated as

	$\displaystyle\tau_{o,j}^{p,t}=\sum_{i}\left\|X_{o,i}\right\|^{2}\hat{\gamma}_{i,j}^{t},$
	$\displaystyle\hat{p}_{o,j}^{t}=\sum_{i}(X_{o,i}\hat{h}_{i,j}^{t})-\hat{s}_{o,j}^{t-1}\tau_{o,j}^{p,t}.$		(22)

Then, residual $\hat{s}_{o,j}^{t}$ and inverse-residual variance $\tau_{o,j}^{s,t}$ are computed by

		$\displaystyle\hat{s}_{o,j}^{t}=\left(\hat{g}_{o,j}^{t}-\hat{p}_{o,j}^{t}\right)/\tau_{o,j}^{p,t},$
		$\displaystyle\tau_{o,j}^{s,t}=\left(1-\tau_{o,j}^{g,t}/\tau_{o,j}^{p,t}\right)/\tau_{o,j}^{p,t}.$		(23)

III-B3 Messages on the Input Nodes

Based on the residual and the inverse-residual variance, the posterior mean and variance of $h_{i,j}$ can be updated by

		$\displaystyle\hat{h}_{i,j}^{t+1}=\mathrm{E}[h_{i,j}\|\hat{r}_{i,j}^{t},\tau^{r,t}_{i,j},\mathrm{\mathbf{A}^{t}},\mathrm{\mathbf{B}^{t}}],$
		$\displaystyle\hat{\gamma}_{i,j}^{t+1}=\mathrm{Var}[h_{i,j}\|\hat{r}_{i,j}^{t},\tau^{r,t}_{i,j},\mathrm{\mathbf{A}^{t}},\mathrm{\mathbf{B}^{t}}],$		(24)

where $\mathrm{E}[\cdot]$ and $\mathrm{Var}[\cdot]$ are taken with respect to the posterior distribution of $h_{i,j}$ given prior $p(h_{i,j}|\mathbf{A}^{t},\mathbf{B}^{t})$ and likelihood function $\mathcal{CN}(h_{i,j}|\hat{r}_{i,j}^{t},\tau^{r,t}_{i,j})$ . The mean and variance in the likelihood function are calculated as

	$\displaystyle\tau^{r,t}_{i,j}=\left(\sum_{o}\left\|X_{o,i}\right\|^{2}\tau_{o,j}^{s,t}\right)^{-1},$
	$\displaystyle\hat{r}_{i,j}^{t}=\hat{h}_{i,j}^{t}+\tau^{r,t}_{i,j}\sum_{o}X_{o,i}^{*}\hat{s}_{o,j}^{t}.$		(25)

Combining (18) and (III-B3)-(25), we have

	$\displaystyle\hat{h}_{i,j}^{t+1}=\frac{\theta_{i,j}^{t}\hat{r}_{i,j}^{t}}{\theta_{i,j}^{t}+\tau^{r,t}_{i,j}},$
	$\displaystyle\hat{\gamma}_{i,j}^{t+1}=\frac{\theta_{i,j}^{t}\tau^{r,t}_{i,j}}{\theta_{i,j}^{t}+\tau^{r,t}_{i,j}}.$		(26)

III-C Learning the Hyperparameters

In this subsection, we utilize the EM algorithm together with deep learning to learn the hyperparameters { $\mathbf{A}$ , $\mathbf{B}$ , $\sigma$ }, where $\mathbf{A}$ and $\sigma$ are updated by the EM-based rule and $\mathbf{B}$ is learned by a convolutional neural network (CNN), as shown in Fig. 1. We then explain it in detail. Following the conventional SBL, we place a Gamma hyperprior over $\sigma^{-2}$ , i.e.,

\displaystyle p(\sigma|r,s)=\Gamma(r)^{-1}s^{r}\sigma^{-2r}e^{-s\sigma^{-2}}.

(27)

Based on the estimation in the $t$ -th iteration, the precision matrix and the noise variance can be updated by

	$\displaystyle\mathrm{\mathbf{A}}^{t+1}=\arg\max_{\mathrm{A}}\mathrm{E}[\log p(\mathbf{H}\|\mathrm{\mathbf{A}},\mathrm{\mathbf{B}})p(\mathrm{\mathbf{A}})\|\mathbf{Y},\mathbf{A}^{t},\sigma^{t}],$		(28)
	$\displaystyle\sigma^{t+1}=\arg\max_{\sigma}\mathrm{E}[\log p(\mathbf{Y}\|\mathbf{H},\sigma)p(\sigma)\|\mathbf{Y},\mathbf{A}^{t},\sigma^{t}],$		(29)

where the kernel $\mathrm{\mathbf{B}}$ is assumed to be known in this step. The optimization in (28) is difficult to get the analytic solution since the hyperparameters are entangled with each other due to the logarithm term. Although the gradient descent methods can be adopted to get the optimal solution, it involves higher computational complexity as compared with an analytical update rule. We derive a closed-form sub-optimal solution (see [7] for more detail) by examining the first derivative of the objective function with respect to $\alpha_{i,j}$ and by proper scaling, given as

\displaystyle\alpha_{i,j}^{t+1}=\frac{a}{b+\omega_{i,j}^{t}},

(30)

where $\omega_{i,j}^{t}=\sum_{p=-D_{x}}^{D_{x}}\sum_{q=-D_{y}}^{D_{y}}\beta_{p,q}\mathrm{E}[\left|h_{i-p,j-q}\right|^{2}]$ and posterior second moment $\mathrm{E}[\left|h_{pq}\right|^{2}]=\left|\hat{h}^{t}_{p,q}\right|^{2}+\left|\hat{\gamma}^{t}_{p,q}\right|^{2}$ , which are the outputs of the GAMP algorithm in the $t$ -th iteration. Then, computing the first derivative of the objective function in (29) with respect to $\sigma^{2}$ and setting it equal to zero, we get

\displaystyle(\sigma^{2})^{t+1}=\frac{\sum_{j=0}^{N_{y}N_{z}-1}\mathrm{E}\left[\left\|\mathbf{y}_{j}-\mathbf{X}\mathbf{h}_{j}\right\|^{2}\right]+s}{MNN_{y}N_{z}+r},

(31)

where

	$\displaystyle\mathrm{E}\left[\left\\|\mathbf{y}_{j}-\mathbf{X}\mathbf{h}_{j}\right\\|^{2}\right]=$	$\displaystyle\left\\|\mathbf{y}_{j}-\mathbf{X}\mathbf{\hat{h}}^{t}_{j}\right\\|^{2}+(\sigma^{2})^{t}$
		$\displaystyle\times\sum_{i=0}^{UMN-1}(1-\hat{\gamma}^{t}_{i,j}/\theta^{t}_{i,j}).$		(32)

By examining (18) and (30), the update of precision matrix $\mathbf{A}$ is related to the 2D convolution between kernel $\mathbf{B}$ and the posterior second moment, and the update of prior variance $\theta_{i,j}$ is related to the cross-correlation between the kernel and precision matrix, i.e.,

\displaystyle(\theta_{i,j}^{-1})^{t+1}=\sum_{p=-D_{x}}^{D_{x}}\sum_{q=-D_{y}}^{D_{y}}\beta_{p,q}\alpha_{i+p,j+q}^{t+1}.

(33)

The only difference between the 2D convolution and the cross-correlation is whether the kernel is flipped within the computations, which motivates us to utilize the CNN to learn kernel $\mathbf{B}$ and then update precision matrix $\mathbf{A}$ . Here, the trainable filter of the convolutional layer can be seen as flipped kernel $\mathbf{B}$ . Combining this idea with the EM update rule and the GAMP algorithm leads to the deep learning-based GAMP algorithm (DL-GAMP), and we summarize the computation process as follows.

As shown in Fig. 1, in the $t$ -th iteration, the posterior second moment computed by the GAMP algorithm in the $(t-1)$ -th iteration is firstly inputted to a convolutional layer and then the rectified linear unit (ReLU)[11] will ensure a positive output $\omega_{i,j}^{t}$ . Next, precision $\alpha_{i,j}^{t+1}$ is updated according to (30), and then is inputted to another convolutional layer followed by the ReLU to get the update of prior variance $\theta_{i,j}^{t+1}$ (similar to (33)). Simultaneously, noise standard variance $\sigma^{t+1}$ is given by (31). Finally, the output of GAMP in this iteration { $\hat{h}_{i,j}^{t+1}$ , $\hat{\gamma}_{i,j}^{t+1}$ } and the update of { $\theta_{i,j}^{t+1}$ , $\sigma^{t+1}$ } computed by the CNN and the EM approach will be inputted to the next iteration of the DL-GAMP. The algorithm will run until the maximum iteration $T_{\text{max}}$ is reached, and then the active device is detected according to (17). Note that the filters in the two convolutional layers are not restricted to be identical since they will be learned from the training samples. Besides, the matrix multiplication in the GAMP algorithm dominates other computations, like the 2D convolution in CNN, leading to complexity $\mathcal{O}(UMN^{2}N_{y}N_{z})$ in each iteration when each channel matrix $\mathbf{H}_{l}$ is parallelly estimated by DL-GAMP. If $\mathbf{H}_{l}$ is estimated in a serial manner, the complexity will be $\mathcal{O}(UM^{2}N^{2}N_{y}N_{z})$ in each iteration.

IV Numerical Results

TABLE I: SIMULATION PARAMETERS

Parameter	Values
Carrier frequency $(\mathrm{GHz})$	2
Subcarrier spacing $(\mathrm{kHz})$	3.75
Size of OTFS symbol $(M,N)$	$(16,35)$
Length of CP	42
Number of Satellite antennas	$4\times 4$
Number of LoS paths	1
Number of Non-LoS path	1
The differential delay (µs)	$[0,699]$
The differential Doppler shift (kHz)	$[-3.88,3.88]$
Directional cosine along the y- and z-axis	$[-1,1]$
Rician factor (dB)	$8$

In this section, we demonstrate the performance of the proposed algorithms through computer simulations. We consider the set-2 configuration of the non-terrestrial networks recommended by the 3GPP[2][9]. The detailed system parameters are summarized in Table I. Each device transmits Gaussian pliot matrix with independent entries obeying $\mathcal{CN}(0,\frac{1}{MN})$ . Moreover, the power of the multi-path channel gain is normalized as 1, i.e., $\sum_{i=0}^{p}\left|h_{u,i}\right|^{2}=1$ . Finally, we define the signal-to-noise ratio (SNR) as $\text{ SNR }=10\log_{10}\frac{1}{MN\sigma^{2}}$ .

To train the DL-GAMP, we generate $2\times 10^{4}$ samples; set $D_{x}=D_{y}=1$ , $b=r=s=10^{-4}$ , and $T_{max}=600$ ; set $a$ as the hyerparameter learned in the training process; adopt the Adam optimizer to update parameters in CNN with the learning rate of $10^{-4}$ . We generate $3\times 10^{3}$ samples at each SNR point to test the DL-GAMP. Besides, we adopt GMMV-AMP[4] as benchmarks to show the effectiveness of the proposed algorithms. Finally, the average normalized mean-squared-error (NMSE) and the average error probability are adopted as the metrics for the channel estimation and device activity detection, respectively, given by

\displaystyle\text{ NMSE }=\frac{1}{M}\sum_{l=0}^{M-1}\frac{\left\|\mathbf{H}_{l}-\hat{\mathbf{H}}_{l}\right\|^{2}_{\mathrm{F}}}{\left\|\mathbf{H}_{l}\right\|^{2}_{\mathrm{F}}},P_{e}=\frac{1}{U}\sum_{u=0}^{U-1}\left|\lambda_{u}-\hat{\lambda}_{u}\right|.

(34)

Fig. 2 compares the channel estimation performance between the DL-GAMP and GMMV-AMP. In the figure, we also provide the performance of the proposed algorithm with a constant kernel (without learning), named ConvGAMP, where the center coefficient of the kernel is set as 1 and other coefficients are set to be identical with $\beta=0.5$ or $\beta=1$ , which means that the neighborhoods of the channel $h_{i,j}$ have the same influence on it. Besides, there are five potential devices with an active one, i.e., $U=5$ and $U_{a}=1$ . From the figure, the proposed algorithms ConvGAMP and DL-GAMP outperform the GMMV-AMP under different SNR values since the GMMV-AMP only captures the sparsity in the angular domain while the proposed algorithms are able to deal with the sparsity in the delay-Doppler-angle domain. For example, when the NMSE is about -7.5 dB, the proposed algorithms outperform the GMMV-AMP by around 5 dB. Besides, the DL-GAMP has better performance than ConvGAMP, and has the 2.5 dB gain when the NMSE is -10 dB, which indicates that the kernel learned by CNN can provide better estimation performance.

Finally, in Fig. 3, we compare the device activity detection performance of the GMMV-AMP, ConvGAMP, and DL-GAMP, where $U=5$ and $U_{a}=1$ . From the figure, similar to Fig. 2, the ConvGAMP and DL-GAMP outperform the GMMV-AMP. With the assistance of CNN, the performance of DL-GAMP is improved further compared with the ConvGAMP.

V Conclusion

This work investigated the application of MIMO-OTFS for the grant-free random access in LEO satellite communications, where there are usually large differential delay and Doppler shift. To exploit the 2D burst block sparsity in the delay-Doppler-angle domain, the correlation-based Gaussian prior with the GAMP was proposed to estimate the channel and detect the active device. Finally, the hyperparameters were learned by the EM rule and deep learning. Simulation results demonstrated that the proposed algorithms could acquire accurate channel state information and device activity.

References

[1] M. De Sanctis, E. Cianca, G. Araniti, I. Bisio, and R. Prasad, “Satellite communications supporting internet of remote things,” IEEE Internet Things J., vol. 3, no. 1, pp. 113–123, Oct. 2016.
[2] H. Chougrani, S. Kisseleff, W. A. Martins, and S. Chatzinotas, “NB-IoT random access for non-terrestrial networks: Preamble detection and uplink synchronization,” IEEE Internet Things J., pp. 1–1, 2021 (Early Access).
[3] Y. Zhang, Q. Guo, Z. Wang, J. Xi, and N. Wu, “Block sparse Bayesian learning based joint user activity detection and channel estimation for grant-free NOMA systems,” IEEE Trans. Veh. Technol., vol. 67, no. 10, pp. 9631–9640, Jul. 2018.
[4] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing-based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764–779, Jan. 2020.
[5] W. Shen, L. Dai, J. An, P. Fan, and R. W. Heath, “Channel estimation for orthogonal time frequency space (OTFS) massive MIMO,” IEEE Trans. Signal Process., vol. 67, no. 16, pp. 4204–4217, May 2019.
[6] D. Shi, W. Wang, L. You, X. Song, Y. Hong, X. Gao, and G. Fettweis, “Deterministic pilot design and channel estimation for downlink massive MIMO-OTFS systems in presence of the fractional Doppler,” IEEE Trans. Wireless Commun., pp. 1–1, May 2021 (Early Access).
[7] B. Shen, Y. Wu, J. An, C. Xing, L. Zhao, and W. Zhang, “Random access with massive MIMO-OTFS in LEO satellite communications,” 2022. [Online]. Available: https://arxiv.org/abs/2202.13058.
[8] X. Zhou, K. Ying, Z. Gao, Y. Wu, Z. Xiao, S. Chatzinotas, J. Yuan, and B. Ottersten, “Active terminal identification, channel estimation, and signal detection for grant-free NOMA-OTFS in LEO satellite internet-of-things,” 2022. [Online]. Available: https://arxiv.org/abs/2201.02084.
[9] 3rd Generation Partnership Project, “3rd generation partnership project; technical specification group radio access network; solutions for NR to support nonterrestrial networks (NTN) (Release 16),” 3GPP TR 38.811V15.1.0, June 2019.
[10] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), St. Petersburg, Russia, Oct. 2011, pp. 2168–2172.
[11] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proc. Int. Conf. Artif. Intell. Statist, Ft. Lauderdale, FL, USA, Jan. 2011, pp. 315–323.