This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Bayes-Optimal Channel Estimation for Holographic MIMO in Unknown EM Environments

   Wentao Yu, Hengtao He, Xianghao Yu, Shenghui Song, Jun Zhang, Fellow, IEEE,
Ross D. Murch, Fellow, IEEE, and Khaled B. Letaief, Fellow, IEEE
This work was supported in part by the Hong Kong Research Grants Council under Grant No. 16212922, 16209622, and the Areas of Excellence Scheme Grant No. AoE/E-601/22-R, in part by a grant from the NSFC/RGC Joint Research Scheme sponsored by the Research Grants Council of the Hong Kong SAR, China and National Natural Science Foundation of China (Project No. N_HKUST656/22), and in part by the National Natural Science Foundation of China for Young Scientists under Grant No. 62301468.An extended journal version of this work is available at [1]. Dept. of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong
Dept. of Electrical Engineering, City University of Hong Kong, Hong Kong
Email: {wyuaq, eehthe, eeshsong, eejzhang, eermurch, eekhaled}@ust.hk, [email protected]
Abstract

Holographic MIMO (HMIMO) has recently been recognized as a promising enabler for future 6G systems through the use of an ultra-massive number of antennas in a compact space to exploit the propagation characteristics of the electromagnetic (EM) channel. Nevertheless, the promised gain of HMIMO could not be fully unleashed without an efficient means to estimate the high-dimensional channel. Bayes-optimal estimators typically necessitate either a large volume of supervised training samples or a priori knowledge of the true channel distribution, which could hardly be available in practice due to the enormous system scale and the complicated EM environments. It is thus important to design a Bayes-optimal estimator for the HMIMO channels in arbitrary and unknown EM environments, free of any supervision or priors. This work proposes a self-supervised minimum mean-square-error (MMSE) channel estimation algorithm based on powerful machine learning tools, i.e., score matching and principal component analysis. The training stage requires only the pilot signals, without knowing the spatial correlation, the ground-truth channels, or the received signal-to-noise-ratio. Simulation results will show that, even being totally self-supervised, the proposed algorithm can still approach the performance of the oracle MMSE method with an extremely low complexity, making it a competitive candidate in practice.

Index Terms:
6G, holographic MIMO, channel estimation, score matching, self-supervised learning, MMSE estimation

I Introduction

With an ultra-massive number of antennas closely packed in a compact space, holographic MIMO (HMIMO) is envisioned as a promising next-generation multi-antenna technology that enables extremely high spectral and energy efficiency [2]. To exploit the benefits of the electromagnetic (EM) channel, it is important to acquire accurate channel state information. However, this is difficult owing to both the high dimensionality of the channel and its complicated EM characteristics.

The minimum mean-square-error (MMSE) estimator is able to achieve the Bayes-optimal performance in terms of MSE. Implementing it requires either a perfect knowledge of the prior distribution of the channels [3, 4], or learning such a distribution from a substantial number of ground-truth channels [5, 6], both of which are difficult, if not impossible, in HMIMO systems owing to the extremely large number of antennas. Additionally, the computational complexity of the MMSE estimator, even the linear version (LMMSE), is extremely high, since it involves computationally-intensive matrix inversion operations, which consume a significant amount of computational budget [7]. Existing studies proposed various low-complexity alternatives, but they all come at the cost of an inferior performance compared with the MMSE estimator. In [3], a subspace-based channel estimation algorithm was proposed, in which the low-rank property of the HMIMO spatial correlation was exploited without requiring the full knowledge of the spatial correlation matrix. In [8], a discrete Fourier transform (DFT)-based HMIMO channel estimation algorithm was proposed by approximating the spatial correlation with a suitable circulant matrix. Nevertheless, such an algorithm was limited to uniform linear array (ULA)-based HMIMO systems, and cannot be extended to the more general antenna array geometries. In [4], a concise tutorial on HMIMO channel modeling and estimation was presented. Even though the aforementioned estimators significantly outperform the conventional least squares (LS) scheme, there still exists quite a large gap from that of the MMSE estimator.

In this paper, we affirmatively answer a fundamental question: Is it possible to establish a Bayes-optimal MMSE channel estimator for HMIMO systems in arbitrarily unknown EM environments? Owing to the complicated channel distribution and the ultra-high dimensionality of the problem, classical analytical methods become either sub-optimal in performance or too complicated to implement. Supervised deep learning-based methods can achieve near-optimal performance, but highly rely on a substantial dataset of the ground-truth channels [5], which is difficult to achieve in HMIMO systems. The data availability and complexity constitute the two core challenges that prevents the practical implementation of the MMSE channel estimator in HMIMO systems. These challenges can be both tackled by our proposed learning-based estimator:

  1. 1.

    Data availability: The proposed estimator needs neither the prior distribution nor the ground-truth channel data. Only the received pilot signals are required at the model training stage.

  2. 2.

    Complexity: The proposed estimator drops the prohibitive matrix inversion, and is with extremely low complexity.

Specifically, we propose a self-supervised deep learning framework for realizing the Bayes-optimal MMSE channel estimator for HMIMO. We first prove in theory that the MMSE channel estimator could be constructed based solely on the distribution of the received pilot signals through the Stein’s score function. Afterwards, we propose a practical algorithm to train neural networks to estimate the score function solely by using the collected received pilot signals. Lastly, a low-complexity principal component analysis (PCA)-based method is proposed to estimate the received signal-to-noise-ratio (SNR) from pilots alone, since it is required in the score-based MMSE estimator. Simulation results in both isotropic and non-isotropic environments are provided to illustrate the effectiveness and efficiency of the proposed estimator. Notably, it achieves almost the same performance as the oracle MMSE estimator with more than 20 times reduction in complexity, in a nominal HMIMO setup.

Notation: aa is a scalar. 𝐚\|\mathbf{a}\| is the 2\ell_{2}-norm of a vector 𝐚\mathbf{a}. 𝐀T\mathbf{A}^{T}, 𝐀H\mathbf{A}^{H}, (𝐀)\text{$\Re$}(\mathbf{A}), (𝐀)\Im(\mathbf{A}) are the transpose, Hermitian, the real part, and the imaginary part of a matrix 𝐀\mathbf{A}, respectively. 𝒞𝒩(𝝁,𝐑)\mathcal{CN}(\boldsymbol{\mu},\mathbf{R}) and 𝒩(𝝁,𝐑)\mathcal{N}(\boldsymbol{\mu},\mathbf{R}) are complex and real Gaussian distributions with mean 𝝁\boldsymbol{\mu} and covariance 𝐑\mathbf{R}, respectively. 𝐈\mathbf{I} is an identity matrix with an appropriate shape.

II HMIMO System and Channel Models

Consider the uplink of an HMIMO system, where the base station (BS) is equipped with a uniform planar array (UPA) with N×N\sqrt{N}\times\sqrt{N} antennas that simultaneously serves KK single antenna-user equipments (UEs). We focus on the cases where the BS has thousands of closely packed antennas with spacings dad_{a} below the nominal value of half the carrier wavelength λc\lambda_{c}. We define a local spherical coordinate system at the UPA with φ[π2,π2]\varphi\in[-\frac{\pi}{2},\frac{\pi}{2}] and ϑ[π2,π2]\text{$\vartheta$}\in[-\frac{\pi}{2},\frac{\pi}{2}] being the azimuth and elevation angles of arrival (AoAs), respectively, as depicted in Fig. 1. We index the antennas row-by-row with n{1,2,,N}n\in\{1,2,\ldots,N\}, and denote the position of the nn-th antenna as 𝐮n=[ux,n,uy,n,0]T\mathbf{u}_{n}=[u_{x,n},u_{y,n},0]^{T}, in which

{ux,n=(N1)da2+damod(n1,N),uy,n=(N1)da2dan1N.\begin{cases}u_{x,n}=-\frac{(\sqrt{N}-1)d_{a}}{2}+d_{a}\text{mod}(n-1,\sqrt{N}),\\ u_{y,n}=\frac{(\sqrt{N}-1)d_{a}}{2}-d_{a}\lfloor\frac{n-1}{\sqrt{N}}\rfloor.\end{cases} (1)

The notations mod(,)\text{mod}(\cdot,\cdot) and \lfloor\cdot\rfloor refer to the modulus operation and the floor function, respectively. Considering a planar wave impinging on the UPA111While we focus on the far-field case here, our proposal is also applicable to the near-field case [9], which will be discussed in our follow-up works. , the array response vector is given by

𝐚(φ,ϑ)=[ej2πλc𝐭(φ,ϑ)T𝐮1,,ej2πλc𝐭(φ,ϑ)T𝐮N]T,\mathbf{a}(\varphi,\vartheta)=[e^{j\frac{2\pi}{\lambda_{c}}\mathbf{t}(\varphi,\vartheta)^{T}\mathbf{u}_{1}},\ldots,e^{j\frac{2\pi}{\lambda_{c}}\mathbf{t}(\varphi,\vartheta)^{T}\mathbf{u}_{N}}]^{T}, (2)

with 𝐭(φ,ϑ)=[cos(ϑ)cos(φ),cos(ϑ)sin(φ),sin(ϑ)]T\mathbf{t}(\varphi,\vartheta)=[\cos(\vartheta)\cos(\varphi),\cos(\vartheta)\sin(\varphi),\sin(\vartheta)]^{T} being the unit vector in the AoA direction. We assume that orthogonal pilots are adopted and consider the channel 𝐡¯N×1\mathbf{\bar{h}}\in\mathbb{C}^{N\times 1} between the BS and an arbitrary UE, consisting of the superposition of multi-path components that can be represented by a continuum of planar waves [3], given by

𝐡¯=π/2π/2g(φ,ϑ)𝐚(φ,ϑ)𝑑ϑ𝑑φ,\mathbf{\bar{h}}=\iint_{-\nicefrac{{\text{$\pi$}}}{{2}}}^{\nicefrac{{\pi}}{{2}}}g(\varphi,\vartheta)\mathbf{a}(\varphi,\vartheta)d\vartheta d\varphi, (3)

where g(φ,ϑ)g(\varphi,\vartheta) denotes the angular spread function specifying the phase shift and the gain for each AoA direction (φ,ϑ)(\varphi,\vartheta). In accordance with [3], we can model g(φ,ϑ)g(\varphi,\vartheta) as a spatially uncorrelated symmetric Gaussian stochastic process with cross-correlation given by

𝔼{g(φ,ϑ)g(φ,ϑ)}=βf(φ,ϑ)δ(φφ)δ(ϑϑ),\mathbb{E}\{g(\varphi,\vartheta)g^{*}(\varphi^{\prime},\vartheta^{\prime})\}=\beta f(\varphi,\vartheta)\delta(\varphi-\varphi^{\prime})\delta(\vartheta-\vartheta^{\prime}), (4)

where β\beta is the average channel gain, δ()\delta(\cdot) denotes the Dirac delta function, and f(φ,ϑ)f(\varphi,\vartheta) is the spatial scattering function, i.e., the joint probability density function (PDF) of the azimuth and elevation AoAs. The function f(φ,ϑ)f(\varphi,\vartheta) is normalized such that f(φ,ϑ)𝑑ϑ𝑑φ=1\iint f(\varphi,\vartheta)d\vartheta d\varphi=1. The HMIMO channel can be modeled by the correlated Rayleigh fading222Notice that the proposed algorithm can work with any possible distribution of the HMIMO channel. We follow the convention in the literature and adopt the correlated Rayleigh fading here. This is because in this case, the Bayes-optimal estimator admits a closed form if the covariance 𝐑\mathbf{R} is perfectly known, which can serve as the oracle performance bound to benchmark our algorithm. , i.e., 𝐡¯𝒞𝒩(𝟎,𝐑)\mathbf{\bar{h}}\sim\mathcal{CN}(\mathbf{0},\mathbf{R}), with a spatial correlation matrix 𝐑N×N\mathbf{R}\in\mathbb{C}^{N\times N}, which satisfies tr(𝐑)=βN\text{tr}(\mathbf{R})=\beta N [3, 8]. Following (4), the correlation matrix 𝐑\mathbf{R} can be calculated by

𝐑=𝔼{𝐡¯𝐡¯H}=βπ/2π/2f(φ,ϑ)𝐚(φ,ϑ)𝐚H(φ,ϑ)𝑑ϑ𝑑φ.\mathbf{R}=\mathbb{E}\{\mathbf{\bar{h}}\mathbf{\bar{h}}^{H}\}=\beta\iint_{-\nicefrac{{\text{$\pi$}}}{{2}}}^{\nicefrac{{\pi}}{{2}}}f(\varphi,\vartheta)\mathbf{a}(\varphi,\vartheta)\mathbf{a}^{H}(\varphi,\vartheta)d\vartheta d\varphi. (5)

For any function f(φ,ϑ)f(\varphi,\vartheta), the (l,m)(l,m)-th entry of the correlation matrix 𝐑\mathbf{R} is given by

Refer to caption
Figure 1: A UPA-shaped HMIMO BS in a 3D spherical coordinate system with an impinging plane wave from azimuth AoA φ\varphi and elevation AoA θ\theta.
[𝐑]l,m=βπ/2π/2ej2πλc𝐭(φ,ϑ)T(𝐮l𝐮m)f(φ,ϑ)𝑑ϑ𝑑φ,[\mathbf{R}]_{l,m}=\beta\iint_{-\nicefrac{{\text{$\pi$}}}{{2}}}^{\nicefrac{{\pi}}{{2}}}e^{j\frac{2\pi}{\lambda_{c}}\mathbf{t}(\varphi,\vartheta)^{T}(\mathbf{u}_{l}-\mathbf{u}_{m})}f(\varphi,\vartheta)d\vartheta d\varphi, (6)

where []l,m[\cdot]_{l,m} denotes the (l,m)(l,m)-th element of a matrix. While in most cases the integral in (6) could only be computed numerically, a closed form solution exists in isotropic scattering environments, where the covariance 𝐑iso\mathbf{R}^{\text{iso}} is given in [10] as

[𝐑iso]l,m=sinc(2𝐮l𝐮mλc).[\mathbf{R}^{\text{iso}}]_{l,m}=\text{sinc}(\frac{2\|\mathbf{u}_{l}-\mathbf{u}_{m}\|}{\lambda_{c}}). (7)

Here sinc()sin(πx)πx\text{sinc}(\cdot)\triangleq\frac{\sin(\pi x)}{\pi x} denotes the sinc function, while \|\cdot\| is the Euclidean norm. In non-isotropic scattering environments where the augular density is not evenly distributed in the whole space, f(φ,ϑ)f(\varphi,\vartheta) takes many distinct forms in the literature. For example, in [3], the non-isotropic scattering function f(φ,ϑ)f(\varphi,\vartheta) is assumed to follow a cosine directivity pattern, while in [4], the leading eigenvalues of 𝐑iso\mathbf{R}^{\text{iso}} are truncated to obtain a non-isotropic covariance, which will be detailed in Section IV.

We then introduce the system model. In the uplink channel estimation phase, the UEs send known pilot sequences to the BS. Assuming that the orthogonal pilots are utilized, the real-valued equivalent of the received pilot signal (measurement) from an arbitrary UE, 𝐲2N×1\mathbf{y}\in\mathbb{R}^{2N\times 1}, is given by333Similar to [3], we consider a fully-digital system model, in which the dimensions of 𝐲\mathbf{y} and 𝐡\mathbf{h} are identical. Nevertheless, the proposed algorithm can be readily extended to compressed sensing-based channel estimation in hybrid analog-digital systems, in a similar manner as [11].

𝐲=ρ𝐡+𝐧,\mathbf{y}=\sqrt{\rho}\mathbf{h}+\mathbf{n}, (8)

where 𝐡=[(𝐡¯)T,(𝐡¯)T]T2N×1\mathbf{h}=[\Re(\mathbf{\bar{h}})^{T},\Im(\mathbf{\bar{h}})^{T}]^{T}\in\mathbb{R}^{2N\times 1} represents the real-valued channel, ρ\rho is the received signal-to-noise-ratio (SNR), and 𝐧𝒩(𝟎,𝐈)\mathbf{n}\sim\mathcal{N}(\mathbf{0},\mathbf{I}) is the additive white Gaussian noise. The channel estimator, in practice, should only have the knowledge of 𝐲\mathbf{y}, without knowing the true covariance 𝐑\mathbf{R} or the SNR ρ\rho.

III MMSE Estimation via Score Matching

In the following, we first discuss how to derive the score-based MMSE estimator solely based on the received pilots 𝐲\mathbf{y}, and then introduce how to estimate the two key components of the algorithm, i.e., the score function and the received SNR.

III-A Bridging MMSE Estimation with the Score Function

Our target is a Bayes-optimal channel estimator, 𝐡^=D(𝐲)\mathbf{\hat{h}}=D(\mathbf{y}), that minimizes the mean-square-error (MSE), i.e.,

MSE𝔼(𝐡𝐡^2|𝐲)=𝐡𝐡^2p(𝐡|𝐲)𝑑𝐡,\text{MSE}\triangleq\text{$\mathbb{E}(\|\mathbf{h}-\hat{\mathbf{h}}\|^{2}|\mathbf{y})=\int\|\mathbf{h}-\mathbf{\hat{h}}\|^{2}p(\mathbf{h}|\mathbf{y})d\mathbf{h}$}, (9)

where the expectation above is taken with respect to (w.r.t.) the unknown channel 𝐡\mathbf{h}, while p(𝐡|𝐲)p(\mathbf{h}|\mathbf{y}) is the posterior density. Taking a derivative of the above equation w.r.t. 𝐡^\hat{\mathbf{h}} and nulling it, we reach the Bayes-optimal, i.e., minimum MSE (MMSE), channel estimator, given by

𝐡^MMSE\displaystyle\hat{\mathbf{h}}_{\text{MMSE}} =𝔼(𝐡|𝐲)=𝐡p(𝐡|𝐲)𝑑𝐡=𝐡p(𝐡,𝐲)p(𝐲)𝑑𝐡,\displaystyle=\mathbb{E}(\mathbf{h}|\mathbf{y})=\int\mathbf{h}p(\mathbf{h}|\mathbf{y})d\mathbf{h}=\int\mathbf{h}\frac{p(\mathbf{h},\mathbf{y})}{p(\mathbf{y})}d\mathbf{h}, (10)

where p(𝐡,𝐲)p(\mathbf{h},\mathbf{y}) is the joint density, and p(𝐲)p(\mathbf{y}) is the measurement density obtained via marginalization, i.e.,

p(𝐲)\displaystyle p(\mathbf{y}) =p(𝐡,𝐲)𝑑𝐡=p(𝐲|𝐡)p(𝐡)𝑑𝐡\displaystyle=\int p(\mathbf{h},\mathbf{y})d\mathbf{h}=\int p(\mathbf{y}|\mathbf{h})p(\mathbf{h})d\mathbf{h} (11)
=(ρ2π)Nexp{ρ2𝐲𝐡2}p(𝐡)𝑑𝐡.\displaystyle=\left(\frac{\rho}{2\pi}\right)^{N}\int\exp\left\{-\frac{\rho}{2}\|\mathbf{y}-\mathbf{h}\|^{2}\right\}p(\mathbf{h})d\mathbf{h}.

The last equality holds since the likelihood function p(𝐲|𝐡)𝒩(𝐡,1ρ𝐈)p(\mathbf{y}|\mathbf{h})\sim\mathcal{N}(\mathbf{h},\frac{1}{\rho}\mathbf{I}) expresses p(𝐲)p(\mathbf{y}) as a convolution between the prior distribution p(𝐡)p(\mathbf{h}) and the i.i.d. Gaussian noise. Taking the derivative of both sides of (11) w.r.t. 𝐲\mathbf{y} gives

𝐲p(𝐲)\displaystyle\nabla_{\mathbf{y}}p(\mathbf{y}) =(ρ2π)N𝐲exp{ρ2𝐲𝐡2}p(𝐡)𝑑𝐡\displaystyle=\left(\frac{\rho}{2\pi}\right)^{N}\int\nabla_{\mathbf{y}}\exp\left\{-\frac{\rho}{2}\|\mathbf{y}-\mathbf{h}\|^{2}\right\}p(\mathbf{h})d\mathbf{h} (12)
=ρ(ρ2π)N(𝐡𝐲)exp{ρ2𝐲𝐡2}p(𝐡)𝑑𝐡\displaystyle=\rho\left(\frac{\rho}{2\pi}\right)^{N}\int(\mathbf{h}-\mathbf{y})\exp\left\{-\frac{\rho}{2}\|\mathbf{y}-\mathbf{h}\|^{2}\right\}p(\mathbf{h})d\mathbf{h}
=ρ(𝐡𝐲)p(𝐲|𝐡)p(𝐡)𝑑𝐡.\displaystyle=\rho\int(\mathbf{h}-\mathbf{y})p(\mathbf{y}|\mathbf{h})p(\mathbf{h})d\mathbf{h}.

Dividing both sides of (12) w.r.t. p(𝐲)p(\mathbf{y}) results in the following:

𝐲p(𝐲)p(𝐲)\displaystyle\frac{\nabla_{\mathbf{y}}p(\mathbf{y})}{p(\mathbf{y})} =ρ(𝐡𝐲)p(𝐲|𝐡)p(𝐡)p(𝐲)𝑑𝐡\displaystyle=\rho\int(\mathbf{h}-\mathbf{y})\frac{p(\mathbf{y}|\mathbf{h})p(\mathbf{h})}{p(\mathbf{y})}d\mathbf{h} (13)
=ρ(𝐡𝐲)p(𝐡|𝐲)𝑑𝐡\displaystyle=\rho\int(\mathbf{h}-\mathbf{y})p(\mathbf{h}|\mathbf{y})d\mathbf{h}
=ρ𝐡p(𝐡|𝐲)𝑑𝐡ρ𝐲p(𝐡|𝐲)𝑑𝐡\displaystyle=\rho\int\mathbf{h}p(\mathbf{h}|\mathbf{y})d\mathbf{h}-\rho\mathbf{y}\int p(\mathbf{h}|\mathbf{y})d\mathbf{h}
=ρ(𝐡^MMSE𝐲),\displaystyle=\rho\left(\mathbf{\hat{h}}_{\text{MMSE}}-\mathbf{y}\right)\text{,}

where the second equality holds owing to the Bayes’ theorem, and the last equality holds due to (10) and p(𝐡|𝐲)𝑑𝐡=1\int p(\mathbf{h}|\mathbf{y})d\mathbf{h}=1. By rearranging the terms and plugging in 𝐲p(𝐲)p(𝐲)=𝐲logp(𝐲)\frac{\nabla_{\mathbf{y}}p(\mathbf{y})}{p(\mathbf{y})}=\nabla_{\mathbf{y}}\log p(\mathbf{y}), we reach the foundation of the proposed algorithm:

𝐡^MMSE=𝐲+1ρ𝐲logp(𝐲),\hat{\mathbf{h}}_{\text{MMSE}}=\mathbf{y}+\frac{1}{\rho}\nabla_{\mathbf{y}}\log p(\mathbf{y}), (14)

where 𝐲logp(𝐲)\nabla_{\mathbf{y}}\log p(\mathbf{y}) is called the Stein’s score function in statistics [12]. From (14), we notice that the Bayes-optimal MMSE channel estimator can be achieved solely based on the received pilot signals 𝐲\mathbf{y}, without access to the prior distribution p(𝐡)p(\mathbf{h}) or a supervised dataset of the ground-truth channels 𝐡\mathbf{h}, which are unavailable in practice. With an efficient estimator of the score function 𝐲logp(𝐲)\nabla_{\mathbf{y}}\log p(\mathbf{y}), the Bayes-optimal MMSE channel estimator can be computed in a closed form with extremely low complexity. In addition, since (14) holds regardless of the distribution of the HMIMO channel 𝐡\mathbf{h}, one can construct the MMSE estimator in arbitrary EM environments without any assumptions on the scatterers or the array geometry.

It is noticed that two terms that should be obtained in (14), i.e., the received SNR ρ\rho and the measurement score function S(𝐲)S(\mathbf{y}). In the following, we discuss on how to utilize machine learning tools to obtain an accurate estimation of them based solely on the measurement 𝐲\mathbf{y}.

III-B Self-Supervised Learning of the Score Function

We discuss how to get the score function 𝐲logp(𝐲)\nabla_{\mathbf{y}}\log p(\mathbf{y}). Given that a closed-form expression is intractable to acquire, we instead aim to achieve a parameterized function with a neural network, and discuss how to train it based on score matching. We first introduce the denoising auto-encoder (DAE) [12], the core of the training process, and explain how to utilize it to approximate the score function.

To obtain the score function 𝐲logp(𝐲)\nabla_{\mathbf{y}}\log p(\mathbf{y}), the measurement 𝐲\mathbf{y} is treated as the target signal that the DAE should denoise. The general idea is to obtain the score function 𝐲logp(𝐲)\nabla_{\mathbf{y}}\log p(\mathbf{y}) based on its analytical relationship with the DAE of 𝐲\mathbf{y}, which will be established later in Theorem 1. We first construct a noisy version of the target signal 𝐲\mathbf{y} by manually adding some additive white Gaussian noise, σ𝐮\sigma\mathbf{u}, where 𝐮𝒩(𝟎,𝐈2N)\mathbf{u}\sim\mathcal{N}(\mathbf{0},\mathbf{I}_{2N}) and σ\sigma controls the noise level444Note that the extra noise is only added during the training process. , and then train a DAE to denoise the manually added noise. The DAE, denoted by R𝜽(;)R_{\bm{\theta}}(\cdot;\cdot), is trained by the 2\ell_{2}-loss function, i.e.,

DAE(𝜽)=𝔼𝐲R𝜽(𝐲;σ)2.\mathcal{L}_{\text{DAE}}(\bm{\theta})=\mathbb{E}\|\mathbf{y}-R_{\bm{\theta}}(\mathbf{y};\sigma)\|^{2}. (15)

The theorem below explains the relationship between the score function and the trained DAE.

Theorem 1 (Alain-Bengio [12, Theorem 1]).

The optimal DAE, R𝛉(;)R_{\bm{\theta}^{*}}(\cdot;\cdot), behaves asymptotically as

R𝜽(𝐲;σ)=𝐲+σ2𝐲logp(𝐲)+o(σ2),asσ0.R_{\bm{\theta}^{*}}(\mathbf{y};\sigma)=\mathbf{y}+\sigma^{2}\nabla_{\mathbf{y}}\log p(\mathbf{y})+o(\sigma^{2}),\,\,\text{\text{as}}\,\,\sigma\rightarrow 0. (16)
Proof:

Please refer to [12, Appendix A]. ∎

The above theorem indicates that, for a sufficiently small σ\sigma, we can approximate the score function based on the DAE by 𝐲logp(𝐲)R𝜽(𝐲;σ)𝐲σ2\nabla_{\mathbf{y}}\log p(\mathbf{y})\thickapprox\frac{R_{\bm{\theta}}(\mathbf{y};\sigma)-\mathbf{y}}{\sigma^{2}}, assuming that parameter of the DAE, 𝜽\bm{\theta}, is near-optimal, i.e., 𝜽𝜽\bm{\theta}\thickapprox\bm{\theta}^{*}. Nevertheless, the approximation can be numerically unstable as the denominator, σ2\sigma^{2}, is close to zero. To alleviate the problem, we improve the structure of the DAE and rescale the original loss function.

First, we consider a residual form of the DAE with a scaling factor. Specifically, let R𝜽(𝐲;σ)=σ2S𝜽(𝐲;σ)+𝐲R_{\bm{\theta}}(\mathbf{y};\sigma)=\sigma^{2}S_{\bm{\theta}}(\mathbf{y};\sigma)+\mathbf{y}. Plugging it into (16), the score function is approximately equal to

𝐲logp(𝐲)(σ2S𝜽(𝐲;σ)+𝐲)𝐲σ2=S𝜽(𝐲;σ),\nabla_{\mathbf{y}}\log p(\mathbf{y})\thickapprox\frac{(\sigma^{2}S_{\bm{\theta}}(\mathbf{y};\sigma)+\mathbf{y})-\mathbf{y}}{\sigma^{2}}=S_{\bm{\theta}}(\mathbf{y};\sigma), (17)

when σ0\sigma\rightarrow 0 holds. This reparameterization enables S𝜽(𝐲;σ)S_{\bm{\theta}}(\mathbf{y};\sigma) to approximate the score function directly, thereby circumventing the need for division that may lead to numerical instability. Also, the residual link significantly enhances the capability of the DAE, since it can easily learn an identity mapping [13].

Second, since the variance σ2\sigma^{2} of the manually added noise is small, the gradient of the DAE loss function (15) can easily vanish to zero and may lead to difficulties in training. Hence, we rescale the loss function by a factor of 1σ\frac{1}{\sigma} to safeguard the vanishing gradient problem, i.e.,

DAE(𝜽)=𝔼𝐮+σS𝜽(𝐲;σ)2,\text{$\mathcal{L}_{\text{DAE}}(\bm{\theta})=\mathbb{E}\|\mathbf{u}+\sigma S_{\bm{\theta}}(\mathbf{y};\sigma)\|^{2}$}, (18)

where (17) is plugged into the loss function.

We are interested in the region where σ\sigma is sufficiently close to zero, in which case S𝜽(𝐲;0)S_{\bm{\theta}}(\mathbf{y};0) can be deemed to be equal to the score function 𝐲logp(𝐲)\nabla_{\mathbf{y}}\log p(\mathbf{y}) according to (17). Nevertheless, directly training the network using a very small σ\sigma is difficult since the SNR of the gradient signal decreases in a linear rate 𝒪(σ)\mathcal{O}(\sigma) with respect to σ\sigma, which introduces difficulty for the stochastic gradient descent [14]. To exploit the asymptotic optimality of the score function approximation when σ0\sigma\rightarrow 0, we propose to simultaneously train the network S𝜽(𝐲;σ)S_{\bm{\theta}}(\mathbf{y};\sigma) with varying σ\sigma values, to handle various σ\sigma levels and then naturally generalize to the desired region, i.e., S𝜽(𝐲;0)S_{\bm{\theta}}(\mathbf{y};0). To achieve the goal, we control the manually added noise by letting σ\sigma follow a zero-mean Gaussian distribution σ𝒩(0,ξ2)\sigma\sim\mathcal{N}(0,\text{$\xi^{2}$}) and gradually anneal ξ[σmin,σmax]\xi\in[\sigma_{\text{min}},\sigma_{\text{max}}] from a large value σmax\sigma_{\text{max}} to a small one σmin0\sigma_{\text{min}}\thickapprox 0 in each iteration. That is, we condition S𝜽(𝐲;σ)S_{\bm{\theta}}(\mathbf{y};\sigma) on the manually added noise level σ\sigma during training.

The proposed algorithm is shown in Algorithm 1. The DAE is trained using stochastic gradient descent for QQ epochs. In each epoch, we draw a random vector 𝐮\mathbf{u} and anneal ξ\xi in σ𝒩(0,ξ2)\sigma\sim\mathcal{N}(0,\text{$\xi^{2}$}) to control the extra noise level according to the current number of iterations qq. Then, the DAE loss function DAE\mathcal{L}_{\text{DAE}} in (18) is minimized by stochastic optimization. Note that in the training process, nothing but a dataset of the received pilot signals 𝐲\mathbf{y} is necessary, which is readily available in practice. In the inference stage, one can apply formula (14) to compute the score-based MMSE estimator, in which the score function can be approximated by using S𝜽(𝐲;0)S_{\bm{\theta}}(\mathbf{y};0), i.e., setting σ\sigma as zero, and the received SNR ρ^\hat{\rho} can be estimated by the PCA-based algorithm in the next subsection.

For the neural architecture of the DAE S𝜽(;)S_{\bm{\theta}}(\cdot;\cdot), we adopt a simplified UNet architecture [15]. Note that depending on the complexity budget, many other prevailing neural architectures could also be applied [6]. Other details of the training process are deferred to Section IV.

Algorithm 1 Training and inference of the proposed algorithm
1:  /* The offline training stage */
2:  Input: Learning rate γ\gamma, maximum extra noise level σmax\sigma_{\text{max}}, minimum extra noise level σmin\sigma_{\text{min}}, number of epochs QQ, a dataset of received pilots {𝐲i}i=1M\{\mathbf{y}_{i}\}_{i=1}^{M}
3:  Output: Trained DAE parameters 𝜽\bm{\theta}
4:  for q=1:Qq=1:Q do
5:      Draw 𝐮𝒩(𝟎,𝐈2N)\mathbf{u}\sim\mathcal{N}(\mathbf{0},\mathbf{I}_{2N})
6:      Set the extra noise level with ξQqQσmin+qQσmax\xi\leftarrow\frac{Q-q}{Q}\sigma_{\text{min}}+\frac{q}{Q}\sigma_{\text{max}}
7:      Compute the loss function DAE\mathcal{L}_{\text{DAE}} as in (18)
8:      Update NN parameters as 𝜽𝜽γ𝜽DAE\bm{\theta}\leftarrow\bm{\theta}-\gamma\nabla_{\bm{\theta}}\mathcal{L}_{\text{DAE}}
9:  return 𝜽\bm{\theta}
10:  /* The online inference stage */
11:  Input: Received pilots 𝐲\mathbf{y}, trained DAE parameters 𝜽\bm{\theta}, size of the sliding window d×d\sqrt{d}\times\sqrt{d}
12:  Initialize: 𝐲logp(𝐲)S𝜽(𝐲;0)\nabla_{\mathbf{y}}\log p(\mathbf{y})\leftarrow S_{\bm{\theta}}(\mathbf{y};0)
13:  Utilize the PCA-based algorithm [16] to estimate SNR ρ^\hat{\rho}
14:  Compute the estimated channel 𝐡^MMSE\mathbf{\hat{h}}_{\text{MMSE}} as in (14)
15:  return 𝐡^MMSE\mathbf{\hat{h}}_{\text{MMSE}}

III-C PCA-Based Received SNR Estimation

We propose a low-complexity PCA-based algorithm to estimate the received SNR ρ\rho in (14) based on a single instance of the pilot signals 𝐲\mathbf{y}. Before further discussion, we stress that the received SNR estimation is executed only at the inference stage, not at the training stage, as shown in Algorithm 1.

The basic idea behind the PCA-based algorithm is the low-rankness of the spatial correlation matrix 𝐑\mathbf{R} of HMIMO due to the dense deployment of the antenna elements. Specifically, for isotropic scattering environments, the rank of the correlation matrix 𝐑iso\mathbf{R}_{\text{iso}} is approximately rank(𝐑iso)πNda2/λc2\text{rank}(\mathbf{R}_{\text{iso}})\thickapprox\pi Nd_{a}^{2}/\lambda_{c}^{2} [17]. It decreases with the shrink of antenna spacing and the increase of the carrier frequency. For example, when da=λc/4d_{a}=\lambda_{c}/4, around 80%80\% of the eigenvalues of 𝐑iso\mathbf{R}_{\text{iso}} shrinks towards zero. The rank deficiency of 𝐑\mathbf{R} tends to be even more prominent in the case of non-isotropic scattering environments [4].

Similar to Fig. 2 in our previous work [16], we decompose multiple virtual subarray channels (VSCs) from the HMIMO channel by a sliding window. Specifically, we reshape the real-valued HMIMO channel 𝐡2N×1\mathbf{h}\in\mathbb{R}^{2N\times 1} into a tensor form 𝐇N×N×2\mathbf{H}\in\mathbb{R}^{\sqrt{N}\times\sqrt{N}\times 2}. We then decompose 𝐇\mathbf{H} into s=(Nd+1)2s=(\sqrt{N}-\sqrt{d}+1)^{2} VSC tensors using a sliding window of size d×d×2\mathbb{R}^{\sqrt{d}\times\sqrt{d}\times 2}, and then reshape them back into the vector form to obtain a set of VSCs, denoted by {𝐡t2d×1}t=1s\{\mathbf{h}_{t}\in\mathbb{R}^{2d\times 1}\}_{t=1}^{s}. Similarly, the received pilot signals and the noise could also be decomposed as {𝐲t2d×1}t=1s\{\mathbf{y}_{t}\in\mathbb{R}^{2d\times 1}\}_{t=1}^{s} and {𝐧t2d×1}t=1s\{\mathbf{n}_{t}\in\mathbb{R}^{2d\times 1}\}_{t=1}^{s}, and should satisfy

𝐲t=ρ𝐡t+𝐧t.\mathbf{y}_{t}=\sqrt{\rho}\mathbf{h}_{t}+\mathbf{n}_{t}. (19)

Due to the low-rankness of the spatial correlation matrix 𝐑\mathbf{R}, the decomposed VSCs 𝐡t\mathbf{h}_{t} should also lie in a low-dimensional subspace. In Fig. 2, we plot the eigenvalues of the covariance matrices of the decomposed VSCs {𝐡t2d×1}t=1s\{\mathbf{h}_{t}\in\mathbb{R}^{2d\times 1}\}_{t=1}^{s} from an HMIMO channel in isotropic scattering environments and their corresponding received pilots {𝐲t2d×1}t=1s\{\mathbf{y}_{t}\in\mathbb{R}^{2d\times 1}\}_{t=1}^{s}, respectively, with a reference line marking the inverse of the received SNR 1ρ\frac{1}{\rho}. According to the figure, the eigenvalues of the covariance of 𝐡t\mathbf{h}_{t} quickly shrinks to zero with about 30 principal dimensions. The zero eigenvalues correspond to the redundant dimensions. In contrast, we observe that the eigenvalues of the covariance of 𝐲t\mathbf{y}_{t} are concentrated around 1ρ\frac{1}{\rho} in the redundant dimensions. This example suggests that it is indeed possible to estimate the received SNR based on the redundant eigenvalues. Rigorously, these eigenvalues could be proved to follow a Gaussian distribution 𝒩(1ρ,2sρ2)\mathcal{N}(\frac{1}{\rho},\frac{2}{s\rho^{2}}) [16]. Plus, the principal and redundant eigenvalues could be separated by an iterative process. Hence, we could leverage [16, Algorithm 1] to accurately estimate the received SNR. Detailed setups are discussed in Section IV.

Refer to caption
Figure 2: Eigenvalues of the covariance matrices in descending order from an HMIMO channel in isotropic scattering environments and their corresponding received pilots, when the received SNR is 0 dB.

III-D Complexity Analysis

The inference complexity of the proposed algorithm consists of the computation of S𝜽(𝐲;0)S_{\bm{\theta}}(\mathbf{y};0) and the PCA-based estimation of ρ\rho. The former depends upon the specific neural architecture of S𝜽(,)S_{\bm{\theta}}(\cdot,\cdot), and costs a constant complexity, denoted by pp, once the network is trained. The computational complexity for the latter, as analyzed in [16], is given by 𝒪(Nd2+d3)\text{$\mathcal{O}$}(Nd^{2}+d^{3}), where dd, the size of the sliding window, is usually quite small. Hence, the overall complexity is 𝒪(Nd2+d3+p)\text{$\mathcal{O}$}(Nd^{2}+d^{3}+p), which scales only linearly with respect to the number of antennas NN.

In practice, the fluctuation of the received SNR may not be frequent. In such a case, it is not a necessity to estimate the received SNR for every instance of the received pilot signals 𝐲\mathbf{y}. Therefore, the actual complexity of the proposed score-based algorithm is within the range of 𝒪(p)\mathcal{O}(p) and 𝒪(Nd2+d3+p)\text{$\mathcal{O}$}(Nd^{2}+d^{3}+p), which is extremely efficient given near-optimal performance555Most previous works on channel estimation assume that the received SNR is perfectly known. In this case, the complexity reduces to 𝒪(p)\mathcal{O}(p). . By sharp contrast, the (oracle) MMSE estimator requires time-consuming matrix inversion, which is as complex as 𝒪(N3)\mathcal{O}(N^{3}). In Section IV, we provide a running time complexity to offer a straightforward comparison.

IV Simulation Results

IV-A Simulation Setups

We consider a typical HMIMO system setup with N=1024N=1024 and da=λc/4d_{a}=\lambda_{c}/4. In the training stage, the hyper-parameters of the proposed score-based algorithm are chosen as γ=0.001\gamma=0.001, σmin=0.001\sigma^{\text{min}}=0.001, σmax=0.1\sigma^{\text{max}}=0.1, Q=100Q=100, and M=10,000M=10\text{,}000. Also, the learning rate γ\gamma is decayed by half after every 25 epochs. In the inference stage, the size of the sliding window is set as 7×77\times 7. The performance discussed below is all averaged over a held-out dataset consisting of L=10,000L=10,000 testing samples.

TABLE I: Performance of the received SNR estimation
1ρ\frac{1}{\rho} / (SNR) Method Bias Std RMSE
1.0000 (0 dB) Oracle 0.0004 0.0160 0.0160
Proposed 0.0080 0.0371 0.0380
Sparsity 0.1769 0.0277 0.1790
0.1778 (15 dB) Oracle <0.0001 0.0028 0.0028
Proposed 0.0031 0.0064 0.0071
Sparsity 0.1530 0.0163 0.1539

For isotropic scattering, the covariance 𝐑iso\mathbf{R}_{\text{iso}} is given via (7). In the non-isotropic case, we follow [4] and also construct the covariance 𝐑\mathbf{R} by truncating the leading rank(𝐑iso)/8\nicefrac{{\text{rank}(\mathbf{R}_{\text{iso}})}}{{8}} eigenvalues of 𝐑iso\mathbf{R}_{\text{iso}}. Mathematically, the non-isotropic covariance could be expressed as 𝐑=𝐕𝚲trunc𝐕H\mathbf{R}=\mathbf{V}\boldsymbol{\Lambda}_{\text{trunc}}\mathbf{V}^{H}, where 𝐕\mathbf{V} is a matrix consisting of the eigenvectors of 𝐑iso\mathbf{R}_{\text{iso}}, while 𝚲trunc\boldsymbol{\Lambda}_{\text{trunc}} is a diagonal matrix that contains the eigenvalues of 𝐑iso\mathbf{R}_{\text{iso}} arranged in descending order. To truncate the matrix, we select only the first rank(𝐑iso)/8\nicefrac{{\text{rank}(\mathbf{R}_{\text{iso}})}}{{8}} eigenvalues and eigenvectors of 𝐑iso\mathbf{R}_{\text{iso}}.

IV-B Accuracy of Received SNR Estimation

The accuracy of the received SNR estimation can influence the performance of the proposed score-based estimator. Hence, different from previous works that assume perfect knowledge of the received SNR, we propose a practical PCA-based means to estimate it and examine its performance. In Table I, we list the estimation accuracy under different SNRs. We present the performance in terms of the inverse of the SNR, i.e., 1ρ\frac{1}{\rho}, since it is used in (14). The bias, the standard deviation (std), and the root MSE (RMSE) are given by 𝔼[|1ρ𝔼[(1ρ^)]|]\mathbb{E}[|\frac{1}{\rho}-\mathbb{E}[(\frac{1}{\hat{\rho}})]|], 𝔼[((1ρ^)𝔼[(1ρ^)])2]\sqrt{\mathbb{E}[((\frac{1}{\hat{\rho}})-\mathbb{E}[(\frac{1}{\hat{\rho}})])^{2}]}, and 𝔼[(1ρ(1ρ^))2]\sqrt{\mathbb{E}[(\frac{1}{\rho}-(\frac{1}{\hat{\rho}}))^{2}]}, respectively. The estimator’s accuracy and robustness are reflected with the bias and std, while the RMSE provides an assessment of its overall performance. From Table I, we observe that the performance of the proposed PCA-based method is both highly accurate and robust, and outperforms the sparsity-based median absolute deviation (MAD) estimator in [18], as HMIMO channels are not exactly sparse. Also, the performance is close to the oracle bound which assumes perfect knowledge of the channel 𝐡\mathbf{h}, and estimate ρ\rho directly from 𝐲𝐡\mathbf{y-\mathbf{h}} [16]. Later, in Section IV-D, we will illustrate that the proposed score-based channel estimator is robust to SNR estimation errors.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 3: Simulation results. (a) NMSE versus SNR in isotropic scattering. (b) NMSE versus SNR in non-isotropic scattering. (c) The influence of the accuracy of the received SNR estimation on the NMSE performance, when the received SNR is ρ=10\rho=10 dB.

IV-C Normalized MSE (NMSE) in Isotropic and Non-Isotropic Scattering Environments

We compare the proposed score-based estimator with three benchmarks, including LS, sample MMSE, and oracle MMSE. Here, oracle MMSE refers to the MMSE estimator with perfect knowledge of both the covariance 𝐑\mathbf{R} and the received SNR ρ\rho, which is the Bayesian performance bound, given by

𝐡^oracle-MMSE=ρ𝐑(ρ𝐑+𝐈)1𝐲.\mathbf{\hat{h}}_{\text{oracle-MMSE}}=\sqrt{\rho}\mathbf{R}(\rho\mathbf{R}+\mathbf{I})^{-1}\mathbf{y}. (20)

This is difficult, if not impossible, to acquire in practice since 𝐑\mathbf{R} contains N2N^{2} entries and is prohibitive to estimate when NN is particularly large in an HMIMO system. The sample MMSE method utilizes the same equation as (20), but replaces the true covariance 𝐑\mathbf{R} with an estimated one 𝐑sample\mathbf{R}_{\text{sample}} based upon the testing samples, given by 𝐑sample1Ll=1L𝐲l𝐲lH1ρ𝐈\mathbf{R}_{\text{sample}}\triangleq\frac{1}{L}\sum_{l=1}^{L}\mathbf{y}_{l}\mathbf{y}_{l}^{H}-\frac{1}{\rho}\mathbf{I} [8], where LL is the number of testing samples and 𝐲l\mathbf{y}_{l} denotes the ll-th sample of the received pilot signals in the testing dataset. We also utilize the perfect received SNR ρ\rho in sample MMSE.

In Fig. 3(a), we present the NMSE as a function of the SNR ρ\rho in isotropic scattering environments. It is illustrated that the proposed score-based algorithm significantly outperforms the LS and the sample MMSE estimator, and achieves almost the same NMSE as the oracle MMSE bound at every SNR level. Note that the oracle MMSE method utilizes the true covariance and received SNR, but the proposed method requires neither.

In Fig. 3(b), we compare the NMSE in non-isotropic environments, in which the rank of the covariance matrix 𝐑\mathbf{R} is further reduced. As a result, the performance gap between the LS and the oracle MMSE estimator is enlarged as the rank deficiency becomes more significant. Nevertheless, similar to the isotropic case, the proposed score-based estimator exhibits almost the same performance as the oracle bound and significantly outperforms the sample MMSE method, illustrating its effectiveness in different scattering environments.

IV-D Robustness to SNR Estimation Errors

In Fig. 3(c), we provide discussions on how the accuracy for the received SNR estimation will influence the performance of the proposed score-based estimator. In the simulations, the true value of the received SNR is set as 10 dB, i.e., 1ρ=0.1\frac{1}{\rho}=0.1. We vary the adopted values of (1ρ^)(\frac{1}{\hat{\rho}}) in (14) within (1ρ^)[0,0.24](\frac{1}{\hat{\rho}})\in[0,0.24], and plot the corresponding NMSE performance curve in blue. Particularly, we utilize red and green dots to denote the NMSE achieved with the estimated and true SNR values, respectively. The LS and the oracle MMSE algorithms are presented as the performance upper and lower bounds. It is observed that even when an inexact received SNR is adopted, the performance of the score-based algorithm is still quite robust and significantly outperforms the LS method. Also, the estimated received SNR by the PCA-based method is accurate enough to offer a near-optimal performance, even in unknown EM environments.

IV-E Running Time Complexity

We introduce the CPU running time of the proposed algorithm. For the considered setups, the proposed score-based estimator takes as low as 3 ms on an Intel Core i7-9750H CPU, which is much shorter than that of the oracle MMSE method involving high-dimensional matrix inverse (requiring around 70 ms). The high efficiency and the Bayes-optimal performance in unknown EM environments thus make the proposed score-based algorithm an ideal candidate in practice.

V Conclusion and Future Directions

In this paper, we studied channel estimation for the HMIMO systems, and proposed a score-based MMSE channel estimator that can achieve Bayes-optimal performance with an extremely low complexity. Particularly, the proposed algorithm is trained solely based on the received pilots, without requiring any kind of priors or supervised datasets that are prohibitive to collect in practice. This enables it to work in arbitrary and unknown EM environments that may appear in real-world deployment. As a future direction, it is interesting to extend to the proposed framework to compressed sensing-based setups. Furthermore, since the proposed algorithm is self-supervised, it is promising to investigate the possibility of online learning and adaptation.

References

  • [1] W. Yu, H. He, X. Yu, S. Song, J. Zhang, R. D. Murch, and K. B. Letaief, “Bayes-optimal unsupervised learning for channel estimation in near-field holographic MIMO,” arXiv preprint arXiv:2312.10438, 2023.
  • [2] A. Pizzo, T. L. Marzetta, and L. Sanguinetti, “Spatially-stationary model for holographic MIMO small-scale fading,” IEEE J. Sel. Areas Commun., vol. 38, no. 9, pp. 1964–1979, Sept. 2020.
  • [3] O. T. Demir, E. Bjornson, and L. Sanguinetti, “Channel modeling and channel estimation for holographic massive MIMO with planar arrays,” IEEE Wireless Commun. Lett., vol. 11, no. 5, pp. 997–1001, May 2022.
  • [4] J. An, C. Yuen, C. Huang, M. Debbah, H. V. Poor, and L. Hanzo, “A tutorial on holographic MIMO communications–part I: Channel modeling and channel estimation,” IEEE Commun. Lett., vol. 27, no. 7, pp. 1664–1668, Jul. 2023.
  • [5] W. Yu, Y. Shen, H. He, X. Yu, S. Song, J. Zhang, and K. B. Letaief, “An adaptive and robust deep learning framework for THz ultra-massive MIMO channel estimation,” IEEE J. Sel. Topics Signal Process., vol. 17, no. 4, pp. 761–776, Jul. 2023.
  • [6] W. Yu, Y. Ma, H. He, S. Song, J. Zhang, and K. B. Letaief, “AI-native transceiver design for near-field ultra-massive MIMO: Principles and techniques,” arXiv preprint arXiv:2309.09575, 2023.
  • [7] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715, Feb. 2020.
  • [8] A. A. D’Amico, G. Bacci, and L. Sanguinetti, “DFT-based channel estimation for holographic MIMO,” in Proc. Asilomar Conf. Signals Syst. Comput., Pacific Grove, CA, USA, Nov. 2023.
  • [9] Z. Wang, J. Zhang, H. Du, D. Niyato, S. Cui, B. Ai, M. Debbah, K. B. Letaief, and H. V. Poor, “A tutorial on extremely large-scale MIMO for 6G: Fundamentals, signal processing, and applications,” arXiv preprint arXiv:2307.07340, 2023.
  • [10] A. de Jesus Torres, L. Sanguinetti, and E. Bjornson, “Electromagnetic interference in RIS-aided communications,” IEEE Wireless Commun. Lett., vol. 11, no. 4, pp. 668–672, Apr. 2022.
  • [11] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Trans. Inf. Theory, vol. 62, no. 9, pp. 5117–5144, Sept. 2016.
  • [12] G. Alain and Y. Bengio, “What regularized auto-encoders learn from the data-generating distribution,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3563–3593, Nov. 2014.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Las Vegas, NV, USA, Jun. 2016.
  • [14] J. H. Lim, A. Courville, C. Pal, and C.-W. Huang, “AR-DAE: towards unbiased neural entropy gradient estimation,” in Proc. Int. Conf. Mach. Learn., Virtual, Jul. 2020.
  • [15] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention, Munich, Germany, Oct. 2015.
  • [16] W. Yu, H. He, X. Yu, S. Song, J. Zhang, and K. B. Letaief, “Blind performance prediction for deep learning based ultra-massive MIMO channel estimation,” in Proc. IEEE Int. Conf. Commun., Rome, Italy, May 2023.
  • [17] A. Pizzo, L. Sanguinetti, and T. L. Marzetta, “Fourier plane-wave series expansion for holographic MIMO communications,” IEEE Trans. Wireless Commun., vol. 21, no. 9, pp. 6890–6905, Sept. 2022.
  • [18] A. Gallyas-Sanhueza and C. Studer, “Low-complexity blind parameter estimation in wireless systems with noisy sparse signals,” IEEE Trans. Wireless Commun., vol. 22, no. 10, pp. 7055–7071, Oct. 2023.