This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Beam-Delay Domain Channel Estimation for mmWave XL-MIMO Systems

Hongwei Hou, , Xuan He, ,
Tianhao Fang, ,  Xinping Yi, ,
Wenjin Wang, ,  Shi Jin, 
Manuscript received xxx; revised xxx.Hongwei Hou, Xuan He, Tianhao Fang, Xinping Yi, Wenjin Wang, and Shi Jin are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).
Abstract

This paper investigates the uplink channel estimation of the millimeter-wave (mmWave) extremely large-scale multiple-input-multiple-output (XL-MIMO) communication system in the beam-delay domain, taking into account the near-field and beam-squint effects due to the transmission bandwidth and array aperture growth. Specifically, we model the sparsity in the delay domain to explore inter-subcarrier correlations and propose the beam-delay domain sparse representation of spatial-frequency domain channels. The independent and non-identically distributed Bernoulli-Gaussian models with unknown prior hyperparameters are employed to capture the sparsity in the beam-delay domain, posing a challenge for channel estimation. Under the constrained Bethe free energy minimization framework, we design different structures on the beliefs to develop hybrid message passing (HMP) algorithms, thus achieving efficient joint estimation of beam-delay domain channel and prior hyperparameters. To further improve the model accuracy, the multidimensional grid point perturbation (MDGPP)-based representation is presented, which assigns individual perturbation parameters to each multidimensional discrete grid. By treating the MDGPP parameters as unknown hyperparameters, we propose the two-stage HMP algorithm for MDGPP-based channel estimation, where the output of the initial estimation stage is pruned for the refinement stage for the computational complexity reduction. Numerical simulations demonstrate the significant superiority of the proposed algorithms over benchmarks with both near-field and beam-squint effects.

Index Terms:
Channel estimation, millimeter-wave, extremely large antenna array, near-field effect, beam-squint effect.

I Introduction

Driven by the demands of higher data rates, both millimeter-wave (mmWave) and extremely large-scale multiple-input multiple-output (XL-MIMO) technologies are expected to play pivotal roles for the future cellular communication systems due to the vast spectrum and high spectral efficiency [1, 2, 3, 4, 5]. Accurate channel estimation is crucial for enabling the performance gains of mmWave and XL-MIMO, which is significantly challenged by the near-field and beam-squint effects [6, 7, 8, 9, 10].

Due to the extremely large aperture array (ELAA) employed in the mmWave XL-MIMO systems, the Rayleigh distance, which is proportional to the square of the array aperture, becomes comparable to the cell coverage radius [6]. This fact allows mobile terminal (MTs) and scatterers to appear in the near-field region, thereby leading to the special wave propagation [7]. On the other hand, large bandwidths are typically employed in the mmWave systems, such as the maximum transmission bandwidth of up to 22 GHz in the fifth-generation (5G) New Radio (NR) specifications [11], which results in a rather low sampling interval. Therefore, when mmWave is integrated with ELAA, the propagation delay difference over the antenna array will be higher than the sampling interval, resulting in the beam-squint effect [8, 9, 10]. With the above analysis, it is shown that the near-field and beam-squint effects in mmWave XL-MIMO systems render its channel model significantly different from that of the conventional massive MIMO (mMIMO) system, which calls for new approaches for accurate channel estimation.

I-A Previous Works

At millimeter-wave frequencies, the channels are typically contributed by very few paths due to the sparse scattering nature [12, 13, 14], which facilitates compressed sensing techniques to enhance the channel estimation performance. For conventional mMIMO systems, the orthogonal matching pursuit (OMP)-based channel estimation algorithm is proposed in [15], which exploits beam domain sparsity based on the angle sampling and reconstructs the channel by extracting the dominant paths. For orthogonal frequency division multiplexing (OFDM) systems, [16] and [17] present the simultaneously weighted OMP (SW-OMP) algorithm and subcarrier selection-simultaneous iterative gridless weighted-orthogonal least squares (SS-SIGW-OLS) to exploit the joint sparsity of beam domain channels under different subcarriers. Furthermore, channel estimation based on beam-delay domain sparsity is proposed in [18] to explore the inter-antenna and inter-subcarrier correlations.

Since the near-field effect in XL-MIMO systems destroys the plane wave assumption, the beam domain sparsity based on the angle sampling no longer holds, necessitating the novel beam domain representation. Following the spherical wave in the near-field channels, the beam domain sparsity based on the angle-distance sampling, denoted as polar domain, is proposed in [19, 20], followed by the OMP algorithm for channel estimation. To alleviate the estimation error introduced by discrete sampling, [19] presents the polar domain simultaneously iterative gridless weighted (P-SIGW) algorithm, which updates the angle-distance sampling grids based on the gradient descent. As the scatterers and MTs may be distributed in both far-field and near-field regions, the hybrid-field model is employed in [21, 22, 23, 24] for such scenarios, which exploit both angle and angle-distance sampling.

With the transmission bandwidth and array aperture growth, the beam-squint effect becomes more significant, which destroys the joint sparsity of beam domain across subcarriers. To address this issue, [25] and [26] propose the pattern of beam domain sparsity variations with subcarriers in far-field channels, followed by the generalized simultaneous OMP (GSOMP) the beam-split pattern detection (BSPD) algorithms. Furthermore, by exploiting the beam-delay domain sparsity with beam-squint effects, the super resolution-based channel estimation algorithms are proposed in [27, 28, 29]. Most recently, the channel estimation with both near-field and beam-squint effects is investigated in [30, 31]. Specifically, [30] explores the bilinear patterns of beam domain sparsity variations with subcarriers in the common sensing matrix, facilitating the bilinear patterns detection (BPD)-based channel estimation algorithm. Besides, the near-field beam-split-aware OMP (NBA-OMP) algorithms is proposed in the [31] based on the GSOMP algorithm, which exploits the frequency selective sensing matrices to explore beam domain sparsity based on angle-distance domain sampling and the joint sparsity across subcarriers. The exploration of joint sparsity variations due to beam squint effects in [31, 30] provides one solution for the channel estimation in mmWave XL-MIMO systems, paving the way for subsequent work.

I-B Motivations and Contributions

In mmWave XL-MIMO systems, the near-field and beam-squint effects induce substantial changes in channel models, necessitating more accurate modeling. Besides, there are significant correlations between subcarriers in the OFDM systems, which are under-explored in such scenarios. Finally, the sparse model based on discretized sampling encounters an inherent mismatch with continuous parameters, thereby warranting further investigation into effective mitigation strategies for this issue.

Motivated by the previous works, we investigate the channel estimation for mmWave XL-MIMO systems with near-field and beam-squint effects. The main contributions of this paper are summarized as follows:

  • By incorporating the near-field and beam-squint effects, we model the channel in the beam-delay domain to explore inter-antenna and inter-subcarrier correlations. In doing so, we end up with the beam-delay domain sparse representation of the spatial-frequency domain channel. To capture the beam-delay domain sparsity, the independent and non-identically distributed Bernoulli-Gaussian models with unknown prior hyperparameters are employed, facilitating the formulation of the channel estimation problem.

  • To cope with the unknown hyperparameters in the prior models, we introduce the constrained Bethe free energy (BFE) minimization framework, which enables the joint estimation of the beam-delay domain channel and the hyperparameters in the prior model. The hybrid message passing (HMP) algorithm is established by designing different belief structures to estimate beam-delay domain channels and their prior hyperparameters, thereby, spatial-frequency domain channels.

  • To further improve the model accuracy, we introduce the multidimensional grid point perturbation (MDGPP)-based representation, which assigns individual perturbation parameters to each multidimensional discrete grid and thus mitigates mismatches between continuous parameters and discrete grids. Under the constrained BFE minimization framework, the MDGPP parameters are treated as unknown hyperparameters, facilitating the HMP algorithm for MDGPP-based channel estimation. To reduce the computational complexity, we propose the two-stage HMP algorithm, where the output of the initial estimation stage is pruned for the refinement stage.

Organization: The rest of this paper is organized as below. Section II describes the system model and formulates the channel estimation problem. In Section III, the HMP algorithm is developed based on constrained BFE minimization framework. The two-stage HMP algorithm for the MDGPP-based channel estimation is proposed in Section IV, whose computational complexity is also analyzed. Numerical simulations in Section V shows the superiority of the proposed algorithms over the benchmarks. Finally, Section VI concludes the paper.

Notation: Throughout this paper, imaginary unit is represented by j=1j=\sqrt{-1}. Lowercase, bold lowercase, and bold uppercase letters denote scalars, column vectors, and matrices, respectively. The transpose, conjugate, and conjugate-transpose operations are represented by the superscripts ()T(\cdot)^{T}, ()(\cdot)^{*}, and ()H(\cdot)^{H}, respectively. The Kronecker and Hardmard products are represented by the operator \otimes and \circ, respectively. []i[\cdot]_{i} and []i,j[\cdot]_{i,j} are the ii-th element of a vector and the (i,j)(i,j)-th element of a matrix, respectively. 2\norm{\cdot}_{2} and \norm{\cdot}_{\infty} represent the 2\ell_{2} and \ell_{\infty} norms, respectively. max{}\max\left\{\cdot\right\} and min{}\min\left\{\cdot\right\} denote the maximum and minimum operators, respectively. 𝖤{}\mathsf{E}\left\{\cdot\right\} and 𝖵{}\mathsf{V}\left\{\cdot\right\} denote the expectation and variance operators, respectively. \lfloor\cdot\rfloor, \lceil\cdot\rceil, and \langle\cdot\rangle denote the floor, ceil, and modulo operations, respectively. The element-wise inverse is represented by the superscripts ()1(\cdot)^{{\circ}-1}. The symbols \mathbb{C} denote the complex number fields. 𝒞𝒩(x;μ,σ2)\mathcal{CN}(x;{\mu},{\sigma}^{2}) denotes the random variable xx obeying circularly symmetric complex Gaussian distribution with mean μ\mu and variance σ2{\sigma}^{2}.

II System Model and Problem Formulation

In this paper, we consider the XL-MIMO-OFDM system in mmWave frequencies, which consists of a BS equipped with a uniform linear array (ULA) of NN (N 1N{\;\gg\;}1) antennas and UU single-antenna MTs. In the time dulplex division (TDD) systems, the channel is estimated from the uplink pilot symbols. It is assumed that the NpN_{\text{p}} pilot subcarriers in each training symbol are equally spaced without overlapping for each MT. Let NallN_{\text{all}} and Δf{\Delta}f denote the number of subcarriers and subcarrier spacing, respectively. Then the transmission bandwidth and pilot subcarrier spacing can be defined as B=NallΔfB=N_{\text{all}}{\Delta}f and Δf¯=B/Np{\Delta}\bar{f}=B/N_{\text{p}}, respectively.

II-A Received Signal Model and Channel Model

In the mmWave XL-MIMO systems, the BS usually adopts hybrid precoding architecture, which makes only a far smaller number of baseband received signals at each training symbol observed. With the hybrid precoding architecture, we assume that the BS is equipped with NRFN_{\text{RF}} RF chains (NRFNN_{\text{RF}}{\;\ll\;}N), where the RF chains and the antennas are connected by phase shifters network[32]. Besides, quasi-static channel assumptions are considered in this paper, where BS, MTs and scatters remain nearly unchanged during channel estimation. Thus, the received signal 𝐲b,p\mathbf{y}_{b,p} under the pp-th pilot subcarrier at the bb-th training symbol can be expressed as111Since non-overlapping pilot subcarriers are allocated for different MTs, we can focus on only the single MT and omit the MT index to simplify the expression.

𝐲b,p=𝐅b,pH𝐡pxb,p+𝐳b,p,\mathbf{y}_{b,p}=\mathbf{F}_{b,p}^{H}\mathbf{h}_{p}x_{b,p}+\mathbf{z}_{b,p}, (1)

where xb,px_{b,p} denotes the pilot under the pp-th pilot subcarrier at the bb-th training symbol, which can be set to 11 without loss of generality, and 𝐳b,p𝒞𝒩(0,σz2𝐈)\mathbf{z}_{b,p}{\;\sim\;}\mathcal{CN}\left(0,{\sigma}_{z}^{2}\mathbf{I}\right) denotes the additive white Gaussian nosie (AWGN) under the pp-th pilot subcarrier.

In the hybrid precoding architecture, the number of measurements required for reliable channel estimation cannot be satisfied with only one pilot symbol, and thus NPSN_{\text{PS}} pilot symbols are required. Stacking all the received signal under the pp-th pilot subcarrier in a column vector, we can obtain

𝐲p=𝐅pH𝐡p+𝐳p,\mathbf{y}_{p}=\mathbf{F}_{p}^{H}\mathbf{h}_{p}+\mathbf{z}_{p}, (2)

where 𝐅p=[𝐅0,p,,𝐅NRX1,p]N×M\mathbf{F}_{p}=\left[\mathbf{F}_{0,p},{\dots},\mathbf{F}_{N_{\text{RX}}-1,p}\right]{\;\in\;}\mathbb{C}^{N{\times}M} denotes the hybrid precoder and M=NRFNPSM=N_{\text{RF}}N_{\text{PS}} denotes the number of measurements. Then, all received signals under NpN_{\text{p}} pilot subcarriers are stacked in a column vector, so that the received signal 𝐲\mathbf{y} can be expressed as

𝐲=𝐅¯H𝐡+𝐳,\mathbf{y}=\bar{\mathbf{F}}^{H}\mathbf{h}+\mathbf{z}, (3)

where 𝐅¯=𝖻𝗅𝗄𝖽𝗂𝖺𝗀{𝐅0,,𝐅Np1}\bar{\mathbf{F}}=\mathsf{blkdiag}\{\mathbf{F}_{0},{\dots},\mathbf{F}_{N_{\text{p}}-1}\} denotes the hybrid precoder of all pilot subcarriers, and 𝐡\mathbf{h} denotes the spatial-frequency domain channel.

We assume that the channel between the target MT and the BS has LL paths. Then the channel between the MT and the nn-th antenna of BS under the pp-th pilot subcarrier can be expressed as

hp,n=l=0L1αlrl(n)exp[j2πfpc(rl(n)+cτ¯l)],h_{p,n}=\sum_{l=0}^{L-1}\frac{{\alpha}_{l}}{r_{l}^{\left(n\right)}}\exp[-j\frac{2{\pi}f_{p}}{c}\left(r_{l}^{\left(n\right)}+c\bar{\tau}_{l}\right)], (4)

where αl{\alpha}_{l} denotes the complex gain of ll-th path, rl(n)r_{l}^{\left(n\right)} denotes the distance between the nn-th antenna of BS and the last-hop scatter (LHS)222For LOS path and NLOS path, the LHS is defined as the MT and the last scatterer before arriving at the antenna array, respectively. of ll-th path, τ¯l\bar{\tau}_{l} denotes the propagation delay between the MT and LHS of ll-th path, fp=fc+(pNp/2)Δf¯f_{p}=f_{\text{c}}+\left(p-N_{\text{p}}/2\right){\Delta}\bar{f} denotes the operating frequency of the pp-th pilot subcarrier, and cc denotes the speed of light.

Refer to caption
Figure 1: Near-field channel model based on spherical wave.

As shown in Fig. 1, the distance between the nn-th antenna of BS and LHS of the ll-th path for MT satisfies [7]:

rl(n)\displaystyle r_{l}^{\left(n\right)} =[rl(n)cos(θ)l(n)]2+[rl(n)sin(θ)l(n)]2\displaystyle=\sqrt{\left[r_{l}^{\left(n\right)}\cos{\theta}_{l}^{\left(n\right)}\right]^{2}+\left[r_{l}^{\left(n\right)}\sin{\theta}_{l}^{\left(n\right)}\right]^{2}}
=(a)[rl(no)cos(θ)l(no)]2+[rl(no)sin(θ)l(no)+Δnd]2,\displaystyle\mathop{=}^{\left(a\right)}\sqrt{\left[r_{l}^{\left(n_{\text{o}}\right)}\cos{\theta}_{l}^{\left(n_{\text{o}}\right)}\right]^{2}+\left[r_{l}^{\left(n_{\text{o}}\right)}\sin{\theta}_{l}^{\left(n_{\text{o}}\right)}+{\Delta}nd\right]^{2}}, (5)

where θl(n){\theta}_{l}^{\left(n\right)} denotes the physical angle between the nn-th antenna of ULA and LHS of the ll-th path, non_{o} denotes the index of reference antenna, Δnnno{\Delta}n{\;\triangleq\;}n-n_{\text{o}} denotes the antenna index difference between the nn-th antenna and the reference antenna, and (a)\left(a\right) follows from the fact that the geometric constraint rl(n)cos(θ)l(n)r_{l}^{\left(n\right)}\cos{\theta}_{l}^{\left(n\right)} is constant, n,l\forall n,l. With the Taylor series expansion of 1+x\sqrt{1+x} at x=0x=0, rl(n)r_{l}^{\left(n\right)} can be approximated as

rl(n)(a)rl(no)+Δndψl(no)+(Δn)2d21[ψl(no)]2rlno,r_{l}^{\left(n\right)}\mathop{{\;\approx\;}}^{\left(a\right)}r_{l}^{\left(n_{\text{o}}\right)}+{\Delta}nd\psi_{l}^{\left(n_{\text{o}}\right)}+({\Delta}n)^{2}d^{2}\frac{1-\left[{\psi}_{l}^{\left(n_{\text{o}}\right)}\right]^{2}}{r_{l}^{n_{\text{o}}}}, (6)

where ψl(no)sinψl(no){\psi}_{l}^{\left(n_{\text{o}}\right)}{\;\triangleq\;}\sin\psi_{l}^{\left(n_{\text{o}}\right)} denotes the angle between the nn-th antenna of BS and LHS of the ll-th path, d=c/(2fc)d=c/(2f_{\text{c}}) denotes the half-wavelength antenna spacing, and (a)\left(a\right) is obtained by ignoring the higher order terms of Δnd\Delta{n}d.

For convenience, rlrl(no)r_{l}{\;\triangleq\;}r_{l}^{\left(n_{\text{o}}\right)} and ψlψl(no){\psi}_{l}{\;\triangleq\;}{\psi}_{l}^{\left(n_{\text{o}}\right)} are regarded as the location parameters of the ll-th path for MT without causing confusion. Substituting (II-A) into (4) and stacking the spatial domain channels into vector form, we can obtain

𝐡p=l=0L1α¯lej2πfpτl𝐚(rl,ψl,fp)𝐛(rl,ψl),\mathbf{h}_{p}=\sum_{l=0}^{L-1}\bar{\alpha}_{l}e^{-j2{\pi}f_{p}{\tau}_{l}}\mathbf{a}\left({r}_{l},{\psi}_{l},f_{p}\right){\;\circ\;}\mathbf{b}\left({r}_{l},{\psi}_{l}\right), (7)

where τl=τ¯l+rl/c{\tau}_{l}=\bar{\tau}_{l}+r_{l}/c denotes the propagation delay between the MT and reference antenna at BS, 𝐚(r,ψ,fp)\mathbf{a}\left(r,{\psi},f_{p}\right) denotes phase differences under the pp-th pilot subcarrier, which can be defined by

[𝐚(r,ψ,fp)]n=exp[j2πfpc(Δndψ+(Δn)2d21ψ2r)],\left[\mathbf{a}\left({r},{\psi},f_{p}\right)\right]_{n}\!=\!\exp\left[-j\frac{2{\pi}f_{p}}{c}\left({\Delta}nd{\psi}\!+\!({\Delta}n)^{2}d^{2}\frac{1-{\psi}^{2}}{r}\right)\right], (8)

𝐛(r,ψ)\mathbf{b}\left(r,{\psi}\right) denotes power differences, which can be defined by

[𝐛(r,ψ)]n=ϱr+Δndψ+(Δn)2d21ψ2r,\left[\mathbf{b}\left({r},{\psi}\right)\right]_{n}=\frac{\varrho}{r+{\Delta}nd\psi+({\Delta}n)^{2}d^{2}\frac{1-{\psi}^{2}}{r}}, (9)

where ϱ\varrho denotes the normalized factor, such that

1ϱ2=n=0N11(r+Δndψ+(Δn)2d21ψ2r)2,\frac{1}{{\varrho}^{2}}=\sum_{n=0}^{N-1}\frac{1}{\left(r+{\Delta}nd\psi+({\Delta}n)^{2}d^{2}\frac{1-{\psi}^{2}}{r}\right)^{2}}, (10)

and the common phase and gain between all antennas independent of Δn{\Delta}n is merged into the path gain α¯l\bar{\alpha}_{l}. For convenience, we define 𝐜(r,ψ,fp)𝐚(r,ψ,fp)𝐛(r,ψ)\mathbf{c}\left(r,{\psi},f_{p}\right){\;\triangleq\;}\mathbf{a}\left({r},{\psi},f_{p}\right){\;\circ\;}\mathbf{b}\left({r},{\psi}\right) as beam domain steering vector.

In (7), the differences between the mmWave XL-MIMO and mMIMO channel model are two-fold:

  1. 1.

    Near-field effect: In mmWave XL-MIMO systems, the MTs and LHSs located within the Rayleigh distance result in the spherical wave propagation [6, 7]. Moreover, the propagation distance difference causes fluctuations in power attenuation across antennas.

  2. 2.

    Beam-squint effect: In mmWave XL-MIMO systems, the propagation delay differences on the antenna array are not negligible due to the large bandwidth of mmWave systems, which leads to the frequency-dependent beam domain steering vector [8].

To illustrate the deviation from the plane wave assumption due to near-field effects and facilitate subsequent beam domain representations, we can introduce the slope parameter η(1ψ2)/r\eta{\;\triangleq\;}\left(1-{\psi}^{2}\right)/r to characterize the rate of instantaneous angle change with Δn{\Delta}n, which is shown by

[𝐚(r,ψ,fp)]n=exp{j2πfpc[Δnd(ψ+Δnd1ψ2rInstantaneous angle)]}.\left[\mathbf{a}\left({r},{\psi},f_{p}\right)\right]_{n}\!=\!\exp\{\!-\!j\frac{2{\pi}f_{p}}{c}\Bigg{[}{\Delta}nd\Big{(}\underbrace{{\psi}\!+\!{\Delta}nd\frac{1-{\psi}^{2}}{r}}_{\text{Instantaneous angle}}\Big{)}\Bigg{]}\Bigg{\}}. (11)

Then, the angle-distance sampling can be converted to the angle-slope sampling by replace the (1ψ2)/r\left(1-{\psi}^{2}\right)/r with η{\eta}, resulting in the beam domain steering vector 𝐜(η,ψ,fp)\mathbf{c}\left({\eta},{\psi},f_{p}\right).

By stacking the spatial channel of all pilot subcarriers, the spatial-frequency channel is expressed as

𝐡=l=0L1αl[𝐝(τl) 1N]𝐜¯(ηl,ψl),\mathbf{h}=\sum_{l=0}^{L-1}{\alpha}_{l}\left[\mathbf{d}\left({\tau}_{l}\right){\;\otimes\;}\mathbf{1}_{N}\right]{\;\circ\;}\bar{\mathbf{c}}\left({\eta}_{l},{\psi}_{l}\right), (12)

where 𝐝(τ)\mathbf{d}\left({\tau}\right) denotes the delay domain steering vector defined as [𝐝(τ)]p=exp(j2πfpτ)\left[\mathbf{d}\left({\tau}\right)\right]_{p}=\exp(-j2{\pi}f_{p}{\tau}), and 𝐜¯(η,ψ)\bar{\mathbf{c}}\left({\eta},{\psi}\right) denotes aggregate beam domain steering vector given by

𝐜¯(η,ψ)=[𝐜T(η,ψ,f0),,𝐜T(η,ψ,fNp1)]T.\bar{\mathbf{c}}\left({\eta},{\psi}\right)=\left[{\mathbf{c}}^{T}\left({\eta},{\psi},f_{0}\right),{\dots},{\mathbf{c}}^{T}\left({\eta},{\psi},f_{N_{\text{p}}-1}\right)\right]^{T}. (13)
Remark 1.

When specific effects and structure are ignored, the conventional mMIMO channel model can be considered as special cases of the proposed channel model.

  1. 1.

    With both near-field and beam-squint effects, but ignoring the correlation across subcarriers, we have 𝐝(τ)=𝝆\mathbf{d}(\tau)=\bm{\rho}, whose elements are uncorrelated. Thus, the channel model in (12) degenerates to the form in [30, 31].

  2. 2.

    With the beam-squint effect only, we have η 0{\eta}{\;\approx\;}0, thus the channel model in (12) degenerates to the form in [28, 29].

  3. 3.

    With the near-field effect only, we have fpfc,pf_{p}{\;\approx\;}f_{\text{c}},\forall p, thus the channel model in (12) degenerates to the form in [19].

  4. 4.

    Without both near-field and beam-squint effects, we have η 0{\eta}{\;\approx\;}0 and fpfc,pf_{p}{\;\approx\;}f_{\text{c}},\forall p, thus channel model in (12) degenerates to the form in [16, 18].

II-B Beam-Delay Domain Sparse Representation

From (12), the spatial-frequency domain channel is superposed by several paths, resulting in the inter-antenna and inter-subcarrier correlations. To exploit such correlations, we develop the beam-delay domain sparse representation. Toward this end, we assume that the angles, slopes and delays for each MT satisfy ψ[1,1){\psi}{\;\in\;}\left[-1,1\right), η[0,ηmax]{\eta}{\;\in\;}\left[0,{\eta}_{\max}\right], and τ[0,τmax]{\tau}{\;\in\;}\left[0,{\tau}_{\max}\right], respectively. The maximum slope parameter ηmax=1/rmin{\eta}_{\max}=1/r_{\min}, where rminr_{\min} denotes the minimum service distance, and τmax{\tau}_{\max} denotes the maximum allowable delay, which is determined by the cyclic prefix length and the array aperture under the beam-squint effect. Then, the angle, slope, and delay domain are sampled uniformly at intervals ψΔ=2/N{\psi}_{\Delta}=2/N, ηΔ{\eta}_{\Delta}333The sampling interval ηΔ{\eta}_{\Delta} is determined by the discrete Fresnel function with the coherence threshold [19]., and τΔ=1/(NpΔf¯){\tau}_{\Delta}=1/(N_{\text{p}}{\Delta}\bar{f}) to obtain the sampling sets, which can be defined by (14a), (14b), and (14c), respectively.

Υan={ψ¯kan:1+(kan+12)ψΔ,kan𝒦an},\Upsilon_{\text{an}}=\left\{\bar{\psi}_{k_{\text{an}}}:-1+\left(k_{\text{an}}+\frac{1}{2}\right){\psi}_{\Delta},k_{\text{an}}{\;\in\;}\mathcal{K}_{\text{an}}\right\}, (14a)
Υsl={η¯ksl:ηmin+(ksl+12)ηΔ,ksl𝒦sl},\Upsilon_{\text{sl}}=\left\{\bar{\eta}_{k_{\text{sl}}}:{\eta}_{\min}+\left(k_{\text{sl}}+\frac{1}{2}\right){\eta}_{\Delta},k_{\text{sl}}{\;\in\;}\mathcal{K}_{\text{sl}}\right\}, (14b)
Υde={τ¯kde:(kde+12)τΔ,kde𝒦de},\Upsilon_{\text{de}}=\left\{\bar{\tau}_{k_{\text{de}}}:\left(k_{\text{de}}+\frac{1}{2}\right){\tau}_{\Delta},k_{\text{de}}{\;\in\;}\mathcal{K}_{\text{de}}\right\}, (14c)

where 𝒦x{0,1,,Kx1},x{an,sl,de}\mathcal{K}_{\text{x}}{\;\triangleq\;}\left\{0,1,{\dots},K_{\text{x}}-1\right\},\text{x}{\;\in\;}\left\{\text{an},\text{sl},\text{de}\right\} denotes the sampling point index set, Kan=2/ψΔK_{\text{an}}=\lceil 2/{\psi}_{\Delta}\rceil, Kan=ηmax/ηΔK_{\text{an}}=\lceil{\eta}_{\max}/{\eta}_{\Delta}\rceil, and Kde=τmax/τΔK_{\text{de}}=\lceil{\tau}_{\max}/{\tau}_{\Delta}\rceil denote the number of sampling points in angle, slope, and delay domains, respectively. Since the sampling intervals decrease as NN and NpN_{\text{p}} increase, the spatial-frequency domain channel is well approximated as a linear combination of basis functions determined by angle-slope-delay tuples (ψ¯kan,η¯ksl,τ¯kde)\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right) for sufficiently large NN and NpN_{\text{p}}.

Therefore, with the pre-defined sampling set of the angles, slopes, and delays, (12) can be approximated as

𝐡kankslkdeβkan,ksl,kde𝐮(ψ¯kan,η¯ksl,τ¯kde),\mathbf{h}{\;\approx\;}\sum_{{k}_{\text{an}}}\sum_{{k}_{\text{sl}}}\sum_{{k}_{\text{de}}}{\beta}_{k_{\text{an}},k_{\text{sl}},k_{\text{de}}}\mathbf{u}\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right), (15)

where 𝐮(ψ¯kan,η¯ksl,τ¯kde)\mathbf{u}\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right) denotes the angle-slope-delay domain basis function defined as

𝐮(ψ¯kan,η¯ksl,τ¯kde)=[𝐝(τ¯kde) 1N]𝐜¯(η¯ksl,ψ¯kan),\mathbf{u}\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right)=\left[\mathbf{d}\left(\bar{\tau}_{k_{\text{de}}}\right){\;\otimes\;}\mathbf{1}_{N}\right]{\;\circ\;}\bar{\mathbf{c}}\left(\bar{\eta}_{k_{\text{sl}}},\bar{\psi}_{k_{\text{an}}}\right), (16)

and βkan,ksl,kde{\beta}_{k_{\text{an}},k_{\text{sl}},k_{\text{de}}} denotes the complex gain of the path with respect to angle-slope-delay tuple (ψ¯kan,η¯ksl,τ¯kde)\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right), which is given by

βkan,ksl,kde={αl,if(ψ¯kan,η¯ksl,τ¯kde)=(ψl,ηl,τl)0,otherwise.{\beta}_{k_{\text{an}},k_{\text{sl}},k_{\text{de}}}=\begin{cases}{\alpha}_{l},&\text{if}\;\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right)=\left({\psi}_{l},{\eta}_{l},{\tau}_{l}\right)\\ 0,&\text{otherwise}\end{cases}. (17)

Reformulating (15) into matrix form and assuming that the spatial-frequency domain channels can be represented perfectly, we can obtain

𝐡=𝐔𝜷,\mathbf{h}=\mathbf{U}\bm{\beta}, (18)

where 𝐔NNp×KanKslKde\mathbf{U}{\;\in\;}\mathbb{C}^{NN_{\text{p}}{\times}K_{\text{an}}K_{\text{sl}}K_{\text{de}}} and 𝜷KanKslKde×1\bm{\beta}{\;\in\;}\mathbb{C}^{K_{\text{an}}K_{\text{sl}}K_{\text{de}}{\times}1} denote the beam-delay domain transformation matrix and the beam-delay domain channel defined as

[𝐔]:,kdeKanKsl+kanKsl+ksl=𝐮(ψ¯kan,η¯ksl,τ¯kde),\left[\mathbf{U}\right]_{:,k_{\text{de}}K_{\text{an}}K_{\text{sl}}+k_{\text{an}}K_{\text{sl}}+k_{\text{sl}}}=\mathbf{u}\left(\bar{\psi}_{k_{\text{an}}},\bar{\eta}_{k_{\text{sl}}},\bar{\tau}_{k_{\text{de}}}\right), (19)

and

[𝜷]kdeKanKsl+kanKsl+ksl=βkan,ksl,kde,\left[\bm{\beta}\right]_{k_{\text{de}}K_{\text{an}}K_{\text{sl}}+k_{\text{an}}K_{\text{sl}}+k_{\text{sl}}}={\beta}_{k_{\text{an}},k_{\text{sl}},k_{\text{de}}}, (20)

respectively. Without causing confusion, (kan,ksl,kde)(k_{\text{an}},k_{\text{sl}},k_{\text{de}}) maps to the simple index k{0,1,,K1}k{\;\in\;}\{0,1,{\dots},K-1\} in a one-to-one manner, which satisfies

kde=kKanKsl,kan=kKanKslKsl,kan=kKsl,k_{\text{de}}=\lfloor\frac{k}{K_{\text{an}}K_{\text{sl}}}\rfloor,k_{\text{an}}=\lfloor\frac{\langle k\rangle_{K_{\text{an}}K_{\text{sl}}}}{K_{\text{sl}}}\rfloor,k_{\text{an}}=\langle k\rangle_{K_{\text{sl}}}, (21)

and K=KanKslKdeK=K_{\text{an}}K_{\text{sl}}K_{\text{de}}.

II-C Channel Estimation Problem Formulation

With the proposed sparse representation, the channel estimation problem is to estimate the beam-delay domain channel and ultimately the spatial-frequency domain channel based on the received signal. The MMSE channel estimator can be expressed as the a posteriori mean [33], given by

𝜷=𝜷𝖯𝗋(𝜷,𝐬𝐲)d𝜷d𝐬,\bm{\beta}^{\star}=\int\bm{\beta}\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y}\right)\mathrm{d}\bm{\beta}\mathrm{d}\mathbf{s}, (22)

where 𝐬𝐆𝜷\mathbf{s}{\;\triangleq\;}\mathbf{G}\bm{\beta} and 𝐆𝐅¯H𝐔\mathbf{G}{\;\triangleq\;}\bar{\mathbf{F}}^{H}\mathbf{U} denote the auxiliary vector and measurement matrix, respectively, 𝖯𝗋(𝜷,𝐬𝐲)\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y}\right) denotes the posterior probability density function (PDF) of 𝜷\bm{\beta} and 𝐬\mathbf{s}.

From the received signal model and the beam-delay domain channel representation, p(𝜷,𝐬𝐲)p\left(\bm{\beta},\mathbf{s}\mid\mathbf{y}\right) can be expressed as

𝖯𝗋(𝜷,𝐬𝐲)𝖯𝗋(𝐲𝐬)𝖯𝗋(𝐬𝜷)𝖯𝗋(𝜷),\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y}\right){\;\propto\;}\mathsf{Pr}\left(\mathbf{y}\mid\mathbf{s}\right)\mathsf{Pr}\left(\mathbf{s}\mid\bm{\beta}\right)\mathsf{Pr}\left(\bm{\beta}\right), (23)

where 𝖯𝗋(𝐲𝐬)\mathsf{Pr}\left(\mathbf{y}\mid\mathbf{s}\right), 𝖯𝗋(𝐬𝜷)\mathsf{Pr}\left(\mathbf{s}\mid\bm{\beta}\right), and 𝖯𝗋(𝜷)\mathsf{Pr}\left(\bm{\beta}\right) denote received noise model, baseband transfer model, and beam-delay domain channel prior model, respectively.

Since AWGN channels are employed, the conditional PDF 𝖯𝗋(𝐲𝐬)\mathsf{Pr}\left(\mathbf{y}\mid\mathbf{s}\right) can be detailed as

𝖯𝗋(𝐲𝐬)=mp𝒞𝒩(ym,p;sm,p,σz2)𝖯𝗋(ym,psm,p),\mathsf{Pr}\left(\mathbf{y}\mid\mathbf{s}\right)=\prod_{m}\prod_{p}\underbrace{\mathcal{CN}\left(y_{m,p};s_{m,p},{\sigma}_{z}^{2}\right)}_{{\;\triangleq\;}\mathsf{Pr}\left(y_{m,p}\mid s_{m,p}\right)}, (24)

where ym,py_{m,p} and sm,ps_{m,p} denote the (mNp+p)\left(mN_{\text{p}}+p\right)-th element of 𝐲\mathbf{y} and 𝐬\mathbf{s}, respectively. Then the conditional PDF 𝖯𝗋(𝐬𝜷)\mathsf{Pr}\left(\mathbf{s}\mid\bm{\beta}\right) can be factorized as

𝖯𝗋(𝐬𝜷)=mpδ(sm,p𝐠m,p(r)𝜷)𝖯𝗋(sm,p𝜷),\mathsf{Pr}\left(\mathbf{s}\mid\bm{\beta}\right)=\prod_{m}\prod_{p}\underbrace{\delta\left(s_{m,p}-\mathbf{g}_{m,p}^{(\text{r})}\bm{\beta}\right)}_{{\;\triangleq\;}\mathsf{Pr}\left(s_{m,p}\mid\bm{\beta}\right)}, (25)

where 𝐠m,p(r)\mathbf{g}_{m,p}^{(\text{r})} denotes the (mNp+p)\left(mN_{\text{p}}+p\right)-th row of 𝐆\mathbf{G}.

With the assumption of wide-sense stationary uncorrelated scattering Rayleigh fading channel, the elements of 𝜷\bm{\beta} are statistically independent for sufficiently large NN and NpN_{\text{p}} [34]. Moreover, the sparse nature of mmWave channels is considered, implying that only a few number of elements in 𝜷\bm{\beta} have significant power, with most elements being zero. Thus, we can model 𝜷\bm{\beta} as an independent and non-identically distributed Bernoulli-Gaussian distribution, and the prior PDF 𝖯𝗋(𝜷;𝝎)\mathsf{Pr}\left(\bm{\beta};\bm{\omega}\right) controlled by hyperparameters 𝝎\bm{\omega} can be decomposed into

𝖯𝗋(𝜷;𝝎)=k[(1λ)δ(βk)+λ𝒞𝒩(βk;0,χk)]𝖯𝗋(βk;λ,χk),\mathsf{Pr}\left(\bm{\beta};\bm{\omega}\right)=\prod_{k}\underbrace{\left[\left(1-{\lambda}\right){\delta}\left({\beta}_{k}\right)+{\lambda}\mathcal{CN}\left({\beta}_{k};0,{\chi}_{k}\right)\right]}_{{\;\triangleq\;}\mathsf{Pr}\left({\beta}_{k};{\lambda},{\chi}_{k}\right)}, (26)

where 𝝎=[λ,χ0,,χK1]\bm{\omega}=\left[{\lambda},{\chi}_{0},{\dots},{\chi}_{K-1}\right], λ{\lambda} denotes the sparsity level, and χk{\chi}_{k} denotes the power of βk{\beta}_{k} when βk{\beta}_{k} is nonzero.

Building on the probabilistic modeling, the MMSE estimation of spatial-frequency domain channel can be given by

𝐡=𝐔𝜷,\mathbf{h}^{\star}=\mathbf{U}\bm{\beta}^{\star}, (27)

due to the commutability of the MMSE estimator over linear transformations [33].

III Constrained BFE Minimization-based Channel Estimation

The MMSE estimator in (22) requires the multi-dimensional integrals of 𝖯𝗋(𝜷,𝐬𝐲;𝝎)\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y};\bm{\omega}\right), which introduces an unacceptable complexity in large-scale systems. In this section, we approximate 𝖯𝗋(𝜷,𝐬𝐲;𝝎)\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y};\bm{\omega}\right) with structured trial beliefs based on the Bethe method, and then develop the HMP algorithm to achieve efficient joint estimation of beam-delay domain channel and prior hyperparameters.

III-A Constrained BFE Minimization Framework

Following the VFE minimization framework [35, 36, 37], we approximate 𝖯𝗋(𝜷,𝐬𝐲;𝝎)\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y};\bm{\omega}\right) by the trial belief 𝖻(𝜷,𝐬;𝝎)\mathsf{b}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right), which can be obtained by minimizing the Kullback-Leibler (KL) divergence of b(𝜷,𝐬;𝝎)b\left(\bm{\beta},\mathbf{s};\bm{\omega}\right) and 𝖯𝗋(𝜷,𝐬𝐲;𝝎)\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y};\bm{\omega}\right). The KL divergence minimization problem can be expressed by

𝖻^(𝜷,𝐬;𝝎)\displaystyle\hat{\mathsf{b}}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right) =argminb𝒬D[𝖻(𝜷,𝐬;𝝎)𝖯𝗋(𝜷,𝐬𝐲;𝝎)]\displaystyle=\arg\min_{b{\in}\mathcal{Q}}D\left[\mathsf{b}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right)\|\mathsf{Pr}\left(\bm{\beta},\mathbf{s}\mid\mathbf{y};\bm{\omega}\right)\right]
=argminb𝒬V+lnZ(𝐲),\displaystyle=\arg\min_{b{\in}\mathcal{Q}}\mathcal{F}_{V}+\ln Z\left(\mathbf{y}\right), (28)

where 𝒬\mathcal{Q} denotes the constrained set of 𝖻(𝜷,𝐬;𝝎)\mathsf{b}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right), D[]D\left[\cdot\|\cdot\right] denotes the KL divergence, V\mathcal{F}_{V} denotes the VFE defined by

V𝖻(𝜷,𝐬;𝝎)ln𝖻(𝜷,𝐬;𝝎)𝖯𝗋(𝜷,𝐬,𝐲;𝝎)d𝜷d𝐬,\mathcal{F}_{V}{\;\triangleq\;}\iint\mathsf{b}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right)\ln\frac{\mathsf{b}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right)}{\mathsf{Pr}\left(\bm{\beta},\mathbf{s},\mathbf{y};\bm{\omega}\right)}\mathrm{d}\bm{\beta}\mathrm{d}\mathbf{s}, (29)

where 𝖯𝗋(𝜷,𝐲;𝝎)\mathsf{Pr}\left(\bm{\beta},\mathbf{y};\bm{\omega}\right) denotes the joint PDF, and lnZ(𝐲)lnp(𝐲)-\ln Z\left(\mathbf{y}\right){\;\triangleq\;}-\ln p\left(\mathbf{y}\right) denotes the Helmholtz free energy independent of 𝜷\bm{\beta}. Thus, the KL divergence minimization is equivalent to the VFE minimization.

The VFE minimization is still intractable in large-scale systems, which motivates the constrained set design of 𝖻(𝜷;𝝎)\mathsf{b}\left(\bm{\beta};\bm{\omega}\right). To balance the exactness and tractability, the Bethe method is adopted to design the constrained set and convert the VFE minimization into the BFE minimization. Following Bayes rule and the probabilistic model, the joint PDF can be factorized as

𝖯𝗋\displaystyle\mathsf{Pr} (𝜷,𝐬,𝐲;𝝎)\displaystyle\left(\bm{\beta},\mathbf{s},\mathbf{y};\bm{\omega}\right)
=mp𝖯𝗋(ym,psm,p)fy,m,p𝖯𝗋(sm,p𝜷)fs,m,pk𝖯𝗋(βk;λ,χk)fβ,k,\displaystyle=\!\prod_{m}\prod_{p}\underbrace{\mathsf{Pr}\left(y_{m,p}\!\mid\!s_{m,p}\right)}_{{\;\triangleq\;}f_{y,m,p}}\underbrace{\mathsf{Pr}\left(s_{m,p}\!\mid\!\bm{\beta}\right)}_{{\;\triangleq\;}f_{s,m,p}}\prod_{k}\underbrace{\mathsf{Pr}\left({\beta}_{k};{\lambda},{\chi}_{k}\right)}_{{\;\triangleq\;}f_{\beta,k}}, (30)

where fy,m,pf_{y,m,p}, fs,m,pf_{s,m,p} and fβ,kf_{\beta,k} denote the factor nodes. Based on the Bethe method [38, 39, 40], we introduce auxiliary beliefs 𝖻s,m,p(sm,p)\mathsf{b}_{s,m,p}\left(s_{m,p}\right), 𝖻s𝜷,m,p(sm,p,𝜷)\mathsf{b}_{s\bm{\beta},m,p}\left(s_{m,p},\bm{\beta}\right) and 𝖻βλχ,k(βk,λ,χk)\mathsf{b}_{{\beta}{\lambda}{\chi},k}\left({\beta}_{k},{\lambda},{\chi}_{k}\right) for factor nodes fy,m,pf_{y,m,p}, fs,m,pf_{s,m,p} and fβ,kf_{{\beta},k}, respectively. Similarly, we introduce auxiliary beliefs 𝗊λ(λ)\mathsf{q}_{\lambda}\left({\lambda}\right), 𝗊χ,k(χk)\mathsf{q}_{{\chi},k}\left({\chi}_{k}\right), 𝗊β,k(βk)\mathsf{q}_{{\beta},k}\left({\beta}_{k}\right) and 𝗊s,m,p(sm,p)\mathsf{q}_{s,m,p}\left(s_{m,p}\right) for variable nodes λ{\lambda}, χk{\chi}_{k}, βk{\beta}_{k} and sm,ps_{m,p}, respectively. Following this, the trial belief and the BFE can be expressed as

𝖻(𝜷,𝐬;𝝎)=mp𝖻s,m,p𝖻s𝜷,m,pk𝖻βλχ,kmp𝗊s,m,p𝗊λKk𝗊χ,k𝗊β,kMNp,\mathsf{b}\left(\bm{\beta},\mathbf{s};\bm{\omega}\right)=\frac{\prod\limits_{m}\prod\limits_{p}\mathsf{b}_{s,m,p}\mathsf{b}_{s\bm{\beta},m,p}\prod\limits_{k}\mathsf{b}_{{\beta}{\lambda}{\chi},k}}{\prod\limits_{m}\prod\limits_{p}\mathsf{q}_{s,m,p}\mathsf{q}_{\lambda}^{K}\prod\limits_{k}\mathsf{q}_{\chi,k}\mathsf{q}_{\beta,k}^{MN_{\text{p}}}}, (31)

and

B=\displaystyle\mathcal{F}_{B}= mp{D[𝖻s,m,pfy,m,p]+D[𝖻s𝜷,m,pfs,m,p]}\displaystyle\sum_{m}\sum_{p}\left\{D\left[\mathsf{b}_{s,m,p}\|f_{y,m,p}\right]+D\left[\mathsf{b}_{s\bm{\beta},m,p}\|f_{s,m,p}\right]\right\}
+kD[𝖻βλχ,kfβ,k]+mpH[𝗊s,m,p]\displaystyle+\sum_{k}D\left[\mathsf{b}_{{\beta}{\lambda}{\chi},k}\|f_{\beta,k}\right]+\sum_{m}\sum_{p}H[\mathsf{q}_{s,m,p}]
+kMNpH[𝗊β,k]+KH[𝗊λ]+H[𝗊χ,k],\displaystyle+\sum_{k}MN_{\text{p}}H\left[\mathsf{q}_{\beta,k}\right]+KH[\mathsf{q}_{\lambda}]+H[\mathsf{q}_{\chi,k}], (32)

respectively, where H[]H\left[\cdot\right] denotes the entropy.

Inspired by the variational message-passing algorithm [36], we can factorize complicated beliefs into the product of simple beliefs by exploiting the slow-varying characteristic of hyperparameters, which can be expressed as

𝖻βλχ,k(βk,λ,χk)=𝖻β,k(βk)𝖻λ(λ)𝖻χ,k(χk),\mathsf{b}_{{\beta}{\lambda}{\chi},k}\left({\beta}_{k},{\lambda},{\chi}_{k}\right)=\mathsf{b}_{\beta,k}\left({\beta}_{k}\right)\mathsf{b}_{\lambda}\left({\lambda}\right)\mathsf{b}_{\chi,k}\left({\chi}_{k}\right), (33)

where 𝖻β,k(βk)\mathsf{b}_{\beta,k}\left({\beta}_{k}\right), 𝖻λ(λ)\mathsf{b}_{\lambda}\left({\lambda}\right) and 𝖻χ,k(χk)\mathsf{b}_{\chi,k}\left({\chi}_{k}\right) denote the auxiliary belief with respect to βk{\beta}_{k}, λ{\lambda} and χk{\chi}_{k}, respectively. Furthermore, the hyperparameter beliefs can be constrained by the Dirac delta function, which is given as

𝖻λ(λ)=δ(λλtrue),\mathsf{b}_{\lambda}\left({\lambda}\right)=\delta\left({\lambda}-{\lambda}_{\text{true}}\right), (34a)
𝖻χ,k(χk)=δ(χkχk,true),\mathsf{b}_{\chi,k}\left({\chi}_{k}\right)=\delta\left({\chi}_{k}-{\chi}_{k,\text{true}}\right), (34b)

where λtrue(0,1){\lambda}_{\text{true}}{\;\in\;}\left(0,1\right) and χk,true 0{\chi}_{k,\text{true}}{\;\geq\;}0 denote the ground-truth hyperparameters.

To guarantee the global dependence of factor nodes, the auxiliary beliefs of factor and variable nodes in Bethe method should satisfy marginal consistency constraints (MCCs), which is intractable for continuous random variables. To address this issue, the MCCs can be relaxed to mean and variance consistency constraints (MVCCs), which can be given by

𝖤{𝜷𝗊β}=𝖤{𝜷𝖻β}=𝖤{𝜷𝖻s𝜷,m,p},m,p,\mathsf{E}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}=\mathsf{E}\left\{\bm{\beta}\mid\mathsf{b}_{\beta}\right\}=\mathsf{E}\left\{\bm{\beta}\mid\mathsf{b}_{s\bm{\beta},m,p}\right\},\forall m,p, (35a)
𝖵{𝜷𝗊β}=𝖵{𝜷𝖻β}=𝖵{𝜷𝖻s𝜷,m,p},m,p,\mathsf{V}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}=\mathsf{V}\left\{\bm{\beta}\mid\mathsf{b}_{\beta}\right\}=\mathsf{V}\left\{\bm{\beta}\mid\mathsf{b}_{s\bm{\beta},m,p}\right\},\forall m,p, (35b)
𝖤{𝐬𝗊s}=𝖤{𝐬𝖻s}=𝖤{𝐬𝖻s𝜷},\mathsf{E}\left\{\mathbf{s}\mid\mathsf{q}_{s}\right\}=\mathsf{E}\left\{\mathbf{s}\mid\mathsf{b}_{s}\right\}=\mathsf{E}\left\{\mathbf{s}\mid\mathsf{b}_{s\bm{\beta}}\right\}, (35c)
𝖵{𝐬𝗊s}=𝖵{𝐬𝖻s}=𝖵{𝐬𝖻s𝜷},\mathsf{V}\left\{\mathbf{s}\mid\mathsf{q}_{s}\right\}=\mathsf{V}\left\{\mathbf{s}\mid\mathsf{b}_{s}\right\}=\mathsf{V}\left\{\mathbf{s}\mid\mathsf{b}_{s\bm{\beta}}\right\}, (35d)

where the beliefs without index subscripts denote the vector version of the scalar beliefs. The variance consistency constraints with respect to 𝜷\bm{\beta} and 𝖻s𝜷\mathsf{b}_{s\bm{\beta}} are further relaxed as (36) by averaging, which can be given by

𝖵{𝜷𝗊β}=𝖵{𝜷𝖻β}=1MNpmp𝖵{𝜷𝖻s𝜷,m,p}.\mathsf{V}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}=\mathsf{V}\left\{\bm{\beta}\mid\mathsf{b}_{\beta}\right\}=\frac{1}{MN_{\text{p}}}\sum_{m}\sum_{p}\mathsf{V}\left\{\bm{\beta}\mid\mathsf{b}_{s\bm{\beta},m,p}\right\}. (36)

Finally, the KL divergence minimization problem can be converted into the constrained BFE minimization problem, which is expressed by

minb𝒬B,s.t.(33),(34),(35),(36).\min_{b{\in}\mathcal{Q}}{\ }\mathcal{F}_{B},{\quad}\text{s.t.}{\ }\eqref{eq:asd_channel_hyparameter_belief},\eqref{eq:hyparameter_belief},\eqref{eq:MVCC_NR},\eqref{eq:RVCC}. (37)

III-B HMP Algorithm for Channel Estimation

Algorithm 1 HMP Algorithm
1:𝐲\mathbf{y} (received signal), σz2{\sigma}_{z}^{2} (noise variance).
2:Spatial-frequency channel 𝐡^=𝐔𝜷^\hat{\mathbf{h}}=\mathbf{U}\hat{\bm{\beta}}.
3:Initialize 𝜷^,𝝈β2,𝝇β,bs𝜷,𝝃s,bs\hat{\bm{\beta}},\bm{\sigma}_{\beta}^{2},\bm{\varsigma}^{{\beta},b_{s\bm{\beta}}},\bm{\xi}^{s,b_{s}}
4:for t=1,,Tt=1,{\dots},T do
5:     𝝇s,bs=(|𝐆|2(𝝇β,bs𝜷)1)1\bm{\varsigma}^{s,b_{s}}=({|{\mathbf{G}}|^{{\circ}2}}(\bm{\varsigma}^{\beta,b_{s\bm{\beta}}})^{{\circ}-1})^{{\circ}-1}
6:     𝝁s,bs=𝝃s,bs(𝝇s,bs)1+𝐆𝜷^\bm{\mu}^{s,b_{s}}=-{\bm{\xi}^{s,b_{s}}}{\;\circ\;}(\bm{\varsigma}^{s,b_{s}})^{{\circ}-1}+\mathbf{G}\hat{\bm{\beta}}
7:     𝖻s𝒞𝒩(𝐬;𝐲,σz2𝐈)𝒞𝒩(𝐬;𝝁s,bs,𝖽𝗂𝖺𝗀{𝝇s,bs}1)\begin{aligned} \mathsf{b}_{s}{\;\propto\;}&\mathcal{CN}\left(\mathbf{s};\mathbf{y},{\sigma}_{z}^{2}\mathbf{I}\right){\cdot}\\ &\mathcal{CN}\left(\mathbf{s};\bm{\mu}^{s,b_{s}},-\mathsf{diag}\{\bm{\varsigma}^{s,b_{s}}\}^{{\circ}-1}\right)\end{aligned}
8:     Obtain 𝐬^\hat{\mathbf{s}} and 𝝈s\bm{\sigma}_{s} following 𝖻s\mathsf{b}_{s}.
9:     𝝇s,bs𝜷=𝝈s2𝝇s,bs\bm{\varsigma}^{s,b_{s\bm{\beta}}}=-\bm{\sigma}_{s}^{{\circ}-2}-\bm{\varsigma}^{s,b_{s}}
10:     𝝁s,bs𝜷=𝝃s,bs(𝝇s,bs𝜷)1+𝐬^\bm{\mu}^{s,b_{s\bm{\beta}}}={\bm{\xi}^{s,b_{s}}}{\;\circ\;}(\bm{\varsigma}^{s,b_{s\bm{\beta}}})^{{\circ}-1}+\hat{\mathbf{s}}
11:     Δ𝝁=𝝁s,bs𝜷𝝁s,bs{\Delta}\bm{\mu}=\bm{\mu}^{s,b_{s\bm{\beta}}}-\bm{\mu}^{s,b_{s}}
12:     𝝃s,bs=((𝝇s,bs)1+(𝝇s,bs𝜷)1)1Δ𝝁\bm{\xi}^{s,b_{s}}=((\bm{\varsigma}^{s,b_{s}})^{{\circ}-1}+(\bm{\varsigma}^{s,b_{s\bm{\beta}}})^{{\circ}-1})^{{\circ}-1}{\Delta}\bm{\mu}
13:     𝝅s,bs=𝝇s,bs(1+𝝇s,bs𝝈s2)\bm{\pi}^{s,b_{s}}=\bm{\varsigma}^{s,b_{s}}{\;\circ\;}(1+\bm{\varsigma}^{s,b_{s}}{\;\circ\;}\bm{\sigma}_{s}^{2})
14:     𝝅β,bβ=|𝐆H|2𝝅s,bs\bm{\pi}^{\beta,b_{\beta}}=|\mathbf{G}^{H}|^{{\circ}2}\bm{\pi}^{s,b_{s}}
15:     𝝇β,bβ=((𝝅β,bβ)1(MNp𝝇β,bs𝜷)1)1\bm{\varsigma}^{\beta,b_{\beta}}=((\bm{\pi}^{\beta,b_{\beta}})^{{\circ}-1}-(MN_{\text{p}}\bm{\varsigma}^{\beta,b_{s\bm{\beta}}})^{{\circ}-1})^{{\circ}-1}
16:     𝝁β=(𝝇β,bβ)1(𝐆H𝝃s,bs)+𝜷^\bm{\mu}^{\beta}=(\bm{\varsigma}^{\beta,b_{\beta}})^{{\circ}-1}{\;\circ\;}(\mathbf{G}^{H}\bm{\xi}^{s,b_{s}})+\hat{\bm{\beta}}
17:     𝖻β[(1λ^)δ(𝜷)+λ^𝒞𝒩(𝜷;0,𝖽𝗂𝖺𝗀{𝝌^})]𝒞𝒩(𝜷;𝝁β,𝖽𝗂𝖺𝗀{𝝇β,bβ}1)\begin{aligned} \mathsf{b}_{\beta}{\;\propto\;}&[(1-\hat{\lambda}){\delta}(\bm{\beta})+\hat{\lambda}\mathcal{CN}(\bm{\beta};0,\mathsf{diag}\{\hat{\bm{\chi}}\})]{\cdot}\\ &\mathcal{CN}(\bm{\beta};\bm{\mu}^{\beta},-\mathsf{diag}\{\bm{\varsigma}^{\beta,b_{\beta}}\}^{{\circ}-1})\end{aligned}
18:     Obtain 𝜷^\hat{\bm{\beta}} and 𝝈β\bm{\sigma}_{\beta} following 𝖻β\mathsf{b}_{{\beta}}.
19:     𝝇β,bs𝜷=𝝈β2𝝇β,bβ/(MNp)\bm{\varsigma}^{\beta,b_{s\bm{\beta}}}=-\bm{\sigma}_{\beta}^{{\circ}-2}-\bm{\varsigma}^{\beta,b_{\beta}}/(MN_{\text{p}})
20:     Update χ^k,k\hat{\chi}_{k},\forall k by (57) and λ^\hat{\lambda} by (58).
21:end for

The optimization problem in (37) can be solved by the Lagrange multiplier method [41], which sets the derivative of the Lagrangian function with respect to the auxiliary beliefs to zero. Since the auxiliary beliefs of the hyperparameters can be completely factorized in (33), their MCCs can be satisfied constantly and thus can be ignored in the Lagrangian function. Substituting constraints (33) and (34) into 𝖻βλχ,k\mathsf{b}_{{\beta}{\lambda}{\chi},k}, the Lagrangian function of (37) can be expressed as

B=B, HP+M+V,\mathcal{L}_{\text{B}}=\mathcal{F}_{\text{B, HP}}+\mathcal{L}_{\text{M}}+\mathcal{L}_{\text{V}}, (38)

where B, HP\mathcal{F}_{\text{B, HP}} denotes the BFE with factorized hyperparameter constraints, which can be obtained by replacing the KL-divergence D[𝖻βλχ,kfβ,k]D[\mathsf{b}_{{\beta}{\lambda}{\chi},k}\|f_{\beta,k}] with D[𝖻β,kf^β,k]D[\mathsf{b}_{{\beta},k}\|\hat{f}_{\beta,k}] in (III-A), and f^β,k𝖯𝗋(βk;λtrue,χk,true)\hat{f}_{\beta,k}{\;\triangleq\;}\mathsf{Pr}({\beta}_{k};{\lambda}_{\text{true}},{\chi}_{k,\text{true}}), M\mathcal{L}_{\text{M}} and V\mathcal{L}_{\text{V}} defined in (39) denote the subfunctions of mean consistency constraints and variance consistency constraints, respectively, 𝝃s,bs\bm{\xi}^{s,b_{s}}, 𝝃s,bs𝜷\bm{\xi}^{s,b_{s\bm{\beta}}}, 𝝃β,bβ\bm{\xi}^{{\beta},b_{{\beta}}}, and 𝝃m,pβ,bs𝜷\bm{\xi}_{m,p}^{{\beta},b_{s\bm{\beta}}} denote the Lagrangian multiplier for mean consistency constraints, 𝝇s,bs\bm{\varsigma}^{s,b_{s}}, 𝝇s,bs𝜷\bm{\varsigma}^{s,b_{s\bm{\beta}}}, 𝝇β,bβ\bm{\varsigma}^{{\beta},b_{{\beta}}} and 𝝇β,bβ\bm{\varsigma}^{{\beta},b_{{\beta}}} denote the Lagrangian multiplier for variance consistency constraints.

M=\displaystyle\mathcal{L}_{\text{M}}= 2Re{(𝝃s,bs)H[𝖤{𝐬𝗊s}𝖤{𝐬𝖻s}]}+2Re{(𝝃s,bs𝜷)H[𝖤{𝐬𝗊s}𝖤{𝐬𝖻s𝜷}]}\displaystyle 2\real\left\{\left(\bm{\xi}^{s,b_{s}}\right)^{H}\left[\mathsf{E}\left\{\mathbf{s}\mid\mathsf{q}_{s}\right\}-\mathsf{E}\left\{\mathbf{s}\mid\mathsf{b}_{s}\right\}\right]\right\}+2\real\left\{\left(\bm{\xi}^{s,b_{s\bm{\beta}}}\right)^{H}\left[\mathsf{E}\left\{\mathbf{s}\mid\mathsf{q}_{s}\right\}-\mathsf{E}\left\{\mathbf{s}\mid\mathsf{b}_{s\bm{\beta}}\right\}\right]\right\}
+2Re{(𝝃β,bβ)H[𝖤{𝜷𝗊β}𝖤{𝜷𝖻β}]}+mp2Re{(𝝃m,pβ,bs𝜷)H[𝖤{𝜷𝗊β}𝖤{𝜷𝖻s𝜷,m,p}]},\displaystyle+2\real\left\{\left(\bm{\xi}^{{\beta},b_{{\beta}}}\right)^{H}\left[\mathsf{E}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}-\mathsf{E}\left\{\bm{\beta}\mid\mathsf{b}_{\beta}\right\}\right]\right\}+\sum_{m}\sum_{p}2\real\left\{\left(\bm{\xi}_{m,p}^{{\beta},b_{s\bm{\beta}}}\right)^{H}\left[\mathsf{E}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}-\mathsf{E}\left\{\bm{\beta}\mid\mathsf{b}_{s\bm{\beta},m,p}\right\}\right]\right\}, (39a)
V=\displaystyle\mathcal{L}_{\text{V}}= (𝝇s,bs)T[𝖵{𝐬𝗊s}𝖵{𝐬𝖻s}]+(𝝇s,bs𝜷)T[𝖵{𝐬𝗊s}𝖵{𝐬𝖻s𝜷}]\displaystyle\left(\bm{\varsigma}^{s,b_{s}}\right)^{T}\left[\mathsf{V}\left\{\mathbf{s}\mid\mathsf{q}_{s}\right\}-\mathsf{V}\left\{\mathbf{s}\mid\mathsf{b}_{s}\right\}\right]+\left(\bm{\varsigma}^{s,b_{s\bm{\beta}}}\right)^{T}\left[\mathsf{V}\left\{\mathbf{s}\mid\mathsf{q}_{s}\right\}-\mathsf{V}\left\{\mathbf{s}\mid\mathsf{b}_{s\bm{\beta}}\right\}\right]
+(𝝇β,bβ)T[𝖵{𝜷𝗊β}𝖵{𝜷𝖻β}]+(𝝇β,bβ)T[MNp𝖵{𝜷𝗊β}mp𝖵{𝜷𝖻s𝜷,m,p}].\displaystyle+\left(\bm{\varsigma}^{{\beta},b_{{\beta}}}\right)^{T}\left[\mathsf{V}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}-\mathsf{V}\left\{\bm{\beta}\mid\mathsf{b}_{\beta}\right\}\right]+\left(\bm{\varsigma}^{{\beta},b_{{\beta}}}\right)^{T}\left[MN_{\text{p}}\mathsf{V}\left\{\bm{\beta}\mid\mathsf{q}_{\beta}\right\}-\sum_{m}\sum_{p}\mathsf{V}\left\{\bm{\beta}\mid\mathsf{b}_{s\bm{\beta},m,p}\right\}\right]. (39b)

Due to space limitations, the stationary-point equations are provided in Appendix A. Building on the derived equations, the resulting HMP algorithm for channel estimation is summarized in Algorithm 1, where the adaptive damping based on the BFE evaluation [42] is introduced to improve the convergence.

Remark 2.

In the constrained BFE minimization problem, we leverage the properties of different beliefs to design hybrid constraints, thus incorporating the existing message passing algorithms into Algorithm 1. It is noteworthy that the Bernoulli-Gaussian distribution is only one of the prior models, and other specific forms of 𝖻β\mathsf{b}_{\beta} can be obtained by different prior models, such as Bernoulli-Gaussian-Mixture [43] and Laplacian [44] distributions.

IV MDGPP-Based Channel Estimation

Due to the finite number of antennas and pilot subcarriers in the practical system, the sampling sets fail to match the continuous parameters in the channel perfectly [19, 28], resulting in the modeling inaccuracy [45, 46]. To address this issue, we introduce the MDGPP-based representation and propose the two-stage HMP algorithm based on the constrained BFE minimization framework.

IV-A MDGPP-Based Channel Representation

To eliminate the gap between the sampling sets and the continuous parameters, the spatial-frequency domain channel can be decorrelated by the MDGPP-based channel basis functions, which is expressed as

𝐡kβk𝐮~k(Δψk,Δηk,Δτk),\mathbf{h}{\;\approx\;}\sum_{k}{\beta}_{k}\tilde{\mathbf{u}}_{k}\left({\Delta}{\psi}_{k},{\Delta}{\eta}_{k},{\Delta}{\tau}_{k}\right), (40)

where Δψk[ψΔ/2,ψΔ/2){\Delta}{\psi}_{k}{\;\in\;}\left[-{\psi}_{\Delta}/2,{\psi}_{\Delta}/2\right), Δηk[ηΔ/2,ηΔ/2){\Delta}{\eta}_{k}{\;\in\;}\left[-{\eta}_{\Delta}/2,{\eta}_{\Delta}/2\right), and Δτk[τΔ/2,τΔ/2){\Delta}{\tau}_{k}{\;\in\;}\left[-{\tau}_{\Delta}/2,{\tau}_{\Delta}/2\right) denote the perturbation parameters in angle, slope, and delay of the kk-th sampling tuple, respectively, βk{\beta}_{k} denotes the complex gain of the kk-th sampling tuple, and 𝐮~k(Δψk,Δηk,Δτk)\tilde{\mathbf{u}}_{k}\left({\Delta}{\psi}_{k},{\Delta}{\eta}_{k},{\Delta}{\tau}_{k}\right) denote the perturbed basis function of the kk-th sampling tuple defined as

𝐮~k(Δψk,Δηk,Δτk)=\displaystyle\tilde{\mathbf{u}}_{k}\left({\Delta}{\psi}_{k},{\Delta}{\eta}_{k},{\Delta}{\tau}_{k}\right)= [𝐝(τ¯kde+Δτk) 1N]\displaystyle\left[\mathbf{d}\left(\bar{\tau}_{k_{\text{de}}}+{\Delta}{\tau}_{k}\right){\;\otimes\;}\mathbf{1}_{N}\right]
𝐜¯(η¯ksl+Δηk,ψ¯an+Δψk),\displaystyle{\;\circ\;}\bar{\mathbf{c}}\left(\bar{\eta}_{k_{\text{sl}}}+{\Delta}{\eta}_{k},\bar{\psi}_{\text{an}}+{\Delta}{\psi}_{k}\right), (41)

where the index tuple (kan,ksl,kde)\left({k}_{\text{an}},{k}_{\text{sl}},{k}_{\text{de}}\right) and the simple index kk satisfies the mapping rule (21). The proposed MDGPP-based representation assigns individual perturbation parameters to each beam-delay domain sampling grid, while the conventional perturbation assigns common perturbation parameters based on grid line constraints [47]. To visualize advantages of the MDGPP-based representation, the schematic of the two-dimensional example is shown in Fig. 2, where the perturbations are shown on the (θ,ϕ)(\theta,\phi) parameter. It can be observed that, with conventional perturbations, the perturbation of “Grid 1” for matching “Path 1” causes the mismatch between “Grid 2” and “Path 2” due to the common perturbation parameter Δθn{\Delta}{\theta}_{n}, which can be avoided by assigning different perturbation parameters Δθn,p{\Delta}{\theta}_{n,p} and Δθn,q{\Delta}{\theta}_{n,q} in the MDGPP.

Remark 3.

The conventional perturbation can be considered as a special case of the MDGPP. Specifically, when the MDGPP parameters exhibits the specific structure as

Δ𝝍=Δ𝝍¯ 1Ksl 1Kde,\Delta\bm{\psi}=\overline{\Delta\bm{\psi}}{\;\otimes\;}\mathbf{1}_{K_{\text{sl}}}{\;\otimes\;}\mathbf{1}_{K_{\text{de}}}, (42a)
Δ𝜼=𝟏KanΔ𝜼¯ 1Kde,\Delta\bm{\eta}=\mathbf{1}_{K_{\text{an}}}{\;\otimes\;}\overline{\Delta\bm{\eta}}{\;\otimes\;}\mathbf{1}_{K_{\text{de}}}, (42b)
Δ𝝉=𝟏Kan 1KslΔ𝝉¯,\Delta\bm{\tau}=\mathbf{1}_{K_{\text{an}}}{\;\otimes\;}\mathbf{1}_{K_{\text{sl}}}{\;\otimes\;}\overline{\Delta\bm{\tau}}, (42c)

the MDGPP-based representation degenerates to the form in [47], where Δ𝝍\Delta\bm{\psi}, Δ𝜼\Delta\bm{\eta} and Δ𝝉\Delta\bm{\tau} denote vectors of MDGPP parameters in the angle, slope and delay domains, respectively, and Δ𝝍¯\overline{\Delta\bm{\psi}}, Δ𝜼¯\overline{\Delta\bm{\eta}} and Δ𝝉¯\overline{\Delta\bm{\tau}} denote vectors of conventional perturbation parameters in the angle, slope and delay domains, respectively.

Refer to caption
(a) Conventional perturbations
Refer to caption
(b) MDGPP
Figure 2: Conventional perturbations and MDGPP of the two-dimensional example (Green circle: ground-truth parameter, blue circle: grids without perturbation, puple/red cross: matched/mismatched grids with perturbation).

Benefiting from MDGPP-based representations, each path can be associated with the sampling tuple in Υ\Upsilon which is closest to the path parameters, and the deviations are described by the MDGPP parameters. By stacking the perturbed basis function into matrix form, the spatial-frequency domain channel can be given by the bilinear form as

𝐡=𝐔~(Δ𝝍,Δ𝜼,Δ𝝉)𝜷,\mathbf{h}=\tilde{\mathbf{U}}\left(\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right)\bm{\beta}, (43)

where 𝐔~(Δ𝝍,Δ𝜼,Δ𝝉)\tilde{\mathbf{U}}\left(\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right) denotes the perturbed transformation maxtrix with perturbation parameter vector Δ𝝍\Delta\bm{\psi}, Δ𝜼\Delta\bm{\eta}, and Δ𝝉\Delta\bm{\tau}. To balance the exactness and complexity, the perturbed transformation maxtrix is approximated by the Taylor series [48], which can be expressed as

𝐔~(Δ𝝍,Δ𝜼,Δ𝝉)=\displaystyle\tilde{\mathbf{U}}\left(\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right)= 𝐔+𝐔ψ𝖽𝗂𝖺𝗀{Δ𝝍}+\displaystyle\mathbf{U}+\mathbf{U}_{\psi}\mathsf{diag}\left\{\Delta\bm{\psi}\right\}+
𝐔η𝖽𝗂𝖺𝗀{Δ𝜼}+𝐔τ𝖽𝗂𝖺𝗀{Δ𝝉},\displaystyle\mathbf{U}_{\eta}\mathsf{diag}\left\{\Delta\bm{\eta}\right\}+\mathbf{U}_{\tau}\mathsf{diag}\left\{\Delta\bm{\tau}\right\}, (44)

where 𝐔ψ\mathbf{U}_{\psi}, 𝐔η\mathbf{U}_{\eta}, and 𝐔τ\mathbf{U}_{\tau} denote the first-order partial derivatives of 𝐔\mathbf{U} with respect to angle, slope and delay.

IV-B Two-Stage HMP Algorithm for MDGPP-based Channel Estimation

Following the proposed representation, the channel estimation problem aims to estimate both beam-delay domain channel and the MDGPP paramters, leading to the structured bilinear model. To tackle this problem, the joint PDF with the MDGPP paramters is factorized as

𝖯𝗋\displaystyle\mathsf{Pr} (𝐲,𝐬,𝜷,Δ𝝍,Δ𝜼,Δ𝝉)=p(𝐲𝐬)𝖯𝗋(𝜷)\displaystyle\left(\mathbf{y},\mathbf{s},\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right)=p\left(\mathbf{y}\mid\mathbf{s}\right)\mathsf{Pr}\left(\bm{\beta}\right)
𝖯𝗋(𝐬𝜷,Δ𝝍,Δ𝜼,Δ𝝉)𝖯𝗋(Δ𝝍)𝖯𝗋(Δ𝜼)𝖯𝗋(Δ𝝉),\displaystyle\mathsf{Pr}\left(\mathbf{s}\mid\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right)\mathsf{Pr}\left(\Delta\bm{\psi}\right)\mathsf{Pr}\left(\Delta\bm{\eta}\right)\mathsf{Pr}\left(\Delta\bm{\tau}\right), (45)

where the MDGPP parameters are assumed to be mutually independent of each other and beam-delay domain channels. With the independent assumptions, the prior PDF of MDGPP parameters can be fully factorized to 𝖯𝗋(Δψk)\mathsf{Pr}\left(\Delta{\psi}_{k}\right), 𝖯𝗋(Δηk)\mathsf{Pr}\left(\Delta{\eta}_{k}\right), and 𝖯𝗋(Δτk)\mathsf{Pr}\left(\Delta{\tau}_{k}\right), which can be denoted by fΔψ,kf_{\Delta{\psi},k}, fΔη,kf_{\Delta{\eta},k}, and fΔτ,kf_{\Delta{\tau},k}, respectively. The conditional PDF can be factorized as

𝖯𝗋\displaystyle\mathsf{Pr} (𝐬𝜷,Δ𝝍,Δ𝜼,Δ𝝉)\displaystyle\left(\mathbf{s}\mid\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right)
=mp𝖯𝗋(sm,p𝜷,Δ𝝍,Δ𝜼,Δ𝝉)f~s,m,p.\displaystyle=\prod_{m}\prod_{p}\underbrace{\mathsf{Pr}\left(s_{m,p}\mid\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right)}_{\tilde{f}_{s,m,p}}. (46)

Building on factorizations of the joint PDF, the factor graph of MDGPP-based channel estimation is shown in Fig. 3. We introduce auxiliary beliefs 𝖻Δψ,k(Δψk)\mathsf{b}_{\Delta{\psi},k}\left({\Delta}{\psi}_{k}\right), 𝖻Δη,k(Δηk)\mathsf{b}_{\Delta{\eta},k}\left({\Delta}{\eta}_{k}\right), 𝖻Δτ,k(Δτk)\mathsf{b}_{\Delta{\tau},k}\left({\Delta}{\tau}_{k}\right), and 𝖻s𝜷Δ𝝍Δ𝜼Δ𝝉,m,p(sm,p,𝜷,Δ𝝍,Δ𝜼,Δ𝝉)\mathsf{b}_{s\bm{\beta}\Delta\bm{\psi}\Delta\bm{\eta}\Delta\bm{\tau},m,p}\left(s_{m,p},\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau}\right) for factor nodes fΔψ,kf_{\Delta{\psi},k}, fΔη,kf_{\Delta{\eta},k}, fΔτ,kf_{\Delta{\tau},k}, and f~s,m,p\tilde{f}_{s,m,p}, respectively. Similarly, the auxiliary beliefs 𝗊Δψ,k(Δψk)\mathsf{q}_{\Delta{\psi},k}\left({\Delta}{\psi}_{k}\right), 𝗊Δη,k(Δηk)\mathsf{q}_{\Delta{\eta},k}\left({\Delta}{\eta}_{k}\right), and 𝗊Δτ,k(Δτk)\mathsf{q}_{\Delta{\tau},k}\left({\Delta}{\tau}_{k}\right) are introduced for variable nodes Δψk\Delta{\psi}_{k}, Δηk\Delta{\eta}_{k}, and Δτk\Delta{\tau}_{k}, respectively. To guarantee the tractability of the structured bilinear inference problem, the complicated beliefs 𝖻s𝜷Δ𝝍Δ𝜼Δ𝝉,m,p\mathsf{b}_{s\bm{\beta}\Delta\bm{\psi}\Delta\bm{\eta}\Delta\bm{\tau},m,p} can be factorized as

𝖻s𝜷Δ𝝍Δ𝜼Δ𝝉,m,p=𝖻s𝜷,m,pk(𝖻Δψ,k𝖻Δη,k𝖻Δτ,k),\mathsf{b}_{s\bm{\beta}\Delta\bm{\psi}\Delta\bm{\eta}\Delta\bm{\tau},m,p}=\mathsf{b}_{s\bm{\beta},m,p}\prod_{k}\left(\mathsf{b}_{\Delta{\psi},k}\mathsf{b}_{\Delta{\eta},k}\mathsf{b}_{\Delta{\tau},k}\right), (47)

where 𝖻Δψ,k\mathsf{b}_{\Delta{\psi},k}, 𝖻Δη,k\mathsf{b}_{\Delta{\eta},k}, and 𝖻Δτ,k\mathsf{b}_{\Delta{\tau},k} are further constrained by the Dirac delta function with ground-truth perturbation parameters Δψk,true[ψΔ/2,ψΔ/2){\Delta}{\psi}_{k,\text{true}}{\in}\left[-{\psi}_{\Delta}/2,{\psi}_{\Delta}/2\right), Δηk,true[ηΔ/2,ηΔ/2){\Delta}{\eta}_{k,\text{true}}{\in}\left[-{\eta}_{\Delta}/2,{\eta}_{\Delta}/2\right), and Δτk,true[τΔ/2,τΔ/2){\Delta}{\tau}_{k,\text{true}}{\in}\left[-{\tau}_{\Delta}/2,{\tau}_{\Delta}/2\right).

Refer to caption
Figure 3: Factor graph representation of the MDGPP-based channel estimation.

With the formulation above, the perturbation parameters can be acquired efficiently by the constrained BFE minimization framework. Compared with Algorithm 1, the HMP algorithm for MDGPP-based channel estimation differ in the update of perturbation parameters, which is derived in Appendix B. The update rule of angle domain perturbation parameters is

Δ𝝍=argminΔ𝝍Δ𝝍T𝐏ψΔ𝝍2𝐮ψTΔ𝝍,\Delta\bm{\psi}^{\star}=\arg\min_{\Delta\bm{\psi}}\Delta\bm{\psi}^{T}\mathbf{P}_{\psi}\Delta\bm{\psi}-2\mathbf{u}_{\psi}^{T}\Delta\bm{\psi}, (48)

where 𝐏ψ\mathbf{P}_{\psi} and 𝐮ψ\mathbf{u}_{\psi} can be defined by

𝐏ψ(𝐔ψH𝐅¯𝐅¯H𝐔ψ)(𝜷^𝜷^H+𝚺β),\mathbf{P}_{\psi}{\;\triangleq\;}\left(\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\mathbf{U}_{\psi}\right)^{\ast}{\;\circ\;}\left(\hat{\bm{\beta}}\hat{\bm{\beta}}^{H}+\bm{\Sigma}_{\beta}\right), (49)

and

𝐮ψ\displaystyle\mathbf{u}_{\psi}{\;\triangleq\;} Re{𝖽𝗂𝖺𝗀{𝜷^}𝐔ψH𝐅¯(𝐲𝐅¯H𝐔\ψ𝜷^)}\displaystyle\real\left\{\mathsf{diag}\left\{\hat{\bm{\beta}}^{\ast}\right\}\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}\left(\mathbf{y}-\bar{\mathbf{F}}^{H}\mathbf{U}_{\backslash{\psi}}\hat{\bm{\beta}}\right)\right\}
Re{𝖽𝗂𝖺𝗀{𝐔ψH𝐅¯𝐅¯H𝐔\ψ𝚺β}},\displaystyle-\real\left\{\mathsf{diag}\left\{\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\mathbf{U}_{\backslash{\psi}}\bm{\Sigma}_{\beta}\right\}\right\}, (50)

where 𝐔\ψ𝐔+𝐔η𝖽𝗂𝖺𝗀{Δ𝜼}+𝐔τ𝖽𝗂𝖺𝗀{Δ𝝉}\mathbf{U}_{\backslash{\psi}}{\;\triangleq\;}\mathbf{U}+\mathbf{U}_{\eta}\mathsf{diag}\left\{\Delta\bm{\eta}\right\}+\mathbf{U}_{\tau}\mathsf{diag}\left\{\Delta\bm{\tau}\right\}, and 𝚺β\bm{\Sigma}_{\beta} denotes the posterior covariance matrix whose diagonal elements are 𝝈β2\bm{\sigma}_{\beta}^{{\circ}2} in Algorithm 1. To solve the above optimization problem, the closed-form solution without bounding constraints is obtained and then checked whether the bounding constraints are satisfied. When the closed-form solution does not satisfy the bounding constraints, the perturbation parameters are adjusted sequentially by

Δψk=Limit{[𝐮ψ]n[𝐏ψ]\n,nT[Δ𝝍]\n[𝐏ψ]n,n,ψΔ2,ψΔ2},\Delta{\psi}_{k}^{\star}\!=\!\mathrm{Limit}\left\{\frac{\left[\mathbf{u}_{\psi}\right]_{n}\!-\!\left[\mathbf{P}_{\psi}\right]_{\backslash n,n}^{T}\left[\Delta\bm{\psi}\right]_{\backslash n}}{\left[\mathbf{P}_{\psi}\right]_{n,n}},\!-\!\frac{{\psi}_{\Delta}}{2},\!\frac{{\psi}_{\Delta}}{2}\right\}, (51)

where Limit{x,a,b}min{max{x,a},b}\mathrm{Limit}\left\{x,a,b\right\}{\;\triangleq\;}\min\left\{\max\left\{x,a\right\},b\right\}. Similarly, the slope and delay domain perturbation parameters Δ𝜼\Delta\bm{\eta} and Δ𝝉\Delta\bm{\tau} can be obtained in the same manner.

However, the closed-form solution of the MDGPP parameters requires matrix inversion, which has a prohibitively high computational complexity in large-scale systems. Motivated by the mmWave channel sparsity, the two-stage HMP algorithm for MDGPP-based channel estimation can be proposed in Algorithm 2, which involves the initial estimation and refinement stages. In the first stage, the initial estimation of the beam-delay domain channel is obtained by Algorithm 1 without model perturbations. Building upon the initial estimation, the elements with energies below a certain threshold are pruned to significantly reduce the number of perturbation parameters to be estimated next. In the second stage, the pruned beam-delay domain channel and its corresponding perturbation parameters are iteratively acquired.

Algorithm 2 Two-Stage HMP Algorithm
1:𝐲\mathbf{y}, σz2{\sigma}_{z}^{2}, EthE_{\text{th}} (energy threshold).
2:𝐡^=𝐔𝜷^\hat{\mathbf{h}}=\mathbf{U}\hat{\bm{\beta}}.
3:Execute Algorithm 1 to obtain the initial estimation of beam-delay domain channel 𝜷^ini\hat{\bm{\beta}}_{\text{ini}}.
4:Determine the unpurned indices based on the energy criterion 𝒮ref={k:[𝜷^ini]k>Eth𝜷^ini}\mathcal{S}_{\text{ref}}=\{k:[\hat{\bm{\beta}}_{\text{ini}}]_{k}>E_{\text{th}}\|\hat{\bm{\beta}}_{\text{ini}}\|_{\infty}\}.
5:Prune the initial estimation of beam-delay domain channel 𝜷^ref=[𝜷^ini]𝒮ref\hat{\bm{\beta}}_{\text{ref}}=[\hat{\bm{\beta}}_{\text{ini}}]_{\mathcal{S}_{\text{ref}}}, and Kref𝖼𝖺𝗋𝖽(𝒮ref)K_{\text{ref}}{\;\triangleq\;}\mathsf{card}(\mathcal{S}_{\text{ref}}).
6:Initialize Δ𝝍\Delta\bm{\psi}, Δ𝜼\Delta\bm{\eta}, and Δ𝝉\Delta\bm{\tau} to 𝟎\mathbf{0}.
7:for t=1,,Treft=1,{\dots},T_{\text{ref}} do
8:     Construct 𝐆~=𝐅¯H𝐔~\tilde{\mathbf{G}}=\bar{\mathbf{F}}^{H}\tilde{\mathbf{U}} based on Δ𝝍\Delta\bm{\psi}, Δ𝜼\Delta\bm{\eta}, and Δ𝝉\Delta\bm{\tau}.
9:     Obtain pruned measurement matrix 𝐆ref=[𝐆~]:,𝒮ref\mathbf{G}_{\text{ref}}=[\tilde{\mathbf{G}}]_{:,\mathcal{S}_{\text{ref}}}.
10:     Replace 𝜷^\hat{\bm{\beta}} and 𝐆\mathbf{G} by 𝜷^ref\hat{\bm{\beta}}_{\text{ref}} and 𝐆ref\mathbf{G}_{\text{ref}}, respectively.
11:     Execute Line 5 - Line 19 in Algorithm 1.
12:     for k=0,1,,Kref1k=0,1,{\dots},K_{\text{ref}}-1 do
13:         Update χ^k\hat{\chi}_{k} by (57) and λ^\hat{\lambda} by (58).
14:         Update Δψk\Delta{\psi}_{k}, Δηk\Delta{\eta}_{k}, and Δτk\Delta{\tau}_{k}.
15:     end for
16:end for

IV-C Computational Complexity

The per-iteration computational complexity of Algorithm 1 is 𝒪(MNpKanKslKde)\mathcal{O}(MN_{\text{p}}K_{\text{an}}K_{\text{sl}}K_{\text{de}}), which are dominated by matrix-vector multiplications. For Algorithm 2, the per-iteration computational complexities in the initial estimation stage and the refinement stage are 𝒪(MNpKanKslKde)\mathcal{O}(MN_{\text{p}}K_{\text{an}}K_{\text{sl}}K_{\text{de}}) and 𝒪(MNpKref2+Kref3)\mathcal{O}(MN_{\text{p}}K_{\text{ref}}^{2}+K_{\text{ref}}^{3}), where the latter is dominated by the perturbation parameter acquisitions. Since the size of beam-delay domain channel after pruning can be assumed to keep the same order of magnitude as path number LL (LMNpL{\;\ll\;}MN_{\text{p}}), we have KrefMNpK_{\text{ref}}{\;\ll\;}MN_{\text{p}}, such that the per-iteration computational complexity in the refinement stage can be approximated as 𝒪(MNpKref2)\mathcal{O}(MN_{\text{p}}K_{\text{ref}}^{2}). Therefore, the total computational complexity of Algorithm 2 is about 𝒪(MNpKanKslKdeTini+MNpKref2Tref)\mathcal{O}(MN_{\text{p}}K_{\text{an}}K_{\text{sl}}K_{\text{de}}T_{\text{ini}}+MN_{\text{p}}K_{\text{ref}}^{2}T_{\text{ref}}), where TiniT_{\text{ini}} and TrefT_{\text{ref}} denote the number of iterations for the initial estimation stage and refinement stage, respectively. It can also be found that KrefK_{\text{ref}} controls the tradeoff between the computational complexity and performance. Specifically, a larger KrefK_{\text{ref}} leads to higher complexity, while smaller KrefK_{\text{ref}} results in performance degradations since the true beam-delay domain channel components will be pruned.

It can be observed that the exploration of inter-subcarrier correlations increases the computational complexity but delivers significant performance gains, as verified in subsequent simulations. Since the proposed algorithms are inherently compatible with the prior models, the further reduction of the computational complexity is feasible by exploiting the prior information, which can be acquired by the sophisticated statistical channel state information estimation [49] and channel knowledge map [50]. This reduction unfolds two aspects:

  • The reduction in unknown variables: With the prior information, the support acquisition of beam-delay domain channels is avoided, which enables the reduction of the unknown variables to the order of the number of paths.

  • The reduction in observations: The received signals are highly correlated with support-aided beam-delay domain channels. To eliminate such redundancy, the row-orthogonalization [51] can be introduced to reduce the effective observations to the order of unknown variables based on an equivalent BFE representation.

Furthermore, the subsequent simulations also indicate that the proposed algorithms requires fewer pilot symbols than benchmarks, enabling a further reduction in computational complexity while maintaining performance.

V Numeical Simulations

Unless otherwise specified, the simulation configurations follow the system model in Section II with the paramters summarized in Table I. In the subsequent simulations, the SNR is defined as the received SNR defined as SNR(dB)=10log10(𝖤{𝐅¯H𝐡22}/𝖤{𝐳22})\text{SNR}\;(\text{dB})=10\log_{10}(\mathsf{E}\{\|\bar{\mathbf{F}}^{H}\mathbf{h}\|_{2}^{2}\}/\mathsf{E}\{\|\mathbf{z}\|_{2}^{2}\}).

TABLE I: Scenario Paramters
Paramter Value
Carrier Frequency fc=30f_{\text{c}}=30 GHz
Pilot Subcarrier Spacing Δf¯=25{\Delta}\bar{f}=25 MHz
Number of BS Antennas N=256N=256
Number of BS RF Chains NRF=16N_{\text{RF}}=16
Number of Pilot Subcarriers Np=64N_{\text{p}}=64
Number of Pilot Symbols NPS=8N_{\text{PS}}=8
Minimum Service Distance rmin=3r_{\min}=3 m
Number of Paths L=6L=6

To demonstrate the superiority of the proposed algorithms, we select the following excellent channel estimation algorithms as benchmarks:

  • P-SIGW [19]: This channel estimation algorithm is designed for near-field channels without beam squint effect, which models the beam domain sparsity with sampling in the angle and distance doamin.

  • GSOMP [25]: This channel estimation algorithm is designed for far-field channels with beam squint effect, which employs frequency-dependent sensing matrices.

  • BPD [30], NBA-OMP [31]: These channel estimation algorithm are designed for near-field channels with beam squint effect, which captures the joint sparsity pattern across subcarriers by bilinear pattern and employs frequency-dependent sensing matrices, respectively.

For the comparison with the proposed algorithms and benchmarks, the normalized mean square error (NMSE) of the spatial-frequency channels is adopted as the performance metric, which is defined by:

NMSE(dB)=10log10{1Dd=0D1𝐡^(d)𝐡(d)22𝐡(d)22},\text{NMSE}\;(\text{dB})=10\log_{10}\left\{\frac{1}{D}\sum_{d=0}^{D-1}\frac{\|\hat{\mathbf{h}}^{(d)}-\mathbf{h}^{(d)}\|_{2}^{2}}{\|\mathbf{h}^{(d)}\|_{2}^{2}}\right\}, (52)

where DD denotes the numebr of simulations, 𝐡(d)\mathbf{h}^{(d)} and 𝐡^(d)\hat{\mathbf{h}}^{(d)} denote the ground-truth and estimated spatial-frequency channels in the dd-th simulation, respectively.

Refer to caption
Figure 4: The convergence performance of the proposed algorithms.

The convergence performance comparison of Algorithm 1 and Algorithm 2 at the SNR of 1515dB is provided in Fig. 4. It can be seen that the MDGPP-based representation introduced in Algorithm 2 significantly reduces the NMSE from 15.94-15.94dB to 18.75-18.75dB. Since both Algorithm 1 and Algorithm 2 have rapid decline rate of NMSE in the beginning iterations, we can terminate the algorithm iterations earlier to reduce the computational complexity based on the performance requirements of practical systems. To show the effectiveness of the proposed beam-delay domain sparse representation, the beam-delay domain channel power of the on-grid case estimated from Algorithm 1 without specific effects are provided in Fig. 5. The proposed beam-delay domain sparse representation can decorrelate the spatial-frequency domain channel effectively, while ignoring the specific effects induces significant energy leakage.

Refer to caption
Figure 5: Beam-delay domain channel power of the on-grid case: (a) This work, (b) Without near-field effect, (c) Without beam-squint effect, (d) Without both near-field and beam-squint effects.
Refer to caption
Figure 6: NMSE versus received SNR.

The NMSE of the proposed algorithms and the benchmarks with the variation of SNR is shown in Fig. 6. It can be seen that the proposed algorithms significantly outperforms benchmarks at the full SNR region, with the performance gain increasing as SNR grows. Due to the mismatch between the assumptions and this scenario, the GSOMP and P-SIGW fail to achieve an NMSE below 7-7dB at the SNR of 1515dB. By exploiting the joint sparsity variations across subcarriers and modeling the spherical wave propagation, the NMSE performances of BPD and NBA-OMP are almost identical, both outperforming GSOMP and P-SIGW. However, there is still a significant gap between the BPD, NBA-OMP and proposed algorithms, which stems from the further exploration of inter-subcarrier correlations. From the performance comparison of Algorithm 1 and Algorithm 2, it is clear that MDGPP-based representations can significantly improve model accuracy and thus Algorithm 2 achieves the lowest NMSE.

Refer to caption
Figure 7: NMSE versus the number of pilot symbols.

The trend of the channel estimation performance with the number of pilot symbols is crucial in the system design, hence, we provide the NMSE of the different algorithms as the number of pilot symbols varies in Fig. 7. Although the received SNRs are kept identical for different numbers of pilot symbols, the NMSE of the GSOMP, P-SIGW, BPD, NBA-OMP, and the proposed algorithms still decreases with the increase in the number of pilot symbols. It indicates that the number of measurements is also an essential factor in the channel estimation problem besides the received SNRs. The proposed algorithms significantly outperform all benchmarks for the full range of pilot symbol number, implying that the pilot overhead can be effectively reduced in the proposed algorithms with the same NMSE requirement. Specifically, Algorithm 1 can reduce the pilot overhead by about 10%10\% compared to the NBA-OMP for 10-10dB NMSE, while Algorithm 2 can further reduce the pilot overhead by about 25%25\% compared to Algorithm 1 for 17-17dB NMSE.

Refer to caption
Figure 8: NMSE versus maximum slope parameter of MTs.

To illustrate the impact of near-field and beam-squint effects on different algorithms, the NMSEs of the proposed algorithms and the benchmarks as maximum slope parameter of MTs and the transmission bandwidths varies are shown in Fig. 8 and Fig. 9, respectively. As the near-field effect growing more significant, the deviation of the plane wave assumption adopted by the GSOMP rises, resulting in increasing NMSE. For the beam-squint effect, the P-SIGW that relies on the frequency-independent beam domain assumption exhibits the increasing NMSE as the transmission bandwidth grows. Besides, although the BPD and NBA-OMP model the frequency-dependent beam domain and spherical wave propagation to achieve better NMSE than other benchmarks, the proposed algorithms reach the lowest NMSE near 20-20dB under both near-field and beam-squint effects.

Refer to caption
Figure 9: NMSE versus transmission bandwidth.

VI Conclusion

In this paper, we investigated the uplink channel estimation of mmWave XL-MIMO communication systems in the beam-delay domain, taking into account the near-field and beam-squint effects. Specifically, we modeled the sparsity in the beam-delay domain, and captured it by the independent and non-identically distributed Bernoulli-Gaussian models with unknown prior hyperparameters. To improve the model accuracy, we introduced the MDGPP-based representation, and treated the MDGPP parameters as unknown hyperparameters. Under the constrained BFE minimization framework, we established the two-stage HMP algorithm for MDGPP-based channel estimation, which enables efficient joint estimation of the beam-delay domain channel and its prior hyperparameters, and reduces the computational complexity by pruning the output of the initial estimation stage. With extensive numerical simulations, the proposed algorithms exhibits superiority over benchmarks in mmWave XL-MIMO communication systems. For future work, channel estimation of mmWave XL-MIMO systems with spatial non-stationary can be investigated, which further destroys the sparsity in the beam domain.

Appendix A HMP Algorithm for Constrained BFE Minimization

Based on the Lagrangian multiplier method, we set the derivative of the Lagrangian function with respect to the auxiliary beliefs to zero. The stationary-point of the auxiliary belief can be given in (53), where 𝝃β,bs𝜷mp𝝃m,pβ,bs𝜷\bm{\xi}^{\beta,b_{s\bm{\beta}}}{\;\triangleq\;}\sum_{m}\sum_{p}\bm{\xi}_{m,p}^{\beta,b_{s\bm{\beta}}}. By substituting the stationary equations in (53) into the MVCCs in (35) and (36), we obtain a series of iteration equations for the Lagrange multipliers. The iteration equations are organized following the message flow backwards and forwards from the observations to the beam-delay domain sparse channel, which yields Algorithm 1 except for the hyperparameter learning.

𝖻s𝒞𝒩(𝐬;𝐲,σz2𝐈)𝒞𝒩(𝐬;𝝃s,bs(𝝇s,bs)1+𝖤{𝐬𝖻s},𝖽𝗂𝖺𝗀{𝝇s,bs}1),\mathsf{b}_{s}{\;\propto\;}\mathcal{CN}\left(\mathbf{s};\mathbf{y},{\sigma}_{z}^{2}\mathbf{I}\right){\;\cdot\;}\mathcal{CN}\left(\mathbf{s};-{\bm{\xi}^{s,b_{s}}}{\;\circ\;}(\bm{\varsigma}^{s,b_{s}})^{{\circ}-1}+\mathsf{E}\{\mathbf{s}\mid\mathsf{b}_{s}\},-\mathsf{diag}\{\bm{\varsigma}^{s,b_{s}}\}^{{\circ}-1}\right), (53a)
𝗊s𝒞𝒩(𝐬;(𝝃s,bs+𝝃s,bs𝜷)(𝝇s,bs+𝝇s,bs𝜷)1+𝖤{𝐬𝗊s},𝖽𝗂𝖺𝗀{𝝇s,bs+𝝇s,bs𝜷}1),\mathsf{q}_{s}{\;\propto\;}\mathcal{CN}\left(\mathbf{s};-(\bm{\xi}^{s,b_{s}}+\bm{\xi}^{s,b_{s\bm{\beta}}}){\;\circ\;}(\bm{\varsigma}^{s,b_{s}}+\bm{\varsigma}^{s,b_{s\bm{\beta}}})^{{\circ}-1}+\mathsf{E}\{\mathbf{s}\mid\mathsf{q}_{s}\},-\mathsf{diag}\{\bm{\varsigma}^{s,b_{s}}+\bm{\varsigma}^{s,b_{s\bm{\beta}}}\}^{{\circ}-1}\right), (53b)
𝖻β[(1λ^)δ(𝜷)+λ^𝒞𝒩(𝜷;0,𝖽𝗂𝖺𝗀{𝝌^})]𝒞𝒩(𝜷;𝝃β(𝝇β,bβ)1+𝖤{𝜷𝖻β},𝖽𝗂𝖺𝗀{𝝇β,bβ}1)\mathsf{b}_{\beta}{\;\propto\;}[(1-\hat{\lambda}){\delta}(\bm{\beta})+\hat{\lambda}\mathcal{CN}(\bm{\beta};0,\mathsf{diag}\{\hat{\bm{\chi}}\})]{\;\cdot\;}\mathcal{CN}(\bm{\beta};\bm{\xi}^{\beta}{\;\circ\;}(\bm{\varsigma}^{\beta,b_{\beta}})^{{\circ}-1}+\mathsf{E}\{\bm{\beta}\mid\mathsf{b}_{\beta}\},-\mathsf{diag}\{\bm{\varsigma}^{\beta,b_{\beta}}\}^{{\circ}-1}) (53c)
𝗊β𝒞𝒩(𝜷;(𝝃β,bβ+𝝃β,bs𝜷)(𝝇β,bβ+MNp𝝇β,bsβ)1+𝖤{𝜷𝗊β},MNp𝖽𝗂𝖺𝗀{𝝇β,bβ++MNp𝝇β,bsβ}1)\mathsf{q}_{\beta}{\;\propto\;}\mathcal{CN}(\bm{\beta};(\bm{\xi}^{\beta,b_{\beta}}+\bm{\xi}^{\beta,b_{s\bm{\beta}}}){\;\circ\;}(\bm{\varsigma}^{\beta,b_{\beta}}+MN_{\text{p}}\bm{\varsigma}^{\beta,b_{s\beta}})^{{\circ}-1}+\mathsf{E}\{\bm{\beta}\mid\mathsf{q}_{\beta}\},-MN_{\text{p}}\mathsf{diag}\{\bm{\varsigma}^{\beta,b_{\beta}}++MN_{\text{p}}\bm{\varsigma}^{\beta,b_{s\beta}}\}^{{\circ}-1}) (53d)
bs𝜷,m,p\displaystyle b_{s\bm{\beta},m,p} δ(sm,p𝐠m,p(r)𝜷)𝒞𝒩(sm,p;ξm,ps,bs𝜷/ςm,ps,bs𝜷+𝖤{sm,pbs,𝜷,m,p},1/ςm,ps,bs𝜷)\displaystyle{\;\propto\;}{\delta}(s_{m,p}-\mathbf{g}_{m,p}^{(\text{r})}\bm{\beta}){\;\cdot\;}\mathcal{CN}({s}_{m,p};-{{\xi}_{m,p}^{s,b_{s\bm{\beta}}}}/{{\varsigma}_{m,p}^{s,b_{s\bm{\beta}}}}+\mathsf{E}\{{s}_{m,p}\mid b_{{s,\bm{\beta}},m,p}\},-{1}/{{\varsigma}_{m,p}^{s,b_{s\bm{\beta}}}})
𝒞𝒩(𝜷;𝝃m,pβ,bs𝜷(𝝃kβ,bs𝜷)1+𝖤{𝜷bs,𝜷,m,p},1/𝝇kβ,bs𝜷).\displaystyle{\;\cdot\;}\mathcal{CN}(\bm{\beta};-\bm{\xi}_{m,p}^{{\beta},b_{s\bm{\beta}}}{\;\circ\;}(\bm{\xi}_{k}^{\beta,b_{s\bm{\beta}}})^{{\circ}-1}+\mathsf{E}\{\bm{\beta}\mid b_{{s,\bm{\beta}},m,p}\},-{1}/{\bm{\varsigma}_{k}^{\beta,b_{s\bm{\beta}}}}). (53e)

For the hyperparameters χk{\chi}_{k}, the constrained BFE minimization with respect to χk,true{\chi}_{k,\text{true}} can be expressed as

χ^k,true(t+1)=argmaxχk,true0𝖻β,k(t)ln𝖯𝗋(βk;λtrue,χk,true)dχk,true,\hat{\chi}_{k,\text{true}}^{(t+1)}=\arg\max_{{\chi}_{k,\text{true}}{\geq}0}\int\mathsf{b}_{\beta,k}^{(t)}\ln\mathsf{Pr}({\beta}_{k};{\lambda}_{\text{true}},{\chi}_{k,\text{true}})\mathrm{d}{\chi}_{k,\text{true}}, (54)

where 𝖻β,k(t)\mathsf{b}_{\beta,k}^{(t)} denotes the fixed-point solution of b[β,k]b_{[}\beta,k] in the tt-th iteration. The derivative of the objective function with respect to χk,true{\chi}_{k,\text{true}} can be obtained by

χk,true\displaystyle\frac{{\partial}}{{\partial}{\chi}_{k,\text{true}}} ln𝖯𝗋(βk;λtrue,χk,true)\displaystyle\ln\mathsf{Pr}({\beta}_{k};{\lambda}_{\text{true}},{\chi}_{k,\text{true}})
={12(|βk|2χk,true21χk,true),βk 00,βk=0.\displaystyle=\begin{cases}\frac{1}{2}\left(\frac{|{\beta}_{k}|^{2}}{{\chi}_{k,\text{true}}^{2}}-\frac{1}{{\chi}_{k,\text{true}}}\right),&{\beta}_{k}{\;\neq\;}0\\ 0,&{\beta}_{k}=0\end{cases}. (55)

Splitting the domain of integration in (54) into closed ball ϵ={βk:|βk|ϵ}\mathcal{B}_{\epsilon}=\{{\beta}_{k}:|{\beta}_{k}|{\;\leq\;}{\epsilon}\} and its complement ϵ=\ϵ\mathcal{B}_{\epsilon}=\mathbb{C}\;\backslash\;\mathcal{B}_{\epsilon}, the first order condition can be given with the limit of ϵ 0{\epsilon}{\;\rightarrow\;}0:

limϵ0βk¯ϵ(|βk|2χk,true)𝖻β,k(t)dβk=0.\lim_{{\epsilon}{\rightarrow}0}\int_{{\beta}_{k}{\in}\bar{\mathcal{B}}_{\epsilon}}(|{\beta}_{k}|^{2}-{\chi}_{k,\text{true}})\mathsf{b}_{\beta,k}^{(t)}\mathrm{d}{\beta}_{k}=0. (56)

Following this, the learning rule of χk,true{\chi}_{k,\text{true}} is

χ^k,true(t+1)=𝖤{|βk|2𝖻β,k(t)},\hat{{\chi}}_{k,\text{true}}^{(t+1)}=\mathsf{E}\{|{\beta}_{k}|^{2}\mid\mathsf{b}_{\beta,k}^{(t)}\}, (57)

whose bounding constraint χ^k,true 0\hat{{\chi}}_{k,\text{true}}{\;\geq\;}0 constantly holds.

In the similar manner, the learning rule of λtrue{\lambda}_{\text{true}} is given by

λ^true(t+1)=1Kk11+LRk(t),\hat{{\lambda}}_{\text{true}}^{(t+1)}=\frac{1}{K}\sum_{k}\frac{1}{1+\mathrm{LR}_{k}^{(t)}}, (58)

where LRk(t)\mathrm{LR}_{k}^{(t)} denotes the posterior likelihood of the kk-th angle-slope-delay tuple, which can be defined by

LR(t)=(1λ^true(t))𝒞𝒩(βk;0,χk,true(t))λ^true(t)𝒞𝒩(βk;0,χk,true(t)+σz2).\mathrm{LR}^{(t)}=\frac{(1-\hat{{\lambda}}_{\text{true}}^{(t)})\mathcal{CN}({\beta}_{k};0,{\chi}_{k,\text{true}}^{(t)})}{\hat{{\lambda}}_{\text{true}}^{(t)}\mathcal{CN}({\beta}_{k};0,{\chi}_{k,\text{true}}^{(t)}+{\sigma}_{z}^{2})}. (59)

Appendix B Perturbation Parameter Learning in the MDGPP-based Channel Estimation

Due to the similarity of perturbation parameter updating in the angle, slope and delay domains, we will give the derivation of the perturbation parameter updating rules in the angle domain as an example.

The optimization objective function of constrained BFE minimization with respect to Δϕ{\Delta}\bm{\phi} can be expressed as

f(Δ𝝍)=m,p𝖤{ln𝖯𝗋(sm,p𝜷,Δ𝝍,Δ𝜼,Δ𝝉)𝖻s𝜷}.f(\Delta\bm{\psi})=\sum_{m,p}\mathsf{E}\{\ln\mathsf{Pr}(s_{m,p}\mid\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau})\mid\mathsf{b}_{s\bm{\beta}}\}. (60)

To avoid the singularity of Dirac delta function, the condition PDF can be approximated by the limitation form:

𝖯𝗋(sm,p𝜷,Δ𝝍,Δ𝜼,Δ𝝉)=limϵ0𝒞𝒩(sm,p;𝐠~m,p(r)𝜷,ϵ),\mathsf{Pr}(s_{m,p}\mid\bm{\beta},\Delta\bm{\psi},\Delta\bm{\eta},\Delta\bm{\tau})\!=\!\lim_{{\epsilon}{\rightarrow}0}\mathcal{CN}(s_{m,p};\tilde{\mathbf{g}}_{m,p}^{(\text{r})}\bm{\beta},{\epsilon}), (61)

where 𝐠~m,p(r)\tilde{\mathbf{g}}_{m,p}^{(\text{r})} denotes the (mNp+p)(mN_{\text{p}}+p)-th row of 𝐆~\tilde{\mathbf{G}}, and 𝐆~𝐅¯H𝐔~\tilde{\mathbf{G}}{\;\triangleq\;}\bar{\mathbf{F}}^{H}\tilde{\mathbf{U}}. With this approximation, the optimization objective function can be simplified as

f(Δ𝝍)𝐬^𝐆~𝜷^22+𝖳𝗋{𝐆~𝖽𝗂𝖺𝗀{𝝈β2}𝐆~H}.f(\Delta\bm{\psi}){\;\propto\;}\|\hat{\mathbf{s}}-\tilde{\mathbf{G}}\hat{\bm{\beta}}\|_{2}^{2}+\mathsf{Tr}\{\tilde{\mathbf{G}}\mathsf{diag}\{\bm{\sigma}_{\beta}^{2}\}\tilde{\mathbf{G}}^{H}\}. (62)

For the first term of (62), it can be simplified as

𝐬^𝐆~𝜷^22=\displaystyle\|\hat{\mathbf{s}}-\tilde{\mathbf{G}}\hat{\bm{\beta}}\|_{2}^{2}= (𝐬^𝐅¯H𝐔~\ψ𝜷^)𝐅¯H𝐔ψ𝖽𝗂𝖺𝗀{𝜷^}Δ𝝍22\displaystyle\|(\hat{\mathbf{s}}-\bar{\mathbf{F}}^{H}\tilde{\mathbf{U}}_{\backslash{\psi}}\hat{\bm{\beta}})-\bar{\mathbf{F}}^{H}\mathbf{U}_{\psi}\mathsf{diag}\{\hat{\bm{\beta}}\}\Delta\bm{\psi}\|_{2}^{2}
\displaystyle{\;\propto\;} Δ𝝍T((𝐔ψH𝐅¯𝐅¯H𝐔ψ)𝜷^𝜷^H)Δ𝝍\displaystyle\Delta\bm{\psi}^{T}((\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\mathbf{U}_{\psi})^{\ast}{\;\circ\;}\hat{\bm{\beta}}\hat{\bm{\beta}}^{H})\Delta\bm{\psi}-
2Re{𝖽𝗂𝖺𝗀{𝜷^}𝐔ψH𝐅¯(𝐬^𝐅¯H𝐔~\ψ𝜷^)}TΔ𝝍.\displaystyle 2\real\{\mathsf{diag}\{\hat{\bm{\beta}}^{\ast}\}\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}(\hat{\mathbf{s}}-\bar{\mathbf{F}}^{H}\tilde{\mathbf{U}}_{\backslash{\psi}}\hat{\bm{\beta}})\}^{T}\Delta\bm{\psi}. (63)

For the second term of (62), it can be simplified as

𝖳𝗋\displaystyle\mathsf{Tr} {𝐆~𝖽𝗂𝖺𝗀{𝝈β2}𝐆~H}\displaystyle\{\tilde{\mathbf{G}}\mathsf{diag}\{\bm{\sigma}_{\beta}^{2}\}\tilde{\mathbf{G}}^{H}\}
\displaystyle{\;\propto\;} 2Re{𝖳𝗋{𝐔~ψH𝐅¯𝐅¯H𝐔~\ψ𝖽𝗂𝖺𝗀{𝝈β2}𝖽𝗂𝖺𝗀{Δ𝝍}}}\displaystyle 2\real\{\mathsf{Tr}\{\tilde{\mathbf{U}}_{{\psi}}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\tilde{\mathbf{U}}_{\backslash{\psi}}\mathsf{diag}\{\bm{\sigma}_{\beta}^{2}\}\mathsf{diag}\{\Delta\bm{\psi}\}\}\}
+𝖳𝗋{𝖽𝗂𝖺𝗀{Δ𝝍}𝖽𝗂𝖺𝗀{𝝈β2}𝖽𝗂𝖺𝗀{Δ𝝍}𝐔ψH𝐅¯𝐅¯H𝐔ψ}\displaystyle+\mathsf{Tr}\{\mathsf{diag}\{\Delta\bm{\psi}\}\mathsf{diag}\{\bm{\sigma}_{\beta}^{2}\}\mathsf{diag}\{\Delta\bm{\psi}\}\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\mathbf{U}_{\psi}\}
=\displaystyle= 2Re{𝖳𝗋{𝖽𝗂𝖺𝗀{𝐔~ψH𝐅¯𝐅¯H𝐔~\ψ𝖽𝗂𝖺𝗀{𝝈β2}}}}TΔ𝝍\displaystyle 2\real\{\mathsf{Tr}\{\mathsf{diag}\{\tilde{\mathbf{U}}_{{\psi}}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\tilde{\mathbf{U}}_{\backslash{\psi}}\mathsf{diag}\{\bm{\sigma}_{\beta}^{2}\}\}\}\}^{T}\Delta\bm{\psi}
+Δ𝝍T((𝐔ψH𝐅¯𝐅¯H𝐔ψ)𝖽𝗂𝖺𝗀{𝝈β2})Δ𝝍\displaystyle+\Delta\bm{\psi}^{T}((\mathbf{U}_{\psi}^{H}\bar{\mathbf{F}}\bar{\mathbf{F}}^{H}\mathbf{U}_{\psi})^{\ast}{\;\circ\;}\mathsf{diag}\{\bm{\sigma}_{\beta}^{2}\})\Delta\bm{\psi} (64)

where the last equality is based on the identity equation as

𝖳𝗋{𝖽𝗂𝖺𝗀H{𝐱}𝐀𝖽𝗂𝖺𝗀{𝐲}𝐁T}=𝐱H(𝐀𝐁)𝐲\mathsf{Tr}\{\mathsf{diag}^{H}\{\mathbf{x}\}\mathbf{A}\mathsf{diag}\{\mathbf{y}\}\mathbf{B}^{T}\}=\mathbf{x}^{H}(\mathbf{A}{\;\circ\;}\mathbf{B})\mathbf{y} (65)

for the proper dimensions of matrices 𝐀\mathbf{A}, 𝐁\mathbf{B} and vectors 𝐱\mathbf{x}, 𝐲\mathbf{y}.

Based on the simplification above, the constrained BFE minimization with respect to Δ𝝍\Delta\bm{\psi} can be equivalently expressed in (48), which can be solved efficiently by closed-form solution without bounding constraints and sequential adjustment in (51). The perturbation parameter updating rules in the slope and delay domains can be derived in the similar manner, which are omiited due to the space limitations.

References

  • [1] W. Roh, J.-Y. Seol et al., “Millimeter-wave beamforming as an enabling technology for 5G cellular communications: theoretical feasibility and prototype results,” IEEE Commun. Mag., vol. 52, no. 2, pp. 106–113, Feb. 2014.
  • [2] M. Xiao, S. Mumtaz et al., “Millimeter wave communications for future mobile networks,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 1909–1935, Sep. 2017.
  • [3] Y. Han, S. Jin et al., “Channel estimation for extremely large-scale massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 9, no. 5, pp. 633–637, May 2020.
  • [4] G. Bacci, L. Sanguinetti, and E. Björnson, “Spherical wavefronts improve MU-MIMO spectral efficiency when using electrically large arrays,” IEEE Wireless Commun. Lett., vol. 12, no. 7, pp. 1219–1223, Jul. 2023.
  • [5] Z. Wang, J. Zhang et al., “Extremely large-scale MIMO: Fundamentals, challenges, solutions, and future directions,” IEEE Wireless Commun., pp. 1–9, 2023.
  • [6] K. T. Selvan and R. Janaswamy, “Fraunhofer and fresnel distances: Unified derivation for aperture antennas,” IEEE Antennas Propag. Mag., vol. 59, no. 4, pp. 12–15, Aug. 2017.
  • [7] Z. Zhou, X. Gao et al., “Spherical wave channel and analysis for large linear array in LoS conditions,” in Proc. IEEE Globecom Workshops (GC Workshop), 2015, pp. 1–6.
  • [8] B. Wang, F. Gao et al., “Spatial-and frequency-wideband effects in millimeter-wave massive MIMO systems,” IEEE Trans. Signal Process., vol. 66, no. 13, pp. 3393–3406, Jul. 2018.
  • [9] M. M. Mojahedian, M. Attarifar, and A. Lozano, “Spatial-wideband effect in line-of-sight MIMO communication,” in Proc. IEEE Int. Conf. Commun. (ICC), 2022, pp. 3972–3977.
  • [10] H. Luo, F. Gao et al., “Beam squint assisted user localization in near-field integrated sensing and communications systems,” IEEE Trans. Wireless Commun., 2023.
  • [11] 3GPP, “NR; user equipment (UE) radio transmission and reception; part 2: Range 2 standalone,” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 38.101-2, 2023.
  • [12] M. R. Akdeniz, Y. Liu et al., “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1164–1179, Jun. 2014.
  • [13] I. A. Hemadeh, K. Satyanarayana et al., “Millimeter-wave communications: Physical channel models, design considerations, antenna constructions, and link-budget,” IEEE Commun. Surveys Tuts., vol. 20, no. 2, pp. 870–913, Secondquarter 2018.
  • [14] C.-X. Wang, J. Bian et al., “A survey of 5G channel measurements and models,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 3142–3168, Aug. 2018.
  • [15] J. Lee, G.-T. Gil, and Y. H. Lee, “Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave communications,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2370–2386, Jun. 2016.
  • [16] J. P. González-Coma, J. Rodríguez-Fernández et al., “Channel estimation and hybrid precoding for frequency selective multiuser mmWave MIMO systems,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 2, pp. 353–367, May 2018.
  • [17] N. González-Prelcic, H. Xie et al., “Wideband channel tracking and hybrid precoding for mmWave MIMO systems,” IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2161–2174, Apr. 2021.
  • [18] K. Venugopal, A. Alkhateeb et al., “Channel estimation for hybrid architecture-based wideband millimeter wave systems,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 1996–2009, Sep. 2017.
  • [19] M. Cui and L. Dai, “Channel estimation for extremely large-scale MIMO: Far-field or near-field?” IEEE Trans. Commun., vol. 70, no. 4, pp. 2663–2677, Jan. 2022.
  • [20] Y. Lu and L. Dai, “Near-field channel estimation in mixed LoS/NLoS environments for extremely large-scale MIMO systems,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3694–3707, Jun. 2023.
  • [21] Z. Hu, C. Chen et al., “Hybrid-field channel estimation for extremely large-scale massive MIMO system,” IEEE Commun. Lett., vol. 27, no. 1, pp. 303–307, Jan. 2023.
  • [22] X. Wei and L. Dai, “Channel estimation for extremely large-scale massive MIMO: Far-field, near-field, or hybrid-field?” IEEE Commun. Lett., vol. 26, no. 1, pp. 177–181, Jan. 2022.
  • [23] X. Peng, L. Zhao et al., “Channel estimation for extremely large-scale massive MIMO systems in hybrid-field channel,” in Proc. IEEE/CIC Int. Conf. Commun. China (ICCC), 2023, pp. 1–6.
  • [24] Y. Li and M. Jiang, “ADMM-based hybrid-field uplink channel estimation for extremely large-scale MIMO systems,” in Proc. IEEE/CIC Int. Conf. Commun. China (ICCC), 2023, pp. 1–5.
  • [25] K. Dovelos, M. Matthaiou et al., “Channel estimation and hybrid combining for wideband Terahertz massive MIMO systems,” IEEE J. Sel. Areas Commun., vol. 39, no. 6, pp. 1604–1620, Jun. 2021.
  • [26] J. Tan and L. Dai, “Wideband channel estimation for THz massive MIMO,” China Commun., vol. 18, no. 5, pp. 66–80, May 2021.
  • [27] I.-S. Kim and J. Choi, “Spatial wideband channel estimation for mmWave massive MIMO systems with hybrid architectures and low-resolution ADCs,” IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 4016–4029, Jun. 2021.
  • [28] B. Wang, M. Jian et al., “Beam squint and channel estimation for wideband mmWave massive MIMO-OFDM systems,” IEEE Trans. Signal Process., vol. 67, no. 23, pp. 5893–5908, Dec. 2019.
  • [29] M. Jian, F. Gao et al., “Angle-domain aided UL/DL channel estimation for wideband mmWave massive MIMO systems with beam squint,” IEEE Trans. Wireless Commun., vol. 18, no. 7, pp. 3515–3527, Jul. 2019.
  • [30] M. Cui and L. Dai, “Near-field wideband channel estimation for extremely large-scale MIMO,” Sci. China Inf. Sci., vol. 66, no. 7, p. 172303, Jun. 2023.
  • [31] A. M. Elbir, K. V. Mishra, and S. Chatzinotas, “NBA-OMP: Near-field beam-split-aware orthogonal matching pursuit for wideband THz channel estimation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2023, pp. 1–5.
  • [32] R. Méndez-Rial, C. Rusu et al., “Hybrid MIMO architectures for millimeter wave communications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267, Jan. 2016.
  • [33] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory.   Prentice-Hall, Inc., 1993.
  • [34] J. Yang, A.-A. Lu et al., “Channel estimation for massive MIMO: An information geometry approach,” IEEE Trans. Signal Process., vol. 70, pp. 4820–4834, 2022.
  • [35] M. I. Jordan, Z. Ghahramani et al., “An introduction to variational methods for graphical models,” Mach. learn., vol. 37, pp. 183–233, Nov. 1999.
  • [36] J. Winn, C. M. Bishop, and T. Jaakkola, “Variational message passing,” J. Mach. Learn. Res., vol. 6, no. 4, Apr. 2005.
  • [37] C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine Learning.   Springer, 2006, vol. 4, no. 4.
  • [38] J. Yedidia, W. Freeman, and Y. Weiss, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2282–2312, Jul. 2005.
  • [39] D. Zhang, X. Song et al., “Unifying message passing algorithms under the framework of constrained Bethe free energy minimization,” IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4144–4158, Jul. 2021.
  • [40] X. Liu, W. Wang et al., “Sparse channel estimation via hierarchical hybrid message passing for massive MIMO-OFDM systems,” IEEE Trans. Wireless Commun., vol. 20, no. 11, pp. 7118–7134, Nov. 2021.
  • [41] S. P. Boyd and L. Vandenberghe, Convex Optimization.   Cambridge University Press, 2004.
  • [42] J. Vila, P. Schniter et al., “Adaptive damping and mean removal for the generalized approximate message passing algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2015, pp. 2021–2025.
  • [43] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,” IEEE Trans. Signal Process., vol. 61, no. 19, pp. 4658–4672, Oct. 2013.
  • [44] F. Bellili, F. Sohrabi, and W. Yu, “Generalized approximate message passing for massive MIMO mmWave channel estimation with Laplacian prior,” IEEE Trans. Commun., vol. 67, no. 5, pp. 3205–3219, May 2019.
  • [45] Y. Chi, L. L. Scharf et al., “Sensitivity to basis mismatch in compressed sensing,” IEEE Trans. Signal Process., vol. 59, no. 5, pp. 2182–2195, May 2011.
  • [46] G. Tang, B. N. Bhaskar et al., “Compressed sensing off the grid,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7465–7490, Nov. 2013.
  • [47] G. Liu, A. Liu et al., “Angular-domain selective channel tracking and Doppler compensation for high-mobility mmWave massive MIMO,” IEEE Trans. Wireless Commun., vol. 20, no. 5, pp. 2902–2916, May 2021.
  • [48] T. M. Apostol, Mathematical Analysis.   Addison-Wesley, 1974.
  • [49] A.-A. Lu, Y. Chen, and X. Gao, “2D beam domain statistical CSI estimation for massive MIMO uplink,” IEEE Trans. Wireless Commun., 2023.
  • [50] Y. Zeng and X. Xu, “Toward environment-aware 6G communications via channel knowledge map,” IEEE Wireless Commun., vol. 28, no. 3, pp. 84–91, Jun. 2021.
  • [51] H. Fan, W. Wang et al., “Generalized approximate message passing detection with row-orthogonal linear preprocessing for uplink massive MIMO systems,” in Proc. IEEE Veh. Technol. Conf. (VTC-Fall), 2017, pp. 1–6.