This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bayesian Channel Estimation for Intelligent Reflecting Surface-Aided mmWave Massive MIMO Systems With Semi-Passive Elements

In-soo Kim, Mehdi Bennis, Jaeky Oh, Jaehoon Chung, and Junil Choi I. Kim is with Wireless Research & Development (WRD), Qualcomm Technologies, Inc., San Diego, CA, USA (e-mail: [email protected]).J. Choi is with the School of Electrical Engineering, KAIST, Daejeon, South Korea (e-mail: [email protected]).M. Bennis is with the Centre for Wireless Communications, University of Oulu, Oulu, Finland (e-mail: [email protected]).Jaeky Oh and Jaehoon Chung are with C&M Standard Lab, ICT Technology Center, LG Electronics Inc. (e-mail: {jaeky.oh; jaehoon.chung}@lge.com).This work was supported in part by LG Electronics Inc., in part by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2020-0-01787) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation), in part by the Ministry of Trade, Industry and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT) through the International Cooperative R&D program (P0022557), and in part by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00269, Development of sub-THz band wireless transmission and access core technology for 6G Tbps data rate).
Abstract

In this paper, we propose a Bayesian channel estimator for intelligent reflecting surface-aided (IRS-aided) millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with semi-passive elements that can receive the signal in the active sensing mode. Ultimately, our goal is to minimize the channel estimation error using the received signal at the base station and additional information acquired from a small number of active sensors at the IRS. Unlike recent works on channel estimation with semi-passive elements that require both uplink and downlink training signals to estimate the UE-IRS and IRS-BS links, we only use uplink training signals to estimate all the links. To compute the minimum mean squared error (MMSE) estimates of all the links, we propose a novel variational inference-sparse Bayesian learning (VI-SBL) channel estimator that performs approximate posterior inference on the channel using VI with the mean-field approximation under the SBL framework. The simulation results show that VI-SBL outperforms the state-of-the-art baselines for IRS with passive reflecting elements in terms of the channel estimation accuracy and training overhead. Furthermore, VI-SBL with semi-passive elements is shown to be more spectral- and energy-efficient than the baselines with passive reflecting elements.

Index Terms:
Channel estimation, intelligent reflecting surface (IRS), semi-passive element, variational inference (VI), sparse Bayesian learning (SBL).

I Introduction

5G wireless communications support high data rates by communicating in the millimeter wave (mmWave) band [1]. The high carrier frequency in the range of 30-300 GHz offers a large bandwidth, which results in a significant throughput gain. The problem, however, is that the severe path loss renders mmWave communications vulnerable to blockages. To overcome such an issue, intelligent reflecting surface (IRS) was recently proposed [2], which is an array of metamaterial-based passive reflecting elements capable of adjusting the amplitude and phase of the impinging signal. For the IRS to generate a favorable detour around a blockage, the reflection amplitude and phase shift must align with the channel, which necessitates accurate channel state information (CSI).

The distinct feature of channel estimation for IRS with passive reflecting elements is that the UE-IRS link of size NKNK and IRS-BS link of size MNMN form a UE-IRS-BS link of size MNKMNK where MM, NN, and KK are the numbers of base station antennas, IRS elements, and single-antenna users. Since passive reflecting elements cannot observe the UE-IRS and IRS-BS links, the UE-IRS-BS link must be estimated from the reflected signal, which results in a significant training overhead.

Next, we review prior works [3, 4, 5, 6, 7, 8, 9, 10, 11] on channel estimation for IRS with passive reflecting elements that focus on training overhead reduction. In [3], a three-phase channel estimator is proposed inspired by the correlation between the channels of different users. In particular, [3] exploits the fact that the UE-IRS-BS links of different users share the same IRS-BS link. As a result, the three-phase channel estimator proceeds by estimating the UE-BS links of all the users in the first phase, UE-IRS-BS link of a particular user in the second phase, and UE-IRS-BS links of the remaining users in the third phase where most of the training overhead reduction occurs. Moving on to [4], channel estimation for IRS with discrete phase shifts is considered, for which a reflection design is proposed. In particular, the reflection design is optimized based on grouping IRS elements to reduce the channel estimation error. In [5], a channel estimator is proposed for the orthogonal frequency division multiple access (OFDMA) scenario based on the line-of-sight (LoS) assumption on the UE-BS link. Again, the channel estimator exploits the fact that the UE-IRS-BS links of different users share the same IRS-BS link to reduce the training overhead as in [3]. In [6], the channel estimation problem is reformulated as a matrix factorization problem, which is solved using the message passing (MP) algorithm. In [7], a dual link training signal-based channel estimator is proposed. In particular, the dual link training signal transmits downlink training signals to the IRS, whose reflected version is used as uplink training signals to estimate the IRS-BS link. In [8], an atomic norm minimization-based channel estimator is proposed that extracts the angle parameters of the channel. In [9], a channel estimator is proposed based on the single-path approximation of the channel. In addition, [9] develops another channel estimator that exploits IRS phase shifts and training signals. In [10], the mmWave channel estimation is reformulated as a compressed sensing problem to capture the channel sparsity. In [11], the double-structured orthogonal matching pursuit (DS-OMP) channel estimator is proposed that exploits the common IRS-BS link shared by all the users. In particular, DS-OMP reduces the channel estimation error and training overhead by identifying the common rows and columns that the UE-IRS-BS links of all the users share by taking into account the double sparsity structure inherent in the IRS links.

The training overhead in [3, 4, 5, 6, 7, 8, 9, 10, 11] is still high because estimating the UE-IRS-BS link with passive reflecting elements that cannot receive the signal is a challenging task. To overcome such an issue, IRS with semi-passive elements that can be switched to the active sensing mode was recently proposed in [12, 13]. In particular, the received signal at the active sensors is leveraged to estimate the UE-IRS and IRS-BS links in the first coherence block. Then, [12, 13] can replace UE-IRS-BS link estimation of size MNKMNK with UE-IRS link estimation of size NKNK in the subsequent coherence blocks because the IRS-BS link remains constant over multiple coherence blocks, which results in significant channel estimation overhead reduction. The problem of [12, 13], however, is that both uplink and downlink training signals are necessary to estimate all the links. Furthermore, [12, 13] are semi-passive element-driven rather than aided because only the received signal at the active sensors is leveraged, while the received signal at the base station is discarded.

In this paper, a Bayesian channel estimator is proposed for IRS-aided mmWave massive multiple-input multiple-output (MIMO) systems with semi-passive elements. In particular, the semi-passive elements are activated to the active sensing mode in the channel estimation phase and deactivated to the passive reflecting mode in the data transmission phase. For the first time in the literature, we estimate the UE-IRS and IRS-BS links using only uplink training signals, which is in contrast to recent works [12, 13] on channel estimation with semi-passive elements that rely on both uplink and downlink training signals to estimate all the links. To compute the minimum mean squared error (MMSE) estimate of the channel from the received signal at the base station and additional information acquired from the active sensors at the IRS, we perform posterior inference on the channel under the sparse Bayesian learning (SBL) framework [14, 15]. Since exact posterior inference is intractable, we use the variational inference (VI) approach with the mean-field approximation [16, 17]. The proposed VI-SBL-based channel estimator enables us to compute the approximate MMSE estimates of all the links iteratively. In addition, we reduce the complexity of VI-SBL by converting a large matrix inversion to many small matrix inversions using the mean-field approximation. The simulation results show that VI-SBL outperforms the state-of-the-art channel estimators for IRS with passive reflecting elements in terms of the channel estimation error, training overhead, and energy efficiency, which is defined as the spectral efficiency normalized by the total power consumption to make a fair comparison between IRS with semi-passive elements and passive reflecting elements.

The rest of the paper is organized as follows. The channel model and signal model are introduced in Section II. In Section III, the proposed VI-SBL-based channel estimator is developed, which is followed by the mean-field approximation-based complexity reduction scheme. The performance of VI-SBL is assessed in Section IV based on various performance metrics, and a concluding remark follows in Section V.

Notation: aa, 𝐚\mathbf{a}, and 𝐀\mathbf{A} denote a scalar, vector, and matrix. The complex conjugate, transpose, and conjugate transpose of 𝐀\mathbf{A} are written as 𝐀\mathbf{A}^{*}, 𝐀T\mathbf{A}^{\mathrm{T}}, and 𝐀H\mathbf{A}^{\mathrm{H}}. |a||a| and a\angle a are the magnitude and phase of aa. The ii-th element of 𝐚\mathbf{a} is aia_{i}, while the (i,j)(i,j)-th element and ii-th column of 𝐀\mathbf{A} are [𝐀]i,j[\mathbf{A}]_{i,j} and [𝐀]:,i[\mathbf{A}]_{:,i}. The elementwise product, Kronecker product, and Khatri-Rao product of 𝐀\mathbf{A} and 𝐁\mathbf{B} are written as 𝐀𝐁\mathbf{A}\odot\mathbf{B}, 𝐀𝐁\mathbf{A}\otimes\mathbf{B}, and kr(𝐀,𝐁)\mathrm{kr}(\mathbf{A},\mathbf{B}). The n×1n\times 1 all-zero vector, n×1n\times 1 all-one vector, and n×nn\times n identity matrix are written as 𝟎n\mathbf{0}_{n}, 𝟏n\mathbf{1}_{n}, and 𝐈n\mathbf{I}_{n}. vec(𝐀)\mathrm{vec}(\mathbf{A}) is the vectorization of 𝐀\mathbf{A}, while diag(𝐚)\mathrm{diag}(\mathbf{a}) is the diagonal matrix with 𝐚\mathbf{a} on the main diagonal. The probability density function (PDF) of a complex Gaussian random vector 𝐱𝒞𝒩(𝐦,𝐂)\mathbf{x}\sim\mathcal{CN}(\mathbf{m},\mathbf{C}) with mean 𝐦\mathbf{m} and covariance 𝐂\mathbf{C} is written as 𝒞𝒩(𝐱|𝐦,𝐂)\mathcal{CN}(\mathbf{x}|\mathbf{m},\mathbf{C}). The probability measure of a random vector 𝐱\mathbf{x} is p(𝐱)p(\mathbf{x}). The set difference between sets 𝒜\mathcal{A} and \mathcal{B} is 𝒜\mathcal{A}\setminus\mathcal{B}. N\llbracket N\rrbracket denotes {1,,N}\{1,...,N\}.

Refer to caption
Figure 1: The uplink of an IRS-aided mmWave massive MIMO system with semi-passive elements. The semi-passive elements act as active sensors in the channel estimation phase and passive reflecting elements in the data transmission phase.

II System Model and Problem Formulation

Consider the uplink of an IRS-aided mmWave massive MIMO system with an MM-antenna base station and KK single-antenna users as illustrated in Fig. 1. The IRS is equipped with NN elements to compensate for the path loss in the mmWave band. In particular, NpN_{\mathrm{p}} passive reflecting elements reflect the impinging signal as usual [18, 19, 20]. Meanwhile, Na=NNpNN_{\mathrm{a}}=N-N_{\mathrm{p}}\ll N semi-passive elements act as active sensors in the channel estimation phase and passive reflecting elements in the data transmission phase. The active sensors are capable of receiving the signal, whose quantized version is forwarded to the base station. In practice, the active sensors are implemented by connecting the IRS elements to NaN_{\mathrm{a}} radio frequency (RF) chains with BB-bit analog-to-digital converters (ADCs) [12], whose connection can be switched using base station-controlled switching networks [21, 22]. In this paper, a small number of active sensors and low-resolution ADCs are considered to reduce the additional power consumed by the IRS.

In essence, our goal is to exploit the received signal at the base station and additional information acquired from the active sensors at the IRS to estimate the UE-IRS and IRS-BS links instead of the UE-IRS-BS link. Then, the base station can avoid estimating the UE-IRS-BS link of size MNKMNK and only estimate the UE-IRS link of size NKNK in the subsequent coherence blocks because the IRS-BS link remains constant over multiple coherence blocks in practice [7]. As a result, channel estimation overhead reduction in the long run is attained from the reduced training overhead achieved by the active sensors in the first coherence block, and replacing UE-IRS-BS link estimation with UE-IRS link estimation in the subsequent coherence blocks. In this paper, we focus on the first coherence block where all the links are unknown.

II-A Channel Model

The channel is composed of the UE-IRS link 𝐟¯kN\bar{\mathbf{f}}_{k}\in\mathbb{C}^{N} for kKk\in\llbracket K\rrbracket, IRS-BS link 𝐆¯M×N\bar{\mathbf{G}}\in\mathbb{C}^{M\times N}, and UE-BS link 𝐡¯kM\bar{\mathbf{h}}_{k}\in\mathbb{C}^{M} for kKk\in\llbracket K\rrbracket. Since the scatterers are limited in the mmWave band as the path loss is severe, the links are typically modeled as [23]

𝐟¯k=\displaystyle\bar{\mathbf{f}}_{k}= NκUI,k1+κUI,kαUI,k,0𝐚I(θUI,k,0AoA,θUI,k,0ZoA)+\displaystyle\sqrt{\frac{N\kappa_{\mathrm{UI},k}}{1+\kappa_{\mathrm{UI},k}}}\alpha_{\mathrm{UI},k,0}\mathbf{a}_{\mathrm{I}}(\theta_{\mathrm{UI},k,0}^{\mathrm{AoA}},\theta_{\mathrm{UI},k,0}^{\mathrm{ZoA}})+
NLUI,k(1+κUI,k)=1LUI,kαUI,k,𝐚I(θUI,k,AoA,θUI,k,ZoA),\displaystyle\sqrt{\frac{N}{L_{\mathrm{UI},k}(1+\kappa_{\mathrm{UI},k})}}\sum_{\ell=1}^{L_{\mathrm{UI},k}}\alpha_{\mathrm{UI},k,\ell}\mathbf{a}_{\mathrm{I}}(\theta_{\mathrm{UI},k,\ell}^{\mathrm{AoA}},\theta_{\mathrm{UI},k,\ell}^{\mathrm{ZoA}}),
𝐆¯=\displaystyle\bar{\mathbf{G}}= MNκIB1+κIBαIB,0𝐚B(θIB,0AoA)𝐚IH(θIB,0AoD,θIB,0ZoD)+\displaystyle\sqrt{\frac{MN\kappa_{\mathrm{IB}}}{1+\kappa_{\mathrm{IB}}}}\alpha_{\mathrm{IB},0}\mathbf{a}_{\mathrm{B}}(\theta_{\mathrm{IB},0}^{\mathrm{AoA}})\mathbf{a}_{\mathrm{I}}^{\mathrm{H}}(\theta_{\mathrm{IB},0}^{\mathrm{AoD}},\theta_{\mathrm{IB},0}^{\mathrm{ZoD}})+
MNLIB(1+κIB)=1LIBαIB,𝐚B(θIB,AoA)𝐚IH(θIB,AoD,θIB,ZoD),\displaystyle\sqrt{\frac{MN}{L_{\mathrm{IB}}(1+\kappa_{\mathrm{IB}})}}\sum_{\ell=1}^{L_{\mathrm{IB}}}\alpha_{\mathrm{IB},\ell}\mathbf{a}_{\mathrm{B}}(\theta_{\mathrm{IB},\ell}^{\mathrm{AoA}})\mathbf{a}_{\mathrm{I}}^{\mathrm{H}}(\theta_{\mathrm{IB},\ell}^{\mathrm{AoD}},\theta_{\mathrm{IB},\ell}^{\mathrm{ZoD}}),
𝐡¯k=\displaystyle\bar{\mathbf{h}}_{k}= MκUB,k1+κUB,kαUB,k,0𝐚B(θUB,k,0AoA)+\displaystyle\sqrt{\frac{M\kappa_{\mathrm{UB},k}}{1+\kappa_{\mathrm{UB},k}}}\alpha_{\mathrm{UB},k,0}\mathbf{a}_{\mathrm{B}}(\theta_{\mathrm{UB},k,0}^{\mathrm{AoA}})+
MLUB,k(1+κUB,k)=1LUB,kαUB,k,𝐚B(θUB,k,AoA)\displaystyle\sqrt{\frac{M}{L_{\mathrm{UB},k}(1+\kappa_{\mathrm{UB},k})}}\sum_{\ell=1}^{L_{\mathrm{UB},k}}\alpha_{\mathrm{UB},k,\ell}\mathbf{a}_{\mathrm{B}}(\theta_{\mathrm{UB},k,\ell}^{\mathrm{AoA}}) (1)

where a uniform linear array (ULA) geometry at the base station and uniform planar array (UPA) geometry at the IRS with half-wavelength spacings are assumed without loss of generality. For each link, α𝒞𝒩(0,1/PL)\alpha_{\ell}\sim\mathcal{CN}(0,1/\mathrm{PL}) is the \ell-th path gain where PL\mathrm{PL} is the distance- and frequency-dependent path loss, θ\theta_{\ell} is the \ell-th azimuth/zenith angle of arrival/departure, κ\kappa is the Rician K-factor, and LL is the number of non-LoS (NLoS) paths. In addition, 𝐚B()M\mathbf{a}_{\mathrm{B}}(\cdot)\in\mathbb{C}^{M} and 𝐚I(,)N\mathbf{a}_{\mathrm{I}}(\cdot,\cdot)\in\mathbb{C}^{N} are the array response vectors of the base station and IRS.

II-B Signal Model

Let 𝛀[t]{0,1}N\bm{\Omega}[t]\in\{0,1\}^{N} and 𝛀c[t]=𝟏N𝛀[t]\bm{\Omega}^{\mathrm{c}}[t]=\mathbf{1}_{N}-\bm{\Omega}[t] denote the index vectors of the active sensors and passive reflecting elements that constitute the IRS at time tt. In addition, define 𝐅¯=[𝐟¯1,,𝐟¯K]\bar{\mathbf{F}}=[\bar{\mathbf{f}}_{1},\dots,\bar{\mathbf{f}}_{K}] and 𝐇¯=[𝐡¯1,,𝐡¯K]\bar{\mathbf{H}}=[\bar{\mathbf{h}}_{1},\dots,\bar{\mathbf{h}}_{K}] for notational simplicity.

Then, the received signal 𝐲[t]M\mathbf{y}[t]\in\mathbb{C}^{M} at the base station is

𝐲[t]=𝐆¯(𝛀c[t]𝐯[t]=𝐬[t]𝐅¯𝐱[t])+𝐇¯𝐱[t]+𝐧B[t]\mathbf{y}[t]=\bar{\mathbf{G}}\left(\underbrace{\bm{\Omega}^{\mathrm{c}}[t]\odot\mathbf{v}[t]}_{=\mathbf{s}[t]}\odot\bar{\mathbf{F}}\mathbf{x}[t]\right)+\bar{\mathbf{H}}\mathbf{x}[t]+\mathbf{n}_{\mathrm{B}}[t] (2)

where 𝐯[t]N\mathbf{v}[t]\in\mathbb{C}^{N} is the passive reflection vector with the reflection amplitude |vn[t]|1|v_{n}[t]|\leq 1 and phase shift vn[t][0,2π)\angle v_{n}[t]\in[0,2\pi), 𝐱[t]K\mathbf{x}[t]\in\mathbb{C}^{K} is the transmit signal of the users under the transmit power constraint 𝔼{|xk[t]|2}Pk[t]\mathbb{E}\{|x_{k}[t]|^{2}\}\leq P_{k}[t], and 𝐧B[t]𝒞𝒩(𝟎M,σB2𝐈M)\mathbf{n}_{\mathrm{B}}[t]\sim\mathcal{CN}(\mathbf{0}_{M},\sigma_{\mathrm{B}}^{2}\mathbf{I}_{M}) is the additive white Gaussian noise (AWGN) at the base station. Likewise, the quantized received signal 𝐳[t]N\mathbf{z}[t]\in\mathbb{C}^{N} at the active sensors forwarded to the base station is

𝐳[t]=𝛀[t]Q(𝐅¯𝐱[t]+𝐧I[t])\mathbf{z}[t]=\bm{\Omega}[t]\odot\mathrm{Q}(\bar{\mathbf{F}}\mathbf{x}[t]+\mathbf{n}_{\mathrm{I}}[t]) (3)

where 𝐧I[t]𝒞𝒩(𝟎N,σI2𝐈N)\mathbf{n}_{\mathrm{I}}[t]\sim\mathcal{CN}(\mathbf{0}_{N},\sigma_{\mathrm{I}}^{2}\mathbf{I}_{N}) is the AWGN at the active sensors. The BB-bit quantizer Q()\mathrm{Q}(\cdot) is applied to the real and imaginary parts elementwise as

z=Q(u){Re(zlo)Re(u)<Re(zup)Im(zlo)Im(u)<Im(zup)z=\mathrm{Q}(u)\iff\begin{cases}\mathrm{Re}(z^{\mathrm{lo}})\leq\mathrm{Re}(u)<\mathrm{Re}(z^{\mathrm{up}})\\ \mathrm{Im}(z^{\mathrm{lo}})\leq\mathrm{Im}(u)<\mathrm{Im}(z^{\mathrm{up}})\end{cases} (4)

where zloz^{\mathrm{lo}}\in\mathbb{C} and zupz^{\mathrm{up}}\in\mathbb{C} are the lower and upper thresholds associated with zz\in\mathbb{C}. In other words, the real and imaginary parts of zz, zloz^{\mathrm{lo}}, and zupz^{\mathrm{up}} correspond to one of the 2B2^{B} quantization intervals.

A coherence block of length TcT_{\mathrm{c}} in the uplink consists of the channel estimation phase of length TT and data transmission phase of length TcTT_{\mathrm{c}}-T. To proceed, let 𝒯c=T\mathcal{T}_{\mathrm{c}}=\llbracket T\rrbracket and 𝒯d=TcT\mathcal{T}_{\mathrm{d}}=\llbracket T_{\mathrm{c}}\rrbracket\setminus\llbracket T\rrbracket be the time slots for the channel estimation phase and data transmission phase. Then, the received signals 𝐘M×T\mathbf{Y}\in\mathbb{C}^{M\times T} and 𝐙N×T\mathbf{Z}\in\mathbb{C}^{N\times T} at the base station and active sensors over the channel estimation phase of length TT are expressed as

𝐘\displaystyle\mathbf{Y} =[𝐲[1]𝐲[T]]\displaystyle=\begin{bmatrix}\mathbf{y}[1]&\cdots&\mathbf{y}[T]\end{bmatrix}
=𝐆¯(𝐒𝐅¯𝐗)+𝐇¯𝐗+𝐍B,\displaystyle=\bar{\mathbf{G}}(\mathbf{S}\odot\bar{\mathbf{F}}\mathbf{X})+\bar{\mathbf{H}}\mathbf{X}+\mathbf{N}_{\mathrm{B}}, (5)
𝐙\displaystyle\mathbf{Z} =[𝐳[1]𝐳[T]]\displaystyle=\begin{bmatrix}\mathbf{z}[1]&\cdots&\mathbf{z}[T]\end{bmatrix}
=𝛀Q(𝐅¯𝐗+𝐍I)\displaystyle=\bm{\Omega}\odot\mathrm{Q}(\bar{\mathbf{F}}\mathbf{X}+\mathbf{N}_{\mathrm{I}}) (6)

using the notations 𝐒=[𝐬[1],,𝐬[T]]\mathbf{S}=[\mathbf{s}[1],\dots,\mathbf{s}[T]], 𝛀=[𝛀[1],,𝛀[T]]\bm{\Omega}=[\bm{\Omega}[1],\dots,\bm{\Omega}[T]], 𝐗=[𝐱[1],,𝐱[T]]\mathbf{X}=[\mathbf{x}[1],\dots,\mathbf{x}[T]], 𝐍B=[𝐧B[1],,𝐧B[T]]\mathbf{N}_{\mathrm{B}}=[\mathbf{n}_{\mathrm{B}}[1],\dots,\mathbf{n}_{\mathrm{B}}[T]], and 𝐍I=[𝐧I[1],,𝐧I[T]]\mathbf{N}_{\mathrm{I}}=[\mathbf{n}_{\mathrm{I}}[1],\dots,\mathbf{n}_{\mathrm{I}}[T]].

In addition, the semi-passive elements are configured as

𝛀[t]0={Nafor t𝒯c0for t𝒯d,\|\bm{\Omega}[t]\|_{0}=\begin{cases}N_{\mathrm{a}}&\text{for }t\in\mathcal{T}_{\mathrm{c}}\\ 0&\text{for }t\in\mathcal{T}_{\mathrm{d}}\end{cases}, (7)

which means that NaN_{\mathrm{a}} semi-passive elements are activated to the active sensing mode in the channel estimation phase and deactivated to the passive reflecting mode in the data transmission phase. The switching performance of switching networks that connect the IRS elements to RF chains is determined by the switching period Tsn1T_{\mathrm{sn}}\geq 1 defined as the time required for the 0-1 pattern of 𝛀[t]\bm{\Omega}[t] to change such that

𝛀[(i1)Tsn+1]==𝛀[iTsn] for i,\bm{\Omega}[(i-1)T_{\mathrm{sn}}+1]=\cdots=\bm{\Omega}[iT_{\mathrm{sn}}]\text{ for }i\in\mathbb{N}, (8)

and we define the switching frequency as fsn=1/Tsn1f_{\mathrm{sn}}=1/T_{\mathrm{sn}}\leq 1. Therefore, the 0-1 pattern of 𝛀[t]\bm{\Omega}[t] can change according to (7) and (8).

II-C Problem Formulation via Virtual Channel Representation

We formulate the channel estimation problem using the virtual channel representation in conjunction with the channel model and signal model introduced in the previous subsections. To proceed, define the overcomplete dictionaries

𝐀B\displaystyle\mathbf{A}_{\mathrm{B}} =[𝐚B(θ^1)𝐚B(θ^Mg)]M×Mg,\displaystyle=\begin{bmatrix}\mathbf{a}_{\mathrm{B}}(\hat{\theta}_{1})&\cdots&\mathbf{a}_{\mathrm{B}}(\hat{\theta}_{M_{\mathrm{g}}})\end{bmatrix}\in\mathbb{C}^{M\times M_{\mathrm{g}}}, (9)
𝐀I\displaystyle\mathbf{A}_{\mathrm{I}} =[𝐚I(θ^1A,θ^1Z)𝐚I(θ^NgA,θ^NgZ)]N×Ng\displaystyle=\begin{bmatrix}\mathbf{a}_{\mathrm{I}}(\hat{\theta}_{1}^{\mathrm{A}},\hat{\theta}_{1}^{\mathrm{Z}})&\cdots&\mathbf{a}_{\mathrm{I}}(\hat{\theta}_{N_{\mathrm{g}}}^{\mathrm{A}},\hat{\theta}_{N_{\mathrm{g}}}^{\mathrm{Z}})\end{bmatrix}\in\mathbb{C}^{N\times N_{\mathrm{g}}} (10)

over the predefined grids {θ^m}mMg\{\hat{\theta}_{m}\}_{m\in\llbracket M_{\mathrm{g}}\rrbracket} and {(θ^nA,θ^nZ)}nNg\{(\hat{\theta}_{n}^{\mathrm{A}},\hat{\theta}_{n}^{\mathrm{Z}})\}_{n\in\llbracket N_{\mathrm{g}}\rrbracket} where MgMM_{\mathrm{g}}\geq M and NgNN_{\mathrm{g}}\geq N are the grid resolutions. Then, the virtual channel representation admits the transformations [24]

𝐅¯\displaystyle\bar{\mathbf{F}} =𝐀I𝐅,\displaystyle=\mathbf{A}_{\mathrm{I}}\mathbf{F}, (11)
𝐆¯\displaystyle\bar{\mathbf{G}} =𝐀B𝐆𝐀IH,\displaystyle=\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}, (12)
𝐇¯\displaystyle\bar{\mathbf{H}} =𝐀B𝐇\displaystyle=\mathbf{A}_{\mathrm{B}}\mathbf{H} (13)

where 𝐅Ng×K\mathbf{F}\in\mathbb{C}^{N_{\mathrm{g}}\times K}, 𝐆Mg×Ng\mathbf{G}\in\mathbb{C}^{M_{\mathrm{g}}\times N_{\mathrm{g}}}, and 𝐇Mg×K\mathbf{H}\in\mathbb{C}^{M_{\mathrm{g}}\times K} are the equivalent angular-domain channels. In practice, the angular-domain channels are sparse because there are limited scatterers that constitute the channels in the mmWave band [25, 26].

Now, we can reexpress (5) and (6) using the virtual channel representation as

𝐘\displaystyle\mathbf{Y} =𝐀B𝐆𝐀IH(𝐒𝐀I𝐅𝐗)+𝐀B𝐇𝐗+𝐍B,\displaystyle=\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})+\mathbf{A}_{\mathrm{B}}\mathbf{H}\mathbf{X}+\mathbf{N}_{\mathrm{B}}, (14)
𝐙\displaystyle\mathbf{Z} =𝛀Q(𝐀I𝐅𝐗+𝐍I),\displaystyle=\bm{\Omega}\odot\mathrm{Q}(\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X}+\mathbf{N}_{\mathrm{I}}), (15)

and our goal is to estimate {𝐅,𝐆,𝐇}\{\mathbf{F},\mathbf{G},\mathbf{H}\} from {𝐘,𝐙}\{\mathbf{Y},\mathbf{Z}\}. Since all the links are sparse, (14) and (15) is a combination of low-rank matrix factorization [27] and low-rank matrix completion [28] problems, which are NP-hard in general. To the best of our knowledge, our work is the first attempt to jointly exploit the received signal at the base station and additional information acquired from the active sensors at the IRS to estimate the UE-IRS and IRS-BS links. In contrast, recent works [12, 13] on channel estimation with semi-passive elements cannot be considered as semi-passive element-aided but rather driven because only the received signal at the semi-passive elements is leveraged. As a result, [12, 13] require both uplink and downlink training signals to estimate the UE-IRS and BS-IRS links, whereas we only use uplink training signals to estimate all the links.

III Proposed VI-SBL-Based Channel Estimator

In this section, a VI-SBL-based channel estimator is proposed that performs approximate posterior inference on {𝐅,𝐆,𝐇}\{\mathbf{F},\mathbf{G},\mathbf{H}\} from {𝐘,𝐙}\{\mathbf{Y},\mathbf{Z}\} under the SBL framework [14, 15]. In particular, we solve SBL via the variational free energy principle with the mean-field approximation [16, 17] to derive the posterior distributions of {𝐅,𝐆,𝐇}\{\mathbf{F},\mathbf{G},\mathbf{H}\}. Then, we can compute the posterior means from the approximate posterior distributions, or equivalently the MMSE estimates that minimize the channel estimation error.

III-A Pseudo-Measurement Model

To facilitate the analysis, we propose a pseudo-measurement model

𝐙^\displaystyle\hat{\mathbf{Z}} =Q(𝐔)\displaystyle=\mathrm{Q}(\mathbf{U})
=Q(𝛀𝐀I𝐅𝐗+𝐍I)\displaystyle=\mathrm{Q}(\bm{\Omega}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X}+\mathbf{N}_{\mathrm{I}}) (16)

where the elements of 𝐙^N×T\hat{\mathbf{Z}}\in\mathbb{C}^{N\times T} corresponding to the nonzero pattern of 𝛀\bm{\Omega} are equal to the nonzero elements of 𝐙\mathbf{Z}. In addition, the lower and upper pseudo-thresholds 𝐙^loN×T\hat{\mathbf{Z}}^{\mathrm{lo}}\in\mathbb{C}^{N\times T} and 𝐙^upN×T\hat{\mathbf{Z}}^{\mathrm{up}}\in\mathbb{C}^{N\times T} associated with 𝐙^\hat{\mathbf{Z}} are defined as in (4). The elements of 𝐙\mathbf{Z} and 𝐙^\hat{\mathbf{Z}} corresponding to the zeros of 𝛀\bm{\Omega}, however, are not the same. Nevertheless, (15) and (16) are statistically equivalent because the elements of 𝐙\mathbf{Z} and 𝐙^\hat{\mathbf{Z}} corresponding to the nonzero pattern of 𝛀\bm{\Omega} bear the same information about 𝐅\mathbf{F}, while those corresponding to the zeros of 𝛀\bm{\Omega} have no meaningful information about 𝐅\mathbf{F} as evident from (15) and (16). Therefore, the elements of 𝐙^\hat{\mathbf{Z}}, 𝐙^lo\hat{\mathbf{Z}}^{\mathrm{lo}}, and 𝐙^up\hat{\mathbf{Z}}^{\mathrm{up}} corresponding to the zeros of 𝛀\bm{\Omega}—the elements that cannot be determined from 𝐙\mathbf{Z}—can be assigned arbitrarily from the 2B2^{B} quantization intervals of the BB-bit quantizer without loss of generality. Since the pseudo-measurement model is more convenient to deal with, we estimate {𝐅,𝐆,𝐇}\{\mathbf{F},\mathbf{G},\mathbf{H}\} from {𝐘,𝐙^}\{\mathbf{Y},\hat{\mathbf{Z}}\} in (14) and (16) in the sequel.

III-B Hierarchical Bayesian Model of SBL Framework

To account for the interaction among {𝐅,𝐆,𝐇,𝐔,𝐘,𝐙^}\{\mathbf{F},\mathbf{G},\mathbf{H},\mathbf{U},\mathbf{Y},\hat{\mathbf{Z}}\}, we treat all the variables as random variables that constitute a hierarchical Bayesian model as shown in Fig. 2. In addition, we introduce the random variables 𝚪𝐅Ng×K\bm{\Gamma}_{\mathbf{F}}\in\mathbb{C}^{N_{\mathrm{g}}\times K}, 𝚪𝐆Mg×Ng\bm{\Gamma}_{\mathbf{G}}\in\mathbb{C}^{M_{\mathrm{g}}\times N_{\mathrm{g}}}, and 𝚪𝐇Mg×K\bm{\Gamma}_{\mathbf{H}}\in\mathbb{C}^{M_{\mathrm{g}}\times K} to capture the sparse nature of {𝐅,𝐆,𝐇}\{\mathbf{F},\mathbf{G},\mathbf{H}\}, which is the well-known SBL framework [14, 15]. Before moving on, we introduce the equivalent vector forms of the measurement 𝒴={𝐲,𝐳^}\mathcal{Y}=\{\mathbf{y},\hat{\mathbf{z}}\}, hidden variable 𝒳={𝐟,𝐠,𝐡,𝜸𝐟,𝜸𝐠,𝜸𝐡,𝐮}\mathcal{X}=\{\mathbf{f},\mathbf{g},\mathbf{h},\bm{\gamma}_{\mathbf{f}},\bm{\gamma}_{\mathbf{g}},\bm{\gamma}_{\mathbf{h}},\mathbf{u}\}, and pseudo-threshold {𝐳^lo,𝐳^up}\{\hat{\mathbf{z}}^{\mathrm{lo}},\hat{\mathbf{z}}^{\mathrm{up}}\} to facilitate the analysis. For example, 𝐟=vec(𝐅)\mathbf{f}=\mathrm{vec}(\mathbf{F}), 𝐳^lo=vec(𝐙^lo)\hat{\mathbf{z}}^{\mathrm{lo}}=\mathrm{vec}(\hat{\mathbf{Z}}^{\mathrm{lo}}), and 𝐳^up=vec(𝐙^up)\hat{\mathbf{z}}^{\mathrm{up}}=\mathrm{vec}(\hat{\mathbf{Z}}^{\mathrm{up}}).

Refer to caption
Figure 2: The Bayesian network of a hierarchical Bayesian model where 𝒴={𝐘,𝐙^}\mathcal{Y}=\{\mathbf{Y},\hat{\mathbf{Z}}\} is the measurement and 𝒳={𝐅,𝐆,𝐇,𝚪𝐅,𝚪𝐆,𝚪𝐇,𝐔}\mathcal{X}=\{\mathbf{F},\mathbf{G},\mathbf{H},\bm{\Gamma}_{\mathbf{F}},\bm{\Gamma}_{\mathbf{G}},\bm{\Gamma}_{\mathbf{H}},\mathbf{U}\} is the hidden variable. The arrows represent the conditional dependence between two random variables.

In essence, the goal of SBL is to perform posterior inference on 𝒳\mathcal{X} from 𝒴\mathcal{Y} where the interaction among {𝒳,𝒴}\{\mathcal{X},\mathcal{Y}\} is captured by the conditional distributions listed as follows. First, {𝐟,𝐠,𝐡}\{\mathbf{f},\mathbf{g},\mathbf{h}\} conditioned on {𝜸𝐟,𝜸𝐠,𝜸𝐡}\{\bm{\gamma}_{\mathbf{f}},\bm{\gamma}_{\mathbf{g}},\bm{\gamma}_{\mathbf{h}}\} are assumed to be Gaussian distributed as

p(𝐟|𝜸𝐟)\displaystyle p(\mathbf{f}|\bm{\gamma}_{\mathbf{f}}) =𝒞𝒩(𝐟|𝟎,𝚪𝐟1),\displaystyle=\mathcal{CN}(\mathbf{f}|\mathbf{0},\bm{\Gamma}_{\mathbf{f}}^{-1}), (17)
p(𝐠|𝜸𝐠)\displaystyle p(\mathbf{g}|\bm{\gamma}_{\mathbf{g}}) =𝒞𝒩(𝐠|𝟎,𝚪𝐠1),\displaystyle=\mathcal{CN}(\mathbf{g}|\mathbf{0},\bm{\Gamma}_{\mathbf{g}}^{-1}), (18)
p(𝐡|𝜸𝐡)\displaystyle p(\mathbf{h}|\bm{\gamma}_{\mathbf{h}}) =𝒞𝒩(𝐡|𝟎,𝚪𝐡1)\displaystyle=\mathcal{CN}(\mathbf{h}|\mathbf{0},\bm{\Gamma}_{\mathbf{h}}^{-1}) (19)

where 𝚪𝐟=diag(𝜸𝐟)\bm{\Gamma}_{\mathbf{f}}=\mathrm{diag}(\bm{\gamma}_{\mathbf{f}}), 𝚪𝐠=diag(𝜸𝐠)\bm{\Gamma}_{\mathbf{g}}=\mathrm{diag}(\bm{\gamma}_{\mathbf{g}}), and 𝚪𝐡=diag(𝜸𝐡)\bm{\Gamma}_{\mathbf{h}}=\mathrm{diag}(\bm{\gamma}_{\mathbf{h}}) are the precision matrices of the Gaussian distributions above. Meanwhile, the hyperpriors of the hyperparameters {𝜸𝐟,𝜸𝐠,𝜸𝐡}\{\bm{\gamma}_{\mathbf{f}},\bm{\gamma}_{\mathbf{g}},\bm{\gamma}_{\mathbf{h}}\} are modeled as independent and identically distributed (i.i.d.) Gamma distributions

p(𝜸𝐟)\displaystyle p(\bm{\gamma}_{\mathbf{f}}) =iGamma(γ𝐟,i|a,b)\displaystyle=\prod_{i}\mathrm{Gamma}(\gamma_{\mathbf{f},i}|a,b)
=ibaΓ(a)γ𝐟,ia1ebγ𝐟,i,\displaystyle=\prod_{i}\frac{b^{a}}{\Gamma(a)}\gamma_{\mathbf{f},i}^{a-1}e^{-b\gamma_{\mathbf{f},i}}, (20)
p(𝜸𝐠)\displaystyle p(\bm{\gamma}_{\mathbf{g}}) =iGamma(γ𝐠,i|a,b)\displaystyle=\prod_{i}\mathrm{Gamma}(\gamma_{\mathbf{g},i}|a,b)
=ibaΓ(a)γ𝐠,ia1ebγ𝐠,i,\displaystyle=\prod_{i}\frac{b^{a}}{\Gamma(a)}\gamma_{\mathbf{g},i}^{a-1}e^{-b\gamma_{\mathbf{g},i}}, (21)
p(𝜸𝐡)\displaystyle p(\bm{\gamma}_{\mathbf{h}}) =iGamma(γ𝐡,i|a,b)\displaystyle=\prod_{i}\mathrm{Gamma}(\gamma_{\mathbf{h},i}|a,b)
=ibaΓ(a)γ𝐡,ia1ebγ𝐡,i\displaystyle=\prod_{i}\frac{b^{a}}{\Gamma(a)}\gamma_{\mathbf{h},i}^{a-1}e^{-b\gamma_{\mathbf{h},i}} (22)

where Γ()\Gamma(\cdot) is the Gamma function, and aa and bb are the shape and rate to be chosen. The reason for choosing the Gaussian-Gamma distribution is because {p(𝜸𝐟),p(𝜸𝐠),p(𝜸𝐡)}\{p(\bm{\gamma_{\mathbf{f}}}),p(\bm{\gamma_{\mathbf{g}}}),p(\bm{\gamma_{\mathbf{h}}})\} are the conjugate priors for the likelihood functions {p(𝐟|𝜸𝐟),p(𝐠|𝜸𝐠),p(𝐡|𝜸𝐡)}\{p(\mathbf{f}|\bm{\gamma}_{\mathbf{f}}),p(\mathbf{g}|\bm{\gamma}_{\mathbf{g}}),p(\mathbf{h}|\bm{\gamma}_{\mathbf{h}})\} that make posterior inference tractable [29]. Furthermore, the Gaussian-Gamma distribution captures the sparse nature of {𝐟,𝐠,𝐡}\{\mathbf{f},\mathbf{g},\mathbf{h}\} by making the marginal priors {p(𝐟),p(𝐠),p(𝐡)}\{p(\mathbf{f}),p(\mathbf{g}),p(\mathbf{h})\} Student’s-tt distributions, which are sparsity-promoting priors under the appropriate choice of aa and bb [14, 15]. In the simulation, we adhere to the convention that assumes uninformative priors by setting a=b=106a=b=10^{-6}.

Moving on to the measurement model, the conditional distribution of 𝐲\mathbf{y} is

p(𝐲|𝐟,𝐠,𝐡)=\displaystyle p(\mathbf{y}|\mathbf{f},\mathbf{g},\mathbf{h})=
𝒞𝒩(𝐲|vec(𝐀B𝐆𝐀IH(𝐒𝐀I𝐅𝐗)+𝐀B𝐇𝐗),σB2𝐈),\displaystyle\mathcal{CN}(\mathbf{y}|\mathrm{vec}(\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})+\mathbf{A}_{\mathrm{B}}\mathbf{H}\mathbf{X}),\sigma_{\mathrm{B}}^{2}\mathbf{I}), (23)

which follows from (14). Likewise, the conditional distributions of 𝐮\mathbf{u} and 𝐳^\hat{\mathbf{z}} in the pseudo-measurement model are

p(𝐮|𝐟)=\displaystyle p(\mathbf{u}|\mathbf{f})= 𝒞𝒩(𝐮|vec(𝛀𝐀I𝐅𝐗),σI2𝐈),\displaystyle\mathcal{CN}(\mathbf{u}|\mathrm{vec}(\bm{\Omega}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X}),\sigma_{\mathrm{I}}^{2}\mathbf{I}), (24)
p(𝐳^|𝐮)=\displaystyle p(\hat{\mathbf{z}}|\mathbf{u})= 𝕀(𝐳^=Q(𝐮))\displaystyle\mathbb{I}(\hat{\mathbf{z}}=\mathrm{Q}(\mathbf{u}))
=\displaystyle= 𝕀(Re(𝐳^lo)Re(𝐮)Re(𝐳^up))×\displaystyle\mathbb{I}(\mathrm{Re}(\hat{\mathbf{z}}^{\mathrm{lo}})\preceq\mathrm{Re}(\mathbf{u})\prec\mathrm{Re}(\hat{\mathbf{z}}^{\mathrm{up}}))\times
𝕀(Im(𝐳^lo)Im(𝐮)Im(𝐳^up))\displaystyle\mathbb{I}(\mathrm{Im}(\hat{\mathbf{z}}^{\mathrm{lo}})\preceq\mathrm{Im}(\mathbf{u})\prec\mathrm{Im}(\hat{\mathbf{z}}^{\mathrm{up}})) (25)

where (24) and (25) come from (16) and (4). The indicator function 𝕀()\mathbb{I}(\cdot) is equal to one if the argument is true and zero otherwise.

Now, recall that the goal of SBL is to perform exact posterior inference on 𝒳\mathcal{X} from 𝒴\mathcal{Y} using (17)-(25). The problem, however, is that computing p(𝒳|𝒴)=p(𝒳,𝒴)/p(𝒳,𝒴)𝑑𝒳p(\mathcal{X}|\mathcal{Y})=p(\mathcal{X},\mathcal{Y})/\int p(\mathcal{X},\mathcal{Y})d\mathcal{X} is intractable in general. Therefore, we focus on finding the approximate posterior distribution using the VI approach.

III-C VI Approach to SBL

First, we explain the idea behind VI with the mean-field approximation [16, 17]. Then, we derive the approximate posterior distributions, which enable us to compute the posterior means. To proceed, consider the decomposition of the log-evidence

logp(𝒴)=\displaystyle\log p(\mathcal{Y})=
q(𝒳)logq(𝒳)p(𝒳|𝒴)d𝒳=DKL(qp)+q(𝒳)logp(𝒳,𝒴)q(𝒳)d𝒳=(q)\displaystyle\underbrace{\int q(\mathcal{X})\log\frac{q(\mathcal{X})}{p(\mathcal{X}|\mathcal{Y})}d\mathcal{X}}_{=D_{\mathrm{KL}}(q\|p)}+\underbrace{\int q(\mathcal{X})\log\frac{p(\mathcal{X},\mathcal{Y})}{q(\mathcal{X})}d\mathcal{X}}_{=\mathcal{L}(q)} (26)

where q(𝒳)q(\mathcal{X}) is any probability distribution that approximates p(𝒳|𝒴)p(\mathcal{X}|\mathcal{Y}), DKL(qp)D_{\mathrm{KL}}(q\|p) is the Kullback–Leibler (KL) divergence that measures the distance between q(𝒳)q(\mathcal{X}) and p(𝒳|𝒴)p(\mathcal{X}|\mathcal{Y}), and (q)\mathcal{L}(q) is known as the negative variational free energy. Then, we can minimize DKL(qp)D_{\mathrm{KL}}(q\|p) with respect to q(𝒳)q(\mathcal{X}) by maximizing (q)\mathcal{L}(q) because logp(𝒴)\log p(\mathcal{Y}) is constant.

In general, the variational free energy minimization problem is intractable for a general class of q(𝒳)q(\mathcal{X}). Therefore, we adopt the mean-field approximation by considering the class of q(𝒳)q(\mathcal{X}) that assumes independence among the partition of 𝒳\mathcal{X} such that q(𝒳)=iq(𝒳i)q(\mathcal{X})=\prod_{i}q(\mathcal{X}_{i}). Then, {q(𝒳i)}i\{q(\mathcal{X}_{i})\}_{\forall i} is the global minimum of the variational free energy minimization problem if and only if [16, 17]

q(𝒳i)=1Ziexp{𝔼jiq(𝒳j){logp(𝒴|𝒳)p(𝒳)}=logp(𝒴|𝒳)p(𝒳)𝒳i} for iq(\mathcal{X}_{i})=\frac{1}{Z_{i}}\exp\left\{\underbrace{\mathbb{E}_{\prod_{j\neq i}q(\mathcal{X}_{j})}\{\log p(\mathcal{Y}|\mathcal{X})p(\mathcal{X})\}}_{=\langle\log p(\mathcal{Y}|\mathcal{X})p(\mathcal{X})\rangle_{\mathcal{X}_{i}}}\right\}\text{ for }\forall i (27)

where ZiZ_{i} is the normalization constant that makes q(𝒳i)q(\mathcal{X}_{i}) a valid probability distribution. Here, 𝒳i\langle\cdot\rangle_{\mathcal{X}_{i}} indicates the expectation with respect to jiq(𝒳j)\prod_{j\neq i}q(\mathcal{X}_{j}). In addition, we use the notation \langle\cdot\rangle to indicate the expectation with respect to q(𝒳)q(\mathcal{X}) in the sequel.

Then, our goal is to derive the functional forms of {q(𝒳i)}i\{q(\mathcal{X}_{i})\}_{\forall i} from (27) that enable us to deduce the distributions of {q(𝒳i)}i\{q(\mathcal{X}_{i})\}_{\forall i} by inspection. Since the distributions of {q(𝒳i)}i\{q(\mathcal{X}_{i})\}_{\forall i} are coupled in an intertwined manner as evident from (27), however, q(𝒳i)q(\mathcal{X}_{i}) is updated for fixed {q(𝒳j)}ji\{q(\mathcal{X}_{j})\}_{j\neq i} by cycling through ii, which leads us to a local minimum. Now, we derive the functional forms of {q(𝒳i)}i\{q(\mathcal{X}_{i})\}_{\forall i} to determine the approximate posterior distributions for the partition 𝒳={𝐟,𝐠,𝐡,𝜸𝐟,𝜸𝐠,𝜸𝐡,𝐮}\mathcal{X}=\{\mathbf{f},\mathbf{g},\mathbf{h},\bm{\gamma}_{\mathbf{f}},\bm{\gamma}_{\mathbf{g}},\bm{\gamma}_{\mathbf{h}},\mathbf{u}\}.

1. Derivation of q(𝐮)q(\mathbf{u}): To derive q(𝐮)q(\mathbf{u}), we only need to plug in (24) and (25) to (27). The reason for not plugging in (17)-(23) to (27) is because the expectations of (17)-(23) with respect to 𝐮\langle\cdot\rangle_{\mathbf{u}} are constants that do not affect the functional form of q(𝐮)q(\mathbf{u}). In the posterior inference jargon, {𝐟,𝐳^}\{\mathbf{f},\hat{\mathbf{z}}\} is said to be the Markov blanket of 𝐮\mathbf{u} [30]. In the sequel, we identify the Markov blanket by inspection without further explanation.

To proceed, define

𝐛𝐮𝐟\displaystyle\mathbf{b}_{\mathbf{uf}} =vec(𝛀𝐀I𝐅𝐗)\displaystyle=\mathrm{vec}(\bm{\Omega}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})
=diag(vec(𝛀))(𝐗T𝐀I)𝐟\displaystyle=\mathrm{diag}(\mathrm{vec}(\bm{\Omega}))(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{I}})\mathbf{f} (28)

from (24). Then, plugging in (24) and (25) to (27) yields

q(𝐮)\displaystyle q(\mathbf{u}) exp{logp(𝐳^|𝐮)+logp(𝐮|𝐟)𝐮}\displaystyle\propto\exp\{\log p(\hat{\mathbf{z}}|\mathbf{u})+\langle\log p(\mathbf{u}|\mathbf{f})\rangle_{\mathbf{u}}\}
𝕀(𝐳^=Q(𝐮))exp{𝐮𝐛𝐮𝐟22/σI2𝐮}\displaystyle\propto\mathbb{I}(\hat{\mathbf{z}}=\mathrm{Q}(\mathbf{u}))\exp\{\langle-\|\mathbf{u}-\mathbf{b}_{\mathbf{uf}}\|_{2}^{2}/\sigma_{\mathrm{I}}^{2}\rangle_{\mathbf{u}}\}
𝕀(𝐳^=Q(𝐮))exp{𝐮𝐛𝐮𝐟22/σI2}\displaystyle\propto\mathbb{I}(\hat{\mathbf{z}}=\mathrm{Q}(\mathbf{u}))\exp\{-\|\mathbf{u}-\langle\mathbf{b}_{\mathbf{uf}}\rangle\|_{2}^{2}/\sigma_{\mathrm{I}}^{2}\}
𝕀(𝐳^=Q(𝐮))𝒞𝒩(𝐮|𝐛𝐮𝐟,σI2𝐈),\displaystyle\propto\mathbb{I}(\hat{\mathbf{z}}=\mathrm{Q}(\mathbf{u}))\mathcal{CN}(\mathbf{u}|\langle\mathbf{b}_{\mathbf{uf}}\rangle,\sigma_{\mathrm{I}}^{2}\mathbf{I}), (29)

or equivalently

q(Re(ui))\displaystyle q(\mathrm{Re}(u_{i}))\propto
𝕀(Re(z^ilo)Re(ui)<Re(z^iup))𝒩(Re(ui)|Re(b𝐮𝐟,i),σI22)\displaystyle\mathbb{I}(\mathrm{Re}(\hat{z}_{i}^{\mathrm{lo}})\leq\mathrm{Re}(u_{i})<\mathrm{Re}(\hat{z}_{i}^{\mathrm{up}}))\mathcal{N}\left(\mathrm{Re}(u_{i})|\mathrm{Re}(\langle b_{\mathbf{uf},i}\rangle),\frac{\sigma_{\mathrm{I}}^{2}}{2}\right)

for the real as well as the imaginary part, from which we recognize that 𝐮\mathbf{u} is truncated Gaussian distributed. The posterior mean of 𝐮\mathbf{u} has a well-known form [31, 32, 33]

Re(ui)=\displaystyle\langle\mathrm{Re}(u_{i})\rangle=
Re(b𝐮𝐟,i)σI2×ϕ(Re(βi))ϕ(Re(αi))Φ(Re(βi))Φ(Re(αi)),\displaystyle\quad\mathrm{Re}(\langle b_{\mathbf{uf},i}\rangle)-\frac{\sigma_{\mathrm{I}}}{2}\times\frac{\phi(\mathrm{Re}(\beta_{i}))-\phi(\mathrm{Re}(\alpha_{i}))}{\Phi(\mathrm{Re}(\beta_{i}))-\Phi(\mathrm{Re}(\alpha_{i}))}, (30)
Im(ui)=\displaystyle\langle\mathrm{Im}(u_{i})\rangle=
Im(b𝐮𝐟,i)σI2×ϕ(Im(βi))ϕ(Im(αi))Φ(Im(βi))Φ(Im(αi))\displaystyle\quad\mathrm{Im}(\langle b_{\mathbf{uf},i}\rangle)-\frac{\sigma_{\mathrm{I}}}{2}\times\frac{\phi(\mathrm{Im}(\beta_{i}))-\phi(\mathrm{Im}(\alpha_{i}))}{\Phi(\mathrm{Im}(\beta_{i}))-\Phi(\mathrm{Im}(\alpha_{i}))} (31)

where ϕ()\phi(\cdot) and Φ()\Phi(\cdot) are the standard normal PDF and cumulative distribution function (CDF), αi=(z^ilob𝐮𝐟,i)/(σI/2)\alpha_{i}=(\hat{z}_{i}^{\mathrm{lo}}-\langle b_{\mathbf{uf},i}\rangle)/(\sigma_{\mathrm{I}}/\sqrt{2}), and βi=(z^iupb𝐮𝐟,i)/(σI/2)\beta_{i}=(\hat{z}_{i}^{\mathrm{up}}-\langle b_{\mathbf{uf},i}\rangle)/(\sigma_{\mathrm{I}}/\sqrt{2}). To compute the expressions above, we need 𝐛𝐮𝐟\langle\mathbf{b}_{\mathbf{uf}}\rangle, which we can obtain from (28) by replacing 𝐟\mathbf{f} with 𝐟\langle\mathbf{f}\rangle.

2. Derivation of q(𝐟)q(\mathbf{f}): First, define 𝐀𝐟𝐠\mathbf{A}_{\mathbf{fg}}, 𝐛𝐟𝐡\mathbf{b}_{\mathbf{fh}}, and 𝐀𝐟\mathbf{A}_{\mathbf{f}} from (23) and (24) as (32) and (33) at the bottom of the next page. Then, plugging in (17), (23), and (24) to (27) with some straightforward but tedious algebra leads to

q(𝐟)\displaystyle q(\mathbf{f})\propto exp{logp(𝐲|𝐟,𝐠,𝐡)+logp(𝐮|𝐟)+logp(𝐟|𝜸𝐟)𝐟}\displaystyle\exp\{\langle\log p(\mathbf{y}|\mathbf{f},\mathbf{g},\mathbf{h})+\log p(\mathbf{u}|\mathbf{f})+\log p(\mathbf{f}|\bm{\gamma}_{\mathbf{f}})\rangle_{\mathbf{f}}\}
\displaystyle\propto exp{𝐲𝐛𝐟𝐡𝐀𝐟𝐠𝐟22/σB2𝐟}×\displaystyle\exp\{\langle-\|\mathbf{y}-\mathbf{b}_{\mathbf{fh}}-\mathbf{A}_{\mathbf{fg}}\mathbf{f}\|_{2}^{2}/\sigma_{\mathrm{B}}^{2}\rangle_{\mathbf{f}}\}\times
exp{𝐮𝐀𝐟𝐟22/σI2𝐟}×\displaystyle\exp\{\langle-\|\mathbf{u}-\mathbf{A}_{\mathbf{f}}\mathbf{f}\|_{2}^{2}/\sigma_{\mathrm{I}}^{2}\rangle_{\mathbf{f}}\}\times
exp{𝐟H𝚪𝐟𝐟𝐟}\displaystyle\exp\{\langle-\mathbf{f}^{\mathrm{H}}\bm{\Gamma}_{\mathbf{f}}\mathbf{f}\rangle_{\mathbf{f}}\}
\displaystyle\propto exp{(𝐟𝐦𝐟)H𝐂𝐟1(𝐟𝐦𝐟)},\displaystyle\exp\{-(\mathbf{f}-\mathbf{m}_{\mathbf{f}})^{\mathrm{H}}\mathbf{C}_{\mathbf{f}}^{-1}(\mathbf{f}-\mathbf{m}_{\mathbf{f}})\}, (34)

which is a Gaussian distribution with

𝐦𝐟\displaystyle\mathbf{m}_{\mathbf{f}} =𝐂𝐟(1σB2𝐀𝐟𝐠H(𝐲𝐛𝐟𝐡)+1σI2𝐀𝐟H𝐮),\displaystyle=\mathbf{C}_{\mathbf{f}}\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\langle\mathbf{A}_{\mathbf{fg}}\rangle^{\mathrm{H}}(\mathbf{y}-\langle\mathbf{b}_{\mathbf{fh}}\rangle)+\frac{1}{\sigma_{\mathrm{I}}^{2}}\mathbf{A}_{\mathbf{f}}^{\mathrm{H}}\langle\mathbf{u}\rangle\right), (35)
𝐂𝐟\displaystyle\mathbf{C}_{\mathbf{f}} =(1σB2𝐀𝐟𝐠H𝐀𝐟𝐠+1σI2𝐀𝐟H𝐀𝐟+𝚪𝐟)1.\displaystyle=\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle+\frac{1}{\sigma_{\mathrm{I}}^{2}}\mathbf{A}_{\mathbf{f}}^{\mathrm{H}}\mathbf{A}_{\mathbf{f}}+\langle\bm{\Gamma}_{\mathbf{f}}\rangle\right)^{-1}. (36)

The posterior mean and covariance above require 𝐀𝐟𝐠\langle\mathbf{A}_{\mathbf{fg}}\rangle, 𝐛𝐟𝐡\langle\mathbf{b}_{\mathbf{fh}}\rangle, and 𝐀𝐟𝐠H𝐀𝐟𝐠\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle to be computed, from which the first two can be obtained from (32) by substituting 𝐆\mathbf{G} and 𝐡\mathbf{h} with 𝐆\langle\mathbf{G}\rangle and 𝐡\langle\mathbf{h}\rangle. The expression for 𝐀𝐟𝐠H𝐀𝐟𝐠\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle is provided in Appendix A as the derivation is more involved.

vec(𝐀B𝐆𝐀IH(𝐒𝐀I𝐅𝐗)+𝐀B𝐇𝐗)=(𝐈T𝐀B𝐆𝐀IH)diag(vec(𝐒))(𝐗T𝐀I)=𝐀𝐟𝐠𝐟+(𝐗T𝐀B)𝐡=𝐛𝐟𝐡,\displaystyle\mathrm{vec}(\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})+\mathbf{A}_{\mathrm{B}}\mathbf{H}\mathbf{X})=\underbrace{(\mathbf{I}_{T}\otimes\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\mathrm{diag}(\mathrm{vec}(\mathbf{S}))(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{I}})}_{=\mathbf{A}_{\mathbf{fg}}}\mathbf{f}+\underbrace{(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{B}})\mathbf{h}}_{=\mathbf{b}_{\mathbf{fh}}}, (32)
vec(𝛀𝐀I𝐅𝐗)=diag(vec(𝛀))(𝐗T𝐀I)=𝐀𝐟𝐟,\displaystyle\mathrm{vec}(\bm{\Omega}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})=\underbrace{\mathrm{diag}(\mathrm{vec}(\bm{\Omega}))(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{I}})}_{=\mathbf{A}_{\mathbf{f}}}\mathbf{f}, (33)
vec(𝐀B𝐆𝐀IH(𝐒𝐀I𝐅𝐗)+𝐀B𝐇𝐗)=((𝐒T𝐗T𝐅T𝐀IT)𝐀I𝐀B)=𝐀𝐠𝐟𝐠+(𝐗T𝐀B)𝐡=𝐛𝐠𝐡,\displaystyle\mathrm{vec}(\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})+\mathbf{A}_{\mathrm{B}}\mathbf{H}\mathbf{X})=\underbrace{((\mathbf{S}^{\mathrm{T}}\odot\mathbf{X}^{\mathrm{T}}\mathbf{F}^{\mathrm{T}}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}})\mathbf{A}_{\mathrm{I}}^{*}\otimes\mathbf{A}_{\mathrm{B}})}_{=\mathbf{A}_{\mathbf{gf}}}\mathbf{g}+\underbrace{(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{B}})\mathbf{h}}_{=\mathbf{b}_{\mathbf{gh}}}, (37)
vec(𝐀B𝐆𝐀IH(𝐒𝐀I𝐅𝐗)+𝐀B𝐇𝐗)=(𝐗T𝐀B)=𝐀𝐡𝐡+((𝐒T𝐗T𝐅T𝐀IT)𝐀I𝐀B)𝐠=𝐛𝐡𝐟𝐠\displaystyle\mathrm{vec}(\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})+\mathbf{A}_{\mathrm{B}}\mathbf{H}\mathbf{X})=\underbrace{(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{B}})}_{=\mathbf{A}_{\mathbf{h}}}\mathbf{h}+\underbrace{((\mathbf{S}^{\mathrm{T}}\odot\mathbf{X}^{\mathrm{T}}\mathbf{F}^{\mathrm{T}}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}})\mathbf{A}_{\mathrm{I}}^{*}\otimes\mathbf{A}_{\mathrm{B}})\mathbf{g}}_{=\mathbf{b}_{\mathbf{hfg}}} (41)

3. Derivation of q(𝐠)q(\mathbf{g}): First, we define 𝐀𝐠𝐟\mathbf{A}_{\mathbf{gf}} and 𝐛𝐠𝐡\mathbf{b}_{\mathbf{gh}} from (23) as (37) at the bottom of the next page. Then, expanding (27) after plugging in (18) and (23) gives

q(𝐠)\displaystyle q(\mathbf{g})\propto exp{logp(𝐲|𝐟,𝐠,𝐡)+logp(𝐠|𝜸𝐠)𝐠}\displaystyle\exp\{\langle\log p(\mathbf{y}|\mathbf{f},\mathbf{g},\mathbf{h})+\log p(\mathbf{g}|\bm{\gamma}_{\mathbf{g}})\rangle_{\mathbf{g}}\}
\displaystyle\propto exp{𝐲𝐛𝐠𝐡𝐀𝐠𝐟𝐠22/σB2𝐠}×\displaystyle\exp\{\langle-\|\mathbf{y}-\mathbf{b}_{\mathbf{gh}}-\mathbf{A}_{\mathbf{gf}}\mathbf{g}\|_{2}^{2}/\sigma_{\mathrm{B}}^{2}\rangle_{\mathbf{g}}\}\times
exp{𝐠H𝚪𝐠𝐠𝐠}\displaystyle\exp\{\langle-\mathbf{g}^{\mathrm{H}}\bm{\Gamma}_{\mathbf{g}}\mathbf{g}\rangle_{\mathbf{g}}\}
\displaystyle\propto exp{(𝐠𝐦𝐠)H𝐂𝐠1(𝐠𝐦𝐠)},\displaystyle\exp\{-(\mathbf{g}-\mathbf{m}_{\mathbf{g}})^{\mathrm{H}}\mathbf{C}_{\mathbf{g}}^{-1}(\mathbf{g}-\mathbf{m}_{\mathbf{g}})\}, (38)

which is recognized as a Gaussian distribution associated with the posterior mean and covariance

𝐦𝐠\displaystyle\mathbf{m}_{\mathbf{g}} =𝐂𝐠(1σB2𝐀𝐠𝐟H(𝐲𝐛𝐠𝐡)),\displaystyle=\mathbf{C}_{\mathbf{g}}\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\langle\mathbf{A}_{\mathbf{gf}}\rangle^{\mathrm{H}}(\mathbf{y}-\langle\mathbf{b}_{\mathbf{gh}}\rangle)\right), (39)
𝐂𝐠\displaystyle\mathbf{C}_{\mathbf{g}} =(1σB2𝐀𝐠𝐟H𝐀𝐠𝐟+𝚪𝐠)1.\displaystyle=\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle+\langle\bm{\Gamma}_{\mathbf{g}}\rangle\right)^{-1}. (40)

Again, computing the expressions above requires 𝐀𝐠𝐟\langle\mathbf{A}_{\mathbf{gf}}\rangle and 𝐛𝐠𝐡\langle\mathbf{b}_{\mathbf{gh}}\rangle as well as 𝐀𝐠𝐟H𝐀𝐠𝐟\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle, from which the terms corresponding to the first moment are given by (37) after replacing 𝐅\mathbf{F} and 𝐡\mathbf{h} with 𝐅\langle\mathbf{F}\rangle and 𝐡\langle\mathbf{h}\rangle. The explicit form of 𝐀𝐠𝐟H𝐀𝐠𝐟\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle with a detailed derivation is provided in Appendix B.

4. Derivation of q(𝐡)q(\mathbf{h}): First, define 𝐀𝐡\mathbf{A}_{\mathbf{h}} and 𝐛𝐡𝐟𝐠\mathbf{b}_{\mathbf{hfg}} from (23) as (41) at the bottom of the next page. By plugging in (19) and (23) to (27), we arrive at

q(𝐡)\displaystyle q(\mathbf{h})\propto exp{logp(𝐲|𝐟,𝐠,𝐡)+logp(𝐡|𝜸𝐡)𝐡}\displaystyle\exp\{\langle\log p(\mathbf{y}|\mathbf{f},\mathbf{g},\mathbf{h})+\log p(\mathbf{h}|\bm{\gamma}_{\mathbf{h}})\rangle_{\mathbf{h}}\}
\displaystyle\propto exp{𝐲𝐛𝐡𝐟𝐠𝐀𝐡𝐡22/σB2𝐡}×\displaystyle\exp\{\langle-\|\mathbf{y}-\mathbf{b}_{\mathbf{hfg}}-\mathbf{A}_{\mathbf{h}}\mathbf{h}\|_{2}^{2}/\sigma_{\mathrm{B}}^{2}\rangle_{\mathbf{h}}\}\times
exp{𝐡H𝚪𝐡𝐡𝐡}\displaystyle\exp\{\langle-\mathbf{h}^{\mathrm{H}}\bm{\Gamma}_{\mathbf{h}}\mathbf{h}\rangle_{\mathbf{h}}\}
\displaystyle\propto exp{(𝐡𝐦𝐡)H𝐂𝐡1(𝐡𝐦𝐡)},\displaystyle\exp\{-(\mathbf{h}-\mathbf{m}_{\mathbf{h}})^{\mathrm{H}}\mathbf{C}_{\mathbf{h}}^{-1}(\mathbf{h}-\mathbf{m}_{\mathbf{h}})\}, (42)

which implies that 𝐡\mathbf{h} is Gaussian distributed parameterized by

𝐦𝐡\displaystyle\mathbf{m}_{\mathbf{h}} =𝐂𝐡(1σB2𝐀𝐡H(𝐲𝐛𝐡𝐟𝐠)),\displaystyle=\mathbf{C}_{\mathbf{h}}\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\mathbf{A}_{\mathbf{h}}^{\mathrm{H}}(\mathbf{y}-\langle\mathbf{b}_{\mathbf{hfg}}\rangle)\right), (43)
𝐂𝐡\displaystyle\mathbf{C}_{\mathbf{h}} =(1σB2𝐀𝐡H𝐀𝐡+𝚪𝐡)1.\displaystyle=\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\mathbf{A}_{\mathbf{h}}^{\mathrm{H}}\mathbf{A}_{\mathbf{h}}+\langle\bm{\Gamma}_{\mathbf{h}}\rangle\right)^{-1}. (44)

The posterior mean and covariance above require 𝐛𝐡𝐟𝐠\langle\mathbf{b}_{\mathbf{hfg}}\rangle to be computed, which can be obtained from (41) by substituting 𝐅\mathbf{F} and 𝐠\mathbf{g} with 𝐅\langle\mathbf{F}\rangle and 𝐠\langle\mathbf{g}\rangle.

5. Derivation of q(γ𝐟)q(\bm{\gamma}_{\mathbf{f}}): Let us plug in (17) and (20) to (27) to obtain

q(𝜸𝐟)\displaystyle q(\bm{\gamma}_{\mathbf{f}})\propto exp{logp(𝐟|𝜸𝐟)𝜸𝐟+logp(𝜸𝐟)}\displaystyle\exp\{\langle\log p(\mathbf{f}|\bm{\gamma}_{\mathbf{f}})\rangle_{\bm{\gamma}_{\mathbf{f}}}+\log p(\bm{\gamma}_{\mathbf{f}})\}
\displaystyle\propto exp{logdet(𝚪𝐟)𝐟H𝚪𝐟𝐟𝜸𝐟+logp(𝜸𝐟)}\displaystyle\exp\{\log\det(\bm{\Gamma}_{\mathbf{f}})-\langle\mathbf{f}^{\mathrm{H}}\bm{\Gamma}_{\mathbf{f}}\mathbf{f}\rangle_{\bm{\gamma}_{\mathbf{f}}}+\log p(\bm{\gamma}_{\mathbf{f}})\}
\displaystyle\propto iγ𝐟,i×exp{|fi|2γ𝐟,i}×γ𝐟,ia1exp{bγ𝐟,i}\displaystyle\prod_{i}\gamma_{\mathbf{f},i}\times\exp\{-\langle|f_{i}|^{2}\rangle\gamma_{\mathbf{f},i}\}\times\gamma_{\mathbf{f},i}^{a-1}\exp\{-b\gamma_{\mathbf{f},i}\}
\displaystyle\propto iγ𝐟,i(a+1)1exp{(b+|fi|2)γ𝐟,i},\displaystyle\prod_{i}\gamma_{\mathbf{f},i}^{(a+1)-1}\exp\{-(b+\langle|f_{i}|^{2}\rangle)\gamma_{\mathbf{f},i}\}, (45)

which is recognized as a product of Gamma distributions associated with the same posterior shape a¯=a+1\bar{a}=a+1 and different rates b¯𝐟,i=b+|fi|2=b+[𝐂𝐟+𝐦𝐟𝐦𝐟H]i,i\bar{b}_{\mathbf{f},i}=b+\langle|f_{i}|^{2}\rangle=b+[\mathbf{C}_{\mathbf{f}}+\mathbf{m}_{\mathbf{f}}\mathbf{m}_{\mathbf{f}}^{\mathrm{H}}]_{i,i}. Therefore, the posterior mean of 𝜸𝐟\bm{\gamma}_{\mathbf{f}} is

γ𝐟,i=a¯b¯𝐟,i.\langle\gamma_{\mathbf{f},i}\rangle=\frac{\bar{a}}{\bar{b}_{\mathbf{f},i}}. (46)

6. Derivation of q(γ𝐠)q(\bm{\gamma}_{\mathbf{g}}): Since the priors {p(𝐟|𝜸𝐟),p(𝐠|𝜸𝐠)}\{p(\mathbf{f}|\bm{\gamma}_{\mathbf{f}}),p(\mathbf{g}|\bm{\gamma}_{\mathbf{g}})\} as well as the hyperpriors {p(𝜸𝐟),p(𝜸𝐠)}\{p(\bm{\gamma}_{\mathbf{f}}),p(\bm{\gamma}_{\mathbf{g}})\} are i.i.d., the derivation of q(𝜸𝐠)q(\bm{\gamma}_{\mathbf{g}}) leads to the same functional form, which is a product of Gamma distributions with the same posterior shape a¯=a+1\bar{a}=a+1 and different rates b¯𝐠,i=b+|gi|2=b+[𝐂𝐠+𝐦𝐠𝐦𝐠H]i,i\bar{b}_{\mathbf{g},i}=b+\langle|g_{i}|^{2}\rangle=b+[\mathbf{C}_{\mathbf{g}}+\mathbf{m}_{\mathbf{g}}\mathbf{m}_{\mathbf{g}}^{\mathrm{H}}]_{i,i}. Therefore, the posterior mean of 𝜸𝐠\bm{\gamma}_{\mathbf{g}} is

γ𝐠,i=a¯b¯𝐠,i.\langle\gamma_{\mathbf{g},i}\rangle=\frac{\bar{a}}{\bar{b}_{\mathbf{g},i}}. (47)

7. Derivation of q(γ𝐡)q(\bm{\gamma}_{\mathbf{h}}): By the same argument as in the derivation of q(𝜸𝐠)q(\bm{\gamma}_{\mathbf{g}}), we conclude that q(𝜸𝐡)q(\bm{\gamma}_{\mathbf{h}}) is a product of Gamma distributions parameterized by the same posterior shape a¯=a+1\bar{a}=a+1 and different rates b¯𝐡,i=b+|hi|2=b+[𝐂𝐡+𝐦𝐡𝐦𝐡H]i,i\bar{b}_{\mathbf{h},i}=b+\langle|h_{i}|^{2}\rangle=b+[\mathbf{C}_{\mathbf{h}}+\mathbf{m}_{\mathbf{h}}\mathbf{m}_{\mathbf{h}}^{\mathrm{H}}]_{i,i}, from which the posterior mean of 𝜸𝐡\bm{\gamma}_{\mathbf{h}} is deduced as

γ𝐡,i=a¯b¯𝐡,i.\langle\gamma_{\mathbf{h},i}\rangle=\frac{\bar{a}}{\bar{b}_{\mathbf{h},i}}. (48)

The VI-SBL-based channel estimator developed until now is summarized in Algorithm 1. In essence, Algorithm 1 performs approximate posterior inference by updating {q(𝐟),q(𝐠),q(𝐡),q(𝜸𝐟),q(𝜸𝐠),q(𝜸𝐡),q(𝐮)}\{q(\mathbf{f}),q(\mathbf{g}),q(\mathbf{h}),q(\bm{\gamma}_{\mathbf{f}}),q(\bm{\gamma}_{\mathbf{g}}),q(\bm{\gamma}_{\mathbf{h}}),q(\mathbf{u})\} iteratively. The MMSE estimates of all the links are given by

𝐅¯^\displaystyle\hat{\bar{\mathbf{F}}} =𝐀Ireshape(𝐦𝐟,[Ng,K]),\displaystyle=\mathbf{A}_{\mathrm{I}}\mathrm{reshape}(\mathbf{m}_{\mathbf{f}},[N_{\mathrm{g}},K]), (49)
𝐆¯^\displaystyle\hat{\bar{\mathbf{G}}} =𝐀Breshape(𝐦𝐠,[Mg,Ng])𝐀IH,\displaystyle=\mathbf{A}_{\mathrm{B}}\mathrm{reshape}(\mathbf{m}_{\mathbf{g}},[M_{\mathrm{g}},N_{\mathrm{g}}])\mathbf{A}_{\mathrm{I}}^{\mathrm{H}}, (50)
𝐇¯^\displaystyle\hat{\bar{\mathbf{H}}} =𝐀Breshape(𝐦𝐡,[Mg,K])\displaystyle=\mathbf{A}_{\mathrm{B}}\mathrm{reshape}(\mathbf{m}_{\mathbf{h}},[M_{\mathrm{g}},K]) (51)

where reshape(𝐚,[m,n])\mathrm{reshape}(\mathbf{a},[m,n]) reshapes 𝐚\mathbf{a} to a matrix of size m×nm\times n that preserves the columnwise ordering. To investigate the convergence of Algorithm 1, recall that Algorithm 1 tackles (27) by solving q(𝒳i)q(\mathcal{X}_{i}) for fixed {q(𝒳j)}ji\{q(\mathcal{X}_{j})\}_{j\neq i} in a cycling manner instead of jointly solving for {q(𝒳i)}i\{q(\mathcal{X}_{i})\}_{\forall i}. Since (27) defines the global minimum of the variational free energy minimization problem [16, 17], each line in Algorithm 1 at least reduces the variational free energy. By noting that the variational free energy is lower-bounded by the negative log-evidence as evident from (26), we conclude that Algorithm 1 converges to a local minimum of the variational free energy problem.

Algorithm 1 VI-SBL-based channel estimator
1:𝐲\mathbf{y}, 𝐳^\hat{\mathbf{z}}
2:𝐦𝐟\mathbf{m}_{\mathbf{f}}, 𝐦𝐠\mathbf{m}_{\mathbf{g}}, 𝐦𝐡\mathbf{m}_{\mathbf{h}}, or equivalently 𝐟\langle\mathbf{f}\rangle, 𝐠\langle\mathbf{g}\rangle, 𝐡\langle\mathbf{h}\rangle
3:Set the parameters aa and bb for the hyperpriors
4:Initialize 𝐦𝐟\mathbf{m}_{\mathbf{f}}, 𝐂𝐟\mathbf{C}_{\mathbf{f}}, 𝐦𝐡\mathbf{m}_{\mathbf{h}}, 𝐂𝐡\mathbf{C}_{\mathbf{h}}, 𝜸𝐟\langle\bm{\gamma}_{\mathbf{f}}\rangle, 𝜸𝐠\langle\bm{\gamma}_{\mathbf{g}}\rangle, 𝜸𝐡\langle\bm{\gamma}_{\mathbf{h}}\rangle, and 𝐮\langle\mathbf{u}\rangle
5:while termination condition do
6:     Update 𝐦𝐠\mathbf{m}_{\mathbf{g}} and 𝐂𝐠\mathbf{C}_{\mathbf{g}} according to (39) and (40)
7:     Update 𝜸𝐠\langle\bm{\gamma}_{\mathbf{g}}\rangle according to (47)
8:     Update 𝐦𝐡\mathbf{m}_{\mathbf{h}} and 𝐂𝐡\mathbf{C}_{\mathbf{h}} according to (43) and (44)
9:     Update 𝜸𝐡\langle\bm{\gamma}_{\mathbf{h}}\rangle according to (48)
10:     Update 𝐦𝐟\mathbf{m}_{\mathbf{f}} and 𝐂𝐟\mathbf{C}_{\mathbf{f}} according to (35) and (36)
11:     Update 𝜸𝐟\langle\bm{\gamma}_{\mathbf{f}}\rangle according to (46)
12:     Update 𝐮\langle\mathbf{u}\rangle according to (30) and (31)
13:end while

The problem, however, is that poor initialization can result in a bad local minimum. The simulation results showed that a good initialization strategy is to perform “initial” posterior inference on a subset of the parameters in advance, and then use the coarse estimates obtained in this stage to initialize the parameters. The initial posterior inference we consider is the UE-RIS link estimation and UE-BS link estimation. In particular, the UE-IRS link initialization proceeds by performing initial posterior inference on {𝐟,𝜸𝐟,𝐮}\{\mathbf{f},\bm{\gamma}_{\mathbf{f}},\mathbf{u}\} based only on the measurement 𝐳^\hat{\mathbf{z}} to initialize {𝐦𝐟,𝐂𝐟,𝜸𝐟,𝐮}\{\mathbf{m}_{\mathbf{f}},\mathbf{C}_{\mathbf{f}},\langle\bm{\gamma}_{\mathbf{f}}\rangle,\langle\mathbf{u}\rangle\} via Lines 8-10. For the UE-BS link initialization, we first turn off the passive reflecting elements for a short duration during the channel estimation phase as shown in (57). Then, the UE-BS link initialization proceeds by performing initial posterior inference on {𝐡,𝜸𝐡}\{\mathbf{h},\bm{\gamma}_{\mathbf{h}}\} based on a subset of 𝐲\mathbf{y} corresponding to the time slots the passive reflecting elements were turned off as in (57) to initialize {𝐦𝐡,𝐂𝐡,𝜸𝐡}\{\mathbf{m}_{\mathbf{h}},\mathbf{C}_{\mathbf{h}},\langle\bm{\gamma}_{\mathbf{h}}\rangle\} via Lines 6-7. Since each initial posterior inference is equivalent to the conventional direct link channel estimation, i.e., UE-IRS link and UE-BS link, the initial posterior inference can be performed without much difficulty (even though the estimates obtained in this stage are very coarse due to the extremely small size of the measurements used, thus using them only for initialization and preventing us from using them as final estimates). The remaining parameter 𝜸𝐠\langle\bm{\gamma}_{\mathbf{g}}\rangle, which is not associated with neither the UE-IRS nor UE-BS link initialization, can be initialized as its prior mean 𝔼{𝜸𝐠}=a/b×𝟏\mathbb{E}\{\bm{\gamma}_{\mathbf{g}}\}=a/b\times\mathbf{1}. By adopting the proposed initialization strategy, Algorithm 1 was demonstrated to produce good results.

III-D Complexity Reduction via Mean-Field Approximation

The complexity of VI-SBL is mainly attributed to the matrix inversions in (36), (40), and (44) where the matrices to be inverted are of sizes NgK×NgKN_{\mathrm{g}}K\times N_{\mathrm{g}}K, MgNg×MgNgM_{\mathrm{g}}N_{\mathrm{g}}\times M_{\mathrm{g}}N_{\mathrm{g}}, and MgK×MgKM_{\mathrm{g}}K\times M_{\mathrm{g}}K. Therefore, the complexity of VI-SBL is 𝒪(Ng3K3+Mg3Ng3+Mg3K3)\mathcal{O}(N_{\mathrm{g}}^{3}K^{3}+M_{\mathrm{g}}^{3}N_{\mathrm{g}}^{3}+M_{\mathrm{g}}^{3}K^{3}). To reduce the complexity of VI-SBL, a mean-field approximation-based solution is proposed that converts a large matrix inversion to many small matrix inversions.

In particular, we consider the mean-field approximation that splits {𝐟,𝜸𝐟}\{\mathbf{f},\bm{\gamma}_{\mathbf{f}}\}, {𝐠,𝜸𝐠}\{\mathbf{g},\bm{\gamma}_{\mathbf{g}}\}, and {𝐡,𝜸𝐡}\{\mathbf{h},\bm{\gamma}_{\mathbf{h}}\} to S𝐟S_{\mathbf{f}}, S𝐠S_{\mathbf{g}}, and S𝐡S_{\mathbf{h}} subvectors. For example, the partition of {𝐟,𝜸𝐟}\{\mathbf{f},\bm{\gamma}_{\mathbf{f}}\} can be written as 𝐟T=[𝐟[1]T,,𝐟[S𝐟]T]\mathbf{f}^{\mathrm{T}}=[\mathbf{f}_{[1]}^{\mathrm{T}},\dots,\mathbf{f}_{[S_{\mathbf{f}}]}^{\mathrm{T}}] and 𝜸𝐟T=[𝜸𝐟[1]T,,𝜸𝐟[S𝐟]T]\bm{\gamma}_{\mathbf{f}}^{\mathrm{T}}=[\bm{\gamma}_{\mathbf{f}[1]}^{\mathrm{T}},\dots,\bm{\gamma}_{\mathbf{f}[S_{\mathbf{f}}]}^{\mathrm{T}}] where 𝐟[i]NgK/S𝐟\mathbf{f}_{[i]}\in\mathbb{C}^{N_{\mathrm{g}}K/S_{\mathbf{f}}} and 𝜸𝐟[i]NgK/S𝐟\bm{\gamma}_{\mathbf{f}[i]}\in\mathbb{C}^{N_{\mathrm{g}}K/S_{\mathbf{f}}}, and the same logic holds for {𝐠,𝜸𝐠}\{\mathbf{g},\bm{\gamma}_{\mathbf{g}}\} and {𝐡,𝜸𝐡}\{\mathbf{h},\bm{\gamma}_{\mathbf{h}}\}. Then, VI-SBL performs posterior inference on the partition

q(𝒳)=\displaystyle q(\mathcal{X})=
q(𝐮)i=1S𝐟q(𝐟[i])q(𝜸𝐟[i])j=1S𝐠q(𝐠[j])q(𝜸𝐠[j])k=1S𝐡q(𝐡[k])q(𝜸𝐡[k])\displaystyle q(\mathbf{u})\prod_{i=1}^{S_{\mathbf{f}}}q(\mathbf{f}_{[i]})q(\bm{\gamma}_{\mathbf{f}[i]})\prod_{j=1}^{S_{\mathbf{g}}}q(\mathbf{g}_{[j]})q(\bm{\gamma}_{\mathbf{g}[j]})\prod_{k=1}^{S_{\mathbf{h}}}q(\mathbf{h}_{[k]})q(\bm{\gamma}_{\mathbf{h}[k]}) (52)

from {𝐲,𝐳^}\{\mathbf{y},\hat{\mathbf{z}}\}.

Now, we present how the update rules in (35), (36), (39), (40), (43), and (44) are modified after the proposed mean-field approximation is applied. As a preliminary, let us introduce some shorthand notations. First, 𝐟[i]c\mathbf{f}_{[i]}^{\mathrm{c}} is defined as the vector obtained by removing 𝐟[i]\mathbf{f}_{[i]} from 𝐟\mathbf{f}. Likewise, 𝐀𝐟[i]𝐠\mathbf{A}_{\mathbf{f}[i]\mathbf{g}} and 𝐀𝐟[i]𝐠c\mathbf{A}_{\mathbf{f}[i]\mathbf{g}}^{\mathrm{c}} are defined as the matrices obtained by retaining the columns of 𝐀𝐟𝐠\mathbf{A}_{\mathbf{fg}} in (32) corresponding to 𝐟[i]\mathbf{f}_{[i]} and 𝐟[i]c\mathbf{f}_{[i]}^{\mathrm{c}}. The same logic can be extended to denote any vectors and matrices obtained by pruning. As an exception, the pruned versions of the precision matrices are denoted by 𝚪𝐟[i]=diag(𝜸𝐟[i])\bm{\Gamma}_{\mathbf{f}[i]}=\mathrm{diag}(\bm{\gamma}_{\mathbf{f}[i]}), 𝚪𝐠[i]=diag(𝜸𝐠[i])\bm{\Gamma}_{\mathbf{g}[i]}=\mathrm{diag}(\bm{\gamma}_{\mathbf{g}[i]}), and 𝚪𝐡[i]=diag(𝜸𝐡[i])\bm{\Gamma}_{\mathbf{h}[i]}=\mathrm{diag}(\bm{\gamma}_{\mathbf{h}[i]}).

Then, the new update rules for (35), (36), (39), (40), (43), and (44) are given by (53)-(55) at the bottom of the page. Since (53) repeats an NgK/S𝐟×NgK/S𝐟N_{\mathrm{g}}K/S_{\mathbf{f}}\times N_{\mathrm{g}}K/S_{\mathbf{f}} matrix inversion S𝐟S_{\mathbf{f}} times, the overall complexity is 𝒪(Ng3K3/S𝐟2)\mathcal{O}(N_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{f}}^{2}). The same logic applies to (54) and (55). As a result, the overall complexity of VI-SBL reduces to 𝒪(Ng3K3/S𝐟2+Mg3Ng3/S𝐠2+Mg3K3/S𝐡2)\mathcal{O}(N_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{f}}^{2}+M_{\mathrm{g}}^{3}N_{\mathrm{g}}^{3}/S_{\mathbf{g}}^{2}+M_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{h}}^{2}).

𝐦𝐟[i]\displaystyle\mathbf{m}_{\mathbf{f}[i]} =𝐂𝐟[i](1σB2{𝐀𝐟[i]𝐠H(𝐲𝐛𝐟𝐡)𝐀𝐟[i]𝐠H𝐀𝐟[i]𝐠c𝐟[i]c}+1σI2𝐀𝐟[i]H(𝐮𝐀𝐟[i]c𝐟[i]c)),\displaystyle=\mathbf{C}_{\mathbf{f}[i]}\bigg{(}\frac{1}{\sigma_{\mathrm{B}}^{2}}\bigg{\{}\langle\mathbf{A}_{\mathbf{f}[i]\mathbf{g}}\rangle^{\mathrm{H}}(\mathbf{y}-\langle\mathbf{b}_{\mathbf{fh}}\rangle)-\langle\mathbf{A}_{\mathbf{f}[i]\mathbf{g}}^{\mathrm{H}}\mathbf{A}_{\mathbf{f}[i]\mathbf{g}}^{\mathrm{c}}\rangle\langle\mathbf{f}_{[i]}^{\mathrm{c}}\rangle\bigg{\}}+\frac{1}{\sigma_{\mathrm{I}}^{2}}\mathbf{A}_{\mathbf{f}[i]}^{\mathrm{H}}(\langle\mathbf{u}\rangle-\mathbf{A}_{\mathbf{f}[i]}^{\mathrm{c}}\langle\mathbf{f}_{[i]}^{\mathrm{c}}\rangle)\bigg{)},
𝐂𝐟[i]\displaystyle\mathbf{C}_{\mathbf{f}[i]} =(1σB2𝐀𝐟[i]𝐠H𝐀𝐟[i]𝐠+1σI2𝐀𝐟[i]H𝐀𝐟[i]+𝚪𝐟[i])1 for iS𝐟,\displaystyle=\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\langle\mathbf{A}_{\mathbf{f}[i]\mathbf{g}}^{\mathrm{H}}\mathbf{A}_{\mathbf{f}[i]\mathbf{g}}\rangle+\frac{1}{\sigma_{\mathrm{I}}^{2}}\mathbf{A}_{\mathbf{f}[i]}^{\mathrm{H}}\mathbf{A}_{\mathbf{f}[i]}+\langle\bm{\Gamma}_{\mathbf{f}[i]}\rangle\right)^{-1}\text{ for }i\in\llbracket S_{\mathbf{f}}\rrbracket, (53)
𝐦𝐠[i]\displaystyle\mathbf{m}_{\mathbf{g}[i]} =𝐂𝐠[i](1σB2{𝐀𝐠[i]𝐟H(𝐲𝐛𝐠𝐡)𝐀𝐠[i]𝐟H𝐀𝐠[i]𝐟c𝐠[i]c}),\displaystyle=\mathbf{C}_{\mathbf{g}[i]}\bigg{(}\frac{1}{\sigma_{\mathrm{B}}^{2}}\bigg{\{}\langle\mathbf{A}_{\mathbf{g}[i]\mathbf{f}}\rangle^{\mathrm{H}}(\mathbf{y}-\langle\mathbf{b}_{\mathbf{gh}}\rangle)-\langle\mathbf{A}_{\mathbf{g}[i]\mathbf{f}}^{\mathrm{H}}\mathbf{A}_{\mathbf{g}[i]\mathbf{f}}^{\mathrm{c}}\rangle\langle\mathbf{g}_{[i]}^{\mathrm{c}}\rangle\bigg{\}}\bigg{)},
𝐂𝐠[i]\displaystyle\mathbf{C}_{\mathbf{g}[i]} =(1σB2𝐀𝐠[i]𝐟H𝐀𝐠[i]𝐟+𝚪𝐠[i])1 for iS𝐠,\displaystyle=\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\langle\mathbf{A}_{\mathbf{g}[i]\mathbf{f}}^{\mathrm{H}}\mathbf{A}_{\mathbf{g}[i]\mathbf{f}}\rangle+\langle\bm{\Gamma}_{\mathbf{g}[i]}\rangle\right)^{-1}\text{ for }i\in\llbracket S_{\mathbf{g}}\rrbracket, (54)
𝐦𝐡[i]\displaystyle\mathbf{m}_{\mathbf{h}[i]} =𝐂𝐡[i](1σB2𝐀𝐡[i]H(𝐲𝐛𝐡𝐟𝐠𝐀𝐡[i]c𝐡[i]c)),𝐂𝐡[i]=(1σB2𝐀𝐡[i]H𝐀𝐡[i]+𝚪𝐡[i])1 for iS𝐡\displaystyle=\mathbf{C}_{\mathbf{h}[i]}\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\mathbf{A}_{\mathbf{h}[i]}^{\mathrm{H}}(\mathbf{y}-\langle\mathbf{b}_{\mathbf{hfg}}\rangle-\mathbf{A}_{\mathbf{h}[i]}^{\mathrm{c}}\langle\mathbf{h}_{[i]}^{\mathrm{c}}\rangle)\right),\quad\mathbf{C}_{\mathbf{h}[i]}=\left(\frac{1}{\sigma_{\mathrm{B}}^{2}}\mathbf{A}_{\mathbf{h}[i]}^{\mathrm{H}}\mathbf{A}_{\mathbf{h}[i]}+\langle\bm{\Gamma}_{\mathbf{h}[i]}\rangle\right)^{-1}\text{ for }i\in\llbracket S_{\mathbf{h}}\rrbracket (55)

IV Simulation Results

In this section, we evaluate the performance of the proposed VI-SBL-based channel estimator based on the channel estimation error and energy efficiency. The baselines are state-of-the-art compressed sensing-based channel estimators for IRS-aided mmWave massive MIMO systems with passive reflecting elements. To implement the baselines, the UE-IRS-BS link estimation problem is reformulated as a sparse recovery problem as proposed in [10]. Then, the problem is solved using the generalized approximate MP (GAMP) [34], vector approximate MP (VAMP) [35], SBL [14], and generalized expectation consistent-signal recovery (GEC-SR) [36] algorithms, which are compressed sensing algorithms widely adopted in the mmWave channel estimation domain [37, 32, 38]. In addition, we also adopt the DS-OMP channel estimator as our baseline, which is an algorithm that exploits the double sparsity structure of the UE-IRS-BS link by taking into account the fact that all the users share a common IRS-BS link. DS-OMP attempts to improve the channel estimation accuracy by identifying the common rows and columns that the UE-IRS-BS links of all the users share.

The channel parameters in (1) are as follows. The bandwidth and path loss are W=80W=80 MHz and

PL={31.4+20log10dfor LoS42+29.2log10dfor NLoS\mathrm{PL}=\begin{cases}31.4+20\log_{10}d&\text{for LoS}\\ 42+29.2\log_{10}d&\text{for NLoS}\end{cases} (56)

in dB where dd is the distance of the link in meters. The Rician K-factors are κUI,k=κIB=13.2\kappa_{\mathrm{UI},k}=\kappa_{\mathrm{IB}}=13.2 dB for the UE-IRS and IRS-BS links in LoS and κUB,k=\kappa_{\mathrm{UB},k}=-\infty dB for the UE-BS link in NLoS. The number of NLoS paths is LUI,k=LIB=LUB,k=4L_{\mathrm{UI},k}=L_{\mathrm{IB}}=L_{\mathrm{UB},k}=4 for all the links.

In addition, the system parameters are configured as specified in the Dense Urban-eMBB scenario in ITU-R M.2412-0 [39]. In particular, the transmit power is Pk[t]=23P_{k}[t]=23 dBm. Meanwhile, the noise figure of the base station and active sensors is NF=7\mathrm{NF}=7 dB. Therefore, the noise power is σB2=σI2=W×N0×NF\sigma_{\mathrm{B}}^{2}=\sigma_{\mathrm{I}}^{2}=W\times N_{0}\times\mathrm{NF} where N0=174N_{0}=-174 dBm/Hz is the noise spectral density. There are M=16M=16 antennas at the base station, N=8×8N=8\times 8 elements at the IRS with Na=4N_{\mathrm{a}}=4 semi-passive elements, and K=4K=4 users. The parameters for the active sensors are B=4B=4 bit for the ADC resolution and fsn=1f_{\mathrm{sn}}=1 for the switching frequency. In addition, the uniform BB-bit quantizer proposed in [37, 40] is adopted. The passive reflecting elements are initially turned off for UE-BS link estimation and turned on as

|vn[t]|={0for t501for tT50|v_{n}[t]|=\begin{cases}0&\text{for }t\in\llbracket 50\rrbracket\\ 1&\text{for }t\in\llbracket T\rrbracket\setminus\llbracket 50\rrbracket\end{cases} (57)

with random phase shifts in the channel estimation phase. The base station and IRS are at (0,0)(0,0) m and (20,10)(20,10) m, while the users are around the circle with center (40,0)(40,0) m and radius 5 m. The lengths of the coherence block and channel estimation phase are Tc=1800T_{\mathrm{c}}=1800 and T=400T=400. Throughout the simulation, the system parameters are fixed unless stated otherwise. In addition, the parameters for VI-SBL are Mg=M,Ng=NM_{\mathrm{g}}=M,N_{\mathrm{g}}=N, S𝐟=1S_{\mathbf{f}}=1, S𝐠=8S_{\mathbf{g}}=8, S𝐡=1S_{\mathbf{h}}=1, and a=b=106a=b=10^{-6}. To understand the reason behind S𝐠=8S_{\mathbf{g}}=8, recall that the complexity of VI-SBL is 𝒪(Ng3K3/S𝐟2+Mg3Ng3/S𝐠2+Mg3K3/S𝐡2)\mathcal{O}(N_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{f}}^{2}+M_{\mathrm{g}}^{3}N_{\mathrm{g}}^{3}/S_{\mathbf{g}}^{2}+M_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{h}}^{2}). Since massive MIMO and IRS imply large MgMM_{\mathrm{g}}\geq M and NgNN_{\mathrm{g}}\geq N, most of the complexity comes from (54) that repeats an MgNg/S𝐠×MgNg/S𝐠M_{\mathrm{g}}N_{\mathrm{g}}/S_{\mathbf{g}}\times M_{\mathrm{g}}N_{\mathrm{g}}/S_{\mathbf{g}} matrix inversion S𝐠S_{\mathbf{g}} times. Therefore, we convert a large matrix inversion to many small matrix inversions by setting S𝐠=8S_{\mathbf{g}}=8.

TABLE I: Power consumption parameters
Parameter Value Parameter Value
BB_{\mathrm{\infty}} 10 bit PTP_{\mathrm{T}} [41] 0.25 W/Gbps
FOM\mathrm{FOM} [42] 1432.1 fJ/conversion-step PLOP_{\mathrm{LO}} [43] 22.5 mW
fsf_{\mathrm{s}} 80 MHz PRFP_{\mathrm{RF}} [43] 31.6 mW

Now, we explain the power consumption model that accounts for the power dissipated at the base station, IRS, and fronthaul link through which the received signal at the active sensors is forwarded. Since the base station is equipped with a local oscillator, MM RF chains, and MM pairs of BB_{\mathrm{\infty}}-bit ADCs where B1B_{\mathrm{\infty}}\gg 1, the power consumed by the base station is modeled as PBS=PLO+M(PRF+2PADC(B))P_{\mathrm{BS}}=P_{\mathrm{LO}}+M(P_{\mathrm{RF}}+2P_{\mathrm{ADC}}(B_{\infty})). Here, PADC(B)=FOM×fs×2BP_{\mathrm{ADC}}(B)=\mathrm{FOM}\times f_{\mathrm{s}}\times 2^{B} [44] is the power consumption of a BB-bit ADC where FOM\mathrm{FOM} and fsf_{\mathrm{s}} are the figure of merit and sampling frequency. Likewise, the power consumed by the active sensors at the IRS is modeled as PIRS=PLO+Na(PRF+2PADC(B))P_{\mathrm{IRS}}=P_{\mathrm{LO}}+N_{\mathrm{a}}(P_{\mathrm{RF}}+2P_{\mathrm{ADC}}(B)). Moving on to the fronthaul link that forwards 2BNaW2BN_{\mathrm{a}}W bits per second, the power consumption model is PFH=2BNaWPTP_{\mathrm{FH}}=2BN_{\mathrm{a}}WP_{\mathrm{T}} [41] where PTP_{\mathrm{T}} is the traffic-dependent power consumption. Then, the total power consumption is PBS+T/Tc×(PIRS+PFH)P_{\mathrm{BS}}+T/T_{\mathrm{c}}\times(P_{\mathrm{IRS}}+P_{\mathrm{FH}}) under the premise that the semi-passive elements become passive reflecting elements in the data transmission phase. On the other hand, the total power consumption of the baselines with passive reflecting elements is PBSP_{\mathrm{BS}}. The power consumption parameters are shown in Table I.

TABLE II: Per-iteration complexities of various channel estimators
Algorithm Complexity
Proposed 𝒪(Ng3K3/S𝐟2+Mg3Ng3/S𝐠2+Mg3K3/S𝐡2)\mathcal{O}(N_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{f}}^{2}+M_{\mathrm{g}}^{3}N_{\mathrm{g}}^{3}/S_{\mathbf{g}}^{2}+M_{\mathrm{g}}^{3}K^{3}/S_{\mathbf{h}}^{2})
GAMP 𝒪(MTMgNgK)\mathcal{O}(MTM_{\mathrm{g}}N_{\mathrm{g}}K)
VAMP 𝒪(MTMgNgK)\mathcal{O}(MTM_{\mathrm{g}}N_{\mathrm{g}}K)
SBL 𝒪(Mg3Ng3K3)\mathcal{O}(M_{\mathrm{g}}^{3}N_{\mathrm{g}}^{3}K^{3})
GEC-SR 𝒪(Mg3Ng3K3)\mathcal{O}(M_{\mathrm{g}}^{3}N_{\mathrm{g}}^{3}K^{3})
DS-OMP 𝒪(MTK+NTKLIBLUI,k3)\mathcal{O}(MTK+NTKL_{\mathrm{IB}}L_{\mathrm{UI},k}^{3})
Refer to caption
Figure 3: NMSE of the UE-IRS-BS link vs. transmit power in the channel estimation phase.
Refer to caption
Figure 4: Energy efficiency vs. transmit power in the channel estimation phase. The transmit power in the data transmission phase is Pk[𝒯d]=23P_{k}[\mathcal{T}_{\mathrm{d}}]=23 dBm.

The performance metrics for the channel estimation error and energy efficiency are as follows. The channel estimation error is measured based on the normalized MSE (NMSE). In particular, the NMSEs for the UE-IRS, IRS-BS, and UE-BS links are

NMSE(𝐅¯)\displaystyle\mathrm{NMSE}(\bar{\mathbf{F}}) =𝔼{𝐅¯^𝐅¯F2/𝐅¯F2},\displaystyle=\mathbb{E}\{\|\hat{\bar{\mathbf{F}}}-\bar{\mathbf{F}}\|_{\mathrm{F}}^{2}/\|\bar{\mathbf{F}}\|_{\mathrm{F}}^{2}\}, (58)
NMSE(𝐆¯)\displaystyle\mathrm{NMSE}(\bar{\mathbf{G}}) =𝔼{𝐆¯^𝐆¯F2/𝐆¯F2},\displaystyle=\mathbb{E}\{\|\hat{\bar{\mathbf{G}}}-\bar{\mathbf{G}}\|_{\mathrm{F}}^{2}/\|\bar{\mathbf{G}}\|_{\mathrm{F}}^{2}\}, (59)
NMSE(𝐇¯)\displaystyle\mathrm{NMSE}(\bar{\mathbf{H}}) =𝔼{𝐇¯^𝐇¯F2/𝐇¯F2}.\displaystyle=\mathbb{E}\{\|\hat{\bar{\mathbf{H}}}-\bar{\mathbf{H}}\|_{\mathrm{F}}^{2}/\|\bar{\mathbf{H}}\|_{\mathrm{F}}^{2}\}. (60)

Meanwhile, the spectral efficiency is evaluated based on the sum rate obtained by optimizing the passive reflecting elements [45] and combiner at the base station [46] using the estimates of the UE-IRS-BS link kr(𝐅¯T,𝐆¯)\mathrm{kr}(\bar{\mathbf{F}}^{\mathrm{T}},\bar{\mathbf{G}}) and UE-BS link. Then, the energy efficiency is defined as the spectral efficiency scaled by WW and normalized by the total power consumption, which is defined as such to make a fair comparison between IRS with semi-passive elements and passive reflecting elements. Therefore, the energy efficiency is given by

EE={W×SEPBS+T/Tc×(PIRS+PFH)for Na>0W×SEPBSfor Na=0\mathrm{EE}=\begin{cases}\frac{W\times\mathrm{SE}}{P_{\mathrm{BS}}+T/T_{\mathrm{c}}\times(P_{\mathrm{IRS}}+P_{\mathrm{FH}})}&\text{for }N_{\mathrm{a}}>0\\ \frac{W\times\mathrm{SE}}{P_{\mathrm{BS}}}&\text{for }N_{\mathrm{a}}=0\end{cases} (61)

where the first case corresponds to VI-SBL with semi-passive elements, while the second case corresponds to the baselines with passive reflecting elements. Throughout the simulation, we mainly compare the NMSE of the UE-IRS-BS link because the baselines can only estimate the UE-IRS-BS and UE-BS links unlike VI-SBL that estimates all the links. Moreover, the reason for omitting the UE-BS link is because the UE-BS link estimation problem can be converted to the conventional channel estimation problem as pointed out in [3].

Refer to caption
Figure 5: NMSE of the UE-IRS-BS link vs. TT.
Refer to caption
Figure 6: Energy efficiency vs. TT.
Refer to caption
Figure 7: NMSEs of all the links vs. NaN_{\mathrm{a}}. The baselines are not shown.
Refer to caption
Figure 8: Energy efficiency vs. NaN_{\mathrm{a}}. The baselines with passive reflecting elements are shown as references.
Refer to caption
Figure 9: NMSEs of all the links vs. BB. The baselines are not shown.
Refer to caption
Figure 10: Energy efficiency vs. BB. The baselines with passive reflecting elements are shown as references.
Refer to caption
Figure 11: Energy-spectral efficiency as a function of BB for various NaN_{\mathrm{a}}. At each line, the markers correspond to B=1,,8B=1,\dots,8 from left to right. The baselines with passive reflecting elements are shown as references.

In the first simulation, we compare the performance of VI-SBL and the baselines in terms of the NMSE of the UE-IRS-BS link and energy efficiency for various transmit powers in the channel estimation phase. According to Fig. 3, VI-SBL outperforms the baselines in terms of the channel estimation accuracy. The superior performance of VI-SBL is expected because VI-SBL exploits the additional information acquired from a small number of active sensors at the IRS, whereas only the received signal at the base station is available to the baselines. As a result, we see a significant energy efficiency gap from Fig. 4 due to the high channel estimation accuracy of VI-SBL that outweighs the additional power consumption.

In the second simulation, we evaluate the performance of various channel estimators for different TT. According to Fig. 5, T200T\geq 200 is sufficient for VI-SBL to yield an accurate channel estimate, whereas the baselines require at least T400T\geq 400. As a result, the energy efficiency gap is large in the low training overhead regime as evident from Fig. 6. The reason for the dramatic training overhead gap comes from the fact that the baselines cannot observe the UE-IRS link, whereas VI-SBL has direct access to the UE-IRS link through a small number of active sensors. Therefore, the training overhead issue problematic in IRS-aided mmWave massive MIMO systems incurred by the large size of the UE-IRS-BS link can be resolved by adopting a small number of active sensors at the cost of additional but marginal power consumption.

In the third simulation, we evaluate the performance of VI-SBL in terms of the NMSEs of all the links and energy efficiency for various NaN_{\mathrm{a}}. From Fig. 7, observe that the NMSE of the UE-IRS link decreases as NaN_{\mathrm{a}} increases, which is not as surprising. In contrast, the NMSE of the IRS-BS link increases unlike the UE-IRS link. The reason for such a phenomenon is because the IRS-BS link can only be observed through the reflected signal. Therefore, more active sensors means less passive reflecting elements, or equivalently less IRS-BS link measurements. In addition, another interesting point is that the NMSE of the UE-IRS-BS link composed of the UE-IRS and IRS-BS links is lower-bounded by the worst NMSE of the UE-IRS and IRS-BS links as evident from Fig. 7. From the discussion until now, we recommend to keep NaN_{\mathrm{a}} small because increasing NaN_{\mathrm{a}} results in additional power consumption, while degrading the quality of the UE-IRS-BS link estimate. By recalling that the impact of the UE-IRS-BS link on the spectral efficiency is significant, we conclude that increasing NaN_{\mathrm{a}} offers no benefit other than increasing the accuracy of the UE-IRS link estimate. In fact, the energy efficiency falls below the baselines as NaN_{\mathrm{a}} increases as evident from Fig. 8.

In the fourth simulation, we assess the performance of VI-SBL in terms of the NMSEs of all the links and energy efficiency for various BB. According to Fig. 9, the NMSEs of all the links except the UE-IRS link saturate at B4B\geq 4. In contrast, the NMSE of the UE-IRS link hits the floor at B12B\geq 12. To understand such a phenomenon, recall that the UE-IRS link is observed through the quantized received signal at the active sensors. Then, we conclude that the impact of BB on the quality of the UE-IRS link estimate must be significant. Also, another interesting point is that the NMSE of the UE-IRS-BS link is lower-bounded by the worst NMSE of the UE-IRS and IRS-BS links as previously observed in Fig. 7. Moving on to Fig. 10, we see that there is a diminishing return beyond B6B\geq 6 at the cost of additional power consumption. Therefore, we can find the most energy-efficient operating point as a function of BB, which is B=6B=6 in our simulation setup.

In the fifth simulation, we investigate the energy-spectral efficiency of VI-SBL as a function of BB for various NaN_{\mathrm{a}}. According to Fig. 11, it is evident that increasing the ADC resolution beyond a certain limit, i.e., approximately B4B\geq 4 for any NaN_{\mathrm{a}}, yields only a small spectral efficiency gain at the cost of large additional power consumption. Therefore, it is not worth it to deploy high-resolution ADCs at the active sensors of the IRS since the gain is marginal. In addition, an interesting observation is that increasing the number of active sensors at the IRS actually decreases the spectral efficiency. Therefore, the combined effect of reduced spectral efficiency and increased power consumption results in significantly degraded system performance. This is in line with Figs. 7 and 8, and this phenomenon can be attributed to the fact that increasing the number of active sensors decreases the amount of signals reflected at the IRS towards the base station, thus decreasing the IRS-BS link measurements. Therefore, we conclude that an energy-efficient architecture that attains high energy efficiency requires both the number of the active sensors and ADC resolution to be low.

Before moving on, we emphasize that the superior performance of VI-SBL with active sensors over the baselines with passive reflecting elements is due to the availability of the additional information acquired at the active sensors. That is, the simulation results until now do not imply that VI-SBL proposed in this paper is algorithmically superior over the existing compressed sensing algorithms. The main contribution of the proposed VI-SBL is that it suggests a systematic way of how to jointly process the pilots received at the base station and active sensors effectively to yield good channel estimation accuracy/energy efficiency. This is in contrast to the recent works on IRS with semi-passive elements [12, 13] that only use the pilots received at the active sensors due to the complicated nature of jointly exploiting the pilots received at the base station and active sensors. The channel estimators in [12, 13] indeed require a dedicated channel estimator protocol involving both uplink and downlink pilots, which make them to be only applicable to limited scenarios.

Next, we investigate the complexities of various channel estimators. According to Table II, SBL and GEC-SR that perform an MgNgK×MgNgKM_{\mathrm{g}}N_{\mathrm{g}}K\times M_{\mathrm{g}}N_{\mathrm{g}}K matrix inversion have the highest complexities. In contrast, GAMP and VAMP have relatively low complexities that involve a matrix-vector multiplication of size MT×MgNgKMT\times M_{\mathrm{g}}N_{\mathrm{g}}K, while DS-OMP has the lowest complexity slightly larger than that of the conventional OMP algorithm [47]. Meanwhile, VI-SBL performs many small matrix inversions, whose complexity is controlled by S𝐟S_{\mathbf{f}}, S𝐠S_{\mathbf{g}}, and S𝐡S_{\mathbf{h}} as discussed earlier. The complexity of VI-SBL is demanding but not as SBL and GEC-SR. Therefore, we conclude that VI-SBL offers a significant performance gain at the expense of not-so-low complexity.

V Conclusion

A Bayesian channel estimator was proposed for IRS-aided mmWave massive MIMO systems with semi-passive elements. Unlike recent works on channel estimation with semi-passive elements that require both uplink and downlink signals to estimate the UE-IRS and IRS-BS links, the proposed channel estimator aimed to estimate all the links using only uplink training signals. To perform approximate posterior inference on the channel, the channel estimation problem was recast as an SBL framework. Then, SBL was solved using the variational free energy principle and mean-field approximation, from which the VI-SBL-based channel estimator was obtained. The simulation results showed that the proposed channel estimator requires low training overhead by taking advantage of the active sensors. Also, VI-SBL was capable of estimating all the links, which reduces the channel estimation overhead in the long run by replacing UE-IRS-BS link estimation with UE-IRS link estimation as the IRS-BS link is quasi-static in practice.

𝐀𝐠𝐟H𝐀𝐠𝐟\displaystyle\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle =(𝐀IT(𝐒𝐀I𝐅𝐗)𝐀BH)((𝐒T𝐗T𝐅T𝐀IT)𝐀I𝐀B)\displaystyle\overset{\phantom{(a)}}{=}\langle(\mathbf{A}_{\mathrm{I}}^{\mathrm{T}}(\mathbf{S}^{*}\odot\mathbf{A}_{\mathrm{I}}^{*}\mathbf{F}^{*}\mathbf{X}^{*})\otimes\mathbf{A}_{\mathrm{B}}^{\mathrm{H}})((\mathbf{S}^{\mathrm{T}}\odot\mathbf{X}^{\mathrm{T}}\mathbf{F}^{\mathrm{T}}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}})\mathbf{A}_{\mathrm{I}}^{*}\otimes\mathbf{A}_{\mathrm{B}})\rangle
=(a)𝐀IT(𝐒𝐀I𝐅𝐗)(𝐒T𝐗T𝐅T𝐀IT)𝐀I𝐀BH𝐀B\displaystyle\overset{(a)}{=}\langle\mathbf{A}_{\mathrm{I}}^{\mathrm{T}}(\mathbf{S}^{*}\odot\mathbf{A}_{\mathrm{I}}^{*}\mathbf{F}^{*}\mathbf{X}^{*})(\mathbf{S}^{\mathrm{T}}\odot\mathbf{X}^{\mathrm{T}}\mathbf{F}^{\mathrm{T}}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}})\mathbf{A}_{\mathrm{I}}^{*}\otimes\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}\rangle
=𝐀IT(𝐒𝐀I𝐅𝐗)(𝐒T𝐗T𝐅T𝐀IT)𝐀I𝐀BH𝐀B\displaystyle\overset{\phantom{(a)}}{=}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}}\langle(\mathbf{S}^{*}\odot\mathbf{A}_{\mathrm{I}}^{*}\mathbf{F}^{*}\mathbf{X}^{*})(\mathbf{S}^{\mathrm{T}}\odot\mathbf{X}^{\mathrm{T}}\mathbf{F}^{\mathrm{T}}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}})\rangle\mathbf{A}_{\mathrm{I}}^{*}\otimes\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}} (62)

Appendix A Derivation of 𝐀𝐟𝐠H𝐀𝐟𝐠\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle for Updating q(𝐟)q(\mathbf{f})

As a preliminary, define 𝐁𝐟\mathbf{B}_{\mathbf{f}} from

𝐀𝐟𝐠=(𝐈T𝐀B𝐆𝐀IH)diag(vec(𝐒))(𝐗T𝐀I)=𝐁𝐟\mathbf{A}_{\mathbf{fg}}=(\mathbf{I}_{T}\otimes\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\underbrace{\mathrm{diag}(\mathrm{vec}(\mathbf{S}))(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{I}})}_{=\mathbf{B}_{\mathbf{f}}}

in (32). Then, expanding 𝐀𝐟𝐠H𝐀𝐟𝐠\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle leads to

𝐀𝐟𝐠H𝐀𝐟𝐠\displaystyle\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle =𝐁𝐟H(𝐈T𝐀I𝐆H𝐀BH)(𝐈T𝐀B𝐆𝐀IH)𝐁𝐟\displaystyle\overset{\phantom{(a)}}{=}\langle\mathbf{B}_{\mathbf{f}}^{\mathrm{H}}(\mathbf{I}_{\mathrm{T}}\otimes\mathbf{A}_{\mathrm{I}}\mathbf{G}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}^{\mathrm{H}})(\mathbf{I}_{T}\otimes\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\mathbf{B}_{\mathbf{f}}\rangle
=(a)𝐁𝐟H(𝐈T𝐀I𝐆H𝐀BH𝐀B𝐆𝐀IH)𝐁𝐟\displaystyle\overset{(a)}{=}\langle\mathbf{B}_{\mathbf{f}}^{\mathrm{H}}(\mathbf{I}_{T}\otimes\mathbf{A}_{\mathrm{I}}\mathbf{G}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}\mathbf{G}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\mathbf{B}_{\mathbf{f}}\rangle
=𝐁𝐟H(𝐈T𝐀I𝐆H𝐀BH𝐀B𝐆𝐀IH)𝐁𝐟\displaystyle\overset{\phantom{(a)}}{=}\mathbf{B}_{\mathbf{f}}^{\mathrm{H}}(\mathbf{I}_{T}\otimes\mathbf{A}_{\mathrm{I}}\langle\mathbf{G}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}\mathbf{G}\rangle\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\mathbf{B}_{\mathbf{f}}

where the mixed-product property of the Kronecker product was used in (a). Meanwhile, the vectorized version of 𝐆H𝐀BH𝐀B𝐆\langle\mathbf{G}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}\mathbf{G}\rangle in the middle of the last equality can be expressed as

vec(𝐆H𝐀BH𝐀B𝐆)=𝐆T𝐆Hvec(𝐀BH𝐀B).\mathrm{vec}(\langle\mathbf{G}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}\mathbf{G}\rangle)=\langle\mathbf{G}^{\mathrm{T}}\otimes\mathbf{G}^{\mathrm{H}}\rangle\mathrm{vec}(\mathbf{A}_{\mathrm{B}}^{\mathrm{H}}\mathbf{A}_{\mathrm{B}}).

Now, let us focus on the conjugate transpose of 𝐆T𝐆H\langle\mathbf{G}^{\mathrm{T}}\otimes\mathbf{G}^{\mathrm{H}}\rangle, or equivalently 𝐆𝐆\langle\mathbf{G}^{*}\otimes\mathbf{G}\rangle, which contains Mg×NgM_{\mathrm{g}}\times N_{\mathrm{g}} submatrices of sizes Mg×NgM_{\mathrm{g}}\times N_{\mathrm{g}}. The (i,j)(i,j)-th submatrix of 𝐆𝐆\langle\mathbf{G}^{*}\otimes\mathbf{G}\rangle can be vectorized as

vec(gi+(j1)Mg𝐆)=[𝐂𝐠+𝐦𝐠𝐦𝐠H]:,i+(j1)Mg,\mathrm{vec}(\langle g_{i+(j-1)M_{\mathrm{g}}}^{*}\mathbf{G}\rangle)=[\mathbf{C}_{\mathbf{g}}+\mathbf{m}_{\mathbf{g}}\mathbf{m}_{\mathbf{g}}^{\mathrm{H}}]_{:,i+(j-1)M_{\mathrm{g}}},

which can be reshaped and assembled to compute 𝐀𝐟𝐠H𝐀𝐟𝐠\langle\mathbf{A}_{\mathbf{fg}}^{\mathrm{H}}\mathbf{A}_{\mathbf{fg}}\rangle.

Appendix B Derivation of 𝐀𝐠𝐟H𝐀𝐠𝐟\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle for Updating q(𝐠)q(\mathbf{g})

First, we expand 𝐀𝐠𝐟H𝐀𝐠𝐟\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle using (37) to obtain (62) at the top of the page where the mixed-product property of the Kronecker product was applied in (a). Now, let us focus on the transpose of (𝐒𝐀I𝐅𝐗)(𝐒T𝐗T𝐅T𝐀IT)\langle(\mathbf{S}^{*}\odot\mathbf{A}_{\mathrm{I}}^{*}\mathbf{F}^{*}\mathbf{X}^{*})(\mathbf{S}^{\mathrm{T}}\odot\mathbf{X}^{\mathrm{T}}\mathbf{F}^{\mathrm{T}}\mathbf{A}_{\mathrm{I}}^{\mathrm{T}})\rangle in the middle of the last line of (62). To proceed, we define 𝐁𝐠\mathbf{B}_{\mathbf{g}} from

vec(𝐒𝐀I𝐅𝐗)=diag(vec(𝐒))(𝐗T𝐀I)=𝐁𝐠𝐟.\mathrm{vec}(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})=\underbrace{\mathrm{diag}(\mathrm{vec}(\mathbf{S}))(\mathbf{X}^{\mathrm{T}}\otimes\mathbf{\mathbf{A}_{\mathrm{I}}})}_{=\mathbf{B}_{\mathbf{g}}}\mathbf{f}.

Then, the (i,j)(i,j)-th element of (𝐒𝐀I𝐅𝐗)(𝐒H𝐗H𝐅H𝐀IH)\langle(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})(\mathbf{S}^{\mathrm{H}}\odot\mathbf{X}^{\mathrm{H}}\mathbf{F}^{\mathrm{H}}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\rangle is

[(𝐒𝐀I𝐅𝐗)(𝐒H𝐗H𝐅H𝐀IH)]i,j=\displaystyle[\langle(\mathbf{S}\odot\mathbf{A}_{\mathrm{I}}\mathbf{F}\mathbf{X})(\mathbf{S}^{\mathrm{H}}\odot\mathbf{X}^{\mathrm{H}}\mathbf{F}^{\mathrm{H}}\mathbf{A}_{\mathrm{I}}^{\mathrm{H}})\rangle]_{i,j}=
k=1T[𝐁𝐠𝐟𝐟H𝐁𝐠H]i+(k1)N,j+(k1)N=\displaystyle\sum_{k=1}^{T}[\mathbf{B}_{\mathbf{g}}\langle\mathbf{f}\mathbf{f}^{\mathrm{H}}\rangle\mathbf{B}_{\mathbf{g}}^{\mathrm{H}}]_{i+(k-1)N,j+(k-1)N}=
k=1T[𝐁𝐠(𝐂𝐟+𝐦𝐟𝐦𝐟H)𝐁𝐠H]i+(k1)N,j+(k1)N,\displaystyle\sum_{k=1}^{T}[\mathbf{B}_{\mathbf{g}}(\mathbf{C}_{\mathbf{f}}+\mathbf{m}_{\mathbf{f}}\mathbf{m}_{\mathbf{f}}^{\mathrm{H}})\mathbf{B}_{\mathbf{g}}^{\mathrm{H}}]_{i+(k-1)N,j+(k-1)N},

which can be assembled and rearranged to compute 𝐀𝐠𝐟H𝐀𝐠𝐟\langle\mathbf{A}_{\mathbf{gf}}^{\mathrm{H}}\mathbf{A}_{\mathbf{gf}}\rangle.

References

  • [1] X. Wang, L. Kong, F. Kong, F. Qiu, M. Xia, S. Arnon, and G. Chen, “Millimeter wave communication: A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 20, no. 3, pp. 1616–1653, 2018.
  • [2] Q. Wu and R. Zhang, “Towards smart and reconfigurable environment: Intelligent reflecting surface aided wireless network,” IEEE Commun. Mag., vol. 58, no. 1, pp. 106–112, 2020.
  • [3] Z. Wang, L. Liu, and S. Cui, “Channel estimation for intelligent reflecting surface assisted multiuser communications: Framework, algorithms, and analysis,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6607–6620, 2020.
  • [4] C. You, B. Zheng, and R. Zhang, “Channel estimation and passive beamforming for intelligent reflecting surface: Discrete phase shift and progressive refinement,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2604–2620, 2020.
  • [5] B. Zheng, C. You, and R. Zhang, “Intelligent reflecting surface assisted multi-user OFDMA: Channel estimation and training design,” IEEE Trans. Wireless Commun., vol. 19, no. 12, pp. 8315–8329, 2020.
  • [6] H. Liu, X. Yuan, and Y.-J. A. Zhang, “Matrix-calibration-based cascaded channel estimation for reconfigurable intelligent surface assisted multiuser MIMO,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2621–2636, 2020.
  • [7] C. Hu, L. Dai, S. Han, and X. Wang, “Two-timescale channel estimation for reconfigurable intelligent surface aided wireless communications,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7736–7747, 2021.
  • [8] J. He, H. Wymeersch, and M. Juntti, “Channel estimation for RIS-aided mmWave MIMO systems via atomic norm minimization,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 5786–5797, 2021.
  • [9] S. Kim, H. Lee, J. Cha, S.-J. Kim, J. Park, and J. Choi, “Practical channel estimation and phase shift design for intelligent reflecting surface empowered MIMO systems,” IEEE Trans. Wireless Commun., pp. 1–1, 2022.
  • [10] P. Wang, J. Fang, H. Duan, and H. Li, “Compressed channel estimation for intelligent reflecting surface-assisted millimeter wave systems,” IEEE Signal Process. Lett., vol. 27, pp. 905–909, 2020.
  • [11] X. Wei, D. Shen, and L. Dai, “Channel estimation for RIS assisted wireless communications—Part II: An improved solution based on double-structured sparsity,” IEEE Commun. Lett., vol. 25, no. 5, pp. 1403–1407, 2021.
  • [12] X. Hu, R. Zhang, and C. Zhong, “Semi-passive elements assisted channel estimation for intelligent reflecting surface-aided communications,” IEEE Trans. Wireless Commun., vol. 21, no. 2, pp. 1132–1142, 2022.
  • [13] Y. Jin, J. Zhang, X. Zhang, H. Xiao, B. Ai, and D. W. K. Ng, “Channel estimation for semi-passive reconfigurable intelligent surfaces with enhanced deep residual networks,” IEEE Trans. Veh. Technol., vol. 70, no. 10, pp. 11 083–11 088, 2021.
  • [14] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, p. 211–244, 2001.
  • [15] D. Wipf and B. Rao, “Sparse Bayesian learning for basis selection,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2153–2164, 2004.
  • [16] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximation for Bayesian inference,” IEEE Signal Process. Mag., vol. 25, no. 6, pp. 131–146, 2008.
  • [17] M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Foundations and Trends® in Machine Learning, vol. 1, no. 1–2, pp. 1–305, 2008.
  • [18] Q. Wu and R. Zhang, “Intelligent reflecting surface enhanced wireless network via joint active and passive beamforming,” IEEE Trans. Wireless Commun., vol. 18, no. 11, pp. 5394–5409, 2019.
  • [19] S. Abeywickrama, R. Zhang, Q. Wu, and C. Yuen, “Intelligent reflecting surface: Practical phase shift model and beamforming optimization,” IEEE Trans. Commun., vol. 68, no. 9, pp. 5849–5863, 2020.
  • [20] Q. Wu, S. Zhang, B. Zheng, C. You, and R. Zhang, “Intelligent reflecting surface-aided wireless communications: A tutorial,” IEEE Trans. Commun., vol. 69, no. 5, pp. 3313–3351, 2021.
  • [21] R. Méndez-Rial, C. Rusu, N. González-Prelcic, A. Alkhateeb, and R. W. Heath, “Hybrid MIMO architectures for millimeter wave communications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267, 2016.
  • [22] S. Park, A. Alkhateeb, and R. W. Heath, “Dynamic subarrays for hybrid precoding in wideband mmWave MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 5, pp. 2907–2920, 2017.
  • [23] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1164–1179, 2014.
  • [24] A. Sayeed, “Deconstructing multiantenna fading channels,” IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2563–2579, 2002.
  • [25] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, 2014.
  • [26] K. Venugopal, A. Alkhateeb, N. González Prelcic, and R. W. Heath, “Channel estimation for hybrid architecture-based wideband millimeter wave systems,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 1996–2009, 2017.
  • [27] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
  • [28] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717–772, 2009.
  • [29] P. D. Hoff, A first course in Bayesian statistical methods.   Springer, 2009.
  • [30] J. Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference.   Morgan Kaufmann, 1988.
  • [31] C.-K. Wen, C.-J. Wang, S. Jin, K.-K. Wong, and P. Ting, “Bayes-optimal joint channel-and-data estimation for massive MIMO with low-precision ADCs,” IEEE Trans. Signal Process., vol. 64, no. 10, pp. 2541–2556, 2016.
  • [32] Y. Ding, S.-E. Chiu, and B. D. Rao, “Bayesian channel estimation algorithms for massive MIMO systems with hybrid analog-digital processing and low-resolution ADCs,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 499–513, 2018.
  • [33] X. Cheng, B. Xia, K. Xu, and S. Li, “Bayesian channel estimation and data detection in oversampled OFDM receiver with low-resolution ADC,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 5558–5571, 2021.
  • [34] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in 2011 IEEE Int. Symp. Inf. Theory (ISIT), 2011, pp. 2168–2172.
  • [35] P. Schniter, S. Rangan, and A. K. Fletcher, “Vector approximate message passing for the generalized linear model,” in 2016 50th Asilomar Conf. Signals, Syst. and Comput., 2016, pp. 1525–1529.
  • [36] H. He, C.-K. Wen, and S. Jin, “Generalized expectation consistent signal recovery for nonlinear measurements,” in 2017 IEEE Int. Symp. Inf. Theory (ISIT), 2017, pp. 2333–2337.
  • [37] J. Mo, P. Schniter, and R. W. Heath, “Channel estimation in broadband millimeter wave MIMO systems with few-bit ADCs,” IEEE Trans. Signal Process., vol. 66, no. 5, pp. 1141–1154, 2018.
  • [38] H. He, C.-K. Wen, and S. Jin, “Bayesian optimal data detector for hybrid mmWave MIMO-OFDM systems with low-resolution ADCs,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 469–483, 2018.
  • [39] Report ITU-R M.2412-0, “Guidelines for evaluation of radio interface technologies for IMT-2020,” ITU-R, Tech. Rep., 2017.
  • [40] J. Max, “Quantizing for minimum distortion,” IRE Trans. Inf. Theory, vol. 6, no. 1, pp. 7–12, 1960.
  • [41] H. Masoumi and M. J. Emadi, “Performance analysis of cell-free massive MIMO system with limited fronthaul capacity and hardware impairments,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 1038–1053, 2020.
  • [42] B. Murmann, “ADC performance survey 1997-2021,” http://web.stanford.edu/ murmann/adcsurvey.html.
  • [43] L. N. Ribeiro, S. Schwarz, M. Rupp, and A. L. F. de Almeida, “Energy efficiency of mmWave massive MIMO precoding with low-resolution DACs,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 2, pp. 298–312, 2018.
  • [44] H.-S. Lee and C. G. Sodini, “Analog-to-digital converters: Digitizing the analog world,” Proc. IEEE, vol. 96, no. 2, pp. 323–334, 2008.
  • [45] X. Yu, D. Xu, and R. Schober, “MISO wireless communication systems via intelligent reflecting surfaces,” in 2019 IEEE/CIC Int. Conf. Commun. China (ICCC), 2019, pp. 735–740.
  • [46] Y. Xiu, J. Zhao, E. Basar, M. D. Renzo, W. Sun, G. Gui, and N. Wei, “Uplink achievable rate maximization for reconfigurable intelligent surface aided millimeter wave systems with resolution-adaptive ADCs,” IEEE Wireless Commun. Lett., vol. 10, no. 8, pp. 1608–1612, 2021.
  • [47] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, 2007.