This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Channel Estimation By Transmitting Pilots From Reconfigurable Intelligent Surface

Yanze Zhu, Yang Liu, Qingqing Wu, Changsheng You, and Qingjiang Shi    Y. Zhu and Y. Liu are with the School of Information and Communication Engineering, Dalian University of Technology, Dalian, China, email: [email protected], [email protected].   Q. Wu is with the Department of Electronic Engineering, Shanghai Jiao Tong University, 200240, China, email: [email protected].   C. You is with the Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China, email: [email protected].   Q. Shi is with the School of Software Engineering, Tongji University, Shanghai, China, and also with the Shenzhen Research Institute of Big Data, Shenzhen, China, email: [email protected].
Abstract

Reconfigurable intelligent surface (RIS) is a promising technology for future wireless communication systems. Channel estimation (CE) of RIS device is a critical but also challenging issue for its development. The mainstream of existing CE methods is confined to the so-called cascaded channel (CscdChn) estimation scheme, which treats the multiplicative two-hop RIS channels as an effective one and measures it as a whole. This CscdChn training method suffers from severe double-fading attenuation loss, which significantly degrades the CE accuracy. In this paper, we propose a novel RIS-transmitting (RIS-TX) based CE scheme, which has lower pilot overhead than CscdChn scheme and effectively overcomes the double-fading curse via incorporating only one single transmit radio frequency (RF)-chain into RIS. We develop highly efficient gradient descent (GD) and penalty duality decomposition (PDD)-based solutions to resolve the pilot design task for the RIS-TX CE scheme, which is a difficult quartic optimization problem. Our designed pilot signal outperforms the discrete Fourier transform (DFT) sequence, which is reported to be optimal for CscdChn scheme. Besides, both theoretical analysis and numerical results demonstrate that our proposed RIS-TX scheme exhibits distinct performance characteristics as opposed to its CscdChn counterpart and yields superior accuracy when RIS device is not extremely large.

Index Terms:
Reconfigurable intelligent surface (RIS), channel estimation (CE), gradient descent (GD), penalty duality decomposition (PDD), pilot design.

I Introduction

A. Background

The emerging reconfigurable intelligent surface (RIS) [1], which is also known as intelligent reflecting surface (IRS) [2], has attracted upsurging interests from both academia and industry, and is envisioned as a promising technology for the next-generation wireless communication systems [3]. The RIS can boost the network communication performance from various aspects while consuming rather low energy. Potentials of RIS have been extensively explored recently and many exciting applications can be found in [1]-[3] and the references therein.

Besides its versatility, one most critical aspect of RIS technology is the channel state information (CSI) acquisition. CSI is essential to configure RIS’ phase-shifting, which, however, is practically difficult to obtain for RIS due to its incapability of transmitting, receiving or processing signals. Therefore, a multitude of recent works are dedicated to exploring channel estimation (CE) techniques suitable for RIS, e.g., the references in [4]. Among them, most available methods belong to the so-called cascaded channel (CscdChn) estimation scheme, e.g., [5]-[13]. That is, the effective channel cascading the two-hop links by way of RIS is measured as a whole during the training procedure. For instance, the work [5] proposes to turn on one single RIS element at a time and estimate each CscdChn one after another. In [6], the authors design the pilot sequence according to the minimum variance unbiased estimation rule. The authors of [7] decompose the CscdChn into rank-one subchannels and design pilot signals via adjusting RIS phase-shifting to minimize the mean square error (MSE) of the estimate of subchannels. The paper [8] proposes both linear minimum mean square error (LMMSE) based and deep learning based pilot design methods to improve the CE precision of the CscdChn and shows that discrete Fourier transform (DFT) pilots can achieve nearly optimal MSE. The authors of [9] wisely exploit the fact that the link between the base station (BS) and RIS are the same for all users’ CscdChns to reduce the overall training overhead of multi-user system. The work [10] studies the RIS training method for orthogonal frequency division multiplexing (OFDM) system via optimizing the training phase-shifts and demonstrates that the DFT sequence can achieve the optimal performance. The authors of [11] propose a hierarchical training scheme via grouping the RIS elements and performing CE progressively using discrete phase shifters. Besides, a line of research, e.g., [12] and [13], explores to leverage the hidden sparsity in channel for high frequency/large antenna array systems to decrease training overhead.

Although the CscdChn training method predominates the existing literature, there still exist a number of works exploring active methods at the RIS side for channel training [14]-[18]. For example, the work [14] proposes a hybrid RIS structure via embedding sensors in element array to measure channels. The authors of [15] install one single receive (RX) RF-chain in RIS and utilize it to develop a compressive sensing (CS) based channel recovery algorithm via exploiting channel’s sparsity. In [16], active sensing elements are installed on the uniform planar element array of the RIS and arranged in “L” shape to sense the channels. The latest works [17] and [18] consider similar hybrid RIS architecture as proposed in [14] and utilize tensor decomposition and atomic norm methods to recover the sparse channels, respectively. Besides, some works investigate extraction of one-hop CSI of BS-RIS or RIS-user through the reflected pilot signals. For instance, the authors of [19] and [20] propose factor decomposition and matrix completion methods to achieve this goal. The work [21] exploits matrix calibration theory and channel sparsity to recover separate RIS channels.

B. Motivation

Along with the deepening research on RIS technology, some critical insights have been obtained very recently. Especially, as pointed out by [22], one predominant defect of RIS is the severe fading loss of the concatenated channels. This feature is named as double-fading effect in [22] and derives from the multiplicative nature of the CscdChn’s attenuation. As analyzed in [22], the attenuation loss experienced by the two-hop channels going through RIS is generally several orders of magnitude larger than that of the direct channel. This effect has also been substantiated by numerical experiments and real-field tests reported in the latest works [23] and [24]. At the same time, according to Cramér-Rao lower bound (CRLB) analysis presented in [1] and [25], the MSE of CE is inversely proportional to the signal-to-noise-ratio (SNR). Therefore, the double-fading loss indeed severely weakens the pilot signals’ receiving SNR and hence bottlenecks the CE performances of the classical CscdChn training schemes, e.g., [5]-[13], and the one-hop CSI extraction methods relying on the reflected pilot signals, e.g., [19]-[21].

Based on the above inspections, to improve the CE precision, one natural thought is to breakthrough the double-fading curse. In fact, a number of works have made efforts toward this end. For instance, the authors of [23] and [24] have proposed a novel active-RIS architecture by introducing amplifiers into RIS elements to magnify the impinging signals. At the same time, the works [14]-[18] introduce a small number of RX RF-chains at the RIS side to perform signal receiving locally.

Enlightened by all the above works, this paper proposes a novel RIS channel training scheme. Specifically, we equip the RIS with one single TX RF-chain [15], [26], as shown in Fig. 1. During the channel training period, the TX RF-chain is connected to all the phase shifters [26], which enables the RIS to broadcast pilot signals to the BS and all users. This novel scheme cherishes the following advantages:

  • This scheme can overcome the double-fading curse. In fact, only “one-hop” channels are estimated.

  • Only one TX RF-chain is required, which can largely maintain the hardware efficiency of RIS device.

  • The BS and all users conduct CE simultaneously. This makes the RIS channel training overhead independent of the number of users.

Refer to caption
Fig. 1. CE based on the proposed transmit-able RIS.

C. Contributions

Motivated by the above observations, this paper proposes a novel RIS-transmitting (RIS-TX) channel training scheme as specified above. We conduct a comprehensive research on this new scheme. Specifically, the contributions of this paper are specified as follows:

  • Firstly, we put forward a brand new RIS-TX CE scheme. Via incorporating one TX RF-chain into RIS device, we propose that RIS broadcasts “pilot signals”, which are indeed implemented by RIS elements’ predefined phase shifting, to the BS and all mobile users. This novel CE scheme’s pilot overhead is lower than the conventional CscdChn methods, e.g., [4]-[11], [19], [20], and can effectively overcome double-fading effect. To the best of authors’ knowledge, similar RIS-TX scheme has never been considered in the existing literature, e.g., [4]-[26].

  • Next, we study the associated pilot sequence design problem. Specifically, we aim at minimizing the estimation MSE of the effective RIS channels of all users via designing the pilot sequences. This task turns out to be a nonconvex quartic optimization problem and is much more challenging than those in the existing relevant literature, e.g., [6]-[11]. To tackle this difficult problem, we successfully develop a gradient descent (GD)-based solution, which exhibits fast convergence.

  • Besides, we also develop an alternative pilot design solution by exploiting the cutting-the-edge penalty duality decomposition (PDD) framework [27]. This PDD-based solution has all its block coordinates updated either in closed form or by solving a structured linear equation and consequently runs highly efficiently.

  • Furthermore, we also theoretically analyze the performance of our RIS-TX training scheme. CRLB analysis has uncovered one important fact that the number of RIS elements exerts opposite impact on the RIS-TX scheme and the prevailing CscdChn counterpart. The former yields superior estimation accuracy when RIS has moderate size (e.g. \leq 10000 elements) while the latter is more beneficial for large-scale RIS. Asymptotic analysis reflects that the two competing CE schemes exhibit distinct scaling laws in power and RIS size. These insights provide valuable guidelines for selecting between the RIS-TX and CscdChn scheme in various application scenarios, which have never been discussed in the existing literature, e.g., [4]-[24].

  • Last but not least, extensive numerical results demonstrate the effectiveness of our proposed algorithms and the benefit of the novel RIS-TX training scheme. Very interestingly, the pilot sequences optimized by our GD/PDD-based algorithms outperform the DFT pilot sequence, which has been reported to be optimal for the CscdChn scheme by different literature [6]-[8], [10], [11].

II System Model and Problem Formulation

In this section, we will elaborate the system setting, the channel training procedure and introduce the problem formulation of pilot design towards minimizing the CSI estimation errors.

A. System Setting

As illustrated in Fig. 1, we consider a narrowband time division duplex (TDD) cellular system which consists of a BS equipped with MBM_{\mathrm{B}} antennas, KK single-antenna users and a RIS containing MRM_{\mathrm{R}} reflecting elements. The sets of users and RIS elements are denoted by 𝒦{1,,K}\mathcal{K}\triangleq\{1,\cdots,K\} and {1,,MR}\mathcal{M}\triangleq\{1,\cdots,M_{\mathrm{R}}\}, respectively. Moreover, we denote 𝐆MB×MR\mathbf{G}\in\mathbb{C}^{M_{\mathrm{B}}\times M_{\mathrm{R}}} and 𝐡kMR,k𝒦\mathbf{h}_{k}\in\mathbb{C}^{M_{\mathrm{R}}},\;k\in\mathcal{K}, as the channel between the BS and the RIS and the one between the RIS and the kkth user, respectively. Note that our setting does not consider the direct channels connecting the BS and users for simplicity, which can be easily acquired by following the standard CE procedure for conventional cellular system without RIS [28].

As previously discussed, one RF-chain is built in the RIS device to empower it with transmission capability, as shown in Fig. 1. During the channel training period, every RIS element is switched to connecting the single RF-chain and adjusts the signal’s phase before emitting it over the air. Specifically, denote the RF-chain signal as a complex scalar αejθ0\sqrt{\alpha}e^{j\theta_{0}} with α\sqrt{\alpha} and θ0\theta_{0} representing its amplitude and phase, respectively. Therefore, the emitted signal from the RIS can be expressed as 𝝍[αej(θ0+θ1),,αej(θ0+θMR)]T\bm{\psi}\triangleq[\sqrt{\alpha}e^{j(\theta_{0}+\theta_{1})},\cdots,\sqrt{\alpha}e^{j(\theta_{0}+\theta_{M_{\mathrm{R}}})}]^{\mathrm{T}} with θi,i\theta_{i},\;i\in\mathcal{M}, being the phase shift yielded by the iith element. Since the pilot signal is predefined, we can just assume θ0=0\theta_{0}=0 without loss of generality.

B. Channel Estimation

Based on the above, we elaborate the proposed RIS-TX CSI acquisition procedure. The training period is composed of TT consecutive time slots and is assumed to be shorter than the channel’s coherence time. Within each time slot, the RIS broadcasts one pilot symbol, which is phase-modulated by the RIS elements, to the BS and all users. Specifically, denoting 𝒯{1,,T}\mathcal{T}\triangleq\{1,\cdots,T\}, we can express the transmitted signal in the ttth time slot as [7]-[11]

𝐱(t)=𝝍(t)=αtϕ(t),t𝒯,\mathbf{x}^{(t)}=\bm{\psi}^{(t)}=\sqrt{\alpha_{t}}\bm{\phi}^{(t)},\;t\in\mathcal{T}, (1)

where ϕ(t)[ejθ1(t),,ejθMR(t)]T\bm{\phi}^{(t)}\triangleq[e^{j\theta_{1}^{(t)}},\cdots,e^{j\theta_{M_{\mathrm{R}}}^{(t)}}]^{\mathrm{T}}.

After collecting all TT pilot symbols from the RIS simultaneously, the BS and all users conduct signal processing and obtain the estimate of their own channel connecting the RIS. This procedure will be detailed in the sequel.

In the ttth time interval, the BS receives the signal from the RIS which is given as

𝐲B(t)=𝐆𝝍(t)+𝐧B(t),t𝒯,\mathbf{y}_{\mathrm{B}}^{(t)}=\mathbf{G}\bm{\psi}^{(t)}+\mathbf{n}_{\mathrm{B}}^{(t)},\;t\in\mathcal{T}, (2)

where 𝐧B(t),t𝒯\mathbf{n}_{\mathrm{B}}^{(t)},\;t\in\mathcal{T}, represents the thermal noise at the BS and 𝐧B(t)𝒞𝒩(𝟎,𝚺B)\mathbf{n}_{\mathrm{B}}^{(t)}\sim\mathcal{CN}(\bm{0},\bm{\Sigma}_{\mathrm{B}}) with the positive definite matrix 𝚺B\bm{\Sigma}_{\mathrm{B}} denoting the covariance matrix of the noise which is assumed to be known [8], [9], [29].11Note the noise statistics generally vary slowly in a large timescale [30], which hence can be obtained at a low expense of time overhead. It is also reasonable to assume that the noise covariance matrix is invariant during the coherence time, which indicates that different 𝐧B(t)\mathbf{n}_{\mathrm{B}}^{(t)}’s have one same covariance matrix, i.e., 𝚺B\bm{\Sigma}_{\mathrm{B}}. Besides, we assume that 𝐧B(t)\mathbf{n}_{\mathrm{B}}^{(t)}’s with different tt are independent for simplicity [6]-[11]. To fully determine the channel 𝐆\mathbf{G}, the length of the pilot sequence TT should satisfy TMRT\geq M_{\mathrm{R}}. To reduce training’s overhead, we just assume that T=MRT=M_{\mathrm{R}}. By collecting all TT observations and stacking them in a column-by-column manner, the training data obtained by the BS can be represented as

𝐘B=𝐆𝚿+𝐍B,\mathbf{Y}_{\mathrm{B}}=\mathbf{G}\bm{\Psi}+\mathbf{N}_{\mathrm{B}}, (3)

where 𝐘B[𝐲B(1),,𝐲B(T)]\mathbf{Y}_{\mathrm{B}}\triangleq[\mathbf{y}_{\mathrm{B}}^{(1)},\cdots,\mathbf{y}_{\mathrm{B}}^{(T)}], 𝚿[𝝍(1),,𝝍(T)]\bm{\Psi}\triangleq[\bm{\psi}^{(1)},\cdots,\bm{\psi}^{(T)}], 𝐍B[𝐧B(1),,𝐧B(T)]\mathbf{N}_{\mathrm{B}}\triangleq[\mathbf{n}_{\mathrm{B}}^{(1)},\cdots,\mathbf{n}_{\mathrm{B}}^{(T)}]. Via vectorizing 𝐘B\mathbf{Y}_{\mathrm{B}} and utilizing the formula 𝗏𝖾𝖼(𝐀𝐁𝐂)=(𝐂T𝐀)𝗏𝖾𝖼(𝐁)\mathsf{vec}(\mathbf{ABC})=(\mathbf{C}^{\mathrm{T}}\otimes\mathbf{A})\mathsf{vec}(\mathbf{B}), we obtain

𝐲B=𝗏𝖾𝖼(𝐘B)=(𝚿T𝐈MB)𝐠+𝐧B=𝖠G(𝚿)𝐠+𝐧B,\mathbf{y}_{\mathrm{B}}=\mathsf{vec}(\mathbf{Y}_{\mathrm{B}})=(\bm{\Psi}^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{g}+\mathbf{n}_{\mathrm{B}}=\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{g}+\mathbf{n}_{\mathrm{B}}, (4)

where 𝐠𝗏𝖾𝖼(𝐆)\mathbf{g}\triangleq\mathsf{vec}(\mathbf{G}), 𝐧B𝗏𝖾𝖼(𝐍B)\mathbf{n}_{\mathrm{B}}\triangleq\mathsf{vec}(\mathbf{N}_{\mathrm{B}}), 𝖠G(𝚿)𝚿T𝐈MB\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\triangleq\bm{\Psi}^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}} and 𝐈MB\mathbf{I}_{M_{\mathrm{B}}} is MB×MBM_{\mathrm{B}}\times M_{\mathrm{B}} identity matrix.

Similarly, the compact form of the signals observed by the kkth user during the whole training period can be expressed as

𝐲U,k=𝚿T𝐡k+𝐧U,k,k𝒦,\mathbf{y}_{\mathrm{U},k}=\bm{\Psi}^{\mathrm{T}}\mathbf{h}_{k}+\mathbf{n}_{\mathrm{U},k},\;k\in\mathcal{K}, (5)

where 𝐧U,k[nU,k(1),,nU,k(T)]T\mathbf{n}_{\mathrm{U},k}\triangleq[n_{\mathrm{U},k}^{(1)},\cdots,n_{\mathrm{U},k}^{(T)}]^{\mathrm{T}} and nU,k(t),k𝒦,t𝒯n_{\mathrm{U},k}^{(t)},\;k\in\mathcal{K},\;t\in\mathcal{T}, indicates the thermal noise at the kkth user in the ttth interval with nU,k(t)𝒞𝒩(0,σU,k2)n_{\mathrm{U},k}^{(t)}\sim\mathcal{CN}(0,\sigma_{\mathrm{U},k}^{2}). For simplicity, it is assumed that each nU,k(t)n_{\mathrm{U},k}^{(t)} is independent to others for different users and time slots.

In this paper, we assume that the channels 𝐠\mathbf{g} and {𝐡k}k=1K\{\mathbf{h}_{k}\}_{k=1}^{K} have zero mean and known covariance matrices. Specifically, we denote 𝐑G\mathbf{R}_{\mathrm{G}} and 𝐑h,k,k𝒦\mathbf{R}_{\mathrm{h},k},\;k\in\mathcal{K}, as covariance matrices of 𝐠\mathbf{g} and 𝐡k,k𝒦\mathbf{h}_{k},\;k\in\mathcal{K}, respectively. In reality, channel statistics generally vary slowly and can be easily obtained [29]. For instance, the covariance matrix can be empirically estimated at low overhead [31], [32]. Besides, the zero mean assumption can also be readily satisfied via subtracting their nonzero mean values. Besides, it is reasonable to assume that the channels associated with the BS and users are mutually uncorrelated. With the linear receivers 𝐖G\mathbf{W}_{\mathrm{G}} at the BS and 𝐖h,k,k𝒦\mathbf{W}_{\mathrm{h},k},\;k\in\mathcal{K}, at the kkth user, the BS/every user obtains the linear estimate of their own channels given as follows

𝐠^=𝐖GH𝐲B=𝐖GH𝖠G(𝚿)𝐠+𝐖GH𝐧B,\displaystyle\hat{\mathbf{g}}=\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}\mathbf{y}_{\mathrm{B}}=\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{g}+\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}\mathbf{n}_{\mathrm{B}}, (6)
𝐡^k=𝐖h,kH𝐲U,k=𝐖h,kH𝚿T𝐡k+𝐖h,kH𝐧U,k,k𝒦,\displaystyle\hat{\mathbf{h}}_{k}=\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\mathbf{y}_{\mathrm{U},k}=\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Psi}^{\mathrm{T}}\mathbf{h}_{k}+\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\mathbf{n}_{\mathrm{U},k},\;k\in\mathcal{K}, (7)

whose covariance matrices are given as follows after some manipulations

𝐂G^=𝐖GH𝖠G(𝚿)𝐑G𝖠GH(𝚿)𝐖G+𝐖GH(𝚺B𝐈MR)𝐖G,\displaystyle\mathbf{C}_{\hat{\mathrm{G}}}\!=\!\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}}\!+\!\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Sigma}_{\mathrm{B}}\!\!\otimes\!\!\mathbf{I}_{M_{\mathrm{R}}})\mathbf{W}_{\mathrm{G}}, (8)
𝐂h^,k=𝐖h,kH𝚿T𝐑h,k𝚿𝐖h,k+𝐖h,kH𝚺U,k𝐖h,k,k𝒦,\displaystyle\mathbf{C}_{\hat{\mathrm{h}},k}\!=\!\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{\mathrm{*}}\mathbf{W}_{\mathrm{h},k}\!+\!\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Sigma}_{\mathrm{U},k}\mathbf{W}_{\mathrm{h},k},k\!\in\!\mathcal{K}, (9)

where 𝚺U,k𝖣𝗂𝖺𝗀(σU,k2,,σU,k2),k𝒦\bm{\Sigma}_{\mathrm{U},k}\triangleq\mathsf{Diag}(\sigma_{\mathrm{U},k}^{2},\cdots,\sigma_{\mathrm{U},k}^{2}),\;k\in\mathcal{K}.

C. Pilot Design Problem

After performing CE, mobile users feedback their local estimates to the BS. The effective CscdChn by way of the RIS for each user is defined by 𝐇c,k𝐆𝖣𝗂𝖺𝗀(𝐡k),k𝒦\mathbf{H}_{\mathrm{c},k}\triangleq\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}),\;k\in\mathcal{K}, and its estimate is given as 𝐇^c,k𝐆^𝖣𝗂𝖺𝗀(𝐡^k),k𝒦\hat{\mathbf{H}}_{\mathrm{c},k}\triangleq\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}),\;k\in\mathcal{K}. Based on {𝐇^c,k}k=1K\{\hat{\mathbf{H}}_{\mathrm{c},k}\}_{k=1}^{K}, the BS will determine the scheduling and beamforming for all users. Therefore, it is critical to minimize the estimation error between the estimate {𝐇^c,k}k=1K\{\hat{\mathbf{H}}_{\mathrm{c},k}\}_{k=1}^{K} and their true values. To evaluate the estimation performance, the MSE matrix of 𝐇^c,k,k𝒦\hat{\mathbf{H}}_{\mathrm{c},k},\;k\in\mathcal{K}, is defined as 𝐂Hc,k𝔼{𝗏𝖾𝖼(𝐇c,k𝐇^c,k)𝗏𝖾𝖼H(𝐇c,k𝐇^c,k)},k𝒦\mathbf{C}_{\mathrm{H}_{\mathrm{c}},k}\triangleq\mathbb{E}\{\mathsf{vec}(\mathbf{H}_{\mathrm{c},k}-\hat{\mathbf{H}}_{\mathrm{c},k})\mathsf{vec}^{\mathrm{H}}(\mathbf{H}_{\mathrm{c},k}-\hat{\mathbf{H}}_{\mathrm{c},k})\},\;k\in\mathcal{K}, and its value is determined by the following theorem, which is derived in Appendix A.

Theorem 1.

The value of 𝐂Hc,k𝔼{𝗏𝖾𝖼(𝐇c,k𝐇^c,k)𝗏𝖾𝖼H(𝐇c,k𝐇^c,k)},k𝒦\mathbf{C}_{\mathrm{H}_{\mathrm{c}},k}\triangleq\mathbb{E}\{\mathsf{vec}(\mathbf{H}_{\mathrm{c},k}-\hat{\mathbf{H}}_{\mathrm{c},k})\mathsf{vec}^{\mathrm{H}}(\mathbf{H}_{\mathrm{c},k}-\hat{\mathbf{H}}_{\mathrm{c},k})\},\;k\in\mathcal{K}, is given by

𝐂Hc,k=𝐑G(𝐑h,k𝟙MB)+𝐂G^(𝐂h^,k𝟙MB)\displaystyle\mathbf{C}_{\mathrm{H}_{\mathrm{c}},k}=\mathbf{R}_{\mathrm{G}}\odot(\mathbf{R}_{\mathrm{h},k}\otimes\mathbbm{1}_{M_{\mathrm{B}}})+\mathbf{C}_{\hat{\mathrm{G}}}\odot(\mathbf{C}_{\hat{\mathrm{h}},k}\otimes\mathbbm{1}_{M_{\mathrm{B}}})
(𝐑G𝖠GH(𝚿)𝐖G)((𝐑h,k𝚿𝐖h,k)𝟙MB)\displaystyle-(\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}})\odot((\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{\mathrm{*}}\mathbf{W}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}})
((𝐑G𝖠GH(𝚿)𝐖G)((𝐑h,k𝚿𝐖h,k)𝟙MB))H,\displaystyle-((\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}})\odot((\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{\mathrm{*}}\mathbf{W}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}))^{\mathrm{H}}, (10)

where 𝟙MB\mathbbm{1}_{M_{\mathrm{B}}} denotes an MB×MBM_{\mathrm{B}}\times M_{\mathrm{B}} matrix with all elements being 11 and \odot indicates Hadamard product.

Hence, the estimation MSE of the effective channel 𝐇^c,k,k𝒦\hat{\mathbf{H}}_{\mathrm{c},k},\;k\in\mathcal{K}, is given as 𝖳𝗋{𝐂Hc,k},k𝒦\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}\},\;k\in\mathcal{K}, which is actually a function of the pilot sequence 𝚿\bm{\Psi}. In the subsequent exposition, we define 𝚿=𝚽𝐏\bm{\Psi}=\bm{\Phi}\mathbf{P} with 𝚽[ϕ(1),,ϕ(T)]T\bm{\Phi}\triangleq[\bm{\phi}^{(1)},\cdots,\bm{\phi}^{(T)}]^{\mathrm{T}} and 𝐏𝖣𝗂𝖺𝗀(α1,,αT)\mathbf{P}\triangleq\mathsf{Diag}(\sqrt{\alpha_{1}},\cdots,\sqrt{\alpha_{T}}). Besides, we denote 𝓦h{𝐖h,k}k=1K\bm{\mathcal{W}}_{\mathrm{h}}\triangleq\{\mathbf{W}_{\mathrm{h},k}\}_{k=1}^{K}.

Based on the above exposition, we aim at minimizing the overall MSE of all users’ effective channels {𝐇^c,k}k=1K\{\hat{\mathbf{H}}_{\mathrm{c},k}\}_{k=1}^{K} via designing the pilot sequence and the linear receivers (𝚽,𝐏,𝐖G,𝓦h)(\bm{\Phi},\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}}). This task is formulated as the following problem

(𝒫1):min𝚽,𝐏,𝐖G,𝓦h\displaystyle(\mathcal{P}1):\min_{\bm{\Phi},\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}}}\; k=1K𝖳𝗋{𝐂Hc,k(𝚽,𝐏,𝐖G,𝓦h)}K\displaystyle\sum_{k=1}^{K}\frac{\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}(\bm{\Phi},\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}})\}}{K} (11)
s.t.\displaystyle\mathrm{s.t.}\; 𝚽𝐏F2Pmax,\displaystyle\|\bm{\Phi}\mathbf{P}\|_{\mathrm{F}}^{2}\leq P_{\mathrm{max}}, (11a)
|[𝚽]i,j|=1,i,j,\displaystyle|[\bm{\Phi}]_{i,j}|=1,\;i,j\in\mathcal{M}, (11b)
αt0,t𝒯,\displaystyle\alpha_{t}\geq 0,\;t\in\mathcal{T}, (11c)

where PmaxP_{\mathrm{max}} in (11a) represents the overall transmission power budget for training. The problem (𝒫1)(\mathcal{P}1) is highly challenging due to its nonconvex objective. Especially, it is a nonconvex quartic function in (𝚽,𝐏)(\bm{\Phi},\mathbf{P}). In the following, we propose two algorithms to deal with (𝒫1)(\mathcal{P}1).

III Pilot Sequence Design

In this section, we develop efficient solutions to attack the pilot design problem (𝒫1)(\mathcal{P}1).

A. GD-Based Solution to (𝒫1)(\mathcal{P}1)

In the following, we resolve (𝒫1)(\mathcal{P}1) in a block coordinate descent (BCD) [33] manner.

1) Optimize the phase shift matrix 𝚽\bm{\Phi}

Fixing other variables, the optimization w.r.t. 𝚽\bm{\Phi} is given by

(𝒫2):min𝚽\displaystyle(\mathcal{P}2):\min_{\bm{\Phi}}\; k=1K𝖳𝗋{𝐂Hc,k(𝚽|𝐏,𝐖G,𝓦h)}\displaystyle{\sum}_{k=1}^{K}\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}(\bm{\Phi}|\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}})\} (12)
s.t.\displaystyle\mathrm{s.t.}\; |[𝚽]i,j|=1,i,j,\displaystyle|[\bm{\Phi}]_{i,j}|=1,\;i,j\in\mathcal{M}, (12a)

which is non-convex due to the equality constraint (12a) and challenging to solve. Hence, we turn to investigate another optimization equivalent to (𝒫2)(\mathcal{P}2) which can be expressed as

(𝒫3):min𝚯\displaystyle(\mathcal{P}3):\min_{\bm{\Theta}}\; k=1K𝖳𝗋{𝐂Hc,k(𝚯|𝐏,𝐖G,𝓦h)}\displaystyle{\sum}_{k=1}^{K}\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}(\bm{\Theta}|\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}})\} (13)
s.t.\displaystyle\mathrm{s.t.}\; [𝚯]i,j[0,2π],i,j,\displaystyle[\bm{\Theta}]_{i,j}\in[0,2\pi],\;i,j\in\mathcal{M}, (13a)

where 𝚯(𝚽)\bm{\Theta}\triangleq\angle(\bm{\Phi}) and (𝚽)\angle(\bm{\Phi}) takes the value of phases of 𝚽\bm{\Phi}’s entries in an element-wise manner. By noting that ej(θ+2nπ)=ejθ,ne^{j(\theta+2n\pi)}=e^{j\theta},\;n\in\mathbb{Z}, problem (𝒫3)(\mathcal{P}3) can be viewed as an unconstrained optimization problem w.r.t. 𝚯\bm{\Theta}. Therefore, we adopt GD algorithm to tackle (𝒫3)(\mathcal{P}3). Denote 𝗀(𝚯)\mathsf{g}(\bm{\Theta}) as the objective of (𝒫3)(\mathcal{P}3). Then, the gradient descent of 𝚯\bm{\Theta} is conducted as

𝚯(n+1):=𝚯(n)η𝚯(n)𝚯𝗀(𝚯)|𝚯=𝚯(n),\bm{\Theta}^{(n+1)}:=\bm{\Theta}^{(n)}-\eta_{\bm{\Theta}}^{(n)}\nabla_{\bm{\Theta}}\mathsf{g}(\bm{\Theta})|_{\bm{\Theta}=\bm{\Theta}^{(n)}}, (14)

where η𝚯(n)\eta_{\bm{\Theta}}^{(n)} is the step size for updating 𝚯\bm{\Theta}. Generally, constant or diminishing step size rule can yield satisfactory convergence [34]. We relegate the derivation of 𝚯𝗀(𝚯)\nabla_{\bm{\Theta}}\mathsf{g}(\bm{\Theta}) to Appendix B. Lastly, 𝚽(n+1)\bm{\Phi}^{(n+1)} is given by 𝚽(n+1)=ej𝚯(n+1)\bm{\Phi}^{(n+1)}=e^{j\bm{\Theta}^{(n+1)}}.

2) Optimize the power allocation 𝐏\mathbf{P}

We proceed to study the optimization of power allocation 𝐏\mathbf{P} when other variables are given. Notice again 𝐏𝖣𝗂𝖺𝗀(α1,,αT)\mathbf{P}\triangleq\mathsf{Diag}(\sqrt{\alpha_{1}},\cdots,\sqrt{\alpha_{T}}). Therefore, by defining 𝐩𝖽𝗂𝖺𝗀(𝐏)\mathbf{p}\triangleq\mathsf{diag}(\mathbf{P}), the sub-problem to optimize power allocation can be expressed as

(𝒫4):min𝐩T\displaystyle(\mathcal{P}4):\min_{\mathbf{p}\in\mathbb{R}^{T}}\; k=1K𝖳𝗋{𝐂Hc,k(𝐩|𝚽,𝐖G,𝓦h)}\displaystyle{\sum}_{k=1}^{K}\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}(\mathbf{p}|\bm{\Phi},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}})\} (15)
s.t.\displaystyle\mathrm{s.t.}\; 𝐩T𝐩PmaxMR,\displaystyle\mathbf{p}^{\mathrm{T}}\mathbf{p}\leq\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}, (15a)
𝐩𝟎.\displaystyle\mathbf{p}\geq\bm{0}. (15b)

The objective of (𝒫4)(\mathcal{P}4) is non-convex and we still adopt GD-like method to attack it. Note that plain GD algorithm only applies to unconstrained optimization problem. For the constrained problem (𝒫4)(\mathcal{P}4), we adopt the gradient projection (GP) method [34]. GP method is an iterative procedure. In each iteration, it performs two steps in order: i) gradient descent and ii) projection onto the feasible domain. Denoting the objective of (𝒫4)(\mathcal{P}4) as 𝗁(𝐩)\mathsf{h}(\mathbf{p}), we elaborate these two steps in the following:

  • Gradient Descent: The variable 𝐩\mathbf{p} moves in the direction of negative derivative, i.e.

    𝐩(n+1):=𝐩(n)ηp(n)𝐩𝗁(𝐩)|𝐩=𝐩(n),\mathbf{p}^{(n+1)}:=\mathbf{p}^{(n)}-\eta_{\mathrm{p}}^{(n)}\nabla_{\mathbf{p}}\mathsf{h}(\mathbf{p})|_{\mathbf{p}=\mathbf{p}^{(n)}}, (16)

    where ηp(n)\eta_{\mathrm{p}}^{(n)} is the step size for updating power allocation, which can be set by following the step size rules presented in [34]. The calculation of 𝐩𝗁(𝐩)\nabla_{\mathbf{p}}\mathsf{h}(\mathbf{p}) is shown in Appendix C.

  • Projection: After performing the update of 𝐩\mathbf{p} as shown above, there exists a risk that 𝐩(n+1)\mathbf{p}^{(n+1)} moves out of the feasible domain identified by (15a) and (15b). Hence, we need project 𝐩(n+1)\mathbf{p}^{(n+1)} onto the convex feasible domain of (𝒫4)(\mathcal{P}4) [34], which means to solve the following optimization

    (𝒫5):min𝐱T\displaystyle(\mathcal{P}5):\min_{\mathbf{x}\in\mathbb{R}^{T}}\; 12𝐱𝐩(n+1)22\displaystyle\frac{1}{2}\|\mathbf{x}-\mathbf{p}^{(n+1)}\|_{2}^{2} (17)
    s.t.\displaystyle\mathrm{s.t.}\; 𝐱T𝐱PmaxMR,\displaystyle\mathbf{x}^{\mathrm{T}}\mathbf{x}\leq\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}, (17a)
    𝐱𝟎.\displaystyle\mathbf{x}\geq\bm{0}. (17b)

    The optimal solution to (𝒫5)(\mathcal{P}5) is given by the following Theorem 2, whose proof is left to Appendix D.

    Theorem 2.

    If [𝐩(n+1)]+22PmaxMR\|[\mathbf{p}^{(n+1)}]_{+}\|_{2}^{2}\leq\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}, we have

    𝐱=[𝐩(n+1)]+,\mathbf{x}^{\star}=[\mathbf{p}^{(n+1)}]_{+}, (18)

    otherwise,

    𝐱=PmaxMR[𝐩(n+1)]+[𝐩(n+1)]+2,\mathbf{x}^{\star}=\sqrt{\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}}\frac{[\mathbf{p}^{(n+1)}]_{+}}{\|[\mathbf{p}^{(n+1)}]_{+}\|_{2}}, (19)

    where [𝐩(n+1)]+max{𝐩(n+1),𝟎}[\mathbf{p}^{(n+1)}]_{+}\triangleq\max\{\mathbf{p}^{(n+1)},\bm{0}\} in an element-wise manner.

3) Optimize the linear receiver at the BS 𝐖G\mathbf{W}_{\mathrm{G}}

When other variables are given, the subproblem w.r.t. 𝐖G\mathbf{W}_{\mathrm{G}} is indeed an unconstrained convex quadratic optimization problem. Therefore, by checking the first-order optimality condition of 𝐖G\mathbf{W}_{\mathrm{G}}, the newly obtained 𝐖G\mathbf{W}_{\mathrm{G}} can be expressed in the closed-form as follows

𝐖G=𝐉G𝐃G,1𝐃G,21,\mathbf{W}_{\mathrm{G}}=\mathbf{J}_{\mathrm{G}}\mathbf{D}_{\mathrm{G},1}\mathbf{D}_{\mathrm{G},2}^{-1}, (20)

where

𝐉G=(𝖠G(𝚿)𝐑G𝖠GH(𝚿)+𝚺B𝐈MR)1𝖠G(𝚿)𝐑G,\displaystyle\mathbf{J}_{\mathrm{G}}=(\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})+\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})^{-1}\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}},
𝐃G,1=k=1K𝖣𝖽𝗂𝖺𝗀((𝐖h,kH𝚿T𝐑h,k)𝟙MB),\displaystyle\mathbf{D}_{\mathrm{G},1}={\sum}_{k=1}^{K}\mathsf{Ddiag}((\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}),
𝐃G,2=k=1K𝖣𝖽𝗂𝖺𝗀(𝐂h^,k𝟙MB),\displaystyle\mathbf{D}_{\mathrm{G},2}={\sum}_{k=1}^{K}\mathsf{Ddiag}(\mathbf{C}_{\hat{\mathrm{h}},k}\otimes\mathbbm{1}_{M_{\mathrm{B}}}), (21)

with 𝖣𝖽𝗂𝖺𝗀(𝐗)\mathsf{Ddiag}(\mathbf{X}) constructing a diagonal matrix with its diagonal elements being those of the square matrix 𝐗\mathbf{X}. The derivation of (20) is detailed in Appendix E.

4) Optimize the linear receiver at every user 𝓦h\bm{\mathcal{W}}_{\mathrm{h}}

Following the similar procedure of updating 𝐖G\mathbf{W}_{\mathrm{G}}, the solution to 𝐖h,k,k𝒦\mathbf{W}_{\mathrm{h},k},\;k\in\mathcal{K}, can also be obtained in a closed-form as follows (details are omitted for brevity)

𝐖h,k=𝐉h,k𝐃h,1𝐃h,21,k𝒦,\mathbf{W}_{\mathrm{h},k}=\mathbf{J}_{\mathrm{h},k}\mathbf{D}_{\mathrm{h},1}\mathbf{D}_{\mathrm{h},2}^{-1},\;k\in\mathcal{K}, (22)

where

𝐉h,k=(𝚿T𝐑h,k𝚿+𝚺U,k)1𝚿T𝐑h,k,k𝒦,\mathbf{J}_{\mathrm{h},k}=(\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{*}+\bm{\Sigma}_{\mathrm{U},k})^{-1}\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k},\;k\in\mathcal{K}, (23)

𝐃h,1\mathbf{D}_{\mathrm{h},1} and 𝐃h,2\mathbf{D}_{\mathrm{h},2} are MRM_{\mathrm{R}}-dimensional diagonal matrices with [𝐃h,1]m,m=n=1MB[𝐖GH𝖠G(𝚿)𝐑G](m1)MB+n,(m1)MB+n,m[\mathbf{D}_{\mathrm{h},1}]_{m,m}=\sum_{n=1}^{M_{\mathrm{B}}}[\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}}]_{(m-1)M_{\mathrm{B}}+n,(m-1)M_{\mathrm{B}}+n},\;m\in\mathcal{M}, and [𝐃h,2]m,m=n=1MB[𝐂G^](m1)MB+n,(m1)MB+n,m[\mathbf{D}_{\mathrm{h},2}]_{m,m}=\sum_{n=1}^{M_{\mathrm{B}}}[\mathbf{C}_{\hat{\mathrm{G}}}]_{(m-1)M_{\mathrm{B}}+n,(m-1)M_{\mathrm{B}}+n},\;m\in\mathcal{M}, respectively.

The overall algorithm of solving (𝒫1)(\mathcal{P}1) based on GD is summarized in Algorithm 1.

1:  Initialize feasible 𝚿(0)=𝚽(0)𝐏(0)\bm{\Psi}^{(0)}=\bm{\Phi}^{(0)}\mathbf{P}^{(0)} and t=0t=0;
2:  repeat
3:     update 𝐖G(t+1)\mathbf{W}_{\mathrm{G}}^{(t+1)} by (20);
4:     update 𝓦h(t+1)\bm{\mathcal{W}}_{\mathrm{h}}^{(t+1)} by (22);
5:     update 𝚽(t+1)\bm{\Phi}^{(t+1)} by solving (𝒫2)(\mathcal{P}2) via GD procedure;
6:     update 𝐏(t+1)\mathbf{P}^{(t+1)} by solving (𝒫4)(\mathcal{P}4) via GP procedure;
7:     set 𝚿(t+1)=𝚽(t+1)𝐏(t+1)\bm{\Psi}^{(t+1)}=\bm{\Phi}^{(t+1)}\mathbf{P}^{(t+1)};
8:     t:=t+1t:=t+1;
9:  until a certain stopping criterion is reached
Algorithm 1 Solving the problem (𝒫1)(\mathcal{P}1) based on GD

B. PDD-Based Solution to (𝒫1)(\mathcal{P}1)

One potential defect of the previously discussed GD-based solution is its onerous gradient evaluation procedures, as reflected in Appendix B &\& C. In this subsection, we adopt the PDD methodology [27] to propose an alternative solution that updates all block coordinates highly efficiently. To this end, we first transform the difficult objective in (11) into a more tractable form amiable to block coordinate update via introducing auxiliary variables. Specifically, we introduce multiple copies of 𝚽𝐏\bm{\Phi}\mathbf{P} by defining

𝐔1,k=(𝚽𝐏),k𝒦,\displaystyle\mathbf{U}_{1,k}=(\bm{\Phi}\mathbf{P})^{*},\;k\in\mathcal{K}, (24)
𝐔2=(𝚽𝐏)𝐈MB,\displaystyle\mathbf{U}_{2}=(\bm{\Phi}\mathbf{P})^{*}\otimes\mathbf{I}_{M_{\mathrm{B}}}, (25)

and rewrite the original complicated objective into simple forms of the newly introduced copies as follows

𝐔3,k=𝐑h,k12𝐔1,k𝐖h,k,k𝒦,\displaystyle\mathbf{U}_{3,k}=\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}\mathbf{U}_{1,k}\mathbf{W}_{\mathrm{h},k},\;k\in\mathcal{K}, (26)
𝐔4=𝐑G12𝐔2𝐖G,\displaystyle\mathbf{U}_{4}=\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}\mathbf{U}_{2}\mathbf{W}_{\mathrm{G}}, (27)
𝐅1,k(𝐔3,k,𝐖h,k)𝐅~1,k(𝐔3,k,𝐖h,k)𝟙MB,k𝒦,\displaystyle\mathbf{F}_{1,k}(\mathbf{U}_{3,k},\mathbf{W}_{\mathrm{h},k})\!\triangleq\!\tilde{\mathbf{F}}_{1,k}(\mathbf{U}_{3,k},\mathbf{W}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}},\;k\in\mathcal{K}, (28)
𝐅2(𝐔4,𝐖G)𝐔4H𝐔4+𝐖GH(𝚺B𝐈MR)𝐖G,\displaystyle\mathbf{F}_{2}(\mathbf{U}_{4},\mathbf{W}_{\mathrm{G}})\triangleq\mathbf{U}_{4}^{\mathrm{H}}\mathbf{U}_{4}+\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})\mathbf{W}_{\mathrm{G}}, (29)
𝐅~1,k(𝐔3,k,𝐖h,k)𝐔3,kH𝐔3,k+𝐖h,kH𝚺U,k𝐖h,k,k𝒦.\displaystyle\tilde{\mathbf{F}}_{1,k}(\mathbf{U}_{3,k},\!\mathbf{W}_{\mathrm{h},k})\!\triangleq\!\mathbf{U}_{3,k}^{\mathrm{H}}\mathbf{U}_{3,k}\!\!+\!\!\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Sigma}_{\mathrm{U},k}\mathbf{W}_{\mathrm{h},k},k\!\in\!\mathcal{K}. (30)

Therefore, the problem (𝒫1)(\mathcal{P}1) can be equivalently rewritten as

(𝒫6):min𝓥k=1K𝖳𝗋{𝐅2(𝐔4,𝐖G)𝐅1,k(𝐔3,k,𝐖h,k)+𝐑G\displaystyle(\mathcal{P}6):\min_{\bm{\mathcal{V}}}\;\sum_{k=1}^{K}\mathsf{Tr}\{\mathbf{F}_{2}(\mathbf{U}_{4},\mathbf{W}_{\mathrm{G}})\!\odot\!\mathbf{F}_{1,k}(\mathbf{U}_{3,k},\mathbf{W}_{\mathrm{h},k})\!+\!\mathbf{R}_{\mathrm{G}}
(𝐑h,k𝟙MB)2𝖱𝖾{(𝐑G12𝐔4)((𝐑h,k12𝐔3,k)𝟙MB)}}\displaystyle\!\odot\!(\mathbf{R}_{\mathrm{h},k}\!\!\otimes\!\!\mathbbm{1}_{M_{\mathrm{B}}})\!\!-\!\!2\mathsf{Re}\{(\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}\mathbf{U}_{4})\!\odot\!((\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}\mathbf{U}_{3,k})\!\otimes\!\mathbbm{1}_{M_{\mathrm{B}}})\}\} (31)
s.t.(11a)(11c),\displaystyle\qquad\quad\mathrm{s.t.}\;\mathrm{(11a)-(11c)}, (31a)
(24)(29),\displaystyle\qquad\qquad\quad\mathrm{(24)-(29)}, (31b)

where the newly introduced notations above are defined as

𝓥{𝓤1,𝐔2,𝓤3,𝐔4,𝚽,𝐏,𝓦h,𝐖G},\displaystyle\bm{\mathcal{V}}\triangleq\{\bm{\mathcal{U}}_{1},\mathbf{U}_{2},\bm{\mathcal{U}}_{3},\mathbf{U}_{4},\bm{\Phi},\mathbf{P},\bm{\mathcal{W}}_{\mathrm{h}},\mathbf{W}_{\mathrm{G}}\},
𝓤1{𝐔1,k}k=1K,𝓤3{𝐔3,k}k=1K.\displaystyle\bm{\mathcal{U}}_{1}\triangleq\{\mathbf{U}_{1,k}\}_{k=1}^{K},\;\bm{\mathcal{U}}_{3}\triangleq\{\mathbf{U}_{3,k}\}_{k=1}^{K}. (32)

Following the PDD method [27], to tackle (𝒫6)(\mathcal{P}6), we turn to solve its augmented Lagrangian problem given as follows

(𝒫7):min𝓥𝖿ρ(𝓥)+k=1K𝖳𝗋{𝐅2(𝐔4,𝐖G)𝐅1,k(𝐔3,k,𝐖h,k)+𝐑G\displaystyle(\mathcal{P}7):\min_{\bm{\mathcal{V}}}\;\mathsf{f}_{\rho}(\bm{\mathcal{V}})\!+\!\!\sum_{k=1}^{K}\!\mathsf{Tr}\{\mathbf{F}_{2}(\mathbf{U}_{4},\mathbf{W}_{\mathrm{G}})\!\odot\!\mathbf{F}_{1,k}(\mathbf{U}_{3,k},\mathbf{W}_{\mathrm{h},k})\!+\!\mathbf{R}_{\mathrm{G}}
(𝐑h,k𝟙MB)2𝖱𝖾{(𝐑G12𝐔4)((𝐑h,k12𝐔3,k)𝟙MB)}}\displaystyle\!\odot\!(\mathbf{R}_{\mathrm{h},k}\!\!\otimes\!\!\mathbbm{1}_{M_{\mathrm{B}}})\!\!-\!\!2\mathsf{Re}\{(\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}\mathbf{U}_{4})\!\odot\!((\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}\mathbf{U}_{3,k})\!\otimes\!\mathbbm{1}_{M_{\mathrm{B}}})\}\} (33)
s.t.(11a)(11c),\displaystyle\qquad\quad\mathrm{s.t.}\;\mathrm{(11a)-(11c)}, (33a)

with the augmented term 𝖿ρ(𝓥)\mathsf{f}_{\rho}(\bm{\mathcal{V}}) defined as

𝖿ρ(𝓥)12ρ(k=1K𝐔3,k𝐑h,k12𝐔1,k𝐖h,k+ρ𝚲3,kF2\displaystyle\mathsf{f}_{\rho}(\bm{\mathcal{V}})\triangleq\frac{1}{2\rho}\bigg{(}\sum_{k=1}^{K}\|\mathbf{U}_{3,k}-\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}\mathbf{U}_{1,k}\mathbf{W}_{\mathrm{h},k}+\rho\bm{\Lambda}_{3,k}\|_{\mathrm{F}}^{2}
+k=1K𝐔1,k(𝚽𝐏)+ρ𝚲1,kF2+𝐔2(𝚽𝐏)𝐈MB+ρ𝚲2F2\displaystyle+\!\!\sum_{k=1}^{K}\!\|\mathbf{U}_{1,k}\!\!-\!\!(\bm{\Phi}\mathbf{P})^{*}\!\!+\!\!\rho\bm{\Lambda}_{1,k}\|_{\mathrm{F}}^{2}\!+\!\|\mathbf{U}_{2}\!\!-\!\!(\bm{\Phi}\mathbf{P})^{*}\!\otimes\!\mathbf{I}_{M_{\mathrm{B}}}\!\!+\!\!\rho\bm{\Lambda}_{2}\|_{\mathrm{F}}^{2}
+𝐔4𝐑G12𝐔2𝐖G+ρ𝚲4F2),\displaystyle+\|\mathbf{U}_{4}-\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}\mathbf{U}_{2}\mathbf{W}_{\mathrm{G}}+\rho\bm{\Lambda}_{4}\|_{\mathrm{F}}^{2}\bigg{)}, (34)

where ρ\rho represents the penalty coefficient, {𝚲1,k}k=1K\{\bm{\Lambda}_{1,k}\}_{k=1}^{K}, 𝚲2\bm{\Lambda}_{2}, {𝚲3,k}k=1K\{\bm{\Lambda}_{3,k}\}_{k=1}^{K} and 𝚲4\bm{\Lambda}_{4} represent the Lagrangian multipliers associated with the constraints (24)-(27), respectively.

The PDD method is a two-layer iterative procedure [27], with its inner layer alternatively optimizing the different blocks of variables comprising 𝓥\bm{\mathcal{V}} and its outer layer selectively updating the dual variables 𝚲i\bm{\Lambda}_{i}, i=1,2,3,4i=1,2,3,4 or the penalty coefficient ρ\rho. The two layers’ updating procedure will be specified in the sequel.

For the inner layer, we adopt the block successive upper-bound minimization (BSUM) to update various blocks [27], [35], with each block’s update being elaborated as follows.

1) Optimize the phase shift matrix 𝚽\bm{\Phi}

When other variables are given, the subproblem w.r.t. 𝚽\bm{\Phi} can be expressed as

(𝒫8):min𝚽\displaystyle(\mathcal{P}8):\min_{\bm{\Phi}}\; k=1K𝐔1,k(𝚽𝐏)+ρ𝚲1,kF2\displaystyle{\sum}_{k=1}^{K}\|\mathbf{U}_{1,k}-(\bm{\Phi}\mathbf{P})^{*}+\rho\bm{\Lambda}_{1,k}\|_{\mathrm{F}}^{2}
+𝐔2(𝚽𝐏)𝐈MB+ρ𝚲2F2\displaystyle+\|\mathbf{U}_{2}-(\bm{\Phi}\mathbf{P})^{*}\otimes\mathbf{I}_{M_{\mathrm{B}}}+\rho\bm{\Lambda}_{2}\|_{\mathrm{F}}^{2} (35)
s.t.\displaystyle\mathrm{s.t.}\; |[𝚽]i,j|=1,i,j.\displaystyle|[\bm{\Phi}]_{i,j}|=1,\;i,j\in\mathcal{M}. (35a)

Note that the quadratic term 𝚽𝐏F2\|\bm{\Phi}\mathbf{P}\|_{\mathrm{F}}^{2} reduces to a constant Tt=1TαtT\sum_{t=1}^{T}\alpha_{t} due to (35a), which is independent of 𝚽\bm{\Phi}. Hence, utilizing the above fact, the problem (𝒫8)(\mathcal{P}8) boils down to maximizing a linear objective given as follows

(𝒫9):max𝚽\displaystyle(\mathcal{P}9):\max_{\bm{\Phi}}\; 𝖱𝖾{𝖳𝗋{𝐐𝚽}}\displaystyle\mathsf{Re}\{\mathsf{Tr}\{\mathbf{Q}\bm{\Phi}\}\} (36)
s.t.\displaystyle\mathrm{s.t.}\; |[𝚽]i,j|=1,i,j,\displaystyle|[\bm{\Phi}]_{i,j}|=1,\;i,j\in\mathcal{M}, (36a)

where the parameter 𝐐\mathbf{Q} is defined as

𝐐k=1K𝐏(𝐔1,kT+ρ𝚲1,kT)+𝐂~𝚽,\mathbf{Q}\triangleq{\sum}_{k=1}^{K}\mathbf{P}(\mathbf{U}_{1,k}^{\mathrm{T}}+\rho\bm{\Lambda}_{1,k}^{\mathrm{T}})+\tilde{\mathbf{C}}_{\bm{\Phi}}, (37)

and the (i,j)(i,j)th element of the matrix 𝐂~𝚽\tilde{\mathbf{C}}_{\bm{\Phi}} is given by m=1MB[(𝐏𝐈MB)(𝐔2T+ρ𝚲2T)](i1)MB+m,(j1)MB+m,i,j\sum_{m=1}^{M_{\mathrm{B}}}[(\mathbf{P}\!\otimes\!\mathbf{I}_{M_{\mathrm{B}}})(\mathbf{U}_{2}^{\mathrm{T}}\!+\!\rho\bm{\Lambda}_{2}^{\mathrm{T}})]_{(i\!-\!1)M_{\mathrm{B}}\!+\!m,(j\!-\!1)M_{\mathrm{B}}\!+\!m},\;i,j\in\mathcal{M}.

The problem (𝒫9)(\mathcal{P}9), although nonconvex, achieves optimum when the elements’ phases of 𝚽\bm{\Phi} align with those of 𝐐H\mathbf{Q}^{\mathrm{H}}. Consequently, the optimal solution of (𝒫8)(\mathcal{P}8) is given analytically by

𝚽=ej(𝐐H).\bm{\Phi}=e^{j\angle(\mathbf{Q}^{\mathrm{H}})}. (38)

2) Optimize the power allocation 𝐏\mathbf{P}

After some manipulations, the update of 𝐏\mathbf{P} with the remaining variables fixed can be written as

(𝒫10):min𝐩T\displaystyle(\mathcal{P}10):\min_{\mathbf{p}\in\mathbb{R}^{T}}\; (KMR+MBMR)𝐩T𝐩2𝖱𝖾{𝐪H𝐩}\displaystyle(KM_{\mathrm{R}}+M_{\mathrm{B}}M_{\mathrm{R}})\mathbf{p}^{\mathrm{T}}\mathbf{p}-2\mathsf{Re}\{\mathbf{q}^{\mathrm{H}}\mathbf{p}\} (39)
s.t.\displaystyle\mathrm{s.t.}\; 𝐩T𝐩PmaxMR,\displaystyle\mathbf{p}^{\mathrm{T}}\mathbf{p}\leq\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}, (39a)
𝐩𝟎,\displaystyle\mathbf{p}\geq\bm{0}, (39b)

where 𝐩𝖽𝗂𝖺𝗀(𝐏)\mathbf{p}\!\triangleq\!\mathsf{diag}({\mathbf{P}}), 𝐪H(𝖽𝗂𝖺𝗀(k=1K(𝐔1,kT+ρ𝚲1,kT)𝚽))T+𝐜2T\mathbf{q}^{\mathrm{H}}\triangleq\big{(}\mathsf{diag}\big{(}\sum_{k=1}^{K}(\mathbf{U}_{1,k}^{\mathrm{T}}+\rho\bm{\Lambda}_{1,k}^{\mathrm{T}})\bm{\Phi}\big{)}\big{)}^{\mathrm{T}}+\mathbf{c}_{2}^{\mathrm{T}} and [𝐜2]m=n=1MB[(𝐔2T+ρ𝚲2T)(𝚽𝐈MB)](m1)MB+n,(m1)MB+n,m[\mathbf{c}_{2}]_{m}=\sum_{n=1}^{M_{\mathrm{B}}}[(\mathbf{U}_{2}^{\mathrm{T}}+\rho\bm{\Lambda}_{2}^{\mathrm{T}})(\bm{\Phi}\otimes\mathbf{I}_{M_{\mathrm{B}}})]_{(m-1)M_{\mathrm{B}}+n,(m-1)M_{\mathrm{B}}+n},\;m\in\mathcal{M}.

Following the similar procedure as solving (𝒫5)(\mathcal{P}5), the optimal solution to (𝒫10)(\mathcal{P}10) is given by

  • if [𝖱𝖾{𝐪}]+KMR+MBMR22PmaxMR\big{\|}\frac{[\mathsf{Re}\{\mathbf{q}\}]_{+}}{KM_{\mathrm{R}}+M_{\mathrm{B}}M_{\mathrm{R}}}\big{\|}_{2}^{2}\leq\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}, we have

    𝐩=[𝖱𝖾{𝐪}]+KMR+MBMR,\mathbf{p}^{\star}=\frac{[\mathsf{Re}\{\mathbf{q}\}]_{+}}{KM_{\mathrm{R}}+M_{\mathrm{B}}M_{\mathrm{R}}}, (40)
  • otherwise,

    𝐩=PmaxMR[𝖱𝖾{𝐪}]+[𝖱𝖾{𝐪}]+2,\mathbf{p}^{\star}=\sqrt{\frac{P_{\mathrm{max}}}{M_{\mathrm{R}}}}\frac{[\mathsf{Re}\{\mathbf{q}\}]_{+}}{\|[\mathsf{Re}\{\mathbf{q}\}]_{+}\|_{2}}, (41)

which can be proven by exploiting Appendix D.

3) Optimize the auxiliary variable 𝓤1\bm{\mathcal{U}}_{1}

Obviously, the optimization w.r.t. 𝓤1\bm{\mathcal{U}}_{1} is an unconstrained convex quadratic minimization problem. Hence, by setting the first-order derivative of (33) w.r.t. 𝐔1,k,k𝒦\mathbf{U}_{1,k},\;k\in\mathcal{K}, to zero, we attain the following equation

𝐑h,k1𝐔1,k+𝐔1,k𝐖h,k𝐖h,kH=𝐂~1,k,k𝒦,\mathbf{R}_{\mathrm{h},k}^{-1}\mathbf{U}_{1,k}+\mathbf{U}_{1,k}\mathbf{W}_{\mathrm{h},k}\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}=\tilde{\mathbf{C}}_{1,k},\;k\in\mathcal{K}, (42)

where

𝐂~1,k𝐑h,k1((𝚽𝐏)ρ𝚲1,k+𝐑h,k12(𝐔3,k+ρ𝚲3,k)𝐖h,kH).\tilde{\mathbf{C}}_{1,k}\!\triangleq\!\mathbf{R}_{\mathrm{h},k}^{-1}((\bm{\Phi}\mathbf{P})^{*}\!-\!\rho\bm{\Lambda}_{1,k}\!+\!\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}(\mathbf{U}_{3,k}\!+\!\rho\bm{\Lambda}_{3,k})\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}). (43)

The linear system in (42) is the well-known Sylvester equation and can be readily solved via the seminal Hessenburg-Schur algorithm, which has the complexity of 𝒪(MR3)\mathcal{O}(M_{\mathrm{R}}^{3}) [36].22Note in MATLAB, the standard build-in function 𝗌𝗒𝗅𝗏𝖾𝗌𝗍𝖾𝗋()\mathsf{sylvester}() can be invoked to solve (42) immediately.

4) Optimize the auxiliary variable 𝐔2\mathbf{U}_{2}

Following the similar arguments as in updating 𝓤1\bm{\mathcal{U}}_{1}, the optimization of 𝐔2\mathbf{U}_{2} also reduces to solving a Sylvester equation given as follows

𝐑G1𝐔2+𝐔2𝐖G𝐖GH=𝐂~2,\mathbf{R}_{\mathrm{G}}^{-1}\mathbf{U}_{2}+\mathbf{U}_{2}\mathbf{W}_{\mathrm{G}}\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}=\tilde{\mathbf{C}}_{2}, (44)

where

𝐂~2𝐑G1(𝖠GH(𝚿)ρ𝚲2+𝐑G12(𝐔4+ρ𝚲4)𝐖GH).\tilde{\mathbf{C}}_{2}\triangleq\mathbf{R}_{\mathrm{G}}^{-1}(\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})-\rho\bm{\Lambda}_{2}+\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}(\mathbf{U}_{4}+\rho\bm{\Lambda}_{4})\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}). (45)

5) Optimize the auxiliary variable 𝓤3\bm{\mathcal{U}}_{3}

Next, we proceed to study the update of 𝓤3\bm{\mathcal{U}}_{3}, which is an unconstrained convex quadratic problem, whose closed-form solution can be obtained via checking the first-order optimality condition, shown as

𝐔3,k=\displaystyle\mathbf{U}_{3,k}= (12ρ𝐑h,k12𝐔1,k𝐖h,k12𝚲3,k+𝐑h,k12𝐂~d,1)\displaystyle\bigg{(}\frac{1}{2\rho}\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}\mathbf{U}_{1,k}\mathbf{W}_{\mathrm{h},k}-\frac{1}{2}\bm{\Lambda}_{3,k}+\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}\tilde{\mathbf{C}}_{\mathrm{d},1}\bigg{)}
×(12ρ𝐈MR+𝐂~d,2)1,k𝒦,\displaystyle\times\bigg{(}\frac{1}{2\rho}\mathbf{I}_{M_{\mathrm{R}}}+\tilde{\mathbf{C}}_{\mathrm{d},2}\bigg{)}^{-1},\;k\in\mathcal{K}, (46)

where 𝐂~d,1\tilde{\mathbf{C}}_{\mathrm{d},1} and 𝐂~d,2\tilde{\mathbf{C}}_{\mathrm{d},2} are MRM_{\mathrm{R}}-dimensional diagonal matrices whose iith diagonal elements, ii\in\mathcal{M}, equal to j=(i1)MB+1iMB[𝐔4H𝐑G12]jj\sum_{j=(i-1)M_{\mathrm{B}}+1}^{iM_{\mathrm{B}}}[\mathbf{U}_{4}^{\mathrm{H}}\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}]_{jj} and j=(i1)MB+1iMB[𝐔4H𝐔4+𝐖GH(𝚺B𝐈MR)𝐖G]jj\sum_{j=(i-1)M_{\mathrm{B}}+1}^{iM_{\mathrm{B}}}[\mathbf{U}_{4}^{\mathrm{H}}\mathbf{U}_{4}+\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})\mathbf{W}_{\mathrm{G}}]_{jj}, respectively.

6) Optimize the auxiliary variable 𝐔4\mathbf{U}_{4}

Similar to the optimization of 𝓤3\bm{\mathcal{U}}_{3}, the solution to 𝐔4\mathbf{U}_{4} can be expressed in the closed-form as

𝐔4=\displaystyle\mathbf{U}_{4}= (12ρ𝐑G12𝐔2𝐖G12𝚲4+k=1K𝐑G12𝖣𝖽𝗂𝖺𝗀((𝐔3,kH𝐑h,k12)\displaystyle\bigg{(}\frac{1}{2\rho}\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}\mathbf{U}_{2}\mathbf{W}_{\mathrm{G}}-\frac{1}{2}\bm{\Lambda}_{4}+\sum_{k=1}^{K}\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}\mathsf{Ddiag}((\mathbf{U}_{3,k}^{\mathrm{H}}\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}})
𝟙MB))(12ρ𝐈MBMR+𝐂~d,3)1,\displaystyle\otimes\mathbbm{1}_{M_{\mathrm{B}}})\bigg{)}\bigg{(}\frac{1}{2\rho}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}}+\tilde{\mathbf{C}}_{\mathrm{d},3}\bigg{)}^{-1}, (47)

where

𝐂~d,3k=1K𝖣𝖽𝗂𝖺𝗀((𝐔3,kH𝐔3,k+𝐖h,kH𝚺U,k𝐖h,k)𝟙MB).\tilde{\mathbf{C}}_{\mathrm{d},3}\!\triangleq\!\sum_{k=1}^{K}\mathsf{Ddiag}((\mathbf{U}_{3,k}^{\mathrm{H}}\mathbf{U}_{3,k}\!+\!\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Sigma}_{\mathrm{U},k}\mathbf{W}_{\mathrm{h},k})\!\otimes\!\mathbbm{1}_{M_{\mathrm{B}}}). (48)

7) Optimize the linear receiver at every user 𝓦h\bm{\mathcal{W}}_{\mathrm{h}}

The update of 𝓦h\bm{\mathcal{W}}_{\mathrm{h}} is an unconstrained convex quadratic optimization problem, whose optimal solution is equivalent to solving the following Sylvester equation

12ρ𝚺U,k1𝐔1,kH𝐑h,k𝐔1,k𝐖h,k+𝐖h,k𝐂~d,2=𝐂~Wh,k,k𝒦,\frac{1}{2\rho}\bm{\Sigma}_{\mathrm{U},k}^{-1}\mathbf{U}_{1,k}^{\mathrm{H}}\mathbf{R}_{\mathrm{h},k}\mathbf{U}_{1,k}\mathbf{W}_{\mathrm{h},k}\!\!+\!\!\mathbf{W}_{\mathrm{h},k}\tilde{\mathbf{C}}_{\mathrm{d},2}\!\!=\!\!\tilde{\mathbf{C}}_{\mathrm{W_{h}},k},k\!\in\!\mathcal{K}, (49)

where

𝐂~Wh,k12ρ𝚺U,k1𝐔1,kH𝐑h,k12(𝐔3,k+ρ𝚲3,k),k𝒦.\tilde{\mathbf{C}}_{\mathrm{W_{h}},k}\triangleq\frac{1}{2\rho}\bm{\Sigma}_{\mathrm{U},k}^{-1}\mathbf{U}_{1,k}^{\mathrm{H}}\mathbf{R}_{\mathrm{h},k}^{\frac{1}{2}}(\mathbf{U}_{3,k}+\rho\bm{\Lambda}_{3,k}),\;k\in\mathcal{K}. (50)

8) Optimize the linear receiver at the BS 𝐖G\mathbf{W}_{\mathrm{G}}

The update of 𝐖G\mathbf{W}_{\mathrm{G}} is similar to that of 𝓦h\bm{\mathcal{W}}_{\mathrm{h}} and can be conducted by solving the following Sylvester equation (details are skipped to avoid repetition)

12ρ(𝚺B𝐈MR)1𝐔2H𝐑G𝐔2𝐖G+𝐖G𝐂~d,3\displaystyle\frac{1}{2\rho}(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})^{-1}\mathbf{U}_{2}^{\mathrm{H}}\mathbf{R}_{\mathrm{G}}\mathbf{U}_{2}\mathbf{W}_{\mathrm{G}}+\mathbf{W}_{\mathrm{G}}\tilde{\mathbf{C}}_{\mathrm{d},3}
=12ρ(𝚺B𝐈MR)1𝐔2H𝐑G12(𝐔4+ρ𝚲4).\displaystyle=\frac{1}{2\rho}(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})^{-1}\mathbf{U}_{2}^{\mathrm{H}}\mathbf{R}_{\mathrm{G}}^{\frac{1}{2}}(\mathbf{U}_{4}+\rho\bm{\Lambda}_{4}). (51)

For the outer layer, if all the equalities (24)-(27) are approximately satisfied, we will update the Lagrangian multipliers in a gradient ascent manner, i.e., 𝚲i(t+1):=𝚲i(t)+ρ1(𝐔i𝐔iR)\bm{\Lambda}_{i}^{(t+1)}:=\bm{\Lambda}_{i}^{(t)}+\rho^{-1}(\mathbf{U}_{i}-\mathbf{U}_{i}^{\mathrm{R}}), where 𝐔iR\mathbf{U}_{i}^{\mathrm{R}} represents the right hand side of the iith equality. Otherwise, the penalty coefficient ρ\rho should be decreased to force the equalities to be approached in the subsequent iterations.

The proposed PDD-based algorithm of handling (𝒫1)(\mathcal{P}1) is stated in Algorithm 2.

1:  Initialize feasible 𝚿(0)=𝚽(0)𝐏(0)\bm{\Psi}^{(0)}=\bm{\Phi}^{(0)}\mathbf{P}^{(0)} and t=0t=0;
2:  repeat
13:     set 𝚽(t,0):=𝚽(t)\bm{\Phi}^{(t,0)}:=\bm{\Phi}^{(t)}, 𝐏(t,0):=𝐏(t)\mathbf{P}^{(t,0)}:=\mathbf{P}^{(t)}, 𝓦h(t,0):=𝓦h(t)\bm{\mathcal{W}}_{\mathrm{h}}^{(t,0)}:=\bm{\mathcal{W}}_{\mathrm{h}}^{(t)}, 𝐖G(t,0):=𝐖G(t)\mathbf{W}_{\mathrm{G}}^{(t,0)}:=\mathbf{W}_{\mathrm{G}}^{(t)}, {𝐔i(t,0)}i=14:={𝐔i(t)}i=14\{\mathbf{U}_{i}^{(t,0)}\}_{i=1}^{4}:=\{\mathbf{U}_{i}^{(t)}\}_{i=1}^{4} and
s=0s=0;
4:     repeat
5:        update 𝓦h(t,s+1)\bm{\mathcal{W}}_{\mathrm{h}}^{(t,s+1)} by solving equation (49);
6:        update 𝐖G(t,s+1)\mathbf{W}_{\mathrm{G}}^{(t,s+1)} by solving equation (51);
7:        update 𝓤1(t,s+1)\bm{\mathcal{U}}_{1}^{(t,s+1)} by solving equation (42);
8:        update 𝐔2(t,s+1)\mathbf{U}_{2}^{(t,s+1)} by solving equation (44);
9:        update 𝓤3(t,s+1)\bm{\mathcal{U}}_{3}^{(t,s+1)} by (46);
10:        update 𝐔4(t,s+1)\mathbf{U}_{4}^{(t,s+1)} by (47);
11:        update 𝚽(t,s+1)\bm{\Phi}^{(t,s+1)} by (38);
12:        update 𝐏(t,s+1)\mathbf{P}^{(t,s+1)} by solving (𝒫10)(\mathcal{P}10);
13:        s:=s+1s:=s+1;
14:     until a certain stopping criterion is reached
15:     set 𝚽(t+1):=𝚽(t,)\bm{\Phi}^{(t+1)}:=\bm{\Phi}^{(t,\infty)}, 𝐏(t+1):=𝐏(t,)\mathbf{P}^{(t+1)}:=\mathbf{P}^{(t,\infty)}, 𝓦h(t+1):=𝓦h(t,)\bm{\mathcal{W}}_{\mathrm{h}}^{(t+1)}:=\bm{\mathcal{W}}_{\mathrm{h}}^{(t,\infty)}, 𝐖G(t+1):=𝐖G(t,)\mathbf{W}_{\mathrm{G}}^{(t+1)}:=\mathbf{W}_{\mathrm{G}}^{(t,\infty)} and {𝐔i(t+1)}i=14:={𝐔i(t,)}i=14\{\mathbf{U}_{i}^{(t+1)}\}_{i=1}^{4}:=\{\mathbf{U}_{i}^{(t,\infty)}\}_{i=1}^{4};
16:     set 𝚿(t+1)=𝚽(t+1)𝐏(t+1)\bm{\Psi}^{(t+1)}=\bm{\Phi}^{(t+1)}\mathbf{P}^{(t+1)};
17:     for i=1i=1 to 44 do
18:        if 𝐔i(t+1)(𝐔iR)(t+1)δ\|\mathbf{U}_{i}^{(t+1)}-(\mathbf{U}_{i}^{\mathrm{R}})^{(t+1)}\|_{\infty}\leq\delta then
19:           𝚲i(t+1):=𝚲i(t)+1ρ(t)(𝐔i(t+1)(𝐔iR)(t+1))\bm{\Lambda}_{i}^{(t+1)}:=\bm{\Lambda}_{i}^{(t)}+\frac{1}{\rho^{(t)}}(\mathbf{U}_{i}^{(t+1)}-(\mathbf{U}_{i}^{\mathrm{R}})^{(t+1)});
20:           ρ(t+1):=ρ(t)\rho^{(t+1)}:=\rho^{(t)};
21:        else
22:           𝚲i(t+1):=𝚲i(t)\bm{\Lambda}_{i}^{(t+1)}:=\bm{\Lambda}_{i}^{(t)}, ρ(t+1):=cρ(t)\rho^{(t+1)}:=c\cdot\rho^{(t)};
23:        end if
24:     end for
25:     t:=t+1t:=t+1;
26:  until a certain stopping criterion is reached
Algorithm 2 Solving the problem (𝒫1)(\mathcal{P}1) based on PDD

Remark 1. Observed from simulation results shown in Sec. V, Alg. 1 & 2 converge and yield nearly identical performance. Nevertheless, when the number of users is small, e.g., K3K\leq 3, Alg. 1 exhibits lower complexity and vice versa. Besides, assuming that Alg. 1 starts from a feasible point and the step size of (projected) GD procedure is sufficiently small, then, it can be proved that the solution iterates generated by Alg. 1 maintain feasible to (𝒫1)(\mathcal{P}1) and yields monotonically decreasing objective values of (𝒫1)(\mathcal{P}1).

IV Performance Analysis and Comparison

In this section, we analyze the estimation performance of our proposed RIS-TX CE method and compare it with that of the conventional CscdChn training scheme, including both LMMSE and LS estimation. To this end, we employ the DFT matrix as the codebook for pilot sequence, i.e., 𝚽=𝐅\bm{\Phi}=\mathbf{F} with [𝐅]i,j=ej2π(i1)(j1)MR,i,j[\mathbf{F}]_{i,j}=e^{-j\frac{2\pi(i-1)(j-1)}{M_{\mathrm{R}}}},\;i,j\in\mathcal{M}, and uniform power allocation, i.e., 𝐏=PmaxMR𝐈MR\mathbf{P}=\frac{\sqrt{P_{\mathrm{max}}}}{M_{\mathrm{R}}}\mathbf{I}_{M_{\mathrm{R}}}. The use of DFT pilot sequence is due to the following considerations:

  • The DFT pilot sequence has been shown to be nearly optimal for the CscdChn training scheme by different works [6]-[8], [10], [11].

  • Although our GD/PDD-algorithm induced pilot sequences yield superior performance over DFT codebook (see Sec. V), utilizing DFT pilot yields closed-form MSE and CRLB expressions, which greatly simplifies analysis and helps obtain more insights.

Besides, we also assume all channels follow Rayleigh fading distribution [28], i.e., 𝐠𝒞𝒩(𝟎,ρG𝐈MBMR)\mathbf{g}\sim\mathcal{CN}(\bm{0},\rho_{\mathrm{G}}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}}) and 𝐡k𝒞𝒩(𝟎,ρh,k𝐈MR),k𝒦\mathbf{h}_{k}\sim\mathcal{CN}(\bm{0},\rho_{\mathrm{h},k}\mathbf{I}_{M_{\mathrm{R}}}),\;k\in\mathcal{K}, where ρG\rho_{\mathrm{G}} and ρh,k,k𝒦\rho_{\mathrm{h},k},\;k\in\mathcal{K}, stand for the large-scale fading coefficients of 𝐆\mathbf{G} and 𝐡k\mathbf{h}_{k}, respectively.

A. CscdChn Training Scheme

To perform analysis and comparison, we first briefly review the conventional CscdChn training procedure, which is conducted in analogy with the classical uplink CE protocol [28]. Specifically, for the system described in Sec. II, all KK single-antenna users simultaneously transmit orthogonal pilot sequences to the BS within TMRT\geq M_{\mathrm{R}} consecutive time slots. Here, the direct channels between users and the BS are still not considered since they can be easily obtained in advance [1], [9], [10]. Therefore, in the ttth time slot, the signal received at the BS is

𝐘~B(t)=k=1K𝐇c,kϕp(t)𝐳kH+𝐍~B(t),t𝒯,\tilde{\mathbf{Y}}_{\mathrm{B}}^{(t)}={\sum}_{k=1}^{K}\mathbf{H}_{\mathrm{c},k}\bm{\phi}_{\mathrm{p}}^{(t)}\mathbf{z}_{k}^{\mathrm{H}}+\tilde{\mathbf{N}}_{\mathrm{B}}^{(t)},\;t\in\mathcal{T}, (52)

where 𝐇c,k,k𝒦\mathbf{H}_{\mathrm{c},k},\;k\in\mathcal{K}, is the effective CscdChn defined in Sec. II, ϕp(t),t𝒯\bm{\phi}_{\mathrm{p}}^{(t)},\;t\in\mathcal{T}, indicates phase shifts of RIS elements and is set as the ttth column of the DFT matrix, 𝐳k,k𝒦\mathbf{z}_{k},\;k\in\mathcal{K}, represents pilot sequence transmitted by the kkth user and 𝐍~B(t),t𝒯\tilde{\mathbf{N}}_{\mathrm{B}}^{(t)},\;t\in\mathcal{T}, denotes the thermal noise at the BS with each element following i.i.d. 𝒞𝒩(0,σB2)\mathcal{CN}(0,\sigma_{\mathrm{B}}^{2}). We assume that pilot sequences from different users are mutually orthogonal [28], i.e., 𝐳k22=Pmax\|\mathbf{z}_{k}\|_{2}^{2}=P_{\mathrm{max}} and 𝐳kH𝐳j=0,k,j𝒦,kj\mathbf{z}_{k}^{\mathrm{H}}\mathbf{z}_{j}=0,\;k,j\in\mathcal{K},\;k\neq j. To save overhead, we just assume T=MRT=M_{\mathrm{R}}.

Remark 2. Without channel sparsity assumption, CscdChn scheme has the pilot overhead of at least MR+𝗆𝖺𝗑{K1,(K1)MR/MB}M_{\mathrm{R}}+\mathsf{max}\{K-1,(K-1)\lceil M_{\mathrm{R}}/M_{\mathrm{B}}\rceil\} [1], [9], which is higher than our proposed RIS-TX scheme (i.e. MRM_{\mathrm{R}}).

To extract each user’s channel information, by right multiplying both sides of (52) with 𝐳k,k𝒦\mathbf{z}_{k},\;k\in\mathcal{K}, the BS obtains

𝐲~B,k(t)=𝐘~B(t)𝐳k=Pmax𝐇c,kϕp(t)+𝐧~B,k(t),k𝒦,t𝒯,\tilde{\mathbf{y}}_{\mathrm{B},k}^{(t)}\!=\!\tilde{\mathbf{Y}}_{\mathrm{B}}^{(t)}\mathbf{z}_{k}\!=\!P_{\mathrm{max}}\mathbf{H}_{\mathrm{c},k}\bm{\phi}_{\mathrm{p}}^{(t)}\!+\!\tilde{\mathbf{n}}_{\mathrm{B},k}^{(t)},k\in\mathcal{K},t\in\mathcal{T}, (53)

where 𝐧~B,k(t)𝐍~B(t)𝐳k,k𝒦,t𝒯\tilde{\mathbf{n}}_{\mathrm{B},k}^{(t)}\triangleq\tilde{\mathbf{N}}_{\mathrm{B}}^{(t)}\mathbf{z}_{k},\;k\in\mathcal{K},\;t\in\mathcal{T}. Via collecting all TT 𝐲~B,k(t)\tilde{\mathbf{y}}_{\mathrm{B},k}^{(t)} and stacking them in a tall vector, user-kk’s data obtained by the BS can be expressed as

𝐲~B,k=Pmax𝖠G(𝚽p)𝐡c,k+𝐧~B,k,k𝒦,\tilde{\mathbf{y}}_{\mathrm{B},k}=P_{\mathrm{max}}\mathsf{A}_{\mathrm{G}}(\bm{\Phi}_{\mathrm{p}})\mathbf{h}_{\mathrm{c},k}+\tilde{\mathbf{n}}_{\mathrm{B},k},\;k\in\mathcal{K}, (54)

where 𝚽p[ϕp(1),,ϕp(T)]\bm{\Phi}_{\mathrm{p}}\triangleq[\bm{\phi}_{\mathrm{p}}^{(1)},\cdots,\bm{\phi}_{\mathrm{p}}^{(T)}] is a DFT matrix, 𝐡c,k𝗏𝖾𝖼(𝐇c,k),k𝒦\mathbf{h}_{\mathrm{c},k}\triangleq\mathsf{vec}(\mathbf{H}_{\mathrm{c},k}),\;k\in\mathcal{K}, and 𝐧~B,k𝗏𝖾𝖼([𝐧~B,k(1),,𝐧~B,k(T)]),k𝒦\tilde{\mathbf{n}}_{\mathrm{B},k}\triangleq\mathsf{vec}\big{(}[\tilde{\mathbf{n}}_{\mathrm{B},k}^{(1)},\cdots,\tilde{\mathbf{n}}_{\mathrm{B},k}^{(T)}]\big{)},\;k\in\mathcal{K}.

Note that 𝐡c,k,k𝒦\mathbf{h}_{\mathrm{c},k},\;k\in\mathcal{K}, has zero mean and correlation matrix 𝐑Hc,k,k𝒦\mathbf{R}_{\mathrm{H_{c}},k},\;k\in\mathcal{K}. Based on (54), adopting the LMMSE estimator, the MSE matrix of 𝐡c,k,k𝒦\mathbf{h}_{\mathrm{c},k},\;k\in\mathcal{K}, can be shown to be given by [37]

𝐂~Hc,k=𝐑Hc,k𝐑Hc,k𝖠GH(𝚽p)𝐉Hc,k𝖠G(𝚽p)𝐑Hc,k,k𝒦,\tilde{\mathbf{C}}_{\mathrm{H}_{\mathrm{c},k}}\!\!=\!\!\mathbf{R}_{\mathrm{H_{c}},k}\!\!-\!\!\mathbf{R}_{\mathrm{H_{c}},k}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\!\bm{\Phi}_{\mathrm{p}}\!)\mathbf{J}_{\mathrm{H_{c}},k}\mathsf{A}_{\mathrm{G}}(\!\bm{\Phi}_{\mathrm{p}}\!)\mathbf{R}_{\mathrm{H_{c}},k},k\!\in\!\mathcal{K}, (55)

where 𝐉Hc,k(𝖠G(𝚽p)𝐑Hc,k𝖠GH(𝚽p)+σB2Pmax𝐈MBMR)1\mathbf{J}_{\mathrm{H_{c}},k}\triangleq\big{(}\mathsf{A}_{\mathrm{G}}(\bm{\Phi}_{\mathrm{p}})\mathbf{R}_{\mathrm{H_{c}},k}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Phi}_{\mathrm{p}})+\frac{\sigma_{\mathrm{B}}^{2}}{P_{\mathrm{max}}}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}}\big{)}^{-1}, k𝒦k\in\mathcal{K}.

B. MSE Performance

Next, we present the estimation performance of the proposed RIS-TX and the classical CscdChn training scheme, including both LMMSE and LS estimation.

1) LMMSE estimation

i) RIS-TX training

Note the receivers 𝐖G\mathbf{W}_{\mathrm{G}} and 𝓦h\bm{\mathcal{W}}_{\mathrm{h}} are coupled (see (10)) and hence the joint optimal receivers cannot be obtained analytically. To obtain insight, we adopt a suboptimal yet simple receiving scheme—the BS/users conduct LMMSE receiving to minimize CE error of their direct channel to RIS. Specifically, substituting 𝐖G=(𝖠G(𝚿)𝐑G𝖠GH(𝚿)+𝚺B𝐈MR)1𝖠G(𝚿)𝐑G\mathbf{W}_{\mathrm{G}}=(\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})+\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})^{-1}\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}} and 𝐖h,k=(𝚿T𝐑h,k𝚿+𝚺U,k)1𝚿T𝐑h,k,k𝒦\mathbf{W}_{\mathrm{h},k}=(\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{*}+\bm{\Sigma}_{\mathrm{U},k})^{-1}\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k},\;k\in\mathcal{K}, into (10), the MSE for DFT pilot and Rayleigh fading scenario reduces to (56), as shown on the top of next page.

𝐂Hc,k=ρG2ρh,kσU,k2MRPmax+ρh,k2ρGσB2MRPmax+ρGρh,kσB2σU,k2MR2(ρGPmax+σB2MR)(ρh,kPmax+σU,k2MR)𝐈MBMR,k𝒦.\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}=\frac{\rho_{\mathrm{G}}^{2}\rho_{\mathrm{h},k}\sigma_{\mathrm{U},k}^{2}M_{\mathrm{R}}P_{\mathrm{max}}+\rho_{\mathrm{h},k}^{2}\rho_{\mathrm{G}}\sigma_{\mathrm{B}}^{2}M_{\mathrm{R}}P_{\mathrm{max}}+\rho_{\mathrm{G}}\rho_{\mathrm{h},k}\sigma_{\mathrm{B}}^{2}\sigma_{\mathrm{U},k}^{2}M_{\mathrm{R}}^{2}}{(\rho_{\mathrm{G}}P_{\mathrm{max}}+\sigma_{\mathrm{B}}^{2}M_{\mathrm{R}})(\rho_{\mathrm{h},k}P_{\mathrm{max}}+\sigma_{\mathrm{U},k}^{2}M_{\mathrm{R}})}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}},\;k\in\mathcal{K}. (56)

ii) CscdChn training

Utilizing the DFT pilot and Rayleigh fading hypothesis, the expression in (55) can be simplified into

𝐂~Hc,k=ρGρh,kσB2ρGρh,kMRPmax+σB2𝐈MBMR,k𝒦.\tilde{\mathbf{C}}_{\mathrm{H}_{\mathrm{c},k}}=\frac{\rho_{\mathrm{G}}\rho_{\mathrm{h},k}\sigma_{\mathrm{B}}^{2}}{\rho_{\mathrm{G}}\rho_{\mathrm{h},k}M_{\mathrm{R}}P_{\mathrm{max}}+\sigma_{\mathrm{B}}^{2}}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}},\;k\in\mathcal{K}. (57)

2) LS estimation

i) RIS-TX training

When utilizing LS estimator, i.e., 𝐖G=𝖠G(𝚿)(𝖠GH(𝚿)𝖠G(𝚿))1\mathbf{W}_{\mathrm{G}}=\mathsf{A}_{\mathrm{G}}(\bm{\Psi})(\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathsf{A}_{\mathrm{G}}(\bm{\Psi}))^{-1} and 𝐖h,k=𝚿T(𝚿𝚿T)1,k𝒦\mathbf{W}_{\mathrm{h},k}=\bm{\Psi}^{\mathrm{T}}(\bm{\Psi}^{*}\bm{\Psi}^{\mathrm{T}})^{-1},\;k\in\mathcal{K}, under the hypothesis of DFT pilot and Rayleigh fading, the equation (10) is simplified to (58), shown on the top of next page.33Note the channels statistics are used here only for purpose of analysis, which is indeed unavailable when performing LS estimation.

𝐂Hc,kLS=ρh,kσB2MRPmax+ρGσU,k2MRPmax+σB2σU,k2MR2Pmax2𝐈MBMR,k𝒦.\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}^{\mathrm{LS}}=\frac{\rho_{\mathrm{h},k}\sigma_{\mathrm{B}}^{2}M_{\mathrm{R}}P_{\mathrm{max}}+\rho_{\mathrm{G}}\sigma_{\mathrm{U},k}^{2}M_{\mathrm{R}}P_{\mathrm{max}}+\sigma_{\mathrm{B}}^{2}\sigma_{\mathrm{U},k}^{2}M_{\mathrm{R}}^{2}}{P_{\mathrm{max}}^{2}}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}},\;k\in\mathcal{K}. (58)

ii) CscdChn training

Applying LS estimation to (54), under the DFT pilots and Rayleigh fading assumptions, the covariance matrix of the CscdChn is given as

𝐂~Hc,kLS\displaystyle\tilde{\mathbf{C}}_{\mathrm{H}_{\mathrm{c},k}}^{\mathrm{LS}} =σB2Pmax((𝚽p𝚽pT)1𝐈MB)\displaystyle=\frac{\sigma_{\mathrm{B}}^{2}}{P_{\mathrm{max}}}((\bm{\Phi}_{\mathrm{p}}^{*}\bm{\Phi}_{\mathrm{p}}^{\mathrm{T}})^{-1}\otimes\mathbf{I}_{M_{\mathrm{B}}}) (59)
=σB2MRPmax𝐈MBMR,k𝒦.\displaystyle=\frac{\sigma_{\mathrm{B}}^{2}}{M_{\mathrm{R}}P_{\mathrm{max}}}\mathbf{I}_{M_{\mathrm{B}}M_{\mathrm{R}}},\;k\in\mathcal{K}. (60)

C. CRLB Analysis

In this subsection, we present CRLBs for both the RIS-TX and CscdChn training schemes and compare their performance.

1) CRLB for RIS-TX scheme

Based on signal model (4), the likelihood function of the random vector 𝐲B\mathbf{y}_{\mathrm{B}} conditioned on 𝐠\mathbf{g} is

𝖿R(𝐲B;𝐠)=e(𝚺B𝐈MR)12(𝐲B((𝚽𝐏)T𝐈MB)𝐠)22πMBMR𝖽𝖾𝗍(𝚺B𝐈MR).\mathsf{f}_{\mathrm{R}}(\mathbf{y}_{\mathrm{B}};\mathbf{g})=\frac{e^{-\|(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})^{-\frac{1}{2}}(\mathbf{y}_{\mathrm{B}}-((\bm{\Phi}\mathbf{P})^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{g})\|_{2}^{2}}}{\pi^{M_{\mathrm{B}}M_{\mathrm{R}}}\mathsf{det}(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})}. (61)

The Fisher information matrix (FIM) of 𝐠\mathbf{g} is given as [37]

𝐉(𝐠)𝔼{2𝗅𝗇𝖿R(𝐲B;𝐠)𝐠𝐠}\displaystyle\mathbf{J}(\mathbf{g})\triangleq\mathbb{E}\bigg{\{}-\frac{\partial^{2}\mathsf{ln}\;\mathsf{f}_{\mathrm{R}}(\mathbf{y}_{\mathrm{B}};\mathbf{g})}{\partial\mathbf{g}^{*}\partial\mathbf{g}}\bigg{\}}
=((𝚽𝐏)𝐈MB)(𝚺B𝐈MR)1((𝚽𝐏)T𝐈MB).\displaystyle=((\bm{\Phi}\mathbf{P})^{*}\otimes\mathbf{I}_{M_{\mathrm{B}}})(\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})^{-1}((\bm{\Phi}\mathbf{P})^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}}). (62)

Following the similar procedure, the FIM of 𝐡k,k𝒦\mathbf{h}_{k},\;k\in\mathcal{K}, can be expressed as

𝐉(𝐡k)=(𝚽𝐏)𝚺U,k1(𝚽𝐏)T,k𝒦.\mathbf{J}(\mathbf{h}_{k})=(\bm{\Phi}\mathbf{P})^{*}\bm{\Sigma}_{\mathrm{U},k}^{-1}(\bm{\Phi}\mathbf{P})^{\mathrm{T}},\;k\in\mathcal{K}. (63)

By exploiting [37, Eq. (3.30)], the CRLB of effective CscdChn’s MSE matrix 𝐂Hc,k,k𝒦\mathbf{C}_{\mathrm{H_{c}},k},\;k\in\mathcal{K}, is given as follows

𝐂Hc,k(𝐡c,kT𝐠)H𝐉1(𝐠)𝐡c,kT𝐠+(𝐡c,kT𝐡k)H𝐉1(𝐡k)𝐡c,kT𝐡k,\displaystyle\mathbf{C}_{\mathrm{H_{c}},k}\!\!\succeq\!\!\bigg{(}\!\frac{\partial\mathbf{h}_{\mathrm{c},k}^{\mathrm{T}}}{\partial\mathbf{g}}\!\bigg{)}^{\mathrm{H}}\!\!\mathbf{J}^{-\!1}(\!\mathbf{g}\!)\frac{\partial\mathbf{h}_{\mathrm{c},k}^{\mathrm{T}}}{\partial\mathbf{g}}\!\!+\!\!\bigg{(}\!\frac{\partial\mathbf{h}_{\mathrm{c},k}^{\mathrm{T}}}{\partial\mathbf{h}_{k}}\!\bigg{)}^{\mathrm{H}}\!\!\mathbf{J}^{-\!1}(\!\mathbf{h}_{k}\!)\frac{\partial\mathbf{h}_{\mathrm{c},k}^{\mathrm{T}}}{\partial\mathbf{h}_{k}}, (64)

where

𝐡c,kT𝐠=𝖣𝗂𝖺𝗀(𝐡k)𝐈MB,k𝒦,\displaystyle\frac{\partial\mathbf{h}_{\mathrm{c},k}^{\mathrm{T}}}{\partial\mathbf{g}}=\mathsf{Diag}(\mathbf{h}_{k})\otimes\mathbf{I}_{M_{\mathrm{B}}},\;k\in\mathcal{K},
𝐡c,kT𝐡k=𝖻𝗅𝗄𝖽𝗂𝖺𝗀([𝐆]:,1T,,[𝐆]:,MRT),k𝒦.\displaystyle\frac{\partial\mathbf{h}_{\mathrm{c},k}^{\mathrm{T}}}{\partial\mathbf{h}_{k}}=\mathsf{blkdiag}([\mathbf{G}]_{:,1}^{\mathrm{T}},\cdots,[\mathbf{G}]_{:,M_{\mathrm{R}}}^{\mathrm{T}}),\;k\in\mathcal{K}. (65)

Utilizing DFT pilot and assuming Rayleigh fading channels, the NMSE of the RIS-TX scheme is lower-bounded by

𝖭𝖬𝖲𝖤k𝖳𝗋{𝐂Hc,k}𝖳𝗋{𝐑Hc,k}σB2MB𝐡k22+σU,k2𝐆F2ρGρh,kMBPmax,k𝒦,\mathsf{NMSE}_{k}\!\!\triangleq\!\!\frac{\mathsf{Tr}\{\!\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}\!\}}{\mathsf{Tr}\{\!\mathbf{R}_{\mathrm{H}_{\mathrm{c},k}}\!\}}\!\!\geq\!\!\frac{\sigma_{\mathrm{B}}^{2}M_{\mathrm{B}}\|\mathbf{h}_{k}\|_{2}^{2}\!+\!\sigma_{\mathrm{U},k}^{2}\|\mathbf{G}\|_{\mathrm{F}}^{2}}{\rho_{\mathrm{G}}\rho_{\mathrm{h},k}M_{\mathrm{B}}P_{\mathrm{max}}},k\!\in\!\mathcal{K}, (66)

where 𝖳𝗋{𝐑Hc,k}=ρGρh,kMBMR,k𝒦\mathsf{Tr}\{\mathbf{R}_{\mathrm{H}_{\mathrm{c},k}}\}=\rho_{\mathrm{G}}\rho_{\mathrm{h},k}M_{\mathrm{B}}M_{\mathrm{R}},\;k\in\mathcal{K}. Note the above CRLB is dependent on instant realization of 𝐡k,k𝒦\mathbf{h}_{k},\;k\in\mathcal{K}, and 𝐆\mathbf{G}. To eliminate the randomness deriving from channels, taking expectations w.r.t. channels, we obtain

𝖭𝖬𝖲𝖤kρh,kσB2MR+ρGσU,k2MRρGρh,kPmax,k𝒦.\mathsf{NMSE}_{k}\geq\frac{\rho_{\mathrm{h},k}\sigma_{\mathrm{B}}^{2}M_{\mathrm{R}}+\rho_{\mathrm{G}}\sigma_{\mathrm{U},k}^{2}M_{\mathrm{R}}}{\rho_{\mathrm{G}}\rho_{\mathrm{h},k}P_{\mathrm{max}}},\;k\in\mathcal{K}. (67)

2) CRLB for CscdChn scheme

The likelihood function of the receiving signal in (54) given 𝐡c,k,k𝒦\mathbf{h}_{\mathrm{c},k},\;k\in\mathcal{K}, is

𝖿C(𝐲~B,k;𝐡c,k)=e𝐲~B,kPmax(𝚽pT𝐈MB)𝐡c,k22σB2Pmax(πσB2Pmax)MBMR,k𝒦.\mathsf{f}_{\mathrm{C}}(\tilde{\mathbf{y}}_{\mathrm{B},k};\mathbf{h}_{\mathrm{c},k})=\frac{e^{-\frac{\|\tilde{\mathbf{y}}_{\mathrm{B},k}-P_{\mathrm{max}}(\bm{\Phi}_{\mathrm{p}}^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{h}_{\mathrm{c},k}\|_{2}^{2}}{\sigma_{\mathrm{B}}^{2}P_{\mathrm{max}}}}}{(\pi\sigma_{\mathrm{B}}^{2}P_{\mathrm{max}})^{M_{\mathrm{B}}M_{\mathrm{R}}}},\;k\in\mathcal{K}. (68)

The FIM of 𝐡c,k,k𝒦\mathbf{h}_{\mathrm{c},k},\;k\in\mathcal{K}, can be readily obtained as

𝐉~(𝐡c,k)=PmaxσB2((𝚽p𝚽pT)𝐈MB),k𝒦.\tilde{\mathbf{J}}(\mathbf{h}_{\mathrm{c},k})=\frac{P_{\mathrm{max}}}{\sigma_{\mathrm{B}}^{2}}((\bm{\Phi}_{\mathrm{p}}^{*}\bm{\Phi}_{\mathrm{p}}^{\mathrm{T}})\otimes\mathbf{I}_{M_{\mathrm{B}}}),\;k\in\mathcal{K}. (69)

Under the DFT pilot and Rayleigh fading channel assumptions, the CRLB of the CscdChn scheme’s NMSE is

𝖭𝖬𝖲𝖤k𝖳𝗋{𝐉~1(𝐡c,k)}𝖳𝗋{𝐑Hc,k}=σB2ρGρh,kMRPmax,k𝒦.\mathsf{NMSE}_{k}\geq\frac{\mathsf{Tr}\{\tilde{\mathbf{J}}^{-1}(\mathbf{h}_{\mathrm{c},k})\}}{\mathsf{Tr}\{\mathbf{R}_{\mathrm{H_{c}},k}\}}=\frac{\sigma_{\mathrm{B}}^{2}}{\rho_{\mathrm{G}}\rho_{\mathrm{h},k}M_{\mathrm{R}}P_{\mathrm{max}}},\;k\in\mathcal{K}. (70)

3) RIS-TX v.s. CscdChn — The Watershed MRM_{\mathrm{R}}

Comparing (67) and (70), we notice one important fact is that MRM_{\mathrm{R}} exerts opposite impact on the two competing CE schemes. The growth of MRM_{\mathrm{R}} increases the RIS-TX scheme’s CRLB while lowering that of the CscdChn method, which implies that there exists a threshold MRThM_{\mathrm{R}}^{\mathrm{Th}} such that the RIS-TX scheme outperforms CscdChn when MR<MRThM_{\mathrm{R}}<M_{\mathrm{R}}^{\mathrm{Th}} and vice versa. In the following, we investigate the impact of PmaxP_{\mathrm{max}} and pathloss fading on the cross-point MRThM_{\mathrm{R}}^{\mathrm{Th}}.

Refer to caption
Fig. 2. The impact of pathloss fading on MRThM_{\mathrm{R}}^{\mathrm{Th}} (Pmax=50dBmP_{\mathrm{max}}=50\mathrm{dBm}, {σU,k2}k=1K=σB2=90dBm\{\sigma_{\mathrm{U},k}^{2}\}_{k=1}^{K}=\sigma_{\mathrm{B}}^{2}=-90\mathrm{dBm}).

i) Impact of PmaxP_{\mathrm{max}} on MRThM_{\mathrm{R}}^{\mathrm{Th}}

With other system settings fixed, MRThM_{\mathrm{R}}^{\mathrm{Th}} will not change when PmaxP_{\mathrm{max}} varies. In fact, via dividing the rightmost side of (67) by that of (70), the ratio (σB2MB𝐡k22+σU,k2𝐆F2)MRσB2MB\frac{(\sigma_{\mathrm{B}}^{2}M_{\mathrm{B}}\|\mathbf{h}_{k}\|_{2}^{2}+\sigma_{\mathrm{U},k}^{2}\|\mathbf{G}\|_{\mathrm{F}}^{2})M_{\mathrm{R}}}{\sigma_{\mathrm{B}}^{2}M_{\mathrm{B}}} is independent of PmaxP_{\mathrm{max}}.

ii) Impact of pathloss fading on MRThM_{\mathrm{R}}^{\mathrm{Th}}

To investigate the impact of pathloss on MRThM_{\mathrm{R}}^{\mathrm{Th}}, we adopt the classical 3GPP Urban Micro channel model [38] in the following comparison. Specifically, according to [38], the channel fading gain for a typical indoor environment is given as

𝖿(d)=Gt+Gr37.522log10(d/1m)[dB],\mathsf{f}(d)=G_{\mathrm{t}}+G_{\mathrm{r}}-37.5-22\log_{10}(d/1\mathrm{m})\;[\mathrm{dB}], (71)

where GtG_{\mathrm{t}} and GrG_{\mathrm{r}} indicate the antenna gains (in dBi) at the transmitter and the receiver, respectively, and dd is the distance (in meters) between the TX and RX terminals (here dd is the distance between the RIS and BS/users). Besides, we denote {σU,k2}k=1K=σB2=σ2\{\sigma_{\mathrm{U},k}^{2}\}_{k=1}^{K}=\sigma_{\mathrm{B}}^{2}=\sigma^{2} and omit the subscript kk in (67) and (70) for simplicity. In Fig. 2, under various pathloss settings (i.e. dd), we present CRLBs of two CE schemes and the associated cross-point MRThM_{\mathrm{R}}^{\mathrm{Th}}. The results in Fig. 2 indicate that the severer pathloss is, the larger MRThM_{\mathrm{R}}^{\mathrm{Th}} is. The underlying reason lies in that CscdChn scheme experiences “double”-fading while RIS-TX has only “single”-fading, which makes CscdChn scheme influenced by pathloss fading much more severely.

Especially, as shown in Fig. 2, for typical indoor scenarios, the CscdChn scheme outperforms its RIS-TX counterpart when RIS has large size (i.e. MR3000M_{\mathrm{R}}\geq 3000), which implies that our RIS-TX scheme can be beneficial for RIS device with small/moderate size.

Remark 3. Note that the above discussion on MRThM_{\mathrm{R}}^{\mathrm{Th}} is based on CRLB, which is only achievable when PmaxP_{\mathrm{max}} is extremely large. For finite PmaxP_{\mathrm{max}}, the cross-point MRThM_{\mathrm{R}}^{\mathrm{Th}} can be unrealistically large, e.g., \geq 10000 (see Sec. IV. E).

D. Scaling Laws

In the following, we examine the scaling laws of the RIS-TX and CscdChn schemes when PmaxP_{\mathrm{max}} and MRM_{\mathrm{R}} are in limit regime. Here, we can focus on the CE performance of one specific user kk. The normalized MSE (NMSE) is defined in (66), where 𝖳𝗋{𝐑Hc,k}=ρGρh,kMBMR,k𝒦\mathsf{Tr}\{\mathbf{R}_{\mathrm{H}_{\mathrm{c},k}}\}=\rho_{\mathrm{G}}\rho_{\mathrm{h},k}M_{\mathrm{B}}M_{\mathrm{R}},\;k\in\mathcal{K}, with Rayleigh fading assumption. For simplicity, we denote {ρh,k}k=1K=ρh\{\rho_{\mathrm{h},k}\}_{k=1}^{K}=\rho_{\mathrm{h}} and σU,k2=σB2=σ2\sigma_{\mathrm{U},k}^{2}=\sigma_{\mathrm{B}}^{2}=\sigma^{2}.

Refer to caption
Fig. 3. RIS-TX training v.s. CscdChn training: Impact of PmaxP_{\mathrm{max}} (LMMSE estimation).

1) Scaling law of PmaxP_{\mathrm{max}}

Define 𝖲𝖭𝖱=Pmax/σ2\mathsf{SNR}=P_{\mathrm{max}}/\sigma^{2}. Then, when 𝖲𝖭𝖱\mathsf{SNR} is large, it can be shown that the NMSE of the RIS-TX training scheme, including both LMMSE and LS estimation, can be approximated by

𝖭𝖬𝖲𝖤ρGMR+ρhMRρGρh𝒪(𝖲𝖭𝖱1).\mathsf{NMSE}\approx\frac{\rho_{\mathrm{G}}M_{\mathrm{R}}+\rho_{\mathrm{h}}M_{\mathrm{R}}}{\rho_{\mathrm{G}}\rho_{\mathrm{h}}}\mathcal{O}(\mathsf{SNR}^{-1}). (72)

At the same time, the NMSE of the CscdChn scheme using either LMMSE or LS converges to

𝖭𝖬𝖲𝖤1ρGρhMR𝒪(𝖲𝖭𝖱1).\mathsf{NMSE}\approx\frac{1}{\rho_{\mathrm{G}}\rho_{\mathrm{h}}M_{\mathrm{R}}}\mathcal{O}(\mathsf{SNR}^{-1}). (73)

As seen above, when PmaxP_{\mathrm{max}} is large, the NMSE of the two competing training schemes both converge to zero, but with different rates. For realistic setting, where MRM_{\mathrm{R}} is finite (several tens or hundreds) and the distances of BS-RIS and RIS-user are over ten meters, CscdChn generally converges faster.

Refer to caption
Fig. 4. RIS-TX training v.s. CscdChn training: Impact of MRM_{\mathrm{R}} (LMMSE estimation).
Refer to caption
Fig. 5. RIS-TX training v.s. CscdChn training: Coverage (LMMSE estimation).

2) Scaling law of MRM_{\mathrm{R}}

With infinite MRM_{\mathrm{R}}, it can be observed that the NMSE of the RIS-TX training method exploiting LMMSE estimation converges to ρGρhσ4σ4×MBMRρGρhMBMR=1\frac{\rho_{\mathrm{G}}\rho_{\mathrm{h}}\sigma^{4}}{\sigma^{4}}\times\frac{M_{\mathrm{B}}M_{\mathrm{R}}}{\rho_{\mathrm{G}}\rho_{\mathrm{h}}M_{\mathrm{B}}M_{\mathrm{R}}}=1, while that utilizing LS estimator approaches to

𝖭𝖬𝖲𝖤1ρGρh𝖲𝖭𝖱2𝒪(MR2)MR.\mathsf{NMSE}\approx\frac{1}{\rho_{\mathrm{G}}\rho_{\mathrm{h}}\mathsf{SNR}^{2}}\mathcal{O}(M_{\mathrm{R}}^{2})\xrightarrow{M_{\mathrm{R}}\rightarrow\infty}\infty. (74)

Meanwhile, the NMSE of the CscdChn scheme leveraging both LMMSE and LS estimation converges to

𝖭𝖬𝖲𝖤1ρGρh𝖲𝖭𝖱𝒪(MR1)MR0.\mathsf{NMSE}\approx\frac{1}{\rho_{\mathrm{G}}\rho_{\mathrm{h}}\mathsf{SNR}}\mathcal{O}(M_{\mathrm{R}}^{-1})\xrightarrow{M_{\mathrm{R}}\rightarrow\infty}0. (75)

As demonstrated above, with the RIS size extremely large, the NMSE of CscdChn training scheme converges to zero utilizing both LMMSE and LS estimation, which always outperforms the RIS-TX estimation method because the latter converges to a constant (LMMSE estimator) or tends to be infinite (LS estimator). The reason lies in that, when MRM_{\mathrm{R}} is large, more element-wise CscdChns will be combined to form the effective channel in CscdChn estimation scheme, which indeed enlarges the effective channel’s magnitude and therefore improves the CE performance. In contrast, for RIS-TX training scheme, when MRM_{\mathrm{R}} grows, the average power allocated to each element-wise channel decreases, which deteriorates the CE precision. Therefore, MRM_{\mathrm{R}} exerts opposite influence onto the two training schemes.

The above asymptotic analysis appears to suggest that the CscdChn training can be more beneficial than the RIS-TX scheme. However, in practice, realistic settings of PmaxP_{\mathrm{max}} and MRM_{\mathrm{R}} are in finite regime, as elaborated in the next subsection.

E. CE Performance for Realistic Scenarios

We continue to assess different training schemes’ performance in realistic settings. To this end, we still adopt the classical 3GPP Urban Micro channel model [38] in the following comparison. Based on (71), we consider a typical indoor setting that both BS-RIS and RIS-user distances are set as d=80md=80\mathrm{m} and σU,k2=σB2=90dBm\sigma_{\mathrm{U},k}^{2}=\sigma_{\mathrm{B}}^{2}=-90\mathrm{dBm} in the following assessment.

1) Comparison of PmaxP_{\mathrm{max}}

Based on the above setting, we visually compare the two training schemes using LMMSE by plotting their NMSE curves w.r.t. PmaxP_{\mathrm{max}} in Fig. 3, whose left and right half illustrates the RIS-TX and CscdChn scheme, respectively. As shown by Fig. 3, RIS-TX scheme significantly outperforms the CscdChn counterpart. For instance, by setting MR=64M_{\mathrm{R}}=64, to achieve NMSE of 10dB-10\mathrm{dB}, the TX power of RIS-TX is 0.032W0.032\mathrm{W} while that for the CscdChn counterpart reaches up to 105.1W105.1\mathrm{W}! When PmaxP_{\mathrm{max}} is 20dBm20\mathrm{dBm}, RIS-TX is 14.64dB14.64\mathrm{dB} superior to CscdChn in NMSE!

2) Comparison of MRM_{\mathrm{R}}

We present NMSE w.r.t. MRM_{\mathrm{R}} of RIS-TX and CscdChn schemes in the left and right part of Fig. 4, respectively, where LMMSE estimation is used. As seen, increasing MRM_{\mathrm{R}} imposes opposite impacts onto the two training schemes. Although growing MRM_{\mathrm{R}} can benefit the CscdChn scheme, its MRM_{\mathrm{R}} has to be prohibitively large to beat the RIS-TX opponent. For instance, with Pmax=20dBmP_{\mathrm{max}}=20\mathrm{dBm}, to reach 10dB-10\mathrm{dB} NMSE level, MRM_{\mathrm{R}} has to exceed 6.73×1046.73\times 10^{4}, which is unrealistic.

3) Comparison of coverage

We also examine the coverage of the two training schemes in the above realistic setting. The two schemes’ NMSE curves w.r.t. the distance dd are presented in Fig. 5, where LMMSE estimation is used. As seen, by fixing Pmax=15dBmP_{\mathrm{max}}=15\mathrm{dBm} and MR=64M_{\mathrm{R}}=64, to reach 10dB-10\mathrm{dB} NMSE level, the coverage of CscdChn scheme is 12.67m12.67\mathrm{m}, while that of the RIS-TX can extend up to 79.18m79.18\mathrm{m}.

V Simulation Results

Refer to caption
Fig. 6. Simulation scenario.
Refer to caption
Fig. 7. Convergence of GD algorithm (Alg. 1).

In this section, extensive numerical results will be presented to verify the effectiveness of our algorithms and the benefit of the proposed RIS-TX training scheme. As illustrated in Fig. 6, we consider a system which includes a BS with MB=8M_{\mathrm{B}}=8 antennas, K=4K=4 users and a RIS equipped with one TX RF-chain. The covariance matrices are given by 𝐑G=ρG(𝐑GR𝐑GB)\mathbf{R}_{\mathrm{G}}=\rho_{\mathrm{G}}(\mathbf{R}_{\mathrm{GR}}\otimes\mathbf{R}_{\mathrm{GB}}) and 𝐑h,k=ρh,k𝐑hR,k,k𝒦\mathbf{R}_{\mathrm{h},k}=\rho_{\mathrm{h},k}\mathbf{R}_{\mathrm{hR},k},\;k\in\mathcal{K}, where 𝐑GR\mathbf{R}_{\mathrm{GR}}, 𝐑GB\mathbf{R}_{\mathrm{GB}} and 𝐑hR,k,k𝒦\mathbf{R}_{\mathrm{hR},k},\;k\in\mathcal{K}, denote the spatial correlation matrices at the RIS for 𝐆\mathbf{G}, at the BS for 𝐆\mathbf{G} and at the RIS for 𝐡k,k𝒦\mathbf{h}_{k},\;k\in\mathcal{K}, respectively. Hence, the covariance matrix of 𝐡c,k,k𝒦\mathbf{h}_{\mathrm{c},k},\;k\in\mathcal{K}, can be calculated as 𝐑c,k=ρGρh,k((𝐑hR,k𝐑GR)𝐑GB),k𝒦\mathbf{R}_{\mathrm{c},k}=\rho_{\mathrm{G}}\rho_{\mathrm{h},k}((\mathbf{R}_{\mathrm{hR},k}\odot\mathbf{R}_{\mathrm{GR}})\otimes\mathbf{R}_{\mathrm{GB}}),\;k\in\mathcal{K} [1]. Unless otherwise specified, the distance-dependent pathloss is established as (71), dB={dU,k}k=1K=d=100md_{\mathrm{B}}=\{d_{\mathrm{U},k}\}_{k=1}^{K}=d=100\mathrm{m}, 𝚺B=σB2𝐈MB\bm{\Sigma}_{\mathrm{B}}=\sigma_{\mathrm{B}}^{2}\mathbf{I}_{M_{\mathrm{B}}}, σB2=90dBm\sigma_{\mathrm{B}}^{2}=-90\mathrm{dBm} and σU,k2=80dBm,k𝒦\sigma_{\mathrm{U},k}^{2}=-80\mathrm{dBm},\;k\in\mathcal{K}.

Refer to caption
Fig. 8. Convergence of PDD algorithm (Alg. 2).
Refer to caption
Fig. 9. The impact of PmaxP_{\mathrm{max}} on CE performance.

Fig. 7 illustrates the convergence of Alg. 1. It can be observed that, the GD algorithm generally converges within two or three tens of iterations.

Fig. 8 verifies the convergence behaviours of the proposed PDD algorithm. The left part of Fig. 8 illustrates the violation of the equalities in (24)-(27) and the right part illustrates objective iterates. The PDD algorithm generally converges very fast.

In Fig. 9, we investigate the impact of the transmit power PmaxP_{\mathrm{max}} on the MSE performance. As suggested by the figure, the MSE decreases when the transmit power grows. Interestingly, the pilots numerical designed by our proposed GD and PDD solutions yield nearly identical performance. Importantly, the results in Fig. 9 reflect that the pilot sequence yielded by our proposed GD/PDD algorithms outperforms the classical DFT codebook for RIS-TX training scheme. Note that the DFT sequence is nearly optimal for the classical cascaded CE scheme, as demonstrated by different literature [6]-[8], [10], [11].

Fig. 10 plots the impact of the number of reflecting elements on the CE performance. As shown in Fig. 10, given a specific PmaxP_{\mathrm{max}}, the CE performance degrades when the number of RIS elements MRM_{\mathrm{R}} increases, whose reason has been presented in Sec. IV. D. Besides, as previously discussed, the pilot sequences yielded by our GD/PDD algorithms have superior performance than the DFT counterpart.

Refer to caption
Fig. 10. The impact of MRM_{\mathrm{R}} on CE performance.
Refer to caption
Fig. 11. RIS-TX training v.s. CscdChn training: Impact of PmaxP_{\mathrm{max}} (LS estimation, d=80md=80\mathrm{m}, σU,k2=σB2=90dBm\sigma_{\mathrm{U},k}^{2}=\sigma_{\mathrm{B}}^{2}=-90\mathrm{dBm}).
Refer to caption
Fig. 12. RIS-TX training v.s. CscdChn training: Impact of MRM_{\mathrm{R}} (LS estimation, d=80md=80\mathrm{m}, σU,k2=σB2=90dBm\sigma_{\mathrm{U},k}^{2}=\sigma_{\mathrm{B}}^{2}=-90\mathrm{dBm}).
Refer to caption
Fig. 13. RIS-TX training v.s. CscdChn training: Coverage (LS estimation, σU,k2=σB2=90dBm\sigma_{\mathrm{U},k}^{2}=\sigma_{\mathrm{B}}^{2}=-90\mathrm{dBm}).

In Fig. 11, we compare the MSE performance of our proposed RIS-TX CE scheme and the standard CscdChn training scheme [8], [9] based on LS estimation. Specifically, the left half presents CE performance associated with the RIS-TX training scheme while the right half plots that of the standard CscdChn training. As suggested by the results in Fig. 11, the proposed RIS-TX scheme significantly outperforms the CscdChn training. This is because the severe double fading effect severely attenuates the power of CscdChn. At the same time, it is interesting to note that increasing MRM_{\mathrm{R}} tends to improve the CE performance for CscdChn training scheme, as opposed to the impact observed in RIS-TX scheme (recall Fig. 10). Detailed explanation of this phenomenon has been given previously. At the same time, LMMSE estimator dramatically surpasses the LS counterpart, since the latter one does not suppress the noise, which is indeed much larger than the pilot signal (recall Fig. 3).

In Fig. 12, the impact of MRM_{\mathrm{R}} on CE precision is illustrated for LS. The left half exhibits the NMSE performance of RIS-TX training scheme and the right half presents those for the CscdChn training. As shown by this figure, for the test case that corresponds to typical indoor setting, the RIS-TX scheme is obviously superior over the CscdChn training. Besides, as previously explained, increment of MRM_{\mathrm{R}} exerts opposite influence on NMSE of the two channel training schemes.

In Fig. 13, we investigate the coverage of the proposed RIS-TX and CscdChn training schemes using LS estimator. Based on the pathloss modeling specified in [38] (see (71)), we plot the relation between NMSE and the distance dd in Fig. 13. To reach a reasonable NMSE level of 10dB-10\mathrm{dB}, we indicates coverage difference between the two channel training schemes in the plots. As can be seen, the proposed RIS-TX scheme can enlarge the coverage by several tens or even a hundred meters in an typical indoor scenario.

VI Conclusion

In this paper, we propose a novel RIS-TX channel training scheme, where the RIS is equipped with one TX RF-chain and is able to transmit pilot signals during the training period. The new training scheme’s pilot sequence design problem turns out to be a very challenging quartic optimization problem. We have developed two efficient solutions to design pilot sequences. Both of these algorithms yield pilots with superior estimation performance over the DFT sequence, which is widely believed to be nearly optimal in the literature. Both theoretic analysis and numerical experiment results show that our proposed RIS-TX training scheme can significantly outperform the classical CscdChn training method in realistic scenarios.

A. Derivation of (10)

According to the definition of 𝐂Hc,k,k𝒦\mathbf{C}_{\mathrm{H}_{\mathrm{c}},k},\;k\in\mathcal{K}, its value can be calculated as

𝐂Hc,k=𝔼{𝗏𝖾𝖼(𝐆𝖣𝗂𝖺𝗀(𝐡k))𝗏𝖾𝖼H(𝐆𝖣𝗂𝖺𝗀(𝐡k))}\displaystyle\mathbf{C}_{\mathrm{H}_{\mathrm{c}},k}=\mathbb{E}\{\mathsf{vec}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\mathsf{vec}^{\mathrm{H}}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\}
+𝔼{𝗏𝖾𝖼(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))𝗏𝖾𝖼H(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))}\displaystyle+\mathbb{E}\{\mathsf{vec}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\mathsf{vec}^{\mathrm{H}}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\}
𝔼{𝗏𝖾𝖼(𝐆𝖣𝗂𝖺𝗀(𝐡k))𝗏𝖾𝖼H(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))}\displaystyle-\mathbb{E}\{\mathsf{vec}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\mathsf{vec}^{\mathrm{H}}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\}
𝔼{𝗏𝖾𝖼(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))𝗏𝖾𝖼H(𝐆𝖣𝗂𝖺𝗀(𝐡k))},k𝒦.\displaystyle-\mathbb{E}\{\mathsf{vec}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\mathsf{vec}^{\mathrm{H}}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\},\;k\in\mathcal{K}. (76)

We firstly evaluate the first two expectations. To this end, we study the following expectation

𝐑=𝔼{𝗏𝖾𝖼(𝐗𝖣𝗂𝖺𝗀(𝐲))𝗏𝖾𝖼H(𝐗𝖣𝗂𝖺𝗀(𝐲))},\mathbf{R}=\mathbb{E}\{\mathsf{vec}(\mathbf{X}\mathsf{Diag}(\mathbf{y}))\mathsf{vec}^{\mathrm{H}}(\mathbf{X}\mathsf{Diag}(\mathbf{y}))\}, (77)

where 𝐗M×N\mathbf{X}\in\mathbb{C}^{M\times N} and 𝐲N\mathbf{y}\in\mathbb{C}^{N} are uncorrelated, 𝗏𝖾𝖼(𝐗)\mathsf{vec}(\mathbf{X}) and 𝐲\mathbf{y} have zero means and the correlation matrices 𝐑X\mathbf{R}_{\mathrm{X}} and 𝐑y\mathbf{R}_{\mathrm{y}}, respectively. Define 𝐃𝖣𝗂𝖺𝗀(𝐲)\mathbf{D}\triangleq\mathsf{Diag}(\mathbf{y}), then, the value of 𝐑\mathbf{R} can be derived as follows

𝐑\displaystyle\mathbf{R} =𝔼{𝗏𝖾𝖼(𝐗𝐃)𝗏𝖾𝖼H(𝐗𝐃)}\displaystyle=\mathbb{E}\{\mathsf{vec}(\mathbf{X}\mathbf{D})\mathsf{vec}^{\mathrm{H}}(\mathbf{X}\mathbf{D})\}
=𝔼𝐗,𝐲{(𝐃T𝐈M)𝗏𝖾𝖼(𝐗)𝗏𝖾𝖼H(𝐗)(𝐃𝐈M)}\displaystyle=\mathbb{E}_{\mathbf{X},\mathbf{y}}\{(\mathbf{D}^{\mathrm{T}}\otimes\mathbf{I}_{M})\mathsf{vec}(\mathbf{X})\mathsf{vec}^{\mathrm{H}}(\mathbf{X})(\mathbf{D}^{\mathrm{*}}\otimes\mathbf{I}_{M})\}
=𝔼𝐲{(𝐃T𝐈M)𝔼𝐗{𝗏𝖾𝖼(𝐗)𝗏𝖾𝖼H(𝐗)}(𝐃𝐈M)}\displaystyle=\mathbb{E}_{\mathbf{y}}\{(\mathbf{D}^{\mathrm{T}}\otimes\mathbf{I}_{M})\mathbb{E}_{\mathbf{X}}\{\mathsf{vec}(\mathbf{X})\mathsf{vec}^{\mathrm{H}}(\mathbf{X})\}(\mathbf{D}^{\mathrm{*}}\otimes\mathbf{I}_{M})\}
=𝔼𝐲{𝖣𝗂𝖺𝗀(𝐲𝟏M)𝐑X𝖣𝗂𝖺𝗀(𝐲𝟏M)}\displaystyle=\mathbb{E}_{\mathbf{y}}\{\mathsf{Diag}(\mathbf{y}\otimes\bm{1}_{M})\mathbf{R}_{\mathrm{X}}\mathsf{Diag}(\mathbf{y}^{*}\otimes\bm{1}_{M})\}
=𝐑X𝔼𝐲{(𝐲𝟏M)(𝐲𝟏M)H}\displaystyle=\mathbf{R}_{\mathrm{X}}\odot\mathbb{E}_{\mathbf{y}}\{(\mathbf{y}\otimes\bm{1}_{M})(\mathbf{y}\otimes\bm{1}_{M})^{\mathrm{H}}\}
=𝐑X𝔼𝐲{(𝐲𝐲H)(𝟏M𝟏MT)}\displaystyle=\mathbf{R}_{\mathrm{X}}\odot\mathbb{E}_{\mathbf{y}}\{(\mathbf{y}\mathbf{y}^{\mathrm{H}})\otimes(\bm{1}_{M}\bm{1}_{M}^{\mathrm{T}})\}
=𝐑X(𝔼𝐲{𝐲𝐲H}𝟙M)=𝐑X(𝐑y𝟙M),\displaystyle=\mathbf{R}_{\mathrm{X}}\odot(\mathbb{E}_{\mathbf{y}}\{\mathbf{y}\mathbf{y}^{\mathrm{H}}\}\otimes\mathbbm{1}_{M})=\mathbf{R}_{\mathrm{X}}\odot(\mathbf{R}_{\mathrm{y}}\otimes\mathbbm{1}_{M}), (78)

where 𝟏M\bm{1}_{M} represents an MM-dimensional all-one vector. Hence, invoking (78), the first two expectations in (76) are given by

𝔼{𝗏𝖾𝖼(𝐆𝖣𝗂𝖺𝗀(𝐡k))𝗏𝖾𝖼H(𝐆𝖣𝗂𝖺𝗀(𝐡k))}=𝐑G(𝐑h,k𝟙MB),\displaystyle\mathbb{E}\{\mathsf{vec}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\mathsf{vec}^{\mathrm{H}}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\}\!\!=\!\!\mathbf{R}_{\mathrm{G}}\!\!\odot\!\!(\mathbf{R}_{\mathrm{h},k}\!\otimes\!\mathbbm{1}_{M_{\mathrm{B}}}), (79)
𝔼{𝗏𝖾𝖼(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))𝗏𝖾𝖼H(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))}=𝐂G^(𝐂h^,k𝟙MB).\displaystyle\mathbb{E}\{\mathsf{vec}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\mathsf{vec}^{\mathrm{H}}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\}\!\!=\!\!\mathbf{C}_{\hat{\mathrm{G}}}\!\!\odot\!\!(\mathbf{C}_{\hat{\mathrm{h}},k}\!\otimes\!\mathbbm{1}_{M_{\mathrm{B}}}). (80)

Next, utilizing the following identities that are easily verified

𝔼{𝐠𝐠^H}=𝐑G𝖠GH(𝚿)𝐖G,\displaystyle\mathbb{E}\{\mathbf{g}\hat{\mathbf{g}}^{\mathrm{H}}\}=\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}}, (81)
𝔼{𝐡k𝐡^kH}=𝐑h,k𝚿𝐖h,k,k𝒦,\displaystyle\mathbb{E}\{\mathbf{h}_{k}\hat{\mathbf{h}}_{k}^{\mathrm{H}}\}=\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{\mathrm{*}}\mathbf{W}_{\mathrm{h},k},\;k\in\mathcal{K}, (82)

the third expectation in (76) can be derived as follows

𝔼{𝗏𝖾𝖼(𝐆𝖣𝗂𝖺𝗀(𝐡k))𝗏𝖾𝖼H(𝐆^𝖣𝗂𝖺𝗀(𝐡^k))}\displaystyle\mathbb{E}\{\mathsf{vec}(\mathbf{G}\mathsf{Diag}(\mathbf{h}_{k}))\mathsf{vec}^{\mathrm{H}}(\hat{\mathbf{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}))\}
=\displaystyle=\; 𝔼𝐆,𝐡{(𝐃h,kT𝐈MB)𝗏𝖾𝖼(𝐆)𝗏𝖾𝖼H(𝐆^)(𝐃^h,k𝐈MB)}\displaystyle\mathbb{E}_{\mathbf{G},\mathbf{h}}\{(\mathbf{D}_{\mathrm{h},k}^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathsf{vec}(\mathbf{G})\mathsf{vec}^{\mathrm{H}}(\hat{\mathbf{G}})(\hat{\mathbf{D}}_{\mathrm{h},k}^{*}\otimes\mathbf{I}_{M_{\mathrm{B}}})\}
=\displaystyle=\; 𝔼𝐡{(𝐃h,kT𝐈MB)𝔼𝐆{𝗏𝖾𝖼(𝐆)𝗏𝖾𝖼H(𝐆^)}(𝐃^h,k𝐈MB)}\displaystyle\mathbb{E}_{\mathbf{h}}\{(\mathbf{D}_{\mathrm{h},k}^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbb{E}_{\mathbf{G}}\{\mathsf{vec}(\mathbf{G})\mathsf{vec}^{\mathrm{H}}(\hat{\mathbf{G}})\}(\hat{\mathbf{D}}_{\mathrm{h},k}^{*}\otimes\mathbf{I}_{M_{\mathrm{B}}})\}
=\displaystyle=\; 𝔼𝐡{𝖣𝗂𝖺𝗀(𝐡k𝟏MB)𝐑G𝖠GH(𝚿)𝐖G𝖣𝗂𝖺𝗀(𝐡^k𝟏MB)}\displaystyle\mathbb{E}_{\mathbf{h}}\{\mathsf{Diag}(\mathbf{h}_{k}\otimes\bm{1}_{M_{\mathrm{B}}})\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}}\mathsf{Diag}(\hat{\mathbf{h}}_{k}^{*}\otimes\bm{1}_{M_{\mathrm{B}}})\}
=\displaystyle=\; (𝐑G𝖠GH(𝚿)𝐖G)𝔼𝐡{(𝐡k𝟏MB)(𝐡^k𝟏MB)H}\displaystyle(\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}})\odot\mathbb{E}_{\mathbf{h}}\{(\mathbf{h}_{k}\otimes\bm{1}_{M_{\mathrm{B}}})(\hat{\mathbf{h}}_{k}\otimes\bm{1}_{M_{\mathrm{B}}})^{\mathrm{H}}\}
=\displaystyle=\; (𝐑G𝖠GH(𝚿)𝐖G)𝔼𝐡{(𝐡k𝐡^kH)(𝟏MB𝟏MBT)}\displaystyle(\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}})\odot\mathbb{E}_{\mathbf{h}}\{(\mathbf{h}_{k}\hat{\mathbf{h}}_{k}^{\mathrm{H}})\otimes(\bm{1}_{M_{\mathrm{B}}}\bm{1}_{M_{\mathrm{B}}}^{\mathrm{T}})\}
=\displaystyle=\; (𝐑G𝖠GH(𝚿)𝐖G)(𝔼𝐡{(𝐡k𝐡^kH)}𝟙MB)\displaystyle(\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}})\odot(\mathbb{E}_{\mathbf{h}}\{(\mathbf{h}_{k}\hat{\mathbf{h}}_{k}^{\mathrm{H}})\}\otimes\mathbbm{1}_{M_{\mathrm{B}}})
=\displaystyle=\; (𝐑G𝖠GH(𝚿)𝐖G)((𝐑h,k𝚿𝐖h,k)𝟙MB),\displaystyle(\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})\mathbf{W}_{\mathrm{G}})\odot((\mathbf{R}_{\mathrm{h},k}\bm{\Psi}^{\mathrm{*}}\mathbf{W}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}), (83)

where 𝐃h,k𝖣𝗂𝖺𝗀(𝐡k),k𝒦\mathbf{D}_{\mathrm{h},k}\triangleq\mathsf{Diag}(\mathbf{h}_{k}),\;k\in\mathcal{K}, and 𝐃^h,k𝖣𝗂𝖺𝗀(𝐡^k),k𝒦\hat{\mathbf{D}}_{\mathrm{h},k}\triangleq\mathsf{Diag}(\hat{\mathbf{h}}_{k}),\;k\in\mathcal{K}. Combining (79), (80), (83) and its hermitian transpose, we obtain (10).

B. Derivation of 𝚯𝗀(𝚯)\nabla_{\bm{\Theta}}\mathsf{g}(\bm{\Theta})

Define 𝗀~(𝚽)\tilde{\mathsf{g}}(\bm{\Phi}) as the objective of (𝒫2)(\mathcal{P}2) and we firstly calculate 𝚽𝗀~(𝚽)\frac{\partial}{\partial\bm{\Phi}}\tilde{\mathsf{g}}(\bm{\Phi}). It can be seen that the Hadamard product and the Kronecker product make it difficult to derive. Notice that 𝖳𝗋{𝐀𝐁}\mathsf{Tr}\{\mathbf{A}\odot\mathbf{B}\} can be written as

𝖳𝗋{𝐀𝐁}=𝖳𝗋{𝖣𝖽𝗂𝖺𝗀(𝐀)𝐁}=𝖳𝗋{𝐀𝖣𝖽𝗂𝖺𝗀(𝐁)}.\mathsf{Tr}\{\mathbf{A}\odot\mathbf{B}\}=\mathsf{Tr}\{\mathsf{Ddiag}(\mathbf{A})\mathbf{B}\}=\mathsf{Tr}\{\mathbf{A}\mathsf{Ddiag}(\mathbf{B})\}. (84)

Via exploiting (84), the Hadamard product in (12) disappears.

To overcome the difficulty led by the Kronecker product, we first define and study the following derivatives

𝐅o(𝐀,𝐁,𝐂,𝐗)𝐗𝖳𝗋{𝐂((𝐀𝐗𝐁)𝟙N)},\displaystyle\mathbf{F}_{\mathrm{o}}(\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{X})\triangleq\frac{\partial}{\partial\mathbf{X}}\mathsf{Tr}\{\mathbf{C}((\mathbf{AXB})\otimes\mathbbm{1}_{N})\}, (85)
𝐅e(𝐀,𝐁,𝐂,𝐗)𝐗𝖳𝗋{𝐂((𝐀𝐗𝐁)𝐈N)},\displaystyle\mathbf{F}_{\mathrm{e}}(\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{X})\triangleq\frac{\partial}{\partial\mathbf{X}}\mathsf{Tr}\{\mathbf{C}((\mathbf{AXB})\otimes\mathbf{I}_{N})\}, (86)

where the matrices 𝐀\mathbf{A}, 𝐁\mathbf{B} and 𝐗\mathbf{X} are MM-dimensional complex square matrices and 𝐂MN×MN\mathbf{C}\in\mathbb{C}^{MN\times MN}.

We firstly take (85) into consideration. Denote an MM-dimensional matrix 𝐂~\tilde{\mathbf{C}} whose (i,j)(i,j)th element is given by s=(i1)N+1iNt=(j1)N+1jN[𝐂]st\sum_{s=(i-1)N+1}^{iN}\sum_{t=(j-1)N+1}^{jN}[\mathbf{C}]_{st}. Then, 𝖳𝗋{𝐂((𝐀𝐗𝐁)𝟙N)}\mathsf{Tr}\{\mathbf{C}((\mathbf{AXB})\otimes\mathbbm{1}_{N})\} in (85) can be calculated as

𝖳𝗋{𝐂((𝐀𝐗𝐁)𝟙N)}=m=1M[𝐂~]m,:𝐀𝐗[𝐁]:,m.\mathsf{Tr}\{\mathbf{C}((\mathbf{AXB})\otimes\mathbbm{1}_{N})\}=\sum_{m=1}^{M}[\tilde{\mathbf{C}}]_{m,:}\mathbf{AX}[\mathbf{B}]_{:,m}. (87)

Hence, the derivative shown in (85) reads as

𝐗𝖳𝗋{𝐂((𝐀𝐗𝐁)𝟙N)}=𝐗m=1M[𝐂~]m,:𝐀𝐗[𝐁]:,m\displaystyle\frac{\partial}{\partial\mathbf{X}}\mathsf{Tr}\{\mathbf{C}((\mathbf{AXB})\otimes\mathbbm{1}_{N})\}=\frac{\partial}{\partial\mathbf{X}}\sum_{m=1}^{M}[\tilde{\mathbf{C}}]_{m,:}\mathbf{AX}[\mathbf{B}]_{:,m}
=\displaystyle= 𝐗𝖳𝗋{m=1M[𝐁]:,m[𝐂~]m,:𝐀𝐗}=(𝐁𝐂~𝐀)T.\displaystyle\frac{\partial}{\partial\mathbf{X}}\mathsf{Tr}\bigg{\{}\sum_{m=1}^{M}[\mathbf{B}]_{:,m}[\tilde{\mathbf{C}}]_{m,:}\mathbf{AX}\bigg{\}}=(\mathbf{B}\tilde{\mathbf{C}}\mathbf{A})^{\mathrm{T}}. (88)

Following the similar procedure, function (86) has the same form as (85), where the (i,j)(i,j)th element of 𝐂~\tilde{\mathbf{C}} is 𝖳𝗋{[𝐂](i1)N+1:iN,(j1)N+1:jN}\mathsf{Tr}\{[\mathbf{C}]_{(i-1)N+1:iN,(j-1)N+1:jN}\} instead.

Via leveraging (85) and (86), the derivative 𝚽𝗀~(𝚽)\frac{\partial}{\partial\bm{\Phi}}\tilde{\mathsf{g}}(\bm{\Phi}) can be expressed as

𝚽𝗀~(𝚽)=i=12𝐃𝚽,io+𝐃𝚽e,\frac{\partial}{\partial\bm{\Phi}}\tilde{\mathsf{g}}(\bm{\Phi})=\sum_{i=1}^{2}\mathbf{D}_{\bm{\Phi},i}^{\mathrm{o}}+\mathbf{D}_{\bm{\Phi}}^{\mathrm{e}}, (89)

where

𝐃𝚽,1ok=1K𝐅o(𝐖h,kH𝐏,𝐑h,k(𝚽𝐏)𝐖h,k,𝖣𝖽𝗂𝖺𝗀(𝐂1o),𝚽),\displaystyle\mathbf{D}_{\bm{\Phi},1}^{\mathrm{o}}\triangleq{\sum}_{k=1}^{K}\mathbf{F}_{\mathrm{o}}(\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\mathbf{P},\mathbf{R}_{\mathrm{h},k}(\bm{\Phi}\mathbf{P})^{*}\mathbf{W}_{\mathrm{h},k},\mathsf{Ddiag}(\mathbf{C}_{1}^{\mathrm{o}}),\bm{\Phi}),
𝐃𝚽,2ok=1K𝐅o(𝐖h,kH𝐏,𝐑h,k,𝖣𝖽𝗂𝖺𝗀(𝐂2o),𝚽),\displaystyle\mathbf{D}_{\bm{\Phi},2}^{\mathrm{o}}\triangleq-{\sum}_{k=1}^{K}\mathbf{F}_{\mathrm{o}}(\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\mathbf{P},\mathbf{R}_{\mathrm{h},k},\mathsf{Ddiag}(\mathbf{C}_{2}^{\mathrm{o}}),\bm{\Phi}),
𝐃𝚽e𝐅e(𝐏,𝐈MR,𝐂1e+𝐂2e𝐂3e,𝚽),\displaystyle\mathbf{D}_{\bm{\Phi}}^{\mathrm{e}}\triangleq\mathbf{F}_{\mathrm{e}}(\mathbf{P},\mathbf{I}_{M_{\mathrm{R}}},\mathbf{C}_{1}^{\mathrm{e}}+\mathbf{C}_{2}^{\mathrm{e}}-\mathbf{C}_{3}^{\mathrm{e}},\bm{\Phi}),
𝐂1o𝐖GH𝐂1,1o𝐖G,\displaystyle\mathbf{C}_{1}^{\mathrm{o}}\triangleq\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}\mathbf{C}_{1,1}^{\mathrm{o}}\mathbf{W}_{\mathrm{G}},
𝐂1,1o((𝚽𝐏)T𝐈MB)𝐑G((𝚽𝐏)𝐈MB)+𝚺B𝐈MR,\displaystyle\mathbf{C}_{1,1}^{\mathrm{o}}\triangleq((\bm{\Phi}\mathbf{P})^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{R}_{\mathrm{G}}((\bm{\Phi}\mathbf{P})^{\mathrm{*}}\otimes\mathbf{I}_{M_{\mathrm{B}}})+\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}},
𝐂2o𝐖GH((𝚽𝐏)T𝐈MB)𝐑G,\displaystyle\mathbf{C}_{2}^{\mathrm{o}}\triangleq\mathbf{W}_{\mathrm{G}}^{\mathrm{H}}((\bm{\Phi}\mathbf{P})^{\mathrm{T}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{R}_{\mathrm{G}},
𝐂1e𝐑G((𝚽𝐏)𝐈MB)𝐖G𝐂1,1e𝐖GH,\displaystyle\mathbf{C}_{1}^{\mathrm{e}}\triangleq\mathbf{R}_{\mathrm{G}}((\bm{\Phi}\mathbf{P})^{\mathrm{*}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{W}_{\mathrm{G}}\mathbf{C}_{1,1}^{\mathrm{e}}\mathbf{W}_{\mathrm{G}}^{\mathrm{H}},
𝐂2e𝐑G((𝚽𝐏)𝐈MB)𝐖G𝐂2,1e𝐖GH,\displaystyle\mathbf{C}_{2}^{\mathrm{e}}\triangleq\mathbf{R}_{\mathrm{G}}((\bm{\Phi}\mathbf{P})^{\mathrm{*}}\otimes\mathbf{I}_{M_{\mathrm{B}}})\mathbf{W}_{\mathrm{G}}\mathbf{C}_{2,1}^{\mathrm{e}}\mathbf{W}_{\mathrm{G}}^{\mathrm{H}},
𝐂3e𝐑G𝐂3,1e𝐖GH,\displaystyle\mathbf{C}_{3}^{\mathrm{e}}\triangleq\mathbf{R}_{\mathrm{G}}\mathbf{C}_{3,1}^{\mathrm{e}}\mathbf{W}_{\mathrm{G}}^{\mathrm{H}},
𝐂1,1ek=1K𝖣𝖽𝗂𝖺𝗀((𝐖h,kH(𝚽𝐏)T𝐑h,k(𝚽𝐏)𝐖h,k)𝟙MB),\displaystyle\mathbf{C}_{1,1}^{\mathrm{e}}\triangleq{\sum}_{k=1}^{K}\mathsf{Ddiag}((\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}(\bm{\Phi}\mathbf{P})^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k}(\bm{\Phi}\mathbf{P})^{*}\mathbf{W}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}),
𝐂2,1ek=1K𝖣𝖽𝗂𝖺𝗀((𝐖h,kH𝚺U,k𝐖h,k)𝟙MB),\displaystyle\mathbf{C}_{2,1}^{\mathrm{e}}\triangleq{\sum}_{k=1}^{K}\mathsf{Ddiag}((\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Sigma}_{\mathrm{U},k}\mathbf{W}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}),
𝐂3,1ek=1K𝖣𝖽𝗂𝖺𝗀((𝐖h,kH(𝚽𝐏)T𝐑h,k)𝟙MB).\displaystyle\mathbf{C}_{3,1}^{\mathrm{e}}\triangleq{\sum}_{k=1}^{K}\mathsf{Ddiag}((\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}(\bm{\Phi}\mathbf{P})^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}). (90)

Moreover, recalling 𝚽=ej𝚯\bm{\Phi}=e^{j\bm{\Theta}} and utilizing the chain rule, we have

𝚯𝗀(𝚯)=j𝚽𝗀~(𝚽)ej𝚯.\frac{\partial}{\partial\bm{\Theta}}\mathsf{g}(\bm{\Theta})=j\frac{\partial}{\partial\bm{\Phi}}\tilde{\mathsf{g}}(\bm{\Phi})\odot e^{j\bm{\Theta}}. (91)

Due to 𝗀(𝚯)\mathsf{g}(\bm{\Theta}) real-valued, 𝚯𝗀(𝚯)\nabla_{\bm{\Theta}}\mathsf{g}(\bm{\Theta}) can be calculated as [39]

𝚯𝗀(𝚯)=𝚯𝗀(𝚯)+𝚯𝗀(𝚯)\displaystyle\nabla_{\bm{\Theta}}\mathsf{g}(\bm{\Theta})=\frac{\partial}{\partial\bm{\Theta}}\mathsf{g}(\bm{\Theta})+\frac{\partial}{\partial\bm{\Theta}^{*}}\mathsf{g}(\bm{\Theta})
=𝚯𝗀(𝚯)+(𝚯𝗀(𝚯))=2𝖱𝖾{𝚯𝗀(𝚯)}.\displaystyle=\frac{\partial}{\partial\bm{\Theta}}\mathsf{g}(\bm{\Theta})+\bigg{(}\frac{\partial}{\partial\bm{\Theta}}\mathsf{g}(\bm{\Theta})\bigg{)}^{*}=2\mathsf{Re}\bigg{\{}\frac{\partial}{\partial\bm{\Theta}}\mathsf{g}(\bm{\Theta})\bigg{\}}. (92)

C. Derivation of 𝐩𝗁(𝐩)\nabla_{\mathbf{p}}\mathsf{h}(\mathbf{p})

Denote 𝗁~(𝐏)\tilde{\mathsf{h}}(\mathbf{P}) (recall that 𝐏=𝖣𝗂𝖺𝗀(𝐩)\mathbf{P}=\mathsf{Diag}(\mathbf{p})) as the objective of (𝒫4)(\mathcal{P}4) and we firstly investigate 𝐏𝗁~(𝐏)\nabla_{\mathbf{P}}\tilde{\mathsf{h}}(\mathbf{P}). Similar to the procedure shown in Appendix B, 𝐏𝗁~(𝐏)\nabla_{\mathbf{P}}\tilde{\mathsf{h}}(\mathbf{P}) can be given by

𝐏𝗁~(𝐏)=2𝖱𝖾{i=12𝐃𝐏,io+𝐃𝐏e},\nabla_{\mathbf{P}}\tilde{\mathsf{h}}(\mathbf{P})=2\mathsf{Re}\bigg{\{}\sum_{i=1}^{2}\mathbf{D}_{\mathbf{P},i}^{\mathrm{o}}+\mathbf{D}_{\mathbf{P}}^{\mathrm{e}}\bigg{\}}, (93)

where

𝐃𝐏,1ok=1K𝐅o(𝐖h,kH,𝚽T𝐑h,k(𝚽𝐏)𝐖h,k,𝖣𝖽𝗂𝖺𝗀(𝐂1o),𝐏),\displaystyle\mathbf{D}_{\mathbf{P},1}^{\mathrm{o}}\triangleq{\sum}_{k=1}^{K}\mathbf{F}_{\mathrm{o}}(\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}},\bm{\Phi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k}(\bm{\Phi}\mathbf{P})^{*}\mathbf{W}_{\mathrm{h},k},\mathsf{Ddiag}(\mathbf{C}_{1}^{\mathrm{o}}),\mathbf{P}),
𝐃𝐏,2ok=1K𝐅o(𝐖h,kH,𝚽T𝐑h,k,𝖣𝖽𝗂𝖺𝗀(𝐂2o),𝐏),\displaystyle\mathbf{D}_{\mathbf{P},2}^{\mathrm{o}}\triangleq-{\sum}_{k=1}^{K}\mathbf{F}_{\mathrm{o}}(\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}},\bm{\Phi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k},\mathsf{Ddiag}(\mathbf{C}_{2}^{\mathrm{o}}),\mathbf{P}),
𝐃𝐏e𝐅e(𝐈MR,𝚽T,𝐂1e+𝐂2e𝐂3e,𝐏),\displaystyle\mathbf{D}_{\mathbf{P}}^{\mathrm{e}}\triangleq\mathbf{F}_{\mathrm{e}}(\mathbf{I}_{M_{\mathrm{R}}},\bm{\Phi}^{\mathrm{T}},\mathbf{C}_{1}^{\mathrm{e}}+\mathbf{C}_{2}^{\mathrm{e}}-\mathbf{C}_{3}^{\mathrm{e}},\mathbf{P}), (94)

and the notations in (94) have been defined in (90).

By exploiting 𝐩=𝖽𝗂𝖺𝗀(𝐏)\mathbf{p}=\mathsf{diag}(\mathbf{P}), we obtain

𝐩𝗁(𝐩)=𝖽𝗂𝖺𝗀(𝐏𝗁~(𝐏)).\nabla_{\mathbf{p}}\mathsf{h}(\mathbf{p})=\mathsf{diag}(\nabla_{\mathbf{P}}\tilde{\mathsf{h}}(\mathbf{P})). (95)

D. Proof of Theorem 2

Proof.

Consider the following optimization

(𝒫11):min𝐩N\displaystyle(\mathcal{P}11):\min_{\mathbf{p}\in\mathbb{R}^{N}}\; 𝐩222𝖱𝖾{𝐪H𝐩}\displaystyle\|\mathbf{p}\|_{2}^{2}-2\mathsf{Re}\{\mathbf{q}^{\mathrm{H}}\mathbf{p}\} (96)
s.t.\displaystyle\mathrm{s.t.}\; 𝐩22P,\displaystyle\|\mathbf{p}\|_{2}^{2}\leq P, (96a)
[𝐩]n0,n=1,,N,\displaystyle[\mathbf{p}]_{n}\geq 0,\;n=1,\cdots,N, (96b)

which is strictly convex.

The Lagrangian function of (𝒫11)(\mathcal{P}11) reads as

(𝐩,𝝀,μ)=𝐩222𝖱𝖾{𝐪H𝐩}𝝀T𝐩+μ(𝐩22P).\mathcal{L}(\mathbf{p},\bm{\lambda},\mu)=\|\mathbf{p}\|_{2}^{2}-2\mathsf{Re}\{\mathbf{q}^{\mathrm{H}}\mathbf{p}\}-\bm{\lambda}^{\mathrm{T}}\mathbf{p}+\mu(\|\mathbf{p}\|_{2}^{2}-P). (97)

By setting (𝐩,𝝀,μ)𝐩\frac{\partial\mathcal{L}(\mathbf{p},\bm{\lambda},\mu)}{\partial\mathbf{p}} to 0, we obtain

𝐩=11+μ(𝖱𝖾{𝐪}+12𝝀).\mathbf{p}=\frac{1}{1+\mu}\bigg{(}\mathsf{Re}\{\mathbf{q}\}+\frac{1}{2}\bm{\lambda}\bigg{)}. (98)

According to the KKT condition

[𝝀]n0,[𝝀]n[𝐩]n=0,n=1,,N,\displaystyle[\bm{\lambda}]_{n}\geq 0,\;[\bm{\lambda}]_{n}[\mathbf{p}]_{n}=0,\;n=1,\cdots,N, (99)
μ0,μ(𝐩22P)=0,\displaystyle\mu\geq 0,\;\mu(\|\mathbf{p}\|_{2}^{2}-P)=0, (100)

and constraint (96b), we can conclude from (98) that

  • if [𝖱𝖾{𝐪}]n0[\mathsf{Re}\{\mathbf{q}\}]_{n}\leq 0, we have [𝝀]n0[\bm{\lambda}]_{n}\geq 0, [𝐩]n=0[\mathbf{p}]_{n}=0,

  • otherwise, [𝐩]n>0[\mathbf{p}]_{n}>0, [𝝀]n=0[\bm{\lambda}]_{n}=0.

Therefore, (98) can be simplified as

𝐩=[𝖱𝖾{𝐪}]+1+μ,\mathbf{p}=\frac{[\mathsf{Re}\{\mathbf{q}\}]_{+}}{1+\mu}, (101)

where [𝐱]+max{𝐱,𝟎}[\mathbf{x}]_{+}\triangleq\max\{\mathbf{x},\bm{0}\} in an element-wise manner.

Following (100), we can observe that

  • if [𝖱𝖾{𝐪}]+22P\|[\mathsf{Re}\{\mathbf{q}\}]_{+}\|_{2}^{2}\leq P when μ=0\mu=0, we obtain

    𝐩=[𝖱𝖾{𝐪}]+,\mathbf{p}^{\star}=[\mathsf{Re}\{\mathbf{q}\}]_{+}, (102)
  • otherwise, there must exist μ>0\mu>0 such that

    𝐩=P[𝖱𝖾{𝐪}]+2[𝖱𝖾{𝐪}]+,\mathbf{p}^{\star}=\frac{\sqrt{P}}{\|[\mathsf{Re}\{\mathbf{q}\}]_{+}\|_{2}}[\mathsf{Re}\{\mathbf{q}\}]_{+}, (103)

Substituting the parameters in (𝒫5)(\mathcal{P}5) into (102) or (103), we obtain the desired result. ∎

E. Derivation of (20)

The derivative 𝐖Gk=1K𝖳𝗋{𝐂Hc,k(𝚽,𝐏,𝐖G,𝓦h)}\frac{\partial}{\partial\mathbf{W}_{\mathrm{G}}^{*}}\sum_{k=1}^{K}\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}(\bm{\Phi},\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}})\} can be written as

𝐖Gk=1K𝖳𝗋{𝐂Hc,k(𝚽,𝐏,𝐖G,𝓦h)}=(𝖠G(𝚿)𝐑G𝖠GH(𝚿)\displaystyle\frac{\partial}{\partial\mathbf{W}_{\mathrm{G}}^{*}}\sum_{k=1}^{K}\mathsf{Tr}\{\mathbf{C}_{\mathrm{H}_{\mathrm{c},k}}(\bm{\Phi},\mathbf{P},\mathbf{W}_{\mathrm{G}},\bm{\mathcal{W}}_{\mathrm{h}})\}=(\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}}\mathsf{A}_{\mathrm{G}}^{\mathrm{H}}(\bm{\Psi})
+𝚺B𝐈MR)𝐖Gk=1K𝖣𝖽𝗂𝖺𝗀(𝐂h^,k𝟙MB)𝖠G(𝚿)𝐑G\displaystyle+\bm{\Sigma}_{\mathrm{B}}\otimes\mathbf{I}_{M_{\mathrm{R}}})\mathbf{W}_{\mathrm{G}}{\sum}_{k=1}^{K}\mathsf{Ddiag}(\mathbf{C}_{\hat{\mathrm{h}},k}\otimes\mathbbm{1}_{M_{\mathrm{B}}})-\mathsf{A}_{\mathrm{G}}(\bm{\Psi})\mathbf{R}_{\mathrm{G}}
×k=1K𝖣𝖽𝗂𝖺𝗀((𝐖h,kH𝚿T𝐑h,k)𝟙MB),\displaystyle\times{\sum}_{k=1}^{K}\mathsf{Ddiag}((\mathbf{W}_{\mathrm{h},k}^{\mathrm{H}}\bm{\Psi}^{\mathrm{T}}\mathbf{R}_{\mathrm{h},k})\otimes\mathbbm{1}_{M_{\mathrm{B}}}), (104)

where (84) is utilized. Then, by setting the above derivative to zero and exploiting (21), we obtain (20).

References

  • [1] C. Pan and et al., “An overview of signal processing techniques for RIS/IRS-aided wireless systems,” IEEE J. Sel. Topics Signal Process., vol. 16, no. 5, pp. 883–919, Aug. 2022.
  • [2] Q. Wu, S. Zhang, B. Zheng, C. You, and R. Zhang, “Intelligent reflecting surface-aided wireless communications: A tutorial,” IEEE Trans. Commun., vol. 69, no. 5, pp. 3313–3351, May 2021.
  • [3] H. Tataria, M. Shafi, A. F. Molisch, M. Dohler, H. Sjöland, and F. Tufvesson, “6G wireless systems: Vision, requirements, challenges, insights, and opportunities,” in Proceedings of the IEEE, vol. 109, no. 7, pp. 1166–1199, Jul. 2021.
  • [4] B. Zheng, C. You, W. Mei, and R. Zhang, “A survey on channel estimation and practical passive beamforming design for intelligent reflecting surface aided wireless communications,” IEEE Commun. Surveys Tuts., vol. 24, no. 2, pp. 1035–1071, Secondquarter 2022.
  • [5] D. Mishra and H. Johansson, “Channel estimation and low-complexity beamforming design for passive intelligent surface assisted MISO wireless energy transfer,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Brighton, U.K., May 2019.
  • [6] T. L. Jensen and E. D. Carvalho, “An optimal channel estimation scheme for intelligent reflecting surfaces based on a minimum variance unbiased estimator,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Barcelona, Spain, May 2020.
  • [7] Z. Zhou, N. Ge, Z. Wang, and L. Hanzo, “Joint transmit precoding and reconfigurable intelligent surface phase adjustment: A decomposition-aided channel estimation approach,” IEEE Trans. Commun., vol. 69, no. 2, pp. 1228–1243, Feb. 2021.
  • [8] N. K. Kundu and M. R. McKay, “Channel estimation for reconfigurable intelligent surface aided MISO communications: From LMMSE to deep learning solutions,” IEEE Open J. Commun. Soc., vol. 2, pp. 471–487, Mar. 2021.
  • [9] Z. Wang, L. Liu, and S. Cui, “Channel estimation for intelligent reflecting surface assisted multiuser communications: Framework, algorithms, and analysis,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6607–6620, Oct. 2020.
  • [10] B. Zheng and R. Zhang, “Intelligent reflecting surface-enhanced OFDM: Channel estimation and reflection optimization,” IEEE Wireless Commun. Lett., vol. 9, no. 4, pp. 518–522, Apr. 2020.
  • [11] C. You, B. Zheng, and R. Zhang, “Channel estimation and passive beamforming for intelligent reflecting surface: Discrete phase shift and progressive refinement,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2604–2620, Nov. 2020.
  • [12] P. Wang, J. Fang, H. Duan, and H. Li, “Compressed channel estimation for intelligent reflecting surface-assisted millimeter wave systems,” IEEE Signal Process. Lett., vol. 27, pp. 905–909, 2020.
  • [13] H. Liu, J. Zhang, Q. Wu, H. Xiao, and B. Ai, “ADMM based channel estimation for RISs aided millimeter wave communications,” IEEE Commun. Lett., vol. 25, no. 9, pp. 2894–2898, Sep. 2021.
  • [14] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Enabling large intelligent surfaces with compressive sensing and deep learning,” IEEE Access, vol. 9, pp. 44304–44321, Mar. 2021.
  • [15] S. Liu, Z. Gao, J. Zhang, M. D. Renzo, and M.-S. Alouini, “Deep denoising neural network assisted compressive channel estimation for mmWave intelligent reflecting surfaces,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 9223–9228, Aug. 2020.
  • [16] X. Chen, J. Shi, Z. Yang, and L. Wu, “Low-complexity channel estimation for intelligent reflecting surface-enhanced massive MIMO,” IEEE Wireless Commun. Lett., vol. 10, no. 5, pp. 996–1000, May 2021.
  • [17] Y. Lin, S. Jin, M. Matthaiou, and X. You, “Tensor-based algebraic channel estimation for hybrid IRS-assisted MIMO-OFDM,” IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 3770–3784, Jun. 2021.
  • [18] R. Schroeder, J. He, G. Brante, and M. Juntti, “Two-stage channel estimation for hybrid RIS assisted MIMO systems,” IEEE Trans. Commun., vol. 70, no. 7, pp. 4793–4806, Jul. 2022.
  • [19] L. Wei, C. Huang, G. C. Alexandropoulos, C. Yuen, Z. Zhang, and M. Debbah, “Channel estimation for RIS-empowered multi-user MISO wireless communications,” IEEE Trans. Commun., vol. 69, no. 6, pp. 4144–4157, Jun. 2021.
  • [20] Z.-Q. He and X. Yuan, “Cascaded channel estimation for large intelligent metasurface assisted massive MIMO,” IEEE Wireless Commun. Lett., vol. 9, no. 12, pp. 210–214, Feb. 2020.
  • [21] H. Liu, X. Yuan, and Y.-J. A. Zhang, “Matrix-calibration-based cascaded channel estimation for reconfigurable intelligent surface assisted multiuser MIMO,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2621–2636, Nov. 2020.
  • [22] M. Najafi, V. Jamali, R. Schober, and H. V. Poor, ”Physics-based modeling and scalable optimization of large intelligent reflecting surfaces,” IEEE Trans. Commun., vol. 69, no. 4, pp. 2673–2691, Apr. 2021.
  • [23] Z. Zhang and et al., “Active RIS vs. passive RIS: Which will prevail in 6G?” IEEE Trans. Commun., vol. 71, no. 3, pp. 1707–1725, Mar. 2023.
  • [24] C. You and R. Zhang, “Wireless communication aided by intelligent reflecting surface: Active or passive?” IEEE Wireless Commun. Lett., vol. 10, no. 12, pp. 2659–2663, Dec. 2021.
  • [25] A. L. Swindlehurst, G. Zhou, R. Liu, C. Pan, and M. Li, “Channel estimation with reconfigurable intelligent surfaces—a general framework,” in Proceedings of the IEEE, vol. 110, no. 9, pp. 1312–1338, Sep. 2022.
  • [26] R. W. Heath Jr., N. G.-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 436–453, Apr. 2016.
  • [27] Q. Shi and M. Hong, “Penalty dual decomposition method for nonsmooth nonconvex optimization—part i: Algorithms and convergence analysis,” IEEE Trans. Signal Process., vol. 68, pp. 4108–4122, Jun. 2020.
  • [28] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundat. Trends Signal Process., vol. 11, no. 3–4, pp. 154–655, 2017.
  • [29] E. Björnson, L. Sanguinetti, and M. Debbah, “Massive MIMO with imperfect channel covariance information,” in Proc. 50th Asilomar Conf. Signals, Syst. Comput., Nov. 2016, pp. 974–978.
  • [30] I. Viering, H. Hofstetter, and W. Utschick, “Spatial long-term variations in urban, rural and indoor environments,” in the 5th Meeting of COST273, Lisbon, Portugal, 2002.
  • [31] N. Shariati, E. Björnson, M. Bengtsson, and M. Debbah, “Low-complexity polynomial channel estimation in large-scale MIMO with arbitrary statistics,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 815–830, Oct. 2014.
  • [32] K. Upadhya and S. A. Vorobyov, “Covariance matrix estimation for massive MIMO,” IEEE Signal Process. Lett., vol. 25, no. 4, pp. 546–550, Apr. 2018.
  • [33] D. Bertsekas, Nonlinear programming, 2nd ed. Belmont, MA: Athena Scientific, 1999.
  • [34] A. Beck, First-Order Methods in Optimizations, Society for Industrial and Applied Mathematics and Mathematical Optimization Society, 2017.
  • [35] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergence analysis of block successive minimization methods for nonsmooth optimization,” SIAM J. Optim., vol. 23, no. 2, pp. 1126–1153, 2013.
  • [36] G. Golub, S. Nash, and C. V. Loan, “A Hessenberg-Schur method for the problem AX + XB = C,” IEEE Trans. Autom. Control, vol. 24, no. 6, pp. 909–913, Dec. 1979.
  • [37] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, Upper Saddle River, NJ, USA, 1993.
  • [38] Further advancements for E-UTRA physical layer aspects (Release 9). 3GPP TS 36.814, Mar. 2010.
  • [39] A. Hjørungnes, Complex-Valued Matrix Derivatives, New York, NY, USA: Cambridge Univ. Press, 2011.