This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Energy-Efficient Designs for SIM-Based Broadcast MIMO Systems

Nemanja Stefan Perović,  Eduard E. Bahingayi, and Le-Nam Tran N. S. Perović was with Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des Signaux et Systèmes, 3 Rue Joliot-Curie, 91192 Gif-sur-Yvette, France (Email: [email protected]).Eduard E. Bahingayi and Le-Nam Tran are with the School of Electrical and Electronic Engineering, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland (Email: [email protected], [email protected]).
Abstract

Stacked intelligent metasurface (SIM), which consists of multiple layers of intelligent metasurfaces, is emerging as a promising solution for future wireless communication systems. In this timely context, we focus on broadcast multiple-input multiple-output (MIMO) systems and aim to characterize their energy efficiency (EE) performance. To gain a comprehensive understanding of the potential of SIM, we consider both dirty paper coding (DPC) and linear precoding and formulate the corresponding EE maximization problems. For DPC, we employ the broadcast channel (BC)- multiple-access channel (MAC) duality to obtain an equivalent problem, and optimize users’ covariance matrices using the successive convex approximation (SCA) method, which is based on a tight lower bound of the achievable sum-rate, in combination with Dinkelbach’s method. Since optimizing the phase shifts of the SIM meta-elements is an optimization problem of extremely large size, we adopt a conventional projected gradient-based method for its simplicity. A similar approach is derived for the case of linear precoding. Simulation results show that the proposed optimization methods for the considered SIM-based systems can significantly improve the EE, compared to the conventional counterparts. Also, we demonstrate that the number of SIM meta-elements and their distribution across the SIM layers have a significant impact on both the achievable sum-rate and EE performance.

Index Terms:
Optimization, broadcast, EE, MIMO, stacked intelligent metasurface (SIM), multi-user.

I Introduction

The framework for the future development of International Mobile Telecommunications (IMT) for 2030 highlights sustainability as a fundamental goal for future communication systems [1]. This means that these systems are expected to be designed with minimal environmental impact, focusing on the efficient use of resources, reducing power consumption, and lowering greenhouse gas emissions. Due to this, the study and development of energy-efficient wireless communications have recently attracted much of attention. At the same time, the global mobile network data traffic is expected to reach 563 exabytes (EBs) by 2029 [2]. To accommodate such a high volume of data traffic, existing network technologies need to evolve, providing additional capabilities. For example, conventional MIMO systems are advancing toward massive MIMO (mMIMO) and ultra-massive MIMO (umMIMO) systems. However, a large number of radio frequency (RF) chains required to support mMIMO transmissions results in substantial power consumption, which leads to an unsustainable and energy inefficient communication model.

A promising technical solution that addresses the growing demand for higher data rates while simultaneously enhancing EE is based on the use of intelligent metasurfaces, specifically reconfigurable intelligent surfaces. RISs are composed of a large number of programmable metamaterial or tiny discrete antenna elements. Each of these elements is capable, using integrated electronic circuits, of dynamically adjusting its electromagnetic (EM) properties (i.e., to form EM fields with controllable amplitudes, phases, polarization) and consequently its EM response to the incoming waves. In this way, RISs can modify the incoming waves in a programmable and controllable manner [3]. This capability allows RISs to simultaneously improve multiple performance metrics, such as spectrum efficiency, EE, coverage. Unfortunately, the multiplicative effect of the path loss of the RIS-assisted links significantly limits the potential EE gains from RIS deployment.

To address the critical issue of RIS, several innovative approaches have been proposed that utilize metamaterial-based antenna technologies instead of conventional antenna arrays in mMIMO transceiver design. These include holographic radio, dynamic metasurface antennas and SIMs. Holographic radio, also known as holographic MIMO (HMIMO), is a hybrid transceiver architecture that achieves high directive gain, spectral efficiency and EE by incorporating a continuous structure of densely packing sub-wavelength metamaterial antenna elements. These element, combined with holographic techniques, are capable of recording and reconstructing the amplitude and the phase of wave fronts [4]. The significantly lower power consumption of HMIMO allows for the deployment of a greater number of antenna elements compared to traditional mMIMO, resulting in higher EE [5]. Similarly, DMAs consist of multiple microstrips, each composed of a multitude of metamaterial radiating elements and connected to a single RF chain [6]. Due to this, DMAs achieve better EE performance than even hybrid analog-to-digital (A/D) architectures, since they do not need additional power to support numerous phase shifters [7]. However, both HMIMOs and DMAs are single layer matasuface structures, which may require a very large number of elements due to practical hardware constraints that limit the number of tunable amplitudes/phases associated with each meta-element.

In contrast, SIMs represent the latest advancement in metamaterial-based antenna technologies. SIMs consist of multiple parallel metasurface layers, each accommodating numerous meta-elements with programmable phase characteristics. These layers are integrated with conventional radio transceivers that employ a small to moderate number of active antennas. The concept of SIMs draws inspiration from the architecture of a deep neural network (DNN), which is a multi-layer neuron structure capable of implementing various functions [8]. Similarly, SIMs can efficiently implement different signal processing tasks, such as transmit precoding and receive combining, directly in the EM domain when properly controlled and programmed. Hence, SIMs have the potential to substantially improve the performance metrics of conventional communication systems, such as the achievable rate and the EE, while requiring minimal additional hardware complexity.

In [9], SIMs were exploited to implement a 2D discrete Fourier transform (DFT) for direction of arrival direction of arrival (DOA) estimation. Moreover, a hybrid channel estimator was proposed in [10], in which the received training symbols were initially processed in the wave domain and subsequently in the digital domain. In [11], the authors jointly optimized the transmit beamforming at the base station (BS) and the SIM phase shifts, to minimize the Cramer-Rao bound (CRB) for target estimation. Using an experimental SIM platform, they evaluated the performance of the proposed algorithms for communication and sensing tasks.

A general path loss model for an SIM-assisted wireless communication system was developed in [12], based on which, an algorithm aimed at maximizing the received power was derived. In [13], the authors studied the achievable sum-rate maximization problem for a downlink channel between a SIM-assisted BS and multiple single-antenna users. The achievable rate optimization for a downlink multi-user SIM-assisted system using statistical channel state information (CSI) was proposed in [14]. Utilizing statistical CSI, the ergodic sum-rate was optimized for a satellite communication system in [15]. In [16], a joint optimization of the SIM phase shifts and transmit power allocation for maximizing the sum-rate in a SIM-assisted multi-user multiple-input single-output (MISO) communication system was implemented, employing a deep reinforcement learning (DRL) approach. In [17], the authors optimized the achievable rate in an uplink SIM-based cell-free MIMO architecture with distributed signal processing. In this setup, each access point (AP) performs local detection of user information, and a central processing unit (CPU) subsequently combines these local estimates to recover the final user information.

The integration of SIMs with transmitters and receivers into a so-called SIM-based HMIMO system, which performs signal precoding and combining in the wave domain, was proposed in [18]. The introduced channel fitting approach enables the SIM-based HMIMO system to achieve significant channel capacity gains compared to mMIMO and RIS-assisted counterparts. Furthermore, the optimization of achievable rates for the SIM-based HMIMO system was studied in [19]. An approach for the mutual information maximization in a SIM-based HMIMO system with discrete signaling was presented in [20], using the cutoff rate as an alternative metric. This study demonstrates that incorporating even a small-scale digital precoder into the system can substantially increase the mutual information performance.

Despite the extensive research summarized in the aforementioned papers, the EE analysis of SIM-assisted MIMO systems remains unexplored. Motivated by this gap, we aim to maximize the EE for a SIM-aided broadcast system with DPC and linear precoding. To find the maximum achievable EE in the case of DPC, we formulate a joint optimization problem of the covariance matrix of the transmitted signal and the phase shifts of the SIM meta-elements. For linear precoding, which is more practical, we consider a joint optimization problem of the transmit signal precoding and the phase shifts of the SIM meta-elements. In both cases, the BS has a limited total power budget and the SIM meta-elements are subject to the unit modulus constraint. The main contributions of this paper are summarized as follows:

  • For DPC, we exploit the well-known Gaussian MIMO BC-MAC duality, reformulating the original EEmax problem as a function of the users’ covariance matrices in the MAC and the phase shifts of the SIM meta-elements. In the context of the adopted AO framework, we present an efficient solution for optimizing the users’ covariance matrices. This solution is based on a tight and concave lower bound of the achievable sum-rate, which is derived using the SCA method. By applying Dinkelbach’s method, we then obtain the optimal users’ covariance matrices by closed-form expressions. Our complexity analysis demonstrates that our proposed method has significantly lower complexity compared to an existing solution. For the optimization of the phase shifts of the SIM meta-elements, we employ a conventional projected gradient-based method, updating all SIM layers in parallel. This approach is viable considering the large size of this problem. In this context, we derive closed-form expressions for the complex-valued gradients involved.

  • For linear precoding, we leverage an interesting recent result for the sum-rate maximization that allows for reformulating the considered EEmax problem as an equivalent one, but with a greatly reduced dimension. After this important step, we again invoke the SCA method to derive a quadratic lower bound of the achievable sum-rate and approximate the EEmax problem as a concave fractional program. Next, we apply Dinkebach’s method to solve the resulting problem, where optimal users’ precoders are found by closed-form expressions. Similar to the DPC-based scheme, the phase shifts of the SIM meta-elements in this setting are optimized in parallel using a conventional projected gradient-based method.

  • We present efficient implementations of the proposed algorithms, analyze their computational complexities in terms of the number of complex multiplications, and mathematically prove their convergence.

  • We show through simulation results that the proposed algorithms can substantially increase the EE in SIM-aided broadcast communication systems, with greater improvements observed in the case of DPC. Moreover, we demonstrate that using the aforementioned precoding schemes is crucial to mitigate the impact of multi-user inference, especially in systems with a large number of users. We also provide several valuable insights into the design and performance of SIM-based holographic MIMO systems. First, we show that the EE is highly dependent on the number and distribution of SIM meta-elements across the SIM layers. Second, we find that the EE for a SIM-aided system with a low number of meta-elements can even be lower than the EE for a conventional MIMO system without SIM integration. Third, in SIM-aided broadcast systems without digital precoding, optimal EE transmission involves activating only a subset of the available transmit antennas, where each antenna in this subset transmits an independent data stream. Lastly, we demonstrate that at least 3 bits per meta-element are required to ensure that the reduction in EE caused by quantization errors remains within acceptable limits.

Notation: Bold lower and upper case letters represent vectors and matrices, respectively. m×n\mathbb{C}^{m\times n} denotes the space of m×nm\times n complex matrices. 𝐇T\mathbf{H}^{T} and 𝐇H\mathbf{H}^{H} denote the transpose and Hermitian transpose of 𝐇\mathbf{H}, respectively. |𝐇||\mathbf{H}| is the determinant of 𝐇\mathbf{H} and Tr(𝐇)\operatorname{Tr}(\mathbf{H}) denotes the trace of 𝐇\mathbf{H}. log2()\log_{2}(\cdot) is the binary logarithm, ln()\ln(\cdot) is the natural logarithm, ()+(\cdot)^{+} denotes the pseudo-inverse and ()\left(\cdot\right)^{\ast} denotes the complex conjugate. 𝐇\left\|\mathbf{H}\right\| denotes the Frobenius norm of 𝐇\mathbf{H} which reduces to the Euclidean norm if 𝐇\mathbf{H} is a vector. vecd(𝐇)\operatorname{vec}_{d}(\mathbf{H}) is the vector comprised of the diagonal elements of 𝐇\mathbf{H}. The notation 𝐀()𝐁\mathbf{A}\succeq(\succ)\mathbf{B} means that 𝐀𝐁\mathbf{A}-\mathbf{B} is positive semidefinite (definite). 𝐈\mathbf{I} represents an identity matrix whose size depends from the context. (𝐱)\Re(\mathbf{x}) and (𝐱)\Im(\mathbf{x}) denote the real and imaginary part of 𝐱\mathbf{x}, respectively. For a vector 𝐱\mathbf{x}, diag(𝐱)\operatorname{diag}(\mathbf{x}) denotes a diagonal matrix with the elements of 𝐱\mathbf{x} on the diagonal. 𝒞𝒩(μ,σ2\mathcal{CN}(\mu,\sigma^{2}) denotes a circularly symmetric complex Gaussian random variable with mean μ\mu and variance σ2\sigma^{2}. |x||x| denotes the modulus of the complex number xx, and |𝐇\mathbf{H}| denotes the determinant of 𝐇\mathbf{H}. Finally, we denote by 𝐱f()\nabla_{\mathbf{x}}f(\cdot) the complex gradient of f()f(\cdot) with respect to (w.r.t.) 𝐱\mathbf{x}^{\ast}, i.e., 𝐱f()=12(f()(𝐱)+jf()(𝐱))\nabla_{\mathbf{x}}f(\cdot)=\frac{1}{2}\Bigl{(}\frac{\partial f(\cdot)}{\partial\Re(\mathbf{x})}+j\frac{\partial f(\cdot)}{\partial\Im(\mathbf{x})}\Bigr{)}.

II System Model and Problem Formulation

II-A System Model

We consider a multi-user broadcast system in which a BS with NtN_{t} transmit antennas communicates with KK users, where each user has NrN_{r} receive antennas. The BS is also equipped with a SIM, which consists of LL metasurface layers with NN meta-elements per layer. In general, SIMs are controlled by external field programmable gate array (FPGA) devices, which adjust the phase shifts of individual meta-elements, thereby implementing signal beamforming directly in the EM wave domain.111By considering both SIM (i.e., wave-based precoding) and digital precoding, our system model is general enough to include the wave-based only precoding as a special case.

The phase shifts of the meta-elements in the ll-th SIM layer are presented by the diagonal matrix 𝚽l=diag(ϕl)=diag([ϕ1lϕ2lϕNl]T)\boldsymbol{\mathbf{\Phi}}^{l}=\text{diag}(\boldsymbol{\phi}^{l})=\text{diag}([\phi_{1}^{l}\>\phi_{2}^{l}\>\cdots\>\phi_{N}^{l}]^{T}), where ϕnl=exp(jθnl)\phi_{n}^{l}=\exp(j\theta_{n}^{l}) and θnl\theta_{n}^{l} is the phase shift introduced by the nn-th element of the ll-th layer. Signal propagation between two consecutive layers, l1l-1 and ll, of the SIM is modeled by the matrix 𝐖lN×N\mathbf{W}^{l}\in\mathbb{C}^{N\times N} for l=2,3,,Ll=2,3,\dots,L. More precisely, signal propagation between the nn-th meta-element of the (l1)(l-1)-th and the mm-th meta-element of ll-th layer of the SIM is presented by the (m,n)(m,n)-th element of 𝐖l\mathbf{W}^{l}, which is calculated according to the Rayleigh-Sommerfeld diffraction theory as [21, Eq. (1)]

[𝐖l]m,n=Akcosχm,ndm,n(12πdm,njλ)ej2πdm,nλ[\mathbf{W}^{l}]_{m,n}=\frac{A_{k}\cos\chi_{m,n}}{d_{m,n}}\Bigl{(}\frac{1}{2\pi d_{m,n}}-\frac{j}{\lambda}\Bigr{)}e^{j\frac{2\pi d_{m,n}}{\lambda}} (1)

where AkA_{k} is the area of each meta-element, dm,nd_{m,n} is the distance between the meta-elements of these two layers of the SIM, χm,n\chi_{m,n} is the angle between the propagation direction and normal direction of the (l1)(l-1)-th layer, and λ\lambda is the wavelength. Signal propagation between the transmit antenna array and the first layer of the SIM is presented by the matrix 𝐖1N×Nt\mathbf{W}^{1}\in\mathbb{C}^{N\times N_{t}}, whose elements can be calculated as in (1). Finally, the EM response of the transmit SIM can be written as

𝐁=𝚽L𝐖L𝚽2𝐖2𝚽1𝐖1N×Nt.\mathbf{B}=\boldsymbol{\mathbf{\Phi}}^{L}\mathbf{W}^{L}\cdots\boldsymbol{\mathbf{\Phi}}^{2}\mathbf{W}^{2}\boldsymbol{\mathbf{\Phi}}^{1}\mathbf{W}^{1}\in\mathbb{C}^{N\times N_{t}}. (2)

For the considered system, the end-to-end channel matrix between the BS and the kk-th user receive antenna array is given by

𝐇k=𝐆k𝐁Nr×Nt\mathbf{H}_{k}=\mathbf{G}_{k}\mathbf{B}\in\mathbb{C}^{N_{r}\times N_{t}} (3)

where 𝐆k\mathbf{G}_{k} Nr×N\in\mathbb{C}^{N_{r}\times N} denotes the channel matrix between the final layer of the SIM and user kk. We assume that 𝐆k\mathbf{G}_{k} is perfectly known to the BS in an effort to investigate its full theoretical potential.

II-B Dirty Paper Coding (DPC)

In a multi-user broadcast system, the received signal at user kk is given by

𝐲k=𝐇k𝐬k+j<k𝐇k𝐬j+j>k𝐇k𝐬j+𝐧k\mathbf{y}_{k}=\mathbf{H}_{k}\mathbf{s}_{k}+\sum\nolimits_{j<k}\mathbf{H}_{k}\mathbf{s}_{j}+\sum\nolimits_{j>k}\mathbf{H}_{k}\mathbf{s}_{j}+\mathbf{n}_{k} (4)

where 𝐇kNr×Nt\mathbf{H}_{k}\in\mathbb{C}^{N_{r}\times N_{t}} is the channel matrix for user kk, 𝐬kNt×1\mathbf{s}_{k}\in\mathbb{C}^{N_{t}\times 1} is the transmitted signal intended for user kk, and 𝐬jNt×1\mathbf{s}_{j}\in\mathbb{C}^{N_{t}\times 1} for jij\neq i are the transmitted signals intended for the other users, which act as interference for the detection of 𝐬k\mathbf{s}_{k}. The noise vector 𝐧kNr×1\mathbf{n}_{k}\in\mathbb{C}^{N_{r}\times 1} consists of independent and identically distributed (i.i.d.) elements that are distributed according to 𝒞𝒩(0,σ2)\mathcal{CN}(0,\sigma^{2}), where σ2\sigma^{2} is the noise variance. DPC is capable of eliminating the interference term j<k𝐇k𝐱j\sum_{j<k}\mathbf{H}_{k}\mathbf{x}_{j}, which is caused by users 1,2,,k11,2,\dots,k-1. Therefore, the achievable rate of user kk is given by

RBC,k(𝐐,ϕ)=ln|𝐈+𝐇k(jk𝐐j)𝐇kH||𝐈+𝐇k(j>k𝐐j)𝐇kH|,R_{\text{BC},k}(\mathbf{Q},\boldsymbol{\phi})=\ln\frac{\Bigl{|}\mathbf{I}+\mathbf{H}_{k}\bigl{(}\sum_{j\geq k}\mathbf{Q}_{j}\bigr{)}\mathbf{H}_{k}^{H}\Bigr{|}}{\Bigl{|}\mathbf{I}+\mathbf{H}_{k}\bigl{(}\sum_{j>k}\mathbf{Q}_{j}\bigr{)}\mathbf{H}_{k}^{H}\Bigr{|}}, (5)

where, by slight abuse of notation, 𝐇k\mathbf{H}_{k} stands for 𝐇k/σ\mathbf{H}_{k}/\sigma (i.e., 𝐇k\mathbf{H}_{k} is normalized by the square root of the noise power), 𝐐k=𝔼{𝐬k𝐬kH}𝟎\mathbf{Q}_{k}=\mathbb{E}\bigl{\{}\mathbf{s}_{k}\mathbf{s}_{k}^{H}\bigr{\}}\succeq\mathbf{0} is the input covariance matrix of user kk and 𝐐=(𝐐1,𝐐2,,𝐐K)\mathbf{Q}=(\mathbf{Q}_{1},\mathbf{Q}_{2},\dots,\mathbf{Q}_{K}). These covariance matrices are constrained by the total power budget as

k=1KTr(𝐐k)Pmax\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{Q}_{k})\leq P_{\max} (6)

where PmaxP_{\max} is the available transmit power budget.

II-C Multi-user MIMO with Linear Precoding

Although DPC is a capacity achieving scheme, it has high complexity due to its nonlinear processing nature. On the other hand, linear precoding is much simpler to implement in practice. For linear precoding, the transmitted signal is expressed as

𝐱=k=1K𝐏k𝐬k\mathbf{x}=\sum\nolimits_{k=1}^{K}\mathbf{P}_{k}\mathbf{s}_{k} (7)

where 𝐬k\mathbf{s}_{k} is the signal intended for user kk and 𝐏kNt×Nr\mathbf{P}_{k}\in\mathbb{C}^{N_{t}\times N_{r}} is the corresponding linear precoder. Thus, the received signal at user kk is given by

𝐲k\displaystyle\mathbf{y}_{k} =𝐇k𝐱+𝐧k\displaystyle=\mathbf{H}_{k}\mathbf{x}+\mathbf{n}_{k}
=𝐇k𝐏k𝐬k+j=1,jkK𝐇k𝐏j𝐬j+𝐧k.\displaystyle=\mathbf{H}_{k}\mathbf{P}_{k}\mathbf{s}_{k}+\sum\nolimits_{j=1,j\neq k}^{K}\mathbf{H}_{k}\mathbf{P}_{j}\mathbf{s}_{j}+\mathbf{n}_{k}. (8)

By treating the multiuser interference as Gausian noise, the achievable rate of user kk is given by

RL,k(𝐏,ϕ)=\displaystyle\!\!\!R_{\text{L},k}(\mathbf{P},\boldsymbol{\phi})= ln|𝐈+𝐇k𝐏k𝐏kH𝐇kH\displaystyle\ln\bigg{|}\mathbf{I}+\mathbf{H}_{k}\mathbf{P}_{k}\mathbf{P}_{k}^{H}\mathbf{H}_{k}^{H}
×(𝐈+j=1,jkK𝐇k𝐏j𝐏jH𝐇kH)1|\displaystyle\times\left(\mathbf{I}+\sum\nolimits_{j=1,j\neq k}^{K}\mathbf{H}_{k}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{H}_{k}^{H}\right)^{-1}\bigg{|} (9)

where 𝐇k=𝐇k/σ\mathbf{H}_{k}=\mathbf{H}_{k}/\sigma and the precoding matrices 𝐏=(𝐏1,𝐏2,,𝐏K)\mathbf{P}=(\mathbf{P}_{1},\mathbf{P}_{2},\dots,\mathbf{P}_{K}) have to satisfy the total power constraint:

k=1KTr(𝐏k𝐏kH)Pmax.\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{P}_{k}\mathbf{P}_{k}^{H})\leq P_{\max}. (10)

II-D Problem Formulation

In this paper, our goal is to maximize the EE of the considered communication system, which is defined as the ratio of the sum-rate and the total power consumption. To this end, we model the total power consumption as

Ptot=Pt+NtPc+P0+LNPs,P_{\mathrm{tot}}=P_{t}+N_{t}P_{c}+P_{0}+LNP_{s}, (11)

where PtP_{t} is the data-dependent transmit signal power, PcP_{c} is the circuit power per RF chain, P0P_{0} is the basic power consumed at the BS, and PsP_{s} is the power consumption of the switching circuits (e.g., PIN diode, varactor diodes) of every SIM meta-element.222In this model, we assume that the control and driving circuits of the SIM are integrated in the BS, and the power consumption of these circuits is already included in the BS power consumption, P0P_{0}. Note that Pt=k=1KTr(𝐐k)P_{t}=\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{Q}_{k}) for the DPC-based scheme and Pt=k=1KTr(𝐏k𝐏kH)P_{t}=\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{P}_{k}\mathbf{P}_{k}^{H}) for the linear precoding scheme. Since BSs have the largest power consumption in mobile networks, the users’ consumed power is not taken into account in the considered EE optimization.

For the DPC-based scheme, the EE maximization (EEmax) problem is stated as

​​maximize𝐐,ϕ\underset{\mathbf{Q},\boldsymbol{\phi}}{\operatorname{maximize}} ηdpc=Wk=1KRBC,k(𝐐,ϕ)k=1KTr(𝐐k)+NtPc+P0+LNPs\displaystyle\eta_{\mathrm{dpc}}=\frac{W\sum_{k=1}^{K}R_{\text{BC},k}(\mathbf{Q},\boldsymbol{\phi})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{Q}_{k})+N_{t}P_{c}+P_{0}+LNP_{s}} (12a)
subjectto\displaystyle\operatorname{subject~{}to} |ϕ|=1,\displaystyle\left|\boldsymbol{\phi}\right|=1, (12b)
k=1KTr(𝐐k)Pmax;𝐐k𝟎,k,\displaystyle\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{Q}_{k})\leq P_{\text{max}};\mathbf{Q}_{k}\succeq\mathbf{0},\forall k, (12c)

where WW is the system bandwidth. Similarly, the EEmax problem with linear precoding is written as

​​maximize𝐏,ϕ\underset{\mathbf{P},\boldsymbol{\phi}}{\operatorname{maximize}} ηlp=Wk=1KRL,k(𝐏,ϕ)k=1KTr(𝐏k𝐏kH)+NtPc+P0+LNPs\displaystyle\eta_{\mathrm{lp}}=\frac{W\sum_{k=1}^{K}R_{\text{L},k}(\mathbf{P},\boldsymbol{\phi})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{P}_{k}\mathbf{P}_{k}^{H})+N_{t}P_{c}+P_{0}+LNP_{s}} (13a)
subjectto\displaystyle\operatorname{subject~{}to} |ϕ|=1,\displaystyle\left|\boldsymbol{\phi}\right|=1, (13b)
k=1KTr(𝐏k𝐏kH)Pmax.\displaystyle\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{P}_{k}\mathbf{P}_{k}^{H})\leq P_{\text{max}}. (13c)

Since WW is constant, we will drop it when solving (12) and (13), but it is included in simulation results in Section VII. In the following sections, we present our proposed methods for solving the above two EEmax problems.

III Proposed Solution to DPC-based SIM

To solve (12), we present an iterative optimization algorithm which optimizes the covariance matrices and the SIM phase shifts in an alternating manner, which is a prevailing method in existing studies for SIM. In particular, for fixed phase shifts, we propose a novel method which can optimize the covariance matrices in parallel, using a closed-form expression. Our proposed method is derived by applying Dinkelbach’s method to maximize a quadratic lower bound of the objective, iteratively. The phase shifts of the meta-elements of the SIM layers are optimized by a gradient-based optimization method, which is a natural choice, considering the extremely large size of the SIM.

III-A Covariance Matrix Optimization

We remark that the objective function in (12a) is neither convex nor concave with respect to the optimization variables. To deal with this, we exploit the well-know duality between BCs and MACs, introduced in [22], which states that the achievable sum-rate of the MIMO BC equals the achievable sum-rate of the dual MIMO MAC. Accordingly, (12) is equivalent to the EEmax problem in the dual MAC, which is expressed as

​​maximize𝐒,ϕ\underset{\mathbf{S},\boldsymbol{\phi}}{\operatorname{maximize}} ln|𝐈+k=1K𝐇kH𝐒k𝐇k|k=1KTr(𝐒k)+NtPc+P0+LNPs\displaystyle\quad\frac{\ln\left|\mathbf{I}+\sum_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k}\right|}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k})+N_{t}P_{c}+P_{0}+LNP_{s}} (14a)
subjectto\displaystyle\operatorname{subject~{}to} |ϕ|=1,\displaystyle\quad\left|\boldsymbol{\phi}\right|=1, (14b)
k=1KTr(𝐒k)P;𝐒k𝟎,k,\displaystyle\quad\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k})\leq P;\mathbf{S}_{k}\succeq\mathbf{0},\forall k, (14c)

where 𝐇kH\mathbf{H}_{k}^{H} is the dual MAC of user kk. Also, 𝐒=(𝐒1,𝐒2,,𝐒K)\mathbf{S}=(\mathbf{S}_{1},\mathbf{S}_{2},\dots,\mathbf{S}_{K}), where 𝐒kNr×Nr\mathbf{S}_{k}\in\mathbb{C}^{N_{r}\times N_{r}} is the input covariance matrix of user kk in the dual MAC. Note that the equality constraint in (14b) is treated element-wise. The key idea to develop efficient solutions to (12a) is first to drop the power constraint (12c), which results in

maximize𝐒k0 g(𝐒)=\displaystyle\text{$\underset{\mathbf{S}_{k}\succeq 0}{\operatorname{maximize}}$ }g(\mathbf{S})= ln|𝐈+k=1K𝐇kH𝐒k𝐇k|k=1KTr(𝐒k)+NtPc+P0+LNPs.\displaystyle\frac{\ln\left|\mathbf{I}+\sum_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k}\right|}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k})+N_{t}P_{c}+P_{0}+LNP_{s}}. (15)

To appreciate the novelty of our proposed method, we briefly describe the block-coordinate method proposed in [23], which optimizes each 𝐒k\mathbf{S}_{k} sequentially, while other variables are fixed. More precisely, let 𝐒(n)=(𝐒1(n),𝐒2(n),,𝐒K(n))\mathbf{S}^{(n)}=(\mathbf{S}_{1}^{(n)},\mathbf{S}_{2}^{(n)},\dots,\mathbf{S}_{K}^{(n)}) denote the current iterate. Then the next iterate 𝐒k(n+1)\mathbf{S}_{k}^{(n+1)} is obtained as

𝐒k(n+1)\displaystyle\mathbf{S}_{k}^{(n+1)} =argmax𝐒k𝟎g(𝐒1(n+1),,𝐒k1(n+1),𝐒k,𝐒k+1(n),,𝐒K(n))\displaystyle=\arg\underset{\mathbf{S}_{k}\succeq\mathbf{0}}{\max}\>\>g\bigl{(}\mathbf{S}_{1}^{(n+1)},\ldots,\mathbf{S}_{k-1}^{(n+1)},\mathbf{S}_{k},\mathbf{S}_{k+1}^{(n)},\ldots,\mathbf{S}_{K}^{(n)}\bigr{)}
=argmax𝐒k𝟎ln|𝐙k|+ln|𝐈+𝐓kH𝐒k𝐓k|pk+Tr(𝐒k)\displaystyle=\arg\underset{\mathbf{S}_{k}\succeq\mathbf{0}}{\max}\>\>\frac{\ln\left|\mathbf{Z}_{k}\right|+\ln\left|\mathbf{I}+\mathbf{T}_{k}^{H}\mathbf{S}_{k}\mathbf{T}_{k}\right|}{p_{k}+\operatorname{Tr}(\mathbf{S}_{k})} (16)

where 𝐙k=𝐈+j=1,jkK𝐇jH𝐒j𝐇j\mathbf{Z}_{k}=\mathbf{I}+\sum_{j=1,j\neq k}^{K}\mathbf{H}_{j}^{H}\mathbf{S}_{j}\mathbf{H}_{j}, 𝐓k=𝐇k𝐙k1/2\mathbf{T}_{k}=\mathbf{H}_{k}\mathbf{Z}_{k}^{-1/2}, and pk=j=1,jkKTr(𝐒j)+NtPc+P0+LNPsp_{k}=\sum_{j=1,j\neq k}^{K}\operatorname{Tr}(\mathbf{\mathbf{S}}_{j})+N_{t}P_{c}+P_{0}+LNP_{s}. Next, to solve (16), the authors of [23] applied Dinkelbach’s method, which leads to the following problem

max𝐒k𝟎Fλ(𝐒)ln|𝐙k|+ln|𝐈+𝐓kH𝐒k𝐓k|λD(pk+Tr(𝐒k))\underset{\mathbf{S}_{k}\succeq\mathbf{0}}{\max}F_{\lambda}\bigl{(}\mathbf{S}\bigr{)}\triangleq\ln\left|\mathbf{Z}_{k}\right|+\ln\left|\mathbf{I}+\mathbf{T}_{k}^{H}\mathbf{S}_{k}\mathbf{T}_{k}\right|-\lambda_{D}(p_{k}+\operatorname{Tr}(\mathbf{S}_{k})) (17)

where λD\lambda_{D} is a non-negative parameter. For a given λD\lambda_{D}, the above problem can be solved in closed-form [23]. However, such a method requires to find the inverse of 𝐙k𝒞Nt×Nt\mathbf{Z}_{k}\in\mathcal{C}^{N_{t}\times N_{t}} which has a complexity of 𝒪(Nt3)\mathcal{O}(N_{t}^{3}) in general, and compute the singular value decomposition (SVD) of 𝐓k\mathbf{T}_{k} which has a complexity of 𝒪(Nt2Nr)\mathcal{O}(N_{t}^{2}N_{r}). Thus, the overall complexity of the method presented in [23] is very high.

In this paper, we propose a more efficient method based on the fact that a stationary solution to (15) is also globally optimal since (15) is a concave-convex program. This motivates us to adopt the SCA framework, which is normally used to find a stationary solution for nonconvex programs. To proceed, since 𝐒k𝟎\mathbf{S}_{k}\succeq\mathbf{0}, we can write 𝐒k=𝐔kH𝐔k\mathbf{S}_{k}=\mathbf{U}_{k}^{H}\mathbf{U}_{k} where 𝐔kNr×Nr\mathbf{U}_{k}\in\mathbb{C}^{N_{r}\times N_{r}}. Thus, (15) is equivalent to the following unconstrained optimization problem:

maximize𝐔k g(𝐔)=h(𝐔)k=1KTr(𝐔kH𝐔k)+NtPc+P0+LNPs.\text{$\underset{\mathbf{U}_{k}}{\operatorname{maximize}}$ }g(\mathbf{U})=\frac{h(\mathbf{U)}}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{U}_{k}^{H}\mathbf{U}_{k})+N_{t}P_{c}+P_{0}+LNP_{s}}. (18)

where h(𝐔)=ln|𝐈+k=1K𝐇kH𝐔kH𝐔k𝐇k|h(\mathbf{U)}=\ln\left|\mathbf{I}+\sum_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\right|. It is easy to see that h(𝐔)h(\mathbf{U)} can be equivalently rewritten as

h(𝐔)=j=1Kln|𝐈+k=jK𝐇kH𝐔kH𝐔k𝐇k||𝐈+k=j+1K𝐇kH𝐔kH𝐔k𝐇k|\displaystyle h(\mathbf{U)}=\sum_{j=1}^{K}\ln\frac{\left|\mathbf{I}+\sum_{k=j}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\right|}{\left|\mathbf{I}+\sum_{k=j+1}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\right|} (19)
=j=1Kln|𝐈+(𝐈+k=j+1K𝐇kH𝐔kH𝐔k𝐇k)1𝐇kH𝐔kH𝐔k𝐇k|\displaystyle=\sum_{j=1}^{K}\ln\left|\mathbf{I}+\Bigl{(}\mathbf{I}+\sum_{k=j+1}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\Bigr{)}^{-1}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\right| (20)
=j=1Kln|𝐈+𝐔j𝐇j(𝐈+k=j+1K𝐇kH𝐔kH𝐔k𝐇k)1𝐇jH𝐔jH|.\displaystyle=\sum_{j=1}^{K}\ln\left|\mathbf{I}+\mathbf{U}_{j}\mathbf{H}_{j}\Bigl{(}\mathbf{I}+\sum_{k=j+1}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\Bigr{)}^{-1}\mathbf{H}_{j}^{H}\mathbf{U}_{j}^{H}\right|. (21)

In fact, the jj-th term in the sum above is the capacity of user jj in the dual MAC, using successive interference cancellation [22]. As shown shortly, the above reformulation of h(𝐔)h(\mathbf{U)} allows for approximating h(𝐔)h(\mathbf{U)} by a “proper bound” to obtain a subproblem that admits a closed-form solution. In this regard, we recall the following inequality [24]:

ln|𝐈+𝐕𝐘1𝐕H|ln|𝐈+𝐕^𝐘^1𝐕^H|Tr(𝐕^𝐘^1𝐕^H)\displaystyle\ln\left|\mathbf{I}+\mathbf{V}\mathbf{Y}^{-1}\mathbf{V}^{H}\right|\geq\ln\left|\mathbf{I}+\hat{\mathbf{V}}\hat{\mathbf{Y}}^{-1}\hat{\mathbf{V}}^{H}\right|-\operatorname{Tr}(\hat{\mathbf{V}}\hat{\mathbf{Y}}^{-1}\hat{\mathbf{V}}^{H})
+2(Tr(𝐕^𝐘^1𝐕H))Tr(𝐀H(𝐕H𝐕+𝐘)),\displaystyle+2\mathfrak{R}(\operatorname{Tr}(\hat{\mathbf{V}}\hat{\mathbf{Y}}^{-1}\mathbf{V}^{H}))-\operatorname{Tr}(\mathbf{A}^{H}(\mathbf{V}^{H}\mathbf{V}+\mathbf{Y})), (22)

where 𝐀=𝐘^1(𝐕^H𝐕^+𝐘^)1\mathbf{A}=\hat{\mathbf{Y}}^{-1}-(\hat{\mathbf{V}}^{H}\hat{\mathbf{V}}+\hat{\mathbf{Y}})^{-1}. We remark that the above inequality holds for arbitrary 𝐕\mathbf{V}, 𝐕^\hat{\mathbf{V}}, 𝐘𝟎\mathbf{Y}\succ\mathbf{0} and 𝐘^𝟎\hat{\mathbf{Y}}\succ\mathbf{0}, whose sizes are compatible and the equality occurs when 𝐕=𝐕^\mathbf{V}=\hat{\mathbf{V}} and 𝐘=𝐘^\mathbf{Y}=\hat{\mathbf{Y}}. In light of the SCA framework, we denote by 𝐔j(n)\mathbf{U}_{j}^{(n)} the value of 𝐔j\mathbf{U}_{j} after nn iterations. Now let

Input: 𝐇\mathbf{H}, 𝐒\mathbf{S}, λD(0)0\lambda_{D}^{(0)}\geq 0, m0m\leftarrow 0
1
2repeat
3     
4     n0n\leftarrow 0
5     
6     repeat
7          
8          Calculate all 𝐕^j(n)\hat{\mathbf{V}}_{j}^{(n)}, 𝐘^j(n)\hat{\mathbf{Y}}_{j}^{(n)}, 𝐀j(n)\mathbf{A}_{j}^{(n)}, 𝐁j(n)\mathbf{B}_{j}^{(n)}
9          
10          Calculate all 𝐔j(n+1)\mathbf{U}_{j}^{(n+1)} from (32)
11          
12          nn+1n\leftarrow n+1
13          
14           until g(𝐔(n+1))g(𝐔(n))>ϵ1,Dg(\mathbf{U}^{(n+1)})-g(\mathbf{U}^{(n)})>\epsilon_{1,D}
15          λD(m+1)g(𝐔(n))\lambda_{D}^{(m+1)}\leftarrow g(\mathbf{U}^{(n)}) according to (18)
16          
17          mm+1m\leftarrow m+1
18          
19          𝐔(0)𝐔(n)\mathbf{U}^{(0)}\leftarrow\mathbf{U}^{(n)}
20          
21           until λD(m+1)λD(m)>ϵ2,D\lambda_{D}^{(m+1)}-\lambda_{D}^{(m)}>\epsilon_{2,D}
22          Calculate all 𝐒k=𝐔k(n)H𝐔k(n)\mathbf{S}_{k}^{*}=\mathbf{U}_{k}^{(n)H}\mathbf{U}_{k}^{(n)}
23          
24          if  k=1KTr(𝐒k)P\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k}^{*})\leq P  then
25               
26               𝐒opt=𝐒\mathbf{S}_{\text{opt}}=\mathbf{S}^{*}
27                else
28                    
29                    𝐒opt=𝐒^\mathbf{S}_{\text{opt}}=\hat{\mathbf{S}} obtained from [25]
30                    
31                     end if
Algorithm 1 Optimization of the covariance matrices in the dual MAC.
𝐕=𝐔j𝐇j𝐕jNr×Nt,\mathbf{V}=\mathbf{U}_{j}\mathbf{H}_{j}\triangleq\mathbf{V}_{j}\in\mathbb{C}^{N_{r}\times N_{t}},
𝐕^=𝐔j(n)𝐇j𝐕^j(n)Nr×Nt,\hat{\mathbf{V}}=\mathbf{U}_{j}^{(n)}\mathbf{H}_{j}\triangleq\hat{\mathbf{V}}_{j}^{(n)}\in\mathbb{C}^{N_{r}\times N_{t}},
𝐘=𝐈+k=j+1K𝐇kH𝐔kH𝐔k𝐇k=𝐈+k=j+1K𝐕jH𝐕j𝐘j\mathbf{Y}=\mathbf{I}+\sum_{k=j+1}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}=\mathbf{I}+\sum_{k=j+1}^{K}\mathbf{V}_{j}^{H}\mathbf{V}_{j}\triangleq\mathbf{Y}_{j}

and

𝐘^\displaystyle\hat{\mathbf{Y}} =𝐈+k=j+1K𝐇kH𝐔k(n)H𝐔k(n)𝐇k\displaystyle=\mathbf{I}+\sum_{k=j+1}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{(n)H}\mathbf{U}_{k}^{(n)}\mathbf{H}_{k}
=𝐈+k=j+1K𝐕^j(n)H𝐕^j(n)𝐘^j(n)Nt×Nt.\displaystyle=\mathbf{I}+\sum_{k=j+1}^{K}\hat{\mathbf{V}}_{j}^{(n)H}\hat{\mathbf{V}}_{j}^{(n)}\triangleq\hat{\mathbf{Y}}_{j}^{(n)}\mathbb{C}^{N_{t}\times N_{t}}.

Then (22) implies

h(𝐔)\displaystyle h(\mathbf{U}) h¯(𝐔;𝐔(n))=c(n)+j=1K2(Tr(𝐁j(n)𝐔jH))\displaystyle\geq\bar{h}(\mathbf{U};\mathbf{U}^{(n)})=c^{(n)}+\sum\nolimits_{j=1}^{K}2\mathfrak{R}\Bigl{(}\operatorname{Tr}(\mathbf{B}_{j}^{(n)}\mathbf{U}_{j}^{H})\Bigr{)}
j=1KTr(𝐀j(n)Hk=jK𝐇kH𝐔kH𝐔k𝐇k).\displaystyle\quad-\sum\nolimits_{j=1}^{K}\operatorname{Tr}(\mathbf{A}_{j}^{(n)H}\sum\nolimits_{k=j}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}). (23)

where

c(n)\displaystyle c^{(n)} =j=1K[ln|𝐈+𝐕^j(n)(𝐘^(n))1𝐕^j(n)H|\displaystyle=\sum\nolimits_{j=1}^{K}\Bigg{[}\ln\Bigl{|}\mathbf{I}+\hat{\mathbf{V}}_{j}^{(n)}\Bigl{(}\hat{\mathbf{Y}}^{(n)}\Bigr{)}^{-1}\hat{\mathbf{V}}_{j}^{(n)H}\Bigr{|}
Tr(𝐕^j(n)(𝐘^(n))1𝐕^j(n)H)Tr(𝐀j(n)H)]\displaystyle-\operatorname{Tr}(\hat{\mathbf{V}}_{j}^{(n)}\Bigl{(}\hat{\mathbf{Y}}^{(n)}\Bigr{)}^{-1}\hat{\mathbf{V}}_{j}^{(n)H})-\operatorname{Tr}(\mathbf{A}_{j}^{(n)H})\Bigg{]}
=h(𝐔(n))\displaystyle=h(\mathbf{U}^{(n)})
j=1KTr(𝐕^j(n)(𝐘^(n))1𝐕^j(n)H)Tr(𝐀j(n)H)\displaystyle\quad-\sum\nolimits_{j=1}^{K}\operatorname{Tr}(\hat{\mathbf{V}}_{j}^{(n)}\Bigl{(}\hat{\mathbf{Y}}^{(n)}\Bigr{)}^{-1}\hat{\mathbf{V}}_{j}^{(n)H})-\operatorname{Tr}(\mathbf{A}_{j}^{(n)H})
𝐁j(n)\displaystyle\mathbf{B}_{j}^{(n)} =𝐁¯j(n)𝐇jHNr×Nr;𝐁¯j(n)=𝐕^j(n)(𝐘^j(n))1Nr×Nt\displaystyle=\bar{\mathbf{B}}_{j}^{(n)}\mathbf{H}_{j}^{H}\in\mathbb{C}^{N_{r}\times N_{r}};\bar{\mathbf{B}}_{j}^{(n)}=\hat{\mathbf{V}}_{j}^{(n)}\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1}\in\mathbb{C}^{N_{r}\times N_{t}}
𝐀j(n)\displaystyle\mathbf{A}_{j}^{(n)} =(𝐘^j(n))1(𝐕^j(n)H𝐕^j(n)+𝐘^j(n))1\displaystyle=\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1}-\Bigl{(}\hat{\mathbf{V}}_{j}^{(n)H}\hat{\mathbf{V}}_{j}^{(n)}+\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1}
=(𝐘^j(n))1(𝐘^j1(n))1Nt×Nt\displaystyle=\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1}-\Bigl{(}\hat{\mathbf{Y}}_{j-1}^{(n)}\Bigr{)}^{-1}\in\mathbb{C}^{N_{t}\times N_{t}}

Regarding the complexity of the above approximation, the following remark is in order.

Remark 1.

Since 𝐘^Nt×Nt\hat{\mathbf{Y}}\in\mathbb{C}^{N_{t}\times N_{t}}, it may appear that the complexity of constructing the above bound is 𝒪(Nt3)\mathcal{O}(N_{t}^{3}) due to the computation of (𝐘^j(n))1\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1}. However, we emphasize that this is not the case. Specifically, by invoking the matrix-inversion lemma we can write

(𝐘^j(n))1\displaystyle\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1} =(𝐕^j+1(n)H𝐕^j+1(n)+𝐘^j+1(n))1\displaystyle=\bigl{(}\hat{\mathbf{V}}_{j+1}^{(n)H}\hat{\mathbf{V}}_{j+1}^{(n)}+\hat{\mathbf{Y}}_{j+1}^{(n)}\bigr{)}^{-1} (24)
=(𝐘^j+1(n))1(𝐘^j+1(n))1𝐕^j+1(n)H(𝐈+\displaystyle=\bigl{(}\hat{\mathbf{Y}}_{j+1}^{(n)}\bigr{)}^{-1}-\bigl{(}\hat{\mathbf{Y}}_{j+1}^{(n)}\bigr{)}^{-1}\hat{\mathbf{V}}_{j+1}^{(n)H}\bigl{(}\mathbf{I}+ (25)
𝐕^j+1(n)(𝐘^j+1(n))1𝐕^j+1(n)H)1𝐕^j+1(n)(𝐘^j+1(n))1\displaystyle\quad\hat{\mathbf{V}}_{j+1}^{(n)}\bigl{(}\hat{\mathbf{Y}}_{j+1}^{(n)}\bigr{)}^{-1}\hat{\mathbf{V}}_{j+1}^{(n)H}\bigr{)}^{-1}\hat{\mathbf{V}}_{j+1}^{(n)}\bigl{(}\hat{\mathbf{Y}}_{j+1}^{(n)}\bigr{)}^{-1} (26)
=(𝐘^j+1(n))1𝐁¯j+1(n)H(𝐈+𝐁¯j+1(n)𝐕^j+1(n)H)1𝐁¯j+1(n),\displaystyle=\bigl{(}\hat{\mathbf{Y}}_{j+1}^{(n)}\bigr{)}^{-1}-\bar{\mathbf{B}}_{j+1}^{(n)H}\bigl{(}\mathbf{I}+\bar{\mathbf{B}}_{j+1}^{(n)}\hat{\mathbf{V}}_{j+1}^{(n)H}\bigr{)}^{-1}\bar{\mathbf{B}}_{j+1}^{(n)}, (27)

for j=1,2,,K1j=1,2,\ldots,K-1. The above equation indeed suggests a recursive method to compute (𝐘^j(n))1\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1} efficiently. Suppose the inverse of 𝐘^j+1(n)\hat{\mathbf{Y}}_{j+1}^{(n)} is known. Then, we only need to compute the inverse of the matrix 𝐈+𝐁¯j+1(n)𝐕^j+1(n)HNr×Nr\mathbf{I}+\bar{\mathbf{B}}_{j+1}^{(n)}\hat{\mathbf{V}}_{j+1}^{(n)H}\in\mathbb{C}^{N_{r}\times N_{r}}, which has complexity of 𝒪(Nr3)\mathcal{O}(N_{r}^{3}), to obtain (𝐘^j(n))1\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1}. Thus, starting from 𝐘^K(n)=𝐈\hat{\mathbf{Y}}_{K}^{(n)}=\mathbf{I}, we can gradually compute (𝐘^j(n))1\Bigl{(}\hat{\mathbf{Y}}_{j}^{(n)}\Bigr{)}^{-1} for j=K1,K2,1j=K-1,K-2,\ldots 1. In this way, we remark that 𝐀j(n)\mathbf{A}_{j}^{(n)} is also obtained easily.

Now, using the above lower bound of h(𝐔)h(\mathbf{U}), we consider the following approximate problem

maximize𝐔k h¯(𝐔;𝐔(n))k=1KTr(𝐔kH𝐔k)+NtPc+P0+LNPs.\text{$\underset{\mathbf{U}_{k}}{\operatorname{maximize}}$ }\frac{\bar{h}(\mathbf{U};\mathbf{U}^{(n)})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{U}_{k}^{H}\mathbf{U}_{k})+N_{t}P_{c}+P_{0}+LNP_{s}}. (28)

It is important to note that the above problem is a concave-convex fractional program since h¯(𝐔;𝐔(n))\bar{h}(\mathbf{U};\mathbf{U}^{(n)}) is concave. Thus, Dinkelbach’s method can be applied to find the optimal solution, which leads to the following parameterized problem

maximize𝐔kf(𝐔)=c(n)+j=1K2(Tr(𝐁j(n)𝐔jH))\displaystyle\underset{\mathbf{U}_{k}}{\operatorname{maximize}}f(\mathbf{U})=c^{(n)}+\sum_{j=1}^{K}2\mathfrak{R}\Bigl{(}\operatorname{Tr}(\mathbf{B}_{j}^{(n)}\mathbf{U}_{j}^{H})\Bigr{)} (29)
j=1KTr(𝐀j(n)Hk=jK𝐇kH𝐔kH𝐔k𝐇k)\displaystyle-\sum_{j=1}^{K}\operatorname{Tr}\bigl{(}\mathbf{A}_{j}^{(n)H}\sum\nolimits_{k=j}^{K}\mathbf{H}_{k}^{H}\mathbf{U}_{k}^{H}\mathbf{U}_{k}\mathbf{H}_{k}\bigr{)} (30)
λD(j=1KTr(𝐔jH𝐔j)+NtPc+P0+LNPs)\displaystyle-\lambda_{D}\Bigl{(}\sum_{j=1}^{K}\operatorname{Tr}(\mathbf{U}_{j}^{H}\mathbf{U}_{j})+N_{t}P_{c}+P_{0}+LNP_{s}\Bigr{)} (31)

where λD>0\lambda_{D}>0 is a given parameter. It is important to note that the above optimization problem can be solved independently for each 𝐔j\mathbf{U}_{j}, which admits the following closed-form solution

𝐔j=𝐁j(n)(𝐇jl=1j𝐀l(n)H𝐇jH+λ𝐈)1,j=1,2,K.\mathbf{U}_{j}=\mathbf{B}_{j}^{(n)}\Bigl{(}\mathbf{H}_{j}\sum_{l=1}^{j}\mathbf{A}_{l}^{(n)H}\mathbf{H}_{j}^{H}+\lambda\mathbf{I}\Bigr{)}^{-1},\>j=1,2,\ldots K. (32)

Let 𝐒\mathbf{S}^{\ast} be the optimal solution to (15). Obviously, if k=1KTr(𝐒k)Pmax\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k}^{\ast})\leq P_{\max}, then 𝐒\mathbf{S}^{\ast} is also optimal to (12a). On the other hand, if k=1KTr(𝐒k)>Pmax\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k}^{\ast})>P_{\max}, it is straightforward to see that the optimal solution to (12a) is obtained by solving the sum-rate maximization problem with the sum power constraint, which is defined as:

​​maximize𝐒k\underset{\mathbf{S}_{k}}{\operatorname{maximize}} ln|𝐈+k=1K𝐇kH𝐒k𝐇k|\displaystyle\ln\left|\mathbf{I}+\sum\nolimits_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k}\right| (33a)
subjectto\displaystyle\operatorname{subject~{}to} Tr(𝐒k)Pmax.\displaystyle\operatorname{Tr}(\mathbf{S}_{k})\leq P_{\max}. (33b)

An efficient method for solving the above problem was proposed in [25], which we omit the details for the sake of brevity. Let 𝐒^=(𝐒^1,𝐒^2,,𝐒^K)\hat{\mathbf{S}}=(\hat{\mathbf{S}}_{1},\hat{\mathbf{S}}_{2,}\dots,\hat{\mathbf{S}}_{K}) be the optimal solution to (33). Then it is easy to see that the optimal covariance matrices for (14) are given by

𝐒opt={𝐒k=1KTr(𝐒k)Pmax𝐒^otherwise.\mathbf{S}_{\text{opt}}=\begin{cases}\mathbf{S}^{\ast}&\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\mathbf{S}_{k}^{\ast})\leq P_{\max}\\ \hat{\mathbf{S}}&\text{otherwise}.\end{cases} (34)

A summary of the described covariance matrix optimization method is presented in Algorithm 1.

III-B SIM Phase Shift Optimization

Since the power consumption does not depend on the channel matrix, the phase shift optimization can improve the EE by increasing the sum-rate of the considered system. Therefore, for fixed 𝐒\mathbf{S}, the SIM phase shift optimization problem is formulated as

maximizeϕ\displaystyle\underset{\boldsymbol{\phi}}{\operatorname{maximize}}\quad κ(ϕ)=ln|𝐈+k=1K𝐇kH𝐒k𝐇k|\displaystyle\kappa(\boldsymbol{\phi})=\ln\left|\mathbf{I}+\sum\nolimits_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k}\right| (35a)
subjectto\displaystyle\operatorname{subject~{}to}\quad |ϕ|=1.\displaystyle\left|\boldsymbol{\phi}\right|=1. (35b)

Considering the large size of the SIM, we adopt a gradient-based method to optimize the phase shifts, which consists of the following iterations:

ϕ(n+1)=𝒫ϕ(ϕ(n)+unϕκ(ϕ(n))),\boldsymbol{\phi}^{(n+1)}=\mathcal{P}_{\phi}(\boldsymbol{\phi}^{(n)}+u_{n}\nabla_{\boldsymbol{\phi}}\kappa(\boldsymbol{\phi}^{(n)})), (36)

where unu_{n} is the appropriate step size. The gradient of κ(ϕ)\kappa(\boldsymbol{\phi}) w.r.t. the phase shifts of the SIM is determined by the gradients of κ(ϕ)\kappa(\boldsymbol{\phi}) w.r.t. the phase shifts of the constituent SIM layers as

ϕκ(ϕ)\displaystyle\nabla_{\boldsymbol{\phi}}\kappa(\boldsymbol{\phi}) =[ϕ1κ(ϕ)TϕLκ(ϕ)T]T.\displaystyle=[\nabla_{\boldsymbol{\phi}^{1}}\kappa(\boldsymbol{\phi})^{T}\>\cdots\>\nabla_{\boldsymbol{\phi}^{L}}\kappa(\boldsymbol{\phi})^{T}]^{T}.

The gradient w.r.t. each ϕL\boldsymbol{\phi}^{L} is provided in Theorem 1.

Theorem 1.

The gradient of h(ϕ)h(\boldsymbol{\phi}) w.r.t. ϕL\boldsymbol{\phi}^{L} is given by

ϕlκ(ϕ)\displaystyle\nabla_{\boldsymbol{\phi}^{l}}\kappa(\boldsymbol{\phi}) =vecd(𝐂)\displaystyle=\operatorname{vec}_{d}(\mathbf{C}) (37)

where

𝐃\displaystyle\mathbf{D} =𝐈+k=1K𝐇kH𝐒k𝐇k\displaystyle=\mathbf{I}+\sum\nolimits_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k} (38a)
𝐂\displaystyle\mathbf{C} =𝚯l+1:Lk=1K𝐆kH𝐒k𝐇k𝐃1𝚯1:l1(𝐖l)H\displaystyle=\boldsymbol{\Theta}^{l+1:L}\sum\nolimits_{k=1}^{K}\mathbf{G}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k}\mathbf{D}^{-1}\boldsymbol{\Theta}^{1:l-1}(\mathbf{W}^{l})^{H} (38b)

where 𝚯m:n=(𝐖m)H(𝚽m)H(𝐖n)H(𝚽n)H\boldsymbol{\Theta}^{m:n}=(\mathbf{W}^{m})^{H}(\boldsymbol{\mathbf{\Phi}}^{m})^{H}\cdots(\mathbf{W}^{n})^{H}(\boldsymbol{\mathbf{\Phi}}^{n})^{H}.

Proof:

Differentiating κ(ϕ)\kappa(\boldsymbol{\phi}) w.r.t. 𝚽l\mathbf{\boldsymbol{\Phi}}^{l}, we obtain

dκ(ϕ)=Tr(𝐃1k=1Kd(𝐇kH𝐒k𝐇k))=\displaystyle\!\!\!\!\!\!\text{d}\kappa(\boldsymbol{\phi})=\operatorname{Tr}\bigg{(}\mathbf{D}^{-1}\sum\nolimits_{k=1}^{K}\text{d}(\mathbf{H}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k})\bigg{)}=
Tr(k=1K𝐒k𝐇k𝐃1d𝐇kH+k=1K𝐃1𝐇kH𝐒kd𝐇k).\displaystyle\operatorname{Tr}\bigg{(}\sum\nolimits_{k=1}^{K}\mathbf{S}_{k}\mathbf{H}_{k}\mathbf{D}^{-1}\text{d}\mathbf{H}_{k}^{H}+\sum\nolimits_{k=1}^{K}\mathbf{D}^{-1}\mathbf{H}_{k}^{H}\mathbf{S}_{k}\text{d}\mathbf{H}_{k}\bigg{)}. (39)

Substituting

d𝐇k=𝐆k𝚽L𝐖Ld𝚽l𝐖l𝚽1𝐖1\text{d}\mathbf{H}_{k}=\mathbf{G}_{k}\boldsymbol{\mathbf{\Phi}}^{L}\mathbf{W}^{L}\cdots\text{d}\boldsymbol{\mathbf{\Phi}}^{l}\mathbf{W}^{l}\cdots\boldsymbol{\mathbf{\Phi}}^{1}\mathbf{W}^{1} (40)

in the previous equation, it can be written as

dκ(ϕ)=Tr(𝐂d(𝚽l)H+𝐂Hd𝚽l)\text{d}\kappa(\boldsymbol{\phi})=\operatorname{Tr}(\mathbf{C}\text{d}(\mathbf{\boldsymbol{\Phi}}^{l})^{H}+\mathbf{C}^{H}\text{d}\mathbf{\boldsymbol{\Phi}}^{l}) (41)

where 𝐂\mathbf{C} is defined in (38b). Hence, we have

𝚽lκ(ϕ)=𝐂\nabla_{\boldsymbol{\mathbf{\Phi}}^{l}}\kappa(\boldsymbol{\phi})=\mathbf{C} (42)

and using 𝚽l=diag(ϕl),\boldsymbol{\mathbf{\Phi}}^{l}=\text{diag}(\boldsymbol{\phi}^{l}), we obtain (37). ∎

Since all the elements of ϕ\boldsymbol{\phi} have the unit amplitude, the projection 𝒫ϕ(ϕ)\mathcal{P}_{\phi}(\boldsymbol{\phi}) is defined by

ϕ¯nl={ϕnl/|ϕnl|ϕnl0,ejα,α[0,2π]ϕnl=0.\bar{\phi}_{n}^{l}=\begin{cases}\phi_{n}^{l}/|\phi_{n}^{l}|&\phi_{n}^{l}\neq 0,\\ e^{j\alpha},\alpha\in[0,2\pi]&\phi_{n}^{l}=0.\end{cases} (43)

Finally, the the EE optimization algorithm for a SIM-aided broadcast system with DPC precoding is outlined in Algorithm 2.

Input: 𝐇\mathbf{H}, 𝐒(0)\mathbf{S}^{(0)}, ϕ(0)\boldsymbol{\phi}^{(0)}, δ>0\delta>0, u0>0u_{0}>0, ρ(0,1)\rho\in(0,1), n0n\leftarrow 0
1
2repeat
3     
4     Call Algorithm 1 to obtain 𝐒(n+1)\mathbf{S}^{(n+1)}
5     
6     repeat
7          
8          ϕ(n+1)=𝒫ϕ(ϕ(n)+unϕκ(ϕ(n)))\boldsymbol{\phi}^{(n+1)}=\mathcal{P}_{\phi}(\boldsymbol{\phi}^{(n)}+u_{n}\nabla_{\boldsymbol{\phi}}\kappa(\boldsymbol{\phi}^{(n)}))
9          
10          if κ(ϕ(n+1))<κ(ϕ(n))+δϕ(n+1)ϕ(n)2\kappa(\boldsymbol{\phi}^{(n+1)})<\kappa(\boldsymbol{\phi}^{(n)})+\delta||\boldsymbol{\phi}^{(n+1)}-\boldsymbol{\phi}^{(n)}||^{2} then
11               
12               unρunu_{n}\leftarrow\rho u_{n}
13               
14                end if
15               
16                until κ(ϕ(n+1))κ(ϕ(n))+δϕ(n+1)ϕ(n)2\kappa(\boldsymbol{\phi}^{(n+1)})\geq\kappa(\boldsymbol{\phi}^{(n)})+\delta||\boldsymbol{\phi}^{(n+1)}-\boldsymbol{\phi}^{(n)}||^{2}
17               Calculate 𝐁\mathbf{B} and 𝐇\mathbf{H} for ϕ(n+1)\boldsymbol{\phi}^{(n+1)}
18               
19               nn+1n\leftarrow n+1
20               
21                until convergence of η\eta in (12a)
Algorithm 2 Proposed algorithm for the EE optimization for a SIM-aided broadcast system with DPC precoding.

IV Proposed Solution to Linear Precoding

In this section, we propose an optimization method for the EEmax problem with linear precoding. Similarly as in the previous section, the precoding matrices are found by closed-form expressions, which are derived by implementing Dinkelbach’s method, while the optimal phase shifts for the SIM meta-elements are optimized by a gradient-based method.

IV-A Precoding Matrix Optimization

The precoding matrix optimization, for fixed ϕ\boldsymbol{\phi}, in (13) in fact reduces to the EEmax problem in conventional MIMO systems. It can be observed that the complexity of the direct optimization of 𝐏\mathbf{P} is proportional to NtN_{t}, which can cause a significant complexity burden for systems even with moderate NtN_{t}, and thus such a direct optimization method is not practically appealing.

To overcome this issue, we consider an equivalent formulation of (13), introduced in [26], that has a smaller dimension and thus requires a lower computational complexity. Denoting 𝐇=[𝐇1T𝐇2T𝐇KT]TKNr×Nt\mathbf{H}=[\mathbf{H}_{1}^{T}\>\mathbf{H}_{2}^{T}\>\cdots\>\mathbf{H}_{K}^{T}]^{T}\in\mathbb{C}^{KN_{r}\times N_{t}}, (13) can be equivalently written as

​​maximize𝐗\underset{\mathbf{X}}{\operatorname{maximize}} f(𝐗)=k=1KR¯kk=1KTr(𝐇¯𝐗k𝐗kH)+NtPc+P0+LNPs\displaystyle f(\mathbf{X})=\frac{\sum_{k=1}^{K}\bar{R}_{k}}{\sum_{k=1}^{K}\operatorname{Tr}(\bar{\mathbf{H}}\mathbf{X}_{k}\mathbf{X}_{k}^{H})+N_{t}P_{c}+P_{0}+LNP_{s}} (44a)
subjectto\displaystyle\operatorname{subject~{}to} k=1KTr(𝐇¯𝐗k𝐗kH)Pmax,\displaystyle\sum_{k=1}^{K}\operatorname{Tr}(\bar{\mathbf{H}}\mathbf{X}_{k}\mathbf{X}_{k}^{H})\leq P_{\max}, (44b)

where

R¯k=\displaystyle\bar{R}_{k}= ln|𝐈+𝐇¯k𝐗k𝐗kH𝐇¯kH\displaystyle\ln\Bigg{|}\mathbf{I}+\bar{\mathbf{H}}_{k}\mathbf{X}_{k}\mathbf{X}_{k}^{H}\bar{\mathbf{H}}_{k}^{H}
×(𝐈+j=1,jkK𝐇¯k𝐗j𝐗jH𝐇¯kH)1|,\displaystyle\times\left(\mathbf{I}+\sum\nolimits_{j=1,j\neq k}^{K}\bar{\mathbf{H}}_{k}\mathbf{X}_{j}\mathbf{X}_{j}^{H}\bar{\mathbf{H}}_{k}^{H}\right)^{-1}\Bigg{|}, (45)

𝐗=[𝐗1𝐗2𝐗K]KNr×KNr\mathbf{X}=[\mathbf{X}_{1}\>\mathbf{X}_{2}\>\cdots\>\mathbf{X}_{K}]\in\mathbb{C}^{KN_{r}\times KN_{r}} are new optimization variables, and 𝐇¯k=𝐇k𝐇HNr×KNr\bar{\mathbf{H}}_{k}=\mathbf{H}_{k}\mathbf{H}^{H}\in\mathbb{C}^{N_{r}\times KN_{r}} is the kk-th sub-matrix of 𝐇¯=𝐇𝐇HKNr×KNr\bar{\mathbf{H}}=\mathbf{H}\mathbf{H}^{H}\in\mathbb{C}^{KN_{r}\times KN_{r}}. The equivalence between (13) and (44) is a result of [26, Prop. 2], and the optimal solutions of the two problems are related as 𝐏k=𝐇H𝐗k\mathbf{P}_{k}=\mathbf{H}^{H}\mathbf{X}_{k}. We remark that, comparing the size of 𝐗\mathbf{X} and 𝐏=[𝐏1,𝐏2,,𝐏K]Nt×KNr\mathbf{P}=[\mathbf{P}_{1},\mathbf{P}_{2},\dots,\mathbf{P}_{K}]\in\mathbb{C}^{N_{t}\times KN_{r}}, the equivalent formulation in (44) can significantly reduce the number of optimization variables for systems with large NtN_{t} (i.e., NtNrN_{t}\gg N_{r}). In the sequel, similar to the development of Algorithm 1, we first drop (44b) and derive a solution for the unconstrained case. The solution for the constrained case follows immediately.

Upon close inspection, we can observe that the denominator of (44) is a quadratic convex function, while the numerator of (44) is neither convex nor concave function. As a result, finding the solution of (44) is not a trivial task. To find an efficient method for solving (44), we again exploit the inequality, given in (22), to obtain a lower bound of the achievable rate in (45). Utilizing the identity |𝐈+𝐙k𝐙kH𝐘k|=|𝐈+𝐙kH𝐘k𝐙k|\left|\mathbf{I}+\mathbf{Z}_{k}\mathbf{Z}_{k}^{H}\mathbf{Y}_{k}\right|=\left|\mathbf{I}+\mathbf{Z}_{k}^{H}\mathbf{Y}_{k}\mathbf{Z}_{k}\right| to reformulate (45), the lower bound of the achievable rate of user kk can be expressed as

R¯k\displaystyle\bar{R}_{k} Lk=ln|𝐈+𝐙^kH𝐘^k1𝐙^k|Tr(𝐙^kH𝐘^k1𝐙^k)\displaystyle\geq L_{k}=\ln\left|\mathbf{I}+\hat{\mathbf{Z}}_{k}^{H}\hat{\mathbf{Y}}_{k}^{-1}\hat{\mathbf{Z}}_{k}\right|-\operatorname{Tr}(\hat{\mathbf{Z}}_{k}^{H}\hat{\mathbf{Y}}_{k}^{-1}\hat{\mathbf{Z}}_{k})
+2(Tr(𝐙^kH𝐘^k1𝐇¯k𝐗k))\displaystyle+2\mathfrak{R}(\operatorname{Tr}(\hat{\mathbf{Z}}_{k}^{H}\hat{\mathbf{Y}}_{k}^{-1}\bar{\mathbf{H}}_{k}\mathbf{X}_{k}))
j=1KTr(𝐗jH𝐇¯kH𝐀kH𝐇¯k𝐗j)Tr(𝐀kH)\displaystyle-\sum\nolimits_{j=1}^{K}\operatorname{Tr}(\mathbf{X}_{j}^{H}\bar{\mathbf{H}}_{k}^{H}\mathbf{A}_{k}^{H}\bar{\mathbf{H}}_{k}\mathbf{X}_{j})-\operatorname{Tr}(\mathbf{A}_{k}^{H}) (46)

where 𝐙k=𝐇¯k𝐗kNr×Nr\mathbf{Z}_{k}=\bar{\mathbf{H}}_{k}\mathbf{X}_{k}\in\mathbb{C}^{N_{r}\times N_{r}}, 𝐙^k=𝐇¯k𝐗k(n)Nr×Nr\mathbf{\hat{Z}}_{k}=\bar{\mathbf{H}}_{k}\mathbf{X}_{k}^{(n)}\in\mathbb{C}^{N_{r}\times N_{r}}, 𝐘k=j=1,jkK𝐇¯k𝐗j𝐗jH𝐇¯kH+𝐈Nr×Nr\mathbf{Y}_{k}=\sum\nolimits_{j=1,j\neq k}^{K}\bar{\mathbf{H}}_{k}\mathbf{X}_{j}\mathbf{X}_{j}^{H}\bar{\mathbf{H}}_{k}^{H}+\mathbf{I}\in\mathbb{C}^{N_{r}\times N_{r}}, 𝐘^k=j=1,jkK𝐇¯k𝐗j(n)(𝐗j(n))H𝐇¯kH+𝐈Nr×Nr\hat{\mathbf{Y}}_{k}=\sum\nolimits_{j=1,j\neq k}^{K}\bar{\mathbf{H}}_{k}\mathbf{X}_{j}^{(n)}(\mathbf{X}_{j}^{(n)})^{H}\bar{\mathbf{H}}_{k}^{H}+\mathbf{I}\in\mathbb{C}^{N_{r}\times N_{r}} and 𝐀k=𝐘^k1(𝐙^k𝐙^kH+𝐘^k)1Nr×Nr\mathbf{A}_{k}=\hat{\mathbf{Y}}_{k}^{-1}-(\hat{\mathbf{Z}}_{k}\hat{\mathbf{Z}}_{k}^{H}+\hat{\mathbf{Y}}_{k})^{-1}\in\mathbb{C}^{N_{r}\times N_{r}}. Consequently, the resulting approximate problem of (44) is given by

maximize𝐗\displaystyle\underset{\mathbf{X}}{\operatorname{maximize}} f¯(𝐗)=k=1KLkk=1KTr(𝐇¯𝐗k𝐗kH)+NtPc+P0+LNPs\displaystyle\bar{f}(\mathbf{X})=\frac{\sum_{k=1}^{K}L_{k}}{\sum_{k=1}^{K}\operatorname{Tr}(\bar{\mathbf{H}}\mathbf{X}_{k}\mathbf{X}_{k}^{H})+N_{t}P_{c}+P_{0}+LNP_{s}} (47)

which is a concave-convex optimization problem.

Next, we apply Dinkelbach’s method to solve (47), leading to the following optimization problem

maximize𝐗MλL(𝐗)\underset{\mathbf{X}}{\operatorname{maximize}}\;M_{\lambda_{L}}(\mathbf{X}) (48)

where

MλL(𝐗)\displaystyle M_{\lambda_{L}}(\mathbf{X}) =k=1KLkλL(k=1KTr(𝐇¯𝐗k𝐗kH)\displaystyle=\sum\nolimits_{k=1}^{K}L_{k}-\lambda_{L}\Big{(}\sum\nolimits_{k=1}^{K}\operatorname{Tr}(\bar{\mathbf{H}}\mathbf{X}_{k}\mathbf{X}_{k}^{H})
+NtPc+P0+LNPs)\displaystyle+N_{t}P_{c}+P_{0}+LNP_{s}\Big{)} (49)

and λL0\lambda_{L}\geq 0 is a given parameter.

Implementing the Karush-Kuhn-Tucker (KKT) first-order optimality condition to (48) by taking the gradient of this expression w.r.t. 𝐗j\mathbf{X}_{j}^{*} and setting it to 0, we obtain

Input: 𝐇\mathbf{H}, 𝐗(0)\mathbf{X}^{(0)}, ϕ\boldsymbol{\phi}, λL(0)0\lambda_{L}^{(0)}\geq 0, m0m\leftarrow 0
1
2repeat
3     
4     n0n\leftarrow 0
5     
6     repeat
7          
8          Calculate all 𝐙^k\mathbf{\hat{Z}}_{k}, 𝐘^k\mathbf{\hat{Y}}_{k}, 𝐀k\mathbf{A}_{k}
9          
10          Calculate 𝐗opt\mathbf{X}_{\text{opt}} according to (51) or (53)
11          
12          𝐗(n+1)𝐗opt\mathbf{X}^{(n+1)}\leftarrow\mathbf{X}_{\text{opt}}
13          
14          nn+1n\leftarrow n+1
15          
16           until f(𝐗(n+1))f(𝐗(n))>ϵ1,Lf(\mathbf{X}^{(n+1)})-f(\mathbf{X}^{(n)})>\epsilon_{1,L}
17          λL(m+1)f(𝐗(n))\lambda_{L}^{(m+1)}\leftarrow f(\mathbf{X}^{(n)}) according to (44a)
18          
19          mm+1m\leftarrow m+1
20          
21          𝐗(0)𝐗(n)\mathbf{X}^{(0)}\leftarrow\mathbf{X}^{(n)}
22          
23           until  λL(m+1)λL(m)>ϵ2,L\lambda_{L}^{(m+1)}-\lambda_{L}^{(m)}>\epsilon_{2,L}
24          if k=1KTr(𝐇¯𝐗k𝐗kH)Pmax\sum_{k=1}^{K}\operatorname{Tr}(\bar{\mathbf{H}}\mathbf{X}_{k}\mathbf{X}_{k}^{H})\leq P_{\max} then
25               
26               𝐗opt=𝐗\mathbf{X}_{\text{opt}}=\mathbf{X}
27                else
28                    
29                    𝐗opt\mathbf{X}_{\text{opt}} obtained from [26]
30                    
31                     end if
32                    
33                    for k=1k=1 𝐭𝐨\mathbf{to} KK do
34                         
35                         𝐏k=𝐇H𝐗opt,k\mathbf{P}_{k}=\mathbf{H}^{H}\mathbf{X}_{\text{opt},k}
36                         
37                          end for
Algorithm 3 Optimization of the precoding matrices.
(𝐙^jH𝐘^j1𝐇¯j)Hk=1K𝐇¯kH𝐀kH𝐇¯k𝐗jλL𝐇¯𝐗j=𝟎(\hat{\mathbf{Z}}_{j}^{H}\hat{\mathbf{Y}}_{j}^{-1}\bar{\mathbf{H}}_{j})^{H}-\sum\nolimits_{k=1}^{K}\bar{\mathbf{H}}_{k}^{H}\mathbf{A}_{k}^{H}\bar{\mathbf{H}}_{k}\mathbf{X}_{j}-\lambda_{L}\bar{\mathbf{H}}\mathbf{X}_{j}=\boldsymbol{0} (50)

To solve for 𝐗j\mathbf{X}_{j} we differentiate two cases. If 𝐇\mathbf{H} is row-rank matrix, e.g. (i.e., KNr<NtKN_{r}<N_{t}), then 𝐇¯\bar{\mathbf{H}} is invertible, and thus (50) results in

𝐗j\displaystyle\mathbf{X}_{j} =(k=1K𝐇¯kH𝐀kH𝐇¯k+λL𝐇¯)1𝐇¯jH𝐘^j1𝐙^j.\displaystyle=\left(\sum\nolimits_{k=1}^{K}\bar{\mathbf{H}}_{k}^{H}\mathbf{A}_{k}^{H}\bar{\mathbf{H}}_{k}+\lambda_{L}\bar{\mathbf{H}}\right)^{-1}\bar{\mathbf{H}}_{j}^{H}\hat{\mathbf{Y}}_{j}^{-1}\hat{\mathbf{Z}}_{j}. (51)

If 𝐇\mathbf{H} is column-rank matrix (i.e., KNr>NtKN_{r}>N_{t}), we rewrite (50) as

𝐇(𝐇jH𝐘^j1𝐙^jk=1K𝐇kH𝐀kH𝐇¯k𝐗jλL𝐇H𝐗j)=𝟎\mathbf{H}\Bigl{(}\mathbf{H}_{j}^{H}\hat{\mathbf{Y}}_{j}^{-1}\hat{\mathbf{Z}}_{j}-\sum_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{A}_{k}^{H}\bar{\mathbf{H}}_{k}\mathbf{X}_{j}-\lambda_{L}\mathbf{H}^{H}\mathbf{X}_{j}\Bigr{)}=\boldsymbol{0} (52)

and finally obtain

𝐗j=(k=1K𝐇kH𝐀kH𝐇¯k+λL𝐇H)+𝐇jH𝐘^j1𝐙^j.\mathbf{X}_{j}=\Bigl{(}\sum_{k=1}^{K}\mathbf{H}_{k}^{H}\mathbf{A}_{k}^{H}\bar{\mathbf{H}}_{k}+\lambda_{L}\mathbf{H}^{H}\Bigr{)}^{+}\mathbf{H}_{j}^{H}\hat{\mathbf{Y}}_{j}^{-1}\hat{\mathbf{Z}}_{j}. (53)

If the obtained solution from (51) or (53) satisfy the power constraint (44b) then it also the general case solution. Otherwise, the optimal 𝐗\mathbf{X} is the solution of the achievable rate optimization problem

​​maximize𝐗\underset{\mathbf{X}}{\operatorname{maximize}} k=1KR¯k\displaystyle\sum_{k=1}^{K}\bar{R}_{k} (54a)
subjectto\displaystyle\operatorname{subject~{}to} k=1KTr(𝐇¯𝐗k𝐗kH)Pmax,\displaystyle\sum_{k=1}^{K}\operatorname{Tr}(\bar{\mathbf{H}}\mathbf{X}_{k}\mathbf{X}_{k}^{H})\leq P_{\max}, (54b)

which can be solved by [26, Algorithm 1]. The described optimization algorithm is summarized in Algorithm 3.

IV-B SIM Phase Shift Optimization

For fixed 𝐏\mathbf{P}, the SIM phase shift optimization problem is formulated as

maximizeϕ\displaystyle\underset{\boldsymbol{\phi}}{\operatorname{maximize}}\quad τ(ϕ)=k=1KRk(ϕ)\displaystyle\tau(\boldsymbol{\phi})=\sum\nolimits_{k=1}^{K}R_{k}(\boldsymbol{\phi}) (55a)
subjectto\displaystyle\operatorname{subject~{}to}\quad |ϕ|=1\displaystyle\left|\boldsymbol{\phi}\right|=1 (55b)

where the achievable rate for user kk is expressed as

RL,k(ϕ)\displaystyle R_{\text{L},k}(\boldsymbol{\phi}) =ln|𝐈+j=1K𝐇k𝐏j𝐏jH𝐇kH|\displaystyle=\ln\left|\mathbf{I}+\sum\nolimits_{j=1}^{K}\mathbf{H}_{k}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{H}_{k}^{H}\right|
ln|𝐈+j=1,jkK𝐇k𝐏j𝐏jH𝐇kH|.\displaystyle-\ln\left|\mathbf{I}+\sum\nolimits_{j=1,j\neq k}^{K}\mathbf{H}_{k}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{H}_{k}^{H}\right|. (56)

Since we again apply the projected gradient method for the phase shift optimization, the rate expression in (9) is used, instead of (45). The obvious reason is that it is easier to find the gradient of the objective w.r.t ϕ\boldsymbol{\phi} using (9) than using (9).

The optimization of the phase shifts of the BS SIM follows the same steps as in the case of DPC in subsection III-B. Specifically, the phase shifts are iteratively updated as

ϕ(n+1)=𝒫ϕ(ϕ(n)+tnϕτ(ϕ(n))),\boldsymbol{\phi}^{(n+1)}=\mathcal{P}_{\phi}(\boldsymbol{\phi}^{(n)}+t_{n}\nabla_{\boldsymbol{\phi}}\tau(\boldsymbol{\phi}^{(n)})), (57)

where tnt_{n} is the appropriate step size. Also, the gradient ϕτ(ϕ)\nabla_{\boldsymbol{\phi}}\tau(\boldsymbol{\phi}) is given by ϕτ(ϕ)=[ϕ1τ(ϕ)TϕLτ(ϕ))T]T\nabla_{\boldsymbol{\phi}}\tau(\boldsymbol{\phi})=[\nabla_{\boldsymbol{\phi}^{1}}\tau(\boldsymbol{\phi})^{T}\>\cdots\>\nabla_{\boldsymbol{\phi}^{L}}\tau(\boldsymbol{\phi}))^{T}]^{T}, where ϕlτ(ϕ)\nabla_{\boldsymbol{\phi}^{l}}\tau(\boldsymbol{\phi}) is expressed in the following theorem.

Theorem 2.

The gradients of τ(ϕ)\tau(\boldsymbol{\phi}) w.r.t. the l-th layer of the SIM at the BS is given by

ϕlτ(ϕ)=vecd(k=1K𝚯l+1:L𝐆kH𝐅1,k1𝐇k𝐏^s𝚯1:l1𝐖lH)\displaystyle\nabla_{\boldsymbol{\phi}^{l}}\tau(\boldsymbol{\phi})=\operatorname{vec}_{d}\left(\sum_{k=1}^{K}\boldsymbol{\Theta}^{l+1:L}\mathbf{G}_{k}^{H}\mathbf{F}_{1,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{s}\boldsymbol{\Theta}^{1:l-1}\mathbf{W}_{l}^{H}\right)
vecd(k=1K𝚯l+1:L𝐆kH𝐅2,k1𝐇k𝐏^k𝚯1:l1𝐖lH)\displaystyle-\operatorname{vec}_{d}\left(\sum_{k=1}^{K}\boldsymbol{\Theta}^{l+1:L}\mathbf{G}_{k}^{H}\mathbf{F}_{2,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{k}\boldsymbol{\Theta}^{1:l-1}\mathbf{W}_{l}^{H}\right) (58)

where 𝚯m:n=(𝐖m)H(𝚽m)H(𝐖n)H(𝚽n)H\boldsymbol{\Theta}^{m:n}=(\mathbf{W}^{m})^{H}(\boldsymbol{\mathbf{\Phi}}^{m})^{H}\cdots(\mathbf{W}^{n})^{H}(\boldsymbol{\mathbf{\Phi}}^{n})^{H}, 𝐏^s=j=1K𝐏j𝐏jH\hat{\mathbf{P}}_{s}=\sum_{j=1}^{K}\mathbf{P}_{j}\mathbf{P}_{j}^{H}, 𝐏^k=j=1,jkK𝐏j𝐏jH\hat{\mathbf{P}}_{k}=\sum_{j=1,j\neq k}^{K}\mathbf{P}_{j}\mathbf{P}_{j}^{H}, 𝐅1,k=𝐈+𝐇k𝐏^s𝐇kH\mathbf{F}_{1,k}=\mathbf{I}+\mathbf{H}_{k}\hat{\mathbf{P}}_{s}\mathbf{H}_{k}^{H} and 𝐅2,k=𝐈+𝐇k𝐏^k𝐇kH\mathbf{F}_{2,k}=\mathbf{I}+\mathbf{H}_{k}\hat{\mathbf{P}}_{k}\mathbf{H}_{k}^{H}.

Proof:

See Appendix A. ∎

The outline of the proposed algorithm is given in Algorithm 4.

Input: 𝐇\mathbf{H}, 𝐗(0)\mathbf{X}^{(0)}, ϕ(0)\boldsymbol{\phi}^{(0)}, λ(0)0\lambda^{(0)}\geq 0, δ>0\delta>0, tn>0t_{n}>0, ρ(0,1)\rho\in(0,1), n0n\leftarrow 0
1
2repeat
3     
4     Call Algorithm 3 to obtain 𝐏(n+1)\mathbf{P}^{(n+1)}
5     
6     repeat
7          
8          ϕ(n+1)=𝒫ϕ(ϕ(n)+tnϕτ(ϕ(n)))\boldsymbol{\phi}^{(n+1)}=\mathcal{P}_{\phi}(\boldsymbol{\phi}^{(n)}+t_{n}\nabla_{\boldsymbol{\phi}}\tau(\boldsymbol{\phi}^{(n)}))
9          
10          if  τ(ϕ(n+1))<τ(ϕ(n))+δϕ(n+1)ϕ(n)2\tau(\boldsymbol{\phi}^{(n+1)})<\tau(\boldsymbol{\phi}^{(n)})+\delta||\boldsymbol{\phi}^{(n+1)}-\boldsymbol{\phi}^{(n)}||^{2}  then
11               
12               tnρtnt_{n}\leftarrow\rho t_{n}
13               
14                end if
15               
16                until  τ(ϕ(n+1))τ(ϕ(n))+δϕ(n+1)ϕ(n)2\tau(\boldsymbol{\phi}^{(n+1)})\geq\tau(\boldsymbol{\phi}^{(n)})+\delta||\boldsymbol{\phi}^{(n+1)}-\boldsymbol{\phi}^{(n)}||^{2}
17               nn+1n\leftarrow n+1
18               
19                until convergence of η\eta in (13a)
Algorithm 4 Proposed algorithm for the EE optimization for a SIM-aided broadcast system with linear precoding.

V Computational Complexity

In this section, the computational complexity for SIM-aided broadcast systems with DPC and linear precoding are obtained by counting the required number of complex multiplications. In the following complexity analysis, for ease of exposition, we assume that N,NtNrN,N_{t}\gg N_{r} which is the typical case for a SIM-based broadcast communication system.

V-A DPC Precoding

The optimization of the covariance matrices is performed by Algorithm 1. The complexity of obtaining 𝐔\mathbf{U} from 𝐒\mathbf{S} can be neglected. In addition, the complexity of calculating all 𝐕^j(n)\hat{\mathbf{V}}_{j}^{(n)}, 𝐘^j(n)\hat{\mathbf{Y}}_{j}^{(n)}, 𝐀j(n)\mathbf{A}_{j}^{(n)} and 𝐁j(n)\mathbf{B}_{j}^{(n)} is 𝒪(KNtNr2)\mathcal{O}(KN_{t}N_{r}^{2}) as explained previously. Furthermore, 𝒪(KNt2Nr)\mathcal{O}(KN_{t}^{2}N_{r}) multiplications is needed to calculate all 𝐔j\mathbf{U}_{j} in (32). The complexity of calculating the objective function is 𝒪(KNt2Nr+Nt3)\mathcal{O}(KN_{t}^{2}N_{r}+N_{t}^{3}). Let IUI_{U} be the number of inner loops (i.e., lines 3 to 7 in Algorithm 1). Then the complexity of lines 1 to 11 in Algorithm 1 is 𝒪(IU(KNt2Nr+Nt3))\mathcal{O}(I_{U}(KN_{t}^{2}N_{r}+N_{t}^{3})).

For optimizing the SIM phase shifts, we need 𝒪(KNNtNr)\mathcal{O}(KNN_{t}N_{r}) multiplications for the computation of k=1K𝐆kH𝐒k𝐇k\sum\nolimits_{k=1}^{K}\mathbf{G}_{k}^{H}\mathbf{S}_{k}\mathbf{H}_{k}. The complexity of the matrix inversion 𝐀1\mathbf{A}^{-1} and its multiplication with the previous sum is 𝒪(Nt3+NNt2)\mathcal{O}(N_{t}^{3}+NN_{t}^{2}). As all the matrix product (𝐖m)H(𝚽m)H(\mathbf{W}^{m})^{H}(\boldsymbol{\mathbf{\Phi}}^{m})^{H} are precomputed in advance and the fact that we need only the diagonal elements in (37), the additional complexity is 𝒪(LN3)\mathcal{O}(LN^{3}). Hence, the complexity of the gradient calculation is 𝒪(NNt2+LN3)\mathcal{O}(NN_{t}^{2}+LN^{3}). After obtaining ϕ(n+1)\boldsymbol{\phi}^{(n+1)}, the complexity of calculating 𝐁\mathbf{B} and 𝐇\mathbf{H} is 𝒪(LN3)\mathcal{O}(LN^{3}). In addition, 𝒪(KNtNr(Nt+Nr)+Nt3)𝒪(Nt3)\mathcal{O}(KN_{t}N_{r}(N_{t}+N_{r})+N_{t}^{3})\approx\mathcal{O}(N_{t}^{3}) multiplications is needed for calculating h(ϕ(n+1))h(\boldsymbol{\phi}^{(n+1)}). The complexity of optimizing the SIM phase shifts is given by 𝒪(NNt2+LN3+Iϕ,D(LN3+Nt3))𝒪(NNt2+Iϕ,D(LN3+Nt3))\mathcal{O}(NN_{t}^{2}+LN^{3}+I_{\phi,D}(LN^{3}+N_{t}^{3}))\approx\mathcal{O}(NN_{t}^{2}+I_{\phi,D}(LN^{3}+N_{t}^{3})), where Iϕ,DI_{\phi,D} is the number of line search loops.

Therefore, the overall computational complexity for one iteration of the DPC algorithm is given by

CDPC\displaystyle C_{\text{DPC}} =𝒪(IU(KNt2Nr+Nt3)+NNt2\displaystyle=\mathcal{O}(I_{U}(KN_{t}^{2}N_{r}+N_{t}^{3})+NN_{t}^{2}
+Iϕ,D(LN3+Nt3)).\displaystyle+I_{\phi,D}(LN^{3}+N_{t}^{3})). (59)

V-B Linear Precoding

The optimization of the precoding matrices is specified by Algorithm 3. The calculation of 𝐇¯\bar{\mathbf{H}} requires 𝒪(K2NtNr2)\mathcal{O}(K^{2}N_{t}N_{r}^{2}) multiplications. Computing the initial 𝐗\mathbf{X} from 𝐏\mathbf{P} requires 𝒪(K2NtNr2+K3Nr3)\mathcal{O}(K^{2}N_{t}N_{r}^{2}+K^{3}N_{r}^{3}) multiplications. Furthermore, the complexity of calculating all 𝐙^k\mathbf{\hat{Z}}_{k}, 𝐘^k\mathbf{\hat{Y}}_{k} and 𝐀k\mathbf{A}_{k} is 𝒪(K3Nr3+K2Nr3)\mathcal{O}(K^{3}N_{r}^{3}+K^{2}N_{r}^{3}). Next, we need to determine the optimal 𝐗\mathbf{X} according to (51) if NtKNrN_{t}\geq KN_{r}, or otherwise according to (53). The calculation of the optimal 𝐗\mathbf{X} according to (51) requires 𝒪(K3Nr3)\mathcal{O}(K^{3}N_{r}^{3}) multiplications and according to (53) 𝒪(K3Nr3+K3NtNr2)𝒪(K3NtNr2)\mathcal{O}(K^{3}N_{r}^{3}+K^{3}N_{t}N_{r}^{2})\approx\mathcal{O}(K^{3}N_{t}N_{r}^{2}), which can be written as

Cx={𝒪(K3Nr3)NtKNr𝒪(K3NtNr2)Nt<KNr.C_{x}=\begin{cases}\mathcal{O}(K^{3}N_{r}^{3})&N_{t}\geq KN_{r}\\ \mathcal{O}(K^{3}N_{t}N_{r}^{2})&N_{t}<KN_{r}.\end{cases} (60)

The complexity for calculating kR¯k\sum_{k}\bar{R}_{k} and f(𝐗(n))f(\mathbf{X}^{(n)}) can be neglected. If the number of inner loops (i.e., lines 3 to 8 in Algorithm 3) is IXI_{X}, then the complexity of Algorithm 3 is given by 𝒪(IXCx)\mathcal{O}(I_{X}C_{x}).

To optimize the phase shifts of the SIM, we need 𝒪(KNt2)\mathcal{O}(KN_{t}^{2}) multiplications to obtain 𝐏^s\hat{\mathbf{P}}_{s} and all 𝐏^k\hat{\mathbf{P}}_{k} matrices. The complexity of calculating 𝐅1,k\mathbf{F}_{1,k} and 𝐅2,k\mathbf{F}_{2,k} is 𝒪(KNtNr(Nt+Nr))\mathcal{O}(KN_{t}N_{r}(N_{t}+N_{r})) and the same is also true for 𝐅1,k1𝐇k𝐏^s𝐅2,k1𝐇k𝐏^k\mathbf{F}_{1,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{s}-\mathbf{F}_{2,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{k}. Multiplying these terms with 𝐆kH\mathbf{G}_{k}^{H} has the complexity of 𝒪(KNNt2)\mathcal{O}(KNN_{t}^{2}). Utilizing the fact that all the matrix product (𝐖m)H(𝚽m)H(\mathbf{W}^{m})^{H}(\boldsymbol{\mathbf{\Phi}}^{m})^{H} are precomputed and that we need only the diagonal elements in (58), the additional complexity is 𝒪(LN3)\mathcal{O}(LN^{3}). Hence, the complexity of the gradient calculation is 𝒪(KNNt2+LN3)\mathcal{O}(KNN_{t}^{2}+LN^{3}). After obtaining ϕ(n+1)\boldsymbol{\phi}^{(n+1)}, the complexity of calculating 𝐁\mathbf{B} and 𝐇\mathbf{H} is 𝒪(LN3)\mathcal{O}(LN^{3}). To obtain all terms j𝐇k𝐏j𝐏jH𝐇kH,\sum\nolimits_{j}\mathbf{H}_{k}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{H}_{k}^{H}, we need 𝒪(KNtNr(Nt+Nr))\mathcal{O}(KN_{t}N_{r}(N_{t}+N_{r})) multiplications. Any additional complexity for computing g(ϕ(n+1))g(\boldsymbol{\phi}^{(n+1)}) can be neglected. Hence, the complexity of optimizing the SIM phase shifts is given by 𝒪(KNNt2+LN3+Iϕ,LLN3)𝒪(KNNt2+Iϕ,LLN3)\mathcal{O}(KNN_{t}^{2}+LN^{3}+I_{\phi,L}LN^{3})\approx\mathcal{O}(KNN_{t}^{2}+I_{\phi,L}LN^{3}), where Iϕ,LI_{\phi,L} is the number of line search loops.

Therefore, the overall computational complexity for one iteration of the linear precoding algorithm is given by

CLIN=𝒪(IXCx+KNNt2+Iϕ,LLN3).C_{\text{LIN}}=\mathcal{O}(I_{X}C_{x}+KNN_{t}^{2}+I_{\phi,L}LN^{3}). (61)

VI Convergence

Let us now prove the convergence of Algorithm 2. First, for a given ϕ\boldsymbol{\phi}, Algorithm 2 achieves monotonic convergence, which can be shown as follows. From (23), it holds that

g(𝐔(n+1))=h(𝐔(n+1))k=1KTr(𝐔k(n+1)H𝐔k(n+1))+NtPc+P0+LNPs\displaystyle g(\mathbf{U}^{(n+1)})=\frac{h(\mathbf{U}^{(n+1)})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{U}_{k}^{(n+1)H}\mathbf{U}_{k}^{(n+1)})+N_{t}P_{c}+P_{0}+LNP_{s}} (62)
(a)h¯(𝐔(n+1);𝐔(n))k=1KTr(𝐔k(n+1)H𝐔k(n+1))+NtPc+P0+LNPs\displaystyle\stackrel{{\scriptstyle\textrm{(a)}}}{{\geq}}\frac{\bar{h}(\mathbf{U}^{(n+1)};\mathbf{U}^{(n)})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{U}_{k}^{(n+1)H}\mathbf{U}_{k}^{(n+1)})+N_{t}P_{c}+P_{0}+LNP_{s}} (63)
(b)h¯(𝐔(n);𝐔(n))k=1KTr(𝐔k(n)H𝐔k(n))+NtPc+P0+LNPs\displaystyle\stackrel{{\scriptstyle\textrm{(b)}}}{{\geq}}\frac{\bar{h}(\mathbf{U}^{(n)};\mathbf{U}^{(n)})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{U}_{k}^{(n)H}\mathbf{U}_{k}^{(n)})+N_{t}P_{c}+P_{0}+LNP_{s}} (64)
(c)h(𝐔(n))k=1KTr(𝐔k(n)H𝐔k(n))+NtPc+P0+LNPs=g(𝐔(n))\displaystyle\stackrel{{\scriptstyle\textrm{(c)}}}{{\geq}}\frac{h(\mathbf{U}^{(n)})}{\sum_{k=1}^{K}\operatorname{Tr}(\mathbf{U}_{k}^{(n)H}\mathbf{U}_{k}^{(n)})+N_{t}P_{c}+P_{0}+LNP_{s}}=g(\mathbf{U}^{(n)}) (65)

where (a) is a due to (23), (b) is due to the fact that 𝐔(n+1)\mathbf{U}^{(n+1)} is the optimal solution to (28) and that the optimal objective is no less than the objective of a feasible point, and (c) is true because it is easy to check that h¯(𝐔(n);𝐔(n))=h(𝐔(n))\bar{h}(\mathbf{U}^{(n)};\mathbf{U}^{(n)})=h(\mathbf{U}^{(n)}), i.e. the equality in (23) occurs when 𝐔=𝐔(n)\mathbf{U}=\mathbf{U}^{(n)}. Regarding (b) above, note again that since (28) is a concave-convex fractional program, Dinkelbach’s method is guaranteed to converge to an optimal solution to (28). Consequently, the sequence {g(𝐔(n))}\{g(\mathbf{U}^{(n)})\} increases monotonically to an optimal solution to (18), and thus, Algorithm 2 is able to compute an optimal solution to (14). Next, for given covariance matrices, the SIM phase shift is optimized by a standard projected gradient method, for which the convergence is guaranteed. Also, the projected gradient method always yields an improved solution. In other words, Algorithm 2 generates a non-decreasing objective sequence. Since the feasible sets for the convariance matrices and phase shifts are continuous, the objective sequence produced by Algorithm 2 is guaranteed to converge.

Since Algorithm 4 uses a similar method for the optimization of the precoding matrices, as Algorithm 2 for the optimization of the covariance matrices, we can prove, following the same derivation steps, that for a given ϕ\boldsymbol{\phi} Algorithm 4 is guaranteed to provide an optimal solution to (28). In addition, a gradient-based optimization of the SIM phase shifts always increase the objective function. Moreover, the feasible sets for the precoding matrices and phase shifts are continuous, which ensures the convergence of the objective sequence in Algorithm 4.

VII Simulation Results

In this section, we evaluate the EE of the considered systems using proposed algorithms by means of Monte Carlo simulations. First, we compare the EE of the proposed algorithms and three benchmark schemes. The first benchmark scheme, referred to as LIN w/o SIM, is based on linear precoding without the presence of a SIM. In that case, the total power consumption is Pr+NtPc+P0+LNPs.P_{r}+N_{t}P_{c}+P_{0}+LNP_{s}. The second benchmark scheme, referred to as LIN w/o prec., does not employ digital precoding; instead data streams are fed directly to transmit antennas, while the phase shifts of the SIM meta-elements are optimized as described in Algorithm 4. The achievable rate for a single user in this scheme can be calculated using (9) and the identity 𝐏k𝐏kH=(Pmax/KNt)𝐈\mathbf{P}_{k}\mathbf{P}_{k}^{H}=(P_{\mathrm{max}}/KN_{t})\mathbf{I}, which ensures that the power constraint (10) is satisfied. The last benchmark scheme, termed LIN w/o prec. red. RF, also does not include a SIM, but differs from the previous one by utilizing a reduced number of transmit antennas and, consequently, a reduced number of RF chains. More precisely, this scheme uses KNrKN_{r} active transmit antennas (i.e., RF chains), each transmitting an independent data stream. The remaining transmit antennas (i.e., RF chains) are inactive, resulting in a total power consumption of Pr+KNrPc+P0+LNPs.P_{r}+KN_{r}P_{c}+P_{0}+LNP_{s}.

The channel matrix between the BS and user kk is modeled according to the spatially-correlated channel model as 𝐆k=𝐆¯k𝐑T1/2Nr×L\mathbf{G}_{k}=\bar{\mathbf{G}}_{k}\mathbf{R}_{\text{T}}^{1/2}\in\mathbb{C}^{N_{r}\times L}, where 𝐆¯kNr×L\bar{\mathbf{G}}_{k}\in\mathbb{C}^{N_{r}\times L} denotes the channel between the last SIM layer and the receiver, and follows a complex Gaussian distribution 𝒞𝒩(0,β𝐈)\mathcal{CN}(0,\beta\mathbf{I}). The free space path loss between the transmitter and the receiver, β\beta, is given by β(d)=β(d0)+10blog10(d/d0)\beta(d)=\beta(d_{0})+10b\log_{10}(d/d_{0}), where β(d0)=20log10(4πd0/λ)\beta(d_{0})=20\log_{10}(4\pi d_{0}/\lambda) is the free space path loss at the reference distance d0d_{0}, bb is the path loss exponent, and dd is the distance between the BS and user kk. Moreover, 𝐑TL×L\mathbf{R}_{\text{T}}\in\mathbb{C}^{L\times L} is the spatial correlation matrix of the SIM, with its elements defined according to [18, Eq. (14), (15)].

In the following simulation setup, the parameters are set as follows: λ=5cm\lambda=5\,\text{cm} (i.e., f=6GHzf=6\,\text{GHz}), Nt=16N_{t}=16, Nr=2N_{r}=2, W=100kHzW=100\thinspace\text{kHz}, β=3.5\beta=3.5, d0=1md_{0}=1\,\text{m}, L=4L=4, K=4K=4 and σ2=110dB\sigma^{2}=-110\,\text{dB}. Unless otherwise specified, the number of meta-elements per SIM layer, NN, is 100. The BS antennas are placed in a planner array parallel to the xyxy-plane and the position of its midpoint is (30m,0,0)(30\,\mathrm{m},0,0). The inter antenna separation of the BS antennas is λ/2\lambda/2 in both dimensions. The BS SIM layers are also placed parallel to the xyxy-plane, with the midpoint of the ll-th layer located at (30m,0,lλ/2)(30\,\mathrm{m},0,l\lambda/2). Moreover, the meta-elements in each SIM layer are uniformly placed in a square formation, where each meta-element has dimensions λ/2×λ/2\lambda/2\times\lambda/2. It is assumed that all users’ ULAs are parallel to the xx-axis and the midpoint of the kk-th user’s ULA is positioned at (xk,yk,zk)(x_{k},y_{k},z_{k}). The users’ coordinates are randomly selected such that xkx_{k} is drawn from a uniform distribution between 1.6 m and 2 m with a resolution of 1 cm, yky_{k} is drawn from a uniform distribution between 20-20 m and 20 m with a resolution of 0.5 m, and zkz_{k} is drawn from a uniform distribution between 80 m and 120 m with a resolution of 0.5 m. The circuit power per RF chain is Pc=30 dBmP_{c}=30\text{\thinspace dBm} and the basic power consumption at the BS is P0=40dBmP_{0}=40\thinspace\text{dBm} [7, 27]. The power consumption of each SIM meta-element is Ps=10dBmP_{s}=10\thinspace\text{dBm} [28, 29]. For the DPC method, we use ϵ1,D=ϵ2,D=106\epsilon_{1,D}=\epsilon_{2,D}=10^{-6} and for linear precoding ϵ1,L=ϵ2,L=106\epsilon_{1,L}=\epsilon_{2,L}=10^{-6}. Regarding the gradient-based optimization methods, the initial step size value is 1000, ρ=1/2\rho=1/2 and δ=103\delta=10^{-3}. All results are averaged over 200 independent channel realizations.

Refer to caption
Figure 1: Convergence of the proposed algorithms for different number of SIM layers.
Refer to caption
Figure 2: EE versus number meta-elements per SIM layer.

The convergence of the proposed algorithms for different number of SIM layers is shown in Fig. 1. In general, we can see that the number of iterations required for the algorithms to converge increases with the number of SIM layers (i.e., meta-elements). A similar, but much more pronounced, effect was previously observed in [20], where the authors optimized the SIM phase shifts on a layer-by-layer basis. Moreover, the EE does not change monotonically with LL. For DPC, the EE at convergence is almost the same for L=4L=4 and L=8L=8, which is visibly larger in the case of linear precoding. This can be attributed to the saturation of the achievable sum-rate as the number of SIM layers increases [19, Fig. 4], coupled with the fact that power consumption scales linearly with the number of meta-elements.

In Fig. 2, we present the EE evaluated for different number meta-elements per SIM layer. Specifically, the numbers of meta-elements considered per SIM layer are 25, 49, 100 and 196. The EE of both the proposed schemes and the benchmark schemes without precoding (i.e., LIN w/o prec. and LIN w/o prec. red. RF) increases with the number of meta-elements per SIM layer. The increase rates of the corresponding EE gradually reduces with NN. The reason is that for these schemes, the transmitter operates full power, and thus the EE increases in line with the achieved sum-rate, which follows a logarithm function. Among the benchmark schemes without precoding, the LIN w/o prec. red. RF scheme achieves significantly higher EE. This is partly due to lower power consumption, as some of its RF chains are inactive. Additionally, in the LIN w/o prec. scheme, each antenna simultaneously transmits data for all users, while in the LIN w/o prec. red. RF scheme, each active antenna transmits an independent data stream, which better suppresses the multi-user interference. Since the LIN w/o SIM scheme does not incorporate a SIM, its EE is independent of the number of SIM meta-elements. For a small number of meta-elements per SIM layer, this benchmark scheme has the largest EE, while for a higher number of meta-elements the EE of other schemes becomes larger. Finally, we observe that the scheme with DPC achieves a slightly higher EE than the linear precoding scheme, and that this EE difference increases with the number of SIM meta-elements.

Refer to caption
Figure 3: EE (blue solid lines) and achievable sum-rate (red dashed lines) versus number SIM layer for the constant number of 400 meta-elements.

In Fig. 3, we present the EE and the achievable sum-rate of the considered system versus the number of SIM layers, while maintaining a constant total of 400400 meta-elements. As observed, the EE and the achievable sum-rate of the proposed schemes do not change monotonically with the number of SIM layers, LL. Both the EE and the achievable sum-rate increase when LL changes from 1 to 2, which is likely due to the enhanced beamforming capabilities offered by multi-layer structures. However, with LL increases further, both of performance metrics significantly decrease, potentially reaching levels comparable to those of the LIN w/o SIM scheme, a benchmark scheme that does not utilize a SIM. This effect can be explained by the following reasoning:as the number of meta-elements per SIM layer decreases, the beamforming gain of each individual layer also reduces. Additionally, the signal propagation between adjacent SIM layers can be viewed as a form of path loss, which increases with the number of SIM layers. These two facts contribute to the observed reduction in both EE and achievable sum-rate when the number of SIM layers is greater than two. A similar trend is observed in the benchmark schemes without precoding (i.e., LIN w/o prec. and LIN w/o prec. red. RF), where the EE and the achievable sum-rate also decrease as LL increases. Among these two scheme, the LIN w/o prec. red. RF scheme achieves much better system performance.

Refer to caption
Figure 4: The EE versus the number users.

The EE of the considered schemes for different number of users, KK, is shown in Fig. 4. For a small KK, the LIN w/o prec. red. RF scheme provides the best EE, primarily due to the low number of RF chains used in this scheme. However, as KK increases, the number of RF chains used by this scheme becomes comparable to the total number of RF chains in other schemes. In addition, the lack of capability of adjusting the amplitude of the transmitted signal prevents this scheme from effectively suppressing the multi-user interference [25]. As a result, the EE of this scheme reduces as KK increases. For the same reason, the EE of the LIN w/o prec. scheme also decreases with an increasing number of users. On the other hand, the EE of the proposed schemes and the LIN w/o SIM scheme increase with KK, because of the presence of digital precoders that can reduce or even eliminate the multi-user interference, allowing them to exploit the multiuser diversity. Comparing the EE of the proposed schemes with that of the benchmark schemes, we can see that using a SIM in combination with digital precoding almost always provides the best EE, except in cases where KK is very small.

Refer to caption
Figure 5: EE versus the maximum transmit power.

Next, we study how the EE varies with the maximum transmit power, as shown in Fig. 5. The EE curves for all schemes exhibit an approximately logarithmic shape due to the logarithmic increase in the achievable rate. As expected, the proposed schemes achieve higher EE compared to benchmark schemes. Among all the benchmark schemes, the LIN w/o prec. red. RF scheme obtains the largest EE because of a smaller number of RF chains used. The difference between the EE of the proposed scheme with linear precoding scheme and that of any other benchmark scheme generally increases with transmit power, although these differences are almost stabilized at higher transmit power levels. Additionally, the DPC-based scheme consistently shows a noticeable improvement in EE over all other schemes, which can be attributed to the superior interference suppression capabilities of DPC.

In order to better understand the impact of realistic imperfections, we present the EE of the proposed schemes for the case of discrete SIM phase shifts in Fig. 6. The EE generally deteriorates as the number of quantization bits decreases. This effect is more evident for SIMs with a larger number of layers, since they contain more meta-elements and thus can cause a larger EE reduction. As a rule of thumb, at least 3 bits per meta-element are required for SIMs with a small LL (e.g., 2 or 4) to ensure that the EE reduction caused by quantization errors remains within acceptable limits. For SIMs with a larger LL, the minimum number of bits per meta-element is expected to be higher.

Refer to caption
Figure 6: EE for the case of discrete SIM phase shifts.

The per-iteration computational complexity of the proposed optimization schemes are shown in Table I. The relevant iteration counts for Dinkelbach’s method IUI_{U} and IXI_{X}, and the number of the line search steps, Iϕ,DI_{\phi,D} and Iϕ,LI_{\phi,L}, are averaged over the number of iterations required for each optimization scheme to reach 95 % of the EE at the 1000-th iteration. It can be observed that the number of iterations of Dinkelbach’s method remains almost unchanged as KK varies for both schemes, and the same holds true for the number of line search steps. Moreover, the computational complexity of the proposed optimization scheme for DPC-based systems is higher than that of the scheme with linear precoding when KK is 4 and 8. However, when KK increases to 12, the DPC-based scheme exhibits significantly lower complexity compared to the linear precoding scheme. This substantial increase in the complexity of the linear precoding scheme is due to the fact that the complexity of the precoding matrix optimization, CxC_{x}, is proportional to NtN_{t} when K=12K=12.

DPC Linear precoding
KK IUI_{U} Iϕ,DI_{\phi,D} CDPCC_{\text{DPC}} IXI_{X} Iϕ,LI_{\phi,L} CLINC_{\text{LIN}}
4 55 1 4367616 51 1 4129024
8 55 1 4480256 50 1 4409600
12 55 1 4592896 51 1 9947392
TABLE I: Comparison of the per-iteration computational complexities of the schemes with DPC and linear precoding.

VIII Conclusion

In this paper, we studied the EE maximization in a SIM-aided broadcast system with DPC and linear precoding at the BS. For DPC, we exploited the well-known BC-MAC duality and optimize the users’ covariance matrices by employing a SCA-based technique, which establishes a tight lower bound of the achievable sum-rate, and applying Dinkelbach’s method. A similar approach was used to optimize the precoders in the case of linear precoding. The phase shifts of the SIM meta-elements for DPC and linear precoding were optimized using a conventional projected gradient-based method due to its simplicity. Also, we conducted a computation complexity analysis of the proposed optimization algorithms and proved their convergence. Numerical results showed that implementing these proposed optimization algorithms can significantly improve the EE for SIM-aided broadcast systems. Moreover, we demonstrated that the EE depends on the number of SIM meta-elements and their distribution across the SIM layers. We also found that in SIM-aided broadcast systems without precoding, optimal energy efficient transmission strategies typically involve a subset of active transmit antennas.

Appendix A Proof of Theorem 2

Differentiating Rk(ϕ)R_{k}(\boldsymbol{\phi}) in (56) w.r.t. 𝐇k\mathbf{H}_{k} yields

dRL,k(ϕ)=dln|𝐅1,k|dln|𝐅2,k|\text{d}R_{\text{L},k}(\boldsymbol{\phi})=\text{d}\ln\left|\mathbf{F}_{1,k}\right|-\text{d}\ln\left|\mathbf{F}_{2,k}\right| (66)

where

dln|𝐅1,k|\displaystyle\text{d}\ln\left|\mathbf{F}_{1,k}\right| =Tr(𝐏^s𝐇kH𝐅1,k1d𝐇k+𝐅1,k1𝐇k𝐏^sd𝐇kH)\displaystyle=\operatorname{Tr}\left(\hat{\mathbf{P}}_{s}\mathbf{H}_{k}^{H}\mathbf{F}_{1,k}^{-1}\text{d}\mathbf{H}_{k}+\mathbf{F}_{1,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{s}\text{d}\mathbf{H}_{k}^{H}\right) (67)
dln|𝐅2,k|\displaystyle\!\!\!\text{d}\ln\left|\mathbf{F}_{2,k}\right| =Tr(𝐏^s𝐇kH𝐅2,k1d𝐇k+𝐅2,k1𝐇k𝐏^kd𝐇kH).\displaystyle=\operatorname{Tr}\left(\hat{\mathbf{P}}_{s}\mathbf{H}_{k}^{H}\mathbf{F}_{2,k}^{-1}\text{d}\mathbf{H}_{k}+\mathbf{F}_{2,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{k}\text{d}\mathbf{H}_{k}^{H}\right). (68)

Substituting (40) into the previous expressions, we obtain

dln|𝐅1,k|\displaystyle\text{d}\ln\left|\mathbf{F}_{1,k}\right| =Tr(𝐆1Hd𝚽l+𝐆1d(𝚽l)H)\displaystyle=\operatorname{Tr}\left(\mathbf{G}_{1}^{H}\text{d}\boldsymbol{\mathbf{\Phi}}^{l}+\mathbf{G}_{1}\text{d}(\boldsymbol{\mathbf{\Phi}}^{l})^{H}\right) (69)
dln|𝐅2,k|\displaystyle\text{d}\ln\left|\mathbf{F}_{2,k}\right| =Tr(𝐆2Hd𝚽l+𝐆2d(𝚽l)H)\displaystyle=\operatorname{Tr}\left(\mathbf{G}_{2}^{H}\text{d}\boldsymbol{\mathbf{\Phi}}^{l}+\mathbf{G}_{2}\text{d}(\boldsymbol{\mathbf{\Phi}}^{l})^{H}\right) (70)

where

𝐆1=\displaystyle\mathbf{G}_{1}= 𝚯l+1:L𝐆kH𝐅11𝐇k𝐏^s𝚯1:l1𝐖lH\displaystyle\boldsymbol{\Theta}^{l+1:L}\mathbf{G}_{k}^{H}\mathbf{F}_{1}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{s}\boldsymbol{\Theta}^{1:l-1}\mathbf{W}_{l}^{H} (71)
𝐆2=\displaystyle\mathbf{G}_{2}= 𝚯l+1:L𝐆kH𝐅21𝐇k𝐏^k𝚯1:l1𝐖lH\displaystyle\boldsymbol{\Theta}^{l+1:L}\mathbf{G}_{k}^{H}\mathbf{F}_{2}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{k}\boldsymbol{\Theta}^{1:l-1}\mathbf{W}_{l}^{H} (72)

After a few simple mathematical steps, we get

ϕlRL,k(ϕ)=vecd(𝚯l+1:L𝐆kH𝐅1,k1𝐇k𝐏^s𝚯1:l1𝐖lH)\displaystyle\nabla_{\boldsymbol{\phi}^{l}}R_{\text{L},k}(\boldsymbol{\phi})=\operatorname{vec}_{d}\left(\boldsymbol{\Theta}^{l+1:L}\mathbf{G}_{k}^{H}\mathbf{F}_{1,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{s}\boldsymbol{\Theta}^{1:l-1}\mathbf{W}_{l}^{H}\right)
vecd(𝚯l+1:L𝐆kH𝐅2,k1𝐇k𝐏^k𝚯1:l1𝐖lH).\displaystyle-\operatorname{vec}_{d}\left(\boldsymbol{\Theta}^{l+1:L}\mathbf{G}_{k}^{H}\mathbf{F}_{2,k}^{-1}\mathbf{H}_{k}\hat{\mathbf{P}}_{k}\boldsymbol{\Theta}^{1:l-1}\mathbf{W}_{l}^{H}\right). (73)

From this gradient expression for the achievable rate of user kk, we can easily obtain the appropriate gradients for all other users. After summation of all these gradient expressions, we obtain (58). This completes the proof.

References

  • [1] “Framework and overall objectives of the future development of imt for 2030 and beyond,” International Telecommunication Union (ITU) Recommendation (ITU-R), 2023.
  • [2] “Mobile data traffic outlook – Ericsson Mobility Report,” 2024, [Accessed 05-07-2024]. [Online]. Available: https://www.ericsson.com/en/reports-and-papers/mobility-report/dataforecasts/mobile-traffic-forecast
  • [3] M. Di Renzo et al., “Smart radio environments empowered by reconfigurable intelligent surfaces: How it works, state of research, and the road ahead,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 11, pp. 2450–2525, 2020.
  • [4] T. Gong et al., “Holographic MIMO communications: Theoretical foundations, enabling technologies, and future directions,” IEEE Communications Surveys & Tutorials, vol. 26, no. 1, pp. 196–257, 2024.
  • [5] A. Zappone et al., “Energy efficiency of holographic transceivers based on RIS,” in GLOBECOM 2022-2022 IEEE Global Communications Conference.   IEEE, 2022, pp. 4613–4618.
  • [6] N. Shlezinger et al., “Dynamic metasurface antennas for 6G extreme massive MIMO communications,” IEEE Wireless Communications, vol. 28, no. 2, pp. 106–113, 2021.
  • [7] L. You et al., “Energy efficiency maximization of massive MIMO communications with dynamic metasurface antennas,” IEEE Transactions on Wireless Communications, vol. 22, no. 1, pp. 393–407, 2022.
  • [8] C. Liu et al., “A programmable diffractive deep neural network based on a digital-coding metasurface array,” Nature Electronics, vol. 5, no. 2, pp. 113–122, 2022.
  • [9] J. An et al., “Two-dimensional direction-of-arrival estimation using stacked intelligent metasurfaces,” arXiv preprint arXiv:2402.08224, 2024.
  • [10] Q.-U.-A. Nadeem et al., “Hybrid digital-wave domain channel estimator for stacked intelligent metasurface enabled multi-user MISO systems,” arXiv preprint arXiv:2309.16204, 2023.
  • [11] Z. Wang et al., “Multi-user ISAC through stacked intelligent metasurfaces: New algorithms and experiments,” arXiv preprint arXiv:2405.01104, 2024.
  • [12] N. U. Hassan et al., “Efficient beamforming and radiation pattern control using stacked intelligent metasurfaces,” IEEE Open Journal of the Communications Society, vol. 5, pp. 599–611, 2024.
  • [13] J. An et al., “Stacked intelligent metasurfaces for multiuser downlink beamforming in the wave domain,” arXiv preprint arXiv:2309.02687, 2023.
  • [14] A. Papazafeiropoulos et al., “Achievable rate optimization for large stacked intelligent metasurfaces based on statistical CSI,” IEEE Wireless Communications Letters, 2024, Early Access.
  • [15] S. Lin et al., “Stacked intelligent metasurface enabled LEO satellite communications relying on statistical CSI,” IEEE Wireless Communications Letters, vol. 13, no. 5, pp. 1295–1299, 2024.
  • [16] H. Liu et al., “DRL-based orchestration of multi-user MISO systems with stacked intelligent metasurfaces,” arXiv preprint arXiv:2402.09006, 2024.
  • [17] Q. Li et al., “Stacked intelligent metasurfaces for holographic MIMO aided cell-free networks,” IEEE Transactions on Communications, 2024, Early Access.
  • [18] J. An et al., “Stacked intelligent metasurfaces for efficient holographic MIMO communications in 6G,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 8, pp. 2380–2396, 2023.
  • [19] A. Papazafeiropoulos et al., “Achievable rate optimization for stacked intelligent metasurface-assisted holographic MIMO communications,” arXiv preprint arXiv:2402.16415, 2024.
  • [20] N. S. Perović and L.-N. Tran, “Mutual information optimization for SIM-based holographic MIMO systems,” arXiv preprint arXiv:2403.18307, 2024.
  • [21] X. Lin et al., “All-optical machine learning using diffractive deep neural networks,” Science, vol. 361, no. 6406, pp. 1004–1008, 2018.
  • [22] S. Vishwanath et al., “Duality, achievable rates, and sum-rate capacity of gaussian MIMO broadcast channels,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2658–2668, 2003.
  • [23] J. Xu and L. Qiu, “Energy efficiency optimization for MIMO broadcast channels,” IEEE Transactions on Wireless Communications, vol. 12, no. 2, pp. 690–701, 2013.
  • [24] H. H. M. Tam et al., “Successive convex quadratic programming for quality-of-service management in full-duplex MU-MIMO multicell networks,” IEEE Transactions on Communications, vol. 64, no. 6, pp. 2340–2353, 2016.
  • [25] N. S. Perović et al., “On the maximum achievable sum-rate of the ris-aided mimo broadcast channel,” IEEE Transactions on Signal Processing, vol. 70, pp. 6316–6331, 2022.
  • [26] X. Zhao et al., “Rethinking WMMSE: Can its complexity scale linearly with the number of BS antennas?” IEEE Transactions on Signal Processing, vol. 71, pp. 433–446, 2023.
  • [27] S. He et al., “Coordinated beamforming for energy efficient transmission in multicell multiuser systems,” IEEE Transactions on Communications, vol. 61, no. 12, pp. 4961–4971, 2013.
  • [28] J. Wang et al., “Reconfigurable intelligent surface: Power consumption modeling and practical measurement validation,” IEEE Transactions on Communications, 2024, Early Access.
  • [29] C. Huang et al., “Reconfigurable intelligent surfaces for energy efficiency in wireless communication,” IEEE Transactions on Wireless Communications, vol. 18, no. 8, pp. 4157–4170, 2019.