This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rate-Splitting Multiple Access for Downlink Multiuser MIMO: Precoder Optimization and PHY-Layer Design

Anup Mishra, Yijie Mao, , Onur Dizdar, , and Bruno Clerckx This work has been partially supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under grant EP/N015312/1, EP/R511547/1.
Abstract

Rate-Splitting Multiple Access (RSMA) has recently appeared as a powerful and robust multiple access and interference management strategy for downlink Multi-user (MU) multi-antenna communications. In this work, we study the precoder design problem for RSMA scheme in downlink MU systems with both perfect and imperfect Channel State Information at the Transmitter (CSIT) and assess the role and benefits of transmitting multiple common streams. Unlike existing works which have considered single-antenna receivers (Multiple-Input Single-Output–MISO), we propose and extend the RSMA framework for multi-antenna receivers (Multiple-Input Multiple-Output–MIMO) and formulate the precoder optimization problem with the aim of maximizing the Weighted Ergodic Sum-Rate (WESR). Precoder optimization is solved using Sample Average Approximation (SAA) together with the proposed vectorization and Weighted Minimum Mean Square Error (WMMSE) based approach. Achievable sum-Degree of Freedom (DoF) of RSMA is derived for the proposed framework as an increasing function of the number of transmitted common and private streams, which is further validated by the Ergodic Sum Rate (ESR) performance using Monte Carlo simulations. Conventional MU-MIMO based on linear precoders and Non-Orthogonal Multiple Access (NOMA) schemes are considered as baselines. Numerical results show that with imperfect CSIT, the sum-DoF and ESR performance of RSMA is superior than that of the two baselines, and is increasing with the number of transmitted common streams. Moreover, by better managing the interference, RSMA not only has significant ESR gains over baseline schemes but is more robust to CSIT inaccuracies, network loads and user deployments.

Index Terms:
Rate-Splitting Multiple Access (RSMA), interference management, imperfect CSIT, Degree of Freedom (DoF), MU–MIMO, MIMO, MIMO NOMA, Weighted Sum-Rate (WSR).

I Introduction

Multiple-Input Multiple-Output (MIMO) communication networks are one of the key enabling technologies for current and future wireless networks. By multiplexing signals in space, MIMO networks are capable of providing remarkably higher spectral efficiency. For a point-to-point MIMO channel, the channel capacity is known to scale linearly with the minimum number of transmit and receive antennas at high Signal-to-Noise Ratio (SNR), regardless of the level of Channel State Information (CSI) available to the Base Station (BS) [1]. However, such result does not hold in a Multi-user (MU) network environment. When Channel State Information at the Transmitter (CSIT) is perfect, it is well known that Dirty Paper Coding (DPC) achieves the capacity region of the MIMO Broadcast Channel (BC) [2, 3] but its implementation is prohibitive due to high computational complexity. A more practical approach that has attracted great attention is Space Division Multiple Access (SDMA) and MU–MIMO implemented using MU Linear Precoders (LP)111MU–MIMO techniques based on LP will be referred to as MU–MIMO in the rest of the paper.[4, 5, 6, 7]. However, this approach has many limitations. Among them, one crucial limitation is that SDMA and MU-MIMO require accurate CSIT to design beamforming and manage interference. Therefore, it is sensitive to the CSIT inaccuracy. In practical wireless communication networks, the CSIT quality can deteriorate due to several reasons such as pilot reuse in Time Division Duplex (TDD), mobility, latency, quantization errors in Frequency Division Duplex (FDD) and hardware impairments. Poor CSIT quality leads to poor interference management in current MIMO BC based on SDMA and MU–MIMO, and therefore, acts as a primary bottleneck in meeting demands of higher data rates.

Besides SDMA, another multiple access scheme that has recently attracted great attention in multi-antenna MU networks to manage interference is Non-Orthogonal Multiple Access (NOMA). Motivated by the established result in Single-Input Single-Output (SISO) BC, where power-domain NOMA based on Superposition Coding (SC) at the transmitter and Successive Interference Cancellation (SIC) at the receivers is known as a capacity-achieving scheme (such scheme is also known as SC–SIC), NOMA has been applied to multi-antenna BC [8, 9]. A typical MIMO NOMA strategy is to group the users into different user groups and apply SIC within each group to decode the intra-group interference. The inter-group interference is treated as noise [10]. However, it has been pointed out in [10, 11, 12, 13] that NOMA, which forces one user to decode the message of other users, causes spatial Degree of Freedom (DoF) loss and is inefficient in multi-antenna settings. Hence, in multi-antenna (Gaussian) BC with perfect CSIT, the only known capacity-achieving scheme is Dirty Paper Coding (DPC), rather than NOMA.

To overcome the limitations of SDMA and NOMA, a novel multiple access scheme, called Rate-Splitting Multiple Access (RSMA) is introduced in [10] for Multiple-Input Single-Output (MISO) BC. RSMA is based on splitting user messages into common and private parts at the transmitter. The common parts of the user messages are combined into common messages and encoded into common streams to be decoded by multiple users (but not necessarily intended to all those users). The private parts of all users are independently encoded into private streams to be decoded by the corresponding users only and treated as noise by other users. The encoded common and private streams are superposed in a non-orthogonal manner. Users rely on SIC to first decode the intended common streams before decoding the intended private streams. By adjusting the message split and the power allocation to the common and private streams, RSMA manages to partially decode the interference and treat the remaining interference as noise. This capability allows RSMA to act as a bridge between the two extreme interference management schemes of fully treating interference as noise (as in SDMA and MU–MIMO) and fully decoding interference (as in NOMA), and creates the opportunity to enhance the Quality of Service (QoS) and reduce complexity [14]. RSMA is shown to be a more general multiple access scheme embracing SDMA and NOMA as special cases [10, 12]. It is built upon 1-layer Rate-Splitting (RS), a low-complexity strategy that relies on one layer of SIC at each user [15]. For simplicity, 1-layer RS will be referred to by ‘RS’ for the rest of the paper.

The concept of RS was first introduced in [16] for a two-user SISO Interference Channel (IC) and has nowadays been further developed for multi-antenna BC [12, 17, 18, 19, 20, 21, 22, 23, 24]. In multi-antenna BC, RS has been studied from an information theoretic perspective [15, 20, 25, 21] demonstrating that it achieves the optimal sum-DoF [20] and the entire DoF region [25] of the KK-user underloaded MISO BC with imperfect CSIT. [21] investigates the achievable DoF region of RS in asymmetric MIMO BC and IC with imperfect CSIT and this achievable DoF region is shown in [26] to be optimal. Expanding the scope from high SNR to finite SNR regime of MISO BC, the energy efficiency performance of RSMA is investigated in [27] and its spectral efficiency performance is investigated in [10] with perfect CSIT and in [20, 18] with imperfect CSIT. RSMA has been shown to achieve a rate region close to DPC in MISO BC with perfect CSIT [10]. When CSIT is imperfect, linearly precoded RS is able to achieve a larger rate region than DPC [28]. A novel non-linearly precoded RS scheme, namely, Dirty Paper Coded RS is also proposed in [28], which is shown to outperform linearly precoded RS and DPC. [29] has explored the benefits of RS using non-linear precoding technique named Tomlinson-Harashima Precoding (THP) in MISO BC. [30] investigates precoder design and stream selection for RS in MISO BC. Apart from the conventional MISO BC, performance benefits of RS have also been exploited in massive MIMO [31, 32], millimeter wave systems [33, 34], multigroup multicasting [35, 36], multicarrier multigroup multicast [37], joint unicast and multicast transmission [14], Cloud Radio Access Network (C-RAN) [38], cooperative user relaying [39], secure transmission [40], etc. Furthermore, [41] investigates resource allocation for multicarrier RSMA systems, [42] studies RSMA in aerial networks and RSMA is shown to provide better robustness, rate and QoS in multi-cell Coordinated Multipoint (CoMP) [43]. RSMA is therefore a more promising strategy to manage interference in MIMO networks with both perfect and imperfect CSIT.

RS has been fairly studied and analyzed in different aforementioned works. However, the number of receive antennas at each user is limited to one in most of these works. While there are studies considering multi-antenna receivers (in MIMO settings), the scope of such studies is limited. For example, [44] proposes practical stream combining techniques together with Regularized Block Diagonalization (RBD) precoding for RS in MIMO BC but with only a single common stream and without precoding optimization. To the best of our knowledge, the role and benefits of multiple common streams in MIMO BC at finite SNR have not been investigated and remains an open problem. Moreover, in both perfect and imperfect CSIT settings, the achievable rate region of RS in MIMO BC is still unknown.

I-A Motivations and Contributions

In light of the information theoretic results of [21], in a symmetric setup with MM transmit antennas and QQ receive antennas at each user, min(M,Q)\min(M,Q) common streams should be transmitted in RS to achieve the information theoretical optimal DoF with imperfect CSIT. Motivated by this result and the performance gain of RS over SDMA, NOMA and DPC in terms of rate and sum-DoF with imperfect CSIT in MISO BC, we fill the aforementioned research gaps and make the following contributions:

  • We introduce a general framework of RS in symmetric MIMO BC with the same number of receive antennas at all users. The setting is general in the sense that RS can have arbitrary number of common streams between 11 and min(M,Q)\min(M,Q) inclusive. This is the first work that allows flexibility in the number of common and private streams to be transmitted in RS.

  • At high SNR, we derive the achievable sum-DoF for the proposed RS framework in MIMO BC with imperfect CSIT and show the influence of multiple common streams on the sum-DoF of RS. Even with a single common stream, the sum-DoF of RS is shown to be greater than the sum-DoF of MU–MIMO and MIMO NOMA. This sum-DoF of RS increases as the number of common streams increases. We show that by transmitting multiple common streams, the sum-DoF gain of RS over MU–MIMO and MIMO NOMA increases. The assertions are further testified through the Ergodic Sum Rate (ESR) performance using Monte-Carlo simulations. This is the first paper to compare the DoF of MU-MIMO, MIMO NOMA and RS in a MIMO setting as opposed to current comparisons as in [13] which are limited to MISO settings.

  • We propose to utilize vectorization and Weighted Minimum Mean Square Error (WMMSE)-based Alternative Optimization (AO) algorithm to optimize the precoders for RS in MIMO BC with the aim of maximizing the WSR subject to the transmit power constraint. The proposed optimization framework addresses the challenge of intractable optimization introduced due to matrix variables. To the best of our knowledge, this is the first work that studies the precoder optimization and the benefits of transmitting multiple commons streams in RS-assisted MIMO BC with perfect and imperfect CSIT.

  • Under the assumption of Gaussian signalling and infinite block lengths, we demonstrate that the Ergodic Rate (ER) region of RS with optimized precoders always outperforms the ER regions of MU–MIMO and MIMO NOMA in MIMO BC with both perfect and imperfect CSIT. When CSIT is perfect, we also demonstrate that the ER region of RS comes closer to the capacity region achieved by DPC than MU–MIMO and MIMO NOMA. This is the first work to demonstrate such benefits of RS in MIMO settings.

  • To demonstrate the performance of RS in practical systems, we design the Physical (PHY)-layer architecture of RS with finite constellation modulation schemes, finite length polar codes and Adaptive Modulation and Coding (AMC). We show via the Link Level Simulations (LLS) that RS achieves significant throughput gain over MU–MIMO and MIMO NOMA in MIMO BC. This is the first work to design the PHY-layer architecture and to provide the LLS of RS in MIMO settings.

I-B Organisation

The rest of the paper is organized as follows. In Section II, the system model and CSIT assumptions are delineated. Problem is formulated in Section III. Section IV contains the proposed methodology to solve the optimization problem. Section V describes the PHY-layer architecture for RS. Simulation results are illustrated in Section VI and Section VII concludes the paper. Appendix A contains the derivation of the achievable sum-DoF for RS, MU-MIMO and MIMO NOMA schemes.

I-C Notations

Matrices are denoted by boldface uppercase letters, column vectors are denoted by boldface lowercase letters and scalars are denoted by standard letters. Trace and determinant of matrix 𝐀\mathbf{A} are denoted by tr(𝐀)tr(\mathbf{A}) and det(𝐀)\det(\mathbf{A}), respectively. diag(𝐀)diag(\mathbf{A}) denotes the diagonal entries of the matrix. 𝐀T\mathbf{A}^{T} and 𝐀H\mathbf{A}^{H} denote the Transpose and Hermitian operators on the matrix 𝐀\mathbf{A}, respectively. Euclidean norm of a vector 𝐚\mathbf{a} is denoted as 𝐚\lVert\mathbf{a}\rVert. \otimes denotes the kronecker product and vec(𝐀)vec(\mathbf{A}) denotes vectorization of matrix 𝐀\mathbf{A}. 𝔼X{Y}\mathbb{E}_{X}\{Y\} is expectation of YY w.r.t random variable XX. M×N\mathbb{C}^{M\times N} and M×N\mathbb{R}^{M\times N} denote the set of all M×NM\times N dimensional matrices with complex-valued and real-valued entries, respectively. The Circularly Symmetric Complex Gaussian (CSCG) distribution with mean μ\mu and variance σ2\sigma^{2} is denoted as 𝒞𝒩(μ,σ2)\mathcal{CN}(\mu,\sigma^{2}).

II System Model

We consider a system model in which a BS consisting of MM transmit antennas is serving KK users indexed by the set 𝒦={1,,K}\mathcal{K}=\{1,\ldots,K\}, each equipped with QQ receive antennas. The transmit signal 𝐱M×1\mathbf{x}\in\mathbb{C}^{M\times 1} is subject to a power constraint 𝔼{𝐱2}Pt\mathbb{E}\{\|\mathbf{x}\|^{2}\}\leq{P_{t}}. The signal is transmitted through a MIMO BC with 𝐇kM×Q\mathbf{H}_{k}\in\mathbb{C}^{M\times Q} denoting the channel matrix between the BS and user-kk and it is drawn from a continuous distribution. The signal received at user-kk is given by

𝐲k=𝐇kH𝐱+𝐧k,\mathbf{y}_{k}=\mathbf{H}_{k}^{H}\mathbf{x}+\mathbf{n}_{k}, (1)

where 𝐧k𝒞𝒩(𝟎,σn,k2𝐈Q)\mathbf{n}_{k}\sim\mathcal{C}\mathcal{N}({\mathbf{0}},\sigma_{n,k}^{2}\mathbf{I}_{Q}) is the Additive White Gaussian Noise (AWGN) vector and is independent of the channel matrices. Without loss of generality, we assume the noise variances across users to be equal, i.e., σn,k2=σn2,k𝒦\sigma_{n,k}^{2}=\sigma_{n}^{2},\,\forall k\in\mathcal{K}. We assume that each user has complete knowledge of the channel information, i.e., perfect CSI at the Receiver (CSIR). In contrast, the BS only has partial knowledge of users CSI. Next we detail the channel acquisition at the BS.

II-A Imperfect CSIT

The overall channel state can be denoted as 𝐇=[𝐇1,𝐇2,,𝐇K]\mathbf{H}=[\mathbf{H}_{1},\mathbf{H}_{2},\ldots,\mathbf{H}_{K}] M×(QK)\in\mathbb{C}^{M\times(QK)}, where the fading channel varies according to an ergodic stationary process during the time of transmission. The probability density function of the stationary process is f𝐇(𝐇){f}_{\mathrm{\mathbf{H}}}(\mathbf{H}). Practical limitations in CSI acquisition such as quantized feedback [45], feedback and processing delay [46], [47], hardware impairments [48] and channel estimation [49] result in partial knowledge of the CSI at the BS given by 𝐇^=[𝐇^1,𝐇^2,,𝐇^K]\widehat{\mathbf{H}}=[\widehat{\mathbf{H}}_{1},\widehat{\mathbf{H}}_{2},\ldots,\widehat{\mathbf{H}}_{K}] and is modeled as 𝐇=𝐇^+𝐇~\mathbf{H}=\widehat{\mathbf{H}}+\widetilde{\mathbf{H}}. We assume that the joint distribution of the channel state and its estimate {𝐇,𝐇^}\{\mathbf{H},\widehat{\mathbf{H}}\} is ergodic and stationary [20]. The conditional density f𝐇|𝐇^(𝐇|𝐇^){f}_{\mathrm{\mathbf{H}}|\mathrm{\widehat{\mathbf{H}}}}(\mathbf{H}|\widehat{\mathbf{H}}) is assumed to be known at the BS while 𝐇\mathbf{H} is unknown over the entire transmission. Error in the estimation is defined by the channel estimation error matrix 𝐇~=[𝐇~1,𝐇~2,,𝐇~K]\widetilde{\mathbf{H}}=[\widetilde{\mathbf{H}}_{1},\widetilde{\mathbf{H}}_{2},\ldots,\widetilde{\mathbf{H}}_{K}] in which each element of 𝐇~k\widetilde{\mathbf{H}}_{k} is an independent and identically distributed (i.i.d) complex Gaussian distribution variable with zero mean. Whereas, 𝔼{𝐇~k𝐇~kH}=𝐑e,k\mathbb{E}\{\widetilde{\mathbf{H}}_{k}\widetilde{\mathbf{H}}_{k}^{H}\}=\mathbf{R}_{e,k} is the covariance matrix of the error matrix, independent of 𝐇^k\widehat{\mathbf{H}}_{k}. Furthermore, the average CSIT error power is defined as σe,k2𝔼𝐇~k{𝐇~k2}=1Mtr(𝐑e,k)\sigma_{e,k}^{2}\triangleq\mathbb{E}_{\widetilde{\mathbf{H}}_{k}}\big{\{}\lVert\widetilde{\mathbf{H}}_{k}\rVert^{2}\big{\}}=\frac{1}{M}tr(\mathbf{R}_{e,k}). σe,k2\sigma_{e,k}^{2} is allowed to scale as 𝒪(Ptα)\mathcal{O}(P_{t}^{-\alpha}) felicitating the scaling of the CSIT quality with SNR, where α[0,)\alpha\in[0,\infty) is the quality scaling factor representing the quality of CSI at the BS in the high SNR regime [20, 45, 46, 47]. Consequently, we write σe,k2=𝒪(Ptα)\sigma_{e,k}^{2}=\mathcal{O}(P_{t}^{-\alpha}) such that the error variance is assumed to scale exponentially with SNR. For α=\alpha=\infty, the average CSIT error power is equal to zero as σe,k2=0,k𝒦\sigma_{e,k}^{2}=0,\>\forall\,k\,\in\,\mathcal{K}, resulting in a perfect CSIT scenario. On the other extreme, for α=0\alpha=0, the CSIT quality remains invariant w.r.t SNR. Thus, a finite non-zero α\alpha leads to CSIT quality improvement as SNR increases, e.g., increasing the number of feedback bits with SNR. Here we truncate α[0,1]\alpha\in[0,1]. From a DoF perspective α=1\alpha=1 corresponds to perfect CSIT [20].

II-B MIMO Rate Splitting

Here we delineate the RS framework proposed for MIMO BC.

II-B1 Transmitter

There are Qkmin(M,Q)Q_{k}\leq\min(M,Q) messages intended for user-kk, k𝒦\forall k\in\mathcal{K}, such that k=1KQk=Qp=min(M,KQ)\sum_{k=1}^{K}Q_{k}=Q_{p}=\min(M,KQ). These messages are expressed as 𝐰k={W1k,W2k,,WQkk}\mathbf{w}_{k}=\{W_{1}^{k},W_{2}^{k},\ldots,W_{Q_{k}}^{k}\}, k𝒦k\in\mathcal{K}. Each message of user-kk is split into a common part and a private part as Wik={Wic,k,Wip,k}W_{i}^{k}=\{W_{i}^{c,k},W_{i}^{p,k}\}, i{1,,Qk}\forall i\in\{1,\ldots,Q_{k}\}. The common parts 𝐰c,1,,𝐰c,K\mathbf{w}_{c,1},\ldots,\mathbf{w}_{c,K} of the messages of all users, with 𝐰c,k={W1c,k,W2c,k,,WQkc,k}\mathbf{w}_{c,k}=\{W_{1}^{c,k},W_{2}^{c,k},\ldots,W_{Q_{k}}^{c,k}\} denoting common parts of user-kk, are combined into Qc,Qc{1,,min(M,Q)}Q_{c},\,Q_{c}\in\{1,\ldots,\min(M,Q)\} common messages denoted by 𝐰cQc×1\mathbf{w}_{c}\in\mathbb{C}^{Q_{c}\times 1}, and encoded together into a common stream vector of size QcQ_{c} denoted by 𝐬c=[s1c,,sQcc]T\mathbf{s}_{c}=[s_{1}^{c},\ldots,s_{Q_{c}}^{c}]^{T}. 𝐬c\mathbf{s}_{c} will be decoded by all users. The private parts of user-kk, 𝐰p,k={W1p,k,,WQkp,k}Qk×1\mathbf{w}_{p,k}=\{W_{1}^{p,k},\ldots,W_{Q_{k}}^{p,k}\}\in\mathbb{C}^{Q_{k}\times 1} are independently encoded into a private stream vector 𝐬k=[s1p,k,,sQkp,k]T\mathbf{s}_{k}=[s_{1}^{p,k},\ldots,s_{Q_{k}}^{p,k}]^{T} meant to be decoded by the corresponding user-kk only. Therefore, the overall data stream vector to be transmitted is expressed as 𝐬=[𝐬c,𝐬1,..,𝐬K]T\mathbf{s}=[\mathbf{s}_{c},\mathbf{s}_{1},..,\mathbf{s}_{K}]^{T}. We use linear precoders 𝐏=[𝐏c,𝐏1,..,𝐏K]\mathbf{P}=[\mathbf{P}_{c},\mathbf{P}_{1},..,\mathbf{P}_{K}] to precode the data streams, where 𝐏cM×Qc\mathbf{P}_{c}\in\mathbb{C}^{M\times Q_{c}} is the precoder for the common stream vector and 𝐏kM×Qk\mathbf{P}_{k}\in\mathbb{C}^{M\times Q_{k}} is the precoder for the private stream vector of user-kk. The resulting transmit signal is 𝐱=𝐏𝐬\mathbf{x}=\mathbf{P}\mathbf{s}. The assumption is that 𝔼{𝐬𝐬H}=𝐈\mathbb{E}\{\mathbf{s}\mathbf{s}^{H}\}=\mathbf{I} thereby making the transmit power constraint as 𝔼{tr(𝐏𝐏H)}Pt\mathbb{E}\big{\{}tr(\mathbf{P}\mathbf{P}^{H})\big{\}}\leq P_{t}.

II-B2 MMSE Receiver and Rates

At user-kk, first the common stream vector 𝐬c\mathbf{s}_{c} is decoded into 𝐰^c\widehat{\mathbf{w}}_{c} by treating the interference from all private stream vectors as noise. Once the common stream vector is decoded and removed successfully using SIC, the private stream vector 𝐬k\mathbf{s}_{k} of user-kk is decoded into 𝐰^p,k\widehat{\mathbf{w}}_{p,k} by treating interference from private stream vectors of other users as noise. User-kk reconstructs its original message by extracting 𝐰^c,k\widehat{\mathbf{w}}_{c,k} from 𝐰^c\widehat{\mathbf{w}}_{c} and combining it with 𝐰^p,k\widehat{\mathbf{w}}_{p,k} to form 𝐰^k\widehat{\mathbf{w}}_{k}. Fig. 1 shows the KK-user RS transmission model for MIMO BC. Next, we specify the instantaneous and ergodic rates for the common and private stream vectors (which are respectively denoted as common rate and private rate in the following).

Refer to caption
Figure 1: Transmission model of RS in MIMO BC

Since the precoder design at the BS is dependent on the channel estimate 𝐇^\widehat{\mathbf{H}} while each user having perfect CSIR decodes its intended streams based on the exact channel 𝐇\mathbf{H}, a joint fading state {𝐇^,𝐇}\{\widehat{\mathbf{H}},\mathbf{H}\} determines the instantaneous common and private rates of each user. Assuming the signalling to be Gaussian, for a given channel realization, the instantaneous common and private rates Rz,k(𝐇,𝐇^),z{c,p}R_{z,k}(\mathbf{H},\widehat{\mathbf{H}}),\,z\in\{c,p\}222To avoid redundancy, wherever possible, subscript z,z{c,p}{z},\,z\in\{c,p\} will be used throughout the paper to simultaneously represent entities associated with the common and private stream vectors with cc representing entities associated with the common stream vector and pp with the private stream vector. can be written as

Rc,k(𝐇,𝐇^)=log2det(𝐈+𝐏cH𝐇k(𝐑c,k)1𝐇kH𝐏c),Rp,k(𝐇,𝐇^)=log2det(𝐈+𝐏kH𝐇k(𝐑p,k)1𝐇kH𝐏k).\begin{split}R_{c,k}(\mathbf{H},\widehat{\mathbf{H}})&=\log_{2}\det(\mathbf{I}+\mathbf{P}_{c}^{H}\mathbf{H}_{k}(\mathbf{R}_{c,k})^{-1}\mathbf{H}_{k}^{H}\mathbf{P}_{c}),\\ R_{p,k}(\mathbf{H},\widehat{\mathbf{H}})&=\log_{2}\det(\mathbf{I}+\mathbf{P}_{k}^{H}\mathbf{H}_{k}(\mathbf{R}_{p,k})^{-1}\mathbf{H}_{k}^{H}\mathbf{P}_{k}).\end{split} (2)

The noise plus interference covariance matrices 𝐑z,k(𝐇,𝐇^),z{c,p}\mathbf{R}_{z,k}(\mathbf{H},\widehat{\mathbf{H}}),\,z\in\{c,p\} for the common and private stream vectors at user-kk are defined as

𝐑c,k(𝐇,𝐇^)=𝐈Q+i=1K𝐇kH𝐏i𝐏iH𝐇k,𝐑p,k(𝐇,𝐇^)=𝐈Q+i=1,ikK𝐇kH𝐏i𝐏iH𝐇k.\mathbf{R}_{c,k}(\mathbf{H},\widehat{\mathbf{H}})=\mathbf{I}_{Q}+\sum_{i=1}^{K}\mathbf{H}_{k}^{H}\mathbf{P}_{i}\mathbf{P}_{i}^{H}\mathbf{H}_{k},\;\;\mathbf{R}_{p,k}(\mathbf{H},\widehat{\mathbf{H}})=\mathbf{I}_{Q}+\sum_{i=1,i\neq k}^{K}\mathbf{H}_{k}^{H}\mathbf{P}_{i}\mathbf{P}_{i}^{H}\mathbf{H}_{k}. (3)

Denote the receive filters for common and private stream vectors at user-kk as 𝐆c,k(𝐇,𝐇^)Qc×Q\mathbf{G}_{c,k}(\mathbf{H},\widehat{\mathbf{H}})\in\mathbb{C}^{Q_{c}\times Q} and 𝐆p,k(𝐇,𝐇^)Qk×Q\mathbf{G}_{p,k}(\mathbf{H},\widehat{\mathbf{H}})\in\mathbb{C}^{Q_{k}\times Q}, respectively. The estimated common stream vector is denoted as 𝐬^c,k=𝐆c,k𝐲k\widehat{\mathbf{s}}_{c,k}=\mathbf{G}_{c,k}\mathbf{y}_{k}. Assuming successful decoding and removal of the common stream vector, the private stream vector is estimated as 𝐬^k=𝐆p,k(𝐲k𝐇kH𝐏c𝐬c)\widehat{\mathbf{s}}_{k}=\mathbf{G}_{p,k}\big{(}\mathbf{y}_{k}-\mathbf{H}_{k}^{H}\mathbf{P}_{c}\mathbf{s}_{c}\big{)}. Therefore, Mean Square Error (MSE) matrices are written as

𝐄c,k(𝐇,𝐇^)=𝔼[(𝐬^c,k𝐬c)(𝐬^c,k𝐬c)H],𝐄p,k(𝐇,𝐇^)=𝔼[(𝐬^k𝐬k)(𝐬^k𝐬k)H].\mathbf{E}_{c,k}(\mathbf{H},\widehat{\mathbf{H}})=\mathbb{E}\big{[}(\widehat{\mathbf{s}}_{c,k}-\mathbf{s}_{c})(\widehat{\mathbf{s}}_{c,k}-\mathbf{s}_{c})^{H}\big{]},\;\;\mathbf{E}_{p,k}(\mathbf{H},\widehat{\mathbf{H}})=\mathbb{E}\big{[}(\widehat{\mathbf{s}}_{k}-\mathbf{s}_{k})(\widehat{\mathbf{s}}_{k}-\mathbf{s}_{k})^{H}\big{]}. (4)

Minimizing the MSEs by solving 𝐄z,k𝐆z,k=0,z{c,p}\frac{\partial\mathbf{E}_{z,k}}{\partial\mathbf{G}_{z,k}}=0,\,z\in\{c,p\} leads to respective Minimum MSE (MMSE) filters

𝐆c,kMMSE(𝐇,𝐇^)=argmin𝐆c,k𝔼[𝐆c,k𝐲k𝐬c2]=𝐏cH𝐇k(𝐇kH𝐏c𝐏cH𝐇k+𝐑c,k)1,\begin{split}\mathbf{G}_{c,k}^{\textrm{MMSE}}(\mathbf{H},\widehat{\mathbf{H}})=\arg\min_{\mathbf{G}_{c,k}}\mathbb{E}[\|\mathbf{G}_{c,k}\mathbf{y}_{k}-\mathbf{s}_{c}\|^{2}]=\mathbf{P}_{c}^{H}\mathbf{H}_{k}(\mathbf{H}_{k}^{H}\mathbf{P}_{c}\mathbf{P}_{c}^{H}\mathbf{H}_{k}+\mathbf{R}_{c,k})^{-1},\end{split} (5)
𝐆p,kMMSE(𝐇,𝐇^)=argmin𝐆p,k𝔼[𝐆p,k(𝐲k𝐇kH𝐏c𝐬c)𝐬k2]=𝐏kH𝐇k(𝐇kH𝐏k𝐏kH𝐇k+𝐑p,k)1.\begin{split}\mathbf{G}_{p,k}^{\textrm{MMSE}}(\mathbf{H},\widehat{\mathbf{H}})=\arg\min_{\mathbf{G}_{p,k}}\mathbb{E}[\|\mathbf{G}_{p,k}(\mathbf{y}_{k}-\mathbf{H}_{k}^{H}\mathbf{P}_{c}\mathbf{s}_{c})-\mathbf{s}_{k}\|^{2}]=\mathbf{P}_{k}^{H}\mathbf{H}_{k}(\mathbf{H}_{k}^{H}\mathbf{P}_{k}\mathbf{P}_{k}^{H}\mathbf{H}_{k}+\mathbf{R}_{p,k})^{-1}.\end{split} (6)

Substituting the MMSE filters (5) and (6) in (4), respectively, the MMSE matrices for common and private stream vectors are calculated as

𝐄c,kMMSE(𝐇,𝐇^)=(𝐈+𝐏cH𝐇k(𝐑c,k)1𝐇kH𝐏c)1,𝐄p,kMMSE(𝐇,𝐇^)=(𝐈+𝐏kH𝐇k(𝐑p,k)1𝐇kH𝐏k)1.\mathbf{E}_{c,k}^{\textrm{MMSE}}(\mathbf{H},\widehat{\mathbf{H}})=(\mathbf{I}+\mathbf{P}_{c}^{H}\mathbf{H}_{k}(\mathbf{R}_{c,k})^{-1}\mathbf{H}_{k}^{H}\mathbf{P}_{c})^{-1},\;\;\mathbf{E}_{p,k}^{\textrm{MMSE}}(\mathbf{H},\widehat{\mathbf{H}})=(\mathbf{I}+\mathbf{P}_{k}^{H}\mathbf{H}_{k}(\mathbf{R}_{p,k})^{-1}\mathbf{H}_{k}^{H}\mathbf{P}_{k})^{-1}. (7)

Using MMSE matrices in (7) and equation (2), we obtain the instantaneous rate expressions as Rz,k(𝐇,𝐇^)=log2det(𝐄z,kMMSE(𝐇,𝐇^))1,z{c,p}.R_{z,k}(\mathbf{H},\widehat{\mathbf{H}})=\log_{2}{\det{\big{(}\mathbf{E}_{z,k}^{\textrm{MMSE}}(\mathbf{H},\widehat{\mathbf{H}})\big{)}^{-1}}},\,z\in\{c,p\}. Partial knowledge of the CSI at the BS may result in overestimation of the instantaneous rates, rendering them unachievable [20]. A robust method would be to design the precoders based on the ER assuming that the transmission is delay unlimited. The ERs for the common and private stream vectors of user-kk are defined as

R¯z,k𝔼{𝐇,𝐇^}{Rz,k(𝐇,𝐇^)},z{c,p}.\bar{R}_{z,k}\triangleq\mathbb{E}_{\{\mathbf{H},\widehat{\mathbf{H}}\}}\{R_{z,k}(\mathbf{H},\widehat{\mathbf{H}})\},\,z\in\{c,p\}. (8)

ER characterizes the long-term performance of user-kk over all possible joint fading states. To ensure that each user is able to successfully decode the common stream vector, it needs to be sent at an ER R¯c=minj{R¯c,j}j=1j=K\bar{R}_{c}=\min_{j}\big{\{}\bar{R}_{c,j}\big{\}}_{j=1}^{j=K}. The common rate is shared by all users with C¯k\bar{C}_{k} denoting the share allocated to user-kk such that k𝒦C¯k=R¯c\sum_{k\in\mathcal{K}}\bar{C}_{k}=\bar{R}_{c}. Therefore, the total ER achieved by user-kk is equal to R¯k,tot=R¯p,k+C¯k\bar{R}_{k,tot}=\bar{R}_{p,k}+\bar{C}_{k}.

II-C Two User Example

To better illustrate RS, we consider a two-user case. At the BS, Qk,k{1,2}Q_{k},\>k\in\{1,2\} messages of both users denoted by 𝐰1={W11,,WQ11}\mathbf{w}_{1}=\{W_{1}^{1},\ldots,W_{Q_{1}}^{1}\} and 𝐰2={W12,,WQ22}\mathbf{w}_{2}=\{W_{1}^{2},\ldots,W_{Q_{2}}^{2}\} are respectively split into sub-messages as 𝐰1={{W1c,1,W1p,1},,{WQ1c,1,WQ1p,1}}\mathbf{w}_{1}=\big{\{}\{W_{1}^{c,1},W_{1}^{p,1}\},\ldots,\{W_{Q_{1}}^{c,1},W_{Q_{1}}^{p,1}\}\big{\}} and 𝐰2={{W1c,2,W1p,2},,{WQ2c,2,WQ2p,2}}\mathbf{w}_{2}=\big{\{}\{W_{1}^{c,2},W_{1}^{p,2}\},\ldots,\{W_{Q_{2}}^{c,2},W_{Q_{2}}^{p,2}\}\big{\}}. Sub-messages {{W1c,k,,WQkc,k}k{1,2}}\{\{W_{1}^{c,k},\ldots,W_{Q_{k}}^{c,k}\}\mid\forall k\in\,\{1,2\}\} are combined and jointly encoded into a common stream vector 𝐬c\mathbf{s}_{c} of size QcQ_{c}. Whereas, the private sub-messages {W1p,1,,WQ1p,1}\{W_{1}^{p,1},\ldots,W_{Q_{1}}^{p,1}\} and {W1p,2,,WQ2p,2}\{W_{1}^{p,2},\ldots,W_{Q_{2}}^{p,2}\} are respectively encoded into streams 𝐬1\mathbf{s}_{1} of size Q1Q_{1} and 𝐬2\mathbf{s}_{2} of size Q2Q_{2}. For instance, in our simulations, we consider M=4M=4, Q=2Q=2, Q1=Q2=2Q_{1}=Q_{2}=2 and Qc=2Q_{c}=2. The transmit signal is formed by precoding and superposing the encoded data stream vectors 𝐬c\mathbf{s}_{c}, 𝐬1\mathbf{s}_{1}, 𝐬2\mathbf{s}_{2}, which is expressed as 𝐱=𝐏c𝐬c+𝐏1𝐬1+𝐏2𝐬2\mathbf{x}=\mathbf{P}_{c}\mathbf{s}_{c}+\mathbf{P}_{1}\mathbf{s}_{1}+\mathbf{P}_{2}\mathbf{s}_{2}. At the user side, both users decode their intended streams using SIC. At user-11, first 𝐬c\mathbf{s}_{c} is decoded by treating 𝐬1\mathbf{s}_{1} and 𝐬2\mathbf{s}_{2} as noise. Assuming 𝐬c\mathbf{s}_{c} is successfully decoded, user-11 then removes it from the received signal and decodes 𝐬1\mathbf{s}_{1} by treating 𝐬2\mathbf{s}_{2} as noise. Similarly, user-22 decodes its intended common and private streams sequentially. The two-user RS case reduces to the conventional MU–MIMO strategy by simply turning off the common streams, i.e., allocating no power to the common stream vector. Considering the other extreme of fully decoding the interference, we look at the MIMO NOMA strategy. By encoding the entire 𝐰2\mathbf{w}_{2} into 𝐬c\mathbf{s}_{c}, allocating no power to 𝐬2\mathbf{s}_{2} and encoding 𝐰1\mathbf{w}_{1} into 𝐬1\mathbf{s}_{1}, the two-user RS reduces to MIMO NOMA with decoding order 121\rightarrow 2, where the message of user-22 is decoded by both users and the message of user-11 is decoded by user-11 only. Table I illustrates the mapping of messages to streams.

Table I: Messages mapped to streams in different schemes.
𝐬1\mathbf{s}_{1} 𝐬2\mathbf{s}_{2} 𝐬c\mathbf{s}_{c}
RS 𝐰p,1\mathbf{w}_{p,1} 𝐰p,2\mathbf{w}_{p,2} 𝐰c,1,𝐰c,2\mathbf{w}_{c,1},\mathbf{w}_{c,2}
MU-MIMO 𝐰1\mathbf{w}_{1} 𝐰2\mathbf{w}_{2} -
MIMO NOMA 𝐰1\mathbf{w}_{1} - 𝐰2\mathbf{w}_{2}
decoded by its corresponding user decoded by
and treated as noise by the other user both users

III Problem Formulation

In this section, our objective is to formulate the precoder optimization problem for the proposed system model. A naive approach for precoder design is to assume the channel estimate 𝐇^\widehat{\mathbf{H}} to be perfect and optimize the instantaneous precoder 𝐏\mathbf{P} by maximizing the instantaneous WSR subject to the instantaneous power constraint tr(𝐏𝐏H)Pttr(\mathbf{P}\mathbf{P}^{H})\leq{P}_{t}. However, this approach might not be able to cope with MU interference and it may lead the BS to transmit at undecodable rates [20]. A more robust approach to designing precoders is to maximize the Weighted Ergodic Sum-Rate (WESR) which captures the long-term WSR performance of all users to ensure reliable transmission. We first define the WESR for RS as R¯RS(𝝁)=k=1KμkR¯k,tot\bar{R}_{RS}(\boldsymbol{\mu})=\sum_{k=1}^{K}\mu_{k}\bar{R}_{k,tot}, where μk\mu_{k} is the weight allocated to user-kk and 𝝁={μ1,,μK}\boldsymbol{\mu}=\{\mu_{1},...,\mu_{K}\}. Next, we consider the Weighted Average SR (WASR) optimization approach to maximize WESR at the BS with imperfect instantaneous CSIT. Though it is difficult to predict the instantaneous rates at the BS, the BS can instead access the Average-Rate (AR).

Definition 1.

For a given channel state estimate 𝐇^\widehat{\mathbf{H}} and precoder 𝐏(𝐇^)\mathbf{P}(\widehat{\mathbf{H}}), AR is defined as the expected performance over the CSIT error distribution. The ARs for the common and private stream vectors at user-kk are given by

R^z,k(𝐇^)=𝔼𝐇|𝐇^{Rz,k(𝐇,𝐇^)|𝐇^},z{c,p}.\widehat{R}_{z,k}(\mathrm{\widehat{\mathbf{H}}})=\mathbb{E}_{\mathrm{\mathbf{H}}|\mathrm{\widehat{\mathbf{H}}}}\big{\{}R_{z,k}(\mathbf{H},\widehat{\mathbf{H}})|\mathrm{\widehat{\mathbf{H}}}\big{\}},\,z\in\{c,p\}. (9)

AR should not be confused with ER. While ER captures the long-term performance over all channel states, AR measures the short-term expected performance over CSIT error distribution for one channel estimate. By using the law of total expectation and the definition of AR, the relation between ER and AR of the common and private stream vectors at user-kk is established as R¯z,k=𝔼𝐇^{R^z,k(𝐇^)},z{c,p}\bar{R}_{z,k}=\mathbb{E}_{\mathrm{\widehat{\mathbf{H}}}}\big{\{}\widehat{R}_{z,k}(\widehat{\mathbf{H}})\big{\}},\,z\in\{c,p\}[20]. The share of the AR allocated to user-kk corresponding to the common stream vector is defined as C^k\widehat{C}_{k} such that k=1KC^k=R^c\sum_{k=1}^{K}\widehat{C}_{k}=\widehat{R}_{c} and R^c\widehat{R}_{c} must not be greater than mink𝒦{R^c,k}\min_{k\in\mathcal{K}}\{\widehat{R}_{c,k}\}. For calculating AR of the common stream vector R^c\widehat{R}_{c}, we write

mink𝒦{𝔼𝐇^{R^c,k}}𝔼𝐇^{mink𝒦{R^c,k}},\min_{k\in\mathcal{K}}\big{\{}\mathbb{E}_{\widehat{\mathbf{H}}}\{\widehat{R}_{c,k}\}\big{\}}\geq\mathbb{E}_{\widehat{\mathbf{H}}}\big{\{}\min_{k\in\mathcal{K}}\{\widehat{R}_{c,k}\}\big{\}}, (10)

as interchanging minimization and expectation in (10) does not increase the value of left hand side. Consequently, following Law of Large Numbers (LLN) we approximate the ER of each stream by averaging its AR over all channel states and thereby remove dependencies among channel states. This allows us to decompose the WESR maximization problem with short term333For tractability, we replace the long-term power constraint 𝔼𝐇{tr(𝐏𝐏H)}Pt\mathbb{E}_{\mathbf{H}}\{tr(\mathbf{P}\mathbf{P}^{H})\}\leq P_{t} with short-term power constraints [20]. power constraints to a WASR maximization problem for each 𝐇^\widehat{\mathbf{H}} defined as

R^RS(Pt,𝝁)\displaystyle\widehat{R}_{RS}(P_{t},\boldsymbol{\mu}) =max𝐏,𝐜^k=1KμkR^k,tot\displaystyle=\max_{\mathbf{P,\widehat{c}}}\sum_{k=1}^{K}\mu_{k}\widehat{R}_{k,tot} (11a)
C^1+C^2+C^KR^c\displaystyle\widehat{C}_{1}+\widehat{C}_{2}+\ldots\widehat{C}_{K}\leq\widehat{R}_{c} (11b)
tr(𝐏𝐏H)Pt\displaystyle tr(\mathbf{P}\mathbf{P}^{H})\leq{P_{t}} (11c)
𝐜^𝟎,\displaystyle\mathbf{\widehat{c}}\geq\mathbf{0}, (11d)

where 𝐜^=[C^1,C^2,,C^K]{\mathbf{\widehat{c}}}=[\widehat{C}_{1},\widehat{C}_{2},\ldots,\widehat{C}_{K}] is the average common rate vector and R^k,tot=R^p,k+C^k\widehat{R}_{k,tot}=\widehat{R}_{p,k}+\widehat{C}_{k}. After formulating the WASR problem for the RS scheme, we look at the effect of the instantaneous CSIT quality on its long-term performance and observe how the RS scheme fares against conventional multiple access schemes. We do that by looking at the sum-DoF analysis.

III-A Common message and DoF Analysis

DoF is the total number of interference free streams that can be transmitted simultaneously in a single channel use [15]. The sum-DoF for RS is defined as

dsRS=limPt𝔼𝐇^{R^RS(Pt)}log2(Pt),d_{s}^{\textrm{RS}}=\lim_{P_{t}\rightarrow\infty}\frac{\mathbb{E}_{\mathrm{\widehat{\mathbf{H}}}}\{\widehat{R}_{RS}(P_{t})\}}{\log_{2}(P_{t})}, (12)

where R^RS(Pt)\widehat{R}_{RS}(P_{t}) is the Average SR (ASR) and is equal to R^RS(Pt,𝝁)\widehat{R}_{RS}(P_{t},\boldsymbol{\mu}) for equal user weights, i.e., μk=1,k𝒦\mu_{k}=1,\,\forall k\in\mathcal{K}. We aim at establishing the sum-DoF achieved by the RS scheme for symmetric MIMO BC transmission with imperfect CSIT under the assumption that the channel 𝐇^k\widehat{\mathbf{H}}_{k} is full rank and CSIT error matrix 𝐇~k\widetilde{\mathbf{H}}_{k} is isotropically distributed. It should be noted that these assumptions are not necessary for optimization. With QQ receive antennas at each user, QpQ_{p} being the total number of private streams transmitted and QcQ_{c} as the number of common streams, the sum-DoF achieved by the RS precoding scheme is

dsRS:{Qc(1α)+Qpα,M{2Q,3Q,,KQ}M,MQ.d_{s}^{\textrm{RS}}:\begin{dcases}Q_{c}(1-\alpha)+Q_{p}\alpha,&M\in\{2Q,3Q,\ldots,KQ\}\\ M,&M\leq Q.\end{dcases} (13)

For comparison, we consider the conventional MU–MIMO and MIMO NOMA schemes with sum-DoF expressed as

dsMU-MIMO=max(min(M,Q),Qpα).d_{s}^{\textrm{MU-MIMO}}=\max\big{(}\min(M,Q),Q_{p}\alpha\big{)}. (14)
dsNOMA=min(M,Q).d_{s}^{\textrm{NOMA}}=\min(M,Q). (15)

The procedure to obtain the sum-DoF achieved by all schemes is relegated to Appendix A. Following the principle of SC-SIC, the ASR achieved by MIMO NOMA (SC–SIC) is limited by the decodability of all users messages decoded in the last place. This restricts the sum-DoF to a maximum value of min(M,Q)\min(M,Q). For MU–MIMO, the sum-DoF can attain a maximum value of QpQ_{p} when α=1\alpha=1. As α\alpha falls below 11, detrimental effects of interference lead to a decrease in its sum-DoF. Once α\alpha goes further down, CSIT quality deteriorates to a point where it is not conducive enough to support MU transmission and transmitting to a single user yields a better sum-DoF. However, in the RS scheme, the presence of common messages allows the transmitter to adjust the power allocated to the private stream vectors in a way that the interference is always at the level of noise. Thus, the DoF of the private stream vectors is maintained at QpαQ_{p}\alpha by scaling down the power allocated to the private stream vectors to 𝒪(Ptα)\mathcal{O}(P_{t}^{\alpha}). The remaining power which scales as 𝒪(Pt)\mathcal{O}(P_{t}) is allocated to the common stream vector. The DoF gain achieved by the common streams is Qc(1α)Q_{c}(1-\alpha). For α(0,1)\alpha\,\in(0,1) and Qc=min(M,Q)Q_{c}=\min(M,Q), the sum-DoF of RS is strictly greater than the sum-DoF of both MU–MIMO and MIMO NOMA.

Though optimization does not improve the achievable sum-DoF, it does play a significant role in improving the rate performance. As R^MU-MIMO(Pt)\widehat{R}_{\textrm{MU-MIMO}}(P_{t}) can be obtained by switching off the common streams, the inequality R^RS(Pt)R^MU-MIMO(Pt)\widehat{R}_{\textrm{RS}}(P_{t})\geq\widehat{R}_{\textrm{MU-MIMO}}(P_{t}) is guaranteed for the entire range of SNR. The results in Section VI validate the theoretical assertions.

IV Optimization Framework

The optimization problem and sub-problems of (11) are non-deterministic in nature and thus solving them becomes very difficult in their current form. We adopt a three-step approach to solve the optimization problem (11). First, we use the Sample Average Approximation (SAA) method to obtain a deterministic approximation of the problem, then we transform the WASR problem to a part-wise convex Weighted Average MMSE (WAMMSE) problem making it solvable. Finally, we use vectorization to reduce the matrix variables to their vectorized forms, thereby making WAMMSE problem tractable to solve. Using AO, we obtain the precoders and consequently, the optimized rate for a given weight vector 𝝁\boldsymbol{\mu}..

IV-A Sample Average Approximation

We first consider a set of NN i.i.d channel samples indexed 𝒩={1,2,,N}\mathcal{N}=\{1,2,\ldots,N\} drawn from a distribution with density f𝐇|𝐇^{𝐇|𝐇^}f_{\mathrm{\mathbf{H}}|\mathrm{\widehat{\mathbf{H}}}}\big{\{}\mathbf{H}|\mathbf{\widehat{H}}\big{\}}. Therefore, for a given channel estimate 𝐇^\widehat{\mathbf{H}} we have NN channel samples denoted as (N){𝐇(n)=𝐇^+𝐇~(n)𝐇^,n𝒩}\mathbb{H}^{(N)}\triangleq\big{\{}\mathbf{H}^{(n)}=\widehat{\mathbf{H}}+\widetilde{\mathbf{H}}^{(n)}\mid\widehat{\mathbf{H}},n\in\mathcal{N}\big{\}}. Using the channel realizations and Sample Average Functions (SAFs) defined as: R^z,k(N)1Nn=1NRz,k(n),z{c,p}\widehat{R}_{z,k}^{(N)}\triangleq\frac{1}{N}\sum_{n=1}^{N}R_{z,k}^{(n)},\,z\in\{c,p\}, we approximate the average rates. Here, Rz,k(n)Rz,k(𝐇(n)),z{c,p}R_{z,k}^{(n)}\triangleq R_{z,k}(\mathbf{H}^{(n)}),\,z\in\{c,p\} are the common and private rates associated with the nthn^{th} realization. The SAA of problem (11) is

R^RS(N)(Pt,𝝁)\displaystyle\widehat{R}_{RS}^{(N)}(P_{t},\boldsymbol{\mu}) =max𝐏,𝐜^k=1KμkR^k,tot(N)\displaystyle=\max_{\mathbf{P},\widehat{\mathbf{c}}}\sum_{k=1}^{K}\mu_{k}\widehat{R}_{k,tot}^{(N)} (16a)
C^1+C^2+C^KR^c(N)\displaystyle\widehat{C}_{1}+\widehat{C}_{2}+\ldots\widehat{C}_{K}\leq\widehat{R}_{c}^{(N)} (16b)
tr(𝐏𝐏H)Pt\displaystyle tr(\mathbf{P}\mathbf{P}^{H})\leq{P_{t}} (16c)
𝐜^𝟎,\displaystyle\widehat{\mathbf{c}}\geq\mathbf{0}, (16d)

where R^k,tot(N)=R^p,k(N)+C^k\widehat{R}_{k,tot}^{(N)}=\widehat{R}_{p,k}^{(N)}+\widehat{C}_{k} and R^c(N)minj{R^c,j(N)}j=1K\widehat{R}_{c}^{(N)}\triangleq\min_{j}\{\widehat{R}_{c,j}^{(N)}\}_{j=1}^{K}. The rates obtained here are bounded [20] and therefore by applying LLN, it can be inferred that

limNR^z,k(N)(𝐏)=R^z,k(𝐏),𝐏,z{c,p}.\lim_{N\rightarrow\infty}\widehat{R}_{z,k}^{(N)}(\mathbf{P})=\widehat{R}_{z,k}(\mathbf{P}),\forall\mathbf{P}\in\mathbb{P},\,z\in\{c,p\}. (17)

The set \mathbb{P} defined as {tr(𝐏𝐏H)Pt𝐏}\{tr(\mathbf{P}\mathbf{P}^{H})\leq P_{t}\mid\mathbf{P}\in\mathbb{P}\} is the feasible set of precoders for which the rate functions are bounded, continuous and differentiable in 𝐏\mathbf{P}, thereby making convergence in (17) uniform in 𝐏\mathbf{P}. The ARs are also continuous and differentiable [20] and therefore using (17) we obtain

limNR^k,tot(N)=R^k,tot𝐏.\lim_{N\rightarrow\infty}\widehat{R}_{k,tot}^{(N)}=\widehat{R}_{k,tot}\;\;\;\forall\;\mathbf{P}\in\mathbb{P}. (18)

Based on (17), (18), we obtain that as NN\rightarrow\infty, the optimum solutions of the SAA in problem (16) converges to the solution of the stochastic problem in (11)[50, 20].

IV-B WASR \rightarrow WAMMSE

In this subsection, we aim at solving the sample average approximated problem (16) by using the methods adopted in [4] to transform problem (16) into an equivalent WAMMSE form.

First we define Augmented Weighted Mean Square Error (AWMSE) for common and private stream vectors as

ξz,k(𝐇,𝐇^)=tr(𝐔z,k𝐄z,k)logdet(𝐔z,k),z{c,p}.\xi_{z,k}(\mathbf{H},\widehat{\mathbf{H}})=tr(\mathbf{U}_{z,k}\mathbf{E}_{z,k})-\log{\det{(\mathbf{U}_{z,k})}},\,z\in\{c,p\}. (19)

where 𝐔z,k,z{c,p}\mathbf{U}_{z,k},\,z\in\{c,p\} are instantaneous weights introduced for common and private MSE matrices of user-kk. Next we aim to establish the Rate-WMMSE relationship by optimizing the AWMSEs w.r.t equalizers (filters) and weights. By solving ξz,k(𝐇,𝐇^)𝐆z,k=0\frac{\partial\xi_{z,k}(\mathbf{H},\widehat{\mathbf{H}})}{\partial\mathbf{G}_{z,k}}=0, the optimum equalizers are obtained as 𝐆z,k=𝐆z,kMMSE\mathbf{G}_{z,k}^{*}=\mathbf{G}_{z,k}^{\mathrm{MMSE}}, z{c,p}z\in\{c,p\}. Substituting the optimum equalizers in (19), we get

ξz,k(𝐆z,kMMSE)=tr(𝐔z,k𝐄z,kMMSE)logdet(𝐔z,k),z{c,p}.\xi_{z,k}\big{(}\mathbf{G}_{z,k}^{\mathrm{MMSE}}\big{)}=tr(\mathbf{U}_{z,k}\mathbf{E}_{z,k}^{\mathrm{MMSE}})-\log{\det{(\mathbf{U}_{z,k})}},\,z\in\{c,p\}. (20)

By solving ξz,k(𝐆z,kMMSE)𝐔z,k=0\frac{\partial\xi_{z,k}\big{(}\mathbf{G}_{z,k}^{\mathrm{MMSE}}\big{)}}{\partial\mathbf{U}_{z,k}}=0, the optimum MMSE weights are obtained as 𝐔z,k=𝐔z,kMMSE(𝐄z,kMMSE)1\mathbf{U}_{z,k}^{*}=\mathbf{U}_{z,k}^{\mathrm{MMSE}}\triangleq(\mathbf{E}_{z,k}^{\mathrm{MMSE}})^{-1}, z{c,p}z\in\{c,p\}. Subsituting the obtained optimum weights for the weights in equation (20), the instantaneous Rate-WMMSE relationship is established as

ξc,k(𝐇,𝐇^)min𝐆c,k,𝐔c,kξc,k=QcRc,k(𝐇,𝐇^),ξp,k(𝐇,𝐇^)min𝐆p,k,𝐔p,kξp,k=QkRp,k(𝐇,𝐇^).\xi_{c,k}(\mathbf{H},\widehat{\mathbf{H}})\triangleq\min_{\mathbf{G}_{c,k},\mathbf{U}_{c,k}}\xi_{c,k}=Q_{c}-R_{c,k}(\mathbf{H},\widehat{\mathbf{H}}),\;\;\xi_{p,k}(\mathbf{H},\widehat{\mathbf{H}})\triangleq\min_{\mathbf{G}_{p,k},\mathbf{U}_{p,k}}\xi_{p,k}=Q_{k}-R_{p,k}(\mathbf{H},\widehat{\mathbf{H}}). (21)

Based on the principle of SAA, the AR-WAMMSE relationship is derived by taking the expectation over the conditional distribution of channel 𝐇\mathbf{H} for a given channel estimate 𝐇^\widehat{\mathbf{H}} and is written as

ξ^c,k𝔼𝐇|𝐇^{min𝐆c,k,𝐔c,kξc,k|𝐇^}=QcR^c,k,ξ^p,k𝔼𝐇|𝐇^{min𝐆p,k,𝐔p,kξp,k|𝐇^}=QkR^p,k.\widehat{\xi}_{c,k}\triangleq\mathbb{E}_{\mathrm{\mathbf{H}}|\mathrm{\widehat{\mathbf{H}}}}\big{\{}\min_{\mathbf{G}_{c,k},\mathbf{U}_{c,k}}\xi_{c,k}|\widehat{\mathbf{H}}\big{\}}=Q_{c}-\widehat{R}_{c,k},\;\;\widehat{\xi}_{p,k}\triangleq\mathbb{E}_{\mathrm{\mathbf{H}}|\mathrm{\widehat{\mathbf{H}}}}\big{\{}\min_{\mathbf{G}_{p,k},\mathbf{U}_{p,k}}\xi_{p,k}|\widehat{\mathbf{H}}\big{\}}=Q_{k}-\widehat{R}_{p,k}. (22)

Next, we use the SAFs to obtain the deterministic equivalent relations of (22). Taking the NN i.i.d channel samples, the average AWMSEs are ξ^z,k(N)1Nn=1Nξz,k(n)\widehat{\xi}_{z,k}^{(N)}\triangleq\frac{1}{N}\sum_{n=1}^{N}{\xi}_{z,k}^{(n)}, z{c,p}z\in\{c,p\}, where ξz,k(n){\xi}_{z,k}^{(n)}, 𝐆z,k(n)\mathbf{G}_{z,k}^{(n)}, 𝐔z,k(n)\mathbf{U}_{z,k}^{(n)}, z{c,p}z\in\{c,p\} are all associated with the nthn^{th} realization in (N)\mathbb{H}^{(N)}. For ease of notation, we use 𝐆\mathbf{G} to represent the set of equalizers for common and private stream vectors, i.e., 𝐆{𝐆z,kk𝒦}\mathbf{G}\triangleq\{\mathbf{G}_{z,k}\mid\forall\,k\in\mathcal{K}\}, where 𝐆z,k{𝐆z,k(n)n𝒩,z{c,p}}\mathbf{G}_{z,k}\triangleq\{\mathbf{G}_{z,k}^{(n)}\mid\forall\,n\in\mathcal{N},\,z\in\{c,p\}\}. Following the same method, we obtain the set of weights for common and private streams, denoted as 𝐔\mathbf{U}.

Following the LLN as in (17) and the approach used to obtain (21), the AR-WAMMSE in (22) is written as

(ξ^c,kMMSE)(N)min𝐆c,k,𝐔c,kξ^c,k(N)=QcR^c,k(N),(ξ^p,kMMSE)(N)min𝐆p,k,𝐔p,kξ^p,k(N)=QkR^p,k(N).\big{(}\widehat{\xi}_{c,k}^{\;\mathrm{MMSE}}\big{)}^{(N)}\triangleq\min_{\mathbf{G}_{c,k},\mathbf{U}_{c,k}}\widehat{\xi}_{c,k}^{(N)}=Q_{c}-\widehat{R}_{c,k}^{(N)},\;\;\big{(}\widehat{\xi}_{p,k}^{\;\mathrm{MMSE}}\big{)}^{(N)}\triangleq\min_{\mathbf{G}_{p,k},\mathbf{U}_{p,k}}\widehat{\xi}_{p,k}^{(N)}=Q_{k}-\widehat{R}_{p,k}^{(N)}. (23)

The sets of optimum MMSE equalizers and weights associated with equation (23) are defined as 𝐆MMSE{(𝐆z,kMMSE)(n)z{c,p},n𝒩,k𝒦}\mathbf{G}^{\mathrm{MMSE}}\triangleq\big{\{}\big{(}\mathbf{G}_{z,k}^{\mathrm{MMSE}}\big{)}^{(n)}\mid z\in\{c,p\},\forall\,n\in\mathcal{N},\,\forall k\in\mathcal{K}\big{\}}, 𝐔MMSE{(𝐔z,kMMSE)(n)z{c,p},n𝒩,k𝒦}\mathbf{U}^{\mathrm{MMSE}}\triangleq\big{\{}\big{(}\mathbf{U}_{z,k}^{\mathrm{MMSE}}\big{)}^{(n)}\mid z\in\{c,p\},\forall\,n\in\mathcal{N},\,\forall k\in\mathcal{K}\big{\}} respectively. Motivated by the AR-WAMMSE relationship in (23), the deterministic WAMMSE optimization problem is formulated as

min𝐏,𝐱^,𝐔,𝐆k=1Kμkξ^k,tot(N)\displaystyle\min_{\mathbf{P},\widehat{\mathbf{x}},\mathbf{U},\mathbf{G}}\sum_{k=1}^{K}\mu_{k}\widehat{\xi}_{k,tot}^{(N)} (24a)
X^1+X^2+X^K+Qcξ^c(N)\displaystyle\widehat{X}_{1}+\widehat{X}_{2}+\ldots\widehat{X}_{K}+Q_{c}\geq\widehat{\xi}_{c}^{(N)} (24b)
tr(𝐏𝐏H)Pt\displaystyle tr(\mathbf{P}\mathbf{P}^{H})\leq{P_{t}} (24c)
𝐱^𝟎,\displaystyle\widehat{\mathbf{x}}\leq\mathbf{0}, (24d)

where ξ^c(N)=max{ξ^c,k(N)}k=1K\widehat{\xi}_{c}^{(N)}=\max\{\widehat{\xi}_{c,k}^{(N)}\}_{k=1}^{K}, ξ^k,tot(N)=ξ^p,k(N)+X^k\widehat{\xi}_{k,tot}^{(N)}=\widehat{\xi}_{p,k}^{(N)}+\widehat{X}_{k} and 𝐱^={X^1,X^2,,X^K}=𝐜^\widehat{\mathbf{x}}=\{\widehat{X}_{1},\widehat{X}_{2},\ldots,\widehat{X}_{K}\}=-\widehat{\mathbf{c}}. (24) is optimized w.r.t (𝐔,𝐆)(\mathbf{U},\mathbf{G}) by minimizing individual AWMSEs shown in (23) as the AWMSEs are decoupled in their corresponding weights and equalizers. This can be validated by showing that for a given precoder 𝐏\mathbf{P}, the KKT optimality conditions of (24) are satisfied by the MMSE solution {𝐔MMSE,𝐆MMSE}\{\mathbf{U}^{\mathrm{MMSE}},\mathbf{G}^{\mathrm{MMSE}}\}. Consequently, it can be shown that for the MMSE solution, (24) boils down to (16) and the Rate-WMMSE relationship is not just limited to the global optimum solution but can be extended to the entire set of stationary points. For any point {𝐔,𝐆,𝐏,𝐱^}\{\mathbf{U}^{*},\mathbf{G}^{*},\mathbf{P}^{*},\widehat{\mathbf{x}}^{*}\} satisfying the KKT optimality conditions of (24), also satisfies the KKT optimality conditions of (16), with 𝐜^=𝐱^\widehat{\mathbf{c}}=-\widehat{\mathbf{x}} [20]. Therefore, as NN\rightarrow\infty, solving (24) yields a solution for (16), which in turn, converges to a solution of the WASR problem in (11).

IV-C Vectorization and Alternate Optimization

Problem (24) is non-convex for the joint optimization of variables 𝐆,𝐔\mathbf{G},\mathbf{U}, 𝐱^\widehat{\mathbf{x}} and 𝐏\mathbf{P} but it is convex for each block of variables if the other two are fixed. Therefore, we utilize the AO algorithm with 2 steps, 1) updating the equalizers 𝐆\mathbf{G} and weights 𝐔\mathbf{U} by using (fixing) the precoders 𝐏\mathbf{P} from the previous iteration and 2) updating the precoders 𝐏\mathbf{P} and the message split 𝐱\mathbf{x} by solving the optimization problem for a given 𝐆\mathbf{G} and 𝐔\mathbf{U}. Unlike the MISO case in [10], optimization in (24) encounters difficulties of optimizing matrix variables. Furthermore, the presence of 𝐑z,k1,z{c,p}\mathbf{R}_{z,k}^{-1},\,z\in\{c,p\} matrices in the AWMSE expressions makes the optimization intractable. Bearing that in mind, we first use vectorization and deduce the objective function into a tractable form. Let us consider the augmented AWMSE expression (19) for the common stream. In step 2 of the AO, the weights are fixed. However, the calculation of the term tr(𝐔c,k𝐄c,k)tr(\mathbf{U}_{c,k}\mathbf{E}_{c,k}) introduces difficulties due to the aforementioned reasons and thus we try to simplify the expression and consequently the entire objective function into a solvable form. Expanding tr(𝐔c,k𝐄c,k)tr(\mathbf{U}_{c,k}\mathbf{E}_{c,k}) it follows,

tr(𝐔c,k𝐄c,k)=tr(𝐔c,k𝔼{(𝐆c,k𝐲k𝐬c)(𝐆c,k𝐲k𝐬c)H})=tr(𝐔c,k{𝐆c,k𝐇kH𝐏c𝐏cH𝐇k𝐆c,kH+𝐆c,k𝐇kH(i=1K𝐏i𝐏iH)𝐇k𝐆c,kH𝐆c,k𝐇kH𝐏c𝐏cH𝐇k𝐆c,kH+σn2𝐆c,k𝐆c,kH+𝐈}).\begin{split}tr(\mathbf{U}_{c,k}\mathbf{E}_{c,k})&=tr\Big{(}\mathbf{U}_{c,k}\mathbb{E}\big{\{}(\mathbf{G}_{c,k}\mathbf{y}_{k}-\mathbf{s}_{c})(\mathbf{G}_{c,k}\mathbf{y}_{k}-\mathbf{s}_{c})^{H}\big{\}}\Big{)}=tr\Big{(}\mathbf{U}_{c,k}\big{\{}\mathbf{G}_{c,k}\mathbf{H}_{k}^{H}\mathbf{P}_{c}\mathbf{P}_{c}^{H}\mathbf{H}_{k}{\mathbf{G}_{c,k}^{H}}\\ &+\mathbf{G}_{c,k}\mathbf{H}_{k}^{H}(\sum_{i=1}^{K}\mathbf{P}_{i}\mathbf{P}_{i}^{H})\mathbf{H}_{k}{\mathbf{G}_{c,k}^{H}}-\mathbf{G}_{c,k}\mathbf{H}_{k}^{H}\mathbf{P}_{c}-\mathbf{P}_{c}^{H}\mathbf{H}_{k}{\mathbf{G}_{c,k}^{H}}+\sigma_{n}^{2}\mathbf{G}_{c,k}{\mathbf{G}_{c,k}^{H}}+\mathbf{I}\big{\}}\Big{)}.\end{split} (25)

Simplifying444Applying matrix manipulation tr(𝐀𝐁)=tr(𝐁𝐀)tr(\mathbf{AB})=tr(\mathbf{BA}) and tr(𝐀𝐁𝐂)=vec(𝐀H)H(𝐈𝐁)vec(𝐂)tr(\mathbf{ABC})=vec(\mathbf{A}^{H})^{H}(\mathbf{I}\otimes\mathbf{B})vec(\mathbf{C}) we transform (25) to (26). the expanded expression in (25), we get

tr(𝐔c,k𝐄c,k)logdet(𝐔c,k)=𝐩cH𝐀c,k𝐩c+i=1K𝐩iH𝐀c,k𝐩i𝐚c,kH𝐩c𝐩cH𝐚c,k+Φc,k.\begin{split}tr(\mathbf{U}_{c,k}\mathbf{E}_{c,k})-\log\det(\mathbf{U}_{c,k})=\mathbf{p}_{c}^{H}\mathbf{A}_{c,k}^{{}^{\prime}}\mathbf{p}_{c}+\sum_{i=1}^{K}\mathbf{p}_{i}^{H}\mathbf{A}_{c,k}\mathbf{p}_{i}-\mathbf{a}_{c,k}^{H}\mathbf{p}_{c}-\mathbf{p}_{c}^{H}\mathbf{a}_{c,k}+{\Phi}_{c,k}.\end{split} (26)

Similarly, the resultant expression for tr(𝐔p,k𝐄p,k)tr(\mathbf{U}_{p,k}\mathbf{E}_{p,k}) is

tr(𝐔p,k𝐄p,k)logdet(𝐔p,k)=𝐩kH𝐀p,k𝐩k+ikK𝐩iH𝐀p,k𝐩i𝐚p,kH𝐩k𝐩kH𝐚p,k+Φp,k,\begin{split}tr(\mathbf{U}_{p,k}\mathbf{E}_{p,k})-\log\det(\mathbf{U}_{p,k})=\mathbf{p}_{k}^{H}\mathbf{A}_{p,k}\mathbf{p}_{k}+\sum_{i\neq k}^{K}\mathbf{p}_{i}^{H}\mathbf{A}_{p,k}\mathbf{p}_{i}-\mathbf{a}_{p,k}^{H}\mathbf{p}_{k}-\mathbf{p}_{k}^{H}\mathbf{a}_{p,k}+{\Phi}_{p,k},\end{split} (27)

where 𝐩c=vec(𝐏c)\mathbf{p}_{c}=vec(\mathbf{P}_{c}), 𝐩k=vec(𝐏k)\mathbf{p}_{k}=vec(\mathbf{P}_{k}), 𝐀c,k=𝐈Qc𝐇k𝐆c,kH𝐔c,k𝐆c,k𝐇kH\mathbf{A}_{c,k}^{{}^{\prime}}=\mathbf{I}_{Q_{c}}\otimes\mathbf{H}_{k}\mathbf{G}_{c,k}^{H}\mathbf{U}_{c,k}\mathbf{G}_{c,k}\mathbf{H}_{k}^{H}, 𝐀z,k=𝐈Qk𝐇k𝐆z,kH𝐔z,k𝐆z,k𝐇kH\mathbf{A}_{z,k}=\mathbf{I}_{Q_{k}}\otimes\mathbf{H}_{k}\mathbf{G}_{z,k}^{H}\mathbf{U}_{z,k}\mathbf{G}_{z,k}\mathbf{H}_{k}^{H}, 𝐚z,k=vec(𝐔z,k𝐇k𝐆z,kH)\mathbf{a}_{z,k}=vec(\mathbf{U}_{z,k}\mathbf{H}_{k}\mathbf{G}_{z,k}^{H}) and Φz,k=σn2tr(𝐔z,k𝐆z,k𝐆z,kH)+tr(𝐔z,k)logdet(𝐔z,k){\Phi}_{z,k}=\sigma_{n}^{2}tr(\mathbf{U}_{z,k}\mathbf{G}_{z,k}\mathbf{G}_{z,k}^{H})+tr(\mathbf{U}_{z,k})-\log\det(\mathbf{U}_{z,k}), z{c,p}z\in\{c,p\}. Next, we calculate the SAFs of the AWMSEs following (26) and (27).

IV-C1 STEP 1

Let us denote the precoders from the previous iteration as 𝐏[i1]\mathbf{P}^{[i-1]}. For each channel realization, the equalizers (𝐆(𝐏[i1]))(n)\big{(}\mathbf{G}(\mathbf{P}^{[i-1]})\big{)}^{(n)} and weights (𝐔(𝐏[i1]))(n)\big{(}\mathbf{U}(\mathbf{P}^{[i-1]})\big{)}^{(n)} are calculated for both common and private stream vectors. Precoders are fixed for each i.i.d realization. After obtaining equalizers and weights, we consider SAFs of the following entities: 𝐀^c,k(N)=1Nn=1N(𝐀c,k)(n)\widehat{\mathbf{A}}_{c,k}^{{}^{\prime}(N)}=\frac{1}{N}\sum_{n=1}^{N}(\mathbf{A}_{c,k}^{{}^{\prime}})^{(n)}, 𝐀^z,k(N)=1Nn=1N𝐀z,k(n)\widehat{\mathbf{A}}_{z,k}^{(N)}=\frac{1}{N}\sum_{n=1}^{N}\mathbf{A}_{z,k}^{(n)}, 𝐚^z,k(N)=1Nn=1N𝐚z,k(n)\widehat{\mathbf{a}}_{z,k}^{(N)}=\frac{1}{N}\sum_{n=1}^{N}\mathbf{a}_{z,k}^{(n)}, Φ^z,k(N)=1Nn=1NΦz,k(n)\widehat{\Phi}_{z,k}^{(N)}=\frac{1}{N}\sum_{n=1}^{N}{\Phi}_{z,k}^{(n)}, z{c,p}z\in\{c,p\}.

IV-C2 STEP 2

The next step is to update the precoders by substituting the updated equalizers, weights and SAFs of dependent entities into equation (24). The problem is formulated as

min𝐏,𝐱^k=1Kμk(X^k+𝐩kH𝐀^p,k𝐩k+ikK𝐩iH𝐀^p,k𝐩i𝐚^p,kH𝐩k𝐩kH𝐚^p,k+Φ^p,k)\displaystyle\min_{\mathbf{P},\widehat{\mathbf{x}}}\sum_{k=1}^{K}\mu_{k}\bigg{(}\widehat{X}_{k}+\mathbf{p}_{k}^{H}\widehat{\mathbf{A}}_{p,k}\mathbf{p}_{k}+\sum_{i\neq k}^{K}\mathbf{p}_{i}^{H}\widehat{\mathbf{A}}_{p,k}\mathbf{p}_{i}-\widehat{\mathbf{a}}_{p,k}^{H}\mathbf{p}_{k}-\mathbf{p}_{k}^{H}\widehat{\mathbf{a}}_{p,k}+\widehat{\Phi}_{p,k}\bigg{)} (28a)
i=1KX^i+Qc𝐩cH𝐀^c,k𝐩c+i=1K𝐩iH𝐀^c,k𝐩i𝐚^c,kH𝐩c𝐩cH𝐚^c,k+Φ^c,k,k𝒦\displaystyle\sum_{i=1}^{K}\widehat{X}_{i}+Q_{c}\geq\mathbf{p}_{c}^{H}\widehat{\mathbf{A}}_{c,k}^{{}^{\prime}}\mathbf{p}_{c}+\sum_{i=1}^{K}\mathbf{p}_{i}^{H}\widehat{\mathbf{A}}_{c,k}\mathbf{p}_{i}-\widehat{\mathbf{a}}_{c,k}^{H}\mathbf{p}_{c}-\mathbf{p}_{c}^{H}\widehat{\mathbf{a}}_{c,k}+\widehat{\Phi}_{c,k},\,\,\forall\,k\,\in\mathcal{K} (28b)
tr(𝐏𝐏H)Pt\displaystyle tr(\mathbf{P}\mathbf{P}^{H})\leq{P_{t}} (28c)
𝐱^𝟎.\displaystyle\widehat{\mathbf{x}}\leq\mathbf{0}. (28d)
Algorithm 1 AO ALGORITHM
1:𝐈𝐧𝐢𝐭𝐢𝐚𝐥𝐢𝐳𝐞i=0,𝐏[0]\mathbf{Initialize}\>\>i=0,\>\mathbf{P}^{[0]}
2:𝐈𝐭𝐞𝐫𝐚𝐭𝐞\mathbf{Iterate}
3:i=i+1,𝐆=𝐆(𝐏[i1]),𝐔=𝐔(𝐏[i1]).\;\;\;i=i+1,\;\mathbf{G}=\mathbf{G}(\mathbf{P}^{[i-1]}),\>\mathbf{U}=\mathbf{U}(\mathbf{P}^{[i-1]}).
4:Compute𝐀^c,k,𝐀^c,k,𝐀^p,k,𝐚^c,k,𝐚^p,k,Φ^c,k,Φ^p,k,k𝒦.\;\;\;\mathrm{Compute}\;\widehat{\mathbf{A}}_{c,k}^{{}^{\prime}},\,\widehat{\mathbf{A}}_{c,k},\,\widehat{\mathbf{A}}_{p,k},\,\widehat{\mathbf{a}}_{c,k},\,\widehat{\mathbf{a}}_{p,k},\,\widehat{\Phi}_{c,k},\,\widehat{\Phi}_{p,k},\;\;\;\;\forall\>k\in\>\mathcal{K}.
5:Solve(28),update𝐏[i],𝐱^.\;\;\;\mathrm{Solve}\,\,(\ref{eq:16}),\;\mathrm{update}\;\mathbf{P}^{[i]},\widehat{\mathbf{x}}.
6:until convergenceconvergence

Problem (28) is a convex Quadratically Constrained Quadratic Program (QCQP) and can be solved using interior-point methods [51, 52]. Step 1 and 2 are repeated until the convergence is reached as specified in Algorithm 1. Proposition 1 of [20] and its proof shows that for a given (N)\mathbb{H}^{(N)}, Algorithm 1 converges to a KKT solutions of the sampled WASR problem (16) and as NN\rightarrow\infty, converges to a KKT solution of the WASR in problem (11).

V PHY-Layer Design for MIMO Channels

Refer to caption
Figure 2: Proposed transceiver architecture

In addition to the theoretical foundations, it is of high importance to demonstrate the improved performance of RS in practical setups. In this section, we propose a practical transceiver architecture for RS in MIMO settings. Fig. 2 illustrates the proposed transceiver architecture, which is build upon and generalizes the design in [53] of RS in MISO channels. The transmitter employs finite alphabet modulation schemes 44-QAM, 1616-QAM, 6464-QAM and 256256-QAM, finite-length polar coding [54] for Forward Error Correction (FEC), and an Adaptive Modulation and Coding (AMC) algorithm.

The combined common messages are mapped to binary vectors 𝐰ic\mathbf{w}^{c}_{i} of length KicK^{c}_{i}, for i{1,2,,Qc}i\in\left\{1,2,\ldots,Q_{c}\right\}, respectively. Similarly, the private messages are respectively mapped to binary vectors 𝐰jp,k\mathbf{w}^{p,k}_{j} of length Kjp,kK^{p,k}_{j}, for k𝒦k\in\mathcal{K} and j{1,2,,Qk}j\in\left\{1,2,\ldots,Q_{k}\right\}. We assume the split messages are independent, such that, the common and private information bit vectors are independent and uniformly distributed in 𝔽2Kic\mathbb{F}_{2}^{K^{c}_{i}} and 𝔽2Kjp,k\mathbb{F}_{2}^{K^{p,k}_{j}}. The information bit vectors are independently encoded and modulated into common and private symbol streams 𝐬ic\mathbf{s}^{c}_{i} and 𝐬jp,k\mathbf{s}^{p,k}_{j}, each of length SS. The AMC algorithm selects a suitable modulation-coding rate pair based on the ARs. In this work, the transmit rate calculations for the AMC algorithm are performed assuming the instantaneous channel is known at the AMC module. More details on the channel coding procedure and the AMC algorithm are given in [53] for the interested reader. The precoders for the common and private streams are obtained as described in Algorithm 1.

We note that the rate expressions in (2) are valid under the assumption of joint decoding of all QcQ_{c} common streams (and all QkQ_{k} private streams, k𝒦\forall k\in\mathcal{K}). This restricts the use of conventional and practical point-to-point decoding methods for channel coding at the receiver side. Although there are studies on joint decoding of several types of channel codes ( e.g., polar codes, Low-Density Parity Check codes), such implementations have higher complexities than point-to-point decoding methods, especially when the number of jointly decoded streams increase.

Instead of performing joint decoding, we perform interference nulling and interference cancellation among the streams at the receiver in order to benefit from low-complexity decoding methods. Such receiver design is originally proposed in [55, 56] for V-BLAST systems. The proposed design allows to obtain a separate transmission rate for each common and private stream for Modulation and Coding Scheme (MCS) selection, as opposed to assigning a single rate value calculated by (2) to all common and private streams. The proposed receiver architecture is illustrated in Fig. 2 for a detection and decoding ordering based on the natural ordering of the stream indexes. We note that the detection and decoding ordering of the common (and private) streams in the figure and the following explanations is for the sake of simplicity and the actual stream ordering criterion we use in our design is also explained in this section.

Consider the scenario where the common streams 1,2,,l11,2,\ldots,l-1, for any l<Qcl<Q_{c}, have been correctly decoded and removed from the received signal at user-kk to obtain the resulting interference cancelled received signal 𝐲~kl\widetilde{\mathbf{y}}_{k}^{l}. We define the effective channel for the ll-th common stream as 𝐡¯c,k(l)𝐇kH𝐏c(l)Q×1\bar{\mathbf{h}}_{c,k}^{(l)}\triangleq\mathbf{H}_{k}^{H}\mathbf{P}_{c}(l)\in\mathbb{C}^{Q\times 1}, where 𝐏c(l)\mathbf{P}_{c}(l) is the ll-th column of the matrix 𝐏c\mathbf{P}_{c}. We can write 𝐲~kl\widetilde{\mathbf{y}}_{k}^{l} in terms of the real and effective channels as

𝐲~kc,l=i=lQc𝐡¯c,k(i)sic+j𝒦𝐇kH𝐏j𝐬j+𝐧k.\widetilde{\mathbf{y}}_{k}^{c,l}=\sum_{i=l}^{Q_{c}}\bar{\mathbf{h}}_{c,k}^{(i)}s_{i}^{c}+\sum_{j\in\mathcal{K}}\mathbf{H}_{k}^{H}\mathbf{P}_{j}\mathbf{s}_{j}+\mathbf{n}_{k}. (29)

The detection of the ll-th common stream is performed by multiplying 𝐲~kc,l\widetilde{\mathbf{y}}_{k}^{c,l} with a linear nulling filter, 𝐠c,kl\mathbf{g}_{c,k}^{l}. The nulling filter is designed based on the MMSE criterion and expressed as

𝐠c,kl=(𝐡¯c,k(l))H(𝐈Q+i=lQc𝐡¯k(l)(𝐡¯k(l))H+j𝒦𝐇kH𝐏j𝐏jH𝐇k)1.\displaystyle\mathbf{g}_{c,k}^{l}=(\bar{\mathbf{h}}_{c,k}^{(l)})^{H}\left(\mathbf{I}_{Q}+\sum_{i=l}^{Q_{c}}\bar{\mathbf{h}}_{k}^{(l)}(\bar{\mathbf{h}}_{k}^{(l)})^{H}+\sum_{j\in\mathcal{K}}\mathbf{H}_{k}^{H}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{H}_{k}\right)^{-1}.

The definitions above use the assumption that the detection order follows the natural indexing of the streams for the brevity of the explanations. It is demonstrated in [55, 56] that ordering the streams according to their post-processing SINR yields the best performance. Therefore, we follow such approach by calculating the post-processing SINRs over the streams which are filtered by their corresponding linear nulling filters. Consider the detection and decoding of the ll-the common stream at user-kk. The index ii^{\prime} of such stream is determined as the solution of the problem

i=argmaxi𝒮lγc,ki,\displaystyle i^{\prime}=\operatorname*{arg\,max}_{i\in\mathcal{S}_{l}}\gamma^{i}_{c,k},

where 𝒮l\mathcal{S}_{l} is the index set of the undetected streams with a cardinality of |𝒮l|=Qcl+1|\mathcal{S}_{l}|=Q_{c}-l+1, γc,ki=1/ϵc,ki1\gamma^{i}_{c,k}=1/\epsilon^{i}_{c,k}-1 is the post-processing SINR of the undetected common stream ii at user-kk and ϵc,ki𝔼{|𝐠c,ki𝐲~kc,isc,i|2}\epsilon^{i}_{c,k}\triangleq\mathbb{E}\left\{|\mathbf{g}_{c,k}^{i}\widetilde{\mathbf{y}}_{k}^{c,i}-s_{c,i}|^{2}\right\} is the MSE of the undetected common stream ii.

The decoding of the common stream ll is performed by a Soft Decision (SD) decoder, for which the Log-Likelihood Ratios (LLRs) are calculated over the equalized symbol 𝐠c,kl𝐲~kc,l\mathbf{g}_{c,k}^{l}\widetilde{\mathbf{y}}_{k}^{c,l}. We use the LLR calculation method in [57]. Let λc,k,il\lambda_{c,k,i}^{l} denote the LLR of the ii-th bit of the equalized common stream symbol ll for the kk-th user. We write

λk,ic,l=γc,kl[minaθ1(i)ψ(a)minaθ0(i)ψ(a)],\displaystyle\lambda_{k,i}^{c,l}=\gamma_{c,k}^{l}\left[\min_{a\in\theta_{1}^{(i)}}\psi(a)-\min_{a\in\theta_{0}^{(i)}}\psi(a)\right],

where θb(i)\theta_{b}^{(i)} is the set of modulation symbols with the value bb, b{0,1}b\in\{0,1\} at the ii-th bit location, ψ(a)=|𝐠c,kl𝐲~kc,lρc,kla|2\psi(a)=|\frac{\mathbf{g}_{c,k}^{l}\widetilde{\mathbf{y}}_{k}^{c,l}}{\rho^{l}_{c,k}}-a|^{2} and ρc,kl=γc,kl/(1+γc,kl)\rho^{l}_{c,k}=\gamma^{l}_{c,k}/(1+\gamma^{l}_{c,k}).

A similar procedure is applied to the private streams intended to user-kk after all common streams are decoded. The decoding operation is performed using a Successive Cancellation List (SCL) decoder for point-to-point polar codes [58].

Remark: As our aim is to verify the theoretical foundations in the paper, we propose a receiver architecture which has a higher complexity than a receiver with a single interference cancellation (IC) process. An example design for a receiver with single IC would employ linear equalizers to detect each common (or private) stream by treating all other streams as interference and then decode all common (or private) streams in parallel. Although such receiver design is expected to suffer a performance loss due to its sub-optimality, it may be more suitable for practical systems due to its reduced complexity.

VI Numerical Results

In this section, we evaluate the performance of the RS in MIMO BC with perfect and imperfect CSIT. In the following, we first illustrate the WSR performance of RS in MIMO BC followed by the sum-DoF performance. At last, we illustrate the LLS results. For comparison, the following three strategies are considered as baselines:

  • DPC: Implemented based on the algorithm in [59]. With perfect CSIT, DPC is a capacity achieving scheme which cancels interference at the transmitter.

  • MU–MIMO: Results are produced by turning off the common stream and solving problem (11), i.e, allocating no power to the common stream vector.

  • MIMO NOMA: Implemented by extending the degraded beamforming methodology proposed in [11] to the MIMO case, delineated in Appendix A. The precoders and the decoding order are jointly optimized to achieve maximum performance, where precoders are optimized using the WMMSE and AO algorithm.

Note that the comparison with DPC is limited to the perfect CSIT scenario (as DPC is capacity achieving only with perfect CSIT) and MIMO NOMA is limited to the 22-user case because of the high complexity in joint optimization of decoding order and precoders for more than 22 users. User channels are randomly generated in accordance with [20, 18]. The actual channel experienced at user-kk 𝐇k\mathbf{H}_{k} and the channel estimation error 𝐇~k\widetilde{\mathbf{H}}_{k} both have complex Gaussian entries drawn from the distribution 𝒞𝒩(0,σk2)\mathcal{C}\mathcal{N}({0},\sigma_{k}^{2}) and 𝒞𝒩(0,σe,k2)\mathcal{C}\mathcal{N}({0},\sigma_{e,k}^{2}) respectively. The channel estimation error power is defined as σe,k2σk2Ptα\sigma_{e,k}^{2}\triangleq\sigma_{k}^{2}P_{t}^{-\alpha} such that the CSIT quality for user channels scale with both channel variance and transmit power. Consequently, the channel estimate 𝐇^k=𝐇k𝐇~k\widehat{\mathbf{H}}_{k}=\mathbf{H}_{k}-\widetilde{\mathbf{H}}_{k} also follows Gaussian distribution with entries 𝒞𝒩(0,σk2σe,k2)\mathcal{C}\mathcal{N}({0},\sigma_{k}^{2}-\sigma_{e,k}^{2}). By averaging the WASR over 100100 channel realizations, WESR is obtained. For each channel estimate 𝐇^\widehat{\mathbf{H}}, N=1000N=1000 channel error samples are generated to form (n)\mathbb{H}^{(n)} and then the SAA method is used to approximate the AR. For a given channel estimate 𝐇^\widehat{\mathbf{H}}, the nthn^{th} channel estimation error 𝐇~(n)\widetilde{\mathbf{H}}^{(n)} sample is generated randomly from the error distribution and forms the nthn^{th} conditional channel 𝐇(n)=𝐇^(n)+𝐇~(n){\mathbf{H}}^{(n)}=\widehat{\mathbf{H}}^{(n)}+\widetilde{\mathbf{H}}^{(n)}. In the case of perfect CSIT, N=0N=0 and 𝐇^=𝐇\widehat{\mathbf{H}}=\mathbf{H}.

Initialization of precoders of all three schemes is crucial and plays an important role at higher SNRs, especially for convergence[20]. For MU–MIMO and MIMO NOMA, the precoders are initialized using Maximum Ratio Transmission (MRT). Initial power allocation is uniform among the users. For RS, the initialization of precoders is according to MRT-Singular Value Decomposition (SVD) in which the private streams are initialized using MRT and the initial precoder for the common stream vector is the dominant M×QcM\times Q_{c} sub-matrix of the left singular matrix of 𝐇^\widehat{\mathbf{H}}. Power distribution is qc=PtPtαq_{c}=P_{t}-P_{t}^{\alpha} for the common stream vector and the remaining power PtαP_{t}^{\alpha} is uniformly distributed among the private stream vectors of all users. The noise variance is assumed to be σn2=1\sigma_{n}^{2}=1.

VI-A WESR Performance: Rate-Region

We first consider M=4M=4, K=2K=2, Q=2Q=2, Qc=2Q_{c}=2 and SNR=2020 dB. The ER-regions achieved by different strategies are illustrated for perfect and imperfect CSIT in Fig. 3 and Fig. 4, respectively, where different channel strength disparities are considered for analysis. A boundary point for any transmission strategy is realized by solving the WESR problem for a weight pair by averaging the WASR over 100100 channel realizations such that for each channel realization, we use Algorithm 11 to obtain the WASR for that strategy. The entire rate region is calculated over a set of different weight pairs assigned to users. To obtain the rate-regions, the weight of user-11 is fixed at μ1=1\mu_{1}=1 and the weight of user-22 is varied as μ210[3,1,0.95,,0.95,1,3]\mu_{2}\in 10^{[-3,-1,-0.95,\ldots,0.95,1,3]}.

Fig. 3 illustrates the ER-region of all the four transmission strategies in the perfect CSIT scenario. In both subfigures (a) and (b), DPC achieves the highest rate-region, which is the capacity region. In Fig. 3(a), we observe that with no disparity in the strength of user channels, MIMO NOMA achieves the worst rate region. As the MIMO NOMA strategy is motivated to exploit disparities in channel strengths, it is unable to properly manage the interference in this scenario. Whereas, MU–MIMO achieves a larger rate region compared to MIMO NOMA as it depends on the precoder design at the transmitter. At the receiver side, each user decodes its own streams by treating the streams of other users as noise. In contrast, when user-22 suffers an additional 1010 dB path loss, MIMO NOMA achieves a larger rate-region as illustrated in Fig. 3(b) compared to MU–MIMO when the weight of user-22 is either more than or comparable to the weight of user-11. When the weight of user-11 is significantly larger than the weight of user-22, the effect of disparities in channel strength fades as user-22 is weighted significantly less. Hence, MU–MIMO starts performing better than MIMO NOMA. In comparison, RS performs better than both MU–MIMO and MIMO NOMA and achieves a closer rate region to the capacity achieving DPC in both subfigures. This is due to the fact that it allows the users to exploit the common streams thereby enabling them to partially decode the interference and partially treat the interference as noise.

Fig. 4 illustrates the ER-regions obtained for the imperfect CSIT scenario. As the CSIT quality degrades, the performance of MU-MIMO deteriorates significantly while RS, because of the presence of common streams, exhibits robustness and shows explicit performance gain over MU-MIMO and MIMO NOMA. In the case of different channel strengths, we observe that the results are consistent with the perfect CSIT scenario. Comparing Fig. 3 and Fig. 4, we obtain that better management of interference makes RS robust to CSIT inaccuracy and different user deployments.

Refer to caption
(a) σ22=1\sigma_{2}^{2}=1
Refer to caption
(b) σ22=0.09\sigma_{2}^{2}=0.09
Figure 3: ER-region comparison of different strategies with perfect CSIT, averaged over 100100 random channel realizations, SNR=20=20 dB, M=4M=4, K=2K=2, Q=2Q=2, Qc=2Q_{c}=2, σ12=1\sigma_{1}^{2}=1.
Refer to caption
(a) σ22=1\sigma_{2}^{2}=1
Refer to caption
(b) σ22=0.09\sigma_{2}^{2}=0.09
Figure 4: ER-region comparison of different strategies with imperfect CSIT, averaged over 100100 random channel realizations, SNR=20=20 dB, M=4M=4, K=2K=2, Q=2Q=2, Qc=2Q_{c}=2, α=0.6\alpha=0.6, σ12=1\sigma_{1}^{2}=1.

VI-B Sum-DoF: Effect of Common Message

VI-B1 CSIT Quality

From Fig. 5(a), we see that at high SNR, RS with Qc=2Q_{c}=2 always achieves a higher ESR performance compared to RS with Qc=1Q_{c}=1, and both perform better than MU–MIMO and MIMO NOMA555Note that, the performance gain of RS over MIMO NOMA with Qc=1Q_{c}=1 has implications on the receiver complexity with RS requiring less number of SICs compared to MIMO NOMA.. For α=0.6\alpha=0.6, the sum-DoF obtained for RS with Qc=2Q_{c}=2, RS with Qc=1Q_{c}=1, MU-MIMO and MIMO NOMA are [3.08,2.67,2.10,1.93][3.08,2.67,2.10,1.93], respectively. These values are close to the theoretical sum-DoF values [3.2,2.8,2.4,2][3.2,2.8,2.4,2] calculated using (13)–(15). Thus, transmitting multiple common streams yields better ESR and sum-DoF performance than transmitting a single common stream. Furthermore, as α\alpha decreases from 0.90.9 to 0.30.3, the ESR performance and the sum-DoF gain gaps between RS and the other two schemes increase, as seen in Fig. 5(b) and Fig. 5(c). The results are inline with equations (13)–(15). Since α=0.9\alpha=0.9 is closer to the perfect CSIT case, at higher SNRs, the MU–MIMO curve is nearly parallel to the RS curve though with lower ESR. It is because the contribution of the common streams decreases with increase in α\alpha and w.r.t sum-DoF, MU–MIMO gets closer to RS. However, optimization and contribution of common stream still make the ESR performance of RS better than MU–MIMO. MIMO NOMA on the other hand is observed to have the same sum-DoF irrespective of α\alpha. The observation is consistent with the theoretical sum-DoF expression in (15) which clearly implies that the sum-DoF of MIMO NOMA is independent of α\alpha in our scenario666The results are consistent with the findings in [13] that NOMA is unable to exploit the available CSIT efficiently.. Therefore, RS is more suited in MU multi-antenna networks than MU–MIMO and MIMO NOMA in MIMO BC, especially with the deteriorating CSIT quality.

VI-B2 Number of Users

Fig. 6(a) and Fig. 6(b) illustrate the ESR performances of the schemes in the underloaded and overloaded regime, respectively. In the underloaded regime, for all the three user configurations, i.e., K=2K=2, K=3K=3 and K=4K=4, RS with Qc=2Q_{c}=2 has better ESR than RS with Qc=1Q_{c}=1 which in turn performs better than MU–MIMO and MIMO NOMA. Fig.  6(b) illustrates that in the overloaded regime also, RS outperforms MU–MIMO and MIMO NOMA in ESR and sum-DoF performances. The sum-DoF achieved by all the three schemes are inline with equations (13)–(15) for the underloaded and overloaded scenarios. Fig.  6(c) illustrates the ESR performance of all the three schemes with higher dimensions, i.e., M=9,Q=3M=9,\,Q=3. RS achieves ESR performance gain over MU–MIMO and MIMO NOMA in all scenarios.

Refer to caption
(a) α=0.9\alpha=0.9
Refer to caption
(b) α=0.6\alpha=0.6
Refer to caption
(c) α=0.3\alpha=0.3
Figure 5: ESR versus SNR comparison of different strategies with different imperfect CSIT inaccuracies, averaged over 100 random channel realizations, M=4M=4, K=2K=2, Q=2Q=2, σ12=1\sigma_{1}^{2}=1, σ22=1\sigma_{2}^{2}=1.
Refer to caption
(a) Q=2Q=2, M=KQM=KQ
Refer to caption
(b) Q=2Q=2, M=(K1)QM=(K-1)Q
Refer to caption
(c) Q=3Q=3, M=9M=9
Figure 6: ESR versus SNR comparison of different strategies for different network loads, averaged over 100 random channel realizations, α=0.6\alpha=0.6, σk2=1,k[1,4]\sigma_{k}^{2}=1,\forall k\in[1,4].

VI-C Link-Level Simulation Results

In this section, we aim to verify the theoretical foundations in the paper under realistic and practical setups. We perform Link-Level Simulations (LLS) to analyze the throughput performance of RS and compare it with those of MU-MIMO and MIMO NOMA. We employ the practical transceiver architecture described in Section V for RS, MU-MIMO and MIMO NOMA. Note that in the proposed architecture, the common signal is turned off to simulate MU-MIMO and one out of two private signals is turned off to simulate MIMO NOMA. We assume that the instantaneous channel is perfectly known at the transmitter for MCS selection. Let S(l)S^{(l)} denote the number of channel uses in the ll-th Monte-Carlo realization and Ds,k(l)D_{s,k}^{(l)} denote the number of successfully recovered information bits by user-kk in the common stream (excluding the common part of the message intended for the other user) and its private stream. Then, we calculate the throughput as

Throughput[bps/Hz]=l(Ds,1(l)+Ds,2(l))lS(l).\displaystyle\mathrm{Throughput[bps/Hz]}=\frac{\sum_{l}(D_{s,1}^{(l)}+D_{s,2}^{(l)})}{\sum_{l}S^{(l)}}. (30)

Fig. 7 shows the Shannon Bound (ESR obtained with Gaussian signalling and infinite block length) and throughput levels achieved by RS, MU-MIMO and MIMO NOMA in both underloaded and overloaded scenario, for M=4,Q=2,Qc=2M=4,Q=2,Q_{c}=2 and α=0.6\alpha=0.6. The throughput performance is consistent with the ESR performance for all three schemes in both the underloaded and overloaded regime.

Refer to caption
(a) K=2K=2
Refer to caption
(b) K=3K=3
Figure 7: Throughput versus SNR comparison of different strategies averaged over 100 random channel realizations.

VII Conclusion

To conclude, we introduce a general framework for RSMA in MIMO BC with both perfect and imperfect CSIT, where RSMA is able to transmit an arbitrary number of common streams. We study the proposed framework in both finite and high SNR regimes. In the finite SNR regime, we propose the vectorization and WMMSE-based approach for precoder optimization to analyze the rate performance with perfect and imperfect CSIT. In the high SNR regime, we derive the sum-DoF achieved by RS, MU–MIMO and MIMO NOMA in a symmetric MIMO BC setup with imperfect CSIT. We demonstrate via the theoretical results that by transmitting multiple common streams, RSMA achieves a higher sum-DoF with imperfect CSIT compared to the conventional multiple access schemes. Numerical results show that RSMA achieves a higher rate performance than MU–MIMO and MIMO NOMA irrespective of the network load, user deployments or the CSIT quality. Moreover, numerical results validate the theoretical sum-DoF expressions derived for all the three transmission strategies and the sum-DoF gain of RS over MU–MIMO and MIMO MOMA. Moving beyond the assumptions of Gaussian signalling and infinite block lengths, we design the PHY-layer architecture of RS and analyze its performance in practical systems. Through LLS simulations, we demonstrate that by better managing the interference, RS achieves a significant throughput performance gain over MU-MIMO and MIMO NOMA in MIMO BC. Therefore, we conclude that RSMA is a powerful and promising physical-layer strategy for multi-antenna networks.

Appendix A Achievable Sum-DoF

Since the precoders for each channel estimate are decoupled with each other, we consider precoding scheme for a given channel estimate 𝐇^\widehat{\mathbf{H}} with precoders defined by {𝐏}Pt\{\mathbf{P}\}_{P_{t}}. We begin with a general case by denoting the CSIT quality of the user-kk as αk=α,k𝒦{\alpha_{k}}=\alpha,\;\forall\,k\in\mathcal{K}. Here, α{\alpha} assumes a non-negative value and is in the range [0,1][0,1]. Also, replacing negative value of α\alpha with zero does not alter the sum-DoF results derived next. Therefore, one has 𝔼[|𝐡k,iH𝐩k|2]Pα\mathbb{E}[{\lvert\mathbf{h}_{k,i}^{H}\mathbf{p}_{k}\rvert}^{2}]\sim P^{-\alpha}, with 𝐡k,i\mathbf{h}_{k,i} as the ithi^{th} column of channel 𝐇k{\mathbf{H}_{k}} and 𝐩k\mathbf{p}_{k} being the vector of unit norm in the null space of 𝐇^k\widehat{\mathbf{H}}_{k}, i.e., ZF precoder. The entity |𝐡k,i𝐩k|2{\lvert\mathbf{h}_{k,i}\mathbf{p}_{k}\rvert}^{2} represents the residual interference power at the unintended receiver. The power exponent of the private stream vector of each user is taken as α=mink𝒦{αk},k𝒦\alpha=\min_{k\in\mathcal{K}}\{\alpha_{k}\},\forall\,k\in\mathcal{K}, as this power exponent maximizes the sum-DoF [21]. As aforementioned, here we consider a homogeneous network with equal number of receive antennas QQ at every user.

A-A RS and MU-MIMO

We begin by first deriving the achievable sum-DoF of RS. We consider two possible scenarios, MQM\leq Q and M>QM>Q. In the former scenario, the sum-DoF is restricted to MM and can be achieved by switching to single user transmission. Therefore, we focus on the latter scenario where M>QM>Q. This scenario encompasses both network load regimes, i.e., underloaded when M=KQM=KQ777In terms of the achievable sum-DoF, M>KQM>KQ is equivalent to the M=KQM=KQ scenario [60]. and overloaded when M=LQ, 1<L<KM=LQ,\,1<L<K. Consequently, we assume the number of users that will receive the private streams to be JKJ\leq K such that M=JQM=JQ. Whereas, the common streams will be multicast to all the KK users. Let us denote the set of users receiving the private streams to be 𝒥𝒦\mathcal{J}\subseteq\mathcal{K}. Remaining KJK-J users will form a set 𝒥\mathcal{J}^{*} such that 𝒥𝒥=𝒦\mathcal{J}\cup\mathcal{J}^{*}=\mathcal{K}. Therefore, we have Qk=Q,k𝒥Q_{k}=Q,\,\forall k\in\mathcal{J} and Qk=0,k𝒥Q_{k}=0,\,\forall k\in\mathcal{J}^{*}. We assume that the power is uniformly distributed among users for the private streams, i.e., Pα/JP^{\alpha}/J and consequently the power allocated to the common streams is PPαP-P^{\alpha}, which simply boils down to PPαPP-P^{\alpha}\sim P in high SNR regimes. Hence, the transmission block is constructed as,

  • A private stream vector denoted by 𝐬kQ×1\mathbf{s}_{k}\in\mathbb{C}^{Q\times 1} is transmitted to each user in the set 𝒥\mathcal{J} using a ZF precoder 𝐏kM×Q,k𝒥\mathbf{P}_{k}\in\mathbb{C}^{M\times Q},\,\,\forall\,k\in\mathcal{J}.

  • A common stream vector denoted by 𝐬cQc×1\mathbf{s}_{c}\in\mathbb{C}^{Q_{c}\times 1} is multicast to all users using a precoder 𝐏cM×Qc\mathbf{P}_{c}\in\mathbb{C}^{M\times Q_{c}}, where 𝐏c\mathbf{P}_{c} constitutes the first QcQ_{c} left singular vectors of 𝐇^\widehat{\mathbf{H}}.

As per the system model, the transmitted signal from the BS and the received signal at user-kk writes as

𝐱=𝐏c𝐬cP+k𝒥𝐏k𝐬kPα/J,{}\mathbf{x}=\underbrace{\mathbf{P}_{c}\mathbf{s}_{c}}_{P}+\sum_{k\in\mathcal{J}}\underbrace{\mathbf{P}_{k}\mathbf{s}_{k}}_{P^{\alpha}/J}, (31)
𝐲k,k𝒥=𝐇kH𝐏c𝐬cP+𝐇kH𝐏k𝐬kPα+j𝒥,jk𝐇kH𝐏j𝐬jPααk+𝐧kP0.{}\mathbf{y}_{k,k\in\mathcal{J}}=\underbrace{\mathbf{H}_{k}^{H}\mathbf{P}_{c}\mathbf{s}_{c}}_{P}+\underbrace{\mathbf{H}_{k}^{H}\mathbf{P}_{k}\mathbf{s}_{k}}_{P^{\alpha}}+\sum_{j\in\mathcal{J},j\neq k}\underbrace{\mathbf{H}_{k}^{H}\mathbf{P}_{j}\mathbf{s}_{j}}_{P^{\alpha-\alpha_{k}}}+\underbrace{\mathbf{n}_{k}}_{P^{0}}. (32)
𝐲k,k𝒥=𝐇kH𝐏c𝐬cP+j𝒥𝐇kH𝐏j𝐬jPα+𝐧kP0.{}\mathbf{y}_{k,k\in\mathcal{J}^{*}}=\underbrace{\mathbf{H}_{k}^{H}\mathbf{P}_{c}\mathbf{s}_{c}}_{P}+\sum_{j\in\mathcal{J}}\underbrace{\mathbf{H}_{k}^{H}\mathbf{P}_{j}\mathbf{s}_{j}}_{P^{\alpha}}+\underbrace{\mathbf{n}_{k}}_{P^{0}}. (33)

A-A1 User-kk in 𝒥\mathcal{J}

First, we write the common and private rates achieved by user-kk,

R^c,k=log2det(𝐐c+𝐐k+𝐐ηk)log2det(𝐐k+𝐐ηk),{}\widehat{R}_{c,k}=\log_{2}\det(\mathbf{Q}_{c}+\mathbf{Q}_{k}+\mathbf{Q}_{\eta_{k}})-\log_{2}\det(\mathbf{Q}_{k}+\mathbf{Q}_{\eta_{k}}), (34)
R^p,k=log2det(𝐐k+𝐐ηk)log2det(𝐐ηk),{}\widehat{R}_{p,k}=\log_{2}\det(\mathbf{Q}_{k}+\mathbf{Q}_{\eta_{k}})-\log_{2}\det(\mathbf{Q}_{\eta_{k}}), (35)

where 𝐐c=𝐇kH𝐏c𝐏cH𝐇k\mathbf{Q}_{c}=\mathbf{{H}}_{k}^{H}\mathbf{P}_{c}\mathbf{P}_{c}^{H}\mathbf{{H}}_{k}, 𝐐k=𝐇kH𝐏k𝐏kH𝐇k\mathbf{Q}_{k}=\mathbf{{H}}_{k}^{H}\mathbf{P}_{k}\mathbf{P}_{k}^{H}\mathbf{{H}}_{k} and 𝐐ηk=j𝒥,jk𝐇kH𝐏j𝐏jH𝐇k+σn2𝐈Q\mathbf{Q}_{\eta_{k}}=\sum_{j\in\mathcal{J},j\neq k}\mathbf{{H}}_{k}^{H}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{{H}}_{k}+\sigma_{n}^{2}\mathbf{I}_{Q} are respective covariance matrices of 𝐇kH𝐏c𝐬c\mathbf{{H}}_{k}^{H}\mathbf{P}_{c}\mathbf{s}_{c}, 𝐇kH𝐏k𝐬k\mathbf{{H}}_{k}^{H}\mathbf{P}_{k}\mathbf{s}_{k} and j𝒥,jk𝐇kH𝐏j𝐬j+𝐧k\sum_{j\in\mathcal{J},j\neq k}\mathbf{{H}}_{k}^{H}\mathbf{P}_{j}\mathbf{s}_{j}+\mathbf{n}_{k}. Furthermore, we consider the eigenvalue decomposition of 𝐐c\mathbf{Q}_{c}, 𝐐k\mathbf{Q}_{k} and 𝐐ηk\mathbf{Q}_{\eta_{k}} as 𝐖c𝐃c𝐖cH\mathbf{W}_{c}\mathbf{D}_{c}\mathbf{W}_{c}^{H}, 𝐖k𝐃k𝐖kH\mathbf{W}_{k}\mathbf{D}_{k}\mathbf{W}_{k}^{H} and 𝐖ηk𝐃ηk𝐖ηkH\mathbf{W}_{\eta_{k}}\mathbf{D}_{\eta_{k}}\mathbf{W}_{\eta_{k}}^{H}, respectively, with 𝐃cdiag(P𝐈Qc,𝟎QQc)\mathbf{D}_{c}\sim diag(P\mathbf{I}_{Q_{c}},\mathbf{0}_{Q-Q_{c}}), 𝐃kdiag(Pα𝐈Q)\mathbf{D}_{k}\sim diag(P^{\alpha}\mathbf{I}_{Q}) and 𝐃ηkdiag(P(ααk)+𝐈Q)\mathbf{D}_{\eta_{k}}\sim diag(P^{(\alpha-\alpha_{k})^{+}}\mathbf{I}_{Q}). Here, (x)+max(x,0)(x)^{+}\triangleq\max(x,0). Then it follows,

log2det(𝐐c+𝐐k+𝐐ηk)=(Qc+(QQc)α)log2(Pt)+𝒪(log2(Pt)),{}\begin{split}\log_{2}\det(\mathbf{Q}_{c}+\mathbf{Q}_{k}+\mathbf{Q}_{\eta_{k}})&=\big{(}Q_{c}+(Q-Q_{c})\alpha\big{)}\log_{2}(P_{t})+\mathcal{O}\,(\log_{2}(P_{t})),\end{split} (36)
log2det(𝐐k+𝐐ηk)=(Qα)log2(Pt)+𝒪(log2(Pt)),{}\begin{split}\log_{2}\det(\mathbf{Q}_{k}+\mathbf{Q}_{\eta_{k}})&=(Q\alpha)\log_{2}(P_{t})+\mathcal{O}\,(\log_{2}(P_{t})),\end{split} (37)
log2det(𝐐ηk)=𝒪(log2(Pt)).{}\log_{2}\det(\mathbf{Q}_{\eta_{k}})=\mathcal{O}\,(\log_{2}(P_{t})). (38)

A-A2 User-kk in 𝒥\mathcal{J}^{*}

For user-kk in the the set 𝒥\mathcal{J}^{*}, we only have the common stream vector and the common rate is written as

R^c,k=log2det(𝐐c+𝐐ηk)log2det(𝐐ηk),{}\widehat{R}_{c,k}=\log_{2}\det(\mathbf{Q}_{c}+\mathbf{Q}_{\eta_{k}})-\log_{2}\det(\mathbf{Q}_{\eta_{k}}), (39)

where 𝐐ηk=j𝒥𝐇kH𝐏j𝐏jH𝐇k+σn2𝐈Q\mathbf{Q}_{\eta_{k}}=\sum_{j\in\mathcal{J}}\mathbf{{H}}_{k}^{H}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{{H}}_{k}+\sigma_{n}^{2}\mathbf{I}_{Q} with the eigenvalue decomposition as 𝐃ηkdiag(Pα𝐈Q)\mathbf{D}_{\eta_{k}}\sim\\ diag(P^{\alpha}\mathbf{I}_{Q}). It follows

log2det(𝐐c+𝐐ηk)=(Qc+(QQc)α)log2(Pt)+𝒪(log2(Pt)),{}\begin{split}\log_{2}\det(\mathbf{Q}_{c}+\mathbf{Q}_{\eta_{k}})&=\big{(}Q_{c}+(Q-Q_{c})\alpha\big{)}\log_{2}(P_{t})+\mathcal{O}\,(\log_{2}(P_{t})),\end{split} (40)
log2det(𝐐ηk)=(Qα)log2(Pt).{}\log_{2}\det(\mathbf{Q}_{\eta_{k}})=(Q\alpha)\,\log_{2}(P_{t}). (41)

The term 𝒪(log2(Pt))\mathcal{O}(\log_{2}(P_{t})) dies with PtP_{t}\rightarrow\infty because of the negative exponent. We proceed to obtaining the common and private DoF for user-kk by using the expression in (12). From (34), (39), (36)–(37) and (40)–(41), the DoF for the common stream vector is dc,k=Qc(1α)d_{c,k}=Q_{c}(1-\alpha) and is shared by all users in 𝒦\mathcal{K}. Similarly, from (35) and (37)–(38), the DoF for the private stream vector at each user in 𝒥\mathcal{J} is dp,k=Qαd_{p,k}=Q\alpha and dp,k=0d_{p,k}=0 for users in 𝒥\mathcal{J}^{*}. Considering the total rate achieved by user-kk, k𝒦\forall k\in\mathcal{K}, it can be written that R^c+R^p,kR^c,k+R^p,k\widehat{R}_{c}+\widehat{R}_{p,k}\leq\widehat{R}_{c,k}+\widehat{R}_{p,k}. Hence, the sum-DoF achieved by RS is dsRSdc,k+k=1Kdp,kd_{s}^{\textrm{RS}}\leq d_{c,k}+\sum_{k=1}^{K}d_{p,k} and is expressed in (13).

Switching off the common stream and using equations (35)-(38), the expression for the sum-DoF achieved by the conventional MU–MIMO scheme is obtained and is expressed in (14).

A-B MIMO NOMA

We now consider multi-antenna MIMO NOMA. Without loss of generality we assume the decoding order888The sum-DoF analysis is independent of the basis of ordering and will hold for any ordering of users. 1K1\rightarrow K such that user-11 performs K1K-1 layers of SIC to fully decode the messages (and therefore remove interference) from the other K1K-1 users. Similarly, the next user, i.e., user-22 performs K2K-2 layers of SIC to fully decode messages from other K2K-2 users and so on. Thus, user-KK decodes its own message vector treating messages of the rest K1K-1 users as noise and the message vector of user-11 will be decoded by user-11 after it decodes the messages of all the other K1K-1 users and removes them. Following the NOMA principle, the transmit signal vector 𝐱\mathbf{x} is generated such that the messages intended for user-kk are encoded using a shared codebook such that user-kk,k𝒦\;k\in\mathcal{K} is able to decode the message of user-j,j𝒦j,\;j\in\mathcal{K} for j>kj>k. After encoding the messages, linear precoders 𝐏kM×Qk,k𝒦\mathbf{P}_{k}\in\mathbb{C}^{M\times Q_{k}},\;\forall k\in\mathcal{K} can be used to construct the transmit signal vector denoted by

𝐱=k𝒦𝐏k𝐬kP/K,{}\mathbf{x}=\sum_{k\in\mathcal{K}}\underbrace{\mathbf{P}_{k}\mathbf{s}_{k}}_{P/K}, (42)

and the received signal at user-kk is given by 𝐲k=𝐇k𝐱+𝐧k\mathbf{y}_{k}=\mathbf{H}_{k}\mathbf{x}+\mathbf{n}_{k}. Using SIC, user-kk decodes the message vector of user-j,jkj,\;j\geq k while treating the interference from users {ii<j,i𝒦}\{i\;\mid i<j,\;i\in\mathcal{K}\} as noise. Under the assumption of Gaussian signalling and perfect SIC, the rate at user-kk, to decode the message vector of user-jj is given by

R^k,j=log2det(𝐈Q+𝐐k,j(𝐈Q+𝐐k,j(in))1),\widehat{R}_{k,j}=\log_{2}\det\bigg{(}\mathbf{I}_{Q}+{\mathbf{Q}_{k,j}}\big{(}\mathbf{I}_{Q}+\mathbf{Q}_{k,j}^{(in)}\big{)}^{-1}\bigg{)}, (43)

where 𝐐k,j=𝐇kH𝐏j𝐏jH𝐇k\mathbf{Q}_{k,j}=\mathbf{{H}}_{k}^{H}\mathbf{P}_{j}\mathbf{P}_{j}^{H}\mathbf{{H}}_{k} and 𝐐k,j(in)=i<j,i𝒦𝐇kH𝐏i𝐏iH𝐇k\mathbf{Q}_{k,j}^{(in)}=\sum_{i<j,i\in\mathcal{K}}\mathbf{{H}}_{k}^{H}\mathbf{P}_{i}\mathbf{P}_{i}^{H}\mathbf{{H}}_{k}. In order to ensure decodability, message vector of user-jj is expected to be decoded at user-kjk,k𝒦k\;\mid j\geq k,\;k\in\mathcal{K} and thus the rate of user-jj should not exceed R^j=minkj,k𝒦R^k,j\widehat{R}_{j}=\min_{k\leq j,\;k\in\mathcal{K}}\,\widehat{R}_{k,j}. Therefore, the achievable rates of KK-users is given by R^1=R^1,1\widehat{R}_{1}=\widehat{R}_{1,1}, R^2=min(R^1,2,R^2,2)\widehat{R}_{2}=\min(\widehat{R}_{1,2},\widehat{R}_{2,2}), R^3=min(R^1,3,R^2,3,R^3,3)\widehat{R}_{3}=\min(\widehat{R}_{1,3},\widehat{R}_{2,3},\widehat{R}_{3,3}),…, R^K=min(R^1,K,R^2,K,,R^K,K)\widehat{R}_{K}=\min(\widehat{R}_{1,K},\widehat{R}_{2,K},\ldots,\widehat{R}_{K,K}). Since the message vector of each user is required to be decoded by user-11, the SR R^NOMA\widehat{R}_{\textrm{NOMA}} of the KK-user MIMO NOMA can then be upper bounded as

R^NOMAk=1KR^1,k=log2det(𝐈Q+k=1K𝐐1,k).\begin{split}\widehat{R}_{\textrm{NOMA}}&\leq\sum_{k=1}^{K}\widehat{R}_{1,k}=\log_{2}\det(\mathbf{I}_{Q}+\sum_{k=1}^{K}\mathbf{Q}_{1,k}).\end{split} (44)

The SR bound achieved with this KK-user MIMO NOMA strategy is further upper bounded as

R^NOMA\displaystyle\widehat{R}_{\textrm{NOMA}} log2det(𝐈Q+𝐇1H𝐐1𝐇1)=Ptmin(M,Q)log2(Pt)+𝒪(1),\displaystyle\leq\log_{2}\det\left(\mathbf{I}_{Q}+\mathbf{H}_{1}^{H}\mathbf{Q}_{1}^{\star}\mathbf{H}_{1}\right)\stackrel{{\scriptstyle P_{t}\nearrow}}{{=}}\min(M,Q)\log_{2}\left(P_{t}\right)+\mathcal{O}(1),

where 𝐐1\mathbf{Q}_{1}^{\star} refers to the optimal covariance matrix for user-1 in a single-user (OMA) setup with tr(𝐏1𝐏1H)=Pttr(\mathbf{P}_{1}\mathbf{P}_{1}^{H})=P_{t}, i.e., obtained by transmitting along the dominant eigenvector of 𝐇1𝐇1H\mathbf{H}_{1}\mathbf{H}_{1}^{H} and allocating power PtP_{t} according to the water-filling solution. The sum-DoF obtained for MIMO NOMA is described in equation (15).

References

  • [1] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits of MIMO channels,” IEEE J. Sel. Areas Commun., vol. 21, no. 5, pp. 684–702, June 2003.
  • [2] G. Caire and S. Shamai, “On the achievable throughput of a multiantenna Gaussian broadcast channel,” IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, 2003.
  • [3] H. Weingarten, Y. Steinberg, and S. S. Shamai, “The capacity region of the Gaussian multiple-input multiple-output broadcast channel,” IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 3936–3964, Sept. 2006.
  • [4] S. S. Christensen, R. Agarwal, E. D. Carvalho, and J. M. Cioffi, “Weighted sum-rate maximization using weighted MMSE for MIMO-BC beamforming design,” IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 4792–4799, Dec. 2008.
  • [5] Lai-U Choi and R. D. Murch, “A transmit preprocessing technique for multiuser MIMO systems using a decomposition approach,” IEEE Transactions on Wireless Communications, vol. 3, no. 1, pp. 20–24, 2004.
  • [6] M. Sadek, A. Tarighat, and A. H. Sayed, “A leakage-based precoding scheme for downlink multi-user MIMO channels,” IEEE Transactions on Wireless Communications, vol. 6, no. 5, pp. 1711–1721, 2007.
  • [7] Zhengang Pan, Kai-Kit Wong, and Tung-Sang Ng, “Generalized multiuser orthogonal space-division multiplexing,” IEEE Transactions on Wireless Communications, vol. 3, no. 6, pp. 1969–1973, 2004.
  • [8] Q. Sun, S. Han, C. L. I, and Z. Pan, “On the ergodic capacity of MIMO NOMA systems,” IEEE Wireless Commun. Lett., vol. 4, no. 4, pp. 405–408, Aug. 2015.
  • [9] Q. Zhang, Q. Li, and J. Qin, “Robust beamforming for nonorthogonal multiple-access systems in MISO channels,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 10 231–10 236, Dec. 2016.
  • [10] Y. Mao, B. Clerckx, and V. O. K. Li, “Rate-splitting multiple access for downlink communication systems: bridging, generalizing, and outperforming SDMA and NOMA,” EURASIP J. Wireless Commun. Netw., vol. 2018, no. 1, p. 133, May 2018.
  • [11] H. Joudeh and B. Clerckx, “Rate-splitting for max-min fair multigroup multicast beamforming in overloaded systems,” IEEE Trans. Wireless Commun., vol. 16, no. 11, pp. 7276–7289, Nov. 2017.
  • [12] B. Clerckx, Y. Mao, R. Schober, and H. V. Poor, “Rate-splitting unifying SDMA, OMA, NOMA, and multicasting in MISO broadcast channel: A simple two-user rate analysis,” IEEE Wireless Communications Letters, vol. 9, no. 3, pp. 349–353, 2020.
  • [13] B. Clerckx, Y. Mao, R. Schober, E. Jorswieck, D. J. Love, J. Yuan, L. Hanzo, G. Y. Li, E. G. Larsson, and G. Caire, “Is NOMA efficient in multi-antenna networks? A critical look at next generation multiple access techniques,” arXiv:2101.04802, 2021.
  • [14] Y. Mao, B. Clerckx, and V. O. K. Li, “Rate-splitting for multi-antenna non-orthogonal unicast and multicast transmission: Spectral and energy efficiency analysis,” IEEE Trans. Commun., vol. 67, no. 12, pp. 8754–8770, Dec. 2019.
  • [15] B. Clerckx, H. Joudeh, C. Hao, M. Dai, and B. Rassouli, “Rate splitting for MIMO wireless networks: A promising PHY-layer strategy for LTE evolution,” IEEE Commun. Mag., vol. 54, no. 5, pp. 98–105, May 2016.
  • [16] T. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE Trans. Inf. Theory, vol. 27, no. 1, pp. 49–60, Jan. 1981.
  • [17] C. Hao, Y. Wu, and B. Clerckx, “Rate analysis of two-receiver MISO broadcast channel with finite rate feedback: A rate-splitting approach,” IEEE Trans. Commun., vol. 63, no. 9, pp. 3232–3246, Sept. 2015.
  • [18] H. Joudeh and B. Clerckx, “Robust transmission in downlink multiuser MISO systems: A rate-splitting approach,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6227–6242, Dec. 2016.
  • [19] E. Piovano, H. Joudeh, and B. Clerckx, “Overloaded multiuser MISO transmission with imperfect CSIT,” in Proc. 50th Asilomar Conf. Signals, Syst. Comput., Nov. 2016, pp. 34–38.
  • [20] H. Joudeh and B. Clerckx, “Sum-rate maximization for linearly precoded downlink multiuser MISO systems with partial CSIT: A rate-splitting approach,” IEEE Trans. Commun., vol. 64, no. 11, pp. 4847–4861, Nov. 2016.
  • [21] C. Hao, B. Rassouli, and B. Clerckx, “Achievable DoF regions of MIMO networks with imperfect CSIT,” IEEE Trans. Inf. Theory, vol. 63, no. 10, pp. 6587–6606, Oct. 2017.
  • [22] C. Hao and B. Clerckx, “MISO networks with imperfect CSIT: A topological rate-splitting approach,” IEEE Trans. Commun., vol. 65, no. 5, pp. 2164–2179, May 2017.
  • [23] M. Medra and T. N. Davidson, “Robust downlink transmission: An offset-based single-rate-splitting approach,” in Proc. IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), June 2018, pp. 1–5.
  • [24] G. Lu, L. Li, H. Tian, and F. Qian, “MMSE-based precoding for rate splitting systems with finite feedback,” IEEE Commun. Lett., vol. 22, no. 3, pp. 642–645, Mar. 2018.
  • [25] E. Piovano and B. Clerckx, “Optimal DoF region of the K-user MISO BC with partial CSIT,” IEEE Commun. Lett., vol. 21, no. 11, pp. 2368–2371, Nov. 2017.
  • [26] A. Gholami Davoodi and S. Jafar, “Degrees of Freedom region of the (M,N1,N2)(M,N_{1},N_{2}) MIMO broadcast channel with partial CSIT: An application of sum-set inequalities based on aligned image sets,” IEEE Trans. Inf. Theory, vol. 66, no. 10, pp. 6256–6279, 2020.
  • [27] Y. Mao, B. Clerckx, and V. O. K. Li, “Energy efficiency of rate-splitting multiple access, and performance benefits over SDMA and NOMA,” in Proc. IEEE Int. Symp. Wireless Commun. Syst. (ISWCS), Aug. 2018, pp. 1–5.
  • [28] Y. Mao and B. Clerckx, “Beyond dirty paper coding for multi-antenna broadcast channel with partial CSIT: A rate-splitting approach,” IEEE Trans. Commun., vol. 68, no. 11, pp. 6775–6791, 2020.
  • [29] A. R. Flores, B. Clerckx, and R. C. de Lamare, “Tomlinson-harashima precoded rate-splitting for multiuser multiple-antenna systems,” in Proc. IEEE Int. Symp. Wireless Commun. Syst. (ISWCS), Aug. 2018, pp. 1–6.
  • [30] Z. Li, C. Ye, Y. Cui, S. Yang, and S. Shamai, “Rate splitting for multi-antenna downlink: Precoder design and practical implementation,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1910–1924, 2020.
  • [31] M. Dai, B. Clerckx, D. Gesbert, and G. Caire, “A rate splitting strategy for massive MIMO with imperfect CSIT,” IEEE Trans. Wireless Commun., vol. 15, no. 7, pp. 4611–4624, July 2016.
  • [32] A. Papazafeiropoulos, B. Clerckx, and T. Ratnarajah, “Rate-splitting to mitigate residual transceiver hardware impairments in massive MIMO systems,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 8196–8211, Sept. 2017.
  • [33] M. Dai and B. Clerckx, “Multiuser millimeter wave beamforming strategies with quantized and statistical CSIT,” IEEE Trans. Wireless Commun., vol. 16, no. 11, pp. 7025–7038, Nov. 2017.
  • [34] O. Kolawole, A. Panazafeironoulos, and T. Ratnarajah, “A rate-splitting strategy for multi-user millimeter-wave systems with imperfect CSI,” in Proc. IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), June 2018, pp. 1–5.
  • [35] H. Joudeh and B. Clerckx, “Sum rate maximization for MU-MISO with partial CSIT using joint multicasting and broadcasting,” in Proc. IEEE Int. Conf. Commun. (ICC), June 2015, pp. 4733–4738.
  • [36] O. Tervo, L. Trant, S. Chatzinotas, B. Ottersten, and M. Juntti, “Multigroup multicast beamforming and antenna selection with rate-splitting in multicell systems,” in Proc. IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), June 2018, pp. 1–5.
  • [37] H. Chen, D. Mi, T. Wang, Z. Chu, Y. Xu, D. He, and P. Xiao, “Rate-splitting for multicarrier multigroup multicast: Precoder design and error performance,” IEEE Trans. Broadcast., pp. 1–12, 2021.
  • [38] A. Alameer Ahmad, H. Dahrouj, A. Chaaban, A. Sezgin, and M. Alouini, “Interference mitigation via rate-splitting and common message decoding in cloud radio access networks,” IEEE Access, vol. 7, pp. 80 350–80 365, June 2019.
  • [39] Y. Mao, B. Clerckx, J. Zhang, V. O. K. Li, and M. Arafah, “Max-min fairness of K-user cooperative rate-splitting in MISO broadcast channel with user relaying,” IEEE Trans. Wireless Commun., pp. 1–1, 2020.
  • [40] H. Fu, S. Feng, W. Tang, and D. W. K. Ng, “Robust secure beamforming design for two-user downlink MISO rate-splitting systems,” IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 8351–8365, 2020.
  • [41] L. Li, K. Chai, J. Li, and X. Li, “Resource allocation for multicarrier rate-splitting multiple access system,” IEEE Access, vol. 8, pp. 174 222–174 232, 2020.
  • [42] W. Jaafar, S. Naser, S. Muhaidat, P. C. Sofotasios, and H. Yanikomeroglu, “Multiple access in aerial networks: From orthogonal and non-orthogonal to rate-splitting,” IEEE Open J. Veh. Tech., vol. 1, pp. 372–392, 2020.
  • [43] Y. Mao, B. Clerckx, and V. O. K. Li, “Rate-splitting multiple access for coordinated multi-point joint transmission,” in Proc. IEEE Int. Conf. Commun. (ICC) Workshop, May 2019, pp. 1–6.
  • [44] A. R. Flores, R. C. de Lamare, and B. Clerckx, “Linear precoding and stream combining for rate splitting in multiuser MIMO systems,” IEEE Communications Letters, vol. 24, no. 4, pp. 890–894, 2020.
  • [45] N. Jindal, “MIMO broadcast channels with finite-rate feedback,” IEEE Trans. Inf. Theory, vol. 52, no. 11, pp. 5045–5060, Nov. 2006.
  • [46] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “Multiuser MIMO achievable rates with downlink training and channel state feedback,” IEEE Trans. Inf. Theory, vol. 56, no. 6, pp. 2845–2866, June 2010.
  • [47] S. Yang, M. Kobayashi, D. Gesbert, and X. Yi, “Degrees of freedom of time correlated MISO broadcast channel with delayed CSIT,” IEEE Trans. Inf. Theory, vol. 59, no. 1, pp. 315–328, Jan. 2013.
  • [48] A. Papazafeiropoulos, B. Clerckx, and T. Ratnarajah, “Rate-Splitting to Mitigate Residual Transceiver Hardware Impairments in Massive MIMO Systems,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 8196–8211, 2017.
  • [49] T. Yoo and A. Goldsmith, “Capacity of fading MIMO channels with channel estimation error,” in Proc. IEEE Int. Conf. on Commun., vol. 2, 2004, pp. 808–813 Vol.2.
  • [50] A. Shapiro and et al, “Lectures on Stochastic Programming: Modelling and Theory,” Philadelphia, PA, USA ; SIAM, 2009.
  • [51] Y. Ye, Interior point algorithms: theory and analysis.   Springer, 1997.
  • [52] M. Grant, S. Boyd, and Y. Ye, “CVX: Matlab software for disciplined convex programming,” 2008.
  • [53] O. Dizdar, Y. Mao, W. Han, and B. Clerckx, “Rate-splitting multiple access for downlink multi-antenna communications: Physical layer design and link-level simulations,” in IEEE 31st Annual Int. Symp. Personal Indoor and Mobile Radio Commun. (PIMRC), 2020, pp. 1–6.
  • [54] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, 2009.
  • [55] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A. Valenzuela, “V-BLAST: an architecture for realizing very high data rates over the rich-scattering wireless channel,” in Proc. Int. Symp. Signals, Syst. Electron., (ISSSE), 1998, Pisa, Italy., pp. 295–300.
  • [56] G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wolniansky, “Simplified processing for high spectral efficiency wireless communication employing multi-element arrays,” IEEE J. Sel. Areas Commun., vol. 17, no. 11, pp. 1841–1852, 1999.
  • [57] D. Seethaler, G. Matz, and F. Hlawatsch, “An efficient MMSE-based demodulator for MIMO bit-interleaved coded modulation,” in IEEE Global Commun. Conf. (GLOBECOM)’04, vol. 4, 2004, pp. 2455–2459 Vol.4.
  • [58] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, 2015.
  • [59] H. Viswanathan, S. Venkatesan, and H. Huang, “Downlink capacity evaluation of cellular networks with known-interference cancellation,” IEEE J. Sel. Areas Commun., vol. 21, no. 5, pp. 802–811, June 2003.
  • [60] J. Chen and P. Elia, “Symmetric two-user MIMO BC with evolving feedback,” in 2014 Inf. Theory Appl. Workshop (ITA), 2014, pp. 1–5.